GENOMICS
12‘13-17
(19%)
Exon Skipping in Human ,&Casein RAW 5. MENON, Department
of Molecular,
YING-FON
CHANG, KATHLEEN F. JEFFERS, AND RICHARD G. HAM
Cellular and Developmental ReceivedJune
Biology, University of Colorado,
13, 1991;revised
Septemberl8,
Boulder,
Colorado
80309
1991
Previous interspecies comparisons of amino acid sequences of @caseins showed that the human protein was offset by nine residues, which in earlier alignments was depicted as an N-terminal truncation and a C-terminal extension in human @-casein. It was speculated that this could reflect a difference in the number or location of exons in the human gene (Greenberg et al., 1984; Holt and Sawyer, 1988). Here we report the cloning of a segment of human P-casein gene spanning exons 2 to 4 and the presence of a sequence that codes for exon 3 in the intervening sequence. Comparative sequence analysis reveals a possible mechanism for the omission of exon 3 in the mature @-casein message.
Earlier amino acid alignments of mature &caseins showed that the human protein was shifted in alignment relative to other species,with amino acid deletions in the N-terminal region and others inserted in the C-terminal region. Our alignment, basedon cDNA sequencesand their translation products, has shown that the amino acid deletions correspond exactly to exon 3 in the other species. Cloning and sequencingof a segmentof the humanfi-casein gene between exons 2 and 4 revealed the presence of an intact exon 3 sequencein the gene. An interruption of the polypyrimidine tract adjacent to the 5’ end of exon 3 sequencemay account for the omissionof the exon from human @-caseinmRNA. o 1992 Academic POW, IW.
MATERIALS
INTRODUCTION
@-Casein is the major casein in human milk, accounting for as much as 30% of its total protein mass. Apart from being the primary source of essential amino acids, P-casein in concert with K-casein forms micelles that transport calcium and phosphorus to the developing infant. cDNAs for @-caseins from six species have been cloned (Menon and Ham, 1989a, 1989b; Lannerdahl et al., 1990; Baev et al., 1987; Stewart et al., 1987; Jiminez-Florez et al., 1987; Blackburn et aZ., 1982; Schaerer et aZ., 1988; Yoshimura et al., 1986; Provot et al., 1989) and genomic clones for four species are now available (Bonsing et al., 1988; Jones et al., 1986, Yoshimura and Oka, 1989; Thepot et al., 1991). Caseins are among the most rapidly evolving proteins (Dayhoff, 1976). Nevertheless, a comparison of all available @casein amino acid sequences reveals the presence of a number of well-conserved residues distributed along the entire length of the protein (Greenberg et al., 1984; Holt and Sawyer, 1988). These residues are thought to play an important role in conserving the three-dimensional structure of @casein (Holt and Sawyer, 1988).
AND METHODS
Sequence Alignment Amino acid and nucleic acid sequence comparisons were performed using an iterative procedure (Feng and Doolittle, 1987) from the EuGene sequence analysis package obtained from the Molecular Biology Information Resource at Baylor College, Houston, TX. Manual adjustment of the computer-generated global alignment was performed to maximize local alignment. Polymerase
Chain Reaction (PCR)
Primers used in polymerase chain reactions were synthesized at Operon Technologies Inc., CA, based on the known cDNA sequence of human @-casein (Menon and Ham, 198913). The 5’ primer was a 38mer that coded for most of the signal peptide and resides entirely within human exon 2. The 3’ primer was a 42mer that was complementary to the entire length of exon 6. Total human placental genomic DNA, obtained from a single individual, was purchased from Clontech, CA. Approximately 250 ng genomic DNA was used per reaction. PCR was performed in a 50-~1 volume containing 50 pmol of each of the primers, 0.2 mM each of dATP, dCTP, dGTP, and dTTP, 20 m&f Tris-HCl (pH 8.3), 50 mA4 KCl, 1.5 mA4 MgCl,, and
The human sequence data reported in this article have been deposited with the Genbank Data Libraries under Accession No. M69198. 13
All
Copyright 0 1992 rights of reproduction
0888-7543/92 $3.00 by Academic Press, Inc. in any form reserved.
14
MENON
1.25 U Thermus aquaticus (Taq) DNA polymerase (Perkin-Elmer/Cetus Corp.) in a DNA thermal cycler (Perkin-Elmer/Cetus) for 32 cycles. Optimum target amplification was achieved using the following profile: Denaturing temperature was 95°C for 5 min for the first cycle, 2 min for the next 8 cycles, 1 min for the following 22 cycles, and 1 min for the last cycle; annealing and extension performed at 65°C for 2 min and 72°C for 3 min, respectively, for all except the last cycle, which was extended for 6 min. Amplification products were analyzed by agarose gel electrophoresis. The product of the first PCR was then used in a second PCR to generate a subfragment spanning exons 2 to 4, using the 5’ primer described above and a 3’ primer complementary to the entire length of human exon 4. This subfragment facilitated sequencing of the region upstream of exon 4, without having to generate a nested set of deletion clones to acquire sequence information beyond the range of the sequencing primers, which would have otherwise been necessary on account of the long intervening sequence. Subcloning and Sequencing
The amplified target sequence spanning exons 2 to 4 was filled in with Klenow to ensure that the ends were blunt and ligated into the SmaI site of pGEM7Zf(+) (Promega Corp.). Circular double-stranded recombinant plasmids were sequenced by the dideoxy chain termination method using the Sequenase kit (United States Biochemical Corp.) as described earlier (Kraft et al., 1988). The sequence obtained was confirmed from several independent clones to rule out the possibility of amplification-mediated errors in the PCR product. RESULTS AND DISCUSSION
We cloned and sequenced the cDNA for human /3casein (Menon and Ham, 1989b) and aligned the nucleotide sequence and the translated amino acid sequence with fi-caseins from all other available species. Manual adjustment of the computer-generated global alignment was performed to maximize local alignment. The precise codon alignment that was achieved made it possible to project exon boundaries from published mouse, rat, bovine, and rabbit genomic sequences to the cDNA sequences for sheep and human. This in turn revealed that the apparent N-terminal truncation of human P-casein was actually due to the absence of an exon that is otherwise present in all five of the other species (Fig. 1). The C-terminal extension, on the other hand, is due to an elongation of exon 7 (Menon and Ham, unpublished data). Exon 2, which contains 13 bp of the 5’-noncoding region, the entire signal peptide, and the first two amino acids of
ET AL. Exon
2
Bovine MKVLILACLVALALARE Sheep Human Rabbit Rat Mouse
-------_---_------------__------_ ---_--_---_--_--_ ---F---------------Fe------------
Exon
3
Exon
4
1
LEELNVPGE Q-----V-K-Q-S--TKDpJ?T-SSTTFT-SS-
1VESLSSSE T-------TI------‘q-G-“---T G-I--ET D-I--E-
FIG. 1. Amino acid alignment of exons 2 through 4 for p-caseins from six species. Arrow indicates N-terminal amino acid of mature protein. Gaps were introduced to maximize homology.
the mature protein, is highly conserved and is present in its entirety in all six species, including human. However, exon 3, which codes for the next nine amino acids in the other 5 species, is totally absent in the human cDNA. Exon 4, which codes for the phosphorylation sites, is well conserved in all six species and is directly spliced to exon 2 in the human cDNA. To ascertain whether the exon 3 sequence is present in the human P-casein gene, we subcloned a segment of the gene extending from exon 2 to exon 4 using PCR. Sequencing upstream from exon 4 and downstream from exon 2 revealed that exon 3 sequence is present in the human P-casein gene, as is evident from sequence homology to exon 3 from the other species (Figs. 2a and 2b). The first three amino acids of the human exon 3 are identical to the rabbit sequence and the last three amino acids are identical to mouse and rat sequences (Fig. 2b). Thus, the absence of exon 3 in human P-casein mRNA results from a splicing anomaly. Three cis elements contribute to normal splice acceptor function: (1) a YAG consensus sequence-the 3’ splice site-at the intron/exon boundary, with a near absolute requirement for A and G; (2) a polypyrimidine tract, usually located just upstream from the YAG; and (3) a branch point consensus sequence usually 20 to 40 bp upstream from the 3’ splice site (Padgett et al., 1986). Mutations in any of these elements can result in splicing anomalies such as exon skipping, reduced splicing efficiency, abolition of splicing, or activation of cryptic splice sites. Comparison of the sequence upstream from human exon 3 with genomic sequences from other species shows that the YAG sequence adjacent to human exon 3 is unaltered (Fig. 2a). The mammalian branch point consensus sequence is YNYURAY (Smith et al., 1989), with YURAY being the most critical. The bovine genomic sequence has a putative branch point sequence UAUUUAAU (YNYYURAY) located about 45 bp upstream from exon 3. The rabbit genomic sequence also has a putative branch point sequence CAUUUAAU (YNYYURAY) located at exactly the same position as in the bovine sequence (Fig. 2a). The critical elements of this sequence are all conserved in a UGUUUAAC (YNYYURAY) sequence at a similar
EXON SKIPPING IN HUMAN
P-CASEIN
15
a. Bovine Human Rabbit Rat Mouse
uaaag caaguaaaaa uuagguaggaaa auauuuaauaau uuaaggug auguaaaaa uuaaguagaaugcauguuuaacaau uuauggagc uaaaaaaaauucacuagaaaacacauuuaaucau ucaagcagaaaguauguuuaguaacaggg uaaaaagacaugcaaa ccaaauggaaaguauguuuaquaacaaag uaaaa gacaugaaag
Bovine Human Rabbit Rat Mouse
uugacugugggaac uuagcuauagaaqu uaqcuauagaaac uu gauauaguaacau uagauauaguaac
gag gag gaa
uaaa uuuuuucucuuua w9uuu. .. uuuuuaaa& ucuuuuca ugaa ugaaaagguuuuu uuuuuuuccuuuuuca aauguuuuuaau uguuug cuuuuaca u apuguauuuuqucuquuuquuuccuuuuaca
Human Rabbit Rat Mouse
Exon 3 qCUGGAAGAACUCAAUGUA CCUGGUGAGquaaq auauuuuuauacaaa ~~AAGGAACAAUCCAAUGCAUCCU CUGAGgugagauuuauuuuuu gAAGGAACAACUCAGUGU UCCCACAGAGguaagaauuuuuuuucuc gAAGGAUGCAUUCACUGUGUCCU CUGAGguaagaguuuuuauuc aaagACUACAUUUACUGUAUCCU CUGAGguaagaguuuauauu
Bovine Human Rabbit Rat Mouse
gaaaaaaauuaauuuaa cuguaaaau aguaacaqucucu aaug gaga aaauuuau gaa ccauaaaau aguaa auucucu aaug auuccagccauaa agcaguaacauucucugaaauggcu gaga aaau gagacaaauu uccaacucuaagauuacua cauuuu u aaug gagacaaauu ucccacucuaaaauuacua cauuuu u aaug
Bovine Human Rabbit Mouse
cuggcagaagacucagcuaauug ucaauuuuuauuuuucc UC auuuuu uuuucc cu agaagauuuagcugguug cu uagcugguugcug~uc uu l2uuccc ugaaguc cuaauug gaucuauaauuuaacuauauauaug cuaauug gauauauaaucuaac auauauaug z”u ugaaguc
Bovine Human Rabbit Rat Mouse
uuuauagAUUGUGGAAAGCCUUUCAAGCAGUGAG acagACCAUAGAAAGCCUGUCAAGCAGUGAG uauuuuuuuuuuacagGCUGUAGGAAGUGUUUCCAGCAGCGAG uauguauuuuuuguuuacagACU GGUAGUAUUUC CAGUGAG GAUAGUAUUUC CAGUGAG uauguauuuuu guuuacagACU
Bovine
Rat
cz
au au au au
Exon 4
b. Bovine Sheep Rabbit Human Mouse
Rat FIG. 2. (a) Alignment of pre-mRNA p-caseins spanning the 3’portion of intron (including the unexpressed human exon putative branch points. Adenine residues tion products of exon 3 from cow, sheep,
LEELNVPGE p----v-K-Q-S-P-K-QS-ASSTTFU-SSKDAFT-SS-
sequences (inferred from genomic sequence data) from bovine, human, rabbit, rat, and mouse 2 to exon 4. Gaps were introduced to maximize homology. Bases in uppercase characters are exons 3 sequence) and those in lowercase are intron sequences. The underligned adenine residues are that interrupt the polypyrimidine tract are shown with asterisks above. (b) Alignment of translarabbit, human, mouse, and rat.
location in the human genome. Homologous regions also occur at comparable locations in the mouse and rat sequences, but in both cases they are changed to UGUUUAGU (YNYYURGY), which is not expected to be functional. There are alternative YURAY sequences closer to the polypyrimidine tracts, which may serve as branch points in these species (Fig. 2a). The most likely cause of exon 3 skipping in human /3-casein appears to be a defective polypyrimidine tract. There are a total of four adenines that interrupt the polypyrimidine tract between positions -10 and -14 from the first base of the human exon 3, including three that are contiguous (Fig. 2a). A limited number of purine substitutions at any position within the polypyrimidine tract have little effect on acceptor
function, provided that a critical density of polypyrimidines is retained (Padgett et al., 1986). However, excessive substitution can block splice acceptor function. In the cY-tropomyosin transcript, exon 1 is spliced to exon 3 by default. The basis for this pattern of splicing has been shown to be a function of the relative strengths of the polypyrimidine tracts and branch points upstream of each of those exons. The polypyrimidine tract upstream of exon 2 in the cu-tropomyosin transcript is shorter and has more purine interruptions than its counterpart upstream of exon 3. As a result, exon 3 outcompetes exon 2 for truns-acting factors and is spliced directly to exon 1 (Nadal-Ginard, 1990). An extreme example is seen in the case of a variant of the /3-thalassemia gene, in
16
MENON
which a single transversional event in the polypyrimidine tract in intron 2 is the only abnormality that results in a thalassemia phenotype (Beldjord et al., 1988). Comparison of the sequences in Fig. 2a shows that the polypyrimidine tract upstream from the human exon 3 sequence has both the lowest pyrimidine density and the lowest total number of pyrimidines. Mutations in the 5’ splice site are also known to cause skipping of the preceeding exon (DiLella et al., 1986; Cole, et al., 1990). The 5’ splice site immediately following the human exon 3 sequence, however, agrees with the eukaryotic 5’ splice site consensus GURAGU (Fig. 2a) and cannot account for the omission of exon 3. In view of the lack of other obvious splicing defects, it appears that interruption of the polypyrimidine tract is the cause for failure of splicing of exon 3, resulting in exon 2 being spliced directly to exon 4. No firm conclusions can be drawn, however, until in vitro mutagenesis studies demonstrate that restoration of an uninterrupted polypyrimidine tract will restore splicing of exon 3. Mutations that result in exon skipping in certain members within a species are not unusual and have been reported for proteins as diverse as human Type III procollagen (Kuivaniemi et al., 1990) and the human retinoblastoma gene product (Horowitz et al., 1989). In addition, a variant form of a,,-casein in goats that correlates with low milk yield has been shown to lack an exon that is present in a,,-caseins from other members of the species (Brignon et al., 1990). The original amino acid sequence (Greenberg et al., 1984) and our cDNA sequence (Menon and Ham, 1989b) are from two different individuals, both of which confirm the absence of exon 3 in the mature human P-casein mRNA. Moreover, two earlier reports on the amino acid content of human P-casein provide additional indirect evidence in support of this view (Chtourou et al., 1985; Groves and Gordon, 1970). Both studies reported a total of 10 serine residues, which matches the figure for the published amino acid sequence of human fl-casein (Greenberg et al., 1984) and the translation product of our cDNA (Menon and Ham, 198913).Inclusion of exon 3 would have increased the total number of serine residues to 13. Lijnnerdahl et al. (1990) have recently confirmed the human p-casein cDNA sequence, but their sequence adds no further evidence in support, since it was obtained from the same cDNA library as ours. The physiological consequence of the lack of the nine residues coded for by exon 3 is not clear, but it is worth noting that the human exon 3 codes for two additional phosphorylation sites (Fig. 2b, serine residues 7 and 8). The N-terminal phosphoserine/phosphothreonine amino acids of @-casein are crucial to the biological function of the molecule, and variations in their number could affect the overall quality of
ET AL.
milk (Brignon et al., 1990). A broader sampling will be required for a firm conclusion that exon 3 is never expressed in human fl-casein. Nevertheless, the lack of expression of exon 3 is at the very least a frequent occurrence in the human population and may well be species specific. ACKNOWLEDGMENTS This research was supported by Grant CA30028 from the National Cancer Institute. We thank the W. M. Keck Foundation for generous support of RNA science on the Boulder campus.
REFERENCES 1. BAEV, A. A., SMIMOV, I. K., AND GORODETSKII, I. I. (1987). Primary structure of bovine /3-casein cDNA. Mol. Biol. (Moscow) 21: 255-265. 2.
BELDJORD, C., LAPOUMEROULIE, C., PAGNIER, J.,BENAEUDJI, M., KRISHNAMOORTHY, R., LABIE, D., AND BANK, A. (1988). A novel /3-thalassemia gene with a single base mutation in the conserved polypyrimidine sequence at the 3’ end of IVS 2. Nucleic
3.
4.
5.
6.
Acids
Res. 16: 4927-4935.
BLACKBURN, D. E., HOBBS, A. A., AND ROSEN, J. M. (1982). Rat /3-casein cDNA: Sequence analysis evolutionary comparisons. Nucleic Acids Res. 10: 2295-2307. BONSING, J., RING, J. M., STEWART, A. F., AND MACKINLAY, A. G. (1988). Complete nucleotide sequence of the bovine flcasein gene. Aust. J. Biol. Sci. 41: 527-537. BRIGNON, G., MAHE, M. F., AND RIDBADEAU-DUMAS, B. (1990). Two of the three genetic variants of goat or,,-casein which are synthesized at a reduced level have an internal deletion possibly due to altered RNA splicing. Eur. J. Biochem. 193: 237-241. CHTOUROU, A., BRIGNON, G., AND RIEZADEAU-DUMAS, B. (1985). Quantification of fl-casein in human milk. J. Dairy Rex
52: 239-247.
7. COLE, W. G., CHIODO, A. A., LAMANDE, S. R., JANECZKO, R., RAMIREZ, F., DAHL, M., CHAN, D., AND BATEMAN, J. F. (1990). A base substitution at a splice site in the COL3Al gene causes exon skipping and generates abnormal type III procollagen in a patient with Ehlers-Danlos syndrome type IV. J. Biol. Chem. 265: 17070-17077. 8. DAYHOFF, M. 0. (1976). “Atlas of Protein Sequence and Structure,” National Biomedical Reses.rch Foundation, Silverspring, MD. 9. DILELLA, A. G., MARVIT, J., LIDSKY, A. S., G~~TTLER, F., AND Woo, S. L. C. (1986). Tight linkage between a splicing mutation and a specific DNA haplotype in phenylketonuria. Nature 322:
799-803.
10. FENG, D. F., AND DOOLITTLE, R. F. (1987). Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 25: 351-360. 11. GREENBERG, R., GROVES, M. L., AND DOWER, H. J. (1984). Human fl-casein: Amino acid sequence and identification of phosphorylation sites. J. Biol. Chem. 259: 5132-5138. 12. GROVES, M. L., AND GORDON, W. G. (1970). The major component of human casein: A protein phosphorylated at different levels. Arch. Biochem. Biophys. 140: 47-51. 13. HOLT, C., AND SAWYER, L. (1988). Primary andpredictedsecondary structures of the caseins in relation to their biological functions. Protein Eng. 2: 251-259.
EXON
SKIPPING
14. HOROWITZ, J. M., YANDELL, D. W., PARK, S.-H., CANNING, S., WHYTE, P., BUCHKOVICH, K., HARLOW, E., WEINBERG, R. A., AND DRYJA, T. P. (1989). Science 243: 937-940. 15. JIMINEZ-FLOREZ, R., KANG, Y. C., AND RICHARDSON, T. (1987). Cloning and sequence analysis of bovine p-casein cDNA. Biochem. Biophys. Res. Commun. 142: 617-621. 16. JONES, W. K., Yu-LEE, L-Y., CLIFT, S. M., BROWN, T. L., AND ROSEN, J. M. (1986). The rat casein multigene family: Fine structure and evolution of the fl-casein gene. J. Biol. Chem.
260:
7042-7050.
17. KRAFT, R., TARDIFF, J., KRAUTER, K. S., AND LEINWAND, L. A. (1988). Using mini-prep plasmid DNA for sequencing double stranded templates with Sequenase. BioTechniques 6: 544-549.
18. KUIVANIEMI, H., KONTUSAARI, S., TROMP, G., ZHAO, M., SABOL, C., AND PROCKOP, D. J. (1990). Identical G+’ to A mutations in three different introns of the type III procollagen gene (COL3Al) produce different patterns of RNA splicing in three variants of Ehlers-Danlos syndrome IV. J. Biol. Chem. 265: 12067-12074. 19. LGNNEFWAHL, B., BERGSTROM, S., AND ANDERSSON, Y. (1990). Cloning and sequencing of a cDNA encoding human milk @-casein. FEBS I&t. 269: 153-156. 20. MENON, R. S., AND HAM, R. G. (1989a). Human @-casein cDNA: Partial cDNA sequence and apparent polymorphism. Nucleic Acids Res. 17: 2869. 21. MENON, R. S., AND HAM, R. G. (1989b). EMBL Accession No. x17070.
IN HUMAN
/3-CASEIN
17
22. NADAL-GINARD, B. (1990). Muscle cell differentiation and alternative splicing. Curr. Opinions Cell Biol. 2: 1058-1064. 23. PADGETT, R. A., GRABOWSKI, P. J., KONARSKA, M. M., SEILER, S., AND SHARP, P. A. (1986). Splicing of messenger RNA precursors. Anna Rev. Biochem. 55: 1119-1150. 24. PROVOT, C., PERSUY, M-A., AND MERCIER, J-C. (1989). Complete nucleotide sequence of ovine @-casein cDNA: Inter-species comparison. Biochimie 71: 827-832. 25. SCHAERER, E., DEVINOY, E., KF~AEHENBUHL, J-P., AND HouDEBINE, L-M. (1988). Sequence of the rabbit @-casein: Comparison with other casein cDNA sequences. Nucleic Acids Res. 16: 11814. 26. SMITH, C. W. J., PATTON, J. G., AND NADAL-GINARD, B. (1989). Alternative splicing in the control of gene expression. Annu.
Rev. Genet.
23: 527-577.
27. STEWART, A. F., BONSING, J., BEATTIE, C. W., SHAH, F., WILLIS, I. A., AND MACKINLAY, A. J. (1987). Complete nucleotide sequences of bovine a,, and P-casein cDNAs: Comparison with related sequences in other species. Mol. Biol. Evol. 4: 231-241. 28. TH~POT, D., DEVINOY, E., FONTAINE, M. L., AND HOUDEBINE, L. M. (1991). Structure of the gene encoding rabbit @-casein. Gene 97: 301-306. 29. YOSHIMURA, M., BANNERJEE, M. R., AND OKA, T. (1986). Nucleotide sequence of a cDNA encoding mouse fi-casein. Nucleic Acids Res. 14: 8224. 30. YOSHIMURA, M., AND OKA, T. (1989). Isolation and structural analysis of the mouse p-casein gene. Gene 78: 267-275.