ib
Pergamoil
Archs oral Biol. Vol. 42, No. 7, pp. 489496, 1997 © 1997ElsevierScienceLtd. All rights reserved Printed in Great Britain PII: S0003-9969(97)00039-3 0003-9969/97$17.00+ 0.00
M O L E C U L A R C L O N I N G A N D C H A R A C T E R I Z A T I O N OF T H E BOVINE T U F T E L I N G E N E M. M. BASHIR,* W. R. ABRAMS and J. ROSENBLOOM Research Center in Oral Biology and Department of Anatomy and Histology, University of Pennsylvania School of Dental Medicine, Philadelphia, PA 19104, U.S.A. (Accepted 4 April 1997)
Summary--The bovine tuftelin gene was cloned and its structure determined by DNA sequence analysis and comparison to that of bovine tuftelin cDNA. The analyses demonstrated that the cDNA contains a 1014-bp open reading frame encoding a protein of 338 residues with a calculated mol. wt of 38630 and an isoelectric point of 5.85. These results differ from those previously published, (1991) which contained a different conceptual amino acid sequence for the carboxy terminal region and identified a different termination codon. The protein does not appear to share homology or domain motifs with any other known protein. The gene consists of 13 exons ranging in size from 66 to 1531 bp, the latter containing the encoded carboxyterminal and 3' untranslated regions. The exons are embedded in more than 28 kbp of genomic DNA. Codons are generally not divided at exonlintron borders. Several alternatively spliced transcripts were identified by DNA sequence analysis of the isolated products produced by reverse transcriptase/polymerase chain reaction. © 1997 Elsevier Science Ltd Key words: enamel matrix; gene structure; tuftelin; alternative splicing; amelogenins.
INTRODUCTION During development of tooth enamel, ameloblasts secrete an organi,= matrix composed primarily of the proteins called amelogenins and various nonamelogenin proteins, such as enamelins and tuft proteins (Eastoe, 1979; Deutsch, 1989; Limeback, 1991; Brookes et al., 1995). Amelogenins constitute approx. 90% of the enamel matrix protein (Termine et al., 1980) and are thought to be crucial for proper enamel structure and function, as mutations within the X-chromosomal amelogenin gene are strongly correlated with the inherited enamel disease, amelogenesis imperfecta (Lagerstrom et al., 1991; Aldred et al., 1992; Lench and Winter, 1995; Collier et al., 19'96). Furthermore, inhibition of amelogenin expre,;sion in cultured tooth explants results in disorganized enamel containing smaller than normal hydroxyapatite crystals, supporting the theory that amelogenins have an important role in development of enamel mineral (Diekwisch et al., 1993). Although great progress has been made in our understanding of amelogenins, information on enamelins is limited. As a group, the enamelins are acidic proteins ric;h in aspartic acid, glutamic acid, serine and glycine and they differ in mass (28-70 *To whom all correspondence should be addressed. Abbreviations: RT/PCR, reverse transcriptase/polymerase chain reaction.
kDa) from the amelogenins (5-25 kDa) (Deutsch, 1989). Only one protein that may belong to the enamelins has been cloned and it has been named tuftelin (Deutsch et al., 1991). The human tuftelin gene has been localized to chromosome lq21-q31 (Deutsch et al., 1994). The role of enamel proteins in the formation of enamel is not well understood, but it is generally accepted that such proteins, in conjunction with enamel organ cells, in some manner control the initiation of crystallization as well as the shape and size of hydroxyapatite crystals in the extracellular enamel matrix (for a review, see Simmer and Finchman, 1995). When enamel proteins are extracted from developing teeth, multiple components are found in both the amelogenin and enamelin classes of proteins (Termine et al., 1980). Some of these species are the result of proteolytic processing (reviewed in Brookes et al., 1995), but in the case of amelogenins it has been demonstrated that extensive alternative splicing may also be partially responsible for the observed protein heterogeneity (Glibson et al., 1991; Gibson et al., 1992; Lau et al., 1992; Simmer et al., 1994; reviewed in Zeichner-David et al., 1995). To understand better the function of the enamelins, it is necessary to determine the nature of the primary translation products to distinguish them from proteolytic breakdown products. To this end, we have now analysed tuftelin cDNAs made from mRNA isolated from bovine developing enamel
489
490
M.M. Bashir et al.
organs. To further our understanding of the gene structure of enamel proteins, we have also cloned and characterized the gene encoding bovine tuftelin.
EXPERIMENTAL PROCEDURES
Construction and screening o f cDNA and genornic libraries A bovine genomic library constructed by the insertion of genomic D N A partially digested with Sau3A into the BamHI site of EMBL-3 Sp6/T7 (Clontech) was screened with oligonucleotides corresponding to the 5', middle and 3' segments of the previously published sequence (Deutsch et al., 1991). A clone was identified that was subsequently shown to contain the 3' region of the tuftelin gene. Total R N A was extracted in guanidine thiocyanate (Chomczynski and Sacchi, 1987) from ameloblastrich tissue prepared from 120- to 160-day bovine fetal molar teeth and poly ( A + ) was isolated by oligo(dT)-cellulose affinity chromatography. A cDNA library was prepared in 2 Zap Express (Strategene) by the method of Gubler and Hoffman (1983) using oligo (dT) as primer for first-strand synthesis. This cDNA library was then screened using a restriction fragment of the genomic clone that contained exon 13, a large exon in the gene encoding the carboxy terminus of the protein and the 3' untranslated segment (see Figs 1 and 4). Positive plaques were further purified by multiple cycles of screening followed by in vivo excision of the phagemid according to the instructions from Strategene. Positive clones were characterized by restriction endonuclease mapping, Southern blotting and sequencing. The genomic library was then screened with restriction fragments of the tuftelin cDNA; after multiple rounds of screening, five positive clones containing the entire tuftelin cDNA were isolated and purified. DNA sequencing Restriction fragments of the cDNA and genomic clones were isolated after electrophoresis on 1% agarose gels and subcloned into pUC19. The inserts were sequenced by the Sanger dideoxynucleotide chain-termination method (Sanger et al., 1977), as modified for Taq D N A polymerase cycle sequencing using an ABI 373A automated D N A sequencer, in order to compare the genomic sequence with that of the cDNA and to determine the exon/intron structure of the gene. Internal regions of the subclones were sequenced by the use of oligonucleotide primers complementary to insert sequences. All sequences have been confirmed by comparing genomic and cDNA sequences and by multiple sequencing of both strands. Sequence data were assembled and discrepancies were resolved using the Wisconsin
Package (Genetics Computer Wisconsin).
Group,
Madison,
Identification o f alternatively spliced transcripts Approx. 900 ng poly ( A + ) RNA prepared from ameloblast-rich tissue was reverse transcribed by using AMV reverse transcriptase (Molecular Genetic Resources) and priming with a synthetic oligonucleotide complementary to a segment of the 3' untranslated region of the tuftelin message (see Fig. 1 for the position of oligonucleotides employed). The cDNA products were amplified by 30 cycles of the PCR (denature 1.5 min at 95°C, anneal 2min at 55°C, synthesize 2 min at 72°C) using the same 3' oligonucleotide and one corresponding to the beginning of the translated portion of the tuftelin message. The products of the PCR reaction were first separated by electrophoresis on a 1.5% agarose gel followed by cloning into pUC19 (the primers contained HindIII restriction sites at their ends) and D N A sequencing.
RESULTS cDNA and Northern analyses To obtain probes for genomic analyses and to determine whether any heterogeneity existed in the bovine tuftelin message, cDNA clones were obtained and analysed. The results of these analyses agreed with those previously published (Deutsch et al., 1991), except for the following differences (see Fig. 1 for the complete cDNA sequence where differences are indicated: (1) the 5' untranslated segment was 28 bases longer than the published sequence; (2) one clone was missing bases 113 187, suggesting alternative splicing (see below); (3) one clone contained a 3' untranslated segment that was 240 bases shorter than the published cDNA (a second polyadenylation consensus sequence, aataaa, is present at bases 2440 2445); (4) there were several differences in individual bases (either an additional base or a different base), which are identified in Fig. 1; however, only one of these differences resulted in an amino acid change (at residue 190 where aspartate was changed to glutamate); and most importantly, (5) an additional guanine (confirmed in the genomic sequence) was found at base position 1097 in all clones, which caused a frameshift and a difference in the conceptual amino acid sequence of the carboxy portion of the protein. The new protein sequence contains only three cysteines compared to five in the published sequence; it does not contain tryptophan, but does contain a consensus sequence, NFS, for potential N-glycosylation at residues 304-306. Figure 2 illustrates the original sequence data obtained for this region of both strands of the cDNA and genomic DNA. The sequence shown and reported in this paper was
Cloning of bovine tuftelin gene
491
cggactgcagggctgagggttgggctt~gtgctgcgacccccgcaggagaagatgaac
58
gggacgcgtaactggtgtaccttggtggacgttcacccggaggggcagaccgcgggcagcgtggacgtgctcagg
133
ctgactctccagagtgaactgacaggagatgaacttgaacgcatagcccagaaggcgggcaggaagacctatgcc
2~
~GGTGTCCAGCCACTCAACTA~CATTCTCTGGCCTCAGAACTGGTGGAGTCCAATGATGGACACGAAGAGATC
2~
ZT3
1M
V
S
3¥4
S
H
S
T
S
H
S
L
A
S
E
L
V
E
S
N
D G H
E
E
I
ATTAAGGTGTACTTGAAGGGGAGGTCTGGAGATAAGATGATCCATGAGAAGAATATTAACCAGCTGAAGAGTGAG 26 I
K
V
Y
L
K
G R S
4@S
G D
K M I
H
E
K
N
I
N Q L
K
S
GTCCAGTATATCCAGGAGGCCAGGAACTGCTTGCAGAAGC~CCGGGAGGATATAAGTAGCAAACTTGACAGAGAT 51V
Q
Y
I
Q
E
A
R
N
C
L
Q 516
K
V
V
L
R
E
D
I
S
S
K
L
D
R
G
D
S
V
H
K
Q
E 617
I
Q
L
E
K
Q
N
G
L
S
E
G
P
K T V O ?T8 GACTTGCTTGTCAAGCTGCAGGAAGCTGAGCAGCAGCACCAGTCAGACTGTTCGGCTTTTAAGGTCACACTCAGC
126 D
T
Y
S
S
P
P
E
V
D
T
H
I
N
E
D
V
E
S
L
R
L
L
V
K
L
Q
E
A
E
Q Q H
Q
S
D
C
S
A
F
K
V
T
L
Y
Q
R
E
A
K
Q
S
Q
V
A
L
Q
R
8~9
.
A
E
D
R
A
E
O
K
E
V
G E
L
Q R
R
L
Q G M E
T
E
Y
O A
I
L
A
K
9 @I0
V
R
658
~3
A
GAAGTCGGGGAGCTGCAGAGGCGCTTGCAGGGAATGGAGACGGAGTATCAGGCCATACTGGCGAAGGTCAGGGAA 176 E
583
S
CAGTACCAGAGAGAAGCCAAGCAGAGTCAGGTGGCCCTTCAGAGAGCGGAGGACAGAGCGGAGCAGAAGGAGGCA 151Q
508
L
ACCACGTATAGCAGCCCACCCGAGGTGGACACCCATATAAATGAAGATGTTGAGAGCTTGAGGAAGACGGTTCAG I01T
433
D
CCAGGAGATTCTGTCCATAAACAGGAGATACAGGTGGTGCTAGAAAAGCAAAATGGCCTTAGTGAAGGTCCCCTG 76 P
358
E
808
E
GGGGAGACAGCCCTGGAGGAACTTCGGAGTAAGAACGTTGACTGCCAAGCAGAACAAGAAAAGGCTGCTAACCTG ~3 201G
E
T
A
L
E
E
L
R S
K
N V
D C O A
E Q E
K
A
A
N
L
GAAAAGGAAGTGGCTGGGTTGCGGGAGAAGATCCACCACTTGGATGACATGCTCAAGAGCCAGCAGCGCAAAGTC 226 E
K
E
V
A
G
L
10~11
R
E
K
I
H
H
L
O D
M
L
K
S
O
Q
R
K
958
V
CGGCAAATGATAGAGCAGCTCCAGAATTCAAAAGCTGTGATCCAGTCCAAGGATACCACCATCCAGGAGCTCAAG 1~3 251R
O
M
I
E
O
L
O
N
S
11 ~ 12
K
A
V
I
Q
S
K
O T
T
I
Q
E
L
K
GAGAAGATTGCCTACCTGGAAGCCGAGAATTTAGAGATGCACGACCGGATGGAACACCTGATA6AAAAACAAATC I I ~ 276 E
K
I
A
Y
L
E
A
E
N
L
E
M
H
D
R
M
E H iz~13
L
I
E
K
Q
I
A3TCATGGCAACTTCAGCACCCAGAACCGGGCCAAGACCGAGAACCTGGGCAGCATCAGGATATCCAAGCCCCCC 11~ 301S
H G N F S
T O N R A K T E N L
G S I
R I
$
K P P
AGCCCAAAGCCCATGCCTCTCATCCGAGTGGTTGAAACATGA 326 S P K P M P L I R V V E T STOP gcccgaggagatggacgttgctgccgcctctggcctgcggaggagcccaccacccctggaggccgccagccctga
1225
1300
c t t t g a c t t c c c a a c t g c t c c c t g g c t g t g c t c a g g c t c g g g c t t c a t g g t t c t g t a ~ g c t g g g c a g a c a g a g g a 1375 ~tTgchcccctctggacactgtggcctctgaaggctggtaccctgcctgcaggagccagggcagtagccttgttc 1450 c c g t a a t t a c t a t t t c t c t c t g t a g c a g a g c c t c c c t t c t g a t g t a g a t t g g a g t t c a g c t g c c c c a g a a g c a g g 1525 ccctccgcgggtcaggtcgggagaggtgatgagatctgccctaggagctggagtcctgggggaacagagttcctt ccataaacatagctcagttcttaacaacaaactgtttgtttttctacttgctccaccctcagcccatgctgagca gggcctcctgcagacagactatggggctacctgtcatcgcctggtcagagccagtgaacctcaactttgaccggg tggtcttgtctttgggggaggggaaatgtccttcaaggcgta~Etgtgagcagaaaggttctgtgaggccacaga cagttatgggcaacttctctctgctgtgaagtctcccagaacctcttagggtttccccctaagtggaggtgcaag ataca¢ctcattcctcttgactcagagcctaaaaactgttttcactgggttacatcaatctcagcgaagaaactc ttctggtatttattttgctaagttattggtgtttttccttccatctcacaactgatttaatc¢cagagttctgca gtcttctcttgcggtgtttggatttacttgacagagggaaaggagcattttcaatggactcatcgatttcaggaa tgacgttataagtcttttcctctctttcctctacctgctgcctttcccttcctcctcccaaacctcaggaaggca agtatatttaccctgaggccatcagggtttgcttgtctgttgtttgggggaagacagcagtcaggagggattgaa tccctctcttaaccctgctgtgccctggtgctccaccatcagtttgttatcccggctctgaatgaacctgccact cgtctgtgttttctttagggctcttcctgcagctttaagaaggagttaagcagcagaatgtggtactctaggaca cagaaaaaatgaaaaataaacaaagtctctttaggttctacttggctcaaaaccagaagccttaagtctgtggtt tctgtgcgttggggctgagctcccatgctagctttgttcattgtggtgttctggtgttcttcacaccattcccgt ttgtacaatgagggggttgcaccagataatttccatgggtccttccagttctgatactctttcccatggcatatt c:tttgtatggtgagtttaataaattatgttaatgtgtcaaaaaaaaa
1600 7675 1750 1825 I~0
1975 2050 2125
22~ 2275 2350 2425 2500 2575 2650 2697
Fig. 1. Nucleotide and deduced amino acid sequence of tuftelin. The 5' and 3' untranslated segments are in lower-case letters. Amino acids are numbered to the left from the start site of translation; nucleotides are numbered to the right. Division of the sequence into exons is indicated (V). Two polyadenylation consensus sequences, aataaa are heavily underlined. The additional bases found at the 5' end are indicated by dashed underlining; individual bases that differ from the previously published sequence (Deutsch et al., 1991) are in bold (i.e. &=base difference, g = additional base); the additional guanine located at position 1097, which alters the reading frame, is indicated QG); the previously identified stop codon t(L~) is indicated; sequences corresponding to the oligonucleotides used in the RT/PCR amplification of tuftelin mRNA are indicated by solid arrows over the nucleotide sequences.
obtained from two independently isolated c D N A clones and two independently isolated genomic clones that were repeatedly sequenced with identical results. The new sequence does not contain any recognizable motifs and the c D N A now encodes a protein containing 338 residues instead of 389 as in the previously published sequence. The calculated
isoelectric point is 5.85. A search of GenBank, the E M B L Data Library, and the Brookhaven Protein Data Bank failed to identify any homologous sequences. Northern analysis of poly(A + ) R N A isolated from the teeth of 120-160-day bovine fetuses using cloned c D N A as a probe identified a single, some-
492
M. M. Bashir et al.
Y r
cDNA
DNA Source
5' GATAGAAAAAC3
'
I
5 ' GTTTTTCTATC3
'
Y
Genomic
t Sense
i
i
i
i
Anti-Sense
Strand Orientation Fig. 2. Tracings obtained by automated DNA sequencing of the region that contained an additional guanine residue compared to the previously published sequence (Deutsch et al., 1991). The additional base is marked by an arrow head and shaded for easy identification.
what broad band centred at 2.7 kb (Fig. 3). This is consistent with the size of the cloned cDNA and suggests that the entire mRNA has been cloned.
Structure o f the tuftelin gene The genomic clones, encompassing more than 28 kbp, were characterized by restriction endonuclease mapping and comparison of the DNA sequence to that of the cDNA (see Fig. 4 for a diagram of the gene). These analyses demonstrated that the entire cDNA sequence was contained within the cloned genomic DNA and permitted the definition of the exon/intron structure of the coding and 3' untranslated portions of the gene. This region of the gene is composed of 13 exons, which are indicated in Figs 1 and 4. There was 100% agreement between the sequence contained in the designated exons and the corresponding cDNA, including the 3' untranslated segment. The exons range in size from 66 bp (exon 6) to 1531 bp (exon 13, including the 3' untranslated sequence). Exon/intron borders confirm to the consensus deduced from analysis of a
large number of eukaryotic genes (Fig. 5) (Padgett et al., 1986). In most cases, border codons were not split. To determine the start site of transcription, we have carried out primer extension and nuclease protection experiments. As yet, these have been inconclusive (data not shown). It is possible that additional transcribed sequences exist 5' of the exon designated number 1, possibly in an additional exon, so that the present exon numbering must be regarded as provisional.
Isolation and characterization o f alternatively spliced transcripts There is compelling evidence that the primary amelogenin transcript is alternatively spliced in several species including cows (Zeichner-David et al., 1995; Gibson et al., 1991, 1992). This observation and our finding that one tuftelin cDNA clone lacked bases 113-187 (corresponding to exon 2) raised the possibility that the tuftelin transcript is also alternatively spliced. To test this hypothesis,
Cloning of bovine tuftelin gene
493 DISCUSSION
kb n
4.4
--
2.4
Enamel proteins are central to the process of amelogenesis, a highly regulated developmental cascade yielding the most mineralized of all vertebrate tissues (Eastoe, 1979; Deutsch, 1989). Early biochemical analysis of the enamel proteins resulted in their division into two classes of proteins: (i) those proteins extractable in 4 M guanidine-HC1 were designated amelogenins; and (ii) those proteins that became solubilized when 4 M guanidine-HCl extraction was repeated in the presence of the demineralizing agent EDTA were termed enamelins (Termine et al., 1980). Both the amelogenins and enamelins appear rather heterogeneous when analysed by various techniques (Deutsch, 1989; Termine et al., 1980; Ogata et al., 1988). Numerous studies of amelogenins indicate that the heterogeneity results from both proteolytic processing and translation of alternatively spliced transcripts. Identification of the cause of the enamelin heterogeneity is much less comprehensive, although it is highly likely that proteolytic cleavage is at least partially responsible (Ogata et al., 1988; Uchida et al., 1991). Using affinity purified polyclonal antibodies against an enamelin 66 kDa protein as probe, a 2.7 kbp tuftelin cDNA was cloned from an expression library prepared from the bovine enamel organ (Deutsch et al., 1991). Antibodies produced to peptide sequences deduced from the cloned cDNA recognized several polypeptides ranging in size from 28 to 66 kDa on Western blot analysis. Immunohistochemical studies using these antibodies indicated that the tuftelin polypeptides occur mainly interprismatically, surrounding the enamel prisms. The data also suggested that some of the enamelin proteins, including tuftelin, secreted during the early stages of enamel formation are retained and found, perhaps in partially degraded form, in the mature enamel. In the present work, we have identified
1.4
Fig. 3. Northern blot analysis. Poly (A+) RNA (5/~g) was separated on a tbrmaldehyde-agarose gel, transferred to a nitrocellulose membrane and hybridized to radiolabelled tuftelin cDNA. The arrow identifies the 2.7-kb tuftelin mRNA.
we used RT/PCR with poly ( A + ) RNA isolated from fetal bovine teeth. After separation of the products by electrophoresis in agarose (Fig. 6), several products were identified, eluted and sequenced. The complete sequences of five clones were obtained. Three contained all the translated exons illustrated in Fig. 1, while in one clone the segment corresponding to exon 5 was missing and in another the segment corresponding to exon 6 was similarly missing.
B
HSS
S ESBB
E
HBSB
E
S S
SSHE
S
SH
SSS
ESE
SSBH SH
|3' Exon #
1
i)
4 .5
ii
2 3
6
7
8
9
10 11
12
13~
i' i gBTF.3
gB TF. I
~
• gBTF.2
gBTF.4
Poly A Signal
[
gBTF.5
I ....
I
0
5;
....
I 10
....
I
....
15
Nucleotides
I 20
....
I 25
....
I 30
(kb)
Fig. 4. Diagram of bovine tuftelin gene. The entire cDNA illustrated in Fig. 1 is contained in five 2 clones which are identified. Note that clone gBTF.2 does not overlap the adjacent clones. Exons are numbered. Restriction sites: B, BamHI; E, EcoRI; S, SstI; H, HindlIl.
494
M.M. Bashir et al. Exon Size #
ttgctgtgattgtcattatt ttctttctctctactctggc agtgttgttttccttatttc gcctcttgccctctcctccc aatttgatttttctttgaac ctcgtccctctcctggctat cagtccccatcctgtgtccc tttctttcctttggcccacc gcctttcggtcttaattccc agtgtctcccgcatctttgc atctcgtaaacttgtgttac gcatttctgactttcttttc
~p)
g t aagtaaaagcgttcgcgctg gt a t c g a g a t t g c t g g t g g a g c GCG AAG gt a g g c t t a g t t c a c c t c c a c g )~~ GAG gt g g g c a c c c c t t c c c t t t g t g GTG GCC ! ~ CAG gt aataagaaatggtccagagt GTG GAG gt aggtaacagggaaagtagtg ~,~) AAG GTG gt g a g a t t t g g g a t t t g a g g a g ACG gt a a c t g c t g g g g t c t t g c t c t GTC GAG ~ ) GAAAA g t aagggccctggctcaccgtg GGCT CAG g t a c g t g g g t g a c c c c t g c t c c GAG g t g t g t g t t c c g g g c a g c c c t g CTC GGCAG g t gagtgagcgcacatggagcg AAT CATC ~rmina~on
Initiation i l I E ~ i ! ~ ag GGC ~
ag ag ag ag ag ag ag ag ag ag ag
GCG AAG
m
i
Fig. 5. Nucleotide sequence of the tuftelin gene exon/intron junctions. Exons are numbered in a 5' to 3' direction. Divided border codons are underlined with a solid bar, while undivided border codons are underlined with an open bar. Borders of untranslated exons are italicized. Twenty bases of each flanking intron are shown.
four distinct types of transcripts (not counting the use of alternative polyadenylation sites). One type of transcript contained the entire sequence illustrated in Fig. 1. The three other types of transcripts lacked individual exons 2, 5 or 6. If these transcripts are all translated, three different isoforms of tuftelin would result, as we have designated exon 2 as untranslated in agreement with Deutsch et al. (1989). It should be noted, however, that the open reading frame extends 5' to another potential A T G codon located at bases 53-55. Thus, it is possible that the primary of tuftelin structure is larger and alternative splicing of exon 2 (which contains 75
kbp 1.353 1.078 0.872 0.603 Fig. 6. Agarose gel electrophoresis of the products produced by RT/PCR of tuftelin-coding region. Ameloblast cDNA obtained by RT was amplified by 30 PCR cycles using the oligonucleotides indicated in Fig. 1. Arrows identify reaction products that were isolated, cloned into pUC19 and sequenced. The lower band of 1.078 kbp contained two alternatively spliced products, one missing exon 5 and the other exon 6, which migrated to the same position.
nucleotides) could result in further variation. Furthermore, the phasing of exon/intron borders (Fig. 5) permits alternative splicing of many exons in a cassette fashion without disturbance of the reading frame, so that a number of other alternatively spliced products could be generated. While most of the differences between the present sequence and that of Deutsch et al. (1991) can be attributed to conventional polymorphic variation, the difference in the carboxyterminal segment due to an additional guanine residue at base 1097 in our sequence resulting in a frameshift must be for another reason, such as a novel type of polymorphism. Additional sequence data from cloned bovine tuftelin D N A as well as that from other species should help to resolve this quandary. Our findings certainly raise the distinct possibility that functional differences may exist between tuftelin isoforms. Another potential source of enamelin heterogeneity is variation in glycosylation. Enamelins isolated from bovine teeth had apparent sizes of 28, 30, 45 and 70 kDa (Ogata et al., 1988). They had no detectable N-glycosidic linkages but did contain Oglycosidic linkages, and had relatively large amounts of sialic acid and hexosamine. While the proteins had similar amino acid compositions, they apparently contained no cysteine and only small amounts of tyrosine. Tuftelin (Fig. 1) contains only one potential N-glycosylation consensus sequence and relatively small amounts of cysteine and tyrosine, but does contain many potential O-glycosylation sites. Thus, tuftelin may have been a component of the heterogeneous enamelin fraction as originally characterized by Termine et al. (1980). It is clear from the nucleotide sequences and structures of the amelogenin (Lau et al., 1989;
Cloning of bovine tuftelin gene G i b s o n et al., 1992; Salido et al., 1992) a n d tuftelin genes t h a t they have n o t h i n g in c o m m o n . T h e amelogenin gene is m u c h more compact, with a large p o r t i o n o f the coding sequence c o n t a i n e d in a single exon. In contrast, the tuftelin coding sequence is m o r e uniformly dispersed a n d is c o n t a i n e d in at least 11 coding exons. Thus, they a p p e a r to be evolutionarily distinct. Similarly, there is n o a p p a r e n t relation between either amelogenin or tuftelin a n d the recently de,;cribed protein, ameloblastin ( K r e b s b a c k et al., 1996). So far, o u r attempts to relate the o r g a n i z a t i o n of the tuftelin gene to potential p r o t e i n - d o m a i n structures have been u n p r o d u c tive. Also, the potential functions of the alternatively spliced p r o d u c t s are n o t readily a p p a r ent. The translated exons 5 a n d 6, which are alternatively spliced, contain serine a n d / o r t h r e o n i n e residues t h a t m a y be glycosylated, but the role o f such glycosylation in enamel matrix structure a n d function is far from clear. However, studies on r e c o m b i n a n t l y p r o d u c e d protein should facilitate d e v e l o p m e n t o f structural models a n d functional relations.
We thank Tom Tucker and the Biopolymer Analysis Laboratory, University of Pennsylvania School of Dental Medicine, for DNA sequence determination and informatics support. This work was supported by NIH grants DE10620 and DE08239. Acknowledgements
REFERENCES
Aldred M. J., Crawford P. J. M., Roberts E. and Thomas N. S. T. (1992) Identification of a nonsense mutation in the amelogenin gene (AMELX) in a family with Xlinked amelogenesis imperfexta (A1H1). Hum. Genet. 90, 413 416. Brookes S. J., Robinson C., Kirkham J. and Bonass W. A. (1992) Biochemistry and molecular biology of amelogenin proteins of developing dental enamel. Archs oral Biol. 40, 1-14. Chomczynski P. and Sacchi N. (1987) Single-step method of RNA isolation by acid guanidinium thiocyanat~phenol-chloroform exLraction. Anal. Biochem. 162, 156 159. Collier P. M., Sauk J. J., Rosenbloom J., Yuan Z. A. and Gibson C. W. (1996) A new amelogenin gene defect associated with X-linked Amelogenesis Imperfecta. J. Dent. Res. 75, 198. Deutsch D. (1989) Structure and function of enamel gene products. Anat. Rec. 224, 189-210. Deutsch D., Palmon A., Fisher L. W., Kolodny N., Termine J. D. and Young M. F. (1991) Sequencing of bovine enamelin ('Luftelin') a novel acidic enamel protein. J. Biol. Chem. 266, 16021-16028. Deutsch D., Palmon A., Young M. F., Sleig S., Kearns W. G. and Fisher L. W. (1994) Mapping of the human tuftelin (TUFT1) gene to chromosome 1 by fluorescence in situ hybridization. Mammalian Genome 5, 461-462. Diekwisch T., David S., Bringas P., Santos V. and Slavkin H. C. (1993) Antisense inhibition of AMEL translation demonstrates supramolecular controls for enamel HAP
495
crystal growth during embryonic mouse molar development. Development 117, 471-482. Eastoe J. I. (1979) Enamel protein chemistry-past, present and future. J. Dent. Res. 58(B), 753-763. Gibson C. W., Golub E., Ding W., Shimokawa H., Young M., Termine J. and Rosenbloom J. (1991) Identification of the leucine-rich amelogenin peptide (LRAP) as the translation product of an alternatively spliced transcript. Biochem. Biophys. Res. Commun. 174, 1306-1312. Gibson C. W., Golub E. E., Abrams W. R., Shen G., Ding W. and Rosenbloom J. (1992) Bovine amelogenin message heterogeneity: alternative splicing and Y-chromosomal gene transcription. Biochemistry 31, 83848388. Gubler U. and Hoffman B. J. (1993) A simple and very efficient method for generating cDNA libraries. Gene 2, 263 269. Krebsbach P. H., Lee S. K., Matsuki Y., Kozak C. A., Yamada K. M. and Yamada Y. (1996) Full-length sequence, localization, and chromosomal mapping of ameloblastin. A novel tooth-specific gene. J. Biol. Chem. 271, 4431-4435. Lagerstrom M., Dahl N., Nakahori Y., Nakagome Y., Backman B., Landegren U. and Pettersson U. (1991) A deletion in the amelogenin gene D(AMG) causes X-linked Amelogenesis Imperfecta (A1H1). Genomics 10, 971-975. Lau E. C., Mohandas T. D., Shapiro L. J., Slavkin H. C. and Snead M. L. (1989) Human and mouse amelogenin gene loci are on the sex chromosomes. Genomics 4, 162-168. Lau E. C., Simmer J. P., Bringas P., Hsu D. D. J., Hu C. C., Zeichner-David M., Thiemann F., Snead M. L., Slavkin A. C. and Fincham A. G. (1992) Alternative splicing of the mouse amelogenin primary RNA transcript contributes to amelogenin heterogeneity. Biochem. Biophys. Res. Commun. 188, 1253-1260. Lench N. J. and Winter G. B. (1995) Characterisation of molecular defects in X-linked Amelogenesis Imperfecta (A1H1). Human Mutation 5, 251-259. Limeback H. (1991) Molecular mechanisms in dental hard tissue mineralization. Curr. Opin. Dentistry 1, 826-835. Ogata Y., Shimokawa H. and Sasaki S. (1988) Purification, characterization, and biosynthesis of bovine enamelins. Calcif. Tissue Int. 43, 389-399. Padgett R. A., Grabowski P. J., Konarska M. M., Seiler S. and Sharp P. (1986) Splicing of messenger RNA precursors. Annu. Rev. Biochem. 55, 1119 1150. Salido E. C., Yen P. H., Koprivnikar K., Yu L. C. and Shapiro L. J. (1992) The human enamel protein gene amelogenin is expressed from both the X and the Y chromosomes. Am. J. Hum. Genet. 50, 303-316. Sanger F., Nicklen S. and Coulson A. R. (1977) DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. U.S.A. 74, 5463-5467. Shimokawa H., Sobel M. E., Sasaki M., Termine J. D. and Young M. F. (1987) Heterogeneity of amelogenin mRNA in the bovine tooth. J. Biol. Chem. 262, 40424047. Simmer J. P., Hu C-C., Lau E. C., Moradian-Oldak J., Slavkin H. C. and Fincham A. G. (1994) Alternative splicing of the mouse amelogenin primary RNA transcript. Calcif. Tissue Int. 54, 302-310. Simmer J. P. and Finchman A. G. (1995) Molecular mechanisms of dental enamel formation. Crit. Rev. Oral Biol. Med. 6, 84-108. Termine J. D., Belcourt A. B., Christner P. J., Conn K. M. and Nylen M. U. (1980) Properties of dissociatively extracted fetal tooth matrix proteins. J. Biol. Chem. 255, 9760-9768.
496
M . M . Bashir et al.
Uchida T., Tanabe T., Fukae M. and Schimizu M. (1991) Immunocytochemical and immunochemical detection of a 32 kDa nonamelogenin and related proteins in porcine tooth germs. Archs Histol. Cytol. 54, 527-538.
Ziechner-David M., Diekwisch T., Fincham A., Lau E., MacDougall M., Moradian-Oldak J., Simmer J., Snead M. and Slavkin H. C. (1995) Control of ameloblast differentiation. Int. J. Dev. BioL 39, 69-92.