Molecular Structure and Transcriptional Regulation of the Salivary Gland Proline-Rich Protein M uItigene FamiIies DON M. CARL SON,^ ZHOU~ AND PAUL S. WRIGHT~ JIE
Department of Biochemistry and Biophysics University of California-Davis Davis, California 95616
I. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. PRP mRNAs and Cell-free Translation Analysis ........ 111. PRP cDNAs and Amino-acid Sequences ............................ IV. Sequence and Structural Analyses of PRP Genes V. Regulation of Expression of PRP Genes ............................ VI. Functional Aspects of PRPs .........
16 18
...........
21
3 6 9
The proline-rich proteins (PRPs) in mammalian salivary glands are encoded by tissue-specific multigene families whose members have diverged with respect to structure and regulation of expression. A common evolutionary origin of the P R P genes is evident from the extensive conservation of 5’untranslated regions, coding sequences, and intronlexon organizations. The 42-nucleotide repeat unit CCA CCA CCA CCA GGA GGC CCA CAG CCG AGA CCC CCT CAA GGC has been proposed (1) as the ancestral unit, multiples of three bases probably being recruited into, or deleted from, this ancestral sequence during gene duplication. Gene conversion possibly was the mechanism of homogenization of the divergence of the internal repeats. Two nonallelic mouse P R P genes ( M P 2 and M 1 4 ) have essentially identical sequences, with two major differences (2). M P 2 has 13tandemly arranged 42-nucleotide repeats, whereas M 1 4 has 17 repeats. M 1 4 has an insertion by transposition of a two-kilobase member of the long, interspersed elements of repeated mouse DNA (LINE family) into intron I. The 5’-untranslated se-
* To whom correspondence may be addressed.
Present address: Neurological Sciences Institute, Good Samaritan Hospital and Medical Center, Portland, Oregon 97209. Present address: Merrell Dow Pharmaceuticals. Inc., Cincinnati, Ohio 45215. 2
1 Progress i i i Nucleic Acid Hrearch and Moleciilar Biology, Vnl. 41
Copyright 8 1991 by Academic Press. Inc. All rights of reproduction in any form reserved.
2
DON M. CARLSON ET AL.
quences and regions encoding the signal peptides of all PRP mRNAs, regardless of source, are nearly identical. In another multigene family from rat submandibular glands that encodes contiguous repeat proteins (CRPs) or glutamic acid/glutamine-rich proteins (Glx-rich proteins), the 5'-untranslated sequences and the regions encoding the signal peptides of the mRNAs are 91% identical (nucleotides) and 92% identical (amino acids) to the PRP mRNAs (3, 4). Two mRNA size-classes, each containing multiple PRP mRNAs, are transcripts from PRP gene families of mice (5), hamsters (6),rats (i'),and humans (8).The CRP or Glx-rich multigene family also encodes two size-classes of mRNAs, and this multigene family has the same introdexon organization as the mouse and rat PRP genes. Cell-free translations show some unusual differences in PRPs encoded by mRNAs from parotid glands of four mouse strains (BALB/cJ, DBA/2J, CD-1, and C57BL/6J) after isoproterenol treatment (5).Reasons for the variations of translation products in these mouse strains after induction of the PRP gene families are unknown. Repeated administration of the P-agonist isoproterenol causes hypertrophy and hyperplasia of rat and mouse parotid and submandibular glands (9, 10).The morphological changes are accompanied by a dramatic increase, or induction, in the synthesis of PRPs. Typically, these proteins contain 25-45% proline, 18-22% glycine, and 18-22% glutamine and glutamic acid. Aromatic and sulfur-containing amino acids are either very low in amount or absent. Generally, PRPs can be divided into acidic and basic groups, and both groups may be glycosylated and phosphorylated. PRPs may compose more than 70% of the protein in salivary gland extracts after treatment with isoproterenol. All proteins derived from the nucleotide sequences of PRP cDNAs and PRP genes are characterized by four general regions: a signal peptide region, a transition region, a repeat region, and a carboxyl-terminal region (11). The apparent tissue-specific synthesis and the appearance of PRPs in saliva in such large quantities, either constitutive (as in humans) or induced by isoproterenol, suggest biological functions in the oral cavity and the gastrointestinal tract. Several functions, such as calcium binding, inhibition of hydroxylapatite formation, and formation of the dental-acquired pellicle, have been attributed to the human salivary PRPs (12). PRPs have an unusually high &nity for such multihydroxylated phenols as tannins; feeding tannins to rats and mice mimics the effects of isoproterenol on the parotid glands (13). The induction of PRP synthesis by dietary tannins clearly results in a protective response against the detrimental effects of the tannins (13). Unlike mice and rats, hamsters do not respond to tannins in the diet by the induction of PRPs. Pronounced detrimental effects are observed in weanling hamsters specifically. When these animals are maintained on a 2%
t
tannin diet for 6 months, they fail to grow (6).Tannins are unusually toxic to weanling hamsters; an increase of tannin in the diet to 4% causes death to most animals within 3 days. The association of tannins with pathological problems, including carcinogenesis and hepatotoxicity, and the influences on growth and toxicity in hamsters, have led to the proposal that PRPs may act as a first line of defense against these multihydroxylated phenols (13). This review focuses on the biochemistry and molecular biology of the salivary PRPs; it is not intended to be an overall or complete review of PRPs. To those who have contributed to the PRP literature and whose work is not mentioned, we apologize. Previous reviews are used for many references and studies.
1. Background4 Salivary glands of various animals synthesize, or can be induced to synthesize, a group of proteins unusually high in proline, the so-called prolinerich proteins (PRPs) (12, 14-20). These proteins collectively constitute the largest group of proteins in human salivary secretions, making up more than 70%of the secreted proteins (12).PRPs may be divided into acidic and basic groups, and members of each group may be phosphorylated or glycosylated, or both. These unusual proteins are constitutive in human saliva, but families of similar proteins are dramatically increased or induced in parotid and submandibular glands of rats, mice, and hamsters by isoproterenol treatment (6, 18, 19,21).Profound morphological effects on rat parotid glands by isoproterenol treatment were first observed in 1961 (9, 10). Repeated pharmacological doses cause dramatic glandular hypertrophy (Fig. 1). The increase in DNA synthesis with isoproterenol treatment (25, 26) probably results mainly from polyploidy; by 4-5 days, more polyploid than diploid nuclei are seen (Fig. 2) (see 27 for a review on the regulation of salivary gland size and the effects of isoproterenol). The dramatic accumulation of PRPs in the parotid glands of rats treated with isoproterenol was first described in 1974 (16, 18,28).After 7-10 days of treatment (5 mg of isoproterenol per day), PRPs composed about 70%of the total soluble proteins in parotid gland extracts. Initially, an acidic PRP (PI =
4 Reviews describing mainly the human PRP families are available (12, 22, 23). These unusual proteins were first observed in human saliva by Mandel, Thompson, and Ellison (24) and were first purified and characterized by Bennick and Connell(14) and by Oppenheim, Hay, and Franzblau (15).The genetics of this human multigene family were described in a review by Bennick (23). Other than for comparisons of the human cDNAs and multigene families, this review focuses primarily on the tissue-specific inducible multigene PRP families of mouse, rat, and hamster.
4
DON M. CARLSON ET AL.
FIG.1. Hypertrophic effects of isoproterenol treatment on rat salivary glands. Rats (150200 g of body weight) were injected intraperitoneally with 5 mg of isoproterenol daily for 7 days. The parotid glands (p), submandibular glands (sm), and sublingual glands (sl) were removed from control (bottom) and isoproterenol-treated animals (top). No changes were noted for the sublingual glands, which secrete principally mucous glycoproteins. Parotid glands, which are serous secretors, showed a dramatic increase in weight of about 6- to l0-fold. Submandibular glands are of a mixed cell population and showed an intermediate response to isoproterenol.
4.5) was identified (Ipr-lA2), and this protein was phosphorylated and glycosylated (16, 18, 19). Subsequently, six basic PRPs unusually high in proline (40-44%), glutamine plus glutamate (22-25%), and glycine (18-20%), containing varying amounts of lysine plus arginine (7-9%), were isolated and characterized (18, 19). Aromatic and sulfur-containing amino acids were either absent or present in very low amounts. Therefore, PRPs have little or no absorbance at 280 nm. Neither hydroxylysine nor hydroxyproline is present and the treatment of these PRPs with purified prolyl hydroxylase failed to convert proline into hydroxyproline. The molecular weights of the basic proteins, from sedimentation equilibrium, ranged from 15,000 to 18,OOO, and that of PRP Ipr-1A2 was 25,000. A high MW,,, (71,000) was observed following chromatography on Sephadex G-100, but the unusually high axial ratio (>25) of these proteins undoubtedly caused this value to be substantially overestimated. S values ranged from 1.1 to 1.4. Circular dichroism spectra showed no a-helical or polyproline conformations.
FIG. 2. Karyotypes of (a) a mouse bone marrow cell and (b) a monse parotid gland cell. The chromosomal display of the mouse hone marrow cell showed the normal 2n (= 40) chromosomes after 2 days of isoproterenol treatment. The mouse parotid gland cells (>50% of the cells) showed 471 chromosomes after 2 days of isoproterenol treatment. (Courtesy of Christopher Bidwell.)
6
DON M. CARLSON ET AL.
II, PRP mRNAs and Cell-free Translation Analysis Studies by cell-free translation analysis using the reticulocyte lysate system and labeling with [3H]proline or 135Slmethionine showed dramatic and definitive changes in the patterns of protein synthesis in parotid glands of isoproterenol-treated rats, and PRP mRNAs were highly elevated in the treated animals (29).There was very little synthesis of PRPs from poIy(A)+ RNAs from glands of control rats: poly(A) RNAs from the glands of treated animals synthesized mainly PRPs; translation patterns with [3H]proline and [35S]methionine gave identical labeling patterns; and PRPs from cell-free translations were all precipitated by antibodies to PRPs. [35S]Methionine was incorporated only into the initiation site, as determined by sequence analysis and by the fact that PRPs synthesized by tissue slices of parotid glands of isoproterenol-treated rats in the presence of [35S]methionine contained no 35S label. Because most PRPs are acid-soluble, a property first used in the purification procedures of rat submandibular gland PRPs (30),it is imperative that cell-free translation products be precipitated with a solution containing both trichloroacetic and phosphotungstic acids (29). The induction of PRP mRNAs in the parotid and submandibular glands of both rats and mice by isoproterenol treatment has been demonstrated by Northern and dot-blot hybridizations (21). PRP mRNAs either are very low or are not detectable in the glands of untreated rats and mice. After 4-5 days of isoproterenol treatment, mRNAs encoding these unusual proteins compose over 50% of the total glandular mRNAs (5). For example, plasmid pRP25 does not hybridize with RNAs from control rats (Fig. 3A), but does hybridize with PRP mRNAs of two size-classes, ranging from 600 to 1100 bases, from isoproterenol-treated animals. These size ranges of mRNAs are consistent with all rat RNA preparations tested. The multiplicity of PRPs encoded by the PRP mRNAs from treated rats is evident from Fig. 3B, since about 12 PRPs were identified by cell-free translation analysis and immunoprecipitation. The PRP cDNA insert of pUMP40 (11), prepared from mRNAs from BALB/cJ mice, has been tentatively identified as the transcript of the mouse PRP gene MP2 (1).However, the nucleotide sequences of MP2 and the PRP insert of pUMP40 showed only 98% homology (1).MP2 was cloned from a genomic library prepared from chromosomal DNA from the CD-1 mouse strain. In an attempt to reconcile the heterologous regions and base differences between the CD-1 mouse gene MP2 and the BALB/cJ mouse mRNA, we isolated mRNAs from four mouse strains. Northern blots of total RNA from the parotid glands of mouse strains CD-1 and BALB/cJ and from strains DBA/2J and C57BL/6J, from both control and isoproterenol-treated mice, were probed with 32P-labeled exon +
7
PROLINE-RICH PROTEIN MULTICENE FAMILIES
A
B
1078 1353
872 -
603 -
FIG.3. Northern blot of parotid gland RNA from normal and isoproterenol-treated rats and cell-free translations of “sized” PRP mRNAs. (A) Parotid gland RNAs (10 pg) from normal and isoproterenol-treated rats were electrophoresed on a 1.5% agarose gel containing 5 mM methyl mercury hydroxide and transferred to nitrocellulose. The blot was probed with 32Plabeled pRP25 (11).(B) RNA was isolated from a methyl mercury denaturing low-melting-point agarose gel and translated in oitro with [SSImethionine. The translation products were separated by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Lanes l and 8 show S S label incorporated in the absence of RNA and with total RNA from the parotid glands of isoproterenol-treated rats, respectively. Lanes 2-7 are the translation products obtained from RNA indicated in (A). Molecular-weight standards ( X l O - 3 ) are indicated at the right, and nucleotide standards are indicated at the left. [Reprinted with permission from the Journal of Biological Chemistry (S).]
IIb (see Fig. 10) of PRP gene MP2 (5). Two major classes of PRP mRNAs were detected in the treated animals. RNA species of about 1050 and 1300 bases for BALB/cJ and DBA/2J mice and about 1100 and 1200 bases for CD-1 and C57BL/6J mice were observed. Cell-free translations of total RNA from these four mouse strains showed interesting and unusual differences in the PRPs synthesized (Fig. 4). Similar labeling patterns were observed with both [3H]proline and [35S]methionine. The amounts or levels of incorporation varied considerably between controls and treated animals, and cx-
8
DON M. CARLSON ET AL.
35S-Met
I
3
M.W. Std.
I PR
NORMAL
a
-3
x c
0 m
-J
U
\ -I I
m lc
n m
m V
V
’
*
(D
\
a
m
-I
I
m U 0 n m V
3 H -Pro
3
(0
cv
\ -I
lc
U
lc
U
o
0
In 0
D
\ -I
I PR
NORMAL 7
7
3
cv
1
3
m
\
In
m
m
3
cv
\
m
(D
7
-
‘
3
-IN
m \
I + a a n o m -I
m o
o n
45.0
31 .O
21.5
FIG.4. Translation products of RNAs prepared from four different mouse strains. Ten micrograms of total RNA from parotid glands of the mouse strains indicated, both before and after isoproterenol treatment, was translated with either [35S]methionine or [3H]proline; 200,000cpm of 35s and 50,000 cpm of 3H were applied to gel electrophoresis. a-Amylase and parotid-specific protein are indicated by the upper and lower arrows, respectively. Molecularweight standards (XlOW3) (M.W. Std.) are indicated at the left. [Reprinted with permission from the Journal of Biological Chemistry (5).]
amylase (upper arrow, Fig. 4)and parotid-specific protein (lower arrow, Fig. 4) were dramatically reduced. a-Amylase and parotid-specific protein expressions appear to be regulated in concert (31).Earlier cell-free translation experiments (21, 29) and
RNA/DNA hybridization results (5) show that a-amylase mRNA is dramat-
PROLINE-RICH PROTEIN MULTIGENE FAMILIES
9
ically decreased by isoproterenol treatment. In a related study, a polymorphism in an androgen-regulated single-copy mouse gene (RE?)produced three major mRNAs (32).These RP2 mRNAs differed in the lengths of their untranslated 3’ regions as a result of using different polyadenylation sites, and additional variability resulted from the insertion of a member of the mouse B1 family. However, these RP2 polymorphisms had no effect on the translation product. Whether the PRP polymorphisms are the result of different PRP genes or are caused by differential RNA splicing remains to be determined.
111. PRP cDNAs and Amino-acid Sequences Plasmids containing cDNAs for PRPs were first isolated from a cDNA library prepared from RNA isolated from the parotid glands of isoproterenoltreated rats (7). Four recombinant plasmids (pRP8, pRP18, pRP25, and pRP33) were selected. Several mRNAs hybridized to each PRP cDNA, which emphasized the similarities in nucleotide sequences of the PRP mRNAs. This could have resulted from the expression of a family of closely related genes or from the production of multiple mRNAs from the same or similar genes by different splicing patterns. Whether one or both possibilities are responsible for the multiple PRP mRNAs has not been unequivocally demonstrated. Subsequently, several more PRP cDNAs were cloned from mouse and rat parotid gland mRNAs after isoproterenol treatment (11) and from the human parotid gland (33).5 The nucleotide sequence of the PRP cDNA insert of pRP33 (7) encodes the acidic PRP, Ipr-1A2 (Fig. 5). The 13 amino-terminal amino acids are highly hydrophobic and are probably part of a signal peptide (the signal peptide region) (see Fig. 8). The next 60 amino acids (the “transition” region) contain numerous acidic residues, with 10 aspartic acids in the 16-aminoacid sequence of Asp-58 to Asp-73. The “proline-rich” region (residues 80-189) is high in proline, glycine, and glutamine, and includes six repeats of 18 or 19 amino acids (the “repeat” region). The 17 carboxyl-terminal amino acids (the carboxyl-terminal region) contain single residues of tyrosine (-2O1), tryptophan (-203), and phenylalanine (-204) clustered close to the carboxylterminal serine (-206). These data, derived from the nucleotide sequence of pRP33, gave the first complete amino-acid sequence of a PRP. This sequence has been compared (7) with the partial amino-acid sequences reported for the human acidic (34) and basic (12)PRPs. Subsequent data derived from several PRP cDNA and PRP gene sequences show that the first 100 nucleotides in the 5’ regions of PRP mRNAs, which contain the 5’5 Differential splicing is considered to contribute to the multiple PRP mRNAs in the human salivary gland (33).
10
DON M. CARLSON ET AL.
'
M
L
V
V
L
L
T
A
A
L
L
V
L
S
S
A
H
G
S
m
D
E
E
V
T
Y
E
D
S
S
S
Q
L
L
D
V
E
Q
Q
a
N
Q
K
H
G
Q
H
H
Q
K
P
P
P
A
S
D
E
N
G
s
D
G
D
D
S
D
D
G
D
D
D
G
S
G
D
D
G
N
R
E
R
8oP P
P
H
G
G
N
H
Q
R
P
P
P
G
H
H
H
G
9BP P
P
S
G
G
P
Q
T
S
S
Q
P
G
N
P
Q
G
P
P
Q
P
G
N
P
Q
G
P
P
nP
' " P P P Q G G P Q G "
P
P
P
Q
G
G
P
Q
Q
P
P
Q
G
G
P
Q
G
P
P
Q
G
G
H
Q
Q
I w P A Q D A T H
E
Q
"'P
R
Q
P
G
K
P
Q
G
P
P
Q
P
G
N
P
Q
G
R
P
P
Q
P
R
Q
D
P
S
Y
L
W
F
K P 206 S S
FIG. 5. The amino-acid sequence derived from primer-extended pRP33, arranged to align similar sequences. [Reprinted with permission from the Journal of Biological Chemistry (7.1
flanking sequence and encode exon I (or the putative signal peptide), have unusually strong homologies (>95% identity) (11)(Fig. 6). Sequence data obtained from a multigene family encoding CRPs from rat submandibular glands (3), are unusually high in glutamine and glutamate (4) and are very similar in this region, 91% of the nucleotides and 92% of the amino acids being identical to the sequences of the PRPs (3).
IV. Sequence and Structural Analyses of PRP Genes One of the family of mouse PRP genes was isolated on a cloned 3600-bp EcoRilBgZII-generated DNA fragment from a partial Sau3A bacteriophage library of CD-1 mouse chromosomal DNA (1).The transcriptional unit included three exonic sequences separated by 1434 bp (intron I) and by 450 bp (intron 11). The upstream sequence (Fig. 7) had putative induction sites for CAMP(box I11 and box I) and an activator or enhancer sequence (box II), ZDNA sequences that flanked an 86-bp sequence, a TATA box, and a CAAT box (1).The derived amino-acid sequence of this PRP gene ( M P 2 ) revealed a protein that contained 13 tandemly arranged repeats of 14 amino acids with the prototype sequence P P P P G G P Q P R P P Q G (Fig. 8). Each amino
-20
-30 I
up2 A
M(terenrs:
C
-10 I
I I
A
C
T
T
C
.
~
0
10
20
30
40
I
I
I
I
I
~
G
M
A
M
C
T
C
C
T
T
C
50
60
I
C
~
~
I ~
T
~
T
~
~
02030000000330220001003011213010000000230130300130330003301030010123311010010230111123333333300010
FIG.6. Comparisons of the S’-flanking regions and sequences e n d i n g exon 1 of PRP cDNAs and PRP genes from mouse ( M P 2 , pUMP40, pUMPl2, and pUMPl), rat (pRP25, pRP33, and pRP18), human (CPTI, CPT3, and CP6), hamster (H29), and rat CRPB. The legend indicating differences of 0, 1, 2, and 2 3 denotes the relative conservation of bases at each position. In 65% of the positions, there is only 0-1 base change.
~
T
C
C
T
H29
HE2 C T A A C C T T A A G C A T C T T T A A T A G A A C A A ~ T ~ G ~ G C ~ T C T-650 AT
I l l
I I
I
I Ill
I
-494 TCCCTACTGGGTGAGCTAACTCCCTACACAAT"AAACAAATCAATCAACT
I
-
D
I
GGTCCTTCAIAAATGTAACAGTCAAA-CAIACTCAC-CAGGAATTACGGATT-602
II
IIIIIIII Ill
1 1 1 1 I II II
I
-444 AAGTGTTAT GCATGTAACA-TCATGCCA A-TAACACUTGAA
+
I
I
-
d
CAAGATATTGACTCATGTATACCTCATATGTGTTGTGACTCCACTTTTAC -552
0
TGGGACTTTATAGATGAAATAGGTCTCATGCTTTACTA GCCAAT GTTGTA -502 GTTATTGTGTTAGGTCAGGAGAATAGTGGGCACTCTTACTGAGGCTTAGC -452 ATGTTAGGGATTCCAAGGGTCTTGGTGTAATTGATAlTTGlTTATGAATA
-402
GCCTCAACACCATCACTCTTAACTAATTATAGAATATATAAGAACATATA
-352
TAAGTGACAGTGGTTAAGCTATCCTACTGATCATAAA?iATTGACCACATT
-302
CAATTTGGACAGAAATCATTACTGTCAA TATAAACAAAT
-263
I I I I I I l l II CATAATTTTGCACCTTTAGTCTCAGTGA CAGGAA
-404
-370
I l l IIIIII IIII I Ill I I II II IIIII GAATATGGACA AAATTA TACAGGTATGTAGAAGCACCTCCCACAA
-322 TCATACCTAATAGGTCAGAGTCAGAGTTATGTCAATAACAGTGTCTTACA -274 CAATGATAGGCCTTAAAGGACAATAGACTTATTG -238 ATAGAACTATATATCTAATGTCTAGACTTTGCCTGTATCACTTAAACTAT GTGTATGCACACTAGTTTTA -244
I
I I
-1aa TGTTGTCAAAATTTCACATTGTACCATAGAGAACTGAAACATTGACTGCA CCCCAATGCACATTGATACACAAA AAATGTCAGCAAATGCA ATGAGATAT -193
I
II I
I I II
I
-138 TCCTGCTGGGCTAGAGTCCCAAAG AAAAGTCAGT GATGCA AAG
TTATATATTGTTAGTCATTACTGCAATAACTGGGTTATATGATTACATAG -143 GAGTTTTTTCTAGTAGGGACACTAGCAGCTAGC
TCTTCCTTACCTCATCCTGATGGGCAAAAGTCCCAGTGTCACACAAAGGA
-60
GAAAGGTGACATTCTTCTGCTCCTCCTTATAAAGGCAGTGTCTTACT
-12
II II I I l l II I I II I l l I I IIII TA TCCTGCTG TG TC AGGT CAGATCAATAGTGAGGA
-95 C
-60
I IIIIIIII IIII IIIIII I IIIIIIIIIII II I I CATGAAAGGTGCCATTGTTCTGC CTTCCTTATAAAGATTTTGGCCTTGC TCTTCCAGCACAGACTTGG
I
IIIIIIIIIIIIIII
-11 TGGCCCAGCACAGACTTGG FIG.7. Comparisons of upstream sequences of mouse PRP gene M P 2 and hamster PRP gene H 2 9 . The upstream sequences of M P 2 and H 2 9 are aligned to maximize sequence similarities. Putative regulatory regions are indicated. Boxes 111 and I, M P 2 , -640 to -623 and -218 to 203. respectively; arrows, AP-1 binding sites; GCCAAT, -513 to 508. CCAAT box.
13
PROLINE-RICH PROTEIN MULTICENE FAMILIES
I M L V V L F T V A L L A L S S 1 6 A Q G P R E E L Q N Q I Q I P N Q R
SIGNAL PEPTIDE TRANS I T I O N REG ION
3 4 P P P S G F Q P R P P V N G S Q Q G 52P P P P G G P Q P R P P
Q
G
REPEAT REGION
66P P P P G G P Q P R P P Q G 8oP P P P G G P Q P R P P
Q G
g4P P P P G G P Q P R P P Q G 108P P P P G G P Q Q R P P Q G 122P P P P G G P Q P R P P Q G 136P P P P G G P Q L R P P Q G 15oP P P P A G P Q P R P P Q G 16‘4P P P P A G P Q P R P P Q G 178P P T T - G P Q P R P T Q G 191P P P T G G P
Q Q R P P Q G
205P P P P G G P Q P R P P Q G 219P P P P G G P Q P S P T Q G 2 3 3 P P P T G G P Q Q T P P L A G N T 61 G
CARBOXYL TERMINUS
2 5 2 P P Q G R P Q G P R STOP 26 1 FIG. 8. Amino-acid sequence of PRP GPMsm derived from the nucleotide sequence of mouse PRP gene MP2.
acid within the repeat had its “favored” codon ( 1 ) (Table I), and six amino acids had a total conservation of codons for all 13 repeats. Subsequent studies (2) showed that two nonallelic PRP genes ( M P 2 and M 1 4 ) are tandemly arrayed and separated by about 30 kbp. Analysis of DNA sequences suggested that M P 2 and M 1 4 arose via gene duplication of a common ancestor. A homology matrix, or “dot-plot,” showed virtually no spurious background, and, aside from three differences, the sequences of the two genes, including the introns, were nearly identical. The differences observed were two additional sequences in intron I of M 1 4 of 223 and 2005 bp, four additional repeats (17 repeats total) in M 1 4 , and fractional sequence differences of the simple repetitive sequences (CA, TA, and TAGA) of intron
14
DON M. CARLSON ET AL.
TABLE I COMPARISON OF CODON USAGE’ Codon
MP2
Other mouse genes
CCU Pro (47)b CCA CCG
16 16 60 8
31 25 38 6
GGU Gly (19)b GGC GGA GGG
0 62 31 7
21 26 32 21
CAA Gln (17)b CAG
53 47
35
CCU Arg (7)b CGC CGA CGG AGA AGG
6 0 6 6 64 18
7 13 13 12 31 24
ACU Thr (4)b ACC ACA ACG
23 0 77 0
25 36 32 7
UAA term UAG UGA
100 0 0
0 0 100
ccc
65
“Reprinted with permission from theloumul of Biological Chemistry ( 1 ) . b Amino-acid composition in mol%.
I. The additional 2005-bp sequence was the 3‘ portion of the mouse LINE element, and it apparently had been transposed into intron I, but in the opposite orientation of M 1 4 (Fig. 9). This mouse LINE sequence (LIMdPRP), like most mouse LINE elements, is truncated at the 5‘ end (35). LIMd-PRP contains the typical polyadenylation signal (AATAAA) and an adenine-rich sequence, and it is flanked by a pair of 10-bp imperfect repeats (TGTCTTTTTT and TGTCTTTCTT). This IO-bp sequence is present only once in MF2. These and other data are strong evidence that LIMd-PRP entered this PRP locus via transposition. Both M P 2 and M 1 4 are transcriptionally active in the parotid gland when the mouse is treated with isoproterenol (11). PRP cDNAs pUMP4 and pUMP40 are encoded by M 1 4 and MF2, respectively. The number of tan-
15
PROLINE-RICH PROTEIN MULTIGENE FAMILIES 0
10
5
15
30 35
25
20
40
45
50
60
55
65
70 75
80 kb
GENES
Hindm Sol1 BamHI EcoRI
CLONES
I
9
I
I,
1
0
,
b
,
,,
I
f
,,,
, I
c
,d,
e
,
I
,
MC I6 MC22 M I4
FIG. 9. Linkage of mouse PRP genes MP2 and M14.The organization of MP2 and M I 4 are shown by the expanded scales and the relative lengths are indicated by the kilobase bar. Solid bars show the three exonic regions, and the open arrow ( M 1 4 ) represents the LINE insert. Arrows show the direction of transcription. [Reprinted with permission from the Journal of Biological Chemistry @).I
dem repeats within each gene varies, as we indicated (I, 36), and is similar to those reported (37) using variable number of tandem repeats (VNTRs) as markers for mapping human genes. However, PRP tandem repeats are the major body of the active gene, and no sequence similarity exists between this repeat and the invariant core sequence of VNTRs (37, 38). Sequence analysis of a hamster PRP gene (H29) showed that the hamster, rat, mouse, and human PRP genes are all closely related. Mouse and rat PRP genes have two exons encoding PRPs (I and IIb) (Fig. lo), while hamster and human genes have three exons (I, IIa, and IIb). Exon IIa of hamster and human PRP genes are both comprised of 36 bp, seemingly coming from the 5’ sequences of exon IIb of the mouse and rat. Whether this difference in PRP gene organization resulted from a separation or combination of exonic regions is unknown. Upstream regulatory regions of mouse, hamster, and human PRP genes are discussed in Section V. Unlike PRPs from mice and rats, which are all blocked on the amino terminus, hamster PRPs Hp43a and Hp43b were partially sequenced from the amino terminus (39). The open reading frame of exon I (H29) encodes hydrophobic residues, the putative signal peptide. Exon IIa contains only 36 bp, and the derived amino-acid sequence is A T I Y E D S I S Q L S, which is exactly the sequence of the amino terminus of Hp43a, except in position 8, which is D instead of I. Exon IIb contains 514 bp and encodes the mature protein, except for the first 12 amino acids. Exon IIb has 10 Hue111 sites and six Sau96I restriction enzyme sites, which is one of the unusual characteristics of PRP genes (I).One open reading frame of exon IIb encodes a 20-
16
DON M. CARLSON ET AL. EI
E Ilo
\ ,I
Hamster
H29
'\\
M
ll
N
\
0
Human PRHl
3'
5'
I,
I,
,
,
,'
, I
,/
,
\I
'\
'
;
:: I t
U
I
1 kb
FIG. 10. Comparison of PRP gene organizations in mice, hamsters, and humans. Related exonic regions (bars)are connected by dashed lines. The gene organizations of hamster H29 (36) and human PRHl (8)show the additional 36-bp exonic sequence. [Reprinted with permission from the Journal of Biological Chemistry (2).]
aminoacid peptide that is repeated five times and has a prototype sequence of P P Q Q E G Q Q Q N R P P K P Q N Q E G. The first 43aminoacids and the last 12 amino acids derived from the nucleotide sequences of exons IIa and IIb diverge from this prototype repeat pattern and give rise to the transition region and the carboxyl-terminal region, respectively. While the 5'-noncoding regions and the sequences encoding the putative signal peptides (about 100-110 bp) are highly conserved in all PRP mRNAs (Fig. 6), there is a discrepancy in the apparent cleavage site by the signal peptidase in hamster PRPs, as suggested by the amino-acid sequence of Hp43a. From the amino-acid sequence derived from PRP gene H 2 9 , the nascent polypeptide chain has the sequence M L V V L L T A A L L A & E H f A T I Y E----, with Glu (E) and His (H) preceding the apparent site of cleavage ( t ) (see the amino-terminal sequence encoded by exon IIa, above, A T I Y E D-----). Histidine has not been observed in position -1 (counting from the cleavage site) (40). We have proposed that the signal peptidase cleaves between Ala and Glu ( J. ) and that Glu and His are removed by further processing (36). However, this proposal predicts an unusually short signal peptide of only 12 amino acids.
V. Regulation of Expression of PRP Genes The upstream sequences of mouse PRP genes M P 2 and M14 and hamster PRP gene H 2 9 contain potential regulatory elements (Figs. 7 and 11). In each of these genes, three highly conserved regions were identified (boxes I111) (1).These regions include two putative CAMPresponse elements (boxes
17
PROLINE-RICII PROTEIN MULTIGENE FAMILIES
Source
Sequence
E. coli CRP Binding Site
Bovine a-gonadotropin Human VIP
A A N T G T G A N N -122 -81
Reference (411
- T N N N - N C A
T T A T G T G A A G - T A C -
- .C A
-108
(55)
T A C T G T G A C G
- T C A
-65
(56)
- T C T T
PRP Genes, Box 111 Mouse MP2 Mouse M14 Hamster H29
-640 -640 -435
A A A T G T A A C A G T C A A - A C A C A A T G T A A C A G T C A A - A C A G C A T G T A A C A - T C A T G C C A
-623 -623 -418
(1) (2) (36)
PRP Genes, Box I Mouse MP2 Mouse M14 Hamster H29
-218 -217 -114
A A A T G T C A G C - A A A T - G C A A A T T G T C A G C - T A A T - G C A A A A T G T C A G T - - G A T - G C A
-203 -202 -99
(1) (2) (36)
Human PRP Genes PRHl PRH2
-484 -484
A A A T G T G A A A A T A C C A A A T A T A C A A A T A T C
-467 -467
(8) (8)
Relative Base Frequencies: A T G C
0 1 0 411 12 11 12 0 0
0 0 0 5 0 0 0 0 3 1
1
11 0 0
- T C A - C A T 1lJ 0 1 0 11 0
FIG. 11. Sequence comparisons of the E . coli CRP binding site with putative cAMP regulatory sites (boxes I and 111) of PRP gene M P 2 . Similar sequences in the E . coli CRP binding site are reported in bovine a-gonadotropin, human vasoactive intestinal peptide (VIP), and human PRHl (PRP) genes. The relative frequencies of A, T, G , and C at each position are indicated. Positions 4, 5, 6, and 8 (TGT-A) are totally conserved. Positions 12, 18, and 19 each show only one substitution: A for T ( 1 2 ) in M P 2 , and A for C (18)and C for A (19),both found in human PRHI.
I and 111) with sequences similar to the CRP binding site required for transcriptional activation of CAMP-regulated genes in Escherichia coli (1,41) (Fig. 11). Also, box I11 of the mouse PRP genes contains a sequence (-637 TAACAGTCA - 629) which resembles the 8-bp palindromic sequence TGACGTCA, a cAMP response element (CRE) in eukaryotes (42).The palindrome is imperfect by one base (A for G), and it is interrupted by an A. This sequence in box I11 of the hamster gene is similar, but it lacks a G (TAAC-TGA). Such overlapping sequences of CRP and CRE have been shown to be functionally related (43). Mammalian activation-translation factors (ATF)can bind specifically to some E. coli CRP sites, and, conversely, E. coli CRP-binding protein specifically binds to some mammalian ATF sites (43). Of considerable interest is the observation that multiple AP-1 binding sites (44) are present immediately 3' to box I11 in PRP genes MP2 and M14 (Fig. 11).These sequences are not present in the hamster gene. A perfect
18
DON M. CARLSON ET AL.
copy of the AP-1 heptamer, TGACTCA, is located at positions -594 to -588. A totally conserved CCAAT box (GCCAAT) is located at positions -516 to -509 in MP2 and M14. Proteins known to bind to the CCAAT box (CTF/NF-1 proteins) activate both transcription and replication (45). The proline-rich transcriptional activator of CTF/NF-1 is distinct from the replication and DNA binding domain in that it requires an additional carboxylterminal domain (46). Preliminary studies using gel-mobility-shift assays (47) have shown that nuclear extracts from the parotid glands of isoproterenoltreated mice have about a 6-fold increase in protein(s) binding to the upstream sequence (-702 to -574 bp) of MP2. “Footprint” assays indicate that the nuclear protein(s) binds to the AP-1 repeats. Adding Bt,cAMP or forskolin to hamster parotid gland primary-cell cultures resulted in a large increase (i.e., 15-to 30-fold) in PRP mRNA levels (48). The increase was most dramatic between 10 and 18 hr of treatment. Treatment of the cells with cycloheximide blocked this induction of PRP mRNAs, which is added evidence that the synthesis of a trans-acting factor is necessary for the dramatic increase in transcriptional activation of the PRP genes. &-AmylasemRNA was not significantly ailected by the cycloheximide treatment. Transfections have been performed using the plasmid pUMP2BE, which contains the complete MP2 gene, and with various constructs containing deletions of the upstream sequence of MP2. Constructs containing the sequence -702 to -574 bp of MP2 in tandem with the Rous sarcoma virus (RSV) promoter and the chloramphenicol transferase (CAT) gene showed induction of PRP mRNAs of 2- to 4-fold. Various cell types have been used for the transfection experiments, including PC-12, AtT20, and L M cells. Presently, we are attempting transfection experiments with a parotidhepatoma cell line prepared by fusion of FTO-2B cells and parotid gland primary cells (49). This may be the only “immortalized cell line of parotid gland cells available, and these cells may respond more dramatically to transfections with the PRP gene regulatory sequence.
VI. Functional Aspects of PRPs The high conservation of the sequences and structures of PRP genes and PRPs argues for specific biological functions for these unusual proteins. Some of the proposed functions, such as calcium binding, hydroxylapatite binding, formation of the dental-acquired pellicle, and agglutination of oral bacteria, have been reviewed, especially for human PRPs (12, 22, 23). In 1983, we showed that PRPs, which were dramatically induced in rat parotid glands by feeding tannins, are beneficial to the rat (SO).Condensed tannins (proanthocyanidins, oligomers G f fiaan-3-01s) and hydrolyzable tannins
PROLINE-RICH PROTEIN MULTICENE FAMILIES
19
(oligomers of gallic acid) are present in many foods. Antinutritional effects and other toxic and pathological properties, such as carcinogenicity and hepatotoxicity, have been associated with the ingestion of tannins and with the medicinal use of tannins. The general properties of tannins, their effects on biological systems, and specifically their roles in the induction of PRPs have recently been reviewed (13). Seeds of bird-resistant cultivars of sorghum, a major cereal crop of the semi-arid tropics, contain high levels of tannin (high-tannin sorghum), which diminishes the nutritional value of the grain. Studies designed to define the interactions of tannins and proteins show that tannins have an extremely high affinity for proteins rich in proline (51), and that the salivary PRPs have the highest affinity (50). Because the gastrointestinal tract, specifically the oral cavity, is the source of PRPs, it was suggested that salivary PRPs might interact with tannins and serve as a defense against the detrimental effects of dietary tannins. While the dramatic induction of PRPs in the rat following isoproterenol treatment clearly offsets the usual detrimental effects of dietary tannins (50),parotid glands from rats fed high-tannin sorghum (i.e., 2% of their diet) without isoproterenol treatment were also enlarged about 4fold, and there was a dramatic increase in PRPs within 3 days. Thus, tannins in the diet mimic the effects of the P-agonist isoproterenol on the parotid glands. There was an initial weight loss on the 2% tannin diet, reversed at 3 days, or at the time of maximal stimulation of PRP synthesis. After this time, the animals grew at close to the normal rate. Amino-acid analysis, electrophoretic patterns of proteins, and cell-free translations of mRNAs all confirmed that the PRPs induced in parotid glands by feeding tannin are identical to those induced by isoproterenol. Subsequent studies show that the P-agonist propranolol (a mixed PI, P,-agonist) and atenolol (a PI-agonist), when included in the diet, block the induction of PRPs by dietary tannin. Butoxamine, a P,-specific blocker, had no effect and therefore the P-adrenergic receptor affected by tannin feeding is the PI-receptor. The addition of either propranolol or atenolol to the diet of rats also causes substantial increases in four proteins in the submandibular glands, of MW 145,000 (GP145), 42,000 (P42), 40,000 (P40), and 39,000 (P39) (52). GP145 is glycosylated. These proteins are tissue-specific, as they were not detected in the parotid or sublingual gland, lung, liver, pancreas, kidney, heart, or small intestine either before or after propranolol treatment. We believe that this is the first report on the induction or regulation of protein synthesis by a P-adrenergic blocker. The hamster was used as another animal model to study the regulation and expression of PRPs. The hamster responded to isoproterenol by the induction of a series of proteins (39). However, the protein encoded by “PRP” gene H29 was unusually high (34%) in Gln and was only 15% Pro.
20
DON M. CARLSON ET AL.
Also, there was no evidence of a hypertrophic response in the hamster salivary glands. Subsequent studies of feeding tannins to hamsters also showed essentially no hypertrophic response, and PRPs were not induced. Weanling hamsters fed a diet of 2% tannin lose weight for about 3 days, as do rats and mice, but then an unusual growth inhibition occurs (39). Hamsters maintained on a 2% tannin diet failed to grow, and even at 60 days were essentially the same body weight as at 3 days after starting the feeding trial. When diets were switched, the experimental animals gained weight at close to the normal rate for young hamsters, while the control animals, then on a 2% tannin diet, lost about 20% of their weight. Clearly, the detrimental effects of tannins are reversed or inhibited by the induction of PRPs in rats and mice, but hamsters are unusually susceptible to tannins. In fact, increasing the tannin content of the diet to 4% was fatal to most hamsters within 3 days.
VII. Discussion Specialized cells in eukaryotes variably express different genes during differentiation and development. Exocrine glands, such as the pancreas and the salivary glands, have served as models of secretory tissues. Under ordinary conditions, the salivary glands of adult animals are relative stable and do not change appreciably in cell size or number (27). However, administration of the catecholamine isoproterenol causes dramatic morphological, cytological, and biochemical changes. Morphologically, the parotid glands can increase up to 10-fold in size. Cytologically, about 50% of the acinar cells are polyploid within 2 days of treatment. Biochemically, a dramatic induction of the multigene family encoding the PRPs is observed. The expression of PRPs for the parotid and submandibular glands is tissue-specific or, possibly more correctly, cell-specific. PRPs have been identified immunochemically in the trachea (53) and the pancreas but there is no evidence that PRP genes in these tissues respond as in the salivary glands to isoproterenol treatment. Small amounts of PRP mRNAs are observed in the mouse pancreas after isoproterenol treatment (C. A. Bidwell and D. M. Carlson, unpublished), but these results were variable. Current data suggest that transcriptional controls, tissuespecific factors, and post-translational modification share the role of principal modulators of the expression of the PRP gene families.
(a),
ACKNOWLEDGMENT These studies were supported in part by NIH grant DK 36812. P.S.W. was supported by
NIH training grant T32 HL 7013-13.
PROLINE-RICII PROTEIN MULTIGENE FAMILIES
21
REFERENCES D. K. Ann and D. M. Carlson, JBC 260, 15863 (1985). D. K. Ann, M. K. Smith and D. M. Carlson, JBC 263, 10887 (1988). G. Heinrich and J. F. Habener, JBC 262, 5262 (1987). L. Mirels, G. S. Bedi, D. P. Dickison, K. W. Grossand L. A. Tabak,JBC262,7289(1987). D. K. Ann, S. Clements, E. M. Johnstone and D. M. Carlson, JBC 262, 899 (1987). H. Mehansho, D. K. Ann, L. G. Butler, J. Rogler and D. M. Carlson, JBC 262, 12344 (1987). 7 . M. A. Ziemer, W. F. Swain, W. J. Rutter, S. Clements, D. K. Ann and D. M. Carlson,JBC 259, 10475 (1984). 8. H. S. Kim and N. Maeda, JBC 261, 6712 (1986). 9. H. Selye, M. Cantin and R. Veilleux, Growth 25, 243 (1961). 10. K. Brown-Grant, Nature 191, 1076 (1961). 11. S. Clements, H. Mehansho and D. M. Carlson, JBC 260, 13471 (1985). 12. A. Bennick, MCBchem 45, 83 (1982). 13. H. Mehansho, L. G. Butler and D. M. Carlson, Annu. Reu. Nutr. 7, 423 (1987). 14. A. Bennick and G. E. Connell, BJ 123, 455 (1971). 15. F. G. Oppenheim, D. I. Hay and P. Franzblau, Bchem 10, 4233 (1971). 16. A. Fernandez-Sorenson and D. M. Carlson, BBRC 60, 249 (1974). 17. D. L. Kauffman and P. J. Keller, Arch. Oral B i d . 24, 249 (1979). 18. J. Muenzer, C. Bildstein, M. Gleason and D. M. Carlson, JBC 254, 5623 (1979). 19. J. Muenzer, C. Bildstein, M. Gleason and D. M. Carlson, JBC 254, 5629 (1979). 20. R. S. C. Wong and A. Bennick, JBC 255, 5943 (1980). 21. H. Mehansho, S. Clements, B. T. Sheares, S. Smith and D. M. Carlson, JBC 260, 4418 (1985). 22. A. Bennick, J . Dental Res. 66, 457 (1987). 23. A. Bennick, J . Dental Res. 68, 2 (1989). 24. I. D. Mandel, R. H. Thompson and S. A. Ellison, Arch. Oral B i d . 10, 499 (1965). 25. T. Barka, Exp. Cell Res. 37, 662 (1965). 26. R. Baserga, FP 29, 1443 (1970). 27. C. A. Schneyer, in “Regulation of Organ and Tissue Growth” (R. J. Gross, ed.), 211 pp. Academic Press, New York, 1972. 28. M. R. Robinovitch, P. J. Keller, D. A. Johnson, J. M. Iverson and D. L. Kaufman,]. Dental Res. 56, 290 (1977). 29. M. A. Ziemer, A. Mason and D. M .Carlson, JBC 257, 11176 (1982). 30. H. Mehansho and D. M. Carlson, ]BC 258, 6616 (1983). 31. H. 0. Madsen and J. B. Hjorth, NARes 13, 1 (1985). 32. D. King, L. D. Snider and J. B. Lingrel, MCBiol 6, 209 (1986). 33. N. Maeda, H . 4 . Kim, E. A. Azen and 0. Smithies, JBC 260, 11123 (1985). 34. D. Kauffman, R . Wong, A. Bennick and P. Keller. Bchem 21, 6558 (1982). 35. M . F. Singer and J. Skowronski, TZBS 10, 119 (1985). 36. D. K. Ann, D. Gadbois and D. M. Carlson, JBC 262, 3958 (1987). 37. Y. Nakamura, M. Leppert, P. O’Connell, R. WOK, T. Holm, M. Culver, C. Martin, E. Fujimoto, M. Hoff, E. Kumlin and R. White, Science 235, 1616 (1987). 38. A. J. Jeffreys, V. Wilson and S . L. Thein, Nature 314, 67 (1985). 39. H. Mehansho, D. K. Ann, L. G. Butler, J. Rogler and D. M. Carlson, JBC 262, 12344 (1987). 40. G. von Heijne, J M B 173, 243 (1984). 1. 2. 3. 4. 5. 6.
22
DON M . CARLSON ET AL.
41. B. de Crombrugghe, S. Busby and H. Buc, Science 224, 831 (1984). 42. W. J. Roesler, G. R. Vanderback and R. W. Hansen, JBC 263, 9063 (1988). 43. Y.-S.Lin and M. R. Green, Nature 340, 656 (1989). 44. P. K. V. Vogt and T. J. Bos, TZBS 14, 172 (1989). 45. C. Santoro, N. Mermod, P. C. Andrews and R. Tjian, Nature 334, 218 (1988). 46. N. Mermod, E. A. O’Neill, T. J. Kelly and R. Tjian, Cell 58, 741 (1989). 47. J. Zhou and D. M. Carlson, FASEB]. 4, 2131 (1990). 48. P. S. Wright, C. Lenney and D. M. Carlson, J . Mol. Endocrinol. 4, 81 (1990). 49. P. S Wright and D. M. Carlson, FASEB]. 2, 3104 (1988). 50. H. Mehansho, A. Hagerman, S. Clements, L. G. Butler, J. Rogler and D. M. Carlson, PNAS 80, 3948 (1983). 51. A. Hagerman and L. G. Butler, JBC 256, 4494 (1981). 52. V. N. Subramaniam and D. M. Carlson, FASEB J. 4, 1980 (1990). 53. T. F. Warner and E. A. Azen, Am. Reu. Respir. Dis. 130, 115 (1984). 54. S. Ito, S. Isemura, E. Saitob, K. Sanada, T. Suzuki and A. Shibita, Acta Endocrinol. 103, 544 (1983). 55. R. G. Goodwin, C. L. Moncman, F. M. Rottman and J. H. Nilson, NARes 11, 6873 (1983). 56. T. Tsukada, J. S. Fink, G. Mandel and R. H. Goodman, JBC 262, 8743 (1987).