BioSystems 45 (1998) 29 – 44
Computer analysis of distribution of putative cis- and transregulatory elements in milk protein gene promoters Tadeusz Malewski Institute of Genetics and Animal Breeding, Polish Academy of Sciences, Jastrze¸biec, 05 -551 Mroko´w, Poland Received 9 June 1997; accepted 11 August 1997
Abstract Multiple alignment of 28 milk protein gene promoters belonging to seven protein superfamilies is described. In these gene promoters three groups of common motifs were found: group I — specific for all milk protein gene promoters; group II—specific only for one gene superfamily; and group III — motifs shared by several gene superfamilies. Motifs of group I and III do not have any preferential location in the promoters, while group II motifs are located in the proximal part, from − 36 to −224. Milk protein gene promoters were analysed for presence of putative binding sites for nine transcription factors important for the expression of this group of genes. The transcription factor binding sites for C/EBP, CTF/NF1, MAF and MGF were found in all promoters investigated. The set of putative transcription factor binding sites or response elements for GRE, IRE, PMF, STR and YY1 is unique for every gene superfamily. © 1998 Elsevier Science Ireland Ltd. Keywords: Transcription factor; Promoter; Gene expression; Casein; Mammary gland
1. Introduction Milk proteins are a small group of proteins belonging to different superfamilies. According to the Protein Identification Resources (PIR) database this group consists of seven superfamilies: aS1-, aS2-, b-, and k-caseins, lysozyme c, lipocalin and antileucoproteinase. Casein superfamilies are represented only by the aS1-, aS2-, b-, and k-casein genes, respectively: lysozyme c, by a-lactalbumin and lysozyme gene; lipocalin
and antileu coproteinase superfamilies include many genes and among them milk protein genes. b-lactoglobulin belongs to lipocalin and whey acidic protein (WAP) to the antileucoproteinase superfamily. Expression patterns of milk protein genes share common features. They are expressed only in the mammary gland during late pregnancy and lactation, their expression is stimulated by synergistic action of three lactogenic hormones: insulin, glucocorticoids and prolactin and attenuated by progesterone (Vonderhaar and Ziska,
0303-2647/98/$19.00 © 1998 Elsevier Science Ireland Ltd. All rights reserved. PII S 0 3 0 3 - 2 6 4 7 ( 9 7 ) 0 0 0 5 9 - 2
30
T. Malewski / BioSystems 45 (1998) 29–44
1989). Transcription of these genes is very active during lactation; casein mRNA constitutes 80% (Guyette et al., 1979) 10 – 15% and of WAP (Hobbs et al., 1982) of total cellular mRNA, which make them attractive objects for biotechnology. Common expression features suggest the existence of common motifs and a set of transcription factor binding sites in promoter regions of these genes. Multiply alignment of three to eight milk protein gene promoters showed the existance of common motifs (Groenen, 1992; Laird et al., 1988; Yoshimura and Oka, 1989; Yu-Lee et al., 1986), however the question is which of those specific for the whole group of milk protein gene promoters was left open. So far eight transcription factors important for expression of milk protein genes and the locations of their binding sites in some of these genes promoters have been described: (1) mammary gland-specific nuclear factor (MGF), recently identified as a member of the signal transducers and activators of translation family (STAT5) (Wakao et al., 1994), which also appears to be identical to the milk proteinbinding factor (MPBF) (Burdon et al., 1994); (2) mammary cell-activating factor (MAF), a member of Ets-related proteins (Welte et al., 1994a); (3) pregnancy-specific mammary nuclear factor (PMF) (Lee and Oka, 1992); (4) CCAAT/enhancer binding protein (C/EBP) (Raught et al., 1995); (5) CTF/NF1 (Li and Rosen, 1995a); (6) single-stranded DNA-binding transcriptional regulator (STR) (Altiok and Groner, 1994); (7) yingyang (YY1) (Meier and Groner, 1994); (8) glucocorticoid receptor (GR) (Welte et al., 1993). Transcription factors recognize sequences mostly 6–16-bp long (Wingender, 1993) and the theoretical probability of the accidental appearance of a transcription factor binding site is low − 1/46 –1/416. Thus, occurrence of various consensuses in gene promoters can indicate which transcription factors may be involved in a gene regulation. Estimation of putative transcription factor binding sites can yield an insight into regulation of gene expression. Recent computer analysis of milk protein gene promoters showed their very complex structure (Malewski and Zwierzchowski, 1995).
In this report we describe multiple alignment of promoter regions of milk protein genes and the search of them for putative cis- and trans-regulatory sequences in them. The aim of this study was to find (1) motif(s) common for the whole group of gene promoters and superfamily-specific ones, (2) to map putative regulatory sites in the individual gene promoters, and (3) to estimate sets of transcription factor binding sites common to the whole group of milk protein genes and those which are superfamily-specific.
2. Material and methods The Milk Gene Promoters database was compiled by extraction from the GenBank (release R92.0) for all 5% upstream sequences of milk protein genes. The database is presented in Table 1. It consists of 28 promoters which represent eight genes in 11 species. The sequenced regions covered 391–4835-bp length upstream of the transcription initiation site. Multiple alignment of milk protein gene promoters and search of homology to specific motifs and transcription factor response elements were performed by a package program HIBIO DNASIS (Hitachi) version 2.10. Multiple alignment was performed using the Higgins and Sharp algorithm under the following analysis parameters: gap penalty, from 1 to 5; fixed gap penalty, from 1 to 10; floating gap penalty, from 1 to 10; number of top diagonals, 5; k-tuple, 4; windows size, 5. Homology search was performed at two levels: high \ 82% (e.g. one base substitution in a 6-bp sequence) and low 67–82% (e.g. two base substitutions in a 6-bp sequence). For homology search the algorithm developed by Lipman and Pearson was used. The homology search was performed under the following analysis parameters: cutoff score, − 16; ktup, 1; sequence mode, normal; initial gap penalty score, − 12; incremental gap penalty score, 4. The homology searched for motifs described as typical for milk protein genes: a 224-bp-long sequence described by Groenen (1992), henceforth referred to as the ‘Groenen structure’ G(N)5TNT(N)4TNNT(N)6ANT(N)15 ANTNC(N)4TTCCTGGACACATTTC CTTT (N)9 TNANTNNNT(N)5 TNNNT(N)4 A(N)4
T. Malewski / BioSystems 45 (1998) 29–44
31
Table 1 Sequenced milk protein gene promoters Species
Promoter Gene
GenBank access. no.
Abbreviation
Length (bp)a
Bos taurus
aS1 Casein aS2 Casein b Casein k Casein a Lactabumin b Lactoglobulin
X59856 M94327 X14711 M75887 M90645 X14710
BOVAS1 BOVAS2 BOVB BOVK BOVLA BOVBGL
2131 2486 1722 2143 1977 2171
Bos indicus
a Lactalbumin
Z12029
BOVINLA
510
Capra hirsus
b Casein k Casein a Lactalbumin b Lactoglobulin
M90559 Z33882 M63868 Z33881
GOATB GOATK GOATLA GOATBGL
1740 2245 770 2148 1173
Ca6ia cutleri
a Lactalbumin
Y00726
GUILA
Homo sapiens
a Lactalbumin
X05153
HUMLA
735
Macropus eugenii
b Lactoglobulin
L14954
MACBGL
678
Mus musculus
b Casein a Lactalbumin WAP k Casein b Lactoglobulin
X13484 M87863 X79437 L31372 X68105
MUSB MUSLA MUSWAP SHEEPK SHBGL
4835 546 4070 1219 4203
a Casein b Casein k Casein WAP
M77195 X15735 X52564
RABA RABB RABK RABWAP
1341 2096 70 1798
Rattus nor6egicus
a Casein b Casein g Casein a Lactalbumin WAP
X03584 M10936 X03589 X00461 X01153
RATA RATB RATG RATLA RATWAP
682 783 679 1246 1199
Sus scrofa
a Lactalbumin
L31944
PIGLA
O6is aries Oryctolagus cuniculus
a
391
Length of sequenced of 5% upstream part of promoter.
CNNNGAATNT(N)4GGANA NNANANT (N)4 AGAANA(N)4 TTTCNNAT(N)7 ANTTCTTNGAATTNA(N)12 T(N)11AAACCACANAATTAGCATNTNA(N)13TATAWAT the milk box, a 33-bp-long nucleotide sequence (Laird et al., 1988): RGAAGRAAANTGGACAGAAANTCAACGTTTCTA sequences described by Yu-Lee et al. (1986): (1) YYGTTKRAGA; (2) MCYYAGAATYT; (3) RGAASRAAWTVSAVAGAARHGAWTTTCYWAT; (4) TTCTTAGAATT; (5) RAAACCACARAATTAGCAT; (6) RGTWTAWAT-
AG, henceforth referred to as ‘Yu-Lee sequences’. sequence described by Yoshimura and Oka (1989): (a) CCCTAGAATTTCTGG; (b) TTCTTGAATTAA; (c) AAACCACAAAATTAGCATTTTA, henceforth referred to as ‘Oka boxes’ a sequence described by Vilotte and Soulier (1992): KRMTTCYTRGAAYYR, henceforth referred to as the ‘Vilotte sequence’. Transcription factors were analysed and their consensuses are listed in Table 2.
T. Malewski / BioSystems 45 (1998) 29–44
32
3. Results
3.1. Comparison on milk protein gene promoters Efficiency of multiple alignment depends on correct estimation of analysis parameters. To estimate the optimal parameters, alignment was performed on the set of short, 250-bp-long fragments of milk protein gene promoters at three levels with increasing stringency for gap introducing and increasing the size of the gap. The parameters of analysis were as follows: 1. level of stringency: gap penalty, 1; fixed gap penalty, 1; floating gap penalty, 1; 2. level of stringency: gap penalty, 4; fixed gap penalty, 4; floating gap penalty, 4; 3. level of stringency: gap penalty, 5; fixed gap penalty, 10; floating gap penalty, 10. Comparison of dendrograms of aligned 250-bplong fragments of promoters show that results Table 2 Transcription factors and their consensuses analysed in milk protein gene promoters Transcription factor or response element
Abbreviation
Consensus
CCAAT/enhancer-binding protein CCAAT-binding transcription factor/Nuclear Factor 1 Glucocorticoid/progesterone receptor Mammary-cell activating factor Mammary gland specific nuclear factor Insulin response element Pregnancy-specific mammary nuclear factor Single-stranded DNAbinding factor Yin and yang
C/EBP
TKNNGNAAKa1 GCCAAT2
a
CTF/NF1
GRE/PRE MAF
GGTACAN3TGTTCT2 GRRGSAAGK3
MGF
TTCNNNGAA4
IRE
CCGCCTC2
PMF
TGAATN4 – 7ATCA5
STR
TTMAYCWBCKYYH5 CCATNT6
YY1
The nucleotides are described in one letter code: A, adenine; C, cytosine; G, guanine; T, thymine; R, A/G; Y, C/T; S, G/C; W, A/T; K, G/T; M, A/C; B, C/G/T; H, A/C/T; V, A/C/G; N, A/C/G/T. 1 (Wingender, 1993); 2 (Locker, 1993); 3 (Mink et al., 1992); 4 (Welte et al., 1994b); 5 (Altiok and Groner, 1994); 6 (Shi et al., 1991).
obtained at the level 2 and 3 of analysis stringency are similar but differ from results obtained at the level 1 (data no shown). Although the level 2 has better abilities to detect common motifs it was chosen for alignment analysis of the entire milk protein gene 5% upstream sequences extracted from the GenBank database as well as for their 1900-bp and 510-bp-long proximal fragments. The results of alignment are shown in Fig. 1. The entire 5% upstream region of the b-casein and a-lactalbumin genes as well as their 250-bp-long and 510-bp-long fragments have similar levels of sequence homology. The level of homology did not increase in the region adjacent to the initiation of transcription site. The opposite was found for WAP gene promoters—their 250-bp-long proximal fragments have higher homology than more distal sequences. Regions adjacent to the initiation of transcription site are more conserved then distal regions also in bovine aS2- and rat g-casein gene promoters. A 250-bp-long proximal fragment of bovine k-casein promoter also has higher homology to appropriate fragments of goat and sheep k-casein promoters than the more distal sequences. Promoters of b-lactoglobulin, b- and k-casein genes form a heterologous group. Similarity of different fragments of goat and sheep b-lactoglobulin, goat and bovine b-casein gene promoters does not depend on their position in relation to the initiation of transcription site. The most surprising features show goat and sheep k-casein, mouse and rat b-casein as well as bovine, goat and sheep b-lactoglobulin gene promoters. Their distal fragments show a higher similarity than their proximal fragments. For example the proximal 250-bp-long fragment of mouse 10% of homology, 510-bp-long fragments have 33%, while entire promoters showed 53% of homology. Homology of the analysed gene promoter sequences is highly related to a phylogenetic proximity of the analysed species, however the level of similarity depends on the superfamily of milk protein genes analysed. In the groups of animals with short phylogenetic distances (e.g. cattle and goat in ruminantia, mouse and rat in muridae) the most conservative are promoters of a-lactalbumin genes. In cattle and goat they show 91% and in mouse and rat they show 67% homology. Less conservative in this respect are b-caseins; comparison of their gene promoters between the same pair
T. Malewski / BioSystems 45 (1998) 29–44
33
Fig. 1. Phylogenetic tree of milk protein gene promoters. (A) entire promoters; (B) −1900 to −1 fragment of promoters; (C) − 510 to − 1 fragment of promoters; (D) −250 to − 1 fragment of promoters.
of species showed 85 and 53% of homology, respectively. The lowest level of homology is displayed by WAP gene promoters; in mouse and rat they have
only 43% homology. Comparison of animals at quite long phylogenetic distances (e.g. artiodactyla and rodentia) showed that a-lactalbumin gene
34
T. Malewski / BioSystems 45 (1998) 29–44
promoters are more conservative (25% homology) then those of b-caseins (12%). The homology of the latter is almost the same as that of WAP promoters showing 8% between rodentia and lagomorpha. Homology between milk protein gene promoters from different protein multifamilies even in the same species is very low (10 – 15%).
3.2. Motifs typical for milk protein gene promoters No common motifs were found at the assumed levels of alignment stringency for all of the 28 milk protein gene promoters. The negative results suggest that either this group of genes is not homogenous or that the method of multiple alignment is not sensitive enough to detect small regions of homology located at different distances from the start point of initiation of transcription without allowing the insertion of a large number of gaps. The results of promoter analysis for the presence of Groenen structure, milk box, Oka boxes and Yu-Lee sequences are presented in Fig. 2. At the high level of homology ( \82%) these motifs were found only in some promoters, at the low level
(67–82%). Yu-Lee1 and Yu-Lee6 motifs were found in all promoters analysed. They have no preferential location in the promoters. In addition to the Yu-Lee1 and Yu-Lee6 sequences, low levels of homology to the Groenen structure and Oka box A sequences were found in the a-casein gene superfamilies. The existence of motifs specific to a-casein superfamilies suggested that motifs specific to other superfamilies could also exist. To check this possibility, multiple alignment of promoters grouped by superfamilies was performed. Two sequence features were assumed as a criteria to choose sequence as a candidate for a motif: minimal length of sequence—6 base pairs, no more than two sequential gaps. The homology of each candidate for a motif was estimated in all promoters in a group. Sequences having \ 82% of homology to all promoters entering in a superfamily were chosen as a motifs, while sequences showing homology B 82% only to some promoters in a group were excluded from further analysis. Multiple alignment showed the existence of many superfamily-specific motifs. The a-casein gene promoters were found to have 6 motifs designated A1–A6.
Fig. 2. Map of motifs’ in milk protein gene promoters (motifs of group III not shown).
T. Malewski / BioSystems 45 (1998) 29–44 35
36
T. Malewski / BioSystems 45 (1998) 29–44
Motifs from A1 to A4 are located in the proximal region at position of − 197/− 13 (numbering refers to the sequence of the bovine aS1-casein gene promoter). In addition to the above described motifs, other motifs were found in a more distal part of promoters—RGAARAT [A5] and ATTAYYYY [A6]. The motifs found in the a-casein gene promoter are similar to motifs previously reported but not the same. They have higher homology to a-casein gene promoter sequences (\ 82%) than the Groenen structure (\75%) and Oka box A ( \ 67%). A2–A4 motifs have 83–100% homology to the Groenen structure:
A1 and A5 are parts of the milk box and the milk box is a part of A4. Parts of A4 are also Yu-Lee3, Yu-Lee4 and Vilotte sequences:
T. Malewski / BioSystems 45 (1998) 29–44
The 46-bp-long LA1 motif is located at positions from −130 to −85 (numbering refers to the sequence of the Bos indicus a-lactalbumin gene). LA2 and LA3 motifs are located at positions from −259 to −265 and from −269 to − 302, respectively. Promoters of the whey acidic protein gene have 12 common motifs, 11 of them are short: YCCARRSTCYTCCTCCTG [W1]; AYGACCRC [W3]; TTMATTT [W4]; CTTYYTTYTCT [W5]; ACAYTTGYYRSDGA [W6]; AGGGCA [W7]; TCHCYACR [W8]; GAGYCA [W9]; ARYATG [W10]; GCRAGRG [W11]; CYCBGGA [W12] and one extended [W2]. Motifs W1 – W6 are located at positions between −305 to − 55 in relation to the initiation of transcription site, motifs W7–W12 are in more distal part of promoters (numbering refers to the sequence of the mouse WAP gene promoter):
37
Promoters of b-casein and b-lactoglobulin genes have no extended regions of homology. Typical for these are sets of short 6–14-bp-long motifs. They are located at some distance from the initiation of transcription site. The most proximally located motif BL2 was found at position − 173 in marsupial b-lactoglo-bulin gene promoter. b-lactoglobulin gene promoters have the following short motifs: CCAGCCTGGACC [BL1]; YCTTTYC [BL2]; CTCTTTYCTGC [BL3]; YGTGAGTTCCT [BL4]; GTRAAGACC [BL5]; ATCAGRAYTCCC [BL6]; CARCCYCYCY [BL7]; ACCCAGTT [BL8]; CCTGGCT [BL9]; ATTTYCT [BL10]; GGTTCACAGRAA [BL11] and TYATTT [BL12]. Six short motifs designated as B1–B6: RYTAYTGGRCAATT [B1]; YTGRRRACTR [B2]; TMATWG [B3]; ATCTYWR [B4]; TWYRTRA [B5] and TTYTRC [B6] in promoters of b-casein gene superfamily were found. They are located at
38
T. Malewski / BioSystems 45 (1998) 29–44
positions beginning from −188 (goat b-casein B1 motif). The 1220-bp-long k-casein promoter fragments show high degrees of homology (\ 86%), which do not allow one to separate common motif(s). Motifs found in the a-lactalbumin gene promoters have no essential homology to published milk protein gene motifs. Some homology to LA1 is found in motifs typical for WAP and b-lacto-globulin promoters: W5, W7, W12 and BL7, respectively.
Motifs W4 and W12 found in whey acidic protein gene promoters have 83% similarity to the Groenen structure:
Short motifs—W3, W6, W8, W10, W11 and W12 are homologous to W2 and some of them shows some similarity to each other. Comparison of motifs detected by superfamily-grouped gene promoter alignment showed that many motifs have significant homology to each other. Results of comparison are presented below. It is possible to group these motifs in two blocks. In the block I motifs could be arranged around W2 and in the block II around W1.
T. Malewski / BioSystems 45 (1998) 29–44
39
Table 3 Occurrence of putative transcription factor binding sites in milk protein gene promoters Protein superfamilies
a Caseins b Caseins k Caseins a-Lactalbumin b-Lactoglobulin WAP
Transcription factor C/EBP
CTF/NF1
GRE
IRE
MAF
MGF
PMF
STR
YY1
+ + + + + +
+ + + + + +
− − − − − 9
− − 9 − + +
+ + + + + +
+ + + + + +
9 + + 9 + 9
+ 9 + 9 + +
+ + + 9 + 9
Significant sequence homology of motifs among themselves suggests that a slow rate of evolutionary divergence is also in these motifs. Analysis of promoters grouped in the appropriate gene superfamily (e.g. a-casein) does not answer the question of whether the same motifs are or are not present in promoters of other gene superfamilies (e.g. in b-, k-caseins, a-lactalbumins, b-lactoglobulins and WAP). The set of 28 sequenced milk protein gene promoters was analysed for homology to motifs already found in this study. The results of analysis are presented in Fig. 2. Motifs found in promoters of different gene superfamilies could be divided into three groups: group I, specific for all milk protein gene promoters; group II, specific only for one gene superfamily; and group III, motifs shared by several gene super-families. It was found that group I consists of three motifs; B4, BL10 and W10. These motifs do not have preferential locations in the promoters. Group II consists of motifs A4, B1, LA1, LA3, BL4, W1 and W2. Most of these are located in the proximal parts of promoters. Motif A5 is typical for a-caseins only. Sequences homologous to A4 begin at positions − 47 to − 83 and end up at positions −143 to −169. Motif B1 is typical for b-caseins and is located at positions from −188 to − 224. For a-lactalbumins, motifs LA1 and LA3 are typical and they are also located in the proximal end of promoters. Motifs W1 and W2 are typical for WAP gene promoters. W1 is located at positions − 29 to − 63, while sequences homologous to W2 begin at positions −36 to −72 and end up at − 108 to − 144. An exception to this rule is BL4, which
has no preferential location in b-lactoglobulin gene promoters.
3.3. Putati6e transcription factor binding sites in milk protein gene promoters The set of 28 sequenced milk protein gene promoters was analysed for homology to nine transcription factor binding sites reported as essential for the expression of this group of genes. The results of this analysis are shown in Fig. 3. and summary in Table 3. It was found that all promoters analysed have sequences highly homologous to four transcription factor binding sites: C/EBP, CTF/NF1, MAF and MGF/STAT. These transcription factors are probably responsible for common regulatory features of milk protein genes. The set of IRE binding factor, PMF, STR and YY1 transcription factors probably govern superfamily-specific gene expression patterns. b-lactoglobulin gene promoters have sequences highly homologous to all the above-mentioned transcription factor binding sites. Transcription factors which could bind to k-casein gene promoters are the same as those for b-lactoglobulins. The only exception is the absence of putative IRE in the sheep k-casein gene promoter. High levels of homology of sheep k-casein gene promoter to goat’s (93%) and cow’s (75%) promoters and similar locations of most putative transcription factor binding sites suggest that the putative IRE is located in the unsequenced part of this promoter. Like k-casein and b-lactoglo-bulin, WAP promoters have a putative IRE. This could suggest a high influence of insulin
40
T. Malewski / BioSystems 45 (1998) 29–44
on the expression of genes belonging to these superfamilies. Probably not only insulin but also glucocorticoids have a high influence on whey acidic protein genes expression. WAP gene promoters are the only group of promoters which has sequences highly homologous to canonical GRE. STR and YY1 putative binding sites were found in most of the analysed promoters. The absence of their putative binding sites may be an individual feature of a promoter (STR, rabbit b-casein; YY1, rat a-lactalbumin and rabbit WAP genes) or they may be located in unsequenced part of promoters (STR, pig a-lactalbumin gene). PMF putative binding sites were found in all promoters of b- and k-casein and b-lactoglobulin genes. Promoters of a-casein, a-lactalbumin and WAP are a heterogenous group; however, their sequenced parts on which putative PMF binding sites were not found are quite short compared to the distance from the transcription initiation site at which putative PMF binding sites are located in the appropriate gene superfamily. The presence of PMF binding site can suggest that the expression of these genes is suppressed during pregnancy. Location of putative transcription factor binding sites at a short distance from the initiation of transcription site increases the probability that these factors are involved in regulation of gene expression. In this respect the first binding sites of MAF are located at a short distance (from −7 to − 138) in k-caseins, a-lactalbumins and WAP, C/EBP in k-caseins (from −28 to − 58), and CTF/NF1 in WAP (from − 48 to − 100) gene promoters. Putative MGF binding sites are located at distances from − 13 to −101 in calciumsensitive (a- and b-) casein promoters. Results obtained in mapping putative transcription factor binding sites in milk protein gene promoters allowed us to divide them into several groups in respect of transcription regulation features: WAP genes, sensitive to insulin and glucocorticoids; k-casein and b-lactoglobulin, sensitive to insulin; calcium-sensitive caseins, increased sensitivity to prolactin compared to other milk protein genes; a-lactoglobulin, increased role of
MAF transcription factor in regulation of its expression. Analysis of motifs common to all milk protein genes as well as those which are superfamily-specific, for the presence of putative binding sites of all known transcription factors may be an important future topic for investigations which should provide a new insight into regulation of these groups of genes.
4. Discussion Milk protein genes have a high rate of divergence. The estimated rate of divergence for aS1and b-caseins is 113 PAM/100 M.Y. and 97 PAM/100 M.Y. (accepted point mutation/100 million years), respectively (Stewart et al., 1984), for rat and mouse WAP-64 PAM/100 M.Y. (Hennighausen et al., 1982) while for fibrinopeptides, only − 37 PAM/100 M.Y. (Dayhoff, 1978). The homology at the nucleotide level between coding sequences of bovine aS1- and rat a-caseins is 63.3% (Stewart et al., 1984), between mouse, rat and rabbit WAP − 64% (Devinoy et al., 1988). The overall homologies of porcine b- casein cDNA to b-casein cDNAs of other species vary from 82% for bovine and ovine to 67% for mouse and rat (Alexander and Beattie, 1992). The signal peptide sequence is much more conserved. Homologies to porcine b-casein signal peptide sequence vary from 84% for mouse to 91% for human (Alexander and Beattie, 1992). For bovine aS1- and rat a-caseins the homology is 93% (Stewart et al., 1984) and for mouse, rat and rabbit WAP − 89%. Homologies of 5% flanking regions estimated in this analysis are lower than those of coding sequences. It is possible that high homology regions correspond only to motifs typical for some groups of promoters. The other possibility is that the level of stringency, especially gap initiation and continuation penalties, at which sequence comparison was made for regions corresponding to coding sequences, signal peptide sequences and 5% flanking regions was not the same. When interspecies comparison of different parts of b-lactoglobulin gene sequence was performed at the same stringency parameter domains corre-
Fig. 3. Map of transcription factors’ putative binding sites in milk protein gene promoters.
T. Malewski / BioSystems 45 (1998) 29–44 41
42
T. Malewski / BioSystems 45 (1998) 29–44
sponding to mature peptide showed higher homology then the 5% flanking regions (Folch et al., 1994). Low homology of 5% flanking regions was shown not only in caseins and WAP genes but also in a-lactalbumin genes in which sequences encoding the signal peptide and the mature protein show similar rates of divergence. Interspecies comparison shows 44 – 99% homology in coding sequences and 50 – 100% homology in the signal peptide sequences. Comparison of the promoter regions of milk protein genes revealed motifs specific for all milk protein gene promoters as well as those superfamily-specific. Vilotte and Soulier (1992) reported a motif typical for 20 promoters, it was a good candidate of group I motifs-motifs specific for all milk protein gene promoters, however, it was not found in all promoters tested. Groenen suggested that an approximately 224-bp-long sequence is typical for calcium-sensitive caseins (a and b). In this investigation, sequences with low or high homology to Groenen structure were found in all a-casein gene promoters, however, significant homology was not always found in b-casein gene promoters. While bovine, goat and rabbit b-casein gene promoters have 79 – 100% homology to Groenen structure, mouse and rat b-casein gene promoters have only 47 and 32%, respectively. This disagreement is probably a result of differences in the degree of stringency of the analysis performed, especially the penalty of gap initiation and continuation. In the sequence alignment performed by Groenen (1992) there are 11 and 19 gaps in the mouse and rat b-casein gene promoters, respectively. Lowering of analysis stringency parameters (0 penalties to gap initiation and continuation) increased the level of homology between mouse b-casein gene promoters and Groenen sequence to 81% but introduces 35 gaps (data not shown). In searches of sets composed of three to nine promoters, depending of the promoters chosen for comparison and the stringency of alignment, different motifs were found (Yu-Lee et al., 1986; Yoshimura and Oka, 1989; Groenen, 1992). Studies on DNA-protein interactions have been performed with several different milk protein gene promoters, however, both computer and experi-
mental data are limited. The TRANSFAC database (Wingender, 1994) has only 21 entries related to three promoters: rat b-casein, sheep b-lactoglobulin and mouse WAP and two transcription factors’ binding sites: MGF (MPBF) and CTF/ NF1. The TFD transcription factor database (7.5 release, 1996) (Ghost, 1990) has only nine records, however, a computer search of rabbit b-casein gene promoter revealed its very complex structure (Malewski and Zwierzchowski, 1995). Additionally to the above-mentioned databases, binding of several transcription factors, mostly to rat b-casein and WAP gene promoters has been described recently. In the rat b-casein gene promoter, binding sites for five transcription factors were found: C/EBP, MGF, PMF, STR and YY1. The minimal LHRR (lactogenic hormone response region) of the rat b-casein gene promoter contains at least three different functional elements. Elimination of MGF recognition sites between − 89/− 82 led to a complete loss of the lactogenic hormone-dependent activity. The second functional element, C/EBP transcription factor binding sites, are located at positions − 142/− 133, − 165/− 155, − 181/− 171 and − 220/− 210 (Doppler et al., 1995). The binding site of the third element, YY1, is located at − 114 (Raught et al., 1994). In the rat b-casein gene promoter PMF binding sites were also found located at − 9/+ 4 and − 364/− 349 (Lee and Oka, 1992) as well as STR binding sites located at − 194/− 163 (Altiok and Groner, 1994). Mouse b-casein gene promoter has been investigated less; the presence of C/EBP (Doppler et al., 1995) and STR transcription factor binding sites (Altiok and Groner, 1994) has been reported. They occur at approximately the same position as in the rat b-casein gene promoter. In the bovine aS2-casein gene, MGF and OCT1 binding sites were found. Two MGF binding sites were found at positions between − 87 and − 99 (higher affinity) and between − 130 and −150 (lower affinity) (Wakao et al., 1994). Two of the OCT1 binding sites, at positions − 50 and − 260, exhibit strong and intermediate affinity to OCT1, respectively. The other sites, at positions − 480 and − 210, are very weak OCT1 binding sites (Groenen et al., 1992) The binding sites for MPBF, a transcription
T. Malewski / BioSystems 45 (1998) 29–44
factor related to MGF, were found in ovine b-lactoglobulin promoter sequences at positions − 93, − 210 and −278 (Watson et al., 1991). Another relatively extensively studied promoter is the rat WAP gene promoter. Recently, in this promoter CTF/NFI, GR, MAF and MGF transcription factor binding sites were localised. Human GR was shown to protect against DNase I digestion three regions in the noncoding strand of the rat WAP promoter: −818/ − 800, − 782/ − 764, and − 753/− 745 (Li and Rosen, 1995b); however, none of these binding sites share extensive similarity with the canonical GR consensus sequence, each binding site contains one GR halfsite —TGTNCY. The GR half-sites are colocalised with CTF/NFI binding sites in the distal DNase I hypersensitive sites of region I which suggests interaction between transacting factors binding to these sites (Li and Rosen, 1995b). MGF binding site is located far from the initiation of transcription site (− 726), but it is important for transcription. Mutation of this site decreases WAP gene expression in transgenic animals about 10-fold. The sequence −120/ − 100 in the mouse WAP gene promoter contains the MAF binding site (Welte et al., 1994a). Computer predicted putative transcription factor binding sites agreed well with experimental data obtained for most transcription factors. All transcription factor binding sites found by experimental methods in milk protein gene promoters were also predicted by computer analysis. Experimental results obtained in the analysis of milk protein gene promoters suggest that the role of prolactin, glucocorticoid and insulin can be different in the regulation of expression of individual genes. Genes which promoters have canonical GR binding sites (WAP and quinea pig a-lactalbumin) are probably regulated by glucocorticoids directly. Addition of cortisol to rabbit mammary gland explants induced rapid accumulation of WAP but not aS1- and b-casein mRNA which supports the suggestion that WAP gene transcription is directly dependent on the glucocorticoid receptor, while casein genes are only indirectly dependent (Puissant and Houdebine, 1991). The indirect effect of glucocorticoids on milk protein gene expression can occur during functional interactions between
43
MGF/STAT5 and the glucocorticoid receptor (Sto¨cklin et al., 1996) or by binding to GRE half-sites (Li and Rosen, 1995a). Knowledge about distribution of putative transcription factor binding sites can be useful in choosing promoters and estimation of their length in transgenic experiments.
References Alexander, L.J., Beattie, C.W., 1992. The sequence of porcine b-casein cDNA. Animal Genet. 23, 369 – 371. Altiok, S., Groner, B., 1994. b-casein mRNA sequesters a single-stranded nucleic acid-binding protein which negatively regulates the b-casein gene promoter. Mol. Cell. Biol. 14 (9), 6004 – 6012. Burdon, T.G., Demmer, J., Clark, A.J., Watson, Ch.J., 1994. The mammary factor MPBF is a prolactin-induced transcriptional regulator which binds to STAT factor recognition sites. FEBS Lett. 350, 177 – 182. Dayhoff, M.O., 1978. Atlas of protein sequence and structure, vol. 5, Suppl. 3. In: Dayhoff, M.O. (Ed.). National Biomedical Research Foundation, MD. Devinoy, E., Hubert, Ch., Schaerer, E., Houdebine, L.-M., Kraehenbuhl, J.-P., 1988. Sequence of the rabbit whey acidic protein cDNA. Nucleic Acids Res. 16 (16), 8180 – 8181. Doppler, W., Welte, T., Philipp, S., 1995. CCAAT/enhancerbinding protein isoforms b and d are expressed in mammary epithelial cells and bind to multiple sites in the b-casein gene promoter. J. Biol. Chem. 270 (30), 17962 – 17969. Folch, J.M., Coll, A., Sanches, A., 1994. Complete sequence of the caprine b-lactoglobulin gene. J. Dairy Sci. 77, 3493 – 3497. Ghost, D., 1990. A relational database of transcriptional factors. Nucleic Acids Res. 18, 1749 – 1756. Groenen, M.A.M., 1992. Regulation of expression of milk protein genes. 43rd Annual Meeting of the EAAP, 14 – 17 September, pp. 1 – 18 Groenen, M.A.M., Dijnhof, R.J.M., van der Poel Diggelen, R., Verstege, E., 1992. Multiple octamer binding sites in the promoter redion of the bovine aS2-casein gene. Nucleic Acids Res. 20 (16), 4311 – 4318. Guyette, W.A., Matusik, R.J., Rosen, J.M., 1979. Prolactinmediated transcriptional and post-transcriptional control of casein gene expression. Cell 17, 1013 – 1014. Hennighausen, L.G., Soppel, A.E., Hobbs, A.A., Rosen, J.M., 1982. Nucleic Acid Res. 10, 3733 – 3744. Hobbs, A.A., Richards, D.A., Kessle, D.J., Rosen, J.M., 1982. Complex hormonal regulation of rat casein gene expression. J. Biol. Chem. 257, 3598 – 3605. Lee, Ch.S., Oka, T., 1992. A pregnancy-specific mammary nuclear factor involved in the repression of the mouse
T. Malewski / BioSystems 45 (1998) 29–44
44
b-casein gene transcription by progesterone. J. Biol. Chem. 267, 5795 – 5801. Laird, J.E., Jack, L., Hall, L., Boulton, A., Parker, D., Craig, R.K., 1988. Structure and expression of the guinea-pig a-lactalbumin gene. Biochem J. 254, 85–94. Li, S., Rosen, J.M., 1995a. Glucocorticoid regulation of rat whey acidic protein gene expression involves hormone-induced alterations of chromatin structure in the distal promoter region. Mol. Endocrinol. 8, 1328–1335. Li, S., Rosen, J.M., 1995b. Nuclear factor I and mammary gland factor (STAT5) play a critical role in regulating rat whey acidic protein gene expression in transgenic mice. Mol. Cell. Biol. 15, 2063–2070. Locker, J., 1993. Transcription controls: Cis-elements and trans-factors. In: Hames, B.D., Higgins, S.J. (Eds.), Gene Transcription. A Practical Approach. Oxford University Press, New York, pp. 2103–2121. Meier, V.S., Groner, B., 1994. The nuclear factor YY1 participates in repression of the b-casein gene promoter in mammary epithelial cells and is counteracted by mammary gland factor during lactogenic hormone induction. Mol. Cell Biol. 14, 128 – 137. Mink, S., Hartig, E., Jennewein, P., Doppler, W., Cato, A.C.B., 1992. A mammary cell-specific enhancer in mouse mammary tumor virus DNA is composed of multiple regulatory elements including binding sites for CTF/NF1 and a novel transcription factor, mammary cell-activating factor. Mol. Cell. Biol. 12, 4906–4918. Malewski, T., Zwierzchowski, L., 1995. Computer-aided analysis of potential transcription-factor binding sites in the rabbit b-casein gene promoter. BioSystems 36, 109–119. Puissant, C., Houdebine, L.-M., 1991. Cortisol induces rapid accumulation of whey acid protein mRNA but not of aS1 and b-casein mRNA in rabbit mammary explants. Cell Biol. Int. Rep. 15 (2), 121–129. Raught, B., Kursheed, B., Kazansky, A., Rosen, J., 1994. YY1 represses b-casein gene expression by preventing the formation of a lactation-associated complex. Mol. Cell. Biol. 14 (3), 1752 – 1763. Raught, B., Liao, W.S.L., Rosen, J.M., 1995. Developmentally and hormonally regulated CCAAT/enhancer-binding protein isoforms influence b-casein gene expression. Mol. Endocrinol. 9, 1223 –1232. Shi, Y., Seto, E., Chang, L.S., Shenk, T., 1991. Transcriptional repression by YY1, a human GLI-Kru¨ppel-related protein, and relief of repression by adenovirus E1A protein. Cell 67, 377 – 388.
.
Stewart, A.F., Willis, I.M., Mackinlay, A.G., 1984. Nucleotide sequences of bovine as1- and k-casein cDNAs. Nucleic Acids Res. 12 (9), 3895 – 3907. Sto¨cklin, E., Wissler, M., Gouilleux, F., Groner, B., 1996. Functional interaction between STAT5 and glucocorticoid receptor. Nature 383, 726 – 728. Vilotte, J.-L., Soulier, S., 1992. Isolation and characterisation of the mouse a-lactalbumin-encoding gene: interspecies comparison, tissue-and stage-specific expression. Gene 119, 287 – 292. Vonderhaar, B.K., Ziska, S.E., 1989. Hormonal regulation of milk protein gene expression. Ann. Rev. Physiol. 51, 641 – 649. Wakao, H., Gouilleux, F., Groner, B., 1994. Mammary gland factor (MGF) is a novel member of the cytokine regulated transcription factor gene family and confers the prolactin response. EMBO J. 13 (9), 2182 – 2191. Watson, J.W., Gordon, K.E., Robertson, M., Clark, A.J., 1991. Interaction of DNA-binding proteins with a milk protein gene promoter in vitro: identification of a mammary gland-specific factor. Nucleic Acids Res. 19, 6603 – 6610. Welte, T., Philipp, S., Cairns, C., Gustafson, J.-A., Doppler, T., 1993. Glucocorticoid receptor binding sites in the promoter region of milk protein genes. J. Steroid Biochem. Mol. Biol. 47, 75 – 81. Welte, T., Garimorth, K., Philipp, S., Jennewein, P., Huck Cato, A.C.B., Doppler, T., 1994a. Involvement of Ets-related proteins in hormone-independent mammary cell-specific gene expression. Eur. J. Biochem. 223, 997 – 1006. Welte, T., Garimorth, K., Philipp, S., Doppler, T., 1994b. Prolactin-dependent activation of a tyrosine phosphorylated DNA binding factor in mouse mammary epithelial cells. Mol. Endocrinol. 8, 1091 – 1102. Wingender, E., 1993. Gene Regulation in Eucaryotes. VCH Verlagsgesellschaft mbH (Ed.), Weinheim. Wingender, E., 1994. Recognition of regulatory regions in genomic sequences. J. Biotechnol. 35, 273 – 280. Yoshimura, M., Oka, T., 1989. Isolation and structural analysis of mouse b-casein gene. Gene 78, 267 – 275. Yu-Lee, L., Richter-Mann, L., Couch, C.H., Stewart, A.F., Mackinlay, A.G., Rosen, J.M., 1986. Evolution of the casein multigene family: conserved sequences in the flanking and exon regions. Nucleic Acids Res. 14, 1883 – 1902.