Journal of Steroid Biochemistry & Molecular Biology 103 (2007) 352–356
In silico analysis of the 5 region of the Vitamin D receptor gene: Functional implications of evolutionary conservation J.A. Halsall a,∗ , J.E. Osborne b , P.E. Hutchinson b , J.H. Pringle a a
Department of Cancer Studies and Molecular Medicine, University of Leicester, Level 3, Robert Kilpatrick Clinical Sciences Building, Leicester LE2 7LX, UK b Department of Dermatology, Leicester Royal Infirmary, Leicester LE1 5WW, UK Received 30 November 2006
Abstract In recent years, the complexity of the 5 region of the Vitamin D receptor (VDR) gene has become apparent. Six exons, 1a–1f, lie upstream of the translation start codon in exon 2. Transcription has been reported beginning in exons 1f, 1a, 1d and 1c and alternative splicing can produce a large number of alternative mRNA transcripts. Exon 1d transcripts can code for two alternative proteins. This pattern of transcription produces several potential promoter regions. We have used a number of in silico tools to study the evolutionary conservation of the exon and promoter sequences of the 5 region. Those exons involved in the alternative VDR proteins, exons 1d and 1c, were well conserved from mouse and rat, unlike exons 1f, 1e and 1b which showed little homology. Exon 1a was also well conserved. Furthermore, 1a was shown to be within a strong CpG island and TraFac revealed a Sp1-MazR-Sp1-MazR cluster of transcription factor binding sites immediately upstream of exon 1a, a common motif in strong promoters. The promoter region upstream of 1f showed a highly conserved pattern of transcription factor binding sites and was also shown to be within a CpG island. No significant clusters of conserved sites were observed in the 1c promoter region, despite reports of 1c promoter activity. © 2006 Elsevier Ltd. All rights reserved. Keywords: Vitamin D receptor; Evolutionary conservation; Promoter regions; Bioinformatics
1. Introduction The 5 region of the Vitamin D receptor (VDR) gene is complex, consisting of six exons, 1a–1f. Downstream, exons 2–9 code for a 424 or 427 amino acid protein, depending on the Fok1 polymorphism [1], with a 3 untranslated region in exon 9. Crofts et al. [2] have reported transcripts beginning in exons 1a, 1d and 1f in a range of tissues and cell lines, with the majority of transcripts initiated in exon 1a. Further variation is introduced by the alternative splicing of the remaining 5 exons: exon 1a and 1d transcripts can splice directly to exon 2 or include one or both of exons 1b and 1c; similarly, exon 1f transcripts can include either one or both of exons 1e and 1c or neither. Exon 1d contains an ATG translation start codon such that transcripts beginning in this exon ∗
Corresponding author. E-mail address:
[email protected] (J.A. Halsall).
0960-0760/$ – see front matter © 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.jsbmb.2006.12.046
can code for two alternative proteins, VDRB1 and VDRB2 dependent on alternative splicing. The VDRB1 protein has been identified in several tissues [3]. Transcripts initiating in exon 1c have also been reported in breast cancer cells [4]. This complexity has major implications for transcriptional regulation of the VDR. With transcription reported initiating in four separate promoter regions, it follows that there are potentially four promoter regions. Exons 1a and 1d are separated by only 400 bases so may, for the most part, share a promoter. Exons 1f and 1c are 36,000 bases upstream and 22,000 bases downstream of exons 1a and 1d, respectively and therefore must be independently regulated. It remains unclear how the expression of the different VDR isoforms are regulated or why such a diversity of mRNA transcripts have evolved. Here, the 5 exons and promoter regions were examined for evolutionary conservation. The identification of evolutionarily conserved regions of DNA can help iden-
J.A. Halsall et al. / Journal of Steroid Biochemistry & Molecular Biology 103 (2007) 352–356
353
Fig. 1. Dotplot comparisons of human, mouse and rat genomic DNA (x axes) plotted against human mRNA, exons 1f-2 (y axes). Dotplot analysis was carried out with a window size of 25 and a mismatch limit of 7.
tify functional sequences. The more critically functional a sequence is, the more likely it is that alteration of this sequence will be selected against and that the sequence will remain preserved through from species to species. Exonic sequences are long enough that a simple sequence comparison approach is sufficient to identify conserved regions. For promoter regions, however, functional sequences may be much shorter, a single transcription factor binding site may be only six bases in length, making these approaches impractical (reviewed in [5]). Novel approaches have therefore been developed to identify conserved regulatory regions, combining the prediction of transcription factor binding sites within a sequence with a sequence comparison approach to identify regions of conserved transcriptional regulation.
were not masked by homologous regions of non-functional repeats. The putative promoter regions upstream of exons 1a, 1d, 1f and 1c were analysed for conservation of transcription factor binding sites using TraFac4 . The VDR human and mouse sequences are available on the TraFac database named VDR Hs and VDR Mm. A regulogram was produced for the regions immediately upstream of the four exons. The regulograms were used to identify regions where there was significant sequence homology and a significant number of conserved transcription factor binding sites. These regions were then analysed to produce TraFac images showing transcription factor binding sites present in equivalent regions in both sequences. To further improve the accuracy of the analysis, TraFac images were produced showing only transcription factor binding sites that occurred in parallel positions in the two sequences.
2. Materials and methods Dotplot comparison of the mouse and rat VDR genes with the human gene was carried out using the Molecular Toolkit nucleic acid dot plot software [6]1 with a window size of 25 and a mismatch limit of 7. The genomic sequences used for comparison were human chromosome 12 (Build 36.1), 46,630K–46,550K; mouse chromosome 15 (build 35.1), 97,968K–97,888K; rat chromosome 7 (v3.4), 136,674K–136,594K. In all three organisms, the VDR gene is on the reverse strand of the DNA so the reverse complement of these sequences was used. The human mRNA sequences were compared with the genomic DNA from each species, including the human gene as reference. The human and mouse sequences above were also used to produce a percentage identity plot (PIP)2 [7]. The human sequence was first analysed for common human repeat regions by the RepeatMasker software3 and repeat regions in the sequence “masked” so that regions of functional homology 1 2 3
http://arbl.cvmbs.colostate.edu/molkit/dnadot/index.html. http://pipmaker.bx.psu.edu/pipmaker/. http://www.repeatmasker.org.
3. Results 3.1. Exonic conservation Exons 1d and 1c, the two exons that code for the alternative proteins, were both well conserved from mouse and rat (Fig. 1). The major initiating exon for transcripts encoding the wildtype protein, exon 1a was also well conserved. Exon 1b was poorly conserved in mouse and not present in rat. Exon 1e showed very slight homology in both species while no sequence relating to the human exon 1f was found in either species. PIPmaker revealed that exon 1a was within a strong CpG island while exon 1f was in a slightly weaker island. 3.2. Conservation of regulatory regions As would be expected, TraFac revealed significant conservation of the 1a promoter region from mouse to human 4
http://trafac.cchmc.org/trafac/index.jsp.
354
J.A. Halsall et al. / Journal of Steroid Biochemistry & Molecular Biology 103 (2007) 352–356
Fig. 2. Analysis of the 1a and 1d promoter region by TraFac, showing strong conservation immediately upstream of exon 1a. The regulogram (top) shows the two aligned sequences, with the mouse sequence above and the human below. The number of predicted transcription factor binding sites for each sequence, the percentage identity and the number of conserved transcription factor binding sites (hits) are shown graphically between the two sequences. The TraFac image below shows the two sequences aligned with the human on the left and the mouse on the right. Only conserved, parallel transcription factor binding sites are shown.
(Fig. 2). A cluster of conserved transcription factor binding sites was found immediately upstream of the exon, including a Sp1-MazR-Sp1-MazR motif characteristic of a major promoter. There was no significant conservation of the promoter region between exons 1a and 1d. While exon 1f was not
found in the mouse, a cluster of transcription factor binding sites was found to be conserved between the human 1f promoter and the equivalent region in the mouse (Fig. 3). There was no significant conservation of the 1c promoter region (Fig. 4).
J.A. Halsall et al. / Journal of Steroid Biochemistry & Molecular Biology 103 (2007) 352–356
355
Fig. 3. Regulogram and TraFac analysis of the 1f promoter region, showing a strong region of homology.
4. Discussion The strong evolutionary conservation of exons 1d and 1c and the poor conservation of 1b is consistent with exons 1d and 1c being involved in coding for the alternative VDR proteins regions while 1b is purely part of the untranslated region. The strong conservation of exon 1a despite being noncoding is consistent with it being the major initiating exon and suggests it may contain binding sites for translational proteins. The promoter region immediately upstream of exon 1a contained a distinctive Sp1-MazR-Sp1-MazR, indicative of a major promoter region [8]. Furthermore, exon 1a was within
a strong CpG island and transcription from 1a can therefore be controlled by methylation, also indicative of a major promoter. While exons 1f and 1e were very poorly conserved, the promoter region upstream of exon 1f showed significant conservation and exon 1f was also shown to be within a CpG island. It may be therefore that transcripts beginning in an exon equivalent to 1f exist in the mouse although the exon sequence has diverged beyond recognition. There was little conserved promoter activity between exons 1a and 1d suggesting that the strong promoter region upstream of exon 1a regulates transcription from both exons. This is in agreement with previous reports of low promoter activity between exons
356
J.A. Halsall et al. / Journal of Steroid Biochemistry & Molecular Biology 103 (2007) 352–356
Fig. 4. A TraFac regulogram of the 1c promoter region. There was little sequence homology upstream of exon 1c and TraFac analysis did not show significant conservation.
1a and 1d [2]. It remains unclear whether there is any mechanism for differential expression of the VDRA and VDRB proteins or whether the two are always co-expressed. It is perhaps surprising that the major initiating exons for both the wildtype and variant protein are under the control of the same promoter while exon 1f transcripts – relatively minor transcripts coding for the same wildtype protein – are under the control of an entirely separate promoter. There was little conservation of the 1c promoter region; the Sp1 and Ap-2 sites reported by Byrne et al. [4] in human did not appear to be conserved from mouse. It may be that the initiation from 1c is a phenomenon that appears only in tumour cells and is not involved in normal VDR signalling.
5. Conclusion In silico analysis confirmed that the complexity of the 5 region of the VDR is not a purely human phenomenon and is evolutionarily relatively well conserved. Exons encoding the alternative VDR proteins were well conserved. While exons 1f and 1e were poorly conserved, the conservation of the 1f promoter region suggests that transcripts involving an exon 1f equivalent may exist in mouse although the exonic sequence has evolved beyond recognition. There was little evidence that the 1c promoter region is an important promoter.
Acknowledgements We would like to thank Dr Bruce Aronow, Associate Professor; Co-director, Computational Medicine Center,
Cincinnati Children’s Hospital for his help in updating and analysing the VDR sequences on the TraFac databases.
References [1] H. Arai, K. Miyamoto, Y. Taketani, H. Yamamoto, Y. Iemori, K. Morita, T. Tonai, T. Nishisho, S. Mori, E. Takeda, A Vitamin D receptor gene polymorphism in the translation initiation codon: effect on protein activity and relation to bone mineral density in Japanese women, J. Bone Miner. Res. 12 (6) (1997) 915–921. [2] L.A. Crofts, M.S. Hancock, N.A. Morrison, J.A. Eisman, Multiple promoters direct the tissue-specific expression of novel N-terminal variant human Vitamin D receptor gene transcripts, Proc. Natl. Acad. Sci. USA 95 (18) (1998) 10529–10534. [3] K.L. Sunn, T.A. Cock, L.A. Crofts, J.A. Eisman, E.M. Gardiner, Novel N-terminal variant of human VDR, Mol. Endocrinol. 15 (9) (2001) 1599–1609. [4] I.M. Byrne, L. Flanagan, M.P. Tenniswood, J. Welsh, Identification of a hormone-responsive promoter immediately upstream of exon 1c in the human Vitamin D receptor gene, Endocrinology 141 (8) (2000) 2829–2836. [5] M.A. Nobrega, L.A. Pennacchio, Comparative genomic analysis as a tool for biological discovery, J. Physiol. (Lond) 554 (1) (2004) 31– 39. [6] J. Pustell, F.C. Kafatos, A high speed, high capacity homology matrix: zooming through SV40 and polyoma, Nucleic Acids Res. 10 (15) (1982) 4765–4782. [7] S. Schwartz, Z. Zhang, K.A. Frazer, A. Smit, C. Riemer, J. Bouck, R. Gibbs, R. Hardison, W. Miller, PipMaker—a web server for aligning two genomic DNA sequences, Genome Res. 10 (4) (2000) 577– 586. [8] J. Song, H. Ugai, K. Ogawa, Y. Wang, A. Sarai, Y. Obata, I. Kanazawa, K. Sun, K. Itakura, K.K. Yokoyama, Two consecutive zinc fingers in Sp1 and in MAZ are essential for interactions with cis-elements, J. Biol. Chem. 276 (32) (2001) 30429–30434.