Complex controls: the role of alternative promoters in mammalian genomes

Complex controls: the role of alternative promoters in mammalian genomes

640 Review TRENDS in Genetics Vol.19 No.11 November 2003 Complex controls: the role of alternative promoters in mammalian genomesq Josette-Rene´e L...

180KB Sizes 0 Downloads 201 Views

640

Review

TRENDS in Genetics Vol.19 No.11 November 2003

Complex controls: the role of alternative promoters in mammalian genomesq Josette-Rene´e Landry*, Dixie L. Mager and Brian T. Wilhelm* Terry Fox Laboratory, British Columbia Cancer Agency and Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z IL3, Canada

Despite the wealth of sequence data available from human, mouse and other genomes, our understanding of the mechanisms involved in regulating transcription and creating protein diversity is incomplete. Although effects such as alternative splicing have been extensively studied, other less well-characterized phenomena are now being recognized that increase the complexity of the transcribed portion of the genome. One mechanism, supported by a growing body of research, is the use of alternative promoters. In this article, we describe the consequences and significance of alternative promoter usage in human and mouse genomes, demonstrating that mammalian genes commonly employ multiple promoters to regulate and increase their transcriptional and translational potential. Estimates for the total number of human and mouse protein-coding genes currently fall in the range of 20 – 30 000 [1,2], whereas those of simpler organisms such as Drosophila melonagaster (fruit fly) and Caenorhabditis elegans (worm) are lower with , 13 000 and 18 000 genes, respectively [1]. Given that mammalian genomes have less than twice as many genes as D. melonagaster and C. elegans, it is generally believed that the phenotypic complexity of higher organisms is achieved, not only by higher gene numbers, but also by multiple proteins being encoded by a single gene and by the number of protein – protein interactions [1]. The most well described mechanism that results in the production of multiple protein isoforms from a single gene locus is alternative splicing and it has been estimated that 35 – 50% of all human genes give rise to alternatively spliced mRNAs [1,3]. An increase in the number of recent studies reporting the existence of alternative PROMOTERS (see Glossary) for genes demonstrates that this phenomenon is another important source for generating protein and regulatory diversity. In this article we examine the diverse consequences of ALTERNATIVE PROMOTER usage by highlighting various recently published examples. We also discuss the role of repetitive elements in alternative q Supplementary data associated with this article can be found at doi: 10.1016/j.tig. 2003.09.014 * These authors contributed equally. Corresponding author: Dixie L. Mager ([email protected]).

promotion and the connection between alternative splicing and alternative promoter usage. Prevalence of alternative promoters To date, there has been no genome-wide analysis performed to specifically address the prevalence of alternative promoter usage in mammals, although two recent studies do shed some valuable light on the potential impact of alternative promoters. Zavolan et al. investigated the frequency of alternatively spliced transcripts in the mouse genome, including those with alternative first exons (which would contain alternative-spliced first exons as well as transcripts originating from alternative promoters), using 21 000 full-length mouse cDNA clones available from RIKEN (http://genome.gsc.riken.go.jp). This analysis revealed that 9% of the mouse genes in the dataset contained alternative first exons [4]. By contrast, Trinklein et al. used a database of full-length human transcripts generated by the Mammalian Gene Collection (http://mgc.nci.nih.gov/) to test 152 putative promoter regions (selected at random), which were defined as the sequence located 2550 to þ 50 relative to the 50 end of each cDNA clone. They noted that 28 (18%) of the genes examined had more than one transcriptional start site, separated by a distance of greater than 500 bp and that these were likely to represent alternative promoters. Of these 28 genes, at least 20 exhibited promoter activity, suggesting that these putative alternative promoters were functional [5]. To obtain an overall estimate of alternative promoter usage in humans, we analyzed , 67 000 human transcripts (,18 000 loci) from the NCBI LocusLink database (http://www.ncbi.nlm.nih.gov/LocusLink/) using a simple computational methodology, which is described in more detail online (see supplementary material online). We compared all transcripts for a given gene using the Glossary Promoter: the genomic sequence immediately upstream of the transcriptional start site defined by the 50 end of an mRNA. It is this region that is presumed to bind the transacting factors required to transcribe the gene. Alternative promoter: an alternative region from which transcripts of a gene originate. The existence of multiple transcripts for a single gene that differ in their 50 termini reflects the presence of alternative promoters.

http://tigs.trends.com 0168-9525/$ - see front matter q 2003 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2003.09.014

Review

BLAST2 [6] program to identify transcripts that differed at their 50 termini, because alternatively promoted transcripts should begin with non-identical sequences. This analysis revealed that , 18% of all human genes in the dataset have evidence for alternative promoter usage, and this estimate also accounts for false positives caused by unreliable expressed sequence tags (ESTs), determined through manual validation. The calculated frequency agrees with the previously published human analysis, although it is higher than reported for the mouse. It is possible that a more comprehensive investigation, specifically focused on alternative promoter usage in mice, might yield a frequency more similar to human. In addition, the fact that over 200 papers dealing with the identification of alternative promoters have been published since January of 2000 indicates that this is a common phenomenon. A summary of recently published studies of genes with alternative promoters is shown in Table 1, organized by

(a) Same protein (i) CYP19 1a

641

TRENDS in Genetics Vol.19 No.11 November 2003

class in terms of the result of the alternative promoter usage, as depicted in Figure 1. Consequences of alternative promoters Transcript variants with diverse transcription pattern and translation efficiency For many genes for which multiple promoters have been documented, no variation in the resulting proteins has been reported [7– 24]. In these genes, although the mRNAs have alternative initial exons, a common downstream exon that contains the translation initiation site and they therefore have the same open reading frame (ORF) (Figure 1a). Although no protein isoforms are generated in these instances, the mRNA variants differ in their transcriptional patterns and translational efficiencies. These alternative promoters have different tissue specificity [7,8], developmental activity [9,10] and/or expression level [11,12] or the variant 50 untranslated regions (UTRs)

(ii) p18 (INK4c) 1c

1b

2′

1

ATG

ATG

... 2 3 4 5 ... 10

2

(b) Different N-termini (i) PCDHγ 1a ATG

3

(ii) PPARγ 1a

1b ATG

1b ATG

ATG

2 3 4 5 6 7

2 3 4 (iii) SHC (p52-p46 and p66)

(iv) p73 ATG

1

2′

1

ATG

ATG

3′

ATG ...

... 2 3 4 5 6 ... 13 (c) Different protein (i) INK4a-ARF 1a

ATG

2 3

3

4

5 6 ... 14

(ii) p21 (Waf1-Cip1) and p21B 1b

1a

ATG

1b

ATG

2 ATG TRENDS in Genetics

Figure 1. The types and consequences of alternative promoters. (a) The use of alternative promoters (represented by arrows) does not result in protein isoforms because the variant 50 initial exons (coloured boxes) are joined to a common second exon that contains the translation initiation site, shown as ATG. The black boxes illustrate coding exons and 30 untranslated regions (UTRs) are not shown. Note that splicing, represented by solid lines, is only shown between the first and second exons for (a) and (b). (b) Using multiple promoters produces mRNAs that encode protein isoforms differing in their N-termini. (c) Use of the alternative promoters creates transcripts that code for different proteins as they are translated in different reading frames (represented by the black and white boxes). An example of a gene representing each type of alternative promoter usage present in both human and mouse is given with the exception of 1c, ii where the alternative promoter has only been identified in human. In some cases, not all promoters are shown. http://tigs.trends.com

642

Review

TRENDS in Genetics Vol.19 No.11 November 2003

Table 1. Selected human and rodent genes with alternative promotersa b,c

Gene

Species

Type

CYP19

Human: .five

a, i

Mouse: three

a, ii

SRC

Human: two

a, i

Ly49

Mouse: two

a, i

MID1

Human: five

a,i

APOC1 EDNRB GATA-2

Mouse: five Rat: three Human: two Human: two Human: two

a, ii a, i a, i a, i

d

Feature

Refs

Promoters have different tissue specificity and one human promoter is of retroviral origin

[7,16,26]

Differential expression; one promoter is tissue restricted, the second is housekeeping Differential expression during development; one promoter might be required for induction, the second for maintenance of expression Promoters have different tissue specificity and one human promoter is of retroviral origin

[8] [9] [10,17]

Alternative promoter is of retroviral origin Alternative promoter is of retroviral origin Differential expression; one promoter predominantly erythroid; the second is house-keeping

[11,12] [11,12] [18,19]

Different first exon has different translatation efficiency and might have different tissue specificity

[15]

Different first exons have different tissue specificity and translation efficiency. Promoter usage influences alternative splicing. N-truncated form is from an intronic promoter, and is also expressed in human testis

[14,20,29]

Mouse: two PPARb or d

NOS1

UROS

Mouse: four

a, i

Human: nine

a, ii a, i

Rat: three

b, iv

Human: two

a, ii

Differential expression; one promoter predominantly expressed in erythroid; the second is house-keeping

[21,22]

a, i

Different tissue specificity; one isoform is expressed predominantly in ovary, another in liver (rat)

[23,24]

Mouse: two PRLR

Human: six Rat: three

DAB1

Human: six

a, i

One of the 50 UTR isoforms is over 1 kb and consists of seven exons in human and ten in mouse

[53]

p18 (INK4c)

Mouse: four M: two Human: two

a, ii

Different first exon has different tissue specificity and translation efficiency

[13]

PCDH-g

Human: 22

0

b, i

Genomic organization similar to Ig and TCR but variable 5 ends encoded by alternative first exon are present, each with their own promoter

[41]

b, i

Genomic organization similar to Ig and TCR but variable 50 ends are encoded by the alternative first exon are present, each with their own promoter

[40]

Mouse: 22 PCDH-a

Human: 15 Mouse: 14

p63

Human: two Mouse: two

b, iv

Different function: DN isoform is a dominant negative of p53 and p63

[34]

p73

Human: two

b, iv

Different function; DN isoform is a dominant negative of p53 and p73 (whereas p73 is anti-apoptotic, whereas deltaNp73 is pro-apoptotic)

[30,31]

SHC ( p52-p46 and p66)

Human: two

b, III

Differential expression and function: p52 and p46 are ubiquitous and involved in Ras activation whereas p66 is tissue-specific and not involved in Ras activation

[54,55]

NFATC1

Mouse: two Human: two

b, i

Different lineage-specificity and the upstream promoter is positively regulated by gene product

[56]

b, iv

Isoforms have similar activity but different subcellular localization and tissue specificity

[38]

b, ii

The use of four promoters results in two isoforms with different tissue specificity

[57,58]

Mouse: two

Mouse: two DNMT3A

Human: two Mouse: two

PPARg

Human: four Mouse: two

HNF4a

Human: two

b, i

Different activity and expression; one isoform is predominantly expressed in islet of Langerhans whereas the other is expressed in liver and kidney; one isoform has reduced activity

[59,60]

Oprm

Mouse: two Mouse: two

b, i

[61]

PTCH1

Human: three

b, ii

Differential expression: one 50 isoform is ubiquitous in brain whereas the other has a regional distribution in brain. Different activity; all isoforms interact with Smoh but only one inhibits its signaling

Ink4a-ARF

Human: two

c, i

Different function; INK4 inhibits CDKs, which leads to G1 arrest; ARF inhibits MDM2

[42,44]

p21 (Waf1-Cip1) and p21B

Human: two

c, ii

Different function; p21B induces apoptosis whereas p21 induces cell cycle arrest

[45]

[62]

Mouse: two

a

Abbreviations: APOC1, apolipoprotein C1; CYP19, cytochrome P450 family 19; DAB1, disabled homologue 1; DNMT3A, DNA (cytosine-5-)-methyltransferase 3 alpha; EDNRB, endothelin B receptor; GATA-2, GATA binding protein 2; Ig, immunoglobulin; HNF4a, hepatocyte nuclear factor 4a; Ly49, lymphocyte antigen 49; MID1, midline 1; NOS1, neuronal nitric oxide synthase; NFATC1, nuclear factor of activated T-cells C1; Oprm, opioid receptor mu; PPARg, peroxisome proliferator activator receptor g; p18 (INK4c), inhibitor of CDK4; PCDH, protocadherin; PRLR, prolactin receptor; PTCH1, patched homolog 1; Ink4a-ARF, inhibitor of CDK4 and alternative reading frame; p21 (Waf-Cip1); TCR, T-cell receptor; SHC ( p52-p46 and p66), Src homology 2 domain containing; Smoh, Smoothened; SRC, sarcoma; UROS, uropophyrinogen III synthase; p21(Cip1/Waf1), cyclin-dependent kinase (CDK) inhibitor (where Cip1 is CDK-interacting protein 1 and Waf1 is wild-type p53-associated fragment 1). b Species in which the alternative promoters have been identified. c The number of alternative promoters identified for each species. d Type, refers to the representation in Figure 1.

http://tigs.trends.com

Review

TRENDS in Genetics Vol.19 No.11 November 2003

might differ in their secondary structure and/or presence of upstream ORFs, which can affect translation [13– 15]. A well-documented example of a human gene with tissue-specific expression governed by the usage of alternative promoters is the CYP19 gene (Figure 1a,i). This gene encodes the aromatase P450 protein, which is involved in the conversion of C19 steroids to C18 estrogens. As in other mammals, gonadal and brain-specific promoters in humans direct the tissue-restricted expression of CYP19 [7,16]. In addition, an upstream promoter situated , 100 kb from the coding region that is unique to humans and other primates also drives high expression of this gene in the placenta [25]. Notably, although this placentaspecific promoter has been extensively studied, it was only recently appreciated that this promoter element is derived from a long terminal repeat (LTR) of a primate endogenous retrovirus [26]. Indeed, the use of endogenous retroviral sequences and other transposable elements as transcriptional promoters, although often over looked, is not uncommon in the genome (Box 1). An interesting example of alternative promoter usage during development is provided by the murine Ly49 multigene family, members of which are expressed on the surface of natural killer (NK) cells and which act to regulate the cytolytic activity of these cells. Each NK cell randomly initiates expression of a subset of these genes with individual genes varying in the frequency with which they are expressed in a characteristic manner despite a high level of similarity in the promoter regions used in adult NK cells [27,28]. Recently, an alternative Ly49 promoter (of type a,i in Figure 1) has been shown to exist a few kilobases upstream of several genes, the use of which appears to be restricted to bone marrow and fetal thymus where receptor expression is likely to be initiated [9]. Interestingly, this novel ‘immature’ promoter region seemed to show a correlation between in vitro activity and the frequency with which the genes are expressed in NK cells in vivo [9]. This suggests that the role of this alternative promoter might be to initiate expression of a gene during development, allowing the downstream promoter to maintain expression at later stages. Alternative promoter usage can also influence processing of the transcript by the translational machinery. The p18 (INK4c) gene, a member of the p16 – INK family of cyclin-dependent kinase inhibitors, illustrates not only this effect but also shows differences in promoter usage during development. Two promoters have been identified for the murine p18 (INK4c) gene that produce transcripts encoding the same protein, but which differ by the presence of an additional 1.1 kb of sequence in the 50 UTR [13] (Figure 1a,ii). In undifferentiated C2C12 myoblasts, all detectable p18 transcripts originate at the upstream promoter. Once differentiation begins, transcription rapidly and completely shifts to the downstream promoter which results in a significantly shortened 50 UTR. The consequence of this UTR shortening is a 50-fold increase in the amount of p18 present in the cell, even though the steady state concentration of p18 mRNA remains unchanged. The increase in p18 in the cell is postulated to be involved in the permanent arrest in cellcycling that occurs in terminal cell differentiation. http://tigs.trends.com

643

The human neuronal isoform of the nitric oxide synthase gene (NOS1) is another striking example of multiple transcriptional and translational effects caused by alternative promoters. This gene encodes one of three enzymes that synthesize nitric oxide, which acts as an intracellular messenger in numerous biological processes. The human NOS1 gene has nine alternative first exons (of type a,i in Figure 1) exhibiting differences in tissue specificity and translational efficiency. Although these alternative first exons all splice into a common second exon, in some cases an alternatively spliced exon is inserted between the different first exons and the common second exon [14]. However, inclusion of this alternative exon varies in frequency depending on which first exon is used, suggesting a possible correlation between the use of alternative promoters and the extent of alternative splicing observed in that gene (Box 2). In addition to the nine initial exons mentioned above, a tenth 50 transcript variant has also been identified for NOS1 (of type b,iv in Figure 1) which originates from an intronic promoter. The expression of this variant form is testis-specific and results in an N-terminally truncated protein, unlike the other NOS1 mRNA isoforms that encode identical proteins [29]. Although the function of this truncated transcript and protein are unknown, other better-characterized examples of alternative promoters creating protein variants are discussed in the following sections. Protein isoforms with different N-termini In the previously described examples, the use of alternative promoters does not result in the production of different proteins. However, in other cases, transcripts driven from different promoters are translated into distinct protein isoforms because their variant 50 exons contain alternative Box 1. Repetitive elements Mammalian genomes harbor several classes of interspersed repetitive DNA derived from mobile genetic elements, which together account for at least 45% of the human genome [1] and 39% of the mouse genome [2]. Although the bulk of these repeats are generally considered as ‘junk DNA’, reports in recent years have indicated that specific elements have evolved a biological function by donating transcriptional regulatory signals, including alternative promoters, to cellular genes. For example, in addition to driving the expression of CYP19, endogenous retroviruses (ERVs) have been found to contribute alternative promoters to the human apolipoprotein C1 (APDC1), endothelin B receptor (ENDRB) [11,12] and midline 1 (MID1) [10] genes, whereas in mouse, many examples of oncogene activation owing to retroviral promoter insertions have been reported [47]. Two recent large-scale studies have determined that repeats do indeed participate in the regulation of numerous human and mouse genes. Our group has shown that 3.6% of human and 1% of mouse RefSeq genes (http://www.ncbi.nlm.nih.gov/RefSeq/) initiate within a long interspersed nuclear element (LINE), short interspersed nuclear element (SINE) or ERV, suggesting that these transcripts are promoted by these transposable elements (TEs) [26]. In addition, Jordan et al. have determined that nearly 25% of regulatory regions in the Human Promoter Database (http://zlab.bu. edu/~mfrith/HPD.html) contain TE sequences [48], although this high value does not directly reflect functional relevance for these elements. Combined, these studies suggest that a large number of alternative promoters are TE-derived and demonstrates how the genome has co-opted such elements for the benefit of the host.

644

Review

TRENDS in Genetics Vol.19 No.11 November 2003

Box 2. Connection between alternative splicing and alternative promoters The role of alternative promoters might extend beyond the direct effects discussed in this article. There is recent evidence, from both human and mouse genes, that alternative promoters play a role in regulating the products of the splicing process, thereby indirectly altering the protein-coding potential of a transcript. For several genes, including the human neuronal isoform of the nitric oxide synthase gene (NOS1) [14], the mouse bcl-X [49] and human caspase-2 (CASP2) [50], the downstream alternative splicing that occurs in these transcripts correlates with the promoter used to generate the transcript. For example, usage of the second downstream noncoding first exon in the caspase-2 gene is associated with inclusion of the variable ninth exon, which produces the short isoform. Similarly the use of several of the bcl-X promoters is associated with an alternative splicing pattern that produces a specific ratio of long and short isoforms of the protein. Thus, for these genes, promoter choice appears to determine the inclusion or exclusion of an alternatively spliced exon. Two mechanisms have been proposed to explain the observed connection between promoter usage and splicing events. The first involves differential interaction between the trans-acting factors that influence the splicing machinery and the alternative promoter regions [51]. It is proposed that these factors influence either the inclusion or exclusion of alternative exons because they differ in their ability to stimulate either the initiation or elongation steps of transcription by RNA polymerase II [52]. The second mechanism explains alternative splicing as a result of secondary structure differences caused by the inclusion of unique sequence at the 50 end of the transcript [49]. These differences in turn are proposed to affect the activity of the splicing machinery. Neither of these theories are necessarily mutually exclusive, however, and further experimentation will be required to resolve the precise mechanisms involved for each case.

ATGs. For some genes the use of alternative transcription start sites results in proteins that differ in their N-terminus (Figure 1b,i –iii) whereas for others, the use of an intronic promoter will result in a truncated or DN isoform (Figure 1b,iv). A well-characterized example of a gene that produces a truncated protein isoform is p73, a member of the p53 family. The normal function of the p73 protein largely overlaps with p53, as it is involved in cell cycle arrest, initiation of apoptosis and transactivation of similar genes. A total of six splice variants have been identified for the human p73 gene, where five are variants involving the last five exons at the C-terminus, and the sixth is a variant that splices out the second exon (Dexon2). Recently, another p73 transcript has been characterized, first in mouse [30] and subsequently in humans [31] that originates from a novel promoter within the third intron of the gene (Figure 1b,iv). This transcript encodes the DNp73 protein, which functions as a dominant negative regulator of both p53 and p73 [30,32] in a fashion similar to that described for the Dexon2 form [33]. Analysis of a variety of tumors by Zaika and co-workers demonstrated a strong trend in the upregulation of dominant negative p73 isoform (Dexon2 or DNp73) and the of wildtype p53 in the tumor [31]. It is probable, based even on the small sample size used, that there is a competitive growth advantage for tumors that express the dominant negative forms. The existence of this alternative promoter, and more importantly its conservation between http://tigs.trends.com

species, suggests that there is some benefit for an organism to express a form of the p73 protein that is potentially oncogenic. Evidence for a vital role of this protein was demonstrated by Pozniak et al. who looked at the function of p73 in developing mouse neurons in the sympathetic superior cervical ganglion (SCG) [30]. They showed that the majority of p73 was present as the alternatively promoted DNp73 isoform and that this protein was necessary to counteract the apoptotic signals generated from p53. This functional role is supported by the observation that neuronal cell death in mice that lack all p73 protein is greatly increased. The situation is similar for the p63 gene, which is not only closely related to p73 but also encodes an isoform truncated in the N-terminus [34]. The DNp63 form, similar to DNp73, acts as a dominant negative regulator of p63 and p53. The similarity between p63 and p73 is also shown by the fact that DNp63 is upregulated in some human tumors [35]. In the case of both p63 and p73, the dominant negative forms of the proteins created by the alternative promoters appear to be vital for normal growth and development [30,36]. Despite their necessity, the unregulated activity of these alternative promoters seems to predispose the cell to malignancy. Examples of dominant negative protein isoforms have also been described in D. melanogaster, showing that the role of alternative promoters is not restricted to vertebrates. A recent study of the STAT92E gene (signal transducers and activator of transcription) [37] demonstrated that an internal promoter generates a N-terminally truncated dominant-negative protein isoform. Interestingly, the majority of transcripts produced from this gene during various developmental stages encode the dominant-negative protein. RNA interference (RNAi) experiments that altered the ratio of full length to truncated protein showed that disruption of the normal expression ratio pattern, but not the overall amount of STAT92E, was associated with severe developmental defects. There are other examples of changes in the N-terminus as a result of alternative promoter usage that do not lead to dominant-negative effects. Chen and co-workers recently demonstrated that a novel internal promoter (of type b, iv in Figure 1) exists in the sixth exon of both the human and mouse Dnmt3a genes, which encode a cytosine 5-methly transferase enzyme [38]. Analysis of the mouse promoter revealed that it was responsible for the production of a protein isoform that lack the N-terminal 219 amino acid residues (called DNMT3a2). Although the deleted N-terminus does not affect the enzymatic activity of the protein, the subcellular localization of DNMT3a2 is altered. Fluorescent microscopy and cell fractionation experiments revealed that Dnmt3a2 is located in euchromatin whereas the full-length protein is located in the heterochromatic regions. For other genes, the functional significance of protein variants with different N-termini that were produced from alternative promoters is not yet established. The human and mouse protocadherin (PCDH) genes represent an extreme example of this phenomenon (Figure 1b,i). These genes are organized in three tandem clusters, PCDH-a, -b and -g, situated on human chromosome 5 and mouse

Review

TRENDS in Genetics Vol.19 No.11 November 2003

chromosome 18 [39]. The PCDH-a and -g gene clusters each have a variable region containing at least 15 alternative first exons, which encode the distinct N-terminal cadherin-like extracellular and transmembrane domains, as well as a constant region with three exons encoding the invariable cytoplasmic domain [39]. Although the genomic structure of the PCDH clusters resemble that of the immunoglobulin (Ig) and T cell receptor (TCR) loci, expression of the various PCDH isoforms does not result from somatic DNA recombination but is instead under the control of multiple promoters. Each PCDH variable first exon appears to be transcribed from a separate promoter, suggesting that the decision regarding which initial exon is included in a PCDH transcript is dependent on the alternative promoter used [40,41]. Different proteins encoded by alternative open reading frames The role of alternative promoters is not limited to simply generating transcripts that vary in their 50 end. Two welldocumented examples exist where alternative promoters encode different proteins through either alternative reading frames or splicing variation to create novel ORFs (Figure 1c). The cyclin-dependent kinase inhibitor 2A gene (CDKN2A also known as INK4a and ARF) encodes two different proteins that influence the activity of the tumor suppressor genes, p53 and retinoblastoma 1 (RB1). Transcripts from this gene are generated from two alternative promoters resulting in first exons with different translational start sites [42]. The two differing first exons splice into a common second exon but are read in different frames and, as a result, the protein from the upstream promoter terminates in the second exon whereas the protein translated from the downstream promoter terminates in the third exon (Figure 1c,i). In addition to alternative promoter usage, the promoters for the human gene have differing tissue specificity and the downstream promoted transcript also undergoes alternative splicing to generate protein isoforms [43]. Given the combination of alternative promoters, alternative reading frames, differences in tissue specific expression and alternative splicing, CDKN2A is an excellent example of the multiple effects that generate protein diversity from a single locus. This gene is also of particular interest considering that the generated proteins function in the same biological pathway but differ in their roles, with the INK4 protein inhibiting CDKs and the ARF protein blocking the activity of MDM2 [44]. A second example of the combination of alternative promoters and alternative reading frames is the human p21 gene. Analysis of this gene has recently revealed that it has two promoters, which generate three distinct transcripts encoding two different proteins [45]. The novel promoter in the first intron of the gene generates two transcripts that differ in their splice pattern. Although one transcript splices into the same exon (containing the coding region) as the upstream promoter, the other transcript fails to splice out a large part of the first intron that contains a novel ORF encoding p21B (Figure 1c,ii). http://tigs.trends.com

645

Interestingly, all three transcripts are upregulated by p53, but although the full-length p21 (produced by two of the transcripts) induces cell-cycle arrest, p21B appears to stimulate apoptosis. Conclusions and unanswered questions It is clear from the wide variety of documented examples that the effects of alternative promoter usage are highly varied. The simplest and most common scenario involves two or more promoters that produce transcripts with identical ORFs (Figure 1a). From studies published to date, it appears that 60 –80% of genes with alternative promoters are of this type. Because the protein remains unchanged, alterations in tissue or developmental specificity are the main consequences in such cases. In more complex situations, in addition to possible transcriptional differences, the usage of alternative promoters can result in altered N-termini of proteins (Figure 1b) or create alternative ORFs (Figure 1c), although the latter situation is likely to be quite rare. Several questions remain to be answered regarding the prevalence, origin and function of alternative promoters. Of central importance is to determine if the resulting variation in transcript and protein forms is necessary for the cell or organism, or if it simply represents an inherent ‘leakiness’ in transcriptional control of the genome. Perhaps the strongest evidence supporting the idea that complex control of genes is not simply a biological artifact is the extent to which complex promoter structures are conserved between species. The examples of homologous genes with multiple promoters that are conserved in humans and rodents (Table 1) suggests that, in many cases, selective pressure has acted to maintain this complex regulation of gene expression. No doubt there are other genes that are regulated by multiple promoters in both species and it would be interesting to determine how many genes have identifiable alternative promoters in their last common mammalian ancestor. One can also ask if alternative promoters play a role in generating species-specific diversity. A comprehensive analysis of the alternatively promoted genes that are conserved between species, as well as those that are species-specific, would be useful. It would also be interesting to determine if genes of certain functional classes are more or less likely to have alternative promoters. The common occurance of alternative promoter usage in mammalian genomes invites the question of how alternative promoters evolved and how they are regulated. One can envisage several pathways that could lead to the creation of an alternative promoter and we will mention four here (Figure 2). First, it is possible that mutations over time eventually create a cluster of new functional motifs that can recruit the transcriptional machinery and that are in a favorable genomic position to serve as a promoter for the downstream gene (Figure 2a). As many predicted binding sites for transcription factors are relatively short, such an occurrence is not unlikely. Second, a recombination event could duplicate an entire promoter region and subsequent mutations could alter the strengths and tissue specificities of the promoters (Figure 2b). In such cases, the alternative promoter regions should share some sequence similarity, depending

Review

646

TRENDS in Genetics Vol.19 No.11 November 2003

on the age of the duplication and on selective pressures [46]. Third, the insertion of a transposable element (TE) in the vicinity of a gene can create a promoter, and numerous examples of this phenomenon are now known (Box 1; Figure 2c). Finally, other genomic rearrangements or duplications could result in the introduction of novel promoter elements in the vicinity of the gene (Figure 2d). Regardless of the mechanism used to create an alternative promoter, all such elements will only persist if they can withstand selective pressure. Therefore, unless the instantaneous donation of a promoter (through recombination or TE insertion) is beneficial, or at least neutral to the host, cells or offspring with this novel promoter might be at a ‘metabolic disadvantage’. The consequences of this disadvantage would vary significantly depending on gene function. It is perhaps more likely that a slow and gradual accumulation of mutations that result in a novel location of transcription would be the most frequent evolutionary path followed. A process that advances in a gradual fashion would allow time for subfunctionalization of the alternatively promoted transcripts

and avoid the immediate negative evolutionary pressure from the sudden creation of a strong promoter. For instance, a gene involved in glucose transport might acquire a weak novel promoter through gradual mutations in the region upstream of the gene. Individuals with this novel promoter might not experience any selective pressure until several generations later when a second random mutation creates a binding-site for a liver specific enhancer where the altered transcript form might be able to provide some novel function. In the case of the sudden creation of potential promoters, such as those derived from TE insertions, it is possible that the TE is silenced upon insertion (thereby escaping negative selection) but might later be allowed to become active in certain tissues or developmental stages if alternative gene promotion from the TE provides a beneficial level of transcriptional diversity. To determine how the regulation of genes with multiple promoters has evolved, it will be informative to examine other species. Conservation of alternative promoters through evolutionary distance implies either a common origin or convergent evolution. Careful analysis

(a) Gradual mutations 1a

1

1b

2 3 4 5

2 3 4 5

(b) Local duplication 1a

1

1b

2 3 4 5

2 3 4 5

(c) Transposable element insertion 1a

1

1b

2 3 4 5

2 3 4 5

(d) Other genomic rearrangement 1a

1

2 3

1b

2 3 TRENDS in Genetics

Figure 2. Proposed mechanisms that could lead to creation of alternative promoters. A hypothetical gene is shown with a non-coding first exon (green box) and several coding exons (black boxes). Promoters are represented by bent arrows. (a) Random mutations (represented by blue arrows) generate functional transcription factor bindingsites, which create an alternative promoter and alternative first exon (blue box). (b) Duplication of the 50 end of a gene (denoted by a bracket) results in using an alternative promoter and first exon (shown as a light green box) which might diverge with time from the parental sequence. (c) An inserted mobile element, such as a retroviral long terminal repeat (shown as a large blue arrow), contributes an alternative promoter to the gene. (d) A potential promoter region and exon (shown as a blue box) is brought into the vicinity of a gene as a result of a large-scale deletion or other genomic rearrangement. http://tigs.trends.com

Review

TRENDS in Genetics Vol.19 No.11 November 2003

might be able to distinguish the two and give an estimate of the relative frequency at which alternative promoters evolve. In conclusion, the usage of alternative promoters is highly prevalent in mammalian genomes and it seems likely that these promoters play an important role in our biology. Given the diversity that alternative promoters create, both at the level of mRNA distribution and protein structure, this phenomenon appears to be a key mechanism for generating organismal complexity.

20

21

22

23

Acknowledgements We thank Catherine Dunn for helpful discussions. This work was supported by a Canadian Institutes of Health Research (CIHR) grant to D.L.M. J-R.L was supported by studentships from CIHR and the Michael Smith Foundation for Health Research (MSFHR) and B.T.W. by a studentship from MSFHR.

References 1 Lander, E.S. et al. (2001) Initial sequencing and analysis of the human genome. Nature 409, 860 – 921 2 Waterston, R.H. et al. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520 – 562 3 Modrek, B. et al. (2001) Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res. 29, 2850 – 2859 4 Zavolan, M. et al. (2002) Splice variation in mouse full-length cDNAs identified by mapping to the mouse genome. Genome Res. 12, 1377 – 1385 5 Trinklein, N.D. et al. (2003) Identification and functional analysis of human transcriptional promoters. Genome Res. 13, 308 – 312 6 Tatusova, T.A. and Madden, T.L. (1999) BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol. Lett. 174, 247 – 250 7 Kamat, A. et al. (2002) Mechanisms in tissue-specific regulation of estrogen biosynthesis in humans. Trends Endocrinol. Metab. 13, 122 – 128 8 Bonham, K. et al. (2000) An alternative human SRC promoter and its regulation by hepatic nuclear factor-1alpha. J. Biol. Chem. 275, 37604 – 37611 9 Saleh, A. et al. (2002) Identification of a novel Ly49 promoter that is active in bone marrow and fetal thymus. J. Immunol. 168, 5163– 5169 10 Landry, J-R. et al. (2002) The Opitz syndrome gene Mid1 is transcribed from a human endogenous retroviral promoter. Mol. Biol. Evol. 19, 1934 – 1942 11 Medstrand, P. et al. (2001) Long terminal repeats are used as alternative promoters for the endothelin B receptor and apolipoprotein C-I genes in humans. J. Biol. Chem. 276, 1896 – 1903 12 Landry, J-R. and Mager, D.L. (2003) Functional analysis of the endogenous retroviral promoter of the human endothelin B receptor gene. J. Virol. 77, 7459 – 7466 13 Phelps, D.E. et al. (1998) Coupled transcriptional and translational control of cyclin-dependent kinase inhibitor p18INK4c expression during myogenesis. Mol. Cell. Biol. 18, 2334 – 2343 14 Wang, Y. et al. (1999) RNA diversity has profound effects on the translation of neuronal nitric oxide synthase. Proc. Natl. Acad. Sci. U. S. A. 96, 12150 – 12155 15 Larsen, L.K. et al. (2002) Genomic organization of the mouse peroxisome proliferator-activated receptor beta/delta gene: alternative promoter usage and splicing yield transcripts exhibiting differential translational efficiency. Biochem. J. 366, 767 – 775 16 Golovine, K. et al. (2003) Three different promoters control expression of the aromatase cytochrome P450 gene (Cyp19) in mouse gonads and testis. Biol. Reprod. 68, 978 – 984 17 Landry, J-R. and Mager, D.L. (2002) Widely spaced alternative promoters, conserved between human and rodent, control expression of the Opitz Syndrome gene MID1. Genomics 80, 499– 508 18 Minegishi, N. et al. (1998) Alternative promoters regulate transcription of the mouse GATA-2 gene. J. Biol. Chem. 273, 3625 – 3634 19 Pan, X. et al. (2000) Identification of human GATA-2 gene distal IS http://tigs.trends.com

24

25

26

27 28

29

30

31

32

33

34

35 36 37

38

39 40

41

42

43

647

exon and its expression in hematopoietic stem cell fractions. J Biochem. (Tokyo) 127, 105– 112 Lee, M.A. et al. (1997) Tissue- and development-specific expression of mutliple alternatively spliced transcripts of rat neuronal nitric oxide synthase. J. Clin. Invest. 100, 1507– 1512 Aizencang, G.I. et al. (2000) Uroporphyrinogen III synthase: an alternative promoter controls erythroid-specific expression in the murine gene. J. Biol. Chem. 275, 2295 – 2304 Aizencang, G.I. et al. (2000) Human uroporphyrinogen-III synthase: genomic organization, alternative promoters, and erythroid-specific expression. Genomics 70, 223– 231 Hu, Z. et al. (1996) Multiple and tissue-specific promoter control of gonadal and non-gonadal prolactin receptor gene expression. J. Biol. Chem. 271, 10242 – 10246 Hu, Z-Z. et al. (2002) Complex 50 genomic structure of the human prolactin receptor: multiple alternative exons 1 and promoter utilization. Endocrinology 143, 2139 – 2142 Kamat, A. et al. (1999) A 500-bp region, , 40 kb upstream of the human CYP19 (aromatase) gene, mediates placenta-specific expression in transgenic mice. Proc. Natl. Acad. Sci. U. S. A. 96, 4575 – 4580 van de Lagemaat, L.N. et al. Transposable elements promote regulatory variation and diversification of genes with specialized functions in mammals. Trends Genet. (in press) Kubota, A. et al. (1999) Diversity of NK cell receptor repertoire in adult and neonatal mice. J. Immunol. 163, 212– 216 Wilhelm, B.T. et al. (2001) Comparative analysis of the promoter regions and transcriptional start sites of mouse Ly49 genes. Immunogenetics 53, 215– 224 Wang, Y. et al. (1997) A novel, testis-specific mRNA transcript encoding an NH2-terminal truncated nitric-oxide synthase. J. Biol. Chem. 272, 11392 – 11401 Pozniak, C.D. et al. (2000) An anti-apoptotic role for the p53 family member, p73, during developmental neuron death. Science 289, 304– 306 Zaika, A.I. et al. (2002) deltaNp73, a dominant-negative inhibitor of wild-type p53 and TAp73, is up-regulated in human tumors. J. Exp. Med. 196, 765– 780 Nakagawa, T. et al. (2002) Autoinhibitory regulation of p73 by Delta Np73 to modulate cell survival and death through a p73-specific target element within the Delta Np73 promoter. Mol. Cell. Biol. 22, 2575– 2585 Fillippovich, I. et al. (2001) Transactivation-deficient p73alpha (p73Deltaexon2) inhibits apoptosis and competes with p53. Oncogene 20, 514 – 522 Yang, A. et al. (1998) p63, a p53 homolog at 3q27-29, encodes multiple products with transcactivating, death-inducing, and dominant-negative activities. Mol. Cell 2, 305 – 316 Nylander, K. et al. (2002) Differential expression of p63 isoforms in normal tissues and neoplastic cells. J. Pathol. 198, 417– 427 Yang, A. et al. (1999) p63 is essential for regenerative proliferation in limb, craniofacial and epithelial development. Nature 398, 714– 718 Henriksen, M.A. et al. (2002) Negative regulation of STAT92E by an N-terminally truncated STAT protein derived from an alternative promoter site. Genes Dev. 16, 2379 – 2389 Chen, T. et al. (2002) A novel Dnmt3a isoform produced from an alternative promoter localizes to euchromatin and its expression correlates with active de novo methylation. J. Biol. Chem. 277, 38746 – 38754 Wu, Q. and Maniatis, T. (1999) A striking organization of a large family of human neural cadherin-like cell adhesion genes. Cell 97, 779– 790 Tasic, B. et al. (2002) Promoter choice determines splice site selection in protocadherin alpha and gamma pre-mRNA splicing. Mol. Cell 10, 21 – 33 Wang, X. et al. (2002) Molecular mechanisms governing Pcdh-gamma gene expression: Evidence for a multiple promoter and cis-alternative splicing model. Genes Dev. 16, 1890 – 1905 Quelle, D.E. et al. (1995) Alternative reading frames of the INK4a tumor suppressor gene encode two unrelated proteins capable of inducing cell cycle arrest. Cell 83, 993 – 1000 Robertson, K.D. and Jones, P.A. (1999) Tissue-specific alternative splicing in the human INK4a/ARF cell cycle regulatory locus. Oncogene 18, 3810 – 3820

Review

648

TRENDS in Genetics Vol.19 No.11 November 2003

44 Sherr, C.J. (2001) The INK4a/ARF network in tumour suppression. Nat. Rev. Mol. Cell Biol. 2, 731 – 737 45 Nozell, S. and Chen, X. (2002) p21B, a variant of p21Waf1/Cip1 is induced by the p53 family. Oncogene 21, 1285 – 1294 46 Lynch, M. and Conery, J.S. (2000) The evolutionary fate and consequences of duplicate genes. Science 290, 1151– 1155 47 Rosenberg, N. and Jolicoeur, P. (1997) Retroviral pathogenesis. In Retroviruses (Coffin, J.M. et al., eds), pp. 475 – 586, Cold Spring Harbor Press 48 Jordan, I.K. et al. (2003) Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genet. 19, 68 – 72 49 Pecci, A. et al. (2001) Promoter choice influences alternative splicing and determines the balance of isoforms expressed from the mouse bclX gene. J. Biol. Chem. 276, 21062 – 21069 50 Logette, E. et al. (2003) The human caspase-2 gene: alternative promoters, pre-mRNA splicing and AUG usage direct isoform-specific expression. Oncogene 22, 935– 946 51 Cramer, P. et al. (1999) Coupling of transcription with alternative splicing: RNA Pol II promoters modulate SF2/ASF and 9G8 effects on an exonic splicing enhancer. Mol. Cell 4, 251 – 258 52 Nogues, G. et al. (2002) Transcriptional activators differ in their abilities to control alternative splicing. J. Biol. Chem. 277, 43110– 43114 53 Bar, I. et al. (2003) The gene encoding disabled-1 (DAB1), the intracellular adaptor of the Reelin pathway, reveals unusual complexity in human and mouse. J. Biol. Chem. 278, 5802 – 5812

54 Ventura, A. et al. (2002) The p66Shc longevity gene is silenced through epigenetic modifications of an alternative promoter. J. Biol. Chem. 277, 22370 – 22376 55 Luzi, L. et al. (2000) Evolution of the Shc functions from nematode to human. Curr. Opin. Genet. Dev. 10, 668 – 674 56 Chuvpilo, S. et al. (2002) Autoregulation of NFATc1/A expression facilitates effector T cells to escape from rapid apoptosis. Immunity 16, 881– 895 57 Fajas, L. et al. (1997) The organization, promoter analysis, and expression of the human PPARgamma gene. J. Biol. Chem. 272, 18779 – 18789 58 Sundvold, H. and Lien, S. (2001) Identification of a novel peroxisome proliferator-activated receptor (PPAR) gamma promoter in man and transactivation by the nuclear receptor RORalpha1. Biochem. Biophys. Res. Commun. 287, 383 – 390 59 Boj, S.F. et al. (2001) A transcription factor regulatory circuit in differentiated pancreatic cells. Proc. Natl. Acad. Sci. U. S. A. 98, 14481 – 14486 60 Nakhei, H. et al. (1998) An alternative splice variant of the tissue specific transcription factor HNF4alpha predominates in undifferentiated murine cell types. Nucleic Acids Res. 26, 497 – 504 61 Pan, Y-X. et al. (2001) Generation of the mu opioid receptor (MOR-1) protein by three new splice variants of the Oprm gene. Proc. Natl. Acad. Sci. U. S. A. 98, 14084 – 14089 62 Kogerman, P. et al. (2002) Alternative first exons of PTCH1 are differentially regulated in vivo and might confer different functions to the PTCH1 protein. Oncogene 21, 6007 – 6016

Newsletters – a service from BioMedNet, Current Opinion and Trends Now available, direct to your e-mail box: free e-mail newsletters highlighting the latest developments in rapidly moving fields of research. Teams of editors from the Current Opinion and Trends journals have combined to compile news from a broad perspective: Evolution of Infectious Diseases Newsletter – from emerging infectious diseases, host-pathogen interactions and the impact of genomics to the evolution of drug resistance. Transcriptional Control Newsletter – from homeobox genes and epigenetic control to chromatin remodelling complexes and anti-sense therapy. Comparative Genomics Newsletter – from the evolution of genomes by gene transfer and duplication to the discovery of gene function and common developmental pathways. Genetic Analysis Techniques Newsletter – from microarrays and high-throughput analysis to SNPs and PCR. Each newsletter is sent out six times a year and features news articles from the BioMedNet newsdesk and highlights from the review content of the Current Opinion and Trends journals. Access to full text journal articles is available through your institution. To sign up for Newsletters and other e-mail alerts, visit http://news.bmn.com/alerts http://tigs.trends.com