Sequence and analysis of the murine Hmgiy (Hmga1) gene locus

Sequence and analysis of the murine Hmgiy (Hmga1) gene locus

Gene 271 (2001) 51±58 www.elsevier.com/locate/gene Sequence and analysis of the murine Hmgiy (Hmga1) gene locus q Marisa L. Pedulla a, Nathan R. Tre...

612KB Sizes 0 Downloads 77 Views

Gene 271 (2001) 51±58

www.elsevier.com/locate/gene

Sequence and analysis of the murine Hmgiy (Hmga1) gene locus q Marisa L. Pedulla a, Nathan R. Treff b, Linda M.S. Resar c, Raymond Reeves b,* a

Pittsburgh Bacteriophage Institute, Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA Biochemistry and Biophysics, School of Molecular Biosciences, Washington State University, Pullman, WA 99164-4660, USA c Hematology Division, Departments of Pediatrics, Molecular Biology and Genetics, Oncology, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA b

Received 14 February 2001; received in revised form 30 March 2001; accepted 20 April 2001 Received by A.J. van Wijnen

Abstract The HMGIY non-histone proteins play important roles as architectural transcription factors that regulate gene transcription in mammalian cells and also act as host-supplied cofactors necessary for retroviral integration. The genes coding for the HMGIY proteins are protooncogenes, and their aberrant or over-expression is correlated with both neoplastic transformation and metastatic progression in a wide variety of tumors. Here, we report the ®rst complete sequence of the murine Hmgiy (a.k.a. Hmga1) gene and provide a detailed comparison of this with the sequence and organization of the human HMGIY gene, including an analysis of its promoter region with the previously unreported 5 0 upstream region of the human gene. These analyses reveal a remarkable degree of overall sequence conservation in both the protein coding and promoter regions of the murine and human genes, including conservation of the c-Myc binding site that has been demonstrated to regulate murine Hmgiy transcription (Wood et al., 2000. Mol. Cell. Biol. 20, 5490±5502). The promoters of both genes contain other conserved transcription factor binding sites that may also represent important cis-regulatory elements. Two exons present in the 5 0 untranslated region of the human gene, however, are missing from the murine gene, suggesting that these two closely related mammalian species regulate transcription of their Hmgiy genes in an individualistic manner. q 2001 Elsevier Science B.V. All rights reserved. Keywords: Cancer; Chromatin proteins; High mobility group (HMG) proteins; Genomic clones; c-Myc

1. Introduction High mobility group (HMG) proteins are eukaryotic DNA-binding proteins that fall into three families, HMGIY, HMG-1/2 and HMG-14/17. (reviewed in: Bustin and Reeves, 1996; Reeves and Beckerbauer, 2001). The HMGIY proteins were initially discovered by Lund and his colleagues (Lund et al., 1983) in proliferating HeLa cells and were later demonstrated to bind in a preferential manner to A´T-rich alpha satellite DNA in vitro (Strauss and Varshavsky, 1984). In vivo, the HMGIY proteins are localized in the A´T-rich G/Q and C- bands of human and mouse metaphase chromosomes (Disney et al., 1989). HMG-I and HMG-Y are isoform proteins (which only differ by an interAbbreviations: bp, base pair; HMG, high mobility group; s, seconds; min, minute; nt, nucleotide q GenBank Accession numbers: The DNA sequence of the murine Hmgiy gene, including the upstream promoter region, is available through Accession number AF285780. The DNA sequence of the 5 0 promoter region of the human HMGIY gene is deposited at Accession Number AF286367. * Corresponding author. Tel.: 11-509-335-1948; fax: 11-509-335-9688. E-mail address: [email protected] (R. Reeves).

nal deletion of 11 amino acids in the latter) that are translated from alternatively spliced mRNAs transcribed from a single gene, HMGIY (a.k.a. HMGA1 under a new nomenclature; Bustin, 2001) located at chromosomal locus 6p21 in humans (Friedmann et al., 1993). In the mouse, the Hmgiy (a.k.a. Hmga1) gene is located near the t-locus of chromosome 17 (Johnson et al., 1992). Another HMGIY family member, the HMGI-C protein, is coded for by a different gene, HMGI-C (a.k.a. Hmga2; Bustin, 2001), located at chromosomal locus 12q14-15 in humans (Chau et al., 1995) and the pygmy locus in the distal end of chromosome 10 in mice (Zhou et al., 1995). Two features that distinguish HMGIY proteins from other HMG proteins are that they lack structure when free in solution, and each protein contains three independent DNA-binding domains referred to as A´T-hooks that preferentially bind to the narrow minor groove of A´T-rich DNA (Reeves and Nissen, 1990; Huth et al., 1997). The A´T-hook peptide motif is highly conserved in evolution and is also found in `structured' proteins, many of which are transcription factors, in organisms ranging from bacteria to humans (Aravind and Landsman, 1998). The HMGIY proteins recognize structural features of

0378-1119/01/$ - see front matter q 2001 Elsevier Science B.V. All rights reserved. PII: S 0378-111 9(01)00500-5

52

M.L. Pedulla et al. / Gene 271 (2001) 51±58

DNA substrates rather than the sequence of nucleotides (Reeves and Wolffe, 1996; Hill et al., 1999). In vitro and in vivo the HMGIY proteins also have the ability to speci®cally interact with both nucleosome core particles (Reeves and Wolffe, 1996; Reeves et al., 2000) and many transcription factors (Bustin and Reeves, 1996). Speci®c proteinprotein and protein-DNA interactions allow the HMGIY proteins to function as `architectural transcription factors' in the formation of stereo-speci®c complexes called `enhanceosomes' on gene promoter/enhancer regions (Thanos and Maniatis, 1995), thereby regulating the expression of a large number of mammalian genes (Reeves and Beckerbauer, 2001). Aberrant or over-expression of the HMGI/Y proteins is strongly correlated with both the cancerous transformation and metastatic progression of many types of tumors (reviewed in: (Hess, 1998; Tallini and Dal Cin, 1999; Reeves, 2000)). Moreover, transgenic cells with overexpression of full-length human HMG-I and HMG-Y proteins have recently been demonstrated to form tumors in nude mice (Wood et al., 2000a,b; Reeves et al., 2001), and truncated or chimeric (as well as full-length) forms of the HMGI-C protein induce neoplastic transformation of mouse ®broblasts cells in culture (Fedele et al., 1998; Wood et al., 2000a) and tumors in nude mice (Wood et al., 2000a). In addition, expression of the truncated HMGI-C gene in transgenic mice induces gigantism and lipomatosis (Battista et al., 1999). The HMGIY proteins also participate in retroviral integration into the host cell genome, a non-homologous recombination event. In vivo, HMGIY is present in complexes with HIV-1 viral DNA and integrase protein and is required for ef®cient integration reactions in vitro (Farnet and Bushman, 1997; Hindmarsh et al., 1999). This situation is analogous to that of bacteriophages that utilize host DNA-binding proteins and phage-encoded integrase enzymes to construct recombinagenic higher-order nucleoprotein complexes (Pedulla and Hatfull, 1998). Although the overall intron/exon organization of the murine Hmgiy gene was recently reported (Liu et al., 2000), the sequence of the whole gene, including the noncoding regions, has not previously been determined nor has any detailed comparison of the gene's sequence been made with that of the human gene. Through a combination of genomic library screening, genomic walking and PCR ampli®cation techniques, we have isolated and sequenced the entire murine Hmgiy gene and its 5 0 upstream promoter regulatory region. Here, we report this complete genomic sequence and present a detailed comparison of this with the sequence and organization of the human HMGIY gene. In addition, we determined the DNA sequence of the promoter elements 5 0 upstream of the ®rst exon of the human HMGIY gene and present a comparison of this region with the corresponding 5 0 upstream region of the mouse gene. Together, these analyses revealed that there is a remarkable degree of overall sequence conservation in both the protein coding

and promoter regions of the murine and human genes, but also demonstrated structural differences in their organization that may be important for differential regulation of transcription in these two mammalian species. 2. Materials and methods 2.1. Cloning of the mouse and human 5 0 promoter and enhancer regions The HMG-I/Y-pBluescript II KS (2) vector previously described by Wood et al. (2000b), which contains inserts of murine genomic DNA from approximately 2 kb upstream, to 2.9 kb downstream, of the transcription start site (at nucleotide (nt) 11), was employed to obtain the sequences of the mouse 5 0 promoter/enhancer region, exons I and II, all of intron 1 and most of intron 2 (see Fig. 1A). The sequence of the promoter/enhancer region of the human gene 5 0 upstream of the transcription start site at nt 11 (Fig. 1B) was determined from a cloned 2 kb genomic DNA fragment isolated by hybridization from a male Caucasian placental DNA library (Stratagene, Cat #946203) whose 3 0 end is the EcoRI site located upstream of exon I of the published HMGIY genomic clone (Friedmann et al., 1993). 2.2. Oligonucleotide synthesis and sequencing Both strands of the complete murine Hmgiy gene, as well as the 5 0 promoter region of the human Hmgiy genomic clone, were sequenced using sense and antisense synthetic oligonucleotide primers and a Perkin±Elmer/Applied BioSystems, Inc.(PE/ABI) Foster City, CA) Model 377 Automated DNA Sequencer. The Big Dyee Terminator Cycle Sequencing Ready Reaction Kit (PE/ABI) was used following the manufacturer's instructions. Synthetic oligonucleotides used for both PCR and sequencing reactions were synthesized on a PE/ABI Model 380B DNA synthesizer. 2.3. Sequence analysis and comparisons The SeqWebe Sequence Analysis Program (version 1.1), developed by the Genetics Computer Group (GCG), as part of the Wisconsin Packagee (version 10) was used for sequence alignments and sequence comparisons. Potential transcription factor binding sites were identi®ed using either the FindPatterns program and the Transcription Factor Binding Site Database of the GCG Wisconsin Package or the MatInspector Program (version 2.2; GSF- National Research Center for Environmental Health, Germany) as described by Quandt et al. (1995). The Wisconsin Package Gap program was used for making optimal alignments of sequences. Parameters for the best-®t local alignment and percentage identity of two nucleic acid sequences employed the local homology algorithm of Smith and Waterman (Smith and Waterman, 1981) and included a gap creation

M.L. Pedulla et al. / Gene 271 (2001) 51±58

53

Fig. 1. Diagram of the overall intron/exon organization of (A) the mouse gene and transcriptional start sites 1, 2, 3, and 4 (Wood et al., 2000b) and 5 (Johnson et al., 1988) and (B) the human gene and transcriptional start sites (Friedmann et al., 1993). Under the current numbering system, the transcription start sites for the mouse gene are: nt 143 (start site 3), nt 1159 (start site 1), nt 1162 (start site 2), nt 1172 (start site 4) and nt 1287 (start site 5). Roman numerals denote exons (displayed as boxes) with the small Arabic numbers ¯anking the boxes denoting the nucleotides at the beginning and end of each exon. Introns are displayed as lines connecting the exon boxes and their sizes are indicated by large Arabic numbers. The human introns 3 and 5 are shortened as indicated with line breaks. In both the mouse and human genes the nucleotide at the beginning of exon I is arbitrarily given the designation nt 11. The bent arrows indicate the transcription start sites detected in vivo for both the mouse and human genes. The key to the exon regions (bottom) are as follows: open boxes, untranslated regions transcribed into mRNA; gray boxes, protein-coding regions of exons; black boxes, regions of exons that are spliced out of transcripts to produce the HMG-Y isoform protein. The 1 kb size marker applies to both panels.

penalty of 50, a gap extension penalty of 3, and no penalization for gap extensions longer than 20.

3. Results 3.1. The murine Hmgiy gene: isolation and characterization of the intron/exon organization Based on the known structure of the human HMGIY gene (Fig. 1B; (Friedmann et al., 1993)), we designed mouse PCR primer sets (Johnson et al., 1988) to amplify unique murine introns, initially assuming that the intron/exon structures of the mouse and human genes would be similar. However, as shown in Fig. 1A, utilizing this approach it was discovered that although the exon/intron organization of the 3 0 half of the mouse gene was similar to that of the human gene, the organization of its 5 0 half was somewhat different in that it was missing two exons (e.g. human exons III and IV) that were present in the corresponding region of the human gene. This unexpected organization, therefore, made it necessary to employ a combination of PCR ampli®cation reactions,

genomic DNA walking strategies and sequence analysis of a genomic clone containing the 5 0 end of the gene (Wood et al., 2000b), in order to obtain the complete genomic sequence, including the promoter region. Genetic analyses have localized the authentic Hmgiy gene near the t-locus of chromosome 17 in mice (Johnson et al., 1992). PCR ampli®cation of unique sequence fragments from a YAC genomic library of mouse chromosome 17 con®rmed both the identity and chromosomal location of the gene reported here (data not shown). The similarities and differences in intron/exon organization of the mouse and human Hmgiy genes are summarized in Fig. 1. Fig. 1A shows a diagram of the organization of the mouse gene and the approximate positions of the transcriptional start sites identi®ed by Johnson et al. (Johnson et al., 1988) and Wood et al. (2000b). For comparison, Fig. 1B, shows a diagram of the organization of the human gene, including its transcription start sites (Friedmann et al., 1993; Ogram and Reeves, 1995). The mouse gene, from the beginning of exon I (at nt 11) to the end of exon VI (7088 bp), is considerably shorter than the corresponding region of the human gene (9346 bp) with most of the differ-

54

M.L. Pedulla et al. / Gene 271 (2001) 51±58

ence being accounted for by variation in the lengths of various introns. As noted above, however, the most noticeable difference between the two genes is that the mouse gene appears to lack the equivalent of the human exons III and IV. Thus, the mouse gene has ®ve introns separating six exons (Fig. 1A), whereas the human gene contains seven introns separating eight exons (Fig. 1B) with the protein coding regions of the genes starting in murine exon III and human exon V, respectively. A similar overall intron/ exon organization for the mouse Hmgiy gene was also recently reported by Liu et al. (Liu et al., 2000), but with some notable differences, as discussed below. 3.2. The exon and intron sequences of the mouse and human genes are highly conserved A compilation of the lengths and percent nucleotide sequence identities of various regions of the human and mouse Hmgiy genes are shown in Table 1. The mouse intron 2 was compared with the region spanning human introns 2 through 4, including the human exons III and IV not found in the mouse gene (Fig. 1). As is evident from this table, there is a high degree of similarity (,80% identity) between the exons containing protein coding information in the murine and human Hmgiy genes, with an even higher degree of conservation (.90% identity) when only the actual protein-coding portions of these exons are considered. Overall, the protein-coding nucleotide sequences of the human and murine genes are 91.4% identical whereas (as a consequence of third base nucleotide `wobble') the amino acid sequences of the encoded proteins are 96.3% identical with only four varying amino acids between the mouse and human HMGI isoform proteins. In addition, the DNA sequences of

exon I of the human and mouse genes are nearly identical (95.6%), suggesting that this non-coding exon may have a conserved regulatory function in both species. For this reason, the ®rst nucleotide of exon I in both species has been designated nt 11 and is considered to be the beginning of the Hmgiy gene. Individual intron sequences, even given the variations in their lengths between the species, have similarities ranging from 69.1% for murine intron 4 and human intron 6, to 82% for murine and human intron 1. Interestingly, although intron 3 of the mouse and intron 5 of the human genes share approximately 72% sequence identity, as noted above, there are several islands of unusual nucleotide composition and sequence in the murine intron that are absent from the human intron. For example, murine intron 3 contains several long homopolymer nucleotide stretches (e.g. A15, G12, G9, etc.), as well as several G/C- and G/A-rich direct, or inverted, repeat sequences that are absent from intron 5 of the human gene. These unusual murine-speci®c sequences are located in the intron that is alternatively spliced out of transcripts to produce either the HMG-I or HMG-Y isoform proteins. The role(s) played by these sequences in the mouse remains to be determined, although a possible function could be in transcript processing. 3.3. The 5 0 upstream promoter regions of the mouse and human Hmgiy genes Fig. 2A presents a diagrammatic comparison of the human and mouse 5 0 upstream promoters regions and shows the percentage sequence identity of various areas and indicates some of the numerous transcription factor consensus recognition sequences that are conserved between the two species. A number of lines of evidence

Table 1 A comparison of the size (in bp) and the overall percent nucleotide sequence identity of the individual introns and exons of the mouse and human Hmgiy genes. The percentages of nucleotide sequence identity in the protein-coding regions of the exons are also given. For these calculations, the mouse intron 2 sequence was compared with the sequence of the human gene corresponding to all of intron 2, exon III, intron 3, exon IV and intron 4 Human

Mouse Size (bp)

Exon I Intron 1 Exon II Intron 2 Exon III Intron 3 Exon IV Intron 4 Exon V Exon V 0 Intron 5 Exon VI Intron 6 Exon VII Intron 7 Exon VIII a

91 188 164 298 175 1489 100 1344 147 180 1800 84 675 51 1311 1396

Nucleotide percent identity Size (bp)

Overall

Exon I Intron 1 Exon II

96 190 168

95.6 82 81.9

Intron 2 a

2468

76.5

Exon III Exon III 0 Intron 3 Exon IV Intron 4 Exon V Intron 5 Exon VI

138 171 1240 84 449 51 938 1233

80.4 82.5 72.5 90.5 69.1 90.2 72.1 78.5

The mouse intron 2 sequence was compared with all of the human intron 2, exon III, intron 3, exon IV and intron 4 sequences.

Coding region

91.2 91.1 90.5 90.2 94.4

M.L. Pedulla et al. / Gene 271 (2001) 51±58

55

Fig. 2. (A) Diagrammatic comparison of the human and mouse 5 0 upstream promoters regions illustrating the percentage sequence identity of various areas and indicating some of the numerous transcription factor recognition sequences that are conserved between the two species. The middle bar (black) of the diagram represents the entire sequenced portion (4192 bp) of the human gene promoter. Above and below the middle bar are expanded regions of the human promoter (black bars) shown in comparison with corresponding homologous regions of the mouse promoter (open bars) with some of the consensus transcription factor binding sites that are conserved between these species indicated. The MatInspector Program (Quandt et al., 1995) was used to identify the conserved sequences. Detailed information about each of the putative transcription factors that bind to these consensus sites can be retrieved from the TRANSFEC database at `http:://transfect.gbf.de'. (B). A comparison of the sequence of the promoter region of the human gene between nt 21705 and nt 2854 (upper line) and the promoter of the mouse gene between nt 21444 and nt 2791 (lower line). In both the mouse and human genes nt 11 designates the ®rst nucleotide of exon I. The two sequences are 77.7% identical and contain numerous conserved consensus binding sites for transcription factors including c-Myc, AP1 and AP-1 like, which are underlined and labeled above the sequences. The promoter regions depicted here are bracketed in Fig. 2A.

suggest that many of these putative transcription factor binding sites may represent important in vivo cis-regulatory elements that have been functionally conserved during

evolution. For example, the murine promoter contains an E-box binding site for the c-Myc transcription factor and its protein partner c-Max, as previously reported (Wood et

56

M.L. Pedulla et al. / Gene 271 (2001) 51±58

al., 2000b). Resar and her colleagues have recently demonstrated that both the E-box and the c-Myc/Max proteins regulate murine Hmgiy gene transcriptional expression in vivo. In the current numbering system (Fig. 2B), the Ebox is located at bp 21179 in the murine promoter when nt 11 is used as the reference nucleotide. Under this numbering system, it should be noted that transcription start site 1 identi®ed by Wood et al. (2000b) is now located at nt 1159 (Fig. 1). When a genomic clone (isolated by Friedmann et al. (Friedmann et al., 1993)) covering the region 5 0 upstream of nt 11 of the human gene was sequenced, it was also found to contain an E-box situated in a region that has high homology to the E-box-containing area of the mouse promoter. Fig. 2B shows the results of a Wisconsin Package Gap program comparison of the sequence of the promoter region of the human gene from bp 21705 to 2854 upstream of exon I (upper line) with the promoter region of the mouse gene from bp 21444 to 2791 upstream of its exon I (lower line). This analysis program makes optimal alignments of sequences by introducing gaps and indicates that the human and mouse promoter sequences are 77% identical in this region. As shown, this region contains conserved binding sites for a variety of different transcription factors, including an E-box and two AP-1 related sequences, which are similarly positioned in the murine and human promoters. The conservation of the E-box in the promoter region of the human gene strongly suggests that, as in mouse cells, the human Hmgiy gene is under the transcriptional regulation of c-Myc/Max transcription factors. In addition, HMGIY protein expression was shown to be increased in human Burkitt's lymphoma cells with c-Myc over-expression, further suggesting similar regulation of human and murine HMGIY by c-Myc and Max (Wood et al., 2000). Likewise, conservation of the AP-1 and AP-1-like sites is consistent with the fact that transcription of the Hmgiy genes in both species is induced by stimuli (such as growth factors and phorbol esters) that activate AP-1 transcription factors (Johnson et al., 1990; Lanahan et al., 1992; Ogram and Reeves, 1995; Cmarik et al., 1998).

differences in experimental methodologies. Liu and coinvestigators derived the genomic length, in part, from restriction maps and the apparent sizes of PCR ampli®ed DNA fragments, whereas in this study the entire gene was sequenced on both complementary strands of the DNA. 4.2. The mouse Hmgiy gene sequence is highly conserved Aside from lacking exon sequences corresponding to human exons III and IV (Fig. 1), and thus having only six rather than eight exons, the nucleotide sequences of the murine and human Hmgiy genes are remarkably similar in their introns, exons and 5 0 upstream promoter regions (Table 1). This high degree of sequence identity implies that each gene region has an evolutionarily conserved function. For the protein coding regions such conservation is expected since the amino acid sequences of the mouse and human HMG-I and HMG-Y proteins are approximately 96% identical and, furthermore, these proteins appear to serve similar roles in mouse and human cells (Bustin and Reeves, 1996). The high degree of conservation in the 5 0 upstream promoter region is also reasonable considering that transcription of the Hmgiy gene is induced in both mouse and human cells by such stimuli as serum, growth factors, phorbol esters and mitogens (reviewed in; (Reeves and Beckerbauer, 2001)). Interestingly, there is also a high degree of sequence identity (.76%) between intron 2 of the mouse gene and the entire region of the human gene between the end of exon II and the beginning of exon V, even though this region lacks two introns found in the human gene. Such active conservation of intron nucleotide sequences suggests that this region may serve some common functional purpose in the mouse and human genes. Nevertheless, the absence of two human introns, one of which (exon III) is a transcription start site in human cells (Johnson et al., 1989) also implies that under Table 2 The sequence lengths of various regions of the mouse Hmgiy gene as determined by Liu et al. (2000) compared with those reported here Size (bp)

4. Discussion 4.1. Genome organization As illustrated by the data in Fig. 1 and Table 2, the overall genomic exon/intron structure of the murine Hmgiy gene reported here is in agreement with that recently reported by Liu et al. (Liu et al., 2000), with some notable exceptions. The present gene is 7088 bp in length whereas that reported by Liu et al. (Liu et al., 2000) is 7287 bp. This discrepancy is due primarily to signi®cant differences in the apparent lengths of introns 2, 3 and 5 (Table 2). The reason for these differences is unknown but may be attributed to either polymorphic variation between mouse strains or to

Data reported here

Liu et al. (2000)

Exon I Intron 1 Exon II Intron 2 Exon III Exon III 0 Intron 3 Exon IV Intron 4 Exon V Intron 5 Exon VI

96 190 168 2468 138 171 1240 84 449 51 938 1233

94 189 165 2800 138 171 1300 84 500 51 700 1233

Total length

7088

7287

M.L. Pedulla et al. / Gene 271 (2001) 51±58

some in vivo circumstances the murine and human genes may be under different transcriptional controls. 4.3. Transcription start sites Primer extension and S1 nuclease analyses by Wood et al. (2000b) demonstrated that the mouse Hmgiy gene has four in vivo transcription start sites which are situated, according to the current gene numbering system (Fig. 1), at nucleotides 143 (start site 3), 1159 (start site 1), 1162 (start site 2) and 1172 (start site 4). A ®fth in vivo transcription start site was identi®ed at the beginning of exon II (at nt 1287) by the cloning of cDNA transcripts from mouse cells (Johnson et al., 1988). Multiple transcription start sites also occur in the regions of exons I and II in human cells (Johnson et al., 1989; Ogram and Reeves, 1995; Holth et al., 1997)), and these human start sites overlap the mouse start sites noted above. The above transcription sites, and upstream gene regulatory elements involved in serum and growth factor stimulation, differ signi®cantly from those recently reported by Liu and co-investigators (Liu et al., 2000). Employing a primer extension procedure using murine embryonic cell RNA, Liu et al. detected three transcription start sites in intron 2 located 86, 120, and 128 nucleotides upstream of the AUG translation initiation site in exon III. In contrast, these transcription start sites were not detected in several mouse tissue culture cell lines of somatic origin by other investigators using primer extension and S1 nuclease analysis (Johnson et al., 1988; Wood et al., 2000b). Of particular note, the region upstream of exon III of the mouse gene was not necessary for the serum induction of HMG-IY in transfection experiments (Wood et al., 2000b). Nevertheless, since it is known that transcription can initiate from exon III in human cells (Johnson et al., 1989), it is possible that the start sites detected in murine intron 2 may be con®ned to messages expressed only in mouse embryonic tissues. Further investigations are warranted to determine the biologic signi®cance of the transcription start sites located immediately 5 0 upstream of exon III in the murine gene. Acknowledgements The authors wish to acknowledge Dr. Michael Friedmann, Dale Edberg, Derek Pouchnik, Dustin Thomas, Jason Roth, Lois Beckerbauer and Mark Nissen for their many contributions to this project. This work was supported by NIH grant # GM46352 to R. Reeves, NIH NRSA #1 F32 HD08613-01 to M.L. Pedulla, NIH grants # K11 CA59793-5 and R29 CA76130-01 to L.M.S. Resar. N.R. Treff was supported by the NIH Training in Biotechnology Grant # T32-GM08336. References Aravind, L., Landsman, D., 1998. AT-hook motifs identi®ed in a wide variety of DNA-binding proteins. Nucleic Acids Res. 26, 4413±4421. Battista, S., Fidanza, V., Fedele, M., Klein-Szanto, A.J.P., Outwater, E.,

57

Brunner, H., Santoro, M., Croce, C.M., Fusco, A., 1999. The expression of a truncated HMGI-C gene induces gigantism associated with lipomatosis. Cancer Res. 59, 4793±4797. Bustin, M., 2001. Revised nomenclature for high mobility group (HMG) chromosomal proteins. Trends Biochem. Sci. 26, 152±153. Bustin, M., Reeves, R., 1996. High-mobility-group chromosomal proteins: architectural components that facilitate chromatin function. Prog. Nucleic Acid. Res. Mol. Biol. 54, 35±100. Chau, K.Y., Patel, U.A., Lee, K.L., Lam, H.Y., Crane-Robinson, C., 1995. The gene for the human architectural transcription factor HMGI-C consists of ®ve exons each coding for a distinct functional element. Nucleic Acids Res. 23, 4262±4266. Cmarik, J.L., Li, Y., Ogram, S.A., Min, H., Reeves, R., Colburn, N.H., 1998. Tumor promoter induces high mobility group HMG-Y protein expression in transformation-sensitive but not -resistant cells. Oncogene 16, 3387±3396. Disney, J.E., Johnson, K.R., Magnuson, N.S., Sylvester, S.R., Reeves, R., 1989. High-mobility group protein HMG-I localizes to G/Q- and Cbands of human and mouse chromosomes. J. Cell Biol. 109, 1975±1982. Farnet, C.M., Bushman, F.D., 1997. HIV-1 cDNA integration: requirement of HMG I(Y) protein for function of preintegration complexes in vitro. Cell 88, 483±492. Fedele, M., Berlingieri, M.T., Scala, S., Chiariotti, L., Viglietto, G., Rippel, V., Bullerdiek, J., Santoro, M., Fusco, A., 1998. Truncated and chimeric HMGI-C genes induce neoplastic transformation of NIH3T3 murine ®broblasts. Oncogene 17, 413±418. Friedmann, M., Holth, L.T., Zoghbi, H.Y., Reeves, R., 1993. Organization, inducible-expression and chromosome localization of the human HMGI(Y) nonhistone protein gene. Nucleic Acids Res. 21, 4259±4267. Hess, J.L., 1998. Chromosomal translocations in benign tumors. Am. J. Clin. Path. 109, 251±261. Hill, D.A., Pedulla, M.L., Reeves, R., 1999. Directional binding of HMGI(Y) on four-way junction DNA and the molecular basis for competitive binding with HMG-1 and histone H1. Nucleic Acids Res. 27, 2135± 2144. Hindmarsh, P., Ridky, T., Reeves, R., Andrake, M., Skalka, A.M., Leis, J., 1999. HMG protein family members stimulate human immunode®ciency virus type 1 and avian sarcoma virus concerted DNA integration in vitro. J. Virol. 73, 2994±3003. Holth, L.T., Thorlacius, A.E., Reeves, R., 1997. Effects of epidermal growth factor and estrogen on the regulation of the HMG-I/Y gene in human mammary epithelial cell lines. DNA Cell Biol. 16, 1299±1309. Huth, J.R., Bewley, C.A., Nissen, M.S., Evans, J.N., Reeves, R., Gronenborn, A.M., Clore, G.M., 1997. The solution structure of an HMG-(Y)DNA complex de®nes a new architectural minor groove binding motif. Nat. Struct. Biol. 4, 657±665. Johnson, K.R., Cook, S.A., Davisson, M.T., 1992. Chromosomal localization of the murine gene and two related sequences encoding high-mobility-group I and Y proteins. Genomics 12, 503±509. Johnson, K.R., Disney, J.E., Wyatt, C.R., Reeves, R., 1990. Expression of mRNAs encoding mammalian chromosomal proteins HMG-I and HMG-Y during cellular proliferation. Exp. Cell Res. 187, 69±76. Johnson, K.R., Lehn, D.A., Elton, T.S., Barr, P.J., Reeves, R., 1988. Complete murine cDNA sequence, genomic structure, and tissue expression of the high mobility group protein HMG-I(Y). J. Biol. Chem. 263, 18338±18342. Johnson, K.R., Lehn, D.A., Reeves, R., 1989. Alternative processing of mRNAs encoding mammalian chromosomal high-mobility-group proteins HMG-I and HMG-Y. Mol. Cell Biol. 9, 2114±2123. Lanahan, A., Williams, J.B., Sanders, L.K., Nathans, D., 1992. Growth factor-induced delayed early response genes. Mol. Cell Biol. 12, 3919±3929. Liu, J., Schiltz, J.F., Shah, P.C., Benson, K.F., Chada, K.K., 2000. Genomic structure and expression of the murine Hmgi(y) gene. Gene 246, 197± 207. Lund, T., Holtlund, J., Fredriksen, M., Laland, S.G., 1983. On the presence

58

M.L. Pedulla et al. / Gene 271 (2001) 51±58

of two new high mobility group-like proteins in HeLa S3 cells. FEBS Lett. 152, 163±167. Ogram, S.A., Reeves, R., 1995. Differential regulation of a multipromoter gene: selective 12-O-tetradecanoylphrobol-13-acetate induction of a single transcription start site in the HMG-I/Y gene. J. Biol. Chem. 270, 14235±14242. Pedulla, M.L., Hatfull, G.F., 1998. Characterization of the mIHF gene of Mycobacterium smegmatis. J. Bacteriol. 180, 5473±5477. Quandt, K., Frech, K., Karas, H., Wingender, E., Werberm, T., 1995. MatInd and MatInspector- New fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res. 23, 4878±4884. Reeves, R., 2000. Structure and Function of the HMGI(Y) Family of Architectural Transcription Factors. Environ. Health Perspect. 108, 803±809. Reeves, R., Beckerbauer, L., 2001. HMGI/Y proteins: ¯exible regulators of transcription and chromatin structure. Biochim. Biophys. Acta 1519, 13±29. Reeves, R., Leonard, W.J., Nissen, M.S., 2000. Binding of HMGI(Y) imparts architectural speci®city to a positioned nucleosome on the promoter of the human interleukin-2 receptor a gene. Mol. Cell Biol. 20, 4666±4679. Reeves, R., Edberg, D.D., Li, Y., 2001. Architectural Transcription Factor HMGI(Y) Promotes Tumor Progression and Mesenchymal Transition of Human Epithelial Cells. Mol. Cell Biol. 21, 575±594.

Reeves, R., Nissen, M.S., 1990. The A.T-DNA-binding domain of mammalian high mobility group I chromosomal proteins: a novel peptide motif for recognizing DNA structure. J. Biol. Chem. 265, 8573±8582. Reeves, R., Wolffe, A.P., 1996. Substrate structure in¯uences binding of the non-histone protein HMG-I(Y) to free and nucleosomal DNA. Biochemistry 35, 5063±5074. Smith, T.F., Waterman, M.S., 1981. Comparison of bio-sequences. Adv. Appl. Math. 2, 482±489. Strauss, F., Varshavsky, A., 1984. A protein binds to a satellite DNA repeat at three speci®c sites that would be brought into mutual proximity by DNA folding in the nucleosome. Cell 37, 889±901. Tallini, G., Dal Cin, P., 1999. HMGI(Y) and HMGI-C dysregulation: a common occurrence in human tumors. Adv. Anat. Pathol. 6, 237±246. Thanos, D., Maniatis, T., 1995. Virus induction of human IFN beta gene expression requires the assembly of an enhanceosome. Cell 83, 1091± 1100. Wood, L.J., Maher, J.F., Bunton, T.E., Resar, L.M., 2000a. The oncogenic properties of the HMG-I gene family. Cancer Res. 60, 4256±4261. Wood, L.J., Mukerjee, M., Dolde, C.E., Xu, Y., Maher, J.F., Bunton, T.E., Williams, J.B., Resar, L.M., 2000b. HMG-I/Y, a new c-Myc target gene and potential oncogene. Mol. Cell. Biol. 20, 5490±5502. Zhou, X., Benson, K.F., Ashar, H.R., Chada, K., 1995. Mutation responsible for the mouse pygmy phenotype in the developmentally regulated factor HMGI-C. Nature 376, 771±774.