GENOMICS
52, 298 –304 (1998) GE985463
ARTICLE NO.
Identification of a Novel Highly Conserved Gene in the Centromeric Part of the Major Histocompatibility Complex Lutz Walter1 and Eberhard Gu¨nther Division of Immunogenetics, University of Go¨ttingen, Goßlerstrasse 12d, D-37073 Go¨ttingen, Germany Received March 31, 1998; accepted July 5, 1998
A novel highly conserved gene designated Sacm2l alias Are1) has been identified and fine mapped in the centromeric part of the major histocompatibility complex in rat, human, and mouse. Sacm2l is closely linked to the ribosomal protein S18 gene Rps18 with a distance of about 450 bp between the respective translational start points. Numerous Sacm2l-homologous EST sequences can be identified in the database. Northern blot experiments of rat Sacm2l revealed a transcript of 3 kb in each organ tested, and by RT-PCR differentially spliced products could be detected in testis RNA. The deduced amino acid sequence of the rat Sacm2l gene shows a putative coiled-coil region and significant homology to a putative Caenorhabditis elegans protein and the yeast SAC2 protein. © 1998 Academic Press
INTRODUCTION
The major histocompatibility complex (MHC) has been divided into the class I, II, and III regions. The class I and class II gene products are involved in the presentation of foreign antigens to T lymphocytes. The human MHC, the HLA system, comprises about 4000 kb on chromosome 6p21.3, and the regions are organized from centromere to telomere as class II– class III– class I (for review see Trowsdale, 1995). The centromeric part of the MHC contains genes like Rps18, Ring1, Ring2, Ke4, Rxrb, and Col11a2, which are well conserved and show clear orthologous relationships in human, rat, and mouse (Abe et al., 1988; Hanson and Trowsdale, 1991; Walter et al., 1996). Different from human, a second class I gene-containing region, H2-K and RT1-A in mouse and rat, respectively, is found centromeric to the class II region beThe gene symbol SACM2L (alias ARE1) has been approved by the HUGO Nomenclature Committee. Sequence data for this article have been deposited with the EMBL/ GenBank Data Libraries under Accession Nos. AJ223319, AJ223830, AJ223831, and AJ224960. 1 To whom correspondence should be addressed at Abteilung Immungenetik, Universita¨t Go¨ttingen, Goßlerstrasse 12d, 37073 Go¨ttingen, Germany. Telephone: (49) 551-395854. Fax: (49) 551-395852. E-mail:
[email protected]. 0888-7543/98 $25.00 Copyright © 1998 by Academic Press All rights of reproduction in any form reserved.
tween the Rps18 and the Ring1 genes (Hanson and Trowsdale, 1991; Walter et al., 1996). This peculiar organization is assumed to be due to a class I gene translocation from the telomeric part of the MHC during evolution of these rodent species. We have examined the centromeric part of the MHC for the occurrence of novel genes and describe here the identification and characterization of a highly conserved new gene, which will be designated Sacm2l. MATERIALS AND METHODS DNA. Genomic DNA was isolated from livers of inbred rat and mouse strains LEW.1W/Gun and C3H/HeN, respectively, and from human peripheral blood cells. PCR analysis. Conventional PCR was performed using 10 ng of cosmid DNA or 200 ng of genomic DNA and 25 pmol of primers in a 50-ml reaction containing 200 mM each dNTP and 1 unit of Taq DNA polymerase (Boehringer Mannheim, Mannheim, Germany). The PCR profile was 94°C for 60 s and subsequently 30 cycles of 94°C for 30 s, 63°C for 45 s, and 72°C for 90 s. Long-range PCR was carried out with primers Are1-17 and RT1A-2 according to the recommendations of the manufacturer (Boehringer Mannheim). Briefly, 10 ng of cosmid DNA, 15 pmol of each primer, 500 mM each dNTP, and 3 units of a mixture of Taq and Pwo DNA polymerase was used in the following PCR protocol: 1 min denaturation at 93°C and subsequently 30 cycles of denaturation at 93°C for 20 s, annealing at 65°C for 30 s, and elongation at 68°C for 20 min with 20-s increments for each cycle. For long-range PCR with primers Are1-21 and Rps18-1 the reaction mix contained 350 mM each dNTP, and the same PCR profile as above was applied except for an elongation time of 8 min and an increment of 5 s. For RT-PCR 2 ml of first-strand cDNA (see below) was used with 25 pmol of primers Are1-3 and Are1-4, 200 mM each dNTP, and 1 unit of Taq DNA polymerase (Boehringer Mannheim). Cycling protocol was 1 min denaturation at 94°C and subsequently 30 cycles of 94°C for 30 s, 60°C for 45 s, and 72°C for 2 min. Primer sequences, position in nucleotides (figure or accession number) were Are1-3, ACAGTTGCCGGAAGTGAGGCT, 162 to 182 (W73311); Are1-4, CAGAAGTTCGGCTTGTGTTTC, 205 to 224 (W65987); Are1-16, TGTGAACATCCACTTCATCCAA, 175 to 196 (Fig. 2); Are1-17, ATCCTGGAGATGAATGTGCAGAGCGTC, 1462 to 1488 (Fig. 2); Are1-19, CTCCTCCTCCATATCTGAAGCGCC, 70 to 93 (Fig. 2); Are1-20, TTCCTCCTCCATATCTGAGGTCCC, 228 to 251 (Fig. 2); Are1-21, AAGCGATGGTAGAGCTGGATCAGCTG, 2052 to 2077 (Fig. 2); Rps18-1, GTGCTGAAACTTCTCGGGGATCACTAGGGA, 41 to 70 (X57529); Rps18-3, GAACTTTTCAGGGATCACTAGAGA, 50 to 73 (X69150); Rps18-4, TGGAACTTCTCAGGGATCACTAGAGA, 445 to 470 (M76762); RT1A-2, GCTGCCTGAGCCACTTTCCCAGAGATGG, 2399 to 2426 (X82669); hSACM2L-2,
298
299
A NOVEL HIGHLY CONSERVED GENE IN THE MHC
FIG. 1. Fine mapping of the Sacm2l gene in the centromeric part of the rat and human MHC. The rat data are based on two cosmid contigs of the RT1u haplotype (Walter and Gu¨nther, 1995; Walter et al., 1996). The gap between cosmids 9b1 and 26.1 is bridged by a Rps18and Ring1-containing rat PAC clone (to be published). The class I gene (RT1-A)-containing translocated material is indicated by a dotted line (and is absent from the human MHC). For Tapasin and Bing4 see text. The HLA map, based on Campbell and Trowsdale (1997) and Herberg et al. (1998), shows only genes that are also known in the rat. Information about sequence and transcriptional orientation of human BING4 was obtained from database Accession No. Z97184. For RPS18 orientation see text. HLA-DPB2 and RT1-Hb are orthologous class II genes in human and rat (Walter et al., 1996). AACTGTCAGTCCCGGCGAGT, 406 to 425 (Fig. 4); and Sacm2l-1, AGCTGTCAGTCTCTGAGAGT, 416 to 435 (Fig. 4). Sequencing. Cycle sequencing was carried out either with 200 ng of purified PCR product or with 1 mg of plasmid or 3mg of cosmid DNA, using a DyePrism sequencing kit (ABI, Weiterstadt, Germany). Reaction products were analyzed on an ABI377 automated sequencer. Primer extension analysis. The transcriptional start sites of human and rat Sacm2l were mapped by primer extension analysis. Briefly, 20 mg of total RNA from rat lung and liver and human melanoma cell lines PARL and MelWei (kindly provided by Dr. J. P. Johnson, Munich, Germany) was reverse transcribed for 1 h using 6-FAM-labeled oligonucleotides Sacm2l-1 or hSACM2L-2 and 400 units of SuperScript reverse transcriptase (Gibco BRL, Eggenstein, Germany) in a total volume of 50 ml. Aliquots of 4 ml were analyzed on an ABI310 automated sequencer using the GeneScan software and GeneScan-500 TAMRA size standard (ABI). Fluorescence intensities of single peaks were compared, and major and minor start sites could be deduced. Southern and Northern blot analysis. Southern blot analysis was according to standard procedures (Sambrook et al., 1989). For Northern blot and RT-PCR analysis total RNA was prepared according to Chomczynski and Sacchi (1987). Thirty micrograms of total RNA was separated in 1.2% formaldehyde-containing agarose gels and subsequently transferred to nylon membranes (Amersham Buchler, Braunschweig, Germany). For RT-PCR 10 mg of total RNA was reverse transcribed for 2 h using an oligo(dT) primer and 400 units of reverse transcriptase (Gibco BRL ) in a total volume of 50 ml. Computational analysis. Searches for homologous DNA and amino acid sequences as well as amino acid motifs in the PROSITE database were performed with the BCM Search Launcher at http://kiwi.imgen. bcm.tmc.edu:8088/search-launcher/launcher.html. Predictions for coiled-coil domains and transmembrane regions were based on programs at http://ulrec3.unil.ch/software/coils_form.html and http:// ulrec3.unil.ch/software/TMPRED_form.html, respectively.
RESULTS
Isolation and sequence. We have described previously the isolation of cosmid clone 9b1, which maps to the RT1-A region (Walter et al., 1995). This clone contains the MHC class I gene RT1-A as well as the ribosomal protein S18 gene Rps18 (formerly designated Ke3), which are separated by about 25 kb (Walter and Gu¨nther, 1995; see Fig. 1). To search for further genes in this region we sequenced shotguncloned fragments of cosmid 9b1. One of these sequences showed a high degree of similarity with expressed sequence tags (EST) of human and mouse origin. On the basis of the cosmid-derived sequence as well as the various EST, oligonucleotides that allowed PCR amplification of a fragment of about 2.2 kb from testis cDNA of inbred rat strain LEW.1W were designed. The sequence of this cDNA (Fig. 2) did not show any homology to known vertebrate genes in the database. This indicated that we had identified a novel gene on cosmid 9b1, which we initially designated Are1 (A region expressed 1). The longest open reading frame contains 680 codons (Fig. 2). Numerous EST sequences of human, mouse, zebrafish, and Drosophila (Fig. 3) could be identified in the database that display 95, 98, 82, and 50% identity, respectively, with the deduced amino acid sequence of ARE1. Interestingly, a putative Caenorhabditis elegans protein designated F08C6.3 (database Accession No. U29378) and the yeast SAC2 (suppressor of actin mutation 2; Kolling et al., 1994) protein showed 33 and 20% sequence identity, respec-
300
¨ NTHER WALTER AND GU
FIG. 2. cDNA sequence of the rat Sacm2l gene. The translational start codons are underlined. The stop codon is indicated by an asterisk, the putative coiled-coil domain is doubly underlined, putative membrane-spanning regions are indicated by a dotted line, and arrows mark amino acid sequences 97–560 and 330 – 459, missing in alternatively spliced transcripts. Amino acid residues identical between SACM2L, C. elegans protein F08C6.3, and yeast SAC2 are shown in bold and italics, whereas SACM2L residues found in either F08C6.3 or SAC2 are in bold. The sequence was initially generated with primers derived from human and mouse EST (see Materials and Methods). The corresponding rat nucleotides have been subsequently established by sequencing rat cosmid 9b1.
tively, with Are1 (Fig. 2). Therefore, the designation Are1 was changed to Sacm2l (suppressor of actin mutation 2-like).
Inspection of the deduced SACM2L amino acid sequence indicated the presence of a coiled-coil motif (probability value 99.5%) and three putative trans-
A NOVEL HIGHLY CONSERVED GENE IN THE MHC
301
FIG. 2 — Continued
membrane regions with scores above significance threshold at amino acid positions 109 to 150, 171 to 191, 397 to 415, and 433 to 451 (Fig. 2). In the homologous C. elegans and yeast proteins, the coiled-coil domain is also found, whereas the probability for the presence of transmembrane regions is low. The 59 end of the coding region contains two ATG codons that are in accord with the consensus sequence of translational start codons (Kozak, 1996). The 59 upstream AUG will
most probably be used for the start of translation (Kozak, 1995). No further motifs indicative of function could be identified in the putative SACM2L protein. Fine mapping of Sacm2l in the rat MHC. To fine map Sacm2l on rat cosmid 9b1 and to determine its transcriptional orientation long-range PCR was performed with primers Are1-17 and RT1A-2, which are derived from the sense strands of Sacm2l and RT1-A, respectively. The distance between these primers
302
¨ NTHER WALTER AND GU
FIG. 3. Schematic representation of Sacm2l homologous genes and EST sequences. The coding part and flanking sequences of Sacm2l are indicated by a bar and a line, respectively. The database accession numbers or the gene products are shown above the lines of the respective EST or gene sequences.
was found to be about 15 kb, and the transcriptional orientation of Sacm2l could be deduced as shown in Fig. 1. Thus, RT1-A and Sacm2l are in a tail-to-tail arrangement. The distance between Sacm2l and Rps18 was determined by conventional PCR with primers Rps18-1 and Are1-19 derived from the antisense strands of the respective genes and which yielded a product of 1030 bp. Sequence analysis revealed that Sacm2l and Rps18 are oriented head-tohead, and the translational start points of these genes are separated by 455 bp (Fig. 4). Putative TATA and GC boxes are found at positions 74 to 70 (Rps18) and 91 to 101 (Sacm2l), respectively (Fig. 4). The size of the rat Sacm2l gene was determined by long-range PCR with primers Are1-21 and Rps18-1. A product of about 13 kb was obtained. Since it also contains 1 kb of Rps18 and Rps18/Sacm2l intergenic sequence, the Sacm2l gene comprises about 12 kb. Fine mapping of the human and mouse Sacm2l genes. A conventional PCR was carried out with human genomic DNA and primers Are1-20 and Rps18-3, which were derived from human EST AA088818 (Fig. 3) and the RPS18 gene (Chassin et al., 1993). A product of size similar to that in the rat was obtained and sequenced. The distance between the translational start points of the
human SACM2L and RPS18 genes is 444 bp (Fig. 4). Similarly, PCR was performed with mouse genomic DNA and primers Are1-16 and Rps18-4, which were derived from the rat Sacm2l gene and the mouse Rps18 sequence (MacMurray and Shin, 1992), respectively. The sequence of the PCR product revealed a distance of 482 bp between the respective translational start codons of mouse Sacm2l and Rps18 (Fig. 4). These fine mapping data were confirmed by crosshybridization of a rat Sacm2l probe to Rps18-containing human and mouse cosmid clones HPB.ALL 51 and H25, respectively (own unpublished data). Thus, rat, human, and mouse Sacm2l map to an orthologous position in the MHC, and the Sacm2l and Rps18 genes have the same orientation and similar distances in all three species. It should be noted that the transcriptional orientation of the human RPS18 gene as published by Campbell and Trowsdale (1997) is opposite. Therefore, the orientation described here was further confirmed by long-range PCR using primers derived from the sense strands of the human BING4 (GTACATGGCCACCTCTGGCCTAGA) gene, which maps centromerically of RPS18 (Herberg et al., 1998), as well as the RPS18 gene (CAGCACACCAAGACCACTGGCCGC; Chassin et al., 1993). PCR fragments of 4 kb were obtained from
A NOVEL HIGHLY CONSERVED GENE IN THE MHC
303
FIG. 4. Alignment of the human, rat, and mouse Rps18 –Sacm2l intergenic sequences. The translational start codons of Rps18 and Sacm2l are indicated by bent arrows. Putative TATA and GC promoter elements are boxed. Open arrowheads mark the 59 ends of Rps18 cDNA (DDBJ/GenBank/EMBL database). Filled arrowheads indicate transcription start site of human (above sequence) and rat (below sequence) Sacm2l. Gaps introduced to maximize homology are indicated by dashes.
human genomic DNA as well as from rat cosmid clone 9b1.B, confirming the transcriptional orientation of the human RPS18 gene as revealed here (Fig. 1). Expression of Sacm2l. A transcript of about 3 kb is detected in all tissues examined by Northern blot analysis of total RNA (Fig. 5). RT-PCR of testis cDNA revealed this transcript and two additional Sacm2l amplificates. These amplificates had open reading frames in which codons 330 to 459 and 97 to 560 (Fig. 2), are missing. This indicates occurrence of alternative splicing, since Sacm2l is a single-copy gene in the human and rat genome according to Southern blot analysis (not shown). The transcriptional start site of human and rat Sacm2l was mapped by primer extension analysis. Several start sites could be determined for both human and rat Sacm2l. The human gene shows minor start sites at positions 204 and 240 (Fig. 4) and major start sites at positions 283, 284, 288, and 298 (Fig. 4). Minor start sites for the rat Sacm2l gene could be mapped at positions 230, 273, 344, and 345 (Fig. 4), and major start sites were found at positions 284, 288, and 289 (Fig. 4). Thus, transcription of the human and rat Sacm2l gene starts at multiple sites. This is in accord with different 59 ends of human EST sequences W7331, AA303068, AA0800818, and AA323253 (Fig. 3).
DISCUSSION
Sacm2l is a novel highly conserved single-copy gene in the MHC. It is closely linked to the well-mapped
FIG. 5. Expression analysis of rat Sacm2l by Northern blot with total RNA. Organs tested: spleen (sp), thymus (th), liver (li), kidney (ki), brain (br), lung (lu), testis (te), heart (he). The 28S and 18S rRNA bands are indicated. Rehybridization of the blot with a human ß-actin probe is shown in the lower part.
¨ NTHER WALTER AND GU
304
Rps18 gene and, therefore, could be precisely physically mapped. In human, rat, and mouse the Rps18 and Sacm2l genes are oriented head-to-head, and the intergenic region is conserved with respect to sequence and length. Interestingly, a similar organization of neighboring genes can be found in the MHC genes TAP1 and LMP2, which are separated by 593 bp and share a bidirectional promoter (Wright et al., 1995). The centromeric part of the rat (and mouse) MHC has undergone a translocation of class I gene(s) to the Rps18–Ring1 interval (Walter et al., 1996). Since SACM2L maps to an orthologous position in the HLA system, the rat single-copy gene Sacm2l was most probably not part of the translocation. Hence, the centromeric border of the translocation can now be narrowed to the 12-kb interval between the 39 ends of Sacm2l and RT1-A in the MHC haplotype analyzed. The deduced amino acid sequence of Sacm2l shows significant homology to a C. elegans protein and the yeast SAC2 protein. However, no homologous vertebrate gene is known so far. According to Novick and colleagues (1989) the SAC proteins interact with actin, probably as components or controllers of the assembly or stability of the yeast actin cytoskeleton. Since the SACM2L protein appears to be highly conserved, one might assume a similar function in mammalian cells. Recently genes involved in the processing and presentation of antigen have been localized to the centromeric part of the MHC. Thus, the Tapasin gene maps about 20 kb centromeric of the Rps18 gene in human and rat (Herberg et al., 1998; Walter and Gu¨nther, manuscript in preparation). Furthermore, a locus in mouse and rat, designated cim2, which affects presentation of peptides by class I molecules, maps to the H2-K and H2-Pb interval in the mouse (Simmons et al., 1997). Sacm2l can formally be excluded as a candidate gene for cim2 due to its map position. ACKNOWLEDGMENTS The authors are grateful to Drs. E. Weiss and J. Trowsdale for the gifts of cosmids H25 and HPB.ALL 51, respectively, and to Dr. J. P. Johnson for providing human melanoma cell lines PARL and MelWei. This study was supported by EU Grant PL 96562.
REFERENCES Abe, K., Wei, J.-F., Wei, F.-S., Hsu, Y.-C., Uehara, H., Artzt, K., and Bennett, D. (1988). Searching for coding sequences in the mam-
malian genome: The H-2K region of the mouse MHC is replete with genes expressed in embryos. EMBO J. 7: 3441–3449. Campbell, D., and Trowsdale, J. (1997). Map of the human MHC. Immunol. Today 14: 43. Chassin, D., Bellet, D., and Koman, A. (1993). The human homolog of ribosomal protein S18. Nucleic Acids Res. 21: 745. Chomczynski, P., and Sacchi, N. (1987). Single-step method of RNA isolation by acid guanidinium thiocyanate–phenol– chloroform extraction. Anal. Biochem. 162: 156 –159. Hanson, I. M., and Trowsdale, J. (1991). Colinearity of novel genes in the class II regions of the MHC in mouse and human. Immunogenetics 34: 5–11. Herberg, J. A., Sgouros, J., Jones, T., Copeman, J., Humphray, S. J., Sheer, D., Cresswell, P., Beck, S., and Trowsdale, J. (1998). Genomic analysis of the Tapasin gene, located close to the TAP loci in the MHC. Eur. J. Immunol. 28: 459 – 467. Kolling, R., Lee, A., Chen, E. Y., and Botstein, D. (1994). Nucleotide sequence of the SAC2 gene of Saccharomyces cerevisiae. Yeast 10: 1211–1216. Kozak, M. (1995). Adherence to the first-AUG rule when a second AUG codon follows closely upon the first. Proc. Natl. Acad. Sci. USA 92: 2662–2666. Kozak, M. (1996). Interpreting cDNA sequences: Some insights from studies on translation. Mamm. Genome 7: 563–574. MacMurray, A. J., and Shin, H. S. (1992). The murine MHC encodes a mammalian homolog of bacterial ribosomal protein S13. Mamm. Genome 2: 87–95. Novick, P., Osmond, B. C., and Botstein, D. (1989). Suppressors of yeast actin mutations. Genetics 121: 659 – 674. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989). ‘‘Molecular Cloning: A Laboratory Manual,’’ 2nd ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Simmons, W. A., Roopenian, D. C., Summerfield, S. G., Jones, R. C., Galocha, B., Christianson, G. J., Maika, S. D., Zhou, M., Gaskell, S. J., Bordoli, R. S., Ploegh, H. L., Slaughter, C. A., Fischer Lindahl, K., Hammer, R. E., and Taurog, J. D. (1997). A new MHC locus that influences class I peptide presentation. Immunity 7: 641– 651. Trowsdale, J. (1995). “Both man & bird & beast”: Comparative organization of MHC genes. Immunogenetics 41: 1–17. Walter, L., and Gu¨nther, E. (1995). Cosmid cloning of the RT1.A encompassing region of the rat major histocompatibility complex. Transplant. Proc. 27: 1501. Walter, L., Tiemann, C., Heine, L., and Gu¨nther, E. (1995). Genomic organization and sequence of the rat major histocompatibility complex class Ia gene RT1.Au. Immunogenetics 41: 332. Walter, L., Fischer, K., and Gu¨nther, E. (1996). Physical mapping of the Ring1, Ring2, Ke6, Ke4, Rxrb, Col11a2, and RT1-Hb genes in the rat major histocompatibility complex. Immunogenetics 44: 218 –221. Wright, K. L., White, L., C., Kelly, A., Beck, S., Trowsdale, J., and Ting, J. P. (1995). Coordinate regulation of the human TAP1 and LMP2 genes from a shared bidirectional promoter. J. Exp. Med. 181: 1459 –1471.