A Cluster of Ten Novel MHC Class I Related Genes on Human Chromosome 6q24.2–q25.3

A Cluster of Ten Novel MHC Class I Related Genes on Human Chromosome 6q24.2–q25.3

Article doi:10.1006/geno.2001.6673, available online at http://www.idealibrary.com on IDEAL A Cluster of Ten Novel MHC Class I Related Genes on Huma...

533KB Sizes 0 Downloads 34 Views

Article

doi:10.1006/geno.2001.6673, available online at http://www.idealibrary.com on IDEAL

A Cluster of Ten Novel MHC Class I Related Genes on Human Chromosome 6q24.2–q25.3 Mirjana Radosavljevic,1 Benoît Cuillerier,1 Michael J. Wilson,2,* Oliver Clément,1 Sophie Wicker,1 Susan Gilfillan,3 Stephan Beck,4 John Trowsdale,2 and Seiamak Bahram1,† 1



INSERM-CReS, Centre de Recherche d’Immunologie et d’Hématologie, 4 rue Kirschleger, 67085 Strasbourg Cedex, France 2 Immunology Division, Department of Pathology, Tennis Court Road, Cambridge CB2 1QP, UK 3 Basel Institute for Immunology, Grenzacherstrasse 487, CH-4005, Basel, Switzerland 4 The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK * Present address: GlaxoSmithKline, Gunnels Wood Road, Stevenage, SG1 2NY, UK

To whom correspondence and reprint requests should be addressed. Fax: + 33 390.244.016. E-mail: [email protected].

We have identified a novel family of human major histocompatibility complex (MHC) class I genes. This MHC class I related gene family is defined by 10 members, among which 6 encode potentially functional glycoproteins. The 180-kb cluster containing them has been generated by serial duplication and minimal diversification of an ancestral prototype. They are not located within the MHC on 6p21.3, but near the tip of its long arm at q24.2–q25.3, close to the human equivalent of the mouse H2-linked t-complex, a subchromosomal region syntenic to a segment of mouse chromosome 10 harboring the orthologous MHC class I related retinoic acid early transcript loci, Raet1a–d. Hence we have named the identified loci RAET1E–N. Human RAET1 products are all devoid of the membrane-proximal immunoglobulin-like a3 domain and most, but not all, are predicted to remain membrane-anchored via glycosylphosphatidylinositol linkage and are shown to display an atypical pattern of polymorphism. RAET1 transcripts are absent from hematopoietic tissues, but largely expressed in tumors. The involvement of orthologous mouse RAET1A–D/H60 in natural killer and Tcell activation through NKG2D engagement augurs a similar function for the human RAET1 proteins. Key words: major histocompatibility complex, MHC, MIC, RAET1, chromosome 6q, t-complex

INTRODUCTION The past 15 years have witnessed the identification of a growing list of major histocompatibility complex class I (MHC-I) related molecules which, despite sharing the same tridomain backbone, are engaged in a variety of critical physiological processes, some unrelated to immunity. Most are encoded outside of the MHC (on the short arm of chromosome 6 in human) and, in chronological order of their identification, are CD1 [1], AZGP1 (also known as ZAG) [2], FCGRT (also called FcRn) [3], PROCR (previously designated EPCR) [4], MIC [5], HLALS (originally named MR1) [6], and HFE [7]. The chromosome 1qencoded CD1 molecules do not primarily bind peptides; instead, they display self and pathogen-derived lipids and glycolipids to distinct T-cell subpopulations. The 19q-located gene FCGRT (neonatal Fc receptor) governs neonatal immunity through binding and transcytosis of maternal IgG, whereas the 7q-stationed AZGP1 (zinc-a2-glycoprotein) has been implicated

114

in fat catabolism and the 20q-based PROCR (endothelial protein C receptor) operates in the blood coagulation pathway. The only members of this heteroclitic gene superfamily that reside within the 6p21.3 MHC itself, the MHC class I polypeptide-related MIC genes, seem to function as stress signals to activating natural killer (NK) cell receptors. The two most recently identified family members are the 1q-situated MHC class I-like sequence gene HLALS, of yet unknown function, and the 6p22-encoded gene HFE, which is mutated in hereditary hemochromatosis [8–10]. Among the abovementioned loci, MIC genes occupy a singular position, both physically and functionally. In addition to being encoded within the MHC itself, they display unique structural and biological features, including a mucosa-specific expression pattern, stress induction, a high degree of polymorphism (a feature previously exclusive to the classical, peptide-binding/ab TCR-interacting class I genes), independency of b2-microglobulin (a feature shared only with

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved. 0888-7543/01 $35.00

doi:10.1006/geno.2001.6673, available online at http://www.idealibrary.com on IDEAL

Article

A

B

FIG. 1. A novel family of MHC class I genes on human chromosome 6q24.2–q25.3. (A) Physical map: closed arrows depict genes, whereas open arrows depict gene fragments or pseudogenes. Underneath is shown the location of genomic clones with relevant accession numbers as well as length. Centromeric (Cent.) and telomeric (Telo.) clones are only partially shown. Scale is in kb. (B) Exon/intron structure: closed boxes represent exons with open reading frames, whereas open boxes represent pseudoexons. Regarding RAET1E and RAET1G, the last exon, despite being classified as TM, ends with hydrophilic sequences potentially representing cytoplasmic tails.

identified mouse retinoic acid early transcript-1 loci, Raet1a–d (alternatively called Rae1a–d [17,18]; recently reported to interact with mouse Nkg2d [19,20]), and are therefore called RAET1E–N (as approved by the HGNC).

RESULTS AND DISCUSSION AZGP1 thus far), and interaction with a novel C-type lectin activating-receptor, NKG2D, broadly expressed on T and NK lymphocytes [11,12]. The latter interaction might indeed be pertinent for the surveillance exerted by NK and T cells against infectious threats and malignant development, as engagement of NKG2D seems to override the negative signal generated by an array of inhibitory receptors recognizing classical/nonclassical MHC-I proteins [12]. Though MICs are absent from rodents, NKG2D has been mapped and characterized within the mouse natural killer complex on chromosome 6 [5,13,14]. This observation raises the question of whether or not MICs are the sole NKG2D ligands [15,16] and intimates the existence of other, perhaps more physiologically relevant, NKG2D ligands that have been preserved throughout the 60 million years of rodent/primate lineage separation. To search for such novel ligands, we undertook an in silico cloning strategy that led to identification of a novel cluster of MHC class I genes. These unusual MHC-I genes are devoid of the membrane-proximal a3 domain and are located on the opposite end of the chromosome 6 with respect to the MHC, close to the tip of its long arm on chromosomal bands q24.2–q25.3. By virtue of their location, structure, and sequence, these MIC homologous genes are most likely the human orthologs of the previously

The identification of the activating C-type lectin NK/T receptor NKG2D as a ligand for the nonclassical MHC-I MICA glycoprotein creates a dilemma [11]. Although NKG2D is strongly conserved throughout human and mouse evolution [13,14], MIC genes are absent not only from rodents but also from certain human populations [5,9,21,22]. This fact implies that MICA/B are not the exclusive NKG2D ligands [15,16]. Hence, we set out to identify evolutionarily conserved ligands, presumably members of the MHC-I superfamily (by analogy to MIC genes), that might have a more general role in the activation of NKG2D-bearing NK and T lymphocytes. Full-length cDNAs of several prototypical MHC-I glycoproteins were used to screen the GenBank expressed sequence tags (ESTs) databases (http://www.ncbi.nlm.nih.gov/ BLAST) via the tBLASTn program. The search parameters were set widely to not only fully retrieve all relevant sequences, but also to overcome the hurdle posed by the large number of highly related MHC-I loci and their numerous alleles (MICA-G and HLA genes) cluttering up the database. One such search with the human MICA polypeptide retrieved 2054 EST sequences. Each of these sequences was individually searched against, this time, the GenBank nr nucleotide database (defined by the NCBI as including all GenBank + EMBL + DDBJ + PDB sequences, but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences, “no longer nonredundant”)

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.

115

Article

doi:10.1006/geno.2001.6673, available online at http://www.idealibrary.com on IDEAL

TABLE 1: Genomic annotation of human RAET1 genes Locus

Status

Orient.

Acc. no./clone

Gene start Gene end

Exons

Start

End

Putative cDNA

EST acc. no.

RAET1E

gene

-

AL355312

2404

1

2404

2320

2586

BE545401

2

1648

1388

3

1126

851

1

RP11-350J20

RAET1F

pseudogene

+

AL583835

RAET1G

gene

-

AL583835

4

170

1

21530

21992

4

21883

21992

N/A

34513

29502

1

34513

34429

2050

2

31319

31056

3

30827

30546

RP11-244K5 RP11-244K5

ULBP2

gene

+

gene

+

58932

AL355497

75553

81628

RP11-472G23

(RAET1I)

RAET1J

53576

RP11-244K5

(RAET1H)

ULBP1

AL583835

pseudogene

+

AL355497

88165

90837

AW510737 BF513861

4

29887

29502

1

53576

53660

2

56812

57075

AW162943

3

57875

58156

AA359492

4

58823

58932

1

75553

75637

2

80110

80373

3

80588

80863

4

81519

81628

2

88939

89237

3

90322

90595

1

116660

116576

2

113171

112854

3

112631

112352

4

111697

111591

2625

AW959307

R25716 2789

AI830832

N/A

RP11-472G23 RAET1K

pseudogene

-

AL355497

116660

111591

RP11-472G23

RAET1L

gene

-

AL355497

136974

131633

RP11-472G23

RAET1M

Pseudogene

+

AL355497

141687

145407

RP11-472G23 ULBP3 (RAET1N)

Gene

-

RP11-472G23

180569

AL355497

176110

1

136974

136890

2

133746

133483

3

132689

132408

4

131742

131633

2

144272

144549

3

144742

145016

1

180569

180482

2

177665

177402

3

177176

176901

4

176216

176110

N/A

AA884304 AI017460

3336

N/A 3276

AI091180

All sizes in bp. The length of putative cDNAs was inferred from the location of hypothetical polyadenylation signals (AATAAA, ATTAAA) within the genomic DNA.

to systematically exclude previously known loci. All but three of these ESTs had cognate sequences in this database. The latter include classical as well as nonclassical HLA class I and II loci and alleles, atypical CD1, HLALS, HFE, AZGP1, and FCGRT class I genes, immunoglobulin C-2 type domains and

116

T-cell receptor loci. The hitherto unknown trio (GenBank acc. nos. AI091180, AI830832, and AW959307) was subsequently subjected to BLASTx analysis in the nr database. Although no strong homologies were uncovered, remote resemblance to atypical MHC class I MICA and HLALS genes prompted

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.

doi:10.1006/geno.2001.6673, available online at http://www.idealibrary.com on IDEAL

Article

novel MHC-I loci with the aid of homology search and exon prediction programs did not reveal any other immunologically relevant genes, as the only notable hits were those of a pseudogene remotely homologous to the endothelial cell multimerin precursor gene (starts at 15,751 and ends at 19,454), a general transcription factor BTF3a-related gene (starts at 89,428 and ends at 89,916), an anonymous EST (GenBank acc. nos. AW168418; starts at 93,925 and ends at 93,517), and a prohibitin pseudogene (GenBank acc. no. L14273; starts at 154,049 and ends at 155,019). Altogether, this novel MHC-I cluster appears relatively gene-rich, with one gene per every 12.8 kb (14 loci over 180 kb), a feature mirrored by its high GC density of 48.75% and classifying it as belonging to the isochore H1/H2 [23] (Fig. 1A and Table 1). All in all, 10 genomic loci defining novel members of the extended MHC class I superfamily were uncovered here, 6 of which were matched by corresponding EST sequences; however, as most ESTs lacked proper (absent or poor quality sequence) 5’ and/or 39-end sequences, they were recloned by RT-PCR (see GenBank accession numbers provided elsewhere in the manuscript and check http://mhc-x.u-strasbg.fr for further updates). These genes were furFIG. 2. RAET1 glycoprotein structure in the atypical MHC-I gene superfamily. Sequence distance was ther characterized by various BLAST inferred from domain-by-domain multiple alignment with Clustal. searches against nucleotide and peptide sequence databases as well as desktop manual alignments with representative members of the superfamily. Although these loci seem to be further analysis. To physically map the location of putative equidistantly homologous (20–35% sequence identity; Fig. 2) genes giving rise to these three distinct cDNAs, each EST was to a subgroup of atypical class I genes (for example MIC, used to BLASTn screen htgs (high-throughput genomic HLALS, and HFE) given their orthology to mouse chromosequences), nr or the human genome BLAST databases. This some 10 harbored genes Raet1a–d, we designated them search resulted in the identification of two target sequences RAET1E to RAET1N (Figs. 1 and 2, and Table 1). Analysis of in the htgs database, that is BAC clones RP11-244K5 and the exon–intron organization of the RAET1 genes revealed a RP11-472G23 (GenBank acc. nos. AL583835 and AL355497, canonical MHC-I genomic architecture, that is, separate exons respectively) on human chromosome 6q24.2–q25.3. The first encode distinct extracellular a1 and a2 domains flanked by three exons of EST clones AI091180 and AI830832 were two highly hydrophobic membrane-anchorage domains located within the 215,397-bp AL355497 sequence, whereas (Figs. 1B and 3). The most surprising finding was the lack of the genomic coordinates of all four exons of EST AW959307 the immunoglobulin-like a3-encoding exon in the RAET1 were pinpointed on the 46,941-bp AL583835. This analysis genes. Despite the fact that engineered (H2-Kb) or proteolytic also exposed further hits that turned out to correspond to the location of six additional homologous loci. Given the multi(HLA-Aw68) deletion of this membrane-proximal a3 domain plicity of loci, we explored adjacent genomic segments for has been shown to have no effect on structural integrity or the additional relevant coding sequences. Available contiguous peptide binding capacity of typical MHC-I glycoproteins, this clones were screened by Genescan and Pipeline gene identiis the first detection of a “wild-type” human a3-less MHC-I fication programs. This analysis revealed the existence of one [24,25]. The lack of an a3 domain most likely precludes CD8 additional homologous sequence on the BAC clone RP11coreceptor binding. The absence of any sequence correspon350J20 (188,684 bp; GenBank acc. no. AL355312). Screening ding to an obvious cytoplasmic tail (in all but two of the the entire virtual contig of 180,569 bp encompassing all of the RAET1 sequences) focused our attention on the putative

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.

117

Article

doi:10.1006/geno.2001.6673, available online at http://www.idealibrary.com on IDEAL

FIG. 3. Hydropathy profiles. Sequences other than RAET1 were extracted from GenBank and subjected to the Kyte-Doolittle algorithm.

transmembrane domain. A hydrophobicity plot analysis of putative RAET1 polypeptides revealed hydrophobic sequences at both ends of the molecule (Fig. 3). The former represented a typical signal sequence, whereas the latter had a topology characteristic of glycosylphosphatidylinositol

118

(GPI)-anchored proteins. This observation was corroborated by the statistically significant prediction of the so-called v-site, the location of the proteolytic cleavage that removes the carboxy terminus of the precursor sequence, designated the propeptide (GPI Predictor at http://mendel.imp.univie. ac.at/). Indeed, this analysis successfully predicted GPI modification for RAET1H (officially and hence hereafter designated ULBP2; P value 0.0155), RAET1I (officially and hence hereafter designated ULBP1; P value 0.0192), RAET1L (P value 0.0155), and RAET1N (officially and hence hereafter designated ULBP3; P value 0.0178). Hence RAET1 proteins seem to be part of a diverse subset of eukaryotic proteins linked to the membrane not through a permanent proteinacious segment, but through anchorage to a GPI moiety [26]. However, two members of the family (RAET1E and RAET1G) seem to possess bona fide transmembrane segments followed by a short hydrophilic cytoplasmic tail (Fig. 3). In addition to the notable absence of the a3 and cytoplasmic tail exons, as well as the probable presence of a GPI-anchorage motif within RAET1 propeptides, RAET1 genes are also unusual in that they harbor relatively large first introns (with an average of several kb, except for RAET1E) evocative of those seen in the atypical genes MICA/B and HLALS, but distinct from the classical HLA and H2 loci, in which the size of this intron is rather short (< 100 bp on average). Moreover, all expressed RAET1 loci harbor at least one (ULBP1, position 82; ULBP3, position 37) or several (RAET1E positions 36, 154; RAET1G, positions 82 and 323; ULBP2 and RAET1L positions 68, 82, 112) Nlinked glycosylation sites within the a1-a2 superdomain. Finally, RAET1F, RAET1J, and RAET1M are gene fragments: RAET1F carries only the fourth exon, whereas RAET1J and RAET1M lack exons 1 and 4. RAET1K is a pseudogene due to the appearance of an invariable in-frame stop codon within the a2 domain (20/20 individuals studied carried this TAG at position 148). Dot-matrix analysis of the entire 180,569-bp RAET1 cluster and its immediate vicinity (total size of 250 kb) against itself revealed evidence of genomic duplication centered between RAET1J and RAET1K (Fig. 4). Accordingly, the first event was most likely one of inverted-tandem duplication leading to the present-day head-to-head disposition of ULBP2–ULBP1 on the centromeric and RAET1K– RAET1L on the telomeric side of a central island harboring a 1-kb minisatellite, juxtaposed (1.4 kb on the telomeric side) by a closely packed cluster of Alu sequences (5 in 1.6 kb as opposed to 1 per every 3.2 kb in the whole 180 kb sequence) collectively defining a probable recombinational hot spot. Several subsequent partial duplications must be invoked to complete the region. These include the creation of RAET1G from ULBP2 and creation of RAET1M from RAET1G, whereas RAET1K gave birth to ULBP3. Finally, RAET1E, the most divergent sequence, is most likely the origin of RAET1F and RAET1J

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.

doi:10.1006/geno.2001.6673, available online at http://www.idealibrary.com on IDEAL

Article

FIG. 4. A scenario for the genesis of the RAET1 complex. Dot-matrix analysis of the 250 kb of DNA harboring the 180,569-bp RAET1 cluster versus itself. Diagonals indicate the regions where contiguous sequences conform to parameters detailed in the main text. The location of the central minisatellite is boxed. Numbers above the arrows depict percent identity between the corresponding genomic sequences.

gene fragments (this is only inferred from sequence relatedness and not by dot-plot analysis, unlike all other predictions, which are supported both by sequence homology and dotplot scrutiny). This scenario is corroborated by two lines of evidence: first, a highly stringent dot-plot analysis (Stringency, 95%; Window size, 30; Hash size, 6; and Jump size, 1); and, second, equally strong sequence identity (Figs. 2 and 4). For instance, ULBP2 and RAET1L are almost 97% identical (both at the genomic and polypeptide level), whereas RAET1G and ULBP2 share 92% homology at the

amino acid level. This gene organization is remarkable on two fronts. With the exception of the CD1 genes (CD1A–E in human and CD1.1 and CD1.2 in mouse), this is the first example of a non-MHC linked class I multigene family. It is also the most numerous outside the MHC and one of the first examples of such an obvious duplication event (none have been so clearly encountered either within the 6p21.3 MHC or the 1q21–q22 CD1 region) [27,28] reflecting, perhaps, its young age. However, despite the relative youth of these serial duplications, the region per se seems to be evolutionarily old

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.

119

Article

doi:10.1006/geno.2001.6673, available online at http://www.idealibrary.com on IDEAL

TABLE 2: RAET1 nucleotide variation Position

Codons

Amino acids

RAET1E exon 2 (a1) (1/3) 46

CTG -> TTG

Leu -> Leu

54

AAT -> TAT

Asn -> Tyr

61

GAA -> GAG

Glu -> Glu

exon 3 (a2) (3/3) 100

CGT -> CAT

Arg -> His

113

ACC -> GCC

Thr -> Ala

114

ATC -> ACC

Ile -> Thr

GCA -> GCC

Ala -> Ala

CCA -> CTA

Pro -> Leu

57

ACG -> ATG

Thr -> Met

78

CGT -> CTT

Arg -> Leu

ATC -> ACC

Ile -> Thr

118

CGG -> CAG

Arg -> Gln

139

CGG -> AGG

Arg -> Arg

148

AGC -> AGT

Ser -> Ser

ULBP2 exon 3 (a2) (0/1) 95 ULBP1 exon 3 (a2) (1/1) 179 RAET1L exon 2 (a1) (2/2)

exon 3 (a2) (1/1) 119 ULBP3 exon 3 (a2) (1/3)

Positions refer to amino acid residues, whereas numbers in parentheses refer to the ratio of non-synonymous over total substitutions. The single ULBP2 polymorphism was identified thru co-amplification with RAET1L given the high degree of sequence identity.

as reflected by an analysis of various repetitive elements, for example an Alu S/J ratio of 1.94 (comparable to the genome average of 3) and the presence of three times more mammalian-specific LINEs than primate in favor of its presence before human/mouse divergence [29]. As previously noted, RAET1 genes seem to be the human orthologs of the mouse Raet1a–d loci, an apparently less complex genetic region, encoding so far three (RAE1a, -b, and -g; whether RAE1d is a distinct locus or an allele of the first three requires genomic analysis) GPI-linked glycoproteins devoid of the a3 exon on a segment of mouse chromosome 10 (band A4)

120

syntenic to human chromosome 6q [17,18]. Mouse RAET1 genes seem to be under tight spatiotemporal transcriptional control with a rather narrow expression window during days 10–14 of mouse embryonic development. Sequence comparisons show, albeit weakly (≅ 20–30%), that the best human matches to the mouse Raet1 genes are indeed the human RAET1 genes (Fig. 2). The fact that RAET1 proteins have been identified as ligands for mouse NKG2D [19,20] reinforces the contention that the evolutionary preserved RAET1 system might indeed define the premier NKG2D ligand with MIC evolving as structural surrogates. Deciphering the functional hierarchy of each ligand requires further studies in defined in vivo settings—for example, in humans devoid of MIC or in mice carrying a deletion of the entire RAET1 cluster. The identification of RAET1 proteins as ligands for NKG2D brought to light an analogous function for H60, a mouse (no human equivalent identified thus far) polypeptide of unknown function originally recognized as a minor histocompatibility antigen [19,20,30]. Another remote member of the MHC-I superfamily, H60 shares with RAET genes the absence of the a3 domain as well as a similar chromosomal location, but is anchored to the membrane by a genuine transmembrane segment. H60 shows the highest degree of sequence homology with RAET1E (Fig. 2), a unique member of the RAET1 gene family by virtue of sequence divergence, genomic location and possession of a transmembrane segment (the latter is also shared by RAET1G; the occurrence of the transmembrane segment here, however, is due to an out-of-frame insertion of 79 bp within the fourth exon). RAET1E also seems to represent, thus far, the most diverse RAET1 locus identified, foretelling perhaps its function as minor histocompatibility antigen in human. The physical location of RAET1 on chromosome 6q might not be fortuitous. In fact, human RAET1 genes are less than 20 Mb away from the human equivalent of the mouse t-complex, a group of genes critical for embryonic development and leading, when mutated, to multiple defects including the brachyury and fused tail mutations indirectly instrumental in the discovery of MHC itself [31–33]. Occupying a segment of 12–15 cM on the proximal portion of mouse chromosome 17 including the mouse MHC (H2 complex) itself, the t-complex was identified by the natural occurrence of a genetic variant, the t-haplotype, defined by four nonoverlapping inversions leading to transmission ratio distortion, male sterility, and recombination suppression. In contrast to the situation in mouse, the genes defining the t-complex in human are split in two genomic locations, one on each arm of chromosome 6 (6p21.3 and 6q27) in close vicinity to the MHC and RAET1 genes, respectively [34–36]. This peculiar human-mouse orthology might reflect the original MHC genomic arrangement, perhaps a unique block containing MHC–RAET1–t genes. In the course of speciation the t-complex was split in human, moving RAET1 to the other end of chromosome 6 by a yet to be defined chromosomal inversion or transposition event. Finally, the lack of the a3 domain is clearly in support of the a1-a2 platform as defining the primordial MHC class

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.

Article

doi:10.1006/geno.2001.6673, available online at http://www.idealibrary.com on IDEAL

A

B

C

FIG. 5. RAET1 expression in various normal tissues (A, B) and transformed cell lines (C). Lanes in (A) and (B) contain approximately 1 mg, whereas those in (C) carry 2 mg per lane of respective human poly(A)+ RNA. The histological origin of cell lines used in (C) are as follows: HL-60, promyelocytic leukemia; HeLa, cervix carcinoma; K-562, chronic myelogenous leukemia; MOLT-4, lymphoblastic leukemia; Raji, Burkitt’s lymphoma; SW480, colorectal adenocarcinoma; A549, lung carcinoma; and G-361, melanoma. Size markers on the left: 1.35, 2.4, and 4.4 kb or 2.4 and 4.4 kb. Given the high degree of sequence identity among RAET1G, ULBP2, and RAET1L, blots probed with RAET1L could reveal transcripts originating from any of the three genes. For expected length of transcripts, including rather large 39-untranslated segments, see Table 1.

I structure, with the addition of the third domain perhaps a later incident in evolution. Thus, the original structure would have interacted with the evolutionary more ancient innate arm of the immune system, whereas addition of the third domain allowed engagement of the CD8 co-receptor, a hallmark of the antigen-driven MHC restriction. A rapid survey of RAET1 diversity in 50 individuals, most of Caucasoid origin, revealed 4 of 6 (non-synonymous/total) substitutions in RAET1E, 0 of 1 in ULBP2, 1 of 1 in ULBP1, 3 of 3 in RAET1L, and 1 of 3 in ULBP3 (Table 2). A first glimpse as to the tissuewide transcription of RAET1 genes in adult tissues unveils a rather erratic expression pattern: highly tissue-selective for some, ULBP1 and ULBP3, but somewhat more relaxed for others, RAET1G, ULBP2, and RAET1L (Fig. 5). Within these “unchallenged” tissues, RAET1 genes are collectively expressed at rather low levels, an incentive to uncover various stimuli triggering, perhaps, their overexpression. Equally clear is the relative absence of RAET1 transcripts from hematopoietic tissues, relative specificity of ULBP1 and ULBP3 for endocrine glands (thyroid, adrenal gland) or organs with endocrine activity (kidney, brain, prostate), as well as a clear preponderance in various tumor cell lines, reminiscent perhaps of MIC and RAET1 expression. In contrast to the latter, however, no clear retinoic acid induction of RAET1 genes was observed (within the small panel of cell

lines tested), a result reflected by the absence of any correct retinoic acid responsive elements, that is direct repeat of the sequence PuG[G/T]TCA spaced by 1, 2, or 5 nucleotides, within reasonable physical distance from the putative RAET1 transcription initiation sites (M.R. and O.C., unpublished data) [37]. While we were in the final stages of preparing this paper, Cosman and colleagues reported the functional identification, via interaction with a previously uncharacterized human cytomegalovirus glycoprotein, UL16, of the so-called UL16binding proteins (ULBP)-1, -2, and -3 [38], which upon sequence comparison happen to correspond to RAET1I, RAET1H, and RAET1N, respectively (given the prior registration, by HGNC, of the ULBP symbol for these three RAET genes we have used the former name instead throughout the manuscript). The elegant demonstration of the interaction between ULBPs and NKG2D strengthens the argument that RAET1s and MICs are perhaps functionally redundant. In summary, we have revealed the existence of a novel family of MHC-I loci on human chromosome 6q24.2–q25.3. These RAET1 genes define the only class I multigene family, besides CD1, that resides outside the MHC proper. Among them, six (RAET1E, RAET1G, ULBP2, ULBP1, RAET1L, and ULBP3) have the potential to be expressed at the protein level. They are devoid of the membrane-proximal Ig-like a3 domain and

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.

121

Article

doi:10.1006/geno.2001.6673, available online at http://www.idealibrary.com on IDEAL

TABLE 3: Oligonucleotide pairs used in genomic amplification of RAET1 genes Primers RAET1E

Amplicons

5’-GTCAGGGAGAGATGGGAACA-3’ (intron 2, 505-524) 3’-CCAGTTTTCGGAAGGGACAC-5’ (intron 4, 61-42)

1026 bp

ULBP1 5’-TCACCATAAGTGGGAGGAGG-3’ (intron 2, 4397-4416) 3’-CCTTCGACAGAGCTCAAGTC-5’ (intron 4, 41-22) RAET1L

871 bp

5’-CTTATTGACACAGCGTGGAG-3’ (intron 2, 3066-3085) 3’-CAAGAACTCAAGTCGAAGTT-5’ (intron 4, 83-64)

1502 bp

ULBP3 5’-CACAGTGTTTGGGGTCTTTC-3’ (intron 2, 2755-2774) 3’-CTCAATGTGGGAACAGACAG-5’ (intron 4, 63-44)

some appear to be membrane-anchored by a GPI moiety, therefore representing a “minimalist” class I structure. Detailed analysis of RAET1 biology in MIC-deficient individuals should help clarify their physiological relevance in the immune response, as will cluster deletion of their mouse orthologs.

MATERIALS AND METHODS Database mining. The MICA polypeptide sequence (acc. no. AAB41060) was used as a “bait” to screen the NCBI human expressed sequence tags (human ESTs) database using the tBLASTn search program (http://www.ncbi.nlm.nih.gov/blast/). Within the advanced option, all parameters were left on default values except “number of one-line descriptions” and “number of alignments,” which were both set at 2000 (-v 2000, -b 2000, respectively). All ESTs thus obtained were tested against the nonredundant database using a cocktail of BLAST family programs. Previously known

891 bp

sequences were discarded, whereas the others were run against the “high throughput genomic sequences” (htgs) using BLASTn. Genomic and cDNA analysis. The target region of human chromosome 6q24.2–q25.3 gene content was assessed using the Genescan (http://genes.mit. edu/GENSCAN.html) and Pipeline (http://grail. lsd.ornl.gov/GP/) gene prediction programs. The genomic organization of loci of interest was resolved via alignment of consensus EST sequences (generated through clustering of all available sequences) to the relevant genomic contig using the Clustal W algorithm provided in the OMIGA 2.0 software package (Genetics Computer Group software, WI). Open reading frames and amino acid sequences were predicted using translation tools in OMIGA 2.0. The putative polypeptides were further tested by BLAST to ascertain homology to MICA and to potentially find orthologs in other species. The identification of the GPI-modification motif within proprotein sequences was realized online (http://mendel.imp.univie.ac.at/gpi/). Finally, hydropathy plots and dot-matrix analysis used respective algorithms within OMIGA 2.0.

Allelic diversity. RAET1E, ULBP1, RAET1L, and ULBP3 variants were identified by direct sequence analysis of PCR amplicons derived from 50 unrelated individuals, most of Caucasian origin. The putative extracellular a1-2 exons were amplified using locus-specific primers (Table 3). All PCR reactions were run on either a GeneAmp PCR System 9600 or 9700 apparatus (Applied Biosystems, CA) in a total volume of 50 ml including 250 ng genomic DNA, 25 pmol of each primer, and the following enzymes: AmpliTaq DNA polymerase (Roche Diagnostic Systems, Germany) for RAET1E, RAET1L, and ULBP3, and FastStart Taq DNA polymerase (Roche Diagnostics, Germany) for ULBP1 according to the manufacturers’ recommendations, except for the annealing temperatures, which were 578C for RAET1L and 608C for RAET1E, ULBP1, and ULBP3. The PCR products were subsequently purified using “NucleoSpin Extract 2 in 1” (Macherey-Nagel, Germany) and sequenced using ABI PRISM Big Dye terminator Cycle Sequencing Ready Reaction with AmpliTaq DNA polymerase, FS (Applied Biosystems, Foster City, CA), according to the manufacturer’s protocol, except the amount of Terminator Ready Reaction Mix quantity was 4 ml and primer quantity 6.4 pmol. Table 4 documents the primers used for sequence analysis. All reactions were run on an ABI PRISM 310 Genetic Analyser (Applied Biosystems, Foster City, CA). All heterozygous positions identified on electropherograms have been generated as a result of at least two independent PCR reactions.

TABLE 4: Primers used in sequence analysis of RAET1 genes RAET1E

ULBP1

RAET1L

ULBP3

122

Exon 2 (a1 domain)

Exon 3 (a2 domain)

5’-GTCAGGGAGAGATGGGAACA-3’ (intron 2, 505-524)

5’-TGGAGGATGATGGACTTCTC-3’ (intron 3, 204-223)

3’-GTCTCCGTCCGTCATTCTCT-5’ (intron 3, 45-26)

3’-CACCGTCGAGGGAAATTTCC-5’ (intron 4, 43-24)

5’-TCACCATAAGTGGGAGGAGG-3’ (intron 2, 4147-4166)

5’-AGCTGATCTCTTTGCAATGG-3’ (intron 3, 190-209)

3’-CCTCGTCTGTGTCATCATTG-5’ (intron 3, 40-21)

3’-CCTTCGACAGAGCTCAAGTC-5’ (intron 4, 41-22)

5’-CTTATTGACACAGCGTGGAG-3’ (intron 2, 3066-3085

5’-CAGGAAATTGTAAGGGGAAC-3’ (intron 3, 719-738)

3’-GTCACCATCCAATCTCCGTA-5’ (intron 3, 58-39)

3’-CAAGAACTCAAGTCGAAGTT-5’ (intron 4, 83-64)

5’-CACAGTGTTTGGGGTCTTTC-3’ (intron 2, 2755-2774)

5’-CATAGGAGGATGTGGGACAG-3’ (intron 3, 157-176)

3’-GAGTCCCCGACAAGACACAC-5’ (intron 3, 62-43)

3’-CTCAATGTGGGAACAGACAG-5’ (intron 4, 63-44)

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.

doi:10.1006/geno.2001.6673, available online at http://www.idealibrary.com on IDEAL

RT-PCR and northern blot analysis. RAET1 transcripts harboring all exons were cloned by RT-PCR from pooled total RNA of the following human cell lines: HT29, HT1080, Caco-2, and HeLa. First-strand cDNA synthesis used oligodT17, AMV reverse transcriptase, and RNasin (both from Promega, WI) in a standard 20 ml reaction, a fraction of which (4 ml) was used in gene-specific PCR amplifications with AmpliTaq DNA polymerase (Roche Diagnostic Systems, CA). Multiple tissue and human cancer cell line northern blots (MTN, Clontech, Palo Alto, CA) were hybridized (ULTRAhyb, Ambion, Austin, TX) with random-primed ULBP1, RAET1L, and ULBP3 cDNA fragments spanning the signal through GPI domains. Equal loading was controlled by hybridization to a human b-actin cDNA fragment (Clontech). High stringency washes were performed in 0.13 SSC, 0.1% SDS at 55–608C and films developed after 5 to 7 days of exposure at –708C, except for actin, which required only 2 to 4 hours of exposure.

ACKNOWLEDGMENTS We thank the entire chromosome 6 project group at the Sanger Centre (http://www.sanger.ac.uk/HGP/Chr6/team.shtml), in particular Sarah Blakey, Roger Horton, Sarah Milne, Andrew Mungall, Sarah Sims, and Alan Tracey. This work was supported by the INSERM-Contrat de Recherche Stratégique (CReS), the Actions Concertées Incitatives Jeunes Chercheurs and Biologie du Développement et Physiologie Intégrative-Ministère de la Recherche, the Action Recherche Santé 2000-Fondation pour la Recherche Médicale, the Association pour la Recherche sur le Cancer, and the Ligues Départementales et Nationales contre le Cancer. Human sequencing at the Sanger Centre and J.T.’s laboratory is funded by the Wellcome Trust. RECEIVED FOR PUBLICATION JULY 2; ACCEPTED OCTOBER 18, 2001.

REFERENCES 1. Calabi, F., and Milstein, C. (1986). A novel family of human major histocompatibility complex-related genes not mapping to chromosome 6. Nature 323: 540–543. 2. Araki, T., et al. (1988). Complete amino acid sequence of human plasma Zn-a 2-glycoprotein and its homology to histocompatibility antigens. Proc. Natl. Acad. Sci. USA 85: 679–683. 3. Simister, N. E., and Mostov, K. E. (1989). An Fc receptor structurally related to MHC class I antigens. Nature 337: 184–187. 4. Fukudome, K., and Esmon, C. T. (1994). Identification, cloning, and regulation of a novel endothelial cell protein C/activated protein C receptor. J. Biol. Chem. 269: 26486–26491. 5. Bahram, S., Bresnahan, M., Geraghty, D. E., and Spies, T. (1994). A second lineage of mammalian major histocompatibility complex class I genes. Proc. Natl. Acad. Sci. USA 91: 6259–6263. 6. Hashimoto, K., Hirai, M., and Kurosawa, Y. (1995). A gene outside the human MHC related to classical HLA class I genes. Science 269: 693–695. 7. Feder, J. N., et al. (1996). A novel MHC class I-like gene is mutated in patients with hereditary haemochromatosis. Nat. Genet. 13: 399–408. 8. Wilson, I. A., and Bjorkman, P. J. (1998). Unusual MHC-like molecules: CD1, Fc receptor, the hemochromatosis gene product, and viral homologs. Curr. Opin. Immunol. 10: 67–73. 9. Bahram, S. (2000). MIC genes: from genetics to biology. Adv. Immunol. 76: 1–60. 10. Kronenberg, M., Brossay, L., Kurepa, Z., and Forman, J. (1999). Conserved lipid and peptide presentation functions of nonclassical class I molecules. Immunol. Today 20: 515–521. 11. Bauer, S., et al. (1999). Activation of NK cells and T cells by NKG2D, a receptor for stressinducible MICA. Science 285: 727–729. 12. Cerwenka, A., and Lanier, L. L. (2001). Ligands for natural killer cell receptors: redundancy or specificity. Immunol. Rev. 181: 158–169. 13. Vance, R. E., Tanamachi, D. M., Hanke, T., and Raulet, D. H. (1997). Cloning of a mouse homolog of CD94 extends the family of C-type lectins on murine natural killer cells. Eur. J. Immunol. 27: 3236–3241.

Article

14. Ho, E. L., et al. (1998). Murine Nkg2d and Cd94 are clustered within the natural killer complex and are expressed independently in natural killer cells. Proc. Natl. Acad. Sci. USA 95: 6320–6325. 15. Ding, Y., Sumitran, S., and Holgersson, J. (1999). Direct binding of purified HLA class I antigens by soluble NKG2/CD94 C-type lectins from natural killer cells. Scand. J. Immunol. 49: 459–465. 16. Pende, D., et al. (2001). Role of NKG2D in tumor cell lysis mediated by human NK cells: cooperation with natural cytotoxicity receptors and capability of recognizing tumors of nonepithelial origin. Eur. J. Immunol. 31: 1076–1086. 17. Zou, Z., Nomura, M., Takihara, Y., Yasunaga, T., and Shimada, K. (1996). Isolation and characterization of retinoic acid-inducible cDNA clones in F9 cells: a novel cDNA family encodes cell surface proteins sharing partial homology with MHC class I molecules. J. Biochem. (Tokyo) 119: 319–328. 18. Nomura, M., et al. (1996). Genomic structures and characterization of Rae1 family members encoding GPI-anchored cell surface proteins and expressed predominantly in embryonic mouse brain. J. Biochem. (Tokyo) 120: 987–995. 19. Cerwenka, A., et al. (2000). Retinoic acid early inducible genes define a ligand family for the activating NKG2D receptor in mice. Immunity 12: 721–727. 20. Diefenbach, A., Jamieson, A. M., Liu, S. D., Shastri, N., and Raulet, D. H. (2000). Ligands for the murine NKG2D receptor: expression by tumor cells and activation of NK cells and macrophages. Nat. Immunol. 1: 119–126. 21. Komatsu-Wakui, M., et al. (2001). Wide distribution of the MICA-MICB null haplotype in East Asians. Tissue Antigens 57: 1–8. 22. Ota, M., et al. (2000). On the MICA deleted-MICB null (MICB0107N), HLA-B4801 haplotype. Tissue Antigens 56: 268–271. 23. Bernardi, G. (2000). Isochores and the evolutionary genomics of vertebrates. Gene 241: 3–17. 24. Collins, E. J., Garboczi, D. N., Karpusas, M. N., and Wiley, D. C. (1995). The three-dimensional structure of a class I major histocompatibility complex molecule missing the a 3 domain of the heavy chain. Proc. Natl. Acad. Sci. USA 92: 1218–1221. 25. Elliott, T., Elvin, J., Cerundolo, V., Allen, H., and Townsend, A. (1992). Structural requirements for the peptide-induced conformational change of free major histocompatibility complex class I heavy chains. Eur. J. Immunol. 22: 2085–2091. 26. Muniz, M., Morsomme, P., and Riezman, H. (2001). Protein sorting upon exit from the endoplasmic reticulum. Cell 104: 313–320. 27. Shiina, T., et al. (1999). Molecular dynamics of MHC genesis unraveled by sequence analysis of the 1,796,938-bp HLA class I region. Proc. Natl. Acad. Sci. USA 96: 13282–13287. 28. Shiina, T., et al. (2001). Genomic anatomy of a premier major histocompatibility complex paralogous region on chromosome 1q21–q22. Genome Res. 11: 789–802. 29. Jurka, J., and Smith, T. (1988). A fundamental division in the Alu family of repeated sequences. Proc. Natl. Acad. Sci. USA 85: 4775–4778. 30. Malarkannan, S., et al. (1998). The molecular and functional characterization of a dominant minor H antigen, H60. J. Immunol. 161: 3501–3509. 31. Dobrovolskaïa-Zavadskaïa, N. (1927). Sur la mortification spontanée de la queue chez la souris nouveau-née et sur l`existence d’un caractère (facteur) héréditaire non viable. C. R. Soc. Biol. 97: 114–116. 32. Gorer, P. A., Lyman, S., and Snell, G. D. (1948). Studies on the genetic and antigenic basis of tumour transplantation. Linkage between a histocompatibility gene and ‘fused’ in mice. Proc. R. Soc. London Ser. B 135: 499–505. 33. Edwards, Y. H., et al. (1996). The human homolog T of the mouse T (Brachyury) gene; gene structure, cDNA sequence, and assignment to chromosome 6q27. Genome Res. 6: 226–233. 34. Bibbins, K. B., et al. (1989). Human homologs of two testes-expressed loci on mouse chromosome 17 map to opposite arms of chromosome 6. Genomics 5: 139–143. 35. Blanche, H., et al. (1992). Genetic mapping of three human homologues of murine tcomplex genes localizes TCP10 to 6q27, 15 cM distal to TCP1 and PLG. Genomics 12: 826–828. 36. Hamvas, R. M., et al. (1997). Mouse chromosome 17. Mamm. Genome 7: S274–294. 37. Mangelsdorf, D. J., and Evans, R. M. (1995). The RXR heterodimers and orphan receptors. Cell 83: 841–850. 38. Cosman, D., et al. (2001). ULBPs, novel MHC class I–related molecules, bind to CMV glycoprotein UL16 and stimulate NK cytotoxicity through the NKG2D receptor. Immunity 14: 123–133.

Sequence data from this article have been deposited with the DDBJ/EMBL/GenBank Data Libraries under accession numbers AF359243 (RAET1E), AY026825 (RAET1H), AF346416 (RAET1I), AY039682 (RAET1L), and AY027538 (RAET1N).

GENOMICS Vol. 79, Number 1, January 2002 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.

123