Isolation and characterization of an expressed hypervariable gene coding for a breast-cancer-associated antigen

Isolation and characterization of an expressed hypervariable gene coding for a breast-cancer-associated antigen

Gene, 93 (1990) 313-318 Elsevier 313 GENE 03625 Isolation and characterization of an expressed hypervariable gene coding for a breast-cancer-associ...

1MB Sizes 0 Downloads 36 Views

Gene, 93 (1990) 313-318 Elsevier

313

GENE 03625

Isolation and characterization of an expressed hypervariable gene coding for a breast-cancer-associated antigen (Recombinant DNA; estrogen regulatow element; exon; intron; phage ~.gtl I expression library; tandem repeat array (TRA); signal peptide; promoter)

I. Tsarfaty ", M. Hareuveni', J. Horev', J. Zaretsky*, M. Weiss', J.M. Jeitsch b, J.M. Garnier b, R. Lathe b, I. Keydar" and D.H. Wresehner* ° Department of Microbiology, G.S. Wise Faculty of Life Sciences, TelAviv University, Tel A viv (Israel): b Laboratoire de Odndtique MolL,culaire des Eucaryotes du CNRS, Unitd 184 de Biologie Moldculaire et de Gdnie Gdndtique de L'INSERM, Facult~ de Mddecine, 67085 Strasbourg Cedex (France) Received by J.-P. Lecocq: 16 August 1989 Revised: 12 January 1990 Accepted: 14 April 1990

SUMMARY

A human gene and cDNA coding for a breast-cancer-associated antigen (H23Ag) were isolated and characterized. The gene contains two exons and one intron. Part ofthe second exon is a tandem repeat array (TRA) consisting of multiple 60-bp G + C-rich units. We report here the characterization of unique sequences that are found in the H23Aggene and cDNA, in addition to the 60-bp repeats. Analysis of the cDNA sequences revealed a putative ATG start codon preceded by two overlapping initiation consensus sequences (CCACC). The open reading frame determines an amino acid (aa) sequence consisting ofthree regions. The first region contains an initiating methionine and a highly hydrophobic putative signal peptide. This is followed by a variable number of highly conserved 20-aa repeat units (TRA). The last region, C-terminal to TRA, contains four potential N-linked glycosylation sites. The genomic nucleotide sequences demonstrate a putative promoter region that includes a 'TATA' box. A putative estrogen regulatory element is located 5' to the promoter region. The characterization of the gene and cDNA coding for the H23Ag presented here, may help to elucidate its possible function in human breast cancer.

INTRODUCTION

A variety of immunogens have been used to elicit mAbs directed against breast-cancer-associated antigens, that recognize similar normal epithelial antigens which, in malignant tissues, undergo quantitative and/or qualitative changes (reviewed by Tjandra et al., 1988). The H23 mAb Correspondence to: Dr. D.H. Wreschner, Department of Microbiology, G.S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat-Aviv,Tel Aviv 69978 (Israel) Tel. (03)5459915; Fax (03)5413 752. Abbreviations: aa, amino acid(s); bp, base pair(s); ERE, estrogenresponsive element; EtdBr, ethidium bromide; H23Ag, a human breastcancer-associated antigen; H23Ag, gene (DNA) coding H23Ag; kb, kilobase(s) or 1000 bp; mAb, monoclonal antibody; nt, nucleotide(s); oligo, 0378-1119/90/503.50 © 1990Elsevier Science Publishers B.V.(Biomedical Division)

generated in our laboratory specifically detects a cytoplasmic antigen (H23Ag) in 90~o of human breast-cancer tissues examined (Keydar et al., 1989). Secreted H23Ag can also be detected in body fluids of breast-cancer patients and its serum levels correlate with severity of disease (Tsarfaty et al., 1989). The T47D (breast-cancer cell line) cDNA ;LgtI 1 expresoligodeoxyribonucleotide; ORF, open reading flame; PAGE, polyacrylamide-gel electrophoresis; Pollk, Klenow (large) fragment orE. coil DNA polymerase I; PVP, polyvinylpyrrolidone;sa, splice acceptor; sd, splice donor; SDS, sodium dodecyl sulfate; SSC, 0.15 M NaCI/0.015 M Na3' citrate pH 7.6; TRA, tandem repeat array; tsp, transcription start point(s); UTR, untranslated region; VNTR, variable number of tandem repeats.

314 sion library was screened with H23 mAb and a e D N A insert composed of a 60-bp tandem repeat, designated 3b, was obtained. Other mAbs that are directed against breastcancer-associated antigens were used to isolate e D N A inserts that are also composed of 60-bp tandem repeats and are expressed from a hypervar~able gene (Siddiqui et al., 1988; Gendler et al., 1987). The aim of the present study was to clone, sequence and analyze the gene coding for H23Ag.

EXPERIMENTAL AND DISCUSSION (a) Isolation and characterization of the H23Ag gene A molecular probe, 3b eDNA, coding for part of the H23Ag (Hareuveni et al., 1990) is 225 bp long and is composed oftandem 60-bp units (Siddigui et al., 1988; Gendler

A kb

et al., 1987) repeated 3.75 times and flanked on the 5' and 3' ends by synthetic EcoRI linkers. It was used to screen a human breast tumor cell line (MCF-7) genomic library. A positive recombinant, designated 17.5, contained a 7.5-kb inse~ and was selected for further investigation. The 17.5 restriction map (Fig. 1C) was deduced using partial and complete restriction enzyme digestions and Southern blotting. Southern blot hybridization of a Sinai 17.5 digest with the 3b eDNA probe, revealed bands at 1.3 kb and 1.15 kb, and a band corresponding to a size of approx. 60 bp. The 60-bp band gave an intense hybridization signal, whereas the two larger-sized bands showed weak signals (Fig. IA). Analysis of the Sinai digest by PAGE shows a large diffuse band corresponding to about 60 bp, suggesting that multiple S m a l sites exist in the genomic fragment (Fig. 1A II). Partial Sinai and Ddel digestion of an endlabelled 5.3-kb SacI-EcoRI 17.5 fragment revealed multiple

B Z i a a

]g

kb

kb

1.3

8.3

1.16

4.4

OdeZ

Smal

23 2.02

Sscl

VVW

1.38

I [WIM/

1.070 0,870

1AI ,. 1.18"

V

1,3kb

Ukb TANDEM REPEAT ARRAY

O.80S

I 0.08-

0,810

,1,08

VV

EcoRI

V V

ll

I,Iikb

It

|,l~kb

e P

~ m

41h

C EcoRI

BSSSNS NX

$

SSacS'Ve

X KP

[

Ill,I

I

II

!

II

II

II,

Ve$

88-bp TANDEH REPEAT ARRAY

I IIIIIIIIIILIIIIlllillllllllllllllllllilllll

$N

KP

KP NVeSHVeEcoRI

II

II I I Illl

Fig. 1. Southern-blotanalysis,partial enzymedigestionand restrictionmap ofthe 17.Sgene.(Panel A) Restrictionenzymedigestionand Southern blotting of the 17.3 gvne. 17.3 digested with SmaI was run on 6% PAGE and EtdBr-stalned, The most intense band is approx. 60 bp indicatingthat there are multiple Smal sites in the gene (panel !!). Restrictionenzymedigestions(Pstl, Sinai and Sacl)of17.~ were followed by 1% agarose-gelelectrophoresis, Southern blottingand hybridizationwith 3b-eDNA,a 225-bpprobe that is composedoftandem 60-bp units (Siddiquiet al., 1988;Gendler, 1987)repeated 3.75 times and codes for part of the H23Ag(panel !, lanes I, 2, 3, respectively).(Panel B) Partial digestionofa 5.3-kb 8acl-£coRl end-labelledfragment purified from 17.5.End-labellingconsisted of filling.inthe EcoRl site with Pollk and [~t-3aP]dATP.Partial digestionof the purified end-labelledfragment was performed with Smal (CCCGGG) and Ddel (CTNAG) and analyzedby 1% agarose-gelelectrophoresis followed by autoradiographyof the dried gel (right and left sections of this panel). Verticallines denote Smal sites and symbolsv denote Ddel sites which are depicted in the Sacl.EcoRl box. The bottom line shows the distance (in kb) from the 3' EcoRl site. (Map C) Restriction map of the 17.5 gene. The map was deduced from restriction analysis, Southern(1975)blotting and partial restrictionanalysisofend-labeUed~t-32Pfragmentsof the gene.The cleavagesites indicated are: BamHI(B), EcoRl (EcoRl), Kpnl(K), Ncol(N), Pstl(P), SmaI(S), Sacl(Sac), PmlI(V2) and Xmnl(X). The 2.3-kb fragment which contains 60-bp repeat units is denoted by a series of vertical lines, corresponding to Smal sites.

315 Sinai sites in a 2.3-kb fragment indicating the existence of a TR.,I. No Ddel sites were found in this fragment (Fig. 1, B and C). A detailed IT.S restriction map was deduced by additional restriction enzyme digestions (Fig. lC). Southern blots probed with either the 3b eDNA or a Ddel 2.3-kb genomic fragment containing the TR/I (Fig. IC) displayed identical hybridization patterns, and demonstrated one homozygous and three heterozygous individuals (Fig. 2, panel B1.). Hybridization of the same blot with a l-kb SmaI-Pstl genomic 5' fragment that lacks the T/M (Fig. IC), reveals a single band of approx. 5 kb in all samples analyzed (Fig. 2, panel B2.). These results are consistent with a hypervariable gene whose allelic sizes vary due to variation in the number of repeating units. To confirm that genomic non-repeat as well as the T/M are expressed, Northern blots were sequentially probed with (lane 1) 3b eDNA, (lane 2) a genomic 2.3-kb fragment containing the TR/I and (lane 3) a l-kb SmaI-Pstl unique sequence genomic fragment located 5' to the TR/I (Fig. 1C). As these probes detect the same mRNA species (Fig. 2A) we conclude that they are all part ofthe same gene and that, in addition to expression of the T/M, unique genomic sequences are also expressed. Expression of the same size mRNA was also observed following hybridization with a unique sequence genomic fragment located 3' to the T/M (data not shown). Our experiments with probes consisting of unique genomic sequences are concl,sive and confirm the hypothesis (Swallow et al., 1987) that polymorphism is due to the presence of 60-bp tandem repeats constituting a genomic TRA.

A

(b) The nt sequence of the human H2$Ag eDNA Unique-sequence eDNA inserts located 5' and 3' to the repeat array were isolated following screening of the T47D eDNA library with 5' l-kb Smal-Pstl and 3' 0.45-kb Pstl genomic fragments (Fig. I C). The nt sequence of the 5' eDNA fragment shows that the longest ORF extends from a Met start codon (ATG at nt 1) through the 3' sequences that are identical with the consensus 60-bp repeat unit. Furthermore, as this eDNA contains a single Smal site (CCCGGG) proximal to its 3' terminus, the 3' part of the eDNA can be located within the repeat unit. Preceding the putative ATG start codon, the nt sequence CCACCACC that represents two overlapping consensus (CCACC) initiation sequences (Kozak et al., 1986) is found. Analysis of the aa sequence determined by the ORF shows a highly hydrophobic 13-aa peptide that includes five tandem Leu residues located 7 aa downstream from the start codon. It seems likely that this region is a putative signal peptide, given both its size and proximity to the N terminus. Of further note is the high preponderance of Ser residues; in one stretch of 20 aa (aa 43-62) the Ser content is 40%. All these features (i.e., longest ORF, consensus initiation sequences, signal peptide sequence and preferred codon usage by Leu residues), suggest that this should be the correct reading flame. The ORF extends into the 60-bp TRA ar,d determines a Pro-rich 20-aa repeat motifthat also contains three Thr, two Ser, two Gly and two to five Ala residues. Due to nt variation from the consensus sequence, the Pro content within the repeat motif can vary, in which case Ala replaces Pro. The 20-aa repeat also contains His,

B |.

I

3

2,

TI T 2 T 3 T 4

3

TI

T2T3T4

J

--9.5 - - 28S

- - 18S

i

--9.5

--6.5

- - 6.5

- - 4.5

--4.6

--2.2

-2.2

Fig. 2. Northern- and Southern-blot analysis of human breast cancer nucleic acids using the ! 7..5 gene fragments. (Panel A)Northern-blot analysis of RNA extracted from frozen breast tumour tissue using the guanidium thiocyanato/CsCl method (Wreschner et al., 1988). RNA was separated on !.4% agarose gel under glyoxal/dimethyl sulfoxide denaturing conditions, tr~sferred to a nylon membrane and hybridized using nick-translated probes. The following probes were used. Lanes: 1, 3b.cDNA; 2, a 2.3-kb Ddel fragment ofl7.S, the TRA; 3, l-kb Smal-Pstl f r a g m ~ 5' to the tandem array; this fragment does not contain 60-bp repeat sequences. Arrowhead indicates the 6.5-kb mRNA species hybridizing with the labelled probes. (Panel B) Southern-blot analysis of DNA extracted from breast turnout tissue (Gross-Bellard et ai., 1973), and digested to complel;ton with Pstl, separated on 0.8% agarose gel, transferred to a nylon membrane and hybridized wil~hnick-translated probes. The probes used were: (panel I) 3b-eDNA; (panel 2) l-kb Smal.Pstl fragment. Lanes TI to T4 contain DNA samples isolated from breast turnout" tissue of four breast-cancer patients. Arrowhead indicates a single 5-kb hybridizing band.

316

Vai, Asp and Arg residues, whereas other aa are not represented. The 3' cDNA nt sequences extend the ORF 3' to the TRA for an additional 160 aa after which a TGA stop codon is encountered, followed by a 3'-UTR of 1188 nt. A discrepancy exists between our results and the limited nt sequences located 3' to the TRA previously described (Gendler et al., 1988). We note, however, that the cDNA sequences presented here are homologous to the corresponding genomic sequences. The aa sequence 3' to the TRA is also Set-rich and contains four potential N-linked glycosylation sites (NXS/T). A hydrophobic region (VSFFFLSFHL) is located 48 aa upstream from the C terminus. Although a polyadenylation site has not been reached, the combined length ofthe sequences 5' and 3' to the TRA, correlates well SacI GAGCTCCTGGCCAGTGGTGGAGAGTGGCAAGGAAGGACCCTAGGGTTCATCGGAGCCCA6 GTTTACTCCCTTAAGTGGAAATTTCTTCCCCCACTCCCCTCCTTGGCTTTCTCCAAGGA6 GGAACCCCAGGCTGCTGGAAAGTCCGC-CTGGGGCGGGGACTGTGGGTTTCAGGGTAGAAC TGCGTGTGGAACGGGACAGGGAGCGGTTAGAAGGGTGGGGCTATTCCGGGAAGTGGTGGT GGGGGGAGGGAGCCCAAAACTAGCACCTAGTCCACTCATTATCCAGCCCTCTTATTTCTC Sin! rSpll GGCCGCCTCTGCTTCAGTGGACCCGGGGAG GGL.~.~GGAAGTGGAGTGG~AGACCTAGGGG I.PUTATIVE ERE TGGGCTTCCCGACCTTGCTGTACfGGACCTCGACCT~GCTGGCTTTGTTCCCCATCCCCA GTTAGTTGTTGCCCTGAGGCTAAAACTAGAGCCCAGGGGCCCCAAGTTCCAGACTGCCCC TCCCCCCTCCCCCGGAGCCAGGGAGTGGTTGGTGAAAGGGGGAGGCCAGCTGGAGAAGAA ACGGGTAGTCAGGGGTTGCAGCATTAGAGCCCTTGTAGCCCTAGCCCAGGAATGGTTGGA GAGAGAAGAGTAGAGTAGGGAGGGGGGTTTGTCACCTGTCACCTGCTCGGCTGTGCCTA Spl sites ~-TATA 80X

-72$ -66S -605 -s4$ -485 -4~5 -.t6~ -$oJ -~4s - 1IS • 126

-'~GGGGGAOTGGGGGGACCGGJ'~ AAGCG(3TAGGCGCCTGTGCCCGCTCCAC -6$

X~I i--Kozak--~ CTCTCAAGCAGCCAGCGCCTGCCTGAATCTGTTCTGCCCCCTCCCCACCCATTTCAiCCACCACC] .i START CODON

'~

ACACCGGGCACCCAGTCTCCTTTCTTCCTGCTGCTGCTCCTCACAGTGCTTACA T P G T Q S P F F L L L L L T V L T G~TGA~GG~CAC~A~TGG~GAGTGGGCTGCCCTGCTTAGPaTGGTCTTCGTG~TCTTTC

CTGCCCTGTCTGTGCCAG~GGAGGGAGAGGCTAAGGACAGGCTGAGAAGAGTTGCCCCC

S? 116

236

* * * * l l * l * l l * l ~ l l * * l l l l l . ~ l l l l l l * l . * ~ l l l ~ . l l l , l l l ~ * * l l . * , l ~

rPUTATIVE ENHANCER AACCCTGAGAGTGGGTACCAGGGGCAAGCAAATGTCCTGTA~GAAGTCTAGGGGGAAGA

~6

GAGTAGGGA~AG~GAAGGCTTAAGAGGGGAAGAAATGCAGGGGCCATGAGCCAA~CCTA

3S6

TGGG~GAGAG~G~GGCTGCTGCAGGAAG~GGCGGCC~CCCAGGGGTTACTGAGGC

416

TGCCCACTCCCCAGTCCTCCTGGTATTATTTCTCTGGTGGCCAGGCTTATATTTTCTTCT

416

TGCTCTTATTTTTCCTTCATAAAGACCCAACCCTATGACTTTAACTTCTTACAGCTACCA

$36

~GCCCCTGGGCCCGC~CA~TTGTTA~TTCTGGT~TGC~GCTCTACCCCAGGTG . . . . . . . . . . . . . . . . . . sa~ V T G S 6 H A S S T P G GAGAAAAGGA~CTTCGGCTACC~GAAGTTCAGTGCCCAGCTCTACTGAGAAG~TG G E K E T S A T Q R S S V P S S T E K N

S96 6S6

with previous estimates of the unique mRNA regions and the nt sequence is th,Js probably almost full-length cDNA. (c) Nucleotide sequence of the human H23Ag gene Genomic fragments located 5' and 3' to the TRA (Fig. 1C) were sequenced using the dideoxy method (Fig. 3). A comparison of the genomic (SacI-EcoRI 5.3-kb fragment) and eDNA nt sequences enabled us to identify different domains of the H23Ag gene.

(1) Coding regions of the gene The genomic sequence contains the same consensus initiation sequences, putative signal peptide divided by an intron (see below), TRA and unique sequences 3' to the TRA. CTGT~GTAT~CCAGC~CGT~TCTCC~CC~CCCCGGTT~GG~TC~CCA A V S H T S S V L S S H S P G S G S S T CTCAGGGACAG~TGT~CTCTGGCCCCG~C~GG~CCAGCT~A~TT~GCTGCCA T Q G 0 D V T L A P A T E P A S G S A A CCTGGGG~TGTC~CTCGGT~CAGTC~CAGGC~GC~T~CC~CC T H G Q D V T S V P V T R P A L G S T T ~I .., CGCCAGCCCAC~TGT~CCTCAGCCCCG~C~C~GCCAGCCCCG~tTC~CCGCCC

116

896

P P A H D V T S A P D N K P A P G S T A ----Repeat - U n t t CCCCAGCC~G~TGT~CCTCGGCCCCG~CCAGGCC~CCCCGG~CTC~CCGCCC

916

?~6 836

P P A 0 G V T S A P E T R P P P G S T A CCCCAGCCCATGGTGTCACCTCGGCGCCGGACAACAGGCCCGCCTTGGCGTCCACCGCCCIon6 P P A H S V T S A P 0 N R P A L A S T A CTCCAGTCCACAATGTCACCTCC~CCTCAf~CTCTGCATCAC~CTCAGCTTCTACTCTGG ! o76 P P V H NaY T S A S G S A S G S A S T L TGCAC/LACGGCACCTCTGCCAC~GCTACCACAACCCCAGCCAGCAA6ACW~CTCCATTCT I ! 36 V H NeG T S A R A T T T P A S K $ T P F CAATTCCCAGCCACCACTCT~TACTCCTACCACCCTTGCCAGCCATA6CACCkAC~CTG111)6 S i P $ H H S I) T P T T L A S H $ T K T ATGCCAGTAGCACTCACCATAGCACGGTACCTCCTCTCACCTCCTCCAATCACAGCACTTI1s6 D A S $ T H H S T V P P L T S S NeH S T CTCCCCAGTTGTCTACTC~GGTCTCTTTCTTTTTCCTGTCTTTTCACATTTCAAACCTCC 1314 S P 0 L S T G V S F F F L S F H I S N L AGTTTAATTCCTCTCTGGAAGATCCCAGCACCGACTACTACCAAGAGCTGCASAGASACA1374 Q F N*S S L E D P S T D Y Y Q E L Q R D TTTCTGAAATGGTGAGTATCSSCCTTTCCTTCCCCATGCTCCCCTr'~SCAGCCATCAQA14.~6 I S E I I V S I G L S F P 14 L PEnd ACTGTCCACACCCTTTGCATCAAGCCTGAGTCCTTTCCCTCTCACCCCAGTrTTTGCAGA"1496 TTTATAAACAAGGGGGTTTTCTGGGCCTCTCCAATATTAAGTTCAGGTACAGTTCTGGGT iSs6 GTGGACCCASTGTGGTGGTTGGAGGSGTGGGTGGTEGTCATGAGCCGTAGGGAGGGACT616s6 GTGCACTTAAGGTTGGGGGAAGASTGCTGAGCCAGAGCTGGGACCCGTGGCTGAA~TGCC1616 CATTTCCCTGTGACCAGGCCAGGATCTGTGGTGGTACAATTGACTCTGGCCTTCCGACd~A1"/36 GGTACCATCAATGTCCACGACGTGGA~CACAGTTCAATCAGTATAAAACC~AGCAGCC 1196 TCTCGATATAACCTGACGATCTCAA~CGTCAGCGGTGAGSCTACTTCCCTGCTGCAGCCillS6 A~CACCATGCCGGGGCCCCTCTCCTTCCAGTGTCTGGGTCCCCGCTCTTTCCTTAGTGCT1916 GGCAGCGC-GAGGGGCGCCTCCTCTGGGAGACTGCCCTGACCACT~TTTTCCTTTTAGTG1916 AGTGATGTGCCATTTCCTTTCTCTGACCAGTCTGGSGCTGGGGTGCCAGGCTGGGGCATC1036 GCGCTGCTGGTGCTGGTCTGTGTTCT(iSTTGCGCTGGCCATTGTCTATCTCATrGCCTTG2096 GTGA~TGCAGTCCCTGGCCCTGATCAGAGCCCCCCGGTAGAAGGCACTCCATGGCCTGCCais6 ATAACCTCCTATCTCCCCAGGCTGTCTGTCAGTGCC~CCGAAAGAACTACGGGCAGCTGG2216 ACATCTTTCCAGCCCGC-GATACCTACCATCCTATGAGCGAGTACCCCACCTACCACACCC2216 ATGGGCGCTATGTGCCCCTAGCAGTACCGATCGTAGCCCCTATGAC~M~TGAGATTGGG2556 CCCCACAGGCAGGGGAAGCAGAGGGTTTGGCTGGGCAAGGATrCTGAAGGGGGTACTTGG2396 AAAACCCAAAGAGCTrGGAAGAGGTGAGAAGTGGCGTGA.AGTGAGCAGGGGAGGGCTGGC2456 AAGGATGAGGGGCAGAGGTCAGAGGAGTTTTGGGGGACAGGCCTGGGAGGAGACTATGGA2sl6 E©oRI AGAAAGGGGCCCCTCAAAAGGGAGTGCCCCACTGCCAGAATTC 2559

Fig. 3. Sequence of the HMAg gene. eDNA inserts and genomic fragments corresponding to regions 5' and 3' of the TRA were isolated and subcloned into M 13rap 10 and M 13rap I l vectors. Sequencing was performed according to S ange,"et al, (1977). M 13 subciolAes were sequenced by priming with either the Ml3 universal primer or synthetic oligos prepared according to the known sequences. Sequence analysis was performed using the Beckman Micro-Genie program. The putative regulatory region of the gene is characterized by the regulatory motifs (boxed): two Spl-binding sites, putative ERE and TATA box. The in-frame ATG (nt l) and two overlapping initiation consensus sequences mark the initiation of translated sequences. The intron is indicated by dots and begins with the sd sequence and ends with an sa site. The putative enhancer, similar to the mouse cellular enhancer, is boxed. The region corresponding to the putative signal peptide is boxed as is the representative 20-aa repeat unit bounded by 8real sites. The actual TRA can span from 20-80 repeat units. Putative N-linked glycosylation sites are shown by asterisks. The nt numbering is indicated to the right of each column. The novel nt sequence data published here have been deposited with GenBank under accession No. M35093.

317

TRA s in hypervariable genes may evolve by duplication

within the Pu-rich region of the intron (nt 278-306). This showed 86~o identity with the murine cellular enhancer of retroviral gene expression (Fig. 4).

o f an essential basic unit (Jefferys et al., 1988; Teumer et al., 1989). Sequencing ofregions flanking the TRA in 17.5

shows a gradual deviation from the basic repeating unit, both on the 5' and 3' sides (Fig. 3). This unusual gene structure may lead to gene instability resulting in allelic loss often observed in breast cancer tissue (Merle et al., 1989).

(3) Regulatory sequences

Analysis of the genondc sequence 5' upstream from the c D N A sequences revealed a putative promoter region consisting of: (i) 'TATA' sequences (Hogness boxes) located at

(2) Intron sequences

nt -96 to -93, flanked on the 5' and 3' sides by

Comparison of the cDNA and gene sequences reve~ied one 499-bp intron located at nt 59, 5' to the TRA that divides the Thr and Val residues towards the end of the signal peptide. The 5' and 3' intron/exon boundaries conform to consensus sd and sa sites. The intron contains a highly Pu-rich region (nt 59-478: 78~o A + G) followed by a Py-rich sequence (nt 478-557: 75% C+T). An nt homology search against the NIH DATA bank revealed a putative enhancer sequence (Taketo et al., 1987) situated

( i / ) G + C - r i c h regions that include several Spl-binding sites which in turn are preceded by (iii) two C A C C T motifs (at nt - 1 5 2 and -145). A putative ERE (Green et al., 1988; Beato et al., 1989) is located 300 bp upstream from the T A T A box (nt -401 to - 3 8 8 ) and is partially palindromic (Fig. 4). The 5' tetranucleotide, A G G A , and the 3' pentanucleotide, G A C C T , within the putative ERE are identical to the ERE consensus (Fig. 4).

®

®

inteon

H23A"

CONSENSUS

AGGACACAGTGACC

v,r ~E ePe| ~T ] l a ~RE pS~

***T**CT÷*+ACC ***A*+TA*÷*÷÷A ***A+*CA***TA÷ ***T**CG***G÷*

,GH ~ E

©

T

TT G

sequence

AGAAGTCTAGGGGGAAGAGAGTAGGGAG AGAAGTCTAAGGGGAAAAGCTTAGGGA~

TAA G

*

H~rlne

i

im

cellular

ehhahcer

G,.A*---**+A+÷

lip| ERE LI

I

9p! TATA II

I \putative/

S

gene

~han:7

3'

UTR

3'

UTR

b .~,e.J

,,]

@

,Q.

regu I I~or~j i on

|

t

kozak

start codon

CCACCA¢C ATG

signa~l" pept i e

te-bp tandem repeat arrant '

stop codon

I

/

E

cDNA

]

®

un

Sinai

t

Stoat

Fig. 4. Primary structure of the H23Ag gene. The general structures of"the gene and cDNA are represented in (D) and (E), respectively, in (D), Sacl and EcoRI sites are designated by S and E; in (E), aa are indicated by the one-letter code. Putative regnlatory sequences in the regulatory region, demonstrated in (C), refer to Spl-binding sites ($pl), estrogen-responsive element (ERE) and the TATA site. A comparison of the available hormoneresponsive elements including vitellogenin A2 responsive element (vit ERE), the rat prolactin estrogen response element (rPR I ERE), the metallothionein lla glucocorticoid response element (MTIIa GRE), the pS2 estrogen response element (p$2) and the rat O'owth hormone triiodothyronine response element (rGM TRE) yields a 13-met consensus sequence for the ERE, illustrated in (A) (Green et al., 1988; Beato et al., 19897. in (A), this sequence (CONSENSUS) is compared to the H23Agputative ERE (H23Ag);plus and minus symbols represent identity and gap, respectively. Significant homology (24/28 nt matches or 86% homology) is also observed between a mouse cellular enhancer and 28 bp of the H23Ag intron and this is illustrated in (B) (asterisks represent four mismatches).

318

(d) Conclusions We have cloned and sequenced the H23Aghypervariab!e gone and cDNA that code for a breast-cancer-associated antigen. (1) The H2?Aggone is polymorphic and is composed of unique sequences and a TRA of 60-bp units. (2) An analysis of the genomic sequences upstream from the 5' cDNA terminus identifies a 'TATA' sequence, two Spl (GGGCGG) motifs (Dynan et al., 1983) and a G + Crich region. These features suggest that this is the promoter site, although definitive proof must await demonstration of functionality, (3) An ERE-fike sequence appears in the H23Ag gene approx. 280 bp upstream from the putative TATA box indicating that antigen expression may be hormonally modulated. (4) A 28-bp sequence located within the H23Ag intron shows extensive similarity with essential enh~cer sequences of retroviral gone expression suggesting that this region may also regulate transcriptional activity. (5) Analysis of the cDNA nt sequence demonstrates an ORF of 127 aa that starts with an in-frame ATG codon and a highly hydrophobic signal sequence. This is followed by the 20-aa TRA after which the ORF continues for another 160 aa. A TGA codon is then encountered and is followed by a long 3'-UTR. Characterization of the gone and cDNA coding for the breast-tumour-associated antigen may help elucidate its function as well as the mechanisms leading to over, expression in human breast cancer. ACKNOWLEDGEMENTS

This work was supported by Simko Chair for Breast Cancer Research, Frederico Fund for Tel Aviv University, Ms. Toby Green, London, and grants from CNRS/ INSERM, Israel Cancer Association, Moise and Frida Eskenasy Institute Cancer Research, and Friedman Fund for Breast Cancer Research. I.T., M.H. and D.H.W. were the recipients of EMBO short-term and long-term (M.H.) Fellowships and D.H.W. is the recipient of a Koret Foundation Fellowship, San Francisco, CA. We are very grateful to Prof. Pierre Chambon for his continued support and fruitful discussion. REFERENCES Beato, M.: Gene regulation by steroid hormones. Cell $6 (1989) 335-344. Dynan, W.S. and Tjian, R.: The promoter-specific transcription factor Spl binds to upstream sequences in the SV40 early promoter. Cell 35 (1983) 79-87. Gendler, SJ., Burchell, J.M., Duhig, T., Lamport, D., White, R., Parker, M. and Taylor-PapadimiU'iou,J.: Cloning of partial eDNA encoding differentiation and tumor associated mucin glycoproteins expressed

by human mammary epithelium. Proc. Natl. Acad. Sci. USA 84 (1987) 6060-6064. Gendler, S., Taylor-Papadimitriou, J., Duhig, T., Rothbard, J. and Burchell, J.: A highly immunogenic region of a human polymorphic epithelial mucin expressed by carcinomas is made up of tandem repeats. J. Biol. Chem. 263 (1988) 12820-12823. Green, S. and Chambon, P.: Nuclear receptors enhance our understanding of transcription regulation. Trends Genet. 11 (1988) 309-313. Gross-Bellard, M., Ouder, P. and Chambon, P.: Isolation of highmolecular-weight DNA from mammalian cells. Eur. J. Biochem. 36 (1973) 32-38. Hareuveni, M., Tsarfaty, I., Zaretsky, J., Kotkes, P., Zrihan, S., Weiss, M., Green, S., Lathe, R., Keydar, I. and Wreschner, D.H.: A transcribed VNTR gene codes for a human epithelial tumor antigen. eDNA cloning, expression of the transfected gene and overexpression in breast cancer. Eur. J. Biochem. (1990) in press. Jeffreys, A.J., Royle, N.J., Wilson, V. and Wong, Z.: Spontaneous mutation rates to new length alleles: tandem-repetitive hypervariable loci in human DNA. Nature 332 (1988) 278-281. Keydar, I., Chou, C.S., Hareuveni, M., Tsarfaty, I., Sahar, E., Seizer, G., Chaitchik, S. and Hizi, A.: Production and characterization ofmonoclonal antibodies identifyingbreast tumor associated antigens. Proc. Natl. Acad. Sci. USA 86(4) (1989) 1362-1366. Kozak, M.: Point mutations defne a sequence flankingthe AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44 (1986) 283-292. Merlo, G.R., Siddiqui, J., Croop, C.S., Liscia, D.S., Lidereau, R., Callahan, R. and Kufe, D.W.: Frequent alteration of the DF3 tumor associated antigen gene in primary human breast carcinoma. Cancer Res. 49 (1989) 6966. Rigby, P.WJ., Dieckman, M., Rhodes, C. and Berg, P.: Labelling deoxyribonucleic acid to high specific activity in-vitro by nick translation with DNA polymerase 1. J. MoL Biol. 113 (1977) 237-251. Sanger, F., Nicklen, S. and Coulson, A.R.: DNA sequencing with chaintermination inhibitors. Proc, Natl. Acad. Sci. USA 74 (1977) 5463-5467. Siddiqui, J., Abe, M., Hayes, D., Shani, E., Yunis, E. and Kufe, D.: Isolation and sequencing of a eDNA coding for a human DF3 carcinoma-associated antigen. Proc. Natl. Acad. Sci. USA 85 (1988) 2320-2323. Southern, E.M.: Detection of specific sequences among DNA fragments separated by gel electrophoresis, J. Moi. Biol. 98 (1975) 503-517, Swallow, D.M., Gendler, S., Griffiths, B., Corney, G., TaylorPapadimitriou, J. and Bramwell, E.: The human tumor-associated epithelial mucins are coded by an expressed hypervariable gene locus PUM. Nature 328 (1987) 82-84. Taketo, M. and Tanaka, M.: A cellular enhancer of retrovirus gone expression in embryonal carcinoma cells. Proc. Natl. Acad. Sci. USA 84 (1987) 3748-3752. Teumer, J. and Green, H.: Divergent evolution of part of the involucrin gone in the hominoids-unique intragenic duplication in the gorilla and human. Proe. Natl. Acad. Sci. USA 86 (1989) 1283-1286. Tjandra, J.J. and McKenzie, I.F.: Murine monoclonal antibodies in breast cancer: an overview. Br. J. Surg. 75 (1988) 1067-1077. Tsarfaty, I., Chaitchick, S., Hareuveni, M., Horev, J., Hizi, A., Wreschner, D.H. and Keydar, I.: H23 monoclonal antibodies recognize a breast cancer tumor associated antigen: clinical and molecular studies, In Ceriani, R.L. (Ed.), Breast Cancer Immunodiagnosis and lmmunotherapy. Plenum, New York, 1988, pp. 161-169. Wreschner, D.H. and Rechavi, G.: Differential mRNA stability to reticulocyte ribonucleases correlates ~th 3' non-coding (U),A sequences. Eur. J. Biochem. 172 (1988) 333-340.