Expression, alternative splicing and haplotype analysis of transcribed testis specific protein (TSPY) genes

Expression, alternative splicing and haplotype analysis of transcribed testis specific protein (TSPY) genes

Gene 302 (2003) 11–19 www.elsevier.com/locate/gene Expression, alternative splicing and haplotype analysis of transcribed testis specific protein (TS...

321KB Sizes 3 Downloads 92 Views

Gene 302 (2003) 11–19 www.elsevier.com/locate/gene

Expression, alternative splicing and haplotype analysis of transcribed testis specific protein (TSPY) genes Roswitha Kricka, Sibylle Jakubiczkab, Joachim Arnemanna,* a

Institute of Human Genetics, Johann Wolfgang Goethe University Hospital, Theodor-Stern-Kai 7/Haus 9, D-60590 Frankfurt am Main, Germany b Institute of Human Genetics, Otto-von-Guericke University, Magdeburg, Germany Received 19 July 2002; received in revised form 17 October 2002; accepted 28 October 2002 Received by M. D’Urso

Abstract Testis specific protein (TSPY) is a human Y-chromosome derived gene with numerous functional and non-functional copies. Specific expression patterns in testis and testicular tumors, as in prostate cancer samples and cell lines led to the postulation of a potential role in cell proliferation, supported by the presence of a suppressor of variegation, enhancer of zeste and Trithorax/nucleosome assembling protein (nucleosome assembly protein) domain in the mature protein. Expression studies have now identified two transcripts of variable length, termed TSPY-S and -L, which differ in their 30 -translated region due to alternative splicing, and in the quantitative level of transcripts, with TSPY-S being at least 3 – 4-fold more abundant. In immunoblot experiments on human testis and LNCaP protein extracts using an antipeptide-antiserum against the TSPY-L specific C-terminus TSPY-L was characterized as a functional variant on the protein level. As there are at least three intragenic positions differing between various TSPY genes and thus defining certain haplotypes, the alternatively spliced TSPY transcripts were analysed for their haplotypes in order to link them to well defined TSPY loci. Surprisingly, no evidence of a GG-18 haplotype was found for the TSPY-L transcript, while this haplotype makes up almost 50% of all TSPY-S transcripts. This excludes the corresponding TSPY-1 locus from alternative splicing. The only significant differences between the TSPY-1 locus and eight other loci were identified in the promotor region as revealed by detailed sequence comparisons. Thus one might speculate that the alternative splicing could be influenced by elements binding to the promotor region. q 2002 Elsevier Science B.V. All rights reserved. Keywords: Testis specific protein (TSPY) variants; TSPY-L; TSPY-S; Alternative splicing; Variable C-terminus; LNCaP prostate carcinoma cell line

1. Introduction Testis specific protein (TSPY, Y-encoded) was described as one of the first human Y chromosome derived genes; with Abbreviations: A, adenosine; aa, amino acid(s); BCIP, 5-bromo-4chloro-3-indolyl phosphate; bp, base pairs; BPS, branch point site; C, cytosine; G, guanosine; GST, glutathione S-transferase; HRP, horseradish peroxidase; IPTG, isopropyl b-D -thiogalactopyranoside; kDa, kilodaltons; MB, mega base; N, any nucleoside; NBT, nitroblue tetrazoliumchloride; ORF, open reading frame; PAGE, PA-gel electrophoresis; SDS, sodium dodecyl sulfate; T, thymidine; TSPY, testis specific protein, Y-encoded; UTR, untranslated region; Y, pyrimidine; YRAS, pyrimidine-rich acceptor site; FISH, fluorescence in situ hybridization; YAC, yeast artifical chromosome; LB, Luria-Bertani medium; MMLV, Moloney murine leukemia virus; PBST, phosphate-buffered saline - Tween-20; TBST, trisbuffered saline - Tween-20. * Corresponding author. Tel.: þ49-69-6301-7808; fax: þ 49-69-63016002. E-mail address: [email protected] (J. Arnemann).

specific expression patterns in testis and testicular tumors, as in prostate cancer samples and cell lines (Arnemann et al., 1991; Zhang et al., 1992; Schnieders et al., 1996; Lau and Zhang, 2000; Dasari et al., 2001). TSPY is part of a heterogenous repetitive family with previously estimated 20 – 40 copies of related sequences (Vogel and Schmidtke, 1998). Each TSPY transcription unit has a coding sequence of 924 bp (Schnieders et al., 1996; Acc. No. U58096) and covers approx. 2.8 kb of genomic DNA, which itself is embedded into the 20 kb large DYZ5 repeat unit (Manz et al., 1993). DYZ5 initially served as a hybridization probe for Southern blot and FISH experiments to map the various TSPY/DYZ5 units on the Y chromosome. However, due to the connection between TSPY and DYZ5 (Manz et al., 1993) and due to misleading hybridization data the exact number of functional TSPY genes has so far not been quantified. A larger and a smaller cluster of TSPY genes/ pseudogenes thus have been localized on Yp outside the

0378-1119/02/$ - see front matter q 2002 Elsevier Science B.V. All rights reserved. doi: 1 0 . 1 0 1 6 / S 0 3 7 8 - 1 1 1 9 ( 0 2 ) 0 1 1 0 4 - 6

12

R. Krick et al. / Gene 302 (2003) 11–19

pseudoautosomal region (Tyler-Smith et al., 1988), while additional copies were mapped to the long arm, Yq11.23, most likely due to a pericentric inversion during Y chromosome evolution (Tyler-Smith et al., 1988; Schempp et al., 1995; Ratti et al., 2000). The so-called TSPY-major block is located at interval 4A on Yp11.2, while TSPYminor maps to interval 3C on Yp (Lahn and Page, 1997). A screening for TSPY sequence variations on isolated YAC clones led to the postulation of probably six loci with variable copy numbers (Dechend et al., 2000). The increased copy number can be explained with various duplication events which took place during evolution of the Y chromosome. This process could have been enforced by the lack of meiotic recombination and the lack of control/ proofreading mechanisms which favor the accumulation of replication errors, such as nucleotide exchanges or even gross changes like duplications or inversions. On the biochemical and cellular level the function of the TSPY gene product is so far unknown. Due to its limited expression pattern in germ cells a putative role for spermatogenesis had been postulated (Arnemann et al., 1991; Zhang et al., 1992). However, as there is no convenient cell culture system available for biological tests of the spermatogenic cycle, an experimental analysis has been restricted so far. From sequence comparisons with the database the observed homology with the SET/NAP gene family (SET ¼ suppressor of variegation, enhancer of zeste and Trithorax, NAP ¼ nucleosome assembling protein) led to the postulation of a possible mitotic activator function for TSPY (Schnieders et al., 1996). Other members of the SET/NAP family of proteins have been shown to act on chromatin structure and transcription factor binding (Shikama et al., 2000), regulating DNA replication (Nagata et al., 1995) and cell proliferation (von Lindern et al., 1992; Li et al., 1996; Estanyol et al., 1999; Chai et al., 2001) via histone accessibility. The observation of strong TSPY expression in gonadoblastoma (Schnieders et al., 1996; Hildenbrand et al., 1999; Lau et al., 2000) and testicular carcinoma in situ supported the hypothesis of a factor involved in cell proliferation and of an oncogene with malignant potential (Lau, 1999). The recent observation of an aberrant TSPY expression in prostate cancer specimens and prostate cancer cell lines (Lau and Zhang, 2000; Dasari et al., 2001) focussed interests on a potential role also during prostate cancerogenesis. The availability of an in vitro test system now allows a molecular approach to study expression and cellular function of TSPY in more detail.

2. Materials and methods 2.1. Cell culture and tissue samples Human prostate carcinoma cell line LNCaP (DSMZ, # ACC 256; characterized by a hypotetraploid karyotype with 6% polyploidy; 88(84 –93) , 4n . XXYY) was grown to

subconfluence in T-25 flasks in 5 ml of RPMI medium containing antibiotics and 10% fetal calf serum at 378C in 5% CO2. Protein lysates and a total RNA sample from a normal human testis and prostate tissue respectively were obtained from two commercial suppliers (Clontech; Ambion). 2.2. Reverse transcription and polymerase chain reaction (PCR) amplification Total RNA from LNCaP cells was isolated according to standard procedures (Chomczynski and Sacchi, 1987). Poly(A) RNA was purified by use of the Micro-mRNA Purification Kitw [Pharmacia] and subsequently reverse transcribed into cDNA using MMLV reverse transcriptase Superscript RNase H2 [GIBCO BRL] and an oligo(dT)primer. For the amplification of TSPY cDNA we used the two different sets of primers. To amplify exons 4 – 6 the following primers were used: 50 -CAGGGCTTCTCATTCCACTC-30 (JA1005, forward primer complementary to pos. 681 – 700 in exon 4) and 50 -CCATCATATTCAACTCAACAACTGG-30 (JA1006, reverse primer complementary to pos. 936 – 912 in exon 6). The amplification conditions were 948C for 1 min and 578C for 30 s followed by an extension step of 1.5 min at 728C for 35 cycles. PCR products were separated on 15% PA-gel electrophoresis (PAGE) gels. To amplify the complete coding region the following primers were used: 50 -CGCATGCGCCCTGAGGGCTC-30 (JA1091, forward primer complementary to pos. 2 1 in exon 1) and JA1006. The amplification conditions were 948C for 1 min and 558C for 50 s followed by an extension step of 1 min at 728C for 30 cycles. 2.3. Haplotype analysis Haplotype analysis was done as defined and described by Dechend et al. (2000). Here, the term haplotype is used for different genes, located in cis on the same chromosome. Briefly, for the 135 G ! A haplotype analysis cDNAs were amplified with primer pairs FJD1/LEE3, Rsa I digested and electrophoretically separated for the presence or absence of a 486 (A) or 351 bp (G) fragment. For the 584 C ! G haplotype cDNAs were amplified with primer pairs FJD2/ TaqIEX3, Taq I digested and electrophoretically separated for the presence or absence of a 120 (C) or 97 bp (G) fragment. To test for the 18-bp tandem duplication, PCR was carried out using primers JA56/LEE3 and subsequently separated on 15% PAGE gel, displaying bands of 286 and 304 bp, respectively. To test for the splice variant (S/L) and the G/C sequence variation at pos. 584 simultaneously an assay based on a Bam HI/HinfI double digest was set up. At the 30 -end of primer JA1252 (50 -CCTGGAGGTGGAAGAAGAGAAGGATC-30 ), which is used in combination with primer JA1006 for PCR analysis, an incomplete Bam HI restriction

R. Krick et al. / Gene 302 (2003) 11–19

site was created by replacing the C nucleotide (pos. 280) by a G, which is cut by Bam HI in the case it is completed by a C nucleotide being present at pos. 584. The TSPY-L and -S splice variants can furthermore be discriminated by HinfI digest. 2.4. Cloning and sequencing Reverse transcription (RT)-PCR products were cloned into the pGEM-T vector (Promega) and selected for L- or Stype by PCR. Selected clones were cycle-sequenced on an ALFexpress sequencer (Amersham Biosciences, Germany) using Cy5-labelled universal forward and reverse sequencing primers. Sequence comparison was done with BLAST (http://www.ncbi.nlm.nih.gov./blast/). 2.5. Expression of glutathione S-transferase (GST)-fusion proteins For the expression of fusion proteins the TSPY cDNAs with the haplotype of choice were cloned in-frame into the Bam HI/Eco RI site of the pGEX-2T vector (Amersham Biosciences, Germany) and expressed in JM109 bacterial cells according to standard protocols. Briefly, an overnight culture of the recombinant clones was diluted 1:10 in LB medium and incubated at 378C for 1 h, before protein expression was induced by the addition of isopropyl b-D thiogalactopyranoside (IPTG) [0.1 mM] and subsequent incubation for another 6 h. The bacterial cells were pelleted and lysed in cracking buffer [10 mM Na-Phosphate pH 7.2; 6 m Urea; 1% sodium dodecyl sulfate (SDS); 1% bmercaptoethanol] for at least 6 h at 378C. For analysis fusion proteins were separated on 12% SDS-PAGE and stained with Coomassie-Blue. Non-induced recombinant clones and clones with the pGEX-2T vector on its own were used for control. 2.6. Antibodies The rabbit polyclonal antibody a-1026 was raised against a C-terminal peptide (SPDRSYVRTCGAIPCN) of the TSPY-L variant corresponding to amino acids 274 –290 (TSPY-L). The generation of antiserum a-839 has been described elsewhere (Schnieders et al., 1996). 2.7. Western blot analysis For Western blot experiments proteins were extracted from cells and boiled in Laemmli buffer before separating on a 12% acrylamide SDS-PAGE gel under non-reducing conditions. Proteins were electroblotted onto nitrocellulose (Hybond C; Amersham Pharmacia Biotech). Nitrocellulose membranes were blocked with non-fat milk (10%) in PBST (phosphate buffered saline, 0.5% Tween-20) and subsequently incubated for 1 h at RT with a-1026 antibody in PBST. After washing, membranes were incubated with

13

protein A-horseradish peroxidase for 1 h at RT. After washing in PBST signals were detected using Supersignal chemiluminescent substratew (Pierce). Alternatively an alkaline phosphate-labelled anti-rabbit IgG antibody combined with the nitroblue tetrazoliumchloride/5-bromo-4chloro-3-indolyl phosphate system was used for signal detection. However in this case PBST buffer was replaced by TBST (50 mM Tris– HCl, pH 7.4; 0.9% NaCl; 0.5% Tween-20). 2.8. Database search In order to obtain sequence information about the gene structure and the copy number the complete cDNA sequence of TSPY was searched against the Human Genome Sequencing Database and aligned to strings of genomic DNA using the BLAST-programs (http://www. ncbi.nl.nih.gov/blast/).

3. Results 3.1. Evidence of alternative TSPY transcripts with variable 3 0 coding regions To analyse TSPY expression in more detail we performed RT-PCR amplification experiments with different sets of primers covering the whole coding region with poly(A) RNAs from the prostate cancer cell line LNCaP and from human testis for comparison. When using primers complementary to exon 4 (primer JA1005) and exon 6 (primer JA1006) on poly(A) RNA of LNCaP cells, a double band was visible with a major lower band and a fainter upper band (Fig. 1a), which was reproduced in different assays. To characterize the differences in more detail, poly(A) RNA was reverse transcribed, amplified by use of TSPY-specific primers JA1091 (exon 1) and JA1006 and the cloned fragments subsequently sequenced. As expected two different types of clones of various length were obtained and subsequently termed TSPY-L (long variant) and TSPY-S (short variant). The strong mRNA signal is coded by TSPYS and the fainter one by TSPY-L. Sequence analysis revealed that TSPY-S corresponds to the TSPY cDNA sequence described earlier by Schnieders et al. (1996; Acc. No. U58096), while TSPY-L corresponds to another published partial TSPY cDNA sequence (Zhang et al., 1992; Acc. No. M98525), which has now been extended at the 50 -end (Acc. No. AY130858). 3.2. Analysis of the ratio between TSPY-S and TSPY-L transcripts As there is a major difference in the transcriptional level of the two variants we were interested in getting a rough estimation of the ratio. The sequence analysis has helped to identify a unique HinfI restriction site within the TSPY-L

14

R. Krick et al. / Gene 302 (2003) 11–19

Fig. 1. Detection of the alternative splice variants TSPY-S and -L after PCR assays using primer set JA1005/JA1006. (a) PCR analysis of cloned LNCaP cDNA sequences of the TSPY-L (1 ¼ pRo9) and TSPY-S (2 ¼ pRo36) variants and of reverse transcribed LNCaP poly(A) RNA (3); M ¼ marker. (b) Detection of TSPY-L specific fragments in cloned LNCaP cDNA sequences of the TSPY-L (1 ¼ pRo9) and TSPY-S (2 ¼ pRo36) variants, reverse transcribed poly(A) RNA of LNCaP (3), human prostate (4) and testis (5) after HinfI restriction digest. The corresponding fragments are labelled with S (TSPY-S) and L (TSPY-L). M ¼ marker.

variant which improves the discrimination between the two variants. We thus repeated the above mentioned RT-PCR analysis with primers JA 1005/JA1006 on poly(A) RNA samples of human testis, prostate and LNCaP cells followed by a HinfI restriction digest (Fig. 1b). As could be shown, a similar distribution of the TSPY variants was detectable in all samples tested. However, after RT-PCR the signal from the prostate sample was quite weak, which could represent a low basic transcription rate. For the variant analysis we thus had to use an increased amount of the reverse transcribed RNA for the PCR assay. In a first densitometric approach the scanning of the above mentioned RT-PCR gels gave a ratio of roughly up to 9:1 in favour of TSPY-S, which was almost the same for assays of LNCaP, prostate and human testis cDNA. As this densitometric method is not always an exact method for major quantitative differences due to the non-linear distribution of signals, full-length RT-PCR products from LNCaP cells were cloned to analyse the exemplary distribution of the variants in a second approach. A number of 34 clones picked at random were screened by PCR using

primers JA1005/JA1006 and subsequent digestion of the PCR products with HinfI. The ratio of the analysed cloned sequences was 25:9 in favour of TSPY-S, suggesting that from these data the amount of TSPY-S mRNA molecules should be at least 3-fold as compared to TSPY-L. 3.3. Sequence of splice sites of introns/exons When TSPY-L was aligned to genomic TSPY sequences it became evident that it uses a different splice site within intron 4, just 11 bp upstream of the TSPY-S splice site (Fig. 2). For a closer insight into the process of alternative splicing we computer-analysed in more detail the intron 4 sequences of the known TSPY genes and in particular the 30 splice acceptor site. GenBank analysis revealed that intron 4 is 100% homologous in all nine TSPY genes which we have identified and tested so far, including the gene with the G-G18 haplotype (see Section 3.4). There is no hint of a sequence or possible regulatory element within intron 4 differing between, e.g. G-C-18 and G-G-18 haplotypes. The consensus sequence of the acceptor site (NYAG) is

Fig. 2. Comparison of intron 4 nucleotide sequence of TSPY with the consensus sequences of the pyrimidin-rich acceptor site region (YRAS; boxed) and the branch point sites (BPS; shaded box). Altogether three conserved BPS (BPS 1–3) and four potential YRAS regions (YRAS-1–-4) have been identified. While YRAS-1 gives rise to TSPY-S, YRAS-2 gives rise to the TSPY-L splice variant, as indicated. Exonic nucleotide sequences are written in capital, intronic sequences in small letters. Amino acids are written in italics.

R. Krick et al. / Gene 302 (2003) 11–19

usually upstream flanked by a pyrimidine-rich tract of 11 consecutive pyrimidine (Y) nucleotides (thymidine and/or cytosine) leaving a pyrimidine-rich region acceptor site (YRAS) consensus sequence of the order YYYYYYYYYYYNYAG (Moore, 2000). The pyrimidine residue immediately 50 to the AG motif hereby enhances the splice site recognition. Sequence alignment of the YRAS consensus sequence to TSPY intron 4 sequences revealed three weak (YRAS-2, -3, -4) and one strong binding site (YRAS-1; Fig. 2), as the pyrimidine motif is only present in YRAS-1. Thus YRAS-1 is the preferential splice site giving rise to TSPY-S, while the weaker YRAS-2 is the splice site for TSPY-L. So far no evidence has been found that YRAS3 and -4 are used for splicing, which, however, would lead to an immediate stop codon within the following base pairs. Immediately upstream of YRAS-1 and -2 the corresponding branch point sites branch point site 2 (BPS-2) and -3, respectively are present. An additional potential BPS-1 has been located within YRAS-1. 3.4. Haplotype analysis of the expressed TSPY transcripts in LNCaP cells As different loci for TSPY have been described (Dechend et al., 2000) we wondered whether the transcription pattern could be linked to certain haplotypes or defined loci. Variations have been described for three different positions within the gene, namely two nucleotide substitutions at pos. 135 (G ! A) and pos. 584 (C ! G) and the duplication of a 18 bp element (dup18/36; pos. 221– 238). We analysed the obtained 34 cDNA clones for their haplotypes and compared those data with the presence of a S- or L-splice variant (Table 1). While the majority of TSPY-S cDNAs were of the G-C-18 (32%) or of the G-G-18 (44%) haplotype, surprisingly, no G-G-18 haplotype was found among the TSPY-L cDNAs in LNCaP cells, where the G-C18 (89%) type was the dominant one. To test for the absence of a TSPY-L transcript of the G-G-18 haplotype in an independent approach we established an assay to test for the splice variant and the absence or presence of the G nucleotide at pos. 584. By replacing the C nucleotide (pos. 280) through a G, an imperfect Bam HI restriction site is created at the 30 -end of primer JA1252, which is used in

15

combination with primer JA1006 for PCR analysis. In case there is a C nucleotide at pos. 284 the RT-PCR product can be cut by Bam HI. On the other hand the TSPY-L and -S splice variants can be discriminated by HinfI digest of this particular RT-PCR product. Thus the double digestion Bam HI/HinfI gives the information whether there is a TSPY-L variant with a G at pos. 284 with a predicted fragment of 263 bp in case of G and 241 bp in case of C (Fig. 3a). Again this experiment could neither detect a Bam HIundigested TSPY-L fragment of 263 bp in size, nor in LNCaP, nor in human testis mRNA, supporting the previous results, that there is no evidence of an alternatively spliced TSPY-L variant of the G-G-18 haplotype (Fig. 3b). Cloned and sequenced TSPY variants of the 584-C type (TSPYL ¼ pRo9) and of the 584-G type (TSPY-S ¼ pRo36) served as controls. Due to limited amounts and the very low transcription rate prostate mRNA was not analysed any further. To get a closer insight into this process of transcription and alternative splicing of various TSPY genes we did an extensive in silico search of the human genome sequence and were able to map different TSPY genes and pseudogenes and their corresponding haplotypes on the Y chromosome (Table 2). TSPY genes and pseudogenes were mapped to three different Y chromosomal contigs. Altogether six genes (TSPY-1 – 6) of the classical type, three genes with a duplication of the 18 bp element (TSPY-D1 – D3) and five pseudogenes (TSPY-PS1 – PS5) with a homology . 85% were assigned. Thus TSPY-1, -PS1 and -PS5 on contig NT_011896 define the earlier postulated (Lahn and Page, 1997) minor block of TSPY/DYZ5 units on the short arm, Yp, approx. 3.5 mega base (MB) centromeric to the pseudoautosomal boundary, while TSPY-2 – 6, TSPYD1 – D3, TSPY-PS2 and TSPY3 on contig NT_011878 define the major block of TSPY/DYZ5 units on Yp, approx. 6.5 MB centromeric to the pseudoautosomal boundary. These units have been postulated to be approx. 20 kb in size and present as arrays, which is – with the exception of TSPY-PS2 and TSPY-PS5 – confirmed by the human genome sequencing data. Another pseudogene (TSPY-PS4) maps to contig NT_011875 on the long arm, Yq. However, the most interesting result of this computer analysis is that TSPY-1, which characterizes the unique G-G-18 haplotype,

Table 1 Haplotype analysis of TSPY-S and -L cDNA clones expressed from LNCaP cellsa Number of clones

135 G ! A

584 G ! C

18/36 allel

TSPY-S/-L

Percentage

8 3 11 2 1 8 1

G A G A G G A

C C G C G C C

18 18 18 36 36 18 36

S S S S S L L

32 12 44 8 4 89 11

a For a short formula the haplotypes are abbreviated in order of the sequence variants. Haplotype G-G-18, which is predominant in TSPY-S could not be detected in TSPY-L.

16

R. Krick et al. / Gene 302 (2003) 11–19

Fig. 3. Haplotype analysis at pos. 584(C/G) for TSPY-S and -L variants after PCR analysis using primers JA1252/JA1006 and subsequent Bam HI/HinfI double restriction digestion of the obtained PCR fragments. (a) Theoretically deduced size distribution of the different haplotypes at pos. 584(G/C) for the TSPY-S and -L variant. (b) PCR analysis of defined LNCaP cDNA sequences of the cloned TSPY-L (1 ¼ pRo9) and TSPY-S (2 ¼ pRo36) variants, reverse transcribed poly(A) RNA of LNCaP cells (3) and human testis; M ¼ marker. Sizes of the detectable fragments are given in basepairs (bp).

is the only functional TSPY copy on contig NT_011896 and is at least 3 MB apart from the next functional copy. This leaves the question of a locus-dependent splice and/or Table 2 TSPY genes and pseudogenes on the human Y chromosome, their distribution on sequenced contigs and their haplotypesa Contig

Code

Pos.

135 G/A

584 G/C

18/36

expression mechanism open. Sequence alignments of the genomic sequences of all identified TSPY genes including 1.5 kb of promotor sequence and 30 -untranslated region revealed only differences in the promotor region of TSPY-1 compared to the rest of the genes, while all the intron sequences and at least 500 bp of the 30 -untranslated region showed almost identical sequences (99%) with the exception of a few single nucleotide exchanges. 3.5. Protein translation of the TSPY-L mRNA

NT_01189 TSPY-1 TSPY-PS1 TSPY-PS2

3.579887 3.600211 5.023166

···G ···A ···A

···G ···C ···C

··18 ··18 ··18

TSPY-D1 TSPY-2 TSPY-3 TSPY-4 TSPY-5 TSPY-D2 TSPY-D3 TSPY-6 TSPY-PS3 TSPY-PS4

210164 230497 250776 271121 289655 309967 330250 350580 370767 730793

···A ···A ···G ···A ···G ···A ···A ···A ···A ···A

···C ···C ···C ···C ···C ···C ···C ···C ···C ···C

··36 ··18 ··18 ··18 ··18 ··36 ··36 ··18 ··18 ··18

TSPY-PS5

9.675439

···C

···C

··18

NT_011878

NT_011875 a

For the haplotype analysis the possibly duplicated 18 bp element (18/36) at pos. 221– 238 and the sites of potential nucleotide substitutions at pos. 135 (135 G/A) and pos. 584 (584 G/C) were evaluated. The TSPY genes are numbered TSPY-1–6, while TSPY genes carrying a duplicated 18 bp segment are numbered TSPY-D1–D2; pseudogenes are labelled with the abbreviation PS (TSPY-PS1 –PS5). The GenBank accession numbers of the contigs and the position of the corresponding TSPY start codons are given.

As mentioned before, both TSPY splice variants differ in their 30 coding region. As a consequence the open reading frame is altered leaving different C-termini of the TSPY protein with no homology to each other. The TSPY-L protein has a C-terminus of 20 amino acids with a total length of 294 amino acids (aa), while TSPY-S has a longer C-terminus of 34 aa with a total length of 308 aa (Fig. 4). There is a predicted difference in molecular weight from almost 2 kD between TSPY-S (35 kD) and TSPY-L (33 kD). When cloned into the GST-vector (pGEX-2T) and subsequently expressed as a fusion protein the size differences between TSPY-S and -L protein became evident (Fig. 5a). A database search on potential functional sites within the Cterminus did not give any relevant information. To test whether TSPY-L is functional and not a transcribed pseudogene we raised an anti-peptide-antiserum in rabbit using a peptide of 16 aa from the TSPY-L Cterminus for immunization. The specificity of this antipeptide antiserum (a-1026 ) was tested in immunoblot experiments against GST-fusion proteins of as well TSPY-L (clone pRo-Ex1) as TSPY-S (clone pRo-Ex2). These clones cover the complete coding region of each

R. Krick et al. / Gene 302 (2003) 11–19

17

Fig. 4. Variable C-terminus of the TSPY-S and -L proteins due to use of different splice sites. TSPY-S has a unique C-terminus of 34 aa with a total length of 308 aa, while TSPY-L has a unique C-terminus of 20 aa with a total length of 294 aa. The TSPY-L peptide sequence used for the generation of antisera is underlined.

Fig. 5. Analysis of the uninduced and IPTG-induced GST-fusion proteins of TSPY-S (clone pRo-Ex2; lanes 4 and 5) and TSPY-L (clone pRo-Ex1; lanes 2 and 3). The GST expressing vector on its own served as a control (lane 1). The corresponding sizes are marked by arrows. (a) Coomassie-stained 10% SDS-PAGE of uninduced and IPTG-induced GST-fusion proteins for TSPY-L and pGEX-vector; (b) corresponding western blot probed with antibody a-839 detecting both variants; and (c) corresponding Western blot probed with antibody a-1026 detecting only TSPY-L.

variant. As a positive control for the detection of both GSTfusion proteins TSPY antibody a-839 (Schnieders et al., 1996), which was raised against a N-terminal fusion protein common to both variants (Fig. 5b), was applied. Using antipeptide antiserum a-1026 a specific signal of the predicted size was obtained only for the TSPY-L type, thus confirming the specificity of the antiserum (Fig. 5c). When tested on immunoblots of protein lysates from LNCaP cells or human testis a specific band was obtained, which confirmed that TSPY-L is translated and expressed as a protein variant (Fig. 6). It is interesting to note, that due to post-translational modifications the molecular weight of the variants as calculated from the gels is slightly higher than predicted, with TSPY-L (approx. 40 kD) being larger than TSPY-S (approx. 38.5 kD). This suggests that for TSPY-L post-translational modifications should be different and have a greater effect on the actual size of the protein.

Fig. 6. Western blot analysis of protein lysates from LNCaP cells (1) and human testis (2) separated on a 12% SDS-PAGE and probed with antibodies a-839 (a) and a-1026. The corresponding sizes of the S- and Lvariant are marked. M ¼ molecular weight marker.

4. Discussion In this study we report on the identification and characterization of an alternative splice variant of TSPY,

18

R. Krick et al. / Gene 302 (2003) 11–19

called TSPY-L, in the human prostate cancer cell line LNCaP, human testis and – on a weak level – in human prostate tissue. The variant transcript, TSPY-L, described in this manuscript was detected as a fainter band of slightly lower mobility after RT-PCR using primers (JA1005/JA1006) specific for the more 30 end of the common, now renamed TSPY-S sequence (Schnieders et al., 1996; Acc. No. U58096). The TSPY-L transcripts described here are of low abundance compared to TSPY-S, whose transcription rate should be between 3- and 9-fold higher. Detailed sequence analysis revealed that TSPY-L corresponds to a previously published sequence (Zhang et al., 1992) but with a more 50 extended sequence. Sequence alignments of TSPY-S and -L cDNAs with genomic TSPY sequences demonstrated that TSPY-L should be generated by alternative splicing within intron 4. This is strongly supported by the findings that: (a) a search of the genome database does not detect genomic TSPY sequences which lack this insertion of the TSPY-L type; and that (b) at a minimum the first 200 nucleotides of the 30 -untranslated region of both variants are identical and present in at least nine different genomic TSPY copies analysed. Usually the 30 -untranslated regions are variable and represent unique ‘fingerprints’ of a gene which help, e.g. to discriminate between different members of a gene family and to analyse the expression pattern of a specific gene. However, it should be mentioned that from a very theoretical point of view one cannot exclude the existence of a so far unknown genomic TSPY copy with altered intron 4 sequences despite the lack of any evidence from experiments and the human genome database. Using a different splice acceptor site alters the open reading frame and thus leads to a different C-terminus. TSPY-L is a slightly longer (11 bp) transcript with a Cterminus of only 20 aa (total size of 294 aa), while the shorter transcript of TSPY-S codes for a 14 aa longer Cterminus of 34 aa (total size of 308 aa). The calculated molecular weights of the deduced proteins vary by almost 2 kD (TSPY-S: 35.1 kD; TSPY-L: 33.3 kD) in their native form, as can be shown for the GST-fusion proteins (Fig. 5), which do not undergo any post-translational modification. As the C-terminal amino acid sequences differ significantly one has to expect different properties of those two isoforms. The remaining question was about a functional expression of TSPY-L on the protein level. In the light of accumulation and existence of transcribed TSPY pseudogenes a TSPY-L specific antiserum (a-1026 ) was used to proof that TSPY-L is translated into an amino acid sequence. The TSPY-L specific expression pattern was characterized as a functional isoform by various immunoblot experiments as well on GST fusion proteins of the TSPY-L and -S type as on cellular protein lysates of LNCaP cells and human testis and subsequently compared with the patterns of antiserum a-839 (Schnieders et al., 1996) directed against the TSPY-S/-L common N-terminal part

of the protein. However, the observation of an inversion in the expected size in cellular protein extracts, namely that TSPY-L is larger than TSPY-S, suggests that the pattern of post-translational modifications should have different effects on the TSPY proteins, which will have to be answered in future experiments concerning their cellular processing. So far, hard data on the nature of posttranslational modifications and the detailed function of TSPY have not been available at all. Concerning the molecular aspects of this alternative splice process it is interesting to note, that a detailed analysis identified four potential YRAS (pyrimidine-rich region acceptor site) motifs (Moore, 2000) within intron 4 (Fig. 2). While the YRAS-2 motif, which gives rise to TSPY-L transcripts, and the YRAS1 motif, which gives rise to the more frequently transcribed TSPY-S variant, show the same degree of homology to the consensus sequence (11/15 nucleotides), the YRAS-1 motif (11/15) is the stronger binding site, which should be due to the immediately 50 located pyrimidine residue of the classical AG motif, which is of some importance for splice site recognition and only present in this motif (Chen et al., 2000). Other studies, e.g. on transcriptional regulation and alternative splicing were able to demonstrate that potent repressor or activator regions within the C-terminus (Norris and Kern, 2001) or within the promotor region (Pecci et al., 2001) might account for significant differences in the expression of alternative splice products. Alternative splicing is not a rare event and has been described for various genes and cell types. However, on this occasion we do not just have two different transcripts and one gene, but two different transcripts (TSPY-S, -L) and nine almost identical (approx. 98%) genes. The remaining question is whether a certain member of this set of genes could be responsible for this alternative transcript, TSPY-L. Using the haplotype approach a major discrepancy in the haplotypes of the cDNAs of the S- and L-variants could be shown. While the G-G-18 haplotype is the major one for TSPY-S transcripts (44%), none could be detected in TSPYL transcripts, which almost exclusively (89%) show a G-C18 haplotype. The lack of TSPY-L transcripts with a G-G18 haplotype was confirmed in another independent experimental approach. Thus, we would like to postulate that the two identified TSPY genes with a G-C-18 haplotype (TSPY-3, -5; Table 2) should be the major source for TSPYL transcripts. On the other hand these data also indicate that the TSPY gene with a G-G-18 haplotype (TSPY-1, contig NT_011896) should account for almost one third of all transcripts, which is quite surprising with regard to the estimated number of functional copies, which varies between nine functional genes and at least five additional pseudogenes as deduced from the actually available genomic sequence contigs of the human genome sequencing project (Table 2) and between 20 and 40 copies of TSPY genes and pseudogenes as derived from hybridization data (reviewed by Vogel and Schmidtke, 1998). TSPY-1 is the

R. Krick et al. / Gene 302 (2003) 11–19

only functional gene within its cluster and maps at least 3 Mb apart from the next cluster of functional TSPY genes with the remaining haplotypes. Sequence alignments of the genomic sequences from all identified TSPY genes could only reveal a variation in the promotor region of TSPY-1, while all of the intron sequences and at least 500 bp of the 30 -untranslated region were almost identical, suggesting that regulatory elements in the promotor region might somehow influence the alternative splicing pattern. Future work will have to compare and analyse in detail the promotor sequences of the various TSPY copies and, mainly, test for the cellular function of the TSPY-L protein.

Acknowledgements The authors would like to thank Prof. U. Langenbeck for his support and the ‘Deutsche Forschungsgemeinschaft’ for their financial contribution (DFG-AR224/4-1).

References Arnemann, J., Jakubiczka, S., Thu¨ ring, S., Schmidtke, J., 1991. Cloning and sequence analysis of a human Y chromosome-derived testicular cDNA, TSPY. Genomics 11, 108–114. Chai, Z., Sarcevic, B., Mawson, A., Toh, B.-H., 2001. SET-related cell division autoantigen-1 (CDA1) arrests cell growth. J. Biol. Chem. 276, 33665– 33674. Chen, S., Anderson, K., Moore, M.J., 2000. Evidence for a linear search in bimolecular 30 splice site AG selection. Proc. Natl. Acad. Sci. USA 97, 593–598. Chomczynski, P., Sacchi, N., 1987. Single-step method of RNA isolation by guanidinium thiocyanate-phenol chloroform extraction. Anal. Biochem. 162, 156–159. Dasari, V.K., Goharderakhshan, R.Z., Perinchery, G., Li, L.C., Tanaka, Y., Alonzo, J., Dahiya, R., 2001. Expression analysis of Y chromosome genes in human prostate cancer. J. Urol. 165, 1335– 1341. Dechend, F., Williams, G., Skawran, B., Schubert, S., Krawczak, M., TylerSmith, C., Schmidtke, J., 2000. TSPY variants in six loci on the human Y chromosome. Cytogenet. Cell Genet. 91, 67–71. Estanyol, J.M., Jaumot, M., Casanovas, O., Rodriguez-Vilarrupla, A., Agell, N., Bachs, O., 1999. The protein SET regulates the inhibitory effect of p21Cip1 on cyclin E-cyclin-dependent kinase 2 activity. J. Biol. Chem. 274, 33161–33165. Hildenbrand, R., Schro¨ der, W., Brude, E., Schepler, A., Ko¨ nig, R., Stutte, H.J., Arnemann, J., 1999. Detection of TSPY-protein in unilateral microscopic gonadoblastoma of a Turner mosaic patient with a Yderived marker chromosome. J. Pathol. 189, 623–626. Lahn, B.T., Page, D.C., 1997. Functional coherence of the human Y chromosome. Science 278, 675–680.

19

Lau, Y.F., 1999. Gonadoblastoma, testicular and prostate cancers and the TSPY gene. Am. J. Hum. Genet. 64, 921–927. Lau, Y.F., Zhang, J., 2000. Expression analysis of 31 Y chromosome genes in human prostate cancer. Mol. Carcinog. 27, 308 –321. Lau, Y.F., Chou, P.M., Iezzoni, J.C., Alonzo, J.A., Ko¨ mu¨ ves, L.G., 2000. Expression of a candidate gene for the gonadoblastoma locus in gonadoblastoma and testicular seminoma. Cytogenet. Cell Genet. 91, 160– 164. Li, M., Makkinje, A., Damuni, Z., 1996. The myeloid leukemia-associated protein SET is a potent inhibitor of protein phosphatase 2A. J. Biol. Chem. 271, 11059–11062. Manz, E., Schnieders, F., Brechlin, A.M., Schmidtke, J., 1993. TSPYrelated sequences represent a microheterogeneous gene family organized as constitutive elements in DYZ5 tandem repeat units on the human Y chromosome. Genomics 17, 726 –731. Moore, M.J., 2000. Intron recognition comes of AGe. Nat. Struct. Biol. 7, 14– 16. Nagata, K., Kawase, H., Handa, H., Yano, K., Yamasaki, M., Ishimi, Y., Okuda, A., Kikuchi, A., Matsumoto, K., 1995. Replication factor encoded by a putative oncogene, set, associated with myeloid leukemogenesis. Proc. Natl. Acad. Sci. USA 92, 4279–4283. Norris, R.A., Kern, M.J., 2001. The identification of Prx1 transcription regulatory domains provides a mechanism for unequal compensation by the Prx1 and Prx2 loci. J. Biol. Chem. 276, 26829–26837. Pecci, A., Viegas, L.R., Baranao, J.L., Beato, M., 2001. Promoter choice influences alternative splicing and determines the balance of isoforms expressed from the mouse bcl-X gene. J. Biol. Chem. 276, 21062–21069. Ratti, A., Stuppia, L., Gatta, V., Fogh, I., Calabrese, G., Pizzuti, A., Palka, G., 2000. Characterization of a new TSPY gene family member in Yq (TSPYq1). Cytogenet. Cell Genet. 88, 159–162. Schempp, W., Binkele, A., Arnemann, J., Gla¨ ser, B., Ma, K., Taylor, K., Toder, R., Wolfe, J., Zeitler, S., Chandley, A.C., 1995. Comparative mapping of YRRM and TSPY sequences in man and hominoid apes. Chromosome Res. 3, 227–234. Schnieders, F., Do¨ rk, T., Arnemann, J., Vogel, T., Werner, M., Schmidtke, J., 1996. Testis-specific protein, Y-encoded (TSPY) expression in testicular tissues. Hum. Mol. Genet. 5, 1801–1807. Shikama, N., Chan, H.M., Krstic-Demonacos, M., Smith, L., Lee, C.W., Cairns, W., La Thangue, N.B., 2000. Functional interaction between nucleosome assembly proteins and p300/CREB-binding protein family coactivators. Mol. Cell Biol. 20, 8933– 8943. Tyler-Smith, C., Taylor, L., Mu¨ ller, U., 1988. Structure of a hypervariable tandemly repeated DNA sequence in the short arm of the human Y chromosome. J. Mol. Biol. 103, 837 –848. Vogel, T., Schmidtke, J., 1998. Structure and function of TSPY, the Ychromosome gene coding for the ‘testis-specific protein’. Cytogenet. Cell Genet. 80, 209–213. von Lindern, M., van Baal, S., Wiegant, J., Raap, A., Hagemeijer, A., Grosveld, G., 1992. CAN, a putative oncogene associated with myeloid leukogenesis, may be activated by fusion of its 30 half to different genes: characterization of the set gene. Mol. Cell Biol. 12, 3346–3355. Zhang, J.S., Yang-Feng, T.L., Mu¨ ller, U., Mohandas, T.K., de Jong, P.J., Lau, Y.F., 1992. Molecular isolation and characterization of an expressed gene from the human Y chromosome. Hum. Mol. Genet. 1, 717– 726.