Gene 324 (2004) 117 – 127 www.elsevier.com/locate/gene
TDPOZ, a family of bipartite animal and plant proteins that contain the TRAF (TD) and POZ/BTB domains Chiu-Jung Huang a, Chung-Yung Chen b, Huang-Hui Chen a, Shih-Feng Tsai b,c, Kong-Bung Choo a,* a
Department of Medical Research and Education, Taipei Veterans General Hospital, 201 Shih Pai Road, Section 2, Shih Pai, Taipei 11217, Taiwan b Division of Molecular and Genomic Medicine, National Health Research Institutes, Nankang, Taipei 115, Taiwan c Institute of Genetics and Genome Research Center, National Yang Ming University, Taipei 112, Taiwan Received 10 June 2003; received in revised form 1 September 2003; accepted 16 September 2003 Received by G. Pesole
Abstract We have previously reported a gene Tdpoz1 (previously called 2cpoz56) that is temporally expressed in unfertilized eggs and in early embryos of the mouse. The putative TDPOZ1 protein carries a tumor necrosis factor receptor-associated factor (TRAF) domain (TD) and a POZ/BTB domain. On the analysis of nine bacterial artificial chromosome (BAC) clones, we have uncovered four more Tdpoz1 homologs in the mouse genome, designated Tdpoz2 through Tdpoz5. Tdpoz1 and Tdpoz2 are found 30 kb apart in a fully sequenced BAC clone (GenBank accession number AF545858). The genes are intronless in the coding region and each carries an intron in the 5V-untranslated region as in other early embryonic genes. The Tdpoz gene cluster is mapped on chromosome 3 at 3F2.1 – 2.2. RT-PCR experiments and a search of expressed sequence tag (EST) databases show that the Tdpoz1 – 5 genes are transcribed in early embryos, particularly at the two-cell stage. Exhaustive database searches have further uncovered three more mouse Tdpoz homologs in chromosomes 3 and 11 and 25 other Tdpoz-like orthologs in the genomes of other animal and plant species including human, rat, C. elegans, Drosophila, Arabidopsis and rice. In the rat genome, eight rat Tdpoz genes are found as a cluster in chromosome 2. Hence, TDPOZ proteins form a new protein family on the basis of similar protein domain organization. Based on reported characteristics of known TD- and POZ-bearing proteins, we speculate that TDPOZ proteins may be nuclear scaffold proteins probably involved in transcription regulation in early development and other cellular processes. D 2003 Elsevier B.V. All rights reserved. Keywords: TRAF domain; POZ/BTB domain; TDPOZ protein family; Pre-implantation embryos
1. Introduction We have previously described a gene called Tdpoz1 (previously designated as 2cpoz56) that is temporally transcribed in unfertilized eggs and in pre-implantation embryos (Choo et al., 2001). Tdpoz1 mRNA is undetectable in laterAbbreviations: BAC, bacterial artificial chromosome; BLAST, Basic Local Alignment Search Tool; BLAT, BLAST-like Alignment Tool; EST, expression sequence tag; MATH, meprin and TRAF homology domain; SMART, Simple Modular Architecture Research Tool; TD, TRAF domain; TDPOZ, TD and POZ domain containing protein; TEFs, TD-encompassing factors; TNF, tumor necrosis factor; TRAF, TNF receptor-associated factor; RTDPOZ, rat TDPOZ; UTR, untranslated region. * Corresponding author. Tel.: +886-2-2875-7400; fax: +886-2-28721312. E-mail address:
[email protected] (K.-B. Choo). 0378-1119/$ - see front matter D 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2003.09.022
stage fetuses or in any of the adult tissues analyzed. Conceptual translation of the Tdpoz1 cDNA sequence (GenBank accession number AF290198) indicates that the putative TDPOZ1 protein is 365 residues long with a tumor necrosis factor (TNF) receptor-associated factor (TRAF) domain (TD) and a POZ/BTB domain in the amino and carboxyl portions of the protein, respectively (Fig. 1). TD and POZ domains are frequently found in zinc finger proteins but rarely associate with each other. To date, the only reported protein with a similar domain organization is the nuclear speckle-type protein, SPOP (GenBank accession number BAB68542). SPOP is a ubiquitous protein that exhibits speckled staining pattern in the nuclei of transfected cells; the protein colocalizes with the spliceosomal protein snRNPBV/B (Nagai et al., 1997). SPOP is evolutionarily conserved and homologs have been found in mouse, human, C. elegans
118
C.-J. Huang et al. / Gene 324 (2004) 117–127
Fig. 1. The cDNA sequence of the mouse Tdpoz1/2cpoz56 gene (GenBank accession number AF290198). Conceptual translation of the coding sequence is shown below the nucleotide sequence. Other features shown include demarcation of exons 1 and 2, two potential initiation codons (boxed), the TD and POZ domains and the putative polyadenylation signal (underlined).
and Drosophila (GenBank accession nos. BAB68542, CAA04199, P34568 and AAF55007, respectively). TRAFs form a family of proteins that bind to members of the TNF receptor protein family and each carries a conserved domain called the TRAF domain (TD) (Rothe et al., 1994). Diversity of TD-encompassing factors (TEFs) (Zapata et al., 2001) is achieved by combining TD with other functional domains such as the C2H2 and RBCC-type zinc fingers, POZ and ubiquitin-specific protease domains. There are two major phylogenetic groups of TEF members: Group I includes the classical TRAFs with the TD domain located at the carboxyl-terminus. Group II TEFs carry TD internally or at the amino terminus as in TDPOZ1 and SPOP. Phylogenetic analysis has clearly indicated a long evolutionary history for the TRAF domain. Meprins, the extracellular metalloproteinases, also contain a TD-like domain. In its first report, the TD-like domain in mephrin was called MATH (for meprin and TRAF homology) (Uren and Vaux, 1996). POZ/BTB is a hydrophobic domain originally found at the amino end of many C2H2-type zinc finger proteins (Bardwell and Treisman, 1994; Zollman et al., 1994). It has been estimated that 5– 10% of human zinc finger proteins are POZ-bearing proteins (Collins et al., 2001). The POZ domain is structurally folded into alternating a-helices with intervening h-sheets (Bardwell and Treisman, 1994; Collins et al., 2001). Most of the conserved residues in the POZ domain are hydrophophic reflecting the importance of conserving the POZ conformation. POZ domains are in-
volved in oligomerization (Bardwell and Treisman, 1994; Hoatlin et al., 1999); dimerization of POZ-containing zinc finger proteins is thought to be essential for correct protein conformation (Melnick et al., 2000). The POZ domain in PLZF inhibits DNA-binding in vitro and thus acts as a transcriptional repressor (Melnick et al., 2000). Chromosomal translocations of the POZ-containing PLZF and LAZ/ Bcl6 genes (Kerckaert et al., 1993; Ye et al., 1993) result in novel POZ fusion proteins that act in a dominant negative way by sequestering their fusion partners into inactive multimeric complexes (Bardwell and Treisman, 1994). The full potential of the regulatory role of the POZ domain in different proteins remains to be realized. In this work, we report the discovery of seven other Tdpoz1 homologs in the mouse genome and show that Tdpoz-like orthologs exist in other species in the animal and the plant kingdoms. We now call this novel protein family TDPOZ, for TD and POZ domain containing proteins.
2. Materials and methods 2.1. Screening of the mouse BAC library For the derivation of Tdpoz1-containing genomic clones, we screened the mouse bacterial artificial chromosome (BAC) DNA Pools (Release II) and associated superpools obtained from Research Genetics (Hunsville, AL) by PCR using the primer pair CZ56-6 and CZ56-R6 (Table 1) that
C.-J. Huang et al. / Gene 324 (2004) 117–127 Table 1 Oligonucleotides used in this study Oligonucleotide Sequence
PCR product (bp)
CZ56-6 CZ56-R6 Tdpoz1 Tdpoz1R Tdpoz2 Tdpoz2R Tdpoz3 Tdpoz3R Tdpoz4 Tdpoz4R Tdpoz Tdpoz5R mG6pdF1 mG6pdR1 Hsp70.1F Hsp70.1R mh-actin mh-actinR
343
5V-CCAAGAGCAACAAACAACAACGTC-3V 5V-TCCTTTCCTCTGACCAGTTATCTG-3V 5V-CCTAGTTTTGCTCAGCCATCTG-3V 5V-ATAGTAGGCAATGAAATCAAGG-3V 5V-GAGAAAGCAAAACTGGGGATTCAG-3V 5V-GACTTCCACCCAGAGCTTTT-3V 5V-GGGCACATCAGAAAAGTATTACC-3V 5V-GGCATGTCAAATACGGTTCCA-3V 5V-TTGTTGAGGAAACGGAGGAAT-3V 5V-AATGCTGAAGGAGTCTTGG-3V 5V-TGTGTTAAGGAATTCTGCTATGTG-3V 5V-GAGAAACCGACGTGGATGAGAGAT-3V 5V-TGAGGGTCGTGGGGGCTATTTTGA-3V 5V-GCATCAGGGAGCTTCACATTCTTG-3V 5V-GAAGGTGCTGGACAAGTGC-3V 5V-GCCAGCAGAGGCCTCTAATC-3V 5V-CCCTAAGGCCAACCGTGAAAAGAT-3V 3V-ACCGCTCGTTGCCAATAGTGATGA-3V
730
119
quence tag (EST) database and the Riken FANTOM2 database (http://fantom2.gsc.riken.go.jp) were accessed. Multiple alignments for the TDPOZ proteins were generated with the CLUSTALW program with default settings using the Vector NTI software (Informax, Frederick, MD). Phylogenetic trees were generated from alignments using the CLUSTAL program included in the DNAStar (Madison, WI) packaged software.
648 376 374 369 563 293 431
covered the coding region of Tdpoz1 as previously described (Chen et al., 2002a,b; Choo et al., 2002). Consistently positive clones were located and were purchased from Research Genetics. 2.2. Southern blot analysis Mouse genomic DNA was prepared from tail biopsies using the standard phenol – chloroform extraction protocol (Sambrook et al., 1989). For each digest, 15 Ag DNA of genomic or 0.5 Ag of BAC DNA was used. Hybridization was performed in the ExpressHyb hybridization solution (Clontech, Palo Alto, CA). Hybridization probes were 32Plabeled using the Rediprime II random primed labeling system (Amersham Phamarcia Biotech, Picataway, NJ) (Sambrook et al., 1989). 2.3. Subcloning, sequence analysis and database search Subcloning of BAC clones was performed using the pGEM7 or pGEM-TEasy vector. Sequencing reactions were performed using the BigDye terminator cycle sequencing kit (ABI, Foster City, CA) according to the user’s manual; the reaction products were analyzed in an ABI377 automatic sequencer. Tdpoz homologs were derived by alignment of the Tdpoz1 coding sequence against GenBank databases (http://www.ncbi.nlm.nih.gov) using the BLASTN and BLASTX algorithms, identifying matches with E < 10 5 over a 60% query search length. Homologs of potential interest were further identified using the BLASTP search program. TD/MATH and POZ/BTB domains were identified and verified using the SMART algorithms (http://smart.embl-heidelberg.de). For in silico expression profiling, the GenBank mouse expressed se-
2.4. BAC shotgun sequencing BAC DNA was prepared from cell culture using standard alkaline lysis procedures (Sambrook et al., 1989). BAC shotgun libraries were constructed as previously described (Boysen et al., 1997). Approximately 10-fold sequencing coverage was achieved using the BigDye terminator cycle sequencing kit for end sequences of a shotgun library with 2.5 – 3.5-kb inserts. Sequences were jointly assembled using the Phred/Phrap/Consed software (obtained from University of Washington, Seattle) as described (Edwing et al., 1998; Gordon et al., 1998). Accuracy, order and orientation of contigs were examined on the basis of linking information from forward and reverse sequence ends of each clone. Sequence gaps were closed by editing the end sequences of each contig, primer walking on linking clones and by sequencing PCR products from the BAC DNA. Restriction mapping was used to confirm the accuracy of the assembled BAC sequences. 2.5. Embryo collection, RNA preparation, RT-PCR and a-amanitin inhibition analysis Collection of unfertilized eggs and pre-implantation embryos from hormone-induced superovulating female mice, RNA preparation and RT-PCR were carried out as described (Chen et al., 2002b; Choo et al., 2001, 2002). Primers specific for each of the Tdpoz1 – 5 genes and the genes used as RTPCR controls are listed in Table 1. For the a-amanitin inhibition experiment, 50 pronuclear one-cell zygotes were cultured overnight in the absence or presence of 50 Ag/ml aamanitin to the two-cell stage (Worrad et al., 1994). RNA was then prepared and RT-PCR detection of transcripts of the target genes was performed.
3. Results 3.1. Derivation and sequence analysis of the Tdpoz1/ 2cpoz56 and Tdpoz2 genes The previously derived cDNA of the Tdpoz1/2cpoz56 gene is 1885 bp long with an open reading frame (ORF) encoding a putative protein of 365 residues (Fig. 1). Two adjacent potential initiation codons are noted. The cDNA carries a 144-bp 5V-untranslated region (5V-UTR) and a 631bp 3V-UTR. When the putative TDPOZ1 protein sequence
120
C.-J. Huang et al. / Gene 324 (2004) 117–127
Fig. 2. Southern blot analysis of the mouse Tdpoz genes. (a) Mouse genomic DNA was digested with different restriction enzymes and was probed with a 399bp 3V-UTR fragment of the Tdpoz1 cDNA. (b) Hybridization analysis of the nine Tdpoz BAC clones, digested with HindIII, using a probe that covered the entire Tdpoz1 coding region. Numerical designation of the BAC clones is shown above the autoradiogram. The two proposed overlapping contigs, A and B, are also shown.
is submitted to a SMART algorithm search to identify conserved protein domains, the TD/MATH and POZ/BTB domains are found (Fig. 1). To determine the copy number of the Tdpoz1 gene, mouse genomic DNA digested with different restriction enzymes was used in Southern blot hybridization using the Tdpoz1 sequence as a probe. In all cases of restriction digestion, multiple bands were observed (Fig. 2a) suggesting the existence of multiple Tdpoz1 homologs. A mouse BAC genomic library was next screened for the Tdpoz1 gene by PCR using primers that covered the coding sequence of Tdpoz1. A total of nine positive clones were obtained. Southern blot analysis of the BAC clones using a probe that embraced the entire Tdpoz1 coding region
confirmed that all the BAC clones carried Tdpoz1-like sequences (Fig. 2b). It could be deduced from the Southern blots that BAC clones 8 and 9 are almost identical, and that BAC clones 1, 2, 4, 8 and 9 form an overlapping contig A and BAC clones 3, 5, 6 and 7 probably constitute another contig B. Computer-assisted analysis of the autoradiogram displayed in Fig. 2b and those obtained with different restriction digestions indicates that the two contigs are probably linked (data not shown). A battery of primers based on different segments of the Tdpoz1 cDNA sequence was further used to probe the BAC clones by PCR. BAC clones 8 and 9 were consistently found to be positive for all the Tdpoz1 primers tested. The entire BAC clone 9 was subjected to sequence analysis; the sequence
Fig. 3. Genomic structure of Tdpoz1 and Tdpoz2 derived from sequencing of the 56BAC9 clone (Section 3.1). Thick lines indicate the 5V- or 3V-UTR sequences; slanting dash lines denote intron splicing events; white boxes house the coding sequences (cds). Nucleotide positions of exons 1 and 2 and the intron in each of the genes are indicated based on the sequence for the entire 56BAC9 clone (GenBank accession number AF545858). pA denotes the putative AATAAA polyadenylation signal.
C.-J. Huang et al. / Gene 324 (2004) 117–127 Table 2 The mouse Tdpoz genes derived by subcloning and sequence analysis of mouse BAC clones Gene
BAC clone
GeneBank accession numbersa
Size (bp)
Tdpoz1
9
Tdpoz2 Tdpoz3 Tdpoz4 Tdpoz5
9 3 7 6
AF290198b AF545858 AF545858 AF545857 AY159314 AY159315
1885 124,933 124,933 1713 1623 1659
121
C56BL6/DBA2 hybrid (from which the Tdpoz1/2cpoz56 cDNA originated) and in the outbreed strain ICR (data not shown). The cDNA of Tdpoz2 has not been derived. However, it is deduced from an alignment against the homologous Tdpoz1 cDNA sequence that the same 74-bp exon 1 sequence is also found in the genomic sequence of Tdpoz2 located between nt 48,524 and 48,597 of the 56BAC9 sequence (Fig. 3). The exact 5V-end of Tdpoz2 is not known. In Tdpoz2, the intron is 6324 bp long and the coding sequence is also uninterrupted. However, the 3V-UTR sequences of the Tdpoz1 and Tdpoz2 genes are dissimilar. In Tdpoz1, an AATAAA polyadenylation signal is located 631 bp downstream of the termination codon. In Tdpoz2, the first discernible AATAAA signal is located 1.78 kb downstream of the coding sequence.
a
Genomic sequences, unless stated, based on the genome of the mouse strain 129sv. b cDNA sequence derived from the mouse strain B6/DBA2 (Choo et al., 2001).
obtained is designated as 56BAC9 (GenBank accession number AF545858). 56BAC9 is 124,933 bp long. The uninterrupted ORF of the Tdpoz1 gene is located at nucleotide (nt) 23,383 – 24,480 of the 56BAC9 sequence (Fig. 3). The uninterrupted ORF of a second Tdpoz1-like gene, designated as Tdpoz2, is found 30 kb downstream of Tdpoz1 locating at nt 54,975– 56,069 (Fig. 3). The Tdpoz1 and Tdpoz2 genes are 79.2% homologous in the nucleotide sequences in the coding region. The discovery of the homologous Tdpoz2 gene in the 56BAC9 sequence is consistent with the observation of multiple Tdpoz1-like bands in the Southern blots (Fig. 2) described above. Using 56BAC9 as a probe, we have mapped the Tdpoz1/Tdpoz2 gene pair on chromosome 3 at 3F2.1– 2.2 (data not shown). When the previously derived Tdpoz1 cDNA sequence is aligned against the 56BAC9 sequence, we find that the cDNA sequence is composed of two exons disrupted only once by a 4673-bp intron in the 5V-UTR close to the 5Vterminus of the cDNA (Figs. 1 and 3). The 74-bp exon 1 sequence is located between nt 18,580 and 18,653 of the 56BAC9 sequence (Fig. 3). Exon 2 of the Tdpoz1 gene carries the rest of the 5V-UTR, the coding sequence and the entire 3V-UTR (Fig. 3). It is noteworthy that in the course of examination of the Tdpoz1 genomic sequence, we have identified three polymorphic alleles of Tdpoz1 in the mouse strain 129sv (from which the BAC clones were derived), the
3.2. Derivation of other Tdpoz1 homologs in the mouse genome To search for other Tdpoz1 homologs, plasmid sublibraries were generated from the BAC clones and were screened by low-stringency hybridization using Tdpoz1 as a probe. Three more Tdpoz homologs, designated Tdpoz3, Tdpoz4 and Tdpoz5 were derived from BAC clones 3, 7 and 6, respectively, in the form of short plasmid DNA fragments (Table 2). We note that the Tdpoz3 – 5 genes are derived from the BAC clones that seem to form the same overlapping contig B (Fig. 2b) suggesting that they probably form a tight cluster. Sequencing data show that the coding regions of the Tdpoz3 – 5 genes are also uninterrupted by introns as in Tdpoz1 and Tdpoz2. It remains to be determined, when full genomic sequences for these genes are available, if the Tdpoz3 – 5 genes carry intron sequences in the 5V- or 3V-UTR. To integrate our experimental data with the current assembled sequences of the mouse genome and to search for other potential mouse Tdpoz homologs, we used the TDPOZ1 protein sequence to perform a simple BLAST-like alignment Tool (BLAT) search against the February 2003 Mouse Sequence Assembly using the UCSC Genome browser (http://genome.ucsc.edu). Nine hits were obtained, each of which was further subjected to SMART algorithm
Table 3 Putative TDPOZ proteins encoded in the mouse genome derived by BLAT algorithm searches Protein designation
Size (number of residues)
Domain(s)
Chromosome
Strand
Start
End
POZ-A POZ-B TDPOZ2 TDPOZ1 TDPOZ6 TDPOZ7 TDPOZ8 SPOP
188 212 328 347 256 331 322 407
POZ POZ TD/POZ TD/POZ TD/POZ TD/POZ TD/POZ TD/POZ
3 3 3 3 3 3 14 11
+
93,896,905 94,432,331 94,454,881 (56,550)a 94,470,940 (27,712)a 94,547,807 94,571,012 102,987,856 96,310,465
93,897,468 94,432,984 94,460,880 (50,574)a 94,475,082 (23,580)a 94,548,644 94,572,064 102,988,881 96,331,817
+ + +
BLAT searches were conducted using the UCSC Genome Brower using the TDPOZ and SPOP protein sequences as queries against the February 2003 Assembly of the mouse genome sequences. The data tabulated are based on Genscan Gene Predictions of the individual genes revealed by the search. Sequences that did not show discernible protein domains in SMART searches are not included. a Nucleotide position in the 56BAC9 sequence (GenBank accession number AF545858).
122
C.-J. Huang et al. / Gene 324 (2004) 117–127
Table 4 Percentage identity of the mouse SPOP and TDPOZ protein sequences SPOP TDPOZ1 TDPOZ2 TDPOZ3 TDPOZ4 TDPOZ5 TDPOZ6 TDPOZ7 TDPOZ8
55.2 56.4 53.3 56.6 56.0 57.0 58.5 50.9 SPOP
70.4 77.6 60.7 71.3 97.7 98.2 95.3 TDPOZ1
69.6 65.5 73.3 75.0 74.2 64.6 TDPOZ2
59.8 71.6 79.7 78.5 77.3 TDPOZ3
analysis to identify potential domains. Two of the hits carried only the POZ domain, five hits were bipartite proteins carrying the TD and the POZ domains (Table 3) and the two remaining hits did not reveal any discernible domains and were not further analyzed. The same set of hits consistently appeared when TDPOZ2-5 sequences and the five BLAT-derived TD/POZ protein hits were used as
60.7 63.3 63.9 58.7 TDPOZ4
73.4 70.3 64.0 TDPOZ5
98.8 88.7 TDPOZ6
88.2 TDPOZ7
queries. Subsequent nucleotide alignments suggest that two of the BLAT-derived TD/POZ hits probably corresponded with the Tdpoz1 and Tdpoz2 genes described in Section 3.1 above. In each case, a 6-kb block of the 56BAC9 sequence including the intronless coding sequence aligned with >99% homology without interruption with the BLAT-derived data. It is evident that the remaining three
Table 5 Putative TDPOZ proteins in animals and plants derived by BLAST and BLAT algorithm searches Protein
Accession numbers
Organisma
Size
Location of TD
Location of POZ
Chromosome
TDPOZ1
Mm
365
24 – 130
188 – 287
3
TDPOZ2 TDPOZ3 TDPOZ4 TDPOZ5 TDPOZ6 TDPOZ7 TDPOZ8 MSPOP RTDPOZ1 RTDPOZ2 RTDPOZ3 RTDPOZ4
AF545858 AF290198 AF545858 AF545857 AY159314 AY159315 Chr3_19.29b Chr3_19.30b Chr14_20.8b BAB68542 XP227344 XP227346 XP227347 XP227350
Mm Mm Mm Mm Mm Mm Mm Mm Rn Rn Rn Rn
364 365 370 340 256 330 322 373 364 364 577 711
XP227351 XP227352 XP227353 XP227355 AK021919 CAA04199 AAF55007 AAC63596 CAA91323 AAC46650 AAB71059 P34568 P41886 AAB87125 AAK76565 AAK32896 NP_189956 AAM97116 AAK68819 BAC21424 AF128457
Rn Rn Rn Rn Hs Hs Dm Ce Ce Ce Ce Ce Ce At At At At At At Os Os
564 358 314 359 391 374 377 395 387 410 326 409 418 408 406 465 411 407 429 364 445
188 – 287 188 – 287 188 – 287 188 – 287 143 – 231 176 – 267 152 – 248 188 – 285 188 – 287 188 – 187 188 – 287 188 – 287 512 – 643 392 – 491 188 – 287 188 – 287 189 – 288 200 – 297 200 – 297 203 – 300 195 – 296 227 – 325 223 – 320 148 – 242 223 – 326 231 – 332 193 – 303 201 – 311 215 – 328 201 – 317 203 – 309 197 – 312 192 – 309 276 – 382
3 3 3 3 3 3 14 11 2 2 2 2
RTDPOZ5 RTDPOZ6 RTDPOZ7 RTDPOZ8 BAB13937 SPOP DSPOP MEL26 F52H3.3 C07D10.2 R52.1 CESPOP F37A4.9 T517.6 F28L1.13 F20H23.23 At3g43700 At5g19000 F22D1 OJ1221_ H04 AAD27629.1
24 – 130 24 – 130 24 – 130 24 – 130 5 – 85 24 – 118 14 – 105 35 – 152 24 – 130 32 – 130 24 – 130 24 – 130 409 – 515 228 – 334 24 – 130 24 – 130 24 – 131 31 – 164 31 – 164 34 – 167 41 – 165 84 – 206 84 – 185 20 – 127 53 – 186 46 – 172 24 – 161 32 – 169 46 – 183 34 – 168 36 – 170 28 – 165 53 – 159 110 – 245
a
2 2 2 2 2 17 3R 1 2 2 2 3 3 2 3 3 3 5 5 8 10
Organisms listed: Mm, Mus Muculus; Rn, Rattus norvegicus, Hs, Homo sapiens; Dm, Drosophila melanogaster; Ce, Caenorhabditis elegans; At, Arabidopsis thaliana; Os, Orgza sativa. b Sequences derived by a BLAT search against the February 2003 Assembly of the mouse genome using TDPOZ1 protein as the query.
C.-J. Huang et al. / Gene 324 (2004) 117–127
123
Spop gene mapped on a different chromosome on chromosome 11, SPOP and the TDPOZ proteins are most likely related in evolutionary terms. Conceptual translation of the discernible ORFs of the Tdpoz and Spop genes indicates that, with the exception of TDPOZ6 which appears to have a very short leader sequence preceding the TD domain, the putative TDPOZ proteins and SPOP are relatively constant in size (320 – 373 residues), and all the proteins carry the TD and POZ domains in approximately the same relative locations (Table 5, top nine lines). Except for TDPOZ6, the differences in size are found mainly in the carboxyl end of the proteins. 3.3. Analysis of transcription of the Tdpoz genes
Fig. 4. Transcription profiling of the Tdpoz genes by RT-PCR. (a) Detection of the Tdpoz1 – 5 transcripts in unfertilized eggs (Eg), two- and eight-cell embryos (2C and 8C), blastocysts (Bl) and testis (Te). The h-actin gene was used as a control. (b) Transcription of the Tdpoz1 gene in the one- to twocell embryonic stages in the presence (+) or absence ( ) of a-amanitin (aAm). The G6pd (maternally transcribed) and Hsp70.1 (zygotically transcribed) genes were included as controls (see text).
Tdpoz homologs from the BLAT search, designated as Tdpoz6 through to Tdpoz8 (Table 3), had escaped our BAC screening (Section 3.1). When the Tdpoz3 – 5 genes were subjected to the BLAT search, no matches with >95% homology appeared indicating that the sequences of these genes are not yet in the current mouse sequence assembly. They may lie within the numerous sequencing gaps that are found in the chromosome 3 segment that carries the other Tdpoz genes. Hence, we have shown in this work that there are at least eight Tdpoz homologs in the mouse genome. With the exception of Tdpoz8 which is mapped in chromosome 14, all other Tdpoz genes are found in chromosome 3 in both strands indicating reversed insertion during the evolution process. When the SPOP sequence was used as a query in the BLAT search, two hits were obtained but TD and POZ domains could be identified in only one of the hits mapping in chromosome 11, identifying that to be the Spop gene (Table 3). It is noted here that when TDPOZ protein sequences were used as queries, SPOP was not among the hits obtained, and vice versa, probably because SPOP and TDPOZ proteins share only < 60% identity (Table 4). On the other hand, some of the TDPOZs share a sequence identity greater than 95% (Table 4). Despite the fact that the
To determine if the mouse Tdpoz genes are functional, we focused on Tdpoz1 – 5, the sequences of which were experimentally derived in our laboratory. Since Tdpoz1 has been shown to be expressed in pre-implantation embryos (Choo et al., 2001), expression of Tdpoz2 – 5 in different stages of pre-implantation development was similarly determined by RT-PCR using Tdpoz gene-specific primers (Table 1). We show that Tdpoz1 mRNA was detected in unfertilized eggs and in early embryonic stages up to the blastocyst stage (Fig. 4a). Tdpoz3 – 5 expression was noted clearly in the two-cell embryo but weak signals were also discernible in other stages for Tdpoz3 and 5. Transcription of Tdpoz2 was, however, not evident in any of the embryonic stages analyzed. When the expression profiling was extended to later stages of development and to the major organs of adult mice, weak but consistent RT-PCR signals were detected in the testis using primers Table 6 Expression sequence tags (EST) with significant homology with the mouse Tdpoz genes Expression site
GenBank accession numbersa
Two-cell embryos
AA413548, AA545069, AA547104, AA547616, AA619934, AA647935, AA666887, AA794127, AI505909, AI593756, AI645740, BB591899, BB624829, BB709752, BY725364, BY725651 AK030746b AK028201b BQ830220, BQ828064 AK033449b, AV370075, BB624829 AK004669b AK014913b AI561653 BI685210
Day 8 whole fetus Day 11 whole fetus Day 14-14.5 fetal liver Colon Lung Testis Skin Mammary carcinoma a
Sequences were derived from the GenBank mouse EST database by BLASTN and TBLASTN searches, or stated otherwise, using the Tdpoz1 – 5 sequences as queries. Only sequences with E values >e 50 are listed. b Sequences derived from BLASTN and TBALSTN searches of the RIKEN FANTOM2 database.
124
C.-J. Huang et al. / Gene 324 (2004) 117–127
for Tdpoz2, 3 and 5 (Fig. 4a). The results indicate that Tdpoz2 maybe testis-specific and other Tdpoz genes are temporally expressed in early embryos, particularly at the two-cell stage. Previous studies have established that activation of the zygotic genome is initiated in the late one-cell zygotic stage; transcription of the zygotic genes is in full swing in the two-cell embryo (Bouniol et al., 1995; Schultz, 1993). To elucidate the mode of transcription of the Tdpoz genes using Tdpoz1 as a model, one-cell zygotes were cultured in the presence of the transcription inhibitor, a-amanitin (Worrad et al., 1994; Bevilacqua et al., 2000). On culturing the drug-treated one-cell zygotes to the two-cell stage, transcription of Tdpoz1 and the control G6pd and Hsp70.1 genes was assayed by RT-PCR. G6pd is known to be actively transcribed only in the unfertilized egg but the highly stable G6pd mRNA persists in the two-cell embryo; initiation of transcription of Hsp70.1, on the other hand, occurs only in the early zygote (Bevilacqua et al., 2000; Wang et al., 2001). Our results show that in the presence of a-amanitin, G6pd mRNA was indeed still detectable in the two-cell embryo (Fig. 4b). As anticipated, Hsp70.1 mRNA was undetectable in the presence of
a-amanitin confirming zygotic activation of the gene. We also did not detect Tdpoz1 mRNA when the embryos were cultured in the presence of the drug (Fig. 4b) clearly indicating that Tdpoz1 transcription is also zygotically activated. Furthermore, the data also signify that, unlike G6pd, the maternally derived Tdpoz1 mRNA is highly unstable and is degraded by the two-cell stage as true for >90% of other maternally derived transcripts (Bachvarova and De Leon, 1980; Paynton et al., 1988; Paynton and Bachvarova, 1994). To further investigate the expression of the Tdpoz genes, the Tdpoz1 – 5 sequences were used as queries in exploring the GenBank mouse Expression Sequence Tag (EST) and the Riken PANTOM2 databases using the BLASTN and TBLASTN algorithms. Using an E value >e 50 as the threshold, 16 EST sequences of significant homology with the Tdpoz1 – 5 genes were found (as on July 31, 2003) that were derived from cDNA libraries derived from two-cell mouse embryos (Table 6), consistent with our RT-PCR profiling results that Tdpoz1, 3 –5 are specifically expressed in two-cell embryos. One to three Tdpoz sequences were also identified in later stages of development and in the colon, lung, testis and skin of adult mice, and in a case of a
Fig. 5. Phylogenetic tree of the putative TDPOZ proteins described in this work. The length of each pair of branches represents the distance between the sequence pairs; dotted lines indicate a negative branch length from averaging. Details of the proteins and the abbreviations used are as shown in Table 5. The proteins in boxes are those selected for sequence alignment in Fig. 6.
C.-J. Huang et al. / Gene 324 (2004) 117–127
mouse mammary carcinoma-derived cell line. Taken together, we have demonstrated that the Tdpoz genes are functional in early embryo development and are transcribed albeit in relatively much lower levels in other developmental stages and in some adult organs. 3.4. Uncovering of Tdpoz-like orthologs in the genomes of other animal and plant species In an iterated BLAST search of the GenBank database, we have further uncovered 25 other Tdpoz-like genes in the genomes of rat, human, C. elegans, Drosophila, the small flowering dicot plant A. thaliana and the monocot Orgza sativa (rice) (Table 5). In the rat genome, a cluster of eight Tdpoz orthologs are found in the supercontig NW_43535 derived from chromosome 2. Here, we call the rat genes Rtdpoz1 –8, and the putative proteins RTDPOZ’s (Table 5). A close examination of the sequences indicates that the Rtdpoz genes also appear to be intronless in the coding sequences. Of the eight Rtdpoz genes, Rtdpoz4 is unusual in that it would encode a protein with two tandem copies of the bipartite TDPOZ. Unlike all other TDPOZ proteins, each of
125
which would carry a linking peptide segment between the TD and POZ domains, the second TDPOZ copy in RTDPOZ4 would not have such a linkage. Intriguingly, we have located at least six discernible segments that carry only POZ-coding sequences dispersed among the eight Rtdpoz genes in the same supercontig (data not shown). The Rtdpoz orthologs have most likely been through a complex evolutionary process of retrotranspositon, duplication, fusion and possibly truncation. As a result, some of the Rtdpoz genes may be dysfunctional pseudogenes. Our analysis also detected the presence of a single TD-containing segment in another supercontig NW_042670 derived from chromosome 10 of the rat genome. It remains to be elucidated if this solitary TD segment is evolutionarily related to the RTDPOZ proteins. It comes as a surprise that despite repeated BLAST and BLAT searches using the latest human genome sequence assembly (as on July 31, 2003), only two apparent human Tdpoz orthologs (domains verifiable by SMART) are found in the human genome. One of the genes is the known SPOP; the other human Tdpoz ortholog has not been described and studied before (Table 5). In the Drosophila genome, the only Tdpoz ortholog
Fig. 6. Alignment of selected TDPOZ proteins (boxed in Fig. 5). Alignment was performed using the MegAlign program of the Lasergene Sequence Analysis Software (DNAStar). Shading of residues was performed using the Quantify Mode of the publicly available GeneDoc software (Nicholas et al., 1997). Darker shading indicates residues that are identical in all the proteins; lighter shading indicates residues that are identical in 80% or more of the proteins aligned.
126
C.-J. Huang et al. / Gene 324 (2004) 117–127
detected is the putative Spop gene. Half a dozen Tdpoz-like orthologs have been found in the genomes of C. elegans and in Arabidopsis dispersing over different chromosomes. In rice, only two orthologs of Tdpoz are found but other ortholog may be anticipated when more sequences of the rice genome are made available. A detailed phylogenetic analysis of the TDPOZ proteins presently would be meaningful only when a wider spectrum of other animal and plant TDPOZ proteins is available. Nonetheless, for the purpose of taking a first glimpse of what is to come, a simple phylogenetic tree is generated from an alignment of the TDPOZ protein sequences (Fig. 5). The TDPOZ proteins are sorted into subgroups according to species. Nine representative TDPOZ proteins including at least one member from each species (boxed in Fig. 5) are randomly selected for a simplified sequence alignment (Fig. 6). We observe that homologous regions are found not only in the TD and POZ domains but throughout the proteins. We further note that the peptide segment linking the TD and the POZ domains in the animal TDPOZ proteins is rather constant suggesting a relatively conserved POZ domain in the TDPOZ proteins.
4. Discussion In this work, we describe the uncovering of 34 Tdpoz homologs and orthologs in animal and plant species. In the mouse genome, there are at least eight Tdpoz homologs. We show that the Tdpoz genes are functional and are transcribed in early embryos, particularly at the two-cell stage when the zygotic genome is transcriptionally active. Our ongoing work has further revealed that some of the rat Rtdpoz genes are also expressed in specific adult organs and tissues (Huang and Choo, unpublished data). Genomic sequence analysis shows that Tdpoz1 and Tdpoz2 are intronless in the coding region and carry a short exon 1 and a solitary 5V-UTR intron, strikingly similar in structure with three other early embryonic zinc finger protein genes, Rnf33, Rnf35 and Zfp352, that we have previously reported (Choo et al., 2001, 2002; Chen et al., 2002a,b). Tdpoz1 and Tdpoz2 are additions to a rapidly growing repertoire of early embryonic genes that possess the said unique gene structure (Choo et al., 2002). We have previously proposed that such a structure is derived from retrotransposition events in the course of evolution (Chen et al., 2002b; Choo et al., 2002). The function of TDPOZ proteins is unknown. However, conservation of the TD and POZ domains strongly suggests that these domains are critical in some cellular functions. Through interactions with other TRAFs and also other factors, TRAFs are known to act as protein scaffolds, linking the TD-bearing TRAFs to the IL-1 receptor/Toff pathways and, thus, modulating apoptosis and cytokine signaling (reviewed by Bradley and Pober, 2001; Wajant
et al., 2001). However, the TD domain of SPOP has been shown to interact only weakly with some TRAF members and that such an interaction fails to elicit inhibition of NFnB induction as other TRAFs do (Zapata et al., 2001). Thus, it appears that SPOP, and possibly other similarly sturctured TDPOZ proteins, do not function as TRAFs in apoptosis and signaling despite the presence of the TD domain. An important characteristic of the POZ domain is its ability to associate with transcription factors in inducing transcription repression (Section 1). In PLZF and BCL6, the POZ domains interact with the co-repressors SMRT/N-CoR, mSin3A and N-CoR (Dhordain et al., 1998; Nagy et al., 1997). Recruitment of the histone deacetylase by the POZ domain is also an important component of a mechanism that contributes to transcriptional repression (Heinzel et al., 1997; Nagy et al., 1997; Dhordain et al., 1998). On the other hand, since the majority of POZ proteins are also DNA-binding proteins, it will be important to identify putative DNA-binding motifs in the TDPOZ proteins. Interestingly, the POZ domain of PLZF and BCL6 interacts directly with Sp1 (Lee et al., 2002), a ubiquitous factor that is known to contribute to oocyte maturation and the development of early embryos in the mouse (Wang and Latham, 2000). We speculate here that the POZ domain of some TDPOZ proteins may similarly interact with Sp1, and hence, contributing to transcription regulation in early development and other cellular processes.
Acknowledgements This work was supported by the Taipei Veterans General Hospital research grants VGH91-343 and VGH92-346. We thank S.-Y. Hong and S.-M. Lin for contributing to this work.
References Bachvarova, R., De Leon, V., 1980. Polyadenylated RNA of mouse ova and loss of maternal RNA in early development. Dev. Biol. 74, 1 – 8. Bardwell, V.J., Treisman, R., 1994. The POZ domain: a conserved protein – protein interaction motif. Genes Dev. 8, 1664 – 1677. Bevilacqua, A., Fiorenza, M.T., Mangia, F., 2000. A developmentally regulated GAGA box-binding factor and Sp1 are required for transcription of the hsp70.1 gene at the onset of mouse zygotic genome activation. Development 127, 1541 – 1551. Bouniol, C., Nguyen, E., Debey, P., 1995. Endogenous transcription occurs at the 1-cell stage in the mouse embryo. Exp. Cell Res. 218, 57 – 62. Boysen, C., Simon, M.I., Hood, L., 1997. Analysis of the 1.1-Mb human alpha/delta T-cell receptor locus with bacterial artificial chromosome clones. Genome Res. 7, 330 – 338. Bradley, J.R., Pober, J.S., 2001. Tumor necrosis factor receptor-associated factors (TRAFs). Oncogene 20, 6482 – 6491. Chen, H.-H., Liu, T.Y.-C., Choo, K.-B., 2002a. Use of a common promoter by two juxtaposed and intronless mouse early embryonic genes Rnf33/ 2czf45 and Rnf35: implications in zygotic gene expression. Genomics 80, 140 – 143. Chen, H.-H., Liu, T.Y.-C., Huang, C.-J., Choo, K.-B., 2002b. Generation of
C.-J. Huang et al. / Gene 324 (2004) 117–127 two homologous and intronless zinc-finger protein genes, Zfp352 and Zfp353, with different expression patterns by retrotransposition. Genomics 79, 18 – 23. Choo, K.-B., Chen, H.-H., Cheng, W.T.-K., Chang, H.-S., Wang, M., 2001. In silico mining of EST databases for novel pre-implantation embryospecific zinc finger protein genes. Mol. Reprod. Dev. 59, 249 – 255. Choo, K.-B., Chen, H.-J., Liu, T.Y.-C., Chang, C.-P., 2002. Different modes of regulation of transcription and pre-mRNA processing of the structurally juxtaposed homologs, Rnf33 and Rnf35, in eggs and in preimplantation embryos. Nucleic Acids Res. 30, 4836 – 4844. Collins, T., Stone, J.R., Williams, A.J., 2001. All in the family: the BTB/ POZ, KRAB, and SCAN domains. Mol. Cell. Biol. 21, 3609 – 3615. Dhordain, P., Lin, R.J., Quief, S., Lantoine, D., Kerckaert, J.P., Evans, R.M., Albagli, O., 1998. The LAZ3(BCL-6) oncoprotein recruits a SMRT/mSIN3A/histone deacetylase containing complex to mediate transcriptional repression. Nucleic Acids Res. 26, 4645 – 4651. Edwing, B., Hillier, L., Wendl, M.C., Green, P., 1998. Base-calling of automated sequencer traces using phred: I. Accuracy assessment. Genome Res. 8, 175 – 185. Gordon, D., Abajian, C., Green, P., 1998. Consed: a graphic tool for sequence finishing. Genome Res. 8, 195 – 202. Heinzel, T., et al., 1997. A complex containing N-CoR, mSin3 and histone deacetylase mediates transcriptional repression. Nature 387, 43 – 48. Hoatlin, M.E., et al., 1999. A novel BTB/POZ transcriptional repressor protein interacts with the Fanconi anemia group C protein and PLZF. Blood 94, 3737 – 3747. Kerckaert, J.P., Deweindt, C., Tilly, H., Quief, S., Lecocq, G., Bastard, C., 1993. LAZ3, a novel zinc-finger encoding gene, is disrupted by recurring chromosome 3q27 translocations in human lymphomas. Nat. Genet. 5, 66 – 70. Lee, D.K., Suh, D., Edenberg, H.J., Hur, M.W., 2002. POZ domain transcription factor, FBI-1, represses transcription of ADH5/FDH by interacting with the zinc finger and interfering with DNA binding activity of Sp1. J. Biol. Chem. 277, 26761 – 26768. Melnick, A., et al., 2000. In-depth mutational analysis of the promyelocytic leukemia zinc finger BTB/POZ domain reveals motifs and residues required for biological and transcriptional functions. Mol. Cell. Biol. 20, 6550 – 6567. Nagai, Y., Kojima, T., Muro, Y., Hachiya, T., Nishizawa, Y., Wakabayashi, T., Hagiwara, M., 1997. Identification of a novel nuclear speckle-type protein, SPOP. FEBS Lett. 418, 23 – 26. Nagy, L., et al., 1997. Nuclear receptor repression mediated by a complex containing SMRT, mSin3A, and histone deacetylase. Cell 89, 373 – 380.
127
Nicholas, K.B., Nicholas Jr., H.B., Deerfield, D.W., 1997. GeneDoc: analysis and visualization of genetic variation. EMBNEW News 4, 14. Paynton, B.V., Rempel, R., Bachvarova, R., 1988. Changes in state of adenylation and time course of degradation of maternal mRNAs during oocyte maturation and early embryonic development in the mouse. Dev. Biol. 129, 304 – 314. Paynton, B.V., Bachvarova, R., 1994. Polyadenylation and deadenylation of maternal mRNAs during oocyte growth and maturation in the mouse. Mol. Reprod. Dev. 37, 172 – 180. Rothe, M., Wong, S.C., Henzel, W.J., Goeddel, D.V., 1994. A novel family of putative signal transducers associated with the cytoplasmic domain of the 75 kDa tumor necrosis factor receptor. Cell 78, 681 – 692. Sambrook, J., Fritsch, E.F., Maniatis, T., 1989. Molecular Cloning: A Laboratory Manual Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Schultz, R.M., 1993. Regulation of zygotic gene activation in the mouse. BioEssays 15, 531 – 538. Uren, A.G., Vaux, D.L., 1996. TRAF proteins and meprins share a conserved domain. Trends Biochem. Sci. 21, 244 – 245. Wajant, H., Henkler, F., Scheurich, P., 2001. The TNF-receptor-associated factor family: scaffold molecules for cytokine receptors, kinases and their regulators. Cell Signal. 13, 389 – 400. Wang, Q., Latham, K.E., 2000. Translation of maternal messenger ribonucleic acids encoding transcription factors during genome activation in early mouse embryos. Biol. Reprod. 62, 969 – 978. Wang, Q., Chung, Y.G., deVries, W.N., Struwe, M., Latham, K.E., 2001. Role of protein synthesis in the development of a transcriptionally permissive state in one-cell stage mouse embryos. Biol. Reprod. 65, 748 – 754. Worrad, D.M., Ram, P.T., Schultz, R.M., 1994. Regulation of gene expression in the mouse oocyte and early preimplantation embryo: developmental changes in Sp1 and TATA box-binding protein, TBP. Development 120, 2347 – 2357. Ye, B.H., Lista, F., Lo Coco, F., Knowles, D.M., Offit, K., Chaganti, R.S., Dalla-Favera, R., 1993. Alterations of a zinc finger-encoding gene, BCL-6, in diffuse large-cell lymphoma. Science 262, 747 – 750. Zapata, J.M., Pawlowski, K., Haas, E., Ware, C.F., Godzik, A., Reed, J.C., 2001. A diverse family of proteins containing tumor necrosis factor receptor-associated factor domains. J. Biol. Chem. 276, 24242 – 24252. Zollman, S., Godt, D., Prive, G.G., Couderc, J.L., Laski, F.A., 1994. The BTB domain, found primarily in zinc finger proteins, defines an evolutionarily conserved family that includes several developmentally regulated genes in Drosophila. Proc. Natl. Acad. Sci. U. S. A. 91, 10717 – 10721.