Biochemical and Biophysical Research Communications 293 (2002) 816–826 www.academicpress.com
Identification and characterization of a novel murine multigene family containing a PHD-finger-like motif R. Trappe,a,* M. Ahmed,a B. Gl€ aser,b C. Vogel,a S. Tascou,a P. Burfeind,a and W. Engela a
Institute of Human Genetics, Georg-August University Go€ttingen, Heinrich-D€uker-Weg 12, Go€ttingen D37073, Germany b Department of Human Genetics, University of Ulm, Parkstraße 11, Ulm D-89073, Germany Received 18 March 2002
Abstract The genes Phf5a and Phf5b-ps are the first two members of a novel murine multigene family that is highly conserved during evolution and belongs to the superfamily of PHD-finger genes. The Phf5 gene family contains an active locus on mouse chromosome 15, region E and several processed pseudogenes on different chromosomes. The active locus, Phf5a, is expressed ubiquitously in preand postnatal murine tissues and encodes a protein of 110 amino acids. The protein is localized in the nucleus in a non-homogenous pattern as the nucleolar subcompartment is almost free of Phf5a. The molecular and biological functions of Phf5a are unknown up-to-date, but the systematic deletion of its yeast homolog is lethal, pointing out that the protein is required for cell viability. Interpretation of our data and review of the literature suggest both basic and essential cellular functions of the Phf5a protein, possibly acting as a chromatin-associated protein. Ó 2002 Elsevier Science (USA). All rights reserved. Keywords: PHD finger; Zinc finger; Multigene family; Pseudogene; Retrotransposition; Evolutionary conservation; Chromatin
The PHD-finger domain was first described by Schindler et al. [35] within the homeobox protein HAT3.1 from Arabidopsis thalania. It is composed of eight amino acids: Cys(4)–His–Cys(3) with regular spacing. Proteins containing PHD-finger domains are known in yeast, plants, and mammals. Usually a single PHD-finger motif is present in a protein regularly accompanied by further protein domains but there are also some known exceptions [1,2]. Up to now PHD fingers have been found in two major groups of proteins: (1) in transcriptional activators, repressors, or cofactors [1,2] and (2) in proteins of chromatin modulating complexes such as acetyltransferases or complexes containing acetyltransferases such as p300 [3] or CBP [4]. Lyngso et al. [38] showed that the PHD finger of the transcription factor SPBP is involved in chromatin-mediated transcriptional regulation acting as a domain of protein–protein interaction.
*
Corresponding author. Fax: +49-551-399303. E-mail address:
[email protected] (R. Trappe).
In the present work we characterize a novel multisequence family and demonstrate that the active gene, Phf5a, on chromosome 15E encodes a small, evolutionary highly conserved protein of 110 amino acids containing a PHD-finger-like domain. All other Phf5related sequences found in the murine genome show features of processed pseudogenes and in addition a similar situation was observed in the human genome. The Phf5a gene is ubiquitously expressed and the protein is located in the nucleus. We hypothesize that the Phf5a protein acts as a chromatin-associated protein.
Materials and methods Database analysis of DNA and protein sequences. Nucleotide and deduced protein sequences of mouse Phf5a and Phf5b-ps were subjected to homology searches using the BLAST program in the public database (http://ncbi.nlm.nih.gov). Electronic PCR to test a DNA sequence for the presence of sequence tagged sites was carried out using the following program at NCBI: (http://www.ncbi.nlm.gov/ genome/sts/epcr.cgi) [5]. Identification of repeat sequences was done using the program Repeat Masker2 (http://ftp.genome.washington.edu/). Putative structural and functional motifs were analyzed by
0006-291X/02/$ - see front matter Ó 2002 Elsevier Science (USA). All rights reserved. PII: S 0 0 0 6 - 2 9 1 X ( 0 2 ) 0 0 2 7 7 - 2
R. Trappe et al. / Biochemical and Biophysical Research Communications 293 (2002) 816–826 Pfam (http://pfam.wustl.edu/hmmsearch.shtml), PredictProtein (http:// www.embl-heidelberg.de/predictprotein/submit_def.html), and Predict NLS (http://cubic.bioc.columbia.edu/cgi/var/nair/resonline.pl). Multisequence alignment was performed using the ClustalW program (http://www.ebi.ac.uk/clustalw). The following further databases were used for analyzing homologous genes in different species: Saccharomyces cerevisiae (http://genome-www.stanford.edu/Saccharomyces/), Saccharomyces pombe (http://www.sanger.ac.uk/Projects/ S_pombe/), Arabidopsis thalania (http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/ara.html), and Drosophila melanogaster (http://ww.flybase.org). Gene symbols and accession numbers. The gene symbol PHF5a (alias MGC1346) has been accepted by the HUGO Gene Nomenclature Committee (HGNC). The gene symbols Phf5a and Phf5b-ps for the corresponding murine genes have been accepted by the Mouse Genome Informatics (MGI) Nomenclature Committee, Jackson Laboratory. Mouse Phf5a cDNA and mouse genomic Phf5a and Phf5b-ps sequences have been deposited at the GenBank database under Accession numbers AF479288, AF479286, and AF479287, respectively. Cell lines. The cell lines GC-1spg and 15P-1 were purchased from the American Tissue Type Collection (ATTC). The spermatocytederived cell line GC-4spc and the Leydig tumor cell line MA-10 were established, as previously described [6,7]. GC-1spg and GC-4spc cells were maintained in DMEM (Gibco/BRL) supplemented with 1% nonessential amino acids, 10% FCS, and 1.4% penicillin/streptomycin solution (Gibco/BRL and Sigma) and grown at 37 °C in air containing 5% CO2 . MA-10 cells were maintained in Waymouth (Gibco/BRL) medium supplemented with 14.5% horse serum (Gibco/BRL) and 0.5% gentamicin solution (Gibco/BRL and Sigma) and grown at 37 °C in air containing 5% CO2 . 15P-1 cells were maintained in DMEM (Gibco/ BRL) supplemented with 4 mM L -glutamin, 1mM pyruvate, 10% FCS, and 1% penicillin/streptomycin solution (Gibco/BRL and Sigma) and grown at 32 °C in air containing 5% CO2 . Cloning of the murine Phf5a cDNA and genomic library screening. The mouse Phf5a cDNA was isolated by suppression subtractive hybridization (SSH) which was performed on RNAs derived from the spermatogonia-derived cell line GC-1spg and the spermatocyte-derived cell line GC-4spc [8]. The cDNA library of GC-1spg cells, containing differentially expressed transcripts, was used as a ‘‘tester.’’ The cDNA library of GC-4spc cells, that served as a reference, was used as a ‘‘driver’’. Subsequently the positively subtracted cDNA library was ligated into the pGEM-T vector (Promega) and plasmids containing cloned Phf5a cDNA from the subtracted library were identified by sequencing. Subsequently a mouse genomic cosmid library (129/ola mouse cosmid, 121) from the Resource Center of the German Human Genome Project (RZPD) was screened using the [a-32 P]dCTP labeled mouse Phf5a cDNA as a probe. Hybridization was carried out overnight at 65 °C in 50% formamide, 4 SSC, 50 mM sodium phosphate (pH 6.8–7.2), 1 mM EDTA (pH 8.0), 10% dextran sulfate (w/v), 1% SDS (w/v), 50 lg=ml denaturated salmon sperm DNA, and 10 Denhardt’s solution. Filters were washed at room temperature for 15 min in 2 SSC and subsequently in 0:2 SSC, 0:1 SDS at 65 °C for 10–30 min. The filters were dried and exposed to X-ray films with an intensifying screen at )70 °C. Mapping, subcloning, and sequencing of the cosmid inserts. Recombinant cosmid DNA was purified and mapped by restriction endonuclease cleavage and Southern blot hybridization [9]. For detection of Phf5 genes a Phf5a cDNA was used as a probe. Hybridizing fragments were subcloned into the plasmid vector pBluescript KS. Plasmid clones hybridizing with the Phf5a cDNA were analyzed and sequenced using vector- and sequence-specific primers. Insert sequences were aligned together to build up continuous sequences for Phf5a and Phf5b-ps, respectively. Northern blot analysis and RNase protection assay. Total RNA preparations were made from testicular cell lines and mouse tissues using the RNeasy Kit from Qiagen or the RNA Isolation Kit from
817
BIOMOL according to manufacturer’s instructions. For Northern blot analysis the total RNA samples ð20 lgÞ were electrophoresed on 1% denaturating agarose gels containing formaldehyde (5%) and transferred onto Hybond N nylon membranes (Amersham). Filters were hybridized with an [a-32 P]dCTP (3000 Ci/mmol) labeled mouse Phf5a cDNA probe at 65 °C in Rapid-hyb buffer (Amersham). Following hybridization, filters were washed, dried, and exposed to X-ray films, as described above. For integrity of RNA, the membranes were rehybridized with a human elongation factor-2 (hEF) cDNA probe [10]. For RNase protection assay (RPA) the complete coding region of Phf5a as shown in Fig. 3B was cloned into the multicloning site (MCS) of pBluescript KS. For in vitro transcription the insert was amplified with a proof-reading polymerase (PfuTurbo, Stratagene) using T3 and T7 primers of pBluescript and 50 ng plasmid as a template in a total volume of 50 ll. The PCR product was gel purified and 3–5 ll out of 50 ll was used as a template for in vitro transcription using the Maxiscript Kit from Ambion. The antisense transcript was synthesized using T3 RNA polymerase and was labeled by incorporation of [a-32 P]UTP (3000 Ci/mmol). The ratio of labeled to unlabeled UTP was adjusted to 1:3 for a highly sensitive detection of specific Phf5a mRNA transcripts. The antisense mouse b-actin probe was synthesized by in vitro transcription of linearized pTRI-b-Actin-Mouse template (Ambion) using T3 RNA polymerase and labeled by incorporation of [a-32 P]UTP (3000 Ci/mmol) adjusted to a ratio of labeled to unlabeled UTP of 1:600 for the detection of moderate abundant targets. After purification of the full length transcripts on a denaturating 8 M urea/ 5% (v/v) polyacrylamide gel, labeled probes were used for RNase protection assay. For multiprobe RNase protection assay using both an internal control probe (b-actin) and a probe specific for Phf5a the RPA III Kit from Ambion was used. Hybridization of 20 lg total RNA with the labeled probes was performed at 45 °C overnight according to manufacturer’s manual. After digestion with the RNase A/T1 mixture (1:100 dilution), the protected fragments were separated on a denaturating 8 M urea/5% (v/v) polyacrylamide gel and were detected by autoradiography. Chromosomal localization. DNAs of the mouse Phf5a- and Phf5bps-specific cosmid clones (MPMGc121E24208Q4 and MPMGc121P 06611Q2, respectively) were labeled with biotin-16-dUTP (Boerhinger, Mannheim) by nick translation and hybridized in situ to metaphases of the WMP-1 cell line from newborn mice carrying Robertsonian translocation chromosomes [11]. Signal detection via fluorescinated avidin (FITC-avidin) was performed as described [12]. Chromosomes were counterstained with 4,6-diamidino-2-phenylindol-dihydrochloride (DAPI, Boehringer, Mannheim). Images of emitted light were captured separately by using the DAPI and FITC filter set and were subsequently merged and aligned. Subcellular localization. A fusion construct was established by cloning the entire ORF of murine Phf5a in-frame into the pEGFPN1 expression vector (Clontech) upstream of the EGFP-coding sequence. A second construct (Phf5a-del) was established by cloning the region coding for amino acids 1–90 of murine Phf5a in-frame into the pEGFP-N1 expression vector using a PCR-based cloning strategy. pEGFP-N1 encodes an enhanced fluorescent variant of the Aequorea victoria green fluorescent protein (GFP). Mouse NIH3T3 fibroblast cells (2 105 ) were plated in single well chamber slides (Nunc) 24 h before transfection. The construct DNA (2 lg per chamber slide) was introduced into the cells using the Superfect transfection reagent (Qiagen) according to manufacturer’s instruction. After 48 h, the transiently transfected NIH3T3 cells were fixed on the chamber slides with 100% methanol ()20 °C) for 10 min and subsequently washed with PBS. Cell nuclei were counterstained with DAPI (Boehringer, Mannheim) and observed under a fluorescence microscope. Images of emitted light were captured separately by using the DAPI and GFP filter set and were subsequently merged and aligned.
818
R. Trappe et al. / Biochemical and Biophysical Research Communications 293 (2002) 816–826
Results The mouse Phf5a cDNA encodes a small PHD-finger-like protein Using a suppression subtractive hybridization (SSH) which was performed on RNAs from the spermatogonia-derived cell line GC-1spg and the spermatocytederived cell line GC-4spc the complete 859-bp cDNA of mouse Phf5a was isolated. The murine Phf5a cDNA consists of a 50 UTR of 45 bp and an open reading frame (ORF) of 330 bp encoding a protein of 110 amino acids (aa). The predicted protein has a molecular weight of 12.4 kDa and the theoretical pI lies at 8.413. Both the N- and C-terminal parts of the putative protein are strongly basic: pI lies at 10.02 for amino acids 1–21 and at 10.24 for the last 25 amino acids 85–110. The midpart of the protein (amino acids 22–86) contains an averaged amino acid composition of 9 strongly basic (K,R), 10 strongly acidic (D,E), 12 hydrophobic (A,I,L,F,W,V), and 25 polar (N,C,Q,S,T,Y) amino acids resulting in an average pI of 5.076. The 30 UTR consists of 481 bp and contains a putative polyadenylation signal (AAUAAA) which is located at nucleotide positions 833–838 (Fig. 1).
Fig. 1. Nucleotide and deduced amino acid sequences of murine Phf5a. The nucleotide sequence representing the putative polyadenylation signal (AATAAA) is underlined. The translation initiation codon ATG is written in bold letters and the stop codon is marked by an asterisk. The putative nuclear localization site is underlined, basic NLS residues are written in bold. Amino acid residues of the PHD-finger domain are shown in inverted contrast. Exon–intron boundaries deduced from the Phf5a genomic clone are shown in medium gray. Nucleotide residue numbers are shown at the left of each line, amino acid residue numbers are shown at the right of each line.
Database alignment with the Phf5a cDNA showed 91% homology at the nucleotide level to a corresponding human full length cDNA MGC1346 (Accession No. BC007321) within its ORF (corresponding gene: bK223H9.2 (Accession No. AL008582)) coding for a protein of 110 amino acids with a homology to murine Phf5a of 100%. Search for structural and functional motifs within the human PHF5a and the murine Phf5a protein revealed the presence of a PHD-finger domain in the mid-part of the protein (amino acids 29–78) with a significant expectation value (E-value) of 0.033. Phf5a is a structurally intact gene while Phf5b-ps is a pseudogene Murine Phf5a and Phf5b-ps genes were isolated by screening a genomic mouse cosmid library from the Resource Center of the German Human Genome Project (RZPD) using the mouse Phf5a cDNA as a probe. In this screen eight clones containing Phf5 genes were isolated. The Phf5 genes described here were isolated from cosmid clones MPMGc121E24208Q4 and MPMGc121H15709Q2 (containing the Phf5a gene) and MPMGc121P06611Q2 (containing the Phf5b-ps pseudogene). The genes were isolated and analyzed as described in Materials and methods. The insert sequences of plasmid subclones pRT10, pRT19, pRT11, and pRT6 of cosmid MPMGc121H15709Q2 were aligned together to built up a continuous sequence of 8834 bp (nucleotide sequence data available under Accession No. AF 479286). The murine Phf5a gene is arranged within this sequence in four exons of 97, 25, 168, and 573 bp, spanning a genomic region of 6.608 kb (Fig. 2b). The insert sequences of the plasmid subclones pRT9, pRT13, and pRT8 of cosmid MPMGc121P06611Q2 built up a continuous sequence of 5285 bp (nucleotide sequence data available under Accession No. AF479287) containing the entire sequence of the Phf5b-ps gene. A comparison of this sequence to the Phf5a cDNA and database analysis using the TblastN procedures at NCBI revealed a genomic organization of Phf5b-ps consisting of two exons with lengths of 171 and 656 bp, respectively, spanning a genomic sequence of 1.288 kb (Fig. 2D). A putative poly-A-signal at positions 2940–2945 is modified (AGTAAA) and the coding sequence of Phf5b-ps contains two frame-shift mutations (c.75delA, c.307insAA) and a nonsense mutation (c.226 ðC > AÞ) leading to a premature termination of the gene product, indicating that Phf5b-ps is a pseudogene. Phf5a and Phf5b-ps genes are located on chromosomes 15E and 7A, respectively The genomic clones MPMGc121E24208Q4, containing the entire mouse Phf5a gene and MPMGc121P0
R. Trappe et al. / Biochemical and Biophysical Research Communications 293 (2002) 816–826
819
Fig. 2. Chromosomal localizations of Phf5a (A) and Phf5b-ps (C) using fluorescence in situ hybridization (FISH) on metaphase chromosomes from WMP-1 murine cells with a DNA cosmid probe containing Phf5a and Phf5b-ps, respectively. The arrows point to the specific signals of the mouse Phf5a gene on chromosome 15, region E and to the specific signals of the mouse Phf5b-ps gene on chromosome 7, region A. Open arrows point to a second hybridization signal on chromosome 8, region A, obtained with the Phf5b-ps cosmid probe. (B) Genomic organization of Phf5a. Exonic regions are shown as black boxes and introns are represented by horizontal lines. Numbers directly below the exons and above the introns refer to their exact lengths in base pairs. The positions of the translational start and stop codons are indicated. The murine Phf5a gene contains four exons and the entire gene spans approximately 6.6 kb of genomic DNA. (D) Genomic organization of Phf5b-ps. Exon and intron visualizations and numbering are as described above. The murine Phf5b-ps pseudogene contains two exons and the entire gene spans approximately 1.3 kb of genomic DNA. The coding sequence of Phf5b-ps contains two frame-shift mutations (c.75delA, c.307insAA) and one nonsense mutation (c.226 (C > A)), indicated by vertical arrows. The intron is composed of two LTR/ERV1 elements.
6611Q2, containing the entire Phf5b-ps gene, were used to determine the chromosomal locations of Phf5a and Phf5b-ps in the murine genome. Fluorescence in situ hybridization of the respective genomic cosmid clones to WMP-1 metaphase chromosomes revealed a chromosomal location of Phf5a to chromosome 15, region E (Fig. 2A). By electronic PCR on the genomic clone containing the human PHF5a gene (AL008582) we detected a sequence tagged site (STS), namely D22S678, 11.5 kb 50 of PHF5a, that is located at nucleotide positions 38590615–38590819 of the human chromosome 22 (corresponding cytogenetic position: 22q13.2). As mentioned
above, the orthologous mouse gene Phf5a was localized to mouse chromosome 15, region E, which is syntenic to human chromosome 22q13 [13]. In the hybridization pattern of cosmid MPMGc121 P06611Q2 containing the murine Phf5b-ps gene we detected specific signals on chromosome 7, region A and on chromosome 8, region A (Fig. 2C). For further determination of the chromosomal localization of Phf5b-ps the complete sequenced part (5.5 kb) of the cosmid clone was aligned to the Celera assembled mouse genome database (http://.cds.celera.com) and resulted in a perfect alignment to the sequence of the cluster
820
R. Trappe et al. / Biochemical and Biophysical Research Communications 293 (2002) 816–826
Fig. 3. Expression analysis of mouse Phf5a in murine cell lines and postnatal tissues. (A) Northern blot analysis with 20 lg total RNA prepared from the mouse spermatogonia-derived cell line GC1-spg, the mouse spermatocyte-derived cell line GC4-spc, and mouse testis. A Phf5-transcript of approximately 1.2 kb could be detected. For control of RNA integrity the Northern blot was rehybridized with a human elongation factor-2 (hEF) cDNA probe. (B) Phf5a subtype-specific probe and b-actin control probe used for RNase protection analysis. Map of subcloned cDNA fragments, antisense-RNA probes, and protected RNA fragments. In the upper part of the individual map specific cDNA fragments are shown as boxes and translation start and stop sites are indicated. Restriction sites used for the subcloning of DNA fragments and promoter sites for in vitro transcription are indicated. Numbering with respect to the corresponding position of the expression constructs is in base pairs. The shaded boxes below the maps indicate the lengths of transcribed antisense RNA probes, while the black boxes indicate the lengths of protected fragments in the RNase protection assay. Numbering with respect to the antisense probes is shown in nucleotides. (C) RNase protection assay of specific transcripts of the murine Phf5a gene and the murine b-actin gene for internal control. Total RNA (20 lg each) from murine testicular cell lines and a broad range of postnatal murine tissues were used for the RNase protection analysis. Comparison was performed within the series by comparing Phf5a hybridization intensities between RNA preparations with respect to the intensity of the corresponding b-actin hybridization. The expression of Phf5a in these testicular cell lines and postnatal murine tissues is ubiquitous. The RNase protection assay further disclosed a ubiquitous expression of a second closely Phf5arelated sequence, resulting in a second protected fragment of about 300 nucleotides. Marker: [c-32 P]dATP end-labeled fragments of 527, 489, 404, 360, and 242 nucleotides.
R. Trappe et al. / Biochemical and Biophysical Research Communications 293 (2002) 816–826
GA_x5J8B7W4436 which is located on mouse chromosome 7A1 (Mb positions 13–13.5). An alignment of the cosmid sequence to a cluster of chromosome 8 was not observed. Therefore the hybridization signal on chromosome 8 is best interpreted as ‘‘cross-hybridization’’ while the Phf5b-ps gene is localized on chromosome 7A. The murine Phf5a gene is ubiquitously expressed in preand postnatal tissues Northern blot analysis showed a strong expression of mouse Phf5-genes in the spermatogonia-derived cell line GC-1spg [14] with a transcript size of 1.2 kb while a weaker expression was present in the GC-4spc cell line and murine testis (Fig. 3A). The Northern blot was rehybridized with a human elongation factor II cDNA probe to verify the integrity and equality of the RNA [10]. To examine the expression pattern of Phf5a separately within the gene family the level of Phf5a-mRNA was detected by RNase protection analysis (RPA). Since not all other members of this gene family are known, we have cloned the complete coding region of Phf5a into a pBluescript expression vector. The exact fragment used for RPA as well as the fragment used for quantification and control of integrity of the RNA is shown in Fig. 3B. The RPA approach to analyze the expression of Phf5a excluded potential cross-hybridization of similar sequences of the different subtype genes that would lead to false positive signals. Only bands from protected fragments corresponding to the whole coding region of Phf5a were used for interpretation of the expression data. Potential false positive signals resulting from partial RNase digest or from secondary structures of the in vitro transcribed antisense RNA that may be insensitive to RNase cleavage were identified by reactions with yeast RNA instead of mouse RNA. Fig. 3C demonstrates the expression pattern of the murine Phf5a gene. The expression of Phf5a is ubiquitous. Phf5a is strongly expressed in the spermatogonia-derived cell line GC-1spg, the spermatocyte-derived cell line GC-4spc, and in the Leydig cell-derived cell line MA-10 while it is only weakly expressed in the Sertoli cell-derived cell line 15P-1. Phf5a expression in postnatal murine tissues is almost at the same level in all tissues. Furthermore, we could identify several EST clones showing 100% identity to Phf5a that originates from prenatal murine tissues, i.e., from a trophoblast stem cell cDNA library, a mouse 3.5-day fetus cDNA library, and a mouse 9-day fetus cDNA library (Accession No. BM210174, C77060, and BF607097, respectively). Additionally RNase protection assay disclosed the expression of a second (unknown) closely Phf5a-related sequence, resulting in a protected fragment of about 300 nucleotides. The expression pattern of this family member is similar to that of Phf5a.
821
The Phf5a protein is localized in the nucleus but not in the nucleolar subcompartment Mouse NIH3T3 cells were transiently transfected with a fusion construct containing the complete ORF of murine Phf5a which was coupled to the N-terminus of GFP (green fluorescent protein). After observation under a fluorescence microscope 48 h after transfection, the Phf5a–GFP protein was detected almost exclusively in the nucleus which is in agreement with the results obtained by the computer program PredictNLS which proposed a nuclear localization. Interestingly, the cytochemical analysis using the Phf5a–GFP fusion protein further pointed out that the nuclear distribution of Phf5a is not homogenous. While a strong GFP-fluorescence was observed in the nuclear matrix, the nucleolar subcompartment of transfected NIH3T3 cells was almost free of GFP-fluorescence. In addition, a second fusion construct containing a deletion of the last 20 amino acids at the C-terminus of Phf5a was generated. The sequence deleted contains several strongly basic amino acids that may represent a possible NLS (ERKKYGFKKR). Transfection of NIH3T3 cells with this second construct resulted in a predominant cytoplasmatic distribution of the fusion protein, while GFPfluorescence in the nucleus was strongly reduced. Again the nucleolar subcompartment of transfected NIH3T3 cells was almost free of GFP-fluorescence (Fig. 4).
Discussion The Phf5a cDNA was initially isolated through SSH between the spermatogonia-derived cell line GC1-spg and the spermatocyte-derived cell line GC4-spc. Phf5a and Phf5b-ps genomic clones were isolated from a 129/ ola mouse cosmid library, mapped, subcloned, and sequenced. A genomic Southern blot hybridization using a cDNA probe specific for Phf5a confirmed the pattern of a multigene family (data not shown). The two novel genes Phf5a and Phf5b-ps were termed Phf5 (PHD-finger protein 5) according to a PHD-finger-like domain present in their deduced protein sequence containing an already established root of PHD-finger proteins (PHF1: [15]; PHF2: [16]). The murine gene Phf5a was mapped on chromosome 15, region E, by FISH analysis, a region of conserved synteny to human chromosome 22q13.2 [13] which was found to contain the corresponding human PHF5a gene. The murine pseudogene Phf5b-ps was mapped on chromosome 7, region A, by FISH analysis and electronic database comparison. Furthermore, database alignment of the human PHF5a protein to the human and mouse genome databases at NCBI and at Celera, respectively, disclosed the presence of at least six closely related sequences in the human genome and of at least
822
R. Trappe et al. / Biochemical and Biophysical Research Communications 293 (2002) 816–826
Fig. 4. Subcellular localization of Phf5a in mammalian cells. Mouse NIH3T3 cells were transiently transfected with a fusion construct containing the complete ORF of murine Phf5a and GFP (a–c) or with a construct containing a truncated ORF of murine Phf5a fused to GFP (Phf5a-del-GFP) (d–f). The latter construct served as an internal experimental control for the subcellular localization of the Phf5a protein and for identification of its NLS. (a) Irradiation with fluorescent light disclosed an almost exclusive cellular localization of Phf5a in the nucleus sparing out the nucleolar subcompartment. (d) The fusion protein Phf5a-del-GFP shows a predominant cytoplasmatic distribution, while the GFP-fluorescence in the nucleus is strongly reduced. Again the nucleolar subcompartment of transfected NIH3T3 cells is almost free of GFP-fluorescence. (b,e) Cell nuclei were counterstained with DAPI. (c), (f) Overlay of GFP- and DAPI-fluorescence signals.
four closely related sequences in the mouse genome, all located on different chromosomes. Their amino acid identity to human and mouse Phf5a determined by TBlastN ranges from 68% to 93% (Table 1). The homologous sequences differ from the cDNA sequences of Phf5a and PHF5a, respectively, by several missense mutations, nonsense mutations, deletions, and insertions. They all lack an exon/intron structure and most of them possess a homopolymeric purine repeat sequence. For sequences on human chromosomes 19p13.2, 3q13.2, 3q23, Xq24, and 2q24 and for the sequence of murine Phf5b-ps, direct flanking repeats were identified representing integration sites (Table 1). Taken together, the six human PHF5a-like sequences as well as the four murine Phf5a-related sequences display the hallmarks of processed pseudogenes, which are generated by transposition in the genome via an RNA intermediate [17–19]. The intron of murine Phf5b-ps most probably arose from a second integration event of a retrotransposon in a former intronless structure (Fig. 2) [20]. Therefore we consider all these sequences as processed pseudogenes, symbolized by W1 –W6 and by Phf5b-ps and W1 –W3 , respectively. Processed pseudogenes are generally not transcribed, because they do not include
the transcriptional control elements present in the functional gene. The possibility that they become influenced by control elements of other genes cannot be excluded. With respect to the results of the RNase protection assay for murine Phf5a it is most likely that a second ubiquitously expressed family member, not identified so far, exists in the mouse genome. Its sequence identity to Phf5a must be very high to ensure protection over a length of 300 bp with a probe specific for Phf5a (Fig. 3C). Thus, we speculate that this sequence may be a further active member of the Phf5 multigene family. The fact that in A. thalania two nonallelic genes code for an identical protein (Fig. 5) also points to the possible existence of further active Phf5loci in both the mouse and human genomes. To examine the evolutionary conservation of Phf5 genes a database alignment was performed and revealed that the human PHF5a and mouse Phf5a proteins are highly homologous to proteins of other species throughout evolution. The strongest homology of mouse Phf5a and human PHF5a is found with the product from the Drosophila gene CG9548 (Accession No. AAF52393) coding for a 110 amino acid protein with 97% identity and to the proteins coded by two
A-rich region at the 30 end
Flanking direct Flanking repetitive sequence repeats 50 30
Name
Accession no.
Chromosomal localization
Exon/intron structure
Open reading frame
Human PHF5a
AL008582
22q13.2
+
+
W1 W2 W3 W4 W5 W6
NT_011295.4 NT_005795.7 NT_005832.6 NT_028405.3 NT_005343.7 NT_010274.8a
19p13.2 3q13.2 3q23 Xq24 2q24 15q26ter
– – – – – –
– – – – – –
Mouse Phf5a
AF479286b
15E
+
+
Phf5b-ps
AF479287b
7A
+
–
+
+
W1 W2 W3
5JB87TQU7Q 5J8B7W3ETRa 5J8B7W5P47
Unknown 17E1 7E1
– – –
Truncatedc – –
+ ?
– ?
– + – – + +
+ + + + + –
SINE/Alu
SINE/Alu
LTR/MaLR LTR/MaLR AT_rich
SINE/Alu SINE/Alu SINE/Alu SINE/Alu DNA/Mariner SINE/Alu
LTR/MaLR ?
Coding sequence
Identity to PHF5a by TBlastN (%)
61925..62014, 68975..69141, 69651..69674, 70129..70180
100
137642..137969 2229275..2229606 521795..521464 1904779..1905092 1164901..1165227 1182287..1182453
93 88 74 68 85 77
1307..1358, 1982..2005, 2672..2838, 7298..7387 1703..1845, 2308..2498 974..1303 984768..984771 105440..105174
100
83 85 86 73
Database alignment of protein sequences was performed using the TblastN routine at NCBI and at Celera. Flanking repetitive sequences were identified using Repeat Masker2. Chromosomal localizations are given as far as annotated in the databases. a Complete sequence is not given in the database. b Sequences from the cosmid clones MPMGc121H15709Q2 and MPMGc121 P06611Q2, respectively, as submitted to GenBank. c Open reading frame for aa1 to aa109.
R. Trappe et al. / Biochemical and Biophysical Research Communications 293 (2002) 816–826
Table 1 Organization of the human PHF5 and murine Phf5 multigene family
823
824
R. Trappe et al. / Biochemical and Biophysical Research Communications 293 (2002) 816–826
distinct genes from A. thalania coding for the same 110 amino acid protein with 94% identity. Other homologous proteins from yeast are the hypothetical proteins YPR094w from S. cerevisiae (Accession No. S69077) and SPAC23H3.02c from S. pombe (Accession No. CAB40798) showing 55% and 68% identities, respectively (Fig. 5). Proteins from lower organisms such as bacteria did not show significant similarity to these highly conserved proteins except a weak homology within the polycysteine PHD-finger region. These data suggest a novel protein family of to date unknown functions in eukaryotic organisms. This extensive evolutionary conservation at the protein level indicates contribution of Phf5 proteins to basic cellular functions [21]. The S. cereivisiae homolog YPR094w of murine Phf5a and human PHF5a is an intronless single copy gene with an ORF of 324 bp located on the Watson strand in the intron of SYT1 on chromosome XVI [22]. To date the molecular and biological functions of YPR094w is unknown but the systematic deletion of YPR094w is lethal [23], pointing out that the protein is required for cell viability. A functional analysis of YPR094w was performed along with a global gene expression profiling approach in yeast and revealed that YPR094w is differentially expressed at different a-factor concentrations and time series [24]. Clustering of genes
Fig. 5. Multiple amino acid sequence alignments of human PHF5a and mouse Phf5a with homologous proteins, i.e., the protein of CG9548 from Drosophila melanogaster (Accession No. AAF52393), At2g30000 from A. thalania chromosome II (Accession No. AC004680 reverse complement translation of bases 50813..50484), homolog of At2g30000 from A. thalania Chromosome 1 (Accession No. AC067971.5 reverse complement translation from bases 40874..40533), SPAC23H3.02c from S. pombe (Accession No. CAB40798) and YPR094w from S. cerevisiae (Accession No. S69077). Amino acids identical to mouse and human amino acid sequences are shown in inverted contrast. Shown in medium gray are amino acids with a functional similarity to mouse and human sequences. Dashed lines indicate gaps introduced to maximize alignment.
showing similar expression patterns in these experiments resulted in grouping YPR094w into a cluster together with genes known to be part of RNA polymerases, genes involved in mitotic spindle assembly, and genes responsible for chromosome condensation [24]. In this study, we have shown that the Phf5a protein is localized in the nucleus of NIH3T3 cells in a nonhomogenous pattern as the nucleolar compartment of NIH3T3 cells is almost free of Phf5a. As this compartment is not separated from the nuclear matrix by delineating membranes [25] and many nuclear proteins are known to be highly mobile while diffusion provides an efficient and rapid mode of transport within the nucleus [26], sparing out of the nucleoli in the distribution pattern of Phf5a argues for an at least transient association of Phf5a with a distinct matrix substance similar to a roaming behavior with a target present in interphase nuclei sparing out the nucleoli [27]. Analysis of the primary structure of the Phf5a protein has disclosed the presence of a PHD-finger-like domain in the mid-part of the protein and a highly basic composition of the flanking N- and C-terminal parts of this small protein of 110 amino acids. All cysteine residues within the PHD-finger motif are evolutionary conserved (Fig. 5), and alignment of the PHD-finger domain of human and murine PHF5a/Phf5a to a set of 40 different PHD-finger motifs [28] resulted in grouping together the PHD-fingers of murine Phf5a, human PHF5a, and Drosphila’s polycomblike (Pcl) showing a common ancestor together with RiR1357. Alignments were performed with ClustalW on the basis of the sequences published by Aasland et al. [28]. Pcl codes for a protein with two typical PHD-finger domains and a Tudor domain [29] which regulates the expression of different homeotic genes through a mechanism thought to involve some aspect of chromatin structure [30]. The PHD-finger domain of RiR1357 is classified as an imperfect PHD finger within this family since it does not display all the characteristic features of the motif [28]. Up to now PHD fingers have been found in two major groups of proteins: (1) in transcriptional activators, repressors or cofactors [31–33] and in (2) proteins of chromatin modulating complexes [3,4,34]. With respect to the function of PHD fingers it is noteworthy that in vitro the PHD finger is able to bind to DNA [35] although Lyngso et al. [38] showed that the PHD finger of the transcription factor SPBP is involved in chromatin-mediated transcriptional regulation acting as a domain of protein–protein interaction which was shown alike for many other PHD-finger proteins. Concerning the functional structure of Phf5a the highly basic N- and C-terminal domains may be able to directly interact with DNA based on electrostatic adhesion [36,37] exposing the central part of the protein containing the PHD finger for specific interactions with other proteins or specific DNA sequences.
R. Trappe et al. / Biochemical and Biophysical Research Communications 293 (2002) 816–826
Our results provide evidence for a novel and up-todate unknown multisequence family that is highly conserved during evolution and belongs to the superfamily of PHD-finger genes. We have shown the genomic organization of the first two members (Phf5a and Phf5bps) of this gene family in mouse and have provided evidence for several further members in the murine and human genomes, most of them being processed pseudogenes. We have analyzed the expression pattern of Phf5a and the subcellular localization of its encoded protein. Interpretation of our data and review of the literature suggest both basic and essential cellular functions of the Phf5a protein possibly acting as a chromatin-associated protein.
[11]
[12]
[13]
[14]
[15]
Acknowledgments [16] The authors thank Ute Teske and Astrid Herwig for the technical assistance and Dr. Gregor Schl€ uter for performing analyses at the Celera database. This work was supported by a grant from the Medical Faculty, Georg-August-University, G€ ottingen (Forschungsf€ orderprogramm 2001 to RT) and by the Deutsche Forschungsgemeinschaft (SFB 271 to WE).
[17] [18] [19]
References [1] A.L. Adamson, A. Shearn, Molecular genetic analysis of Drosophila ash2, a member of the trithorax group required for imaginal disc pattern formation, Genetics 144 (1996) 621–633. [2] S. Ikegawa, M. Isomura, Y. Koshizuka, Y. Nakamura, Cloning and characterization of ASH2L and Ash2l, human and mouse homologs of the Drosophila ash2 gene, Cytogenet. Cell Genet. 84 (1999) 167–172. [3] L. Bordoli, S. Husser, U. Luthi, M. Netsch, H. Osmani, R. Eckner, Functional analysis of the p300 acetyltransferase domain: the PHD finger of p300 but not of CBP is dispensable for enzymatic activity, Nucleic Acids Res. 29 (2001) 4462–4471. [4] A.L. Newton, B.K. Sharpe, A. Kwan, J.P. Mackay, M. Crossley, The transactivation domain within cysteine/histidine-rich region 1 of CBP comprises two novel zinc-binding modules, J. Biol. Chem. 275 (2000) 15128–15134. [5] G.D. Schuler, Sequence mapping by electronic PCR, Genome Res. 7 (1997) 541–550. [6] S. Tascou, K. Nayernia, A. Samani, J. Schmidtke, T. Vogel, W. Engel, P. Burfeind, Immortalization of murine male germ cells at a discrete stage of differentiation by a novel directed promoterbased selection strategy, Biol. Reprod. 63 (2000) 1555–1561. [7] M. Ascoli, Characterization of several clonal lines of cultured Leydig tumor cells: gonadotropin receptors and steroidogenic responses, Endocrinology 108 (1981) 88–95. [8] S. Tascou, K. Nayernia, J. Uedelhoven, D. Bohm, R. Jalal, M. Ahmed, W. Engel, P. Burfeind, Isolation and characterization of differentially expressed genes in invasive and non-invasive immortalized murine male germ cells in vitro, Int. J. Oncol. 18 (2001) 567–574. [9] E.M. Southern, Detection of specific sequences among DNA fragments separated by gel electrophoresis, J. Mol. Biol. 98 (1975) 503–517. [10] J. Hanes, J. Freudenstein, G. Rapp, K.H. Scheit, Construction of a plasmid containing the complete coding region of human
[20]
[21]
[22]
[23]
[24]
[25] [26] [27]
[28]
[29]
825
elongation factor 2, Biol. Chem. Hoppe Seyler 373 (1992) 201– 204. M. Zornig, C. Klett, H. Lovec, H. Hameister, H. Winking, S. Adolph, T. Moroy, Establishment of permanent wild-mouse cell lines with readily identifiable marker chromosomes, Cytogenet. Cell Genet. 71 (1995) 37–40. P. Lichter, T. Cremer, J. Borden, L. Manuelidis, D.C. Ward, Delineation of individual human chromosomes in metaphase and interphase cells by in situ suppression hybridization using recombinant DNA libraries, Hum. Genet. 80 (1988) 224–234. M. Bucan, B. Gatalica, P. Nolan, A. Chung, A. Leroux, M.H. Grossman, J.H. Nadeau, B.S. Emanuel, M. Budarf, Comparative mapping of 9 human chromosome 22q loci in the laboratory mouse, Hum. Mol. Genet. 2 (1993) 1245–1252. M.C. Hofmann, S. Narisawa, R.A. Hess, J.L. Millan, Immortalization of germ cells and somatic testicular cells using the SV40 large T antigen, Exp. Cell Res. 201 (1992) 417–435. M. Coulson, S. Robert, H.J. Eyre, R. Saint, The identification and localization of a human gene with sequence similarity to Polycomblike of Drosophila melanogaster, Genomics 48 (1998) 381–383. K. Hasenpusch-Theil, B.P. Chadwick, T. Theil, S.K. Heath, D.G. Wilkinson, A.M. Frischauf, PHF2, a novel PHD finger gene located on human chromosome 9q22, Mamm. Genome 10 (1999) 294–298. A.J. Mighell, N.R. Smith, P.A. Robinson, A.F. Markham, Vertebrate pseudogenes, FEBS Lett. 468 (2000) 109–114. J.H. Rogers, The origin and evolution of retroposons, Int. Rev. Cytol. 93 (1985) 187–279. E.F. Vanin, Processed pseudogenes: characteristics and evolution, Annu. Rev. Genet. 19 (1985) 253–272. Y.M. Man, H. Delius, D.P. Leader, Molecular analysis of elements inserted into mouse gamma-actin processed pseudogenes, Nucleic Acids Res. 15 (1987) 3291–3304. F. Stanchi, E. Bertocco, S. Toppo, R. Dioguardi, B. Simionati, N. Cannata, R. Zimbello, G. Lanfranchi, G. Valle, Characterization of 16 novel human genes showing high similarity to yeast sequences, Yeast 18 (2001) 69–80. J.M. Cherry, C. Adler, C. Ball, S.A. Chervitz, S.S. Dwight, E.T. Hester, Y. Jia, G. Juvik, T. Roe, M. Schroeder, S. Weng, D. Botstein, SGD: Saccharomyces Genome Database, Nucleic Acids Res. 26 (1998) 73–79. E.A. Winzeler, D.D. Shoemaker, A. Astromoff, H. Liang, K. Anderson, B. Andre, R. Bangham, R. Benito, J.D. Boeke, H. Bussey, A.M. Chu, C. Connelly, K. Davis, F. Dietrich, S.W. Dow, M. El Bakkoury, F. Foury, S.H. Friend, E. Gentalen, G. Giaever, J.H. Hegemann, T. Jones, M. Laub, H. Liao, R.W. Davis, et al., Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis, Science 285 (1999) 901– 906. C.J. Roberts, B. Nelson, M.J. Marton, R. Stoughton, M.R. Meyer, H.A. Bennett, Y.D. He, H. Dai, W.L. Walker, T.R. Hughes, M. Tyers, C. Boone, S.H. Friend, Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles, Science 287 (2000) 873–880. U. Scheer, R. Hock, Structure and function of the nucleolus, Curr. Opin. Cell Biol. 11 (1999) 385–390. M. Dundr, T. Misteli, Functional architecture in the cell nucleus, Biochem. J. 356 (2001) 297–310. A.B. Houtsmuller, S. Rademakers, A.L. Nigg, D. Hoogstraten, J.H. Hoeijmakers, W. Vermeulen, Action of DNA repair endonuclease ERCC1/XPF in living cells, Science 284 (1999) 958–961. R. Aasland, T.J. Gibson, A.F. Stewart, The PHD finger: implications for chromatin-mediated transcriptional regulation, Trends Biochem. Sci. 20 (1995) 56–59. S. O’Connell, L. Wang, S. Robert, C.A. Jones, R. Saint, R.S. Jones, Polycomblike PHD fingers mediate conserved interaction
826
[30]
[31]
[32]
[33]
R. Trappe et al. / Biochemical and Biophysical Research Communications 293 (2002) 816–826 with enhancer of zeste protein, J. Biol. Chem. 276 (2001) 43065– 43073. A. Lonie, R. D’Andrea, R. Paro, R. Saint, Molecular characterisation of the Polycomblike gene of Drosophila melanogaster, a trans-acting negative regulator of homeotic gene expression, Development 120 (1994) 2629–2636. N.E. Ladendorff, S. Wu, J.S. Lipsick, BS69, an adenovirus E1Aassociated protein, inhibits the transcriptional activity of c-Myb, Oncogene 20 (2001) 125–132. X. Wang, S. Yeh, G. Wu, C.L. Hsu, L. Wang, T. Chiang, Y. Yang, Y. Guo, C. Chang, Identification and characterization of a novel androgen receptor coregulator ARA267-alpha in prostate cancer cells, J. Biol. Chem. 276 (2001) 40417–40423. Y.G. Gangloff, J.C. Pointud, S. Thuault, L. Carre, C. Romier, S. Muratoglu, M. Brand, L. Tora, J.L. Couderc, I. Davidson, The TFIID components human TAF(II)140 and Drosophila BIP2 (TAF(II)155) are novel metazoan homologues of yeast TAF(II)47 containing a histone fold and a PHD finger, Mol. Cell Biol. 21 (2001) 5109–5121.
[34] D.A. Bochar, J. Savard, W. Wang, D.W. Lafleur, P. Moore, J. Cote, R. Shiekhattar, A family of chromatin remodeling factors related to Williams syndrome transcription factor, Proc. Natl. Acad. Sci. USA 97 (2000) 1038–1043. [35] U. Schindler, H. Beckmann, A.R. Cashmore, HAT3.1, a novel Arabidopsis homeodomain protein containing a conserved cysteine-rich region, Plant J. 4 (1993) 137–150. [36] V. Brendel, S. Karlin, Association of charge clusters with functional domains of cellular transcription factors, Proc. Natl. Acad. Sci. USA 86 (1989) 5698–5702. [37] R. Wintjens, J. Lievin, M. Rooman, E. Buisine, Contribution of cation-pi interactions to the stability of protein–DNA complexes, J. Mol. Biol. 302 (2000) 395–410. [38] C. Lyngso, G. Bouteiller, C.K. Damgaard, D. Ryom, S. SanchezMunoz, P.L. Norby, B.J. Bonven, P. Jorgensen, Interaction between the transcription factor SPBP and the positive cofactor RNF4: an interplay between protein binding zinc fingers, J. Biol. Chem. 275 (2000) 26144–26149.