The zebrafish genes encoding the Polycomb repressive complex (PRC) 1

The zebrafish genes encoding the Polycomb repressive complex (PRC) 1

Gene 475 (2011) 10–21 Contents lists available at ScienceDirect Gene j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / g ...

2MB Sizes 37 Downloads 100 Views

Gene 475 (2011) 10–21

Contents lists available at ScienceDirect

Gene j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / g e n e

The zebrafish genes encoding the Polycomb repressive complex (PRC) 1 Perrine Le Faou, Pamela Völkel, Pierre-Olivier Angrand ⁎ Chromatinomics, Interdisciplinary Research Institute, University Lille Nord de France, Université de Lille 1 Sciences et Technologies/CNRS USR 3078, 50 Avenue Halley, Parc Scientifique de la Haute Borne, F-59658 Villeneuve d'Ascq Cedex, France

a r t i c l e

i n f o

Article history: Accepted 23 December 2010 Available online 30 December 2010 Received by A.J. van Wijnen Keywords: Phylogeny Chromatin

a b s t r a c t Polycomb repression controls regulation of hundreds of genes involved in development, signalling or cancer and is mediated by essentially two classes of chromatin-associated protein complexes, the Polycomb repressive complexes 1 and 2 (PRC1 and PRC2). PRC2 trimethylates histone H3 at Lysine 27 and this H3K27me3 epigenetic mark serves as a docking site for the PRC1 protein complex. Drosophila core PRC1 is composed of four subunits, Polycomb (Pc), Posterior sex combs (Psc), Polyhomeotic (Ph), and Sex combs extra (Sce). Each of these proteins has multiple orthologs in vertebrates. In particular, mammalian genomes encode five Pc family members (CBX2, 4, 6, 7 and 8), six Psc family members (BMI1, PCGF1, 2, 3, 5, and 6), three Ph family members (PHC1, 2 and 3) and two Sce family members (RING1 and RNF2) generating an enormous scope for potential combinatorial diversity. In order to identify the corresponding PRC1 genes in zebrafish, homology searches were undertaken and allowed the identification of a total of 19 genes. Using phylogenetic, gene organization and gene location analyses, these genes were classified. The zebrafish genes encoding the PRC1 protein complex include 8 Pc orthologs (cbx2, cbx4, cbx6a, cbx6b, cbx7a, cbx7b, cbx8a and cbx8b), 6 Psc orthologs (bmi1a, bmi1b, pcgf1, pcgf5a, pcgf5b and pcgf6), 4 Ph orthologs (phc1, phc2a, phc2b and phc3) and a single Sce ortholog (rnf2). Our results indicate that the potentially high number of distinct PRC1 protein complexes generated by the components combinatorial appeared early in the vertebrate evolution. In addition to conserved gene organization and syntenies, transcript analyses revealed that transcriptional regulation leading to various isoforms syntheses is also conserved at genes encoding the PRC1 components, highlighting a possible important biological role of these isoforms. © 2010 Elsevier B.V. All rights reserved.

1. Introduction In metazoans, the anterior–posterior axis is specified through defined expression patterns of homeotic genes. In Drosophila, the activity of maternally produced and early embryonic expressed transcription factors establishes a specific combination of homeotic gene regulation required at each segment. Although the expression of these sequence-specific activators and repressors is transient, the defined homeotic gene expression profile is maintained throughout the development of the fly. The Polycomb (PcG) and trithorax (trxG) groups of proteins, respectively, act to maintain these repressed or active transcription states through many rounds of cell division

Abbreviations: PcG, Polycomb group; PRC, Polycomb repressive complex; Pc box, Polycomb box; Pc, Polycomb; Sce, Sex combs extra; Ph, Polyhomeotic; Psc, Posterior sex combs; SDS-PAGE, Sodium dodecyl sulfate-polyacrylamide gel electrophoresis; ORF, Open reading frame; EST, Expressed sequence tag; shRNA, short hairpin RNA; PBS, Phosphate buffer saline; GST, Glutathione S-transferase; GFP, Green fluorescent protein; RT-PCR, Reverse transcription-polymerase chain reaction; CBX, Chromobox homolog; PCGF, Polycomb group ring finger; BMI, B lymphoma Mo-MLV insertion; RNF, Ring Finger protein; PHC, Polyhomeotic homolog; bp, base pair; dpf, days postfertilization; ES cell, Embryonic stem cell; MEF, Mouse embryonic fibroblast. ⁎ Corresponding author. Tel.: +33 3 62 53 17 16; fax: +33 3 62 53 17 01. E-mail address: [email protected] (P.-O. Angrand). 0378-1119/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2010.12.012

(Kennison, 1995; Schuettengruber et al., 2007). These proteins function at the level of chromatin organization, maintaining expression patterns either by establishing a permissive environment for transcription or by sequestering genes in a silent chromatin state (Ringrose and Paro, 2004; Pirrotta, 1998). Although primarily known for their role in cell identity maintenance during the establishment of the Drosophila body plan, several PcG and trxG members have now been implicated in the control of various cellular processes including chromosome X-inactivation in mammals (Wang et al., 2001; de Napoles et al., 2004), cell cycle control (Oktaba et al., 2008), cell fate decisions (Sparmann and van Lohuizen, 2006), stem cell differentiation (Pasini et al., 2007), senescence (Bracken et al., 2007), and tumourigenesis (Sparmann and van Lohuizen, 2006; Leung et al., 2004; Lessard and Sauvageau, 2003). PcG proteins interact with each other to form multimeric, chromatin-associated protein complexes of two general types: the Polycomb repressive complex 1 (PRC1) and PRC2 (Shao et al., 1999; Kuzmichev et al., 2002; Cao et al., 2002; Ringrose and Paro, 2004; Völkel and Angrand, 2007). In Drosophila, the core of PRC2 consists in Enhancer of zeste [E(z)], Suppressor of zeste 12 [Su(z)12] and Extra sex combs (Esc) and is involved in the initiation of gene repression. E(z), the catalytic subunit of PRC2, trimethylates lysine 27 of histone H3 (H2K27me3) (Müller et al., 2002; Nekrasov et al., 2007). The PRC1

P. Le Faou et al. / Gene 475 (2011) 10–21

protein complex, which contains the core proteins Polycomb (Pc), Polyhomeotic (Ph), Posterior sex combs (Psc) and Sex combs extra (Sce) recognizes the H3K27me3 mark through the chromodomain of the Pc protein (Fischle et al., 2003; Min et al., 2003). In mammals, Polycomb-mediated gene silencing is more complex than in Drosophila since a numerous PcG orthologs exist (Levine et al., 2002). Concerning the PRC1 protein complex, mammalian genomes encode five orthologs for Pc (CBX2, 4, 6, 7 and 8), six Psc orthologs (BMI1, PCGF1, 2, 3, 5, and 6), three Ph orthologs (PHC1, 2 and 3) and two Sce orthologs (RING1 and RNF2). There are evidence that different PRC1 protein complexes exist in cells (Maertens et al., 2009, unpublished observation). Moreover, mice deficient for individual PRC1 components share homeotic defects, but harbour distinct phenotypes (van der Lugt et al., 1994; Akasaka et al., 1996; Coré et al., 1997; Takihara et al., 1997; del Mar Lorente et al., 2000; Suzuki et al., 2002; Voncken et al., 2003; Isono et al., 2005), indicating that different PRC1 complexes might have at least some non-redundant target genes. Thus, understanding the individual biological role of each PcG paralogous gene remains a main objective towards the deciphering of PcG-mediated gene repression in vertebrates. The zebrafish (Danio rerio) is a widely used vertebrate model for studying development and morphogenesis (King, 2009). Indeed, the zebrafish embryo has many characteristics that make it ideal for undertaking developmental studies (Kimmel et al., 1995). Fertilization and subsequent embryonic development are external and occur synchronously in large clutches. The embryos are relatively large and their development can be observed easily through the chorion. During the first 24 h of development, the embryos are completely transparent, allowing the observation of developing organs, even deep inside the living organism. Moreover, the embryonic development is rapid; after about 2 days, all common vertebrate specific body features can be seen including a compartmentalized brain, eyes, ears, and all the internal organs. Furthermore, D. rerio becomes an appreciated model in cancer research not only for the development of zebrafish models of human tumours, but also because of its use to study a broad array of cancer-related phenomena, including genomic instability, cancer epigenetics, tumour angiogenesis, tumour invasion, metastasis or tumour immunology (Feitsma and Cuppen, 2008; Leach, 2009). Finally, the recent development of target-selected gene inactivation methods in zebrafish has allowed the application of powerful reverse genetics tools to this vertebrate model (Wienholds and Plasterk, 2004; Amacher, 2008; Ekker, 2008). Here, we report on the zebrafish PcG genes that encode components of the PRC1 protein complex. We describe 19 zebrafish PcG genes and their relationships with their human counterparts using phylogenetic, gene organization and gene location analyses. 2. Materials and methods 2.1. Data sources of genomic and cDNA sequences Zebrafish (D. rerio) as well as medaka (Oryzias latipes), fugu blowfish (Takifugu rubripes) and spotted green pufferfish (Tetraodon nigroviridis) shotgun traces and assemblies were obtained from the Ensembl Genome Server (http://www.ensembl.org/index.html). The cDNA and EST sequence data of zebrafish were obtained from the National Center of Biotechnology Information (NCBI) (Sayers et al., 2010; http://www.ncbi.nlm.nih.gov/). 2.2. Identification of genes encoding the PRC1 Zebrafish orthologs of Pc, Psc, Ph and Sce were identified in a homology-based search, TBLASTN (Altschul et al., 1990) of genomic DNA and cDNA nucleotide sequences with the amino acid sequences of human full length protein sequences or conserved domains from CBX, BMI1/PCGF, PHC and RING1 family members as queries. BLASTP

11

was used to calculate percentages of identity between the zebrafish and their human counterparts at the levels of full length proteins or conserved domains. 2.3. Multiple sequence alignments and phylogenetic analysis The multiple sequence alignments were done using ClustalW (Thompson et al., 1994). Based on the alignments, phylogenetic trees were constructed using the neighbour-joining method. 2.4. Gene location analysis Chromosomal location and gene orders of zebrafish genes were obtained from the latest zebrafish whole-genome assembly Zv8 version, zebrafish genome mapping information from the Zebrafish International Resource Center (ZFIN) website (Bradford et al., 2011; http://zfin.org) and from the NCBI Map Viewer (Sayers et al., 2010; http://www.ncbi.nlm.nih.gov/mapview/). To analyze syntenies, zebrafish genes surrounding the genes of interest were identified and subjected to ortholog identification against human non-redundant protein database and the identified proteins were linked to the NCBI Map Viewer. The chromosomal locations of these ortholog pairs were drawn along human and zebrafish chromosomes, thus revealing conserved syntenies between human and zebrafish. 2.5. Nomenclature Zebrafish genes were named in agreement with the ZFIN nomenclature guidelines (http://zfin.org/zf_info/nomen.html), and the Human Genome Organization (HUGO, http://www.hugo-international.org/) gene symbols were used to describe human genes. 2.6. Zebrafish maintenance Zebrafish (TU strain) were maintained at 27.5 °C in a 14/10 h light/ dark cycle. The evening before spawning, males and females were separated into individual tanks. Spontaneous spawning occurred when the light turned on and larvae were collected 5 days later for total RNA extraction. 2.7. RT-PCR Total RNA was extracted from 50 larvae, 5 dpf using RNeasy Mini Kit (Qiagen) and DNA was removed using an RNase-Free DNase Set (Qiagen). One microgram of total RNA was used to synthesize cDNAs, using the Superscript III kit (Invitrogen) according to manufacturer's instructions and the resulting cDNAs were submitted to PCR. The sequence of the primers used was as follows: cbx6a1_1438F: CAGCACAAGATGCTTCACTCCTGC and cbx6a1_1572R: GATCAATGCAAGGTTCGGATGCCT, cbx6a2_1295F: ACCGATTGGCACCCTGAAATGGC and cbx6a21461R: TGCAGTTGTGGTGGGAGAGTTTGG, cbx7_1691F: CCTCACCTGCTTGTCCTGTAACCA and cbx7_1933R: AGGCTCGGCTTTGATTTCTGCC, cbx7b_304F: GCTCATCTGCCACTCTCGCTGG and cbx7b_428R: TCCCTTCTGTCCCACTCTTCCTCA to identify the cbx6a1, cbx6a2, cbx7a and cbx7b transcripts, respectively. 2.8. Constructs and vectors engineering Expression vectors were generated by site-specific recombination using the GATEWAY system (Invitrogen) of PCR-amplified ORFs into TAP epitope-tagged Moloney murine leukemia virus-based vector pRP-NTAP-Gw, previously described (Souza et al., 2009) for in vitro expression. Glutathione S-transferase (GST)-tagged E. coli expression vectors were derived from pGEX-4 T1 (Amersham) (Souza et al., 2009). Full-length open reading frames of all cDNAs were PCR amplified from IMAGE cDNA clones purchased from imaGenes GmbH,

12

P. Le Faou et al. / Gene 475 (2011) 10–21

cloned into the GATEWAY entry vector pDONR201 (Invitrogen) and sequence verified. Entry clones were then recombined into suitable expression vectors by GATEWAY LR reactions. The inserted ORFs were subcloned from the following MGC cDNA clones: cbx4a1, cDNA clone MGC:198077 IMAGE:9039066; cbx4a2, cDNA clone MGC:158588 IMAGE:7118352; cbx6a1, cDNA clone MGC:101049 IMAGE:7152136; cbx7a, cDNA clone MGC:110152 IMAGE:7289412; cbx8a, cDNA clone MGC:153854 IMAGE:8001741; cbx8b, cDNA clone MGC:111978 IMAGE:7399500; bmi1a, cDNA clone MGC:56403 IMAGE:5605189; bmi1b, cDNA clone MGC:63927 IMAGE:6790897; phc2a2, cDNA clone IMAGE:6969670; and rnf2, cDNA clone MGC:55753 IMAGE:3816506.

Table 1 Human PcG genes that encode for PRC1 components and zebrafish orthologous and paralogous genes. Family

Polycomb (Pc)

Posterior sex combs (Psc)

Polyhomeotic (Ph)

2.10. Antibodies and western blotting The protein A moiety of the TAP tag was revealed with a rabbit peroxidase anti-peroxidase (PAP) antibody (P1291, Sigma; used at a dilution of 1:10,000). For Western blotting, protein samples in SDS loading buffer were electrophoresed on 4–12% Bis–Tris gels (Invitrogen) and transferred to nitrocellulose membranes (Schleicher & Schuell). The membranes were blocked in 10% milk powder in PBS-T (1X PBS with 0.1% Tween20) overnight at 4 °C, incubated for 1 h at room temperature with the PAP antibody in PBS-T, and washed three times 10 min in PBS-T. The signal was detected using chemiluminescence reagent (ECL, Amersham) on imaging film (GE Healthcare). 3. Results and discussion 3.1. Identification of zebrafish PcG genes that encode components of the PRC1 protein complex To gain insights into the mechanisms of Polycomb-mediated gene repression, we investigated the phylogenic conservation of PcG genes encoding components of the PRC1 protein complex. The Drosophila core PRC1 complex is composed of four proteins, Pc, Psc, Ph and Sce, whereas in mammals each of these proteins possesses several orthologs. With five Pc proteins (CBX2, 4, 6, 7 and 8), six Psc orthologs

Zebrafish Ortholog

2.9. In vitro GST protein binding assays GST fusion proteins, were expressed in E. coli BL21 (DE3) and purified on glutathione-Sepharose 4B (GE Healthcare) according to the manufacturer's instructions. GST-proteins were then fixed on glutathione-Sepharose 4B and stored in STE buffer (10 mM Tris–HCl pH8, 150 mM NaCl, 1 mM EDTA and complete protease inhibitors [Roche]). In vitro translated proteins used for GST-pull downs were produced with the TnT7 Quick Coupled Transcription/Translation System (Promega). Immobilized GST-fusion proteins were incubated with in vitro translated proteins overnight on rotating wheels at 4 °C in Binding buffer (50 mM Tris–HCl pH7.5, 500 mM NaCl, 1 mM EDTA, 0.5% NP-40, 10% glycerol and complete protease inhibitors [Roche]). The beads were washed four times with Binding buffer and resuspended in loading buffer (12 mM Tris–HCl pH 6.8, 10% glycerol, 0.4% SDS, 80 mM DTT and 0.025% bromophenol blue). Bound proteins were resolved by SDS-PAGE and visualized by Western blotting. Input material corresponds to 17–20% of the material used in the binding assays.

Human

Sex comb extra (Sce)

CBX2 CBX4 CBX6 CBX7 CBX8 BMI1 PCGF1 PCGF2 (MEL18) PCGF3 PCGF5 PCGF6 (MBRL) PHC1 PHC2 PHC3 RNF2 RING1

cbx2 (42%) cbx4 (41%) cbx6a (43%) cbx7b (42%) cbx8a (44%) bmi1a (80%) pcgf1 (77%) – – pcgf5a (72%) pcgf6 (44%) phc1 (46%) phc2b (47%) phc3 (38%) rnf2 (89%) –

Paralog

cbx6b cbx7a cbx8b bmi1b

(42%) (29%) (42%) (76%)

pcgf5b (73%)

phc2a (35%)

Numbers in parentheses represent percentages of amino acid identities with the corresponding human orthologs.

(BMI1, PCGF1, 2, 3, 5, and 6), three Ph orthologs (PHC1, 2 and 3) and two Sce orthologs (RING1 and RNF2), there is an enormous scope of combinatorial diversity in mammals (Whitcomb et al., 2007; Simon and Kigston, 2009). In order to identify ortholog sequences in zebrafish, BLASTN (Basic Local Alignment Search Tool, Altschul et al., 1990) analyses of NCBI (National Center of Biotechnology Information) non-redundant cDNA, EST (Expressed Sequence Tag) and genomic (Whole Genome Shotgun assembly of the zebrafish genome version Zv8, Wellcome Trust Sanger Institute) databases were performed using the human PcG sequences as queries. A total of 19 zebrafish genes were identified and classified based on homology with the corresponding human sequences (Table 1 and Supplementary File 1).

3.2. The zebrafish Pc family All Pc protein family members have a N-terminal chromodomain and a C-terminal Pc box (Müller, 1995; Schoorlemer et al., 1997). The chromodomain specifically recognizes the H3K27me3 marks and is thus involved in PRC1 protein complex targeting at chromatin (Fischle et al., 2003; Min et al., 2003; Bernstein et al., 2006). The Pc box is an around 15-amino acids motif (XTXXTXNXLTVTXKE, where X is a nonconserved amino acid) required for the interaction of Pc protein members with the PRC1 components of the Sce family, and for transcriptional repression of PRC1-target genes (Bárdos et al., 2000; Satijn and Otte, 1999; Schoorlemer et al., 1997). In their recent evolutionary study of PcG proteins, Whitcomb et al. (2007) identified 3 to 5 Pc family members termed Cbx proteins in vertebrates, including 4 Pc protein members in zebrafish. However, our database search allowed the identification of 8 Pc homologs in zebrafish (Table 1 and see Supplementary File 1). Sequence alignments, phylogenetic analyses, gene organization and location studies (Fig. 1 and Supplementary File 2) indicate that all human CBX proteins have at least an orthologous counterpart in zebrafish. Moreover, human CBX6, CBX7 and CBX8 are present as two paralogs in the zebrafish genome. We identified 3 members not

Fig. 1. The zebrafish Pc family members. (A) Phylogenetic tree of human (Hs) and zebrafish (Dr) Pc family members based on the alignment of the amino acid sequences of their full length proteins. The Drosophila Pc protein (Dm_Pc) was used to root the tree. (B) Exon–intron structures of zebrafish cbx2, cbx4, cbx6a, cbx6b, cbx7, cbx8a and cbx8b genes and human CBX2, CBX4, CBX6, CBX7 and CBX8 genes. Position of the Stop codon is indicated as a red bar. White parts indicate untranslated regions whereas dashed parts show Pc box locations. Exon and intron sizes are not shown at the same scale. (C) The comparison of Pc gene family loci in zebrafish and human reveals conserved syntenies. Pc gene family members are shown in red while neighbouring genes are in black. Chromosome numbers of human (Hs) and zebrafish (Dr) are shown. Lines between the compared chromosomes connect relative positions of orthologous gene pairs in the two species. The bar lengths are not proportional to the distances between genes.

P. Le Faou et al. / Gene 475 (2011) 10–21

13

14

P. Le Faou et al. / Gene 475 (2011) 10–21

previously annotated as cbx genes in the databases. These genes are cbx6b (NCBI GeneID: 556231), a novel CBX6 ortholog as well as cbx7a (NCBI GeneID: 550551) and cbx7b, two novel CBX7 orthologs. The identification of a second CBX6 ortholog led us to rename the zebrafish cbx6 gene (NCBI GeneID: 799294) as cbx6a. In addition to conserved chromodomain and Pc box, both Cbx6a and Cbx6b possess the three CBX6-specific motifs described by Senthilkumar and Mishra (2009) (see Supplementary File 3). Cx6.1 is a highly conserved bipartite region specifically associated to CBX6 proteins and rich in serine, threonine, asparagine, proline and basic amino acids. Cx6.2 is rich in basic amino acids, whereas Cx6.3 is rich in proline and acidic amino acids and located closer to the Pc box (Senthilkumar and Mishra, 2009). Moreover, gene locus maps for cbx6a, cbx6b and CBX6 on zebrafish and human chromosomes respectively, reveal conserved syntenies for the three loci (Fig. 1). Thus, the conserved chromosomal organization together with the presence of conserved specific protein motifs, demonstrate that cbx6a and cbx6b are indeed two CBX6 orthologs. Human CBX gene organization studies reveal that all human CBX, but not CBX7, are composed of 5 coding exons (Fig. 1). In contrast, the human CBX7 gene contains 6 exons. Similarly, the zebrafish cbx2, cbx4, cbx6a, cbx6b, cbx8a and cbx8b are all organized in 5 exons, whereas both cbx7a and cbx7b the zebrafish genes coding for the novel CBX7 orthologs, are composed of 6 coding exons. Furthermore, conserved syntenies are observed at the zebrafish cbx7a locus on chromosome 3, at the zebrafish cbx7b locus on chromosome 12, and at the CBX7 locus on human chromosome 22 (Fig. 1). A number of cbx7a are described in the cDNA and EST databases. In contrast, the cbx7b transcript and protein are deduced from genomic sequences, but not supported by experimental data. Indeed, BLAST searches in the zebrafish cDNA and EST databases failed to identify cbx7b transcripts. Then, in order to determine whether the cbx7b gene is expressed in zebrafish, we performed RT-PCR experiments (Fig. 2). Total RNAs from 5 dpf zebrafish larvae were subjected to reverse transcription and PCR amplification using primers specific for the cbx7b transcripts, as well as for cbx7a as a control. Fig. 2B shows that the 243 bp and 125 bp expected PCR products were detected, indicating that both cbx7a and cbx7b are expressed in zebrafish larvae. Then, the presence of functional CBX7 orthologs in the zebrafish genome invalidates the hypothesis of a CBX7 apparition with the mammalian lineage via a gene expansion event, as proposed by Whitcomb et al. (2007). Thus, the zebrafish Pc gene family comprises 8 members, cbx2, cbx4, cbx6a, cbx6b, cbx7a, cbx7b, cbx8a and cbx8b. Zebrafish orthologs can be identified for all human CBX family members. The phylogenetic analysis of Pc family members indicates that each human CBX protein shows more similarity with its zebrafish orthologs rather than with the other human CBX paralogs (Fig. 1A). CBX6, CBX7 and CBX8 are duplicated in the zebrafish genome, whereas CBX4 is duplicated in the fugu blowfish (T. rubripes), and CBX2 and CBX8 are duplicated in the spotted green pufferfish (T. nigroviridis) genomes (Senthilkumar and Mishra, 2009). Comparisons of mammalian and teleost fish genomes have shown that in teleosts additional genome duplication has occurred shortly after the teleost radiation. This partial genome duplication or whole genome duplication followed by rapid gene loss account for about 20% of the zebrafish genes (Meyer and Schartl, 1999; Meyer and Van de Peer, 2005). Interestingly, this increase in the number of copies of CBX genes correlates with an increase of Hox clusters in zebrafish (Amores et al., 1998; Crow et al., 2006). 3.3. Zebrafish cbx4 and cbx6a genes code for both Cbx isoforms containing or lacking the Pc box In the course of our analyses, we noticed that the zebrafish cbx4 gene gives rise to two transcripts by alternative splicing-polyadenyla-

tion events (Fig. 3A). These transcripts were named cbx4a1 (NM_205749) and cbx4a2 (NM_001081694) (see Supplementary File 1). The cbx4a1 transcript is composed of 5 exons and codes for a 477 amino acids protein. In contrast, cbx4a2 contains 6 exons coding for a protein of 145 amino acids lacking the Pc box, but still having a chromodomain. A similar feature occurs at the zebrafish cbx6a gene (Fig. 3A). A cbx6a transcript is described in the databases (NM_001003768). Like cbx4a2, this transcript, named cbx6a1, contains 6 exons and codes for a 411 amino acids protein lacking the Pc box. However, this Cbx6a1 isoform possesses a chromodomain and the CBX6-specific motifs Cx6.1 and Cx6.2. Surprisingly, the last exon consists only in the last three nucleotides (TAA) corresponding to the Stop codon and untranslated sequences. However, the analysis of the cbx6a genomic sequence (clone BX571961, nucleotides 104,541–108,495) allowed us to predict a 5 exons-containing transcript named cbx6a2, and encoding a 487 amino acids protein that would contain a Pc box and the CBX6-specific motif Cx6.3. Since cbx6a2 transcripts are not described in the cDNA and EST databases, we checked for cbx6a2 expression using RT-PCR. Total RNAs were extracted from 5 dpf zebrafish larvae and subjected to reverse transcription and PCR amplification using primers specific for the cbx6a transcripts cbx6a1 and cbx6a2. Fig. 3B shows that the 135 bp and 167 bp expected PCR products were detected, indicating that both cbx6a1 and cbx6a2 are expressed in zebrafish larvae. Thus, based on the same architecture as for the cbx4 gene, the zebrafish cbx6a locus expresses two different transcripts by alternative splicing-polyadenylation events; a transcript of 5 exons coding for a Pc box-containing isoform, and another transcript composed of 6 exons encoding a Cbx6a isoform lacking the Pc box. Remarkably, a similar expression mechanism based on alternative splicing-polyadenylation events occurs at the human CBX2 locus (Fig. 3C). A transcript composed of 5 exons encodes a 532 amino acids CBX2 isoform (NP_005180) that contains a Pc box, whereas a second transcript of 4 exons codes for a second CBX2 isoform (NP_116036) of 211 amino acids and lacking the Pc box. The Pc box containing isoforms, Cbx4, Cbx6a and CBX2, are generated from transcripts composed of 5 exons. The zebrafish isoforms lacking the Pc box are encoded by transcripts that contain 6 exons and resulting from delayed polyadenylation. In contrast, the human CBX2 isoform lacking the Pc box is translated from a 4 exons-containing transcript which is generated by premature polyadenylation. However, the conservation of CBX isoforms lacking the Pc box, both in human and zebrafish, highlights a possible important biological function for these Pc isoforms lacking the Pc box. 3.4. The zebrafish Psc family Drosophila Psc is a RING finger-containing protein involved in PRC1 complex formation and the inhibition of remodelling and transcription (King et al., 2005). Mammalian genomes contain 6 Psc orthologs. In Human, these genes are BMI1 (PCGF4), PCGF1, PCGF2 (MEL18), PCGF3, PCGF5 and PCGF6 (MBRL). A database search also identifies 6 Psc family members in zebrafish (see Supplementary File 1). These Psc homologs all have a similar protein architecture including a conserved RING-finger domain (see Supplementary File 4). However, phylogenetic and gene loci (Fig. 4) analyses indicate that not all of the human BMI1/PCGF genes possess an orthologous counterpart in zebrafish. In particular, we failed to identify orthologs for PCGF2 and PCGF3 whereas PCGF1 and PCGF6 have an ortholog, and both BMI1 and PCGF5 have two orthologs in zebrafish (Table 1). Thus, the zebrafish Psc family comprises the bmi1a, bmi1b, pcgf1, pcgf5a, pcgf5b and pcgf6 genes. The absence of orthologs for PCGF2 and PCGF3 in zebrafish is somehow surprising since the BMI1/PCGF paralogs seem to have nonredundant functions in mammals. For instance, mice deficient for

P. Le Faou et al. / Gene 475 (2011) 10–21

15

Fig. 2. cbx7a and cbx7b are expressed in zebrafish larvae. (A) Schematic representation of zebrafish cbx7a and cbx7b genes and proteins. On the gene organization cartoons, white parts indicate untranslated regions whereas red parts correspond to regions coding for the Pc boxes. On the protein drawings, the chromodomain is indicated as a blue triangle and the Pc box as a red circle. Amino acid length of the proteins is indicated. Position of primers used for the RT-PCR is shown as green arrows. (B) Detection of cbx7a and cbx7b transcripts in zebrafish larvae by RT-PCR. Total RNAs were extracted from 5 dpf zebrafish larvae and subjected to reverse transcription and PCR amplification using primers specific for cbx7a and cbx7b, as indicated in (A), in presence (+RT) or absence (− RT) of reverse transcriptase. The expected size of the PCR products is 243 and 125 bp for cbx7a and cbx7b, respectively.

Fig. 3. CBX isoforms lacking the Pc box. (A) Schematic representation of zebrafish cbx4 and cbx6a transcripts and proteins. On the transcript cartoons, white parts indicate untranslated regions whereas red parts correspond to regions coding for the Pc boxes. On the protein drawings, the chromodomain is indicated as a blue triangle and the Pc box as a red circle. Amino acid length of the proteins is indicated. Primers used for the RT-PCR are shown as green arrows. (B) Detection of cbx6a1 and cbx6a2 transcripts in zebrafish larvae by RT-PCR. Total RNAs were extracted from 5 dpf zebrafish larvae and subjected to reverse transcription and PCR amplification using primers specific for cbx6a1 and cbx6a2, as indicated in (A), in presence (+RT) or absence (− RT) of reverse transcriptase. The expected size of the PCR products is 135 and 167 bp for cbx6a1 and cbx6a2, respectively. (C) Schematic representation of human CBX2 transcripts and proteins. White parts indicate untranslated regions whereas the Pc box coding region is shown in red on the CBX2 isoform 1 transcript. On the protein drawings, the chromodomain is indicated as a blue triangle and the Pc box as a red circle. Amino acid length of the proteins is indicated.

16

P. Le Faou et al. / Gene 475 (2011) 10–21

Fig. 4. The zebrafish Psc family members. (A) Phylogenetic tree of human (Hs) and zebrafish (Dr) Psc family members based on the alignment of the amino acid sequences of their full length proteins. The Drosophila Psc protein (Dm_Psc) was used to root the tree. (B) Exon–intron structures of zebrafish pcgf1, bmi1a, bmi1b, pcgf5a, pcgf5b and pcgf6 genes and human PCGF1, BMI1, PCGF5 and PCGF6 genes. Position of the Stop codon is indicated as a red bar. White parts indicate untranslated regions. Exon and intron sizes are not shown at the same scale. (C) The comparison of Psc gene family loci in zebrafish and human reveals conserved syntenies. Psc gene family members are shown in red while neighbouring genes are in black. Chromosome numbers of human (Hs) and zebrafish (Dr) are indicated. Lines between the compared chromosomes connect relative positions of orthologous gene pairs in the two species. The bar lengths are not proportional to the distances between genes.

Bmi1 or Pcgf2 (Mel18) share homeotic defects, but also harbour distinct phenotypes (van der Lugt et al., 1994; Akasaka et al., 1996). In addition, Bmi1- and Pcgf2-deficient mice survive to birth but die at an age varying from around 3 to 20 weeks. Moreover, transcriptional profiling in DAOY human medulloblastoma cells using short hairpin RNA (shRNA)-mediated expression knock down revealed that around 70% of the genes regulated by BMI1 or PCGF2 are controlled by only one of the two Psc family members but not the other one (Wiederschain et al., 2007). To investigate whether PCGF2 and PCGF3 were specifically lost in the zebrafish genome or whether

these members appeared later in the tetrapod clade, we searched for Psc family members in other fish species such as medaka (O. latipes), fugu blowfish (T. rubripes) and spotted green pufferfish (T. nigroviridis) for which genomic resources are available (Roest Crollius and Weissenbach, 2005). BMI1, PCGF1 and PCGF5 were identified in all three fish genomes. In contrast, PCGF2 was not identified in the medaka, whereas PCGF3 and PCGF6 were not found in fugu and spotted green pufferfish genomes (see Supplementary File 5). Then, we conclude that all BMI1/PCGF family members were present before the teleost radiation and then later lost for some of them, including PCGF2 and PCGF3 in

P. Le Faou et al. / Gene 475 (2011) 10–21

zebrafish. Such a loss of genes has also been observed for other gene families in zebrafish (Itoh and Konishi, 2007; Blomme et al., 2006; Sémon and Wolfe, 2007). In this context, it is worth noting that, in contrast to CBX family members for which paralogs differ greatly in length and in the presence of various motifs (Senthilkumar and Mishra, 2009), BMI1/PCGF paralogs present extended motif similarities over the protein length (see Supplementary File 6).

17

3.5. The zebrafish Ph family The Drosophila polyhometic (ph) locus has been duplicated and contains two related proximal and distal transcription units encoding the Ph proteins Ph-p and Ph-d, respectively (Hodgson et al., 1997). Genetic studies showed that both proteins contribute to Ph function in Drosophila (Dura et al., 1987). Mammals do not have such a duplicated

Fig. 5. The zebrafish Ph family members. (A) Phylogenetic tree of human (Hs) and zebrafish (Dr) Ph family members based on the alignment of the amino acid sequences of their full length proteins. The Drosophila Ph protein (Dm_Ph) was used to root the tree. (B) Exon–intron structures of zebrafish phc1, phc2a, phc2b and phc3 genes and human PHC1, PHC2 and PHC3 genes. Position of the Stop codon is indicated as a red bar. White parts indicate untranslated regions. Exon and intron sizes are not shown at the same scale. (C) The comparison of Ph gene family loci in zebrafish and human reveals conserved syntenies. Ph gene family members are shown in red while neighbouring genes are in black. Chromosome numbers of human (Hs) and zebrafish (Dr) are shown. Lines between the compared chromosomes connect relative positions of orthologous gene pairs in the two species. The bar lengths are not proportional to the distances between genes.

18

P. Le Faou et al. / Gene 475 (2011) 10–21

locus but possess 3 Ph homologs, PHC1, PHC2 and PHC3, located on different chromosomes. Ph family proteins contain a number of conserved domains and motifs such as a SAM domain and a FCS motif (Hodgson et al., 1997; Tonkin et al., 2002; Zhang et al., 2004). The SAM domain is an around 80 amino acids long domain involved in protein– protein interactions and able to form homo- and hetero-oligomers (Kim et al., 2002; 2005), whereas the FCS motif is a RNA-binding putative zinc finger (Zhang et al., 2004). A search in the databases allowed the identification of 4 Ph homologs in zebrafish, named phc1, phc2a, phc2b and phc3 (Table 1 and see Supplementary File 1). Zebrafish phc1, also named ph1, is annotated in the databases, and phc2a also named ph2, have been subjected to previous studies (Kawamura et al., 2002a; Komoike et al., 2005). However, our search identifies phc2b (GeneID: 394177, zgc:56685) as a novel PHC2 ortholog and phc3 (GeneID: 558094) as a PHC3 ortholog. These Ph homologs all have a similar protein architecture including conserved SAM domains and FCS motifs (see Supplementary File 7). Phylogenetic, gene loci and synteny (Fig. 5) analyses show that human PHC1 and PHC3 genes possess an orthologous counterpart in zebrafish, whereas PHC2 has two zebrafish orthologs (Table 1). Kawamura et al. (2002a) showed that the phc2a gene encodes two transcripts, phc2a1 (ph2α) and phc2a2 (ph2β), using alternative promoters (Fig. 5B). These alternative transcripts have distinct spatiotemporal expression patterns along developing somites in zebrafish embryos. Interestingly, this gene organization is conserved in human (Fig. 5B). In mammals, PHC2 is expressed as two isoforms, PHC2a (NP_93157.1) and PHC2b (NP_004418.2) of 90 kDa and 36 kDa, respectively (Yamaki et al., 2002). The short isoform corresponds to the last 323 C-terminal amino-acids of the longer one. Then, in order to determine whether the same gene organization also occurs at the zebrafish phc2b locus, we aligned all the available phc2b EST complete or partial sequences to the corresponding genomic sequence. Our analysis reveals that the phc2b locus encodes at least three different transcripts based on alternative transcriptional initiation (see Supplementary File 8). Interestingly, the isoform we named phc2b2, is initiated within the intron 9 and is reminiscent to what occurs at the phc2a and PHC2 gene loci. This indicates that the alternative transcriptional gene regulation is conserved in all of these three homologs, and might certainly have a biological significance. 3.6. The zebrafish Sce family is composed of a unique member, rnf2 The Drosophila RING finger-containing protein Sce, also known as dRing, possesses a histone H2A ubiquitin ligase activity and is the catalytic subunit of the PRC1 protein complex (Wang et al., 2004). The RING finger is required for Sce H2A ubiquitin ligase activity since a single point mutation in this domain (R65C) abrogates the activity. Mammalian genomes code for two Sce orthologs, RING1 and RNF2 (RING1B). In spite of their high degree of amino acid similarities, RING1 and RNF2 exhibit functional differences. Recombinant RNF2, but not RING1, exhibits a histone H2A ubiquitin ligase activity and mono-ubiquitinylates lysine 119 of H2A (H2AK119ub1) (Wang et al., 2004; Buchwald et al., 2006). RNF2 ubiquitin ligase activity is dependent upon its RING finger, and both RING1 and BMI1 stimulate the RNF2 enzymatic activity (Cao et al., 2005). In particular, mouse embryonic fibroblasts (MEFs), derived from mice inactivated for the function of both Bmi1 and Rnf2, show a global decrease of H2AK119ub1 leading to derepression of some Hox genes (Cao et al., 2005). Moreover, Ring1- and Rnf2-deficient mice exhibit completely different phenotypes. Heterozygous mice for Ring1 present homeotic transformations and skeletal defects (del mar Lorente et al., 2000), whereas heterozygous mice for Rnf2 show no skeletal phenotype (Voncken et al., 2003). However, Rnf2 is essential for gastrulation since homozygous Rnf2-deficient mice die at embryo day 10.5 (Voncken et al., 2003). The severity of such an early phenotype is

Fig. 6. The zebrafish Sce family is composed of a unique member, rfn2. (A) Phylogenetic tree of the human (Hs) and zebrafish (Dr) Sce family members based on the alignment of the amino acid sequences of their full length proteins. The Drosophila Sce protein (Dm_Sce) was used to root the tree. (B) Exon–intron structures of zebrafish rnf2 and human RNF2 genes. Position of the Stop codon is indicated as a red bar. White parts indicate untranslated regions. Exon and intron sizes are not shown at the same scale. (C) The comparison of Sce loci in zebrafish and human reveals a conserved synteny. RNF2 and rnf2 are shown in red while neighbouring genes are in black. Chromosome numbers of human (Hs) and zebrafish (Dr) are indicated. Lines between the compared chromosomes connect relative positions of orthologous gene pairs in the two species. The bar lengths are not proportional to the distances between genes.

not observed for any other PcG gene encoding the PRC1 protein complex. With exception of Rnf2, mice lacking expression of PRC1 proteins do not die during early embryogenesis, but instead exhibit more restricted phenotypes later in development. The severity of the Rnf2 phenotype also correlates with a dramatic reduction of H2AK119ub1 in ES cells, whereas Ring1-deficient ES cells do not show such a global H2AK119ub1 decrease (de Napoles et al., 2004). Interestingly, our in silico screen of the zebrafish genome identifies only one Sce homolog, rnf2 (Table 1 and Supplementary File 1). Phylogenetic analysis indicates that human RNF2 is more closely related to its zebrafish orthologous counterpart Rnf2 rather than to its paralog RING1 (Fig. 6 and Supplementary File 9). Moreover, a conserved synteny is observed for rnf2 on the zebrafish chromosome 2 and RNF2 on the human chromosome 1 (Fig. 6). We failed to identify a RING1 ortholog in the zebrafish genome. To test whether RING1 has been specifically lost in the zebrafish genome or whether it has appeared later in the tetrapod clade, we searched for Sce family members in the other fish species medaka (O. latipes), fugu blowfish (T. rubripes) and spotted green pufferfish (T. nigroviridis). We failed to identify RING1 orthologs in the fugu and the tetraodon, but a RING1 orthologous gene could be identified in the medaka genome (see Supplementary File 10). Thus, we conclude that both RING1 and RNF2 were present before the teleost radiation and then, RING1 was later lost from the zebrafish genome, as it is the case for some of the BMI1/ PCGF family members. 3.7. Protein–protein interactions between zebrafish PRC1 components Using GST pull down and yeast two hybrid assays, Kawamura et al. (2002b) showed that zebrafish Cbx2 (Pc1) and Bmi1a (Psc1) interact

P. Le Faou et al. / Gene 475 (2011) 10–21

19

Fig. 7. Interactions between zebrafish PRC1 components. (A) GST pull down experiments showing that Bm1a and Bmi1b directly interact with Rnf2 and the isoforms Cbx4a1, Cbx4a2 and Phc2a2, and showing that Rnf2 directly interacts with Cbx4a1, Phc2a2 and Bmi1a, but not with the Cbx4 isoform lacking the Pc box, Cbx4a2. GST-tagged Bmi1a, Bmi1b, Rnf2 and GFP as a control were expressed in E. coli, bound to glutathione-Sepharose, and incubated with in vitro expressed TAP-tagged Cbx4a1, Cbx4a2, Phc2a2, Rnf2 or Bmi1a proteins, as indicated. After washing, proteins were separated by SDS-PAGE and Western blots probed with peroxidase anti-peroxidase (αPAP) antibodies to reveal TAP-tagged proteins. (B) The Cbx Pc box is required for direct interaction with Rnf2, but not required for Cbx interactions with Phc2a2 and Bmi1a. GST-tagged Cbx4a1, Cbx4a2, Cbx8a, Cbx8b, Cbx6a1, Cbx7a and GFP as a control were expressed in E. coli, bound to glutathione-Sepharose, and incubated with in vitro expressed TAP-tagged Bmi1a, Phc2a2 or Rnf2 proteins, as indicated and Western blots performed as indicated in (A). (C) Schematic representation of interactions between zebrafish PRC1 components in vitro. The dashed line indicates that the interaction occurs with Pc box-containing Cbx isoforms, but not with Cbx isoforms lacking the Pc box.

together. In order to extend these data and demonstrate that other zebrafish PcG members show multiple interactions as it is expected for components of the PRC1 protein complex (Kiba and Brock, 1998), we performed a number of GST pull down experiments (Fig. 7). First, zebrafish Bmi1a, Bmi1b and Rnf2 proteins, together with a GFP control, were expressed in E. coli as recombinant GST fusion proteins, bound to glutathione-Sepharose beads and incubated with in vitro expressed TAP-tagged PRC1 components, including Cbx4a1, Cbx4a2, Phc2a2, Rnf2 and Bmi1a. In these experiments, we used both a Pc boxcontaining Cbx4 isoform (Cbx4a1) and the Cbx4 isoform lacking the Pc box (Cbx4a2) in order to determine whether the two isoforms have the same binding specificities. A short Phc2a isoform (Phc2a2) was also used, since the short PHC2b isoform is the main isoform present in human PRC1 protein complexes (unpublished observations). After extensive washes, bound proteins were eluted, separated on SDSPAGE and Western blots were probed with peroxidase coupled antiperoxidase (PAP) antibodies for detection of TAP-tagged PcG proteins. Fig. 7A shows that both Bmi1a and Bmi1b directly interact with Cbx4a1, Cbx4a2, Phc2a2 and Rnf2, whereas Rnf2 directly interacts with Cbx4a1, Phc2a2 and Bmi1a, but not with Cbx4a2, the Cbx4 isoform lacking the Pc box. To further investigate the role of the Pc box in Cbx interaction with the other PRC1 components, zebrafish Cbx4a1, Cbx4a2, Cbx8a, Cbx8b, Cbx6a1 and Cbx7a proteins, together with a GFP control, were expressed in E. coli as recombinant GST fusion proteins, bound to glutathione-Sepharose beads and incubated with in vitro expressed TAP-tagged Bmi1a, Phc2a2 or Rnf2. Fig. 7B shows that all Cbx proteins tested directly interact with Bmi1a and Phc2a2 in a similar manner. In contrast, Cbx4a1, Cbx8a, Cbx8b and Cbx7a proteins all interact with Rnf2, whereas the two zebrafish Cbx

isoforms lacking the Pc box Cbx4a2 and Cbx6a1 do not. Several studies using truncated proteins showed that the deletion of the Pc box leads to the loss of CBX-RNF2 interaction (Kyba and Brock, 1998; Schoorlemmer et al., 1997; Satijn and Otte, 1999; Bárdos et al., 2000), but here, we show that the same phenomenon occurs with the naturally expressed zebrafish isoforms, Cbx4a2 and Cbx6a1. Taken together our data indicate that all zebrafish components that compose the PRC1 protein complex are able to directly interact in vitro with each other, and that the Cbx Pc box is required for Cbx-Rnf2 interactions (Fig. 7C). 4. Conclusion Chromatin-associated Polycomb group (PcG) proteins maintain transcriptional repression of hundreds of genes involved in development, signalling or cancer. Biochemical studies in Drosophila as well as in mammalian cells have revealed that PcG proteins form at least two classes of protein complexes named Polycomb repressive complexes 1 and 2 (PRC1 and PRC2). The PRC2 protein complex trimethylates histone H3 on Lysine 27. This H3K27me3 epigenetic mark is central to PcG-silenced chromatin. Drosophila core PRC1 is composed of four subunits, Polycomb (Pc), Posterior sex combs (Psc), Polyhomeotic (Ph), and Sex combs extra (Sce). Each of these proteins has multiple orthologs in vertebrates. In particular, mammalian genomes encode five Pc family members (CBX2, 4, 6, 7 and 8), six Psc family members (BMI1, PCGF1, 2, 3, 5, and 6), three Ph family members (PHC1, 2 and 3) and two Sce family members (RING1 and RNF2). Consequently, there is enormous scope for potential combinatorial diversity within the mammalian PRC1 protein complex. Indeed, there are evidences that

20

P. Le Faou et al. / Gene 475 (2011) 10–21

distinct PRC1 complexes with different protein composition exist in cells (Maertens et al., 2009). Moreover, mice deficient for individual PRC1 components share homeotic defects, but harbour distinct phenotypes, indicating that different PRC1 complexes might control at least some non-redundant target genes. To gain insights into the evolution of Polycomb-mediated repression in vertebrates and since zebrafish is a powerful model organism for studies in developmental genetics, we have undertaken a search for PcG proteins that encode the PRC1 protein complex in this fish. Our database search indicated that a total of 19 zebrafish genes code for the PRC1 core components. These genes include 8 Pc orthologs, 6 Psc orthologs, 4 Ph orthologs and a single Sce ortholog. The zebrafish genome has been almost completely elucidated, and we anticipate that we identified here all the PRC1 orthologous genes. Based on phylogenetic, gene organization and synteny analyses, the zebrafish Pc family gene members identified were cbx2, cbx4, cbx6a, cbx6b, cbx7a, cbx7b, cbx8a and cbx8b. Moreover, CBX6, CBX7 and CBX8 have been maintained in zebrafish after the whole genome duplication that has occurred shortly after the teleost radiation and are present as two paralogs each. A similar situation concerns the Ph gene family. Both PHC1 and PHC3 have one orthologous counterpart, phc1 and phc3, respectively, whereas two zebrafish orthologs for PHC2, phch2a and phc2b were identified. In contrast, phylogenetic and gene loci analyses indicate that not all of the human BMI1/PCGF genes possess an orthologous counterpart in zebrafish. We identified 6 Psc family members in zebrafish, including one ortholog for PCGF1 and PCGF6 (pcgf1 and pcgf6), two orthologs for BMI1 and PCGF5 (bmi1a, bmi1b, pcgf5a and pcgf5b), but we failed to identify zebrafish orthologous counterparts for PCGF2 and PCGF3. RING1 is also absent from the zebrafish genome since we identified only one Sce homolog, rnf2. Because, orthologs for PCGF2, PCGF3 and RING1 can be found in the genome of other fish species including medaka, fugu or tetraodon, we conclude that these genes were present in the genome of the common ancestor of tetrapodes and fishes, but lost after the teleost radiation in some fish species such as zebrafish. This indicates that the diversity in PRC1-like protein complexes appeared early in the vertebrate evolution. In addition, the number of several PRC1 components like the Pc and Ph family members, is even increased in zebrafish, in correlation with the increased number of genomic targets like the Hox clusters. In addition to conserved gene organization and syntenies, transcript analyses revealed that transcriptional regulation is also conserved at genes encoding PRC1 components. This phenomenon is exemplified by the regulation occurring at the PHC2 loci. The human PHC2 gene encodes two transcripts by alternative promoters, and consequently two protein isoforms of 90 kDa and 36 kDa. The short isoform PHC2b, consists in the last 323 C-terminal amino-acids of the longer isoform PHC2a, but still contains the conserved FCS motif and the SAM domain. Interestingly, the two zebrafish orthologs of PHC2, phc2a and phc2b present the same gene organization and transcriptional regulations based on alternative transcriptional initiation and also code for two transcripts. The conservation of the alternative promoter usage at human and zebrafish PHC2/phc2 loci underlines the potential biological role of the two isoforms produced at these loci. Similarly, based on alternative splicing-polyadenylation events, the human CBX2 locus generates two transcripts. A transcript composed of 5 exons encodes a 532 amino acids full length CBX2 isoform, whereas a second transcript made of 4 exons codes for a second CBX2 isoform of 211 amino acids, lacking the Pc box. Remarkably, we identify similar transcriptional regulation events based on alternative splicing-polyadenylation occurring at the zebrafish genes cbx4 and cbx6a. However, the zebrafish isoforms lacking the Pc box (Cbx4a2 and Cbx6a1) are encoded by transcripts composed of 6 exons, whereas full length isoforms (Cbx4a1 and Cbx6a2) are encoded by 5 exons-containing transcripts. Nevertheless, the conservation of CBX isoforms lacking the Pc box both in human and zebrafish highlights

a possible important biological function for these Pc isoforms lacking the Pc box. Since the Pc is required for interaction with the other PRC1 component RNF2/Rnf2, we propose that the human CBX2 and zebrafish cbx4 and cbx6a gene might encode full length Cbx proteins involved in PRC1 function, as well as isoforms lacking the Pc box that would be part of distinct protein complexes lacking the Sce ortholog RNF2/Rnf2. Acknowledgments We thank Claire Rosnoblet for advices in GST-pull down experiments. This work was supported by the CNRS, l'Université de Lille 1 Sciences et Technologies, l'Université de Lille 2 and by the Ministère de la Recherche et de l'Enseignement Supérieur, the Région Nord-Pas de Calais and the European Regional Developmental Funds through the“Contrat de Projets Etat-Région (CPER) 2007–2013”. Appendix A. Supplementary data Supplementary data to this article can be found online at doi:10.1016/j.gene.2010.12.012. References Akasaka, T., Kanno, M., Balling, R., Mieza, M.A., Taniguchi, M., Koseki, H., 1996. A role for mel-18, a Polycomb group-related vertebrate gene, during the anteroposterior specification of the axial skeleton. Development 122, 1513–1522. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local alignment search tool. J. Mol. Biol. 215, 403–410. Amacher, S.L., 2008. Emerging gene knockout technology in zebrafish: zinc-finger nucleases. Briefs Funct. Genomic Proteomic 7, 460–464. Amores, A., Force, A., Yan, Y.L., Joly, L., Amemiya, C., Fritz, A., et al., 1998. Zebrafish hox clusters and vertebrate genome evolution. Science 282, 1711–1714. Bárdos, J.I., Saurin, A.J., Tissot, C., Duprez, E., Freemont, P.S., 2000. HPC3 is a new human orthologue that interacts and associates with RING1 and Bmi1 and has transcriptional repression properties. J. Biol. Chem. 275, 28785–28792. Bernstein, E., Duncan, E.M., Masui, O., Gill, J., Heard, E., Allis, C.D., 2006. Mouse Polycomb proteins bind differently to methylated histone H3 and RNA and are enriched in facultative heterochromatin. Mol. Cell. Biol. 26, 2560–2569. Blomme, T., Vandepoele, K., De Bodt, S., Simillion, C., Maere, S., Van de Peer, Y., 2006. The gain and loss of genes during 600 million years of vertebrate evolution. Genome Biol. 7, R43. Bracken, A.P., Kleine-Kohlbrecher, D., Dietrich, N., Pasini, D., Gargiulo, G., Beekman, C., et al., 2007. The Polycomb group proteins bind throughout the INK4A-ARF locus and are disassociated in senescent cells. Genes Dev. 21, 525–530. Bradford, Y., Conlin, T., Dunn, N., Fashena, D., Frazer, K., Howe, D.G., et al., 2011. ZFIN: enhancements and updates to the zebrafish model organism database. Nucleic Acids Res. 39, D822–D829. Buchwald, G., van der Stoop, P., Weichenrieder, O., Perrakis, A., van Lohuizen, M., Sixma, T.K., 2006. Structure and E3-ligase activity of the ring–ring complex of Polycomb proteins Bmi1 and Ring1b. EMBO J. 25, 2465–2474. Cao, R., Tsukada, Y., Zhang, Y., 2005. Role of Bmi-1 and Ring1A in H2A ubiquitylation and Hox gene silencing. Mol. Cell 20, 845–854. Cao, R., Wang, L., Wang, H., Xia, L., Erdjument-Bromage, H., Tempst, P., et al., 2002. Role of histone H3 lysine 27 methylation in Polycomb-group silencing. Science 298, 1039–1043. Coré, N., Bel, S., Gaunt, S.J., Aurrand-Lions, M., Pearce, J., Fisher, A., et al., 1997. Altered cellular proliferation and mesoderm patterning in Polycomb-M33-deficient mice. Development 124, 721–729. Crow, K.D., Stadler, P.F., Lynch, V.F., Amemiya, C., Wagner, G.P., 2006. The “fish-specific” Hox cluster duplication is coincident with the origin of teleosts. Mol. Biol. Evol. 23, 121–136. de Napoles, M., Mermoud, J.E., Wakao, R., Tang, Y.A., Endoh, M., Appanah, R., et al., 2004. Polycomb group proteins Ring1A/B link ubiquitylation of histone H2A to heritable gene silencing and X inactivation. Dev. Cell 7, 663–676. del Mar Lorente, M., Marcos-Gutierrez, C., Perez, C., Schoorlemmer, J., Ramirez, A., Magin, T., et al., 2000. Loss- and gain-of-function mutations show a Polycomb group function for Ring1A in mice. Development 127, 5093–5100. Dura, J.M., Randsholt, N.B., Deatrick, J., Erk, I., Santamaria, P., Freeman, J.D., et al., 1987. A complex genetic locus, polyhomeotic, is required for segmental specification and epidermal development in D. melanogaster. Cell 51, 829–839. Ekker, S.C., 2008. Zinc finger-based knockout punches for zebrafish genes. Zebrafish 5, 121–123. Feitsma, H., Cuppen, E., 2008. Zebrafish as a cancer model. Mol. Cancer Res. 6, 685–694. Fischle, W., Wang, Y., Jacobs, S.A., Kim, Y., Allis, C.D., Khorasanizadeh, S., 2003. Molecular basis for the discrimination of repressive methyl-lysine marks in histone H3 by polycomb and HP1 chromodomains. Genes Dev. 17, 1870–1881. Hodgson, J.W., Cheng, N.N., Sinclair, D.A.R., Kyba, M., Randsholt, N.B., Brock, H.W., 1997. The polyhomeotic locus of Drosophila melanogaster is transcriptionally and posttranscriptionally regulated during embryogenesis. Mech. Dev. 66, 69–81.

P. Le Faou et al. / Gene 475 (2011) 10–21 Isono, K., Fujimura, Y., Shinga, J., Yamaki, M., O-Wang, J., Takihara, Y., et al., 2005. Mammalian polyhomeotic homologues Phc2 and Phc1 act in synergy to mediate Polycomb repression of Hox genes. Mol. Cell. Biol. 25, 6694–6706. Itoh, N., Konishi, M., 2007. The zebrafish fgf family. Zebrafish 4, 179–186. Kawamura, A., Yamada, K., Fujimori, K., Higashinakagawa, T., 2002a. Alternative transcripts of a polyhomeotic gene homolog are expressed in distinct regions of somites during segmentation of zebrafish embryos. Biochem. Biophys. Res. Commun. 291, 245–254. Kawamura, A., Yokota, S., Yamada, K., Inoue, H., Inohaya, K., Yamazaki, K., et al., 2002b. pc1 and psc1, zebrafish homologs of Drosophila Polycomb and Posterior sex combs, encode nuclear proteins capable of complex interactions. Biochem. Biophys. Res. Commun. 294, 456–463. Kennison, J.A., 1995. The Polycomb and trithorax group proteins of Drosophila: transregulators of homeotic gene function. Annu. Rev. Genet. 29, 289–303. Kim, C.A., Gingery, M., Pilpa, R.M., Bowie, J.U., 2002. The SAM domain of polyhomeotic forms a helical polymer. Nat. Struct. Mol. Biol. 9, 453–457. Kim, C.A., Sawaya, M.R., Cascio, D., Kim, W., Bowie, J.U., 2005. Structural organization of a sex-comb-on-midleg/polyhomeotic copolymer. J. Biol. Chem. 280, 27769–27775. Kimmel, C.B., Ballard, W.W., Kimmel, S.R., Ullmann, B., Schilling, T.F., 1995. Stages of embryonic development of the zebrafish. Dev. Dyn. 203, 253–310. King, A., 2009. Researchers find their Nemo. Cell 139, 843–846. King, I.F., Emmons, R.B., Francis, N.J., Wild, B., Müller, J., Kingston, R.E., et al., 2005. Analysis of a polycomb group protein defines regions that link repressive activity on nucleosomal templates to in vivo function. Mol. Cell. Biol. 25, 6578–6591. Komoike, Y., Kawamura, A., Shindo, N., Sato, C., Satoh, J., Shiurba, R., et al., 2005. Zebrafish Polycomb group gene ph2α is required for epiboly and tailbud formation acting downstream of FGF signaling. Biochem. Biophys. Res. Commun. 328, 858–866. Kuzmichev, A., Nishioka, K., Erdjument-Bromage, H., Tempst, P., Reinberg, D., 2002. Histone methyltransferase activity associated with a human multiprotein complex containing the enhancer of Zeste protein. Genes Dev. 16, 2893–2905. Kyba, M., Brock, H.W., 1998. The Drosophila Polycomb group protein Psc contacts ph and Pc through specific conserved domains. Mol. Cell. Biol. 18, 2712–2720. Leach, S.D., 2009. Pisces and cancer: the stars align. Zebrafish 6, 317. Lessard, J., Sauvageau, G., 2003. Bmi-1 determines the proliferative capacity of normal and leukaemic stem cells. Nature 423, 255–260. Leung, C., Lingbeek, M., Shakhova, O., Liu, J., Tanger, E., Saremaslani, P., et al., 2004. Bmi1 is essential for cerebellar development and is overexpressed in human medulloblastomas. Nature 428, 337–341. Levine, S.S., Weiss, A., Erdjument-Bromage, H., Shao, Z., Tempst, P., Kingston, R.E., 2002. The core of the Polycomb repressive complex is compositionally and functionally conserved in flies and human. Mol. Cell. Biol. 22, 6070–6078. Maertens, G.N., El Messaoudi-Aubert, S., Racek, T., Stock, J.K., Nicholls, J., RodrigezNiedenführ, M., et al., 2009. Several distinct Polycomb complexes regulate and colocalize on the INK4a tumor suppressor locus. PLoS ONE 4, e630. Meyer, A., Schartl, M., 1999. Gene and genome duplications in vertebrates: the one-tofour (-to-eight in fish) rule and the evolution of novel gene functions. Curr. Opin. Cell Biol. 11, 699–704. Meyer, A., Van de Peer, Y., 2005. From 2R to 3R: evidence for a fish-specific genome duplication (FSGD). Bioessays 27, 937–945. Min, J., Zhang, Y., Xu, R.M., 2003. Structural basis for specific binding of polycomb chromodomain to histone H3 methylated at lys 27. Genes Dev. 17, 1823–1828. Müller, J., 1995. Transcriptional silencing by the Polycomb protein in Drosophila embryos. EMBO J. 14, 1209–1220. Müller, J., Hart, C.M., Francis, N.J., Vargas, M.L., Sengupta, A., Wild, B., et al., 2002. Histone methyltransferase activity of the Drosophila Polycomb group repressor complex. Cell 111, 197–208. Nekrasov, M., Klymenko, T., Fraterman, S., Papp, B., Oktaba, K., Köcher, T., et al., 2007. Pcl-PRC2 is needed to generate high levels of H3-K27 trimethylation at Polycomb target genes. EMBO J. 26, 4078–4088. Oktaba, K., Gutiérrez, L., Gagneur, J., Girardot, C., Sengupta, A.K., Furlong, E.E., et al., 2008. Dynamic regulation by polycomb group protein complexes controls pattern formation and the cell cycle in Drosophila. Dev. Cell 15, 877–889. Pasini, D., Bracken, A.P., Hansen, J.B., Capillo, M., Helin, K., 2007. The Polycomb group protein Suz12 is required for embryonic stem cell differentiation. Mol. Cell. Biol. 27, 3769–3779. Pirrotta, V., 1998. Polycombing the genome: PcG, trxG, and chromatin silencing. Cell 93, 333–336. Ringrose, L., Paro, R., 2004. Epigenetic regulation of cellular memory by the Polycomb and Trithorax group proteins. Annu. Rev. Genet. 38, 413–443.

21

Roest Crollius, H., Weissenbach, J., 2005. Fish genomics and biology. Genome Res. 15, 1675–1682. Satijn, D.P., Otte, A.P., 1999. RING1 interacts with multiple Polycomb-group proteins and displays tumorigenic activity. Mol. Cell. Biol. 19, 57–68. Sayers, E.W., Barrett, T., Benson, D.A., Bolton, E., Bryant, S.H., Canese, K., et al., 2010. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 38, D5–16. Schoorlemer, J., Marcos-Gutierrez, C., Were, F., Martinez, R., Garcia, E., Satijn, D.P., et al., 1997. Ring1A is a transcriptional repressor that interacts with the Polycomb-M33 protein and is repressed at rhombomere boundaries in the mouse hindbrain. EMBO J. 16, 5930–5942. Schuettengruber, B., Chourrout, D., Vervoort, M., Leblanc, B., Cavalli, G., 2007. Genome regulation by Polycomb and Trithorax proteins. Cell 128, 735–745. Sémon, M., Wolfe, K.H., 2007. Reciprocal gene loss between Tetraodon and zebrafish after whole genome duplication in their ancestor. Trends Genet. 23, 108–112. Senthilkumar, R., Mishra, R.K., 2009. Novel motifs distinguish multiple homologues of Polycomb in vertebrates: expansion and diversification of the epigenetic toolkit. BMC Genomics 10, 549. Shao, Z., Raible, F., Mollaaghababa, R., Guyon, J.R., Wu, C.T., Bender, W., et al., 1999. Stabilization of chromatin structure by PRC1, a Polycomb complex. Cell 98, 37–46. Simon, J.A., Kigston, R.E., 2009. Mechanisms of Polycomb gene silencing: knows and unknows. Nat. Rev. Mol. Cell Biol. 10, 697–708. Souza, P.P., Völkel, P., Trinel, D., Vandamme, J., Rosnoblet, C., Héliot, L., et al., 2009. The histone methyltranferase SUV420H2 and heterochromatin proteins HP1 interact but show different dynamic behaviours. BMC Cell Biol. 10, 41. Sparmann, A., van Lohuizen, M., 2006. Polycomb silencers control cell fate, development and cancer. Nat. Rev. Cancer 6, 846–856. Suzuki, M., Mizutani-Koseki, Y., Fujimura, Y., Miyagishima, H., Kaneko, T., Takada, Y., et al., 2002. Involvement of the Polycomb-group gene Ring1B in the specification of the anterior-posterior axis in mice. Development 129, 4171–4183. Takihara, Y., Tomotsune, D., Shirai, M., Katoh-Fukui, Y., Nishii, K., Motaleb, M.A., et al., 1997. Targeted disruption of the mouse homologue of the Drosophila polyhomeotic gene leads to altered anteroposterior patterning and neural crest defects. Development 124, 3673–3682. Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving the sentivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673. Tonkin, E., Hagan, D.M., Li, W., Strachan, T., 2002. Identification and characterisation of novel mammalian homologues of Drosophila polyhomeotic permits new insights into relationships between members of the polyhomeotic family. Hum. Genet. 111, 435–442. van der Lugt, N.M., Domen, J., Linders, K., van Roon, M., Robanus-Maadag, E., Riele, H., et al., 1994. Posterior transformation, neurological abnormalities, and severe hematopoietic defects in mice with a targeted deletion of the bmi-1 proto-oncogene. Genes Dev. 8, 757–769. Völkel, P., Angrand, P.O., 2007. The control of histone lysine methylation in epigenetic regulation. Biochimie 89, 1–20. Voncken, J.W., Roelen, B.A., Roefs, M., de Vries, S., Verhoeven, E., Marino, S., et al., 2003. Rnf2 (Ring1b) deficiency causes gastrulation arrest and cell cycle inhibition. Proc. Natl Acad. Sci. USA 100, 2468–2473. Wang, H., Wang, L.J., Erdjument-Bromage, H., Vidal, M., Tempst, P., Jones, R.S., et al., 2004. Role of histone H2A ubiquitination in polycomb silencing. Nature 431, 873–978. Wang, J., Mager, J., Chen, Y., Schneider, E., Cross, J.C., Nagy, A., et al., 2001. Imprinted X inactivation maintained by a mouse Polycomb group gene. Nat. Genet. 28, 371–375. Whitcomb, S.J., Basu, A., Allis, C.D., Bernstein, E., 2007. Polycomb group proteins: an evolutionary perspective. Trends Genet. 23, 494–502. Wiederschain, D., Chen, L., Johnson, B., Bettano, K., Jackson, D., Taraszka, J., et al., 2007. Contribution of polycomb homologues Bmi-1 and Mel-18 to medulloblastoma pathogenesis. Mol. Cell. Biol. 27, 4968–4979. Wienholds, E., Plasterk, R.H.A., 2004. Target-selected gene inactivation in zebrafish. Meth. Cell Biol. 77, 69–90. Yamaki, M., Isono, K., Takada, Y., Abe, K., Akasaka, T., Tanzawa, H., et al., 2002. The mouse edr2 (mph2) gene has two forms of mRNA encoding 90- and 36-kda polypeptides. Gene 288, 103–110. Zhang, H., Christoforou, A., Aravind, L., Emmons, S.W., van den Heuvel, S., Haber, D.A., 2004. The C. elegans Polycomb gene sop-2 encodes an RNA binding protein. Mol. Cell 14, 841–847.