The SCAN domain family of zinc finger transcription factors

The SCAN domain family of zinc finger transcription factors

Gene 359 (2005) 1 – 17 www.elsevier.com/locate/gene Review The SCAN domain family of zinc finger transcription factors Leonard C. Edelstein, Tucker ...

1MB Sizes 2 Downloads 130 Views

Gene 359 (2005) 1 – 17 www.elsevier.com/locate/gene

Review

The SCAN domain family of zinc finger transcription factors Leonard C. Edelstein, Tucker Collins * Department of Pathology, Children’s Hospital Boston and Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA Received 31 January 2005; received in revised form 26 May 2005; accepted 3 June 2005 Available online 1 September 2005 Received by A.J. van Wijnen

Abstract Zinc finger transcription factor genes represent a significant portion of the genes in the vertebrate genome. Some Cys2His2 type zinc fingers are associated with conserved protein domains that help to define these regulators. A novel domain of this type, the SCAN domain, is a highly conserved 84-residue motif that is found near the N-terminus of a subfamily of C2H2 zinc finger proteins. The SCAN domain, which is also known as the leucine rich region, functions as a protein interaction domain, mediating self-association or selective association with other proteins. Here we define the mouse SCAN domain and annotate the mouse SCAN family members. In addition to a single SCAN domain, some of the members of the mouse SCAN family members have a conserved N-terminal motif, a KRAB domain, SANT domains and a variable number of C2H2 type zinc fingers (3 – 14). The genes encoding mouse SCAN domains are clustered, often in tandem arrays, and are capable of generating isoforms that may affect the function of family members. Although the function of most of the family members is not known, an overview of selected members of this group of transcription factors suggests that some of the mouse SCAN domain family members play roles in cell survival and differentiation. D 2005 Elsevier B.V. All rights reserved. Keywords: SCAN domain; KRAB domain; SANT domain; Zinc finger

1. Zinc finger transcription factors Zinc fingers are versatile DNA-recognition elements. These domains bind zinc ions through various combinations of cysteine and/or histidine residues (Klug and Schwabe, 1995). One type of zinc finger contains a C2H2 motif (Pavletich and Pabo, 1991). This motif characterizes one of the most abundant eukaryotic protein families being found in 2% of all human genes (Lander et al., 2001; Tupler et al., 2001). The prevalence of the C2H2 motif reflects their versatility for recognizing different sequences of DNA. Each finger is a short stretch of 30 amino acids, containing two conserved cysteines and two conserved histidines. The conserved residues coordinate a zinc ion that allows the Abbreviations: KRAB, Kruppel-associated box; SANT, switchingdefective protein 3 (Swi3), adapter 2 (Ada2), nuclear receptor co-repressor (N-CoR), transcription factor (TF)IIIB domain. * Corresponding author. Tel.: +1 617 355 5806; fax: +1 617 730 0168. E-mail address: [email protected] (T. Collins). 0378-1119/$ - see front matter D 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2005.06.022

finger to fold into a compact structure containing a h-turn that includes the conserved cysteines and an a-helix that includes the conserved histidines (Pavletich and Pabo, 1991). The C2H2 zinc fingers frequently occur in tandem arrays, and most transcription factors have three or more fingers working collectively to specifically recognize DNA or RNA (Lu et al., 2003).

2. Zinc finger associated domains The C2H2 zinc finger proteins frequently contain additional protein domains located at their N-termini. These structural modules regulate subcellular localization and gene expression by controlling selective association of the transcription factors with each other, or with other cellular components. In the C2H2 class of zinc fingers, these associated modules include the Kruppel-associated box (KRAB) (Bellefroid et al., 1991), the poxvirus and zinc finger (POZ) domain (Bardwell and Treisman, 1994), which

2

L.C. Edelstein, T. Collins / Gene 359 (2005) 1 – 17

is also known as the BTB domain (Broad-Complex, Tramtrack, and Bric-a-brac) (Albagli et al., 1995), the insect zinc finger associated domain (ZAD) (Chung et al., 2002) and the SCAN domain (Williams et al., 1995), which is sometimes called the leucine rich region (LeR) (Pengue et al., 1994). These domains define subgroups within the C2H2 family and may provide insights into the functions of the members of this large family of zinc finger transcription factors.

3. SCAN domain The SCAN domain is an extended sequence motif found in a family of zinc finger transcription factors that constitutes approximately 10% of the estimated 700 C2H2 zinc finger genes present in the human genome (Lander et al., 2001; Venter et al., 2001). 3.1. Definitions of the SCAN domain The SCAN domain is a highly conserved 84-residue motif that is found near the N-terminus of a subfamily of C2H 2 zinc finger proteins. The SCAN domain was originally identified in the human C2H2 zinc finger transcription factor, ZNF174 (Williams et al., 1995). The name was based on the first letters found in some of the founding members of the family (SRE-ZBP, Ctfin51, AW-1 (ZNF174), and Number 18). An alignment of the highly homologous human SCAN domains can be found elsewhere (Stone et al., 2002; Sander et al., 2003; Collins and Sander, 2005) and online at the Project Website (http://www.scanfamily.org). Based on the sequence alignment, as well as the biochemical properties of the recombinant protein detailed below, the human SCAN domain is defined as 84 residues, beginning with E-43 and ending with R-126 of ZNF174. In many sequences, there are proline residues both before and after the domain, helping to delineate the boundaries of the predicted secondary structural elements. The definition of the zinc finger associated domains in protein sequence databases is key to annotating family members in the SCAN family. Unfortunately, the SCAN domain is defined differently in some of these classification schemes (Table 1). For example, the PROSITE database is one of the oldest curated domain databases and is part of the ExPASy (Expert Protein Analysis System) proteomics project operated by the Swiss Institute for Bioinformatics. The PROSITE web server uses protein-sequence-profile

matching to identify a wide variety of regions in protein sequences. In PROSITE the SCAN domain is defined as ZNF174 residues 46 –128. The Pfam database is a curated collection of multiple sequence alignments maintained by the Welcome Trust Sanger Institute. In Pfam the SCAN domain is defined as ZNF174 residues 40 –135. The InterPro database is a consortium of different database projects for annotating protein families, domains and motifs. It is maintained by the European Bioinformatics Institute (EBI), which is part of the European Molecular Biology Laboratory (EMBL). InterPro has combined data from seven member databases to form a much larger database. Redundant overlapping entries are merged into single unique InterPro entries. In InterPro the SCAN domain is also defined in the context of ZNF174. Unfortunately, a fourth major online domain database, SMART (Simple Modular Architecture Research Tool), refers to the SCAN domain as a ‘‘leucine rich region’’ or LeR, but defines the domain in the context of ZNF174 (residues 42 – 154). This is an unfortunate designation because the similar term ‘‘leucine rich repeat’’ is used to describe a different protein – protein interaction domain found in a large (over 2500 proteins in the SMART database) functionally distinct group of proteins involved in inflammation and cell death. These LRR domains consist of repeats of 20 to 30 amino acids with an 11-residue leucine rich repeat consensus sequence (Leu-X-X-Leu-X-Leu-X-X-(Asn/Cys)X-Leu, where X represents any resident and some L positions can be occupied by Val, Ile, or Phe). Unfortunately, the use of the term ‘‘leucine rich region’’ to describe the SCAN domain has resulted in confusion in both the databases and in the literature because of its similarity to ‘‘leucine rich repeat’’. 3.2. Function of the SCAN domain The SCAN domain is capable of functioning as a protein –protein interaction motif. Findings using either the mammalian or the yeast two-hybrid systems demonstrate that the SCAN domain is an interaction motif (Williams et al., 1999; Sander et al., 2000; Schumacher et al., 2000). In one study a mammalian two-hybrid assay was used to show that the SCAN domain mediates protein – protein interactions (Williams et al., 1999). To demonstrate that the SCAN domain can self-associate, the ability of a ZNF174 SCANGAL4 construct to activate a reporter was tested in the presence of a ZNF174 SCAN-VP16 fusion construct. Coexpression of both fusion constructs markedly activated transcription of a GAL4-dependent reporter when compared

Table 1 Database

Designation

Identifier

Website

Prosite Pfam INTERPRO SMART

SCAN SCAN SCAN LER

PDOC50804 PF02023 IPR003309 SM00431

www.expasy.ch/cgi-bin/nicesite.pl?PS50804 www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF02023 www.ebi.ac.uk/interpro/DisplayIproEntry?ac=IPR003309 http://smart.embl-heidelberg.de/

L.C. Edelstein, T. Collins / Gene 359 (2005) 1 – 17

with that of empty GAL4 or VP16 vectors, or with each of the SCAN domain fusion constructs alone. The SCAN domain did not interact with a leucine zipper motif, demonstrating that the SCAN domain did not interact nonspecifically with other transcription factors that contain amphipathic a-helices mediating oligomerization (Williams et al., 1999). The SCAN domain is capable of mediating protein – protein interactions in the context of an intact member of the family. For example, studies were done in which a tagged full-length form of ZNF174 was cotranslated with another SCAN protein, and associations between the two SCAN protein forms were then examined by immunoprecipitation with an antibody that recognized the tag (Williams et al., 1999). These co-immunoprecipitation studies confirmed that the SCAN domain is responsible for self-association in the context of an intact SCAN family member and that the ahelical character of the domain is required for the interaction. Additionally, these in vitro studies demonstrate that intact ZNF174 can selectively bind other members of the SCAN family. Generalizing on these observations, members of the SCAN family can both self-associate and form heterodimers using the SCAN domain as an interaction motif.

3

4. Human SCAN domain family members With the release of the draft of the human genome it became possible to use bioinformatic approaches to identify all of the SCAN domains in the human genome and to annotate the predicted structures of the family members defined by the presence of the domain. To determine the total number of genes predicted to encode SCAN domains in the human genome, human genome databases were screened with a representative human SCAN domain (ZNF174). Fifty-nine SCAN domains were found in complete open reading frames, suggesting that they are contained within functional genes (Sander et al., 2003). The DNA sequences adjacent to the 59 SCAN domains were computationally analyzed and manually annotated to predict the cDNA structure and protein coding-potential for each member of the human SCAN family. This approach successfully predicted the structures for all of the 24 previously reported members of the human SCAN family and identified 39 new SCAN containing genes. Analysis of expressed sequence tag (EST) databases and serial analysis of gene expression (SAGE) libraries confirmed that most of the predicted SCAN family members are actively expressed.

Fig. 1. Model of the structural features of the human SCAN domain family. The names and predicted structures of the 63 members of the human SCAN domain family are schematically shown in order of their gene locations, starting with chromosome 1. The conserved domains are indicated with a shaded box: novel region (red), SCAN (orange), KRAB-A (green), KRAB-B (yellow), SANT domain (red), integrase core domain (black), and C2H2 zinc finger domain (blue). Zinc fingers displayed as an open white box are atypical and deviate from the C2H2 consensus. Many of the original SCAN family names have changed, they are listed below in the format new name = old name. ZNF31 = MGC9923, ZNF496 = MGC15548, ZNF445 = TRFA, ZNF167 = ZFP, ZNF435 = FLJ22191, ZNF307 = P1 p373c6, ZNF187 = SRE-ZBP, PGBD1 = HUCEP-4, ZNF305 = KIAA0426, ZNF390 = LOC222696, ZNF452 = KIAA1925, ZNF498 = FLJ32468, ZKSCAN1 = ZNF36, ZNF483 = KIAA1962, FLJ35867 = LOC255333, ZSCAN2 = FLJ20595, ZNF206 = FLJ14549, ZNF75A = FLJ36660, ZNF434 = MGC4179, ZNF500 = KIAA0557, FLJ23199 = LOC124171, AY234408 = RP11-158H5, ZNF444 = EZF-2, LOC342933 = LOC126209, LOC342934 = LOC126210, ZSCAN5 = MGC4161, hmm28055 = LOC126211, PEG3 = ZIM2, ZSCAN4 = FLJ35105, ZSCAN1 = LOC284312, HKR2 = LOC162969, and ZNF449 = LOC203523. Names were given based on the ‘‘Official Symbol’’ field in the Entrez Gene entry where available, otherwise the LOC number, transcript accession number, or gnomon model number is given. Additional changes to the family are: SANT domains added to ZNF31 and LOC342357; KRAB domains added to ZNF31, ZNF75A, and ZNF434; complete coding sequence evidence for ZNF445, and ZNF434; removal of RP1-29K1, LOC390587, ZIM2, LOC162968 and RP11-452H21. Modified and updated from Sander et al. (2003).

4

L.C. Edelstein, T. Collins / Gene 359 (2005) 1 – 17

An updated schematic diagram of the predicted structural features of the human SCAN family is presented in Fig. 1.

5. Mouse SCAN family members To determine the total number of SCAN domains in the mouse genome, both public (National Center for Biotechnology Information or NCBI; http://www.ncbi.nlm.nih.gov/) and private (Celera Genomics, http://www.Celera.com) mouse genome databases were screened with a representative mouse SCAN domain. The ZNF174 SCAN domain was used as a protein query sequence in TBASTN searches against protein databases and in translations of mouse nucleotide transcript and genome data sets. The mouse SCAN domains from both NCBI and Celera databases were manually compared to determine overlap and to eliminate redundancy. Database searches were completed on or before December 1, 2004.

5.1. Mouse SCAN domains The screen revealed the presence of 40 mouse SCAN domains in the NCBI database and 39 in the Celera database for a total of 40 unique expressed SCAN domain proteins. An alignment for the 40 expressed mouse SCAN domains is presented in Figs. 2 and 3. This alignment shows the extensive homology between SCAN domains in the mouse family members. Based on size, the mouse SCAN domains could be placed into three general groups, which have been designated SCAN A1 and A2, and a smaller group of domains that is more distinct and designated SCAN B. 5.1.1. Mouse SCAN A1 and A2 21 members of the mouse SCAN family have an alignment of that is consistent with the previous definition of the human SCAN domain, a highly conserved region of 84 residues extending from E-43 of human ZNF174 to R-

Fig. 2. SCAN domains in the mouse genome. To illustrate the homology of the SCAN domains, the amino acid sequences of the mouse SCAN domains were grouped into (A) SCAN A1, A2, and (B) SCAN B subdivisions and then aligned by consensus ClustalW sequence alignment using Vector NTI 8.0 (Invitrogen) with an open gap penalty setting of 10.0. Residues that share >51% identity are outlined with dark shading, conserved amino acid differences are indicated by lighter shading.

L.C. Edelstein, T. Collins / Gene 359 (2005) 1 – 17

5

Fig. 3. Model of the structural features of the mouse SCAN domain family. The names and predicted structures of the 40 expressed members of the mouse SCAN domain family are schematically shown in order of their gene locations. The conserved domains are indicated with a shaded box: SCAN A1 and A2 (orange), SCAN B (light orange) KRAB (green), SANT (purple) and C2H2 zinc finger domain (blue). The blue bar adjacent to selected SCAN members indicates the longer SCAN A2 form. The red bar indicates a novel motif containing a consensus sumoylation site. The outlined domains indicate regions that were not identified by all the domain annotating programs. Names were assigned based on the NCBI Entrez Gene name. Where the complete mRNA was only in the Celera database, Celera identifiers (mCG) are given. Where the gene only exists as a model, with no Entrez Gene entry, the hmm identification number is given.

126. This group of SCAN domains is designated SCAN A1. Additionally, we have identified a second group of 17 SCAN domains that has a longer C-terminal extension of 21 residues that is designated SCAN A2 (Fig. 2A). The 21 amino acid sequence is uniquely associated with SCAN domains and is less conserved than the core of the SCAN domain. No defined secondary structure was predicted for the C-terminal extension when the sequence was analyzed using multiple secondary structure prediction programs (SSPro, PROF, PHD, T-99, PSIPRED, and JPRED, accessed from http://cubic.bioc.columbia.edu/predictprotein/ predictprotein.html). 5.1.2. Mouse SCAN B The motif discovery program MEME (meme.sdsc.edu) revealed two mouse SCAN-like domains that were similar to human ZSCAN4 (Fig. 2B). Although these SCAN domains were identified as SCAN domains by motif identifying programs pfam, SMART and Prosite because of a 35-residue core region of homology, the SCAN B domains have divergent N- and C-terminal ends (Fig. 2B). The 44 N-terminal and 34 C-terminal sequences are uniquely associated with the core SCAN domain. Although no secondary structure was predicted in the N-terminal region, three a-helixes were predicted in the C-terminal consensus sequence after analysis with multiple secondary structure algorithms.

5.2. Mouse SCAN family members Once the SCAN domains were identified in the mouse genome, DNA sequences adjacent to the motif were manually annotated to predict the cDNA structures. The full-length predicted protein sequences were obtained from each database and conserved motifs were identified using Prosite (us.expasy.org/prosite), SMART (smart. embl-heidelberg.de/) and pfam (pfam.wustl.edu). Each of the predicted motifs was independently confirmed by consensus Clustal W sequence alignment using Vector NTI 8.0 (Invitrogen). The predicted amino acid sequence for each mouse SCAN member was used in a TBASTN search of the NCBI expressed sequence tag (EST) mouse database. All of the mouse SCAN family members were represented by high probability sequence matches in the EST databases (> 99% nucleotide identity in >100 bp). A schematic diagram of the structural features of the expressed mouse SCAN family members is presented in Fig. 3. This screen revealed the presence of ‘‘only’’ 40 SCAN domain-containing expressed genes in the mouse (relative to 59 in the human). This approach successfully predicted the structures of the previously named mouse family members and identified 20 new SCAN domain-containing genes. Of the 40 mouse SCAN family members, 36 (90%) are predicted to encode a single SCAN domain and a variable

6

L.C. Edelstein, T. Collins / Gene 359 (2005) 1 – 17

number of C2H2 zinc fingers (ranging from 3 to 14), while 4 (10%) contain only a SCAN domain, or a partial SCAN domain. As in the human SCAN family, protein isoforms can be generated from some of the mouse family members that include only the SCAN domain (e.g., Zfp202) or just the zinc fingers (e.g., Zfp98). In the next several sections, we will review the other domains that are found in some of the mouse SCAN family members. 5.3. A conserved N-terminal motif in some mouse SCAN family members Based on the predicted protein structures, several conserved motifs were found outside of the SCAN domain. As with the human genes, some of the mouse SCAN family members contain a region of homology usually located at the N-terminus of the predicted protein. This region of homology is found in single copy and is present in 25 of the 40 mouse SCAN family members. The region is contained in family members that have just a SCAN domain as well as those that also contain a KRAB domain. In 24 of 25 members this domain is located at the very N-terminus of the predicted protein, while in one instance, Zfp287, it is located between a KRAB domain and the zinc fingers. This region is 15 residues in length and has a consensus EXEGLLIVVK(V/L)EE(D/E) sequence (Fig. 3). Contained within the core of the homologous N-terminal region is the sequence VKxE, which fits the consensus for a SUMO acceptor site (CKXE, where C is a large hydrophobic amino acid and K is the site of SUMO conjugation; Rodriguez et al., 2001). Additionally, a glycine residue is located five amino acids upstream from the acceptor lysine, consistent with the region of homology containing a sumoylation site. Although there is no direct evidence that this motif is sumoylated, it is tempting to speculate that SCAN family members are modified by this process. SUMO family proteins are small ubiquitin-related proteins that become covalently attached to lysine residues in substrate proteins (reviewed in Gill, 2003). The SUMO pathway parallels the classical ubiquitinylation pathway with three distinct steps: activation involving the enzyme E1, conjugation involving the E2 enzyme UBC9, and substrate modification through the cooperative association of UBC9 and E3 ligases. Conjugation to SUMO-1 has been shown to increase protein stability, to alter protein –protein interactions, or to modify the subcellular localization of target proteins (reviewed in Melchior, 2000 and Seeler and Dejean, 2003). Importantly, recent evidence suggests that sumoylation plays a role in regulating transcription (reviewed in Gill, 2003). For example, the activity of SP3, Elk-1, c-Myb and the coactivator, p300 is inhibited following sumoylation and mutation of the SUMO acceptor lysine or cotransfection with SUMO protease leads to a dramatic increase in the transcriptional activity of these proteins (e.g., Ross et al., 2002). Although there is no direct evidence that this conserved N-terminal motif is functional,

sumoylation of either the mouse or the human the SCAN family members may have a substantial effect on transcription factor activity, or on the intracellular localization of the proteins, as seen with some other zinc finger transcription factors. 5.4. KRAB domains in the mouse SCAN family members Some of the mouse SCAN family members contain a KRAB domain. The KRAB domain is found at the Nterminus of approximately one third of all of the C2H2 zinc finger proteins (Lander et al., 2001; Venter et al., 2001). The KRAB domain spans about 75 amino acids and has been divided into two subregions, designated A and B (Bellefroid et al., 1991). It has been demonstrated that the KRAB-A domain contains transcriptional repression activity (Margolin et al., 1994; Witzgall et al., 1994). The KRAB-A domain interacts with KAP1 (KRAB-associated protein-1, also known as TIF-1h, transcription intermediary factor-1h), a member of the ‘‘tripartite motif’’ family with the approved symbol TRIM28 (Friedman et al., 1996; Kim et al., 1996; Moosmann et al., 1996; Ryan et al., 1999; Lechner et al., 2000; Schultz et al., 2001, 2002). KAP1 can enhance KRAB-A-mediated repression by interacting with heterochromatin protein 1 (HP1) (Ryan et al., 1999) and may mediate transcriptional silencing through the recruitment of histone modifying proteins such the histone deacetylases (Underhill et al., 2000; Schultz et al., 2001), and a histone 3, lysine 9-specific methyltransferase (Schultz et al., 2002). 17 of the 40 mouse SCAN family members have a KRAB domain (42.5%), which is identical to the percentage of human SCAN family members that contain KRAB domains (42.4%, or 25 of 59 family members). An alignment of the mouse SCAN family member KRAB domains is shown in Fig. 3. The KRAB domain is located at the N-terminus of the mouse SCAN family member, usually C-terminal to the SCAN domain. Although typically found in single copy, two SCAN family members, Zfp369 or NRIF (neurotrophin receptor interacting factor) and Zfp110, have two KRAB domains that flank the SCAN domain (Casademunt et al., 1999). Additionally, isoforms can be generated from some of the mouse SCAN family members that contain just the SCAN and KRAB domains (e.g., Zfp496). KRAB-containing proteins can be classified into three types (Mark et al., 1999). These include proteins that contain a KRAB-A domain alone, those with both -A and B domains, or those that have an -A domain and a divergent -B domain. All KRAB-containing mouse SCAN proteins possess KRAB-A domains. However, while certain conserved residues are present in some members, no definitive KRAB-B domain is observed in the mouse SCAN proteins. It is not possible to distinguish the KRAB domains found in either the mouse or the human SCAN family members from those in transcription factors without a SCAN domain. It is likely that the SCAN domain will influence the repression

L.C. Edelstein, T. Collins / Gene 359 (2005) 1 – 17

function of the KRAB domain, although this has not been extensively studied. 5.5. SANT domains Two of the mouse SCAN family members, Zfp 31 and Zfp694, have SANT (switching-defective protein 3 (Swi3), adapter 2 (Ada2), nuclear receptor co-repressor (N-CoR), transcription factor (TF) IIIB) domains. SANT domains were initially identified in nuclear receptors co-repressors as a 50amino-acid motif (Aasland et al., 1996), although the domain was subsequently found in the subunits of many chromatinremodeling complexes. SANT domains have been linked to chromatin remodeling. An essential role for the SANT domain in histone acetylation and histone deacetylation has been demonstrated and SANT domains have also been linked to ATP-dependent chromatin remodeling (reviewed in Boyer et al., 2004). Additionally, biochemical analysis of SANT domains suggests that the domain plays a role in histone tail recognition (Yu et al., 2003). One general model for the function of a SANT domain suggests that it has a central role in chromatin remodeling by functioning as a histone interaction module, coupling histone binding to enzyme catalysis by histone acetylation or ATP-dependent chromatin remodeling (Boyer et al., 2004).

7

The presence of the SANT domains in two members of the SCAN family suggests that at least these members play a role in regulating chromatin accessibility. In both SCAN family members, the SANT domains are quite similar to each other (82 – 84% similar) (Fig. 4C). This is in contrast to the situation with N-CoR and SMART (46 –48%). Like the SCAN family members these nuclear receptor corepressors both contain a pair of closely spaced SANT domains (designated SANT1 and SANT2). Recent work has highlighted distinct roles for each SANT domains in the function of SMRT/N-CoR-HDAC complexes. The binding of SMRT and N-CoR to the catalytic HDAC subunit is necessary and sufficient to form an active deacetylase. This activating role for SMRT and N-CoR requires the amino-terminal SANT1 domain (Yu et al., 2003). The SANT2 domain cannot substitute for SANT1 in terms of HDAC binding and activation. Recent studies demonstrate that the SANT2 domain in SMRT functions as a histone-tail-interaction domain (Yu et al., 2003). Specifically, the isolated SANT2 domain binds to unacetylated histone-H4 and this interaction is disrupted by tetraacetylation of H4. Additionally, the presence of SANT2 enhances the HDAC activity of SMRT-HDAC3 by increasing the affinity of the complex for histone tails. The similarity of the SCAN family member SANT

Fig. 4. Conserved domains in some of the SCAN domain family members. The amino acid sequences of (A) the N-terminal novel region; (B) KRAB domain; and (C) SANT domain from selected members of the mouse SCAN family were aligned by consensus ClustalW sequence alignment using Vector NTI 8.0 (Invitrogen) with a gap penalty setting of 10.0. The consensus sequences are shown below the alignment. Bold type indicated 100% conservation. Single or multiple capital letters indicate similar residues that are present in >50% of the sequences. Lower case letters indicate residues that a present more often that other residues. Residues that share >50% identity are outlined with dark shading, conserved amino acid differences are indicated by lighter shading. Sequences for a prototypical KRAB domain (Zfp119) and SANT (SMART and NCoR) are also included in the alignments.

8

L.C. Edelstein, T. Collins / Gene 359 (2005) 1 – 17

domains to SANT2 suggests that they may function as histone-tail-interaction domains. 5.6. C2H2 zinc fingers in the mouse SCAN family members The majority of SCAN domains in the mouse are associated with C2H2 zinc finger domains (88%). These 37 members contain a variable number (3 –14) of C2H2 zinc fingers, defined by the consensus sequence CX 2 – 4CX12HX2 – 6H (with X being any amino acid). It is interesting to note that the range in the number of zinc fingers in the mouse SCAN family from 3 (i.e., GM1754, A630035I11RIK and LOC434551) to 14 zinc fingers (i.e., Zfp 206 and 287) is smaller than in the human SCAN family (2– 22). As with the human family members, all of the C2H2 zinc finger motifs are located in the C-terminal half of the SCAN proteins (Figs. 1 and 3). Most of these C2H2 zinc fingers are Kru¨ppel-type, as defined by the conserved link (TGEKP(Y/F)X) between the histidine of the preceding finger with the cysteine of the next finger (H-C link). Some of the mouse family members contain multiple-adjacent fingers that are arranged in clusters of 3 or more, while other members contain pairs of zinc fingers. Since Kru¨ppel-type factors are frequently involved in DNA-binding, this variation in zinc finger number and spacing may have an effect on nucleotide base pair recognition (Nagaoka et al., 2001). Notably, members of the mouse SCAN family with multiple adjacent C2H2 zinc fingers may have more than one DNA binding activity, as is seen with other proteins with more than four zinc fingers (reviewed in Iuchi, 2001). C2H2 fingers have been implicated in protein –protein interactions (reviewed in Mackay and Crossley, 1998), and it is clear that some zinc fingers can interact with other proteins. For example, consensus C2H2 zinc fingers have been shown to be responsible for selective homodimerization of Ikaros family members (McCarty et al., 2003). As will be discussed in the next section, Zfp496 (Nizp1) is a member of the SCAN family and contains four consensus C2H2 zinc fingers preceded by a unique finger-like motif designated C2HR. This unique motif of Nizp1 functions to mediate protein – protein interactions with the chromatin associated protein NSD1. This raises the provocative possibility the zinc finger-like elements in some of the other SCAN family members (Figs. 1 and 3) are actually interaction domains capable of recruiting proteins that are important for the function of the SCAN family member.

6. Overview of selected mouse SCAN family members Although the identification of expressed sequence tags ensured that the majority of SCAN domain-encoding genes are transcribed, only a few of the family members have been examined in any detail. Based on the study of human

SCAN family members, it is likely that ZNF202 regulates the expression of genes involved in lipid metabolism (Kort et al., 2000; Wagner et al., 2000; Porsch-Ozcurumez et al., 2001; Langmann et al., 2003); ZNF197 is a negative regulatory of hypoxia inducible factor-1a (Li et al., 2003); and ZNF444 is a positive regulator of the scavenger receptor expressed by endothelial cells (Adachi and Tsujimoto, 2002). Here we will focus on a few of the mouse SCAN family members that play a role in development or differentiation. 6.1. Neurotrophin receptor interacting factor (NRIF, or Zfp369) Zfp369 or its human homologue ZNF274 may provide new insights into the function of this class of SCAN domain containing proteins. The zinc finger protein designated neurotrophin receptor interacting factor (NRIF) binds to the cytoplasmic domain of the neurotrophin receptor p75NTR, a member of the tumor necrosis factor receptor family. The retinae of mice with a targeted mutation in NRIF show reduced cell death, and this reduction is similar to that seen in both p75 and nerve growth factor null mice (Casademunt et al., 1999). These findings suggest that NRIF is a SCAN family member that is an essential component downstream of NGF and p75 in the cell death pathway. NRIF contains five zinc fingers, as well as a SCAN domain that is positioned between a KRAB-A and -B domain and a KRAB-A domain. Although over expression of NRIF alone localizes to nucleus, co-expression of both p75 and NRIF changes the pattern of expression such that NRIF localizes to both the cytoplasm and the nucleus. The SCAN motif does not appear to participate in the interaction between the transcription factor and the cell surface receptor. A direct functional interaction was reported between NRIF and TRAF6, another protein known to associate with the intracellular domain of the p75 neurotrophin receptor (Gentry et al., 2004). The association of NRIF with TRAF6 is mediated by the amino-terminal KRAB domain of NRIF and the TRAF-C domain of TRAF6 (Gentry et al., 2004). NRIF enhanced TRAF6-mediated activation of the c-Jun NH2-terminal kinase (JNK). The expression of both intact NRIF and TRAF6 was required to reconstitute p75 activation of JNK, suggesting that NRIF and TRAF6 functionally interact to facilitate neurotrophin signaling through the p75 receptor. A provocative model suggests that upon receptor activation, NRIF interacts with TRAF6 and then translocates to the nucleus and regulates the expression of genes related to cell viability in the developing nervous system. This raises the exciting general possibility that upon ligand-induced activation of some cell surface receptors, SCAN box containing transcription factors are released from a cell surface receptor and translocate to the nucleus where they interact with themselves, or with other members of the SCAN family, and regulate transcription.

L.C. Edelstein, T. Collins / Gene 359 (2005) 1 – 17

6.2. Zfp496 (NSD1-interacting zinc finger protein 1, Nizp1) Sotos syndrome is a neurologic disorder characterized by overgrowth from the prenatal stage through childhood and is associated with advanced bone age, a dysmorphic face with macrocephaly and a pointed chin, metal retardation and a possible susceptibility to tumors (Opitz et al., 1998). The most frequent cause of Sotos syndrome is haploinsufficiency caused by point mutations and microdeletions within the NSD1 gene (Kurotaki et al., 2002). NSD1 contains a SET [Su(var)3– 9, Enhancer of zeste, Trithrox] domain which is found in a variety of chromatin-associated proteins that function as histone lysine methyltransferases (Huang et al., 1998). It was reported that the NSD1 has a catalytically active SET domain that specifically methylates recombinant histone H3 at lysine 36 and histone H4 at lysine 20 and is essential for early development (Rayasam et al., 2003). In addition to the SET domain, NSD1 contains five plant homeodomain (PHD) modules and a PWWP (proline – tryptophan – tryptophan –proline) motif, which are seen in various nuclear proteins involved in cell growth and differentiation, as well as two nuclear receptor interaction domains. In view of the potential importance of NSD1 in controlling transcription, a search was done for proteins that bind to NSD1. Using NSD1 as bait in a yeast twohybrid screen, a member of the SCAN family was isolated that was designated Nizp1 (Nielsen et al., 2004). This protein contains a SCAN domain, a KRAB-A domain and four consensus C2H2 type zinc fingers, preceded by a unique finger derivative, referred to as a C2HR motif. The C2HR motif of Nizp1 functions to mediate protein –protein interactions with the cysteine rich C5HCH domain of NSD1 and when targeted to a promoter, represses transcription. Interestingly, mutation of the C2HR motif, or converting it to a consensus C2H2 zinc finger, abolishes the interaction of Nizp1 with NSD1 and compromises the ability of Nizp1 to repress transcription. Although the C2HR motif is found in other zinc finger-containing proteins, none of these is a member of the SCAN family. Collectively these findings suggest that Nzip1 contains a novel type of zinc finger motif that functions as a docking site for NSD1 and is more than just an evolutionary remnant of a C2H2 finger (Nielsen et al., 2004). The recruitment of NSD1 may not be sufficient to account for all of the Nizp1 repressive properties. By also recruiting the corepressor KAP-1 to the KRAB-A domain, Nizp1 may utilize several activities that lead to changes in chromatin structure and decrease transcription.

9

composed of three chains, a1(XI), a2(XI) and a3(XI) and plays a critical role in the formation of cartilage fibrils and in the development of bones. A specific 24-bp region of the a2(XI) collagen gene (Col11a2) promoter was sufficient to drive cartilage-specific expression of a heterologous gene in transgenic animals (Tsumaki et al., 1998). To identify proteins that bound to this region of the Col11a2 promoter, a mouse limb bud expression library was screened using a yeast one-hybrid approach and identified NT2, a member of the SCAN family (Tanaka et al., 2002). NT2 contains a SCAN domain, a KRAB domain and nine consensus C2H2 type zinc fingers. NT2 expression was inversely correlated with expression of Col11a2 in that it is highly expressed by hypertrophic chondrocytes, but is minimally expressed by resting or proliferating chondrocytes. NT2 bound to and inhibited Col11a2 promoter activity through the 24-bp site in a KRAB dependent mechanism. Collectively, the findings suggest that the cartilage-specific expression of this collagen gene is negatively regulated by this SCAN family member during embryonic development and chondrocyte differentiation.

7. Genomics of the mouse SCAN family members The chromosomal map position of each predicted mouse SCAN family member was determined using chromosome mapping databases available at NCBI and Celera. The genomic distribution of the mouse SCAN domain family members is shown in Fig. 5. 7.1. Clustering of the expressed mouse SCAN family genes The genes for the mouse SCAN domain family have been mapped to specific chromosome locations. Of the expressed SCAN-containing genes, 12 are isolated single genes. The majority of mouse genes (62%) are found in clusters of more than two genes on mouse chromosomes 5, 7, 13 and 18, and some of these genes are arrayed in tandem. In at least one instance on mouse chromosome 7, the SCAN genes are present in a subtelomeric cluster, which, by analogy with other genes in this position, may be associated with rapid evolution (Mefford and Trask, 2002). It appears likely that in-situ duplication of the mouse SCAN family members occurred since the mouse SCAN domains in a cluster share high sequence similarity (data not shown; Sander et al., 2003). Following gene duplication, adaptive evolution of selected mouse SCAN family members would have generated diversification of function within the family.

6.3. Zfp263 (NT2) 7.2. SCAN-like sequences in the mouse genome SCAN family members play roles in the regulation of genes that are involved in the production of cartilage and in skeletal morphogenesis. Cartilage contains an extracellular matrix that includes type II, IX and XI collagens and provides mechanical strength to joints. Type XI collagen is

In addition to finding 40 SCAN-containing coding genes and 5 SCAN-containing pseudogenes with a TBLASTX search, 8 SCAN-like sequences (SLSs) were found in the mouse genome (Fig. 6). SLSs are DNA sequences that, if

10 L.C. Edelstein, T. Collins / Gene 359 (2005) 1 – 17 Fig. 5. Genomic distribution of mouse SCAN domain family members. The chromosomal loci of the 40 expressed mouse SCAN genes are shown on an idiogram of the mouse karyotype (David Adler, Department of Pathology, University of Washington, Seattle). In instances where an official gene symbol was not available, the NCBI (LOC), Celera Genomics (mCG) or Riken (Rik) designations are given. Pseudogenes are indicated by an asterisk. #Indicates genes not precisely positioned on the chromosome. Genes listed under ‘‘Not Placed’’ have no chromosomal location information.

L.C. Edelstein, T. Collins / Gene 359 (2005) 1 – 17

11

translated, resemble SCAN domains. However, these sequences have no EST data, nor are they in any annotated gene or gene model in either the NCBI or Celera databases. In some cases, these sequences are interrupted by a stop codon. These sequences are found in genomic locations where there are no coding SCAN genes, such as on the mouse Y chromosome. These sequences may represent severely degraded pseudogenes, or may provide insight into the origin of the SCAN-domain. In some instances the SCAN-like sequences are found adjacent to retroviral like elements, such as long terminal repeat (LTR) sequences (data not shown). It is possible that these SCAN-like sequences might have remained associated with the retroviral elements that died out during vertebrate evolution.

8. Mouse –human SCAN family member orthologues

Fig. 6. Comparison of SCAN gene family members in syntenic regions of mouse and human chromosomes. (A) A SCAN cluster that is well conserved between mice and humans. (B) Human specific expansion of SCAN family members and the splitting of a human cluster to two mouse clusters. (C) A syntenic region that demonstrates both mouse-specific expansion (human SCAN B gene ZNF494 is represented by three mouse genes LOC232867, LOC434555, mCG113343), and human specific expansion (human genes LOC342933, LOC342934, ZNF495, LOC39077 and BC330812 have no apparent mouse orthologues).

A reciprocal comparison of the human and mouse SCAN-containing genes within regions of conserved synteny identified likely orthologues. In some instances, conserved reference sets of genes that flank the genomic segment of the human SCAN gene(s) were located in the mouse genome. To identify more orthologous pairs, each mouse SCAN protein sequence was used in a blastp search of the human genome RefSeq protein database. Additionally, NCBI’s automatic orthologue assignment database, Homologene was consulted. The results presented in Table 2 indicate orthologous pairs, how the assignment was made, the degree of conservation between the pairs, and whether domain identification programs such as Prosite and SMART recognized the same number and types of domains in each. The mouse –human SCAN family orthologous pairs exhibit a range of homology from 47 to 99 percent similarity, with a mean value of 74% (Table 2). This is somewhat lower than the 87% sequence conservation typically seen with rodent – human orthologue pairs (Zhu et al., 2003; Gibbs et al., 2004). This suggests that the SCAN genes are evolving faster than average, which corresponds with the finding that SCAN genes have expanded rapidly (Sander et al., 2003). Most of these mouse SCAN family members are represented by putative orthologues on conserved segments of the syntenic human chromosomes. In several cases, homologous SCAN family members within the human clusters were indistinguishable from each other when compared to the mouse and, as a result, an orthologue assignment was not possible. Analysis of the syntenic regions of SCAN genes in the mouse and human genome revealed several interesting methods by which SCAN clusters differentiated in the two species. The human cluster at chromosome 7q22 is completely conserved on the mouse chromosome 5G2 (Fig. 7A). Interestingly, we found human SCAN clusters that are represented by a larger number of SCAN genes than in the conserved syntenic regions of the mouse. For example, the six clustered SCAN genes on human

12

L.C. Edelstein, T. Collins / Gene 359 (2005) 1 – 17

Table 2 Mouse gene

Human gene

Method of determination

% Identity

% Similarity

Domains match?

Zfp694 Hkr2 Scand1 Scand5 Zfp110 Zfp167 Zfp174 Zfp187 Zfp191 Zfp192 Zfp202 Zfp206 Zfp213 Zfp24 Zfp263 Zfp287 Zfp29 Zfp305 Zfp306 Zfp307 Zfp31 Zfp369 Zfp397 Zfp444 Zfp445 Zfp446 Zfp449 Zfp483 Zfp496 Zfp498 Zfp535 Zfp685 Zfp686 Zfp690 Zfp95 Zfp96 Zfp98 Zfp99 Zipro1 Zkscan1

LOC342357 Hkr2 Scand1 LOC342933 ZNF274 ZNF167 ZNF174 ZNF323 ZNF24 ZNF192 ZNF202 ZNF206 ZNF213 ZNF24 ZNF263 ZNF287 ZFP29 ZNF305 ZNF306 ZNF307 ZNF31 ZNF274 ZNF397 ZNF444 ZNF445 ZNF446 ZNF449 ZNF483 ZNF496 ZNF498 ZNF18 ZNF494 ZNF494 FLJ35867 ZNF95 ZNF305 ZNF42 ZNF394 ZNF38 ZNF36

b, b, b, b, b, b, b b b, b, b, b, b b b, b, b, b b, b, b b b, b, b, b, b, b, b, b b, b b, b, b, b, b, b, b, b,

82 71 63 41 41 60 63 54 94 85 76 61 66 75 80 82 88 42 70 60 74 36 45 81 58 57 87 56 77 84 77 36 37 77 77 65 82 64 78 89

88 77 65 56 51 68 68 68 99 89 83 68 73 82 84 88 91 57 78 67 81 47 61 84 70 62 91 68 82 86 83 53 52 83 83 75 86 73 84 92

Y Y Y N Y N Y N Y Y Y Y Y N Y Y Y N Y N Y Y N Y N Y Y N Y Y Y Y Y N Y N Y Y Y Y

h h h, s h h h, s

h, h, h, h,

s s s s

h, s h, s h h, s h, s

s h h, h h, h, h,

s s s s

h h h, h, h, h h, h h

s s s s

b = BLAST search, h = NCBI homologene entry, s = synteny.

chromosome 16p13.3 are represented by two clusters of only four SCAN genes in the conserved segments of mouse chromosomes 16A and 17A3 (Fig. 7B). These findings provide evidence for human-specific cluster expansions of SCAN family members. Interestingly, mouse-specific expansion of SCAN family members is also seen.

ZNF494 is a SCAN B domain-containing gene and is present in one copy in the human genome. The presence of four SCAN B domains in the mouse, including three human ZNF494 homologues on mouse chromosome 7 (LOC232867, LOC434555, and mCG113343 (Fig. 7C)), indicates the presence of a mouse-specific expansion.

Fig. 7. SCAN-like sequences (SLS) in the mouse genome. SLS were found using a tblastn search of the mouse genome with a prototype SCAN sequence. The sequences were not located in either defined genes or models in either the NCBI or Celera databases. The amino acid sequences were aligned using a ClustalW algorithm in Vector NTI 8.0 (Invitrogen). Also shown are the SCAN domains of coding genes Zfp99, Zfp535, Gm754 and LOC381986, for comparison.

L.C. Edelstein, T. Collins / Gene 359 (2005) 1 – 17

Collectively these observations suggest that some genes within the SCAN family are lineage-specific and may have been selected independently since the divergence of primate and rodent lineages (Sander et al., 2003). Pragmatically, this means that not every human SCAN family member will have a mouse homologue. However, the identification of mouse homologues of some of the human SCAN genes has allowed provisional human models to be replaced with more definitive structures (Fig. 1).

9. SCAN family members are vertebrate specific Initial reports of the human genome suggested that SCAN domain-containing C 2H 2 zinc finger proteins (SCAN-ZFP) are unique to vertebrates (Lander et al., 2001; Venter et al., 2001). As expected, the genomes of invertebrate species, such as the fruit fly (D. melanogaster), nematode (C. elegans), yeast (Saccharomyces cerevisiae) and green plant (A. thaliana), do not contain SCAN domains. Databases were searched for predicted SCAN domains outside of the human and mouse genomes. SCAN domains were found in the genomes of a number of vertebrate organisms including monkey, cow, pig, mouse, rat, frog and several types of fish. Representative examples of SCAN domains from each of these species are provided in Sander et al. (2003). Interestingly, recent sequence analysis of the chicken genome revealed that the entire SCAN domain family seems to be absent from the chicken gene set (Hillier et al., 2004). Importantly, the SCAN domains found in lower vertebrates are not associated with C2H2 zinc finger genes, but are contained in large retroviruslike polyproteins that are reminiscent of those found in the retrovirus-like polyproteins (Sander et al., 2003). The absence of the SCAN domain in invertebrates and birds and the presence of SCAN domains in retroviral-like polyproteins in lower vertebrates suggest that these genes originated and rapidly expanded during recent vertebrate evolution (Fig. 8). This hypothesis is supported by the recently reported solution structure of the SCAN dimer, which is remarkably similar to the dimerization domain found in the human immunodeficiency-1 capsid C-terminal

13

domain (Ivanov et al., 2005). This phyletic distribution suggests a possible acquisition of new regulatory function in the mammalian lineages, whereas in the avian lineage the domain might have remained associated with retroviral sequences that died out subsequently (Hillier et al., 2004). Thus the SCAN family members (Sander et al., 2003), like the members of the KRAB family (Looman et al., 2002) in higher eukaryotes, and the ZAD-containing (Chung et al., 2002; Lespinet et al., 2002) C2H2 zinc finger genes in Drosophila, are characterized by lineage-specific expansions in their respective genomes (reviewed in Lespinet et al., 2002).

10. The SCAN family regulatory network Generating complexity is easy. Controlling it is not. As biological systems become more complex, the amount of information dedicated to regulation must increase. In evolutionary terms, the recent addition and rapid expansion of the SCAN family may accompany the rapid diversification associated with vertebrate complexity. How a collection of regulatory proteins accomplishes the task of controlling these target genes can be described as a regulatory network. Studies in the model eukaryote S. cerevisiae has provided new insights into how to decipher these regulatory networks (reviewed in Wyrick and Young, 2002 and Zhu and Snyder, 2002). The development of a number of genetic tools, including genome-wide expression analysis, has driven this progress (Ren et al., 2000; Simon et al., 2001). DNA microarrays have been used to profile the genomic binding sites of specific transcription factors. Computational algorithms have been developed that identify potential regulatory sequences in promoter regions throughout the yeast genome (reviewed in Wyrick and Young, 2002). These new approaches have provided critical insights into the components of different signaling pathways and the molecular and cellular responses of these pathways. We predict that some of the SCAN family members will assemble into a signaling network. To understand the structure of this regulatory network, a variety of issues will have to be addressed: first, what is the diversity of SCAN

Fig. 8. Evolutionary adaptation of SCAN domain proteins in vertebrates. In fish (Osteichthyes), SCAN domains are associated with retroviral elements like zinc knuckles, integrase, and reverse transcriptase polyprotein domains. ND indicates there is no genomic data for reptiles (Diapside), and there is a lack of SCAN domains in birds (Aves). In mammals, SCAN domains are usually associated with zinc fingers.

14

L.C. Edelstein, T. Collins / Gene 359 (2005) 1 – 17

family members? It will be important to determine which SCAN family members form homodimers and which members can interact to form heterodimers. Additionally, it will be important to determine what other proteins interact with the SCAN family members. Second, using the framework for networks described by Babu et al. (2004), the basic units of the SCAN network should be defined. The basic unit comprises the transcription factor and its target gene with a specific DNA recognition site. It will be important to determine if SCAN family members regulate a select few genes or participate in the regulation of a disproportionately large number of target genes. Third, basic units can be organized into motifs, which comprise specific patters of inter-regulation that are over represented in networks. Results from the yeast model system suggest that transcription factors form a limited number of regulatory networks. Lee et al. (2002) cross-linked tagged transcription factors to target promoters, recovered the DNA and then identified the target genes. They found that many yeast promoters were bound by more than two transcription factors. Additionally, they used this information to identify six basic regulatory motifs, designated: single-input motif, autoregulation, feedforward loops, multiple input motifs and regulatory chains. It will be important to determine if SCAN family members regulate the expression of themselves, or other SCAN family members, as well as other transcription factors and target genes. Fourth, these network motifs can be interconnected to form semi-independent modules. It is possible that distinct cellular processes regulated by SCAN family members might be controlled by discrete modules. Fifth, the entire assembly of SCAN regulatory interactions would constitute a subset of the cells larger transcriptional network and integrate into the cells global network organization. 10.1. Selective SCAN – SCAN interactions Based on the ability of the SCAN domain to function as a dimerization domain, it is possible that the domain participates in the assembly of a network of interacting SCAN family members. To determine which SCAN domains have the ability to interact with one another, pairwise combinations of nine isolated SCAN motifs were tested in the mammalian two-hybrid system (Williams et al., 1999). In addition to the findings from the mammalian twohybrid system, various SCAN domains have been used as bait in the yeast two-hybrid system, revealing new interactions among family members (Sander et al., 2000; Schumacher et al., 2000; Porsch-Ozcurumez et al., 2001). Collectively, a few general features of SCAN domain interactions resulted from the studies: first, not all SCAN domains are able to self-associate, or homodimerize. Second, interactions between different SCAN domains are selective. For example, ZNF174 can interact with some, but not all, SCAN domains. Third, there is significant variation in the relative affinities of the SCAN domains. It is

important to note, however, that the majority of these observations outlined above were based on studying isolated SCAN –SCAN associations. Whether most of these interactions occur in the context of full-length proteins is uncertain. Nevertheless, these studies with a small subset of SCAN family may provide some initial guidelines for the assembly of a SCAN protein network. 10.2. SCAN family member target genes: defining the basic units of the network Several experimental approaches have been used to identify candidate target genes for SCAN family members: first, perhaps most compelling are results from groups that have functionally characterized a specific regulatory element in a gene of interest and then used that site in a ‘‘onehybrid’’ screen to identify the cognate regulatory protein. The yeast one-hybrid system is used for isolating novel genes encoding proteins that bind to a target, cis-acting regulatory element (Wang and Reed, 1993). Several members of the SCAN family have been identified by this approach. As discussed above, Zfp263 (NT2) has nine zinc fingers and a KRAB domain and interacts with an element in the promoter of the a2(XI) collagen gene (Tanaka et al., 2002). EZF-2 (endothelial zinc finger protein-2 or ZNF444) is a member of the family with four zinc fingers and specifically targets the scavenger receptor expressed by endothelial cells (Adachi and Tsujimoto, 2002). ZNF24 (ZNF191) interacts with an intronic polymorphic TCAT repeat in the tyrosine hydroxylase gene, the rate-limiting enzyme in the synthesis of catecholamines (Albanese et al., 2001). A second experimental approach to determine the genes regulated by a SCAN family member is to determine its DNA binding site and then utilize this information to identify candidate target genes. Based on the binding site determination, ZNF202 has been shown to interact with and control the expression of a series of genes involved in lipid metabolism, including the ATP binding cassette transporter A1 (Porsch-Ozcurumez et al., 2001). Using similar approaches, binding sites for MZF1 (ZNF42) were found in CD34, c-myb and myeloperoxidase, genes that are expressed in hematopoietic cells. With the annotation of the zinc finger domain in the human (Fig. 1) and the mouse (Fig. 2) SCAN families, this approach could be automated to rapidly identify binding sites for family members (Bulyk et al., 2001). These sites could be used, in turn, to identify candidate target genes for subsequent validation. A third experimental approach involves the yeast twohybrid system. This approach has been useful in suggesting possible functions for SCAN family members. SCAN family members have been identified as targets in twohybrid screens. For example, to identify proteins that interact with the NSD1, or the human von Hipple Lindau tumor suppressor, two-hybrid approaches were used and identified SCAN family members Zfp496 (or Nizp1, as

L.C. Edelstein, T. Collins / Gene 359 (2005) 1 – 17

discussed above), or ZNF197, respectively (Li et al., 2003). In addition, SCAN family members have been used as bait in the yeast two-hybrid system to identify interacting proteins. For example, to understand the mechanism of transcriptional activation of the myeloid zinc finger protein (ZNF42 or MZF1), it was used in a two-hybrid interaction trap. A novel SWI/SNF related protein, termed mammalian Domino, was identified as a candidate MZF-2A interacting partner (Ogawa et al., 2003). Notably mDomino contains an SWI/SNF-type ATPase/helicase domain, a SANT domain, and a glutamine rich domain. Over expression of Domino enhanced MZF-2A medicated activation of a reporter gene. In similar recent two-hybrid studies, MZF-1 was also shown to interact with LDOC1, a gene encoding a leucine-zipper protein whose expression is decreased in some pancreatic and gastric cancer cells lines (Inoue et al., 2005). LDOC1 induces apoptosis in some human cell lines, at least in part, by cooperating with MZF-1. The selective interactions between SCAN domains raises the possibility that SCAN family members may regulate gene expression in more complex ways. Unraveling a SCAN signaling network will require a combination of recently developed approaches. DNA microarray expression profiling is one of the most powerful methods to analyze global gene expression. Large-scale microarray analyses reveal that transcriptional coregulation patterns can be remarkably helpful in predicting the function of novel mouse genes (Zhang et al., 2004). The ability to analyze transcripts simultaneously provides a detailed molecular phenotype that could be used to infer the role of SCAN family members. Expression microarrays have been quite useful in dissecting signaling pathways. However, these arrays do not reveal the direct downstream targets of transcription factors. Recently, an approach, designated chromatin immunoprecipitation (ChIP) chip, has been described that permits a comprehensive identification of targets of transcription factors. This ChIP-chip approach may be useful in defining the functions of the SCAN family members.

11. Concluding remarks Here we have used bioinformatic approaches to identify the SCAN domains in the mouse genome and to define the structures of the 40 expressed members in the mouse SCAN family. In addition to a single SCAN domain, the members of the family can have an N-terminal motif, a KRAB domain, a SANT domain and a variable number of zinc fingers. The mouse genes encoding the SCAN family members are often clustered and are capable of generating isoforms that probably affect the function of the family members. Analysis of the SCAN domain family using comparative genomic approaches reveals that the SCAN family has rapidly expanded and is vertebrate, and possibly mammalian specific. As a result of the ability of the SCAN domain to mediate dimerization, a diverse network of

15

transcription factor dimers could be generated that may play a key role in the uniqueness of higher vertebrates. The identification of interacting proteins and target genes regulated by the family members should provide more information about this remarkable family of proteins.

Acknowledgements This work was supported by NIH grants (RO1 GM 66516). We are indebted to former members of the laboratory who worked on the SCAN domain, including Drs. T. Sander (Medical College of Wisconsin), J. Stone (Massachusetts General Hospital), K. Stringer (University of Cincinnati) and A. Williams (Millennium), as well as J. Maki (University of Massachusetts).

References Aasland, R., Stewart, A.F., Gibson, T., 1996. The SANT domain: a putative DNA-binding domain in the SWI-SNF and ADA complexes, the transcriptional co-repressor N-CoR and TFIIIB. Trends Biochem. Sci. 21, 87 – 88. Adachi, H., Tsujimoto, M., 2002. Characterization of the human gene encoding the scavenger receptor expressed by endothelial cell and its regulation by a novel transcription factor, endothelial zinc finger protein-2. J. Biol. Chem. 277, 24014 – 24021. Albagli, O., Dhordain, P., Deweindt, C., Lecocq, G., Leprince, D., 1995. The BTB/POZ domain: a new protein – protein interaction motif common to DNA- and actin-binding proteins. Cell Growth Differ. 6, 1193 – 1198. Albanese, V., Biguet, N.F., Kiefer, H., Bayard, E., Mallet, J., Meloni, R., 2001. Quantitative effects on gene silencing by allelic variation at a tetranucleotide microsatellite. Hum. Mol. Genet. 10, 1785 – 1792. Babu, M.M., Luscombe, N.M., Aravind, L., Gerstein, M., Teichmann, S.A., 2004. Structure and evolution of transcriptional regulatory networks. Curr. Opin. Struct. Biol. 14, 283 – 291. Bardwell, V.J., Treisman, R., 1994. The POZ domain: a conserved protein – protein interaction motif. Genes Dev. 8, 1664 – 1677. Bellefroid, E.J., Poncelet, D.A., Lecocq, P.J., Revelant, O., Martial, J.A., 1991. The evolutionarily conserved Kruppel-associated box domain defines a subfamily of eukaryotic multifingered proteins. Proc. Natl. Acad. Sci. U. S. A. 88, 3608 – 3612. Boyer, L.A., Latek, R.R., Peterson, C.L., 2004. The SANT domain: a unique histone-tail-binding module? Nat. Rev., Mol. Cell Biol. 5, 158 – 163. Bulyk, M.L., Huang, X., Choo, Y., Church, G.M., 2001. Exploring the DNA-binding specificities of zinc fingers with DNA microarrays. Proc. Natl. Acad. Sci. U. S. A. 98, 7158 – 7163. Casademunt, E., Carter, B.D., Benzel, I., Frade, J.M., Dechant, G., Barde, Y.A., 1999. The zinc finger protein NRIF interacts with the neurotrophin receptor p75(NTR) and participates in programmed cell death. EMBO J. 18, 6050 – 6061. Chung, H.R., Schafer, U., Jackle, H., Bohm, S., 2002. Genomic expansion and clustering of ZAD-containing C2H2 zinc-finger genes in Drosophila. EMBO Rep. 3, 1158 – 1162. Collins, T., Sander, T.L., 2005. The superfamily of SCAN domain containing zinc finger transcription factors. In: Iuchi, S., Kuldell, N. (Eds.), Zinc Finger Proteins: From Atomic Contact to Cellular Function. Landes Bioscience/Eurekah.com. Kluwer Academic/Plenum Publishers. Friedman, J.R., et al., 1996. KAP-1, a novel corepressor for the highly conserved KRAB repression domain. Genes Dev. 10, 2067 – 2078.

16

L.C. Edelstein, T. Collins / Gene 359 (2005) 1 – 17

Gentry, J.J., Rutkoski, N.J., Burke, T.L., Carter, B.D., 2004. A functional interaction between the p75 neurotrophin receptor interacting factors, TRAF6 and NRIF. J. Biol. Chem. 279, 16646 – 16656. Gibbs, R.A., et al., 2004. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428, 493 – 521. Gill, G., 2003. Post-translational modification by the small ubiquitin-related modifier SUMO has big effects on transcription factor activity. Curr. Opin. Genet. Dev. 13, 108 – 113. Hillier, L.W., et al., 2004. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695 – 716. Huang, N., et al., 1998. Two distinct nuclear receptor interaction domains in NSD1, a novel SET protein that exhibits characteristics of both corepressors and coactivators. EMBO J. 17, 3398 – 3412. Inoue, M., Takahashi, K., Niide, O., Shibata, M., Fukuzawa, M., Ra, C., 2005. LDOC1, a novel MZF-1-interacting protein, induces apoptosis. FEBS Lett. 579, 604 – 608. Iuchi, S., 2001. Three classes of C2H2 zinc finger proteins. Cell. Mol. Life Sci. 58, 625 – 635. Ivanov, D., Stone, J.R., Maki, J.L., Collins, T., Wagner, G., 2005. Mammalian SCAN domain dimer is a domain-swapped homolog of the HIV capsid C-terminal domain. Mol. Cell 17, 137 – 143. Kim, S.S., Chen, Y.M., O’Leary, E., Witzgall, R., Vidal, M., Bonventre, J.V., 1996. A novel member of the RING finger family, KRIP-1, associates with the KRAB-A transcriptional repressor domain of zinc finger proteins. Proc. Natl. Acad. Sci. U. S. A. 93, 15299 – 15304. Klug, A., Schwabe, J.W., 1995. Protein motifs 5. Zinc fingers. FASEB J. 9, 597 – 604. Kort, E.N., et al., 2000. Evidence of linkage of familial hypoalphalipoproteinemia to a novel locus on chromosome 11q23. Am. J. Hum. Genet. 66, 1845 – 1856. Kurotaki, N., et al., 2002. Haploinsufficiency of NSD1 causes Sotos syndrome. Nat. Genet. 30, 365 – 366. Lander, E.S., et al., 2001. Initial sequencing and analysis of the human genome. Nature 409, 860 – 921. Langmann, T., et al., 2003. ZNF202 is inversely regulated with its target genes ABCA1 and apoE during macrophage differentiation and foam cell formation. J. Lipid Res. 44, 968 – 977. Lechner, M.S., Begg, G.E., Speicher, D.W., Rauscher III, F.J., 2000. Molecular determinants for targeting heterochromatin protein 1-mediated gene silencing: direct chromoshadow domain-KAP-1 corepressor interaction is essential. Mol. Cell. Biol. 20, 6449 – 6465. Lee, T.I., et al., 2002. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799 – 804. Lespinet, O., Wolf, Y.I., Koonin, E.V., Aravind, L., 2002. The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Res. 12, 1048 – 1059. Li, Z., Wang, D., Na, X., Schoen, S.R., Messing, E.M., Wu, G., 2003. The VHL protein recruits a novel KRAB-A domain protein to repress HIF1alpha transcriptional activity. EMBO J. 22, 1857 – 1867. Looman, C., Abrink, M., Mark, C., Hellman, L., 2002. KRAB zinc finger proteins: an analysis of the molecular mechanisms governing their increase in numbers and complexity during evolution. Mol. Biol. Evol. 19, 2118 – 2130. Lu, D., Searles, M.A., Klug, A., 2003. Crystal structure of a zinc-finger – RNA complex reveals two modes of molecular recognition. Nature 426, 96 – 100. Mackay, J.P., Crossley, M., 1998. Zinc fingers are sticking together. Trends Biochem. Sci. 23, 1 – 4. Margolin, J.F., Friedman, J.R., Meyer, W.K., Vissing, H., Thiesen, H.J., Rauscher, F.J. III, 1994. Kruppel-associated boxes are potent transcriptional repression domains. Proc. Natl. Acad. Sci. U. S. A. 91, 4509 – 4513. Mark, C., Abrink, M., Hellman, L., 1999. Comparative analysis of KRAB zinc finger proteins in rodents and man: evidence for several evolutionarily distinct subfamilies of KRAB zinc finger genes. DNA Cell Biol. 18, 381 – 396.

McCarty, A.S., Kleiger, G., Eisenberg, D., Smale, S.T., 2003. Selective dimerization of a C2H2 zinc finger subfamily. Mol. Cell 11, 459 – 470. Mefford, H.C., Trask, B.J., 2002. The complex structure and dynamic evolution of human subtelomeres. Nat. Rev., Genet. 3, 91 – 102. Melchior, F., 2000. SUMO-nonclassical ubiquitin. Annu. Rev. Cell Dev. Biol. 16, 591 – 626. Moosmann, P., Georgiev, O., Le Douarin, B., Bourquin, J.P., Schaffner, W., 1996. Transcriptional repression by RING finger protein TIF1 beta that interacts with the KRAB repressor domain of KOX1. Nucleic Acids Res. 24, 4859 – 4867. Nagaoka, M., Nomura, W., Shiraishi, Y., Sugiura, Y., 2001. Significant effect of linker sequence on DNA recognition by multi-zinc finger protein. Biochem. Biophys. Res. Commun. 282, 1001 – 1007. Nielsen, A.L., Jorgensen, P., Lerouge, T., Cervino, M., Chambon, P., Losson, R., 2004. Nizp1, a novel multitype zinc finger protein that interacts with the NSD1 histone lysine methyltransferase through a unique C2HR motif. Mol. Cell. Biol. 24, 5184 – 5196. Ogawa, H., Ueda, T., Aoyama, T., Aronheim, A., Nagata, S., Fukunaga, R., 2003. A SWI2/SNF2-type ATPase/helicase protein, mDomino, interacts with myeloid zinc finger protein 2A (MZF-2A) to regulate its transcriptional activity. Genes Cells 8, 325 – 339. Opitz, J.M., Weaver, D.W., Reynolds Jr., J.F., 1998. The syndromes of Sotos and Weaver: reports and review. Am. J. Med. Genet. 79, 294 – 304. Pavletich, N.P., Pabo, C.O., 1991. Zinc finger-DNA recognition: crystal structure of a Zif268 – DNA complex at 2.1 A. Science 252, 809 – 817. Pengue, G., Calabro, V., Bartoli, P.C., Pagliuca, A., Lania, L., 1994. Repression of transcriptional activity at a distance by the evolutionarily conserved KRAB domain present in a subfamily of zinc finger proteins. Nucleic Acids Res. 22, 2908 – 2914. Porsch-Ozcurumez, M., et al., 2001. The zinc finger protein 202 (ZNF202) is a transcriptional repressor of ATP binding cassette transporter A1 (ABCA1) and ABCG1 gene expression and a modulator of cellular lipid efflux. J. Biol. Chem. 276, 12427 – 12433. Rayasam, G.V., et al., 2003. NSD1 is essential for early post-implantation development and has a catalytically active SET domain. EMBO J. 22, 3153 – 3163. Ren, B., et al., 2000. Genome-wide location and function of DNA binding proteins. Science 290, 2306 – 2309. Rodriguez, M.S., Dargemont, C., Hay, R.T., 2001. SUMO-1 conjugation requires both a consensus modification motif and nuclear targeting. J. Biol. Chem. 276, 12654 – 12659. Ross, S., Best, J.L., Zon, L.I., Gill, G., 2002. SUMO-1 modification represses Sp3 transcriptional activation and modulates its subnuclear localization. Mol. Cell 10, 831 – 842. Ryan, R.F., et al., 1999. KAP-1 corepressor protein interacts and colocalizes with heterochromatic and euchromatic HP1 proteins: a potential role for Kruppel-associated box-zinc finger proteins in heterochromatin-mediated gene silencing. Mol. Cell. Biol. 19, 4366 – 4378. Sander, T.L., Haas, A.L., Peterson, M.J., Morris, J.F., 2000. Identification of a novel SCAN box-related protein that interacts with MZF1B. The leucine-rich SCAN box mediates hetero- and homoprotein associations. J. Biol. Chem. 275, 12857 – 12867. Sander, T.L., Stringer, K.F., Maki, J.L., Szauter, P., Stone, J.R., Collins, T., 2003. The SCAN domain defines a large family of zinc finger transcription factors. Gene 310, 29 – 38. Schultz, D.C., Friedman, J.R., Rauscher III, F.J., 2001. Targeting histone deacetylase complexes via KRAB-zinc finger proteins: the PHD and bromodomains of KAP-1 form a cooperative unit that recruits a novel isoform of the Mi-2alpha subunit of NuRD. Genes Dev. 15, 428 – 443. Schultz, D.C., Ayyanathan, K., Negorev, D., Maul, G.G., Rauscher III, F.J., 2002. SETDB1: a novel KAP-1-associated histone H3, lysine 9-specific methyltransferase that contributes to HP1-mediated silencing of euchromatic genes by KRAB zinc-finger proteins. Genes Dev. 16, 919 – 932. Schumacher, C., et al., 2000. The SCAN domain mediates selective oligomerization. J. Biol. Chem. 275, 17173 – 17179.

L.C. Edelstein, T. Collins / Gene 359 (2005) 1 – 17 Seeler, J.S., Dejean, A., 2003. Nuclear and unclear functions of SUMO. Nat. Rev., Mol. Cell Biol. 4, 690 – 699. Simon, I., et al., 2001. Serial regulation of transcriptional regulators in the yeast cell cycle. Cell 106, 697 – 708. Stone, J.R., Maki, J.L., Blacklow, S.C., Collins, T., 2002. The SCAN domain of ZNF174 is a dimer. J. Biol. Chem. 277, 5448 – 5452. Tanaka, K., et al., 2002. A Kruppel-associated box-zinc finger protein, NT2, represses cell-type-specific promoter activity of the alpha 2(XI) collagen gene. Mol. Cell. Biol. 22, 4256 – 4267. Tsumaki, N., Kimura, T., Tanaka, K., Kimura, J.H., Ochi, T., Yamada, Y., 1998. Modular arrangement of cartilage- and neural tissue-specific ciselements in the mouse alpha2(XI) collagen promoter. J. Biol. Chem. 273, 22861 – 22864. Tupler, R., Perini, G., Green, M.R., 2001. Expressing the human genome. Nature 409, 832 – 833. Underhill, C., Qutob, M.S., Yee, S.P., Torchia, J., 2000. A novel nuclear receptor corepressor complex, N-CoR, contains components of the mammalian SWI/SNF complex and the corepressor KAP-1. J. Biol. Chem. 275, 40463 – 40470. Venter, J.C., et al., 2001. The sequence of the human genome. Science 291, 1304 – 1351. Wagner, S., et al., 2000. A broad role for the zinc finger protein ZNF202 in human lipid metabolism. J. Biol. Chem. 275, 15685 – 15690. Wang, M.M., Reed, R.R., 1993. Molecular mechanisms of olfactory neuronal gene regulation. Ciba Found. Symp. 179, 68 – 72 (discussion 73-5, 88 – 96).

17

Williams, A.J., Khachigian, L.M., Shows, T., Collins, T., 1995. Isolation and characterization of a novel zinc-finger protein with transcription repressor activity. J. Biol. Chem. 270, 22143 – 22152. Williams, A.J., Blacklow, S.C., Collins, T., 1999. The zinc finger-associated SCAN box is a conserved oligomerization domain. Mol. Cell. Biol. 19, 8526 – 8535. Witzgall, R., O’Leary, E., Leaf, A., Onaldi, D., Bonventre, J.V., 1994. The Kruppel-associated box-A (KRAB-A) domain of zinc finger proteins mediates transcriptional repression. Proc. Natl. Acad. Sci. U. S. A. 91, 4514 – 4518. Wyrick, J.J., Young, R.A., 2002. Deciphering gene expression regulatory networks. Curr. Opin. Genet. Dev. 12, 130 – 136. Yu, J., Li, Y., Ishizuka, T., Guenther, M.G., Lazar, M.A., 2003. A SANT motif in the SMRT corepressor interprets the histone code and promotes histone deacetylation. EMBO J. 22, 3403 – 3410. Zhang, W., et al., 2004. The functional landscape of mouse gene expression. J. Biol. 3, 21. Zhu, H., Snyder, M., 2002. ‘‘Omic’’ approaches for unraveling signaling networks. Curr. Opin. Cell Biol. 14, 173 – 179. Zhu, L., Swergold, G.D., Seldin, M.F., 2003. Examination of sequence homology between human chromosome 20 and the mouse genome: intense conservation of many genomic elements. Hum. Genet. 113, 60 – 70.