PROTEIN SEQUENCE MOTIFS
TIBS 24 – JULY 1999
The FF domain: a novel motif that often accompanies WW domains
that FBP11 and FBP21 are Prp40-like splicing factors7. CA150 associates with the RNA polymerase II holoenzyme8; this interaction might play a role in splicing – given that transcription and pre-mRNA processing might be coupled in vivo9. Recently, FBP11 was found to bind to the protein huntingtin10, which is encoded by the gene mutated in Huntington disease11. When the partial sequence of FBP11 was subjected to sequence-database searches (computed by the BLASTP program12), we observed that a portion of this sequence, which is outside the WW domains, is homologous to five distinct areas of the C-terminal half of CA150. The repetitive occurrence of this domain suggests that it has a conserved function. Indeed, when we performed database searches using a region of FBP11 that contains only an isolated copy of this repeated motif, we identified several additional proteins that contain single or multiple copies of this novel domain (Fig. 1). We refer to this new motif as an FF domain because it harbors two strictly conserved phenylalanine residues (Fig. 1). The FF domain is ~50 residues in length. Secondary-structure predictions and the
Splicing requires the assembly of elaborate multiprotein complexes that span introns and exons1. Different sets of proteins associate with nascent RNA in a temporally specific fashion, creating at least four distinguishable complexes: E, A, B and C. Spliceosome-associated proteins therefore often harbor protein–proteininteraction motifs – for example, SR domains, Zn-finger motifs, proline-rich regions, KH domains and WW domains1–3. WW-domain-containing proteins are components of the yeast and mammalian spliceosome structures. Two WW-domaincontaining proteins, FBP21 and CA150 [originally identified as a formin-binding protein (FBP)4], associate with the A splicing complex2,5. Similarly, the Saccharomyces cerevisiae protein Prp40p, which contains two WW domains, is a splicing factor involved in intron bridging6,7. Further, it has been proposed
pattern of residue conservation suggest that the domain contains three a-helices. FF domains are present in eight WW-domaincontaining molecules, three of which are putative splicing factors (Fig. 1a). In every case, the FF domains are C-terminal to the WW domains, and the distance between the last WW domain and the first FF domain varies between 64 and 187 residues (Fig. 2). The fact that FF domains are frequently found in proteins that contain WW domains is intriguing and raises the possibility that FF domains encode a signature motif for those WW-domain-containing proteins involved in splicing. A series of repeated FF domains are also present in the p190 family of GTPases13,14 (Fig. 1b), molecules that play important roles in signal transduction pathways15 that regulate cytoskeletal organization16. Within the first putative a-helical regions of the FF domain, there is a highly conserved FXXLL motif. The FXXLL motif is a potential nuclear receptor (NR) box17. NR boxes are bound to in a liganddependent fashion by nuclear receptors and are components of transcription coactivators17–19. NR boxes lie close to the C-termini of separate a-helical regions; this
(a) E A K E Q K E K I Q D A K E K D R K R E E K E K
A R I A A K A F A K E A I A N A Q V L R A L H R
K M K I K R E K R K R E R L R R S R E L K K E R
Q K S Q E E K E D H N R K D D R E E S K E A D E
A Q D N D H E A N Y I A E D A V R E E Q E A T A
F F F F F F F F F F F F F F F F F Y F F F F Y F
K K F K K R I Q K W F F I C R T R R R K Q V R F
E D E A K Q T K S L E K E S A E E L N E K K N Q
L M L L M L M M L L L F M L L L L L L M L L L V
L L L L M L L L L L F L L L L L L L L L L L L L
K L S S E D K Q R Q D D K R Q K R E K E A E I D
E E N D E E E N E R R S S N D D D S E E E E D N
K R H M A T N N V T Y H S M L K K L H K C Q L H
Y R Y H -
T V N K -
P K Q D R -
S S I T D K G G T E
R G H V K A Q H K G K N K N K N K E N N E G I K
V V L R F I V I I K L V I F I I I L V V L L K I
P S D S N T D K K P D N H E T G T T D S N T S T
S A S S P L S Y A K K P S P L A T S K T G R T P
N F Q D R T T Y N P F S Y Y R Y N M D S R K E M
A S S V A S W T T S S W T T T Q C S S S S S N M
S T R S T T S R R T T T L R K P N L E T S K S R
W W W W F W F W W W W W W W W W W W W F Y W W W
E E S S S K S P S D S E R S K E D T T E S S H R
Q K K D E E R T D L L Q T V E L Q S V K S T E D
A E V T F V I A I A Q T V A L V A L I E F T A A
M L K R A K I K Y S S V K Q Y Y V F K L T K R K
K H D R A K S R P K K R N A P P K P P S S K R K
M K K T K I E L H E K E E K I K W I K K K T I I
I I V L H I L I I L I L F F I L I I I I F L L I
I V E R A K G A K G E C S D K L Q S E V G E R Q
N F S K K E T D S E N D S Q D D N T K F K D K D
R E
D D D D D D D E D S D A H D D D D D D D D E D E
P P P H S P P P P L P K P P P D P T K K P E E E
R R R R R R R I R E D G A R R R R R A R R R R T
Y Y Y W F C Y Y F Y F Y F Y Y F F F Y Y Y Y Y F
S L K E K I W K L K Y Y N T L Y R E L L K I A V
A L A S A K M H H A K V A R N V I L A S A A N K
N -
S -
N -
G M M L M C I
S F V S L L K L L D A
L V L I S D V G G I M T Y G L L F M V L M S
A L D L E S D V R D R K S L Q D N Q E L E D L N
K N S E K S D N N E D D S S S S K P N G R S D S
L P S R M D P E G D D P E K G G V G D A N S K E
S K S E K R L K S N T W T L S E S S D T R S T R
E E M E D K W T S I V H E D T R E S E E D T R K
K R R K R K K K C R R R Q A P R K P R R R R K V
K K E E E Q K R L R E K Q L L K K L E R E E E E
Q Q D K A R E Q D Q S C Q V D E Q D T A D S S R
A V L L L E M T L I L A L A L V L L A C A L L D
F F F F F F F F F F F F F F F F F F F F F F F F
N D K N N E E Q L E E D F E W E N K N D N R D R
A Q Q E E E K D D D E A E D D E A F H A D E D D
Y Y Y H F Y Y Y F F W Y Y H T Y W F Y F F F H W
. . . . . F . . L L . . . - - - - . . . . . . . W . . . . . . I . . - D P R Y . . - - - - - . . . . . . R . . L F . . . 10
20
30
40
50
(b) A T M A A K M A
K K R K T R K K
D P R E D A E E
K E A E K D K E
Y F F F F F F F
E L K Q E Q K Q
W K E E K L K E
L W N L L C T M
V F L L V F L L
S V E L Q V E F
R V T E T V K E
I L S Y V L I H
P Q -
V F F -
K I R I -
S T D S -
H E P S Y E P S
N E G E H K G E
E T K L A T Q L
N R P F T P P F
W W W Y W W W Y
L D E E K D E D
S E E L T E E L
V T A E V T V D
S S R L S D M L
R H S D N H C N
K I F A K I F A
M D I K L D V T
Q N M P K K M P
A M N S N I E S
S E E K H N D S
P N D E P D E D
E E F K D R A K
Y R Y M Y R Y M
Q I Q G E I K S
D P W V E P Y E
Y Y -
V F I F -
Y D N D -
L L L I L L I I
E M Q E L H
G D E D G S T T
T T E V T T E V
Q V S L R L A L
K P V G K E D S
A A Y E A A S E
K E M E R E K E
K Q D Q N K E P
L L I R T V V R
F Y Y F F Y Y Y
L E G K S Q G K
Q T K A K N R A
H H H L H H H L
E A K . . F . . . L . . . - - - - . . . . W . E . . . . . . . . . . Y . . - - - L . . T . . A E . . Y . . H 10
20
30
40
Species
Acc #
Homo sapiens H. sapiens
AF049523 AF017789
Saccharomyces cerevisiae
P33203
S. cerevisiae S69038 Schizosaccharomyces Z98602 pombe
S. pombe Caenorhabditis elegans
Q09685 P34600
C. elegans
U58758
Consensus
60
. H H H H H H H H H H . . - - - - . . . . . . H H H H H H H H H . . - . . . . . . - - - - - . . . . . H H H H H H H . .
T E E R T E E K
Protein FBP11 CA150-1 CA150-2 CA150-3 CA150-4 CA150-5 PRP40-1 PRP40-2 PRP40-3 PRP40-4 YPR152c SPAC4D7.04c-1 SPAC4D7.04c-2 SPAC4D7.04c-3 SPAC4D7.04c-4 C13C5.02 ZK1098.1-1 ZK1098.1-2 ZK1098.1-3 ZK1127.9-1 ZK1127.9-2 ZK1127.9-3 ZK1127.9-4 ZK1127.9-5
Secondary structure
Protein
Species
Acc #
p190-1 p190-2 p190-3 p190-4 p190B-1 p190B-2 p190B-3 p190B-4
Rattus norvegicus
A38218
H. sapiens
U17032
Consensus
50
Figure 1 Clustal alignments of FF domains. Protein names, species and accession numbers (Acc #) are indicated. (a) Alignment of FF domains found in WW-domain-containing proteins. Amino acids conserved in at least 46% of the sequences are highlighted. Secondary structure was predicted by the program PHD (Ref. 20): H indicates predicted a-helices. When the FF domain of FBP11 was subjected to BLAST-P12 sequencedatabase searches, the best E values were seen for ZK1098.1 (E values are 1 3 1027 and 5 3 1022 for two FF domains in ZK1098.1) and CA150 (E values range from 1 3 1023 to 6.7 for the five FF domains present in CA150). (b) Alignment of FF domains found in the p190 family of GTPases. Amino acids conserved in at least 50% of the sequences are highlighted.
264
0968 – 0004/99/$ – See front matter © 1999, Elsevier Science. All rights reserved.
PII: S0968-0004(99)01417-6
PROTEIN SEQUENCE MOTIFS
TIBS 24 – JULY 1999
121
187 560
136
CA150 PRP40
References
375 423
188
FBP11 (partial sequence)
661
727
794
898
956
1098
101
71
3
135
357
203
500
583
64
5 30
214
YPR152c
465
184 36
103
198
SPAC4D7.04c
265
328
407
695
95 9
120
C13C5.02
241
411
121
84
154
226
ZK1098.1
444
580
724
72 200
ZK1127.9
407
505
571
638
755
803
946
98
p190 GTPase
272
371
430
488
1262
1513
271
369
428
486
1272
1499
p190B GTPase PGM domain
WW domain
FF domain
Leucine zipper
GTPase motifs
Rho GAP domain
Figure 2 The modular structure of FF domain proteins. WW domains are found in many proteins that contain FF domains; the PGM domain is common to FBP11, CA150 and ZK1127.9; CA150 contains a leucine zipper at its C-terminus. The RhoGAP-family members (p190 and p190B) contain N-terminal GTPase motifs and C-terminal RhoGAP domains. Residue numbers are indicated in black; the distance between the end of the last WW domain and the start of the first FF domain is indicated in red. is also true of FF domains. The repeated occurrence of FF domains would mean that the FXXLL motif is also repeated in several proteins (Fig. 2). Five members of the FF-domaincontaining family of proteins have been characterized. Two members are Rho GTPases13,14, and three are linked to splicing5–7. These findings demonstrate
that the FF domain is a mobile domain that might be involved in protein–protein interactions.
1 Reed, R. (1996) Curr. Opin. Genet. Dev. 6, 215–220 2 Bedford, M. T. et al. (1998) Proc. Natl. Acad. Sci. U. S. A. 95, 10602–10607 3 Rain, J. C. et al. (1990) RNA 4, 551–565 4 Chan, D. C. et al. (1996) EMBO J. 15, 1045–1054 5 Neubauer, G. et al. (1998) Nat. Genet. 20, 46–50 6 Kao, H. et al. (1996) Mol. Cell. Biol. 16, 960–967 7 Abovich, N. and Rosbash, M. (1997) Cell 89, 403–412 8 Sune, C. et al. (1997) Mol. Cell. Biol. 17, 6029–6039 9 Steinmatz, E. J. (1997) Cell 89, 491–494 10 Faber, P. W. et al. (1998) Hum. Mol. Genet. 7, 1463–1474 11 Group THDCR (1993) Cell 72, 971–983 12 Altschui, S. F. et al. (1990) J. Mol. Biol. 215, 403–410 13 Settleman, J. et al. (1992) Cell 69, 539–549 14 Burbelo, P. D. et al. (1995) J. Cell. Biol. 270, 30919–30926 15 Hall, A. (1994) Annu. Rev. Cell. Biol. 10, 31–54 16 Lim, L. (1996) Eur. J. Biochem. 242, 171–184 17 Voegel, J. J. et al. (1998) EMBO J. 17, 507–519 18 Darimont, B. D. et al. (1998) Genes Dev. 12, 3343–3356 19 McInerney, E. M. et al. (1998) Genes Dev. 12, 3357–3368 20 Rost, B. and Sander, C. (1994) Protein 19, 55–72
Acknowledgements
MARK T. BEDFORD AND PHILLIP LEDER
M. T. B. thanks the Cancer Research Fund of the Damon Runyon–Walter Winchell Foundation for support.
Dept of Genetics, Harvard Medical School and Howard Hughes Medical Institute, 200 Longwood Ave, Boston, MA 02115, USA.
Guidelines for the submission and peer review of Protein Sequence Motifs Protein Sequence Motifs (PSM) is a regular column for brief reports of new motifs and sequence or structural homologies that have been recognized in published sequences. Contributions to this column should be short (fewer than 500 words plus one figure). All articles will be subject to peer review. PSM is not simply a review column for descriptions of sequence similarities. Preference will be given to reports of new motifs or sequence homologies that have profound biological significance (e.g. reports that provide direct insight into the functions of motifs/proteins or a new direction for further experiments). Manuscripts that report trivial findings or merely update well-known domains with some recent sequences will not be published. All sequences should have been published elsewhere in full and/or be freely available in the appropriate databases (e.g. GenBank, SWISS-PROT, etc.). Adequate statistical evaluations (such as E values from BLAST and FASTA) and, where appropriate, structural correlations should be given (where PSI-BLAST E values are given, authors must refer to the iteration at which a sequence was added). If statistical support is weak, additional consistency checks must be made – for example, reciprocative detection in database searches, congruent secondary-structure prediction and demonstration of a lack of conflict with existing domain definitions. Figures should be helpful to the reader. Alignments of representative sequence segments that comprise the motif should include accession numbers, sequence ranges and, where appropriate, secondary-structure assignments and/or predictions. Domain cartoons, especially for complex multidomain proteins, may be worth including but should not duplicate information given in the text. Comprehensive sequence alignments and other relevant supplementary material can be placed on the Web; URLs for such material should be included in the article so that hypertext links can be made in the online version of TiBS. Authors should check online databases (e.g. SWISS-PROT, PFAM, PRO-SITE and SMART) for domain conflicts or previously published reports of the given motif. Reviewers are asked to confirm these checks, and manuscripts may be rejected if any conflicts or previous reports are found. Further guidance about preparation of PSM articles can be found in TiBS 20, 104 (March 1995). Once a manuscript is accepted for publication, multiple sequence alignments must be deposited electronically at the EBI, and the accession number must be included in the article (see http://www.ebi.ac.uk/ebi_docs/embl_db/ebi/ authorinfo.html for submission details). It is hoped that making the submitted alignments publicly available in this way will expedite the creation of new entries in the alignment databases.
265