The FF domain: a novel motif that often accompanies WW domains

The FF domain: a novel motif that often accompanies WW domains

PROTEIN SEQUENCE MOTIFS TIBS 24 – JULY 1999 The FF domain: a novel motif that often accompanies WW domains that FBP11 and FBP21 are Prp40-like spli...

141KB Sizes 5 Downloads 105 Views

PROTEIN SEQUENCE MOTIFS

TIBS 24 – JULY 1999

The FF domain: a novel motif that often accompanies WW domains

that FBP11 and FBP21 are Prp40-like splicing factors7. CA150 associates with the RNA polymerase II holoenzyme8; this interaction might play a role in splicing – given that transcription and pre-mRNA processing might be coupled in vivo9. Recently, FBP11 was found to bind to the protein huntingtin10, which is encoded by the gene mutated in Huntington disease11. When the partial sequence of FBP11 was subjected to sequence-database searches (computed by the BLASTP program12), we observed that a portion of this sequence, which is outside the WW domains, is homologous to five distinct areas of the C-terminal half of CA150. The repetitive occurrence of this domain suggests that it has a conserved function. Indeed, when we performed database searches using a region of FBP11 that contains only an isolated copy of this repeated motif, we identified several additional proteins that contain single or multiple copies of this novel domain (Fig. 1). We refer to this new motif as an FF domain because it harbors two strictly conserved phenylalanine residues (Fig. 1). The FF domain is ~50 residues in length. Secondary-structure predictions and the

Splicing requires the assembly of elaborate multiprotein complexes that span introns and exons1. Different sets of proteins associate with nascent RNA in a temporally specific fashion, creating at least four distinguishable complexes: E, A, B and C. Spliceosome-associated proteins therefore often harbor protein–proteininteraction motifs – for example, SR domains, Zn-finger motifs, proline-rich regions, KH domains and WW domains1–3. WW-domain-containing proteins are components of the yeast and mammalian spliceosome structures. Two WW-domaincontaining proteins, FBP21 and CA150 [originally identified as a formin-binding protein (FBP)4], associate with the A splicing complex2,5. Similarly, the Saccharomyces cerevisiae protein Prp40p, which contains two WW domains, is a splicing factor involved in intron bridging6,7. Further, it has been proposed

pattern of residue conservation suggest that the domain contains three a-helices. FF domains are present in eight WW-domaincontaining molecules, three of which are putative splicing factors (Fig. 1a). In every case, the FF domains are C-terminal to the WW domains, and the distance between the last WW domain and the first FF domain varies between 64 and 187 residues (Fig. 2). The fact that FF domains are frequently found in proteins that contain WW domains is intriguing and raises the possibility that FF domains encode a signature motif for those WW-domain-containing proteins involved in splicing. A series of repeated FF domains are also present in the p190 family of GTPases13,14 (Fig. 1b), molecules that play important roles in signal transduction pathways15 that regulate cytoskeletal organization16. Within the first putative a-helical regions of the FF domain, there is a highly conserved FXXLL motif. The FXXLL motif is a potential nuclear receptor (NR) box17. NR boxes are bound to in a liganddependent fashion by nuclear receptors and are components of transcription coactivators17–19. NR boxes lie close to the C-termini of separate a-helical regions; this

(a) E A K E Q K E K I Q D A K E K D R K R E E K E K

A R I A A K A F A K E A I A N A Q V L R A L H R

K M K I K R E K R K R E R L R R S R E L K K E R

Q K S Q E E K E D H N R K D D R E E S K E A D E

A Q D N D H E A N Y I A E D A V R E E Q E A T A

F F F F F F F F F F F F F F F F F Y F F F F Y F

K K F K K R I Q K W F F I C R T R R R K Q V R F

E D E A K Q T K S L E K E S A E E L N E K K N Q

L M L L M L M M L L L F M L L L L L L M L L L V

L L L L M L L L L L F L L L L L L L L L L L L L

K L S S E D K Q R Q D D K R Q K R E K E A E I D

E E N D E E E N E R R S S N D D D S E E E E D N

K R H M A T N N V T Y H S M L K K L H K C Q L H

Y R Y H -

T V N K -

P K Q D R -

S S I T D K G G T E

R G H V K A Q H K G K N K N K N K E N N E G I K

V V L R F I V I I K L V I F I I I L V V L L K I

P S D S N T D K K P D N H E T G T T D S N T S T

S A S S P L S Y A K K P S P L A T S K T G R T P

N F Q D R T T Y N P F S Y Y R Y N M D S R K E M

A S S V A S W T T S S W T T T Q C S S S S S N M

S T R S T T S R R T T T L R K P N L E T S K S R

W W W W F W F W W W W W W W W W W W W F Y W W W

E E S S S K S P S D S E R S K E D T T E S S H R

Q K K D E E R T D L L Q T V E L Q S V K S T E D

A E V T F V I A I A Q T V A L V A L I E F T A A

M L K R A K I K Y S S V K Q Y Y V F K L T K R K

K H D R A K S R P K K R N A P P K P P S S K R K

M K K T K I E L H E K E E K I K W I K K K T I I

I I V L H I L I I L I L F F I L I I I I F L L I

I V E R A K G A K G E C S D K L Q S E V G E R Q

N F S K K E T D S E N D S Q D D N T K F K D K D

R E

D D D D D D D E D S D A H D D D D D D D D E D E

P P P H S P P P P L P K P P P D P T K K P E E E

R R R R R R R I R E D G A R R R R R A R R R R T

Y Y Y W F C Y Y F Y F Y F Y Y F F F Y Y Y Y Y F

S L K E K I W K L K Y Y N T L Y R E L L K I A V

A L A S A K M H H A K V A R N V I L A S A A N K

N -

S -

N -

G M M L M C I

S F V S L L K L L D A

L V L I S D V G G I M T Y G L L F M V L M S

A L D L E S D V R D R K S L Q D N Q E L E D L N

K N S E K S D N N E D D S S S S K P N G R S D S

L P S R M D P E G D D P E K G G V G D A N S K E

S K S E K R L K S N T W T L S E S S D T R S T R

E E M E D K W T S I V H E D T R E S E E D T R K

K R R K R K K K C R R R Q A P R K P R R R R K V

K K E E E Q K R L R E K Q L L K K L E R E E E E

Q Q D K A R E Q D Q S C Q V D E Q D T A D S S R

A V L L L E M T L I L A L A L V L L A C A L L D

F F F F F F F F F F F F F F F F F F F F F F F F

N D K N N E E Q L E E D F E W E N K N D N R D R

A Q Q E E E K D D D E A E D D E A F H A D E D D

Y Y Y H F Y Y Y F F W Y Y H T Y W F Y F F F H W

. . . . . F . . L L . . . - - - - . . . . . . . W . . . . . . I . . - D P R Y . . - - - - - . . . . . . R . . L F . . . 10

20

30

40

50

(b) A T M A A K M A

K K R K T R K K

D P R E D A E E

K E A E K D K E

Y F F F F F F F

E L K Q E Q K Q

W K E E K L K E

L W N L L C T M

V F L L V F L L

S V E L Q V E F

R V T E T V K E

I L S Y V L I H

P Q -

V F F -

K I R I -

S T D S -

H E P S Y E P S

N E G E H K G E

E T K L A T Q L

N R P F T P P F

W W W Y W W W Y

L D E E K D E D

S E E L T E E L

V T A E V T V D

S S R L S D M L

R H S D N H C N

K I F A K I F A

M D I K L D V T

Q N M P K K M P

A M N S N I E S

S E E K H N D S

P N D E P D E D

E E F K D R A K

Y R Y M Y R Y M

Q I Q G E I K S

D P W V E P Y E

Y Y -

V F I F -

Y D N D -

L L L I L L I I

E M Q E L H

G D E D G S T T

T T E V T T E V

Q V S L R L A L

K P V G K E D S

A A Y E A A S E

K E M E R E K E

K Q D Q N K E P

L L I R T V V R

F Y Y F F Y Y Y

L E G K S Q G K

Q T K A K N R A

H H H L H H H L

E A K . . F . . . L . . . - - - - . . . . W . E . . . . . . . . . . Y . . - - - L . . T . . A E . . Y . . H 10

20

30

40

Species

Acc #

Homo sapiens H. sapiens

AF049523 AF017789

Saccharomyces cerevisiae

P33203

S. cerevisiae S69038 Schizosaccharomyces Z98602 pombe

S. pombe Caenorhabditis elegans

Q09685 P34600

C. elegans

U58758

Consensus

60

. H H H H H H H H H H . . - - - - . . . . . . H H H H H H H H H . . - . . . . . . - - - - - . . . . . H H H H H H H . .

T E E R T E E K

Protein FBP11 CA150-1 CA150-2 CA150-3 CA150-4 CA150-5 PRP40-1 PRP40-2 PRP40-3 PRP40-4 YPR152c SPAC4D7.04c-1 SPAC4D7.04c-2 SPAC4D7.04c-3 SPAC4D7.04c-4 C13C5.02 ZK1098.1-1 ZK1098.1-2 ZK1098.1-3 ZK1127.9-1 ZK1127.9-2 ZK1127.9-3 ZK1127.9-4 ZK1127.9-5

Secondary structure

Protein

Species

Acc #

p190-1 p190-2 p190-3 p190-4 p190B-1 p190B-2 p190B-3 p190B-4

Rattus norvegicus

A38218

H. sapiens

U17032

Consensus

50

Figure 1 Clustal alignments of FF domains. Protein names, species and accession numbers (Acc #) are indicated. (a) Alignment of FF domains found in WW-domain-containing proteins. Amino acids conserved in at least 46% of the sequences are highlighted. Secondary structure was predicted by the program PHD (Ref. 20): H indicates predicted a-helices. When the FF domain of FBP11 was subjected to BLAST-P12 sequencedatabase searches, the best E values were seen for ZK1098.1 (E values are 1 3 1027 and 5 3 1022 for two FF domains in ZK1098.1) and CA150 (E values range from 1 3 1023 to 6.7 for the five FF domains present in CA150). (b) Alignment of FF domains found in the p190 family of GTPases. Amino acids conserved in at least 50% of the sequences are highlighted.

264

0968 – 0004/99/$ – See front matter © 1999, Elsevier Science. All rights reserved.

PII: S0968-0004(99)01417-6

PROTEIN SEQUENCE MOTIFS

TIBS 24 – JULY 1999

121

187 560

136

CA150 PRP40

References

375 423

188

FBP11 (partial sequence)

661

727

794

898

956

1098

101

71

3

135

357

203

500

583

64

5 30

214

YPR152c

465

184 36

103

198

SPAC4D7.04c

265

328

407

695

95 9

120

C13C5.02

241

411

121

84

154

226

ZK1098.1

444

580

724

72 200

ZK1127.9

407

505

571

638

755

803

946

98

p190 GTPase

272

371

430

488

1262

1513

271

369

428

486

1272

1499

p190B GTPase PGM domain

WW domain

FF domain

Leucine zipper

GTPase motifs

Rho GAP domain

Figure 2 The modular structure of FF domain proteins. WW domains are found in many proteins that contain FF domains; the PGM domain is common to FBP11, CA150 and ZK1127.9; CA150 contains a leucine zipper at its C-terminus. The RhoGAP-family members (p190 and p190B) contain N-terminal GTPase motifs and C-terminal RhoGAP domains. Residue numbers are indicated in black; the distance between the end of the last WW domain and the start of the first FF domain is indicated in red. is also true of FF domains. The repeated occurrence of FF domains would mean that the FXXLL motif is also repeated in several proteins (Fig. 2). Five members of the FF-domaincontaining family of proteins have been characterized. Two members are Rho GTPases13,14, and three are linked to splicing5–7. These findings demonstrate

that the FF domain is a mobile domain that might be involved in protein–protein interactions.

1 Reed, R. (1996) Curr. Opin. Genet. Dev. 6, 215–220 2 Bedford, M. T. et al. (1998) Proc. Natl. Acad. Sci. U. S. A. 95, 10602–10607 3 Rain, J. C. et al. (1990) RNA 4, 551–565 4 Chan, D. C. et al. (1996) EMBO J. 15, 1045–1054 5 Neubauer, G. et al. (1998) Nat. Genet. 20, 46–50 6 Kao, H. et al. (1996) Mol. Cell. Biol. 16, 960–967 7 Abovich, N. and Rosbash, M. (1997) Cell 89, 403–412 8 Sune, C. et al. (1997) Mol. Cell. Biol. 17, 6029–6039 9 Steinmatz, E. J. (1997) Cell 89, 491–494 10 Faber, P. W. et al. (1998) Hum. Mol. Genet. 7, 1463–1474 11 Group THDCR (1993) Cell 72, 971–983 12 Altschui, S. F. et al. (1990) J. Mol. Biol. 215, 403–410 13 Settleman, J. et al. (1992) Cell 69, 539–549 14 Burbelo, P. D. et al. (1995) J. Cell. Biol. 270, 30919–30926 15 Hall, A. (1994) Annu. Rev. Cell. Biol. 10, 31–54 16 Lim, L. (1996) Eur. J. Biochem. 242, 171–184 17 Voegel, J. J. et al. (1998) EMBO J. 17, 507–519 18 Darimont, B. D. et al. (1998) Genes Dev. 12, 3343–3356 19 McInerney, E. M. et al. (1998) Genes Dev. 12, 3357–3368 20 Rost, B. and Sander, C. (1994) Protein 19, 55–72

Acknowledgements

MARK T. BEDFORD AND PHILLIP LEDER

M. T. B. thanks the Cancer Research Fund of the Damon Runyon–Walter Winchell Foundation for support.

Dept of Genetics, Harvard Medical School and Howard Hughes Medical Institute, 200 Longwood Ave, Boston, MA 02115, USA.

Guidelines for the submission and peer review of Protein Sequence Motifs Protein Sequence Motifs (PSM) is a regular column for brief reports of new motifs and sequence or structural homologies that have been recognized in published sequences. Contributions to this column should be short (fewer than 500 words plus one figure). All articles will be subject to peer review. PSM is not simply a review column for descriptions of sequence similarities. Preference will be given to reports of new motifs or sequence homologies that have profound biological significance (e.g. reports that provide direct insight into the functions of motifs/proteins or a new direction for further experiments). Manuscripts that report trivial findings or merely update well-known domains with some recent sequences will not be published. All sequences should have been published elsewhere in full and/or be freely available in the appropriate databases (e.g. GenBank, SWISS-PROT, etc.). Adequate statistical evaluations (such as E values from BLAST and FASTA) and, where appropriate, structural correlations should be given (where PSI-BLAST E values are given, authors must refer to the iteration at which a sequence was added). If statistical support is weak, additional consistency checks must be made – for example, reciprocative detection in database searches, congruent secondary-structure prediction and demonstration of a lack of conflict with existing domain definitions. Figures should be helpful to the reader. Alignments of representative sequence segments that comprise the motif should include accession numbers, sequence ranges and, where appropriate, secondary-structure assignments and/or predictions. Domain cartoons, especially for complex multidomain proteins, may be worth including but should not duplicate information given in the text. Comprehensive sequence alignments and other relevant supplementary material can be placed on the Web; URLs for such material should be included in the article so that hypertext links can be made in the online version of TiBS. Authors should check online databases (e.g. SWISS-PROT, PFAM, PRO-SITE and SMART) for domain conflicts or previously published reports of the given motif. Reviewers are asked to confirm these checks, and manuscripts may be rejected if any conflicts or previous reports are found. Further guidance about preparation of PSM articles can be found in TiBS 20, 104 (March 1995). Once a manuscript is accepted for publication, multiple sequence alignments must be deposited electronically at the EBI, and the accession number must be included in the article (see http://www.ebi.ac.uk/ebi_docs/embl_db/ebi/ authorinfo.html for submission details). It is hoped that making the submitted alignments publicly available in this way will expedite the creation of new entries in the alignment databases.

265