PROTEINSEQUENCEMOTIFS A TPR domain in the SNAP secretory proteins Recent work on secretion has revealed a secretory complex composed of several conserved proteins, representatives of which appear to function at all sites of constitutive and regulated secretion in eukaryotes (reviewed in Refs 1, 2). Among these are two types of cytosolic proteins, NSF (N-ethyl-maleimide-sensitive protein) and the SNAPS (a-, 13-and 7-soluble NSF attachment proteins). Here we report that a repeating sequence motif in the SNAP
(a)
TIBS 1 9 - DECEMBER 1994 proteins exhibits marked similarity to the tetratricopeptide repeats (TPRs) that define the TPR protein family, a family consisting largely of nuclear proteins including several that function in the cell cycle (reviewed in Refs 3, 4). We also report here that the TPR repeats in the SNAPs are composed of smaller subrepeats, not previously recognized in TPRs. The TPR is thought to be a structural element that co-assembles with other TPRs in the same protein to form a distinct TPR domain, and may also participate in protein-protein interactions with the TPRs of associated proteins 3. Coincidentally, the TPR
34-residue TPR unit
34-residue TPR unit I
'
7
7
6
structural element has been termed the 'snap helix' in reference to the proposed mechanism by which amphipathic helices in the repeat mediate co-assembly of TPRsa. We have recently isolated Drosophila melanogasterhomologs of the vertebrate and yeast NSF and SNAP genes, to initiate a genetic analysis of their roles in neurotransmitter release s. In alignments of the DrosophilaSNAP amino acid sequence to those of the bovine and yeast SNAPs, we recognized a striking pattern of repeating sequence elements within the amino-terminal half of the SNAP proteins. A continuous sequence
7
7
I
I
'
'
7
7
6
7
7
I
41 41 39 36 50
119 119 117 115 129
nut2 + 430 CDC23 399
497 466
a-SNAP ~-SNAP dSNAP SEC17 ~SNAP
U-SNAP ~-SNAP dSNAP SECt7 7-SNAP
120 120 118 116 130
193 193 191 189 201
nuc2+ 498 CDC23 467
565 534
TPR CONSENSUS
* X
(b)
Thehydrophoblcaminoaclds:l,V,L,M,A Thepolaraminoacids:K,R,Q,N,D,E.
Positlonlnseven-reelduesubre~et Positloninsix-resldueeubrepeet
1234567 123 456 AAD*YEX GECFK IN L
% occurrence of prevalent residues at each position in the pooled seven- and six-residue SNAP subrepeat sequences I Missing from six-residue subrepeat I E
Position Residue
1 A(66%)
2 A(35%)151% G(16%)l I(15%)
3 D(23%)7 E(16%)J 39% N(19%)
I
4 *(51%) C(19%)
5 Y(38%)15^% F(20%) J ~ L(15%)
6 E(25%) K(15%)
7 X(73%)
RgUm 1 (e) Alignmentof the soluble NSF-attachmentprotein (SNAP)secretoryprotein tetratricopeptide (TPR)domainswith knownTPR motifs. The alignment includes the bovine ,,-, 13-and ~SNAPse;the yeast SNAP protein SEC17 (Ref. 7); the yeast TPR proteins, nuc2+ and CDC23 (Refs 3, 4); and a publishedTPR consensus sequences. Shaded residuesare matches to a 'unified' SNAPsubrepeatconsensus sequencerepresentingboth the seven-and six-residuesubrepeats. This unified subrepeat consensus sequencewas generated by analysis of all 75 of the seven-residuesubrepeats and all 20 of the six-residuesubrepeats present in the aligned SNAPsequences.First, separate consensus sequencesfor the seven-and six-residuesubrepeats were generatedby determiningthe frequencyat which each amino acid occurred at each position of the subrepeat. It was then found that when the seven-residuesubrepeat consensus was shortenedto six residues by removalof the fourth position, it was very similar to the consensus of the six-residuesubrepeat. Thus the two sequenceswere considered to be related and were pooled to determine a unified consensus sequence based on the amino acid frequencies shown in (b). The SNAP sequence alignmentswere generatedusing the PILEUPprogramfrom the UW-GCGsoftware package9 (Universityof Wisconsin-Madison).The gaps shown in the first (most amino-terminal)and fourth (most carboxy-terminal)TPR units of the SNAPswere introducedto improvethe alignmentwith the SNAPsubrepeat sequences. Owingto the six-residuelinker regions betweenthe SNAPrepeats, alignmentto the TPR protein domains was producedby introducingsix-residue gaps betweenthe TPRs of nuc2+and CDC23. Databasesearches and sequence randomizationwere performed using the UW-GCGpackageprogramsFASTA and SHUFFLE,respectively. 530
© 1994,Elsevier ScienceLtd 0968-0004/94/$07.00
PROTEINSEQUENCE MOTIFS
TIBS 1 9 - D E C E M B E R 1 9 9 4
beginning near the amino termini of the SNAPs was found to contain several 34-residue repeats separated by linker regions of six amino acids. As shown in the upper portion of Fig. I, each 34-residue repeat can be divided into five subrepeats. These SNAP subrepeats all have related sequences but can be separated into two classes by length. Four copies of a seven-residue subrepeat are present within the 34-residue repeat and are arranged in tandem pairs that are separated by a single six-residue subrepeat in the center. The repeat structure for the five SNAP sequences aligned in Pig. 1 is indicated by shading of residues that match a 'unified' consensus sequence representing both the sevenresidue and six-residue subrepeats (Fig. I). Despite substantial sequence differences among the SNAP proteins (ranging from 88% to 25% amino acid identity with a-SNAP in the aligned region), the percent identity of their 34-residue repeats to the unified consensus for the SNAP subrepeats falls within a rather narrow range as follows: a-SNAP (65%), 13-SNAP (64%), dSNAP (63%), SECI7 (60%) and ,/-SNAP (63%). The degree to which these comparisons indicate strict order within the SNAP sequences is indicated by our finding that a randomized version of the same a-SNAP domain sequence exhibited only 22% identity to the SNAP subrepeat consensus sequences. Database searches using the portions of the dSNAP and a-SNAP proteins shown in Fig. l Identified the yeast nuclear protein nuc2 + as a high-ranking match and
The WW domain: a signalling site in dystrophin? Duchenne and Becket muscular dystrophies are degenerative diseases caused by mutations of a single locus, the dystrophin gene (for a review see Ref. I). The gene encodes a large molecule that belongs to a family of cytoskeletal proteins including a-actinin and ~-spectrin. Alternatively spliced forms exist; the longest one consists of four domains: (I) an aminoterminal, globular acUn-bindingdomain common to other cytoskeletal proteins, (2) 24 spectrin-like repeats forming a long rod in the middle of the molecule, (3) a cysteine-rich Ca2'-bindingdomain, and (4) a carboxy-terminal globular domain I (Fig.l). Molecular analysis of the central rod-like portion of human dystrophin revealed two interruptions of the spectrin repeats and two flanking segments that appear to be hinge regions~. © 1994,ElsevierScienceLtd 0968-0004/94/$07.00
as the only protein matched by both query sequences. Another yeast nuclear protein, the cell-cycle protein CDC23, was identified as the closest match to a consensus sequence representing the SNAP seven-residue subrepeat. Both nuc2* and CDC23 are members of the TPR protein family, and the 34-residue repeats present in these proteins were found to be similar to those in the SNAPs. The similarity between the SNAP repeats and those in the TPR proteins is shown in Fig. I. Here, two individual TPR proteins, the yeast nuclear proteins nuc2 ÷ (Refs 3, 4) and CDC23 (Refs 3, 4), and a published consensus sequence for the TPR motif" are aligned with the SNAP repeat domain. Again, highlighted residues match the unified consensus sequence for the seven- and six-residue SNAP subrepeats. The aligned sequences of the nuc2 ÷ (TPRs 5--8) and CDC23 (TPRs 5-8) proteins are 43% and 40% identical to the unified consensus for the SNAP subrepeats. By contrast, a randomized sequence corresponding to the same nuc2÷domain exhibited only 22% identity. The TPR consensus sequence exhibits 62% identity to the SNAP subrepeat consensus, a value similar to those observed with the repeat domains of individual SNAP proteins. Thus, a clear relationship exists between the 3a,-residue repeats in the SNAPs and those in established TPR proteins. Despite this, the SNAP TPRs are unusual in that they are separated by linker regions of six amino acids in length, in contrast to the tandem arrangement prevalent In other TPR proteins 3,4(see Fig. I).
The findings reported here may have important implications. The subrepeat structure identified in the SNAP TPRs may lead to new general insights into the structure and interactions of TPRs in all proteins in which they appear. Fu~hermore, the relationship of the SNAPs to the TPR proteins adds to previous work identifying NSF-related proteins with nuclear and cell-cycle functions~o,ll.Tal~en together, these findings suggest that similar kinds of protein interactions participate in cell functions as diverse as cell-cycle regulation and neurotransmitter release.
Since the flanking hinge regions are sufficiently long to form functionally independent domains, we have subjected them to sequence-database searches. Indeed, the segment following the spectrin repeats showed significant sequence similarity (probability of a chance match
sequence appears to be the result of a recent duplication and thus allows the length of the domain to be estimated at about 40 amino acids. The WW domain is often flanked by histidine-rich or cysteine-rich regions, which might bind metal ions, as in dystrophin I. The domain itself appears to contain ~strands (Fig. 2) grouped around four conserved aromatic positions. The presence of both a hydrophobic core and numerous charged residues (Fig. 2) is reminiscent of well-characterized protein modules involved in protein-protein interactions. Like the Src homology domains SH2 and SH3, and the pleckstrin homology (Fir), domain, the WW domain occurs in a variety of molecules with no apparent common function. Despite their functional diversity, all of the proteins listed in Fig. I seem to be involved in cell signalling or regulation. Dystrophin and utrophin are more than 70% identical in sequence~; they form tetramers via their spectrin-like repeats and are thought to have multiple
References 1 SOdhof,T. C., De Camilli, P., Niemann, H. and Jahn, R. (1993) Cell 75, "1-4 2 Badnaga, M. (1993) Science 260, 488-489 3 Goebl, M. and Yanagida, M. (1991) Trends Biochem. ScL 16, 173-177 4 Sikorski, R. S. et al. (1991) Cold Spring Harbor Syrup. Quant. Biol. 56, 663-673 50rdway, R. W., Pallanck, L. and Ganetzky, B. ('1994) Proc. Natl Acad. Sci. USA 91, 5715-5719 6 Whiteheart, S. W. et al. (1993) Nature 362, 353-355 7 Gdff, I. C., Schekman, R., Rothman, J. E. and Kaiser, C. A. (1992) J. BioL Chem. 267, 12106-12115 8 Sikorski, R. S., Boguski, M. S., Goebl, M. and Heiter, P. (1990) Cell 60, 307-317 9 Devereux, J., Haeberli, P. and Smithies, O. (1984) Nucleic Acids Res. 12, 387-395 10 F-~hlich,K. et al. (1991) J. Cell Biol. 114, 443-453 11 Thorsness, P. E., White, K. H. and Ong, W-C. (1993) Yeast9, 1267-1271
RICHARD W. ORDWAY, LEO PALLANCK AND BARRY GANETZKY Laboratory of Genetics, 445 Henry Mall, University of Wisconsin-Madison, Madison, Wl 53706, USA.
531