OPUS: a growing family of gap junction proteins?

OPUS: a growing family of gap junction proteins?

LETTERS OPUS:a growing fauul7 of gap luncti0n pr0teins?,-__ Recently, Kfishnan et al. z reported the cloning and sequencing of the Drosophila shaking...

316KB Sizes 0 Downloads 86 Views

LETTERS

OPUS:a growing fauul7 of gap luncti0n pr0teins?,-__ Recently, Kfishnan et al. z reported the cloning and sequencing of the Drosophila shaking-B (sbakB; alias Passover, or Pas) gene, required for the jump response to an optical stimulus. The predicted gene product was similar to those of both the Drosophila gene lethal (1) optic ganglion reduced [l(1)ogre] (Ref. 2) and the Caenorhabditis elegans gene unc-7 (Ref. 3), which together define a new family of evolutionafily conserved proteins that may be membrane-~soc~ted z-3. Below, I describe three additional members of this family, as identified by sequence

homologies. An alignment of all these sequences permits a more informed prediction of the general structure of members of this family. The structure is that of a new type of multipass transmembrane protein. On the basis of ti~e phenotypes of mutant organisms3-7, I suggest that the encoded proteins may be members of a family of invertebrate gap junction proteins. Krishnan et al. z noted that three C. elegans cDNA clonesa,9 had sequence similarity to this family. Here, I report three more, some of which may have been missed owing to their greater divergence (Fig. 1).

I propose that this new and growing protein family be named OPUS (from l(1)ogre, Pas, unc- 7 and shakB). A comprehensive alignment of all known OPUS family members (Fig. 1) reveals the following general characteristics. (1) The pattern of sequence conservation strongly suggests that all members have the same overall structure, notwithstanding the amino-terminal extension of 120 residues in Unc-7. (2) There are three major and two minor hydrophobic regions. Most residues that are absolutely conserved are associated with the major hydrophobic domains, and the 'loops' between them vary in

NTYt~IQSTYTLKSLFLKKQGVSVPYPGIGNSDGD P (C) MLDIFRGLF-NLVKVSH~.q(TDSIVF; LHYSIT~MILMSFSLIIT~RQYVGNPZD~TKDIPEDVL NTYC~/IHSTYTVVDAFMKKQGSKVPFPGVHNSQGRGP MVSHVKIDSPVF[ LHTNATV[LLITFSIAVT~RQYV~PXD~TRDIPEDVL shak~ (K) LHNS~LLTCSLIXT~QYVGQPXSeIVNG VPPHVV NTF~%qIHSTFTMPDAFRRQVGREVAHPGVANDFGD E MYKLLGSLKSYLKWQDXQTDNAVF[ ogre IPREIYSRRNRQ ...+MILYYLASAFRALYPRLDDDFVD~ L~ILASFALLVS~QYV~FPZQ~SVPATFTDAMEQYTEHYCWVQNTYWVPMQED UNC-? IPDSEIDREGAE +MFFLDAFLKGLHKQGDDDSID; L~PMLLVIFALTLS~QYVGQpIQ~IPAQFTGAWEQYSENYCFVQNTYFISPDKY wEST02207 IPGDVXDRQKAE ...WLSETFKPKTFDDAVD~ LSYVTTATLLAFFSIMVS~KQYV~SA~Q~WMPMEFKGG~IEQYAEDYCFIQNTFFTPERSE IwEST01007 LNSR~ft~/ILAVSSXLLL~SHFIGDP~T~PAQFNAQWVNFVNQYCFXHGTYFXPLDQQ LAFEEEERTKVS cmSdS |...p~T~WFPAYYKGWWAEYALDYeYVQNTFFVPFSEDKAERSYNWEQLVADKQNTTS !eml0a8 ÷MGFSAIDKLIRPFL~LH~"DNGA~ LF~(TTTIQILICFGFLVS~MMF~QPZTCLMLPETPDSSANYFHDFeFYQ÷... cmlSf10 shakB

*

shakB (C) shekB (g) ogre

UNC-7 wEST02207 wEST0100? cmgd9

~m10a8 ¢m14e10

m h a k B (R)

ogre UNC-7 cml4elO

*

*

/

*

(K)

ogre UNC-7 cml4e10

C

*

ADKK HY[ ~ F C L F F ~ LTIK HT~ ~WVAFTLFFQI~TLFYTPRWLWKSWEGGKIHALIMDLDIGI ~SEAEKKQKKKLLLDYLWENLRYHN~AYRYYV ~EFVLFF01~MACYTPKFLNNKFEGGLMRMZVMGLNIT] ~TREEKEA~RDALLDYLIKHVKRHKLYAIRYWA DAKK I~ [~PFILAIEI~LLF~f~CILWRGLLYWHSGINLQGLVQW ~DARLMDSEIKTRTVYTMARHMQDEVQLTNIDRQGHSRSCFSNLQLGANCGRHCG I~ [ ~ P F I L . . . Z~

r~QWVpZVL..,

I( ~ P Y V P A L Q ~ L F Y Z P R F Z W K A M I A Y LRQTN~I~ ~PFILALOILML~YFPVVIWRLFYGM

SGYDLAAAVKV :RPFLVRKP÷.., AGQNVTSLCNT :TATEGNEESRKG÷... , . . v'NLV

QEFLCCZNZZVQMYLI4~RPFDGEFLSYOTNZMR LSDVPQEQRVDPMVYVI~RVTKCTFHKYGFSGSLQFJIDSL(~LPLIIIVI~YVFZWJ~FW CYVTMLYZOZKVLYSANVLLQFFLLNHLLGSNDLAYGFSLLKDLMHAZEWE QTGMI~RVTLeDFEVRVL GNZHRIITVQC~LMZNMFI~IFLPLW~LTCGI CKFLDVDSWTWGFXLLGKFZHPTPRAPEFSSFSDK~RPAAZLTDGSYNRF QyI~ZLVGQEY(~LQESVSNFV~KAQ~IR~VIHI~ZFIGLYIRqLLVLTA C

shakB

* C

**

C

*

t'l

*

***

*"

LTLLTLIYRWZ~FS~RMRVYLFRMRFRLVR RDAIEIIVRRSKM GDWFLLYLLGENIDTVIFRDWQDLANRLGHNQHHRVPGLKGEIQDA LLZGLZVFRGC2ZFM~KFRPRLLNASNRMIP MEICRSLSRKLDI GDWWLIYMLGRNLDPVIYKDVMSEFAKQVEPSKHDRAK %"Iq/CNTMYWILZMFI~QGMSFVRKYLRVL~DHP~KP1ADDVTLRKFTNNFLRKDGVFMLRMISTHAGELMSSELILALWQDFNNVDRSPTQFWDAE~GQGTID LSVIG÷... 1 **4 *

F m t ~ 1, Alignmentof the amino acid sequences, demonstrated or predicted, of the Drosopbila proteins shakB (K) (Ref. 1), shakB (C) (Re['. 11), a variant eDNA from this locus (T. Barnes, unpublished), the I(1)ogregene produc# (ogre) and the C. elegansprotein Unc-7 (UNC-7) (Ref. 3), with previously identified C. elegans sequences (cmSd9, cml4el0 and wEST02207) and three new C. elegans expressed sequence tags8,9 (ESTs: cml0aS, cml8fl0 and wEST01007). These eDNA sequences have BLASTscores/fortuitous match probabilities against unc-7 of 133/3 x 10-19, 69/4 x 10.2 and 287/2 x 10-34, respectively (cml8fl0 scores 80/3 x 10.4 against wEST01007). The point at which shakB (C) becomes identical to shakB (K) is shown by the downward arrow within the box in the third row of sequences. A plus symbol at the end of a sequence indicates that the clone extends further; an ellipsis indicates that the [lame is expected to extend beyond that shown. The three major hydrophobic regions are boxed; bold horizontal lines indicate the two minor hydrophobic domains. Asterisks indicate absolutely conserved positions and these residues are also shown in bold typeface; conserved cysteines are highlighted underneath the sequence alignment by a C. It is possible that cml4elO is derived from the same gene as one of the other nematode ESTs. To achieve the alignment, certain obvious frameshifterrors in lESTsequences were corrected. These were the extra bases at position 352 and 360-361 in wEST02207, a missing base at position 340 in cmSdS, and an extra base at position 9 in cml0a8. Ambiguous residues are shown as an X. TIG SEPTEMBER1994 VOL. 10 NO. 9

303

LETTERS

of the gene family in each evolutionary lineage. These observations can be used to construct a topological model for an OPUS protein as an integral membrane protein (Fig. 2). In this model, the protein crosses the membrane four times, with both Exterior L ~ al=~ j termini in the cytosol and most charged residues and the cysteines Y )) D 3 outside the cell. What might be file function of this new family;" )) )) } Analysis of both unc-7and shakB mutants reveals neuronal defects {( ( ( 4 that affect connectivity rather than (( (( i neurogenesis3-6 and appear to involve electrical synapses. The defects in i(1)ogre mutants are also neuro:,at 7, but the gene is C expressed in a wider range of cell types t0. Given the size of the OPUS gene family and the putative role of the unc-7and sbak,B gene products ~6ttaz 2. A model for the topology of OPUS proteins in the membrane. Four at gap junctions, it is tempting to transmembrane domains are predicted. In this model, all conserved cysteines (C) are speculate that OPUS proteins are rutside the cytosol, and are predicted to form three disulfide bridges. The second nlajor the gap junction proteins hydrophobic domain is predicted to cross the bilayer twice; there is an invariant basic themselves: that is~ the invertebrate residue (+) in the middle of the domain. The third major hydrophobic domain is long enough to cross the membrane twice, but has no charged residues in the middle to equivalems of vertebrate connexins. anchor the loop, and so is shown here as crossing only once. The two most conserved Components of the invertebrate gap sequences tYYQWVand NEK)are indicated. The NEK motif is found within the third junction have not been molecularly major hydrophobic domain, and may be either exposed or buried in the membrane. defined to date. The model NH3+, amino terminus; CO0-, carboxyl terminus. proposed above has several features in common with the model proposed for connexin topology 12 including: these proteins into the membrane their length and composition. It has the presence of cytosolic amino does not occur co.translationally. been proposed that the first and carboxyl termini, four However, it should be noted that hydrophobic domain (maior) is a transmembtane domains and six Watanabe and Kankel t° have signal sequence for the shakB cysteines located within attempted to determine the proteint; however, this domain is extracellular loops; the location of subcelhlar location of the also present in Unc-7, which has a the majority of charged residues carboxyl terminus of the I(l)ogre futher amino-terminal extension outside the cell; and the lack of a gene product using confocal of 120 residues, suggesting that signal sequence. However, there is microscopy but found no it is not a signal sequence. One-7 no sl,jnificant extended similarity evidence for a simple is not predicted to have a signal between the seqttences of plasmalemmal location. sequence3. vertebrate connexins and the OPUS proteins. (4) The most conserved regions of (3) In protein families, cysteines the OPUS proteins are that form disulfide bonds are THOMAS M. hydrophobic; if these regions span usually conserved, while unpaired Depa~ment of Biology,McGillUniversiO; a membrane, they may therefore cysteines are not particularly Montreal,Canada H3A IBI. have additional functions. constrained, unless part of an active site. OPUS proteins (Fig. 1) contain References (5) The longest absolutely six absolutely conserved cysteines. i Krishnan, S.N., Frei, E., Swain, G.P. conserved sequence is the This is consistent with the presence and Wyman, RJ. (1993) CeU73, pentapeptide YYQWV. This can be of three pairs of disulfide bonds, 967-977 which would be expected to lie regarded as diagnostic for OPUS 2 Watanabe, T. and Kankel, D.R. family members, as it is not found outside the cytosoi. Both the (1990) Genetics 126, 1033-1044 3 Starich, T.A., Herman, R.K. and in any other open reading frame in presence of conserved pairs of Shaw, J.E. (1993) Genetics 133, cysteine residues and the the NCBI database. 527-541 abundance of hydrophobic 4 Baird, D.H., Schalet, AY. and domains suggest that OPUS (6) Finally, the worm and fly Wyman, RJ. (1990) Genetics 126, proteins are membrane-associated, sequences are much more similar 1045--1059 as previously suggested t-3. The lack within species than between $ Baird, D.H., Koto, M. and of a signal sequence might species, suggesting that there has Wyman, R.J. (1993)./. Neurobiol. 24, therefore imply that insertion of 971-984 been some independent expansion "FIG SEPTEMBER1994 VOL. 10 No. 9

304

LETTERS 6 Thomas, J.B. and Wyman, RJ. (1984) J. Neurosci. 4, 530-538 7 Lipshitz, H.D. and Kankel, D.R. (1985) Dev. Biol. 108, 56-77 8 Waterston, R. et al. (1992) Nature

Genet. 1, 114-123 9 McCombie, W.R. et al. (1992) Nature Genet. 1, 124--131 10 Watanabe, T. and Kankel, D.R. (1992) Dev. Biol. 152, 172-183

I1 Crompton, D.E., Griffin, A.,

Doxies,J.A. and Miklos, G.L.G.(1992) Gene 122, 385-386 12 Ztmng, J-T. and Nicholson, B.J. (1989) ]. Cell Biol. 109, 3391-3401

Species-specific differences in tum0ngenesis and senescence Differences in the rates at which tumors arise in different species are often viewed as reflecting differences in the number of steps required in the process that leacls to tumor formation. However, it is also possible that species have different intrinsic rates of mutation at particular steps in pathways of tumorigenesis. For the dominant oncogenes, the tendency toward mutation appears to differ between specie#,2: for example, while human cancers are practically devoid of mutations in codon 12 of the H-RASgene, such mutations are common in rodent cancers. Mutations in members of the Ras gene family are associated with tumors in both species; however, in rodents, the spectrum of mutation encompasses the whole family (i.e, H-, K- and N-Rash while in humans, it involves N-RASand K-RASalmost exclusively. While it is possible that the spectrum of mutations induced in rodents by chemical mutagenesis might differ from that in spontaneous human cancers, it might be that the intrinsic rate of mutation at the human H-RASlocus is much lower than that at its rodent counterpart. It has long been recognized that the region near codon 12 of the rodent H-Rasgene has the potentialto form a non-classical DNA structure, and this has been proposed to contribute to mutation at this site ~5. Since the capacity for the formation of unusual DNA structures is sequence-dependent, the evolution of an essentially mutation-proof cellular oncogene like the human H-RAS gene could have been accomplished as a result of the degeneracy in the genetic code: sequences that have lost the

human H-RASgene is identical to that of its rat counterpart, yet the DNA sequences are only 75% identical (Fig. 1). In the human, selecdoii against sequences that could form unusual DNA structures could account for the lower mutation rate and for the interspecific divergence in the DNA sequence. A similar process may have occurred at certain tumor suppressor loci. The human and mouse retinoblastoma genes are 95°/6 similar at the amino acid level, but their DNA sequences are only 84% identical 10,n. Frequencies of mutation at these loci 12,13 c a n be estimated as follows. Mutations that give rise to retinoblastoma in

capacity to form unusual DNA structures could have been selected for while preserving the amino acid sequence required for producing a functional protein. Selfcomplementary sequences that could fold back on each other, tracts of purines and pyrimidines that could form triplex or quadmplex DNAs, and GC-rich sequences that could form Z-DNA6 may hi~ider repair or promote the formation of site-specific damage in genes in which they occur7-9. The amino acid sequence of the region around codon 12 of the

Site of mutation Human

1 I

2 II

3 II

=/ i

4

F

I

I

Pudne-rich 5' GTGGTGGTGGGCGCCC~GGTGTGGGCAAG3' I I

Pyrimidine-rlch 3" CACCACCACCCGCGG.CC[~,CCACACCCGTTC 5' t

I !,_.~_.-i i

I

...Ala-Gly-Gly... 11 12 13 Site of mutation Rat

i

1

iI

2

tt '

3

i

~

4 ,/

i

5

i

Purine-rich 5' GTGGTGGTGGGCGCTG~GGCGTGGGAAAG3' Pyrimidine-rich 3' CACCACCACCCGICGA,CCL~,ICCGICACCCTTTC5' ...Ala-Gly-Gly...

11 12 13 RcamB1. Comparisonof the DNAseque~ocesof codon 12 of the rat and human H-RA$ genes. Althoughboth genes encode die shme amino acid ~quence, they have only 75% identity at the DNA level~. Sequence divergence between human and rat diminishes the capacity of the human sequence to foml triplex, quadmplex and non-Watson-Crick paired 'bid-backs' by decreasingthe purine content of the purine-rich strand, and the pyrimidine content of the pyrimidine-richstrand6. Lossof the TGG triplet that overlaps with codon 12 dramaticallyreduces the number ~¢fpotential cruciformstructures that could be generated by slipped pairings.9 between audeotides at codon 12 and adjacent triplets; it also introduces an additional CG that, if methylated, would produce a more stable duplex. One would expect the net effect to be that the human codon 12 sequence would be less prone to the formationof unusual DNAstructures and therefore less susceptible to damage, or more easily repaired. TGG triplets in the region near each mutation site are indicated by brackets and by numbers (top); amino acid residues affected by mutations are identified using the three-letter code and numbered according to their position in the protein (bonom). TIG SEPIEMBER 1994 Vot.. I0 NO. 9

305