Gene, 128 (1993) 189-195 0 1993 Elsevier Science Publishers B.V. All rights reserved. 0378-I 119/93/$06.00
189
GENE 07149
Characterization
of Xenopus Zaevisy-crystallin-encoding
genes
(Lens; gene family; sequence identity; genome duplication; promoter elements)
Beverly D. Smolich”, Sharon K. Tarkingtonb, Margaret S. Sahab, Dean G. Stathakisb and Robert M. Graingerb “Syntex Discovery Research, Palo Alto, CA 94304, USA. Tel. (415) 354-7107;
and “Department of Biology, University of Virginia, Charlottesville, VA
22903, USA
Received by J. Piatigorsky: 12 October 1992; Accepted: 1 February 1993; Received at publishers: 11March 1993
SUMMARY
In order to gain insight into crystallin (Cry)-encoding gene (cry) evolution and developmental function, we have determined the gene structure and sequence of several Xenopus laevis y-cry. These encode the most abundant Cry in the embryonic lens. Four of the X. laevis y-cry, which are part of a multigene family, were isolated from a X. laeuis genomic library and demonstrated to have the same gene structure as y-cry from other vertebrates, thereby providing further evidence that the split between l3 and y members of the Pr cry family occurred relatively early in evolution. Sequence comparisons indicate that these X. laeois genes share 88-90% nucleotide sequence identity in the protein coding regions, which is slightly higher than the identity observed between y-cry of other species. The 5’ upstream regions of X. laeuis y-cry contain a few short stretches of homology and one putative promoter element conserved among all cry genes but lack other regions common to y-cry promoters from other organisms. The deduced amino acid sequences of all four genes and one cDNA suggest that the structure of X. laevis y-Cry is highly conserved with that of other vertebrate y-Cry, as deduced from the known three-dimensional structure of bovine yB Cry.
INTRODUCTION
Crystallins (Cry), the major structural proteins of the lens, are subdivided into distinct immunological classes in vertebrates. In all vertebrates, c1-, p-, and y-crystallins are produced; in addition, taxon-specific crystallins are present in many species, such as 6 which is restricted to birds and reptiles (Wistow and Piatigorsky, 1988). p- and y-Cry form the evolutionarily related l3r superfamily sharing a structural motif known as the greek key. These genes are hypothesized to have evolved from an ancestral Correspondence to: Dr. R.M. Grainger, Biology Department, University of Virginia, Charlottesville, VA 22903, USA. Tel. (804) 982-5495; Fax (804) 98225626; e-mail: rmg9p@virginiaodu
Abbreviations: aa, amino acid(s); bp, base pair(s); cDNA, DNA complementary to RNA; Cry, crystallin(s); cry, gene encoding Cry; kb, kilobase(s) or 1000 bp; Myr, million years; nt, nucleotide(s); RNase, ribonuclease; SDS, sodium dodecyl sulfate; X., Xenopus.
gene with a single greek key motif to the present fourmotif genes through a series of gene duplication and fusion events. The distinct classes of Cry proteins are highly conserved among different vertebrates and are developmentally regulated during lens development (Wistow and Piatigorsky, 1988). The carefully coordinated temporal and spatial expression of various Cry species during lens differentiation is crucial, since it is proposed that the short range spatial order of Cry is responsible for maintaining the transparency of the lens, a condition necessary for proper lens function (Delaye and Tardieu, 1983). As a prelude to studying lens determination and differentiation in Xenopus, we have cloned members of the ycry gene family, which encode the major proteins of the embryonic lens in Xenopus (McDevitt and Brahma, 1973). y-Cry are among the first Cry to appear during Xenopus development and are the most abundant in the
190 embryo, accounting for 75% of the Cry protein by the late tadpole stages (Brahma and Bouts, 1972; McDevitt and Brahma, 1973). In every organism in which they have been examined, y-Cry are encoded by a multigene family that is highly conserved in sequence, sharing 60-90% nt sequence similarity in the protein coding regions and also sharing conserved promoter elements in the 5’ upstream regions (Lok et al., 1989). The ‘~-crystructure is also conserved between species, each gene containing a small 5’ exon which encodes the first 3 aa of the protein, foilowed by the two major protein coding exons, which correspond exactly to the borders of two protein domains; each domain in turn contains two greek key structural motifs (Breitman et al., 1984; Tomarev et al., 1984; Meakin et al., 1985; Den Dunnen et al., 1986). Although much is known about ycry structure and evolution in mammals, relatively little is known regarding this gene family in lower vertebrates such as Xenopus, the focus of the present study.
RESULTS AND DISCUSSION
(a) Isolation of a Xenopus y-cry cDNA and estimation of y-cry gene number in Xenopus As a first step towards cloning Xenopus y-cry genes, we isolated a Xenopus :)-cry cDNA (ybcry) based on homology to a Rana temporaria y-cry cDNA (72; Tomarev et al., 1984). A cDNA library from stage-42 Xenopus tadpoles (stages according to Nieuwkoop and Faber, 1967) was screened at high stringency (50% formamide~l M NaCl/I% SDS at 42°C) with the Rana cDNA. Of several hybridizing plaques, one was chosen for sequence analysis (see section c), which confirmed its identity as a ‘:-cry cDNA. To assess whether y-Cry in Xenopus are encoded by a multigene family, as is the case in all organisms examined thus far (Wistow and Piatigorsky, 1988), y&cry was hybridized to Xenopus genomic DNA digested with EcoRI or BumHI (data not shown). A number of bands were detected following hybridization under the same conditions used to isolate yd-rry, suggesting that multiple ;i-crq’ genes exist in Xenopus. (b) lsoiation and organization of Xenopus y-cry genes A Xenopus genomic library (provided by D. Melton) was then screened to isolate members of the y-cry gene family. Using @-cry as a probe, multiple plaques hybridized under the same stringency conditions used for genomic Southern blots. Additional positive plaques were detected using another Rana cDNA as a probe, also under the same hybridization conditions (74; Tomarev et al., 1984). Four of the y-cry-positive phage were chosen
Fig. I. Partial restriction maps and schematic representation of y4-cry and the i; phages containing :‘I-, y2-, ;‘3- and YS-cry. Filled boxes denote coding regions, open boxes represent untranslated regions. The exons are numbered I, 2, or 3. The distance between exons 2 and 3 in the gene linked 5’ to y/-cry has not been accurately mapped. Convergent arrows mark the boundaries of the region that was sequenced in each phage. The entire cDNA (y&cry)was sequenced. B, BarnHI: Bg, &/II; E, EcoRI; H, HindIll: P, Psrl; Pv, Pt~ull.
for further characterization and were designated h>sI-,y2-, y3-, y4- and yPcry. Restriction maps of these phage are shown in Fig. 1, as wet1 as gene structures, inferred from the results described below. In addition to the four genes and one cDNA which we completely sequenced, we have identified by partial sequencing and restriction analysis of y-cry-positive phage (data not shown) at least four other Xenopus y-cry genes. Sequences from the nonconserved untranslated regions and introns confirm that these are unique genes. Thus, it is estimated that the Xenopus y-cry gene family contains at least nine genes. This is somewhat higher than the number of y-cry genes previously reported in other organisms, six for rats and humans and at least four for mice (Meakin et al., 1985; Den Dunnen et al., 1987), but is within the range expected based on genomic Southern blot patterns. Xenopus may actually contain as many as two genes for every single copy gene in other organisms due to a genome duplication event (Kobel and Du Pasquier, 1986); therefore, an estimate of nine or more ycry genes in Xenopus would be consistent with its proposed tetraploidy. Such a dupIication event among y-cry genes is suggested by the sequence similarity between y3and yPcry, described beiow. The chromosomal organization of many y-cry genes from other species indicates that members of this gene family are chromosomally linked. All six rat y-cry genes are found on the same chromosome (Den Dunnen et al., 1987), and the six human genes are also present on a single chromosome (Meakin et al., 1985). In Xenopus, based on restriction mapping and partial sequence analysis, at least two y-cry genes are also linked, as indicated in Fig. I. Phage hyl-cry, besides containing yi-cry, also
191 includes another partial y-cry which is located upstream from yl-cry and is truncated at the BumHI site in the second exon. The chromosomal organization of the remaining genes is unknown. fc) Nucieotide sequence analysis Prior to the cloning of Xenupus y-cry genes, there was uncertainty as to whether the frog y-cry genes would be P-like or y-like in organization (Wistow and Piatigorsky, 1988), since the two classes of cry are believed to have arisen from the same ancestral gene and few nonmammalian y-cry genes have previously been cloned. The nt sequences of the four y-cry genes and one cDNA were determined, and from the consensus splice site junctions the intron-exon structure of the genes was deduced. Based on this analysis, the structure of the Xe~o~~s y-cry genes is conserved with that of mammalian y-cry genes and differs from the structure of the related &~y. Each Xenopus y-cry consists of three exons, the first encoding the first 3 aa of the protein, exon 2 encoding protein motifs I and II, and exon 3 encoding motifs III and IV. The related p-cry genes contain four major exons, each one encoding a single structural motif of the protein (Den Dunnen et al., 1986). The structures of two other recently cloned nonmammalian y-cry genes, from carp, are also similar to mammaiian y-cry (Chang et al., 1991). While the positioning of the introns at the protein domain boundaries is conserved in Xenopus y-cry genes, the intron lengths vary quite dramatically. The first introns are much longer (0.38-1.4 kb) than in mammalian y-cry, in which they are uniformly 80-100 bp in length (Meakin et al., 1985; Den Dunnen et al., 1987). The bovine ys gene, which is a more distantly related member of the fir superfamily, also has introns which differ in length from the mammalian y-cry (Van Rens et al., 1989). The second introns of the Xenoplss genes are also variable, although similar variation has been reported in previously sequenced y-cry genes (Den Dunnen et al., 1986). The introns of the two carp y-cry genes differ from both the mammalian and Xenopus genes, with the first introns 162 and 275 bp, while the second introns are only 140 bp (Chang et al., 1991). The coding sequence of Xe~o~us yl-, y2-, y3-, y4- and yhry, along with the 5’ and 3’ flanking sequences where known, are presented in Fig. 2, &d-cry lacks the first 6 nt of the coding sequence, and y.5-cry is truncated in exon 2.) Xenopus y-cry genes share 85-90% nt sequence identity in their coding regions (Table I). The least similar gene is yl-cry; in fact, it exhibits changes at several nt residues which are conserved in Xenopus and some mammalian y-cry genes. The overall high degree of sequence conservation in the Xe~o~~s genes is observed between a few y-cry genes in mammals; however, the majority
TABLE I Sequence comparison of Xenopus y-cry Gene”
yI vs. y2 yt vs. y3 yl vs. y4 y1 vs. y5 y2 vs. y3 y2 vs. y4 y2 vs. y5 y3 vs. y4 y3 vs. ys u4 vs. y5 Average
Total coding region
Percentage identity of motifs” [nt (aa)]
Cnt@a)lb
I
2
3
4
8X(82) 88(86) 88(86) 88(86) 90(86) 94(93) 90(85) 91(90) 98(98) 91(90) 91(88)
WW
SS(82) 84(86) 86(86) 86(85) 88(86) 95(95) 90(85) 91(89) 98( 100) 89(88)
89(80) 89(U) 92(88) 90(85) 92(83) 96(90) 92(83) 94(93) 100(100) 94(93) 93(88)
88(85) SS(89)
87@S) 85(86) 91(87) 94(97)
88(86)
88(87)
89(88)
90(83) 88(87] 89(89) 93(89) 89(87) 92(91) 97(96) 91(93) 90(89)
“See Fig. 2. bNumbers represent percentages of identical nt and aa (aa in parentheses) in the indicated regions. ‘The four motifs of the protein (see Figs. 2 and 4) are named according to Wistow et al. (1983). A blank indicates a region where the sequence is not known. The sequence corresponding to the 4-aa connecting peptide between motifs II and III is not included in the analysis.
exhibits less similarity (75-80%) (Meakin et al., 1985). The y-cry cDNAs from another frog, Rena te~porQria, share only 70% identity with each other in the coding region (Tomarev et al., 1984). The homology between the Xenopus y-cry sequences is uniform throughout the four protein coding motifs, as opposed to the situation in other organisms, where motif III is generally more diverged. Since we may have cloned only about one-half of the Xenopus y-cry genes, we cannot be certain that the sequence divergence we have found is representative of the entire range of sequence divergence of Xenupus y-cry genes, especially since all of the genes were isolated with Rana probes at high stringency. However, we estimate that sequences sharing as little as 75% identity could have been detected in the original screen of the library, based on the hybridization conditions used. There is no discernible homology in the 5’ and 3’ untranslated regions and introns (not shown). The y3- and y5-cry exhibit even more striking identity, sharing 98% nt similarity in the coding region, differing in only the last 2 aa. This level of identity is much higher than that observed between other pairs of Xenopus y-cry genes, indicating that y3- and y.5-cry may be duplicated genes resulting from the tetraploidization of the Xenopus genome (Kobel and DuPasquier, 1986). The two genes share only 70% identity in the 3’ untranslated region, while the intron sequences show no significant homologies, demonstrating that they are different genes. Duplicated genes have been described for a number of other Xenopus genes, including those encoding integrins
192 1 qag-t-ttqaat-aact-qgaatgctqa-tact-cttgqtaa~ctatgaa-----caaaagcaccagcatqaactaagaaatgactgtcacag-cac~ac-atca-aacaca---atqca c g --a --a-a t tq 2 a 4 99 a aa --a -a g----- t tt a t -_ q gg -c t 9 9 c ---- a aa tc qct -- - g -a qtaaaa t gg 9 at-- g tg - t 3taaaqq g
103 --t 99
aqaca-aaqgttctctgcttt-tgccttacttqacttgacagaaa-aaqtaagctctaqgttcatcgcatgcagac cagcaatacatga tatat ag---gctqctqqctctttct-ac- 215 -t t 1~;~c::;rTt~;~~t9~~ :,I a_ ‘:: Lca: 1%; t tt - -tea g aa -tag cg gt a 9 sag
2 3 4
a ta tat tt
a aaaaqgt
c
a
caa a -- t
tt
c
-
t - 9-c
9= tqa tc- tt a
A
-
___-- q_"_:' &
G /.38kb/ il.4kbf G / --I
T T T
C CA
C C c
TG AG TG
h
CA AA
AG
CA
AG
702 1718
G G
70
* XI
1 TGTCCTCATACTTCAATCGCTGTAACTCCATCAGGGTAGAGGGTGGAAACTGGATCCTCTATGAGCACCCCAGTTATAAGGGAAACCAATATTATCTCTGGAAAGGAGAATACCCAGATT T G CG C G C G G G 2 T T AA C G C G G CG C 3 T T C GA C T G CC C G C G A C A 4 c T T * G C G G C 5 CG XII TTCAGAGATGGATGGGCTTCRATCACTCCATTAGATCCTGTCGCTTTCTTCCCAAT/.66kb/TACCAAGGCCAATACRAAATGAGAATATACGAGAGGAGACTACCAAGGRCAGATG T C T GA C AG A /.26kb/C T TT GA T C CT T G c C G A GAG TT /.SBkb/C G C T C A -fA T TA T G CT TT G C AG CA A c / T C A GAG TT 1.35kb/C G C T T G C G c
c C c c
1160 822 1838 190 69
1933 1195 AT 2531 303 AT 532
JY ATGGAGTTCTTTGATGACTGCCCCRATACTTACTTATGAGCGATTC~TTTCCATGACATTCACTCCTGCAATGTGTTTGATGGCCACTGGATGTTCTACGAGGAACCCC~CTACAGAGGACGT T G G C G G C T CG G A T G A G G G T CG C AG G A T G A T G G G CG
2053 1315 2651 423 652
CAGTACTACCTGAGACCTGGAGAATACAGGAGATACAGGAGATAC~TGATTGGGGAGCCTCAAGTCCTAG~TTGGCTCATTTAGAAGAGTTTATCACAGGTTTtaaatcaat-tcagaaat--ctaa 2170 cc ATG C CGC A C CC TT a ga = 9a ;;;; C CG C G C TT -- t q a--a T T C T cc TG C GC TTG ca - -gt c--a - 538 t C CGC c t t c-ac c-ata c 770 G C T T C TT G
aacaadtatdtacaa9tt-+=i#---gcaaattqttttaaattactt 2217 II t99 a t ta - g-- t ca - --c - ccaa 812 tttg a---tact t gl
Fig. 2. Partial nt sequences of Xenopus ;‘I-, r2-, ;‘3- , yf- and ~5-cry. Restriction fragments were subcloned into Bluescript vectors (Stratagene, La Jolla, CA), and templates for sequencing both strands were prepared by unidirectional deletions using exonuclease III digestion (Henikoff, 1984). Sequencing reactions were performed using Sequenase according to the supplier’s instructions (US Biochemical. Cleveland, OH). Sequences are aligned relative to yl-cry. Coding sequences are in capital letters, noncoding sequences are in lower case. A blank space indicates identity with ;‘Icry. Dashes signify gaps introduced for alignment purposes. The sequence of the introns in the genomic sequences are not shown, but the lengths of the introns are indicated. The putative TATA boxes and poiyadenyiation addition sequences are boxed for each gene. The conserved I5-bp consensus sequence is labeled and bracketed in the 5’ upstream region of each gene. The beginning of the ~4- and ;5-cry sequences are indicated by an asterisk. The beginning nt of each protein coding motif is marked I, II, III, or IV, with the first nt of each motif overlined. A dot marks every tenth nt of the first 113nt of y/-cry. The GenBank accession Nos. for yf- , ;‘2-, ;3- , 74 and 7.5cry sequences are, respectively, M99579 through M99583.
(DeSimone and Hynes, 1988) and actin (Mohun et al., 1988). These genes are 95599% similar in their coding regions at the nt level, while the untranslated regions share 70-80% identity, similar to the case with ~3- and @-try. A comparison between the nt sequences of y-cry genes from Xenopus and from other species also reveals significant conservation. Xenopus y-cry genes share 68-77% identity with those from Rana temporaria, and 67-70% identity with mammalian sequences. Interestingly, when mammalian y-Cry and Xenopus y-Cry are compared to each other, motif III is somewhat more divergent, e.g., proteins encoded by y3-cry and the rat YE gene share 65% identity in motif iI1, as opposed to ?‘I-73% identity in the other motifs. Given the fact that motif III is on the
outer surface of the protein, it may play an important role in intermolecular interactions. Since it has been suggested that interactions between Cry are necessary for retaining lens transparency (Delaye and Tardieu, 1983). this motif may have been subjected to unique selective pressure related to specific interactions undertaken by each y-Cry polypeptide (Wistow and Piatigorsky, 1988). Orthologous relationships between mammalian y-q genes have been established, for example, between mice and rats (Aarts et al., 1988). However, despite similarities in the nt sequence of approximately 70%, orthologous relationships between Xenopus y-cry genes and those from other species cannot be established, since each Xeno~~s gene is no more similar to one mammalian gene than to another. This suggests that the divergence
193 between frogs and mammals occurred prior to the gene duplication event within the y-cry family. Evolutionary relationships among y-cry genes from different organisms are depicted schematically in the dendrogram in Fig. 3, generated by a multiple nt sequence alignment of the yC~JJlisted, and shows, for example, that mouse yE is more similar to rat yE than to mouse yD. As with the Xenopus family, the Rana y-cry cDNAs which have been cloned cannot be paired with their mammalian counterparts, nor is it possible to align Xenopus and Rana sequences orthologously, although the Xenopus genes are slightly more similar to the Rana ~4 than to any other Rana gene (Fig. 3). Xenopus and Rana y-cry genes are only slightly
more similar to each other than Xenopus and mammalian genes are (i.e., approximately 73% versus 68%). This is probably a reflection of the fact that frogs are evolutionarily a diverse group, and Xenopus and Rana are members of different suborders established approximately 180 Myr ago, while amphibians and mammals diverged approximately 350 Myr ago (Vial, 19733.The dendogram also reveals the difference between yl-cry and the other Xenopus y-cry genes. (d) Amino acid sequence comparisons The aa sequences deduced from the nt sequences of ?I-, y2-, ‘~3-,y#- and y5-cry are shown in Fig. 4, aligned according to the four structural greek key motifs of bovine yB Cry, whose three-dimensional structure is
BovS Carps MouseE RatE MouseD
I
I
BovB MouseB Xen3 Xen5 Xen2 Xen4 Xenl Rana Rana Ranal Cam1
Fig. 3. Dendogram depicting the relatedness between various y-cry genes. The dendrogram was derived using the Pileup program in the Genetics Computer Group (Madison, WI) Sequence Analysis Software Package (Devereux et al., 1984). The dendrogram depicts the clustering relationships generated by progressive pairwise alignments of a group of related nt sequences. The distance along the horizontal axis is proportional to the difference between sequences; the distance on the vertical axis has no significance. The sequences listed were obtained from: Xenl, 2, 3.4, and 5, Xenopus jli-, y2-, y3- , y4- and y5-cry (this paper); BovS, bovine ys (Van Rens et al., 1989); Carps, carp ys (Chang and Chang, 1987); MouseE. B, and D, mouse yE, yB, and yD (Breitman et al., 1984): RatE, rat yE (Den Dunnen et al., 1986); Ranal, 2, and 4, Rana gl, yZ and ~4 (Tomarev et al., 1984); Carpl, carp yl (Chang et al., 1991).
Motif I y2-cry: yl-cry: 73-cry: 74-cry: Ranay2: RatyF: Mwseyf: HumanyD:
Motif
11
y2-cry: yl-cry: y3-cry: y4-cry: y5-cry: RanayZ: RatyF: MauseyF: HumanyD:
Motif
IFFtEERNSQ;;RCY~~SSECSDLSSY~N~~N~~RVEN ... .. .. .. ...H...G.DY................G . . ...DX.....S...N.D.................S .,................,.......... .. ... ..S .I ___. ..D.......,...GD.A..H...S.....K.DS .. .T...D.G....H....TDH.N.QP..S....V..DS .T...D.G....H....TDH.N.QP..S....V..DS .TL..D.G....H.....DHPN.QP.LS....A..DS ,IX :o 1ii
“I*
**
.
*
IV
72-cry: yl-Cry: y3-cry: 74.Cry: ys-cry: Ranayz: RatrF: MouseyF: HumanyD:
.
G~ILYEQPSYRG~QYYL~GEYFDFQR~GFNEYIKS~RFIPN .. . . .. .H...K.N...................DS.R....L.Q ,_..,,_....._.................... DS.R...MS.Y ... . .. .. .._......_..._.....‘..... D.........Q ---..,.... ... ... . .. . .. ... .. ... ...DS.R...MS.Y .C.MI..R.NFL.. ..F.K.....NY.Q....SUSVR..KV..Q .C.M.....NFT.C..F.RR.D...Y.Q....SDSVR..HL..H .C.M... ..NFT.C..F.RR.D...Y.Q....SDSVR...L...C.M...Q.N.S.L..F.RR.D.A.H.Q...LSDSVR...L..ii, 70 80 4:j bii
III
yz-cry: yl-cry: 73-cry: +-cry: y5-cry: Ranay2: RatyF: MouseyF: HumanyD:
Motif
ER GK GK --
HHGQ YQ.. .Q.. N.S. .Q.. 4K.P -SSS YSSS YS.S
t YKMRI;ERGD$Q~~EFSDDfPNTyDRFSFRDIHSfNVSE ... .. ... ..Y..Q....F.......E..N.H.......FD ..~*...,.....QN...FE.........R.........FD .. ..... .... ..Q....F..........R..........D .. . .. .. .. ....QN...FE.........R..........FD H..K...KEELK.Q.L.VLE...SVFEH.KNH..N....L. HRI .. ...E.YR.Q.V.IT....HLQ...H.S.F..FH.I. HRI.....E.YR.Q.V.XT....HLQ...H.S.F..FH.M. HRI.L...E.YR.QVI..TE..SCLQ...R.NE...L..L. UU 110 120 100
.**
**
*
.
*
I
G~FYEEPNYRGRQYYLR~EYRKFSDWGASTARIGSFRR~~F .. .. ... .. .. .. ... .. .... ..R~.....SP........Y.R. .N......................RY......S...........I. .N.........K............R...,,..S............V .N....... . ..... .. .. ... ..RY......S...........LV . . . I...Q........F.K....KR.....SLN..VS.....LDS. Y.VL..M........L...R...RY.....MN..V..L..IMDYY :Y.VL..M..... ...L.......RYH....MN..V..L..IMDFY S.VX.,..LS.......L.M..D..RYQ....TN..V..L...iDFS 140 iso 160 170 i,n
Fig.4. Comparison of deduced aa sequences of Xenopus y-Cry. Sequences are aligned according to y2-cry. Dots indicate residues identical to those in ytcry. Dashes denote unknown aa. The numbering system corresponds to that of Wistow et al. (1983). Conserved residues required for protein folding are indicated by asterisks. Conserved Cys are underlined. Residues important for other structural properties are marked above with a large dot. In numbering the aa, the last digit of each numeral is aligned with the corresponding aa. Sequences other than Xenopus are derived from the following references: Rana ~2, Tomarev et al. (1984); rat yF, Den Dunnen et al. (1986); mouse yF, Breitman et al. (1984); human yD, Meakin et al. (1985).
194 known (Wistow et al., 1983). The Xenopus y-Cry, like Rana, are 1 aa longer than most of the mammalian proteins, due to a 4 (rather than 3)-aa connecting peptide between motifs II and III, encoded by the third exon. The Xenopus sequences share significant homology with mammalian y-Cry sequences, and the key residues required for folding into the structure characteristic of yCry are conserved. These are Tyr6, Phe”, Gly13, and Ser34 in motif 1, and equivalent residues in the other three motifs. Tyr is occasionally substituted for Phe” and its equivalent residues in the other three motifs. Trp42 and Trp68 and equivalents in the second domain are important for the packing of hydrophobic side chains in the core and are also conserved. Interdomain contact residues including Ile43, Tyr56, Ilesl, Met’32, Tyr’45, and Va117’are also identical or vary conservatively. The unusually high Met content of the carp y-Cry is not observed in Xenopus y-Cry (Chang et al., 1991). The number of Cys present in y-Cry is unusually high given an intracellular protein of its size (Fahey et al., 1977). Each of the Xenopus y-Cry proteins contains at least five Cys, while on average, two would be predicted. In addition to the large number, four of the Cys are at positions conserved with mammalian y-Cry (Den Dunnen et al., 1986). Various functions in the normal lens have been suggested for some of these conserved Cys, and it has been hypothesized that disulfide bond formation between these Cys could contribute to cataract formation (Duncan, 198 1). Compared with mammalian y-Cry, two striking differences in the deduced aa sequences of Xenopus y-Cry are apparent. In y2-Cry the first two aa are changed, from the consensus Gly-Lys to Glu-Arg, and in yCCry the first aa of the cloned cDNA is changed from Lys to Glu. The Lys to Arg substitution in y2-Cry is a conservative change and is also seen in the carp ys-Cry (Chang and Chang, 1987). The two other changes are unique to these genes, however, and their effect on the structure and/or stability of the proteins is unknown. In particular, the substitution of Glu for Gly’ in y2-Cry is unusual. While it is possible that these changes are due to a cloning artifact, base changes at two separate positions would be required to restore the conserved sequence and keep the GT splice junction in the correct position. It should be noted that it is not known whether y2- or yCCry are translated, although y2-Cry is transcribed into mRNA, based on RNase protection analysis (unpublished observation). (e) Analysis of 5’ upstream regions
The 5’ upstream regions of three genomic sequences from y1-, y2- and $-cry were analyzed to compare at the nt level possible regulatory regions of Xenopus y-cry
genes. The only matches to the consensus TATA box sequence are located approximately 60 bp upstream from the putative start codon in each gene, with an additional TATA sequence located 30 bp farther upstream in y3-cry (Fig. 2). Immediately 5’ to the proximal putative TATA box is a region which matches a 15-bp consensus sequence found in all cry (Thompson et al., 1987). The location of this sequence varies between classes of cry, but it is generally located immediately 5’ to the functional TATA box in y-cry (Lok et al., 1989), suggesting that these sequences may be functional promoter elements in Xenopus y-cry; however, the 15-bp sequence is not sufficient to confer tissue-specificity in other organisms (Lok et al., 1989). Matsuo and Yasuda (1992) recently showed that one motif of the chicken uA-cry enhancer encompasses this sequence. There is not a second copy of this conserved element near the distal TATA sequence of y3-cry.
Aside from this 15-bp consensus sequence, Xenopus y-cry do not contain significant similarities to any of the sequences which are conserved in the promoter regions of mammalian y-cry. This includes, for example, the G+ C-rich region contained in the functionally defined proximal domain of the murine yF promoter, which is located upstream from the 15-bp consensus sequence in the murine yF gene and in many other mammalian y-cry (Lok et al., 1989). In addition, Xenopus y-cry do not display similarities to a recently defined activating sequence in the rat yD gene, which overlaps at its 3’ end the first 4 nt of the 15-bp consensus sequence (Peek et al., 1992). The yl-, y2- and y3-cry do share among themselves short stretches of homology present mainly within 250 bp of the start codon (as can be seen in Fig. 2). The transcriptional regulatory function of these conserved regions is unknown, but for many cry genes, sequences within this range are sufficient for conferring tissue-specific regulation, although a single sequence conserved among all cry which confers lens-specific expression has not been identified (Peek et al., 1990). (f) Conclusions (1) Xenopus y-Cry are encoded by a multigene family, similar to y-cry in other organisms. Xenopus may actually contain a greater number of y-cry due to a genome dupli-
cation event. (2) Xenopus y-Cry exhibit overall strong evolutionary conservation with y-Cry from other organisms, and the sequence similarity among Xenopus y-Cry family members sequenced to date is considerably higher than among y-Cry proteins in other organisms. (3) The gene structure of Xenopus y-cry resembles mammalian y-cry and not the related /?-cry genes, further demonstrating the occurrence of this gene structure prior
195 to the divergence of mammals during vertebrate evolution. (4) Promoter elements unique to Xenopus y-cry may be required for spatial and temporal regulation, since many elements conserved in mammalian y-cry are not present in Xenopus y-cry.
ACKNOWLEDGEMENTS
We wish to thank Kay Gulding for technical assistance and Tom Duensing for help with the figures. This work was supported by a National Institutes of Health Traineeship (5T32HD07192) to B.D.S. and M.S.S. and NIH grants EY-06675 and EY-05542 and National Science Foundation grant D~B90~54~8 to R.M.G.
REFERENCES Aarts, H.J.M., Den Dunnen, J.T., Leunissen, J., Lubsen, N.H. and Schoenmakers, J.G.G.: The y-crystallin gene families: sequence and evolutionary patterns. J. Mol. Evol. 27 (1988) 163-172. Brahma, S.K. and Bours, J.: Thin layer isoelectric focusing of the soluble lens extracts from larval stages and adult Xenopus laeois. Exp. Eye Res. 13 (1972) 309-314. Breitman, M.L., Lok, S., W&tow, G., Piatigorsky, J., Treton, J.A., Gold, R.J.M. and Tsui, L.-C.: y-Crystallin family of the mouse lens: structural and evolutionary relationships. Proc. Natl. Acad. Sci. USA 81 (1984) 7762-7766. Chang, T. and Chang, W.-C.: Cloning and sequencing of a carp pscrystallin cDNA. Biochim. Biophys. Acta 910 (1987) 89-92. Chang, T., Lin, C.-L., Chen, P.-H. and Chang, W.-C.: y-Crystaltin genes in carp: cloning and characteri~tion, Biochim. Biophys. Acta 1090 (1991) 261-264. Delaye, M. and Tardieu, A.: Short-range order of crystallin proteins accounts for eye lens transparency. Nature 302 (1983) 415-417. Den Dunnen, J.T., Moormann, R.J.M., Lubsen, N.H. and Schoenmakers, J.G.G.: Concerted and divergent evolution within the rat y-crystallin gene family. J. Mol. Biol. 189 (1986) 37-46. Den Dunnen, J.T., Szpirer, J., Levan, G., Islam, Q. and Schoenmakers, J.G.G.: All six rat y-crystallin genes are located on chromosome 9. Exp. Eye Res. 45 (1987) 747-750. DeSimone, D.W. and Hynes, R.O.: Xenopus laeuis integrins: structural conservation and evolutionary divergence of integrin 8 subunits. J. Biol. Chem. 263 (1988) 5333-5340. Devereux, J., Haeberli, P. and Smithies, 0.: A comprehensive set of
sequence analysis programs for VAX. Nucleic Acids Res. 12 (1984) 387-395. Duncan, 6.: Mechanisms of Cataract Formation in the Human Lens. Academic Press, New York, 1981. Fahey, R.C., Hunt, J.S. and Windham, G.C.: On the cysteine and cystine content of proteins. J. Mol. Evol. 10 (1977) 155-160. Henikoff, S.: Unidirectional digestion with exonuclease III creates targeted breakpoints for DNA sequencing. Gene 28 (1984) 351-359. Kobel, H.R. and Du Pasquier, L.: Genetics of polyploid Xenopus. Trends Genet. 2 (1986) 310-315. Lok, S., Stevens, W., Breitman, M.L. and Tsui, L.-C.: Multiple regulatory elements of the murine y2-crystallin promoter. Nucleic Acids Res. 17 (1989) 3563-3582. Matsuo, I. and Yasuda, K.: The cooperative interaction between two motifs of an enhancer element of the chicken crA-crystallin gene, aCE1 and aCE2, confers lens-specific expression. Nucleic Acids Res. 20 (1992) 3701-3712. McDevitt, D.S. and Brahma, SK.: Ontogeny and localization of the crystallins during embryonic lens development in Xenopus laeuis. J. Exp. Zooi. 186 (1973) 127-140. Meakin, S.O., Breitman, M.L. and Tsui, L.-C.: Structural and evolutionary relationships among five members of the human y-crystallin gene family. Mol. Cell. Biol. 5 (1985) 1408-1414. Mohun, T., Garrett, N., Stutz, F. and Spohr, G.: A third striated muscle actin gene is expressed during early development in the amphibian Xenopus he&. J. Mol. Biol. 202 (1988) 67-76. Nieuwkoop, P.D. and Faber, J.: Normal table of Xenopus Iaevis (Daudin). North-Holland, Amsterdam, 1967. Peek, R., Van der Logt, P., Lubsen, N.H. and Schoenmaker, J.G.G.: Tissue- and species-specific promoter elements of rat y-crystallin genes. Nucleic Acids Res. 18 (1990) 1189-1197. Peek, R., Kraft, H.J., Klok, EJ., Lubsen, N.H. and Schoenmaker, J.G.G.: Activation and repression sequences determine the lens-specific expression of the rat yD-crystallin gene. Nucieic Acids Res. 20 (1992) 4865-4871. Thompson, M.A., Hawkins, J.W. and Piatigorsky, J.: Complete nucleotide sequence of the chicken a-A crystallin gene and its 5’ flanking region. Gene 56 (1987) 173-184. Tomarev, S.I., Zinovieva, R.D., Chalovka, P., Krayev, AS., Skryabin, K.G. and Gause Jr., G.G.: Multiple genes coding for the frog eye lens y-crystallins. Gene 27 (1984) 301-308. Van Rens, G.L.M., Raats, J.M.H., Driessen, H.P.C., Oldenburg, M., Wijnen, J.T., Khan, P.M., De Jong, W.W. and Bloemendal, H.: Structure of the bovine eye lens ys-crystallin gene (formerly 0s). Gene 78 (1989) 225-233. Vial, J.L.: Evolutionary Biology of the Anurans. University of Missouri Press, Columbia, 1973. Wistow, G.J. and Piatigorsky, J.: Lens crystallins: the evolution and expression of proteins for a highly specialized tissue. Annu. Rev. Biochem. 57 (1988) 479-504. Wistow, G., Turnell, B., Summer, L., Slingsby, C., Moss, D., Miller, L., Lindley, P, and Blundell, T.: X-ray analysis of the eye lens protein y-11 crystallin at 1.9A resolution. J, Mol. Biol. 170 (1983) 175-202.