J. Mol. Biol. (1991) 221, 175-191
Genomic Structure of Chlamydomonas Caltractin Evidence for Intron Insertion Suggests a Probable Genealogy for the EF-hand Superfamily of Proteins Vincent D. Lee, Mark Stapleton and Bessie Huang Department of Molecular Biology Research Institute of Scripps Clinic 10666 N. Torrey Pines Road La Jolla, CA 92037, U.S.A. (Received
1 February
1991; accepted 1 May
1991)
A clone containing the gene locus for Chlamydomonas caltractin, a 20,000 M, calciumbinding protein that is a member of the EF-hand superfamily of calcium-modulated proteins, was isolated and the structural organization of the gene was determined. The intron-exon organization was resolved by direct comparison of the genomic sequence with a caltractin cDNA. The promoter region does not contain the typical TATA or CCAAT boxes, but the sequences at the splice junctions are similar to those of other eukaryotes. The positions of the six introns in the caltractin gene do not typically define unit structures, nor do they coincide with those in genes for other members of the EF-hand superfamily. An analysis of exon sequences at the splice junctions in the genes of this multigene family was undertaken; evidence was obtained that supports the hypothesis that introns arose at protosplice sites. A probable evolutionary history for the EF-hand superfamily based on int’ron insertion is offered.
Keywords: caltractin;
gene structure;
introns;
1. Introduction
$03.00/0
site; evolutionary
history.
the basis of the crystal structure of parvalbumin and the amino acid sequence homology to troponin C, the E-helix : loop : F-helix, or EF-hand, structure first observed in parvalbumin was predicted to be the conformation for the calcium-binding domains in troponin C and other members of the family of calcium-modulated proteins. Crystallographic data from calmodulin, troponin C and calbindin-D 9k have since provided‘s basic structural confirmation of this hypothesis (for a review, see Kretsinger, 1987). The evolutionary relationship among the members of the EF-hand superfamily of calciummodulated proteins has been studied extensively (e.g. Goodman et al., 1979; Baba et al., 1984; Parmentier et al., 1987; Perret et al., 1988b). Most of these proteins possess four potential calciumbinding domains with homologous amino acid sequences, which can be numbered I through IV. For those proteins that harbor four domains, it has been observed that domain I is more closely related to domain III than to domains II or IV; similarly, domain II has a higher degree of sequence identity with domain IV than with domains I or III.
Caltractin, a 20,000 M, calcium-binding protein, is a component of the basal body complex, which constitutes the major microtubule-organizing center in the unicellular green alga Chlamydomonas reinhardtii and the functional homolog of the centrosome in other eukaryotes (Huang et al., 1988a,b). It comprises the major component of a calcium-sensitive contractile fiber system that physically links the basal body complex to the underlying nucleus in Chlamydomonas and appears to play a substantial role in several critical aspects of cell activity (for a review, see Lee & Huang, 1990a). The deduced amino acid sequence of caltractin cDNA shares a significant sequence relatedness with members of the EF-hand superfamily of calciummodulated proteins that include calmodulin and muscle troponin C (Huang et al., 1988a). The linear sequence identity of caltractin with calmodulin, for spans four well-described EF-hand example, calcium-binding domains. An EF-hand is a linear sequence of about 30 amino acid residues wherein two nearly perpendicular helices flank a calciumbinding loop (for a review, see Kretsinger, 1980). On 00zz~z~~s/91/170175-17
proto-splice
175
0
1991 Academic
Press Limited
176
V. D. Lee
A general model for the evolutionary history of this superfamily of proteins has been proposed, where a common progenitor gene specifying a single calcium-binding domain has been duplicated twice to give rise t,o an ancestral gene with four domains, the different lineages of this family having evolved by remodeling the structure of this primordial gene (e.g. Weeds & McLachlan, 1974). To correlate t)he structures of the genes coding for the members of the ID’-hand superfamily with protein structural and evolutionary data can prove to be valuable, because knowledge of the intron-exon structure of the gene members, when present, may provide insights into the evolution of the individual member genes and of the gene family as well. Tt was primarily for t,his reason that’ we determined the structure of t’he Chlamydomonas caltractin gene. we reasoned that the information Secondarily, obtained from the caltractin gene would add to our knowledge of gene structures in Chlamydomonas and to our understanding of how transcription of genes this organism. Given that is regulated in Chlamydomonas has proven over the years to be a valuable experimental system for genetic and biochemical analysis of cellular activity because of ease of manipulation and its well-established genetic tractability (for a review, see Harris. 1989); insights gained into the molecular mechanism of gene expression in this organism will only expand its usefulness as a model system for further experimentations. We report here the organization of the caltractin gene and the presence of six introns in the gene, whose positions bear no obvious relationship to the boundaries of recognized unit, structures in the mature protein. We further provide evidence that supports an intron insertion model that may explain the evolutionary history of the EF-hand superfamily of proteins.
2.
Materials and Methods
Restriction endonucleases. DIVA-modifying enzymes and S, nuclease were obtained from BoehringerLife Technologes. Inc.. Mannheim Riochemicals, Pharmacia LKB Biotechnology. Tnc.. and United States Biochemical (:orp. [cr-35S]dATP (1000 (X/mmol) was obtained from Amersham Corp. Sequencing nucieotides were obtained from Pharmacia. Oligonucleotides primers were prepared by the Core Facility at Research Institute of Scripps Clime with a model no. 380B synthesizer (Applied Riosystems, Inc., Foster City, CA). Poly(A)+ RNA was purified as previously described (Bolduc et al.. 1986). (h) Isolation and characterization of yenomic clones A genomic library was constructed by the EMBL4 lambda phage vector (J. Hicks & B. Huang, unpublished results) with EcoRI-digested, high molecular weight DNA from ~01~4. a coichicine-resistant mutant of C. reinhardtii wild-type strain 137~ with a single-base substitution in the /?2-tubuiin gene (Lee & Huang, 199Ob). The library was screened by plaque hybridization using as probe a
et al.
32P-labeied caltractin VDSA insert from pCaBP4 (Huang et al., 1988a). Of the approximately 90,000 plaques plated. 46 plaques gave a hybridizat’ion signal. Two of these “positives” were plaque-purified. R,estriction digest and Southern blot analysis revealed that thr 2 calones (Bontainrd identical insert,s. The insert from I of the ralones yielded on restriction digest a I’~11 fragment of about 3 kbt that hybridized to 2 non-overlapping .5’ and 3’ cDNA probes. This 3 kb I+uTT fragment was sub(aloneti into pLlT(”and M I3 vectors.
The 3 kb Pr~ll grnomic fragment was sequenced after subcloning into M13mp18 and MlSmpl9 vectors. DSA sequence data were obtained using the dideoxy rhaintermination method with both the Klenow fragment of DEA polymerasr I #anger rf al.. 1977) and a modified phage T7 DPiA polymerase (Tabor & Rirhardson. 1987) in the presence of [a-35S]dATP (Higgin rt al.. 1983). Roth the 17.mrr universal primer and synthetic oligonucleotidrs prepared against previously sequenced segments were used. Roth strands of the germmica clone were c*ompleteih sequenced. Sequence data were c>ompiled and analvzed using PC’ Gene software (Intelligenetics, Palo Alto. (‘A.) (d)
Tmnacript
mappiny
by an
iS ,
nucleaw
protrctiorc
cr.sstc,g
(I ) To detect the 5’ limit of t,he caltractin transcript. a single-stranded template (sense strand, 31113) was annealed to a synthetic oligonucieotide that was cornpiementary to the 3’ end of the 1st exon at + 147 to + 166 (bp+ 1 being the subsequently defined initiation of tran script). To generate sizing ladders. a portion of the mixture of annealed primer and single-stranded tentplate was used in a separat,e standard DSA sequencing reaction using [r-35S]dATP and the Klenow fragment- of I)SA polgmerase I. To generate a laheled probe. the remainder of the annealed primer-template tnixt,ure was extmdeci with the Kienow fragment of DSA polymerasr 1 in the pwsencr of [TX35S]dATP. The labeled probe-template l)KA mixture was extracted with phenol/chloroform and then preripitatrd with ethanol. The recovered DNA4 was used for S, nueleasr digestion. For s, nuclease digestion. the labeled probe. I’PSUS~ pendrd in water. was mixed wit)h 1.50 pg of yeast tRS.4 and either 0.25 or 0.5 pg of polv(A)+ R’XA from (‘. reinhnrdtii. The mixtures were dried under vacuum and then resuspended in 30 pi of hybridization buffer (40 rnM-Pipes (pH 7). 0.4 M-iYaC1. 1 mM-EDTB. and 80”,, (v/v) formamide). After IO min of incubation at 85°C. t,hr mixtures were immediately transferred to water baths for hybridization at eit,her SOY or WY for 3 h. A 300 ~1 ice-cold S, reaction mixturr (40 rnM-sodiutn acetate (pH 5-S). 250 tnM-Xa(li. I mATn(‘l,. PO pg denatured sonicated salmon sperm i)NVA/ml. I.7 units S, nuciease/pl) each was then added t,o thv hybridized reactions. After incubation at 37”(’ for 30 min. the reactions were adjusted to W2°,,, 81)s and 20 mM-EDTA, extracted with
I’henol/rhioroform and precipitated with ethanol. The recovered DNA was then mixed with a formamide-tracking dyr mixture and inruhatrd at, 9OY: for 3 min prior t,o electrophoresis on a denaturing gel. (2) For detecting the 3’ limit of the caitractin trailscript, a single-stranded template from phage Ml3 was similarly annealed to a synthetic oiigonucieotide that primed at 1 IO bp 3’ to where the longest caitractin cI)TS;A t Abbreviations base-pair(s).
used: kb. IO3 base-pairs:
bp.
Chlamydomonas
(pCaBP4; Huang et al., 1988a) terminated (+2422 to +2441). Sizing ladders were generated as before by using a portion of the primer-template mixture in a separate standard DNA sequencing reaction in the presence of [a-35S]dATP and the Klenow fragment of DNA polymerase I. To generate a labeled probe for S, nuclease digestion, the remainder of the primer-template mixture was extended as described above. The DNA solution was
Caltractin
177
Gene
----
--
-7
-
--
--
---
-
-a
--
then adjusted to the proper conditions for digestion with KpnI. which recognized a site at +2122, at 37°C for 1 h, extracted with phenol/chloroform and precipitated with ethanol. The recovered DNA, which contained a labeled fragment 319 bp long, was then hybridized to Chlamydomonas poly(A)+ RNA before digestion with S, nuclease in a reaction as described above.
(e) Analysis of sequences at introns
and equivalent positions A total of 25 genes in the EF-hand superfamily was analyzed in this compilation. The sequences were aligned with respect to their EF-hand structures. Because the amino and carboxyl termini and the regions joining the calcium-binding domains were very different among most of the proteins, the comparison was restricted mainly to the EF-hand structures. The alignment of EF-hand structures in the majority of the proteins was accomplished according to that reported by Perret et al. (19886). The remainder was aligned according to the positions of signature conserved amino acid residues in this protein family, as described by Perret et al. (19885). These are the leucine residues at the carboxyl end of the calcium-binding loop in the 1st and 3rd EF-hand structures; phenylalanine residues occupy the equivalent positions in the 2nd and 4th domains. The positions of hydrophobic residues in the helices of the EF-hand structures were also noted. The EF-hand structures were numbered I through VI, depending on the number of domains in the individual sequences. To simplify the analysis, each sequence was divided into EF-hand domains of 30 amino acid residues each (a 12-residue calcium-binding loop flanked by 2 9residue helices) and regions between domains (e.g. 6 and 7 residues in caltractin). Three of such aligned sequences are shown in Fig. 3. Intron positions were described by both codon and phase (Sharp, 1981). Six exon nucleotides on either side of an intron-exon junction or equivalent positions in non-intron-bearing genes were tabulated.
3.
(a) Isolation reported
library
of the caltractin gene previously
of a clone
that
H 100
TGTAA bp
Figure 1. Schematic illustration map and the sequencing strategy
of a part,ial restriction of the Chlamydomonas caltractin gene locus as obtained from the 3 kb PvuII genomic fragment. Filled boxes on the cDNA fragment represented by the line denote the transcribed regions of the gene. The direction and extent of sequence determination are depicted by horizontal arrows. Also shown is a cDNA clone for the mRNA encoding the Chlamydmnonas caltractin. Stippled boxes represent the protein-encoding regions, while open boxes represent the 5’ and 3’ untranslated regions of the mRNA. The locations of an initiation codon (ATG), a termination codon (TAA), and a poly adenylation signal sequence (TGTAA) are marked by the vertical arrows.
BamHI and PstI restriction endonucleases, they both gave a hybridization signal in fragments in the sizes of approximately 23, 7 and 3.5 kb, respectively (data not shown), thus corroborating the results obtained in previous genomic DNA blot analysis with caltractin cDNA (Huang et al., 1988a). Further restriction endonuclease analysis revealed that the 7 kb BamHI fragment harbored a PvuII fragment of approximately 3 kb that by itself could account for all the hybridization signal to both the 5’ and 3’ cDNA probes, indicating that it contained the entire caltractin gene. This 3 kb PvuII fragment was subcloned and sequenced.
(b) Structure of the caltractin gene
Results
the isolation from a to contain the full-length coding sequence of Chlamydomonas caltractin (pCaBP4; Huang et al., 1988a). Using the full-length caltractin cDNA as a probe, we screened a Chlamydomonas genomic DNA library for clones carrying the caltractin gene. Two of the positive plaques identified in the initial screening were plaque-purified. In Southern blot analysis in which separate 5’ and 3’ portions of the caltractin cDNA were used as probes, both clones were found to contain identical restriction fragments that hybridized to the probes. Upon digestion with EcoRI, We have
cDNA
T
t
TAA
appeared
Figure 1 shows the physical map of the PwuII the Chlamydomonas that contains fragment caltractin gene locus. Upon a comparison with the sequence of the cDNA in pCaBP4 (Huang et al., 1988a), the coding regions for caltractin were found to be organized in seven exons. The open reading frame of the caltractin cDNA encodes a polypeptide of 169 amino acid residues; the nucleotide sequence of the coding segments in the genomic clone is in complete agreement with that of the cDNA (Fig. 2). The locations of the six introns that interrupt the Chlamydomonas caltractin gene do not coincide with any of those determined for the genes of calmodulin from Chlamydomonas or other eukaryotes or of any other member of the EF-hand superfamily of
-247
et al.
V. D. Lee
178
-128
CTGCTCAAAATTGTTGCGTGCCCAGGlACCGGCAGTGAAATlTGCCCGCAACAGTCTlClGCAGCAGGlAGCTCCACTGGCCGTGAGCCTAGAlGTTGTGAGTGTACGTTTTCTTGATCA p-.
-127 -7
GATTGATGATGGACGCAGCTCTTTCGTGTMGTCTGCTGCTGGAGCTCAGACCGGGATGGGGCTCAGAATTGTGCCCATGCGCTCAAAATTTCGGAGCCTTGACCGGGCTCGACCTCACG -... q * .".b.._._..~ . b:-'-"-. . +1
-8
TAGAACCTMCCCGCCACMAGCCTGATCATTTATC~CCGTTCTAGCTTCCTMGTCGA~CATGAGCTACAAGGCAAAGACCGTCGTCTCCGCTCGTCGGGACCAGAAGAAGtGC
113
MSYKAKTVVSARRDPKKC I....... 114
CCCGTTGGCCTGACTCAGGAGCAGAAGCAGGAGATCCGCGAGGCATTCGACCTGTAGGTGTTAGCCTATATGACCTTGGGGCCGCTCAGCAGATTGAAATTGGACCAGGCGCCCTCTGTT
233
RVGLTEEPKPEIREAFDL.I.
.
.(
.
.
.
234
TGAACTTCAGCGCTTGAACTTGACCCGGTGCCTGACTTCTATACATGCAGCTTCGATACCGATGGCAGCGGCACCATCGACGCCAAGGTACGCGCTCTCCCGCGGCTTTGAGCCTGGCCT . .. . . -FDTDGSGTIDAK-
353
354
TCCTCCGCATCCGTCCAGCGMCCGGGAACCTTCCAGCGCCCAGCTTTCGTTCTCAGCAATGCAGCATACAGCCTGACATCAGAATGTTGTTGCAACGAGCCAGCCTTGGGCCAGCTCGC
LA
.I 474
.
.
.
GCTACAGTGTACCCGGGCTGTCCCTGCGCCTTClGCGCTC~CCTGACCGCTCCGTCGCTCGlTTTCGCTCTMCAGGAGCTGAAGGTGGCCATGCGTGCTClGGGCTTCGAGCCGAAG . .. . . -E L K V A II R A L G
593 F
E
P
K
.I........... 594
AAGGAGGAGGTGAGACCGCCTCTTGGGCTTCATGTGTTCCMTTGTT~CTGTMCATTGACCGCGAGTTATCGCGCTCACCCGCGCGTClTCCCTTTCCTTGCTTGCCTGAlGlAlAlT
713
KEE714
AtCTATtTTTACGCACAAAGAGGTCGCAGlTTGCATGCGGTCGGTGCGGACTCCCGACCTTCCAACTTCAGTGGGAACTAAGCCGCGCTCGCACACTTCACGTTACACCGACGTAGAGGG
833
a34
GATGCGTACTGCGCGGtCGAAACGTGTCTTGGCGTGCGCACATGCTTTATCTGCATGCGTGGGTGTGGGCATAGCGGCGACAAAGTGTGCCGGTGACACGGCGTGTCTGTGGCCGCCAGC
953
954
CTTGAACGCGCATGCGTTAGATAGACGGCGGCCAAGGAGGCCTTGGCATGCCAGAACGCGAGACATGllGCATGCATGCACGACATAACCTCATGCTTCGTTGCACCGCCGCTTCCTAAC
.. . . .
.I......... 1074
K
K
HIS
E
I
D
K
D
G
S
G
TID
F
E
.). 1194
1193
CCTCAAACGCCGCGTCCCTTGCA~CCCTCAGATCAAGAAGATGATCTCGGAGATTGACAAGGACGGCTCGGGCACCAlCGACTTTGAGGAGTlCCTGACCATGATGACCGCCAAGAlGG -I
E
F
L
T
M
I4 T
.
1073
A
K
.
M
t
. 1313
GCGAGtGTCACTCTCGAGAGGAGATTCTG~GGCGTTCCGGCTCTTCGACGACGACAACTCGGGCACCATCACCATCAAGGlGAGCAlTGTGGGGTAGlCGAGGGGGCGTTGCCTGGGTC ERDSREEILKAFRLFDDDNSGTITIK.(.
1314
1434
.
.I. lL33 L
T
E-
AGGCTtGCCGGGGATCGAGCGTGAAAGCAACCTlACAAGATGGGACAGCCCGCGGAGGCGCCGGTCTCCGGCCCCAAGGATGTGCAGACTTCGTACTGGTGCACCGCCTGACGTGTCCAC .
1554
.
CGtGTTtAGGAGAGGCTGGGACAGGGCCATG~CCGTACAGACAAT~CGCTTCCTTATTTGTCTTCACAGGACCTCCGGCGTGTGGCCAAGGAGClGGGCGAGAACCTGACTGGTGAGT .. .. . -D L R R VA K E L GE N
I
.
.
.
.
.
.
.
I.
:*-*
ACGCTCTGGCTGTACGCAGAGGAGGAGCTCCAAGAGATGATlGCGGAGGCGGACCGCAACGACGACAACGAGATCGACGAGGACGAGTTCATCCGGTGAGTGAlGTGTTAGlTCCTCTGC
1553 . 1673
-EELOEMlAEADRNDDNEIDEDEFlR1674
GGGCTGCGTGTCtGCGTCCTGGTCAGCTCTGTCACGTGTGCACACGGTAACCCTCCTCAAGGAGTATGCAGCCGCCAGAAGCACCACATGCGCATCCTTTTGACCAGGTGTTATCGCTGC . ... . .I.........,.
1793
1794
TCCCCAACA~GATCATGAA~MGACCTCGCTCTTCTAAACGA~G~GG~TGGA~TGGG~~TAT~CATCTTGA~G~CGGGG~TTG~TTT~A~~G~~~TGGGGATGTGAGAGTGGTCTCATGC
1913
-1MKKTSLF' 19lL
AAATTGAGAGCAATGCGTTGGTCAAGCAGCCGATGATTGGGCGGCGTGGCGAGCATTCCTCAGTATAGATTATTTGACACGGACGCTACCGGTCTGTTCCACCAGTACCACCGCACGTGT
2033 2153
21%
2273
2274
2393
2396
2513
2514
2633
2634
Figure 2. Kucleotide sequence of the Chlamydomonas caltract,in gene. Also shown is the deduced amino acid sequence of the protein: the single-letter amino acid code is used. The non-coding strand of the gene is shown in the 5’ to 3’ direction. The 5’ and 3’ limits of the transcript, as determined hy an S, nuclease prot,ection assay. are delineated by + 1 and 3’. respectively. Vertical lines define the limits of the intravening sequences. Nucleotides adhering to t,he GT/AG rule at splice junctions are marked by underlining. Nucleotide sequences matching the consensus for the putative branch point of lariat formation are indicated by dots underneath the lettering. The putative polyadenvlation signal sequence TGTAA, as determined by an S, nuclease protection assay, is underlined with carets. Also underhned with dashes is the sequence TGTGA upstream from the TGTAA sequence (see the text). Asterisks denote the 3’ limits of cJ)R’As. Direct repeats of 10 or more bases in the 5’ and 3’ flanking regions are denoted by horizontal arrows. The nuclrotide sequence data reported here are available from EMBL/GenBank/DDBJ under the arcession number X57973.
Figure 3. Locations of introns in the Chlamydomonas caltractin gene sequence in relation to protein domains. Also shown are those in Chlamydomonas and rat calmodulin. The amino acid sequences are aligned with respect to their predicted EF-hand structures. The boundaries of each EF-hand structure are marked by vertical lines. Regions constituting the putative calcium-binding loops of the calcium-binding domains are shown figuratively as incomplete circles and the putative flanking helical structures as straight lines ajoining the loops. Arrowheads indicate the positions of introns. Three classes of introns as defined by Sharp (1981) are indicated by a number adjacent to each arrowhead: class 0 introns occur between codons, class I introns interrupt codons between the 1st and 2nd nucleotides, class II introns interrupt codons between the 2nd and 3rd nucleotides. Pu’umbersafter the translation initiating methionine residue (M) and at the end of the sequence indicate the number of amino acids not shown. The single-letter amino acid code is used.
V. D. Lee
180
et al.
Table 1 sequences of the splice sites and branch sites in (Jhlamydomonas
Sucleotide
Branch site
5’ Splice site A
3’ Splice site
Chlamydomonas caltractin
6
caltractin yene
gene Intro” 1
1166/1671+
CGACCTiGTAGGT
CTTGAC..
23 bp*
ATGCAGICTTCGA
Intro” 2
(320/321
I
GCCAAGIGTACGC
CCTGAC..
26 bp
TAACAGIGAGCTG
lntron
(602/6031
GAGGAGIGTGAGA
CCTAAC..
29 bp
C C T C A G
lntron 4
(1273/12741
ATCAAG/GTGAGC
AATAAC
25bp....
TCACAGiGACCTC
Introll
5
(142711426)
TGACTGIGTGAGT
CCTGAC.
24 bp
A C G C A G i A G G A G G
lt57211573,
Intro” 6
1164W1649)
CATCCG/GTGAGT
TTTGAC..
23 bp
C A A C A G i G A T C A T
11803116041
3
(2831264) I55115521
/A
T C A A G
(1105111061 113E4/13851
Chlamydomonas genes’~ Positton
+6
+5
+4
13
+2
A
20
10
12
18
26
tl 2
-1
-2
-3
-4
-5
-6
0
0
9
35
0
7
8
4
19
33
-6
-5
-3
~2
-1
+I
*2
+3
+4
+5
t6
15
56
6
4
4
20
1
56
0
10
11
16
12
10
10
3
5
0
42
26
28
2
54
0
0
10
23
15
12
19
23
32
0
3
9
6
25
0
0
56
31
12
13
20
21
11
4
0
5
17
18
9
1
0
0
5
10
13
12
6
12
2 100
0
16
20
27
21
18
18
0
16
41
27
21
34
41
100 55
21
23
36
38
20
16
23
21
11
21
0
C
9
22
24
26
14
6
0
0
4
16
0
13
G
10
S
15
10
8
45
56
0
43
5
56
3
17
7
0
T
17
15
5
2
6
3
0
56
0
0
0
33
12
12
53
A
36
16
21
32
46
4
0
0
16
63
0
13
14
7
0
C
16
39
43
46
25
11
0
0
7
29
0
23
34
59
5
18
16
27
16
14
60 100
0
77
5
30
13
30
27
9
4
14
0
59
21
21
~-4
Number
27 100
11
7
7
36
9
0
75
46
50
4
0
57
0
5
16
11
95
7
0
9
30
32
96
0
45
0
0
16
2
0
Percentage G
5
0100
0
9100 0
0
9
C Chlamydomonas cOnSenS”* NNC
D
C AAG/G
TG
AG
7
N C T G A C..
15-51
bp..
CCG TTACAG/GCNNNC
Eukaryotrc CO”S*“*“S
”* ll
A A CAG/GTGAGT
.....c
T
A TGAC
T
14-53bp...
TT T CCNCAGIG
t Sumbers in parentheses denote the positions ofthe nucleotides at the rxon/introu junctions. I being thr putativr transcription Art site (see Fig. 2). 1 Sumbers arc the numbers of bases the branch sites are from the 3’ splice junctions. 5 The nucleotide sequences at the splice sites and branch sites in 14 Phlamydomonats nuclear genes (3 introus) were tabulated. l%wh value in the Table represents the number or percentage occurrence of the nucleotide shown at the left in the position numbered at thr top. At the splice sites, nucleotides denoted by + represent those in the won: those denoted by - reside in the introns. The srquencw were taken from de Hostos et al. (1989), Fukuzawa ot ul. (1990), Goldschmidt-C’lermont & Rahire (1986). Srhloss (1990). Silflow rf rcl (1985), Williams rf al. (1989), Woessner & Goodenough (1989), Youngblom it ~1. (1984). Zimmer et nl. (1989) and this work. 11 Consensus sequences for the 5’ splice junction (donor site) and the 3’ splice junction (accept)or site) were taken from Mount (1982). T (:onsensus sequence for the putative branch site was taken from the data of Keller & Noon (1984. 1985) and Brown (19X6).
calcium-modulated proteins. Figure 3 shows, as an example, the positions of the introns found in the Chlamydomonas caltract’in and calmodulin genes and of those in the rat calmodulin gene. Like the calmodulin gene from Chlamydomonas, but in contrast to most of the intron-bearing genes for calmodulin and other calcium-modulated proteins from other eukaryotes sequenced to date, the six introns identified in the caltractin gene are all located within the sequences encoding the mature caltractin gene product. Similar to the genes coding for all the proteins of the superfamily characterized to date, the positions of the introns in the Chlamydomonas caltractin gene do not appear to correspond to the locations of the boundaries of the potential calcium-binding domains. Five of the six introns are located at’ positions in the gene that
encode parts of the first. third and fourth EF-hand motifs. and two of these five interrupt the gene at positions that correspond to the first and third calcium-binding loops (Fig. 3). The remaining intron is situated in the region that links the third EF-hand structure to the fourt,h. (c) Intron structuw Nucleotide sequences within the six introns in the Chlamydomonas caltractin gene do not appear to exhibit significant identify with one another. The nucleotide sequences at the intron-exon houndaries (Table 1) display a good agreement with the conse11sus sequences observed in the nuclear genes from Chlamydomonas (56 introns from 14 genes) and other eukaryotes characterized t,o date (Mount.
Chlamydomonas
Caltractin Gene
181
r3
T
L A A
A
G
C C T C
G G A A
C
Markers
Markers ACG T
6,
ACGTabcdef
61 a
bcde
f
T
G G C C C
G A G C T
G G
f
A
c
G T G C A T C T T G
160
G
-
-A -T -T
(a)
(bl
-C -T
-G
-G -A i
L5
Figure 4. Localization of the transcription initiation and termination sites of the Chlamydomonas caltractin gene by an S, nuclease protection assay. A representative autoradiogram from experiments designed to map (a) the mRNA cap site and (b) the polyadenylation site is shown. Labeled probes were hybridized with Chlamydomonas poly(A)+ RNA (160 ng, lanes a and c; 80 ng, lanes b and d) at 55°C (lanes a and b) or 50°C (lanes c and d) for 3 h. Controls with no added poly(A)+ RNA (lane e) and S, nuclease (lane fi are also shown for comparison. In each case the size of the protected DNA fragment from an S, nuclease protection assay was determined by comparing its migration with that of products generated by a standard DNA sequencing reaction using a single-stranded DNA of known sequence as a template (markers). In (a) the DNA and primer used were the same as in the S, nuclease protection assay; the sequence of the protected fragment was obtained directly from the marker lanes that had been run alongside in the denaturing gel. The nucleot,ide sequence of the coding strand is shown at the left; the putative transcription start point is indicated by an arrowhead and the A+T-rich region (see the text) at around -39 is bracketed. The extent of the protected fragment is represented by the downward arrow.
1982; Padgett et al., 1986; Oshima & Gotoh, 1987). They all carry the signature GT-AG, the virtually invariant dinucleotides found at the splice junctions of eukaryotic precursor mRNA. The 5’ donor and 3’ acceptor junctions by and large conform well to the
consensus splice-site sequences. The introns in the caltractin gene also contain sequences that conform to a consensus internal splice signal. Furthermore, the nucleotide sequences around the acceptor sites in the caltractin gene resemble those observed in the
182
V. D. Lee
genes of other eukaryotes in having extended pyrimidine-rich regions (Brown, 1986; Oshima & Gotoh, 1987). For example, from -20 to -5 (relative to the 3’ splice point) the six introns contain an pyrimidines and there is an average of = 73% absence of AG within this region in all the introns.
(d) 5’ Flanking
region
The transcriptional start site of the caltractin gene was determined by an S1 nuclease protection assay with Chlamydomonas poly(A)+ RNA using 35S-labeled DNA that was obtained by priming a single-stranded sense DNA with a synthetic oligonucleotide. As shown in Figure 4(a), lanes a to d, a transcript of 166 bases was protected from digestion by S, nuclease when the poly(A)+ RNA was hybridized 1 to the 35S-labeled probe that began at the 3’ end of the first exon. When compared to the products of a standard sequencing reaction obtained by using the same single-stranded DNA and oligonucleotide, the 5’ protected product ended at a T residue 59 nucleotides upstream from the translation initiating ATG codon. A second synthetic oligonucleotide that was complementary to a region within the first intron was used in an identical reaction, and a protected fragment of 166 bases was again recovered (data not shown). These data identified the T residue, numbered + 1 in the genomic sequence (Fig. 2), as the most probable transcription start site. immediately of the sequence Examination from this putative mRNA cap site upstream revealed several nucleotide sequences that may play a role in the expression of the caltractin gene. An A + T-rich region centered at, around - 39 is flanked by domains of high G +C contents (10 of 14 nucleotides immediately upstream, and 20 of 28 nucleotides immediately downstream, from this A +T region are either G or C residues). Such a low stability stretch amidst stable domains is analogous to the canonical Hogness-Goldberg “TATA box” (Bensimhon et al., 1983) that is usually found at around -30 and could act as a recognition signal Chlamydomonas. for RNA polymerase II in Analogous features have been detected in the promoter regions of several Chlamydomonas nuclear genes (Youngblom et al., 1984; Silflow et al., 1985; Goldschmidt-Clermont & Rahire, 1986; Schloss, 1990). Indeed, a similar sequence of high G + C content abuts the 3’ side of the TATA homology in the Chlamydomonas ul -tubulin gene, and it has been shown to be a crucial transcriptional element when assayed after injection into Xenopus oocyte nuclei (Bandziulis & Rosenbaum, 1988). The transcription regulatory element “CCAAT box” (for a review, see Mitchell & Tjian, 1989) that usually occurs at the -90 to -70 region is not found, nor is a sequence similar to this motif readily discernible. Figure 5 lists other sequence features that may be of importance in caltractin transcription. It has been suggested that, the consensus sequence
et al.
GCTC[G/C]AAGGC[G/T][G/C]B[C/AI[C/A]G. where B is G, C or T, which appears several times 5’ to the transcription start site of each of the four tubulin genes in Chlamydomonas, may play a role in the coordinated regulation of t,ubulin gene expression (Brunke et al., 1984). Several nucleotide stretches that bear similarity to this consensus sequence occur upstream from the putative transcription start site in the caltract’in gene. However. only t,hree of these sequences have sequence identity to the consensus that) exceeds 600/b in the first ten nucleotides. which comprise a more conserved core element in the tubulin promoter regions. We also note the presence of three nucleotide sequences that bear identities of 70 %, 90 y0 and 90 96, respectively, to the enhancer elements found in simian virus (SV) 40 early and the human metallothionein 11, genes that constitute the recognition sites for the t,ranscription factor AP-2 (Imagawa et cd.. 1987).
(e) 3’ Flanking
region
The 3’ limit of the caltractin transcript was determined by an 8, nuclease protection assay on Chlamydomonas poly(A)+ RNA that had been hybridized to a labeled probe complementary to the gene from +2122 to + 2441. The results are shown in Figure 4(b), lanes a to d. A fragment of 184 bases was prot’ected from digestion by S, nuclease, indicating t,hat transcription of the caltractin gene terminated at or around a T residue seven nucleotides downstream from the pentanucleotidr TGTAA. This suggests that the sequence TGTAA at +2295 was t)he polyadenplation signal for the transcript and is in agreement with previous results from sequencing caltractin cDNA (Huang et al.. 198Xa). in which the longest cDNA clone contained a poly(A) tract 13 nucleotides 3’ of this TGTAA sequence. The sequence TGTAA most likely represents the conserved polyadenylation signal in Phlamydomonas nuclear genes, as it has been detected consistently in t,hc 3’ untranslated regions of genes charact,erized from this organism (references in Table 1; Mayfield et al.. 1987: Simard et al.. 1988: Yu ck Selman. 1988: Fukuzawa et a,Z., 1990). We reported previouslv t,hat one of the caltractin cDNA clones characterized had a termination point 13 nucleotides downstream from the pentanucleotide TGTGA located at +2196 (Huang et al.. 1988a). suggesting that the sequence TGTGA could also serve as a termination signal for a shorter transcript. However, no protected fragment of the size other than 184 bases was recovered in the S, nuclease protection assay. Tt, is possible that the transcripts that recognize the TGTGA sequence as a polpadenylation signal c>omprise only a very small percentage of the total caltractin transcripts, as indicated by its singular presence among the cDNA clones obtained. such t’hat the weak signal from the protected fragment, was not detected in the assay. Alternatively. t’he shorter cDNA was a cloning artifact,
Chlamydomonas
Consensus
repeated
sequence
in
Caltractin
Gene
183
GCTC,G
AAGGC$;B;;G Chlamydomonas tubulin genes ______--___________-~~~-~~~~~~_~~~~~~~~___-___~~~~__--_-__~------____-_-_--______
(-245)
k i I i A 1 i A T T ; T I G ;:;
(-89)
S~~G~LGCT~A~*~SG
(-92)
. . . . . . GCTGCT-GGAGCTCAG
(-66)
;;
t-230)
.
Chlamydomonas
c-744)
caltractin
. . . . . . t-78)
; ,,, iiiiT T G I k d ;:; A
(-50)
Consensus AP-2 recognition gCC;N;;E EC sequence ____-_-----_--_____---------~~~~-----~-~~~-~----------~~~~----~---~~~~~-----~~~--
(-229)
;
C-55)
T G C C C A T G C G
( -65)
C G G G G T A G G G
. Ch/amydomonas
caltractin
G ;
;: :
1 ;
;
T A
c-220)
. . . . . . . . C-46)
G;;~S~IS;s. c-74)
Figure 5. Comparison of nucleotide sequences in the 5’ flanking region of the Chlamydomonas caltractin gene and those in other genes. Nucleotide sequences in the 5’ flanking region of the Chlamydomonas caltractin gene compared with the nucleotide sequence of the repeated 16 bp consensus sequence found in the region upstream from the putative “TATA box” in the 4 tubulin genes in Chlamydomonas (above) and the AP-2 binding site (Imagawa et al., 1987) found in the enhancer regions of SV40 early and the human metallothionein II, genes (below). Numbers in parentheses denote the positions of the starts and ends of the nucleotide sequences relative to the putative transcription initiation site. Nucleotides identical with those in the consensus sequences are denoted by dots.
(f) Evidence
for a proto-splice
site
Figure 6 shows the intron positions of 21 intronbearing genes in the EF-hand superfamily out of the 25 surveyed. The intron positions are described both by codons within unit structures and by phase. There is a total of 43 different locations within the coding sequences for these calcium-binding proteins (38 in those with 4-domain structures, plus 5 from the extra domains found in calbindin). Table 2 shows the results of a compilation of the nucleotide
sequences flanking introns or at intron equivalent sites. In all, 818 such sequences were tabulated. A consensus sequence of [A/G]AG[G/A] was obtained. Dibb & Newman (1989) have reported a similar consensus sequence, the proto-splice site, on examining the unrelated intron patterns in the actin and tubulin genes from different phyla. We note that by employing the process of Dibb & Newman, in which a consensus sequence was first derived at each intron or intron equivalence and all consensus
CAM
CAM
MLC3
MLCZ
calcyclin
?? no
Chlamydomonas
9k
I
CAM
cqltractin
CDCJI
CB P- 2 5
SOCCnarOmyCCs
tetrahym
Chicken LCAP Human SCAP
Human
Chicken celbindin Rat calbindin-D
Drosophila
Rat pawalbumin Rat HLC2
Orosophilo
Sea
urchin Spcc Rat HLC3 House PlLC3 Chicken MLC3
CMamydomonas
CAM
CAM
Schizosaccharomycrs
Sacchoromyrrs
Paramecium
Nematode CAM Trypanoso~c CAM
Drosoph;/a
CMI
CM4
Rat
Chicken
---+__-------_-____+
+--------0--_+-m-e-_
Nn
I
n
I 1~1
n
n I~‘I
n
m
NNNYYYYYYYYYYNYYYYYYYyyYYYYYNYYYYYYYNWYYKKK
_-_________-________
--a*--_+-
_-me__--+-____-_____
-+_-_-_-_-__-_-__+
-+---_-_
--------
--_+---__-
-+_--__--_--__+
-0__-_---____+
-___________
-----__-__+---
--+--___
_-__________________
?? ----+---~-----+----+--_-__*-_
-+---_--_
----..__+-----_+--___
+-+----+
_-+___
_-_-__----____-____+
++-_--_++__--_-_-_
_-_+___---_________+
---+__------__-_-__+
++____-_-_
++--_---_-
_---•___----+_-_--_._
----__+---__-_-+____
_-__________________
_----_____--__-_____
++-_--__
-_-__-+_-_
+_-__-_-___
____--_-
_---e-e-----__-e-m__
_---________________
--s----m
_----______--_-___+_
_____-__
----+-_----+-_----__ _---+------+-_-_____ _---+____---__-_____
a___--__
++--_---++-_---+_-_--__
positions
7
n_
Iv
+
_
_
_
_
_
_
_
_
_
_
_ _ _
--_--+
____-_
_
_
_
_
-
_
??
_
_
+_-__--_
-
______
+_____
_-_-_-+_
-----+-
--e-+-
w-e-+-
--_*-+-
____+_
_-_++_-_
______
______
______
______
______
_
_
_
_
-
-
_
-
_
-
_
-
-
--•*+-+-+
____+_ _-___+_ ____+_
_
_
_
_
+
+
-
-
?? +
Yamamnto (1987): 9. Zimmtv d cd. (1988): IO. Hardin 4f al. (1985): Il. Strehlrr rt nt. (19%): 12. TLohrrt rt nl. (1984): 13. Xltheshima ut trl. (198-t): 14. Fktlkenthal rt ai. (198-k): 15. Krrchtold PI ni. (1987): 16 Sudrl et al. (1984): 17. Parker rf a/. (19%): IX. \2Ylson vt al. (1988): I!). Perret cf rrl. (19X&l): 3). Fvrrari c,f trl. (1987): 21, Ohno d trl. (198-C):Emori d nl. (1986): 22. Miyake of cl/. (1986): 23. Takrrnnsa pt rrl. (l!M!l): 24. Haunt rt r/l. (IRHB): 2.5. this work. Ahbrvviwtions: (‘.AM. c~alrnodulin: ML(‘3. rtl,vtrsin alkali light (*hair, X MJA’2. myosin rrgulatoyv light chain: IA’AJ’. c*alpain. large subunit; S(‘A\J’. cxlpain. s~nall subunit: f’IN’d/. (‘/j(‘;j/ grrjr product.
Figure 6. 1)istrihution of introns in the penrs of the EF~hantl suprrfamily Srcptw~~s ww alignr4 with rrsprcat to their prrdic~ted RF-hand struvturrs. 3 of nhic>h HIV drpipted _4 srhpmatic drawing of’the various designated domains in the srquenc~ is shown at in Fig. 3. Sequenrps werP divided into EF-hand domains (I to 1’1) and intcrdomain regions. thr hottom of the Figurr. The intron positions are drscvibrtl by both phase and c~xion. For example. N/T represents the region between the amino terminus of the scc4uencr and of an intron the 1st EF-hand domain: I-S.2 denotes the intron that oc(‘urs hrtwrrn the 2nd and the 3rd nuclrotidrs of’thr righth Ceylon in the 1st EF-hand domain. The pr~en~‘e in a xeyuenw is drnoted by “+ ‘.. its absence by “- ‘.. Rlanks represent where equivalent intron J)ositions ~VVW not found. either because donlains were missing (r.g. domain I in }x%‘v~lburnin) or were of u~wquai sizrs to make alignmt=nt non-spJ)lirahIe (r.p. interdomain legion JjJl). The distribution of intronx at rash position is describrd by y or E. which drnotrs if its presence (aan be explained by a single intron Rain (sw t,tw text for drtails). Sequences were taken from the following references: I. Sojirtra dt Sokahe (1987): 2. Putkey ef al. (1983): Epstein rt ~1. (1987); 3. Smith ~1 ~1. (1!#7): 1. Salvato rt al. (1986): 5. Tschudi rf rrl. (1985): 6. Kink rf crl. (1!190): 7. I)avis r/ al. (1986): 8. Takeda 8r
I 2 3 4 5 6 1 8 9 10 II I2 13 14 I5 16 17 18 I9 20 21 22 23 24 25
Intron
Chlamydomonas
185
Caltroxtin Gene
Table 2 A tabulation of nucleotide sequences at actual and equivalent intron positions in the member genes of the EF-hand superfamily Position
-6
-5
-4
-3
-2
-1
+l
+2
+3
+4
+5
f6
Number
A C G T
224 174 262 158
264 176 148 230
181 171 251 215
312 156 260 90
351 146 139 182
207 87 412 112
242 120 359 97
246 209 125 238
214 170 218 216
215 185 210 208
184 174 202
203 181 260 174
Percentage
A C G T
27 21 32 19
32 22 18 28
22 21 31 26
38 19 32 11
43 18 17 22
25 11 50 14
30 15 44 12
30 26 15 29
26 21 27 26
26 23 26 25
22 21 25 32
25 22 32 21
N
N
N
A G
A
G
G A
N
N
N
N
N
A. Actual and equuivalentintrmst
Consensus B. Actual intronsf Number
A C G T
30 17 19 19
24 19 12 30
14 20 37 14
43 20 13 9
37 12 9 27
4 3 74 4
28 9 42 6
25 28 14 18
16 17 15 37
18 15 17 35
13 23 21 28
24 20 26 15
Percentage
A C G T
35 20 22 22
28 22 14 35
16 24 44 16
51 24 15 11
44 14 11 32
5 4 87 5
33 11 49 7
29 33 16 21
19 20 18 44
21 18 20 41
15 27 25 33
28 24 31 18
N
N
G
A
A T
G
G A
N
T
T
N
N
N
H
N
A C
A
G
G A
N
N
N
N
N
Consensus C. Proto-splice site$
Six nucleotides on either side of positions at which introns occur or potential introns could have occurred in the genes of this superfamily, as listed in Fig. 6, were tabulated; the Table shows a summation of the data. Each value in the Table represents the total number or percentage occurrence of the nucleotide shown at the left in the position numbered at the top. Numbers denoted by represent those present on the 5’ side of the intron positions, + denotes those present on the 3’ side of the intron positions. t A summation of data from all actual and equivalent intron positions is shown. $ A summation of data from actual intron positions only is shown. 4 Taken from data of Dibb & Newman (1989). H: a guanine residue tends to be excluded at this position.
sequences were then tabulated, an identical consensus sequence of [A/G]AG[G/A] was again obtained (data not shown). 4. Discussion (a) Chlamydomonas
caltractin gene
The six introns that interrupt the caltractin gene in Chlamydomonas are generally small, ranging from 111 to 503 nucleotides in length; with the exception of the third intro+ all are less than 240 bases. Relatively short introns appear to be a feature in Chlamydomonas nuclear genes (see Table 1 for references). This is in contrast to introns observed in most other eukaryotes, whose lengths range from approximately 50 bases to thousands of bases without apparent periodicity (Naora & Deacon, 1982). The exons in the caltractin gene have a narrow size distribution; all except the fourth are about 100 bases or fewer in length; the coding regions in the first and last exons are 107 and 25 nucleotides long, respectively. The tendency toward uniformity in exon size has been observed to characterize many eukaryotic genes (Naora & Deacon,
1982). As mentioned above, sequences at the intron-exon junctions and at the branch point of the caltractin gene agree well with the consensus compiled from the analysis of sequences Chlamydomonas nuclear genes, which in turn conform to those compiled from other eukaryotic plants and animals. In general, except for a lack of homology to the “CCAAT box” and “TATA box”, a different polyadenylation signal (TGTAA instead of AATAAA), and a biased codon usage, Chlamydomonas nuclear genes adhere strongly to the structure of eukaryotic genes.
(b) Zntron distribution in the EF-hand superfamily Member proteins of the EF-hand superfamily have been proposed to have arisen as a result of two successive duplications of a gene encoding a primordial single domain. Indications of gene duplications and, therefore, hints at the genetic evolution of this superfamily of proteins may be found in the gene structures. Introns, if present, in the early primordial domain may have given rise to a recognizable pattern of distribution, a vestige of their concomi-
186
8. D. Lee
tant duplication with the exons. The six introns in the caltractin gene do not fall typically into positions that correspond to the boundaries of the calcium-binding domains, nor do they coincide with those of the introns characterized in other members of the EF-hand superfamily. Indeed, when the sequences of all the members of this superfamily whose genomic organizations are known are aligned with respect to their EF-hand calcium-binding domains, as shown in Figure 6, a non-coherent distribution of introns is immediately apparent. There is no intron that is present in every gene; the introns do not fall systematically between domains, nor do they even share the same locations within different domains.
(c) Evolutionary
signi$cance
of intron positions
The intron-extron structures of the genes in this superfamily have been studied to gain insights into their evolutionary significance; however, earlier analyses were hampered by the limited number of gene structures available. Hardin et al. (1985) and Smith et al. (1987) observed an apparent lack of general correlation of intron positions with domain boundaries and yet a high degree of similarity of intron positions among the genes that they examined, and they proposed that the present-day genes of this superfamily arose relatively recently from a common, highly evolved, progenitor in which four introns (N/I-1,0; I-1,1; II-12,1(13,1); and IV-21,l in Fig. 6) had already interrupted the four calciumbinding domains. Wilson et al.(1988) re-examined the apparently random placement of introns in this superfamily and proposed an evolutionary history for the genes based on intron insertion following the completion of a four-domain ancestral gene that harbored the same four introns. From a site-by-site comparison of the amino acid sequences of the EF-hand structure in many members of the superfamily and a positional analysis of introns in genes available, Perret et al. (19883) proposed that introns I-30,1 (found only in the rat myosin regulatory light-chain gene) and III-30,l (found only in the myosin alkali light-chain and the calpain genes), both being phase I introns interrupting a conserved glycine codon and at homologous positions, represent the vestigial introns from the first gene duplication event in the genetic evolution of the superfamily. Intron II/III-2,l (detected only in the rat myosin regulatory light-chain gene), a phase I intron in a glycine codon, a non-conserved residue, in interdomain region II/III was proposed to represent a relict of the second gene duplication by analogy. The relevant lineages within this superfamily would have evolved by remodeling the ancestral gene, accompanied by the loss and gain of introns. Such pathways of divergence, as proposed above, would have involved a relatively complicated remodeling of the ancestral gene, when the structures of the member genes of this superfamily characterized to date are taken into account. To generate the
et al.
calmodulin gene in Chlamydomonas (9 in Fig. 6). for example, the four-domain progenitor would have to have lost all of its early introns but to have gained back five new ones. The same ancestral gene would have to have selectively discarded all but one of the early introns it bore: the number of which would depend on the complexity of the scheme. to become the calmodulin gene in the fission yeast Schizosaccharomyees pombe (8 in Fig. 6). Indeed, the non-coherent nature of intron positions that characterizes the gene members of this superfamily cannot, be reasonably attributed to movement nor to removal, because intron movement’ calls for many of the introns to have moved by a non-integral number of codons or across a conserved unit structure. the calcium-binding loop; intron removal implies that the primordial gene harbored at least as many as the total number of introns present in the superfamily, and that would produce some introns one to a few nucleotides apa.rt (for further discussions. see Rogers. 1986. 1989). Kojima (1987) described the detection of an eightfold intragenic homology in the rat and chicken calmodulin genes and proposed a model in which a primordial gene segment that encoded an amino acid stretch with poor calcium-binding activity that was duplicat’ed threefold. Exon fusion and truncation. t’ogether with intron sliding, led to the modern form of the calmodulin gene with live int,rons. However, questions arose upon close examination of the model. The conserved core sequences that encoded amino acids that were to bind calcium inefficient’ly in the then primitive segments are not in the same phase of reading frame in the modern calmodulin gene; this is further complicated by t,he presence of introns that are of different classes. Parts of t,he sequence in the modern calmodulin cannot be accounted for by the model: particularly. a portion of the first RF-hand structure that includes a segment of the critical calcium-binding loop and the adjacent helix, as well as the interdomain region that follows, arose in the duplication scheme from without. Moreover, an unlikely event of intron-sliding was proposed to have occurred at the one adjoining the fifth and sixt’h homologous primordial segments, across a conserved sequence t,he t)hird calcium-binding loop. that encodes Remodeling of t,he gene structure that includes substantial intron loss and addition would still have to have taken place for bhe generation of modern calmodulin genes in other organisms.
(d) Evolutionary
history of the &F-hand
superfa’mily
The often intertwined questions concerning the origin of introns and their role in evolution have resolved into t,wo concepts. One holds that most or all of them are as ancient as the genes themselves and some have been moved or removed during the evolutionary history of the genes (e.g. Doolittle. 1978; Gilbert et al., 1986). The combination of exons and introns facilitates the generation of novel proteins with new unit structures or functional domains
Chlamydomonas (see e.g. Gilbert, 1978; Go, 1981). The second view is that introns were inserted into genes during the course of evolution (e.g. Rogers, 1985; Cech, 1985). Intron insertion has taken place since the divergence of the prokaryotic and eukaryotic lineages (see e.g. Cavalier-Smith, 1985). While there are a number of genes, such as the immunoglobulin genes (see e.g. Tonegawa, 1983), whose structures clearly support the first view, there are others, as represented by those of the serine proteases (see e.g. Rogers, 1985), that provide strong evidence corroborating the second view. It is apparent that the EF-hand superfamily of proteins does not belong to those in which protein structural or functional units are encoded in separate exons; none the less, duplication of a common primordial gene harboring a calciumbinding domain most certainly was responsible for the production of the modern lineages. Examination of the intron distribution pattern as presented in Figure 6 provides clues to the history of the lineages of this superfamily. Of the 43 introns present in the various member genes, 36 can be explained by a single intron gain by an ancestral gene (denoted by Y in Fig. 6). The distribution of seven of the 43 introns cannot be accounted for by a single intron gain (N in Figure 6). They can be explained, however, by independent intron acquisitions at identical positions in the ancestral genes of the lineages after they had diverged from each other. Dibb & Newman (1989) have presented an argument for the presence of introns in the actin and tubulin genes of eukaryotes via intron insertion by this line of reasoning. In both chicken and mouse, differential initiation of transcription and processing of transcripts give rise to two myosin alkali light chains with aminoterminal segments that are different in sequence and in length (Nabeshima et al., 1984; Robert et al., 1984). This, combined with the observation that the sizes and nucleotide sequences of the 5’ and 3’ noncoding regions of the member genes of this superfamily known thus far are distinct from each other, most certainly indicates that the addition of these segments was unrelated to the formation of the ancestral four-domain gene and was a relatively recent event. The emergence of intron N/I-l,0 in some of the member genes must then have come after the arrival of this progenitor. This would appear to constitute a strong case for independent insertions of intron at this position. Our analysis of actual intron and equivalent locations in these genes allowed the detection of evidence for a once-preferred site for intron insertion or deletion. The sequence [A/G]AG[G/A] is nearly identical with the exon consensus sequence that flanks introns, which is in turn identical with the proto-splice site observed at actual introns and their equivalent positions in actin and tubulin genes (Dibb & Newman, 1989). Given that the nucleotide sequences of many of the members of this superfamily have diverged substantially during their evolution, it is improbable that such a high degree
Caltractin Gene
187
of similarity occurred by chance. As explained by Dibb 6 Newman (1989), while the presence of such putative proto-splice sites does not prove intron gain or loss (they could be vestiges of intron removal) it is at least as likely that they represent ancient preferred locations where introns could have been acquired. However, in combination with the intron distribution pattern observed in the genes of the EF-hand superfamily, the presence of this conserved sequence offers reasonable evidence for the hypothesis that introns were not present in a common ancestral gene but were inserted later, on average, at the proto-splice sites, the time of which depended on the divergent history of the genes. The evolutionary history for the genes of the EF-hand superfamily proposed by Wilson et al. (1988) that is based on intron insertion is in agreement with our finding in principle. However, with the availability of more gene structures from this superfamily, particularly that of Chlamydomonas calmodulin (Zimmer et al., 1988) and caltractin, as reported here, a need for refinement of their model is apparent. Figure 7 shows a probable history of member genes of the EF-hand superfamily, as modified from theories of Wilson et al. (1988) and Perret et al. (1988b). Basically, the main branch of this family was derived from a four-domain primordial ancestor, which arrived as a result of a twofold duplication of an ancient one-domain primordial ancestor. It was at the four-domain stage that the various lineages diverged from each other. The majority of these kept the four-domain structure and gained introns as their nucleotide sequences diverged, and the remainder of these underwent a more severe remodeling of gene structure. The placement of the calcium-binding protein TCBP-25 from Tetrahymena was arbitrary and was based on its lack of conserved leucine and phenyalanine residues at positions within the EF-hand structures. The placement of the calbindin lineage differs from that of Wilson et al.(1988) but is akin to that of Perret et al.(198833, and it is based on the positions of conserved leucine residues in its sequence. In this lineage, it is not clear if all ten introns were inserted into the six-domain ancestor, as discussed by Wilson et aZ. (1988). In the calmodulin lineage, the pattern of intron distribution does not allow the distinction between a gain of three introns by the ancestral gene before or after the separation of the vertebrates from the invertebrates. We note that the bifurcation involving caltractin and the yeast CDC31 gene product is arbitrary. The amino acid sequence of caltractin shares the highest sequence identity with that of the CDC31 gene product (Huang et al., 1988a). Moncrief et al. (1990) have recently studied the amino acid sequences of a large number of calcium-binding proteins using a maximum parsimony algorithm and have placed caltractin and the CDC31 gene product on a separate branch in a dendrogram, the complexity of which is significantly more than that depicted in Figure 7.
ia
I
ancestor
Z-Domain
ancestor
4-DomaIn
\
-
ancestor
6- Domoln
IdomaIn +5
-
Exon -shufflEg + IO Introlls
Intro”5
t
Calblndin
CalpaIn
Tetrahymma
Calblndin-D9K
Calcylln
Parvolbumin
Caltractin
L-DC 31 gene
I
CBP-
product
25
L
MLCZ
CDC~I
gene product
spec I Caltroctln
MLC3
Chlomydomonos
Nematode
Figure 7. Probable evolutionary history of t’hr ICF-hand superfamil? of calcium-modulat,ed proteins. The points at which calbindin and calpain departed from t)hr main branch is placed according to the scheme of Perrrt ut al. (198%). Myosm regulatory light chain is grouped under the main branch according to the results of Goodman et al. (1979) and Baba ~1 aE. (1984), although the decision is relatively arbitrary (Perret et al.. 19886). All major divergent events would have taken place before the emergence of higher eukaryotes. The lineage for calmodulin is extended (right) to clarify how the intron insertion model could have operated in this subfamily. Also consistent with the pattern of intron distribution is that the calmodulin gene gained 3 introns before the rmergence of the vrrtrhratrs. as shown in parentheses.
ancestor
primord
(Incestor
Z-Domain
MLCZ
spec
Calmodulln
Chlamydomonas
Caltractin Gene
In conclusion, it appears that the majority of the gene members of the EF-hand superfamily have acquired introns independently after the genesis of a four-domain primordial ancestor from the original one-domain progenitor and after the divergence of this multigene family. The detection of a sequence that is identical with the proposed proto-splice site first observed in actin and tubulin genes suggests strongly that there had been preferred sites at which introns once arose during eukaryotic evolution. Furthermore, it does not appear that intron distribution can be a gauge to early evolution of this family, because intron positions do not typically and unambigously define unit structures within the ancestral gene.
periplasmic
We thank K. Sullivan for critical reading of the manuscript. This work was supported by grant GM-38113 from the National Institutes of Health to B.H.
References Baba, M. L., Goodman, M., Berger-Cohn, J., Demaille, J. G. & Matsuda, G. (1984). The early adaptive evolution of calmodulin. Mol. Biol. Evol. 1, 442-455. Bandziulis, R. J. & Rosenbaum, J. L. (1988). Novel control elements in the alpha-l tubulin gene promoter from Chlamydomonas reinhardii. Mol. Gen. Genet, 214, 204-212. Baum. P., Furlong, C. & Byers, B. (1986). Yeast gene required for spindle pole body duplication: homology of its product with Ca’+-binding proteins. Proc. Nat. Acad. Sci., U.S.A. 83, 5512-5516. Bensimhon, M., Gabarro-Arpa, J., Ehrlich, R., & Reiss, C. (1983). Physical characteristics in eucaryotic promoters. Nucl. Acids Res. 11, 4521-4540. Berchtold, M. W., Epstein, P., Beaudet, A. L., Payne, M. E., Heizman, C. W. & Means, A. R. (1987). Structural organization and chromosomal assignment of the parvalbumin gene. J. Biol. Chem. 262, 8696-8701.
Biggin, M. I)., Gibson, T. J. & Hong, G. F. (1983). Buffer gradient gels and 35S label as an aid to rapid DNA sequence determination. Proc. Nat. Acad. Sci., 1;S.A.
80, 3963-3965.
Bolduc. C., Lee, V. D. & Huang, B. (1988). p-Tubulin mutants of the unicellular green alga Chlamydmonas reinhardtii. Proc. Nat. Acad. Sci., U.S.A. 85, 131-135.
Brown, J. W7. S. (1986). A catalogue of splice junction and putative branch point sequences from plant introns. Nucl. Acids
Res. 14. 9549-9559.
Brunke, K. tJ., Anthony, J. G., Sternberg, E. J. & Weeks, D. P. (1984). Repeated consensus sequence and pseudopromoters in the four coordinately regulated tubulin genes of Chlamydomonas reinhardii. Mol. Cell. Biol. 4, 1115-l 124. Cavalier-Smith, T. (1985). Selfish DNA and the origin of introns. Nature (London), 315, 2833284. Cech, T. R. (1985). Self-splicing RNA: implications for evolution. Znt. Rev. Cytol. 93, 3-22. Davis, T., Urdea, M. S., Masiarz, F. R. & Thorner, J. (1986). Isolation of the yeast calmodulin gene; calmodulin is an essential protein. Cell, 47, 423-431. de Hostos, E. L., Schilling, J. & Grossman, A. R. (1989). Structure and expression of the gene encoding the
reinhardtii.
189
arylsulfatase Mol.
of Chlamydcnnonas Gen. Genet. 218, 229239.
Dibb, N. J. & Newman, A. J. (1989). Evidence that introns arose at proto-splice sites. EMBO J. 8, 2015-2021.
Doolittle, W. F. (1978). Genes in pieces: were they ever together? Nature (London), 272, 581-582. Emori, Y., Ohno, S., Tobita, M. & Suzuki, K. (1986). Gene structure of calcium-dependent protease retains the ancestral organization of the calcium-binding protein gene. FEBS Letters, 194, 249-252. Epstein, P., Simmen, R. C. M., Tanaka, T. & Means, A. R. (1987). Isolation and structural analysis of the chromosomal gene for chicken calmodulin. Methods Enzymol.
139, 217-229.
Falkenthal, S., Parker, V. P., Mattox, W. W. & Davidaon, N. (1984). Drosophila melanogaster has only one myosin alkali light-chain gene which encodes a protein with considerable amino acid sequence homology to chicken myosin alkali light chains. Mol. Cell. Biol. 4, 956-965. Ferrari, S., Calabretta, B., deRie1, J. K., Battini, R., Ghezzo, F., Lauret, E., Griffin, C., Emanuel, B. S., Gurrier, F. & Baserga, R. (1987). Structural and functional analysis of a growth-regulated gene, the human calcyclin. J. Biol. Chem. 262, 8325-8332. Fukuzawa, H., Fujiwara, S., Tachiki, A. & Miyachi, 8. (1990). Nucleotide sequences of two genes CAHZ and CAH2 which encode carbonic anhydrase polypeptides in Chlamydomonas reinhardtii. Nucl. Acids Res. 18, 6441-6442.
Gilbert,
W.
(London),
(1978).
Why
genes
in
pieces?
Nature
271, 501.
Gilbert, W., Marchionni, M. & McKnight, G. (1986). On the antiquity of introns. Cell, 46, 151-154. Go, M. (1981). Correlation of DNA exonic regions with protein structural units in haemoglobin. Nature (London), 291, 90-92. Goldschmidt-Clermont, M. & Rahire, M. (1986). Sequence, evolution and differential expression of the two genes encoding variant small subunits of ribulose in Chlamybisphosphate carboxylase/oxygenase domonas reinhardtii. J. Mol. Biol. 191, 421-432. Goodman, M., Pechere, J.-F., Haiech, J. & Demaille, J. G. (1979). Evolutionary diversification of structure and function in the family of intracellular calciumbinding proteins. J. Mol. Evol. 13, 331-352. Hardin, S. H., Carpenter, C. D., Hardin, P. E., Bruskin, A. M. & Klein, W. H. (1985). Structure of the Spec I gene encoding a major calcium-binding protein in the embryonic ectoderm of the sea urchin, Strongylocentrotus purpuratus. J. Mol. Biol. 186, 243-355. Harris, E. H. (1989). The Chlamydomonas Sourcebook. A Comprehensive Guide to Biology and Laboratory Use, Academic Press, San Diego. Huang, B., Mengersen, A. & Lee, V. D. (1988a). Molecular cloning of cDNA for caltractin, a basal bodyassociated Ca2+-binding protein: homology in its protein sequence with calmodulin and the yeast CDC31 gene product. J. Cell Biol. 107, 133-140. Huang, B., Watterson, D. M., Lee, V. D. & Schibler, M. J. (19886). Purification and characterization of a basal body-associated Ca 2+-binding protein. J. Cell Biol. 107, 121-131.
Imagawa, M., Chiu, R. & Karin, M. (1987). Transcription factor AP-2 mediates induction by two different signal-transduction pathways: protein kinase C and CAMP. Cell, 51, 251-260.
190
V. D. Lee
Keller, E. B. & Noon, W. A. (1984). Intron splicing: a conserved internal signal in introns of animal premRNAs. Proc. Nat. Acad. Ski., U.S.A. 81, 7417-7420. Keller. E. B. & Noon, W. A. (1985). Intron splicing: a conserved internal signal in introns of Drosophila premRNAs. Nucl. Acids Res. 13, 4971-4981. Kink, J. A., Maley, M. E., Preston, R. R.. Ling. K.-Y.. Wallen-Friedman, M. A., Saimi, Y. & Kung, C. (1990). Mutations in Paramecium calmodulin indicate functional differences between the C-terminal and N-terminal lobes in vivo. Cell, 63, 165174. Kretsinger. R. H. (1980). Structure and evolution of calcium-modulated proteins, CRC Crit. Rev. Biochem. 8, 119-174. Kretsinger, R. H. (1987). Calcium coordination and the. calmodulin fold: divergent versus convergent evolution. Cold Spring Harbor Symp. Quant. Biol. 52, 4999510. Lee, V. D. & Huang, B. (199Oa). Caltractin: a basal bodyChlamyassociated calcium-binding protein in domonas. In Calcium as an Intracellular Messenger in Eucaryotic Microbes (O’Day, D. E., ed.), pp. 245-257, American Society for Microbiology. Washington, DC. Lee. V. D., & Huang, B. (1990b). Missense mutations at lysine 350 in P2-tubulin confer altered sensitivity to microtubule inhibitors in Chlamydomonas. Plant Cell, 2. 1051-1057. Mayfield, S. P., Rahire, M.. Frank, G., Zuber. H. & Rochaix. J.-D. (1987). Expression of the nuclear gene encoding oxygen-evolving enhancer protein 2 is required for high levels of photosynthetic oxygen evolution in Chlamydomonas reinhardtii. Proc. Nat. Acad. Sci., U.S.A. 84, 749-753. Mitchell. P. J. & Tjian, R. (1989). Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins. Science, 245. 371-378. Miyake. S., Emori, Y. & Suzuki, K. (1986). Gene organization of a small subunitf of human calciumactivated nuetral protease. ,liucl. Acids Res. 14. 8805-88 17. Moncrief, N. D., Kretsinger, R. H & Goodman, M. (1990). Evolution of EF-hand calcium-modulated proteins. I. Relationships based on amino acid sequences. J. Mol. Evol. 30, 522-562. Mount, S. M. (1982). A catalogue of splice junction sequences. Nucl. Acids Res. 10, 459-472. Nabeshima. Y ,, Fujii-Kuriyama. Y., Muramatsu, M. & Ogata, K. (1984). Alternative transcription and two modes of splicing result in two myosin light chains from one gene. Nature (London). 308, 333-338. Naora. H. & Deacon, N. J. (1982). Relationship between the total size of exons and introns in protein-coding genes of higher eukaryotes. Proc. Nat. Acad. Sci., U.S.A. 79, 6196-6200. Nojima. H. (1987). Molecular evolution of the calmodulin gene. FEBS Letters 217, 187-190. Nojima, H. & Sokabe, H. (1987). Structure of a gene for rat calmodulin. J. Mol. Biol. 193, 439-445. Nudel, U.. Calvo, ,J. M., Shani, M. & Levy. Z. (1984). The nucleotide sequence of a rat myosin light chain 2 gene. Nucl. Acids Res. 12, 71757186. Ohno, S., Emori. Y., Imajoh, S., Kawasaki, H., Kisaragi, M. & Suzuki, K. (1984). Evolutionary origin of a calcium-dependent protease by fusion of genes for a thiol protease and a calcium-binding protein? Nature (London), 312, 566-570. Ohshima. Y. & Gotoh. Y. Signals for the selection of a
et al.
splice site in pre-mRNA. Computer analysis of splice junction sequences and like sequences. J. Mot. Biol. 195, 247-259. Padgett, R. A., Grabowski, P. J., Konarska, M. M.. Seiler. S. & Sharp. P. A. (1986). Splicing of messenger RNA precursors. Annu. Rev. Biochem. 55, 1119-1150. Parker, V. P., Falkenthal. S. & Davidson, N. (1985). Characterization of the myosin light-chain-2 gene of Drosophila melanogaster. Mol. Cell. Biol. 5. 3058-3068. Parmentier. M.. Lawson, 1). E. M. $ Vassart, G. (1987). Human 27-kDa calbindin complementarv DNA sequence. Evolutionary and functional imphpations. Eur. J. Biochem. 170, 207-215. Perret, C.. Lormi. N.. Gouhier. pi.. Auffray, (~‘. & Thomasset. M. (1988a). The rat vitamin-D-dependent calrium-binding protein (9-kDa CaBP) gene. Complete nucleotide sequence and xtrurtural organization. Eur. J. Biochem. 172, 43351. Perret. C.. Lomri. N. & Thomasset, M. (19886). Evolution of the EF-hand calcium-binding protein family: evidence for exon shuffling and intron insertion. J. Mol. Evol. 27. 351-364. Putkey. ,J. A., Ts‘ui. K. F.. Tanaka, T.. Lagacr. L., Stein, ,J. P.. Lai, E. C. & Means. A. R. (1983). Chicken calmodulin genes. A species comparison of cDNA sequences and isolation of a genomic clone. .I. Biol. Chem. 258. 11864-l 1870. Robert, B.. Daubas, P.. Akimenko, M.-A.. C!ohen. A.. Garner. I., Guenet. tJ.-L. & Buckingham, M. (1984). A single locus in t’he mouse encodes both myosin light chains 1 and 3. a second locus corresponds to a related pseudogene. Cell. 39. 129-140. Rogers. ,J. (1985). Exon shuffling and intron insertion in Yaturr (London). 315. serine protease genes. 4588459. Rogers. .J. (1986). Introns between protein domains: selert,ive insertion or frameshifting? Trends Gemt. 2. 223. Rogers. J H. (1989). How were introns inserted into nuclear genes! Trends Nenet. 5, 213-216. Salvat.0, M.. Sulston, tJ.. Albertson. D. & Brennrr. S. (1986). A novel calmodulin-like gene from the nematode (‘aenorhahditis elegans. .I. Mol. Biol. 190. 281-290. Sanger. F.. Nicklen. S. &, Coulson, A. R. (1977). DNA sequencing with chain-termination inhibitors. Proc. Nat. ilcad. Sci., r?.S.il. 74. 5463-5467. Schloss. ,J. A. (1990). A Chlamydomonas gene encodes a G protein /l subunit-like polypeptide. Mol. Gen. Gene!. 221. ti3-452. Sharp. P. A. (1981). Speculations on RNA splicing. (‘~11. 23, 6433646. R. I,., (lonner. T. W’. & Silflow. C. I).. Chisholm. Rosenbaum. ,I. L. (1985). The two alpha-tubulin genes of Chlamydomonas reinhardii code for slightly different, proteins. Mol. Cell. Biol. 5, 2389-2398. Simard, C.. Lemieux. (1. & Bellemare. G. (1988). Cloning of sequencing of a cDNA encoding the small subunit precursor of ribulose- 15bisphosphate carboxylasr from Chlamydomonas mbewusii. Curr. Genet. 14. 461-470. Smith, V. L., Doyle, K. E., Maune, J. F., Munjaal. It. 1’. & Reckingham, K. (1987). Structure and sequence of the Drosophila melanogaster calmodulin gene. ,I. Mol. Biol. 196, 471-485. Strehler. E. E.. Periasamy. M., Strehler-Page, M.-A. & Nadal-Ginard, B. (1985). Myosin light-chain 1 and 3
Chlamydomonas
gene has two structurally distinct and differentially regulated promoters evolving at different rates. Mol. Cell. Biol. 5, 31683182. Tabor, S. & Richardson, C. C. (1987). DNA sequence analysis with a modified bacteriophage T7 DNA polymerase. Proc. Nat. Acad. Sci., U.S.A. 84, 4767-477 1. Takeda, T. t Yamamoto, M. (1987). Analysis and in vivo disruption of the gene coding for calmodulin in Schizosaccharomyces pombe. Proc. Nat. Acad. Sci., U.S.A. 84, 3580-3584. Takemasa, T., Ohnishi, K., Kobayashi, T., Takagi, T., Konishi, K. & Watanabe, Y. (1989). Cloning and sequencing of the gene for Tetrahymena calciumbinding 25-kDa protein (TCBP-25). J. Biol. Chem. 264, 19293-19301. Tonegawa, S. (1983). Somatic generation of antibody diversity. Nature (London), 302, 575-581. Tschudi, C., Young, A. S., Ruben, L., Patton, C. L. & Richards, F. F. (1985). Calmodulin genes in trypanosomes are tandemly repeated and produce multiple mRNAs with a common 5’ leader sequence. Proc. Nat. Acad. Sci., U.S.A. 82, 3998-4002. Weeds, A. G. & McLachlan, A. D. (1974). Structural homology of myosin alkali light chains, troponin C and carp calcium binding protein. Nature (London), 252, 646-649.
Edited by K.
Caltractin Gene
191
Williams, B. D., Velleca, M. A., Curry, A. M. & Rosenbaum, J. L. (1989). Molecular cloning and sequence analysis of the Chlamydomonas gene coding for radial spoke protein 3: flagellar mutation pf-14 is an ochre allele. J. Cell Biol. 109, 235-245. Wilson, P. W., Rogers, J., Harding, M., Pohl, V., Pattyn, G. & Lawson, D. E. M. (1988). Structure of chick chromosomal genes for calbindin and calretinin. J. Mol. Biol. 290, 615-625. Woessner, J. P. & Goodenough, U. W. (1989). Molecular characterization of zygote wall protein: an extensionlike molecule in Chlamydomonas reinhardtii. Plant Cell, 1, 90-911. Youngblom, J., Schloss, J. A. & Silflow, C. D. (1984). The two /I-tubulin genes of Chlamydomonas reinhardtii code for identical proteins. Mol. Cell. Biol. 4, 2686-2696. Yu, L. M. & Selman, B. R. (1988). cDNA sequence and predicted primary structure of the y subunit from the ATP synthase from Chlamydomonus reinhardtii. J. Biol. Chem. 263, 19342-19345. Zimmer, W. E., Schloss, J. A., Silflow, C. D., Youngblom, D. M. (1988). Structural J. & Watterson, organization, DNA sequence, and expression of the calmodulin gene. J. Biol. Chem. 263, 19370-19383.
Yamamoto