Mutation Research, 215 (1989) 89-94
89
Elsevier MUT 04797
Comparison of the nucleotide sequences of a yeast gene family II. Analysis of spontaneous deletions and insertions Daniel Gozalbo * and Stefan Hohmann Institut fiir Mikrobiologie, Technische Hochschule Darmstadt, Schnittspahnstrasse 10, D-6100 Darmstadt (F.R.G.) (Received 12 April 1989) (Accepted 8 June 1989)
Keywords: Spontaneous mutations; Nucleotide sequence comparison; Gene family; Deletions and insertions; Saccharomyces cerevisiae; Invertase genes
Summary We compared the nucleotide sequences of 3 yeast invertase genes in regions where the homology is better than 90%. In the noncoding region 40 gaps of 1-61 bases were found. This is about half as much as the nucleotide substitutions in the same sequences. We grouped the gaps into 5 categories by their length and the characteristics of their sequences. Group I gaps are about 20 nucleotides long and are flanked by repeated sequence of 6 bases which may trigger the deletion of one of the repeats and the sequence between the repeats. Group II gaps are characterized by a small repeated sequence which is missing in one of the invertase genes. Gaps which occur in sequences exclusively made up of one of the 4 bases are summarized in group III. The 4 gaps in group IV do not show any of these sequence characteristics and they are all just one base long. A 61 nucleotide sequence found in only one of the invertase genes seems to be of complex origin. We conclude that small repeated sequences or monotonous sequences are prone to deletion or insertion mutations.
The yeast invertase gene family consists of 6 unlinked genes called SUC1 through SUC5 and S U C 7 (Carlson et al., 1980). The sequences of the invertase genes are very similar within the coding region and can be aligned without any gaps (Hoh-
* Present address: Departament de Microbiologia, Facultat de Farmacia, Universitat de Valencia, Avgda Blasco Ibanez 13, E-46010 Valencia (Spain). Correspondence: Dr. Stefan Hohmann, lnstitut fiir Mikrobiologie, T.H. Darmstadt, Schnittspahnstr. 10, D-6100 Darmstadt (F.R.G.).
mann and Gozalbo, 1989). SUC3 through SUC5 and S U C 7 are more than 99% identical even in the noncoding regions flanking the invertase genes (Sarokin and Carlson, 1985; H o h m a n n and Gozalbo, 1988). SUC2 and SUC1, however, have diverged slightly from these genes and from each other in the coding region (Hohmann and Gozalbo, 1989). In the noncoding region SUC1 is related to the group of the other 4 invertase genes while SUC2 has diverged from these genes upstream of position - 1 3 6 (Hohmann and Gozalbo, 1988). The sequences of the noncoding regions of SUC1, SUC2 and SUC4 (as a representative for the
0027-5107/89/$03.50 © 1989 Elsevier Science Publishers B.V. (Biomedical Division)
90
genes SUC3 to SUC5 and SUC7) were aligned (Hohmann and Gozalbo, 1988) and a variety of gaps, which may be due either to spontaneous deletions or insertions, were found. Since these mutation events were identified exclusively by sequence analysis rather than by selection for mutants defective in invertase synthesis, all the deletions or insertions have no significant effect on the expression of the invertase genes. However, the different SUC genes are responsible for the production of different amounts of invertase (Hohmann and Zimmermann, 1986), the molecular basis for this phenomenon is unknown (Parets-Soler, 1989). The expression of all 6 genes is regulated by carbon catabolite repression (Grossmann and Zimmermann, 1979; Sarokin and Carlson, 1985; H o h m a n n and Zimmermann, 1986). The phylogenetic relationship of the different invertase genes is not certain. Thus, in contrast to the investigation of selected mutants, we do not known the "wild-type" or the " m u t a n t " copy of a gene. Therefore, we have to refer to the observed genetic events of either insertions or deletions. In this report we present the sequences of the gaps identified and we discuss their properties and the possible mechanisms of origin of these mutations. Materials and methods All sequences mentioned in this paper are available from the EMBL data base. The sequences compared, the source of the sequences and the accession numbers of the EMBL data base are listed in the accompanying paper (Hohmann and Gozalbo, 1989). The sequences of SUC1 and SUC4 were aligned and screened for gaps and other possibly important sequences by visual inspection. Results The alignment of the noncoding regions of SUC1, SUC2 and SUC4 revealed a total of 40 gaps without any sequence homology. For this comparison we used only D N A sequences showing more than 90% identity. Thus, the sequences of SUC2 further upstream from position - 1 3 6 and further downstream from position + 1800 rel-
TABLE1 GROUPIGAPS position
Genes
Sequences
SUCI
732
SUC4
-682
aaaAAAAGAggacacaacgatagtAAAAGAcatctaaaa
SUC4
-614
actTTGaTG
SUC1
682
SUCI
tccactg
eccaacagt
aoaAAAAAAAA
-250
SUC4
catctaaaa
actTTGGTGcaaatttgttaTTGGTGtgtccagtg
239
SUCI SUC4
aaaAAAAGA
aaaAAAAAAAAgtgatattgcctgtcAAAgAAAActcaatagt
71
ttttttTcTTTT
-89
ttttttT
tctttttctc TTTTaccatttatcttacTTTTTtttttttctc
The position is always the first nucleotide of the sequence shown and numbered with respect to the translational start site ( + 1, first base of the coding region). The last nucleotide of the coding region is +1599. The sequences of the repeated sequences are given in capital letters.
ative to the translational start site were not considered ( H o h m a n n and Gozalbo, 1988). The gaps can be divided into 5 groups taking as TABLE 2 G R O U P II GAPS Genes
position
Sequences
SUC4
719
ctaGAT
SUCI
772
c ta G A T G A T ~ c ~
SUCI
529
~ACTT
SUC4
48 0
a t A ( T T t AC T T a a ~ ~ t
gi:gt
n~ t 2H
,Hg,LI K
SUC4
-262
tacAGA2 ~
SUCd
- 257
tacAGATc
AGAT~ t c t t, t
SUCI
-130
agATAT
gtatta
SUC4
116
SUC1 SUC4
SL'C I
+ 1715
aaaa
g t ATTTTTATTTTT
SUC4
÷ 1720
aaaat
t tATTTTTATTTTTATT
SUCI
+2047
ataag
SUC4
+2058
a taaga~agACAGACAG
SUC1
~2160
act tAT
SUC4
+2163
act
c,: t t,
agATATATg
t a t tg
~1658
tgcTAA
cgagtg
.1658
t gcTAATAAcg,~g
tg
A FTTTc t FTTATT] ],
tacgaagagg
ACAG
tg c g a a g a g a g a
ctcgta
ATATctcgta
t ,l
t
91
criteria their size and the properties of the sequences surrounding them. The four Group I gaps are 19-24 nucleotides long. Table 1 summarizes the positions of these gaps and the sequences of the invertase gene which has a sequence in the position where another SUC gene has a gap and shows the nucleotides flanking the gap. The ungapped sequence exhibits a repeated sequence of at least 6 nucleotides. In the gapped version the D N A between the repeated sequence and one copy of the repeat is missing. Group II contains eight gaps of 2 - 6 nucleotides (Table 2). The sequences are characterized by small repeated elements of 2 - 6 nucleotides
TABLE 3 GROUP III GAPS Genes
position
TABLE 3 (continued) SUCI
+1665
gtgacG
aatgta
SUC4
+1669
gtgacGGaatgta
SUC1
+1694
agaatA
SUC4
+1699
agaatAAcetcca
SUCZ
+1901
AAAAAA t g t a c g
SUC4
+1912
AAAhAAAtgtacg
SUC4
+2022
acgeth
SUCI
+2010
gctctAkgctctg
SUC1
+2041
ggagaT
SUC4
+2051
ggaggTTataaga
SUC4
+2084
accgAA
SUCI
+2065
gccgAAAtttttt
cctcca
gctctg
ataaga
tttttt
SUCI
+2084
ctgAAA g t t g t t
SUC4
+2102
ctaAAAAgttgtt
SUC1
+2110
ttttAA catgct
SUC4
+2127
ttttAAAcatgct
SUC4
+2138
ctctAA
SUCI
+2120
ctctAAAtataac
SUC4
+2160
g t a a c T atatct
SUCI
+2144
gtaacTTat
Sequences
tcAAAA
cgtgat
SUC4
632
SUCI
-703
SUCI
447
cTTTTT c c t c c g
SUC4
453
TTTTTTTcctccg
SUC4
440
cTTTTT c t c a c g
SUC7
-441
cTTTTTTctcacg
SUC4
-423
cggkAA taaaaa
SUC1
-418
cggAAAAtaaaaa
SUC1
-269
attcaG
SUC4
275
SUC1
-184
cTTTTT a g g e t g
SUC4
-171
cTTTTTTgggctg
SUC1
-122
attaTT
SUC4
-I06
attgTTTctttta
SUC2
-83
TTTTTTTT
SUC1
-86
TTTTTTTTTTaccatt
SUC2
-54
TTTTTTTTTT
SUC1
-56
TTTTTTTTTTTTctctca
SUC1
+1619
cTTTTT a t t t t t
SUC4
+1619
cTTTTTTattttt
tcAAAAAAAcgtgat tatagc
ct
SUC4
+2183
tacttO
SUCl
+2166
tacttGGateatg
atcatg
tacaga
attcaGGtagaga
cttc
a
accata
ctctca
without any bases intervening in this repeat. In the gapped version one copy of the repeat is missing. Group III consists of 23 gaps of 1 - 3 nucleotides (Table 3). All these gaps showed up in monotonous sequences. For example, one S U C gene has 6 consecutive adenines while another one has only 3 adenines at this position. We found just 4 gaps of 1 nucleotide which did not show any of the features described above (Group IV, Table 4). Within the 5' noncoding region of SUC1 there is a 61-nucleotide sequence not found in the other invertase genes (Table 5). The most prominent
92 TABLE4 GROUPIV GAPS Genes
positions
sequences
SUCI
-121
tt
SUC4
-105
tttcttctaaaaca
cttc
aaasca
SUC7
1653
ctttcg
SUC4
1653
cttttgctaataa
taataa
SUC1
1712
tgaaaa
SUC2
1712
tgaaaatgtattt
SUC4
1718
tgaaaatttattt
gtattt
SUC4
2154
tcaatc
SUCI
2138
tcagtcagtaact
gtaact
TABLE 5 GROUP V GAPS Additional sequence in SUC1 and the flanking regions. SUC4
-468
aagatgsaacag
SUCI
-522
aagatgaagtaacccgaaaaagattttttttttttttttttttttt
tccgtttttttcctccgcttttt
SUC4 SUCI
cCCTCCGCTTTTTcctccgctttttcctccgcttttt
cc±ccgcatttt
In capital letters: one copy of the 4 times repeated 12 nucleotide-sequence element.
features of this sequence are 23 consecutive thymidines and a 4 times repeated sequence element of 9 nucleotides. Discussion
Frequency of gap formation We found a total number of 40 gaps in the noncoding region of the yeast invertase gene sequences and no gaps within the coding region. The numbers of base substitutions were 90 in the noncoding region and 132 in the coding region (Hohmann and Gozalbo, 1989). Thus, base substitutions seem to be the relatively more frequent event even in the noncoding region where the selective pressure to maintain the sequence should be much weaker than in the coding region. Molecular analysis of spontaneous mutations in the lacI
gene from Escherichia coli (Farabough et al., 1978; Schaaper et al., 1986) or the E. coli gpt gene stably transformed into hamster cells (Tindall and Stankowski, 1989) identified deletions or insertions as the predominant events. These authors, however, studied mutations inactivating the product of the respective gene while the yeast invertase genes compared are functional ( H o h m a n n and Zimmermann, 1986). A significant portion of the spontaneous mutations that actually occur should be silent base substitutions while most deletions or insertions affect the gene product. Therefore, screening for mutants will overestimate the frequency of deletions or insertions while in our comparison of closely related active genes the number of nucleotide substitutions should be an overestimate relative to the number of deletions and insertions. Even in the noncoding region of the yeast SUC genes many possible deletions or insertions should be selected against during evolution if they affect the synthesis of the m R N A . We conclude that the proportion of about 1 - 2 for deletions and insertions relative to base substitutions (40-90) may therefore be a minimum estimation for spontaneous mutations that actually had occurred.
Group I gaps Although we do not know which of the invertase genes should be regarded as "wild-type" (or "ancestral type") and in which gene the genetic event changing the nucleotide sequence had occurred (actually it may be SUC1 in one case and SUC4 in another), we interpret the Group I gaps as possible deletions. The approx. 20 nucleotides missing do not show any homology to the sequences flanking the gap, which otherwise would indicate a possible duplication. All 4 gaps are characterized by flanking repeated sequences of 6 bases. There is evidence from a variety of reports on the molecular basis of spontaneous mutations that such repeated sequences trigger deletions (Farabough et al., 1978; Albertini et al., 1982; de Zamaroczy et al., 1983; Nalbantoglu et al., 1986; Schaaper et al., 1986; Ahne et al., 1988). It was thought that a slipped mispairing of these repeated sequences and the looping out of the intermediate sequences during D N A replication could be the basis for this phenomenon (Streisinger et
93 al., 1966; Schaaper et al., 1986; Ahne et al., 1988). Also palindromic sequences seem to promote the creation of deletions (Glickman and Ripley, 1984; Golding and Glickman, 1985). Only the first of the 4 Group I gaps shows a perfect repeat of 6 bases. The third and the second gaps are flanked by monotonous sequences with one nucleotide differences which may be due to a substitution which occurred after the deletion. The second gap seems to be more complex. There is a perfect repeat of the sequence T T G G T G in SUC1 and the closely related sequence T T G a T G in SUC4. Furthermore, there is a second small gap in SUC4 missing one copy of the small repeat T G T G (a gap of the Group II type).
Group H gaps The Group II gaps (Table 2) are much smaller and most of them have shorter repeats than the gaps of G r o u p I. With just one exception only one copy of the repeated sequence is missing and no bases between the repeat were deleted. The exception is the second gap, where one T between the repeat is missing. As there are 3 consecutive T in this position this may be interpreted as an additional mutation similar to a Group III event (Table 3). The sequence properties make it impossible to classify the Group II gaps as deletions or small duplications. This is true especially for the fourth and sixth gaps (positions - 1 3 0 and 1715 for SUC1 ) where the sequence motif is repeated 3 or 4 times, resp. The seventh gap (position 2047 for SUC1 ) seems to be more complex and may be due to two different events. Group III gaps The Group III (Table 3) gaps are very short, between 1 and 3 bases, and they are missing bases in monotonous sequences, that means of sequences of 2 or more consecutive identical nucleotides. Therefore, it is again difficult to identify these gaps as deletions or insertions. If the monotonous sequence is long, a slipped mispairing and the deletion of the sequence looped out could be possible (Streisinger et al., 1966). Furthermore D N A polymerase could have problems in "counting" consecutive bases and could either add or leave out one or a few of these bases.
Only the seventh gap (position - 122 for SUC1 ) seems to be more complex and could be due to two close but independent events. The second of the two one-base gaps in that sequence could be due to the loss or addition of one thymine and in SUC1 one of the 3 thymine residues was then substituted by a cytosine.
Group I V gaps The 4 gaps listed in Table 4 do not show any of the characteristics of the gaps in Groups I - I I I . We could not find any repeated or monotonous sequences or palindromes. Either such sequence properties were masked by secondary mutations or these gaps were created by a different independent mechanism. Group V gap The G r o u p V gap is much more complex. It contains 23 consecutive thymines and a 4 times repeated 12-nucleotide motif. Due to these properties the gap could be interpreted either as duplications or multiplications by overreplication or by deletions of the repeated sequences following slipped mispairing during replication. Interestingly, the sequence element G C T T T T T of the 12-base motif is complementary to C G A A A A A at the opposite end of the gap. A looping out of the entire sequence or parts of the sequence (because of the repeated sequence alternative loops were possible) would be possible. This loop could either be deleted or serve as a template for overreplication. It is possible that this gap is due to multiple genetic events. Concluding remarks 36 out of 40 gaps in the sequence alignment of the yeast invertase genes noncoding region show repeated or monotonous sequences flanking the gap. As has been found by others (e.g. Farabough et al., 1978; Schaaper et al., 1986; Ahne et al., 1988) such sequences promote spontaneous deletion or insertion mutations. The comparison of the invertase genes revealed that spontaneous mutations changing the number of nucleotides in a D N A sequence are at least half as frequent as base substitutions.
94
Acknowledgment We thank F.K. Zimmermann for reading the manuscript.
References Ahne, A., J. M~ller-Derlich, A.M. Merlos-Lange, F. Kanbay, K. Wolf and B.F. Lang (1988) Two distinct mechanisms for deletion in mitochondrial DNA of Schizosaccharomyces pombe mutator strains, Slipped mispairing mediated by direct repeats and erroneous intron splicing, J. Mol. Biol., 202, 725-734. Albertini, A.M., M. Hofer, M.P. Calos and J.H. Miller (1982) On the formation of spontaneous deletions: The importance of short sequence homologies in the generation of large deletions, Cell, 29, 319-328. Carlson, M., B.C. Osmond and D. Botstein (1980) SUC genes in yeast: A dispersed gene family, Cold Spring Harbor Symp. Quant. Biol., 45, 799-803. Farabough, P.J., U. Schmeisser, M. Hofer and J.H. Miller (1978) Genetic studies of the lac repressor, VIi. On the molecular nature of spontaneous hot spots in the lacl gene of Escherichia coli, J. Mol. Biol., 126, 847 863. Glickman, B.W., and L.S. Ripley (1984) Structural intermediates of deletion mutagenesis: A role for palindromic DNA, Proc. Natl. Acad. Sci. (U.S.A.), 81,512-516. Golding, G.B., and B.W. Glickman (1985) Sequence-directed mutagenesis: Evidence from a phylogenetic history of human alpha-interferon genes, Proc. Natl. Acad. Sci. (U.S.A.), 82, 8577-8581. Grossmann, M.K., and F.K. Zimmermann (1979) The structural genes of internal invertases in Saccharon~vces cerevisiae, Mol. Gen. Genet., 175,223 229. Hohmann, S., and F.K. Zimmermann (1986) Cloning and
expression on a multicopy vector of five invertase genes of Saccharomyces cerevisiae, Curt. Genet., 11,217-225. Hohmann, S., and D. Gozalbo (1988) Structural analysis of the 5' regions of yeast SUC genes revealed analogous palindromes in SUC, MAL and GAL, Mol. Gen. Genet., 211,446 454. Hohmann, S., and D. Gozalbo (1989) Comparison of the nucleotide sequences of a yeast gene family, 1. Distribution and spectrum of spontaneous base substitutions, Mutation Res., in press. Nalbantoglu, J., D. Hartley, G. Phear, G. Tear and M. Meuth (1986) Spontaneous deletion formation at the aprt locus of hamster cells: the presence of short sequence homologies and dyad symmetries at deletion termini, EMBO J., 5, 1199-1204. Parets-Soler, A. (1989) Base substitutions in the 5' noncoding regions of two naturally occurring yeast invertase structural SUC genes cause strong differences in specific invertase activities, Curr. Genet., 15, 299-302. Sarokin, L., and M. Carlson (1985) Comparison of 2 yeast invertase genes: conservation of the upstream regulation site, Nucleic Acids Res., 13, 6089 6113. Schaaper, R.M., B.N. Danforth and B.W. Glickman (1986) Mechanism of spontaneous mutagenesis: An analysis of the spectrum of spontaneous mutations in the Escherichia coli lacl gene, J. Mol. Biol., 189, 273 284. Streisinger, G., Y. Okada, J. Emrich, J. Newton, A. Tsugita, E. Terzaghi and M. Inouye (1966) Framesbift mutations and the genetic code, Cold Spring Harbor Symp. Quant. Biol., 31, 77-84. Tindall, K.R., and L.F. Stankowski Jr. (1989) Molecular analysis of spontaneous mutations at the gpt locus in Chinese hamster ovary (AS52) cells, Mutation Res., 220, 241 253. Zamaroczy de, M., G. Faugeron-Fonty and G. Bernardi (1983) Excision sequences in the mitochondrial genome of yeast, Gene, 21, 193 202.