Comparison of the nucleotide sequences of a yeast gene family

Comparison of the nucleotide sequences of a yeast gene family

Mutation Research, 215 (1989) 89-94 89 Elsevier MUT 04797 Comparison of the nucleotide sequences of a yeast gene family II. Analysis of spontaneous...

390KB Sizes 0 Downloads 77 Views

Mutation Research, 215 (1989) 89-94

89

Elsevier MUT 04797

Comparison of the nucleotide sequences of a yeast gene family II. Analysis of spontaneous deletions and insertions Daniel Gozalbo * and Stefan Hohmann Institut fiir Mikrobiologie, Technische Hochschule Darmstadt, Schnittspahnstrasse 10, D-6100 Darmstadt (F.R.G.) (Received 12 April 1989) (Accepted 8 June 1989)

Keywords: Spontaneous mutations; Nucleotide sequence comparison; Gene family; Deletions and insertions; Saccharomyces cerevisiae; Invertase genes

Summary We compared the nucleotide sequences of 3 yeast invertase genes in regions where the homology is better than 90%. In the noncoding region 40 gaps of 1-61 bases were found. This is about half as much as the nucleotide substitutions in the same sequences. We grouped the gaps into 5 categories by their length and the characteristics of their sequences. Group I gaps are about 20 nucleotides long and are flanked by repeated sequence of 6 bases which may trigger the deletion of one of the repeats and the sequence between the repeats. Group II gaps are characterized by a small repeated sequence which is missing in one of the invertase genes. Gaps which occur in sequences exclusively made up of one of the 4 bases are summarized in group III. The 4 gaps in group IV do not show any of these sequence characteristics and they are all just one base long. A 61 nucleotide sequence found in only one of the invertase genes seems to be of complex origin. We conclude that small repeated sequences or monotonous sequences are prone to deletion or insertion mutations.

The yeast invertase gene family consists of 6 unlinked genes called SUC1 through SUC5 and S U C 7 (Carlson et al., 1980). The sequences of the invertase genes are very similar within the coding region and can be aligned without any gaps (Hoh-

* Present address: Departament de Microbiologia, Facultat de Farmacia, Universitat de Valencia, Avgda Blasco Ibanez 13, E-46010 Valencia (Spain). Correspondence: Dr. Stefan Hohmann, lnstitut fiir Mikrobiologie, T.H. Darmstadt, Schnittspahnstr. 10, D-6100 Darmstadt (F.R.G.).

mann and Gozalbo, 1989). SUC3 through SUC5 and S U C 7 are more than 99% identical even in the noncoding regions flanking the invertase genes (Sarokin and Carlson, 1985; H o h m a n n and Gozalbo, 1988). SUC2 and SUC1, however, have diverged slightly from these genes and from each other in the coding region (Hohmann and Gozalbo, 1989). In the noncoding region SUC1 is related to the group of the other 4 invertase genes while SUC2 has diverged from these genes upstream of position - 1 3 6 (Hohmann and Gozalbo, 1988). The sequences of the noncoding regions of SUC1, SUC2 and SUC4 (as a representative for the

0027-5107/89/$03.50 © 1989 Elsevier Science Publishers B.V. (Biomedical Division)

90

genes SUC3 to SUC5 and SUC7) were aligned (Hohmann and Gozalbo, 1988) and a variety of gaps, which may be due either to spontaneous deletions or insertions, were found. Since these mutation events were identified exclusively by sequence analysis rather than by selection for mutants defective in invertase synthesis, all the deletions or insertions have no significant effect on the expression of the invertase genes. However, the different SUC genes are responsible for the production of different amounts of invertase (Hohmann and Zimmermann, 1986), the molecular basis for this phenomenon is unknown (Parets-Soler, 1989). The expression of all 6 genes is regulated by carbon catabolite repression (Grossmann and Zimmermann, 1979; Sarokin and Carlson, 1985; H o h m a n n and Zimmermann, 1986). The phylogenetic relationship of the different invertase genes is not certain. Thus, in contrast to the investigation of selected mutants, we do not known the "wild-type" or the " m u t a n t " copy of a gene. Therefore, we have to refer to the observed genetic events of either insertions or deletions. In this report we present the sequences of the gaps identified and we discuss their properties and the possible mechanisms of origin of these mutations. Materials and methods All sequences mentioned in this paper are available from the EMBL data base. The sequences compared, the source of the sequences and the accession numbers of the EMBL data base are listed in the accompanying paper (Hohmann and Gozalbo, 1989). The sequences of SUC1 and SUC4 were aligned and screened for gaps and other possibly important sequences by visual inspection. Results The alignment of the noncoding regions of SUC1, SUC2 and SUC4 revealed a total of 40 gaps without any sequence homology. For this comparison we used only D N A sequences showing more than 90% identity. Thus, the sequences of SUC2 further upstream from position - 1 3 6 and further downstream from position + 1800 rel-

TABLE1 GROUPIGAPS position

Genes

Sequences

SUCI

732

SUC4

-682

aaaAAAAGAggacacaacgatagtAAAAGAcatctaaaa

SUC4

-614

actTTGaTG

SUC1

682

SUCI

tccactg

eccaacagt

aoaAAAAAAAA

-250

SUC4

catctaaaa

actTTGGTGcaaatttgttaTTGGTGtgtccagtg

239

SUCI SUC4

aaaAAAAGA

aaaAAAAAAAAgtgatattgcctgtcAAAgAAAActcaatagt

71

ttttttTcTTTT

-89

ttttttT

tctttttctc TTTTaccatttatcttacTTTTTtttttttctc

The position is always the first nucleotide of the sequence shown and numbered with respect to the translational start site ( + 1, first base of the coding region). The last nucleotide of the coding region is +1599. The sequences of the repeated sequences are given in capital letters.

ative to the translational start site were not considered ( H o h m a n n and Gozalbo, 1988). The gaps can be divided into 5 groups taking as TABLE 2 G R O U P II GAPS Genes

position

Sequences

SUC4

719

ctaGAT

SUCI

772

c ta G A T G A T ~ c ~

SUCI

529

~ACTT

SUC4

48 0

a t A ( T T t AC T T a a ~ ~ t

gi:gt

n~ t 2H

,Hg,LI K

SUC4

-262

tacAGA2 ~

SUCd

- 257

tacAGATc

AGAT~ t c t t, t

SUCI

-130

agATAT

gtatta

SUC4

116

SUC1 SUC4

SL'C I

+ 1715

aaaa

g t ATTTTTATTTTT

SUC4

÷ 1720

aaaat

t tATTTTTATTTTTATT

SUCI

+2047

ataag

SUC4

+2058

a taaga~agACAGACAG

SUC1

~2160

act tAT

SUC4

+2163

act

c,: t t,

agATATATg

t a t tg

~1658

tgcTAA

cgagtg

.1658

t gcTAATAAcg,~g

tg

A FTTTc t FTTATT] ],

tacgaagagg

ACAG

tg c g a a g a g a g a

ctcgta

ATATctcgta

t ,l

t

91

criteria their size and the properties of the sequences surrounding them. The four Group I gaps are 19-24 nucleotides long. Table 1 summarizes the positions of these gaps and the sequences of the invertase gene which has a sequence in the position where another SUC gene has a gap and shows the nucleotides flanking the gap. The ungapped sequence exhibits a repeated sequence of at least 6 nucleotides. In the gapped version the D N A between the repeated sequence and one copy of the repeat is missing. Group II contains eight gaps of 2 - 6 nucleotides (Table 2). The sequences are characterized by small repeated elements of 2 - 6 nucleotides

TABLE 3 GROUP III GAPS Genes

position

TABLE 3 (continued) SUCI

+1665

gtgacG

aatgta

SUC4

+1669

gtgacGGaatgta

SUC1

+1694

agaatA

SUC4

+1699

agaatAAcetcca

SUCZ

+1901

AAAAAA t g t a c g

SUC4

+1912

AAAhAAAtgtacg

SUC4

+2022

acgeth

SUCI

+2010

gctctAkgctctg

SUC1

+2041

ggagaT

SUC4

+2051

ggaggTTataaga

SUC4

+2084

accgAA

SUCI

+2065

gccgAAAtttttt

cctcca

gctctg

ataaga

tttttt

SUCI

+2084

ctgAAA g t t g t t

SUC4

+2102

ctaAAAAgttgtt

SUC1

+2110

ttttAA catgct

SUC4

+2127

ttttAAAcatgct

SUC4

+2138

ctctAA

SUCI

+2120

ctctAAAtataac

SUC4

+2160

g t a a c T atatct

SUCI

+2144

gtaacTTat

Sequences

tcAAAA

cgtgat

SUC4

632

SUCI

-703

SUCI

447

cTTTTT c c t c c g

SUC4

453

TTTTTTTcctccg

SUC4

440

cTTTTT c t c a c g

SUC7

-441

cTTTTTTctcacg

SUC4

-423

cggkAA taaaaa

SUC1

-418

cggAAAAtaaaaa

SUC1

-269

attcaG

SUC4

275

SUC1

-184

cTTTTT a g g e t g

SUC4

-171

cTTTTTTgggctg

SUC1

-122

attaTT

SUC4

-I06

attgTTTctttta

SUC2

-83

TTTTTTTT

SUC1

-86

TTTTTTTTTTaccatt

SUC2

-54

TTTTTTTTTT

SUC1

-56

TTTTTTTTTTTTctctca

SUC1

+1619

cTTTTT a t t t t t

SUC4

+1619

cTTTTTTattttt

tcAAAAAAAcgtgat tatagc

ct

SUC4

+2183

tacttO

SUCl

+2166

tacttGGateatg

atcatg

tacaga

attcaGGtagaga

cttc

a

accata

ctctca

without any bases intervening in this repeat. In the gapped version one copy of the repeat is missing. Group III consists of 23 gaps of 1 - 3 nucleotides (Table 3). All these gaps showed up in monotonous sequences. For example, one S U C gene has 6 consecutive adenines while another one has only 3 adenines at this position. We found just 4 gaps of 1 nucleotide which did not show any of the features described above (Group IV, Table 4). Within the 5' noncoding region of SUC1 there is a 61-nucleotide sequence not found in the other invertase genes (Table 5). The most prominent

92 TABLE4 GROUPIV GAPS Genes

positions

sequences

SUCI

-121

tt

SUC4

-105

tttcttctaaaaca

cttc

aaasca

SUC7

1653

ctttcg

SUC4

1653

cttttgctaataa

taataa

SUC1

1712

tgaaaa

SUC2

1712

tgaaaatgtattt

SUC4

1718

tgaaaatttattt

gtattt

SUC4

2154

tcaatc

SUCI

2138

tcagtcagtaact

gtaact

TABLE 5 GROUP V GAPS Additional sequence in SUC1 and the flanking regions. SUC4

-468

aagatgsaacag

SUCI

-522

aagatgaagtaacccgaaaaagattttttttttttttttttttttt

tccgtttttttcctccgcttttt

SUC4 SUCI

cCCTCCGCTTTTTcctccgctttttcctccgcttttt

cc±ccgcatttt

In capital letters: one copy of the 4 times repeated 12 nucleotide-sequence element.

features of this sequence are 23 consecutive thymidines and a 4 times repeated sequence element of 9 nucleotides. Discussion

Frequency of gap formation We found a total number of 40 gaps in the noncoding region of the yeast invertase gene sequences and no gaps within the coding region. The numbers of base substitutions were 90 in the noncoding region and 132 in the coding region (Hohmann and Gozalbo, 1989). Thus, base substitutions seem to be the relatively more frequent event even in the noncoding region where the selective pressure to maintain the sequence should be much weaker than in the coding region. Molecular analysis of spontaneous mutations in the lacI

gene from Escherichia coli (Farabough et al., 1978; Schaaper et al., 1986) or the E. coli gpt gene stably transformed into hamster cells (Tindall and Stankowski, 1989) identified deletions or insertions as the predominant events. These authors, however, studied mutations inactivating the product of the respective gene while the yeast invertase genes compared are functional ( H o h m a n n and Zimmermann, 1986). A significant portion of the spontaneous mutations that actually occur should be silent base substitutions while most deletions or insertions affect the gene product. Therefore, screening for mutants will overestimate the frequency of deletions or insertions while in our comparison of closely related active genes the number of nucleotide substitutions should be an overestimate relative to the number of deletions and insertions. Even in the noncoding region of the yeast SUC genes many possible deletions or insertions should be selected against during evolution if they affect the synthesis of the m R N A . We conclude that the proportion of about 1 - 2 for deletions and insertions relative to base substitutions (40-90) may therefore be a minimum estimation for spontaneous mutations that actually had occurred.

Group I gaps Although we do not know which of the invertase genes should be regarded as "wild-type" (or "ancestral type") and in which gene the genetic event changing the nucleotide sequence had occurred (actually it may be SUC1 in one case and SUC4 in another), we interpret the Group I gaps as possible deletions. The approx. 20 nucleotides missing do not show any homology to the sequences flanking the gap, which otherwise would indicate a possible duplication. All 4 gaps are characterized by flanking repeated sequences of 6 bases. There is evidence from a variety of reports on the molecular basis of spontaneous mutations that such repeated sequences trigger deletions (Farabough et al., 1978; Albertini et al., 1982; de Zamaroczy et al., 1983; Nalbantoglu et al., 1986; Schaaper et al., 1986; Ahne et al., 1988). It was thought that a slipped mispairing of these repeated sequences and the looping out of the intermediate sequences during D N A replication could be the basis for this phenomenon (Streisinger et

93 al., 1966; Schaaper et al., 1986; Ahne et al., 1988). Also palindromic sequences seem to promote the creation of deletions (Glickman and Ripley, 1984; Golding and Glickman, 1985). Only the first of the 4 Group I gaps shows a perfect repeat of 6 bases. The third and the second gaps are flanked by monotonous sequences with one nucleotide differences which may be due to a substitution which occurred after the deletion. The second gap seems to be more complex. There is a perfect repeat of the sequence T T G G T G in SUC1 and the closely related sequence T T G a T G in SUC4. Furthermore, there is a second small gap in SUC4 missing one copy of the small repeat T G T G (a gap of the Group II type).

Group H gaps The Group II gaps (Table 2) are much smaller and most of them have shorter repeats than the gaps of G r o u p I. With just one exception only one copy of the repeated sequence is missing and no bases between the repeat were deleted. The exception is the second gap, where one T between the repeat is missing. As there are 3 consecutive T in this position this may be interpreted as an additional mutation similar to a Group III event (Table 3). The sequence properties make it impossible to classify the Group II gaps as deletions or small duplications. This is true especially for the fourth and sixth gaps (positions - 1 3 0 and 1715 for SUC1 ) where the sequence motif is repeated 3 or 4 times, resp. The seventh gap (position 2047 for SUC1 ) seems to be more complex and may be due to two different events. Group III gaps The Group III (Table 3) gaps are very short, between 1 and 3 bases, and they are missing bases in monotonous sequences, that means of sequences of 2 or more consecutive identical nucleotides. Therefore, it is again difficult to identify these gaps as deletions or insertions. If the monotonous sequence is long, a slipped mispairing and the deletion of the sequence looped out could be possible (Streisinger et al., 1966). Furthermore D N A polymerase could have problems in "counting" consecutive bases and could either add or leave out one or a few of these bases.

Only the seventh gap (position - 122 for SUC1 ) seems to be more complex and could be due to two close but independent events. The second of the two one-base gaps in that sequence could be due to the loss or addition of one thymine and in SUC1 one of the 3 thymine residues was then substituted by a cytosine.

Group I V gaps The 4 gaps listed in Table 4 do not show any of the characteristics of the gaps in Groups I - I I I . We could not find any repeated or monotonous sequences or palindromes. Either such sequence properties were masked by secondary mutations or these gaps were created by a different independent mechanism. Group V gap The G r o u p V gap is much more complex. It contains 23 consecutive thymines and a 4 times repeated 12-nucleotide motif. Due to these properties the gap could be interpreted either as duplications or multiplications by overreplication or by deletions of the repeated sequences following slipped mispairing during replication. Interestingly, the sequence element G C T T T T T of the 12-base motif is complementary to C G A A A A A at the opposite end of the gap. A looping out of the entire sequence or parts of the sequence (because of the repeated sequence alternative loops were possible) would be possible. This loop could either be deleted or serve as a template for overreplication. It is possible that this gap is due to multiple genetic events. Concluding remarks 36 out of 40 gaps in the sequence alignment of the yeast invertase genes noncoding region show repeated or monotonous sequences flanking the gap. As has been found by others (e.g. Farabough et al., 1978; Schaaper et al., 1986; Ahne et al., 1988) such sequences promote spontaneous deletion or insertion mutations. The comparison of the invertase genes revealed that spontaneous mutations changing the number of nucleotides in a D N A sequence are at least half as frequent as base substitutions.

94

Acknowledgment We thank F.K. Zimmermann for reading the manuscript.

References Ahne, A., J. M~ller-Derlich, A.M. Merlos-Lange, F. Kanbay, K. Wolf and B.F. Lang (1988) Two distinct mechanisms for deletion in mitochondrial DNA of Schizosaccharomyces pombe mutator strains, Slipped mispairing mediated by direct repeats and erroneous intron splicing, J. Mol. Biol., 202, 725-734. Albertini, A.M., M. Hofer, M.P. Calos and J.H. Miller (1982) On the formation of spontaneous deletions: The importance of short sequence homologies in the generation of large deletions, Cell, 29, 319-328. Carlson, M., B.C. Osmond and D. Botstein (1980) SUC genes in yeast: A dispersed gene family, Cold Spring Harbor Symp. Quant. Biol., 45, 799-803. Farabough, P.J., U. Schmeisser, M. Hofer and J.H. Miller (1978) Genetic studies of the lac repressor, VIi. On the molecular nature of spontaneous hot spots in the lacl gene of Escherichia coli, J. Mol. Biol., 126, 847 863. Glickman, B.W., and L.S. Ripley (1984) Structural intermediates of deletion mutagenesis: A role for palindromic DNA, Proc. Natl. Acad. Sci. (U.S.A.), 81,512-516. Golding, G.B., and B.W. Glickman (1985) Sequence-directed mutagenesis: Evidence from a phylogenetic history of human alpha-interferon genes, Proc. Natl. Acad. Sci. (U.S.A.), 82, 8577-8581. Grossmann, M.K., and F.K. Zimmermann (1979) The structural genes of internal invertases in Saccharon~vces cerevisiae, Mol. Gen. Genet., 175,223 229. Hohmann, S., and F.K. Zimmermann (1986) Cloning and

expression on a multicopy vector of five invertase genes of Saccharomyces cerevisiae, Curt. Genet., 11,217-225. Hohmann, S., and D. Gozalbo (1988) Structural analysis of the 5' regions of yeast SUC genes revealed analogous palindromes in SUC, MAL and GAL, Mol. Gen. Genet., 211,446 454. Hohmann, S., and D. Gozalbo (1989) Comparison of the nucleotide sequences of a yeast gene family, 1. Distribution and spectrum of spontaneous base substitutions, Mutation Res., in press. Nalbantoglu, J., D. Hartley, G. Phear, G. Tear and M. Meuth (1986) Spontaneous deletion formation at the aprt locus of hamster cells: the presence of short sequence homologies and dyad symmetries at deletion termini, EMBO J., 5, 1199-1204. Parets-Soler, A. (1989) Base substitutions in the 5' noncoding regions of two naturally occurring yeast invertase structural SUC genes cause strong differences in specific invertase activities, Curr. Genet., 15, 299-302. Sarokin, L., and M. Carlson (1985) Comparison of 2 yeast invertase genes: conservation of the upstream regulation site, Nucleic Acids Res., 13, 6089 6113. Schaaper, R.M., B.N. Danforth and B.W. Glickman (1986) Mechanism of spontaneous mutagenesis: An analysis of the spectrum of spontaneous mutations in the Escherichia coli lacl gene, J. Mol. Biol., 189, 273 284. Streisinger, G., Y. Okada, J. Emrich, J. Newton, A. Tsugita, E. Terzaghi and M. Inouye (1966) Framesbift mutations and the genetic code, Cold Spring Harbor Symp. Quant. Biol., 31, 77-84. Tindall, K.R., and L.F. Stankowski Jr. (1989) Molecular analysis of spontaneous mutations at the gpt locus in Chinese hamster ovary (AS52) cells, Mutation Res., 220, 241 253. Zamaroczy de, M., G. Faugeron-Fonty and G. Bernardi (1983) Excision sequences in the mitochondrial genome of yeast, Gene, 21, 193 202.