Two intron sequences in yeast mitochondrial COX1 gene: Homology among URF-containing introns and strain-dependent variation in flanking exons

Two intron sequences in yeast mitochondrial COX1 gene: Homology among URF-containing introns and strain-dependent variation in flanking exons

Cell, Vol. 32, 379-389, February 1983, Copyright 0 1983 by MIT Two lntron Sequences in Yeast Mitochondrial COXl Gene: Homology among URF-Contain...

1MB Sizes 1 Downloads 67 Views

Cell, Vol. 32, 379-389,

February

1983,

Copyright

0 1983

by MIT

Two lntron Sequences in Yeast Mitochondrial COXl Gene: Homology among URF-Containing lntrons and Strain-Dependent Variation in Flanking Exons Lambert A. M. Hensgens, Linda Boner& Muus de Haan, Gerda van der Horst and Leslie A. Grivell Section for Molecular Biology Laboratory of Biochemistry University of Amsterdam Kruislaan 318 1098 SM Amsteraam, The Netherlands

Summary The DNA sequences of two optional introns in the gene for subunit I of cytochome c oxidase in yeast mitochondrial DNA have been determined. Both contain long unassigned reading frames (URFs). These display regions of amino acid homology with six other URFs, two of which encode proteins involved in mitochondrial RNA splicing. Such conserved regions may thus define functionally important domains of proteins involved in RNA processing. This homology also implies that these URFs had a common ancestral sequence, which has been duplicated and dispersed around the genome. Comparison of the flanking exons in the long strain KL14-4A with their unsplit counterpart in D273-1 OB reveals clustered sequence differences, which lead in D273-10B to codons rarely used in exons. These differences may be linked to the loss or absence of one of the optional introns. Introduction The yeast mitochondrial genome displays a number of unusual features, one of which is the occurrence of optional introns in the genes for the larger ribosomal RNA (rRNA), apocytochrome b, and subunit I of cytochrome c oxidase (for reviews see Borst and Grivell, 1978; Dujon, 1981). Several of these introns contain long unassigned reading frames (URFs) (Nobrega and Tzagoloff, 1980; Lazowska et al., 1980; Bonitz et al., 1980a). For two of them, located in the apocytochrome b gene, convincing evidence has been given for the synthesis of the proteins they encode and for a role of these proteins in RNA processing (Lazowska et al., 1980; De La Salle et al., 1982; Weiss-Brummer et al., 1982; Van Ommen et al., 1980; Kreike et al., 1979; Alexander et al., 1980; Bechmann et al., 1981; Church et al., 1979; Mahler et al., 1982). Within the gene for subunit I of cytochrome c oxidase, several intronic URFs are present, and the fact that they have been conserved in evolution strongly suggests that they have a function. This function could be the catalysis of RNA processing (Grivell et al., 1982; Hensgens et al., 1982). First, complete processing of the transcripts of the subunit I gene is dependent on active mitochondrial translation. Second, studies with splicing-deficient mitt mutants have

implicated intron sequences, or their translation products, in splicing events. However, direct evidence that any of these URFs is translated into protein is lacking, and the alternative suggestion has been made that both their evolutionary conservation and the involvement of intron sequences in splicing can be explained on the basis of a strict coupling between translation and splicing (Borst, 1981; Schmelzer and Schweyen, 1982; De La Salle et al., 1982). That is, movement of the ribosome may be required to induce an RNA secondary structure favorable to processing. In an attempt to distinguish between these possibilities, we have carried out DNA sequence analysis of two introns in the gene for subunit I of cytochrome c oxidase, which are absent from the version of the gene sequenced previously (Bonitz et al., 1980a). Our results show that both introns contain open reading frames. They contain regions of homology with other mitochondrial URFs, and at least part of this homology may be attributable to constraints on protein structure. We suggest, therefore, that such regions of homology define functionally important domains of proteins involved in RNA processing. In addition, detailed analysis of both the introns and sequences flanking them has uncovered several unexpected features that throw light on the evolution of mitochondrial introns and of the genes they interrupt. Results Structure of the Gene for Subunit I of Cytochrome c Oxidase in Two Related Wild-Type Strains The structure of the gene for subunit I of cytochrome c oxidase in the Saccharomyces cerevisiae strain D273-lOB, sequenced by Bonitz et al. (1980a), is shown in Figure 1. The gene in this strain contains 78 exons, identified by their homology to the human protein (Anderson et al., 1981). The first four introns contain long reading frames in phase with their upstream exons. In a previous study (Hensgens et al., 1982) we showed that in S. cerevisiae KL14-4A, the gene has a more complex organization in that it contains additional sequences (insertions IIA and IIB), which form two extra introns, splitting exon A5 into three parts (see Figure 1). Sequencing Strategy We have recently described the cloning and characterization of the subunit I gene in S. cerevisiae KL144A (Hensgens et al., 1982). Cloned Mbo I and Bcl I fragments were used to construct a detailed restriction map of the region containing the extra introns, and these cloned fragments were used as a source of Mbo I, Eco RI*, Alu I and Hinf I fragments for recloning in bacteriophage M13. The DNA sequence was determined by the chain-termination technique of Sanger et al. (1977; 1980) according to the strategy outlined in Figure 2.

Cdl 380

The OX1 3 region

Figure 1. Gene for Subunit Oxidase in Yeast mtDNA

I of Cytochrome

c

Solid bars: exon sequences. Open bars: URFs within introns. Structure of the gene in D2731 OB (9979 bp) is based on the DNA sequence determined by Bonitz et al. (1980a); that for KL14-4A from data presented by Hensgens et al. (1982) and the nucleotide sequence (this report). The predicted length of the gene in this strain is thus 12882 bp.

A50

A50 ml

-

,

*k++b k-b-

c-1-

+-* --I-+

I -

+ ,

Figure

2. Stragegy

of DNA Sequence

1OOObp

I

(1982). lntron al5P in particular is unusually rich in inverted repeats ranging in length from 1 O-21 nucleotides, and the positions of these are shown in Figure 3. As in several other introns (Michel et al., 19821, two such repeats (designated a in the figure) are located close to the ends of the reading frame, and these could potentially form a stem-and-loop structure in which the bulk of the URF lies within the loop.

Analysis

Base of each arrow shows point at which sequence analysis was initiated; its length indicates number of bases read with confidence. Recognition sites for restriction endonucleases are : (1) Mbo I; (?I Hinf I; f?, Alu I; (*) Eco RI’.

The sequence is shown in Figure 3, and it confirms the fine restriction map of this region (Hensgens et al., 1982). Structure of the Extra DNA Sequence in the “Long” Form of the Subunit I Gene Comparison of our sequence (Figure 3) with that determined for S. cerevisiae D273-IOB confirms that exon A5 is split into three parts in KL14-4A. We designate these A&, A5/3 and A5y, and they have lengths of 252, 135 and 24 bp, respectively. The two introns separating these exon segments are 1368 and 1535 bp long. Both contain long open reading frames. That in intron al5o( is in phase with exon A~LI and is 822 bp long. The al5P URF is 996 bp long and is separated from the upstream exon by a 340 bp region containing multiple translational stops in all three reading frames. Both of these long open reading frames resemble other mitochondrial URFs in that the proteins they potentially encode are both basic and hydrophobic in character. The GC contents are very low (15.6% and 16.8%, respectively), and as also noted previously for other URFs, about 61% of the codons used are pure AU, leading to an abundance of Phe, Leu, Ile, Tyr, Asn and Lys residues. The reading frame of al5/3 is unusual in that it neither begins with nor contains an AUG codon. A computer search of the newly sequenced introns shows that both contain elements of potential secondary structure, which may be part of more extensive structures similar to those described by Michel et al.

How Is the Reading Frame in lntron al5jl Expressed? Since the al5/3 URF is not contiguous with an upstream exon and completely lacks AUG codons, it is not obvious how translation could be achieved. Three possibilities can be envisaged. First, translation may be initiated not at an AUG codon, but at either GUG or UUG, as sometimes observed in bacteria (Fiers et al., 1975; Young et al., 1981). or at AUA, AUU or AUC codons, as in animal mitochondria (Anderson et al., 1981; Bibb et al., 1981). Second, the URF could be opened up by a splicing step in which the untranslatable region between exon and URF is removed. Third, the same situation could apply, but analogous to the case in the first two introns of the apocytochrome b gene, a short exon could be embedded in the URF, necessitating a two-step splicing event. This proposal does not necessarily imply a difference in protein sequence between D273-10B and KL14-4A at this point. Examination of the DNA sequence (Figure 4) reveals homology between a region at the 5’ end of the URF (positions 1902-l 911) and the 5’ portion of exon A5y, which results in an identical stretch of three amino acids (Thr-Tyr-Tyr; TYY) in both the URF and exon A5y. This could conceivably result in a new exon of a maximum of nine nucleotides, if the following splice is made to a correspondingly shorter exon A5y. The first of these alternatives is difficult to verify in the absence of protein sequence data. We therefore examined the possibility that splicing of intron al5/3 occurs in two stages, by sequence analysis of the mRNA in this region. To do this, cDNA synthesis was primed from a point in exon A6, by use of a synthetic DNA oligonucleotide complementary to the mRNA, and the sequence was determined by a modification of the Sanger chain-termination technique (Sanger et

Homologies 381

Figure

in Yeast

3. Nucleotide

Mitochondrial

Sequence

lntrons

of the lntrons

aI%

and al5/3, with Flanking

Exons

The sequence shown is that of the nontranscribed strand, beginning from position 8169 in the sequence of D273-1 OB. which is just upstream of exon A5 (Bonitz et al., 1980a). Amino acid assignments have been made by the yeast mitochondrial genetic code (Bonitz et al., 1980b), with the modification that ATA is translated as methionine (see Hudspeth et al., 1982). Intron-coded protein sequences are lowercase. Underlined sequences indicate inverted repeats longer than 10 bp that occur within a single intron.

Cdl 382

of three As at position 7848 within the al4 reading frame is changed into three Ts in KL14-4A. Second, a third-position C to A change is found at position 9558 (D273-1OB numbering) within exon A6. Third, in aI57 at position 3063 (= 8621 in D273-1 OB), three nucleotides from the exon border, an A is changed into T. This nucleotide change leads to the loss of the Taq I site in KL14-4A. It is also of interest that the two strains differ in their steady-state concentrations of the excised intron al5y which accumulates as a circular 11 S RNA (Arnberg et al., 1980; Hensgens et al., 1982). This nucleotide sequence may thus be important for circularization and/or stability of the spliced intron. The high level of sequence variation within exons A5a and A5/3 is unusual and may well be related to the gain or loss of intron al5o( (see Discussion).

al., 1977; Sures et al., 1980). Both strains gave identical sequences in the crucial region (data not shown), demonstrating that if two-step splicing does occur in KL14-4A, both splicing steps involve the same site within the URF, or lie within the region identical in nucleotide sequence to the 5’ end of exon A5y. The Sequence of Exons A5cu and A5j3 in KL14-4A Differs from That of Exon A5 in D273-108 Comparison of exons ASa, A5/3 and A5y in KLl4-4A with their unsplit counterpart in D273-1OB reveals a high level of sequence variation (Figure 5). Eleven differences are found in a stretch of 84 nucleotides. Most of these are only third-position changes, but two, at positions -72 and -71 in KL14-4A, are first- and second-site mutations, which change the threonine codon ACT to CTT. In yeast mitochondria that codon also specifies threonine. The single amino acid difference observed between the two strains converts the tyrosine at position 8469 in D273-1OB to a histidine residue, which is also found at this position in the human, mouse and beef proteins (Bibb et al., 1981; Anderson et al., 1982). This clustering of sequence differences is unexpected. Only five other differences have been observed in a total of 2344 bp of additional sequences common to the two strains (Figure 5). These sequences include segments of al3, al4, aI57 (= al5 in D273-lOB), A4 and A6. First, as also reported by Netter et al. (1982) for S. cerevisiae 777-3A, a stretch

Al

D273-108

Discussion Features of the Newly Sequenced, URF-Containing lntrons The previously characterized URFs within the introns of the apocytochrome b and subunit I genes share several features (Nobrega and Tzagoloff, 1980; Lazowska et al., 1980; Bonitz et al., 1980a). All potentially code for proteins rich in basic and hydrophobic residues. Their coding sequences are contiguous and in phase with upstream exons, although they display somewhat different codon usage. Several show extensive sequence homology (Bonitz et al., 1980a), and

A5

a

.. . .

,. -.

Figure 4. Sequence Homology between Exon A$ and the Y-Terminal Regions of the al5p URF Creates the Possibility of a Two-Step Splice

A6.7

. ..

A‘ L77

951

252

921

.I.35

KC&LA I

t

.,:

500 bp

: -

-81

996

..:’

See text for explanation.

:

j

: .:

_:

,,1

.: _._._.

D

2L

.’

_ ‘. ,:’

V A F H GTAGGCATTCCACGAT

,.:

..:

..’ ...

.i ._...

LTYYIL TTGACTTATTATATATTA

TYYVVGH ACTTACTACGTGGTGGGACATTT

D A D T R A Y F T S A T M I I A I P T GAT GCA GAT CTT AGA GCA TAT TTC CTA TCT GCA CTA ATG ATT ATT GCA ATT CCA ACA AC C G G I K I F S W L A T I H G G S GGA ATT AAA ATT TTC TCA TGA TTA GCT CTA ATC CAT GGT GGT TCA C T T C G T C "I 1366

KL14-4A D273-10B

I

Exon A5-a

Figure

5. Strain-Dependent

Sequence

Variation

Exon A54

in Exons

A5a and A5/3

The sequence shown is that of KL14-4A. Exon A5 in D273-108 is identical, determined for KLI 4-4A were 6503-6625,6645-7579,7630-8353,8640-8810,8822-9018 et al., 1980a).

with the exception

of the changes noted. Additional sequences and 9525-9724 (numbering according to Bon&

Homologies

in Yeast Mitochondrial

lntrons

383

computer analysis shows that this homology extends to rather striking similarities in potential RNA secondary structure (Michel et al., 1982). Finally, the intronic sequences following the URFs are not translatable, although the downstream exons are usually preceded by short open frames. The intronic URF in the 21s rRNA gene differs only in that it begins with AUG and is flanked by blocked frames extending in both directions to the r-RNA exon sequences. The newly sequenced introns al5a and al5/3 show all the characteristics cited above, with the exception that unless it is preceded by an extremely short exon, the aI@ URF is not continuous with an upstream exon. Furthermore, it completely lacks AUG codons. Expression of this URF may be achieved either by use of an initiation codon other than AUG, or by a twostep splicing event which links it to a preceding exon. RNA sequence analysis across the A5a/A5P splice junction shows, however, that if two-step splicing is involved, it can occur only within a stretch of five nucleotides held in common between the start of the URF and exon A5y, so that the final mRNA carries no detectable trace of the intermediate splice. Homologies between lntrons The LAGLI-DADG Region We have compared the reading frames of al!& and al5P with previously sequenced URFs. These comprised the following groups: -URFs having similar coding lengths of approximately 1 kb, that is, al3 and al4 of the subunit I gene (Bonitz et al., 1980a); b12 and b14 of the gene for cytochrome b (Lazowska et al., 1980; Nobrega and Tzagoloff, 1980); the 21 S rRNA URF (rl) (Dujon, 1980) and the nonintronic URF downstream of the gene for subunit II of cytochrome c oxidase (Coruzzi et al., 1981). The structures of these URFs and their flanking regions are shown in Figure 6. Within the central region of each we found a stretch of approximately 115 amino acids (designated LAGLI-DADG), which shows considerable homology at both amino acid and nucleotide levels. These sequences were aligned to maximize amino acid identities, or conservative substitutions, with a minimum number of insertions and deletions (Figure 7). -all and al2 of the subunit I gene (Bonitz et al., 1980a). These lack analogous blocks of homology and have not been considered further. -varl (Hudspeth et al., 1982), which encodes a protein associated with mitochondrial ribosomes. This exhibits an internal stretch of homology (35% identity within a region of 40 amino acids) near the region designated block 5 in Figure 7, but since other blocks conserved among URFs in the first group described could not be detected, it was not included in the analysis. -A series of short, nonintronic URFs, located by a computer search of published sequences of yeast

‘LAGLI

‘DA06

\

/

. .m

al3 P 015/s m &

.. I region -1

l5-m .

bI2 1 rI 0 c)

015,t C Figure URFs

6. Conserved

Sequence

Elements

in Yeast

Mitochondrial

The figure compares mitochondrial URFs with an approximate coding length of 1 kb, located within genes for apocytochrome b (bl2 and bl4), subunit I of cytochrome c oxidase (al3. al4. al% and al5,8), the 21 S rRNA gene @I) and the nonintronic URF following the gene for subunit II of cytochrome c oxidase. Exon sequences are indicated by solid bars; URF sequences by open bars. The crosshatched region represents a segment of approximately 300 bp which shows conserved features at the-protein level. This stretch is flanked by distinctive nonapeptide motifs, which are designated in abbreviated form as LAGLI and DADG tone-letter code for amino acids). Triangle and asterisk indicate the positions of sequence elements identical with, or closely resembling, the cis-dominant box9 and box2 “signal” sequences described by De La Salle et al. (1982) and Weiss-Brummer et al. (1982).

mtDNA. Of these, an open frame following the gene for subunit Ill of cytochrome c oxidase (Thalenfeld and Tzagoloff, 1980) shows an overall homology of 30% with the LAGLI-DADG region of the URF downstream of the subunit II gene (Coruzzi et al., 1981); and in the vicinity of oli2, a short URF (Macino and Tzagoloff, 1980) displays homology with the LAGLI region of other URFs. Pairwise homologies of the amino acid sequences for the LAGLI-DADG regions of the eight URFs in the first group are tabulated in Table 1. The URFs in b14 and al4, whose relatedness was first observed by Bonitz et al. (1980a), show 80% homology at the amino acid level over this region, while al5P and al3 share 38% identity (with three insertions), and al5ai and b12 have 23% homology (with five insertions). The nonintronic URF shows 26% homology with the intronic URF a14. The two new URFs, al5o( and al5/3, do not share a detectable level of overall relatedness (11% homology). Although the homology between two random sequences is lo%-12% (allowing no insertions), amino acid identity between proteins, known to have a common ancestor, may also be extremely low (Paracoccus denitrificans cytochrome csso versus Pseudomonas denitrificans cytochrome cs5, = 18% homology with eight insertions; see Dickerson, 1980). The conserved region is bounded in each case by a distinctive nine amino acid stretch of the following consensus composition (see Figure 7, blocks 1 and 6): (apolar&

Gly (apola&

($;I) (;I;) Asp-

For example, in al4 these domains and FIGFFDADG. --

Gly

are L.4GLIDGDG --

Cell 384

Figure

7. Alignment

of the LAGLI-DADG

Region

in Yeast

Mitochondrial

URFs

Amino acid sequences were aligned to maximize identities or conservative substitutions, with a minimum number of insertions/deletions. Blocks 2-5 indicate regions with distinctive characteristics, such as high densities of charged amino acids (see acidic block 3 and basic blocks 4 and 5) or residues disrupting helical structures (see blocks 4 and 5). Within blocks 1-6 circles denote acidic residues, hexagons denote basic residues and diamonds denote a-helix breakers. Start points for each sequence are as follows, with the numbering system of the original publications: al3 5300; al4 7459 (Bonitz et al., 1980a); al5a 307, al5P 2184 (this report); b12 -594 (Lazowska et al., 1980); b14 1222 (Nobrega et al., 1980); rl - 797 (Dujon, 1980); oxil region URF 1001 (Coruzzi et al., 1981).

Table 1. Amino Acid Homologies Mitochondrial URFs

b14 80

al3

21

22

a15p

25

26

38

region

al5/3

oxi 1 Region

Region

b12

24

26

12

14

b12

13

17

14

16

rl

14

18

13

12

8

29

9

12

8

11

10

23

al5u Numbers 8. URF Families

al3

al4

oxil

Figure

al4

in the LAGLI-DADG

represent

percentage

of

rl

14

21

of identity.

in Yeast mtDNA

The dendrogram is a schematic representation of the data from pairwise comparisons of mitochondrial URFs in group I (Table 11, constructed according to Anderberg (1973).

Several other regions (Figure 7, blocks 2-5) also show conserved characteristics; for example, high frequencies of charged amino acids, or residues disrupting helical secondary structure, presumably reflecting additional constraints imposed by the function(s) of the URF gene products. The highly charged nature of these sequences is suggestive of globular, non-membrane-associated protein products. The strong homology at both the amino acid and secondary structural levels implies not only similar functional constraints but also a common origin. The specific relationships derived from pairwise homology comparisons are represented schematically in the dendrogram in Figure 8. They suggest the existence

of two families: one including the al4, b14, al3 and al!?+ and the nonintronic URFs; and the other including the bl2, aI% and the rl URFs. Since products of the b12 and bl4 URFs, representing each of these families, have been shown to encode proteins with mRNA maturase activity (Lazowska et al., 1980; Mahler et al., 1982; De La Salle et al., 1982, Weiss-Brummer et al., 1.982), we believe that all URFs may have the potential of encoding proteins which perform rather similar roles in RNA processing. In some cases these may have overlapping specificities (compare with the role of b14 maturase in the processing of the pre-mRNA for oxidase subunit I). The regions showing conservation of amino acid sequence likely result from functional constraints, while other URF-distinctive domains may reflect specialization of the protein depending on its particular environment.

Homologies 385

in Yeast

Mitochondrial

lntrons

What the role in RNA processing might be in the case of the nonintronic URF downstream of the oxil gene and for the 21s rRNA URF is not clear. The former might be involved in the maturation of the mRNA for the oxil gene. The latter is not strictly essential for rRNA maturation, since petites lacking mitochondrial protein synthesis can still remove the 21 S rRNA intron precisely (Tabak et al., 1981). Since the nuclear-encoded splicing machinery appears adequate, the contribution of the 21 S rRNA URF gene product may be to increase the efficiency of the operation. The Region Downstream from LAGLI-DADG The region following the highly conserved LAGLIDADG block (Figure 6) in the al4 and b14 URFs also shows considerable homology (approximately 47% at the amino acid level allowing two insertions). That this region of the protein has an important functional role is corroborated by genetic studies which demonstrated that some box7 mutations in b14 (De La Salle et al., 1982; Mahler et al., 1982; Weiss-Brummer et al., 1982) and many box3 mutations in b12 (Jacq et al., 1982) map here. A comparison of the al3 and al5P

sequences reveals.only a short stretch of high homology (9/l 1 identical amino acids at a distance of approximately 35 residues from the end of the al5P URF. Again, this might reflect a certain degree of specialization of these proteins. The Region Upstream from LAGLI-DADG In contrast, the region preceding the LAGLI-DADG block is much less conserved than the two regions described above, with radical differences in both sequence and length. Thus even between the al4 and b14 URFs there is only 37% homology, and this requires introduction of four insertions. In al5p this region even contains a GC cluster coding for a stretch of 18 amino acid residues. However, one important element known to map within this region in b14 is the cis-acting box9 sequence (De La Salle et al., 1982; Weiss-Brummer et al., 19821, which is believed to be a splicing signal (De La Salle et al., 1982). This sequence, also observed in al4, is not found at the analogous position in al3, al5a or al5/?. It is, however, present in the blocked frame of al3, downstream from a GC cluster, and there it can potentially be brought into close

box 2 sequence

bI4 a14

a13

DTSETTR . ..CATACT~CAGAGACTACAC#A...

EXON

,040 nt

. . . TG[AAGATATAGTCCA,T T...

18 nt

... g

85

740 nt

. . . TA,AAGATATAGTCCA,A A...

33nt

... L

A5

29 nt

... g

A4

. ..GAT-CfiCAGAGACTAC+$$hA... RS SETTR

. ..GAT-CC~CAGAGACTACACLjNTTGC-A-T

:.I h-T T-A A-T TV* T-A *-I c-c c-c c-c

-TGbAGATATAGTCC&h

A...

224

nt

. ..g

A50

.

22

nt

. ..g

a15p Lw

.

58

nt

. ..g

215 rEWA

T $:A T-h *-I T-A A-T A-T T-A T-A T-A aI50

. ..GTAT-TGTfAGAGACTA.AmGAATG-:I;

-T-,%$$i,i@T..

r1 & Figure

TA,@ATATAGTtTGAA..

region 9.

Homologies downstream

A&mijGTAA.

Conservation

of box9-Like

and box2-Like

Elements

in URF-Containing

. .

433

nt

. ..

lntrons

with the bl4 box9 and box2 nucleotide sequences are boxed. Distances URF for al5/3; and to the stoD codon for the oxil region URF are indicated.

to the downstream

exons

for

bl4, al4,

al3,

al&;

to the

Cell 386

proximity with another potential cis-acting element, a boxPlike sequence, by a hairpin structure (see Figure 9). The box2 sequence in bl4 is also thought to be a splicing signal (De La Salle et al., 1982). It is located in the blocked frame close to the downstream exons in bl4, al4, al3 and rl. In the two newly sequenced introns, box2-like and box9-like sequences are also present in the same physical orientation as observed in al3, but in the case of aI& at a distance of approximately 220 bp further from the downstream exon. In al5/3 they are located in the untranslatable region, just upstream of the open reading frame (see Figure 6). However, if the al5/3 URF is brought to expression by two-step splicing of the intron, then the box2-like and box9-like elements are in effect at the 3’end of the intron portion removed first, and this is a situation which is more in line with that in other introns. The consequence of a two-step splice is that the remainder of the intron will contain no sequences identifiable as box2 or box9 elements, and in this respect, it resembles intron bl2 (L. Boner?, M. de Haan and L. A. Grivell, unpublished observations). It is possible that in both these introns, other cis-acting elements are operative. The occurrence of box2-like and box9-like elements at different locations within different introns suggests that they may have undergone very recent translocation to new sites and/or that their function is relatively independent of position within the intron. It is also interesting that although the box9 sequence is indentically conserved in al4 and bl4, there appears to have been an insertion/deletion immediately preceding it, causing a frameshift (and thereby resulting in a completely unrelated protein sequence in this region). These points all argue against this region (that is, upstream to the box9 sequence in b14) having an important role in the active URF maturase. It may also be significant that the nonintronic URF begins immediately with the LAGLI-DADG block, lacking all upstream sequences. Sequence Variation-in Exons A5a and A5P The large number of sequence differences found in exons A5a and A5P, relative to A5 in D273-lOB, is clearly unusual, since there is at least 99.7% conservation of sequences between the two strains both in other regions of the subunit I gene (this report) and in other genes for which comparisons are available (ATPase subunit 9, Hensgens et al., 1979; segments of the cytochrome b gene, M. De Haan and L. A. Grivell, unpublished results). This high level of similarity applies to both exon and intron sequences, irrespective of whether the latter are translatable or not. Furthermore, the clustering of changes in sequences flanking an intron, as seen here, is not found elsewhere in the genome, even though the two strains differ by the presence or absence of several other introns.

Table 2. Strain-Dependent to Codon Usage

Sequence

Amino Acid

D273-106 (A3

KLI 4-4A (A5~x/ A5P)

Thr

ACT

CTT

CTG (2X)

CTA

TCC

TCT

TCT

Ser

Variation

in Exon A5 Related

Usage

in Known

ACA ACT ACC ACG

51 34 1 0

CTA CTT CTC CTG

14 2 0 2

TCA

TCA TCT TCC

71 34 1

AGT AGC TCG

15 0 0

ATC

26

149

Genes

Ile

ATC

ATT

ATT

Ala

GCC

GCT

GCA GCC

53 5

GCT 66 GCG 2

Tyr-His

TAC

CAT

TAT TAC

73 13

CAT CAC

47 3

Phe

TTT

TTC

TTT

71

TTC

60

Sequence differences in exon A5 between D273-1 OB (Bonitz et al., 1980a) and KL14-4A (this report) are listed according to the codons affected. One difference between the two strains (a GTG codon at +8580 in D273-108) has been disregarded because it was not confirmed by reverse transcription sequence analysis of mRNA from D273-108 (L. Bonen, unpublished observation). Overall codon usage in known yeast mitochondrial genes is for D273-108 and is taken from Bonitz et al. (1980a).

An additional unexpected feature is that with only one exception, the changes lead in D273-1 OB to the formation of codons rarely, if ever, used in yeast mitochondrial genes. Thus as Table 2 shows, two threonine residues at positions 8412 and 8458 are specified by CTG, a codon hitherto unused in known genes; similarly, the serine residue at position 8406 of D273-1 OB is specified by the only TCC codon ever seen in exon sequences; finally, ATC, TAC and GCC codons are used to specify isoleucine, tyrosine and alanine, respectively, and these are departures from normal usage. Taken together, these observations imply that it is the D273-108 sequence which is unusual, suggesting that the loss or absence of intron al5a is linked with sequence variation in the flanking exons. This is without precedent, since wherever loss of introns from genes has been reported, this has occurred cleanly, without consequences for flanking sequences. What is the origin of this sequence variation? One possibility is that the changes are directly related to a mechanism of intron loss involving internal recombination and error-prone repair. However, the sequence variation is apparently limited to the exons flanking intron al5cY, and, furthermore, it is difficult to think of a plausible mechanism for the concerted introduction of multiple mutations up to 7.5 nucleotides away from the splice junctions. Alternatively, the intron could have been lost and the unusual condons acquired as a result of gene conversion or recombination events between D273-IOB and a distantly related mtDNA.

Homologies 387

in Yeast Mitochondrial

lntrons

Such events may have been relatively recent, so that insufficient time has elapsed to permit back-mutation to more commonly used codons. This explanation accounts well for the multiplicity of the differences within exon 5 and their scattered nature. On the other hand, if new sequences were acquired by recombination, it is somewhat surprising that this did not also occur at other sites, in particular those flanking other optional introns. It is unclear why such an event should not have included that adjacent intron al5/3. Clearly, more extended comparisons of the sequences flanking other optional introns in the subunit I gene of both D273-10B and other mtDNAs will help resolve this dilemma. Mobility of URFs The homologies observed among the URFs discussed here imply that they have a common ancestor which has been duplicated and dispersed around the genome in the course of evolution, resulting in the present locations both within and outside genes. Such a common origin also implies that certain of the introns and their URFs may thus be ancient, while others could be relatively recent acquisitions. The idea that at least some introns are old is supported by the finding that the single intron in the apocytochrome b gene of Aspergillus nidulans mtDNA is homologous with b13 in S. cerevisiae and occurs at an identical position in the gene (Lazowska et al., 1981; Waring et al., 1981). Both introns could thus predate at least the common ancestor of the two fungi. An examination of the exon sequences flanking al4 and bl4, the two URFs with the highest homology, has led to observations suggesting that these introns are also very ancient, perhaps even existing in a common ancestor to the genes for apocytochrome b and cytochrome c oxidase subunit I (L. Bonen, unpublished observations). Regardless of the time at which the movement of the ancestral URF occurred, the mechanism by which it reached its present location is of interest. Although none of the present introns shows characteristics of typical insertion elements, it is possible that these have been lost, together with mobility, in the course of time. Various transposition-duplication or gene conversion events can be envisaged, but it is remarkable that many of the conserved features of the URFs are in locations consistent with a circular permutation (Figure 6). One interesting possibility is, therefore, that translocation of mitochondrial URFs may have occurred via a circular intermediate, according to a mechanism similar to that proposed for mobile eucaryotic nuclear (pseudo) genes (Nishioka et al., 1980; Jagadeeswaran et al., 1981). One could envisage a circular RNA intermediate, with subsequent cDNA synthesis and integration. This is especially attractive since stable circular RNAs, derived in some cases from URF-containing introns, have been observed in

yeast mitochondria et al., 1982). Experimental

(Arnberg

et al., 1980;

Hensgens

Procedures

Materials Restriction endonucleases were from Bethesda Research Laboratories and from New England BioLabs. DNA polymerase I (large fragment). T4 polynucleotide kinase and DNA ligase were from Boehringer Mannheim or Bethesda Research Laboratories. Low-melting agarose was from Bethesda Research Laboratories. DNA Plasmid recombinants containing mtDNA from S. cerevisiae KL14-4A were used as source of DNA for recloning into bacteriophage Ml 3. These recombinants, pKLlO6, 109, 111 and 114, have been described fully by Hensgens et al. (1982). They contain different Mbo I and Bcl I fragments ligated into the Barn HI site of pBR322. Mitochondrial DNA of the 0x13 petite mutant LH26-D7 was used as an additional source of DNA to generate Hinf I and Eco RI* fragments, which overlap the Mbo I fragments. Recombinant plasmid and mitochondrial DNAs were prepared as described previously (Hensgens et al., 1982). Cloning in Bacteriophage Ml3 Plasmid recombinant DNAs were digested with Sau IIIA. The appropriate fragments of mtDNA were then isolated from low-melting agarose gels cl%-2%) and ligated into phage Ml 3 mp7 cut with Barn HI (Messing et al., 1981). Large Sau IIIA fragments were additionally cut with Alu I or Hinf I to generate smaller fragments, which were then made blunt with DNA polymerase I (large fragment) and ligated into the Eco RI site of phage Ml3 mp2 with a 12-base Eco RI linker (CATGAATTCATG). The advantage of this linker is that its presence alone in the vector does not interfere with the synthesis of the aminoterminal fragment of /3-galactosidase because it maintains the reading frame. Recombinants containing only linker, therefore, still generate blue plaques. Eco RI’ fragments were generated by incubation of mtDNA from LH26-D7 in 2 mM MgClp, 20 mM Tris-HCI (pH 8.5) and 7% glycerol using a 50-fold excess of Eco RI for 8-20 hr at 37°C. These fragments were then ligated into phage Ml3 mp2 cut with Eco RI. Singlestranded DNA from each recombinant plaque was spotted onto nitrocellulose filters and hybridized with different “P-labeled plasmid recombinants to detect overlapping sequences. DNA Sequence DNA sequence chain-termination (1977; 1980).

Analysis analysis was carried technique according

out by the dideoxynucleotide to the method of Sanger et al.

Miscellaneous Labeling of DNA with 3zP by nick translation was carried out as described by Jeffreys and Flavell (1977). All experiments with viable recombinants were performed in Cl facilities in accordance with the guidelines laid down by the Dutch Advisory Commission on Recombinant DNA.

We thank Prof. P. Borst for his continuing interest and much constructive criticism, Dr. A. Tzagoloff for providing us with the sequence of the subunit I gene in D273-1 OB prior to publication and Dr. F. Sanger for giving L. A. M. H. and L. A. G. the opportunity to learn the phage Ml3 sequencing technique in his laboratory. We are also grateful to Dr. I. C. Eperon for his considerable help in the early phases of the work, Prof. J. H. Van Boom for synthetic primer and linker oligonucleotides and Mr. L. Posthumus for computing facilities. L. A. G. thanks EMBO for the award of the short-term fellowship, and L. B. is the holder of a Medical Research Council of Canada postdoctoral fellowship. This work was supported in part by a grant to P. Borst and

Cell

388

L. A. Grivell from The Netherlands Foundation for Chemical Research (SON), with financial aid from The Netherlands Organization for the Advancement of Pure Research (ZWO). The costs of ,publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. Received

September

24, 1982;

revised

November

4, 1982

References Alexander, N. J., Perlman, P. S., Hanson, D. K. and Mahler. (1980). Mosaic organisation of a mitochondrial gene: evidence double mutants in the cytochrome b region of Saccharomyces visiae. Cell 20, 199-206.

H. R. from cere-

Anderberg, M. R. (1973). York: Academic Press).

(New

Cluster

Analysis

for Applications.

Anderson, S., Bankier, A. T., Barrell, B. G., De Bruijn. M. H. L., Coulson, A. R., Drouin, J., Eperon, I. C., Nierlich, D. P., Roe, B. A., Sanger, F., Schreier, P. H., Smith, A. J. H., Staden, R. and Young, I. G. (1981). Sequence and organization of the human mitochondrial genome. Nature 290, 457-465. Anderson, S., De Bruijn, M. H. L., Coulson. A. R.. Eperon. I. C., Sanger, F. and Young, I. G. (1982). Complete sequence of bovine mitochondrial DNA. Conserved features of the mammalian mitochondrial genome. J. Mol. Biol. 756, 683-717. Arnberg. A. C., Van Ommen, F. J. and Borst. P. (1980). circular. Cell 79, 313-319.

G. J. B.. Grivell, L. A., Van Bruggen, E. Some yeast mitochondrial RNAs are

Bechmann, H., Haid, A., Schweyen, R. J., Mathews, S. and Kaudewitz, F. (1981). Expression of the split gene COB in yeast mtDNA. Translation of intervening sequences in mutant strains. J. Biol. Chem. 256, 3525-3531. Bibb. M. J., Van Etten, R. A., Wright, C. T., Walberg, Clayton, D. A. (1981). Sequence and gene organization mitochondrial DNA. Cell 26, 167-l 80.

M. W. and of mouse

Bonitz. S. G.. Coruzzi. G., Thalenfeld. 6. E., Tzagoloff. A. and Macino, G. (1980a). Assembly of the mitochondrial membrane system: structure and nucleotide sequence of the gene coding for subunit I of yeast cytochrome oxidase. J. Biol. Chem. 255, 11927-I 1941.

Biology of the Yeast Saccharomyces: Life Cycle and Inheritance, J. N. Strathern, E. W. Jones and J. R. Brorch, eds. (Cold Spring Harbor, New York: Cold Spring Harbor, Laboratory) pp. 505-635. Fiers. W., Contreras, R., Duerinck, F., Haegeman, G., Merregaert, J., Min Jou, W., Raeymakers. A., Volckaert, G., Ysebaert, M., Van de Kerckhove, J.. Nolf, F. and Van Montagu. M. (1975). A protein gene of bacteriophage MS2. Nature 256, 273-278. Grivell. L. A., Hensgens, L. A. M., Osinga, K. A., Tabak. H. F., Boer, P. H., Crusius, J. B. A., Van der Laan, J. C.. De Haan, M., Van der Horst, G., Evers, R. F. and Amberg. A. C. (1982). RNA processing in yeast mitochondria. In Mitochondrial Genes, P. Slonimski. P. Borst and G. Attardi. eds. (Cold Spring Harbor, New York: Cold Spring Harbor Laboratory), pp. 225-239. Hensgens. Nucleotide 9 of yeast 1667.

L. A. M., Grivell. L. A., Borst, P. and Bos, J. L. (1979). sequence of the mitochondrial structural gene for subunit ATPase complex. Proc. Nat. Acad. Sci. USA 76, 1663-

Hensgens, L. A. M., Amberg, A. C.. Roosendaal, E., Van der Horst, G., Van der Veen, R.. Van Ommen. G. J. B. and Grivell, L. A. (1982). Variation, transcription and circular RNAs of the mitochondrial gene for subunit I of cytochrome c oxidase. J. Mol. Biol., in press. Hudspeth, M. E. S., Ainley W. M., Shumard, D. S., Butow, R. A. and Grossman, L. I. (1982). Location and structure of the varl gene on yeast mitochondrial DNA: Nucmotide sequence of the 40.0 allele. Cell 30, 617-626. Jacq. C., Pajot, P., Lazowska, J., Dujardin, G., Claisse, M.. Groudinsky, O., De la Salle, H.. Grandchamp, C., Labouesse, M., Gargouri, A., Guiard, B.. Spyridakis A., Dreyfus, M. and Slonimski. P. P. (1982). Role of introns in the yeast cytochrome b gene: cis- and trans-acting signals, intron manipulation, expression and intergenic communications. In Mitochondrial Genes, P. Slonimski, P. Borst and G. Attardi, eds. (Cold Spring Harbor, New York: Cold Spring Harbor Laboratory), pp. 155-183. Jagadeeswaran, P., Forget, B. G. and Weissman, S. M. (1981). Short interspersed repetitive DNA elements in eucaryotes: transposable DNA elements generated by reverse transcription and RNA pol Ill transcripts? Cell 26, 141-l 42. Jeffreys, A. J. and Flavell, R. A. (1977). A physical map of the DNA regions flanking the rabbit ,&globin gene. Cell 72, 429-439.

Bonitz. S. G., Berlani. R., Coruzzi, G., Li, M., Macino, G., Nobrega, F. G., Nobrega, M. P., Thalenfeld, B. E. and Tzagoloff, A. (1980b). Codon recognition rules in yeast mitochondria. Proc. Nat. Acad. Sci. USA 77, 3167-3170.

Kreike, J., Bechmann. H.. Van Hemert, F. J., Schweyen, R. J., Boer, P. H.. Kaudewitz, F. and Groat, G. S. P. (1979). The identification of apocytochrome b as a mitochondrial gene product and immunological evidence for altered apocytochrome b in yeast strains having mutations in the cob region of mitochondrial DNA. Eur. J. Biochem. 707, 607-617.

Borst. P. (1981). The biogenesis of mitochondria in yeast and other primitive eukaryotes. In International Cell Biology 1980-l 981, H. G. Schweiger, ed. (Berlin: Springer-Verlag), pp. 239-249.

Lazowska. J.. Jacq, C. and Slonimski, introns and flanking exons in wild-type chrome b reveals and interlaced splicing

Borst, P. and Grivell. L. A. (1978). yeast. Cell 75, 705-723.

Lazowska, J., Jacq, C. and Slonimski, P. P. (1981). Splice points of the third intron in the yeast mitochondrial cytochrome b gene. Cell 27, 12-14.

The

mitochondrial

genome

of

Church, G. M.. Slonimski, P. P. and Gilbert, W. (1979). Pleiotropic mutations within two yeast mitochondrial cytochrome genes block mRNA processing. Cell 78, 1209-l 215. Coruzzi, G.. Bonitz, S. G., Thalenfeld, B. E. and Tzagoloff, A. (1981). Assembly of the mitochondrial membrane system. Analysis of the nucleotide sequence and transcripts in the OX11 region of yeast mitochondrial DNA. J. Biol. Chem. 256, 12780-l 2787. De La Salle, H., sequences within and cis-dominant oxidase. Cell 28, Dickerson, metabolism.

Jacq, C. and Slonimski, P. P. (1982). Critical mitochondrial introns: pleiotropic mRNA maturase signals of the box intron controlling reductase and 721-732.

R. E. (1980). Cytochrome Sci. Am. 242, 98-l IO.

c and the evolution

of energy

Dujon, B. (1980). Sequence of the intron and flanking exons of the mitochondrial 21 S rRNA gene of yeast strains having different alleles at the omega and rib-7 genetic loci. Cell 20, 185-l 97. Dujon,

B. (1981).

Mitochondrial

genetics

and functions.

In Molecular

P. P. (1980). Sequence of and box3 mutants of cytoprotein. Cell 22, 333-348.

Macino, G. and Tzagoloff, A. (1980). Assembly of the mitochondrial membrane system: sequence analysis of a yeast mitochondrial ATPase gene containing the o/i2 and o/i4 loci. Cell 20, 507-517. Mahler, H. R., Hanson, D. K., Lamb, M. R.. Perlman, P. S., Anziano, P. Ct.. Glaus, K. R. and Haldi, M. L. (1982). Regulatory interactions between mitochondrial genes: expressed introns-their function and regulation. In Mitochondrial Genes, P. Slonimski, P. Borst and G. Attardi, eds. (Cold Spring Harbor, New York: Cold Spring Harbor Laboratory), pp. 185-l 99. Messing, J., Crea, R. and Seeburg. shotgun DNA sequencing. Nucl. Acids

P. H. (1981). A system Res. 9, 309-321.

for

Michel, F.. Jacquier, A. and Dujon, B. (1982). Comparison of fungal mitochondrial introns reveals extensive homologies in RNA secondary structure. Eiochimie 64, 867-881. Netter, P., Jacq, C., Carignani, G. and Slonimski. P. P. (1982). Critical sequences within mitochondriat introns: &-dominant mutations of

Homologies 389

in Yeast

the “cytochrome-b-like” 738.

Mitochondrial

intron

lntrons

of the oxidase

Qene. Cell 28, 733-

Nishioka. Y.. Leder, A. and Leder, P. (1980). Unusual ol-globin-like gene that has cleanly lost both globin intervening sequences. Proc. Nat. Acad. Sci. USA 77, 2806-2809. Nobrega, F. G. and Tzagoloff, A. (1980). Assembly of the mitochondrial membrane system. DNA sequence and organization of the cytochrome b gene in Saccharomycescerevisiae. J. Biol. Chem..255, 9828-9837. Sanger, F., Nicklen, with chain-terminating 5463-5467.

S. and Coulson, A. R. (1977). DNA sequencing inhibitors. Proc. Nat. Acad. Sci. USA 74,

Sanger. F., Coulson, A. IX, Barrell, B. G., Smith, A. J. H. and Roe, B. A. (1980). Cloning in single-stranded bacteriophage as an aid to rapid DNA sequencing. J. Mol. Biol. 143, 161-l 78. Schmelzer, C. and Schweyen, involved in splicing of yeast Res. 10, 513-524.

R. J. (1982). mitochondrial

Evidence transcripts.

for ribosomes Nucl. Acids

Sures, I., Levy, S. and Kedes L. H. (1980). Leader sequences of Strongylocentrotus purpurates histone mRNAs start at a unique heptanucleotide common to all five histone genes. Proc. Nat. Acad. Sci. USA 77, 1265-l 269. Tabak. H. F., Van der Laan. J., OsinQa, K. A., Schouten, J. P., Van Boom, J. H. and Veeneman, G. H. (1981). Use of a synthetic DNA oligonucleotide to probe the precision of RNA splicing in a yeast mitochondrial petite mutant. Nucl. Acids. Res. 9, 4475-4483. Thalenfeld. B. E. and Tzagoloff, A. (1980). Assembly of the mitochondrial membrane system sequence of the oxi gene of yeast mitochondrial DNA. J. Biol. Chem. 255, 6173-6180. Van Ommen, G. J. 8.. Boer, P. H., Groot, G. S. P., Roosendaal. E., Grivell, L. A., Haid, A. and Schweyen, Mutations affecting RNA splicing and the interaction of sion of the yeast mitochondrial loci COB and 0X13. Cell

De Haan, M., R. J. (1980). gene expres20, 173-l 83.

Waring, R. B., Davies, R. W., Lee, S., Grisi. E., McPhail Berks, M. and Scazzocchio, C. (1981). The mosaic organization of the apocytochrome gene of ASparQilluS nidulans revealed by DNA sequencing. Cell 27, 4-l 1. Weiss-Brummer, B.. RGdel, G., Schweyen, R. J. and Kaudewitz, F. (1982). Expression of the split cob in yeast: evidence for a precursor of a “maturase” protein translated from intron 4 and preceding exons. Cell 29, 527-536. Young. I. G., Rogers, B. L., Campbell, H. D., Jaworowski, A. and Shaw, D. C. (1981). Nucleotide sequence coding for the respiratory NADH dehydrogenase of Escherichia co/i. UUG initiation codon. Eur. J. Biochem. 7 7 6, 165-l 70. Note Added

in Proof

The work referred to throughout the text and in the list of references as HenSQenS et al., 1982, should be Hensgens et al., 1983.