Sequence analysis of mitochondrial DNA from Podospora anserina

Sequence analysis of mitochondrial DNA from Podospora anserina

,I. MOz. Viol. (1988) 204, 815-839 Sequence Analysis of Mitochondrial Podospora anserina DNA from Pervasiveness of a Class I Intron in Three Sepa...

2MB Sizes 4 Downloads 78 Views

,I.

MOz.

Viol. (1988) 204, 815-839

Sequence Analysis of Mitochondrial Podospora anserina

DNA from

Pervasiveness of a Class I Intron in Three Separate Genes Donald

J. Cummings

and Joanne M. Domenico

Department of Microbiology and Immunology University of Colorado School of Medicine Denver, CO 80262, U.S.A. (Received 17 May 1988, and in revised form 10 August

1988)

A 48 kb region of the 95 kb mitochondrial genome of Podospora unserina has been mapped and sequenced (1 kb= lo3 base-pairs). The DNA sequence of the genes for ND2, 3, 4, ATPase 6 and URFC are presented here. As in Neurospora crassa, the ND2 and 3 genes consist of a unit separated by one TAA stop codon. ND3, 4 and ATPase 6 are interrupted by class 1 introns. All three introns are remarkably similar in the C-domain of their secondary structure, sufficient enough to designate them as new subgroup, class IC introns. The open reading frames of the ND3 and 4 introns bear a high sequence similarity to the open reading frame of the class IB introns of ATPase 6 from N. crassa and ND1 from Neurospora intermedia Varkud. We also show that the tRNA Met-2 gene is duplicated and is involved in a recombinational event. The 5’ region of URFC is also duplicated but no involvement of this gene with recombination or formation of plasmids is known. The evolutionary significance of the similarities of intron secondary structures and open reading frames of the ND3, 4 and ATPase 6 genes is discussed, including the possible separate evolution of structural and coding sequences.

1. Introduction

complete class II intron of the cytochrome oxidase subunit 1 gene (COI; Cummings & Wright, 1983; Osiewacz bz Esser, 1984; Cummings et al., 1985). Mitochondrial introns have been classified into two major classes or groups, I and II (Michel et al., 1982; Davies et al., 1982), depending on consensus sequences and folding characteristics. Other mt plasmids have shown themselves also to involve introns. For example, E senDNA contains part of the ND1 gene (formerly termed URFl but now known to be part of the NADH respiratory chain complex: Chomyn et al., 1985). This gene has four large class I introns (Cummings et al., 1988u), two of which are almost completely contained within E senDNA (Cummings et al., 1985; Michel & Cummings, 1985). Both of these class I introns showed extraordinary similarity with regard to both primary and secondary structure to the selfsplicing intron of the large ribosomal nuclear gene of Tetrahymenu (Cech et al., 1981). Thus far the complete DNA sequence of a fungal mt genome has not been reported. The presence of so many introns on the 100 kb mt genome of P. anserine, as well as the possible involvement of intronic sequences in the formation of mt plasmids as part of a senescent process, has prompted us to

The circular mitochondrial (mtt) genome of the filamentous ascomycete Podospora anserina varies in size between 93 and 102 kb depending on the race studied (Cummings et al., 1979a; K&k et al., 1985). Like the petite mutation in yeast (see Bernardi, 1979), the stopper mutants in Neurospora (Mannella et al., 1979; Bertrand et al., 1980) and the ragged phenotype in Aspergillus (Lazarus et al., 1980), mtDNA from P. anserina is highly plastic, undergoing excision-amplification of specific gene coding regions. In all the cases studied thus far in Podospora, excision amplification occurs in senescent cultures (Stahl et al., 1978; Cummings et al., 19793; Jamet-Vierny et al., 1985; Wright et al., 1982) or in long-lived phenotypes (Turker et al., 1987a,b). The most commonly occurring mt “plasmid” has been termed a senDNA or plDNA (Stahl et aZ., 1978; Cummings et al., 1979b; JametVierny et al., 1984; Wright et al., 1982). DNA sequence analysis has shown that a senDNA is a 7 Abbreviations used: mt, mitochondrial; kb, lo3 base-pairs; ORF, open reading frame; bp, base-pair(s);

URF, untranslated reading frame. 0022-2836/88/240815-25

$03AK1/0

815

0 1988 Academic Press Limited

816

D. J. Cummings

undertake the DNA sequence analysis of this mt genome. Here, we have concentrated on the linearized region depicted in Figure 1 (a complete circular map has been given earlier: Cummings et al., 1985). The identity of some of these genes has been or is being reported elsewhere (ND1 , Cummings et al., 1985; Cummings et al., 1988a,b,c) while others will be presented here. As can be noted, several mitochondrial plasmids originate in this 50 kb region, including E senDNA, /? senDNA (Cummings et al., 1985) and plasmids associated with longevity phenotypes (Turker et al., 1987a,b,c) o senDNA, 4 senDNA, 9 senDNA, LMt-1 and the SMtl, 2 and 3 family. In this paper, we present DNA sequence analyses on the ND2, 3 and 4 genes as well as ATPase 6. Physically, these genes are well separated, with ND4 being some 26 kb distant from the ND2, 3 unit and ATPase 6 being another 12 kb upstream. We present them together here because the class I introns contained within the ND4, ND3 and ATPase 6 genes are remarkably similar to each

and J. M. Domenico

;1

other with regard to both primary and secondary structure. In particular, one region of their secondary structure, the so-called C-domain (Michel et al., 1982) has sufficient structural resemblance to constitute a subclassification of these class I introns as class IC. There are (at least) two other mitochondrial introns with a similar structure, the two ND1 introns contained within E senDNA

(Michel reading

and

& Cummings, frames

ND3

(ORFs)

exhibit

1985). Finally, of the class I introns

high

amino

acid

the open in ND4

sequence

similarity with each other as well as with the ORF of the class I intron of the ATPase 6 gene of N. crassa (Morelli & Macino, 1984).

2. Materials and Methods (a) DNA sequence analysis P. anserina mitochondrial DNA clones for restriction fragments EcoRI-1, 3, 12 and 13 as well as PatI-4b, 4c, 6, 7, 9 and 10 from races s and A ( +) mating types were prepared as described (Wright et al., 1982). The chemical sequencing method of Maxam & Gilbert (1980) was utilized throughout. Twice gel-purified DNA fragments were dephosphorylated with calf alkaline phosphatase (obtained from Boehringer-Mannheim) and then labeled at their 5’ ends using bacteriophage T4 polynucleotide kinase (Bethesda Research Laboratories) and [y-32P]ATP (ICN). Uniquely labeled 5’ end fragments were obtained by a second restriction enzymatic digestion followed by gel purification. When necessary, partially digested products were analyzed either before or after 32P-end labeling. In addition, 3’ labeling of some fragments (Hid, Tag1 and MspI products, for example) was done using fill-in reaction with Klenow fragment (Bethesda Research Laboratories) and the appropriate [cr-32P]dNTP (ICK) (Seilhamer et al., 1984). The combination of 5’ and 3’ labeling as well as numerous partial products allowed us thoroughly to overlap the DNA sequences of both strands. Restriction endonucleases were obtained from Bethesda Research Laboratories and their reaction conditions were utilized.

w

Pervasiveness

817

of a Class I Zntron

TAATTAc& TTTTTATTETATAAAGTR AAGmiCTil CCTTGAAGZY AATTTTTTKTATAGTAGE TAGTATATV? TAATTTGA?? TTTATTCAA TtD$i-j E ;CT ;TG TTA $?

:TT ;TT ;TT :TT F

FT

[TG p

;AT FT

;TT PT .$GT t::

;TA ;TA F

fiT

;TT fT

;TT 4”

G

K

S

,,,T~C~T~~T~~TTTT~;JGPiPlGCN~~Ad~T~~~TAGTajA S F S E S N N

A

249 GGA GM TTA TTA AAA % GELLKVLVINNELNFYKKI

GM TTA AAT TTT %

CTC GTA ATA AAT ii?

FT i?? fTA ;CT +CA fTT ;%

f$T ;TT “’

(jCT fT

AAT ATA TAT % N I Y L

FT [TA !TT ;TA ;:i

C$A ;TT F

TTG p L

TCA ATA TAT TTT %

[TA :TA ;CA /AT iii!

1% [TA ;TA fTA FTA f%! [TG ETG p

A

fT

N

G

AAA AAA ATT

[TA :TA fTT ;AT ;%

;TT ;TA ;AA ffiT p

GGG GTG GAT GGT % G VDGLSIYFVLLTTI

fTA ;TG ;CT ;TA ifi!

G

;TA

[TG ;TT

[TA FT $6

TAT Y

TTA TTA ACA ACT tt;:

;CA TTA :CA ii:

#AT :TT f

TCT r?? ;TA :TG&$C ;TA ;:?

;CA

;TA {TA [TA

639 CCG CCT TTA TTT ATA TTA PPLFIL TTT TAT ATT TTT :?i FYIFLY

TAT

!GC ;CT AM BT ETA ;TA K Hif we ;TG FCT ZAG FT

AAA TCA CTA I% KSLVDIWVIAVLS Hinf

y

We

GAT ATA TGG GTA %%G

CAA % Q E

CGA G R VT’ i?

GGT GTA MA G V K

933 MT TTA GAA :‘ORFTTC GTA v $++$I N L !TT “:

873 a58 GTT c9;‘n”AGT CAT FCC ;GC ;ITA ;GT TCT H

f

;CT p

;AT ;GT ;CT ;GCIF!

;CT

918 Taq

978

:TC !?!+A

813 :TG ;TA

1038

1023 FT F

LCTA

Fig. 2.

(b) Computer analysis DNA sequences were analyzed using the programs provided by the BTONET National Computer Resources for Molecular Biology (funding is provided by the Biomedical Research Technology Program, Division of Research Resources, NIH grant RR01685).

3. Results (a) DNA

sequence of the ND4

gene

Cloned fragments from EcoRI-3 and 12 and PstI6 and 9 were isolated and their DNA sequences determined using 5’-end-labeled BglII, DdeI, EcoRI, HaeIII, HindIII, HinfI, MspI, Sau3A and TaqI endonuclease fragments. The nucleotide

side of Figure 1 is sequence in the left-hand presented in Figure 2, starting at an arbitrary position m the EcoRI-3 fragment. The start of the ND4 gene was determined using a computer comparison with the identified ND4 gene of Aspergillus nidulans (Brown et al., 1983). At nucleotide position 100, there begins an excellent alignment of the P. anserina open reading frame with the A. nidulans ND4 gene (Fig. 3). The P. anserim open reading upstream for another 93 frame continues nucleotides but the sequence similarity commences at position 100 here. The amino acid sequence similarity continues down to position 706 where the P. anserina sequence is interrupted by a 1399 bp class I intron. The A. nidulans sequence does not

818

D. J. Cummings and J. M. Domenico 1053 TAT AAT ?TA $CT fCA CAG rT1p Y N 61

+CG ;AC fCA CAClp H

[TG ;CT FCT TTA 2; L 0

CCT P

1113 1128 1143 1158 TGA TTT ATA ACA GGA TTT GTA GAT GCT GAA 'XX TGT TTT ATG ATA EGG TTA ACA AAA W F I TGFVOAEGCFNIGLTK Dd*1188 Alu Sau 1203 /$GT GAA CTA TAT AG4 TCG GGT TAT CAA GTA TCA ;CT +TA ;TT fiA fTT PT [TA ;GG E L Y R S G Y Q V 1218 ;GT fCA $GT y

1278

ATA !TT !CA fU I

FT

1233 1248 ;AT :CT :TA ;TA ;GT ;AA fTT fiA

;AT p

1263

$4T ;AT ;TT FT

:TA FT

1293 1308 Sau 1323 y fCA ;CA ;TA ;AA ;AT ;TG ;TT fiA ;CA $TA y

1338 1353 1368 TTA PT ;TC fTT [TA ;CT ;AT ;TTT E;AT ;CA ;AT ;CT :TG ;TT FT L

'$A fU

1398 1413 1428 GAT TAT ATA CTT TTT AAA CAA GCA ATT GTG TTA ATT AAA $AT y OYILFKQAIVLIK

p

$!

GAT 0 ;CT

1443

;AT TTA ACT L T

Hind 1459 1469 1489 1499 1509 AAA TAA GGACTTAAAAATATTCTTTC TTTAAAAGCTTCTATTAATT TAGGTTTATCTGATGAATTG K .

1519

1529

1539

1549

1559

1589

1599

1609

1619

1629

1659

1669

1679

1689

1699

CAATTAGTTT TTCCTGATATTATACCAATATCTAGACCATCTCCCTGAAAB

ATGAAAATTAAAAACTTTAA TmA@@ATGG

1569

1579

GGCACAAGAC

1639

1649

1709

1719

ATGTTTTTTT TGTCTCTATG

CGTAATTCTTCTCATACTGTATCAUjTMA TCTGTGGTATTAAATTTTCA AATTGTTCAGCACAGTAGAG

1729 1739 1749 1759 1769 1779 1789 ATGAGGAGTTAATGMXATG TTAATATCTG CCCTTGGATGCGGTAGAATAGAGTTAGCCTTAAAGCAATC 1799 1809 1819 1829 1839 1849 1859 TGCTGTATATTATGTTGTAACTAAATACCAAGATATTTTT GATAAATTAA TCCCACTTTT TTATAATCAC 1869 1879 1889 1899 CCTATTAMG GTGTAAAAGCCCTAGATTATTCAGACTTTTAACAAA#% CTTTTTi%

AAWiTAA%

1939 1949 1959 CTCATTTAAC TGAACAAGATTTAATGGAAATCCAAT%i?

Ah

2009 Alu

2019

2029

2039

AGCTAGTAATAGCTAGTAGTTATATACUiGTAATAAGGCTG 2079 2089lmhu 2099 TAAATTAAAGAAAAGACATTTATCATGTCAACAAC G TTA W&A

2137

2152

2167 1

FTT [TA :CT ;TG ;CT ;CT ;TA +TT FT

;CA ;CA ;AT LTG FT

;nD:%-?TA

2182

TTA TTA TCT

;TT [TA ;CA y

$GT

Fig. 2.

have an intron (Brown et al., 1983). After this intron, at position 2106 the amino acid sequence similarity is restored and proceeds down to the end of the ND4 gene at position 3055. The two ND4 genes have 70% amino acid residue identity and are about the same size, 488 for A. nidulans and 519 for P. anserinu with most of the difference occurring near the 5’ portion of the gene. With regard to Figure 1, the EcoRI site separating EcoRI-7 from 12 is at position 937 and the two EcoRI sites separating EcoRI-12 from EcoRI 8a are at positions 2946 and 3208, respectively. Detailed analyses of the open reading frame and secondary structure for the ND4 intron will be presented later when the introns for ND3 and

ATPase 6 are examined. S&lice it to state here that this intron has characteristics common to many class I introns. The last upstream exon base is a T and the last downstream intron base is a G; the f and f’ sequences necessary for maintaining the secondary core structure (see later) are boxed as is the single rather than double dodecapeptide sequence common to maturase-type intronic sequences (Waring et al., 1982; Hensgens et al., 1983; Michel 1984; Michel & Cummings, 1985). As we will see later when we analyze the intronic amino acid sequences of ND3 and ND4, it is possible that there is a second dodecapeptide sequence downstream and the nucleotide sequence is underlined starting at position 1601 (see Fig. 2).

819

Pervasiveness of a Class I Intron

fT

2197 ;TT

?A

p

p

;AT

2257 ACT ccT T P

Ode 2317 FCC ;TA FT p

+CA fcT'i$

fTA

$GT :TA

;TA

:TT';i

;TT

2377 ;GA CTA :TC L

2422 !TT

;TT

FT

:TA

FT

2437 fTT

fTA

2482 FT

fTA

p

p

;TA

2497 fTT

:TA

FT

2542 ;TA

;TC

.$CiT FT

!TT

p

5AT $#

;TG FA

;GT y?

13

;CT

2707 TTC TCC :TA F S

TTA :TT'f% L

$iA

$TA ;TT

2272 AcA GTT TTC TTA MT AGT TGA TTA'% T V FLNSWLLKAHVES

fTA

FT

;TT

tCC=$

CTT ;TT L

CTA ;CT L

FT

2392 TTA TTA ;CT L L

$TT ;CA ATA !TT I

fTT

:TA'F

y

;CT

2452 ;AT

:TA

;TT

2557 ATT ;AA I

;TA

;T&

;CC FT

p

ATT y

;TT

;TT

;TA

[TA

;GT ;TA

2572 FT

I

2407 fT

$GT [TA

;AT'$:

;AT

;CT

;CT

2467 $GT fCA [TA

FT

;GT rA2F:

FT

;GT FT

TAT TTA

;TA

2587 FT

;TT

FT

$TA [TG ;ATq:

FA

FTA

;GA ;CT

fTA

AGA GGA ACT GCT2%i R G T A 61 2737 FT

FT

AAA~L~UCT CAT GTA cZ'TCG

Alu Ode 2512 GCT TAT AGC TCA GTA TCT CAT GC%A2~~~ AYSSVSHAAVYL

ATA ATA TCT TAC?i; S Y Y I I 2722 ;TC

FTT {TA

,TAAFT

:TA

GTA ATG CCT TTG M P L V 2752 $CT [TA

;CT

;TA

2782 2767 2797 2812 AAT TTT GTA GfX CiAA TTT ATG TCT TTA TAT GGA ACT TTT GAA AGA TTA CCT TTA TTA NFVGEFMSLYGTFERLPLL 2842 ;CT ;TG ATA ;TC I

FT

2827 CTA ;TT L

MC N

AlU CGT ATA GCT TTC C&T GGT TCA2% GGSFSKFFEENIGD R I A F

;TA

;CT

p

y

2992 PA

fTA

;AC

3007 ;CA ;CC ;TT

TCT eGT ;CT S

Ed3 y FTC ;TT

3065 3052 TTT GGT GTT TAGAGTACTT F G V

1

TTTTAT%?

TTTACG%

3205 3195 ATGAGTGCAA ATTCATAGGA

fTT

be 2857 ;CG ;CC ;CT

AGT AAA TTC TTT'%

[TA

:TA

2962 ;TT

[TA

FT

y

TAT?&i%

2872 ;TG ;AT

GAA AAT ATT GGT2%

fCT

;TA

;TA

2977 ,fTA ;TT

3022 [TA

:AT

;AT

3037 $AT $TA +CA $GT [TA

3075 3085 TGTTGCTGTA ATAAATAACA

ATCTAAi%

TAC ACT ATA ;AT Y T I

AAAGCA%

CGTCCA%

;CT ;TT

TTTTCG%

GTTCTG%

;TG ;TT

[TG

TATTTT%

TAAGd?

ECOR

GGGMTTC

Figure 2. DNA sequence of the ND4 gene. The non-coding strand sequence starts arbitrarily ND4 exon and continues to the EcoRI site of EcoRI-8a. Restriction sites are marked. Intron boxed or underlined.

(b) DNA sequence of the ND.2 and ND3 genes The DNA sequences separating ND4 and ND2 and ND3 have been published elsewhere (EcoRT-8a and 9, Cummings et al., 1985; EcoRI-5 and 7, Turker et aE., 1987a; Cummings et al., 1988a,c). To obtain the sequence for the ND2 and ND3 genes, only the EcoRI-1 fragment was required. As can be seen in Figure 1, EcoRT-1 also contains the genes for the small rRNA gene, COIII, several tRNA genes, URFC and ATPase 6. The DNA sequence encompassing the ND2 and ND3 genes is given in Figure 4. The sequence starts within the 3’ end

upsteam from the 1st sequence elements are

region of EcoRI-7 and continues through the contiguous EcoRI-1 fragment to include the complete sequence of tRNA Met-2. As can be noted in the map in Figure 1, tRNA Met-2 is duplicated at the 3’ end of EcoRI-1 (see later). Interestingly, II/ senDNA (Fig. 1) excises within this duplicated tRNA (Cummings et al., 1987; Turker et al., 1987a). $ senDNA is actually a chimaeric 2.5 kb plasmid in that it results from a double cross-over event that removes almost the entire EcoRI-1 sequence. In this respect it is quite similar to the results reported by Gross et al. (1984), where it was shown that the tRNA Met-2 gene was also repeated in the N. crassa

820

D. J. Cummings

::

\;

and J. M. Dmenico

!+~TTL~Y

LHLVTLQGNYGLSI/~

VK’:“;‘l”‘:

MS~LLYALLIIPHIGIFFILSFDSYNFNITSNNSNSGSFSEAGAGKNSGGELLKVLVI~ELNFYKKIAF~T

&nidulanr --Pmserina

MS-LL--LL*---*G------------------------------------------*NN-----K-*A--T ~~I~~~~~~ILFDFSSKQYQF~EEHY~INHFDI~GMX;LSIY~LLTTII~IAIL~SIESK~ IllIll I II IIIIIIIIIIIIIlIIIIIIIII TIHNLIVSLIIYILFDFSTNOF~VQD~ELSVYNIYLGMGLSIY~LLTTIIMPIALISF~~\~SI

llllll

A.nidulons

~FI~IHLLLETLLLAVFLVLDILLFYIFFESILPPLFLLI~F~~VRAS~LFLYTLLGSLF~LSIIA I lliIIllIIIlIlII IIIIlIIllllIlII IIIIIIIIIIIIIlII III1 AYLIIILLLETLLLAVFLVLDVLLNIFFESILPPLFILIGLFGSSFLLLSILT

lllll

THN~K _P.anserina

A.Iill

b

ITSI~:SDFDALTK~NFNYITQIFLFYGIFIAFRM
256 GIFRLILPLLPK~SINYTYIIYVIGVITILYASFSTLRTIDIKELIAYSSVSHAAWLIG~IEGS IIlllIlIIIII I III II II III IIIIIIIIIIIIIIIIIIIIIIIIIIIIII III IIIIII 288 GIFRLILPLLPKISLNYTSIIFSIGIITIIYASFSTLRTIDIKELIAYS!X%AA VYLIG~FSNIIM~IEG~

~.onserina

A.nidulans ~.gnserino

A._P.gnsering

GIFRLILPLLPK-S-NYT-II--IG-ITI-YASFSTLRTIDIKELIAYSSVSHAAWLIC-IEG328 IALGLA%FVSSGLFICAGGXLYI%!~NDRLITWRGI~PIFSVLFFILALGNSGTPLTL~XGEFBSLY I III IIIIIIIlIIIIII IIIII I I IIII II II II Illll III I llllll IIIIIII 360 ILLGLG%FVSSGLFICAGG~LYDRSGTRIISWRGTAQVPIPLFSILFFILCL@+ZGAPLTLNF~GEFMSLY

A.nidulons P.gnserina

I-LU-HGFVSSGLFICAU;-LM)RS--R-I-WRG-AQ-PP-FS-LFFIL-LGIJ-G-PLTLNF-GEFMY 400 CI~:~H~~I:~~STSI~FSAAYTIFPNNRIVFGGSYSI~FRENIGWTRREFIHLLVFV~L:M:CI~:~;~~P I I lllllll lllll Ill1 I I Illllll Ill I 432 GTFERLPLLGLFSSSS~IFSTIVMYNRIAFGGSFSKFFEMIGTKREFFLLFTLIIFTI~FGIYPSF

A.Jonserina

G-FER-P-LG---S-S--FSYTI-~NRI-FGCS-S--F-ENIGDVT-REF--L----I-T--FGIYP-472

ILDJX~YSVSYLIYNIN IIIIIII I I 504 IUX+HYNVTSLLFGV

A.nidulanL e. gnserina

I-Hy-v-+-----

Figure 3. Amino acid alignment of the ND4 exon sequence of P. anserina with that of A. nidulans (Ketzger et al., 1982). The consensus sequence is given below each region. The alignment starts at position 100 and ends at 3055 in Fig. 2.

mt genome and was involved in recombination resulting in two smaller circular DNA molecules. Tn this case, however, the separating sequences were not excluded. As determined by a computer sequence similarity search, the ND2 gene starts downstream from the tRNA Met-2 gene at position 342 and continues to the stop codon at position 2010 for a total of 1668 bp. Remarkably, the ND2 and ND3 genes are separated by just this TAA codon and this exact situation prevails in the N. crasSa mt genome (deVries et al., 1986). The amino acid sequence similarity for the P. anserina ND2 gene with that gene from N. crassa is given in Figure 5, where it can be seen that the P. anserina gene contains 556 codons and the N. crassa gene 583, and 78% of the residues are identical. As indicated, for both N. crassa and P. anserine an open reading frame starts immediately following the end of the ND2 gene. This ORF was identified as the ND3 gene by virtue of its amino acid sequence similarity with the ND3 gene of liverwort chloroplasts (Fig. 6; Ohyama et al., 1986) where it can be seen that there is a 27% amino-acid identity and 64% of the amino acid deVries et al. (1986) residues are conserved.

reported some of the 5’ region exon sequences of the ND3 gene and of the 30 amino acid residues given in the N. crassa gene, 87 o/o were identical with those shown here. The ND3 gene is quite short, being only 121 amino acid residues in liverwort chloroplast DNA and 130 in Podospora mitochondria. Shortly after the start of the ND3 gene, the DNA sequence is interrupted by a 1271 bp class I intron at position 2103 (shown in Fig. 4). The ND3 gene in N. crassa is also interrupted by a class I intron at this same position (devries et al., 1985, 1986). As with many introns, the ORF of the exon is continuous with the intron ORF. Of the 19 amino acid residues given for the N. crassa ND3 intron, 90% are identical with the P. anserinu ND3 intron. The P. anserina intron is quite typical. The last upstream exon base is a T and the last downstream the intron base is a G, and it has both dodecapeptide sequences expected for a maturase type protein. The f and f’ sequences are boxed. Just before the end of the intron, the 3’ excision site of w senDNA is noted. As with many senDNAs, excision occurs within an 11 bp consensus repeat sequence GGCGCAAGCTC (Turker et al., 19875). Finally, the DNA sequence in Figure 4 ends at an Alu-site

821

Pervasiveness of a Class I In&on 10t Rw*ME’-z 20

TA GTTTAAAGGTAAAACCTTE TTTCATAC;: TAAAGATG%‘%TCGAT? T GTTGA?: GGTTGGCTE TTGGTTd?? GATTGGT@ TTGATTT% 1)4

TCTCCCCAE

GAATGAGf% TTTTTAT%

150 Ah 180 160 170 TTTTTATAGG AGCTTGCACGACTTCGCCATATAAGGTRT GGCAGACk% CATATAT% ACCGTA& HI-a 220 230 240 TGCGCCGTATTTATTTGAGT TATAACAGTAAAGAGCAE: CATCGTT% TATTATT%! GCAAGCT%

TAATTAT%

CCGTTCAA;:UTTGCTTA.% TAATAAAG?tdTTAATTT% TTTTTTA?? T piT+ C 356 371 TTT ATA AGT ATA ATA GGA TTA TTA CTG T&AT GCC 28 ACT TTA AGG CAG %f ATG FISIIGLLLSNAVTLRQDM 416 431 ;CG ;TT $C FTC PC ;GA :TT ECT ETT :TA ftT 476 ;CA ;TG $GT [TG ;CT fTA !TT FT p

[TG !TT F

491 FT ;TT y

;GT ;TT [TA ;AT “:

;TA ;AT $+i FT

;TA [TA ;AT

536 551 521 !TT FCT $AT FTA ;CA ;TA ;TA ;TT ;AT !TA ;TT :TT ;TT ;TT ;TA ;!?

fTA [TA ;TA

581 TTA CAA TTA ACG AGT TTC %t CCC CGA AAA GCC !ii L (3 L T S FYPRKAWIPEHSSL ;TT ;AT ;AA f@ ;?? :TA fT

ATA CCT GAA CAT %% TCA TTA

flA

FT

p”

F

y

701 ;AT ;TG fiA

?A

791 776 761 ;TC ;TT :TA ATG TCT ACG AAT GAT TTA GTT TCT fTA ;TT LCTT;CT :TT y M S T N D L V S

iz

$GT ;AT PC ;ITG F

;TT ;T?jAA

TCA 5

fU

3°F ;CT ;TA :TA [TA f:i

{TA [TA EGT fCT f”

ECA $?i f3T ;TA fTT ;AT 7:; F FT

FAT pdGAF”

;AT p

fT

;TT ;TT fT

;CT FT

;TG

;TT ;TT !TC $GT p

;CT p

+CT fCA $j? ETG fT

p

;TG

[TA ;CT ;CA

TTA TTA GGT GGT ?% A&“TCG TGC JTT ii: L L G G L S S C

[TA [TA ;AT +TT 9

p

[TA [TA

ET’ fi+

fTA

.l CTT AAT AGT it?i ,$GT PT ;TA fVt”: L N S I Ta TAT MA TCA% WYKSYYLNFALLVFSI

FT GCT GCG GAT% A ,A D M

TAT TTA MT TTT%

CCT GCT TTA ACT% P A L T S

TTA CTA GTA TTT%

ATA y

;TT ;TA

Fig. 4.

which marks the beginning of the contiguous sequence containing the small rRNA gene and tRNA genes reported elsewhere (Cummings et al., 1988b). The secondary structure and ORF analyses will be presented in a later section. (c) DNA sequence of the ATPase 6 gene region This region of the map shown in Figure 1 will not be presented as a contiguous sequence, since some of it has been published as part of p senDNA (Cummings et al., 1985). Instead we will start the sequence just upstream from URFC at a Sau3A site (see Fig. 6 of Cummings et al., 1985). When we

reported URFC we indicated that the sequence likely contained a few errors, since this particular region had few restriction sites. We repeated this sequence analysis by both chemical and enzymatic methods (Maxam & Gilbert, 1980; Sanger et al., 1977) and that sequence starts Figure 7 here. Cloned EcoRI-1 and 13 and P&I-4b, 4c, 7 and 10 were isolated and their DNA sequences analyzed using $-end (or 3’-end)-labeled BglII, CZaI, DdeI, EcoRI, HaeIII, HindIII, Hinfl, MspI, NcoI, Sau3A, TaqI and XbaI endonuclease fragments. The sequences containing URFC and ATPase 6 as well as other genes are given in Figure 7. The open reading frame identified with the URFC gene of A. nidulans (Netzger et aE., 1982;

822

D. J. Cummings

and J. M. Domenico

11%

1091

1136

SW

TTT AAA GTA AGT fCA EA ;CT ;TT ;AT ;TT ;GA ;CT ;CT PC !TA ;AT FT F K V S

1166 1151 AllJ FCC TCT :TA ;TT TCA :CA ;TT ;TG ;CT :TA fTA FT p 1211

TTG {TA p L

1226

;CT TTA

11%

:TT ;CT TTT ;TT $TA ;TT

1241

1256

TTA GTT TAT CAT ACA AAT AAT TAT TTA TCA GAA TTT AGT TGA ACA TAC LVYHTNNYLSEFSWTY

1271 CTT TTA TTA ATA AGT TCT CTT TTT?i I- S S L F S L L L

TTA ATA ATT $;“: L I I

1331 CAA TTT A@ ATT AAG AGA TTA CTT GCG1E KRLLAYSTI Q F R I

;TT $TT FT

AGT ACT ATC :CT’?i

$TA FT

:TG1;ti

;TT fTA

AIU Hinf14c6 Hind 1391 1376 1421 TTA TTA fCT [TA FT FT ;GT $GT fTA p ;CA fCA ;AA gCT ;TT fTA ;TT ;AT [TA L L

1436 1451 TTA $4A ;AT ;CT fTA FT FT ;TA PT !TG :TTlF 14% 1511 TCT TTA TAT GGT TAT ;TA +CA fCA PT p S L Y G Y FC ;CT ;CAyR

C$A ;TA fTT AGCl:ii S Q

+TA :TA ;TT ;CTlp

TTT F

1526 1541 GM TAT AAA G4T TTA TTA GAT AAA AAT E Y K D L L D K N

TTA AAA GGG TAT?E L K G Y F

TAT ATA FT pCCT1:: Y I L

1631 1616 Alu TTA TCT TTA $GT ;TA BT fTC !CA TTT ;TC ;CT ;TT $TG FTIF L s L 1661 p ;TT ;TT FC p

y

1676 1691 ;AG ;TG ;TA :TA $GT ;CT ;CT ;TA FT

;CA ;CT ETA {TA 17% FT FT ;AT ;TT ;TC

1736 1721 1751 1766 TTA ACT TTA ATT ;CA :TA ;TT fCT $GT ;TA :TT FT ;CA ;TT ;AT ;AC [TA FT L T L I ;TA p

1781 y

17%

sau

1811

:TT ;TT FTC TAT [TA ;CT PT EAT ;GT :TA fiT

1841 1856 TTT TTA TTT AAA AAG GGT TTA fTC ;TT y F L F K K G L

AllJ

ET p

FT

1826

;CT ;CT :TA F

1871 ;TT faG FT

:TA

y

y

1886

fTT ;CA

1901 1916 1931 CTT ATA TCT ;GC ;CT ;TT ;CT TTT fCA :TT ;CT ;TA $TT TCA [TG ;TA ;TA :TA TTA L L I s 1946 TTT ATA TTT ATG AAT?& FIFMNK

y

;GA [TA ,Tlg::

$+C

:TA :TG1vi

;AA :TT ;TA

2057 2042 2027 $GT ;GT /TG ?CA ;TA ;TT ;TT [TA ;TT ;TA ;CT fTA FTT ;CT 2072 TTA CTT TTT TTA TTT ATT FT LLFLFI

2087 210 TTA ATA ;TT ;CT ;CT ;AT fiT ;C L I

Fig. 4.

Cummings et al., 1985) starts at position 145 and ends at position 807 for a total of 221 amino acids. Sequences starting upstream from this sequence are underlined in Figure 7 to indicate correspondence with another ORF which begins well downstream at position 1864. This ORF contains 296 amino acids and is labeled ORF(C) because it begins with a 175 bp unit that has 94% nucleotide identity with that sequence underlined as part of URFC (Fig. 7, positions 76 to 249 and 1794 to 1968). Because of we did a complete amino acid this identity, comparison with the URFC sequence of A. nidulans and P. anserinu and found that near the 5’ end, URFC and ORF(C) of P. anserina are, as expected, quite similar and bear good similarity with the

A. nidulans gene. The two P. anserinu genes, however, show less similarity for the rest of the sequence. Over the entire length, P. anserina URFC shows with URFC from similarity sequence 47% A. nidulans but ORF(C) has only 16%. URFC and ORF(C) of P. anserina display 35% amino acid sequence similarity over their entire length with the major difference in codon usage (see later) being the preponderant use of a lysine codon in ORF(C). The ORF(C) product is not known. About 700 bp downstream from the ORF(C) identified as sequence there begins a sequence ATPase 6 by a computer similarity search. Just upstream from this start there is a 52 bp palindromic sequence (underlined in Fig. 7) which we

823

Pervasiveness of a Class I Intron 2117 p

;TA

$'A

;CA !TT

;TypC

$AT2p

$TA $AA CTA TcA2i.% LSNSGKTLKLL

ATT AATT;qCn2ii: I N R K

;TT

y

2162 Hind AAA ACC TTA A&G CTT CTG

GTT AAT TGT u;c2% AGT AAT TAC??A2% V N C GWSNYSGIV

G

2402 fiT

Tc%GG

N

Dde Hi&+32 2417 AGA AAT TTC AGT ACC TTA GAA ;CT fU RNFSTLE

K

ATA GTA

R

G

S

S

[TA

fiT

2447 ;CT ;CA ;AC !TC ;CG

2477 2492 2462 GGG TTT GTA GAT GGT GAA GGT TCT TTT ATG TTA ACT ATA ATA AAA GAT%:: GFVDGEGSFMLTIIKDNKY

K

L

f TGA CGT GTA GTT2% GWRVVCRFVISLH

FT

[TA

;TA

2582 fT

p

;TT

fU

p

AGA TTT GTA ATA"% 2597 ;TT

;TT

fiT

;TA

2717 p

;AA

;TA

;TT

;CT FT

;AT

2732 p

L

AAA TAT

TTA CAT AAA AAA2% K K D 2612 FT

FT

A

TTA L

ETT ;TG f"o'

DSAQYRVESLKGLDLI 2687 CAT TTT FT Ii F

fiA

2747 ;TG ;CT $AT /AT

!TA

;CT fd

;TA

2762 p

FC

p

FT

2777 SAT ETT ;CA fU

BT

;TA

2822 :TA

fT

/AT

FT

FCT ;TT

[TA

{TA

AlU

CTT GTA GCT ATA y L V A I 2867 TTC CCT GGT ATT PT F P G I $CT ;TT

&GA :TA

[TA

2837 /AT

FT

2897 2887 pdGA ;CT EAT ?CA ;CG [TA

p

p

fiT

:TA

TTG TTT AM L F K 2792 [TA

[TG p

2852 s$GT fTT

GCT A

Hhlf ;CT ;AA +TT ;TG AAT N

2942 2957 2927 TCT GGA TTT GTT GAT GCA GAG GGT TGT TTT AGT GTC GTA GTT TTT SGFVDAEGCFSVVVF GCA3g:

K

Ah

2702 ;AC FCC ;TA

AAA TTA AGT TTT3:f;

TTA ACT CAA

SKTSKLGEAVKLSFILTQ

3032 (jGT fT

+CA FT

fA

FT

F

;ATF

;TA

yHp

3107 3092 SW ;ITA FT ;CT $GA FT +CT fTT

:TA'p

FT

Fig.

reported as part of the LMt-1 plasmid 3’ excision site (Turker et al., 19876). This palindrome could just as well be involved as a recognition site for the start of ATPase 6. Parenthetically, the 3’ excision site utilizes the complementary 11 bp consensus sequence noted earlier GAGCTTGCGCC. The total length of the exon sequence for ATPase 6 is 264 codons and it has 94% amino acid sequence identity with the ATPase 6 gene from N. crassa (Morelli & Macino, 1984). This sequence similarity is illustrated in Figure 8. Like the ATPase 6 gene in N. crassa, the P. amerina gene is interrupted by a class I intron. On the basis of the computergenerated similarity with A. nidulans (Netzger et al., 1982), the P. anserinu gene does not have the

{TT

GAA TAT TTA G@i3%; EYLGCGN 3122 $AA ;TA

;CT PAT ;TT

GGT AAT

3137 ;CT FT

:TA

4.

putative 93 bp intron (indicated by an arrow in Fig. 10) near the 5’ end of the N. crassa gene. Moreover, the 1694 bp intron for P. anserina is at a location different from that of the 1370 bp N. crassa intron. Like many other introns, both introns are inserted into highly conserved regions of the gene. The P. anserinu intron differs from the N. crassa intron in other respects. First, it lacks the dodecapeptide sequences connected with maturaselike proteins. Second, its 280 amino acid ORF shares little sequence similarity with the N. crassa intron. Rather it has its closest similarity with the N. crassa ND1 intron (see later). Third, as we will see, its secondary structure model is quite different. Downstream from the 3’ end of the ATPase 6 gene,

824

D. J. Cummings and J. M. Dmenico

p

3155 3170 3185 FAT ;TT ;TT QTT ;CA ;TT ;TC :TT fU ;AT ;CT [TA $AA y fT

3200 FT ;TC fCA FT

$AA3$?i p

y

C&AA”&

p

TAiit6

:TC ;GT3p

TTATAG%

fiA

;TG FT

fCG3;:

3433 AGA :CA 6A.A ;TT FTF I?

p

;TT ;TT :TTT:

3493 3508 TTA GAA ATA TTA TTA ACT TTT ;CT ;TT FT LEILLTF

3553 [TA fTT !TA [TA ;ITA y3F

3613 TTA GGT AAA AGT ffA L G K S

[TG p

$AT ;AT ;CT ;CA3f:A

!jCT TTA (jTA {ACT;: L 3523 ;TT $GT y

;AT ;TC FT

:TA fCT :TT fTA39@: ;TA FT

~TT3ff

ATAGAG%

s$GTfX

TAGGTZ:

F

AAA K

WAA%%%kGI

w2Iamm2 3388 3403 3418 TAT CAA GAG A4G TAT AGC ATA TTT CW TGT GGT TTT CAT $GT ;TT ;yTA p YQEKYSIFECGFH

FT

$AT TTA L

3215 3230 3245 ;TT ;GT GAA GTC GTA AM TTA ATG GAA AAT AAA TCG CAT TTA FT E V V R L M E N K S H L

;TA $P?i)A3;;:

TAA TAGTTA%

fM

;AG tTA3fi?

ATAAAT%

;AA FT

;TA [TA FT 3538 FT :TT

;TT fTT3z%

JAC y

VCA ;TG +CA $GG3F:?

ACTTTA%

ATAA&:

?I T4 Alu 3743 3753 3773 3733 3783 3793 AATTTTGTAC TTCTGCTGAT TACTTAATTT GTACTTCGAGCTTACGCCGA AGCACGAAGT AGTGCTAATA

3803 TTTAGCW

3813 UicAAAAMA

3853 3863 3823 3833 3843 AATTTTTTTT AGGTGTAAGCCCTATAAATT GTAAATAATA TATTATATAC

3873~1~ CTAGATATAG CT

Figure 4. The DNA sequence of the ND2 and ND3 genes’ non-coding strand starts at the duplicated tRNA Met-2 gene spanning the EcoRI-7 and 1 junction and continues to the Ah1 site at position 3875, which starts the sequence upstream from tRNA Arg (Cummings et aE., 19883). ND3 intron sequences are marked and boxed.

the duplicate tRNA Met-2 sequence is given (position 6478), which is involved in the excision of @ senDNA (Cummings et al., 1987; Turker et al., 1987b). The segment of the chimaeric $ senDNA excised from this region extends down to position 7486 where it can be again noted that the sequence GGCGCAAGCTC is part of the excision sequence. The sequence then extends down to the DdeI site, which appears just before the URFQ’ sequence (see Fig. l), reported as part of /l senDNA (see Fig. 6; Cummings et al., 1985). It should be noted that all of the sequences presented in this section complete the sequence of p senDNA. For completeness, the structures for the two tRNA genes identified here, tRNA Asn and tRNA Met-2, are given in Figure 9. These two tRNAs bring to 24 the total number of tRNAs identified in the P. anserinu mt genome, two of which are duplicated, tRNA Met-2 and tRNA Val (see Fig. 1). (d) Secondary

structure

models

Thus far, we have presented the DNA sequences for the ND2, 3, 4 and ATPase 6 genes without any attempt to connect these physically separated

mitochondrial genes. What seems to group these genes is their introns, both with regard to their secondary structure and their open reading frames. The secondary structure models for the ND3, ND4 and ATPase 6 introns are given in Figure 10. All three introns lack the additional 6 helix inserted 3’ to the f-oligonucleotide characteristic of class IA but not IB introns (Michel et al., 1982). Even at a distance, these secondary structures for the three genes appear to be remarkably similar. All three have an extended region with the same shape below the C-helix, designated as the C-domain (Michel & Cummings, 1985). The term C-domain was initally used to describe this region in the first two introns of the ND1 gene in P. anserina (Michel & Cummings, 1985). We suggest that all these introns constitute a new subclass of class I introns, class IC. The C-domain differs in these introns in the socalled Cl helix which is present in the NDl, intron 2, ATPase 6 and ND3 introns, but is absent in the ND4 intron and appears as a 17 base unstructured bulge in NDl, intron 1. The extensive sequence similarity of the ND3 and ND4 introns with NDl, intron 2 and the ATPase 6 intron with the ND4 intron is boxed in Figure 10, but other relationships

825

Pervasiveness of a Class I Intron 70

60 50 40 30 10 20 MIFISIIGLLLSNAVTLRQDMSVNFNRIALl~IYCIL~T~SIINKGIGLHGULHITNITLIFHI~ . . . ...*... : : :::: : ::::::::::::::: :: ::::::::::::: :: MIIMT~LS~~~~~~~TLR~~ISILFNRI~~IALIYCIL~T~SII~~~~L~TNIT~Q~ 10

panserh N.cmrsa 120

110 100 IFFLSILI~~LTSFYPRK~~IpE~KDIIYQKFLNYRT--------------KIFNKmjEHM
::

~grrseri~

::::::

N.cmssa

190 180 160 170 150 130 140 LILLNISGAVFLMSTNDLVSIFLSIELQ~~YLLSTIYRN~LSTA~IYFLL~SSCFILLGTSL~an~ri~ . . . . . . . . :: ........... .......... .: ::::::::::::::::::::::::: iji~~~I~~~VFLMSTN~~SIFLSIE~~~Y~HTS~T;~l~i~~i~o~ILCG~~~~

L~~~~TTSL~~ILNsI~~~~AL2:~S~L~~LLVFSI~~~VSAAPF~PWYDe~ . . . . . . . . . . . .,.. .. .. ..,.,*..,,*,...*...*. . . . . ...*........**. ..,.,...... :: :::::::::::: :::::: : LYANSGT;~ffiLYIIN~;~DVNDNM------TSW&USWLNF~~~IFWGFL~~~ SJUEWgZW,N.M

A:~~IVTTFVA:~~ISIFIF~LVYHTNN:~~F~L~~~SSLFSLI:~~~TGIF~:~RLLAYs~~ :::::::....‘........ : :: :: :::::::::::::::::::::: . . . . ..**..... ..,............,.. . . . . . . ..*.....*. AIP~~~TTFVAII~~SIFILLL~~~TNNYL~~~IY~L~~~FLSLIIG~T(JFRIK~~~~c_m,??S!

T:~~GFILL~~CSI--EST~~~~F~IQYS~~~FI11:~~GFSLYGY:q~NSP .,.,,......... . . . . . . . . . . . . . . ...,..... . . . . . . . .*......... ......... ::::::::

ii~~~ILLALS~~Q~STQ~~~MIQY~~~uL:

4

..,.. ..,..

.. .. .. .. .. ....

P.h

,I,GFSLYGYb$WEYKNLLD;NNSP Nxrasra 470

460 450 440 430 420 410 IQVISOLKGMYINPLLKSLAITIFS~/GIPPLV~F~~~~YIFLTLIAILT~I~~~~D~~~~~~ . . . . . . . . . . . ..a.... : :::::::;: ..,...,,,I,.,,.. . . . . . . . . . . . . . . .. .. .. .. .. .. .. ...*.......*.......* :::::::: VQL~~~LKGYFYL~~~L”“AI~~“AGI~~~F”””Z~~L””’YI~~~IAILT~~VW~~

LNI~~~IFFYLP~~~NP~IGEF~~~K~IFEA~~KGRITLIs----------------~~~FSITISI~~ .*” : : j~j~oi~FYSPR~~KTVDV~N~oFP~R~~~~DSNAFS~S~RYTVSSPL~YTI __

:::::

::::::::: y;;TISI

N.cmsra

550 540 530 ITLVILLFIFMNKEHXffiTILVQVLFSN~.~ .‘..‘..‘:: :::::: N.crossa iir~~i~~~~~~TI~~~~FST -)

Figure 5. Alignment

of ND2

amino

acid sequence

exist. For example, the sequences in the b, c and d helices are quite similar in the ATPase 6 and ND3 introns as are the sequences in the a, b, c, d, and e helices for the ND3 and ND4 introns. Moreover, the shape of the D region is quite similar for all three introns even to the presence of the ORFs. The ND4 and ATPase 6 introns appear to have “exchanged” the ORF position between the D2 and D3 helix locations. The D region of ND1 intron 2 is quite similar to these three introns as well but ND1 intron 1 is not. Its Dl helix extends into a highly structured domain and its open reading frame is in

with

that

of the N. crassa gene (devries et al., 1986).

the a domain (Michel & Cummings, 1985). Except for the ND1 intron 2, none of these introns appears to show base-pairing between a so-called internal guide sequence and the 3’ exon sequence (Davies et al., 1982). The ND4 intron has the sequence pair AAAUCC and UGUUAGG which could provide obvious complementarity but not precisely at the splice site. Repositioning the splice site at either the 5’ or 3’ end is not useful, since this affects the exon sequence. It is possible that these sequences serve as an RNA guide but not in the precise manner proposed. We should emphasize that we have

10 20 QEKYSIF~GFHSFLGQf4iTQFGVKFF?ALVYL MSSMTLFILFVSlIALLFLFIWLIFAPH ::.. .:.:... :.. .:: . . . . . ::::.. .:.:..: . . . . :.. . . .:: : MFLLQKYD~~FVFLLIIS~~SILIFSLS~~IAPINKGP~~FTSYESGI~~-MGEACIQF IRYYMFALV# 80 120 110 100 90 LLD:~ILLTFPFA~~EYVNNIYGLI~LLGFITIITIGFVY~LGKSALK~DSRQV~TMTRFNYSST .. . .. : ..: : :: :..:: .... . ... : . . . . . . . . .:.:.: IFOVETVF~~PWAMSFYN~~ISSFIEA~~ILILIIG~~~AWRKGAL~~~ Liverwort

P.gnrerino

liverwort

(M.polvmorpha)chloroplost

130

_P.pnrerina chloroplast

Figure 6. Alignment of the PU‘D3 amino acid sequence with that of the liverwort chloroplast gene (Ohyama et al., 1986). A similar sequence similarity of part of t,he ND3 gene from N. crassa was obtained with human PU’D3(deVries et al.. 1985).

826

D. J. Cummings

TACNCATA

80

CAGAGGAC~~ ACAGAcGf:?

and J. M. L)omenieo

GTCTGTAff:

ATccTcTfz

-

MAA%

TAAAAATkt:

URFC

159 174 189 1 AAC TAT CCC TTA TTT TGA ATA PhC GAA ATT TTA ACA AAT GGT TTT NNNYPLFWINEILTNGF

219 234 GTT % TAC ATA TTA GA T ATA TTT TCA ATA ATG GCT TTT CTT A tYILDIkSIMAkLT

ATA CTA TTA ;TT

309 p FT

fTA

339 FT ETA fj4T ;TC

fTA

“‘:

;CA ;AT

:TA

;::

384 :CT :TA

;CT :TA

399 TTA ;TT TTA ;TT L L

414 ATC TTG /TG I L

[TA

fTA

:AT

;TT

;TT

$GT fT

ACT AGT tt% T S N

fTA

TTT ATT F I

gGT f!T

;:

[TA

;TT

;?:

;TG

;AT

;AT

f??

$GC PG

TTA ;E AAA AAC CCT ATT 6;: LTKNPIVSVLFLILL

FT

;CC ;CT

:TA

TAT TTA FT Y

?A

489 fTT

324 $AT [TG FT

fTA

;TT

$GT FT

549 FTC ;AT

YAT p”’

p”’

:TA

p

FC

609 PAT [TG FT

;GG EAT p

!TA

/jAT [TA

669 ;CT ;TA

fTA

fAT

624 $AC fjAT f$T

SW ;AC ;TT

684 /AT

;AT

*

;TT

789 FTA fCA fTA

p

:TA

AGT ATT CCT TTA % S I P L T

;CA ;TG 2:;:

714 729 TAT ACT ATT TAT AAT ATA TGA TTA ATA ATC 6% YTIYNIWLIIASFILLLAM sau GTG GGA TCA fTT V G S

TAC GTT Y v

TCC GTA TTA TTT %

:TT

429 ;GA ACA AGT ;AA T S

249

;TT

;CT

;AT

FT

:TT

i:?

fCA FTA f;AT CTA y

LTA

ATA I

639 654 $TA ETG ;AT $TT ;TG +CT $GT $TA

:TA

Ah /+CA ;CT

699 fTA

FT

FT

AGT TTT ATA CTT ;?:

C$A ;GA F

fT

TAGGTG&

TTA ;TG

CTA GCA ATG

CAA%T%AC -

1187 1197 TCGACGTGGG ATATATAGAC

1207 ATACTTCGTC

1217 GTCTTACAGA

1227 1237 CGTACTGACT AAGTATGTCT

TTATGAf%

ATCTT&

C1)PTGCfA:

TAGTTdg%AGTTGAAG

AATTTTk%

CTTTTC+%

TGTT%%

sau 1397th

TGGGATCTAC CkATTT~!?TCAATGk#;

GTATGCCP;jdC

1377Hinf ATTCATffg

CTATTG%

TGCTCGCT?~

Fig. 7.

eliminated the possibility of a one base error at the a helix. Finally, these introns appear to be related in the complementary pairings thought to be necessary for maintaining the core structure of class I introns, the R-S, f-f’ or P7 pairings (Davies et al., 1982; Michel et al., 1982; Burke et al., 1987). A comparison for the relevant introns is shown in Figure 11, where it can be seen that those introns which appear to be closest, i.e. introns ND1 intron 2, ND3, ND4 and ATPase 6, all have the GCA upstream sequence and the extended A-U pairing. The ND1 intron 1, which has the extended C-domain is different, however. This intron is also different in its D domain so it is not certain if it belongs in the same

subgroup as the other introns. Also included for comparison is the ATPase 6 intron from N. crassa (Morelli & Macino, 1984). This intron not only differs in its f-f’ pairing but also lacks the extended C-domain, differs in its D-domain and has an ORF extending from the a helix region. As we will see, however, its ORF is closely related to the ORFs of the Z? anserina ND3 and ND4 introns. (e) Intronic

open reading frame

analyses

The pervasiveness of the open reading frame sequence of the N. crassa ATPase 6 intron has been suggested by its sequence similarity with the open reading frames of the ND1 intron 2 (Michel &

827

Pervasiveness of a Class I Intron 1467

ATATAAACAA

1477

ACTATTMM

1487

CATGAAAAAT

AAMAT%:

Gf%hi%T

CCCTTT%

UiGTT;;T?

MCAAGTTTG

GGGTTTCTTA MTCCTATAA

1657 Ah "ha Alu 1637 1647 Alu TAGCTTATAG GATMTAGAC AGCTAGATGA CGAGCTTGCG

1677 CCTAGTAATT

1687 GTTGTTACAT

1707 ACTTTCTTM

1607

1617

1747 CTCATCTATT

1627 1697 ATTGCATATA

1757

1767

MTMTMAT

YIFIIMAF

1953 ATA ATA ATr; GCT TTT CTT L

FTT plF

;TT

FAG PT

2058 ;GA [TA

FTT fT

FAT'?

y

FT

l%a

fTA

PG

FC

;CT

;CA ;TT

2118 CTT TTA CTT AAT ATA CAT ;TT LLLNIH

2133 ;CA ETA FT

?;A GGT TTT ATC TCT2% LGFISIKNMEI

MT

ATG GAG'i?i

FT

fTA

MA

YISEII

Hinf

Hint

;CA PT

;CA p

2343 y

;CT PC

f$T

fTT2p

EAA f!A

I

f!A

fTA

?TA FCT'F

;TG ;TT

[TA

fiA

:TA

$'TA TT!

;TC

TTA FT

;AT

2088 ;TT

fTT

;TT

ETA ;AT

214a ;TT

2298

FT

;GA FT

EGT ;AT

{TT

;AT

fTA2$?

PG

fTA

2268 :TG FT

:CT ;TA

p

;TT

:CT

$GT fU

,FT

:TT

2313

;CA FTC ;CA FAT [TA

;CT y

2103 $AT

FT

2328 ;CT ;CT

2373

2358 {TA

I

{TT

:TA

TTA L

s

2253

;CG [TA

AG MC STN

1983 TAT AGC ATT ATT FA

$GT :CA :TA

41u

2223

TAT ATT TCA GAA ATA ATi+ ;CT

flA

;TT

MT N

ATA TTA

Hinf

fU

TTATTTfiTG E

1923 GGT TTT GTT IiAA MT

Y

fT

1807 ACAcA(;86$1;

04 (0

1857 ATTAAAAATA

1908 @A ATT TTA ACABT

1893 TTA TTT TGA ATA YPLFWIKEILTNGFVtNIL

1878

FCT FT

CCGEiATGATT AGTATTACAT

1847

1837 TTATTCTCTT

1797

1777 Msp 1787

TAGAGGATGA TGATGATGGT GATTMCCTA

1817 1827 GTACAGACGT ATGTCTGTAC

1938

1727 1737 1717 GTACTCCAGA CTAGTTTCGC CGTCATCATC

;CT fA

2388 ;AC

Fig. 7.

Cummings, 1985), and the r2 intron of the large rRNA gene from P. anserina (Cummings et al., 1988c). This pervasiveness is even more apparent when we compare this N. crassa intron with the ND3 and ND4 introns of P. anserina (Fig. 12) and the ND1 intron from N. intermedia Varkud (Mota & Collins, 1988). This ND1 intron is quite interesting in that it is in the identical position of the ND1 exon sequence as the N. crassa intron but its ORF is different in sequence and location. As shown by Burger & Werner (1986), the ORF of the N. crassa ND1 intron is not continuous with the exon sequence, whereas the N. intermedia intron is (Mota & Collins, 1988). However, both introns have essentially the same DNA sequence utilized in

forming the secondary structure (Mota & Collins, 1988). Only its relative position is altered: at the 5’ end for the N. crassa intron and near the 3’ end for N. intermedia. A remnant of the N. crassa ORF is present in the N. intermedia intron but it is downstream from the secondary structure region. Mota & Collins (1988) proposed that this unusual organizational exchange of structural and ORF sequences in two related Neurospora strains indicated independent evolution of intronic structural and coding regions, Introns of the ND3, ND4 and N. crassa ATPase 6 genes all have open reading frames continuous with the upstream exons (see Figs 2 and 4; Morelli & Macino, 1984). The sequence similarity shown in Figure 12 begins a few bases

828

D. J. Cummings and J. M. Domenico

p

;TT

2568 $GA p

;CT y

;AT

!TA

Dde 2598 TCT GAG GTC AAT AAT AAT GAC ;CA fiT S E V N N N II

p

[TA

FTC p

p

PT

fiA

2748 ;TT

fiA

2764 TAA TATTTATATA .I

CAA CAA $AA fATT: Q Q 2733 !TA

p

;AT

p

;AT

2643 FT

;AT

fTT’fi?i

fTT

p

@T fM

:TA

2658 ;TT

;TA

p2p

2774 TATTGTACGT

2613 SW FAT tJAT

;GT ;TT

Sau FT fTT

[TA

y

?TT FC

F

FCC f”’

2784 2794 CGTTTGGCTT ACTTTATTCA

ATGGGT%

TGTACG%

CAGCAT%?

ATMAT%

GAACTT%:

CATACTS

TEZGGGG%

TATATG?i

AATGCG%%ACTAi%

CA’XAATGCC

CATTTGfi%

TGGTCT&%

CCCCiACf%

TATAATi%

AATATA%i

TTTTTTE

AAGTTAAC?

3014 3024 AGGGTAAAAT ATATTAAAAA

3034 GATAAAAATT

TC%AATh%

3054 TBZAGMAM

TT AAAG%?

s&o74 GAGCGATCuj

TAAATA%

TATATTE

AATTAA%

AAATTG%

CTTACA%

TCTAAA%g

3144 CCTTTTTTGT

AGAAAT%

TTACAA%

TAGTTT%

TAATCC%

TAATAA%i

TTACAT%%CTAG%

3224 CTTTCTGGU

iiha Alu GGCGCAAGCT CTAAAT%

ATAAAT:%

3264mamprru3274 TTMTAAGGC GCGAGCTCAT ATGTTT%?

ACTACT:::

CATATA:%

TGCTAT%

AATTAG%

ATGTTG%

A$TTT~

TTAA%.%

TATTTTZ~?

ACTGGCf%

3404 Da AAGCGAACTG

3434 WTACATA

TTGTAGGCGC &T&i:

CATGAC%

TTT&TT

TTACATE

AAGATAT

Hha

GAATAG,%%CA&~

TAAATTE?

CTACTT%i

GACGC A%%

TTTTAGGT%

2934

TTTTTT%

TTAAAii%!

3414Alu

GTTATAi%

kwn3424

ACTTAT%

~1b.b h-1 3556 TG FT ;CT ;TA FT

fT

E

CTA ATA PC L I

fTT3;i

:CA

3676 GGG TTT TAT TTA ACA3% GFYLTMGAFFLLIINLLST

;CT ;TT

:TA3F

ET

GGG GCT TTT TTT3;?i

[TA

;AT

fTT3F:

TTC +CT /AT

TTA ATT ATA AAT3%

:TA

TTA AGT ACA

Fig. I.

downstream from the relevant ORF. We also include the ORF from the r2 intron of the large rRNA gene of P. anserina (Cummings et al., 1988~). This ORF is not the primary ORI? of the r2 intron; rather, it is an 111 residue oligopeptide between the end of the main ORF and the 3’ end of the intron. As can be noted, the sequence similarity is best for the upstream dodecapeptide but it is quite extensive for much of the length of all the ORFs. In some respects, the ND4 intron ORF is similar to the ND1 intron 2 and the r2 ORF. For the ND1 intron 2, no lengthy ORF was detected; rather, several ORFs of about 160 amino acid residues were found which showed a discontinuous sequence similarity with the N. crassa ATPase 6 intron ORF

(Michel & Cummings, 1985). Here, the ND4 intron has an ORF of 246 amino acids before a stop codon at position 1447 (shown in Fig. 2). But after this stop codon, the sequence similarity continues in the same reading frame. There are numerous stop codons and a two amino acid residue change of reading frame before a putative downstream dodecapeptide (YAIGLTSGDGCF) starting at position 1601 in Figure 2, coinciding with the dodecapeptide positions in the ND3 and N. crassa ATPase 6 introns. After analyzing these perturbations, we repeated the sequence analysis of this region by both 5’ and 3’ end labeling and found no errors. Similarly, the sequence similarity of the r2 ORF appears to start at the end of a dodecapeptide

Pervasiveness

FT

3736 ;AC FT

fU

37% FCT fTT SAT FT

3751 ;TA :TT FT fT

829

of a Class I Intron

Hinf 3781 AIU Sau 3766 !GC ;GA ;CC :TA AGT CAA GAA TCT TTA TAT GCA SQESLYA

3826 3811 :TC ;TA $TA FT ;AA :TA $AT ;CT y fT

3871 3856 ;CT ;TT !TT ;AT ;CT :TA ;TT :TA ;TT !TT [TA :TA f!T’$?-i

FT

[TA !TT p

CCA TAT i& TTT3$? TCT ACA AGC CAT3??+ GTT TTA ACA TTT3g PYSFASTSHFVLTFALSFT 3961 ;TA $TT [TA FT

3991 ;TC 6A-A fiA

3976 FA TCT fTT [TA y

4021 [TA ;TA ;TA d&T u;T4:i: PAGCPLGLLPLL

CCT CTA GGA TTA4:M

4081 TTT ATC ;CG ;AC ;TT ;CG $GA4F F I

;AT p

3841 ,$AA :TA ;AT ;TT

CFAGT

TTT AC1

4006 [TA EAA ;TT ;TT ;CT

CCT TTA TTA $A?

:TA ;CT [TA pT4;:Fa

;TGp

:TAA$T

fTA FG

;CA4p

T

4329 4339 Sal 4299 Sau 4309 4319 Hint 4279 4289 AGATTCTAGTAATAGAAATGGGTGATCGTGGATCTAAATCAGTAATATTT TAMAGATTA TTGTAAAAGA 4379 AT;;% 4369 4359 GACGGTTCTTCCATTAGTTTAAACTAATGGT

CACCCC???: TTTTAT?:

T:?&tTT

CGCCA&:

dTG

4549 Hinf Taq AAGAATEE ATAAAGAATCGA&W&WI

4579 4589 4599 4559 4569 ATAAAGAAAGAAAATAACTA CACTATCGTGTGTAGACTTATATAATCTTG TGATTA?:

CTCCGTi&!!

ACCATGi$.ii GTCTATg?z TGACAC??: CAGAAA~?

AAAGTA?&ATACC??;:

TAT&

4839 ATAT/IJ&AAT TATACCE:

ATT&?:

AACTTCZ

CTACGAf::

TTCCAGii?

GGATGTE

Fig. 7. sequence. It is as if recombinational events have taken place between the N. crassa ATPase 6 intron and the ND4 and r2 introns of P. anserina which preserves part of the intronic peptide sequence. This notion of a recombinational event is reinforced by the sequence similarity of the two Neurospora intronic open reading frames. Here, the organization of the two introns is quite similar. Both have continuous exon-intron reading frames with the secondary structure sequences located near the 3’ end of the intron and the sequence similarity extends for the entire ORF. The ND3, N. crassa and N. intermedia introns all show good sequence similarity throughout their entire length (greater than 40% identical residues), suggesting a similar function in all three introns.

The open reading frames of the ATPase 6 introns from N. crassa and P. anserina displayed very low similarity. Curiously, they are very sequence different introns. As indicated, the N. crassa intron lacks the extended C-domain and has its ORF in a different position of the secondary structure. Moreover, the P. anserina ATPase 6 intron lacks the dodecapeptide sequences and does not (except for 4 amino acid residues) have a long ORF continuous with the upstream exon sequence. The long ORF for the P. anserina intron starts well downstream at position 4925 (Fig. 7). And as shown in Figure 8, the ATPase 6 exon sequences are interrupted at entirely different positions. The closest relative of the P. anserinu ATPase 6 280 amino acid residue ORF is the 305 residue ORF of the N. crussa ND1 gene

830

D. J. Cummings

AGTGAC%?

CCTCTT%

4966 ;TT

p

PG

5026 p

;AT

:TA

fC

5086 “d”

FT

5146 ;AC YCT FT

JAT ;TT

$M

$TA p

:TA

TCT&%? 4981 FT

TTCTCTG ;TG ;%$AC

;CT F

5041 ASGT ;CT ;TG FT

LTA PC

p

5101 ftA

49% $AT ITG $%A fCT

TTA f3A L

5056 ;GT fG

/TG FCT fTT

5161 Ode :CT fTC [TA

5011 p

FT

5071 ;AT

;AC

fJT

$TA ;CA ;TA

:TA

5131 p

;AT 5191 ;TG

5176 ;GC YC

[TA

y

:TA

FT5g

;CC ;CA

;C?$G

;CT FT

FT

fCA FTC fiA

$?%i

;TT 5536 y

;CC fTT

p

f,AT ;AC-T:

;TA

fiC

;CA fICw$A

;CT

;TG fIA

AAA5;g K

;“‘“~Tb+A

;CC ;GAAbT

;AC ;TG fA

FT

$TA5$;

C$A5p

p

y

r$AC TAG AATACT%

MT

TAAATA%

Alha ExnZ 5838 CTT TCA GGT CAT ATG CTG TTA ;AT LSGHMLL

5883 ;CG FT

TTT ;TA

;TC5p

fTT

5853 ;TA

FTC >

:TA

;TA

:TA5Fi

FT

;TT

[TA

fIA

!TA p

fCT5p

$GT

;CT

YCT

$AT $AG fiATi:

pdG ;AT

fiAC’p

;CT $AA y

!TA

:TA

TTG AbA +CA fTyt8 L K

FCC ;TT

;TT5p

TTT AM

;TT

[TAT+

GCA TTA GGT TAT5Ef

AGA AGT AAA5%

TATGTG%

fI

EGT’F

$TT ;CT FC

TCT TCT5% TCT GCT GCT GCG5% SSISAAAKALGYRQPSLSL TAC TTA AAA5@ YLKENRSKPFKGKYL

F

GTT Ci4G GTA ACT%+ V E V T D

E :CT fCT :A,;::

+(jAT ;CT $jtT5;~

;CT F

FT

;CG FG

[TT

:TA

;GA’p

!TT

[TG PT

5236 5206 5221 TCT AGA GAA AA.4 CAC TTT TTT GAA GTA TAC TCC CCC GAG ;AT f&T S R E K Ii F F E V Y S P E F

:TT

;TA

$GT ;TA

$GC ;CT ;Cq$EW\SikT

y

5116 PC

fJC

y

FT

;TG5;k:

!+CT {TA

FT

;TT

eTA4;;

fiG

5251hnsp FCC F

;TT

p

and J. M. Dmenico

p

!Tr’p:

;CT ;TT

:TT

;AG

!TA

;AT

CAG CCC AGT TTA’i:f

GGA AAA5%

TTA ;TC $&I

TTA

:TA5fii

AAAATA%ii

FA

[TA5v:

FT

;TT

;CT ;TA

tCT5ff?

;CC ;TC5y

FT

;?A

;TG FCA

:TT

FT

;TT

Fig. 7.

(Burger & Werner, 1985). This intron’s ORF lacks the two dodecapeptide sequences and its ORF commences well downstream of the 5’ end of the intron. The amino acid sequence similarity is shown in Figure 13, where it can be noted that where the sequences are most similar (196 amino acids) there is a 65% conservation of residues. Previously, Burger & Werner (1985) and Michel & Dujon (1986) found short stretches of amino acid similarity of the N. crassa, ND1 intron and the TMd intron with the ND1 introns 1 and 3 of P. anserina (Cummings et al., 1985), especially in a KGGIY . YIG motif as underlined in Figure 13, and proposed that these introns have a novel type of class I intronic ORFs. The stronger similarity with the P. anserina ATPase

6 intron ORF suggests that this intron also belongs to that subclass. It should also be mentioned that ORF 2 of the P. anserina ND1 intron 4 is quite similar to the single intron of the ND1 gene of N. crassa (Cummings et al., 1988a). (f) Codon usage ND3, ND4, The codon usage of the ND2, ATPase 6 URFC and ORF(C) genes and their introns are all typically mitochondrial (not shown here). For all, the third position is mostly A or U (85 to 89% for the exons and 77 to 84% for the introns) as compared with 30% in the nuclear genes of N. crassa (Nargang et al., 1984) and except for

Pervasiveness

5943 5958 5973 TCT GGA TTA GAG TTA GGT ATT GCC TTT ATA CAG GCT CM 5 G L ELGIAFIQAQVFVVL $GT p

6003 ;AT

fTA

fIA

;AC

;CA

831

of a Class I In&on

6018 s-au ETT f$T ;TA

5988 GTT TTT GTA GTA TTA TCA

6040 TAA AATAAAATGC

6050 GGCAAAAGTT 6120 TGAAATGTTT

6060 TTTTAGTTTT

6070 TTTAGAGTTT

Hha 6080~1~ 6090 GCGCCCGTAG CTTTAAGACT

6130 TAGTTTATTA

6140 CCTCCTGTAT

6150 TATTTATATA

6190 tiha6160 6170 6180 TGTGCGCATT AACACAGACT TGGGTATACC AATTTGATTA

~e6200 TACCTAAGTA

6210 TTGCCTTCTT

6220 ATTTCTTTAT

6230 TTTCTTTAGT

6270 AAATAGGTGT

6280 AAGTAGTTAT

6290 Sau 6300 ~ae6310 6320 GGCAAATAAG GATCATGCGT ATAGGCCGTT TGTTATTTTT

6340 ATTAAAGGCA

Ah 6350 6360 CAAGCTCCTC CTATTCCAAG

6370 TTTTAGTTGG

6380 CTTGCATACT

6390 TACGMGTCG

6400 TCTGCTTACG

6410 CTATAAAAAA

6420 AAAAATTTTT

6430 TTATTATAAA

CTCTGTX

AAAT@?!%:

ACAGAG?%

6470 TATATAATM

6 0 6560 CCCAGGTP C TACTTATTTC

6570 GTAGNiCTAG

6580 6590 GCATATGAGT TTGCGTCTTA

TGT%&TT

TACATATAG:

k%CACit%

TCTMTi%

TAMAT%?ii

ACTGAA%i

ATACTT@?

ACTACT%%

6100 TTATAATTTT

6110 TGTTTTTACC

6260 6240~1~ Hha6250 ACTTGCATAG CTTGCGCACC CATAGTATAC

ATCAW%?

6330 AATTAAGTAA

6690 6700 TACATCGGGA AAATTTATAG

Sau 6710 6720 6730 GATCACGAAA ATTGTCGGAA MTATTTCTA

6740 ATGTTTCTTC

6750 ATTAAAACAA

6830 AGTATATTTT

6840 MCATAAGTT

6850 AATTTTMTA

6860 GATTMCAAG

6870 ACATTCCATT

Ah TTATTAGCTG

6890 GTATAGCMA

6900 ACMTCATGG

6910 GTTGTGTATA

6920 TATTTTTAAC

ATCAGTTGAT

6940 TCTTTAAAAT

6950 AAAAGTCTAA

Ah TAATAAAGCT

6990 7000 GTGAACTAAA TTAGGCATM

Alu 7010 AAGCTGTMC

7020 TTGTTTATCT

7030 GMTTGMTT

ATGGTT%%

GGTTTT:?!

TTACCT%

AllI 6980 AGAGAAGCTG CATCTMTGA

Hinf

T&TT;#

TAGAACTC?&GAGAG%??

TCTTAT;%

7110 ACTCATCATA

Alu 7120 TAGCTTTGTT

7140 7150 ACCTGATGGT AATGTCCATA

7130 TTACMCTAA

Hinf 7170 TGATGAGAAT CTCTMTAAA

Fig. 7.

the short ND3 gene, all use UGA to code for tryptophan. Utilizing the polar residues designated by Capaldi & Vanderkooi (1972) (Asp, Asn, Glu, Gln, Lys, Set-, Arg, Thr and His), in general, the intronic ORF has a higher polar content than do the exons. The ND3, ND4 and ATPase 6 introns have 47, 48 and 50% polar residues, respectively, suggesting that they are not membrane proteins. The exon sequences of ND2, ND3, ND4, ATPase 6 and URFC have 35, 32, 32, 33 and 33%, indicating that these proteins are more iikely to be membrane associated. The exception to this distinction between the introns and exons is the so-called ORF(C) which has 47% polar residues. Primarily this is due to the high content of Lys (10%) and

less so for Asp, although this residue is also high. All three introns also have a high percentage of Lys (10% for each) whereas the exons range between 2 and 4% for their Lys content. Whether this implies that ORF(C) is an intronic open reading frame or an unusual exon is not clear. We were not able to detect intronic consensus sequences. Another example of an exon sequence codon usage resembling an intron is that of URFN (Burger & Werner, 1986). This is an unusual unidentified 633 amino acid residue reading frame of N. crassa mtDNA which more closely resembles introns than exons, since it has a high Lys and Asp content (11 and 5%, respectively) and utilizes CGN codons at a relatively high frequency. ORF(C) more closely

832

D. J. Cummings

and J. M. Domenico

7180 Alu 7190 7200 7210 7220 7230 7240 GTACAAACTT TAGCTACTTG GTTTAAATAC TGAACTAAAAGTTCTAATTT ATGAAATCTT TCGTATAGTA 7250 7260 7280 7270 CTTTTCTTAA CCCTTTACCC TATATCGTAA AATCTCTACG TTTTA&%

7300 Alu 7310 ATACTAfXGT TAGCTTTATA

GATAAA:??? TATTCT%

AGTTAA;%

TATCAT:%

TACAAA?%

TCTTTT:?:?

AGAGAA:$A

7390 AllJ 7420 7430 7400 7440 7450 CTMCATTAT AGGGTATTGTCATMTAGCT TTTTTAATAA TATCCCTAAT AATCTTCATA TCAATTAATC 7460Hinf 7470 Alu 7490 Ah Hha 7500 7520 TAATATACGA ATCTCTCTTC TGGAAGGCGCAAGCT TAAA TTTTTATTTT CTGAAAGCTCTGTTTTTAAA 4 nDNA Cla 7530 7540 7550 7560 7570 7580 tiinf 7590 TAATCGATAA AACAAACMC TACGAATGAATWTCTT TAGGTTTATC ATCCCAAGTAGATTCATTTA MTTTA%

TTGACC#

TTT&&CTA

MCTTA%

AGATAA%!

TGAAAT%

MCATGf::

7670 7700 7710 7730 7680 7690 AIU ATCCATTTGG ATAGGTAGATATGTCTCAAA TCAACTATTA TACTCTATAC MMAGCTAA AAAAATTTTT

TTATTT%??TTGTT%%

CAATAT;;&

TTACGA%:

TTTTTA;;::

ATTTAT;??

GGAGCAT:;

TTTTAC;:E

;Cc”cCM::“T

CTMCT;%

MTTAT::;:

TTTCTT:;?:

AGTTMCK

GATTTTT%A

GTAAAG;%

8170 f-finf 8180 MTATMCTA GAATCTATCA ATTCTT%?T4WTA:?::

CTACCA%I:

ATTCM:%

AT&G

Figure 7. DPU’A sequence of the region encompassing the URFC, ORF(C) ATPase 6 and tRNA Met-2 genes. The noncoding strand sequence starts at a Sau3A site (given in Fig. 6 of Cummings et al. (1985)) and continues for 8227 bp to a DdeI site in that same Fig. 6. The arrows below line marked 789 to 817 indicate that about 400 bp were not reported here since they were given in Fig. 6 of Cummings et al., (1985). This Figure completes 48 kb of contiguous sequences. tRNA Asn is bracketed and restriction sites are marked. The reading frame for the ATPase 6 intron at position 4937 starts 12 bp upstream but only the first M is given

resembles the other exon sequences in its CGN usuage. In Saccharomyces cerevisiae as well there are three long ORFs, termed RFl, RF2 (Michel, 1984) and RF3 (Seraphin et aE., 1985), of unknown function whose codon usage resembles ORF(C), especially with regard to the AAA (Lys) codon. Little sequence similarity with ORF(C) was detectable, and ORF(C) lacks the GC clusters common to these ORFs.

4. Discussion (a) Subgroup P. anserina (Jamet-Vierny

class IC introns

mitochondrial DNA is highly et al., 1984; Cummings

mosaic et al.,

1988a,b,c). It is nevertheless unusual to find that, the singular characteristic linking three physically well-separated genes is the type of class I introns interrupting each gene. The introns are similar in two respects. First, they have a specific shape to the C-domain (the so-called P 5a, b, c region in the nomenclature of Burke et al., 1987), striking enough to warrant classification as a new subgroup class IC intron. Thus far, five P. anserina introns have this trait: the first two introns in the ND1 gene (Michel & Cummings, 1985) and those reported here in the ND3, 4 and ATPase 6 genes. Except for the first intron in NDl, these introns also seem to be related in the similarity of the f-oligonucleotide (P7) with regard to the upstream GCA unit and the A-U

Pervasiveness

of a Class Z Zntron

833

1 MYQFNF---ILSPLDOFEIRDLFSLN~NV~GNIHLSITNIGLYLSIGLLLT~~~-M~I~P~ II III IIll II I III I IllIll II I 1 ~TLFNTVNFWRYN~SPLT~E~K~ISIOTPIL~HISITNI~T~FLL!INLLSTNYN
A. niddans Ifl

I

II

--P.anSOha N.crassa -m

65 ~~ISMAIYATWSIVINGUPTKMLYFPFIYALFIFIL~WJ~TVPYSFASTSHFILTFSMSFTIVLGA Illlll III III1 II II II IIllIIlIIIIII lIlIlllIlIlIllIIl III lIIlllll 73 ~~SISKSLYATLHSIV~NQINPKM~QIYFPFIYALFIFILINMGW!~‘SFASTS~-WLTFALSFTIVLGA lllIIllllII III IllI IlllIllIII IlIIIIIIIIIIIIIIIIIIIIIII IIIlIIIllill YSFASTSMWTFALSFTIVLGA 61 !~SISQESLYATIYSIVTSQINPRM~~IYFPFIYTLFIFILIN~LI T WSISMSLYAT-MSIV-NQINP-NG(~IYFPFIYALFIFILI~IG~Y~~T~~TF~SFTI~~

6.~ E.2. N.2.

137 TFLGLM\HWKFFSLFVPSGCPLULPLLVLIEFISYLSRN~SLGLRLAANIL~LSILSGFTYNI~S I II I III III1 II IllIIIIIIIIlIlIIIII II .llIIIllIlIIIIllII II lllllllll SGMLLHILAGFTYNIIITS 145 TILGFgKHGLEFFSLLVPAGCPLGLLPLLVLIEFISYLARNISLGLRLAAN llIIIIIllIIIIIlllIIllII IIIIIIlllIIIIllIlIIIIIlllIl v IIIIIIIIIIIIIIIIIII 133 TILGFG~KMXEFFSLLVPAGCPLALLPLLVLIEFISMARNISLGLRLAANILSGM~~LHILAGF~IMTS

&c. 11;:. &.

209 GILFFFLGLIPLAFIIAFSGLELAIAFIQAQVFWLTCSYIKOGLOLH.& II llIlIIIllIllIIIIIlIl IIIIIIlllIIII III1 III1 217 GIIFFFLGLIPLAFIIAFS’XELGIAFIQAQVFWLTSGYIKOALCUi.~~. lllllllllllllllllllllll~-lIIIIIIIIIIIIIIIIIIIIIIII 205 GIIFFFLGLIPLAFIIAFSGLELGIAFIQAQVFWLTSGYIKOALDLH.~~. GIIFFFLGLIPLAFIIAFSGLELGIAFIQAQVFWLTSGYIKOALOLH.

Figure 8. Amino acid alignment of the ATPase 6 gene of P. anserina with the same gene in A. nidulans (Netzger et al., 1982) and N. crassa (Morelli & Macino, 1984). Arrows point to sites of interruption by an intron.

extension in the f-f’ pairing. Whether this trait constitutes subgrouping is not clear, but these four introns also share the property that their open reading frames emerge from the D-region (P9) of the secondary structure. A sixth P. anserina intron has properties quite similar to these. While we have not completely determined its sequence, intron 5 of the cytochrome oxidase subunit 1 gene (COI; Matsuura et al., 1986) closely resembles the NDl, intron 2 and ND3 introns in both secondary and primary. structure with regard to the C-domain and f-f’ palrmg (not shown). Its open reading frame also appears to emerge from the D region but this is in t,he uncompleted part of the sequence. Other introns also have their open reading frame in this position. For example, intron 4 of P. anserina ND1 and its counterpart in the N. crassa mitochondrial genome have their ORFs in the D2 region (Cummings et al., 1988a). Due to this location of the open reading frames just upstream from the 3’ splice site. the possibility exists for the use of alternative splice sites in the expression of the intronic reading frame and these have been found in both these introns in the ND1 gene (Cummings et al., 1988a). No such sites have been found in the introns of the ND3, 4 or ATPase 6 genes. With regard to other organisms, the ND4L and ND5 genes of N. crassa appear to be of the class IC type (Nelson & Macino, 1987). The single 1490 bp intron of the ND4L gene is quite similar in secondary structure and primary sequence to the self-splicing nuclear large rRNA intron (Cech et al., 1981) as well as to the first intron (1820 bp) of the ND1 gene of P. anserina (Michel & Cummings, 1985). Like the ND1 intron 1 of P. anserina, the ND4L intron also has its ORF emerging from the a domain rather than the D region and both reading frames are

/OH G-Cu ;I$ C-G 2:: A-U

U G G

AA

lJ*

CUCG I I I, GAGC

ucu AC I I I AGA

U A A

“A

u _ AUUA A-U

G

A-6 U-A A -U A C A ‘GUU

NAA Asparagine /OH E:i C-G 61; A G

AA

UUUG Au I II I AAAC c

GUA

G-C

cucuuUUA 1 I II I GAGAA

CG uu

; ;;A;* A-U

E-g u . c A

U ‘Au NUA

Methlonine-2

Figure 9. tREA structures from tRNA Met-2 (Kochel c?t al., 1981) and tRNA Asn (Xetzger et al.. 1982).

Intron

ND4

5’

A” A : cu

G u x fj A

3’ I

Intron

a

NO3

,

Intron

ATPase 6

.,,IUIIIIIII, UAAGUAGGAAUAAAAUC

“C-domain”

to, CO---

8+5 b-d.-

Figure 10. Secondary structure models for class IC introns for p. anserina, ND3, ND4 and ATE&e 6 genes. Note the similarity of t,he C-domains and the position of the ORFs in the D region for each. Regions are marked by letters according : to Michel et al. (1982) and Michel & Cummings (1985) and in the P-classification suggested by Burke et al. (I 987).

Pervasiveness

Figure 11. Comparison of the f-f’ (~7) pairings of introns 1 and 2 from ND1 (Michel & Cummings, 1985), SD3, 4 and ATPase 6 of P. anserina and N. crassa ATPase 6 (Morelli & Macino, 1984). Base-paired basesare blocked. See Figs 2, 4 and 7 for the complete DKA sequence and Fig. 12 for the secondary structures.

continuous with the exon reading frame. The secondary structure of the first intron (1408 bp) of the N. crassa ND5 gene has as its closest relative the second intron (2641 bp) of the ND1 gene of P. anserina, especially with respect to the primary sequence of the C-domain. Like the other P. anserina class IC introns, t,his intron ORF is also contained within the D region. The reading frame of this ND5 X. crassa intron is very different from the P. anserina ND1 intron 2, however, being one long ORF (372 amino acid residues) compared with several discontinuous 100 residue ORFs. The second ND5 intron (1135 bp) has no known close relatives. While it has an extended C-domain, the position of t,he distal helices seem to be rotated 180°C relative to the orientation shown in Figure IO. The position of its ORF differs as well in emerging from the e region. Although there is some similarity

of the ORFs

of the N. crassa .ND4L

835

of a Class I In&on

and

the ND5 intron 1 with the P. anserina ND1 intron 2, their closest relative appears to be the ORF of the second intron of the N. CraSsa oli2 gene (Nelson & Macino, 1987). It may be necessary to subclassify the class IC introns. While this paper was in its final stages, Collins (1988) catalogued some 40 known class 1 introns into two main groups, those with a short. (21 to 38 bases) and those with a long (59 to 295 bases) C-domain. Of the 17 introns with a long C-domain, only the 8. crassa KD5 intron 1 and the P, anserina ND1 introns 1 and 2 discussed here and less so for the Ai intron of the N. cerevisiae cytochrome oxidase subunit I gene (Bonitz et al., 1980) had the shape of the C-domain displayed in Figure 10. Collins analyzed all these introns with respect to specific bases in each of the secondary helices in the C-domain and determined principally that the struct,ure was maintained by compensatory pairing and that consensus sequences were not apparent. The one major feature of all these extended C-domain introns was a characteristic adenine-rich bulge, located just below the Cl helix (shown in Fig. IO) (AUAAG for ND3, AAAAUC for ND4 and AUAUA for ATPase S), which was proposed to be involved in some int.ron-mediated reaction. One

deficiency in Collins’ analysis was the exclusion of the role of the open reading frame in intron function. As we have intimated, it may be that ORFs of particular function might habitate only those introns that allow the exon helical interaction necessary for expression. Regardless, the work presented here certainly supports the view that the shape of the C-domain is a characteristic and possibly functional feature of many class I introns. Second, the class IC introns in the P. anserina ND3 and ND4 genes are also closely rela.ted with respect to the sequence similarity of their open reading frames. When Morelli & Macino (1984) reported the intronic ORF in the ATPase 6 gene of N. crassa, they were unable to find sequence similarity with any other intron. Now, we can group this intronic ORF with the N. intermedia Varkud ND1 intron and with three introns in P. anserina: ND3 and 4 and the r2 ORF (Fig. 12). In a sense, all these introns represent examples of possible horizontal intron transfer (see Burke & RajBhandary, 1982; Lang, 1984; Dujon et al., 1986). For the NDl, ND3, ND4 and ATPase 6 and the rl intron of introns of P. anserina S. cerevisiae (Dujon et al., 1986), the secondary structure sequences appear to have transferred independently of their open reading frames. For ND3, 4 and r2 of Podospora, ATPase 6 of .V. crassa, and ND1 of N. intermedia (Mota & Collins, 1988) the intronic ORFs were mobilized separate from their secondary structures. That such a recombinational event is possible is illustrated by the sequence similarity between the N. crassa ATPase 6 intron and the r2 ORF2 in P. anserina, where sequence similarity starts at the distal end of the first dodecapeptide (see Fig. 12). Both the secondary

structure

atid

open

reading

frames

of

intron 4 of the ND1 gene of P. anserina share sequence similarity with the single intron of ND1 from N. crassa (Burger 8: Werner, 1985). lror some of the P. anserina introns the possibility exists that horizontal transfer occurred within the same genome, followed by independent evolutionary changes. This may well be involved with the ND4 intron where so many differences are found downstream from its continuous long open reading frame. It is also possible that one, or both, of the stop codons in the ND4 intron are suppressed for expression. Similarly, for two related Seurospora strains, a recombinational event may have occurred between the ATPase 6 and ND1 introns resulting in a chimaeric ND1 intronic structure. It would be interesting to determine the DNA sequence of the N. intermedia Varkud ATPase 6 intron to det,ermine if it has the ORF of the N. crassa ATPase 6 intron or the ND1 intron. Separate from the possibility of intron horizontal transfer, the transfer of an entire gene unit may also have happened with the ND2 and 3 genes of P. anserina and N. crassa (Figs 4 and 5). In both organisms these two genes are separated I,y just a TAA codon and the ND3 intron is in the identical location. This organization is in contrast to the

836

D. J. Cummings

N. intermedia N.crassa

Varkud

ATPU~

P. anserina

and J. M. Domenico

ND1

6

ND3

P. anserine

ND4

P. anserine

r2.

KRNFSTLESKLNPSYISGFVJlGEGSFMLTIIKDNKYKLGWRWCRFVISLRK------------------------GSTUITVYNDNTRSTSWAIK~TKIELXN--

Orf2

--n-St---kLNP-yITG~VDGEG~PmltlfKDn-YklGWqVKliPkISLXk--

Conamaua

Dodecapeptide

N. in&media

Varkud

N. crassa

ATPaea 6

F! ansefina

ND3

ND1

KDISLLEAIQRTIKVGKIYKXGIDSIQYRVS-SLKNLQIITDETDSYPLITQKR KDLSLLNKIKEPTDVGNVTLYTKDSAQYRVP-SLKGLDLIINXPDKYPLITKKQ KDYTLLCQIRDYPGIGIITKXGETTLQYYVR-SIKDLNVILSXTDAYPLFSQKR

ND4

P. anserha

r2,

P. anserina

RDLXTLNEIKKTTNGGTITIITDKNSQIKTS-SLKELELIINXTDKYPLVTKKT

Orf2

KD1aLLn-IkdfTgVGtItKhq-d8aQY~V~-SLK-L-lIinXTDkYPLiTKKr

COn8anSU8

N.

infermedio

N. crassa

Varkud

ND1

ADYKLTK#AXNLIKNKSXLTKEGLLELVAIKAVINNGLNNDLSIATPGINTILR

ND3

P. anserine

SDYILTKQAIVLIKNKEXLT~LKNILSLKASINLGLSDELQLVTPDIIPISR

ND4

r2,

P. anserina

EDYLLTRQAYVMLKNKEXLTEVGLKEIRNIKLYXNKGR--VLS.

Orf2

~DY1LTKQAivLIKNkEXLTkEGLL~Iv-IKA-iN-Gl-n-LS--TP-i--I-R

COn8enSU.9

Varkud

N. infermedla N.crassa

ATPass

P. mserina

ND1

TPVDRPPVPTININKDWLVGTIDGEGCTYINVIKSNTNTKVULL--TQ-ITQXN PSVIYITSDVKVKSLNWIRGTIEGEGCTQVITQNSXXPKGRN~LRTS-LT~XI

6

PD----TSLPQILNPTWLSGTVDAEGCTS~TKSKTSKLGEAVKLSTILTQSN

ND3

P. anserina

ADYQLT~IVTIITDKKXKTEEGTLKILGLRYNLNUGI----SE-ELKLATPNI VDYLLTKQAIALIKNKEXLSLEGLLKLVGIKATLRSSWPNLKKV-TPTVKMVR

6

ATPaae

P. ans6rina

ROXYLLTLIRDYTGVGTVRI(DKNNNSVYSVTKVEELTSVITPEPDRYPLLTKKW

PSPWKRGEGTRXENj-YIAGLTSGDGCTFCLY~SSXTVSGKSWLNTQIVQXS

ND4

P~v---tr-----N--Wl-GFidGEGCT-v-kS-t-kS-t-k-g--v-lf~-lTQX-

COttSOIlSUS

Dcdecapepl~de

varkud

N. crass0

P. anserine ATPam

RDEYLIKSLIEYLGCGNTSLDPRGTIDTKVTN-TSS-IKDIIVPTTIKYPLKGN RDEEL#R#LISALGCGRIELALKQSAVYYWTKYQD-ITDKLIPLTYNXPIKGV

ND4

RDE~Lik-li-YLgCGr--l-pk-----yv-t-f~d-IndklIPTf--yPLKGv

Conasnmua

N. in&media

Varkud

6

N. crassa P. anserina P. answina

ND3 ND4

Conaanaua

RDTDLIEKIALYLNCGSVKQRGKDLDAVDTEVTKTELINTQIIPTLLANPLKSS KDEELLKDIAIYLNYGRYYKSPTRNEGQYLITITSD-INNKLIPTLKEYPLLGV

6

ND3

P. anserina P. anserina

ND1

ND1

KSYDTSSTYEAAQIINNKNTRQWESNDIENLLNIKDKMNKYL. KQEDTLDTMIAXLIESKTELTDEGLDTIKLIQS--NIRIIKPEERKV. KNLDTTDTCCVVRLXXNKSXLTKEGLDQIKKIRN--RXXTNRK. KALDYSD~IVTLXKDKAXLTEQDLXEIQSIKL--NNXLTRKLASUS. A-lDTsDT-•iv-LnanK-XLT-EqLD-IklI----nXN--Rk

Figure 12. Amino acid alignment of the open reading frame of introns from N. crassa ATPase 6 (Morelli &, Macino, 1984), P. anserina ED3, ND4 and r2, ORF2 (Cummings et al., 1988c) and N. intermedia Varkud ND1 (Mota & Collins, 1988). See Figs 4 and 7 for the position of dodecapeptides in ND3 and 4 introns. Dots indicate stop codons both internally and upstream from the 3’ end of the intron. The consensus sequence is given beneath each segment; capital letters indicate majority agreement; lower case at, least 2 codons agree. The ND4 sequence has 2 stop codons (underlined) and 2 reading frame shifts (/).

Pervasiveness

VPIFTIMjWWSIKSSRILLK~ 80

of a Class I Intron

YSFINTVW43 SPKoFYL~bt&~~~ 1OZP -@%o

-LNEKSMP%VALLKYGY%fT-LTI-LEI&DSLMSRE%FEVYSP&%LKIPGSP . . . . . . ,: :. . . . *: ,: . ...: . :, ,,, . . , :. NIAL(3KA:~YGLM(FI~~~~~E~~ISH(ALT~~TSYINRF~~~-~Y~AIA 110 sRG~~PIES”RI~~T~SP~F~~~L~~S~~~V~T~K~~~TTTY~~R .a. :a.:.:.. .: . . :. . . . ,..: . . . . :. . .*:.. TSSLGYKH~~RLKM-MYW&DWiP~~THTEEALGL~SKPGELNf”iXKHSEAT 180

fim

ND1 intron

co_. ATPase6 .. . .

Nc

intm

ND1 intron

kg. ATPase6intm .... .

200 170 180 190 210 220 AJVWALDIDKRYIEHYIYLK(JNKPVLGRYTFKLNSNSDEESRNLI~KVWTS~~WEVTN :... * .: . . :: . . . .: KASPMKKNK’f~LGVGIYDLLLILKFS~ELAKYLG~~VTVU
837

~LNDI

intron

ko,ATPose6intron N_r. ND1 intron

e.g. ATPare 6 intron

FKPIQD fy4.c ND1 intron 300

Figure 13. Amino acid alignment of the N. crassa ND1 intron (Burger & Werner, 1985) with the I? anserina ATPase 6 intron. The ATPase 6 intron starts at position 4937 in Fig. 7 at its first M, but the ND1 intron sequence starts 70 residues downstream from its first M.

structure of the A. nidulans genome where ND2 is separated from ND3 by the ND5 and OxiB genes and ND3 lacks an intron (Brown et al., 1985). The ND2 gene is also separated from the ND3 gene in higher organisms where instead it is punctuated by tRNA genes. Fujii et al. (1988) showed that the ND2 DNA sequence from the frog Rana eatesbeiana had excellent amino acid sequence similarity with this gene from human, bovine, mouse and Xenopus laevis mitochondria. They noted six regions of high sequence similarity and suggested that these domains might be involved in either the assembly or the function of the NADH dehydrogenase complex. deVries et al. (1985) identified the ND2 gene in N. crassa by virtue of its poor but significant amino acid sequence similarity with that sequence in human mitochondrial DNA. Some of the regions of best similarity in the N. crassa (and now P. anserina) gene are part of these domains (underlined in Fig. 5). For example, the YFL in the third line, APFHFVV in the fourth line, LAYS in the fifth line and PPL-GF in the seventh line are all within the highly conserved domains of higher organisms. But there are also similar identities outside these six domains (also underlined in Fig. 5). All these segments in the N. crassalP. anserina ND2 gene are much shorter than the conserved domains in R. catesbeiana, suggesting that if these regions are essential for function then only small areas are critical. The S. crassalP. anserina ND2 gene is also different in its length (about 200 amino acid residues longer, with most of these at the beginning of the gene) and codon usage. C is the preferred third base in the higher organism codons whereas A or T is preferred in the fungi. It would be of interest to have a larger data base to enable us to construct a phylogenetic tree for just the ND2 gene, since its gene structure as well as the sequence has evolved so much.

(b) Other organizational features of the P. anserina mitochondrial genome The P. anserina mt genome has numerous repetitive sequences. Some of the short repeats, for example GGCGCAAGCTC, have been reported here and elsewhere (Turker et al., 19873) to be involved in the excision of several mt plasmids which occur during senescence. Other short repeats may be involved in the long-range interactions necessary for bringing the ends of these plasmids into close proximity prior to excision (Turker et al., 1987b). Two duplications are the tRNA genes for Met-2 and Val. tRNA Met-2 is also duplicated in N. crassa and for each organism this gene is involved in intramolecular recombination (Gross et al.. 1984). For P. anserina, a chimaeric mt plasmid of 2.5 kb results and in N. crassa the entire mt genome is divided into two circular units. As yet we have no information on whether the duplicate tRNA Val is involved in an excision event. Other tRNA genes can also be duplicated. In A. nidulans (Brown et al., 1985) tRNA Cys and tRNA Asn are duplicat’ed. For these genes, the duplications are contained as part of larger duplications. The tRNA genes of ;\r. crassa, A. nidulans and P. anserina are similar also in their clustering either upstream or downstream from genes and it has been suggested that this clustering, or even single tRNAs, serves as recognition signals for gene expression (Burke & RajBhandary, 1982; Brown et al., 1985). No function can be ascribed to the apparent duplication of a 175 bp unit, in the 5’ region of URFC. This element of the sequence has not been found associated with the ends of any mt plasmids and the duplicated region is not in such close proximity to ATPase 6 as to act as a putative signal. The codon usage of the unduplicated part of ORF(C) is different from other exons and introns. This suggests the possibility that, ORF(C) is not of

838

D. J. Cummings

mitochondrial origin. It is also possible that ORF(C) is or is part of a mosaic gene but we were not able to detect sequences to substantiate this. All these direct repeat duplications may simply that have serve to generate “slipped structures” been proposed to be involved in the regulation of mouse collagen genes (McKeon et al., 1984). Finally, we note that the sequences presented here complete the 48 kb region depicted in Figure 1 roughly half the entire which constitutes mitochondrial genome of I>. anserina. This work was supported in part by a grant from the Sational Institutes of Health AG06320. We are grateful to Francois Michel for the construction of the secondary structure diagrams for the intron of XD3. 4 and ATPase 6 as well as for his keen interest.

References Bernardi. G. (1979). Trends Biochem Sci. 4, 197-201. Bertrand, H., Collins, R. A., Stohl, L. L., Goewert, R. R. Bi Lambowitz, A. M. (1980). Proc. Not. Acad. Sci.. C.S.A. 77, 6032-6036. Bonitz, 8. G., Coruzzi, G., Thalenfeld, B. E.. Tzagoloff, A. 8: Macino, G. (1980). J. Biol. Chem. 255, 1192711941. Brown. T. A., Davies, R. W., Ray. J. A., Waring, R. B. & Scazzocchio, C. (1983). EMBO J. 2, 427435. Brown. T. A., Waring. R. B., Scazzocchio, C. & Davies. R. W. (1985). Curr. Genet. 9, 113-117. Burger. G. & Werner, S. (1985). J. Mol. Riot 186. 231242. Burger, G. & Werner, S. (1986). J. Mol. Biol. 91, 589-599. Burke. J. M. & RajBhandary, IT. L. (1982). Cell, 31, 509520. Burke. J. M., Belfort, M.. Cech, T. R., Davies, R. W.. Schweyen, R. cl., Shub, D. A., Szostak, J. W. & Tabuk, H. F. (1987). Nucl. Acids Res. 15, 7217-7221. Capaldi, R. A. & Vanderkooi, G. (1972). Proc. Nut. Acad. Sci., U.S.A. 69. 939932. (‘ech. T. R.. Zaug, A. ,J. &, Grabowski. I’. J. (1981). Cell. 27, 487496. Chomyn, A.. Mariottini, I’., Cleeter. M. W. J.. Ragan. C. I.. Matsuno-Yagi, A., Hat,efi, Y., Doolittle, R. F. & Attardi, G. (1985). Nature (London), 314,592-597. (‘ollins, R. A. (1988). Nucl. Acids Res. 16, 2705-2715. (“ummings, D. J. & Wright, R. M. (1983). Nucl. Acids Res. 11,2111-2119. Cummings, D. J., Belcour. L. & Grandchamp. C., (1979a). Mol. Gen. Genet. 171. 229-238. (Cummings, D. J., Belcour. L. & Grandchamp, C. (19796). Mol. Gen. Genet. 171. 239-250. (Cummings, 1). ,J., Macl’u’eil, I. A., Domenico, J. & Matsuura, E. T. (1985). J. Mol. Biol. 185, 659-680. (‘ummings, D. ,J.. Domenico, J. M. & Turker, M. S. (1987). In Plant Senescence. Its Biochemistry and Physiology (Thomson, W. W., Pu’othnagel, E. A. & Huffaker, R. C., eds). pp. 31-42, The American Society of Plant Physiologists. Cummings, D. J., Domenico, J. M. & Michel, F. (1988a). Curr. Genet. In the press. (‘ummings, D. J., Domenico, J., Nelson, J. & Sogin. M. L. (19886). J. Mol. Evol. Tn the press. Cummings, D. ,J., Domenico, J. & Nelson, J. (1988c). J. Mol. Evol. In the press. Davies. R. W.. Waring R. R., Ray, cJ. A.. Brown, T. A. &

and

J. M. Domenico Scazzocchio, C. (1982). Nature (London), 300, 719 724. deVries, H., deJonge, J. C. & Schrage, C. (1985). In Research (Quagliarello, E., Slater. E. C., Palmieri, F., Kroon, A. M. & Saccone, C.. eds), Biogenesis vol. 2. pp. 285-291, Bari, Italy. deVries. H., Alzner-DeWeerd, B.. Breitenberger, C. A.. Chang, D. D., deJonge, J. C. & RajBhandary. LT. 1,. (1986). EMBO J. 5, 779-785. Dujon, B.. Colleaux, L., Jacquier, A., Michel, F. & Monteilhet, C. (1986). In Eztrachromosomal Elements in Eukaryotes (Wickner, R. B.. Hinnebusch. A.. Lambowitz, A. M., Gunsalus, I. C. & Hollaender. A., eds), pp. 5-27, Plenum Press, P;ew York. Fujii. H.. Shimada, T.. Goto. Y. & Okazaki. T. (1988). J. Biochem 103, 474481. Gross, S. R., Hsieh. T. S. & Levine, P. H. (1984). Cell, 38. 233-239. Hensgens, L. A. M., Bonen, L., DeHaan, J., Venderhorst, G. & Grivell, L. A. (1983). CeZZ, 32, 379389.
10, 59-67. Lang, B. F. (1984). EMBO J. 3. 2129-2136. Lazarus; C. M., Earl, A. ,J., Turner. G. & Kuntzel, H. (1980). Eur. J. Biochem. 106, 633-641. Mannella, C. A., Goewert, R. & Lambowitz, A. M. (1979). Fell, 18, 1197-1209. Matsuura. E. T.. Domenico, J. M. & Cummings, I). J. (1986). Curr. Genet 10, 915-922. Maxam, A. M. & Gilbert, W. (1980). Methods Enzymol. 65, 499-560. McKeon. C., Schmidt, A. & decrombrugghe, B. (1984). ?J. Hiol. Chem. 259, 66366640. Michel, F. (1984). Curr. Genet. 8, 3077317. Michel, F. & Cummings, D. .J. (1985). Curr. Genet. 10, 69.-

79. Michel. F. & Dujon, B. (1986). Cell, 46, 323. Michel, F., Jacquier, A. & Dujon, B. (1982). Biochimie, 64, 8677881. Morelli, G. & Macino, G. (1984). J. Mol. Biol. 178. 491507. Mota. E. M. & Collins, R. A. (1988). Nature (London). 322, 654-656. Pu’argang, F. E.. Bell, J. B., Stohl. L. L. &, Lambowitz, A. (1984). Cell, 38, 441-453. Nelson. M. A. & Macino. G. (1987). Mol. Gen. Genet. 206, 318-325. Xetzger. R., Kochel. H. G.; Basak, ?u‘. & Kuntzel. H. (1982). Nucl. Acids Res. 10. 4783-4794. Ohyama, K., Fukuzawa, H., Kohchi, T.. Shirai, H., Sano, T.. Sano, S.. Umesono, K.. Shiki, Y., Takeuchi, M.. Chang, Z., Aota, S., Inokuchi, H. & Ozeki, H. (1986). Nature (London), 322, 572-574. Osiewacz, H. D. & Esser, K. (1984). Curr. Genet. 8. 299305. Sanger. F.. Nicklen, S. & Coulson. A. R. (1977). Proc. nTat. dead. Sci., U.S.A. 73, 5463-5467. Seilhamer, J. J., Olsen, G. tJ. & Cummings, D. ,J. (1984). ,J. Biol. Chem. 259, 5167-5172. Seraphin. B.. Simon, M. 8: Faye, 0. (1985). Nucl. Acids Res. 13, 300-314. Stahl. C., Lemke, P. A., Tudzynski, I’.. Kiick, 1:. & Esser. K. (1978). Mol. Gen. Genet. 162. 341-343.

Pervasiveness of a Class I Intron Turker, M. S., Domenico, ,J. M. & Cummings, D. ,J. (1987a). J. Biol. Chem. 262, 225&2256. Turker, M. S., Domenico, J. & Cummings, D. J. (19873). J. Mol. Biol. 198, 171-185. Turker, M. S., Nelson, J. G. & Cummings, D. J. (1987c). Mol. (‘ell. Biol. 7, 3199-3204.

839

Waring, R. B., Davies, R. W., Scazzocchio, (‘. & Brown, T. A. (1982). Proc. Nat. Acad. Sci.. U.S.A. 79, 63326336. Wright, R. M., Horrum, M. A. & Cummings, 1). tJ. (1982). Cell, 29, 505-515.

Edited by N. L. Sternberg