,I.
MOz.
Viol. (1988) 204, 815-839
Sequence Analysis of Mitochondrial Podospora anserina
DNA from
Pervasiveness of a Class I Intron in Three Separate Genes Donald
J. Cummings
and Joanne M. Domenico
Department of Microbiology and Immunology University of Colorado School of Medicine Denver, CO 80262, U.S.A. (Received 17 May 1988, and in revised form 10 August
1988)
A 48 kb region of the 95 kb mitochondrial genome of Podospora unserina has been mapped and sequenced (1 kb= lo3 base-pairs). The DNA sequence of the genes for ND2, 3, 4, ATPase 6 and URFC are presented here. As in Neurospora crassa, the ND2 and 3 genes consist of a unit separated by one TAA stop codon. ND3, 4 and ATPase 6 are interrupted by class 1 introns. All three introns are remarkably similar in the C-domain of their secondary structure, sufficient enough to designate them as new subgroup, class IC introns. The open reading frames of the ND3 and 4 introns bear a high sequence similarity to the open reading frame of the class IB introns of ATPase 6 from N. crassa and ND1 from Neurospora intermedia Varkud. We also show that the tRNA Met-2 gene is duplicated and is involved in a recombinational event. The 5’ region of URFC is also duplicated but no involvement of this gene with recombination or formation of plasmids is known. The evolutionary significance of the similarities of intron secondary structures and open reading frames of the ND3, 4 and ATPase 6 genes is discussed, including the possible separate evolution of structural and coding sequences.
1. Introduction
complete class II intron of the cytochrome oxidase subunit 1 gene (COI; Cummings & Wright, 1983; Osiewacz bz Esser, 1984; Cummings et al., 1985). Mitochondrial introns have been classified into two major classes or groups, I and II (Michel et al., 1982; Davies et al., 1982), depending on consensus sequences and folding characteristics. Other mt plasmids have shown themselves also to involve introns. For example, E senDNA contains part of the ND1 gene (formerly termed URFl but now known to be part of the NADH respiratory chain complex: Chomyn et al., 1985). This gene has four large class I introns (Cummings et al., 1988u), two of which are almost completely contained within E senDNA (Cummings et al., 1985; Michel & Cummings, 1985). Both of these class I introns showed extraordinary similarity with regard to both primary and secondary structure to the selfsplicing intron of the large ribosomal nuclear gene of Tetrahymenu (Cech et al., 1981). Thus far the complete DNA sequence of a fungal mt genome has not been reported. The presence of so many introns on the 100 kb mt genome of P. anserine, as well as the possible involvement of intronic sequences in the formation of mt plasmids as part of a senescent process, has prompted us to
The circular mitochondrial (mtt) genome of the filamentous ascomycete Podospora anserina varies in size between 93 and 102 kb depending on the race studied (Cummings et al., 1979a; K&k et al., 1985). Like the petite mutation in yeast (see Bernardi, 1979), the stopper mutants in Neurospora (Mannella et al., 1979; Bertrand et al., 1980) and the ragged phenotype in Aspergillus (Lazarus et al., 1980), mtDNA from P. anserina is highly plastic, undergoing excision-amplification of specific gene coding regions. In all the cases studied thus far in Podospora, excision amplification occurs in senescent cultures (Stahl et al., 1978; Cummings et al., 19793; Jamet-Vierny et al., 1985; Wright et al., 1982) or in long-lived phenotypes (Turker et al., 1987a,b). The most commonly occurring mt “plasmid” has been termed a senDNA or plDNA (Stahl et aZ., 1978; Cummings et al., 1979b; JametVierny et al., 1984; Wright et al., 1982). DNA sequence analysis has shown that a senDNA is a 7 Abbreviations used: mt, mitochondrial; kb, lo3 base-pairs; ORF, open reading frame; bp, base-pair(s);
URF, untranslated reading frame. 0022-2836/88/240815-25
$03AK1/0
815
0 1988 Academic Press Limited
816
D. J. Cummings
undertake the DNA sequence analysis of this mt genome. Here, we have concentrated on the linearized region depicted in Figure 1 (a complete circular map has been given earlier: Cummings et al., 1985). The identity of some of these genes has been or is being reported elsewhere (ND1 , Cummings et al., 1985; Cummings et al., 1988a,b,c) while others will be presented here. As can be noted, several mitochondrial plasmids originate in this 50 kb region, including E senDNA, /? senDNA (Cummings et al., 1985) and plasmids associated with longevity phenotypes (Turker et al., 1987a,b,c) o senDNA, 4 senDNA, 9 senDNA, LMt-1 and the SMtl, 2 and 3 family. In this paper, we present DNA sequence analyses on the ND2, 3 and 4 genes as well as ATPase 6. Physically, these genes are well separated, with ND4 being some 26 kb distant from the ND2, 3 unit and ATPase 6 being another 12 kb upstream. We present them together here because the class I introns contained within the ND4, ND3 and ATPase 6 genes are remarkably similar to each
and J. M. Domenico
;1
other with regard to both primary and secondary structure. In particular, one region of their secondary structure, the so-called C-domain (Michel et al., 1982) has sufficient structural resemblance to constitute a subclassification of these class I introns as class IC. There are (at least) two other mitochondrial introns with a similar structure, the two ND1 introns contained within E senDNA
(Michel reading
and
& Cummings, frames
ND3
(ORFs)
exhibit
1985). Finally, of the class I introns
high
amino
acid
the open in ND4
sequence
similarity with each other as well as with the ORF of the class I intron of the ATPase 6 gene of N. crassa (Morelli & Macino, 1984).
2. Materials and Methods (a) DNA sequence analysis P. anserina mitochondrial DNA clones for restriction fragments EcoRI-1, 3, 12 and 13 as well as PatI-4b, 4c, 6, 7, 9 and 10 from races s and A ( +) mating types were prepared as described (Wright et al., 1982). The chemical sequencing method of Maxam & Gilbert (1980) was utilized throughout. Twice gel-purified DNA fragments were dephosphorylated with calf alkaline phosphatase (obtained from Boehringer-Mannheim) and then labeled at their 5’ ends using bacteriophage T4 polynucleotide kinase (Bethesda Research Laboratories) and [y-32P]ATP (ICN). Uniquely labeled 5’ end fragments were obtained by a second restriction enzymatic digestion followed by gel purification. When necessary, partially digested products were analyzed either before or after 32P-end labeling. In addition, 3’ labeling of some fragments (Hid, Tag1 and MspI products, for example) was done using fill-in reaction with Klenow fragment (Bethesda Research Laboratories) and the appropriate [cr-32P]dNTP (ICK) (Seilhamer et al., 1984). The combination of 5’ and 3’ labeling as well as numerous partial products allowed us thoroughly to overlap the DNA sequences of both strands. Restriction endonucleases were obtained from Bethesda Research Laboratories and their reaction conditions were utilized.
w
Pervasiveness
817
of a Class I Zntron
TAATTAc& TTTTTATTETATAAAGTR AAGmiCTil CCTTGAAGZY AATTTTTTKTATAGTAGE TAGTATATV? TAATTTGA?? TTTATTCAA TtD$i-j E ;CT ;TG TTA $?
:TT ;TT ;TT :TT F
FT
[TG p
;AT FT
;TT PT .$GT t::
;TA ;TA F
fiT
;TT fT
;TT 4”
G
K
S
,,,T~C~T~~T~~TTTT~;JGPiPlGCN~~Ad~T~~~TAGTajA S F S E S N N
A
249 GGA GM TTA TTA AAA % GELLKVLVINNELNFYKKI
GM TTA AAT TTT %
CTC GTA ATA AAT ii?
FT i?? fTA ;CT +CA fTT ;%
f$T ;TT “’
(jCT fT
AAT ATA TAT % N I Y L
FT [TA !TT ;TA ;:i
C$A ;TT F
TTG p L
TCA ATA TAT TTT %
[TA :TA ;CA /AT iii!
1% [TA ;TA fTA FTA f%! [TG ETG p
A
fT
N
G
AAA AAA ATT
[TA :TA fTT ;AT ;%
;TT ;TA ;AA ffiT p
GGG GTG GAT GGT % G VDGLSIYFVLLTTI
fTA ;TG ;CT ;TA ifi!
G
;TA
[TG ;TT
[TA FT $6
TAT Y
TTA TTA ACA ACT tt;:
;CA TTA :CA ii:
#AT :TT f
TCT r?? ;TA :TG&$C ;TA ;:?
;CA
;TA {TA [TA
639 CCG CCT TTA TTT ATA TTA PPLFIL TTT TAT ATT TTT :?i FYIFLY
TAT
!GC ;CT AM BT ETA ;TA K Hif we ;TG FCT ZAG FT
AAA TCA CTA I% KSLVDIWVIAVLS Hinf
y
We
GAT ATA TGG GTA %%G
CAA % Q E
CGA G R VT’ i?
GGT GTA MA G V K
933 MT TTA GAA :‘ORFTTC GTA v $++$I N L !TT “:
873 a58 GTT c9;‘n”AGT CAT FCC ;GC ;ITA ;GT TCT H
f
;CT p
;AT ;GT ;CT ;GCIF!
;CT
918 Taq
978
:TC !?!+A
813 :TG ;TA
1038
1023 FT F
LCTA
Fig. 2.
(b) Computer analysis DNA sequences were analyzed using the programs provided by the BTONET National Computer Resources for Molecular Biology (funding is provided by the Biomedical Research Technology Program, Division of Research Resources, NIH grant RR01685).
3. Results (a) DNA
sequence of the ND4
gene
Cloned fragments from EcoRI-3 and 12 and PstI6 and 9 were isolated and their DNA sequences determined using 5’-end-labeled BglII, DdeI, EcoRI, HaeIII, HindIII, HinfI, MspI, Sau3A and TaqI endonuclease fragments. The nucleotide
side of Figure 1 is sequence in the left-hand presented in Figure 2, starting at an arbitrary position m the EcoRI-3 fragment. The start of the ND4 gene was determined using a computer comparison with the identified ND4 gene of Aspergillus nidulans (Brown et al., 1983). At nucleotide position 100, there begins an excellent alignment of the P. anserina open reading frame with the A. nidulans ND4 gene (Fig. 3). The P. anserim open reading upstream for another 93 frame continues nucleotides but the sequence similarity commences at position 100 here. The amino acid sequence similarity continues down to position 706 where the P. anserina sequence is interrupted by a 1399 bp class I intron. The A. nidulans sequence does not
818
D. J. Cummings and J. M. Domenico 1053 TAT AAT ?TA $CT fCA CAG rT1p Y N 61
+CG ;AC fCA CAClp H
[TG ;CT FCT TTA 2; L 0
CCT P
1113 1128 1143 1158 TGA TTT ATA ACA GGA TTT GTA GAT GCT GAA 'XX TGT TTT ATG ATA EGG TTA ACA AAA W F I TGFVOAEGCFNIGLTK Dd*1188 Alu Sau 1203 /$GT GAA CTA TAT AG4 TCG GGT TAT CAA GTA TCA ;CT +TA ;TT fiA fTT PT [TA ;GG E L Y R S G Y Q V 1218 ;GT fCA $GT y
1278
ATA !TT !CA fU I
FT
1233 1248 ;AT :CT :TA ;TA ;GT ;AA fTT fiA
;AT p
1263
$4T ;AT ;TT FT
:TA FT
1293 1308 Sau 1323 y fCA ;CA ;TA ;AA ;AT ;TG ;TT fiA ;CA $TA y
1338 1353 1368 TTA PT ;TC fTT [TA ;CT ;AT ;TTT E;AT ;CA ;AT ;CT :TG ;TT FT L
'$A fU
1398 1413 1428 GAT TAT ATA CTT TTT AAA CAA GCA ATT GTG TTA ATT AAA $AT y OYILFKQAIVLIK
p
$!
GAT 0 ;CT
1443
;AT TTA ACT L T
Hind 1459 1469 1489 1499 1509 AAA TAA GGACTTAAAAATATTCTTTC TTTAAAAGCTTCTATTAATT TAGGTTTATCTGATGAATTG K .
1519
1529
1539
1549
1559
1589
1599
1609
1619
1629
1659
1669
1679
1689
1699
CAATTAGTTT TTCCTGATATTATACCAATATCTAGACCATCTCCCTGAAAB
ATGAAAATTAAAAACTTTAA TmA@@ATGG
1569
1579
GGCACAAGAC
1639
1649
1709
1719
ATGTTTTTTT TGTCTCTATG
CGTAATTCTTCTCATACTGTATCAUjTMA TCTGTGGTATTAAATTTTCA AATTGTTCAGCACAGTAGAG
1729 1739 1749 1759 1769 1779 1789 ATGAGGAGTTAATGMXATG TTAATATCTG CCCTTGGATGCGGTAGAATAGAGTTAGCCTTAAAGCAATC 1799 1809 1819 1829 1839 1849 1859 TGCTGTATATTATGTTGTAACTAAATACCAAGATATTTTT GATAAATTAA TCCCACTTTT TTATAATCAC 1869 1879 1889 1899 CCTATTAMG GTGTAAAAGCCCTAGATTATTCAGACTTTTAACAAA#% CTTTTTi%
AAWiTAA%
1939 1949 1959 CTCATTTAAC TGAACAAGATTTAATGGAAATCCAAT%i?
Ah
2009 Alu
2019
2029
2039
AGCTAGTAATAGCTAGTAGTTATATACUiGTAATAAGGCTG 2079 2089lmhu 2099 TAAATTAAAGAAAAGACATTTATCATGTCAACAAC G TTA W&A
2137
2152
2167 1
FTT [TA :CT ;TG ;CT ;CT ;TA +TT FT
;CA ;CA ;AT LTG FT
;nD:%-?TA
2182
TTA TTA TCT
;TT [TA ;CA y
$GT
Fig. 2.
have an intron (Brown et al., 1983). After this intron, at position 2106 the amino acid sequence similarity is restored and proceeds down to the end of the ND4 gene at position 3055. The two ND4 genes have 70% amino acid residue identity and are about the same size, 488 for A. nidulans and 519 for P. anserinu with most of the difference occurring near the 5’ portion of the gene. With regard to Figure 1, the EcoRI site separating EcoRI-7 from 12 is at position 937 and the two EcoRI sites separating EcoRI-12 from EcoRI 8a are at positions 2946 and 3208, respectively. Detailed analyses of the open reading frame and secondary structure for the ND4 intron will be presented later when the introns for ND3 and
ATPase 6 are examined. S&lice it to state here that this intron has characteristics common to many class I introns. The last upstream exon base is a T and the last downstream intron base is a G; the f and f’ sequences necessary for maintaining the secondary core structure (see later) are boxed as is the single rather than double dodecapeptide sequence common to maturase-type intronic sequences (Waring et al., 1982; Hensgens et al., 1983; Michel 1984; Michel & Cummings, 1985). As we will see later when we analyze the intronic amino acid sequences of ND3 and ND4, it is possible that there is a second dodecapeptide sequence downstream and the nucleotide sequence is underlined starting at position 1601 (see Fig. 2).
819
Pervasiveness of a Class I Intron
fT
2197 ;TT
?A
p
p
;AT
2257 ACT ccT T P
Ode 2317 FCC ;TA FT p
+CA fcT'i$
fTA
$GT :TA
;TA
:TT';i
;TT
2377 ;GA CTA :TC L
2422 !TT
;TT
FT
:TA
FT
2437 fTT
fTA
2482 FT
fTA
p
p
;TA
2497 fTT
:TA
FT
2542 ;TA
;TC
.$CiT FT
!TT
p
5AT $#
;TG FA
;GT y?
13
;CT
2707 TTC TCC :TA F S
TTA :TT'f% L
$iA
$TA ;TT
2272 AcA GTT TTC TTA MT AGT TGA TTA'% T V FLNSWLLKAHVES
fTA
FT
;TT
tCC=$
CTT ;TT L
CTA ;CT L
FT
2392 TTA TTA ;CT L L
$TT ;CA ATA !TT I
fTT
:TA'F
y
;CT
2452 ;AT
:TA
;TT
2557 ATT ;AA I
;TA
;T&
;CC FT
p
ATT y
;TT
;TT
;TA
[TA
;GT ;TA
2572 FT
I
2407 fT
$GT [TA
;AT'$:
;AT
;CT
;CT
2467 $GT fCA [TA
FT
;GT rA2F:
FT
;GT FT
TAT TTA
;TA
2587 FT
;TT
FT
$TA [TG ;ATq:
FA
FTA
;GA ;CT
fTA
AGA GGA ACT GCT2%i R G T A 61 2737 FT
FT
AAA~L~UCT CAT GTA cZ'TCG
Alu Ode 2512 GCT TAT AGC TCA GTA TCT CAT GC%A2~~~ AYSSVSHAAVYL
ATA ATA TCT TAC?i; S Y Y I I 2722 ;TC
FTT {TA
,TAAFT
:TA
GTA ATG CCT TTG M P L V 2752 $CT [TA
;CT
;TA
2782 2767 2797 2812 AAT TTT GTA GfX CiAA TTT ATG TCT TTA TAT GGA ACT TTT GAA AGA TTA CCT TTA TTA NFVGEFMSLYGTFERLPLL 2842 ;CT ;TG ATA ;TC I
FT
2827 CTA ;TT L
MC N
AlU CGT ATA GCT TTC C&T GGT TCA2% GGSFSKFFEENIGD R I A F
;TA
;CT
p
y
2992 PA
fTA
;AC
3007 ;CA ;CC ;TT
TCT eGT ;CT S
Ed3 y FTC ;TT
3065 3052 TTT GGT GTT TAGAGTACTT F G V
1
TTTTAT%?
TTTACG%
3205 3195 ATGAGTGCAA ATTCATAGGA
fTT
be 2857 ;CG ;CC ;CT
AGT AAA TTC TTT'%
[TA
:TA
2962 ;TT
[TA
FT
y
TAT?&i%
2872 ;TG ;AT
GAA AAT ATT GGT2%
fCT
;TA
;TA
2977 ,fTA ;TT
3022 [TA
:AT
;AT
3037 $AT $TA +CA $GT [TA
3075 3085 TGTTGCTGTA ATAAATAACA
ATCTAAi%
TAC ACT ATA ;AT Y T I
AAAGCA%
CGTCCA%
;CT ;TT
TTTTCG%
GTTCTG%
;TG ;TT
[TG
TATTTT%
TAAGd?
ECOR
GGGMTTC
Figure 2. DNA sequence of the ND4 gene. The non-coding strand sequence starts arbitrarily ND4 exon and continues to the EcoRI site of EcoRI-8a. Restriction sites are marked. Intron boxed or underlined.
(b) DNA sequence of the ND.2 and ND3 genes The DNA sequences separating ND4 and ND2 and ND3 have been published elsewhere (EcoRT-8a and 9, Cummings et al., 1985; EcoRI-5 and 7, Turker et aE., 1987a; Cummings et al., 1988a,c). To obtain the sequence for the ND2 and ND3 genes, only the EcoRI-1 fragment was required. As can be seen in Figure 1, EcoRT-1 also contains the genes for the small rRNA gene, COIII, several tRNA genes, URFC and ATPase 6. The DNA sequence encompassing the ND2 and ND3 genes is given in Figure 4. The sequence starts within the 3’ end
upsteam from the 1st sequence elements are
region of EcoRI-7 and continues through the contiguous EcoRI-1 fragment to include the complete sequence of tRNA Met-2. As can be noted in the map in Figure 1, tRNA Met-2 is duplicated at the 3’ end of EcoRI-1 (see later). Interestingly, II/ senDNA (Fig. 1) excises within this duplicated tRNA (Cummings et al., 1987; Turker et al., 1987a). $ senDNA is actually a chimaeric 2.5 kb plasmid in that it results from a double cross-over event that removes almost the entire EcoRI-1 sequence. In this respect it is quite similar to the results reported by Gross et al. (1984), where it was shown that the tRNA Met-2 gene was also repeated in the N. crassa
820
D. J. Cummings
::
\;
and J. M. Dmenico
!+~TTL~Y
LHLVTLQGNYGLSI/~
VK’:“;‘l”‘:
MS~LLYALLIIPHIGIFFILSFDSYNFNITSNNSNSGSFSEAGAGKNSGGELLKVLVI~ELNFYKKIAF~T
&nidulanr --Pmserina
MS-LL--LL*---*G------------------------------------------*NN-----K-*A--T ~~I~~~~~~ILFDFSSKQYQF~EEHY~INHFDI~GMX;LSIY~LLTTII~IAIL~SIESK~ IllIll I II IIIIIIIIIIIIIlIIIIIIIII TIHNLIVSLIIYILFDFSTNOF~VQD~ELSVYNIYLGMGLSIY~LLTTIIMPIALISF~~\~SI
llllll
A.nidulons
~FI~IHLLLETLLLAVFLVLDILLFYIFFESILPPLFLLI~F~~VRAS~LFLYTLLGSLF~LSIIA I lliIIllIIIlIlII IIIIlIIllllIlII IIIIIIIIIIIIIlII III1 AYLIIILLLETLLLAVFLVLDVLLNIFFESILPPLFILIGLFGSSFLLLSILT
lllll
THN~K _P.anserina
A.Iill
b
ITSI~:SDFDALTK~NFNYITQIFLFYGIFIAFRM
256 GIFRLILPLLPK~SINYTYIIYVIGVITILYASFSTLRTIDIKELIAYSSVSHAAWLIG~IEGS IIlllIlIIIII I III II II III IIIIIIIIIIIIIIIIIIIIIIIIIIIIII III IIIIII 288 GIFRLILPLLPKISLNYTSIIFSIGIITIIYASFSTLRTIDIKELIAYS!X%AA VYLIG~FSNIIM~IEG~
~.onserina
A.nidulans ~.gnserino
A._P.gnsering
GIFRLILPLLPK-S-NYT-II--IG-ITI-YASFSTLRTIDIKELIAYSSVSHAAWLIC-IEG328 IALGLA%FVSSGLFICAGGXLYI%!~NDRLITWRGI~PIFSVLFFILALGNSGTPLTL~XGEFBSLY I III IIIIIIIlIIIIII IIIII I I IIII II II II Illll III I llllll IIIIIII 360 ILLGLG%FVSSGLFICAGG~LYDRSGTRIISWRGTAQVPIPLFSILFFILCL@+ZGAPLTLNF~GEFMSLY
A.nidulons P.gnserina
I-LU-HGFVSSGLFICAU;-LM)RS--R-I-WRG-AQ-PP-FS-LFFIL-LGIJ-G-PLTLNF-GEFMY 400 CI~:~H~~I:~~STSI~FSAAYTIFPNNRIVFGGSYSI~FRENIGWTRREFIHLLVFV~L:M:CI~:~;~~P I I lllllll lllll Ill1 I I Illllll Ill I 432 GTFERLPLLGLFSSSS~IFSTIVMYNRIAFGGSFSKFFEMIGTKREFFLLFTLIIFTI~FGIYPSF
A.Jonserina
G-FER-P-LG---S-S--FSYTI-~NRI-FGCS-S--F-ENIGDVT-REF--L----I-T--FGIYP-472
ILDJX~YSVSYLIYNIN IIIIIII I I 504 IUX+HYNVTSLLFGV
A.nidulanL e. gnserina
I-Hy-v-+-----
Figure 3. Amino acid alignment of the ND4 exon sequence of P. anserina with that of A. nidulans (Ketzger et al., 1982). The consensus sequence is given below each region. The alignment starts at position 100 and ends at 3055 in Fig. 2.
mt genome and was involved in recombination resulting in two smaller circular DNA molecules. Tn this case, however, the separating sequences were not excluded. As determined by a computer sequence similarity search, the ND2 gene starts downstream from the tRNA Met-2 gene at position 342 and continues to the stop codon at position 2010 for a total of 1668 bp. Remarkably, the ND2 and ND3 genes are separated by just this TAA codon and this exact situation prevails in the N. crasSa mt genome (deVries et al., 1986). The amino acid sequence similarity for the P. anserina ND2 gene with that gene from N. crassa is given in Figure 5, where it can be seen that the P. anserina gene contains 556 codons and the N. crassa gene 583, and 78% of the residues are identical. As indicated, for both N. crassa and P. anserine an open reading frame starts immediately following the end of the ND2 gene. This ORF was identified as the ND3 gene by virtue of its amino acid sequence similarity with the ND3 gene of liverwort chloroplasts (Fig. 6; Ohyama et al., 1986) where it can be seen that there is a 27% amino-acid identity and 64% of the amino acid deVries et al. (1986) residues are conserved.
reported some of the 5’ region exon sequences of the ND3 gene and of the 30 amino acid residues given in the N. crassa gene, 87 o/o were identical with those shown here. The ND3 gene is quite short, being only 121 amino acid residues in liverwort chloroplast DNA and 130 in Podospora mitochondria. Shortly after the start of the ND3 gene, the DNA sequence is interrupted by a 1271 bp class I intron at position 2103 (shown in Fig. 4). The ND3 gene in N. crassa is also interrupted by a class I intron at this same position (devries et al., 1985, 1986). As with many introns, the ORF of the exon is continuous with the intron ORF. Of the 19 amino acid residues given for the N. crassa ND3 intron, 90% are identical with the P. anserinu ND3 intron. The P. anserina intron is quite typical. The last upstream exon base is a T and the last downstream the intron base is a G, and it has both dodecapeptide sequences expected for a maturase type protein. The f and f’ sequences are boxed. Just before the end of the intron, the 3’ excision site of w senDNA is noted. As with many senDNAs, excision occurs within an 11 bp consensus repeat sequence GGCGCAAGCTC (Turker et al., 19875). Finally, the DNA sequence in Figure 4 ends at an Alu-site
821
Pervasiveness of a Class I In&on 10t Rw*ME’-z 20
TA GTTTAAAGGTAAAACCTTE TTTCATAC;: TAAAGATG%‘%TCGAT? T GTTGA?: GGTTGGCTE TTGGTTd?? GATTGGT@ TTGATTT% 1)4
TCTCCCCAE
GAATGAGf% TTTTTAT%
150 Ah 180 160 170 TTTTTATAGG AGCTTGCACGACTTCGCCATATAAGGTRT GGCAGACk% CATATAT% ACCGTA& HI-a 220 230 240 TGCGCCGTATTTATTTGAGT TATAACAGTAAAGAGCAE: CATCGTT% TATTATT%! GCAAGCT%
TAATTAT%
CCGTTCAA;:UTTGCTTA.% TAATAAAG?tdTTAATTT% TTTTTTA?? T piT+ C 356 371 TTT ATA AGT ATA ATA GGA TTA TTA CTG T&AT GCC 28 ACT TTA AGG CAG %f ATG FISIIGLLLSNAVTLRQDM 416 431 ;CG ;TT $C FTC PC ;GA :TT ECT ETT :TA ftT 476 ;CA ;TG $GT [TG ;CT fTA !TT FT p
[TG !TT F
491 FT ;TT y
;GT ;TT [TA ;AT “:
;TA ;AT $+i FT
;TA [TA ;AT
536 551 521 !TT FCT $AT FTA ;CA ;TA ;TA ;TT ;AT !TA ;TT :TT ;TT ;TT ;TA ;!?
fTA [TA ;TA
581 TTA CAA TTA ACG AGT TTC %t CCC CGA AAA GCC !ii L (3 L T S FYPRKAWIPEHSSL ;TT ;AT ;AA f@ ;?? :TA fT
ATA CCT GAA CAT %% TCA TTA
flA
FT
p”
F
y
701 ;AT ;TG fiA
?A
791 776 761 ;TC ;TT :TA ATG TCT ACG AAT GAT TTA GTT TCT fTA ;TT LCTT;CT :TT y M S T N D L V S
iz
$GT ;AT PC ;ITG F
;TT ;T?jAA
TCA 5
fU
3°F ;CT ;TA :TA [TA f:i
{TA [TA EGT fCT f”
ECA $?i f3T ;TA fTT ;AT 7:; F FT
FAT pdGAF”
;AT p
fT
;TT ;TT fT
;CT FT
;TG
;TT ;TT !TC $GT p
;CT p
+CT fCA $j? ETG fT
p
;TG
[TA ;CT ;CA
TTA TTA GGT GGT ?% A&“TCG TGC JTT ii: L L G G L S S C
[TA [TA ;AT +TT 9
p
[TA [TA
ET’ fi+
fTA
.l CTT AAT AGT it?i ,$GT PT ;TA fVt”: L N S I Ta TAT MA TCA% WYKSYYLNFALLVFSI
FT GCT GCG GAT% A ,A D M
TAT TTA MT TTT%
CCT GCT TTA ACT% P A L T S
TTA CTA GTA TTT%
ATA y
;TT ;TA
Fig. 4.
which marks the beginning of the contiguous sequence containing the small rRNA gene and tRNA genes reported elsewhere (Cummings et al., 1988b). The secondary structure and ORF analyses will be presented in a later section. (c) DNA sequence of the ATPase 6 gene region This region of the map shown in Figure 1 will not be presented as a contiguous sequence, since some of it has been published as part of p senDNA (Cummings et al., 1985). Instead we will start the sequence just upstream from URFC at a Sau3A site (see Fig. 6 of Cummings et al., 1985). When we
reported URFC we indicated that the sequence likely contained a few errors, since this particular region had few restriction sites. We repeated this sequence analysis by both chemical and enzymatic methods (Maxam & Gilbert, 1980; Sanger et al., 1977) and that sequence starts Figure 7 here. Cloned EcoRI-1 and 13 and P&I-4b, 4c, 7 and 10 were isolated and their DNA sequences analyzed using $-end (or 3’-end)-labeled BglII, CZaI, DdeI, EcoRI, HaeIII, HindIII, Hinfl, MspI, NcoI, Sau3A, TaqI and XbaI endonuclease fragments. The sequences containing URFC and ATPase 6 as well as other genes are given in Figure 7. The open reading frame identified with the URFC gene of A. nidulans (Netzger et aE., 1982;
822
D. J. Cummings
and J. M. Domenico
11%
1091
1136
SW
TTT AAA GTA AGT fCA EA ;CT ;TT ;AT ;TT ;GA ;CT ;CT PC !TA ;AT FT F K V S
1166 1151 AllJ FCC TCT :TA ;TT TCA :CA ;TT ;TG ;CT :TA fTA FT p 1211
TTG {TA p L
1226
;CT TTA
11%
:TT ;CT TTT ;TT $TA ;TT
1241
1256
TTA GTT TAT CAT ACA AAT AAT TAT TTA TCA GAA TTT AGT TGA ACA TAC LVYHTNNYLSEFSWTY
1271 CTT TTA TTA ATA AGT TCT CTT TTT?i I- S S L F S L L L
TTA ATA ATT $;“: L I I
1331 CAA TTT A@ ATT AAG AGA TTA CTT GCG1E KRLLAYSTI Q F R I
;TT $TT FT
AGT ACT ATC :CT’?i
$TA FT
:TG1;ti
;TT fTA
AIU Hinf14c6 Hind 1391 1376 1421 TTA TTA fCT [TA FT FT ;GT $GT fTA p ;CA fCA ;AA gCT ;TT fTA ;TT ;AT [TA L L
1436 1451 TTA $4A ;AT ;CT fTA FT FT ;TA PT !TG :TTlF 14% 1511 TCT TTA TAT GGT TAT ;TA +CA fCA PT p S L Y G Y FC ;CT ;CAyR
C$A ;TA fTT AGCl:ii S Q
+TA :TA ;TT ;CTlp
TTT F
1526 1541 GM TAT AAA G4T TTA TTA GAT AAA AAT E Y K D L L D K N
TTA AAA GGG TAT?E L K G Y F
TAT ATA FT pCCT1:: Y I L
1631 1616 Alu TTA TCT TTA $GT ;TA BT fTC !CA TTT ;TC ;CT ;TT $TG FTIF L s L 1661 p ;TT ;TT FC p
y
1676 1691 ;AG ;TG ;TA :TA $GT ;CT ;CT ;TA FT
;CA ;CT ETA {TA 17% FT FT ;AT ;TT ;TC
1736 1721 1751 1766 TTA ACT TTA ATT ;CA :TA ;TT fCT $GT ;TA :TT FT ;CA ;TT ;AT ;AC [TA FT L T L I ;TA p
1781 y
17%
sau
1811
:TT ;TT FTC TAT [TA ;CT PT EAT ;GT :TA fiT
1841 1856 TTT TTA TTT AAA AAG GGT TTA fTC ;TT y F L F K K G L
AllJ
ET p
FT
1826
;CT ;CT :TA F
1871 ;TT faG FT
:TA
y
y
1886
fTT ;CA
1901 1916 1931 CTT ATA TCT ;GC ;CT ;TT ;CT TTT fCA :TT ;CT ;TA $TT TCA [TG ;TA ;TA :TA TTA L L I s 1946 TTT ATA TTT ATG AAT?& FIFMNK
y
;GA [TA ,Tlg::
$+C
:TA :TG1vi
;AA :TT ;TA
2057 2042 2027 $GT ;GT /TG ?CA ;TA ;TT ;TT [TA ;TT ;TA ;CT fTA FTT ;CT 2072 TTA CTT TTT TTA TTT ATT FT LLFLFI
2087 210 TTA ATA ;TT ;CT ;CT ;AT fiT ;C L I
Fig. 4.
Cummings et al., 1985) starts at position 145 and ends at position 807 for a total of 221 amino acids. Sequences starting upstream from this sequence are underlined in Figure 7 to indicate correspondence with another ORF which begins well downstream at position 1864. This ORF contains 296 amino acids and is labeled ORF(C) because it begins with a 175 bp unit that has 94% nucleotide identity with that sequence underlined as part of URFC (Fig. 7, positions 76 to 249 and 1794 to 1968). Because of we did a complete amino acid this identity, comparison with the URFC sequence of A. nidulans and P. anserinu and found that near the 5’ end, URFC and ORF(C) of P. anserina are, as expected, quite similar and bear good similarity with the
A. nidulans gene. The two P. anserinu genes, however, show less similarity for the rest of the sequence. Over the entire length, P. anserina URFC shows with URFC from similarity sequence 47% A. nidulans but ORF(C) has only 16%. URFC and ORF(C) of P. anserina display 35% amino acid sequence similarity over their entire length with the major difference in codon usage (see later) being the preponderant use of a lysine codon in ORF(C). The ORF(C) product is not known. About 700 bp downstream from the ORF(C) identified as sequence there begins a sequence ATPase 6 by a computer similarity search. Just upstream from this start there is a 52 bp palindromic sequence (underlined in Fig. 7) which we
823
Pervasiveness of a Class I Intron 2117 p
;TA
$'A
;CA !TT
;TypC
$AT2p
$TA $AA CTA TcA2i.% LSNSGKTLKLL
ATT AATT;qCn2ii: I N R K
;TT
y
2162 Hind AAA ACC TTA A&G CTT CTG
GTT AAT TGT u;c2% AGT AAT TAC??A2% V N C GWSNYSGIV
G
2402 fiT
Tc%GG
N
Dde Hi&+32 2417 AGA AAT TTC AGT ACC TTA GAA ;CT fU RNFSTLE
K
ATA GTA
R
G
S
S
[TA
fiT
2447 ;CT ;CA ;AC !TC ;CG
2477 2492 2462 GGG TTT GTA GAT GGT GAA GGT TCT TTT ATG TTA ACT ATA ATA AAA GAT%:: GFVDGEGSFMLTIIKDNKY
K
L
f TGA CGT GTA GTT2% GWRVVCRFVISLH
FT
[TA
;TA
2582 fT
p
;TT
fU
p
AGA TTT GTA ATA"% 2597 ;TT
;TT
fiT
;TA
2717 p
;AA
;TA
;TT
;CT FT
;AT
2732 p
L
AAA TAT
TTA CAT AAA AAA2% K K D 2612 FT
FT
A
TTA L
ETT ;TG f"o'
DSAQYRVESLKGLDLI 2687 CAT TTT FT Ii F
fiA
2747 ;TG ;CT $AT /AT
!TA
;CT fd
;TA
2762 p
FC
p
FT
2777 SAT ETT ;CA fU
BT
;TA
2822 :TA
fT
/AT
FT
FCT ;TT
[TA
{TA
AlU
CTT GTA GCT ATA y L V A I 2867 TTC CCT GGT ATT PT F P G I $CT ;TT
&GA :TA
[TA
2837 /AT
FT
2897 2887 pdGA ;CT EAT ?CA ;CG [TA
p
p
fiT
:TA
TTG TTT AM L F K 2792 [TA
[TG p
2852 s$GT fTT
GCT A
Hhlf ;CT ;AA +TT ;TG AAT N
2942 2957 2927 TCT GGA TTT GTT GAT GCA GAG GGT TGT TTT AGT GTC GTA GTT TTT SGFVDAEGCFSVVVF GCA3g:
K
Ah
2702 ;AC FCC ;TA
AAA TTA AGT TTT3:f;
TTA ACT CAA
SKTSKLGEAVKLSFILTQ
3032 (jGT fT
+CA FT
fA
FT
F
;ATF
;TA
yHp
3107 3092 SW ;ITA FT ;CT $GA FT +CT fTT
:TA'p
FT
Fig.
reported as part of the LMt-1 plasmid 3’ excision site (Turker et al., 19876). This palindrome could just as well be involved as a recognition site for the start of ATPase 6. Parenthetically, the 3’ excision site utilizes the complementary 11 bp consensus sequence noted earlier GAGCTTGCGCC. The total length of the exon sequence for ATPase 6 is 264 codons and it has 94% amino acid sequence identity with the ATPase 6 gene from N. crassa (Morelli & Macino, 1984). This sequence similarity is illustrated in Figure 8. Like the ATPase 6 gene in N. crassa, the P. amerina gene is interrupted by a class I intron. On the basis of the computergenerated similarity with A. nidulans (Netzger et al., 1982), the P. anserinu gene does not have the
{TT
GAA TAT TTA G@i3%; EYLGCGN 3122 $AA ;TA
;CT PAT ;TT
GGT AAT
3137 ;CT FT
:TA
4.
putative 93 bp intron (indicated by an arrow in Fig. 10) near the 5’ end of the N. crassa gene. Moreover, the 1694 bp intron for P. anserina is at a location different from that of the 1370 bp N. crassa intron. Like many other introns, both introns are inserted into highly conserved regions of the gene. The P. anserinu intron differs from the N. crassa intron in other respects. First, it lacks the dodecapeptide sequences connected with maturaselike proteins. Second, its 280 amino acid ORF shares little sequence similarity with the N. crassa intron. Rather it has its closest similarity with the N. crassa ND1 intron (see later). Third, as we will see, its secondary structure model is quite different. Downstream from the 3’ end of the ATPase 6 gene,
824
D. J. Cummings and J. M. Dmenico
p
3155 3170 3185 FAT ;TT ;TT QTT ;CA ;TT ;TC :TT fU ;AT ;CT [TA $AA y fT
3200 FT ;TC fCA FT
$AA3$?i p
y
C&AA”&
p
TAiit6
:TC ;GT3p
TTATAG%
fiA
;TG FT
fCG3;:
3433 AGA :CA 6A.A ;TT FTF I?
p
;TT ;TT :TTT:
3493 3508 TTA GAA ATA TTA TTA ACT TTT ;CT ;TT FT LEILLTF
3553 [TA fTT !TA [TA ;ITA y3F
3613 TTA GGT AAA AGT ffA L G K S
[TG p
$AT ;AT ;CT ;CA3f:A
!jCT TTA (jTA {ACT;: L 3523 ;TT $GT y
;AT ;TC FT
:TA fCT :TT fTA39@: ;TA FT
~TT3ff
ATAGAG%
s$GTfX
TAGGTZ:
F
AAA K
WAA%%%kGI
w2Iamm2 3388 3403 3418 TAT CAA GAG A4G TAT AGC ATA TTT CW TGT GGT TTT CAT $GT ;TT ;yTA p YQEKYSIFECGFH
FT
$AT TTA L
3215 3230 3245 ;TT ;GT GAA GTC GTA AM TTA ATG GAA AAT AAA TCG CAT TTA FT E V V R L M E N K S H L
;TA $P?i)A3;;:
TAA TAGTTA%
fM
;AG tTA3fi?
ATAAAT%
;AA FT
;TA [TA FT 3538 FT :TT
;TT fTT3z%
JAC y
VCA ;TG +CA $GG3F:?
ACTTTA%
ATAA&:
?I T4 Alu 3743 3753 3773 3733 3783 3793 AATTTTGTAC TTCTGCTGAT TACTTAATTT GTACTTCGAGCTTACGCCGA AGCACGAAGT AGTGCTAATA
3803 TTTAGCW
3813 UicAAAAMA
3853 3863 3823 3833 3843 AATTTTTTTT AGGTGTAAGCCCTATAAATT GTAAATAATA TATTATATAC
3873~1~ CTAGATATAG CT
Figure 4. The DNA sequence of the ND2 and ND3 genes’ non-coding strand starts at the duplicated tRNA Met-2 gene spanning the EcoRI-7 and 1 junction and continues to the Ah1 site at position 3875, which starts the sequence upstream from tRNA Arg (Cummings et aE., 19883). ND3 intron sequences are marked and boxed.
the duplicate tRNA Met-2 sequence is given (position 6478), which is involved in the excision of @ senDNA (Cummings et al., 1987; Turker et al., 1987b). The segment of the chimaeric $ senDNA excised from this region extends down to position 7486 where it can be again noted that the sequence GGCGCAAGCTC is part of the excision sequence. The sequence then extends down to the DdeI site, which appears just before the URFQ’ sequence (see Fig. l), reported as part of /l senDNA (see Fig. 6; Cummings et al., 1985). It should be noted that all of the sequences presented in this section complete the sequence of p senDNA. For completeness, the structures for the two tRNA genes identified here, tRNA Asn and tRNA Met-2, are given in Figure 9. These two tRNAs bring to 24 the total number of tRNAs identified in the P. anserinu mt genome, two of which are duplicated, tRNA Met-2 and tRNA Val (see Fig. 1). (d) Secondary
structure
models
Thus far, we have presented the DNA sequences for the ND2, 3, 4 and ATPase 6 genes without any attempt to connect these physically separated
mitochondrial genes. What seems to group these genes is their introns, both with regard to their secondary structure and their open reading frames. The secondary structure models for the ND3, ND4 and ATPase 6 introns are given in Figure 10. All three introns lack the additional 6 helix inserted 3’ to the f-oligonucleotide characteristic of class IA but not IB introns (Michel et al., 1982). Even at a distance, these secondary structures for the three genes appear to be remarkably similar. All three have an extended region with the same shape below the C-helix, designated as the C-domain (Michel & Cummings, 1985). The term C-domain was initally used to describe this region in the first two introns of the ND1 gene in P. anserina (Michel & Cummings, 1985). We suggest that all these introns constitute a new subclass of class I introns, class IC. The C-domain differs in these introns in the socalled Cl helix which is present in the NDl, intron 2, ATPase 6 and ND3 introns, but is absent in the ND4 intron and appears as a 17 base unstructured bulge in NDl, intron 1. The extensive sequence similarity of the ND3 and ND4 introns with NDl, intron 2 and the ATPase 6 intron with the ND4 intron is boxed in Figure 10, but other relationships
825
Pervasiveness of a Class I Intron 70
60 50 40 30 10 20 MIFISIIGLLLSNAVTLRQDMSVNFNRIALl~IYCIL~T~SIINKGIGLHGULHITNITLIFHI~ . . . ...*... : : :::: : ::::::::::::::: :: ::::::::::::: :: MIIMT~LS~~~~~~~TLR~~ISILFNRI~~IALIYCIL~T~SII~~~~L~TNIT~Q~ 10
panserh N.cmrsa 120
110 100 IFFLSILI~~LTSFYPRK~~IpE~KDIIYQKFLNYRT--------------KIFNKmjEHM
::
~grrseri~
::::::
N.cmssa
190 180 160 170 150 130 140 LILLNISGAVFLMSTNDLVSIFLSIELQ~~YLLSTIYRN~LSTA~IYFLL~SSCFILLGTSL~an~ri~ . . . . . . . . :: ........... .......... .: ::::::::::::::::::::::::: iji~~~I~~~VFLMSTN~~SIFLSIE~~~Y~HTS~T;~l~i~~i~o~ILCG~~~~
L~~~~TTSL~~ILNsI~~~~AL2:~S~L~~LLVFSI~~~VSAAPF~PWYDe~ . . . . . . . . . . . .,.. .. .. ..,.,*..,,*,...*...*. . . . . ...*........**. ..,.,...... :: :::::::::::: :::::: : LYANSGT;~ffiLYIIN~;~DVNDNM------TSW&USWLNF~~~IFWGFL~~~ SJUEWgZW,N.M
A:~~IVTTFVA:~~ISIFIF~LVYHTNN:~~F~L~~~SSLFSLI:~~~TGIF~:~RLLAYs~~ :::::::....‘........ : :: :: :::::::::::::::::::::: . . . . ..**..... ..,............,.. . . . . . . ..*.....*. AIP~~~TTFVAII~~SIFILLL~~~TNNYL~~~IY~L~~~FLSLIIG~T(JFRIK~~~~c_m,??S!
T:~~GFILL~~CSI--EST~~~~F~IQYS~~~FI11:~~GFSLYGY:q~NSP .,.,,......... . . . . . . . . . . . . . . ...,..... . . . . . . . .*......... ......... ::::::::
ii~~~ILLALS~~Q~STQ~~~MIQY~~~uL:
4
..,.. ..,..
.. .. .. .. .. ....
P.h
,I,GFSLYGYb$WEYKNLLD;NNSP Nxrasra 470
460 450 440 430 420 410 IQVISOLKGMYINPLLKSLAITIFS~/GIPPLV~F~~~~YIFLTLIAILT~I~~~~D~~~~~~ . . . . . . . . . . . ..a.... : :::::::;: ..,...,,,I,.,,.. . . . . . . . . . . . . . . .. .. .. .. .. .. .. ...*.......*.......* :::::::: VQL~~~LKGYFYL~~~L”“AI~~“AGI~~~F”””Z~~L””’YI~~~IAILT~~VW~~
LNI~~~IFFYLP~~~NP~IGEF~~~K~IFEA~~KGRITLIs----------------~~~FSITISI~~ .*” : : j~j~oi~FYSPR~~KTVDV~N~oFP~R~~~~DSNAFS~S~RYTVSSPL~YTI __
:::::
::::::::: y;;TISI
N.cmsra
550 540 530 ITLVILLFIFMNKEHXffiTILVQVLFSN~.~ .‘..‘..‘:: :::::: N.crossa iir~~i~~~~~~TI~~~~FST -)
Figure 5. Alignment
of ND2
amino
acid sequence
exist. For example, the sequences in the b, c and d helices are quite similar in the ATPase 6 and ND3 introns as are the sequences in the a, b, c, d, and e helices for the ND3 and ND4 introns. Moreover, the shape of the D region is quite similar for all three introns even to the presence of the ORFs. The ND4 and ATPase 6 introns appear to have “exchanged” the ORF position between the D2 and D3 helix locations. The D region of ND1 intron 2 is quite similar to these three introns as well but ND1 intron 1 is not. Its Dl helix extends into a highly structured domain and its open reading frame is in
with
that
of the N. crassa gene (devries et al., 1986).
the a domain (Michel & Cummings, 1985). Except for the ND1 intron 2, none of these introns appears to show base-pairing between a so-called internal guide sequence and the 3’ exon sequence (Davies et al., 1982). The ND4 intron has the sequence pair AAAUCC and UGUUAGG which could provide obvious complementarity but not precisely at the splice site. Repositioning the splice site at either the 5’ or 3’ end is not useful, since this affects the exon sequence. It is possible that these sequences serve as an RNA guide but not in the precise manner proposed. We should emphasize that we have
10 20 QEKYSIF~GFHSFLGQf4iTQFGVKFF?ALVYL MSSMTLFILFVSlIALLFLFIWLIFAPH ::.. .:.:... :.. .:: . . . . . ::::.. .:.:..: . . . . :.. . . .:: : MFLLQKYD~~FVFLLIIS~~SILIFSLS~~IAPINKGP~~FTSYESGI~~-MGEACIQF IRYYMFALV# 80 120 110 100 90 LLD:~ILLTFPFA~~EYVNNIYGLI~LLGFITIITIGFVY~LGKSALK~DSRQV~TMTRFNYSST .. . .. : ..: : :: :..:: .... . ... : . . . . . . . . .:.:.: IFOVETVF~~PWAMSFYN~~ISSFIEA~~ILILIIG~~~AWRKGAL~~~ Liverwort
P.gnrerino
liverwort
(M.polvmorpha)chloroplost
130
_P.pnrerina chloroplast
Figure 6. Alignment of the PU‘D3 amino acid sequence with that of the liverwort chloroplast gene (Ohyama et al., 1986). A similar sequence similarity of part of t,he ND3 gene from N. crassa was obtained with human PU’D3(deVries et al.. 1985).
826
D. J. Cummings
TACNCATA
80
CAGAGGAC~~ ACAGAcGf:?
and J. M. L)omenieo
GTCTGTAff:
ATccTcTfz
-
MAA%
TAAAAATkt:
URFC
159 174 189 1 AAC TAT CCC TTA TTT TGA ATA PhC GAA ATT TTA ACA AAT GGT TTT NNNYPLFWINEILTNGF
219 234 GTT % TAC ATA TTA GA T ATA TTT TCA ATA ATG GCT TTT CTT A tYILDIkSIMAkLT
ATA CTA TTA ;TT
309 p FT
fTA
339 FT ETA fj4T ;TC
fTA
“‘:
;CA ;AT
:TA
;::
384 :CT :TA
;CT :TA
399 TTA ;TT TTA ;TT L L
414 ATC TTG /TG I L
[TA
fTA
:AT
;TT
;TT
$GT fT
ACT AGT tt% T S N
fTA
TTT ATT F I
gGT f!T
;:
[TA
;TT
;?:
;TG
;AT
;AT
f??
$GC PG
TTA ;E AAA AAC CCT ATT 6;: LTKNPIVSVLFLILL
FT
;CC ;CT
:TA
TAT TTA FT Y
?A
489 fTT
324 $AT [TG FT
fTA
;TT
$GT FT
549 FTC ;AT
YAT p”’
p”’
:TA
p
FC
609 PAT [TG FT
;GG EAT p
!TA
/jAT [TA
669 ;CT ;TA
fTA
fAT
624 $AC fjAT f$T
SW ;AC ;TT
684 /AT
;AT
*
;TT
789 FTA fCA fTA
p
:TA
AGT ATT CCT TTA % S I P L T
;CA ;TG 2:;:
714 729 TAT ACT ATT TAT AAT ATA TGA TTA ATA ATC 6% YTIYNIWLIIASFILLLAM sau GTG GGA TCA fTT V G S
TAC GTT Y v
TCC GTA TTA TTT %
:TT
429 ;GA ACA AGT ;AA T S
249
;TT
;CT
;AT
FT
:TT
i:?
fCA FTA f;AT CTA y
LTA
ATA I
639 654 $TA ETG ;AT $TT ;TG +CT $GT $TA
:TA
Ah /+CA ;CT
699 fTA
FT
FT
AGT TTT ATA CTT ;?:
C$A ;GA F
fT
TAGGTG&
TTA ;TG
CTA GCA ATG
CAA%T%AC -
1187 1197 TCGACGTGGG ATATATAGAC
1207 ATACTTCGTC
1217 GTCTTACAGA
1227 1237 CGTACTGACT AAGTATGTCT
TTATGAf%
ATCTT&
C1)PTGCfA:
TAGTTdg%AGTTGAAG
AATTTTk%
CTTTTC+%
TGTT%%
sau 1397th
TGGGATCTAC CkATTT~!?TCAATGk#;
GTATGCCP;jdC
1377Hinf ATTCATffg
CTATTG%
TGCTCGCT?~
Fig. 7.
eliminated the possibility of a one base error at the a helix. Finally, these introns appear to be related in the complementary pairings thought to be necessary for maintaining the core structure of class I introns, the R-S, f-f’ or P7 pairings (Davies et al., 1982; Michel et al., 1982; Burke et al., 1987). A comparison for the relevant introns is shown in Figure 11, where it can be seen that those introns which appear to be closest, i.e. introns ND1 intron 2, ND3, ND4 and ATPase 6, all have the GCA upstream sequence and the extended A-U pairing. The ND1 intron 1, which has the extended C-domain is different, however. This intron is also different in its D domain so it is not certain if it belongs in the same
subgroup as the other introns. Also included for comparison is the ATPase 6 intron from N. crassa (Morelli & Macino, 1984). This intron not only differs in its f-f’ pairing but also lacks the extended C-domain, differs in its D-domain and has an ORF extending from the a helix region. As we will see, however, its ORF is closely related to the ORFs of the Z? anserina ND3 and ND4 introns. (e) Intronic
open reading frame
analyses
The pervasiveness of the open reading frame sequence of the N. crassa ATPase 6 intron has been suggested by its sequence similarity with the open reading frames of the ND1 intron 2 (Michel &
827
Pervasiveness of a Class I Intron 1467
ATATAAACAA
1477
ACTATTMM
1487
CATGAAAAAT
AAMAT%:
Gf%hi%T
CCCTTT%
UiGTT;;T?
MCAAGTTTG
GGGTTTCTTA MTCCTATAA
1657 Ah "ha Alu 1637 1647 Alu TAGCTTATAG GATMTAGAC AGCTAGATGA CGAGCTTGCG
1677 CCTAGTAATT
1687 GTTGTTACAT
1707 ACTTTCTTM
1607
1617
1747 CTCATCTATT
1627 1697 ATTGCATATA
1757
1767
MTMTMAT
YIFIIMAF
1953 ATA ATA ATr; GCT TTT CTT L
FTT plF
;TT
FAG PT
2058 ;GA [TA
FTT fT
FAT'?
y
FT
l%a
fTA
PG
FC
;CT
;CA ;TT
2118 CTT TTA CTT AAT ATA CAT ;TT LLLNIH
2133 ;CA ETA FT
?;A GGT TTT ATC TCT2% LGFISIKNMEI
MT
ATG GAG'i?i
FT
fTA
MA
YISEII
Hinf
Hint
;CA PT
;CA p
2343 y
;CT PC
f$T
fTT2p
EAA f!A
I
f!A
fTA
?TA FCT'F
;TG ;TT
[TA
fiA
:TA
$'TA TT!
;TC
TTA FT
;AT
2088 ;TT
fTT
;TT
ETA ;AT
214a ;TT
2298
FT
;GA FT
EGT ;AT
{TT
;AT
fTA2$?
PG
fTA
2268 :TG FT
:CT ;TA
p
;TT
:CT
$GT fU
,FT
:TT
2313
;CA FTC ;CA FAT [TA
;CT y
2103 $AT
FT
2328 ;CT ;CT
2373
2358 {TA
I
{TT
:TA
TTA L
s
2253
;CG [TA
AG MC STN
1983 TAT AGC ATT ATT FA
$GT :CA :TA
41u
2223
TAT ATT TCA GAA ATA ATi+ ;CT
flA
;TT
MT N
ATA TTA
Hinf
fU
TTATTTfiTG E
1923 GGT TTT GTT IiAA MT
Y
fT
1807 ACAcA(;86$1;
04 (0
1857 ATTAAAAATA
1908 @A ATT TTA ACABT
1893 TTA TTT TGA ATA YPLFWIKEILTNGFVtNIL
1878
FCT FT
CCGEiATGATT AGTATTACAT
1847
1837 TTATTCTCTT
1797
1777 Msp 1787
TAGAGGATGA TGATGATGGT GATTMCCTA
1817 1827 GTACAGACGT ATGTCTGTAC
1938
1727 1737 1717 GTACTCCAGA CTAGTTTCGC CGTCATCATC
;CT fA
2388 ;AC
Fig. 7.
Cummings, 1985), and the r2 intron of the large rRNA gene from P. anserina (Cummings et al., 1988c). This pervasiveness is even more apparent when we compare this N. crassa intron with the ND3 and ND4 introns of P. anserina (Fig. 12) and the ND1 intron from N. intermedia Varkud (Mota & Collins, 1988). This ND1 intron is quite interesting in that it is in the identical position of the ND1 exon sequence as the N. crassa intron but its ORF is different in sequence and location. As shown by Burger & Werner (1986), the ORF of the N. crassa ND1 intron is not continuous with the exon sequence, whereas the N. intermedia intron is (Mota & Collins, 1988). However, both introns have essentially the same DNA sequence utilized in
forming the secondary structure (Mota & Collins, 1988). Only its relative position is altered: at the 5’ end for the N. crassa intron and near the 3’ end for N. intermedia. A remnant of the N. crassa ORF is present in the N. intermedia intron but it is downstream from the secondary structure region. Mota & Collins (1988) proposed that this unusual organizational exchange of structural and ORF sequences in two related Neurospora strains indicated independent evolution of intronic structural and coding regions, Introns of the ND3, ND4 and N. crassa ATPase 6 genes all have open reading frames continuous with the upstream exons (see Figs 2 and 4; Morelli & Macino, 1984). The sequence similarity shown in Figure 12 begins a few bases
828
D. J. Cummings and J. M. Domenico
p
;TT
2568 $GA p
;CT y
;AT
!TA
Dde 2598 TCT GAG GTC AAT AAT AAT GAC ;CA fiT S E V N N N II
p
[TA
FTC p
p
PT
fiA
2748 ;TT
fiA
2764 TAA TATTTATATA .I
CAA CAA $AA fATT: Q Q 2733 !TA
p
;AT
p
;AT
2643 FT
;AT
fTT’fi?i
fTT
p
@T fM
:TA
2658 ;TT
;TA
p2p
2774 TATTGTACGT
2613 SW FAT tJAT
;GT ;TT
Sau FT fTT
[TA
y
?TT FC
F
FCC f”’
2784 2794 CGTTTGGCTT ACTTTATTCA
ATGGGT%
TGTACG%
CAGCAT%?
ATMAT%
GAACTT%:
CATACTS
TEZGGGG%
TATATG?i
AATGCG%%ACTAi%
CA’XAATGCC
CATTTGfi%
TGGTCT&%
CCCCiACf%
TATAATi%
AATATA%i
TTTTTTE
AAGTTAAC?
3014 3024 AGGGTAAAAT ATATTAAAAA
3034 GATAAAAATT
TC%AATh%
3054 TBZAGMAM
TT AAAG%?
s&o74 GAGCGATCuj
TAAATA%
TATATTE
AATTAA%
AAATTG%
CTTACA%
TCTAAA%g
3144 CCTTTTTTGT
AGAAAT%
TTACAA%
TAGTTT%
TAATCC%
TAATAA%i
TTACAT%%CTAG%
3224 CTTTCTGGU
iiha Alu GGCGCAAGCT CTAAAT%
ATAAAT:%
3264mamprru3274 TTMTAAGGC GCGAGCTCAT ATGTTT%?
ACTACT:::
CATATA:%
TGCTAT%
AATTAG%
ATGTTG%
A$TTT~
TTAA%.%
TATTTTZ~?
ACTGGCf%
3404 Da AAGCGAACTG
3434 WTACATA
TTGTAGGCGC &T&i:
CATGAC%
TTT&TT
TTACATE
AAGATAT
Hha
GAATAG,%%CA&~
TAAATTE?
CTACTT%i
GACGC A%%
TTTTAGGT%
2934
TTTTTT%
TTAAAii%!
3414Alu
GTTATAi%
kwn3424
ACTTAT%
~1b.b h-1 3556 TG FT ;CT ;TA FT
fT
E
CTA ATA PC L I
fTT3;i
:CA
3676 GGG TTT TAT TTA ACA3% GFYLTMGAFFLLIINLLST
;CT ;TT
:TA3F
ET
GGG GCT TTT TTT3;?i
[TA
;AT
fTT3F:
TTC +CT /AT
TTA ATT ATA AAT3%
:TA
TTA AGT ACA
Fig. I.
downstream from the relevant ORF. We also include the ORF from the r2 intron of the large rRNA gene of P. anserina (Cummings et al., 1988~). This ORF is not the primary ORI? of the r2 intron; rather, it is an 111 residue oligopeptide between the end of the main ORF and the 3’ end of the intron. As can be noted, the sequence similarity is best for the upstream dodecapeptide but it is quite extensive for much of the length of all the ORFs. In some respects, the ND4 intron ORF is similar to the ND1 intron 2 and the r2 ORF. For the ND1 intron 2, no lengthy ORF was detected; rather, several ORFs of about 160 amino acid residues were found which showed a discontinuous sequence similarity with the N. crassa ATPase 6 intron ORF
(Michel & Cummings, 1985). Here, the ND4 intron has an ORF of 246 amino acids before a stop codon at position 1447 (shown in Fig. 2). But after this stop codon, the sequence similarity continues in the same reading frame. There are numerous stop codons and a two amino acid residue change of reading frame before a putative downstream dodecapeptide (YAIGLTSGDGCF) starting at position 1601 in Figure 2, coinciding with the dodecapeptide positions in the ND3 and N. crassa ATPase 6 introns. After analyzing these perturbations, we repeated the sequence analysis of this region by both 5’ and 3’ end labeling and found no errors. Similarly, the sequence similarity of the r2 ORF appears to start at the end of a dodecapeptide
Pervasiveness
FT
3736 ;AC FT
fU
37% FCT fTT SAT FT
3751 ;TA :TT FT fT
829
of a Class I Intron
Hinf 3781 AIU Sau 3766 !GC ;GA ;CC :TA AGT CAA GAA TCT TTA TAT GCA SQESLYA
3826 3811 :TC ;TA $TA FT ;AA :TA $AT ;CT y fT
3871 3856 ;CT ;TT !TT ;AT ;CT :TA ;TT :TA ;TT !TT [TA :TA f!T’$?-i
FT
[TA !TT p
CCA TAT i& TTT3$? TCT ACA AGC CAT3??+ GTT TTA ACA TTT3g PYSFASTSHFVLTFALSFT 3961 ;TA $TT [TA FT
3991 ;TC 6A-A fiA
3976 FA TCT fTT [TA y
4021 [TA ;TA ;TA d&T u;T4:i: PAGCPLGLLPLL
CCT CTA GGA TTA4:M
4081 TTT ATC ;CG ;AC ;TT ;CG $GA4F F I
;AT p
3841 ,$AA :TA ;AT ;TT
CFAGT
TTT AC1
4006 [TA EAA ;TT ;TT ;CT
CCT TTA TTA $A?
:TA ;CT [TA pT4;:Fa
;TGp
:TAA$T
fTA FG
;CA4p
T
4329 4339 Sal 4299 Sau 4309 4319 Hint 4279 4289 AGATTCTAGTAATAGAAATGGGTGATCGTGGATCTAAATCAGTAATATTT TAMAGATTA TTGTAAAAGA 4379 AT;;% 4369 4359 GACGGTTCTTCCATTAGTTTAAACTAATGGT
CACCCC???: TTTTAT?:
T:?&tTT
CGCCA&:
dTG
4549 Hinf Taq AAGAATEE ATAAAGAATCGA&W&WI
4579 4589 4599 4559 4569 ATAAAGAAAGAAAATAACTA CACTATCGTGTGTAGACTTATATAATCTTG TGATTA?:
CTCCGTi&!!
ACCATGi$.ii GTCTATg?z TGACAC??: CAGAAA~?
AAAGTA?&ATACC??;:
TAT&
4839 ATAT/IJ&AAT TATACCE:
ATT&?:
AACTTCZ
CTACGAf::
TTCCAGii?
GGATGTE
Fig. 7. sequence. It is as if recombinational events have taken place between the N. crassa ATPase 6 intron and the ND4 and r2 introns of P. anserina which preserves part of the intronic peptide sequence. This notion of a recombinational event is reinforced by the sequence similarity of the two Neurospora intronic open reading frames. Here, the organization of the two introns is quite similar. Both have continuous exon-intron reading frames with the secondary structure sequences located near the 3’ end of the intron and the sequence similarity extends for the entire ORF. The ND3, N. crassa and N. intermedia introns all show good sequence similarity throughout their entire length (greater than 40% identical residues), suggesting a similar function in all three introns.
The open reading frames of the ATPase 6 introns from N. crassa and P. anserina displayed very low similarity. Curiously, they are very sequence different introns. As indicated, the N. crassa intron lacks the extended C-domain and has its ORF in a different position of the secondary structure. Moreover, the P. anserina ATPase 6 intron lacks the dodecapeptide sequences and does not (except for 4 amino acid residues) have a long ORF continuous with the upstream exon sequence. The long ORF for the P. anserina intron starts well downstream at position 4925 (Fig. 7). And as shown in Figure 8, the ATPase 6 exon sequences are interrupted at entirely different positions. The closest relative of the P. anserinu ATPase 6 280 amino acid residue ORF is the 305 residue ORF of the N. crussa ND1 gene
830
D. J. Cummings
AGTGAC%?
CCTCTT%
4966 ;TT
p
PG
5026 p
;AT
:TA
fC
5086 “d”
FT
5146 ;AC YCT FT
JAT ;TT
$M
$TA p
:TA
TCT&%? 4981 FT
TTCTCTG ;TG ;%$AC
;CT F
5041 ASGT ;CT ;TG FT
LTA PC
p
5101 ftA
49% $AT ITG $%A fCT
TTA f3A L
5056 ;GT fG
/TG FCT fTT
5161 Ode :CT fTC [TA
5011 p
FT
5071 ;AT
;AC
fJT
$TA ;CA ;TA
:TA
5131 p
;AT 5191 ;TG
5176 ;GC YC
[TA
y
:TA
FT5g
;CC ;CA
;C?$G
;CT FT
FT
fCA FTC fiA
$?%i
;TT 5536 y
;CC fTT
p
f,AT ;AC-T:
;TA
fiC
;CA fICw$A
;CT
;TG fIA
AAA5;g K
;“‘“~Tb+A
;CC ;GAAbT
;AC ;TG fA
FT
$TA5$;
C$A5p
p
y
r$AC TAG AATACT%
MT
TAAATA%
Alha ExnZ 5838 CTT TCA GGT CAT ATG CTG TTA ;AT LSGHMLL
5883 ;CG FT
TTT ;TA
;TC5p
fTT
5853 ;TA
FTC >
:TA
;TA
:TA5Fi
FT
;TT
[TA
fIA
!TA p
fCT5p
$GT
;CT
YCT
$AT $AG fiATi:
pdG ;AT
fiAC’p
;CT $AA y
!TA
:TA
TTG AbA +CA fTyt8 L K
FCC ;TT
;TT5p
TTT AM
;TT
[TAT+
GCA TTA GGT TAT5Ef
AGA AGT AAA5%
TATGTG%
fI
EGT’F
$TT ;CT FC
TCT TCT5% TCT GCT GCT GCG5% SSISAAAKALGYRQPSLSL TAC TTA AAA5@ YLKENRSKPFKGKYL
F
GTT Ci4G GTA ACT%+ V E V T D
E :CT fCT :A,;::
+(jAT ;CT $jtT5;~
;CT F
FT
;CG FG
[TT
:TA
;GA’p
!TT
[TG PT
5236 5206 5221 TCT AGA GAA AA.4 CAC TTT TTT GAA GTA TAC TCC CCC GAG ;AT f&T S R E K Ii F F E V Y S P E F
:TT
;TA
$GT ;TA
$GC ;CT ;Cq$EW\SikT
y
5116 PC
fJC
y
FT
;TG5;k:
!+CT {TA
FT
;TT
eTA4;;
fiG
5251hnsp FCC F
;TT
p
and J. M. Dmenico
p
!Tr’p:
;CT ;TT
:TT
;AG
!TA
;AT
CAG CCC AGT TTA’i:f
GGA AAA5%
TTA ;TC $&I
TTA
:TA5fii
AAAATA%ii
FA
[TA5v:
FT
;TT
;CT ;TA
tCT5ff?
;CC ;TC5y
FT
;?A
;TG FCA
:TT
FT
;TT
Fig. 7.
(Burger & Werner, 1985). This intron’s ORF lacks the two dodecapeptide sequences and its ORF commences well downstream of the 5’ end of the intron. The amino acid sequence similarity is shown in Figure 13, where it can be noted that where the sequences are most similar (196 amino acids) there is a 65% conservation of residues. Previously, Burger & Werner (1985) and Michel & Dujon (1986) found short stretches of amino acid similarity of the N. crassa, ND1 intron and the TMd intron with the ND1 introns 1 and 3 of P. anserina (Cummings et al., 1985), especially in a KGGIY . YIG motif as underlined in Figure 13, and proposed that these introns have a novel type of class I intronic ORFs. The stronger similarity with the P. anserina ATPase
6 intron ORF suggests that this intron also belongs to that subclass. It should also be mentioned that ORF 2 of the P. anserina ND1 intron 4 is quite similar to the single intron of the ND1 gene of N. crassa (Cummings et al., 1988a). (f) Codon usage ND3, ND4, The codon usage of the ND2, ATPase 6 URFC and ORF(C) genes and their introns are all typically mitochondrial (not shown here). For all, the third position is mostly A or U (85 to 89% for the exons and 77 to 84% for the introns) as compared with 30% in the nuclear genes of N. crassa (Nargang et al., 1984) and except for
Pervasiveness
5943 5958 5973 TCT GGA TTA GAG TTA GGT ATT GCC TTT ATA CAG GCT CM 5 G L ELGIAFIQAQVFVVL $GT p
6003 ;AT
fTA
fIA
;AC
;CA
831
of a Class I In&on
6018 s-au ETT f$T ;TA
5988 GTT TTT GTA GTA TTA TCA
6040 TAA AATAAAATGC
6050 GGCAAAAGTT 6120 TGAAATGTTT
6060 TTTTAGTTTT
6070 TTTAGAGTTT
Hha 6080~1~ 6090 GCGCCCGTAG CTTTAAGACT
6130 TAGTTTATTA
6140 CCTCCTGTAT
6150 TATTTATATA
6190 tiha6160 6170 6180 TGTGCGCATT AACACAGACT TGGGTATACC AATTTGATTA
~e6200 TACCTAAGTA
6210 TTGCCTTCTT
6220 ATTTCTTTAT
6230 TTTCTTTAGT
6270 AAATAGGTGT
6280 AAGTAGTTAT
6290 Sau 6300 ~ae6310 6320 GGCAAATAAG GATCATGCGT ATAGGCCGTT TGTTATTTTT
6340 ATTAAAGGCA
Ah 6350 6360 CAAGCTCCTC CTATTCCAAG
6370 TTTTAGTTGG
6380 CTTGCATACT
6390 TACGMGTCG
6400 TCTGCTTACG
6410 CTATAAAAAA
6420 AAAAATTTTT
6430 TTATTATAAA
CTCTGTX
AAAT@?!%:
ACAGAG?%
6470 TATATAATM
6 0 6560 CCCAGGTP C TACTTATTTC
6570 GTAGNiCTAG
6580 6590 GCATATGAGT TTGCGTCTTA
TGT%&TT
TACATATAG:
k%CACit%
TCTMTi%
TAMAT%?ii
ACTGAA%i
ATACTT@?
ACTACT%%
6100 TTATAATTTT
6110 TGTTTTTACC
6260 6240~1~ Hha6250 ACTTGCATAG CTTGCGCACC CATAGTATAC
ATCAW%?
6330 AATTAAGTAA
6690 6700 TACATCGGGA AAATTTATAG
Sau 6710 6720 6730 GATCACGAAA ATTGTCGGAA MTATTTCTA
6740 ATGTTTCTTC
6750 ATTAAAACAA
6830 AGTATATTTT
6840 MCATAAGTT
6850 AATTTTMTA
6860 GATTMCAAG
6870 ACATTCCATT
Ah TTATTAGCTG
6890 GTATAGCMA
6900 ACMTCATGG
6910 GTTGTGTATA
6920 TATTTTTAAC
ATCAGTTGAT
6940 TCTTTAAAAT
6950 AAAAGTCTAA
Ah TAATAAAGCT
6990 7000 GTGAACTAAA TTAGGCATM
Alu 7010 AAGCTGTMC
7020 TTGTTTATCT
7030 GMTTGMTT
ATGGTT%%
GGTTTT:?!
TTACCT%
AllI 6980 AGAGAAGCTG CATCTMTGA
Hinf
T&TT;#
TAGAACTC?&GAGAG%??
TCTTAT;%
7110 ACTCATCATA
Alu 7120 TAGCTTTGTT
7140 7150 ACCTGATGGT AATGTCCATA
7130 TTACMCTAA
Hinf 7170 TGATGAGAAT CTCTMTAAA
Fig. 7.
the short ND3 gene, all use UGA to code for tryptophan. Utilizing the polar residues designated by Capaldi & Vanderkooi (1972) (Asp, Asn, Glu, Gln, Lys, Set-, Arg, Thr and His), in general, the intronic ORF has a higher polar content than do the exons. The ND3, ND4 and ATPase 6 introns have 47, 48 and 50% polar residues, respectively, suggesting that they are not membrane proteins. The exon sequences of ND2, ND3, ND4, ATPase 6 and URFC have 35, 32, 32, 33 and 33%, indicating that these proteins are more iikely to be membrane associated. The exception to this distinction between the introns and exons is the so-called ORF(C) which has 47% polar residues. Primarily this is due to the high content of Lys (10%) and
less so for Asp, although this residue is also high. All three introns also have a high percentage of Lys (10% for each) whereas the exons range between 2 and 4% for their Lys content. Whether this implies that ORF(C) is an intronic open reading frame or an unusual exon is not clear. We were not able to detect intronic consensus sequences. Another example of an exon sequence codon usage resembling an intron is that of URFN (Burger & Werner, 1986). This is an unusual unidentified 633 amino acid residue reading frame of N. crassa mtDNA which more closely resembles introns than exons, since it has a high Lys and Asp content (11 and 5%, respectively) and utilizes CGN codons at a relatively high frequency. ORF(C) more closely
832
D. J. Cummings
and J. M. Domenico
7180 Alu 7190 7200 7210 7220 7230 7240 GTACAAACTT TAGCTACTTG GTTTAAATAC TGAACTAAAAGTTCTAATTT ATGAAATCTT TCGTATAGTA 7250 7260 7280 7270 CTTTTCTTAA CCCTTTACCC TATATCGTAA AATCTCTACG TTTTA&%
7300 Alu 7310 ATACTAfXGT TAGCTTTATA
GATAAA:??? TATTCT%
AGTTAA;%
TATCAT:%
TACAAA?%
TCTTTT:?:?
AGAGAA:$A
7390 AllJ 7420 7430 7400 7440 7450 CTMCATTAT AGGGTATTGTCATMTAGCT TTTTTAATAA TATCCCTAAT AATCTTCATA TCAATTAATC 7460Hinf 7470 Alu 7490 Ah Hha 7500 7520 TAATATACGA ATCTCTCTTC TGGAAGGCGCAAGCT TAAA TTTTTATTTT CTGAAAGCTCTGTTTTTAAA 4 nDNA Cla 7530 7540 7550 7560 7570 7580 tiinf 7590 TAATCGATAA AACAAACMC TACGAATGAATWTCTT TAGGTTTATC ATCCCAAGTAGATTCATTTA MTTTA%
TTGACC#
TTT&&CTA
MCTTA%
AGATAA%!
TGAAAT%
MCATGf::
7670 7700 7710 7730 7680 7690 AIU ATCCATTTGG ATAGGTAGATATGTCTCAAA TCAACTATTA TACTCTATAC MMAGCTAA AAAAATTTTT
TTATTT%??TTGTT%%
CAATAT;;&
TTACGA%:
TTTTTA;;::
ATTTAT;??
GGAGCAT:;
TTTTAC;:E
;Cc”cCM::“T
CTMCT;%
MTTAT::;:
TTTCTT:;?:
AGTTMCK
GATTTTT%A
GTAAAG;%
8170 f-finf 8180 MTATMCTA GAATCTATCA ATTCTT%?T4WTA:?::
CTACCA%I:
ATTCM:%
AT&G
Figure 7. DPU’A sequence of the region encompassing the URFC, ORF(C) ATPase 6 and tRNA Met-2 genes. The noncoding strand sequence starts at a Sau3A site (given in Fig. 6 of Cummings et al. (1985)) and continues for 8227 bp to a DdeI site in that same Fig. 6. The arrows below line marked 789 to 817 indicate that about 400 bp were not reported here since they were given in Fig. 6 of Cummings et al., (1985). This Figure completes 48 kb of contiguous sequences. tRNA Asn is bracketed and restriction sites are marked. The reading frame for the ATPase 6 intron at position 4937 starts 12 bp upstream but only the first M is given
resembles the other exon sequences in its CGN usuage. In Saccharomyces cerevisiae as well there are three long ORFs, termed RFl, RF2 (Michel, 1984) and RF3 (Seraphin et aE., 1985), of unknown function whose codon usage resembles ORF(C), especially with regard to the AAA (Lys) codon. Little sequence similarity with ORF(C) was detectable, and ORF(C) lacks the GC clusters common to these ORFs.
4. Discussion (a) Subgroup P. anserina (Jamet-Vierny
class IC introns
mitochondrial DNA is highly et al., 1984; Cummings
mosaic et al.,
1988a,b,c). It is nevertheless unusual to find that, the singular characteristic linking three physically well-separated genes is the type of class I introns interrupting each gene. The introns are similar in two respects. First, they have a specific shape to the C-domain (the so-called P 5a, b, c region in the nomenclature of Burke et al., 1987), striking enough to warrant classification as a new subgroup class IC intron. Thus far, five P. anserina introns have this trait: the first two introns in the ND1 gene (Michel & Cummings, 1985) and those reported here in the ND3, 4 and ATPase 6 genes. Except for the first intron in NDl, these introns also seem to be related in the similarity of the f-oligonucleotide (P7) with regard to the upstream GCA unit and the A-U
Pervasiveness
of a Class Z Zntron
833
1 MYQFNF---ILSPLDOFEIRDLFSLN~NV~GNIHLSITNIGLYLSIGLLLT~~~-M~I~P~ II III IIll II I III I IllIll II I 1 ~TLFNTVNFWRYN~SPLT~E~K~ISIOTPIL~HISITNI~T~FLL!INLLSTNYN
A. niddans Ifl
I
II
--P.anSOha N.crassa -m
65 ~~ISMAIYATWSIVINGUPTKMLYFPFIYALFIFIL~WJ~TVPYSFASTSHFILTFSMSFTIVLGA Illlll III III1 II II II IIllIIlIIIIII lIlIlllIlIlIllIIl III lIIlllll 73 ~~SISKSLYATLHSIV~NQINPKM~QIYFPFIYALFIFILINMGW!~‘SFASTS~-WLTFALSFTIVLGA lllIIllllII III IllI IlllIllIII IlIIIIIIIIIIIIIIIIIIIIIII IIIlIIIllill YSFASTSMWTFALSFTIVLGA 61 !~SISQESLYATIYSIVTSQINPRM~~IYFPFIYTLFIFILIN~LI T WSISMSLYAT-MSIV-NQINP-NG(~IYFPFIYALFIFILI~IG~Y~~T~~TF~SFTI~~
6.~ E.2. N.2.
137 TFLGLM\HWKFFSLFVPSGCPLULPLLVLIEFISYLSRN~SLGLRLAANIL~LSILSGFTYNI~S I II I III III1 II IllIIIIIIIIlIlIIIII II .llIIIllIlIIIIllII II lllllllll SGMLLHILAGFTYNIIITS 145 TILGFgKHGLEFFSLLVPAGCPLGLLPLLVLIEFISYLARNISLGLRLAAN llIIIIIllIIIIIlllIIllII IIIIIIlllIIIIllIlIIIIIlllIl v IIIIIIIIIIIIIIIIIII 133 TILGFG~KMXEFFSLLVPAGCPLALLPLLVLIEFISMARNISLGLRLAANILSGM~~LHILAGF~IMTS
&c. 11;:. &.
209 GILFFFLGLIPLAFIIAFSGLELAIAFIQAQVFWLTCSYIKOGLOLH.& II llIlIIIllIllIIIIIlIl IIIIIIlllIIII III1 III1 217 GIIFFFLGLIPLAFIIAFS’XELGIAFIQAQVFWLTSGYIKOALCUi.~~. lllllllllllllllllllllll~-lIIIIIIIIIIIIIIIIIIIIIIII 205 GIIFFFLGLIPLAFIIAFSGLELGIAFIQAQVFWLTSGYIKOALDLH.~~. GIIFFFLGLIPLAFIIAFSGLELGIAFIQAQVFWLTSGYIKOALOLH.
Figure 8. Amino acid alignment of the ATPase 6 gene of P. anserina with the same gene in A. nidulans (Netzger et al., 1982) and N. crassa (Morelli & Macino, 1984). Arrows point to sites of interruption by an intron.
extension in the f-f’ pairing. Whether this trait constitutes subgrouping is not clear, but these four introns also share the property that their open reading frames emerge from the D-region (P9) of the secondary structure. A sixth P. anserina intron has properties quite similar to these. While we have not completely determined its sequence, intron 5 of the cytochrome oxidase subunit 1 gene (COI; Matsuura et al., 1986) closely resembles the NDl, intron 2 and ND3 introns in both secondary and primary. structure with regard to the C-domain and f-f’ palrmg (not shown). Its open reading frame also appears to emerge from the D region but this is in t,he uncompleted part of the sequence. Other introns also have their open reading frame in this position. For example, intron 4 of P. anserina ND1 and its counterpart in the N. crassa mitochondrial genome have their ORFs in the D2 region (Cummings et al., 1988a). Due to this location of the open reading frames just upstream from the 3’ splice site. the possibility exists for the use of alternative splice sites in the expression of the intronic reading frame and these have been found in both these introns in the ND1 gene (Cummings et al., 1988a). No such sites have been found in the introns of the ND3, 4 or ATPase 6 genes. With regard to other organisms, the ND4L and ND5 genes of N. crassa appear to be of the class IC type (Nelson & Macino, 1987). The single 1490 bp intron of the ND4L gene is quite similar in secondary structure and primary sequence to the self-splicing nuclear large rRNA intron (Cech et al., 1981) as well as to the first intron (1820 bp) of the ND1 gene of P. anserina (Michel & Cummings, 1985). Like the ND1 intron 1 of P. anserina, the ND4L intron also has its ORF emerging from the a domain rather than the D region and both reading frames are
/OH G-Cu ;I$ C-G 2:: A-U
U G G
AA
lJ*
CUCG I I I, GAGC
ucu AC I I I AGA
U A A
“A
u _ AUUA A-U
G
A-6 U-A A -U A C A ‘GUU
NAA Asparagine /OH E:i C-G 61; A G
AA
UUUG Au I II I AAAC c
GUA
G-C
cucuuUUA 1 I II I GAGAA
CG uu
; ;;A;* A-U
E-g u . c A
U ‘Au NUA
Methlonine-2
Figure 9. tREA structures from tRNA Met-2 (Kochel c?t al., 1981) and tRNA Asn (Xetzger et al.. 1982).
Intron
ND4
5’
A” A : cu
G u x fj A
3’ I
Intron
a
NO3
,
Intron
ATPase 6
.,,IUIIIIIII, UAAGUAGGAAUAAAAUC
“C-domain”
to, CO---
8+5 b-d.-
Figure 10. Secondary structure models for class IC introns for p. anserina, ND3, ND4 and ATE&e 6 genes. Note the similarity of t,he C-domains and the position of the ORFs in the D region for each. Regions are marked by letters according : to Michel et al. (1982) and Michel & Cummings (1985) and in the P-classification suggested by Burke et al. (I 987).
Pervasiveness
Figure 11. Comparison of the f-f’ (~7) pairings of introns 1 and 2 from ND1 (Michel & Cummings, 1985), SD3, 4 and ATPase 6 of P. anserina and N. crassa ATPase 6 (Morelli & Macino, 1984). Base-paired basesare blocked. See Figs 2, 4 and 7 for the complete DKA sequence and Fig. 12 for the secondary structures.
continuous with the exon reading frame. The secondary structure of the first intron (1408 bp) of the N. crassa ND5 gene has as its closest relative the second intron (2641 bp) of the ND1 gene of P. anserina, especially with respect to the primary sequence of the C-domain. Like the other P. anserina class IC introns, t,his intron ORF is also contained within the D region. The reading frame of this ND5 X. crassa intron is very different from the P. anserina ND1 intron 2, however, being one long ORF (372 amino acid residues) compared with several discontinuous 100 residue ORFs. The second ND5 intron (1135 bp) has no known close relatives. While it has an extended C-domain, the position of t,he distal helices seem to be rotated 180°C relative to the orientation shown in Figure IO. The position of its ORF differs as well in emerging from the e region. Although there is some similarity
of the ORFs
of the N. crassa .ND4L
835
of a Class I In&on
and
the ND5 intron 1 with the P. anserina ND1 intron 2, their closest relative appears to be the ORF of the second intron of the N. CraSsa oli2 gene (Nelson & Macino, 1987). It may be necessary to subclassify the class IC introns. While this paper was in its final stages, Collins (1988) catalogued some 40 known class 1 introns into two main groups, those with a short. (21 to 38 bases) and those with a long (59 to 295 bases) C-domain. Of the 17 introns with a long C-domain, only the 8. crassa KD5 intron 1 and the P, anserina ND1 introns 1 and 2 discussed here and less so for the Ai intron of the N. cerevisiae cytochrome oxidase subunit I gene (Bonitz et al., 1980) had the shape of the C-domain displayed in Figure 10. Collins analyzed all these introns with respect to specific bases in each of the secondary helices in the C-domain and determined principally that the struct,ure was maintained by compensatory pairing and that consensus sequences were not apparent. The one major feature of all these extended C-domain introns was a characteristic adenine-rich bulge, located just below the Cl helix (shown in Fig. IO) (AUAAG for ND3, AAAAUC for ND4 and AUAUA for ATPase S), which was proposed to be involved in some int.ron-mediated reaction. One
deficiency in Collins’ analysis was the exclusion of the role of the open reading frame in intron function. As we have intimated, it may be that ORFs of particular function might habitate only those introns that allow the exon helical interaction necessary for expression. Regardless, the work presented here certainly supports the view that the shape of the C-domain is a characteristic and possibly functional feature of many class I introns. Second, the class IC introns in the P. anserina ND3 and ND4 genes are also closely rela.ted with respect to the sequence similarity of their open reading frames. When Morelli & Macino (1984) reported the intronic ORF in the ATPase 6 gene of N. crassa, they were unable to find sequence similarity with any other intron. Now, we can group this intronic ORF with the N. intermedia Varkud ND1 intron and with three introns in P. anserina: ND3 and 4 and the r2 ORF (Fig. 12). In a sense, all these introns represent examples of possible horizontal intron transfer (see Burke & RajBhandary, 1982; Lang, 1984; Dujon et al., 1986). For the NDl, ND3, ND4 and ATPase 6 and the rl intron of introns of P. anserina S. cerevisiae (Dujon et al., 1986), the secondary structure sequences appear to have transferred independently of their open reading frames. For ND3, 4 and r2 of Podospora, ATPase 6 of .V. crassa, and ND1 of N. intermedia (Mota & Collins, 1988) the intronic ORFs were mobilized separate from their secondary structures. That such a recombinational event is possible is illustrated by the sequence similarity between the N. crassa ATPase 6 intron and the r2 ORF2 in P. anserina, where sequence similarity starts at the distal end of the first dodecapeptide (see Fig. 12). Both the secondary
structure
atid
open
reading
frames
of
intron 4 of the ND1 gene of P. anserina share sequence similarity with the single intron of ND1 from N. crassa (Burger 8: Werner, 1985). lror some of the P. anserina introns the possibility exists that horizontal transfer occurred within the same genome, followed by independent evolutionary changes. This may well be involved with the ND4 intron where so many differences are found downstream from its continuous long open reading frame. It is also possible that one, or both, of the stop codons in the ND4 intron are suppressed for expression. Similarly, for two related Seurospora strains, a recombinational event may have occurred between the ATPase 6 and ND1 introns resulting in a chimaeric ND1 intronic structure. It would be interesting to determine the DNA sequence of the N. intermedia Varkud ATPase 6 intron to det,ermine if it has the ORF of the N. crassa ATPase 6 intron or the ND1 intron. Separate from the possibility of intron horizontal transfer, the transfer of an entire gene unit may also have happened with the ND2 and 3 genes of P. anserina and N. crassa (Figs 4 and 5). In both organisms these two genes are separated I,y just a TAA codon and the ND3 intron is in the identical location. This organization is in contrast to the
836
D. J. Cummings
N. intermedia N.crassa
Varkud
ATPU~
P. anserina
and J. M. Domenico
ND1
6
ND3
P. anserine
ND4
P. anserine
r2.
KRNFSTLESKLNPSYISGFVJlGEGSFMLTIIKDNKYKLGWRWCRFVISLRK------------------------GSTUITVYNDNTRSTSWAIK~TKIELXN--
Orf2
--n-St---kLNP-yITG~VDGEG~PmltlfKDn-YklGWqVKliPkISLXk--
Conamaua
Dodecapeptide
N. in&media
Varkud
N. crassa
ATPaea 6
F! ansefina
ND3
ND1
KDISLLEAIQRTIKVGKIYKXGIDSIQYRVS-SLKNLQIITDETDSYPLITQKR KDLSLLNKIKEPTDVGNVTLYTKDSAQYRVP-SLKGLDLIINXPDKYPLITKKQ KDYTLLCQIRDYPGIGIITKXGETTLQYYVR-SIKDLNVILSXTDAYPLFSQKR
ND4
P. anserha
r2,
P. anserina
RDLXTLNEIKKTTNGGTITIITDKNSQIKTS-SLKELELIINXTDKYPLVTKKT
Orf2
KD1aLLn-IkdfTgVGtItKhq-d8aQY~V~-SLK-L-lIinXTDkYPLiTKKr
COn8anSU8
N.
infermedio
N. crassa
Varkud
ND1
ADYKLTK#AXNLIKNKSXLTKEGLLELVAIKAVINNGLNNDLSIATPGINTILR
ND3
P. anserine
SDYILTKQAIVLIKNKEXLT~LKNILSLKASINLGLSDELQLVTPDIIPISR
ND4
r2,
P. anserina
EDYLLTRQAYVMLKNKEXLTEVGLKEIRNIKLYXNKGR--VLS.
Orf2
~DY1LTKQAivLIKNkEXLTkEGLL~Iv-IKA-iN-Gl-n-LS--TP-i--I-R
COn8enSU.9
Varkud
N. infermedla N.crassa
ATPass
P. mserina
ND1
TPVDRPPVPTININKDWLVGTIDGEGCTYINVIKSNTNTKVULL--TQ-ITQXN PSVIYITSDVKVKSLNWIRGTIEGEGCTQVITQNSXXPKGRN~LRTS-LT~XI
6
PD----TSLPQILNPTWLSGTVDAEGCTS~TKSKTSKLGEAVKLSTILTQSN
ND3
P. anserina
ADYQLT~IVTIITDKKXKTEEGTLKILGLRYNLNUGI----SE-ELKLATPNI VDYLLTKQAIALIKNKEXLSLEGLLKLVGIKATLRSSWPNLKKV-TPTVKMVR
6
ATPaae
P. ans6rina
ROXYLLTLIRDYTGVGTVRI(DKNNNSVYSVTKVEELTSVITPEPDRYPLLTKKW
PSPWKRGEGTRXENj-YIAGLTSGDGCTFCLY~SSXTVSGKSWLNTQIVQXS
ND4
P~v---tr-----N--Wl-GFidGEGCT-v-kS-t-kS-t-k-g--v-lf~-lTQX-
COttSOIlSUS
Dcdecapepl~de
varkud
N. crass0
P. anserine ATPam
RDEYLIKSLIEYLGCGNTSLDPRGTIDTKVTN-TSS-IKDIIVPTTIKYPLKGN RDEEL#R#LISALGCGRIELALKQSAVYYWTKYQD-ITDKLIPLTYNXPIKGV
ND4
RDE~Lik-li-YLgCGr--l-pk-----yv-t-f~d-IndklIPTf--yPLKGv
Conasnmua
N. in&media
Varkud
6
N. crassa P. anserina P. answina
ND3 ND4
Conaanaua
RDTDLIEKIALYLNCGSVKQRGKDLDAVDTEVTKTELINTQIIPTLLANPLKSS KDEELLKDIAIYLNYGRYYKSPTRNEGQYLITITSD-INNKLIPTLKEYPLLGV
6
ND3
P. anserina P. anserina
ND1
ND1
KSYDTSSTYEAAQIINNKNTRQWESNDIENLLNIKDKMNKYL. KQEDTLDTMIAXLIESKTELTDEGLDTIKLIQS--NIRIIKPEERKV. KNLDTTDTCCVVRLXXNKSXLTKEGLDQIKKIRN--RXXTNRK. KALDYSD~IVTLXKDKAXLTEQDLXEIQSIKL--NNXLTRKLASUS. A-lDTsDT-•iv-LnanK-XLT-EqLD-IklI----nXN--Rk
Figure 12. Amino acid alignment of the open reading frame of introns from N. crassa ATPase 6 (Morelli &, Macino, 1984), P. anserina ED3, ND4 and r2, ORF2 (Cummings et al., 1988c) and N. intermedia Varkud ND1 (Mota & Collins, 1988). See Figs 4 and 7 for the position of dodecapeptides in ND3 and 4 introns. Dots indicate stop codons both internally and upstream from the 3’ end of the intron. The consensus sequence is given beneath each segment; capital letters indicate majority agreement; lower case at, least 2 codons agree. The ND4 sequence has 2 stop codons (underlined) and 2 reading frame shifts (/).
Pervasiveness
VPIFTIMjWWSIKSSRILLK~ 80
of a Class I Intron
YSFINTVW43 SPKoFYL~bt&~~~ 1OZP -@%o
-LNEKSMP%VALLKYGY%fT-LTI-LEI&DSLMSRE%FEVYSP&%LKIPGSP . . . . . . ,: :. . . . *: ,: . ...: . :, ,,, . . , :. NIAL(3KA:~YGLM(FI~~~~~E~~ISH(ALT~~TSYINRF~~~-~Y~AIA 110 sRG~~PIES”RI~~T~SP~F~~~L~~S~~~V~T~K~~~TTTY~~R .a. :a.:.:.. .: . . :. . . . ,..: . . . . :. . .*:.. TSSLGYKH~~RLKM-MYW&DWiP~~THTEEALGL~SKPGELNf”iXKHSEAT 180
fim
ND1 intron
co_. ATPase6 .. . .
Nc
intm
ND1 intron
kg. ATPase6intm .... .
200 170 180 190 210 220 AJVWALDIDKRYIEHYIYLK(JNKPVLGRYTFKLNSNSDEESRNLI~KVWTS~~WEVTN :... * .: . . :: . . . .: KASPMKKNK’f~LGVGIYDLLLILKFS~ELAKYLG~~VTVU
837
~LNDI
intron
ko,ATPose6intron N_r. ND1 intron
e.g. ATPare 6 intron
FKPIQD fy4.c ND1 intron 300
Figure 13. Amino acid alignment of the N. crassa ND1 intron (Burger & Werner, 1985) with the I? anserina ATPase 6 intron. The ATPase 6 intron starts at position 4937 in Fig. 7 at its first M, but the ND1 intron sequence starts 70 residues downstream from its first M.
structure of the A. nidulans genome where ND2 is separated from ND3 by the ND5 and OxiB genes and ND3 lacks an intron (Brown et al., 1985). The ND2 gene is also separated from the ND3 gene in higher organisms where instead it is punctuated by tRNA genes. Fujii et al. (1988) showed that the ND2 DNA sequence from the frog Rana eatesbeiana had excellent amino acid sequence similarity with this gene from human, bovine, mouse and Xenopus laevis mitochondria. They noted six regions of high sequence similarity and suggested that these domains might be involved in either the assembly or the function of the NADH dehydrogenase complex. deVries et al. (1985) identified the ND2 gene in N. crassa by virtue of its poor but significant amino acid sequence similarity with that sequence in human mitochondrial DNA. Some of the regions of best similarity in the N. crassa (and now P. anserina) gene are part of these domains (underlined in Fig. 5). For example, the YFL in the third line, APFHFVV in the fourth line, LAYS in the fifth line and PPL-GF in the seventh line are all within the highly conserved domains of higher organisms. But there are also similar identities outside these six domains (also underlined in Fig. 5). All these segments in the N. crassalP. anserina ND2 gene are much shorter than the conserved domains in R. catesbeiana, suggesting that if these regions are essential for function then only small areas are critical. The S. crassalP. anserina ND2 gene is also different in its length (about 200 amino acid residues longer, with most of these at the beginning of the gene) and codon usage. C is the preferred third base in the higher organism codons whereas A or T is preferred in the fungi. It would be of interest to have a larger data base to enable us to construct a phylogenetic tree for just the ND2 gene, since its gene structure as well as the sequence has evolved so much.
(b) Other organizational features of the P. anserina mitochondrial genome The P. anserina mt genome has numerous repetitive sequences. Some of the short repeats, for example GGCGCAAGCTC, have been reported here and elsewhere (Turker et al., 19873) to be involved in the excision of several mt plasmids which occur during senescence. Other short repeats may be involved in the long-range interactions necessary for bringing the ends of these plasmids into close proximity prior to excision (Turker et al., 1987b). Two duplications are the tRNA genes for Met-2 and Val. tRNA Met-2 is also duplicated in N. crassa and for each organism this gene is involved in intramolecular recombination (Gross et al.. 1984). For P. anserina, a chimaeric mt plasmid of 2.5 kb results and in N. crassa the entire mt genome is divided into two circular units. As yet we have no information on whether the duplicate tRNA Val is involved in an excision event. Other tRNA genes can also be duplicated. In A. nidulans (Brown et al., 1985) tRNA Cys and tRNA Asn are duplicat’ed. For these genes, the duplications are contained as part of larger duplications. The tRNA genes of ;\r. crassa, A. nidulans and P. anserina are similar also in their clustering either upstream or downstream from genes and it has been suggested that this clustering, or even single tRNAs, serves as recognition signals for gene expression (Burke & RajBhandary, 1982; Brown et al., 1985). No function can be ascribed to the apparent duplication of a 175 bp unit, in the 5’ region of URFC. This element of the sequence has not been found associated with the ends of any mt plasmids and the duplicated region is not in such close proximity to ATPase 6 as to act as a putative signal. The codon usage of the unduplicated part of ORF(C) is different from other exons and introns. This suggests the possibility that, ORF(C) is not of
838
D. J. Cummings
mitochondrial origin. It is also possible that ORF(C) is or is part of a mosaic gene but we were not able to detect sequences to substantiate this. All these direct repeat duplications may simply that have serve to generate “slipped structures” been proposed to be involved in the regulation of mouse collagen genes (McKeon et al., 1984). Finally, we note that the sequences presented here complete the 48 kb region depicted in Figure 1 roughly half the entire which constitutes mitochondrial genome of I>. anserina. This work was supported in part by a grant from the Sational Institutes of Health AG06320. We are grateful to Francois Michel for the construction of the secondary structure diagrams for the intron of XD3. 4 and ATPase 6 as well as for his keen interest.
References Bernardi. G. (1979). Trends Biochem Sci. 4, 197-201. Bertrand, H., Collins, R. A., Stohl, L. L., Goewert, R. R. Bi Lambowitz, A. M. (1980). Proc. Not. Acad. Sci.. C.S.A. 77, 6032-6036. Bonitz, 8. G., Coruzzi, G., Thalenfeld, B. E.. Tzagoloff, A. 8: Macino, G. (1980). J. Biol. Chem. 255, 1192711941. Brown. T. A., Davies, R. W., Ray. J. A., Waring, R. B. & Scazzocchio, C. (1983). EMBO J. 2, 427435. Brown. T. A., Waring. R. B., Scazzocchio, C. & Davies. R. W. (1985). Curr. Genet. 9, 113-117. Burger. G. & Werner, S. (1985). J. Mol. Riot 186. 231242. Burger, G. & Werner, S. (1986). J. Mol. Biol. 91, 589-599. Burke. J. M. & RajBhandary, IT. L. (1982). Cell, 31, 509520. Burke. J. M., Belfort, M.. Cech, T. R., Davies, R. W.. Schweyen, R. cl., Shub, D. A., Szostak, J. W. & Tabuk, H. F. (1987). Nucl. Acids Res. 15, 7217-7221. Capaldi, R. A. & Vanderkooi, G. (1972). Proc. Nut. Acad. Sci., U.S.A. 69. 939932. (‘ech. T. R.. Zaug, A. ,J. &, Grabowski. I’. J. (1981). Cell. 27, 487496. Chomyn, A.. Mariottini, I’., Cleeter. M. W. J.. Ragan. C. I.. Matsuno-Yagi, A., Hat,efi, Y., Doolittle, R. F. & Attardi, G. (1985). Nature (London), 314,592-597. (‘ollins, R. A. (1988). Nucl. Acids Res. 16, 2705-2715. (“ummings, D. J. & Wright, R. M. (1983). Nucl. Acids Res. 11,2111-2119. Cummings, D. J., Belcour. L. & Grandchamp. C., (1979a). Mol. Gen. Genet. 171. 229-238. (Cummings, D. J., Belcour. L. & Grandchamp, C. (19796). Mol. Gen. Genet. 171. 239-250. (Cummings, 1). ,J., Macl’u’eil, I. A., Domenico, J. & Matsuura, E. T. (1985). J. Mol. Biol. 185, 659-680. (‘ummings, D. ,J.. Domenico, J. M. & Turker, M. S. (1987). In Plant Senescence. Its Biochemistry and Physiology (Thomson, W. W., Pu’othnagel, E. A. & Huffaker, R. C., eds). pp. 31-42, The American Society of Plant Physiologists. Cummings, D. J., Domenico, J. M. & Michel, F. (1988a). Curr. Genet. In the press. (‘ummings, D. J., Domenico, J., Nelson, J. & Sogin. M. L. (19886). J. Mol. Evol. Tn the press. Cummings, D. ,J., Domenico, J. & Nelson, J. (1988c). J. Mol. Evol. In the press. Davies. R. W.. Waring R. R., Ray, cJ. A.. Brown, T. A. &
and
J. M. Domenico Scazzocchio, C. (1982). Nature (London), 300, 719 724. deVries, H., deJonge, J. C. & Schrage, C. (1985). In Research (Quagliarello, E., Slater. E. C., Palmieri, F., Kroon, A. M. & Saccone, C.. eds), Biogenesis vol. 2. pp. 285-291, Bari, Italy. deVries. H., Alzner-DeWeerd, B.. Breitenberger, C. A.. Chang, D. D., deJonge, J. C. & RajBhandary. LT. 1,. (1986). EMBO J. 5, 779-785. Dujon, B.. Colleaux, L., Jacquier, A., Michel, F. & Monteilhet, C. (1986). In Eztrachromosomal Elements in Eukaryotes (Wickner, R. B.. Hinnebusch. A.. Lambowitz, A. M., Gunsalus, I. C. & Hollaender. A., eds), pp. 5-27, Plenum Press, P;ew York. Fujii. H.. Shimada, T.. Goto. Y. & Okazaki. T. (1988). J. Biochem 103, 474481. Gross, S. R., Hsieh. T. S. & Levine, P. H. (1984). Cell, 38. 233-239. Hensgens, L. A. M., Bonen, L., DeHaan, J., Venderhorst, G. & Grivell, L. A. (1983). CeZZ, 32, 379389.
10, 59-67. Lang, B. F. (1984). EMBO J. 3. 2129-2136. Lazarus; C. M., Earl, A. ,J., Turner. G. & Kuntzel, H. (1980). Eur. J. Biochem. 106, 633-641. Mannella, C. A., Goewert, R. & Lambowitz, A. M. (1979). Fell, 18, 1197-1209. Matsuura. E. T.. Domenico, J. M. & Cummings, I). J. (1986). Curr. Genet 10, 915-922. Maxam, A. M. & Gilbert, W. (1980). Methods Enzymol. 65, 499-560. McKeon. C., Schmidt, A. & decrombrugghe, B. (1984). ?J. Hiol. Chem. 259, 66366640. Michel, F. (1984). Curr. Genet. 8, 3077317. Michel, F. & Cummings, D. .J. (1985). Curr. Genet. 10, 69.-
79. Michel. F. & Dujon, B. (1986). Cell, 46, 323. Michel, F., Jacquier, A. & Dujon, B. (1982). Biochimie, 64, 8677881. Morelli, G. & Macino, G. (1984). J. Mol. Biol. 178. 491507. Mota. E. M. & Collins, R. A. (1988). Nature (London). 322, 654-656. Pu’argang, F. E.. Bell, J. B., Stohl. L. L. &, Lambowitz, A. (1984). Cell, 38, 441-453. Nelson. M. A. & Macino. G. (1987). Mol. Gen. Genet. 206, 318-325. Xetzger. R., Kochel. H. G.; Basak, ?u‘. & Kuntzel. H. (1982). Nucl. Acids Res. 10. 4783-4794. Ohyama, K., Fukuzawa, H., Kohchi, T.. Shirai, H., Sano, T.. Sano, S.. Umesono, K.. Shiki, Y., Takeuchi, M.. Chang, Z., Aota, S., Inokuchi, H. & Ozeki, H. (1986). Nature (London), 322, 572-574. Osiewacz, H. D. & Esser, K. (1984). Curr. Genet. 8. 299305. Sanger. F.. Nicklen, S. & Coulson. A. R. (1977). Proc. nTat. dead. Sci., U.S.A. 73, 5463-5467. Seilhamer, J. J., Olsen, G. tJ. & Cummings, D. ,J. (1984). ,J. Biol. Chem. 259, 5167-5172. Seraphin. B.. Simon, M. 8: Faye, 0. (1985). Nucl. Acids Res. 13, 300-314. Stahl. C., Lemke, P. A., Tudzynski, I’.. Kiick, 1:. & Esser. K. (1978). Mol. Gen. Genet. 162. 341-343.
Pervasiveness of a Class I Intron Turker, M. S., Domenico, ,J. M. & Cummings, D. ,J. (1987a). J. Biol. Chem. 262, 225&2256. Turker, M. S., Domenico, J. & Cummings, D. J. (19873). J. Mol. Biol. 198, 171-185. Turker, M. S., Nelson, J. G. & Cummings, D. J. (1987c). Mol. (‘ell. Biol. 7, 3199-3204.
839
Waring, R. B., Davies, R. W., Scazzocchio, (‘. & Brown, T. A. (1982). Proc. Nat. Acad. Sci.. U.S.A. 79, 63326336. Wright, R. M., Horrum, M. A. & Cummings, 1). tJ. (1982). Cell, 29, 505-515.
Edited by N. L. Sternberg