J. Mol. Biol. (1988) 203, 294-331
Structure and Organization of Marchantia Chloroplast Genome II. Gene Organization
polymorpha
of the Large Single Copy Region from rps’l2 to atpB
Kazuhiko Umesono’t, Hachiro Inokuchi’, Yasuhiko Shiki’, Masayuki Takeuchi’ Zhen Changl, Hideya Fukuzawa2$, Takayuki Kohchi2, Hiromasa Shirai’$ Kanji Ohyama2 and Haruo Ozekil(I 1 Department of Biophysics, Faculty of Science and 2 Research Center for Cell and Tissue Culture Faculty of Agriculture, Kyoto University, Kyoto 606, Japan (Received 24 June 1987, and in revised form
14 April
1988)
The nucleotide sequence (56,410 base-pairs) of the large single-copy region of chloroplast DNA from the liverwort Murchantia polymorpha has been determined. The sequence starts from one end (JLA) of the large single-copy region and encompassesgenesfor 21 tRNAs, six ATPase subunits (atpA, atpB, atpE, atpF, atpH and atpI), two photosystem I polypeptides (psaA and psaB), four photosystem II polypeptides (psbA, psbC, p&D and psbG), five ribosomal proteins (rps2, rps4, rps7, rps’l2 and rps14), and three RNA polymerase subunits (rpoB, rpoC1 and rpoC2). In addition, we detected 18 open reading frames ranging from 29 to 2136 amino acid residues long, four of which share significant amino acid sequencehomology to those of an Escherichia coli malK protein (designated mbpX), human mitochondrial ND2 (ndh2) and ND3 (n&3) of a respiratory chain NADH dehydrogenase, or a bacterial antenna protein of a light-harvesting complex (ZhcA). Sequence analysis suggests that four tRNA genes and six protein genes might be split by introns; they are trnG(UCC), trnK(UUU), trnL(UAA), trnV(UAC), atpF, ndh2, rpoC1, rps’12, ORF135 and ORF167. In the large single-copy region described here, the gene organization deduced is highly conserved with respect to that of higher plants, but an inversion of some 30,000 base-pairs flanked by tmL(CAA) and tmD(GUC) was seen between the liverwort and tobacco chloroplast genomes.
1. Introduction In this paper, we discuss the gene organization in about 70% of the large single copy (LSCT/) region from the IR,-LSC junction. Portions of the DNA sequence have been published (Umesono et al., 1984; Fukuzawa et aZ., 1986).
2. Materials
and Methods
Sequencingof the cloned chloroplest DNA molecules and analysisof the nucleotide sequenceare describedin an accompanyingpaper (Ohyama et al., 1988).
3. Results t Present address: Gene Expression Laboratory, The Salk Institute, San Diego, CA 92138-9216, U.S.A. $ Present address: Institute of Applied Microbiology, University of Tokyo, Tokyo, 113, Japan. 3 Present address: R & D Center, Unitika Ltd., Uji, Kyoto, 611, Japan. 11 Author to whom all correspondence should be addressed. 7 Abbreviations used: LSC, large single-copy; IR, inverted repeat; bp, base-pair(s);kb, 10’ base-pairs; ORF, open reading frame; SD, Shine & Dalgarno. 0022-2836/88/160299-33
$03.00/O
The entire gene organization of this region is presented in Figure 1. The DNA sequences with deduced coding and amino acid sequences are shown in Figures 2 and 3, which correspond to the a and b regions, respectively, in Figure 1. Nucleotides in the LSC region are numbered with the end proximal to the IR, region as 1 (Ohyama et al., 1988). Figure 2 shows the sequence corresponding to nucleotide numbers 1 to 42,240. Figure 3 shows the reverse strand from nucleotides 56,410 to 299 0 1988 Academic Press Limited
300
K. Umesono
et al.
(b) Figure 1. Gene organization in a region from J,, (o p en triangle) to at@ of the LSC region. Genes identified are indicated in either the upper or the lower position of the linearized DNA strands. The letters (a) and (b) with arrows indicate the sequence file names described in the legend to Figs 2 and 3. Nomenclature of the genes is described in an accompanying paper (Ohyama et aZ., 1988). Coding or exon sequences are represented by filled boxes, and introns by hatched boxes. Numbers indicate the length from J,, in base-pairs x 10-3.
42,011, with an overlap of 230 nucleotides (42,011 to 42,240) with the sequence shown in Figure 2. (a) Genes for transfer
RNAs
We detected 21 possible tRNA genes in the region described here. None of them was duplicated in the entire liverwort chloroplast genome. Judging from the predicted anticodon sequences,they would encode three isoacceptors for serine, two for glycine, leucine, methionine (an initiator and an elongator) and threonine, and single species each for arginine, aspartic acid, cysteine, glutamic acid, glutamine, histidine, lysine, phenylalanine, tyrosine and valine. None of the tRNA genes encode the mature CCA sequence at their 3’ termini, in common with the other tRNA genes detected in the liverwort chloroplast genome. The predicted nucleotide sequences of tRNA molecules are compiled in Table 5 of an accompanying paper (Ohyama et al., 1988). Three serine tRNA genes, tmS(GCU) (22,892 to (41,494 to 41,407) and 22,979), tmS(UGA) tmS(GGA) (48,845 to 48,932), are the same length in their coding regions with long loop sequencesof 19 nucleotides. They are scattered in the LSC region (Fig. l), and no other serine tRNA gene was found throughout the chloroplast genome. Cloverleaf structures of their products seem to be normal except for a short D-stem (2 bp) in the tmS(GCU) product caused by C12-A24 and G13-A23 mispairings.
Two of the three chloroplast leucine tRNA genes, tmL(CAA) (3679 to 3758) and tmL(UAA) (50,522 to 50,921), were located 46.8 kb apart in the LSC region, and the other tmL(UAG) has been mapped in the SSC region (Kohchi et al., 1988). The tmL(UAA) is split by an intron of 315 nucleotides in the middle of the anticodon loop (T35-A36; Tables 2 and 5 of Ohyama et al., -1988). A split structure for tmL(UAA) has been reported for maize (Steinmetz et aZ., 1982), Vicia faba (Bonnard et aE., 1984) and tobacco (Yamada et al., 1986); the length of the intron varies from 315 (liverwort) to 501 (tobacco) nucleotides. As was pointed out by Bonnard et al. (1984), the intron of the liverwort tmL(UAA) shares some sequences similar to those found in fungal mitochondrial group I introns (Davies et al., 1982; Michel et al., 1982; Michel & Dujon, 1983; and see Fig. 4). These sequences are essential for self-splicing in vitro (Burke et al., 1986). However, the liverwort tmL(UAA) intron did not undergo such a reaction using artificial precursors under conditions similar to those used for the selfsplicing of Tetrahymena L-rRNA (Kruger et al., 1982) and yeast mitochondrial L-rRNA (van der Horst & Tabak, 1985; K. Umesono, unpublished results). Genes encoding glycine tRNAs, tmG(GCC) (42,035 to 42,105) and tmG(UCC) (22,047 to 21,385) are 20 kb apart on different DNA strands. The former tRNA would contain a mismatched basepairing between C26 and A42, making an anticodon stem of 4 bp (Umesono et al., 1984). The cloverleaf
Liverwort
rps’l2 5'
Chloroplast
Genome.
301
II
- trnfM(CAlJ)
TTGAACGAGnAGCCGTATGnAATGAAAAT~TCAAGTACG~TTTTGT~A~TGACAATTT~GGTAACTTA~TTGTCAACT~TTCCACTACnACACCAAAAnAACCAAACTCTGCCTTACG;\ TTPKKPNSALR .rps'1z..ragccg-augaa--gaaa--uucaugu-cgguuy.......(lntron)................cuayy-y-ayT
120
AAAATAGCTi;GAGTTAGACiAACCTCTGGATTTGAAATTACTGCATATAiTCCAGGTATiGGCCATAAT~TGCAAGAACATTCAGTTGTiTTGGTAAGAGGAGGAAGGGiCAAAGATTTA \IARVRLTSGFEITAYIPGIGHNLQEHSVVLVRGGRVKDL
240
CCTGGTGTAkGATATCATAiTATTAGAGGAACACTGGAT~CTGTAGGAG~AAAAGATCG~CAACAAGGG~GTTCTAGTG~GTTGTATAT~ATAATCTAT~AAAATGTAT~ATTTTAGAT~ P G V R Y H I I R G T L D A V G V K D R Q Q G R S Kgugyg.......................................
360
CCTAATTTAiTGCTGATAAiATGTAAAAA~TAGCTAACC~GTGATTAAA~TTTACATTT~AAAACGGAA~AAAAGCAGG~TATATGTAT~TAAAATAAA~TAAAATATT~TCTATATTA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron) .. . . . .. . . . .. . .. . . .. . . . .. . .. . . . .. . . . .. . . . .. . . . .. . . . .
480 ..
.
ATACTATACnATATCTAGGETTTTATTTA~AGTTAAAATAAAAATTTAA~TTTTCCCTT~CTTTTTAAT~CAAAATAAA~AAAATTTTA~TTTTTTAGA~CAAGTTAAA~TAAATAGCA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron) . .. . . . .. . . .. . . .. . . . . .. . . . .. . . . .. . . .. . . . .. . . .. . . . .. . .. . .
600
AAATAAAAAkATTTATTTTiATACAATATiTTTATAAATkTAAAACACT~GAAACGGAT~CTCAATTAA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ( 7" t ron).........................................................
720
AGTGAGTAAkCATCAATAAAATTAAACGA~GTAAAAAGC~GTATTCGTT~AAAATCGGA~GTACGGTTT~GAGGGAGAT~AAAAAATCC~CCCTACAAT~TGGAGTAAA~AAGTCAAAA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..ragccg-a"g~~--g~~~--""~~"g"-~gg""y..................c"ayyy-ay YGVKKSK=
840
AAATTTAAAkTAACTCTTAkATAAAAAAA~TAACTTTAA~TATTTATTA~TATGTCACG~AAAAGTATT~CAGAAAAAC~AGTTGCAAA~CCTGATCCA~TATATCGGA~TCGATTAGT~ rps7> MS R K S I A E K Q V A K P D P I Y R N R
L
V
AATATGTTAcTTAATCGTA;TTTAAAAAA~GGAAAAAAA~CATTAGCTT~TCGGATTCT~TATAAAGCT~TGAAAAATA~AAAACAAAA~ACAAAAAAA~ATCCAT~AT~TGTATTACG~ :! M L V N R I L K N G K K S L A Y R I L Y K A M Y N I K Q K T K K N P L F
V
L
R
CAAGCAGTTcGAAAAGTAAcTCCTAACGT~ACAGTCAAA~CAAGACGCA~CGATGGATC~ACTTATCAA~TTCCACTAG~AATTAAATC~ACACAAGGA~AGGCATTAG~CATTC(~TTG~ Q A V R K V T P N V T V K A R R I D G S T Y Q V P L E I K S T Q G K A LA
I
R
W
960
1080
1200
CTATTAGGAGCCTCACGGAnACGCTCAGGiCAAAATATG~CTTTTAAAC~TAGTTATGA~TTAATTGAC~CAGCCAGAG~TAATGGAAT~GCTATTCGT~AAAAAGAAG~AACTCATAA~ LLGASRKRSGQNMAFKLSYELIDAARDNGIAIRKKEETHK
1320
ATGGCAGAAGCTAATAGAGCTTTTGCTCAiTTTCGTTA~T~ACGT~TAAATTATA~AAAACAAT~TTTATTGTA~TGAAATATG~TTTAATATT~TTTATTATT~CAAATATTT~ +---------------~---------> f---, <---+ MA E AN R A F A H F R ===
1440 <-~.-.---
AATACAATAnAAATTGTTTiAGTTTTTTTiTnTTATTTT~ATTGTA~A~AATTTATTT~TTGGAAAAT~TTTATGAAA~TAGAACTTG~TATGTTTTT~TTATATGGA~GTACTATTT~ TTGTTT> TTTAAT> ndhz, GGA MKLELDMFFLYGST:L + ACCAGAATGiATTTTAATTiTTAGTTTATjAATTATTTT~TAATTGAT~TAACATTTC~T~AAAAGA~ACAATTTGG~TATATTTCA~CTCCTTAAC~AGTTTATTA~TAAGCATAA~ PECILIFSLLIILIIDLTFPKKDTI W L Y F I S L T S L L I
1560
1680 S
i
I
AATATTGTTaTTTCAATACiAAACAGATCCTATTATTAGiTTT~AATAGAATT~TTCAGTCAT~TATAGTATT~TGTTCCATT~TATGCATTC~ ILLFQYKTDPIISFLGSFQTDSFNRIFOSFIVFCSILC'P TTTATCAAT;GAATATATTAAATGTGCAA;\AATGGCTAT~CCTGAATTT~TMTATTTA~ATTAACAGCiACTGTCGGAGGAATGTTTTiGTGTGGAGCiAATGATTTA;;TTACTATTTj L S I E Y I KC A KM A I P E F L I F I L TAT V G G M F L C G A N D L V
1800
1920 T‘
F
TGTTTCGTTi\GAATGCTTGnGTTTATGTTCTTATTTATT~TGCGGT~T~CAAA~GAG~TATTCGATCiAATGAAGCTGCTATTAAATATTTACTTAT;\GGTGGAACAilGTTCTTCGAT V S L E C L S L C S Y L L C G Y T K R D I R S N E A A I K Y L L I G G T S S
2040 i;
I 2160
TATTTGTATiCTCGTAGGnCTTGCATTTAi\ACTTTCTTT~GTTCCATTT~ATCAATGGA~TCCTGATAT;TATGAAGGAGTGCGATTCGiTAAAAAAATiATTTAATAAGTTTAATAAAA I C I L V G L A F K L S L V P F H Q !d T P D I Y E G gugyg....................................
2280
ACTCAATAGi\TATATATATi\TATAAATAT;\TTTTTTTTT~ACACAAATT~TAG~T~TTTACAAAA~AAAATAACG~AAAATTTCA~CAATTAATT~TTATTTTTT~ATAAAAATT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron) . . .. . . .. . . . .. . .. . . . .. . . . .. . . . .. . . .. . . . .. . . .. . . . .. . . .. . ..
2400
TTAMnnnGiTnnTAAATAiTACGGAGTniTTGM~TTiAACCTAAAAGTGTAAAAACATAAGAATAG;AATAATAATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (intron)........................................................
2520
ATATT~TTCCTAAAAiAAATTGAATiAATAACTATiTTCGAAAAG~TTAAAAATT~ATTTAATTT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (intron)........................................................
2640
TAATTATACnTACACcAAG;\AncTTTTnnnnnnTTGATT~ATAMATTT~TTATTACTT~GGAGCCGTG~GAATTGAAA~TCTCATGCA~GGTTTTGAA~GAGAGAAAA~ATAATTTTT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..~~g~~g-~"g~~--g~~~--""~~"g"-~gg""y.........................
2760
TTTTTCGACiCTAACTCACCCACCCCAGTCGTTGCTTTTCTTTCTGTTA~TTCAAAAAT~GCTGGATTA~CTTTAGCTA~TAGAATTTT~AATATTTTA~TCTCTTTTT~ACCAAATGA~ . . . ..cuayy-y-ayS P T P V V A F L S V T S K I A G L A L A T R I L N I L F S F 8 P
2880 N
E
TGGA~TTiTTTTAGAAAiTTTAGCTATjTTAAGTATG~TTTTAGGAA~TCTAGTTGC~ATTACTCAA~CAAGTATGAAACGAATGCT;GCTTATTCTiCAATAAGTCAAATTGGATA; WKIFLEILAILSMILGNLVAITQTSMKRMLAYSSIS.QIGY
3000
ATTCTTATTGGATTAATAACAGGTGATCTj\AAAGGGTAC~CTAGTATGA~GATTTATGT~TTTTTCTAC~TTTTTATGA~TTTAGGAAC~TTTGCTTGT~TTATATTAT~TAGTTTACG~ ILIGLITGDLKGYTSMTIYVFFYIFMNLGTFACIILYSLR
3120
ACAGGAACAi;ATAATATTCETGATTATGCAGGTTTGTAT~T~GATC~TTTATTAAG~TTTTCCTTA~CATTATGTTiATTATCTTTAGGAGGACTTCCTCCTTTAACTGGCTTTTTi TGTDNIRDYAGLYIKDPLLSFSLTLCLLSLGGLPPLTGFF
3240
G~TTAiATTTATTTTi;GTGTGW\TGGCAATCAGGT~TTTATTTAT~AGTTTTTAT~GCATT~TT~CAAGTGTAA~TTCACTTTA~TATTATTTA~AAATTATTA~ATTAATTTT~ GKLYLFWCGWQSGFYLLVFIALITSVISLYYYLKIIKLll
3360
ACTAAAAAA6nTMT~iAAATCCTTAiATTCAAGCTiATATTATTACATCACCAACiTTTTTTTCT~MAATCCTA~TGAATTTGT~ATGATTTTT~GTGTATTAGGATCTACTTTi TKKNNEINPYIQAYIITSPTFFSKNPIEFVMIFCVLGSTF
3480
TTAGGCATTATTATAAACCCTATTTTTTCiTTTTTTCM~TAGTTTAT~TTT~GTGT~TTTTTTATT~AATAGAAAT~TTTGTTTTT~TTAAGGGTA~TAAAACTTT~TATATATAT~ L G I I I N P I F S F F QD S L S L S V F F I K === +-------
Fig. 2.
----
3600
302
K. Umesonoet al.
TATATATATATATATATATnTATATATATATATATATATACACGCGAGA~TCAAAATTT~ -------------------)<------------------~-----------+
3720
LeU-cAA>
5'-GCCUUGAUGGUGAAAUGGIJAGACACGCGAGAUUCAAAAUUUC
GTGCTTAAAGCATGGAGGTiCGAGTCCTCiTCAAGGCAA~AAAATAAAA~TATTTAGTT~AATTTTTTA~ATAAATATT~TTTATGTTA~ACTATTCTA~TGATAGTAA~GAAATTTAT~ GUGCUUAAAGCAUGGAGGUUCGAGUCCUCUUCAAGGCA-3'
3840
TATGTAAAAEATATATTTTiTATGAACATiGATTAAATTGAAAAACAGA~TCATATTTT~ATCAGATGA~
3960
ATTTTTAATkTTTTTAAATAACCCTTAAAATAATATAATiATATTTTAG~ATTTATTGC~ACGGCTTTG~TTATTTTAA~TCCTACTGC~TTTTTACTT~TTCTTTATG~ ORF34> MEVNILAFIATALFILIPTAFLLILYV
4080
ACAAACAGCiAGTCAAAAT;GTTAAATTT~TCAAAATTT~TTClTAGAA~ATTTATATT~TTTAAATAT~TATAAAAAG~AGAAAAAAA~ATAAAATTT~TATTTTTTT~TTCTTTTTT~ +--------------------, (-------------------$ Q T A S Q N S === TTCAAA> TATATT>
4200
ATTACTATT;CAATTATTA;\nTTATTAAAiAAGAAATGA~TCATATGGA~CTAGGACCT~GTACAATAC~AGGCGTTGG~TTAATTATA~TAGGTCTAT~TTTATATGC~CTTAAATTA~ ORFl35 MNHMELGPSTILGVGLIIIGLFLYALKLR
4320
GGGAACCTTi\TGTTTCTAGnGGTGTGATTiAAAATGTAC~TAAAAAATT~TCAACATAT~TTGAATATA~AATTAATAT~AAACTTGAA~ATTTTTTAA~AAATTAGAA~AACTTTATT~ E P Y V 5 R Dgugyg..............................(~ntron)........................................................
4440
AAAAGTTTTI\TAAAACCTAiGGTTAAAAT;\TTTAAACAT~TTTATGTTA~TTCGAAAAT~TACTCTTTA~ATGTAAAAA~TAGGATTTT~GTTAAAATT~TTTTTTCTC~AGAGAAAAA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .._...
4560
CAACAACAAEAACAACAACEAAATTTAAT~TGAACCTAA~GATTTTTTA~AATAGTACA~AAAATATAA~TTTTTATAA~TAACAAATT~TTAATTTTT~TTTGTTAAT~ATAAAAAAA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..(1ntron)........................................................
4680
GACAATCCAl\GACTTAAAAAnnnTTACTTGGnsTTTAGAG~AAAAACCAT~TTTTATGGT~TTTGTTTAA~CCATACAGA~TTGAAAATA~CATATATGG~TTTCAAGGG~GGAAAAAGA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .(lntron) . ..ragccg-augaa---gaaa--uucaugu-cgguuy...................
4800
TTATAAAATtTAAAACCTAiCCTAATATT~TGATTTTTT~TTTTCATGT~TTGGATTAT~ATGTGGAGG~ATTCTTTTT~TTCAGGGTT~GCGCTTAGA~CCTATTCTT~TATTATCAC~ YDFFFSCIGLLCGGILFFQGWRLDPILLLSQ . . . . . . . . . . . . . . ..cuayy-y-ay
4920
AATTTTATTitAGTGGAACAi\CTATTTTTTiTATTGCAGA~AGTCTTTAT~TAAGAAAAA~TCTCAATTT~GTAAAATCT~AAAAAAAAT~CATAAATTT~GCAAAAAAA~ATATATATA~ I L L 5 G T T I F FIA E S L Y L R K N L N F V K S K K K Y I N L AK K N
5040 I
Y
K
ATACATTTAiGAAAATTTTkAATTAAAAA6AAAATGGAAjGAATTAAAT~ATACAAGAC~CATTTTTTA~AAAAAAAAA~AGCATTGAA~TTTTTGAAT~CAATGCTTT~TTTAAAAAA~ Y I Y E N F K L K K K W N E L N Y T R H I F Y K K K K H === +---------------, <---------------+ CTTATATTTnTAAACCACTiCGTCCCCAT~CAACAAGTG~AAGAGAAAA~GTAAAAATA~CCATTAAAG~AGCCCAGGC~ATATTAATA~TATCCATTT~AAAATCCTT~ATTTTGCAT~ === L G S R G W V V L S L S F T F I V M L A A d A I N I I D M AGGA TTTTTTTTAkCTATCAAAAkTTTTTTCTTiCTATATAATkTATATATAT~TATATATAT~ +----. -----------------,c~-----------------
-----+
5160
5280
+--~------------,<----
TATATATATiiTAAATATAGbTnGAAATATTTTTTTTTTA~ATAATTTTT~ATATTTTTT~GCTAATTAA~AAAATGAAA~ATTAACTCG~TTTTTTTTT~TTATTGTTT~AATCTTATG~ +------<--------------+ > +
5520
TGGGTTGTCiATAATATAAiAAATTAAAAAACATTTAAA~ACGTATCAG~CTATAATAC~TAGAGCAGT~CCAATTTGT~TTTTTAAAT~ACATAATAG~CAATCCACA~TTTCTATAT~
5640
TTATTAATAiiAGCGACACCCAGATTTGAACTGGGGATAA~GGATTTGCA~TCCTCTGCC~TACCACTTG~CCATGTCGC~TTTATTTTA~TTAATTGTA~AATACAATG~ATTATTTTA~ 3'-UUCGCUGUGGGUCUAAACUUGACCCCUAUUUCCUAAACGUCAGGAGACGGAAUGGUGAACCGGUACAGCGG-5' < Cys4CA
5760
AATTCAAGGEATATTCAATkTTTTTTATA;\CAAAATATA~TCAATTTTT~TTTCTAATA~TTAATAAAA~ACACTCCAA~AATAATTTA~AGGAAAATA~GGAGATTTT~ATACTCCCT~ TATAAC>
5880
AATTTGGTAiiAATACAATTiGAAGGATTTAATCGTTTTA~AAATCAAGG~TTGAGTGAA~AACTTAGTA~TTTTCCAAT~ATTGAAGAT~TAGATCAAG~ATTCGAGTT~CAAATATTT~ FGKIQFEGFNRFINOGLSEELSNFPIIEDIDQEFEFQIFG
6000
GTGAACAATi\TAAATTAGCkGnnccnTTAiTAAAAGAAA~AGATGCCGT~TATCAATCT~TTACCTATT~ATCCGACGT~TACGTACCA~CTCAATTAA~ACAAAAAAA~AAAGGAAAA~ EQYKLAEPLLKERDAVYQSITYSSDVYVPAQLTQKKKGKI
6120
TACAAAAAC2\AATAGTTTTiCTTGGAAGTATiCCTTTAA~GAATTCTCA~GGTACTTTT~TTGTTAATG~AGTAGCTCG~GTTATAATT~ATCAAATTT~ACGAAGTCC~GGAATTTAT~ Q K Q I V F L G S I PLM N S 0 G T F V V M G V A R V I I N Q I L R 5 P G
I
Y
Y
6240
ATAATTCAGAATTAGATCAiAACGGAATTCCTATATATACAATATGGGC~CGTATAAGT~ N S E L D H N G I P I Y T G T L I S id
I
S
K
6360 W
G
G
R
L
K
L
E
I
D
G
K
T
R
I
WAR
AAAAAAGAA6AGTTTCTAT;TTAGTTTTA~TATTAGCTA~GGGTTTAAA~TTACAAAAT~TTTTAGACA~TGTTTGTTA~CCTAAAATT~TTTTAGAGT~TATAAAAAA~AACACAAAA~ KRKVSILVLLLAMGLNLQNILDSVCYPKIFLEFIKKNTKK
6480
AAGAATATCEGAATTCAACiGAAGACGCTATncTGGAAC~TTATAAACA~CTATATTGC~TAGGTGGAG~TCTTTTTTT~TCTGAATCG~TACGCAAAG~ATTACAAAA~AAATTTTTT~ EYPNSTEDAIVELYKHLYCIGGDLFFSESIRKELQKKFFQ
6600
AACAGAGATGTGAGTTAGGcAAAATTGGA~GATTAAATT;A.4ATGAAAT~TTTGTATTA~CACAAGATA~TTTAGCAGC~GTTGATTAT~ QRCELGKIGRLNLNKKLNLNVPENEIFVLPQDILAAVDYL
6720
TAATCAAATiAAAATTTGGiATAGGTACA~TTGATGATA~AGATCACTT~AAAAATCGA~GTGTTTGTT~TGTAGCAGA~TTATTACAA~ATCAATTAA~ATTAGCATT~AATCGTTTA~ IKLKFGIGTIDDIDHLKNRRVCSVADLLQDQLKLALNRLE
6840
AAAATTCAGiTCTTTTTTTiTTTCGAGGAEiCCACAAAAC~AAAACGATT~CCGACTCCA~AAAGTTTAG~AACTTCAAC~CCATTAATA~TGACTTTTA~AGAATTTTT~GGTTCACAT~ NSVLFFFRGATKRKRLPTPKSLVTSTPLIMTFKEFFGSHP
6960
CATTGTCTCAATTTTTAGAiCAAACAAATCCATTAACTG6GAAGAACfG~AAGTTTTCA~GTACGTGAT~ LSQFLDQTNPLTEIVHKRRLSSLGPGGLTRRTASFQVRDI
7080
Fig. 2,
cont.
Liverwort
Chloroplast
Genome.
II
303
TTCACGCTAETCATTATGG;AGAATTTGTCCTATAGAAA~ATCTGAAGG~ATGAATGCT~GACTAATAG~TTCATTAGC~ATTCATGCA~AAATAAGTA~TTTAGGGTG~TTAGAAAGT~ HASHYGRICPIETSEGMNAGLIASLAIHAKISILGCLESP
7200
CATTTTATAkAATATCTAAkTTATCGAATiTAGAAGAAA~TATTAACTT~TCTGCTGCT~AAGATGAAT~CTATCGAAT~GCTACTGGC~ATTGTTTAG~ATTAGATCA~AATAGTCAA~ FYKISKLSNLEEIINLSAAEDEYYRIATGNCLALDQNSQE
7320
AAGAACAAAiTACTCCTGCGCGCTATCGA~AAGATTTTGiAAGTATTTT~CCTTTACAA~ATTTCTCCG~TGGAGCATC~CTTATTCC-~ EQITPARYRQDFVAIAWEQVHLRSIFPLQYFSVGASLIPF
7440
TTCTTGAACRTAACGATGCnAATAGAGCTiTAATGGGCTCTATAGAAAG~CAAACAGCG~ LEHNDANRALMGSNMQRQAVPLLKPEkCIVGTGIESQTAL
7560
TAGATTCGGGAAGTGTTACiGTCTCATCGCATGGAGGAAjATCAAATTA~TTTATCCTT~AAAAAAAAA~AAATTGATA~AAATTTAAT~ATATATCAA~ DSGSVTVSSHGGKIEYLDGNQIILSLKKKKIDKNLIIYQR
7680
GTTCTAATAkTAGTACGTGiATGCATCAA;\AACCTAAAG~AGAAAAACA~AAATATATA~AAAAAGGAC~AATTTTAGC~GACGGAGCT~CTACTGCAA~TGGCGAATT~GCTTTAGGT~ 8 '/ 'G STCMHQKPKVEKQKY!KKGQILADGAATANGELALGK
7800
AAdATATTTiAGTAGCTTAiATGCCTTGG~AAGGTTACA~TTTTGAAGA~GCAATTTTA~TTAACGAAC~TCTAATTTA~GAAGATATT~ATACTTCAA~TCATATTGA~AGATATGAA~ II ! L VA Y M P W E G Y N F E 0 A I L I N E R L I Y E 0 I Y T S I H I E R
7920 Y
TTGA~GCTCbTGTAACAAGiCAAGGTCCT~AAAAATTTA~TAATGAAAT~CCCCATTTA~ATGATTACT~ACTTCGTCA~TTAGATCAA~ATGGCATTG~ATTAACAGG~TCTTGGGTr~ E A R 1 T S Q G P E K F TN E I P H L 0 D Y L L R H L D Q N G I V L T G SW
E
I
V
E
8040
IGlC~GGAG6TGTTTTAGTEGGAAAATTA6CACCTCAAGjAGTAGCAAC~TCGAAAGAA~ T G DlLVGKLTPQETEENLRAPEGKLLQAlFGIQVATSKET
8160
C-TGT:TTAkAGTCCCTCCkGGAGGTAGGi;GTCGAGTTA~TGATATTCG~TTAATCTCT~AAGAAGACA~TTCTGCTAA~ACAGCACAA~TTATTCATA~TTATATTTT~CAAAAACG.r~ C L h : P P G G R G R V I II I R L I S Q E D N S A il T A Q I I H I Y I L Q
8280 K
R
K
'I:~-~~AA4TI\GGTGATAAAGTTGCTGGAAGACATGGAAATAAAGGTATTATTTCAAAAATATTACCAAGACAAGATAT~CCTTTTTTA~AAGATGGTA~ACCAATAGAiATGATATTA~ 1 G D K V A G R H G N KG I I S K I L P R Q 0 iI PFLQDGTPIDMILS
8400
;-CC?TTAG6CGTACCTTC;CGAATGAATGTAGGACAAA~TTTTGAATG~TTGTTGGGT~TAGCAGGAA~TTTTCTTCA~AAAAATTAT~GAATAATTC~TTTTGACGA~CGATATGAA~ 3 L G /I P 8 R M N V G Q I F E C L L G LAG S F L ti K F? i'R I I P F D E R
Y
E
R
i;~~liCCTCAAGAAAGCTAGTCTTTTCTGAACTTTATAAAGCAAGTAAAAAAACAACA~ATCCATGGT~ATTTGAACC~GATAATCCC~GAAAAAATC~ACTAATCGA~GGAAGAACA~ :. 8 R K L V F S E L Y K A 5 K K T T N F > ! L F E p D: P G K N R L ID
G
R
T
G
T
Q
Q
P
C?C-TCGAGcAAGATCTAGAAGAGGAGGTCnAAAGAGTTG~TGAAATGGA~GTGTGGGCT~TAGAAGGCT~TGGTGTAGC~TATATTTTA~AAGAAATGT~AACTATAAA~TCTGACCAT~ R G P S R R G G Q R V G EM E VW A L E G F G VA Y : L Q E M L T I K 8 D
H
I
--CGdGCTCsTTATGAAGTiCTTGGTGCTjlTTGTTACTG~AGAACCTAT~CCTAAACCA~ATACTGCTC~GGAATCATT~AAATTACTT~TAAGAGAAT~ACGATCTTT~GCTTTAGAA~ R : R \- E V L G A I V T G E P I P K P II T A P E 8 F K L L V R E L R S LA
L
E
I
L
R
I
;-sd~ATTTiTGAACAACCiATAACTATTGGAAAAGCTT6TGCATTAGT~ACACAACAA~ : F E Q P I T I G K A Y M L K L I
8520
8640
8760 H
Q
V
D
D
K
I
H
A
R
S
TTaATCATGjTAiTATATGiGAAAAAAATiTGAAATTAAj~TTGGAATTT~TATGTTATG~CTTATCAAA~AAAACATCA~CATCTTCGA~ ', I. 'J : ! C E K N L K L K L K E I === GGA rpoCl>
S
G
P
Y
A
L
V
8880
9000
9120 M
T
Y
Q
K
K
H
Q
H
--~AATTAGCCTCACCTGAACAAATACGTAATTGGGCCGAAAGAGTGTTACCAAATGGT~AAATTGTTG~TCAAGTAAC~AAACCTTAT~CATTACACT~TAAAACACA~AAACCAGAA~ .ASPEQIRNWAERVLPNGE:VGQJT'PfTLHYKTHKPEK
9240
P~GPTGG:TTATTTTGCGAAAAAATTTTCGGACCTATTAAAAGTGGAAT~TGTGCATGT~GAAAATATC~AGGTATTGA~AAGAAAAAA~AAAATATAA~ATTTTGTGA~CAATGCGGA~ ;m 1: L : CEKIFGPIKSGICACG
9360 G
V
TGGAATTTP;TGAATCTCGnATTCGAAGA~ATCGAATGGEATATATTAA~TTAGCATGT~CTGTAACTC~TGTTTGGTA~T?AAAACGT~TACCTAGTT~TATTGCAAA~CTTTTGGCr~ E F : E S R I R R Y R M G Y I K LA C S d T H V: i L" R L P 8 Y I AN L LA
9480 K
APCCTCTTA6AGAGTTAGA6ncTCTAGTTinCTGCGATG~GTGACTTGA~CATATTGAT~TTGATTTTA~CAGATATAA~TAACTGTTA~TTTAATTTA~TTTTAATAA~TACCTCAAA~ ' L i E L E S L V Y C D gugyg.............(lntron)........................................................
9600
-TTG~CATTnCTTAATAAARATATGAAAT~AAGCTCAAA~AAAACAATT~TCATATTAT~TTTTGTATT~GTTTAGAGG~TTTAATCrA~GGTTATAAA~AAAATTATT~GCTATTTAG~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..(lntron)........................................................
9720
TTTCGTAAAAAAAATAATGAAAAAACTATTAAATTTTTTAATGGTAAATTTAAAGTAAA~TCATCGCAA~TATACTAAA~AAAAAACCA~CGAAATAGA~CAAAAACCT~AAGGATAAA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..(1ntron)........................................................
9840
DKAAATCAA6ATACAGACA6GAAAAATCTjTATTAAATA~TATAATAAA~AATATATAG~AATAAACAT~TATAAACAA~TPTTTTTAA~GAAATAAGA~AACTCAATA~TAAAAAAAT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron) . .. . .. . . . .. . . .. . . . .. . .. . . . .. . . . .. . . . .. . .. . . . .. . . . .. . . .. .
9960
ACTTGAGTCnTGAGTAGCA;TTTTTTTTiTTTTGATTT~TTATATTTA~AATCAAAGT~TTTAATAAA~AAATTATAA~TPTlT~~AG~~GGATGACGG~AAACTTTCA~GTCCGATT'~~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron). .. . . . .. . . .. . . . . . . ro','.'.'l-d"',,,a-gdaa---uucdugu-cgguuy.
10080
TAGGGGGGGAATTCTATAA6TAACCTATCCCAATCTCTT~CTTGCTAGA~CTATAACTA~AAAACCCAC~TTATTAAAA~TA~AA~,f,~TiATTTAAATA~GAAGATCAA~CTTGGAAAG~ cuayy-y-ayL F L A R P I T K K P T L L Y .*...................... 0 (> I t?YEDQSWKa
10200
TATTTTTCCiCGCTTTTTTiCTCCTAGAG~TTTTGAAGTiTATTiAAA~~A~.AAl;AA(.TAATTT6AACTTACAAkATGTAATAA~ : F P R F F S P R G F E V F Q N R E I A T G G Ii Al
10320 0
Y
Q
N
V
I
TCTTGCACAcTTAGAATGGiAAGAGTTTG~TGAACAAAA~TCAACTGGA~ATGAATGGG~AGATAGAAA~ATT~Af,~f,A~r,AAAA~,Ali.;ITlAf;TTAG/\CGAATAAAACTAGCTAAAC;\ L A H L E 'vl K E F A C Q K S T G N E W E D R Y I II // ' Y II I I V I( RIK
L
A
Ki
TTTTATTCA~ACAAATATA6AACCAGAAT~GATGGTTTT~TCATTATTA~CAGTGCTTC~CCCGGAATT~Cf~T~~AAi~,~II~,AA~.lAf,~,f.f,AAf,f;Tf~AnrrnnrnnCnjcTGATTTAA~ F I Q T N I K P E W M V L 5 L L P V L P P E L P /1 M II ! f, I (, I I IT
S
D
L
TGAACTTTA~AGAAGAGTT~\TTTATAGAA~TAATACTCT~CTTGATTTT~TGGCACGAA~TGGTTCTA(.~I.~.A~,~,A~,~,I~IA~,II~,~I Et Y R R V I Y R N NT L L D F LA R S G 8 T II f, I, \,
Fig. 2, cont.
Cl
i
I
I
/j
L
N
L
V 10440
10560
I~,(.~AAAA~,C~TTAGTTCAA~AAGCTGTTG~ I. II Y I( i VQEAV?
Y 10680
304
K. Umesono et al.
10800
10920 TCTTATTGG;\CGAAATTTTi;CTCCTAATCiGAGAGCAGC~AAAACTATG~TTCAAAATA~AGAACCTAT~ATTTGGAAA~TACTTCAAG~AGTTATGCA~GGACATCCT~TTTTATTAA~ L I G R N F A P N L R A A K T MIQ N K E P I I W K V L 0 E V M Q G H P I
11040 L
L
N
TAGAGCACC;mCATTACATi\cATTAGGAA~ACAAGCATT~CAACCAATT~TAGTAAATG~ACGAGCTAT~CATTTACAT~CGTTAGTTT~TGGTGGTTT~AATGCTGAT~TTGATGGAG~ R A P T L H R L G I Q A F 0 P I L V N G R A I H L H P L V C G G F N A D F D
G
D
TCAAATGGCiGTTCACATAECTTTATCATiAGAAGCTCA~GCAGAAGCT~GTTTACTTA~GCTTTCTCA~AAAAATTTA~TATCTCCAG~TACAGGAGA~CCTATTTCT~TGCCAAGTC~ QMA V H I P L 5 L E A Q A E A R L L M L 5 H K M L L S PAT G E P I S V
P
8
Q
AGATATGCTiCTTGGACTTiATATTTTAACAATTGAAAAiTAAAAAAAA~TTTTCTCAA~TACCTTATT~ D M L L G L Y I L T I E N N Q G I Y G N K Y
P
Y
F
11160
11280
11400 N
P
S
K
K
Y
D
S
K
K
K
F
S
Q
I
11520 TATTGAAATiCAATACAAAiCTTTTGGnAhTTCTTTTCA~ATTTATGAA~ATTACCAAC~TAGAAAAAA~AAAAACCAA~AAATTATTA~TACTTATAT~TGTACAACA~CTGGACGJA~ I E I Q Y K S F G N S F Q I Y E H Y Q L R K N K N Q E I I S T Y I C T T TCTTTTTAAiCAACAAATTbAAGAAGCTAiACAAGGTACjTGGATAAAA~AGGAGCCAT~CATTTCAAT~ L F N Q Q I E E A I Q G T Y K A S L K 0 K
T
F
V
4
K
I
E
K
N
G
===
A
G
R
+-----,
11640
I
11760
<---
GCTTAAATTCATTGTTTATiTATGTTTAT~AAAACAAGAGGTTTTTTAT~ATGGCAGAA~CAGTCAATT~GATATTTTA~AATAAAGTT~TGGATCGAA~TGCCATAAA~CAACTTATA~ + rpCi!> GAGG i.lA E P V Nt I F Y N K V M D R T A I K Q L
1188G J
S 1200G
CAGCACCTTCTAAAAGTTGGCTTATTGAAi;AIGCAGAAC~ATATGGTAA~CTTTCAGAA~AACACCATA~TTATGGGAG~TTACACGCA~TAGAAAAAT~GCGTCAACT~ATAGAAACA~ A P S K S W L I E D A E Q Y G N L 8 E K H H ni Y G SLH A :! E K L R Q L I
E
T
14
GGTATGCTAEAAGTGAATAiTTAAAACAGGAAATGAATC~TAATTTTCG~ATAACAGAT~CGTTAAATC~AGTTCATAT~ATGTCTTTT~CCGGAGCTC~AGGCAGCAC~TCTCAAGTT~ Y A T 5 E Y L K 0 E M 1‘1 P N F R I T 0 P L M P V H MM S F 8 G A R G S T S
Q
V
H
Y
G
ATCAATTAGiAGGTATGAGkGGATTAATGiCAGATCCTCATAATTTTAG~GAAGGTTTA~CTTTAACAG~ATACATAAT~TCCTGCTAT~ Q L V G M R G L M S D P Q G 0 I I D L P I Q S N F R E G
12120
12240
12360 L
S
L
T
E
Y
I
I
SC
GAGCACGGA;\AGGAGTAGTnGATACTGCAGiACGTACCT~TGATGCAGG~TATCTTACT~GAAGACTTG~TGAAGTAGT~CAACATATT~TTGTCCGAA~AGTAGATTG~GGTACTCTT~ ARKGVVDTAVRTSDAGYLTRRLVEVVQHIVVRKVDCGTLY
12480
ATGGTATAAl\TGTAAATAAiTTATCAGAA~AAAAAAATA~TTTTCAACA~AAATTAATC~GACGTGTGA~TGCAGAAAA~AT~TATATA~ATCATAGAT~TATTGCTCC~CGAAATCAA~ G I N V N N L S E K K N N F Q Q K L I G R V I A E N I Y I D H R C I A P R
N
Q
D
ATATCGGCGEACTTTTAGCEAATAGATTAATAACATTAA6ATTATGCTA~GGTTGGAGT~ I G A L L A N R L IT L K T K 0 I F L
li
S
L
12600
127?3 R
S
P
L
T
C
K
S
M
N
W
I
C
Q
L
C
Y
G
1284C
GAGATATTGCAGAGCATGTACGAACCCCTiTiAATGGAAiTATTGAATT~AATGAAAAT~TTGTATATC~AACACGAAC~AGACATGGA~ATCCTGCAT~GATGTGTCA~ACTAATTTA~ D I A E H V R T P F bJ G I I E F 1‘1 E M F V Y P T R T R H G H PAW M C H TN
12960 L
F
TTTTAGTAAjTAAAAGTAA;AATAAAGTACATAATTTAA~TATTCCACC~AAAA~TTTA~TATTAGTTC~AAATAATCA~TACGTGGAA~CCAAACAAG~TATTGCCGA~~TTCGGGCT~ L V I K S K N K V H N L TIP P K 8 L L L V Q N N Q Y V E S K Q V I A E I
R
A
K
AAACATCAClTTTTAAAGAkAAAGTTCAAkAATATATTT~TTCTAATTT~GAAGGCGAA~TGCATTGGA~TACAAAAGT~CGTCACGCT~CTGAATATA~ACATAGTAA~ATTCACCTT~ T S P F K E K V Q K Y IY S U L E G E 14 H W ST K V R HA 8 E Y I H S N I
H
L
I
TACTTAAAAEGTGTCATAT;TGGATATTAiCAGGAAATT~TCATAAAAA~AACAATGAT~TATCTGTAT~ATTTTATAA~AACCAAGAT~AAATTGATT~TCCAATTTC~CTTACAAAA~ L K T C H I W I L 8 G N F H K K N N D L S V L F Y K N Q D K I D F P I S L
T
K
E
AAAAAAATGiATTTTCTJTiGTAAAAAATkAAACTCAAT~AAATCTTTT~CTJTTTCAT~TTTATCTTT~TAAAAAGAA~AAAATTTTT~~TAAATCCC~ATTAACAAA~AATATATTA~ K N E F 8 F V K N K T Q L N L F L F H F Y L Y K K N K I F I K S Q L TN
I
L
N
133SQ
13200
i332C
13440 N
13562
1366:
TTTTTATAA6AAATAATAAkTTTATTCAA~CAGGTACGC~TATTACTTr~AATATAAGG~GTAATACCA~TGGATTAGT~AAAATTCAA~AAAAAGGAA~TAATAATTA~GAGTTAAAA~ FIKNNKFIQAGTLI TSNIRSNTNGLVKIQKKGNNNYELKI
13921‘
TATTACCTGcAACTATATAiTATCCAAAT~AAACATATA~AATTTCAAA~CAAATAAGT~TTTTAATAC~ACCAGGAAA~AAACTTTTT~ATGAATTTG~ATGCAAAAA~TGGACATAT~ LPGTIYYPNFTYhlSKQI SILIPPGKKLFNEFECKNWTYL
14040
TTCAATGGAiTATGCCTTCiAAAGAAAAACCGTTCGTTT;TTTATTAAA~AAAAATAAA~ QWIMPSKEKPIV~IRPAVEYKISKKLNKSTLFDLLKKNKK
14161'
AAGTAGAAAiTAAAACTATRAATTATCTT~TTTAC~AAG~TGAC~AACA~ATTCAAATA~TAAATGAAA~AAACATTCA~TTAATTCAA~CTT~Tr~-AC~TGTACATTG~AAAAAAAAA~ VEIKTINYII YLII II F 0 [ 0 I I N E K N I Q L I Q T C L L V H W
Fig. 2, cont.
142X K
K
K
Y
Liverwort
Chloroplast
Genome.
305
II
14400
14520
TTTCAAAAAATGTTTTAAAnAAAAACTATinTGATCATTiATAATAT~TAATCAAAAiAATGGAATG~ SKNVLKKNYYDHFFSISKNELKNKKQGVIRIISNQNNGMQ
14760
15000
AATTGCTTTiCGAAAATTTiGTGATATCTAAATATAAAACTTATAAGTA~AAATATAAA~TArTTTATT~TTCGATTAG~TAAACCTTA~CTGGCGACT~ L L F E N F V I S K Y K T S Y P S G 0 I I S I N I N YF I I R
15120 L
A
K
P
Y
LA
T
G
GGGGGGCTAtTATTCATAAiAATTATGGT~AATTTATTA~AGAAGGAGAiACTTTAATA~CACTTATAT~TGAAAGATT~AAATCTGGT~ATATCATTC~AGGTCTTCC~AAAGTTGAG~ GATIHNNYGEFIKEGDTLITLIYERLKSGDIIQGLPKVEQ AATTGCTAGAGGCACGTCCAATAAATTCAGTTTCTATTAkTGATATGAT~AAATTTATC~GTAATCTTT~GGGTTTTTT~TTAAGTACA~ L L E A R P I N S V S I N L E N G F E D W N N D !I I K F I
15240
15360 G
N
L
W
G
F
F
L
8
T
AAATTAGTAiGGAACAAGGnCAAATAAACiTGGTTGATC~AATTCAAAA~GTATATCAAiCTCAAGGAG~ACAAATATC~AATAAACAT~TAGAAATCAiTGTACGTCA~ATGACTTC~~ I S M E Q G Q I N L V D Q I Q K V Y Q S Q G V Q IS 'd K H I E I I V R Q
M
T
SK
AAGTAATAAcTTTAGAAGAiGGAATGACTbnTGTTTTTTiAGTTCCTTA~AAACCCATA~ V I T L E D G M T N V F L P G E L I
K
P
I
K 15480
15600 E
F
S
R
T
0
K
lil
*j
R
A
L
E
E
A
VP
Y
L
TATTAGGAAiAACCAAAGC;TCTTTAAAT~CTCAAAGTTiTATTTCAGA~GCTAGTTTT~AAGAAACTA~AAGAGTTTT~GCAAAAGCT~CGTTAAAAG~CCGAATTGA~TGGTTAAAA~ L G I T K A S L N T Q S F I S E A S F 0 E T T R V LAYAALKGRIDWLKG
15720
GTTTAAAAG6AAATGTTATiCTTGGTGGA~TAGTTCCAG~GGGAACAGG~TCACAAGAA~TTATTTGGC~AATAACTTT~GAAAAAAAA~AAGAAATAT~TTTAAAAAA~AAAAAAGAA~ L K E N V I L G G L V PA G T G S Q E V 11;: Q! TLEiYKEIYLKKKKEF
15840
TTTTTACTA6AAAAATTAAEAATGTTTTTiTATATCAAG~CACATTTTC~ATTTTTCCT~CTACAGAAA~TATTCATAA~GTATTAAAA~AATCAATTT~TCAAAATAA~AAAAATAAT~ F T K K I N N V F L Y Q D T F S I F P T T E I :C )i V L i E S IS Q N N K
15960 N
N
F
TTTCTATTTAAAAAAAAAAiAGAAATTTAATGATATATAiGTAAAACCT~TTTTATATA~ACTTTTATT~TAAATAATA~TATAAAAAT~AAAAATGAA~CAAAAATCTiGGAATATTC~ s 1 === rp.72, MKQKSWNIH
16080
TTTAGAAGAAATGATGGAAGCAGGTGTTCj\TTTTGGTCA~CAAGCTCGG~AATGGAATC~AAAAATGGC~CCTTATATT~TTACAGAAA~AAAAGGTAT~CATATTA~,~~ATCTTACTC~ L E E MM E A G V H F G H Q A R K W N P K MA P i I r T E R KG I H I I N L
TQ
AACAGCTCGnTTTTTATCTGAAGCTTGTG~TTTAGTT~~AATGCGTCA~GTAAAGGAA~ACAATTTTT~ATTGTAGGA~CAAAATATC~AGCAGCTGA~TTAATTGAG~CATCTGCT,:~ TAR F L S E AC D L VAN AS SK G K Q F L ! V G T Y i 0 A AD L I E S S
A
16200
16320 1
AAAAGCTAGATGTCATTAT;TAAATCAAA~ATGGCTTGG~GGTATGTTA~CAAATTGGT~AACTATAGA~ACTCGTCTT~AAAAATTTA~AGATTTAGA~AATAAAAAA~AAACAGGAA~ K A R C H Y V N Q K W L G G M L TN W S T I E T R L 0 v FYDLENKKKTGT
16440
AATAAATCGACTTCCTAAAlAAGAAGCAG~AAATTTAAA~AGACAATTA~ATCATTTAC~AAAGTATTT~GGTGGTATT~AATATATGA~AAGTTTACC~GACATTGTT~TTATTATTG~ I N R L P K K E A A N L K R Q L D H L 0 K Y L G GI ' i 14 T S L P D I V I I
I
I)
TCAACAAAAkGAATTTACAGCTATTCAAG~ATGCATTAC~TTAGGAATT~CTACAATTT~TTTAGTTGA~ACAGATTGT~ATCCAGATA~GACAGATAT~CCAATTCCT~CCAACGATG~ QQKEFTAIQECITLGIPTICLVDTOCJP li M T D 1 P I P A N
D
D
TGCTAGAGCiTCAATTAGAiGGATTTTAA6TAAATTAAC/AAAAATAAT~ACTACTTTC~AATAAAAAA~ AR A S I R W I L N K L T L A I C E G R Y
Id
8
I
Y
11
===
16560
16680
+-------
TAGATTAAAiTAATAACAA6TCTTTTTTTaTTTATTCTTnTACGAAAAA~TGTCTTTTA~TATTTTTAT~TTATTTlT~~AGGAGTAAT~TGTCTCATA~TGCAAAAAT~GCTAGCAC~~ <------------+ > AI,GAG atpI, M S H T A K M A
~_
16800
16920 S
T
F
TTAATAATTiTTACGAAATkTCAAATGTCi;AAGTAGGTC~ACATTTTTA~TGGCAATTA~GTAGTTTTC~AGTTCA~~~~~AA~TACTA~TAACTTCAT~GATTGTAAT~GCTATTTT,~~ N Id F Y E I S N V E V G Q H F Y W Q L G S F 0 V 4 A C J 1 ITSWIVIAILL
17040
TAAGTTTGGETGTTTTAGCEACTCGAAATiTACAAACAAiTCCAATGGGiGGTCAAAATiTTGTCGAAT~TGrTTTAf~A~TTlATICGr~ATTTGACTA~AACACAAAT~GGAGAAGA,~~ S L A V L AT R N L Q T I PM G G Q N F V E I V L Cl IPDLTRTQIGEEE
17160
AATATCGTCCTTGGGTACCiTTTATAGGAj\CTATGTTTT~ATTTATTTT~GTTTCTAAT~GGTCTGGTG~TCTTTTTf,~~T~~(,GAGTT~TTGAACTCC~TAATGGAGA~CTTGCTGCA~ Y R P W V P F I G TM F L F I F V S N W S G A II P ~1 i, Vi ELPNGELAAP
17280
CAACAAATGl\TATCAATACiACTGTTGCA~TAGCTTTAC~TACATCTGT~GCATATTTTiATGCTGGTC~ACATAAAAA~~~~,ATrAAGT~ATTTTGGTA~ATATATTCA~CCAACCCCA~ TN D I NT T VA L A L L T S VA Y F Y A G L H Y Y I, L 5 Y F G K Y IQ P
17400 T
P
V
TACTTTTACEAATAAATATiTTAGAAGATiTTACTAAAC~TTTATCACT~AGTTTTCGA~TTTTTGGAA~TATTTTA~~(,~r,AC~~AATTA~TTGTTGCTG~ACTTATTTCiTTAGTACCT~ L L P I N I L E D F T K P L S L S F R L F G N II A IJLI VVAVLISLVPL
17520
TAGTAGTTCCTATACCTATEATGTTTTTAi;GATTATTTA~TAGTGCTATiCAAGCTTTA~TTTTTGCCA~ACTTf,(.Ar,CI1(,CTTACATAEGCGAATCTAjGGAAGGGCA~CATTAATAA~ V V P I PM M F L G L F T S A I Q A L I F A TLA A A f IG E S M E G H H =====z
17640
ATTTTTTTTkTAAAGAAAAiTAAGTATTA~AAAAAAAAT~TATATAATA~GATTGTTTT~ATTTGATTT~GAATTTTAT~ATATAGAGT~AGATATTTC~AATAAAGAT~TATATCTTT~ +---------, <-------+ +-- ----------)<-------. TTGTTT> lAIAAT>
17760
TTTTATTTAACATTTTTAT6AATATCGACATACTAACAA~AATTTtTAT~GGTAATTAG~TATTTCAAT~TTTTATTTA~AAATTTTTG~ATTTAAAATiTAATAAAAC~TTTAGTGAT~ +-----------> <- ----------+ +
17880
Fig. 2, cant
306
K. Umesono et al.
ClAACACTTnAAAAAGAGAEACTTTGAGTinTTAACTGC;AAAAATTTT~GTAAGCAAC~AAATAGATT~TTAATAAAA~CTTTTTTCA~AAATTTAGT~
18000 a tpn>
AAAGGAGAT;ATCATGAACECCTTGATTTtTGCTGCTTC~GTTATTGCT~CTGGATTAG~TGTGGGCCT~GCTTCTATT~GACCTGGAA~TGGTCAAGG~ACTGCAGCA~GTCAAGCTG~ AGGAG MNPLISAASVIAAGLAVGLASIGPGIGQGTAAGQAV
13120
AGAAGGTAT;GCAAGACAGECTGAAGCAG~AGGTAAAAT~CGAGGTACT~TACTTTTAA~TTTAGCTTT~ATGGAAGCT~TAACCATTT~TGGATTAGT~GTAGCTTTA~CACTTTTAT~ EGIARQPEAEGKIRGTLLLSLAFMEALTIYGLVVALALLF
18240
TGCAAATCCEiTTTGTTTAAiAATTTCAAT;TTTGAATAA~TAATTGTTA~TAATTTTTC~TTCTTGAAA~AAAAGAAAG~AAAATTAAT~ACAATTAAA~TATATTTAA~ATTTTTTTG~ +------------------------, <------------------------+ A N p F " ==z===
18360
GGTAAAAATnAAAAAAAGCAAAAATTTTTAAnTGAA~AAAACTGTT~AGTAAACAA~AAACTCCAT~ATTTTCAAT~ATATAATAA~GAAAAAAAG~GGACAGCAT~GAAAATGGG~ +----------) <-------+ GAGG M TTGAGT> TTCAAT, atpl+
18480 E
N
Gl
CTTATTTTA;TATTTCCTCAAATTTTTGGj\CTATAGCTG~AAGTTTTGG~TTAAATACA~ATTTATTAG~AACAAATTT~ATCAATTTA~GCGTAGTAC~TGGGTTGTT~GTGTATTTT~ Y F I I S 5 N F W T I A G S F G L N T N L L E T Nt I N L G V V L G L L V Y
F
18600 G
GAAAGGGAGiGTGTGCGGGiTGAATATTTGAATAAAAAA~TGGAATGAT~CAATAATAC~TAAATAAAA~AGTATATAA~CCTAACGAA~AACTTTTGG~TAAAAAACT~AAAAGAACA~ K G V Lgugyg.......................................(lntron)........................................................
18720
TAGCATTTCETAAACTCAAkAAATTTATTiTGAGAAGGG6GGATTTTCT~TATCCCACC~AGCTTTTTG~ (lntron)........................................................ ,.......................................................
18840
ATGGTTGAAbATATTTTTGATATATAACACTCATATCAAiCTATAAAAT~TATTAAATT~TGATAATCT~CCCTTAAAT~TTTTTAAGT~CTGAATTGA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron)........................................................
18960
GACCTATTTACAATTTATAATTTATATAA~AACAATCTT~CTGACAAGT~TCAAIATTT;TGTCAAAAGAATCATCAACAATTATTTTACGTAAAAAAA~GAAAATAAA~AAAGAAGAT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron)........................................................
19080
AGTTCAGTCAAATCATCAAAACTTTTTTGiAAAAAACTG~ATAAGAAAG~CGAATGAAT~GAAAAGTTC~TGTTCGGTT~GGGAAGAGA~TATAAAATA~ATATATAAT~TACTTTCAT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..ragccg-augaa--gaaa--""~~"g"-~gg~~y.............................cuayy-y-ay
19200
AAGTAATCTi\TTAAATAATCGTAAACTGACCATTCTAAA~ACTATTCAA~ATGCAGAAG~GCGATATAA~GAAGCTACT~ATAAGCTTA~TCAAGCTCG~ACTCGGTTA~AACAAGCAA~ S N L L N N R K L T I L N T I Q D A E E R Y K E A T 0 K L N Q A R T R L Q
Q
A
19320
ACAAAAAGCkGATGATATCCGAATAAATGGATTATCTCAAGATTCAAAA~ACGCTACTA~ Q K A D D I R I N G L 5 Q M E K E K 0
A
Tl
K 19440
D
L
I
N
A
A
D
E
D
S
K
R
L
E
D
8
K
N
TCGTTTTGAAAAACAGAGAGCTATTGAACAAGTTCGTCAACAAGTTTCT~GTCTGGCTT~AGAACGAGC~TTAGAAACA~TAAAAAGTC~TTTAAATAG~GAATTACAT~TACGTATGA~ RFEKQRAIEQVRQQVSRLALERALETLKSRLNSELHLRMI
19560
TGATTATCAiATTGGCCTACTTAGAGCCAiGGAAAGTACkATATTCGAC~TGATGAAAT~ D Y H I G L L RAM E S T I E ===
19680 atpA>
MVNIRPDEI
AGCAGTATT2\TCCGTAAACkAATAGAACAATATAATCAAGAAGTTAAAAiTGTCAATATjGGAACAGTACTTCAAGTTGGAGATGGTATiGCACGTATTiATGGTCTTGkTAAAGTTATG S S I I R K Q I E Q Y N Q E V K IV N IG T V L Q V G D G I AR I Y G L D K
V
M
19800
GCAGGTGAAiTAGTTGAATiTGAAGATGGiACAGTAGGA~TTGCTTTAA~TTTGGAATC~GATAATGTT~GTGCTGTTT~AATGGGTGA~GGATTAACT~TACAAGAAG~TAGTTCTGT~ A G E L V E F E D G T V G I A L N L E S 0 N V GA V L M G D G L TiQ E G S
5
V
AAAGCAACAEGTAAAATTGETCAAATACC6GTTAGTGATGCTTATTTAG~CCGTGTTGT~AATGCATTA~CTCAACCGA~TGACGGAAA~GGTCAAATA~CAGCATCTG~ATTTAGACT~ K A T G K I A Q I P V S DA Y L G R V V N A L A Q P I D G KG Q I PAS E F
R
L
ATTGAATCTECAGCTCCAGbTATTATATC~AGACGTTCT~TTTATGAAC~TATGCAAAC~GGACTTATT~CTATTGACT~TATGATTCC~ATTGGACGT~GTCAGCGAG~ATTAATTAT~ I E SPA P G I IS R R S V Y E PM Q T G L I A I D SM I P I G R G Q R E L
I
I
19920
20040
20160
GGAGACAGAEAAACAGGAA6AACAGCTGTkGCTATTGAT~CTATTTTAA~TCAAAAAGG~CAAAATGTA~TATGTGTTT~TGTAGCTAT~GGTCAAAAA~CCTCTTCTG~TGCTCAAGT~ IGQKASSVAQV GDRQTGKTAVAIDTILNQKGQNVVCVYVA
20280
GTTAATACAiTTGAAGATCtTGGTGCATT~GAATATACA~TTGTTGTTG~TGAAACTGC~AATTCGCCT~CTACATTGC~ATATCTTGC~CCTTATACT~GAGCTGCTT~AGCTGAATA~ VNTFEDRGALEYTIVVAETANSPATLQYLAPYTGAALAEY
20400
TTTATGTATEGTAAGCAACkTACTCTTAT~ATTTATGAT~ATCTTTCTA~ACAAGCTCA~GCTTATAGA~AAATGTCAC~TTTATTAAG~AGACCACCA~GAAGGGAAG~TTATCCTGG~ FMYRKQHTLI IYODLSKQAQAYRQMSLLLRRPPGREAYPG
20520
GATGTTTTTiACTTACATTCTCGTCTTTTkGAAAGAGCA~CTAAATTAA~CTCTAACTT~GGTGAAGGT~GTATGACTG~TTTACCTAT~GTTGAAACC~AAGC~GGTG~TGTTTCAGC~ DVFYLHSRLLFRAAKLSSNLGEGSMTALPIVETQAGDVSA
20640
TATATTCCAI\CAAATGTTAiTSCTATTACkGATGGACAA~TTTTCTTAT~AGCTGACTT~TTTAATGCA~GAATTCGTC~AGCAATTAA~GTAGGTATT~CTGTATCAA~AGTTGGTTC~ YIPTNVISITDGQIFLSADLFNAGIRPAINVGISVSRVGS
20760
GCTGCACAAATTAAAGCTAiGAAACAAGT6GCTGGTAAA;ATTAGCTCA~TTTGCAGAA~TGGAAGCTT~TG~TCAATT~GCTTCTGAT~TTGATAAGG~TACTCAAAA~ AAQIKAMKQVAGKL KLELAQFAELEAFAQFASDLDKATQN
20880
CAATTAGCAkGAGGTCAAAEATTACGTGA6TTACTTAAACAACAGATAG~TACTATTTA~ACTGGCGTT~ACGGTTACT~AGATGTATT~ QLARGQRLRFLLKQSQSAPLSVEEQIATIYTGVNGYLDVL
21000
GAAACAGGAcAAGTTAAAARATTTTTAAT~CAATTACGTGCAGAACAAG~AGAAAATCT~ ETGQVKKFL IQLREYLVTNKPQFAEIIRSTKVFTEQAENL
21120
TTAAAGGAAGCTATCACTGkACATATCGA6CTTTTCTTAiTGAATAAAA~TTTATTTAG~ L K E A IT t )I I tl F L F Q E E K ===
+-------
<---
TTAAATAATiAcGTCCAAT6GGATTCGAA~CTATACTGG~GGTTTAGAA~ACCTCTGTC~TATCCTTTA~ACGATGGAC~CTTAAAAAA~AATTTGAAA~GATAAACTC~AAAAAATTT~
Fig. 2, cont.
21240 + <---
21360
Liverwort
Chloroplast
Genome. II
307
TCATTTCAA;\TTCTTTTTTnGAAAAGCGGi;TACGGGAAT~GAACCCGCA~CGTTAGCTT~GAAGGCTAA~GGTTATAGT~GACATTTAT~AAAAATAAA~AATGCCTCT~ATTCAAAAC~ --------------m----+ 3'-UCGCCCAIJGCCCUUAGCUUGGGCGUAGCAAUCGAACCUUCCGAUUCCya-y-yyauc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..yuugg
21480
GAACGTGAAkGTTTTCTTTEATTCGGCTCCTTTATAAAAAATTCAAAAA~TACATGTAT~TTTTGAATT~CTTACCGAT~TTTATAACA~CTATGTTTA~TTTTTTTTT~ c-uguacuu---aaag-aagua-gccgar...........................(lntron)........................................................
21600
CATTTTTATbTAAAAAACAETTTATTAGAiGATCCTTTTAGAAAGATAA~ATAAACAAG~TTAAATTAA~AACTTTCTA~TTACTTCGT~CTTTATTTC~ATTTGTTTT~AATCTTTAG~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron)........................................................
21720
AAATTTTTT;TCACCAAGTiTTTTGTAAT~TTAATAATT~TAGCAAACT~AAATTACTT~AATAATCAC~ATTTTTGCA~CAAAATTTT~AAGATTTAG~TATGATGAA~AAATTTATT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron) . . . .. . .. . . . .. . .. . . . .. . . . .. . . .. . . . .. . . .. . . .. . . . .. . . .. . . ..
21840
TTGATTTCTiTCCATAGATiTTTAGCAAA~CAAAAATTTiTATTTTGAA~TATTCCGCT~TTTTTTAAG~AATAATGCT~TTATTAATA~GACATTAAT~TTAATAATT~AAAAATTTA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..(1ntron)........................................................
21960
TAAGAAATAAAGTTATAAARAAGAAATTAiGATCCTTTA~CTTTTTTTT~TTTACGAAT~GCACTTTTA~CACTAAACT~TACCCGCTG~ATAATTATT~TATACTATA~ATTAGGTTT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..gygugAAAAUGGUGAUUUGAUAUGGGCG-5'
22080
TATCATATTiiTTTAACTCC6rCCCCGGAA~TAGATTATT~ACAAAATGC~TATTATTTC~GGAGATGGA~AATCAAAAA~ATTACAAAT~ACCTTTACG~GCTGCTAAT~AAGCAATAA~ +----------> <- ----------+ === L N G K R A A L LA
22200 I
V
TAATGGCCCiGATGCAACTkTTAAAGCCA~AACAGTAAG~TGTGCAATA~CTTCTAAAT~CATTCTGGT~TAACCTCTC~TTTTTTTTA~AATTCTAAA~TTTTTACAA~AATTTTTTT~ +-------L P G S A V I L A L V T L Q A I V CL14 M GGAG cORF33
22320
ATTAACGTA~ACTTAAAATkACAATAAAA6AAATCTATA~CGATAATAA~CTGAGATCA~TACAATAAT~ACAAGACGA~ATTCTTTAT~CAAAATTAA~TCCATACTA~CCAAAAAGA~ === F L L L F F R Y R Y Y V SIL V N N V L R Y E K N L I L V M cORF30
22440
AAATATAATiTTGGATTAAiTGCTAATTG~ATTATATAT~TTTTTTTAT~TTTTAACGA~TAAGGAGAA~AAATAATGA~CTCAATTTC~GATAGTCAA~TTATTGTAA~TCTTTTAAG~ TATATT> ORF32> TTGGAT> AGGAG MTSISDSOIIVILLS
22560
GTATTTATAnCTAGTATTTiAGCTTTAAG~CTAGGAAAA~AGTTATATC~ATAAATGAT~TAAATCATT~AATTTTCAA~TTTGAAAAA~AAAGTTTTG~CAAATTTAA~ATATTATTC~ f-----) <-----+ V f I T S I LA L R L G K E L Y Q ===
22680
AAATAATATAGAATATATAiATATATATAiATATATATA~ATATATATA~ATATATATA~TATAGTGTA~TCCATCAAA~TAAATAATT~GAATAAAAA~AATAAAAAA~ATTGAAATA~ +-----------------------)(------------------------+
22800
>
<----------+
TTGAAA>
ATATTTTTTiTATAGTATAnTGAATATAC~ATATGTTGT~TATATATAC~ATTTTTTGT~TTTATTCAA~AACAAAAAT~TTTTTATAT~TGGAGAGAT~GCCGAGTGG~CGAAAGCGG~ +---------> TATAAT> <---------+Ser-GCU> 5'-GGAGAGAUGGCCGAGUGGACGAAAGCGGC GGATTGCTAATCCGTTGTAEAAGCTTTTIGTACCGAGGG~TCGAATCCC~CTCTCTCCG~TTAAAAATT~AATGTTTTA~TCTTTACGT~CAGGATTAC~TCCTGGATC~TTAGATAAA~ GGAUUGCUAAUCCGUUGUACAAGCUUUUUGUACCGAGGGUUCGAAUCCCUCUC~JCUCCG-3' === E K R G P N R G P 0 N S
22920
23040 L
ATCCAAAAAEAAAAAGAGAAACAAAAAAA~TAACCACTG~ATAAACAAA~AGCTTAAGA~TAAGCATGA~ATAATCTCC~GGATCATTA~TAGAAATAG~ATAGAATAA~ATTCAAATT~ FGFVFLSVFFIVVTYVFLKLTLM GAGG
23160
23780
CTTCTATAA6TTAAAAAAAGTCTTATAGAiTTTGAATAA;GAAATTTAA~ATCCGCTTT~ f----------->
23400
<-----------~~
TAAACTTAAbTTACAATAA6AATTTTAAA~TGAGAATTT~TCGAAAACT~ACTGAAGCT~GCCATACAA~GGCTAAAAG~AAAAAAAAC~AAGGTATAA~TGGCATTAC~TCTACAATT~ -+ === R F S V S A Q W V F A L L F F F L P I I PM V
23520 D
V
I
GATCAAAAAiCGAATAAGCiTCTGGsnnTiTAGCAAAAG~GATACCATT~AAATAAAAC~CATTTTCTA~ATAAATATT~AACATAACT~AATTTTTAG~TCTCCAATA~ATATTATTA~ POFISYAEPLKAFTIGNLYFANELYINFM ClhcA GAGG AAAAGTGAAkTAGAAGAGA6AAAAACTTA;1IATATATAG~ATAAACTCT~AGGTACATC~TACTACAAG~TGTTTTAGC~TTAAAGAGG~TATAATTTT~TGATTTTTA~TAAATAGAT~
23640
23760 TT
GACAAAATG6AAAAAAAAAiGATTTACiAGAATCAAATAGATCCTTCCG~CCCAGACTT~ GACA> TTTACT> Gln-UUG> 5'-UGGGGCGUCGCCAAGUGGUAAGGCUGCAGGUUUUGGUCCUGUUAUUCGGAGGUUCGAAUCCUUCCGUCCCAG-3'
23880
TTATTAATTtTTTTATATT;TTCAATTTTiCTAAAAATThTATTTTTTA~TTATATTAT~ACAATAAGA~AAATATGGC~AGGGATTTT~
24000
CTATGTTATRAAATATAAAiATAGATTAT~GAAAGTAAG~AGATTTTAA~TTATGAAAT~AGCTTATTG~ATGTATGCT~GTCCTGCTC~TATTGGAAC~CTCCGAGTA~CTAGTTCTT~ ORF513> AGGAG MKLAYWMYAGPAHIGTLRVASSF
24120
24240
24360
24480
24600
24720
24840
Fig. 2, cont.
308
K. Umesonoet al.
TAATATATGGTCTCCTATTiTACTAGGAAAAAAATTTGA~TTTGAACCT~ATATTGACG~GCAAACTAG~TTTATTTCG~AAGCTGCTT~GTTTTCAAG~TCAATTGAT~GTCAAAATT~ N I W S P I L L G K K F D F E P V I D E Q T R F I S Q A A W F 8 R SID C Q
24960 N
L
AACAGGAAAAAAAGCTGTTbTTTTTGGTGj\TGCAACACA~GCTGCTTCA~TTACAAAAA~TCTTGCTTG~GAGATGGGA~TTCGTGTTA~TTGTA~TGG~ACTTATTGT~AACATGATG~ TGKKAVVFGDATHAASITKILACEMGIRVSCTGTVCKHDE
25080
AGAATGGTTiAGAGAACAAbTTCAAAATT~TTGTGATGAATACAGAAGT~GGGGACATG~TTGCTCGTA~AGAACCATC~GCTATTTTT~GTACTCAAA~ EWFREQVQNFCDEILITDDHTEVGDMIARIEPSAIFGTQM
25200
GGAACGTCAiATTGGTAAAEGTCTTGATA;TCCTGTGGAGTTATTTCC~CACCAGTTC~TATTCAAAA~TTT~~TTTA~GTTATAGAC~TTTTTTAGG~TATGAAGGT~CTAArCAAA~ ERHIGKRLDIPCGVISSPVHIQNFPLGYRPFLGVEGTNQI
25320
AGCAGATTTnGTTTATAATiCTTTTACTTiAGGGATGGA/GACCATCTT~TAGAAATTT~TGGTGGACA~GATACTAAA~AAGTTATTA~TAAATCTTT~TCTACAGAT~CAGATTTAA~ ADLVYNSFTLGMEDHLLEIFGGHDTKEVlTKSLSTDlDLT
25440
TTGGAATTCiGAAAGTCAAiTAGAGTTAAkTAAAATACCjAAATAACAT~ACAAAAATT~CTGTTGAGG~ WNSESQLELNKIPGFVRGKIKRNTEKFARQNNITKITVEV
25560
TATGTACGCkGCTAAAGAAGATTTAAGTGCATAAAAATT~TTAAGTCTC~GTTTTACTA~ATAATTATT~TTTTTATTT~CAAGCTTTT~TTACTTTAA~CAATTTGAA~TTGAAAGTG~ +-----------M Y A A K E D L S A ===
25680
ATAAATTTTiGGATTCAAA~AAATTAATTiTTTTTGAATECAAAAATTT~TGCACTTTC~ATTTATTAC~AAACTTAAG~AGATTTrAA~GAATCCATl~TTTTATTTT~TACAATTGT~ ____-------_-------_---><-----------------------------------+ ORFSO> AGGAG M N P F F V F V
Q
L
F
25800
TTTTTATTAiCCATTTTTTATCATTTTTTiATATATTTAjAACAAATAA~ATAAATTTT~CAAATATTT~TCCACTTTT~TCAAAATGG~TCAAAAAAT~ F V V P F F I I F L V I Y L V F F I P K TN N I N F 5 u I F P
K
K
==
25920 L
F
S
K
W
I
AAGTAAATTiTTTAAATTGAATCTAATTAAAATAGAATTGACAATACTA~TTCACATAC~TATAATTGT~TTGATGTTT~TT,4TTTTl~~ATCCATTTT~AAATTCAAA~TTTATTTTA~ TTGACA> TATAAT> Lys-UUb
26040 S'-G
GGTTGCTAAETCAATGGTAcAGTACTCGGCTTTTAAGTGCGACTTGGAT~TTTACACAT~TAGATGAAA~AAAAAATTC~TCCATACCG~TGACAAGGT~TGTAAAACT~CGACTAATC~ GGUUGCUAACUCAAUGGUAGAGUACUCGGCUUUUAAgugyg.....................(~ntron)..................................................
26160
TAAAAGGAAkCTTTACAGA6AAAATAGCA~GTCGTTTAT~TTTTTTCAT~CATTTTTTT~AACTAAATA~AATTTCAAT~A4aAATACA~TAAGTCAAT~AAAGTTAAT~GATAAAGCT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..(lntron)..................................................
26280
AATTGCTTAAATCATAGGTAAAAGAAGAA1CAGCTTCTG;TTCAAATTT~TGAAATATT~TCTTGATTT~TTTAAAAAA~TGTTAAAAG~TTTTTAAAC~GTCAATATA~GAGAAAAAT~
26400
CCTATTACTiTTTAGGTTTiTTACCAAAAATGAATCCTA~ACACTTTTT~AAATGTGTC~AGAAATAAC~AGCATGCTG~TTAATATAA~TTTTTATTA~GTCAGGAGA~CAATCAATT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (intim) . . . . .. . .. . . . .. . . . .. . . .. . . . .. . . . .. . . . .. . . . .. . .. . . . .
26520
AAAAAAATGiTTTTTTTTTiTTGTTTAAnkGnTTGTACT~TGTGTTTAT~CTTTTATAT~TTACAAAAT~ATTAAATGA~AACTATACT~AAAGTTTTG~CTACTTTTG~AAAACAAAC~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron) . . .. . . . .. . . .. . . . .. . . . .. . . . .. . . . .. . .. . . . .. . . . .. . .. .
26640
AACAAAAAT;TTTGGAATA6ACCATATAT~TATATATTTiTCAGG~AAATTTTTA~GGAATTGCA~ATAATCG~~~-
26160
. . . .. . . .. . . . . .. . .. . . . . .. . . .. . . .. .. . .. . . . . .. . . .. . . .. . . . .. . . . . .. (1ntron). . . . .. . . . .. . .. . . .. . . . .. . . . . .. . . .. . . . .. . .. ...*.....
mATAAATAAhACCAATTAT6AAAATTTTTiTTTTTTGGTi
. . .. . . . .. . . . .. . .. . . . .. . . . .. . . . .. .. . .. .. . .. . . . .. . . .. . . . .. . . . ...i1ntrcn~..................................................
TAAAAAATAhTTTTAATTTiTTGAAAATAkAncGncTTA~AAAAAAAAT~CGTCAATTT~ATTTTTTAT~T~~~T~TTC~AAAAAAaAA~ATTTTGTAA~TATCAAAGA~GTTATAGTT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..(lntro~................................................... TTATTTTGGATAATATTTT~ACATTTGAAiCnnAAAAAAT~TTTTATTAA~AACAAAATA~AAAGTTATC~ATC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron).
26880
,ATiCA+TCTATATTTkTTTTATG&ACATAGAAT~TATAATTCAA . . . . . . . . . ..ORF370i>....M E H R I Y N
ATTATTTTTiAGATATTACAATACCTTATTTTTTTCACCCATTAGAATC~TTCGTCGGC~TATT~~~~~~~T~~CA--~~ Y FL D I T I P Y F F H P E I LIR I F R R n I C
;:
P
F
i
27000 5
N
TACATTTTT+GCGAACTCTiTTATATAAAi H F L R T L L Y K
M
ATAAATGTTiAAATATTTT;AATATAGAA~ATTTTTTTT~TTTGAAAAA~AATCAGTTT~TTTGTTTTT~~T~GAATTT~?~TfiT~iAC~AATTTGAAT~TCTTTTAAA~GATATATGG~ f E F E Y L L N 0 K C L N I L N I E N F F Y L K K N Q F F C F L ~ : F ' :
27240 I
hi
AAAAATTTTkTAAATTTGAbTCAGTATTT~TTTGGAATT;TATTGATAA~ACAAATTCT~TAAAAAAAA~AAAACATAT~TTAAAAAAA~CTAAAAAAC~GATTGAAAA~AAAATTGTA~ K F Y K F E 8 V F F W N F I D K T N SIK Y Iv H Ii k Y: Y K P I E K KIV AAAAAATAAbTTCCATTCAiTATATCCGA~ATAAAAATA~TTTGATTAT~ACTTTAAAT~ATAGAAATA~TTTG~TTTT~GAAAATTGG~AAGATTTTT~TCTTATTTT~TGGCAAAAA~ K IS S I H Y I R Y K N N L I I T L N 0 R II : L I L E II : Y D F F L I F W
27120
E 27360 K 27480
Q
K
Y
ATTTTAATGiTTGGTTTAAnTCTTCTAGAiiTTTTAATT~~AAATTTTTA~AAAAACTCA~TTT~TTTTT~AGGTTATAT~rTT~~T~~TT~AAAGTCAAA~TATTTTAAT~CAAATTCAA~ F N V ii F K 5 S R I L I Q N F Y K N S F S F L G f :li k I i 5 0 I ILIQIQI
27600
TAATAAATTiATTAAGAAAiGTTAATTTA~~TAAAAAAG~ATTTTGTAG~ATTATTCCA~TAATACCTT~AATTAGACT~TT~f~~TAAA~AAAAATTTT~TGATGTTTT~GGACGTCCA~ I N L L RN V N L I K K E F C S I I P V I P L I PI I F v FYFCDVLGRPL
27720
TTTGTAAATiATCTTGGACIACATTATCAGATAATGAAA~TTTTGAACG~TTTGATCAA~TAATAAAAC~TATTTrTA~~~T~~~ATAf~T~GATGTATTA~TAAAAAAGG~TTATATCAA~ rj L I: I N K K G L C KLS W T T L SD N E I F E R F D Q I I K H Ii : ' i
Y
Q
L
TACAATATAiTTTCCGATTiTCTTGTGCTAAnnCATTAG~ATGTAAACA~AAAAGCACA~TACGCACTG~TTGGAAAAA~TAT~r,TT~A~ATTTATTAA~AAGTTCTAT~TTTTTTAAT~ Q V I F R F S CA K T LA C K H K 5 T I R T V b: Y y / (, (I II I L T S SIF
F
N
K
L
L
K
AAACAAAATiAATTTCTTT6AATTTTTCT~ATAAAAATC~TTACAAAAA~AATTTTTGG~ATTTAAATA~TATT~AA~r~AATTA~TTA~CA~ATTCAT~ACAAAAAAG~AAATTATTA~ T K L I S L N F S N K N P Y K K N F W Y L N I ID VII i L AH 5 L Q K 5
27840
27960
28080 K
AAGAATAAAkAAACATAGAEAAAGCCGTA~GCAGTAAAA~TTGCAAGTA~GGTTTGGGA~GAGATGATT~TATTTTTA~~GAAAAAAAA~TTATTTATC~ACTTCATCC~ACGAGTTCC~ E ===.. . . . . . . . . . ..ragccg-augaa--gaaa--uucaugu-cgguuy...........................................cuayyy-ayCCGACGAGUUCCG
28200
GGTTCGAGCECCGGGCAACECATTTTTTTiATTTTAATA~AATTTCTTG~TTTTTTAGG~AATATTTGA~TATTAGTTG~~ATAAT~AT~TGTTATGTG~AATACTATA~GTTAACAAG~ TACTAT> GGUUCGAGCCCCGGGCAACCCA-3' TTGACA;
28320
TTAAATATTiGGGAAACTCiTAATTATTT~AAAAACCAA~TTTTACTAT~ACCGCTACT~TAGAAAGAC~CGAAAGCGC~AGCATTTGG~GTCGCTTCT~CGATTGGGT~ACTAGCACT~ MTATLERRE 8 A i I I ! G RF C 0 WV T psbA>
Fig. 2, cont.
28440 ST
E
Liverwort
Chloroplast
Genome. II
309
AAAACCGTTiATACATTGGiTGGTTTGGTi;TnTTGATGA~TCCTACTTT~TTAACAGCA~CTTCAGTAT~CATTATTGC~TTTATTGCA~CTCCTCCTG~AGATATTGA~GGTATCCGT~ NRLYIGWFGVLMIPTLLTATSVFIIAFIAAPPVDIDGIRE
28560
AACCTGTATETGGTTCTCTiCTTTACGGA;\ATAACATCA~TTCTGGTGC~ATTATTCCT~CCTCTGCAG~TATCGGTTT~CACTTCTAC~CTATTTGGG~AGCTGCTTC~GTTGATGAA~ P V S G S L L Y G N N I I S G A I I P T SAA I G L H F Y P I W E A A S V D
X768@ E
W
GGTTATACAi\TGGTGGTCCiTACGAACTTATCGTTCTTC~TTTCTTACT~GGTGTAGCT~GCTACATGG~TCGTGAATG~GAACTTAGC~ATCGTTTAG~TATGCGTCC~TGGATTGCT~ LYNGGPYELIVLHFLLGVACYMGREWELSYRLGMRPWIAV
28800
TTGCATATTCAGCTCCAGTiGCTGCTGCTi\CTGCTGTTT~CTTGATCTA~CCTATTGGT~AAGGAAGTT~CTCAGACGG~ATGCCTTTA~GTATCTCTG~TACTTTCAA~TTCATGATT~ AYSAPVAAATAVFLIYPIGQGSFSDGMPLGISGTFNFMIV
28920
TATTCCAAGETGAACACAAtATCCTTATGCACCCATTCCATATGTTGGG~GTAGCTGGT~TATTCGGCG~TTCTCTATT~AGCGCTATG~ATGGTTCTT~GGTAACTTC~AGTTTAATC~ FQAEHNILMHPFHMLGVAGVFGGSLFSAMHGSLVTSSLIR
29040
GTGAAACTACTGAGAATGAETCTGCTAATGCAGGTTACA~GTTTGGTCA~GAAGAAGAA~CTTACAACA~CGTAGCTGC~CACGGTTAC~TTGGTAGAT~AATCTTCCA~TACGCTAGC~ ETTENESANAGYKFGQEEETYNIVAAHGYFGRLIFQYASF
29160
TTAACAACTCTCGTTCTTTACATTTCTTCjTGGCTGCTT~GCCAGTTGT~GGTATTTGG~TTACTGCTT~AGGTATCAG~ACTATGGCT~TCAACTTAA~TGGTTTTAA~TTTAACCAA~ NNSRSLHFFLAAWPVVGIWFTALGISTMAFNLNGFNFNQS
29280
CTGTTGTTG6CAGTCAAGG;CGTGTAATTAACACTTGGG~TGATATTAT~AACCGTGCT~ACCTTGGTA~GGAAGTTAT~CATGAACGT~ACGCTCACA~CTTCCCTCT~GACTTAGCT~ VVOSQGRVINTWADIINRANLGMEVMHERNAHNFPLDLAA
29400
CTGTTGAAGCTCCTGCTGT2AATGGTTAAiGTCCTATAA~AAGGTTACA~AAATAATAA~GAATATTTA~TATTTTAGT~AGAAATTAA~AAACTAAAA~TTTTTAAAG~AGGAAAAAA~ +- -------------) (-----------+ V E A P A V N G ===
29520
TAGAAAATAATGACCTTTGnGACTTGAAA~CTTAAAGGTETGGATTAAG~CAGTGGATT~TGGATCCTC~ +----------------, <----------------+ His-GUG>
29640 5'-GGCGGACGUAGCCAAGUGGAUUAAGGCAGUGGAUUGUGGAUCCUCu
ACGCGCGGGiTCAATTCCCtTCGTTCGCCEAnTAACAATiCTTTGTTTA~GTCATTATA~TGAAGATGA~ATAAAATAG~CTATTTTTC~ +--ACGCGCGGGUUCAAUUCCCGUCGUUCGCC-3'
29760 <----------
>
---++--------
TTTTTTAATiTTTTATAAA;TATCTACTTiAATATTAAT~ATTAAAAAA~AGAAAAATA~GAAATAAAA~TTTAGATAT~TTGTATTTT~ATATTTTTA~AAAAAAAAT~TATTTATTA~ <---------------+ >
29880
TAAAGTCTT6TGTAACTGTiGTAAAAAAA;GAAACAAAA~TTACCAAAA~AAAAATCTT~ATATAAAAA~TTAGATTTA~ATGAAATAC~AAAAATTCA~AATTTAGGA~ATCCATACA~ ORF2136> MKQKLPKKKSLYKNLDLDEIQKIQNLGNPYT
30000
AAAATGGAGiTTAATTAGAiTGTTAATTGCAATATTTTCEATTTTAGTA~TTTATTGGA~TTTCAAATT~TTACTTCAT~ATTTTTTCG~GATTTATAT~ATTCAAAAA~ KWSLIRLLIAIFSNKRNFSTLLDFQILTSLFFRDLYNSKK
30120
AAAAAAAAAGTTTTTACTTiATATTTTAG~TTTTTTAAC~TTACCTTTT~TTGTCTATA~ATTAATTGA~AAAAGTATT~TTGAACAAC~AAATTTTGA~TTTCTAAAA~TTCAAAAAC~ KKKFLLNILVFLTLPFFVYILIOKSIVEQQNFDFLKIQKQ
30240
AAATTTTATiGAAAAAAATnATAAAAGTA~TTTAAAAAAiAAAAAAAAA~GGTATAAAA~ N F I E K N N K 5 I L K N N F Y F L
30360 N
T
K
F
D
I
F
L
H
N
F
F
S
L
K
K
K
K
W
Y
K
N
TTCACTGTTl\AATTTAATTGnTTTTCGTTEGnTTTTAAA~AAAAAAGAA~TTTTAAATC~TC,4TTGGTG~AAATTTTTG~TTTTAGAAC~AATTCAATC~AATTGGAAA~TATCCGAAG~ SLLNLIDFRSILKKKEILNLHWWKFLVLEQIQSNWKISEE
30480
ATCTTTGTCiGAACTCAAAkTTGTATTAGAACAAAAAAA~ATAGATGAA~TAAAACATT~TTTTGAATT~TATATTAAT~AAAAAATAT~TCCTAATAA~AATTGGGAA~ACTATTTTT~ SLSELKIVLEQKNIDELKHFFEFYINQKIYPNNNWEYYFY
30600
TTCAATTTTiATAAACCAAiTAAAAATTGATATAAAAAAiATAAAAATA~TATTGGTTT~GAAGTTTTT~TGGCTTTTT~TGAAAAACT~TTATTTGAA~TTGAATTTT~ SIFINQLKIDIKNSKYNKNSIGFEVFLAFCEKLLFEVEFL
30720
ATCTAAGCCnAACAATAATkATTTACAAA~GAAACTAAA~TGTCTGGAA~ACTTTAGTT~TTTAGATAT~TTTTGCATA~TAAATAAAA~ACTTCCATG~GTTAACAAA~AAATATTTA~ SKPNNNNLQMKLNCLENFSFLDIFCLNKKLPWVNKKIFK
30840
AAATTTACAi\AATTTTAATEnnTCAGATAi\AAAACTTAT~GAATCGTTT~TTTTATTAA~AATAAAAGG~AATCTATAT~TTAAAAATT~TATTGAATT~GTTACTTGG~AATCATAT4~ N L Q N F N E S D K K L I E 5 F F L L K I KG N L Y F K N Y I E F VT W Q S
Y
K
30960
AAAGGATTGiTTGGATTTTAATAAGTTTAATGAATTAAAiTAAATTTTC~AAATATATT~TATATGAAG~ K DC L D F N K F N E L N N 5 E I Y I K I E
E
'G
31080 E
L
F
S
D
Y
I
Y
K
F
S
K
Y
I
L
Y
AAAAAAATCcAAAACCATAiTAAAACAAT~TTTTAATAA~AATATTTAT~ATAAAAAAT~GAATTCTAT~TTTAATTTC~ATACTATTT~TTATTTTGA~TCGAATAAT~TACTTTTT,~~ KKSKTIIKQSFNNNIYYKKLNSIFNFNTIFYFDSNNLLFD
31200
TTGGTTAAAkAAAAATTATiATATCAATAATAAACCATT~CTAAAATCA~TTTTAATTT~CTCAAGTAT~TCAAATCAG~TTATTTTAT~TTTTAAACA~AAAAATTCC~AATCTTTT4~ WLKKNYYINNKPFLKSFLIYSSISNQFILFFKQKNSKSFV
31320
TAAAAATTTnGTAAAAAAAiATAGTAAAG~TGTTATAAC~AATGTTTTT~CAAAAGAAA~TAAAATAGA~ATAAATAAC~TTTCAAAAT~CATTTATT.4~GCTTTTTTT~AGATATTAr~ KNLVKKNSKDVITNVFSKENKIEINNFSKSIYYAFFEILS
31440
AATAAATGA6ATTGATAATAAATTTGTTA~TAATAAGATiAAAAGATTT~ATTTAAACA~AATAAAAAG~TCTGATAAT~TTCGATTTA~ INEIDNKFVINKISLKNINKKKQKRFYLNKIKSSDNFRFI
31560
TAATTTATGEAAAATAAAA;\nTinTTCATEACAACAATT~GTATCAAAT~ATTCTTTTT~ATTAAATCC~GCATTTGAA~TACTTCAAC~AAATTATTA~TTGAAGAAA~AAAATATTI~ NLWKIKNYSSOQFVSNNSFLLNPAFEILQQNYYLKKKNI-
31680
GTTTTTTAAkAAACTAAACbAGGTATTTTCAAATTTTTT~TATTTTCAA~ATTACAAGT~TAAAAAATT~AATATTTTT~TGAAATTTG~TAGTTTAGA~AAAATTCTA~AAAAAAGAA~ FFKKLNEVFSNFFYFQYYKCKKLNIFLKFASLEKILKKRN
31800
TAAAAAATTiACTATATCAiTAAAACTTTiTnncnnnTT~TATAAAAAC~AATTAAATG~AAATGGTGA~TATAAAATT~AAAGTCAAA~TTTACAAAA~GAAAAAGAA~TAAACAAAA~ KKFTISIKLFKKFYKNKLNENGEYKIESQILQNEKELNK<
31920
AAGAAAAAA6AATTTTCAAiTTAATCCAAkcATAAAAAT~TTAAGTTTT~ATAATTCAA~TAAAAAAAA~ATTTATTTA~AAAATAAAT~TTTTTTTAA~AAAAACTTA~TAAATAACA~ R K K N F Q F N P N I K I L S F Y N S S K K N I Y L Q N K Y F F N K N L I
Fig. 2,
cont.
32040 N
N
K
310
K. Umesonoet al.
TACTTTTTTiTTTAATAAAkAATCTTTTA6TATAATTACAAATTGTTTT~CTCTTTTTT~ TFFFNKKSFNIITVIFDKLKKIQLNFQEIQKILNCFSLFF
32280
TAATTCTAAAAATATAAAAkAAACTAAAAiTTTTAAAAA~TCTTATTTT~TTAATGAAA~TTTAACAAC~ACTTTTTCT~TTAATGATA~AGAATTTAA~ATTTTTTTT~TAGAGTTAT~ NSKNIKKTKIFKNSYFINENLTTTFSFNDKEFNOKEFNIFFLELF
32400
TATTTCTGAAATTAACAATGATTTTTTAA;GAGATTTTTiTCCTATAGA~AATAGGCAA~TATTACAAA~ ISEINNDFLMRFFKKYLYYRIYKDKEILFNPIENRQLLON
32520
TTTTTTTGAAAAAACAAAA;TTTTAACTTiTATAGATTT~TTACAGGAT~CTGAATTAA~TTATAATAA~CGATTTATT~TTCATTTAG~AAAAAAAAC~ATTAAAAAT~ATAATTTAT~ FFEKTKILTFIDFLQDPELNYNNRFIFHLEKKTIKNNNLL
32640
ATATTTACGATTATTGAAAnTTTTTCTAAAAGATAAAAGACTTATTTAT~AAATCTCAA~TATCTAATG~ YLRLLKIFLKDKRNFLLINEIKSFIEKKNNLFIKSQLSNV
32760
TTTATTAGT;\AAAAATTCA;nTAAATTTT~TGATAATAT~TTTAATTTT~ATTTTTTGA~ACAAAAAGA~AAAAACATT~AAATTATTT~AAATAACCA~AATTATTTT~AAAAAAGTT~ LLVKNSYKFFDNIFNFHFLKQKEKNIEIILNNQNYFEKSL
32880
ATTAAAAAA;\ACTTATTTA;\AAAATTTAA~CTTAAATAA~AGTTATAGT~AATTTTCTT~TAAAATATT~ATTTTTCAA~TATTAAACA~TTTAAATAA~AATAATTAC~AAACTTTTC~ LKKTYLKNLNLNNSYSKFSYKIFIFQLLNILNKNNYKTFQ
33000
GTGGATTAGiGAACTTATTiTTTATTCAAL\AAATTTAAA~TATAAAATT~AAAACAAAA~AGAAAAAAA~AATTATTGT~ATAATAAAA~TATTTCTTA~AAAAAAAAG~AAATAAAAA~ WISELIFYSKNLNYKIQNKIEKNNYCYNKNISYKKKKIKT
33120
AGTTAATTTiTTTGAAAAAnATAATTTAT~TCAGACTAA~AATTCATGG~TTTTTACTT~GGAATGGTG~GAATATAAT~CATATATAT~ATTACAAAT~ATTCAAGAA~CTTTTTTTC~ VNFFEKNNLFQTNNSWFFTLEWWEYNTYILLQIIQETFFQ
33240
AATTACCGAiGTTTTGGAA;ATTTCAAAAAAAAAAAAATRAACCTTATC~TTTCATAAT~TCAAATTGA~ ITDVLEYFKKKKIIEKNLKFFLKSKKISLKTLSFHNFKLK
33360
ATGGAATTTnCGTTTTTTTkATGAAATTAj\TTATAAAAA~AATTATTTA~TAAATTTTT~ATGGTCTGA~TTTAATTTA~TAAATAATT~TAATAATTT~TATTGGGTT~TTTTTAGTT~ WNLRFFNEINYKKNYLLNFLWSDFNLINNCNNLYWVIFSL
33480
AGTTATATTiATTTTTTTA;ATTATCAAAL\AATTTTTTC~ATTATTATA~GTTCTGATT~TTTTCATTT~TGGAAAAAT~TTGAAATAA~TCAATATTT~ACAGATCGT~CGCGAAGTC~ VIFIFLYYQKIFSIIIGSDCFHLWKNFEIIQYLTDRSRSL
33600
TTATTTTACnAAATTAACTEGTCGTAATAkAACAGCCTTnAAAATTTAT~AAGTTATTT~TTTCAAAAT~TAACACATT~TATTACAAA~ATTAAATTT~ATTTATTAA~ YFTKLTRRNKTALNKTENLLSYFFQNLTHYITNIKFYLLT
33720
AAAAAAAAAiTTAAAAAAAiGGTTAATTA~TAATAAAAC~TTAGATCTA~CTCGTAGAA~ACGTAAATT~TTAGTTCAA~CTTTAATTA~ACATAACAA~ATTCAAAAT~ATGGATTTG~ KKNLKKWLINNKTLDLSRRKRKLLVQSLITHNKIQNYGFE
33840
ATTAAATTCEAATAAACAAiTTTTTACTTCTTATTTTGG~TATCAGATA~CAAATCAAC~AGGACTTTT~TATTTTCAA~ATTTAGCTC~ATTTTTTCA~AAAAATTTA~TTAATAATT~ LNSNKQFFTSYFGYQITNQQGLLYFQYLAQFFQKNLINNS
33960
ATTAGATTTAGCCAATAAAiGGATTGTTT~TTCTTTTTG~CATAAAATT~TTTCTTCAC~AAAATTACG~CAAACAAAT~ATATTGAAT~AGGGTTTCA~AATATACCC~TTCCATTGC~ LDLANKWIVFSFWHKIFSSQKLRQTNNIELGFQNIPVPLQ
34080
ATTTGGATTATCTTATTCAAAAGGAATTTiATTAATAGGiTTATTCAAA~TTTCGATAA~ FGLSYSKGILLIGPIETGRSYLIKNLAAESYVPLFKISIN
34200
CAAACTATTi\TATAATAAA~CTGATGTTAinACAGAAAG~TGGATGAAC~TTTTAATTG~AAGTTTACG~AGGCTAAAT~TTACTTTAG~TTTTGCCAA~AAAATGTCA~CTTGTATAA~ KLLYNKPDVITESWMNILIESLRRLNLTLDFAKKMSPCII
34320
ATGGATTCAiiAATATTCATEAATTAAATGiAAATCGTTT~ACGCAAAAT~TAGAATCTG~TCCAACCTT~TTGCTTGGT~TTTTGTTAA~ATATTTTCA~ACAGATTTT~GTAAAACTA~ WIQNIHQLNVNRLTQNVESDPTFLLGILLKYFQTDFSKT‘K
34440
AAAAAATAAiATAATTGTTRTTGGGTCAA~TCATCTCCC~AAAAAAGTG~ATCCAGCTT~AATTTCTCC~AATAGATTA~ATAAAATAA~TAATGTTCG~TTATTTAAT~TTTCTCAAA~ KNNIIVIGSTHLPKKVDPALlSPNRLDKIINVRLDKIINVRLFNISQR
34560
AAAAAAACA6TTTCCCCTTETTTTAAAAAAAAAGAATTTiCAATTAAAA~AAAATCTGT~TTTTTTAAA~GAGTTTGGA~CACGAACTA~GGGCTATAA~TTAAGAGAT~TATCAGCAT~ KKQFPLLLKKKNFQLKENLFFLNEFGSRTMGYNLRDLSAL
34680
GACAAATGAAGTTTTATTA;TAAGTATTACAAAAAATAG~TCATTTATT~ATACTGATA~TTTAAAATT~GCTTTTCAT~GACAAATTT~TGGTTTAAC~TATACAAAT~ATAAATTAA~ TNEVLLISITKNRSFIDTDTLKLAFHRQIFGLTYTNNKLN
34800
TTTTGATAGi\ATATTTAAAkTAGTTATTTATAAAGTAGG~AAAACTATT~TACAAAATA~TTTAATTAA~AGCTCTAGT~TGAATTTGT~AAATATTGG~AATTTTTTA~GGAAAAAAA~ FDRIFKIVIYKVGKTIIQNILIKSSSMNLLNIGNFLWKKN
34920
TTTTTATTAETTATCTAAAiGGTATTTAG~ACCCTCTnTiTTTAGCTGG~ACAGCTGCT~GAGATTCAT~ FYYLSKWYLEPSIDESIIKELTILTtiILACLAGTAARDSW
35040
GTTTTTATTkGAAAAAAAAiCAGAAAGTTiACTTCCTATiGATAAGTTAGTTGAAAATGL\TTTTACTTTkGCCTTTAGTATTTTAGAAAGTTTTTTTTC~GAATTTCCAjGGTTAGAAAi FLLEKKAESLLPIDKLVENDFTLAFSILESFFSEFPWLEI
35160
ATGTCAAACiAATGTTGTTAATTCTAAAAj\AAATAAAAT~ATTGAATTT~CAACAAAAA~CTCTATGAA~ATTATGCAA~ATGGAATTT~TGCTATAGC~AATAAAAAA~TCATTTACA~ CQTNVVNSKKNKIIEFSTKNSMNIMQNGIFAIANKKFIYT
35280
TCAAAATCAiTTACAATAT6AATCGTCnCiTTCTCAACA~ATAAGTTTT~ATAAAAAAA~AAATTATGA~TTTAAAAAT~CTTCTTGGT~ACCTCGATT~TGGCGTTTG~GTTTTTTTC~ QNHLQYKSSLSQQISFNKKKNYEFKNTSWSPRFWRLSFFR
35400
TAGTAATTTATTTGATTGGATTAAAAGACCAAATGATTT~GAATTTTCT~ATAAATTTG~ATTTACAAA~AAAAAAGAA~ATCTTTTTT~TGCTAATTT~CAAAAAAAA~ATAATTATG~ SNLFDWIKRPNDFEFSYKFGFTKKKEYLFSANLQKKNNYG
35520
ACAATTTATAGAAAAGAAAiAAAAAGAACAACTTCTTTA~GAAAGAATT~TACCGAGAA~ACGAAGAAG~AATGTACAA~AGTTAGAAT~TCAATTTGA~GAAATATTA~TAGAAGAAC~ QFIEKKKKEQLLYERILPRIRRRNVQELESQFEEILLEEQ
35640
Fig. 2, cont.
Liverwort
Chloroplast
Genome. II
311
ATTTGAAAT;TTAGGTTTTiTTCGATTATCAGAACAATA~CCAATGGAA~ATCAATTAT~TAATAAGCC~AGATTATTT~TTGGAAAAC~AATTCTTTG~GATCCAATA~GTTTATTTT~ FEILGFFRLSEQYP.MEYQLYNKPRLFIGKRILWDPIGLFF
35760
TCAAATTCGiCATTTTGTGiTTTCACGTCi;AGAATTTTTiGTAGATGAAi;AAATGTTAAGAAGACTTTA;GTTACTTATGGAGCTCGAAGAGAAAGAGA6AGATCTCGT;CAAGTCAAA6 QIRHFVFSRREFFVDEEMLRRLYVTYGARRERERSRSSQK
35880
AATTAAACA;\TTTTTTCTTiGTCGTGGATATAATAAAGA~CTAATTAGT~AATTATCTA~TCGTTGGTG~AGTCAATTA~CTATTAATG~AAAAAAAAA~ATTGATACA~TAAAACGTA~ IKQFFLCRGYNKDLISKLSIRWWSQLPINEKKNIDTLKRI
36000
TGAACATATiAGTATTCAAiTAAAACGCCCTCAAATTTTiGTTTTTTCG~TTTGAGTTA~TAACTCATC~ EHISIQLKRPQIFTPVYLYQRWLIENSPEKFFRFELLTHR
36120
CAAAAAATGECTTAAAATAAATAGTTTAT~ATTAAATGA~TCTTTTATT~ACACAACAC~TTTAGAAAT~TATGAATAT~TATTGCATT~TTTTATTGC~AATAAAAAA~TACTAAATC~ KKWLKINSLLLNDSFIYTTLLEIYEYLLHFFIANKKLLNQ
36240
AATGACAAAhATTTTATTAkAAAAAGGGT~GCTTTTTGA~AATGAAATA~AAACTATTA~TAATGAAAC~AGACAATAA~CAAATTAGT~TTGAAGATA~ATAGGAATA~ATATAGATA~ +---------------------> MT K I L L K KG W L F EN E I E T I I N E T R Q ===
36360 <---
ATTCCTATAiATCTTCAAAiTTTAATTAACTTAGTTTTT~TTTTTAGAG~CGGGATTGA~GGGGCTCGA~CCCGCAACT~CCGTCTTGA~AGGGCGGTA~TCTAACCAA~TGAACTACA~ + 3'-GCCCUAACUGCCCCGAGCUUGGGCGUUGAAGGCAGAACUGUCCCGCCAUGAGAUUGGUUAACUUGAUGUU
36480
TCCCATTATATAAATGCAT;TTTTTTATAiTGTCAAAAA~GTTTGACAT~ATAAACGGC~ACTTTTTTC~ATAATTAAT~TTGGGTCGA~CTGGATTTG~ACCAGCGTA~GCATTGCCA~ AGGG-5'
36600
CGGATTTACAGTCCGTCCCcATTAACCAC~CGAGCATCG~CCCAGATAG~AAAATCTTT~TCTATCTGA~AAAAAGTAT~AAATATTAA~TAATTAATT~AAGGACTTT~TTATTACCC~ GCCUAAAUGUCAGGCAGGGGUAAUUGGUGAGCUCGUAGCUGGG-5'
36720
CAGGGGAATiCGAATCCCCETCGCCTCCTiGAAAGAGAG~TGTCCTAGG~CACTAGACG~TGGGGGCTT~ATCCCTTAA~CTTATCTTA~TCAATATAT~ATTTCCTGT~AATAGTTTT~ GUCCCCUUAAGCUUAGGGGCAGCGGAGGAACUUUCUCUCCACAGGAUCCGGUGAUCUGCUACCCCCG-5'
36840
f--------->
<-..-----+
+----
TTAGAAAAAAAATTGTTTCiAAAAAATCA~TAATTTATTbCAAATTGAA~ATTACTCAT~AATATTTTT~TGTTATAAA~CATATTGAA~TTAAATAGA~AAATTAAAC~TAAATAAAA~ (---------+ TTGAAT> TGTTAT> ATTTTTTATiTAAATATTG;TTAAAATAGiTTTAAGATAGTTTTAGATC~AGTTAGTTT~ mbpX> AGGAG MS I
36960
37080 L
I
Y
K
V
SK
S
L
G
N
L
K
I
L
D
R
V
S
L
TATGTACCTAAATTTTCTTiAATAGCACT~TTAGGTCCT~CTGGTTCGG~AAAATCCAG~TTATTACGA~TTATTGCAG~TCTTGACAA~TGTGATTAT~GAAATATAT~GTTACATGG~ YVPKFSLIALLGPSGSGKSSLLRIIAGLDNCDYGNIWLHG
37200
ATAGATGTT6CTAATATTTCTACACAATAiAGAAGAATG~GTTTTGTTT~TCAACATTA~GCACTTTTT~AACATATGA~TGTTTATGA~AATATTTCA~TTGGACTTC~ATTACGAGG~ IDVTNISTQYRRMSFVFQHYALFKHMTVYENISFGLRLRG
37320
TTTTCTGCTcAAAAAATAAECAATAAGGTEAnTGATTTA~TAAATTGTT~ACGAATTGC~GATATTTCT~TTGAATATC~TGCCCAACT~TCAGGAGGA~AGAAACAAC~TGTTGCTCT~ F S A Q K I TN I( V N D L L NC L R I AD I S F E Y PA Q L S G G Q K Q R V
A
i
37440
GCACGAAGTiTAGCAATTCnACCAGATTTiCTTTTATTAtTGGTTAAAA~GTTATTTGC~AGATAACAA~ AR S LA IQ P 0 F L L L D E P F GA L D G
D
N
K
ATTACAACAATTATGGTTAtACATGATCA~AAAGAAGCG~TTTCTATGG~TGATGAAAT~GTGATTTTA~AAGAAGGTC~TCTGTTACA~CAAGGAAAA~CTAAAAATT~ATATGACCA~ I T T I M V T H D 0 K E A I S M A D E I V I L K E G R L L Q 0 G K P K N L Y
D
3
37560 E
L
R
R
H
L
S
K
W
L
K
R
Y
L
Q
37680
CCAATTAATiTTTTTGTTGGTATTTTTTkGGATTACTTkAAAATTTAA~AAAATTTGC~ PINFFVGIFLGLLIEIPKLNESITLKNIPSKTPQNLKKFA
37800
TTTGATCCTi\TATGGGTGA6AATATTTGCiAATCGATCA~TAAACAAAT~TCGATTTTT~TTAAGACCT~ATGAATTTT~TATAAAATC~GAAATGGAT~TGGAAGCAA~ACCAGTTCA~ F D P I w V K I F AN R S I N K Y R F F L R P Y E F C I K S EM D L EAT P
37920 V
Q
ATTAAAACAnTAATTTATAlAAGAACTTTiGTTCAGTTG~ATCTTTTTG~AACTTCTTT~TTATGGAAT~TAACAATTC~AATAGGTTA~CAATCTTTC~GAAATTTAC~TATTGAATC~ IKTIIYKRTFVQLDLFVTSFLWNLTIPIGYQSFRNLHIES TTTATGCAA6CACTTTATAiAAAACCTAG~CTTCAAGTT~TTTTAAGAG~ATATCCTAT~TTAACAAAT~TCAAAAAAA~TTAATAATT~GTTTTTTTA~TACTAAATT~TTATTATAA~ FM Q T L Y I K P R L Q V F L RAY P I L TN I K K N ====== +------)
38040
38160 <-----
TTATCTTTA;ATATATATA;ATATTTATAiAATATATAAkTATGTTATA~ATCTTATTG~CAACTTAGT~ +--------, <-----------+ +
38280
TTTTTAATTliAGTTGACAAiTTTGTAACTiiTGTTACAC~ATATGTTGT~TTATTTATA~AAAAAAAAA~ATAGTTTCT~CTTATTGCC~TTTTAACTC~GTGGTAGAG~AACGCCATG~ TTGACA> CACAAT> Thr-GGU> 5'-GCCCUUUUAACUCAGUGGUAGAGUAACGCCAUGG
38400
TAGGGCGTAnGTCATCGGTiCAAATCTGAiAAAGGGCTT~TTTTTACAA~GTCAAATAA~GTTCACATT~TATTTAACG~AAAATTGAA~TTTATAACT~AATTATATA~TCTAATACT~ UAGGGCGUAAGUCAUCGGUUCAAAllCllGAllAAAGGGCLl-3'
38520
AAGTTTTTTCTAATTTACAbAATTAAAAA;TTCTAGTAT~GGTTAAAAA~ATTTTGGAA~TTTCAACTT~CTATTTACT~TGATAAAAT~AATTAGTAA~TTTTGAAAG~TAATAGATT~ TTGAAA>
38640
TCTTTTATAcAATATAAAA;AACTATTAG~TAATGAATT~AAAATAAAA~AAGACATAA~AAATTATTA~AAAGAAAAG~AAAGTTTAC~TACTTTTTA~TTTTAATCG~TTTTAGAAA~ TACAAT>
38760
TATATTAAT6AAAAAAATTRAAATTGAAAi;GGATCATAA~AAAATTTTT~TAGGTACTT~AAAATGGTT~AAGTAACTT~ATAGGAGAA~CATTATGAC~ATAGCCATT~GAAAGTCTT~ psbD> AGGAG MTIAIGKSS
38880
CAAAGAACCi\AAAGGTTTAiTTGATAGCA;ccnrcnCTG~CTAAGGAGA~ACCGTTTTG~ATTTGTAGG~TGGTCTGGT~TATTGCTTT~TCCTTGTGC~TATTTTGCT~TAGGTGGAT~ K E P K G L f 0 S M D D W L R R D R F V F V G W S G L L L F P C A Y F A L G GTTTACAGGiACAACCTTTETAACTTCATi;GTATACTCA~GGATTAGCT~GTTCTTATT~AGAAGGTTG~AATTTTTTA~CTGCCGCTG~TTCTACCCC~GCAAATAGT~TAGCTCATT~ F T G T T F V T S w Y T H G L A S S Y L E G C N F L T A A V ST P AN S L AH
Fig. 2, cont.
39000 G
W 39120 S
312
K. Umesm
et al.
TTTACTATTATTATGGGGACCTGAAGCACAAGGTGATTTiACTCGTTGG~GTCAATTAG~CGGTTTATG~ACTTTTGTA~CTCTTCATG~TGCATTTGG~TTAATAGGC~TTATGTT~~ LLLLWGPEAQGDFTRWCQLGGLWTFVALHGAFGLIGFMLR
39240
ACAATTTGAi\CTTGCTCGAiCTGTTCAATiACGTCCTTAiAATGCAATT~CGTTTTCTG~ACCTATTGCiGTTTTTGTAiCTGTTTTTC~TATTTATCCiTTAGGACAAiCAGGTTGGTi QFELARSVQLRPYNAIAFSGPIAVFVSVFLIYPLGQSGWF
39360
TTTTGCACCiAGTTTTGGTI;TAGCTGCAAiTTTTAGATTiATTCTTTTTiTTCAAGGCTiTCATAACTG~ACTTTAAAC~CATTTCATA~GATGGGTGT~GCTGGAGTTiTAGGGGCTG~ FAPSFGVAAIFRFILFFQGFHNWTLNPFHMMGVAGVLGAA
39480
39600
39720
TTTAGCTTTnAATTTACGTGCCTACGATT~TGTTTCCCAATCCAGAATTiGAAACTTTT;ATACAAAAA6TATTTTATTi\AATGAAGGT~TTAGAGCTT~ LALNLRAYDFVSQEIRAAEDPEFETFYTKNILLNEGIRAW
39840
GATGGCAGCiCAAGATCAGiCTCATGAAA~TCTTGTATT~CCAGAGGAG~TTCTACCCC~TGGAAACGCiCTTTAATGG~ACTTTAGCT~TAGGTGGTC~TGATCAAGA~ACCACAGGTi MA A Q D Q P H E N L V F P E E V L P R G N A L === MKILYSQRRFYP(M)ETLFNGTLALGGRDQETTGF W'
39960
TTGCTTGGTGGGCAGGTnniGCTAGACTTATTAATTTAT~TGGAAAGTT~CTTGGAGCT~ATGTAGCTC~TGCTGGATT~ATTGTTTTT~GGGCTGGAG~AATGAATTT~TTTGAAGTT~ AWWAGNARLINLSGKLLGAHVAHAGLIVFWAGAMNLFEVA
40080
CTCATTTTG;ACCAGAAnniCCTATGTAT~AACAAGGATiAATACTACTiCCTCATTTA~CTACTTTAG~TTGGGGAGT~GGACCTGGTGGAGAAATTGiTGATACTTTiCCATATTTTG H F V P E K PM Y E QG L I L L P H LA T L G :21 G V G P G GE I V DT F P
40200 V
F
V
TGTCTGGAGiTCTTCATTTAATTTCTTCT;;CAGTTTTAG~TTTTGGTGGiATTTATCAT~CACTTATTG~ACCAGAAACiTTAGAAGAA~CTTTTCCGTiTTTTGGTTA~GTTTGGAAA~ SGVLHLISSAVLGFGGIYHALIGPETLEESFPFFGYVWKD
40320
40440
40560
TCATTGGCGGGCATGTATGiTTAGGTTCC;\TTTGTATTTiTGGGGGAATCTGGCATATTiTAACAAAACETTTTGCATGi;GCTCGTCGT;;CTTTGGTATi;GTCTGGGGA~GCTTACTTAi 1 G G H V W L G 5 I C I F G G I W H I L T K P FAN AR R A L V W S G E A Y
L
40680
CTTATAGTTiAGGTGCTATiGCTGTTTTTGGTTTTATTGCATAATACAGCTTATCCGAGiGAATTTTATGGTCCTACCGGTCCAGAAGCATCTCAAGCTC Y S L GA I A V F G F I AC C F VW F N N T A 'I P S E F Y G P
AQ
S 40800
T
G
P
E
A
S
Q
AAGCTTTTACTTTTTTAGTiAGAGATCAAEGTCTTGGAG~TAATGTAGG~TCAGCTCAA~GACCTACTG~ATTAGGGAA~TATATTATG~GTTCGCCCA~TGGAGA~T~ATTTTTGGT~ AFT F L V R DQ R L G A N V G S AQ G P TG L G K Y I M R S P TG E I I F
40920 G
G 41040
CAGAATACAiGACTCATGCiCCATTAGGAiCATTAAATTiCAACATCiCATTTCGTTi E Y M T H A P L G S L N S V G G V
41160 A
T
E
I
HA
V
:;
Y
V
5
P
R
S
W
L
A
T
S
H
F
V
L
TAGGTTTCTiTTTCTTTGTkGGCACTTA~GGCATGCTG~AAGAGCACG~GCTGCTGCA~CTGGTTTTG~AAAAGGAAT;GATCGTGAT;TTGAACCAG;TCTTTCTAT;;ACACCTCTT~ ID E D F E P V L S M T GFFFFVGHLWHAGRARAAAAGFEKG
P
L
N
41280
41400
ATTAATTAAiTAATTAATTnATTATTAACiAAAnnnncniTTTTTiTTTTACTTGiTTTTTTAGTiAATAATTAA~TTGCTAATTiAACTATTTA~TTTTTGAAAi ------+ _== +--------------------------~~~~~~~~~~~---> <---------------------------------TAAAAAAGGiGAGAGAGGGnTTTGAACCCiCGATAATCTiAAAAACTAT~TCGGTTTTC~AGACCGACG~CATAAACCA~TCGGCCATC~CTCCTATAGiAAACATTTT~AATCTAATA~ aer-MiA 3'-UCCUCUCUCUCCCUAAACUUGGGAGCUAUUAGAAUUUUUGAUAUAGCCAAAAGUUCUGGCUGCGGUAlJlJ~JGGUGAGCCGGUAGAGAGG-5'
41640
TTTTTTTCAkAAAAATTATiAAAGTTTGAiCAAATCGAA~TTATAAGTA~TTTTTTGAT~ATTTTACAA~AACAGGATT~GATGGTAAT~TTTTCATATiTATTAAAAA~TTGGAGAAT~ > <-----------f -----_ cAATGTT TTTGAT> ACAACTATGACTATAGCTTicCAATTGGCiGTGTTTGCACTAnTTGCTAiTTCATTTCTi:CTAGTAATTEGTGTTCCCGiAGTACTAGCiTCTCCTGAAGGTTGGTCAAETAACAAAAAi MT I A F Q L A V F A L I A IS F L L V I G V P V V L A S P E G W S S 62>
41520
+-----
GGAG ORF 41760 N
K
N 41880
GTTGTTTTTiCAGGTGCTTCTTTATGGATiGGATTAGTTiTTTTAGTAG~TATTCTTAAiTCGTTTATA~CTTAAAATT~TATAGTAAT~TAAATTTTA~GAATTTA~~CTTCCTTGGi V V F s G A s L W , G L ,, F L V G , L N s F , 5 i=z +-----__------ ---_____________ TTACATTATI\TTATAAATT~TAAATGCAT~TGAAACAAGiGCTTTAGAT~~AAAAATGiTTCCAAGGA~GTTTAATTA~AAAAAAATT~TATAAATAT~TATA~TTA~ATATATATAi ---______ - __....._ - .____._ ___- -------+ ---------------> <-----------TTATAT> +---------------
42000 ----42120
ATATATATAiATAATATnnGTATAATTTninTACGCGGGiATAGTTTAA~GGTAAAATT~CTCCTTGCC~AGGAGAATA~GCGGGTTCG~TTCCCGCTA~CCGCCAATT~AAATAATTA~ TATAAT> 61y-GCC> 5'-GCGGGUAUAGUUUAAUGGUAAAAUUCCUGGUUGCCAAGGAGAAlJAlJGCGGGlJlJCGAlJUCCCGCUACCCGCC-3' ------><--------------------+ CTTAAAAAAnGAAAATAAGiAAAAATATTiTTTTTTGGC~GAGACAGGA~TTGAACCTA~GACCTCAAG~TTATGAGCC~TGCGAGCTA~CAGACTGCT~TACTCCGCGiTATAATTAA~ 3'-ACCGCCUGUGUCCUAAACUUGGAUACUGGAGUUCCAAlJAClJCGGAACGClJCGA~lGGlJCUGACGAGAUGAGGCGC-5' +---------> <---------+
42240 4AIJ-ftkt
Figure 2. Nucleotide and deduced amino acid sequences of a region from r&12 to trrzjAf(CAU) (1 to 42,240). Only the plus strand DNA sequence is shown. The deduced amino acid sequences in the l-letter code and tRNA-like RNA sequences are presented in the lower line. Genes are designated in bold characters at their 5’ termini. Broken arrows at “ - 10”) and indicate inverted repeated sequences. Possible promoter sequences (‘JTGACA at “ - 35” and TATAAT consensus sequences of the group II intron (gugyg . . . ragccg-augaa- -gaaa- -uucaugu-cgguuy . . . cuayy-y-ay) are shown. SD-like sequences and termination codons are indicated by capitals and double underlining, respectively. Note that ORF29, tmC(GCA), tmR(UCU), tmC(UCC), ORF33, ORF30, ORF36a, ORF55(ZhcA), tmD(GUC), tmY(GUA), &FzE(UUC) and tmS(UGA) are oriented in the reverse strand.
Liverwort
atpB
Chloroplast
313
Genome. II
- tmG(GCC)
~CTTTAACAC~;AGCTTTGAA~CCAACAC~T~CTTTAGTCT~~GTTTGTGG~GA~ATAAGT~~~T~~~TAC~AAT~TAATA~TATT~TTGC~AGGATGAGG~CTG~TCGAT~AAAATTTTT GGAGG
56291
~TCTAATTTT~CTTTTGTAT~AAAAATTTT~TAAGTAAAA~TTTTTT~CA~T~ATAAAAC~TTATTATTG~ATATTGTTT~TTATATGTA~TGCAACCTA~CTATTGTAT~ATTAAATAA +---------> (---------+ <-------+
56171
RTTTTATTAT~TTTTTTTAT~GATACA~AT~GAC~TTAAC~ATTTTT~AG~ATATAGATT~AAATATATA~ATATATATA~ATATATATA~ATATGAGTA~TATATATCT~TATCTATAT +--------------------------------) <-------------------
56051
RTATATATATAGATATATATEAAATTTATA~ATTTTGT~A~TAATTTAGT~TCAAAATTT~AT~TATCTA~TTAACTT~A~AAATATTAA~AAAAAGTTT~AATATATAT~TTTTCTAAG + TTGTCA> TTTAAT>
55931
~;TAGTTTTTT;TATTATTAA~TGATTTATT~GATA~ACAA~ATTTTTTTT~TTATAATTT~ATTATTAAC~AACTTTTTA~TTTATGAAA~CAAATTTTT~AGCTTTTGG~ATGTCTA~A a@> M K T N F L A F G M S
55811 T 55691
~CTATGGCTGAGTATTTTCG~GATGTTAATAAA~AAGATG~ACTTTTATT~ATTGATAAT~TTTTTCGTT~TGTTCAAGC~GGTTCAGAA~TTTCTG~TT~ATTAGGTAG~ATGCCGTCT TM A E Y F R D V N K Q D V L L F I D N I F R F V 0 A G S E V S A L L G R M P
54971 S 54651
54731
54611
54491
ATAAAAGGAT~TCAAATGAT;CTTTCGGGA~AATTAGATA~~~TT~CTGA~~AAGCATTT~ATTTAGTAG~AAATATAGA~GAAGCTA~T~CAAAAGCAG~TACTTTACA~GTGGAGAGT IKGFQMILSGELOSLPEQAFYLVGNIDEATAKAATLQVES etpE> GGAG ~AAAAATTATECTAAAT~TTEGTATCATGGC~CCTAATCG~ATTGTTTGG~ATTCGGATA~TCAAGAAAT~ATTTTATCA~CGAATAGTG~GCAAATTGG~ATACTACCT~ACCATGCTT M L N L R I MA P N R I VW N S D I Q E I I L ST N S G Q I G I L P N HA
54371
54251 S
CAGTTTTAA~~G~TTTAGAT~\TAGGAATTG~CAAAATA~G~CTTAATGAT~AATGGT~TA~TATGG~ATT~ATGGGTGGT~TTGCTATGA~TGACAATAA~AATTTAACT~TTTTAGTTA VLTALDIGIVKIRLNDQWSTMALMGGFAMIDNNNLTILVN
54131
I\TGATG~TGAAAAAGCTAGTGAAATAGATT~TC~AGAAG~~CAAGAAA~T~TTCAAAAAG~TAAAACAAA~TTAGAAGAA~CAGAAGGTA~CAAAAAAAA~GAAATCGAA~~TCTATTAG DAEKASEIDYQEAQETFQKAKTNLEEAEGNKKKEIEALLV
54011
;TTTTAAAAG~G~TAAAG~A~GATTAGAAG~AAT~AATAT~G~AT~AAAG~TATAAATTA~ATAATTAAT~AATAATTAA~AATTTATAT~AGATGCCAC~TTTTCTGGC~T~TAATATA ++---> F K R A K A R L E A! N MA S KL=== <----+ f----------------, (--------------~~
53891
~ATTAGAAAA~\AAA~TTA~~~A~T~~~GGA~TTGAAC~AA~GA~TCTCGC~GTATGAAAG~GATACTCTA~AC~ACTGAG~TAAGTAGGT~TTTTATTTC~AATTTATTA~AACTATATA + 3'-AUGGAUGAUAACCUAAACUUGGUUACUGAGAGCGGCAUACUUUCGCUAUGAGAUUUGGUGACUCAAUUCAUCCA-5'
53771
~TAA~TTTTA~TG~AAGATA~GTAAAAAAA~TTGAAT~TA~ACCTTGA~A~TAAATAATA~AAAAGTATA~ATATATATT~TATAGAAAA~AATA~TAAT~TAA~TCGTC~GTATTAAAG TATAAC>Val-UAC> f-----.)<-------+
53651 5'-AG
~G~TATAG~TEAGCGGTAGA~CGCCTCGTT~ACACGTG~G~CAATGCTTA~CAAAAAT~T~TTTTCGATT~GTCGATTCA~AACTAAAAG~TTTCTAATT~TTGTGAAAT~GAAAATGT~ GGCUAUAGCUCAGCGGUAGAGCGCCUCGUUUACAC gugyg................(1ntron).... . . .. . . . .. . . .. . . . .. . .. . . . .. . .. . . . .. . . .. . . . .. . . . .. . .. .
53531
;TACTCTTTGATTTAAT~AT~GAGAAAAATAGCCTGA~A~~AATAATTTC~ATTATTTAT~TGAAATTA~~TTTTTAGTT~ATATGGTTA~TTTTTT~TT~TAATGTTAT~A~ATGATGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron) . . . . .. . .. . . .. . . .. . . . .. . .. . . . .. . .. . . . .. . . . .. . . .. . .. . . . .. .
53411
~AATTACGGG~AACT~AAGA~ATT~TTTTT~TGCTTTATG~AATTTTAAG~TGTATAAAA~TT~ATATTA~TTTAGCAAG~GAAACT~TT~ATTGAGTAA~T~CATGTAA~AAACAAACC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..(lntron) . . . .. . .. . . . .. . . .. . .. . . .. . . .. . . . .. . . .. . . . .. . .. . . . .. . . .. . .
53291
~AAGTCAATA~TTGATAATT~TTGAAAAA~~TTGGGATTG~ATTAAAATT~TT~AGAATT~TAAG~AAA~~GAA~~AT~T~ATTATTAA~~AAAAAAAGG~TGGAAAATA~~TAAATTAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron).. .. . . .. . . .. . . . .. . .. . .. . . . .. . . .. . . . .. . . . .. . . . .. . .. . . .. . .
53171
EA~TTAGTTA~TAAA~GAGCEC~ATG~ATA~AAA~ATG~A~GTTGGGTT~~TAAAG~AGT~~TTAATTTA~AAGAA~TGT~TTA~~GAGA~TGTCTA~GG~TCAAATC~G~ATAG~C~TA . . . . . . . . . . . . . . .ragccg-augaa--gaa~~--Ilucaugllgll-cggul~y.. . . . (1ntron) . . . . . . . . . . ..cuayyy-ayCGAGAAUGUCUACGGUUCAAAUCCGUAUAGCCCUA-3'
53051
Fig. 3.
314
K. Umesono
et al.
;\ATTTGTTTT;TTTTATAAA~TGAAAAAAG~;TATACTT~A~~ATAAAAGA~TAGTTAATA~AATTTAA~T~AAAAATCTA~AGTT~AAAT~AGTACAATT~TAATAATAA~TGACCAATT +-----------> <-----------+ +------, <------+ TTGACC> f-----> <-----+
52931
52811
52691
52571
~TTTAATTAT;GGTTTAGTA~ATG~ATGGCEAAAAGGAG~~~TAGAATGG~CTTAAATTT~AAATTTTTT~CTTGTGAAA~TAGTTTAGA~GATAACTCT~CAACTATGC~TAAAAATTC L I I G L V YAW R KG A L E W S === psbG> AGGAG M V L N F K F F T C E N 8 L E D N S T TM L K
52451 N
S
~ATAGAAT~T~~TTTTATTA~CAAAA~T~T~ACAAATTCA~TTATTTTAA~AACTTTTAA~GATTTTT~T~ATTGGGCTA~ACTTTCTAG~CTATGGCCA~TCCTTTATG~TACAAGTTG I E S 5 F I N K T L TN S I I L T T F N D F 5 NW AR L S S L W P L L Y G T
S
C
52331
52211
52091
51971
~GAAAAAAAAAT~~TT~~~A~~GGAA~~AG~TTTTTTA~T~TAA~T~AT~~ATT~A~TTT~TTTT~AAAT~T~GA~AAT~~~AA~~TAAC~T~~T~AAA~~AATTTTTCC~ATCTAAAAA EKKILKKGTRFFTLNHQFNFFSNLDNPKLTSSNQFFQSKK ;~A~TT~T~~~~TTTTATT~~A~~~AT~TTT~A~ATTTAAA~~~~~GGAAA~TTTATA~~T~T~A~~TTTT~~TTTTGA~T~AAAAAAAAT~AAAAAAGTA~AAATA~TAT~TTAAA~ATT +------> <------+ T S K V L L E T S L T F K E K E N L === ORF169> M L
51851
51731 N
I
~TAAAAAATAA~AATAATAA;\~T~CAAGGA~GTTTATCTA~TTGGTTAAT~AAGCATAAT~TAAAACA~A~ACCTTTGGG~TTTGATTAT~AAGGAATAG~AA~ATTACA~ATTAGATCT LKNNNNKIQGRLSIWLIKHNLKHRPLGFDYOGIETLQIRS
51611
51491
RTAA~~GATAATG~~GAT~~~~~TGAAGA~~TATG~ATAA~AATTTTT~T~TTA~G~AAA~ACCCAAAA~~~TCCATCTAT~TTTTG~GTC~GGAAAAGTG~AGATTTTCA~GAACGT~AA I T DNA D Q P E E I C I K I F I L R K N P K I P S I F bl V bJ K S A D F Q E
51371 R
E
~~TTA~GATA~GTTTGG~AT~TTTTATGAA~~~-~C~CCT~TT~~A~G~ATTTTAATG~~TGATAGTT~GCTA~GATG~CCTTTACGC~AAGATTATA~~~TA~CTAA~TTTTATGAA SYDMFGIFYENHPCLKRILMPDSWLGWPLRKDYIVPNFYE
51251
51131
~TA~AAGACG~TTATTAATT~TATAGAAATAAAAAAAATA~~TA~TATTT~AATAATATA~TAAAGTAAA~ATAAGTATA~ATATAAAAT~TATATAAAA~GGGTTTAAA~TATTCTAAA +--- .-----) <--------+ +--------L Q D A Y ===
>
~TTCAGATTT~TATTAGAAG;\TTTTAAG~~~TTTTTATAT~TTTATATAT~TTAGAA~T~~GC~AGAAAC~AGATTTGAA~T~GTGA~AC~AGGATTTT~~GTC~TCT~C~~TA~~AA~T (----___------------------------+ 3'-ACGGUCUUUGGUCUAAACUUGACCACUGUGCUCCUAAAAGUCAGGAGACGAGAUGGUUGA
51011
~AGCTAT~CCGGCAATTTTTATTTTTT~T~~TTTAA~AT~~ATAG~AA~~~T~TGT~AA~~TTTT~TTAA~ATTCAATAA~AGTTTAC~T~GGGGT~GA~~GA~TTGAA~~~T~A~GGT~ +- ---------, CUCGAUAGGGCCG-5'
50891
~TTAAAGCCA~CGGATTTTC~~TTTTA~T~~~ATTTG~ATT~TTGTT~~~A~T~~~TTGTA~AAA~GGA~T~TATCTTTAT~~TCGTCTAT~TTTAAAATG~A~~~~TGAT~TAAT~~TTT AAAUUUCGGUUGCCUAAAA......................................(lntron).......................................................
50771
~ATTATTATTAATTAATATA~TAAT~CAAT~TTTTATAAA~ATG~T~~~A~ATTT~GTTA~GATAGTTTT~TTTGAGTCT~TG~~~~TCT~TT~TTTTAG~AAA~AAAAT~TGG~T~~GG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron).......................................................
50651
~TTACCTAAT~TTTATCTTT~TTTTGCAA~~TA~GTTT~~~TGAATTTGA~AA~~AT~AT~TA~TAAATT~CTCAACTAA~~~TCAATTA~ATTAA~T~~~~AG~GTCTA~~AATTT~G~ UUCAGGCGUCGCAGAUGGUUAAAGCG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron).............................
50531
EATACCCCCT~CAGTGAAA~~~AAAAAATTT~TTTCTTGTA~~AAAT~TAA~TATAGTTTT~TTTTTTTTT~TATGCAATA~TTAAAAAAA~ATTAAAAAA~AATT~TTG~~TTTTTT~AA TTGCTT>
50411
~ATATTATAT~TAATATGTTGTTTTTAATA~AAAAGAAAA~AATGGTATA~ATTAT~~~T~TATTTTTTA~TTATTTGCC~GTTTAG~T~~~AGGT~AGA~~GT~G~A~T~GTAATG~GA Thr-lJGlJ> 5'-GCCIJGUUUAGCIJCAGAGGUCAGAGCGUCGCACUUGUAAUGCGA TATAAT>
50291
~GGTCATCGG~TCGACTC~G~\TAGCGGGCT~TTTTTCTAT~TTTTAAATA~CAATAAAAA~ATTTTTGAT~ATTTTTATT~CTATAGTAA~ATAATTTTT~TTTTATTTT~TTATTTGTT f-----, <-----+ TTGATT> TATAAT> TTGTT UGGUCAUCGGUUCGACUCCGAUAGCGGGCU-3'
50171
~GTTTA~TTT~TTATGCTAT~\GTTTTTTAACTATGAAAAA~AGTTTATAA~AATTGTAAA~TTTTTTATA~AA~GATTTT~ATATGTTAT~TATTTTATA~TATTTTTT~~TATTTTTGT T> TATAGT>
50051
~ACATAAAGGAGTTTTTATG~CCCGTTAT~~AGGACCTCG~GTAAAAATA~TA~GT~GT~~GGGGGCTTT~CCAGGTTTA~~TAATAAAA~ACTTAAATT~AAATCTGGT~ATATTAAT~ MS R Y R G P R V K I I RR L GA L P G L T IJ Y T L K L K S G Y I r&> AGGAG
49931 N
Q
~AT~AA~ATC~AATAAAAAAGTTT~T~AGTATCGTATT~G~TTAGAAGAA~AA~AAAAAT~A~GTTTT~A~TATGGATTA~~~GAAAGA~~ATTATTAAA~TATGTA~GT~TTG~TAGAA STSNKKVSQYRIRLEEKQKLRFHYGLTFRDLLKYVRIARK
49811
~AG~AAAAGG~T~AACAG~TC~GGTGTTGT~~C~~TTGTT~GAAATG~GT~TA~ATAAT~~TATTTTTCG~TTG~GTATG~CTCC~ACAA~TCCAGGAG~~AGA~A~TTA~TTAAT~AT~ PTIPGARQLVNHR AKGSTGQVLLQLLEMRLDNIIFRLGMA
49691
Fig. 3, cont.
Liverwort
Chloroplast
Genome . I I
315
49571
49451
CTTTTCAAAA~CAAAAAATACCAAATCACT~AACTTTTGA~TTAATGCAA~TTAAAGGAT~AGTTAATCA~ATTATTGAT~GTGAATGGA~TTATTTAAA~ATAAATGAG~TGCTAGTTG F Q K Q K I P N H L T F tl L M Q I KG L V N Q I I D R E W I Y L K I N E L L
V
V
~AGAATATTA;T~TCGT~AAGTTTAAAAAAACAGATAGAA~ATTAGAGTT~TTATCCTAT~TCT~ATTAA~ATGAGAAAT~GGATAATAA~T~TAATTTT~TATCTGTTT~TAATTTATC (----------------------------------------+ E y y s R Q ,, ===+----------------------------------------,
49331
~AAATTTTTGAAAATTTTTT~AAATGAATC;GCTATA~AA~ATATAAAAT~CTATTTGGT~GTAATTAAA~TTGAATTAT~ATATATAAT~TTATTTTGA~ATTGAAACT~TTGTTTAAT
49211
EAAAAAAACT~GTTTATTTT;TTTATTACTATTATTTTTA~TAATTGTAA~ATGTTTATT~ATTAAAAAT~CATTTTTTT~ATAACATAC~AACTAGGAT~AATA~TAGG~TTCATAAAA +-----> (-----+ +-----------------)<-----
49091
RAAAATGAAT;ATAAGTATT~~TITAA~TA~ATTATAATT~~A~TTTAAA~AAATAATAA~AACAATTAG~ATAAAAAAA~TTCAGTTAA~AATTAAACT~AAAAATGCT~ATAAGCATT f--------------->(----+-----, + +
48971 <--------
~ATTTTTTGA~AAAAAAAAA;\~C~CATAAATTTTCTTTCG~AAAGAGAGG~ATT~GAAC~~TCGATAGA~~AAAAGCCTA~ATAGCATTT~CAATGCTAC~CCTTAAA~~~CT~AACCAT 3'-GCCUUUCUCUCCCUAAGCUUGGGAGCUAUCUGGUUUUCGGAUGUAUCGUAAAGGUUACGAUGCGGAAUUUGGUGAGUUGGUA +
48851
ETTTCCAAGA~AAACTATTTGAAAATATTG~AATGGAAAA~TGTATATTA~AG~AATATT~TTTAATTTG~GTTATTTTA~TTAATATAT~AAATAAAAT~ACTTGAATT~AAAAAAAAT GAAAGG-5'
48731
~TATATTTCAI\TAAAAAGCT~TTACATATACATATGTAAA~TTTTTTTAT~ATA~AAATA~TTCAAAATT~TTTATGGAT~TATTTT~AA~AAAAAGAAA~TAAGATTAT~TGATAAAT~ TTCAAT> -,<----+ f---------------) <--------------+
48611
+---------
--- ------------,<------------
--- ---------++---
48491
~GTACTATAGAGATGGTGTG~\TTTGACTTT~AA~~~AAGA~TTTTAATAA~GTGGAATAA~TAA~ATTTT~AATAAATA~~TATG~TTAG~GAAGGTTA~~TTGACTTTT~TT~TAAA~C Y Y R D Ggugyg.....................................(~ntron).......................................................
48371
CTTCTATCGT~TACCCTCGA;TAATGTAGCCTTAAATA~T~ATTTA~AGT~TTAAACA~A~GGTTAAACT~ATGTAAAAA~GTTTAAACA~TTTTTATAA~TATACATAT~AAGAAAGC~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron) . . .. . . .. . . .. . . . .. . .. . . . .. . . .. . . . .. . . . .. . . .. . . .. . . . .. . . .
42251
~TACATGTAGAAATTGACAA~ACAAAAGCT~CGTTATTTC~TTTTTAATT~AATTAAAAT~TATGATTGA~GAAA~AAAC~T~~AT~T~G~TTGTATTTT~GGTATCTAT~AAATCAT~G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron).......................................................
48131
~AGATTGTCA~AAGAGCTAA~TCTTATACAATACAAAGCA~TAAGAA~AA~GAATTTTGT~AAAAATTTG~AAAAAAGGA~GGTTAGA~A~TTTTTAGAA~AA~A~TAA~~~ATATAGAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron).......................................................
48011
~TTCATTTTG~TTACAATAA~~ATTGTATT~TTTTAAATA~~CTTTTTTT~ATTTACTCT~TATAAAGAG~AGCCGTATG~AGTTTAAAC~TCATGTACG~TTTTGAAAC~GAGTTTATT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron:..... ragccg-a"gaa--gaaa--""~~"g~-~gg""y................
47891
~TAAATAAAT~AACAACCGTRACGAATGTCAGC~CAATCT~AAGGAGAAT~TGCAGAAGC~TTG~AAAAT~ATTATGAAG~TATGCGCTT~GAAATTGAT~CTTATGATC~AAGTTATAT M S A Q S E G E Y A E A L Q N Y Y E AM R L E I D P Y D R S . . . . ..*......cuayy-y-ay
47771 Y
I
ATTATATAATATAGGTCTTA~~~~TA~AAG~AATGGAGAA~ATG~TAAAG~TTTAGAATA~TATTTT~AA~~ATTAGAA~~AAAT~~ATC~TTGCCT~AA~~TTTTAATA~TATGGCTGT L Y N I G L I H T S N G E HA K A L E Y Y F Q A L E RN P S L P Q A F N N MA ~ATTTGCCAT~ATCGAGGAG~ACAAG~GAT~CAACAAGGA~ATC~AGAAG~TT~AGAAA~~TGGTTTGAT~AAG~GG~TG~GTATTGGAA~CAAGCTATT~TACTTGCTC~AAGTAATTA IC H Y R G E Q A IO Q G D P E A 5 E T W F D Q A A E Y W K Q A I L LA P S
47651 V 47531 N
Y
~ATTGAAGCACATAATTGGT~AAAAATGAC~GGACGTTTT~AATAAAAAT~TAAAATGAA~TTTAGTTAA~TTAGTTATT~GTTTTTTAT~ACTAATTCA~CTAAATTTC~TTTTAAAAT 1 E A H N W L K M T G R F === f-e ____-__-~---_--~~-_ -_--_-_ --) (- -----------_----_---_--__
_
47411
~TACATAAAA~TTATGGTCC~TTAAGCACC~TAAAATTGT~TCATAATAA~TTTGAAGAC~TG~~TAGGT~ATTT~TGTA~AATAGTTGT~CCAGTTTTA~ACTGAGAAA~AGATCTAAA -+ TTGAAG> TATAAT>
47291
~GTTATATATEGAT~TCTCAGAAGTTTACT~AITATTGTT~GTAGGTTTT~CCTATGCCT~GT~TGAAGA~AGGAGAAC~~~GATGACTA~TCGTTCACC~GAACCAGAA~TCAAGATTG psaA > AGGAG MTIRSPEPEVKIV
47171
~GGTGGAAAAEGATC~TGTAAAAAC~T~TT~TGAAAAATG~G~TAAACCT~GGCATTTTT~AAGGA~TCT~G~TAAGGGT~~TAGTA~TA~CACTTGGAT~TGGAATTTA~ATGCTGATG VEKDPVKTSFEKWAKPGHFSRTLAKGPSTTTWIWNLHADA
47051
46931 ~TCGTTTTTC~AATTATGAA~CATGGTTAA~TGATCCTAC~~ATATTAAG~~CAGTGCT~~AGTTGTTTG~CCTATAGTT~GT~AAGAAA~TTTAAATGG~GATGTTGGC~GGGGTTTTC R F S N Y E A W L S D P T H I K P S A Q V VW P I V G Q E I L N G D V G G G
46811 F
Q 46691
EAGGATGGTT~CATTATCA~~\AAGCTGCTCCAAAATTAG~~TGGTTT~AA~ATGTTGAAT~TATGCTAAA~CAT~ATTTA~~AGGT~TTT~AGGCTTAGG~TCTCTTTCT~GGGCTGGAC G W F H Y H K A A P K L A W F 0 D V E S M L N H H L A G L L G L G S L S W A
46571 G
H 46451
46331
Fig. 3, cont.
316
K.
et al.
Umesono
46211
46091
~TTATGATCCAACTACTCAA~ACAACAATT~GTTAGATCGCAT~TTAAT~GGGTATGCA~CTTTTTAGG~TTTCATAGT~TTGGATTAT Y D P T T Q Y N N L L 0 R V L R H R DA I I S H L NW V
45851 C
I
F
L
G
F
H
S
F
G
L
Y
~TATT~ATAA~GATA~GATGAGIGCITTTAG~ACGTCCTCACACATGCT~TAGCACCAA IHNDTMSALGRPQDMFSDTAIQLQPVFAQWIQNTHALAPN
45731
ACTTTACTGC~CCTAATGCT~TAGCAAGTACTAGTTTAACGGAA~AG~A~ATTTTTTAG FTAPNALASTSLTWGGGDVIAVGSKVALLPIPLGTADFLV
45611
45491
45371
45251
ETTCTCAAGT~\ATTCAATCT~ATGGTTCTTCCTTATCTG~~TATGGTCTT~TATTTTTAG~TG~T~A~TT~GTTTGGG~T~TTAGTTTAA~GTTCTTATT~AGTGGT~GT~GATATTGG~ S Q V I Q S V G S S L S A Y G L L F L GA H F ii i A F 5 L M F L F S G R G Y W
Q
45131
~AGAGCTTAT~GAATCCATT~;TTTGGGCTC~CAACAACAAATGTAGCTCAT~A~CTT~TAG E L I E S I VW AH N K L K V A P A I
G
45011 Q
P
R
A
L
S
I
T
Q
G
R
A
V
G
V
A
H
V
L
GTGGAATTGCEACAACATGGGCATTCTTT~~AGCAAGAAT~ATTGCAGTA~GATAATGG~~AAGGAGGAT~TGAAAAGCA~TATGG~AT~~AGATTTCCC~AATTTAG~C~GGGCCTATC G I ATT W A F F L AR I I A V G === AGGA p%dJ~ MA S R F P K F S Q G
L
44891 L
S
~CAAGACCCA~\CTACG~GTCET~~TTTGGTT~GGTATTGCG~~CGCACATG~~TTTGAAAG~~ATGATGAT~TGACTGAAG~A~GT~TTTA~~AAAAGATT~TTGCGTCAC~TTTTGGTCA QDPTTRRIWFGIATAHDFESH D D M T E E R L Y Q K I F A S H F G
Q
~TTAGCAATC;\TTTTTTTATGGACTTCTGG~AATTTATTT~A~GTTG~TT~G~AAGGTAA~TTTGAAG~A~GGGGA~AAG~CCCTTTACA~GT~AGA~CA~TTGCTCATG~AATTTGGGA LAIIFLWTSGNLFHVAWQGNFEA %! G Q D P L H V R P I AH A I
W
D
~CCGCATTTTGGTCAACCAGETGTTGAAGC~TTTACTCGA~GAGGAGCTT~TGGACCAGT~AATATAG~A~ATT~TGGTG~ATAC~AATG~TGGTATACA~TTGGTTTACGAACTAATCA PHFGQPAVEAFTRGGASGPVNIAYSGV V Q w W Y T I G L R T
N
Q
44771
44651
44531
44411
44291
~ACAAAATTA~CACAT~CGG~~GGACTAGG~CCATTTTTT~~AGGACAAT~GAATATTTA~G~T~AAAAT~TCGATTCAA~TAATCATG~~TTTGGAACA~~T~AAGGGG~TGGAACAG~ T K L P H P E G L G P F F A G Q W N I Y A Q El V D S 8 F ! HAFGTSQGAGTA
44171
~ATCTTAACT~TTATTGGTGGATTTCATCCACAAACACAA~GTTTATGGC~TA~TGATAT~G~TCACCAT~ATTTAG~TA~TGCAGTTGT~TTTATTATA~~TGGTCATA~GTATAGAA~ ILTFIGGFHPQTQSLWLTDIAHHHLAIAVVFIIAGHMYRT
44051
~AATTTTGGA;\TTGGTCATAGTATCAAAGAEIATTCTTGAA~~~CATACTC~T~~AGGAGG~CGTTTAGGT~GAGGA~ATA~AGGTCTTTA~GATACTATT~A~AATT~T~~T~ATTTTCA N FG IGHS I KE I LETHTPPGGRLGRGH KGL YDT I NFISLH.FQ
43931
43811
iCATCATCAAiATATTGCTG;;TTTTATTATi;ACAGGTGCi~TTGCTCATG~AGCTATTTT~TTTATTAGA~ATTATAATC~GGAACAAAA~AAAGATAAT~TATTAGCTA~AATGTTAGA F F I R D Y N P E 0 N K D N V L A R HHQYIAGFIMTGAFAHGAI
43691 M
L
E
ACATAAAGAA~;CTATAATAT~CCATTTAAG~TGGGCTAGT~TATTT~TAG~ATTTCATA~~TTAGGT~TT~ACGTT~ATA~TGATGTTAT~~TTG~TTTT~GTA~T~~TG~AAAACAAAT H K E A I I S H L SW A s L F L G F H T L G L Y V H PJ D V M LA F G T P E K
Q
I
~TTAATTGAAECTATTTTTGETCAATGGAT~\CAATCTGCT~ATGGTAAAG~TTTATATGG~TTTGATGTA~TTTTATCAT~AA~AAATAA~C~AGCATTT~ATG~TGGT~~AAG~ATATG N N PA F NAG Q S LIEPIFAQWIQSAHGKALYGFDVLLSST
I
W
42571
43451
ETTACCTGGT~GGTTAGATG~TATAAATAA~AATAGTAAT~CACTTTT~T~AA~AATTGG~~~TGGAGAC~TTTTAGTAC~T~A~G~TAT~G~TTTAGGT~TA~ATA~TA~TACATTAAT IALGLHTTTLI LPGWLDAINNNSNSLFCTIGPGDFLVHHA
43331
~TTAGTGAAAGGTGCTTTAG~TGCACGAGGATCTAAATTA~TG~~AGATA~AAAAGAATT~GGTTATAGT~TTC~TTGTG~TGGT~CTGG~~GAGGTGGT~CT~GTGATA~TTCTGCTTG LVKGALDARGSKLMPDKKEFGYSFPCDGPGRGGTCDISAW
43211
43091
~ACATATTTA~TGGGCTGGT~AAGAGATTA~TTATGGTTA~ATT~TTCAC~ATTGATTAA~GGATATAAT~~TTTTGGTA~GAATAGTCT~T~TGTTTGG~CATGGATGT~TTTATTTGG T Y L M G W L R D Y L W L N S S Q L I N G Y N P F G M N S L 5 VW A W M F L F ~CATTTAGTT~GGGCTACTGGATTTATGTT~CTGATATCG~GG~GTGGAT~TTGG~AAGA~~TTATTGAA~CTTTAGCTT~GG~T~A~GA~~GTA~T~CT~TAG~GAATT~AGTTCGCTG HLVWATGFMFLISWRGYWQELIETLAWAHERTPLANLVRW
Fig. 3, cont.
42971 G 42851
Liverwort
Chloroplast
Genome. II
317
GAAAGATAAA~CAGTAGCTC~TTCTATTGT~CAAGCAAGA~TAGTTGGAT~AG~T~ATTT~TCTGTAGGT~ATATATTTA~TTATG~TG~~TTTTTAATT~~TT~TA~AT~TGGTAAATT KDKPVALSIVQARLVGLAHFSVGYIFTYAAFLIASTSGKF
42731
~GGTTAAATT~TATTTGTAT;AAATTT~~A~TTTTACTCA~AATATTTTT~AATAAAAAT~TATTGAAAA~GAATATCTT~~TAATTAAA~AATTATGG~~AAAAAGAGT~TTATT~AAA G === QXSl4~ MAKKSLIQR TTCCAA, TTTAAT, +----------) <---------+
42611
EAGAAAAAAA~AGACAAAAT~T~G~~~~~~A~~~~~~~AT~TTA~GTAAT~~TTTAAAAA~AAAAATTAC~GAAACCT~A~~ATTAGATG~AAAATGGGA~TTT~AAAAA~AATTA~AAT E K K R Q N L E K KY K I L RN S L K K K I T E T S S L D E K Id E F Q K
42491 K
L
0
S
ETTTACCACG~AATAGTGC~ECG~CTCGT~~TCATCGTCG~TGTTTTTTG~CTGGAAGA~~TAAAGCAAA~TATCGCGAT~TTGGTTTAT~TAGACATTT~~TT~GTGAA~TGGCT~ATG LPRNSAPTRLHRRCFLTGRPKANYRDFGLSRHLLREMAHA EATGTTTATTGCCTGGAGTAACCAAAT~TAGTTGGTAAAC~TTTTGGGTT~CTTTCCCG~~TCTTATAAG~GGCGGGGTT~TT~AAAAAA~ATCTTGTTT~ATT~CATTT~TATAGAGTA +---------><---------+ TTGTTT, C L L P G V T K S S W === ~ACTTTTTTC~TTAATTATA~CG~GGAGTAGAG~AGT~TG~TAGCTCGCA~GG~TCATAA~~TTGAGGT~~TAGGTT~AA~T~~TGT~T~~G~~AAAAAA~AATATTTTT~~TTATTTT~ TACT> f&t-CAU> 5'-CGCGGAGUAGAGCAGUCUGGUAGCUCGCAAGGCUCAUAACCUUGAGGUCAUAGGUUCAAAUCCUGUCUCCGCCA-3'
+--------->
~TTTTTTAAGETAATTATTT~~ATTGGCGGETAGCGGGAA~~GAA~~~G~~TATT~T~~T~GGCAAGGAG~AATTTTA~~~TTAAACTAT~CCCGCGTAT~TAAATTATA~TTATATTAT 3'-CCGCCCAUCGCCCUUAGCUUGGGCGUAUAAGAGGAACCGUUCCUCCUUAAAAUGGUAAUUUGAUAUGGGCG-S'
42371
42251 TA 42131
<------42011
Figure 3. Nucleotide and deduced sequences of a region from a@B to tmG(GCC) (56,410 to 42,011). As in Fig. 2, but the minus strand DNA sequence is shown. Genes tmM(CAU), tmF(GAA), tmL(UAA), tmS(GGA) and @aG(GCC) are located in the oppositedirection.
structure of the latter is interrupted by a group II intron of 593 nucleotides at the 3’ side of the D-stem (A23-C24; Table 5 of Ohyama et al., 1988), as seen in tobacco (Deno & Sugiura, 1983) and wheat (Quigley & Weil, 1985). Identification of the trfM(CAU) gene (42,229 to 42,156) that encodes an initiator, not an elongator, has been described (Umesono et al., 1984). We found two more tRNA genes containing CAT anticodon sequences, one for methionine and the other for isoleucine. By comparison with known chloroplast tRNA gene sequences,we identified the tRNA gene (53,801 to 53,874) in the middle of the LSC region as a methionine acceptor trnM(CAU), and the other near the distal end of the LSC region as an isoleucine acceptor trnl(CAU) (Fukuzawa et al., 1988). Two threonine tRNA genes, trnT(GGU) (38,367 to 38,438) and tmT(UGU) (50,333 to 50,261), are 11.8 kb apart on opposite DNA strands (Fig. 1). They do not contain any mismatched base-pairings or introns. One of three chloroplast arginine tRNA genes, tmR(UCU) (21,321 to 21,250), is downstream from trnG(UCC) with a spacer of 63 bp (Fig. 1). The other two arginine tRNA genes trnR(CCG) (Fukuzawa et al., 1988) and trnR(ACG) (Kohchi et aE., 1988) contain mismatched base-pairings in the aminoacyl stems and are highly conserved in their primary structures identical). However, the (80% tmR(UCU) gene predicted a normal aminoacyl stem of 7 bp and is divergent from the isoacceptor sequences(45 to 51 o/o identical). Three tRNA genes, trnE(UUC) (36,787 to 36,715) for glutamic acid, tmY(GUA) (36,643 to 36,562) for tyrosine, and tmD(GUC) (36,484 to 36,411) for aspartie acid, are tandemly oriented with respective spacers of 71 bp and 77 bp in this order. No other tRNA genes encoding glutamic acid, tyrosine or aspartic acid were identified in the liverwort chloroplast genome. Barley chloroplast tRNAG’“(UUC) molecules are identical with
RNADALA, which is an essential component for chlorophyll biosynthesis, as reported by Schijn et al. (1986), who pointed out the presence of an A *U pairing instead of a conserved G *C pairing at the distal end of the Tti stem of the barley tRNAoi”(UUC). This A . U pairing would be conserved in the liverwort tRNAG’“(UUC), because the cognate trnE(UUC) gene contains A53 and T61
1
”
:j: A : ” A : ” “:A A ”
A ” A”
Figure 4. Possible secondary structure of tmL(UAA) intron. A secondary structure was constructed according to the proposal of Michel & Dujon (1983). Open boxes SR’, A, B, 9L, 9R and 2 indicate conserved sequence elements found in group I introns (Cech & Baas, 1986). Boxes 9L and 2 are complementary. Putative splicing points are marked with arrows.
K. Umesono et al.
318
(Table 5 of Ohyama et al., 1988). Products of trnD(GUC) and tmY(GUA) might contain mispairings in the anticodon stem; C31. A41 in tmD(GUC) and A28. C44 in trn Y(GUA) were observed (Table 5 of Ohyama et al., 1988). One of two chloroplast valine tRNA genes, tmV(UAC) (53,652 to 53,051), and a lysine tRNA gene, tmK(UUU) (26,040 to 28,222), are split by long introns at the anticodon loop-anticodon stem junction: that is, A37C38 of tRNALYS(UUU) and C37-C38 of tRNA”*‘(UAC) (Tables 2 and 5 of Ohyama et al., 1988). The introns found in tmV(UAC) and tmK(UUU) are 530 and 2111 nucleotides long, respectively. The split form of to be conserved trn V( UAC) seems among chloroplast genomes in higher plants (Deno et al., 1982; Zurawski & Clegg, 1984; Krebbers et al., 1984), but not in Euglena gracilis (Hallick et al., 1984). Furthermore, the tmK(UUU) intron could encode a polypeptide of 370 amino acid residues (ORF370i) with a sequence 34.4% identical with that reported in the tobacco tmK(UUU) intron (509 amino acid residues long; Sugita et al., 1985). We have not detected any other copy of the tRNALyS gene in the liverwort chloroplast genome. The spliced cloverleaf structure of tRNALYs(UUU) contains a mispairing of C26 . C42. The 5’ end of a histidine tRNA gene, tmH(GUG) (29,595 to 29,669), was tentatively assigned according to the maize sequence (Schwarz et al.? 1981). The deduced tRNAH’“(GUG) would contain a mispairing in the anticodon stem, because U30 and U44 are predicted from the DNA sequence (Tables 2 and 5 of Ohyama et al., 1988). A phenylalanine tRNA gene, tmF(GAA) (50,998 to 51,070). is tightly linked to the tmL(UAA) gene; they are oriented as 5’-tmL(UAA)-76 bp-tmF(GAA)-3’. The genes tmH(GUG), tmF(GAA), tmC(GCA) (the gene for a cysteine tRNA, 5720 to 5650) and tmQ(UUG) (the gene for a glutamine tRNA, 23,804 to 23,875) all are a unique copy in the chloroplast-encoded tRNA genes.
(b) Ribosomal protein genes Five ribosomal protein genes were found in this region: rps.2, rps4, rps7, rps’l2 and rpsl4, encoding the counterparts of Escherichia coli S2, S4, S7, S12 and 514 ribosomal proteins, respectively (Fig. 1). The rps2 gene (16,055 to 16,762) encodes 235 amino acid residues, and the predicted polypeptide is 44.3% identical with the E. coli counterpart (241 amino acid residues; An et al., 1981) and 72.3 o/o identical with that of pea chloroplasts (236 amino acid residues; Cozens & Walker, 1986; and see Fig. 5(a)). A putative protein predicted by the rps4 gene (50,033 to 49,425) consisting of 202 amino acid residues shares 40.1 o/o and 69*8o/o homology to proteins in E. coli (205 amino acid residues; Bedwell et al., 1985) and maize chloroplasts (201 amino acid residues; Subramanian et al., 1983; and see Fig. 5(b)).
The rps7 (892 to 1359) and rps’l2 (1 to 842) genes were first located by a heterologous Southern hybridization experiment using a cloned Eu. graeilis chloroplast DNA fragment as a probe (data not shown). Sequence analysis confirmed the presence of these two genes, but failed to detect an EF-Tu gene that is tightly linked to the rps12 and rpsY genes in Eu. gracilis (Montandon & Stutz, 1983, 1984). A polypeptide predicted from the rpsY gene, starting 50 bp downstream from rps’l2, consists of 155 amino acid residues and is 43.8% and 44.5 y. identical with those of E. coli (154 amino acid residues; Reinbolt et al., 1978) and Eu. gracilis (156 amino acid residues; Montandon & Stutz, 1984), respectively (Fig. 5(c)). The first exon (rpsZ2’) is found in the DNA strand opposite the second exon (rps’12), suggesting the possibility of trans-splicing in vivo (Fukuzawa et al., 1986). The second and third exons encoded by rps’l2 gene are also separated by a group II intron of 500 nucleotides long. A similar organization of the rpsl2 gene has been suggested for tobacco chloroplasts (Fromm et al., 1986; Torazawa et al., 1986) The rpsl4 gene (42,635 to 42,333) has been identified (Umesono et al., 1984); it consists of 100 amino acid residues and 71 .O% homologous to the is 45*00,/, and counterparts in E. coli (Cerretti et al., 1983) and spinach (Kirsch et al., 1986), respectively (Fig. 5(d)).
(c) RNA polymerase genes Three large tandemly oriented open reading frames (ORFs) have amino acid sequence homologies to either the p or fl subunits of E. coli RNA polymerase. We designated these chloroplast ORFs as rpoB, rpoC1 and rpoC2. The chloroplast rpoB gene (5859 to 9056) encodes 1065 amino acid The amino acid sequence is 43.6% residues. identical with that of the E. coli subunit (1342 amino acid residues; Ovchinnikov et al., 1981), if large gaps are introduced in the chloroplast rpoB gene product. The amino acid sequence predicted from the liverwort rpoB genes is 68.5% identical with that of tobacco (1070 amino acids residues; Ohme et al., 1986; and see Fig. 6(a)). The chloroplast counterpart of the E. coli p subunit may be encoded by two separate genes, chloroplast rpoC1 and rpoC2, in the liverwort genome. The rpoC1 gene (9087 to 11,737) is located 31 bp downstream from rpoB and consists of 684 sense codons with a group II intron (596 nucleotides long) between the 124th and 125th codons (Fig. 2). The amino acid sequence predicted from the rpoC1 gene shares 41.7% homology with the N-terminal 580 amino acid sequence of the E. coli p’ subunit (1407 amino acid residues; Ovchinnikov et al., 1982; and see Fig. 6(b)). The C-terminal portion (amino acid residues 541 to 684) of the rpoC1 protein is quite different in sequence and size from the corresponding region (512 to 580) of the E. coli /?’ subunit.
Liverwort
Liverwart Pee E. coli
Chloroplast
Genome. II
319
~~UrnIH~EE~E~VH~G~ARKUNP~~PVIFTER~GlHIINLTQ~ARFLSEAC-D~VANASSKGKbFLIVGTKVQ~ADLIESSAL~ARCHYVNPK :TKRY:::TF::::::::::::DT:::::R:::F:SAK::::::T:::K:::::::::-::AFD:A::::::::::::KK:::SVTRA:IR:::::::Q: MATVSMRD:LK::::::::T:V:::::K:F::GA:NKV::::: EK:VPMFN::LAE:NKI::R:::-I:F::::RA:SEAVKDA::SCWFF::HR irLGGnLTNUsiIETRLQKFK~LENKKKTGTiNRLPKKEAAi(LKRQLDHLQ~VLGGIKYMT~LPDIVIIIDiMKEFTAIOE~ITLGIPTIC~VDTDCDPDM ::R::::::V:T::::G::R::RTEQ:::KL:S::: RD::M:::::S:FET:::::::::G:::::::V::::: V::L:::::::::::::I::N::::L :::::::::K:VRQSIKRL::::TQSQD::FDK:T:::: LMRT:E:EK:ENS:::::D:GG:::ALFV::ADH:HI::K:ANN::::VFAI:::NS:::G iDIPIPmD&-RASIRWILN~(LTLAICEGR~NSIKN A:%.-.**a*-I::::L:::::VF::::::SS::R:V . . . .. . . V:FV::G::::I::-VTLV:GAVAATVR:::SQDlASQAEESFVEAE
Liverwort Maize E.
235 236 241
72.3% 44.3%
MSR~RGPRVKIIRRLGA-LP---~LTNKTLKLK-~GVIN~STSN~K--VS~~RIRLEEK~KLRFH~~LTER~LLKV;IRIARKAKGS~G~VLL~LL~~~RLD ::::::::L:K::::::-::---:::R::P:SG-:NPKKKFH:G::---E::::::Q:::::::::::::::::R::H::G:::R::::::::::::::: :A::L::KL:LS::E:TD:FLKS:VRAIDT:C:IEQAPG:ffiA-R:PRL:D:GVO:R::::V:RI::VL:::FRN:VKE:ARL::N::EN::A:::G:::
coli
NIIFRL~~PTIPGAR~LVNHRHILI~NNTVDIPSV~CKPKDVITI~DRSK~SIIiKNLNSF-QKQi(IPNHLTFDL~IKGLVNQIiDREWIVLKI~EL :;L......S...............V:GRI..... . . . . . . .. .. . . . .. . . . .. . . . . ..FR...R:I::T::NaR:KRLVPNYIA:S-DPG:L:K:::V:TL:V::::KK:L::K:VG:::::: :VVV:M:FGA:RAE:::::S:KA:MV:GRV:N:A::OVS:N::VS:REKA:K::RVKAA:ELAE:RE:-:TW:EV:AGKME:TFKRKPE:SDLSAD:::H LvvEvvsRQv .. .. .. .. .. .. .. ..T .. :I::L::K (4
rps7
202 201 205
69.82 40.1%
(57)
Liverwort E. gracilis E. coli
MSRKSIAEK~VAKPDPIVR~RLVNMLVNRiL-KNGKKSLA;RILYKAMXNiK~KTKKNPL~VLRQAVRKV~PNVTVKARRiDGSTVQVPL~IKSTQGKAL :::RRR:K:RIISQ:::: NST:ASKVI:K::-L::::T::QV:F:ET::::OEIV::D::DI::K:IKNAS:QMETRK:::G:TI::::V:V:EDRGTS: :P:RRVIGQRKIL:::KFGSE:LAKF:: -::MVD::::T:ES:V:S:LETLA:RSG:SE:EAFEV:LEN:R:T:E::S::VG:::::::V:VPVRRD-:: ~IRWLLGASRKRSGPNMAFK~SYELIDAAR~NGIAIRKKE~THKMAEANRAFAHFR :LKFIIEKA:E:K:RGIST::KN:I:::SNNT:E:VK::::I::T::::K::SNMKF :M::IVE:A:::GDKS::LR:AD::S:::ENK:T:VK:R:DVAR:::::K::::V:
(d)
rpsl4
Liverwort Spinach E. coli
155 156 154
44.52 43.8%
(514) HAKKSLIQR~KKRQNLEKK~KILRNSLKK~ITETSSLOE-~WEFQKKLPSiPRNSAPTRL~RRCFLTGRP~ANYROFGLS~HLLREMAHA~LLPGVTKSSL; ::R::::::::::R:::Q::HLI:R:S:QE:RKVT::SD-::::P:::::: A::::::::::::R::I::::::G:I::::V:T:::::A:R::: :::Q:MKA::V::VA:AD::FAK:AE::AI:SDVNAS::DR:NAVL:::T:::O:S:S:~RN::RQ::::HGFL:K:::::IKV::A:MRGOI::LK:G
100 100 99
71.0% 45.0%
Figure 5. Amino acid sequence comparison of ribosomal proteins. Amino acid sequences are translated from the respective DNA sequences except for E. coli S7 protein. Identical residues are marked with colons. The bar indicates artificial shifting to maximized homology. (a) E. coli S2 protein homologue. Pea (Cozens & Walker, 1986), and E. coli (An et al., 1981). (b) E. coli 54 homologue. Maize (Subramanian et al., 1983), and E. coli (Bedwell et al., 1985). (c) E. coli S7 homologue. Eu. gracilis (Montandon & Stutz, 1984), and E. coli (Reinbolt et al., 1978). (d) E. coli S14 homologue. Spinach (Kirsch et al., 1986) and E. coli (Cerretti et al., 1983).
The rpoC1 gene is followed by the rpoC2 gene (11,811 to 15,971) encoding 1386 amino acids. The initiation codon of the rpoC2 was tentatively assigned to 74 bp downstream from the rpoC1, because of the presence of an SD-like sequence upstream from the rpoC2 coding sequence. The product of the rpoC2 gene corresponds to the C-terminal half (581 to 1407) of the E. coli p subunit, if large gaps are introduced in the E. coli According to sequence. the alignment in Figure 6(c), the N-terminal 349 and C-terminal 283 amino acid residues are 40.7 y. and 37.8% homologous to the corresponding portions (581 to 948 and 1103 to 1407) of the E. coli /I’ protein. The discrepancy is caused by the additional residues in the rpoC2 protein (amino acid residues 350 to 1103; filled arrows), which is much longer than the corresponding E. coli sequence (949 to 1102). The possibility cannot be ruled out that the central portion of the rpoC2 gene, as well as the region between the rpoC1 and rpoC2 genes, represents introns, although we have not detected the characteristic structures of both group I and II introns. A rpoC2-like gene can be seen in pea chloroplast DNA, although its 5’ part was not described (Cozens & Walker, 1986). This partial
ORF encoding 1163 amino acids is comparable in size to the corresponding region of the liverwort rpoC2 (amino acid residues 205 to 1386). Amino acid sequences are less homologous (32.9%) in the central regions (amino acid residues 476 to 1080 of liverwort rpoC2 and 276 to 876 of pea ORF) than in portions of N termini (65.3 %, 205 to 475 of liverwort and 1 to 275 of pea; open arrows) and C termini (63.8%, 1081 to 1320 of liverwort and 877 to 1123 of pea). (d) ATPase subunit genes genes (atpA, atpB, atpE, atpF, atpH and for the chloroplast H+-ATPase, which consists of nine non-identical subunits, were identified by sequence comparison with genes in higher plant chloroplast genomes. In addition to the split structure of atpF (587 bp group II intron), the organization of the atp genes is also conserved in the liverwort chloroplast genome. These genes form two clusters: 5’-atpIbp-atpH-208 bpatpF-44 bp-atpd-3’ and 5’.atpB-5 bp-atpE-3 (Fig. 1). An F,-IV subunit (248 amino acid residues) encoded by the atpI gene (16,890 to 17,636) is Six
atpI)
320
K. Umesonoet al.
livwwrt
rpoa
Tebecco I?. cdl
rpoe rpo6
MEIFILPEFi;KIQFEGFNR~I~---GLSEEL~FPIIEDI~EFEFQIFGEi)YKLAEPLLK~RDAVYQSIT~SSDVYVPA ILGffiNEG:STI:G:~::::::C:::D:---::T:::YK::K:::T::-:I:::L:V:T:Q:V::: 1:::::::E:L::::EL::S: MVYSYTEKKRIRKDFGKR~VLDV:YLLS::LDS:QK::E:DPE:DYGLEAA:RSVFP:QSYSGNSELDYVS:R:G::VFDVQECDIRGV:::APLR:KL . ------TQKKKGK-IDicDIVFLGSIP~MN~TFVV;IVARVIINQiLRSffiIYYN~EL--DHN-GIPI~TGTLISNffi~RLKLEIDGK~RIWARISK cQ':- ---IU:NSRD-FI:E:TI:I:N::::::L::SI:::IY:IV:::::Q::::::R:::--:::-::SV::::I:: D::::SE::::R:A:::::V:R R:VIYEREAPEGTV:D:KE:E:YM:E**..TDN:::: . . . . 1::TE:::VS:LH::::VFFD:DKGKT:SS:KVL:NARI:PYR:SWLDF:F:P:DNLFV::DR KiKVSILVLLL~LNLQNIL~SVCYPKIFL~F------------------------------------------------------------------
RE::EN::::E:
:Q:I:::::SS::::::
::S:------------------------------------------------------------------
R::LPATII:R:LNYTTEQ:::LFFEKV::EIRDNKLQMELVPERLRGETASFDIEANGKVYVEKGRRITARHIRDLEKDDVKLIEVPVEYIAGKVVAKD --------------------------------------------IKKNTKKE~PNSTEDAIV~LYKHLYCIGc-DLFFSESIR~ELDKKFFQD-~CEL --------------------------------------------LSDKER:KIG-:K:N::L:F:QQFA:V::-:P\I::::LC..........-:::: YIDESTGELICAANMELSLDLLAKL~SGHKRIETLFTNDLDHGPY:SETLRVDPT:DRLS:L::I:RMMRffiEPPTREAA::LFEN:---::SED:YD: GKIGRL~LNKKLNLNVPENEIFVLW~)~LAAVDYLI;(RKRLPTPKS~VTS :R:::R:M:RR:::DI:Q:NT:L::R:::::A:H::G::::M:AL::MN::::K:IR.........FG::: SAV::MKF:RS:LREEI:GSGILSKD::IDVMKK::DIRN:K:EV::::::G:::
. .. . . . . . ..
V::::V:RGTIC::IRH:LI:::QN::::
.. . . . . .. .
IR::GEMAEN:FRVG:V:V:RA:--KE:LSLGDLDTLM:QDMINA
TPLIMTiK~FFGSHPL~~FL~TNPLiEIVHKRRLS~LGP~LTRR~ASF~VRDlH~SHYGRICPI~TSEGRNAGLiASLAIHAKI~ILGCL~SPF~KIS :::TT:YES:::L:::::V::R:::::Q:::G:K::Y:::::::G:::::RI::::P::::::::: D::::I:V:::G::::::R:GHW:S::::::E:: K:ISAAV::::::SQ::::MV:N:::S::T:::: 1:A::::::::ER:G:E:::V:PT::::V::::: P::P:I:::N::SVY:QTNEY:F::T:YR:VT KLSNLEiIINLSAAEDiYYRIATGNCiAL~NSQ-EE~ITPARYR~D~VAIAWEQVH~RSIFPLQYF~VGASLIPFL~HNDANRALM~SNMORQAVP~lK ER:TGVRHLY::PGR****MV:A::S::: ,... N:DI:-:::VV::::::E:LT:::::::::::::F::::I:::::::I::::::::::S::::::::::SR DGVVTD::HY:::I:EGN:V::()A:SN-::EEGHFV:DLVTC:SKGESSLFSRD::DYMDVST~:VV::::::::::::
D::::::::A::::::::T:R
P~KCIVGiGIESOTALD~GSVT-VSSHG~KIEYLDGNQiILSLKKKK-------I~-KNLIIYQRS~NSTCMHQKP~V~KQKYIKK~QILADGAAT~NG~ AL-AIAERE:RVV:TNTDK:L:AGNGDIL-----S:---P:V::::::KN:::::: LO:PRG:C:::::::::::::VG:: S::::::::L:R:A::::: AD:PL::::M:RAV:V:::-:: A:AKR::VVQ:V:ASR:VIKVNEDEMYPGEAG::IY::TK:T:::QN::IN:M:C:SLGEPVER:DV::::PS:DL:: LALGKNiLVAYMPWEG~NFEDAILIN~RLIYEDIYl~IHIERYEIE~RVTS~GPEK~TNEIPHLDD~LLRHtWNGiVLTGSWVET~DVLvGKLTPhETE .. .. .. .. .. .. V:::::::::::S:::V::S:::V:::::::F::RK:::OTH:::::::: V:::::::EAH:::N::K::::ML::::::::I::::::::VVK ::::Q:MR::F:::N:::::: S::VS::VVQ::RF:T:::QELACVS:D:KL:::EI:AD::NVGEAA:SK::~S:::YI:AE:TG::I::::V::KG-:
ENLRAPtGKLLQAIFGiQVATSKETtiKVPPGGRGR;IDIRL---------------------------------------------------------:SSY:::DR::R::L***.S::::::::L:I TQ:T-::E:::R:::I~K~sDv:DSS:R::N:VS:T...
::::::::v:w---------------------------------------------------------. ..V~VFTRDGVEKDKRALEIEEMQLKQAKKDLSE~LQlLEAGLFSRIRAVLVAGGVEAEKCDK
--------------------------------------------IS~E~NSA-NTAQII;l~YlLQKRKI;lIGDKVAGRH~NKGIlSKlL~RQDMPFLQD~ _________________ -__-_______________----_-:oKRGG:S-YNPET:RV::::::E:KV:::::::::::::::: LPRDRWLELGLTDEEKQNQLEQLAEQYDELKHEFEKKLEAKRRK:T:G:DL:PGVLK:VKV:LAV::R::P:::
::::::::y:::: M::..-..:V::::N:IE:::YDEN: . . . . .
TPIDMILsPiGVPSRMNVG9IFECLLGLA~SFL--------------------------------HKNYRIiPFDERYEREAS--RKL------VFSEL; Rsv::vF~................s...... ................ . . . . ..L.--------------------------------DRH...A:::::::~:::--:::------:::::: ::V:IV:N:::::::::I:::L:TH::M:AKGIGDKINAMLKQQQEVAKLREFIQRAYDLGADVRQ:VDLSTFS::EVM:L:ENL::GMPIATP::DGAK KASKKTTNP;LFEPDNPGK;RLIDGRTGEiFEOPITIGK~YMLKLIHQV~DKIHARSSG~YALVTQQPL~GRSRRGGOR~GEMEVWALE~FGVAYILQE~ ::::::::::::AKO:::::::::::::::::::H::::: E:::Q:A :::V:::EY:::S:IF:::::NP::::VI:::P:I::::::::::::G::::H :::::N:L::::M::::T:S:S:::::::G:KAQF::::F:::::::::AY:A::T:::: E:EI:ELL-K:GDLPTS:CJI::Y::::::Q::R:V:V:YM LTIKSDHIRiRYEVLGAIV;GEPIPKPNTi\PESFKLLVR~LRSLALEIN~VIICEKNLK~KLKEI ::Y::::::::O::::TTII:GT::N:ED:::::R::::::::::::L::FLVS:::FQINR::A ::V:::DVNG:TKMYKN::D:NHQME:GM-::::NV:LK:I:::G--::IELED: (b)
rpC1
Liverwort E. coli
(RNA rpo
polperase
Cl
8'
1065 1070 1342
68.5%
43.6%
subunit)
MTYQKKHQHiRIELASPEQiRNWAERVLP~GEIVGQVTK~YTLHYKTHK~EKDGLFCEKiFGPIKSGIC~CGKYQGlEK~KENIKFCEO~ R:::::AR::::V:DYE:L ::::---KRL:HRGVI::K: VKDLLKFLKAQ:KTEEFDAIK:A::::DM::S:SF---------:E:K::E:IN:R:F:::
rpaC (l-580)
GVEFIESRIiRYRMGYlKL~CSVT-HVWYL;RLPsYIANL~AKPLKELES~VYCD T FLARPITKKPTLL-KiQGLFKYEDQ~WKDIFPRFFSPRGFEVFO . ..R.GL::DM::RDI:RVL:FESYVV--:EGGM:N:ERo:I:T--:E:YL-:AL-EE:---:D:-:D :::VTQTKV::E:::H:E::-:P:A:I:F::S...
lT~DLNELYRRViYRNNT---LLDFiARSGSTPGG;YVC~KRLV~~AVDALIDNGiRG~PMKOSH~RPYKSFSDLiEGKEGRFRE~LLGKRVDVS~RSVl A~~~~~D~~~~~~N:::RLKR:::-::A----:OII:RNE::ML:::::::L:::R::RAITG:NK::L::LA:M:K::Q::::~::::::::::::::: . . . .. . . . . . . VVGPFiPLH~CGLPRiMAlELFQAF~-----IRG~IGRNF~PNLRAAKTMiONK~PIIWK~L~EVMOGHPiLLNRAPTLH~LGIQA~OPI~VNGRAIHLH T:::V:R:::::::KK::L:::KP:IVGKLEL:::-----:TTIK:::K:VERE:AVV:DI:O::IRE::V::::::::::::::::E:V:IE:K::~:: ~iVCGGFNAD~DGD~MAVHI~~SLEAQAEA~LLMLSHKNL~SPATGEPlS~PSODM~LGL~lLTl~NN~GiYGNKYNPSK~YDSKKKFSOiPYFSSYDNV YMTRDCVNAKGEGMVLTGP:EAERLYR:GLASLHARVK: :::A::M:TN:I::::N::::I:::::VV:::: ::::AAY::::::::::::V::T::::L
684
IEKNG ----(c)
rpoC2
Liverwort E. coli
(RNA
rpoC2 rpd (581-1407)
plprase
(CONTINUE) 6'
41.7%
subunit)
NAEPVN~IF;NKVMDRTA~;~LISRLIA----H~GITYTTH~L~~LKTLGFOO-~TFGAISLGI~OLLTAPSKS~LIEDAEQYG~LSEKHHNYG~LHAVE V::::MVIPEK:HEI:SE::AEVAEIOECIFQS:LVT:G: :IV:KG:P-:SI:NOALG-:KA::KMLNTCVRIL:LKP:VIFA::IMVT::AYA:RS::-:
KLRQiIETWyATSE;L-KQEM-NPNFk--IT-DP-------LNPVH~MSFSGARGS;SOVH~LVGM~GLMSDPQGQiIDlPI~SNFkEGLSLTEYliSCY RYNKV:OI:A:ANDRVS:AM:D:LQTETV:NR:GQE~KQVSF:SIY::AD::::::AA:IR::A::::::AK:D:~::ET::TA::::::NVLQ:F::TH
Fig. 6.
Liverwort
Chloroplast
Genome. II
321
GARKGV~DTAVRTsDAI%LTRRLVEV~HIVVRKVD~GTLVGINVN~LSE ---KKNNFOi)KLIGRVIAE~IVI--DHRCIA~RNODIGALL~NRLIlLKT :::::LA:::LK:ANS::::::::D:A:DL::TED:::: HE::MMTPVI:GGDV:EPLRDRVL:::T::DVLKPGTADILV:::TLLHEOWCDL:EENSV (Pea) ::::::::::::RT::::IR::S::TRN-GMMPEIILI:T:::::V::::::-GS-:::VV::::::IG:I::F::FO: 9 RTPFNGIIEiNENFVYPTRiRHGHPAWM N~IFLRSPLTC~~NWICQLC~GWSLSffiNL~EffiEAVGIIi\GOSIGEPGTi)LTLRTFHTG~VFTGDIAEH ........... . . ..I..AASRAA::S----------------------------DAVKV::VVS:DTDFGV:AH:::RD:AR:HI:NK:::I:V::A...........M.... R:pI::pv:~................................ 0P::I:T:F::RNTS:::R:::: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..GT..Y..A.S..K.KL..DL.H.......Y..FI C~TNLFLVIKS~NKVHNLTIP~KSLLLVONNi)YVESKOVIA~IRAKTSPF--K~KVOKYIYSN~EG~MHWSTK~RHASEYIHS~IHLlLKTCHiWILSGN __-_________________-------------------------------------------------------------------------------:NID:YVT:E:DDII::VI:::::F:::::D:::K:E:::::::AG:YT:NL::R:R:H:::DS::::::::D:Y::::FMY::V:ILP::S:L:::::K * FHK;NNDLSVLFYic~DKIDFPI~LTKEKNEFS~VKNK-TOLNL~LFHFYLYKK~KIFIK~LT~NILNKINNS~NYNFILOEY~I-KKKKN-FYFi--___-_________-_----_----------SIQ::::-GS[K:SNVKSVVNSSG:LV:T:RN:ELK:-------------------------------SCRS:T-IHF:LR:D::O:TMD-::SNG:TNI:NLLERND:VKHK::R:NTFGTKEKG:SDVSIF:EIICTDHSYPAlHFDTF:FLA:RRR:R:RlPFPf ------KNKNLT-CPL~LKIKKNGVLi(NNEIFAILD~PSYKVKNSGiLKYGNIKVD~INONTNF-ED~OTKLFRPRY~-IIKEGNFF---FiPEEVYVLT ----------------------------IDE:GRTKE -::::PYGAV:AK:DGEOVAGGETVAN-W::H:MPVITEV:GFVRFlDMIDGOT:TROTDE:: OSIOER::ERMSPSGVSIE:PI::IFHR:S:::VF:::O:RRHS:::T::RT:GIHS:F:KED:I:YRGI:ELK:K-:O:OVD-R::---::::::Hl:p ~~LSS~FIKN~~FI~AGTLITSNIRSNT~GLVKI~KKGN~~~ELKIL~GTI~~~~ET~KI~~OI~ILI~~G--~KLF-~EFEC-K~WT~L~WIM~~K~K~ 4:: :LVVLDSAERT::GKD---L:----PAL::VDAQG:DV---LI: ::-DM:---------------------------------------------K:-::LMVR::SLVGI::P::F::::RVG:::RLD::KKK-I::::FS:N:HF.G.MD:::RHSA::::::TV..KKC.KSKKI:::I:V:::ATT:K:Y . . .. . FvLIRPAvEyKISKKiNKSTLF--DLL;KNKKVEIKT~NYIS -------_--______________________________-----------------------------------------------------------:::V::VIL:E:PDSN:FVK::PO::FOEKDNL:L:VV::I::GNGKS:RG:SDTR:::VR:::VFN:DDG:NSSSIE::PA::IEVR:NGLIEY::R:D iIEYSNLE--KK~EKTISKNVL;KNYYD---HFFSi-SKNELKN---KK~GVIRIISNO~NGMOSFIIL~SSDLVKTFK~KKLTKNISI~TNTNTSTAK~ -_------_-______________________________-----------------------------------------------------------:VK-::TSVIR:RNEPSGFGLIGD:KS:RINP ::::H::GKIOOSLSONH:T::MLL:R:KECR:W::::: NNCFOMRP:NNEKSHNG::KDP------FEFNKNFKIiNKKKKLNLT~KNFSIGLLL~KKLGFLGNL~NIVTNSFSSiY-LINYTKLI~NKYSIITKF~HTCONPKWY~IDESKKINK~ILGKHINVN -------_________________________________-------------------------------------------------------------------------------------IISINNN:P::-IALO:A:-:Y:L:H::THNOISII:NLOLD:LTEIF:VI:Y::M::ND::C:PD:YSN:IL: iFNWCFPLFSiLKKKIDFOTiKLGOLLFEN~VIS--KCKT--S"~SGOIISINIEIYFIIRLAKP;LA~ATIHkNYGEFIKEGbTLITLIYERiKSGDI -----------------------------------------------------AO::Lp---------:K:IVoLED:VO:SS::::ARIPo:SGGTK:: ~:HLNWFFLHHFYCEKT:TR:S:::FIC::IC:AOM:NRPHLKLK:::V:IVOMDSV:::S:N:::::P:::::GH:::ILSO::I:V:F:::KSR:::: * IOGLPKVEOLLEAR-P--------IN-SVSINLE---------------------------~GFEDWNNDMiKFI--GNLWGF~LSTKISM~O~OINLV TG:::R:AD:F:::R:KE~AILA~:SGI::FGK:TKGKRRLVITPVDGSD~Y~~MI~KWROL:~::G~RV~RGD~:SD:~~A~HDILRLRGVHA~TRYI: T::::::::I::I: _________ S:D-:I:M :::----------------------------KRlDA::EClT:I:--:IP:::LIGAELTIA:SR:S:: DOIOKVYOSOGVOI~NKHIEIIVRimTSKVITLEbGMTNVFLPG~LIEFSRTQK~NRALE--EAVP--~KPlLLGITK;\SLNTOSFISEASFPETTRV;A NEV:D::RL:::K:ND::::V::::: LR:ATIVNA:SSD-::E::OV:Y::VKIA::E::ANGK:GAT:SRD:::::::::A:E::::A::::::::::T NK:::::R::::H:H:R:::::::: 1::::LVS::::S:I::::::: GLL:AERTG::::--::IC--:RAL:::V::T::::::::::::::::A:::: KAALKGRI~WLKGLKENViLGGLVPAGT~SOEVIWOITiNVFLYODTF~IFPTTEIIH~VLKESlSON~KNNFsl E::VA:KR:E:R:::::::V:R:I::::: YAYHODRMRRRAAG:APAAPOVTA:DASASLAELLNAGLGG:DNE ::::R:::::::::::::V:::MI:V:::FKRIMHRSRSROHNK:T---R::: L:EVEIR:LL:HHRKLLDFANFK:FM
more
than
1386 1407 1163
20.9% 45.4%
Figure 6. Amino acid sequences of RNA polymerase subunits deduced from homologue.Tobacco (Ohme et al., 1986), and E. coli (Ovchinnikov et aE., 1981). The liverwort rpoC1 sequence was obtained after removal of a putative intron together with the rpoC1product, only the N-terminal 580 amino acid residues
the DNA sequences. (a) E. coli @subunit (b) and (c) E. coli /? subunit homologue. between D,,, and LIZ5 (arrow). In (b), of the E. coli #?’subunit (Ovchinnikov et al., 1982) are shown. The following C-terminal portion (amino acid residues 581 to 1407) is compared with liverwort, and pea (partial) rpoC2 products (Cozens & Walker, 1986) in (c).
82.3% and 80.2% identical with the subunits of pea (247 amino acid residues; Cozens et al., 1986) and spinach (247 amino acid residues; Henning & Herrmann,
1986),
respectively.
They
diverge
from
one another mainly in the N-terminal 16 amino acid residues (Fig. 7(a)). A relatively small FOE”,-111 subunit protein (81 amino acid residues) is the atpH (18,014 to 18,259) product. This protein seemsto be highly conserved (97.5%) among plant species,because the liverwort protein differs in only two amino acids from both wheat (Howe et al., 1982) and spinach (Alt et al., 1983) sequences (alanine to serine at the 6th and valine to isoleucine at the 26th), suggesting that the changes are caused by single-base substitutions (Ala to Ser, GCT to TCT; Val to Ile, GTT to ATT; Fig. 7(b)). A split gene, atpF (18,468 to 19,609), encoded an Fo-I subunit (184 amino acid residues), to judge from the amino acid sequence similarities to the
sequences of spinach (50.8%; Henning & Herrmann, 1986) and wheat (49.2%; Bird et al., 1985; and seeFig. 7(c)). Divergence of the liverwort atpF gene was observed in the length of the intron. It consists of 587 nucleotides, but the spinach and wheat introns have 764 bp and 823 bp, respectively (Bird et al., 1985; Henning & Herrmann, 1986). The nucleotide-binding subunit a of F,-ATPase (507 amino acid residues) is encoded by atpA (19,654 to 21,177). The amino acid sequence of liverwort a subunit is highly homologous to the sequencesof tobacco (87.6%; Deno et al., 1983) and wheat (82.8%; Howe et al., 1985). However, the three proteins in the N and C-terminal portions are less similar to the protein of E. coli (Fig. 7(d)). The atp gene cluster, atpBE, is about 30 kb from the atpIHFA cluster and is transcribed in the opposite direction. The nucleotide-binding subunit fl coded for by the atpB gene (55,846 to 54,368; 492 amino acid residues) is 89.0 %, 87.2 o/o and 87.0 y.
322 (a)
K. Umesono stpr
Liverwort Spinach Pea
(Ii+-ATPase
F. subunit
et
al.
IV)
MSHTAKMASiFNNFYElSN~EVGOHFYWOiGSFOVHAQV~ITSWIVIAI~LSLAVLATR~L~TI~~FVEyVLEFI~DLTRTQIGE~EYRPWVPFI~ :NVLSYSINPLKGL:A::G::::::::::I:G::I:DKA:::::V::::::GS:AI:V:SP::::T:::::F:::::::::VSK::::::-::::::::: :NVLLCYIN:LNR::O::A::::::::::l:O::::::::::::V:::::: 1STI:VV::P::::TS:::: F:::::::::VSK::::::-:G::::::: TMFLFIF~S~WSGALF~WRSFEL~NGELA~\PTN~INTT\LALLTS~A~~YAGLHKKGL~YFGKYIQPT~VLLPINILE~FTKPLSLSF~LFGNILA~E~ :::::::::::::::L::KIIQ::H:::::::::::::::::::A:::::::::T::::G::::::::::I::::::::::::::::::::::::::::: :L:::::::::::::L::KIIK::H::::::::::::::::::::::::::::IS::::A::::::::::I::::::::::::::::::::::::::::: VVAVLISLV~LVVPIPMMFiGLFTSAIQAkFATLAAAYiGESMEGHH ::,,::~.....-....,,........G.................L.... . . . . .. . .. . . . . .. . . . . . . .. . . . . . .. .. . . . . . . . ::V::V..........v........G...................... . .. . . . . . . . . .. . . . . . .. . . . . . .. . . . . . . .. . . . . .
(b)
atpN
Liverwort Spinach Wheat (c)
stpF
Liverwort Spinach Wheat
(Ii+-ATPase
F. subunit
248 241 247
80.2% 02.3%
III)
MNPLISAAStIAAGLAVGLiSIGPGIGQGiAAGQAVEGI;EALTIYGLV~ALALLFANP~V .. .. .. .. .. A...................V.........................,............................. . . . . . .. .. .. . . . . .. .. . . . . . .. . . . . . . .. . . . . . .. . . . . . . . . .. . . . . .. . . . .. .. . . . . .. .. . . .. .. .. .. .. A...................V....................................................... . . . . . . . .. . . . . . . .. . . . . . . . .. . . . . . .. . . . . . . .. . . . . .. . . . . . . . .. . . . . .. . . . . . . .. . . . . (H+-ATPare
FG subunit
al 8, 8,
I)
MENGTYFII~SNFWTIAGS~GLNTNLLET~LINLGVVLGiAK~KADD,RING~ :K:V:OSFVFLGH:PS:::::F::OI:A::::::S::::V:IF:::::::D::D:::QR::::: :K:V:HSFVFLAH:PS::::::::DI:A::::::T::V:V:IF::::::
v RNS::LRGK:IEQ:EK::A::KKVEblD::QF:V::Y KD::O:::QR::S::RNS::LRRGTIEQ:EK::I:::KVELE::~Y:M::Y
SPMEKEKOOiINAADEOSKkLEOSKNAT,~FEKQRAIEQ~R~VSRLAL~RALETLKSR~NSELHLRMI~YHIGLLRAM~STIE :EI:R::MN:::STYKTLEOF:NY::E::Q::Q:K::N::::R:FQQ::OG::G::N:C::N :::::T:NAN::MFG::NEITD :EI:R::Api::::TSISLEQ::K::: E:LY::::::MNE:::R:FQQ:VOG::G::N:C::T:::F:T:RAN::,:GSLEWKR (d)
atpA
Liverwort Tobacco Wheat E.
coli
(H+-ATPase
F1 subunit
97.5% 97.5%
184 la4 ,a3
50.82 49.2%
a)
MVNIRPO~I~S,1RK~IE~;NQEVKIVNI~TVLOVGOGI~RIYGLOKVM~GELVEFEOG~VGIALNLES~NVGAVLffiO~LTIQEGSSV~ATGKIAQIP~ ::T::A::::N:::ER:::::R::::::T:::::::::::::H:::E::::::::::E::,::::::::N:::V:::::::L:::::::::::R:::::: :ATL:V:::HK:V:EL:::::RK:G:E:::R:V::::::::: 1::GEI:S::::::AE::R::::::::K:::I:::::::M:::::F:::::R:P:::: :-'JLNST::: EL:KOR:A:F:VVSEAH:E::IVS:S::VI::H::AOC:Q::M,SLffiNRYA::::::R:S::::V::PYAOLA::MK::C::R:LEV:: SOAYLGRVV~ALAQPIOGK~~IPASEFRL~ESPAPG~,S~RSVYEPMOT~LIAIDSM,P~GRGQRELII~DRQTGKTAV~IOTILNOKG~NVVCVYVAI~ :E::::::,::::K::::R:E:S:::::::::A:::::::::::::L:::::::::::::::::::::::::::::::::T::::::O::::I::::::: :E:::::::::::K::::::E:,:::S::::::::S::::::::::L:::::::::::::::::::::::::::::::::T:::::::::G:I::::::: GRGL::::::T:GA::::::PLDHOG:SAV:AI:::V:E:Q::OQ:V:::YK:V::::::::::::::::::::::::~:::A:I::ROSGIK:I::::: QKASSVA~V~NTFEORGAL~YT,VVAETA~SPATL~YLA~YTGAALAEY~MYRKQHTLI~YODLSK~AQ~YRQMSLLLR~PPGREAYPG~VFYLHSRLL~ 0:::::::::::::::::::::::ER::::::::P:::::::::::::::::::::::L:::::::::::: ::::::::::T:LQE:::M:::::::::: :R::::::::T:.HEE::,4 :::: ::-:M:D .~~~~~~~~~~~~~~~~~~~~~~~~R.................................'..........' . .. .. . . . . . .. . . . . . . .. . . .. . . . . . .. . . . . . .. . . . . .. .. . . . .. . . . . . . .. . . . . . .. . ::::TISN::RKL:EH:::AN::::VA::SES:A::::: RMPV:LMG:::RD:GEDA:::::::::::V::::,::::::::::::F::::::::::::: RAAKLSSNL------~E-----GSMTALPI;ETQAGDVSA~IPTNVISIT~G~,fLSADL;NAGIRPAIN;rGISVSRVGS~A~IKAMK~V~GKLKLELA~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~...........................~......."...........................' . .. . . . . .. .. . . . . .. . . . . . .. .. . . . .. . . . .. . . . . . . .. . . . . . .. . . . . . .. . . . . .. . . . . . . ..-......s...... :::::N:t:------::-----.........".s.............................. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. ...I............. :::RVNAEYVEAFTK::VKGKT::L:::::,:::::::::FV::::::::::::::ETN:::::::::V:P::::::::G:::TY,::MLS:GIRTA::: ~AELEAfA~fbSDLDKATQN~LARGORLRE;LKQSoSAPL~VEEQ,AT,Y~GVNGY~DVL~TG~VKKFLI~LREYLVTNK~QFAEIIRST~VFTEQAENL .. .. .. .. .. .. .. .. ...*............................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..T.....".....T.....S..V... R:::VE::T::K::::::Q:::S:::T:::E::A: :::: 0 :::::::A:::TS:::::::R::::::::::AN::P:::::::::::TR::::S::,E::N:::OE::KH:KOT::::Q:::S:S:T::::::,: YR::A::S:::::::D::RK::DH::KVT:::::K:Y::M::AQ:SLVLfAAER::: ADV:LSK,GS:EAA:LA:VORDH:LMO::NG-:GGYNDEI:GK~KEAITEHIE~FLFOEEK :::::Q:QMOR:IL::QA :::::Q:'JL:R:SL: ::G-:L---0S:KAToSW
(e)
etptl
Liverwort Spinach Maize Pea E.
coli
(Ii+-ATPase
Fl
subunit
507 507 504 513
87.6% 82.8% 55.0%
6)
MKTNFLA--FG~STLVAKNIG~ITQVIGPVL6VAF5PGKMP~IYNSLIVKO~NSAGEEINV~CEV~QLLGN~KVRAV~SA~DG~RGMKV~DTGAPLTV :R,:PTTSDP:V:::EK::L:R:A:I:::::N:::P:::::::::A::::GROT::OPM:::::::::::::R:::::::::::LT:::E::::::::S: :R::PTTSRP:I::IEE:SV:R:D:I::::::IT:P:::L:Y:::A::::SROT~OK~:::::::::::::: R:::::::::E:L::::E:::::T::S: :T,TPPPSOTEV:V:EN::L:R:::,:::::::V:P:::::Y:::A:::QGROTV:K~::::::::::::::RI::::::::::LK:::E::::::A:S: MAT:K:V::::A:V::E:POOAV:RV:OA:E:--::--:N:RL:L-.... *.**Q::GGI::TI::GSS::LR::LO:K:LEH:IE: P~GEATLGR,f~VLGEPVDNL~PVEVTTTFPiHRAAPAfTQ~OTKLS,fET~IKVVDLLAP~RRGGKIGLF~GAGVGKTVLiMEL,NN,LK~HGGVSVFG :::GP::::::::::::::::R::DTR::S::::S:::::::::::::::::::::N::::::::::::::::::::::::::::::::A:::::::::: :::G:::::::::::::,::::::DTSA::::::S::::,E::::::::::::::::::::::::::::::::::::::::::::::::A:::::::::: :::G.............]......OTR::S::::S....,....P............................................A:::::::::: . . . .. .. .. . . . . . .. . . . . ..* .. . . . . .. . . . . . .. . . . . .. . . . . . .. . . . . .. . . . . . . .. . . . .. . V::::::::::::NM::::R::AIE:S:Y:::A :::K::::::M::::::::MK:EIGEEERWA::::::SYEE:SNSQELL::::::I::MC:FAK::: G;GERTREGND;YMEMKESKViNEONISESK~AL"YG~MNE~PGARMRVGL~ALTMAEYFR~VNKQOVLLFiDNIFRFVQA~SEVSALLGR~PSAVGY~P A.................................... :::::::::::::::::::G::::::: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E................................... . . . .. . . . .. . . . . . . .. . . . . . .. . . . . . .. . . . :::::::::::::::::::G::::K::E:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: E................................... .. .. .. .. .. .. .. .. .. .. .. .. ... .. .. .. .. ..R....K..A............:::::::::::::::::::::::: . . . .. . . . . . .. . . . . . . .. . . . . . .. . . . . .. .. .. :::::::::::F:H::TD:N::D-------::S:::::::::::N:L::A::G:::::K:::-EGR:::::V:::Y:YTL::T::::::::::::::::: T~STEMGTLoE~ITSTKEGS,~SIQAVYVPAi)DLTDPAPAT~FAHLDATTV~SRGLAAKGI~PAVDPLOST~TML~PWIVG~EHYETAOGV~~TLORYKE .. .. .. .. .. ..s....................... . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..a......................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . R::::::::I::R::E::::::: . . . . . . S.........K........................... ..,.... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. s...................R...,,.......R::E:.::::: . . . . . .. . . . .. . . . . . .. . . . . .. . . . . G....,.........,............. . . . . . . . . . . . . . . . . . . :: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..R...........R.......... .. ::AE:::V:::::::::T:::::V::::::::::::: S:::::::::::V::::~,:SL:::::::::::::RQ:D:LV::Q:::O::R::OSI::::Q: Lb011 A ILGLD~LSEEDRLTV~RARKIERFL~PFFVAEVF~GSPGKYVSL~ETIKGF~MI~~ELDSLPE~AFYLVGNIO~ATAKAATLO~ES .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ...*....................... . . . . . . . . . . . . . . . . . . . . . . . ..G.A...R...L...........................MN.EH..KLKK . .. .. . . . .. . . . . . . .. . . . . . .. . . .. .. .. .. .. :::::::::::::::::::::::::::::::::::::::::::G:A:::R:::L:::::::G:::::::::::::::ST::IN:EE::KLKK . . . . ..v................. :::M: . . . . . . . . . . . . . . . . . . . . . ..F...............:::G:A:::R::: L:::::::::::::::::::::::::::TN:T :K.... . . :::M:::::::K:V:::::::Q:::::::::::::::::::::::KD::R::KG:ME::Y:H:::::::M::S:E::VE::KK:
Fig. 7.
492 498 498 491
a9.0% 07.2% 87.0%
460
63.13%
Liverwort
(f)
atpE
Liverwwt Spinach Ma1.X Pea E. coli
(H+-ATPase
FT subunit
Chloroplast
Genome
323
II
c)
M-LNLRIMAPfrRIVWNSDIQ~IILSTNSGQiGILPNHASViTALDAGIVKiRL-NWWSTMiLETFQKA :T:::cv~T:::s*:::EvK............v..... . . . . . . . . . . . . . . . ..PTA..V.I::LR:::-::::L:L:::::::R:G::EI ::::::::RG:D::P::::Q:LEI: :K:::VVLT:K::I:DCEVK............V...... . . . . . . . . . . . . . . ..PIN..V.M:PLR:::L:::: L:AV:WS:P:R:V::EII::G::::LG:D::PE:::QALEI: :TF::CVLT::::::D:EVK::::::::::::V:Q:::PIA:::: I::LR:::-K:R:L:::::::::R:G::EI ::::T:::S::D:NP::::Q:LQI: IKP:MIR:VKQHGHEEFIV:S::ILEVQPG:V:V:ADT:IRGQDL:EAR:M:AKR:: MA:TVH:DVVVVVQQMFSGLVEK:QVTGSE:EL::V:G::PL::: KiNLEEAEGNK;(KEIEALLVFkRAKARLEAIk!4SKL EA::RK:::KRQ:-::: N:ALR::RT:V::S:TI:S EA::SK:::T:EL-V::K:ALR::RI:V::V:WIPPSN EA::NK:::KRET-:::N:SLR:::T:V:::VETI:RIS EEHISSSH:DVDV-AQ:SAELAK:I:Q:RLSS
135 134 137 137 133
63.0% 45.9x 60.7% 23.02
Figure 7. Amino acid sequence homologies in each subunit of H+-ATPase. (a) ATPase FO subunit IV or a. Spinach (Henning & Herrmann, 1986) and Pea (Cozens et al., 1986). (b) ATPase F, subunit III. Spinach (Alt et aZ., 1983) and wheat (Howe et al.. 1982). (c) ATPase F, subunit I. An intron is located at the 49th leucine codon (an arrow). Spinach (Henning & Herrmann, 1986) and wheat (Bird et aZ., 1985). (d) ATPase F, subunit a. Tobacco (Deno et al., 1983), wheat (Howe et al., 1985), and E. coli (Kanazawa et al., 1981). (e) ATPase F, subunit fl. (f) ATPase Fi subunit E. Subunits /Yl and E of sninach fzurawski et al., 1982a), maize (Krebbers et al., 1982), E. coli (Saraste et al., 1981; Kanazawa et al., 1982),and pea (Zurawski et al., 1986). ”
identical with the fi subunits of spinach (Zurawski et al., 1982a), maize (Krebbers et al., 1982), and pea (Zurawski et al., 1986), respectively. About 60 amino acid residues at the N terminus are rather divergent among these plant species (Fig. 7(e)). In contrast to the conserved p subunits, liverwort E subunit (135 amino acid residues), the product of the atpE gene (54,362 to 53,955), is lesshomologous to those of spinach (63.0%; Zurawski et al., 1982a), pea (60.7 %; Zurawski et al., 1986) and maize (45.9%; Krebbers et al., 1982; and see Fig. 7(f)). Although the initiation codon of atpE overlaps the termination codon of atpB in most of the higher plant chloroplast genomes, the atpB and atpE genes in liverwort are separated by a spacer of five nucleotides. (e) Genes for photosystem
I polypeptides
There are two tightly linked genes encoding photosystem I P700 chlorophyll a apoproteins (psaA and psaB) that are partly homologous (Fig. 1). These genes were identified by comparison with genes in maize (Fish et al., 1985) and spinach (Kirsch et al., 1986). The psaA (47,207 to 44,955) and psaB (44,928 to 42,724) genes code for polypeptides 750 and 734 amino acid residues long, respectively. The liverwort psaA protein is highly homologous to the spinach (93.2%) and maize (91.2%) products (Fig. 8(a)). Likewise, the psaB protein is conserved (spinach, 92.3%; maize, 91.2%; see Fig. 8(b)). Spacers between the psaA and psaB genes are relatively short (maize and spinach, 25 bp; liverwort, 26 bp), and their sequencesare nearly identical; 5’-TGGCTAAGGAGGATTTGAAAtGCATT-3’.
There is a transition of
A A to G in maize (G m the preceding nucleotide sequence) and an insertion of A in liverwort This
spacer
may represent
a signal
control of the downstream psaB
for translational
gene.
(‘). -
(f) Genes for photosystem
II polypeptides
Four (psbA, psbC, psbD and psbG) of the eight photosystem II polypeptide genes so far identified in
higher
plant
chloroplast
genomes
are
in
the
region. Two of them (5’-psbD-psbC-3’) are tandemly oriented, and no intron-like sequence was found in these genes (Fig. 1). A 32,000 M, protein predicted from the psbA gene (28,368 to 29,429) consists of 353 amino acid residues and retains a highly conservative structure with the protein of higher plants; 96.9% identical with that of spinach (Zurawski et al., 19823) and 96.6% with soybean (Spielmann & Stutz, 1983). The higher plant psbA genes do not contain any codons for lysine, but in liverwort there is a lysine codon at the 238th residue instead of the arginine observed in higher plants (Fig. 9(a)). Another protein called D2 is encoded by the psbD gene (38,855 to 39,916), which is structurally related to the 32,000 M, protein reported by Rochaix et al. (1984). The molecular size of the liverwort psbD protein (353 amino acid residues) is nearly identical with that of the psbA, and the primary structure is highly conserved with the counterparts in spinach (96.3%; Alt et al., 1984; Holschuh et al., 1984) and pea (96.3%; Rasmussen et al., 1984; and see Fig. 9(b)). One of the photosystem II chlorophyll a binding proteins is coded for by the psbC gene (39,864 to 41,285), and the other is encoded by psbB (Fukuzawa et al., 1988), located about 28 kb apart in the liverwort chloroplast genome. We first identified an ORF (previously designated as ORF701) from the observation that the ORF701 product affected the antibiotic sensitivity of the host E. coZi cells (Umesono et al., 1984). Sequence comparison between ORF701 and the spinach psbC confirmed the identity. The amino acid sequence deduced from the ORF701 shares 94.7% homology with spinach psbC (Alt et al., 1984; Holschuh et al., 1984), if the initiation codon of psbC gene is taken at the second ATG of ORF701. The liverwort psbC protein contains 473 amino acid residues (Fig. 9(c)), and the 5’-terminal portion of psbC overlaps the 3’ terminus of psbD by 53 bp (Fig. 2). This over-
324
K. Umesono (a)
ps.s4
Liverwort Spinach Maize
(Photosystk
I P7DD
chlorophyll
et al.
a apopmtein)
MTIRSPEPEiKIVVEKDPViTsFEKWAKPi;HFSRTLAKG-~slTTWIWNL~ADAHDFDSH~NDLEEISRK~FSAHFGQLAiIFIWLSGMY~HGARFSNYE :I::::::::::L:DR::::::::A::::::::::I:::-:E:::::::::::::::::::S::::::::I::::::::S:::L:::::::::::::::: :I:::S::::::A:DR::I:::::E::R:::::::I:::N:D::::::::::::::::::: G:::::::::::::::::S:::L:::::::::::::::: ;WLSDPTHIKPSA~VVWPIV~QEILNGDVG~GF~GIQIT~FFQLWRASGiTSELQLYST~IGGLVFAAL~LFAGWFHYH~AAPKLAWFQ~VESMLNHHL .. .. .. .. .. .. .. .. .. G.......................R..........I.............t.... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.................................... . . . . . .. .. . . . . .. . . . . . .. . . . . .. .. .. . . . . .. .. .. .. .. .. .. ..G.......................R..........I.............C....A.I..S............................... . . .. .. . . . . . . .. . . . . . . . . . .. . . .. .. . . . . . . . . . . .. .. . . . .. . . . . . .. . . . . .. .. . . . . .. . . . . . . . . ~GLLGLG~L~~AGH~~HV~L~IN~LLDAGV~PKEIPLPHE~ILNRDLLAE~Y~~FAKGLT~FFTLNWS~Y~DFLTFRGGL~~~TGGLWLT~TAHHHLAIA .. .. .. .. .. .. .. .. .. .. .. .. .. ..I........F.N.............L........~...... .. ........ . . . . . . . . . . . . . . . . . . . . . . . . . . . E:A:::::::::K:A:::::::::D::::::::::::::::::: .. .. .. .. .. .. .. .. .. .. .. .. .. ..I........F........................o...... .. . . . . . .. . . . . . . . .. . . . . . . .. . . . . .. . . E:A:::::::::K:AE::S::::: D:I::::::S:l:::::::: CLFLVAGHMY~TNWGIGHSFKEILEAHKGP~TGEGHKGLY~ILTTSWHA~~ALNLAMLG~~TIIVAHHMY~MPPYPYLAT~YGT~L~LFT~HMWIGGFLI 1:::I:::::::::::::GL:D:::::::::::Q:::::::::::::::::::::::::::::V:::::::::::::::::::::::::::::::::::: I:::I:::::::::::::GL:D:::::::::::9:::::::::::::::::S::::::::T::V::::::S::::::::::::::::::::::::::::: CGAAAHAAIF~~RD~DPTT~CNNLLDRVLR~RDAIISHLN~V~IFLGFH~~GLYIHNDTM~ALGRP~DMF~DTAI~L~P~~ADWIQNT~A~APNFTAPNA A...................................................SA:::G: :::::::::::::::::::R::D:::::::::::::::::: . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . .. .. . . . . .. . . . . . .. .. . . :::::::::::::::::::R::D:::::::::::::::::::::::::::::::::::::::::::::::::A::::::I:::::::I::G::GV:::G: ~AST~LTWGG~DVIAVG~KV~LLPIPLGTA~FLVHHIHAF~IHVTVLILL~GVLFAR~~R~I~DKANLGF~FP~DGPGRG~TC~~~AWDH~FLGLF~~~~ T:::::::::S:LV:::G............,..................................................................... . . . .. . . . . . . . . .. . . . . . .. . . . . .. . . . . . .. . . . . . . . . .. . . . . .. . . . . . .. . . . . .. . . . . . . . .. . . . . . . . .. TT:::::::::ELV:I:G.............................................................::::::::::::::::::::: . . .. . . . . . . . .. . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . .. .. . . . ~ISVVIFHFS;KMPSDVWGTiSEQGVVTHIiGGNFADSAI~INGWLRDFL~AQASDVIQS~GSSLSAYGL~FLGAHFVWA~SLMFLFSGR~YWQELIESI :::::::::::::::::::S::D:::::::::::::::S:::::::::::::::::::::::::::::::F::::::::::::::::::::::::::::: ::::::::::::::::::::::O::I::::::::::::s:::::::::::::::::::::::::::::::F::::::::::::::::::::::::::::: ;WAHNKLKVAPAIOPRAL~I~PGRAVGVAH;LLGGIATTW~FFLARIIAV~ .. .. .. .. .. .. .. .. .. .. ..T.......V.......T...................... . . .,..... . . . . . .. . . . . .. . . . . .. . . . . . .. . . . .. .. .. .. .. .. .. .. .. .. ..T.......I....... . . . . . . . . . . . . . . . . T....,................. . . .. . . . . . . .. . . . .. .. . . .
(b)
ps.&
Liverwort Spinach Maize
(Photosystem
I P~DD chlorophyll
750
e apoprotein)
MASRFPKFS~GLSDDPTTRklWFGIATAH~FESHDDMTE~RLYDKIFAS~FGDLAIIFL~TSGNLFHVA~DGNFEAWGD~PLHVRPIAH~lWDPHFGDP~ ::L:::R:::::A.......................I:::::::N ::::::::::::::::::::::::::::::s:v:::::::::::::::::::::: . . . . . . .. .. . . . . . . .. . . . . . N..............................S:I:::::::::::::::::::::: :EL:::R:::::,.......................I....... . . . . . . . . . . . . . . . . . . . . . . . . ..*... . . . . . . . . . . . . . . . . . . . . . . ...*.... VEAFTRGGA~GPVNIAYSGtY~WWYTIGL~TNQDLyNGAiFLVlLSSIS~IAGWLHL~P~WKPKVSWFK~AESRLNHHL~GLFGVSSLA~TGHLVHVAI~ .. .. .. .. .. .. .. ..L......................E...T....:LF::V:::LG........... . . . . . .. . . . . . . . . .. . . . . . . . . . . . . . . . . . . s...............'.......... . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..A......................E...T....FLF..TL...G.::::::::::SL::::::::::::::::::::::::::::::::::: . . . .. .. . . . . .. .. . . . . . . .. . . . . .. . . E-SRGEHVRWbNFLTKLPHPiCLGPFFAG~~Nt"AQNVDS~NHAFGTSQG~GTAILTFIG~FHP~TQSLW~TDIAHHHLAiAVVFIIAGH~YRTNFG~GH G-::::Y:::N:::DV::::Q::::L:T::::L:::: P:::S:L:::::::::::::LL::::::::::::::M::::::::F::LV::::::::::::: . . . . . . . . . . . . . . . . . . . . . ..FI.L....~~~~~~~~~~ GS::::Y:::N:::DV::Y:O::::LLT::::L::::P:::::L:::T:::::::::LL....................... ~IKEILE~HTP~GGRLGRGH;GLYDTINNS~HFQLGLALA~LGVITSLVA~HMYSLPPYA~LA~DFTTQA~LYTHH~YIA~FIMTGAFAH~AIFFIRDYN :M:DL::A:I...............................................A... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I.*........................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. ,. .. .. .. .. .. :::,L::A...................... . . . . . . . . . . . . . . . . . . . . . . I..........................A...I...................................... .. . . . . .. . . . . . .. . . . . . . .. . . . . .. . . . .. .. . . . .. . . . . . .. . . . . .. . . . . . . .. .. . . . ~E~NKDNVLA~~~LEHKEAIISHLSWASLFLGFHTLGL~VH~DVMLAFGTP~K~ILIE~IF~~WI~~AHGK~L~GFDVLL~~TNN~AFNAG~SI~L~G~LD ::::E........D........................................................TS::::::::::SG::::::R........N . .._.... . . . . . .. . . . .. . . . . . . . .. . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . .. . .. .. ...E........D......................p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..TT....I .. ::::::G:T::::RN:::::::N
.......,
6tNNNSNSLFiTIGPGDFLVkHAIALGLHT;TLILVKGAL~ARGSKLMPD~KEFGYSFPC~GPGRGGTCDiSAWDAFYLA~FWMLNTlGW~TFYWHWKHl ~V~~....................................".'....."' . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D....................................: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. :V:E....................................*.. . . . . D............................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. ...a.. . . .. . . . . . .. . . . . . .. . . . . .. . . . . . .. . . . . .. . . . . . . . . .. ~LW~GNAA~FNESSTYLMGW~RDYLWLNSS~)LINGYNPFG~N~L~VWAWM~LFGHLVWAT~F~FLI~WRG~W~ELIETLA~AHERT~~AN~~R~KDK~~A ~~~~~~vs................................'............ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . ..I..R..... ~~~~~~vs................,........................................" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7:::::::::::::::::::::::I::R~~~~~ iSIVQARLVGiAHFSVGYIFivnnFLIAST:GKFG .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..*.......... .. . . . . . . . .. . . ::v::::::::::::::::::::::::::::::::
734 734 735
92.3% 91.2%
Figure 8. Amino acid sequences of photosystem I P700 chlorophyl a apoproteins. Liverwort (a) psaA and (b) psaB proteins share 45% homology, as do those of maize (Fish et al., 1985) and spinach (Kirsch et al., 1986).
lapping of p&D and psbC has been described for spinach (Alt et al., 1984; Holschuh et al., 1984). However, we cannot rule out the possibility of the presence of a conserved GTG codon 36 nucleotides downstream from the assigned ATG, because there is an SD sequence (AGGAGG at 39,885 to 39,890 in Fig. 2). Perhaps this GTG is the initiation codon for translation of the psbC mRNA. Steinmetz et al. (1986) identified a new protein (248 amino acid residues long) associated with the photosystem II complex, and analysed the fine structure of its gene, p&G, on the maize chloroplast genome. The liverwort counterpart of psbG (52,524
to 51,793) encoding 243 amino acids was identified and the predicted amino acid sequences were compared. Unlike other photosystem II polypeptides, the psbG proteins had significantly diverged in both the N and C-terminal portions; on the average, they are only 65*Oo/o homologous to proteins in the maize psbG product, although the central portions (maize, amino acid residues 36 to 182; liverwort, 37 to 183) are 91.8% identical (Fig. 9(d)). The liverwort psbG gene overlaps the last seven nucleotides of the preceding ndh3 gene (Fig. 3), and may correspond to tobacco ORF284 (bhpB) (Shinozaki et al., 1986).
(a)
p&A
Li vewort Spinach Soybean C. reinhardcii E. gracilis Anabaena
(Photosystem
II
3PK protein)
MTATLERRE~ASIffiRFCD;VTSTENRLViGWFGVLMIPiLLVGNNIIS~AIIPTSAAI~LHFVPIWEA~ :::,::::::E:L:::::N:,............................................................................... . . . . .. . . . . . .. .. . . . .. . . . . .. . . . . . .. . . . . .. . . . . . . . .. . . . . .. . . . . . .. . . . . . . .. . . . . . .. . . . :::I::::::E:L:::::N:I::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 1:::C::::::::::::::::::::::::::::::::::::::T::V::::N::::::::::::: :::I::::::S:L:A:::E:I:::::::::::::: HISPV:KKYARP:L:V:::A::A:KK:::: V::::::::::::::AT:::::::::::::::::::::::::F:::::LT::VV:::N::::::::::::T ::T::QQ:S::NV:E:::T:I:::::: 1:V::::::::::::A::VC:::::V::::::::::::::A::: 1:::::::::VV:S:N::::::::::::: SVDEWLVNG~PYELIVLHFiLGVACVffiR~WELSVRLGM~PWIAVAVSA~VAAATAVFLiVPlGOGSFS~GMPLGISGT~NFMIVFQAE~NILMHPFHM~ .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ...a.... . . . . ..F................................................................. .. . . . .. . . . .. . . . . . .. . . . . .. . . . . . . . .. . . . .. . . . . . . .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..F................................................................. . . .. .. . . . .. . . . .. . . . . . . .. . . . . . .. . . . . .. . . . . . .. . . . . .. :L::::::::::9:::C::::::y:::::::::::F .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. s .. .. .. .. V .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. IV::L..................................... :L::::::::::Q:::C::FI:ICS:::::::::F:::::::::::::::::::S::: . . . . . .. . . . .. .. . . . .. . :L::::::::::Q:VIF::: 1:C:::L::P:::::::::::::C::::::L:S:::::::::::::::::::::::::::::::::::::::::::::::
. . .. .
. . .. .
. .. . . . . . .. .. .. .. .. .. . . . ..
.. .. .. .. ..
. . .. .
.. .. . . . . .. .. .. .. .. .. . . .. .
. . .. .
. . .. .
GvAGVFGGSiFS~HGSLViSSLIRETTE~ESANAGVKFi;~EEETVNIV;\AHGYFGRLIiQYASFNNSR~LHFFLAAWP~VGIWFTALGiSTMAFN~NG~ ::::::::::::::::::::::::::::::::::E::R:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: .................... ....................
:::::::::::::E::R..............................................................
.................... ....................
:::::::::::::E::R::::::::::::::::::::::::::::::::::::::::::i::::::::i::::::::::. . . . . . . . . . . . . . . . . . . . . . . . . . ..a.............
.................... .................... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
NFNQSVVDSi)GRVINTWAOi .................... .................... .................... .................... ........... .............
..L
...... ......
::L::::::::I:V:::::::::::::I:::A:::::::::::::::::::::::V::::::::::::V:::::::::: ::":::::,::Q:",.................................9.......... . . . . . . . . . ..*...................... . NRANLGMEVMHERNAHNFbLDLAAVE-------AP;\VNG II ::::::::::::::::::::::::,:-------::ST:: ::::::::::::::::::::::::,D-------::s*:: :::::::::::::::::::::::STN-------SSSN:
:::::II::::::::::::::::::::::::::::::::::::::G:VAPVALT:::I:: psbD
Liverwort Spinach Pea C . reinhardtii
(Photosystem
II
. . . . . .. . . .
. . . . . . . . ..I........V:::::::::: 353 353 353 352 345 360
.. .. .. .. ..~................................*.... . . .. . . . . . .. . . . . . . .. . . . . .. . . . .. . . . . . .. . .
(b)
.. . . . . . .
96.9% 96.6% 91.2% 84.1% 87.5%
DZ protein)
MTIAIGKSSicEPKGLFDsM~DWLRRDRFV~VGWSGLLLF~CAVFALGGW~TGTTFVTSW~THGLASSYL~GCNFLTAAV~TPANSLAHS~LLLWGPEAO~ ::::"::FT:DE:D...................................................................................... ::::L::FT:WND:::i:::i:::::::::::::::::::::::;:::::::::::::::::::::::::::::::::::::::::::::::::::::: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..I........................................ :::::::TYQ:KRTW::DA:::::Q::::::::::::::::::::::::L:::::::::::::::T:::::::::::::::::::M:::::FV::::::: DFTRWCQLG~LWTFVALHGiFGLIGFMLRbFELARSVQL~PVNAIAFSG~IAVFVSVFLiVPLGOSGWF~APSFGVAAI~RFILFFOGF~NWTLNPFHM~ .. .. .. .. .. .. .. .. .. .. ..A........A.............................................................................. . . . . . . ...* .. . . . . . . .. . . . .. .. . . . .. . . . .. . . . .. . . . . .. . . . . . .. . . . . .. . . . .. . . . . . .. . . . .. .. . . . . .. . . :L.................................................................................................. . . .. . . . .. . . . . .. .. . . . .. . . . . . . .. . . . .. . . . . . .. . .. .. . . . .. . . . . .. . . . . . .. . . . .. . . . . .. . . . . . .. . . . . . .. . . . . .. . . ............. . . . . . . . . . ..A ,............*.....I.... . . .. . . . .. . . . . .. . . . .. . . . . . . .. . . . .. . . . . .. . . . .. . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . N.........,A................................................... GVAGVLGAAiLCAIHcnTV~NTLFEDGOG~NTFRAFNPT~SEETVSMVT~NRFWSQIFG~AFSNKRWLH~FMLFVPVTG~WMSAIGVVG~ALNLRAYDF~ A...........................................L............... :::::::::::::::::::::::::::::::::::::::: . .. . . . .. . . . . .. . . . .. . . . . . .. . . . .. . . . . . . .. . . . . . . . .. .. . . . . .. . . A........................................... :::::::::::::::::::::::::::::::::::::::: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..L............... A.........................,...... :::::::::::::::::::::::::::::::::::::::: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L.......................... . . .. . . . . . .. . . . .. .. . . . . .. . . S~EIRAAED~EFETFVTKNiLLNEGIRAWflAAPDQPHEN~VFPEEVLPR~NAl . . . . . . . . ...* ::::::::::::::::::::::::::::::::::::::::1............ .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..T..................... . . . . . . . .. . . .. . . . .. . . . . ::::::::::::FFSIFIIPNHIINGSVFFNKSQKQIVYI
(c)
pstC
Liverwort Maize
(Photosystem
II
P6GO chlorophyll
353 353 353 339
96.3% 96.3% 82.4%
e apoprotein)
MKILYSQRR~VPVETLFNG~LAL~RDQE~TGFAWWAGN~RLINLSGKL~GAHVAHAGLiVFWAGAMNL~EVAHFVPEK~NYEOGLILL~HLATLGWGV~ ::T:::L::::::::::::::T:A............................................................................ . .. . . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. .. . . . . .. . . . .. . . . .. . . . . . .. . . . .. .. . . . . .. . . ::T:::L::::H...........T...........................................................................: . . . . . .. . . . . .. .. ....*. ..L....H....::::FV:A::::::::::::::::::::::::::::::::~:::::::::::::::""..""..'.'..'."""" G........................... .... . . . . . .. . . . .. .. . . . . .. . . . .. .. . . . . . . . . . . . . . . . . . . . ..*.......................... PGGEIVDTF~VFVSGVLHLiSSAVLGFGGiYHALIGPETiPGGGDV~Kl~ ::::V,............................L..................R...............,..............V.............., . . . . .. .. . . . . .. . . . . . .. . . . .. .. . . .. . . . .. . . . .. . . . . . . . . . . .. . . . .. . . .. . . . . .. . . . . . . . . . . . .. . . . . .. . . ::::v,............................L.................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..R...............I.S:::::::F:::::::::::::::::::: ::::VL:::::::::::::::::::::::::O::L::::::::::::::::::R:::::::::::::::L::::::L:::::::V::::::::::::::: NLTLSPGVI~G~LLKSPFG~EGWIVSVDN~EOIIGGHVW~GS~C~FGG~~H~LTKPFAW~RRALVWSGE~YLSVSLGAI~VFGFIACCF~WFNNTAYPS~ :v::::s,:::c................. . . . . . . . . . . . . . ..cl..........*.v... L..............................A:LS.................... . .. . . . . .. . . . . . .. . . . . .. . . .. . . . . . . . .. . . . . . .. . . . . .. . . :F::::~,L...................~................ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L................................L..................... . .. . . . . .. . . . .. . . . . . .. . . . .. . . . . . . . . . . .. . . . . . .. . . . . .. . . D..............."L:................ :::::::::::::::::::::::::::: . . . . .. . . . .. . . . . . . . . . . . . . . . . . . ..F..............LS:::::::::::::::::::: FYGPTGPEAjQAOAFTFLV~D~RLGANVG~AOGPTGLGK;IMRSPTGEliFGGETMRFw~LRAPWLEPL~GPNGLDLSK~KKD~QPWQE~RSAEVMTHA~ v.............................R.................... ::::::::::::::::::::::::::::::::::::::::L::::::: . . . . .. . . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . . . . .. . . . .. .. L.......".............................R..................... :::::::::::::::::::::::::::::::::::::::: . .. . . . . . . . . .. . . . . .. . . . . .. . . . .. . . . . . . . . . . .. . . . . . .. . . . . .. . . ".............................R.................... . . . . .. . . . . . .. . . . . .. .. ::::::::::::::::::::::::::::::::::::::::L::::::: ,............................ LGSLNSVGG~ATEINAVNY~SPRSWLATSHFVLGFFFFV~HLWHAGRAR~AAAGFEKGI~RDFEPVLSM~PLN S.........L.................................... . . . . . .. . . . .. . . . .. . . . .. . . . . . .. . . . . .. . . . .. . . . .. :::::::::::::::::::::::::: L.................................... :::::::::::::::::::::::::::::::::::: .. . . . .. . . . . .. . . . .. . . . . . .. . . . . .. . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..‘...................................~....y..... . . . . .. . . . .. . . . .. . . . .. .. . .. .. . . . . .. . . . . . . . . . . . . .
(d)
ps#:
Liverwort Maize
(Photosystem
II
473 473 473 473
94.7% 95.3% 94.7%
G protein)
MvLNFKFFTtENSLEDNSTiMLKNSIESS~lNKTLTNSIiLTTFNOFSN~ARLSSLWPL~YGTSCCFIE~ASLIGSRFO~DRYGLVPRS~PR~AD~IIT~ :::TE"SEKKKKEGK:-:,ET,MSL::FPLLDQ:SS::V:S::P::L:::S.............................................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..L.. GT~~KMAPSL~RL~EPMP~PK~~I~GAET(TGGMFST~S~TT~RG~~~L~PV~I~L~~CPPKPEAII~AIIKLRKKI~~EIY~EKKI~KKGTRFFTLN s............v........... :::::::::::::::::::::::::::::::::::::::::: . . . . . . . . . . . . . . . . . . . . . ..V...LT......:R::I:DRTLCQSQKKNRSFT HQFNFFSNLbNPKLTSSNQiF-OSKKTSKViLETSLTFKEiENL TRHKLYVRRSTHTG:VEOELLY::PS:LDISS::FFKS:SSVSSVKLVN
243 248
65.0%
Figure 9. Amino acid sequences of photosystem II proteins. (a) Herbicide-binding 32,000 M, protein. Spinach (Zurawski et al., 19823), soybean (Spielmann & Stutz, 1983), C. reinhardtii (Erickson et al., 1984), Eu. gra&&y (Keller & Stutz, 1984), Anabaena 7120 (psbA1) (Curtis & Haselkorn, 1984). Note that spinach, soybean and C. rei&&tii p&A proteins do not contain a lysine residue, but that those of liverwort, Eu. gra&% and Anabaem do. (b) 32,000 Jf,-like D2 protein. Spinach (Alt et al., 1984; Holschuh et al., 1984), pea (Rasmussen et al., 1984), and C. reinhurdtii (Rochaix et al., 1984). (c) Chlorophyll a apoprotein or 44,000 M, protein. Spinach (Alt et aZ., 1984; Holschuh et al., 1984). Pea and maize (Bookjans et al., 1986). (d) G protein. Maize (Steinmetz et aZ., 1986).
326
K. Umesono et al.
Liverwort E. coli S. typhimurium E. coli
HbpX MdK HisP PstB
MSILIYKVS~SLGNLKILDiVSLYVPKFSiIALLGPSGS~KSSLLRIIA~LDNCDYGNI~LHGID-----VT-NI---~--TOYR----RMS MA:VQLQN:T:AW:EVVVSKDIN:DIHEGEFVVFV:::: C:::T:::M::::ETItS:DLFI-:-EK--RMNDTP--PAE---:---G-VG MMSENKLHVIDLH:RY:GHEV:KG::: 0ARAGDV:SII:S:::::: TF::C:NF:EKPSE:A:IVN:QNINLvRDKDGQLKVADKN:L:LLRT:LT RSMVETAPSK:OVRNLNFYY:KFHA:KNIN:DIA:NPVT:FI::::C:::T::: TFNKMFEL-:PEQRAE:-EILL--DGD::LTN:QD--1ALLRAKVG FViQHYALFKHMiVYEN-ISFGL~LRG-FSAPKIiNKVND-L---LNC~RIADISFEY~A~LSGGOKQ~VALARSLAI~PDFLLLDEP~~LDGELRR~L M:::S:::YP:LS:A::-M:::.: KPA:-AKKEV:NQR::O-V--A-EV:QL:HLLDRK:KA:::::R::::IG:T:VAE:SVF:::::LSN::AA::VQM:::HFN:WS::::L::VMEAPIQVL:-L:KHDARERALKY:--A-KVGIDERAQGK::VH:::::O:::SI::A::ME::V::F:::TS:::P::VGEV M:::KPTPFP-:SI:D:I-A::V::FEKL:RADMDER:QWA:TKAALWNETK:KLHQSGYS:::::Q::LCI::GI::R:EV::::::CS:::PISTGRI
LR-IMQO:AEE-G:-:MVV:::EffiF:RHVSSHVIF:HQ:KIEEE:D:EQV-F-GN:QSPRLQQ-::K:S:K----------------------1----EE:ITE:KQDY---:VVI::: NMOQ:ARCS:HTAFMYL:E:I-EFSNTDDLFT-K:-AKKQTEDYIT:RYG-------------------------FDPIWV~~IFANRSINK~'RFFLRPYEF~IKSEMDLEA~PYP LPVESRDVQVGANMSLGIRPEHLLPSD:A-DVI::GE-::VVEQLGNE:--:IHIQIP:IRQ::VYRQNDVVLVEEGATFAIGLPPERCH:FREDCIT:CR
ILTNIKKN R:HKEPGV --------------
370 370 258 257
28.1% 21.4Z 18.4%
Figure 10. Structures conserved between liverwort mbpX and bacterial membrane subunits of transport complexes. Amino acid sequences G35 ‘. -G40-K41S42 and R145.. -F153-L154-L155-L156-D157 found in liverwort mbpX product correspond to the adenine nucleotide binding consensus (Walker et al,, 1982). Genes w&K (Gilson et al., 1982), h&P (Higgins et aE., 1982) and p&B (Surin et al., 1985).
(g) A bacterial permease-like gene An ORF encoding a polypeptide of 370 amino acids was tentatively designated as mbpX (37,012 to 38,124), because the amino acid sequence shows homology to those of inner membrane components of bacterial permeases such as hisP (Higgins et al., 1982), m&K (Gilson et al., 1982), oppD (Higgins et al., 1985), pstB (Surin et al., 1985) and rbsA (Buckel et al., 1986) proteins of the histidine, maltose, and ribose transport oligopeptide, phosphate systems, respectively, in size as well as in the primary structure (e.g. 28.1% homology with malK; Fig. 10). Amino acid sequence homology
(a)
bet#ween mbpX and the bacterial inner membrane subunits has been detected in other bacterial proteins involved in cell division (FtsE), nodulation (NodI), haemolysin transport (HlyB), and DNA repair (UvrA; Higgins et al., 1986; Doolittle et al., 1986).
(h) Genesfor NADH
dehydrogenase
The liverwort chloroplast genome contains a set of homologues of human mitochondrial “URF” or ND genes encoding components of a respiratory chain NADH dehydrogenase (Ohyama et al., 1986).
dh2
Liverwort Human-URF2
MKLELDMFFiYGSTILPECiLIFSLLIlLiIDLTFPKKD~lWLYFISLl~LLISIIILL~QYKTDPIIS~LGSFQTDSF~RIF~SFIVF~$ILCIPLSI~ TA:-------------------------------------------------------------------------INP:AOPVI:-:::FAGT-::YIKCAK~AI~~FLIFILTA~VGG~FLC~~DLVTIFVSL~CLSLCSYLL~G~TKRDI-RS~~AAIKYLLI~GTSSSILAY~FSWL~G~SG~ETNI~KITN ----------------------------SSHWFFTW:G::MNM:-AF[PV-L::KMNP;:T:::::: F:TQA:A:M::LMAILFNNEI:::Qw:MTNT-:: GLLNAETYNSSGTFIAFICIiVGLAFKLSLVPFHOWTPDI~~TPVVAFLSVTSKIAG~ALATRI-LNI~FSFSPN~WKiFL~ILAILS~ILGNLVAI ------Qy:: -____- LM:,.,,+j,.,:,.,::G"A::: F:V:EVTO:T:--------L::GLLL:TWOKLAPIS:MYPI::SLNVSL:LT:S:::IMA:SWGGL T~TSMKRMLAY~SISQIGYILiGLITGDLKG~TSMTIYVFF;IFMNLGTFAi;IILYSLRTGiDNIRD---YffiL~IKDPLLSFSiTLCLLSLGG~PPLTG N::QLRKI::::::THM:KMMAVLPYNPNMTILNL::: I------I:T:T:FLL:N-:NSS:TTLLLSRTWNK:TWLT::I-P:-:--:::::::::::: FFGKiYLFWCGWQS~FYLLVFI-ALiTSVISLYYYiKII----KLILTK~NNEINPYIQ~YIITSPTFF~KNPIEFVMI~CVLGSTFLGiIINPIFSFF~ :LP:WAIIEEFTKNNSLIIPTIM:T::-LLN::F::RL:VSTSITL:PMS::-VKflKW:-FEH:K::P:LPTL:ALTTLLLPI-:P:MLM:L-------DSLSLSVFFiK ___--------
(b)
501 347
22.79.
ndhf
Liverwort Human-URF3
MFLLOKYDV~FVFLLIISF;SILIFSLSK~IAPINKGPE;FTSYESGIE~ffiEACIQF~iRYYMFALVF~IFDVETVFL--~PWAMSFYNF~ISSFIEAl MN:ALI:M:NTLLAL:LMIITF:LPQL:GYM::S:P::C:FD::SP:RV~SMKFFLv:IT:LL::L:IAL:LPL:::L~TT:L-PLMvMSS: IFILILIIGLVtAWR-KGAtEWS LL:I::ALS:A:E:LQ::-:D:TE
120 115
30.8:!
Figure 11. Structural similarity between liverwort chloroplast ndh and mitochondrial URF or ND proteins. Human ND2 and ND3 (Anderson et aZ., 1981) are components of respiratory chain NADH dehydrogenase complex (Chomyn et al., 1985). (a) Homology between n&2 and human ND2 proteins. An arrow indicates the presence of an mtron in the n&2 gene. Possible initiation methionine codons are marked with asterisks. (b) Homology between ndh3 and human ND3 proteins.
Liverwort (a)
oRF62
Liverwort Spinach Wheat (b)
MTIAFOLAViALlAISFLL~IGVPVVLAS6EGWSSNKNV~FSGASLWIG~VFLVGILNS~IS .. .. .. .. .. .. .. .. .. .. .. .. ..T.S*.L.s....F...D.......*....T...L...........L.. .. . .. . . . . . . . . .. . . . .. . . . . .. . . . ::::::::::::::T:SV:::S::L:F:::D:::N::::::::T::::::::::A::::L::
ORF55
Liverwort Tobacco
82.3% 82.3%
ORF55 ORFl
M-(30 M-(36
AMINO AC,Ds)-~FN------I"LENAFYLNGITFAK~PEAySIFOPiVDV"PIIPL~FFLLAFVW~~SVSFR AMINO ACIDS)-:L:TFSLIG:C:NSTLFSSSFF:G::::::AFLN::::I::V::::::::::::::A:::: MAEVKQES:S::: EGEAK:FHK::TSSIL:FFGVAA:AH::VWI:RPW:PGPNGYSALETLTQTLTYLS MAOK:OLSFT:L:OEQAQ:LHAVYMSGLSAFIAVAVLAH:AVMI:RPWF
86 or
55 90 69 49
63.6% 23.6% 10.9%
ORF36a
Liverwort Tobacco (d)
62 62 62
(IhcA)
R. rubrum R. capsulata
(c)
327
Genome.II
Chloroplast
1 MLTLKLFVYTiVIFFVSLFVfGFLSNOPGRiPGRKE MI~SLFFKKNHLGI-JCV...................~......... . . . . . . ..*.......... . . . . . . . ..I.O..E.
ORF36a ORF2
if2
08.94
ORF370I
Liverwort Tobacco
--------------___---------------------------------------------------------------------------------MEEIORYLQPDRSWHNFLYPLMFQEYIYALAHDHGLNRNRSILLENPGYNNKLSFLIVKRLITRMYWNHFLISTNDSNKNSFLGCNKSLYS~ISEGF 1 ---------------------------------------MEHRIYNSN~FLDITlPYF~HPEILIRIF~RHIQDIPFL~FLRTLLYKN~CLNIL---NIE AFIVEIPFSLRLISSLSSFEGKKIFKSYNLRSIHSTFPFL:DNFSHL::V::: L:::PV:L:::VQTL:YWVK:ASS::L::FF:HEFWN::S:ITSKKP NiFYLKKNP-FFtFLWNFYIYE~EYLLNDIWE~FYKFESVFF~NFIDKTNSI~KIKHILKKS~KPIE-KKIVK~ISSIHYIRY~NNLlITLND~NILILE GYSFS::::R::F::Y:S:V::C:STFVFLRNQSSHLR:TS:GALLERIYFYG::ERLVEVFA:DFOVTLWLF:DPFM::V::QGKS:LASKG-TF:LMN NWK~FFLIFWQKY~NVWFKSSRIiIPNFYKNSF~FLGYMFRIE~~IILIQIQIiNLLRNVNL-I~KfFCSlIPViPLIRLLAKE~FCDVLGRPL~KLSWT K::FYLVN:::CHCSLC:HTG::H:NQLSNH:RD:M::LSSVRLNPSMVRS:MLENSFLI:NA:::-:DTLV:I::::GS:::AN::T:::H:IS:PV:S TLSD~EIFERFWIiKHIFSYYSGtlNKKGLYQL~YIFRFSCAKiLACKHKSTI~TVWKKYGSN~LTSSIFFNK~KLISLNFSN~NPYKKNFWY~NIIOV O:::~D:ID::GR:CRNL:H::::SSK::T::RIK::L:L:::R:::R::::: V::FL:RS::E::EEFLT-SEEQVL::T:PRAS-SSLWGV:RSR:-NYLAHSLQKSKLLKE W::DIFCINDLANYQ
(e)
370 509
34.3%
ORF167
Liverwort Maize
(134
AMINO ACIDS)-WFDQA~EYWKQAILLAP~NV~E~HN~~LKMTGRF -:::::::::::::A:T:':::::Q:::'I:K::EFE
167 ?
Figure 12. Conserved amino acid sequences of ORFs detected in chloroplast DNA sequences. (a) ORF62 is highly conserved among liverwort, spinach (Holschuh et al., 1984) and wheat (Quigley & Weil, 1985). (b) ORF55 (ZheA) contains 86 or 55 amino acid residues. Sequence similarity to tobacco ORF 1 (Deno & Sugiura, 1983) is localized downstream from the 2nd methionine codon. ORB55 (ZheA) also shares structural homology with the fl chains of the light-harvesting complex from Rhodes. rubrum (Berard et al., 1985) and Rhodop. cupsuZata (Youvan et al., 1984). (c) ORF36a corresponds to the carboxy-terminal portion of tobacco ORF2 (Deno & Sugiura, 1983). (d) ORF370i is entirely within the tmK(UUU) intron. In tobacco, the corresponding intron also includes an ORF of 569 amino acid residues (Sugita et al., 1985). (e) The last 33 amino acid residues of ORF167 can be seen in maize chloroplast DNA (Fish et al., 1985). Two genes in this region, ndh2 corresponding to ND2 and ndh3 corresponding to ND3, were identified by comparison of their amino acid sequences. The ndh2 gene (1514 to 3555) is split by a group II intron of 536 bp and it specifies 501 amino acid residues, much more than the human mitochondrial ND2 gene product (347 amino acids; Anderson et al., 1981). The discrepancy in their lengths is caused principally by an additional stretch of N-terminal amino acid residues in the ndh2 product. This portion of the ndh2 polypeptide is removable if either the third or fourth ATG codon, marked with asterisks in Figure 1 l(a), is used for the initiation of translation. Taking the first methionine codon tentatively as the initiation codon, the amino acid sequences of ndh2 and the human mitochondrial ND2 proteins are 22.7% identical. Northern blot analysis indicated that the ndh2 gene is actively transcribed in the chloroplasts (K. Umesono, unpublished results). The other gene, ndh3 (52,877 to 52,515), is in the upstream region of p8bG with an overlap of ten nucleotides. The ndh3 product (120 amino acids) is similar in size to a human
mitochondrial ND3 protein (115 amino acids; Anderson et al., 1981) sharing 30+3% homology (Fig. 11(b)). The protein products of these chloroplast ndh genes have not been identified. Similar observations on a series of ndh genes have been reported for tobacco chloroplast DNA (Shinozaki et al., 1986). (i) Other ORFs There are at least 15 different ORFs in this region, with products that range in length from 29 (ORF29) to 2136 amino acid residues (ORF2136). Some of the ORFs appeared to be phylogenetically conserved among the higher plant chloroplast genomes, suggesting that they are likely to be active genes. It has been shown that “URF-62” is a conserved unidentified ORF encoding 62 amino acids in the wheat, maize, spinach and tobacco chloroplast genomes (Quigley & Weil, 1985; Shinozaki et al., 1986). Accordingly, the liverwort counterpart of “URF-62” was identified as ORF62 (previously called ORF702; Umesono et al., 1984) by amino
328
K. Umesono et al.
acid sequence comparison (Fig. 12(a)). The amino acid sequence of ORF62 is 82.3% homologous to that in wheat and spinach. There are two ORFs (ORFl and ORF2) in the region between the tobacco chloroplast tmS(GCU) and t@(UUG) genes (Deno & Sugiura, 1983). In the corresponding region of the liverwort chloroplast DNA, we found two ORFs that we designated as ORE’55 (23,605 to 23,438) and ORF36a (23,107 to 22,997) with sequences partially homologous to those of the tobacco ORFl and ORF2, respectively. The ORF55 encodes 86 or 55 amino acids, depending on the choice of start codons. Comparison with the tobacco ORFl indicates that the amino acid sequence homology is only in their C-terminal portions just after the second in-frame ATG codons (Fig. 12(b)). Therefore, we have tentatively assigned 55 amino acids to ORF55, where they share 63.6% identical amino acids. A database search showed that ORF55 is partially homologous to light-harvesting polypeptides (fi chain) of photosynthetic bacteria such as Rhodospirillum rubrum (Berard et al., 1985) and Rhodopseudomonas capsulata (Youvan et al., 1984; Fig. 12(b)). Therefore, we tentatively designated ORF55 as 1hcA. Genes for the /? and a chains of the bacterial light-harvesting complex are tandemly oriented as 5/-/l chain-a chain-3’ (Youvan et al., 1984; Berard et al., 1985). Conserved ORF36a is downstream from ORF55 (IhcA), but no significant homology was seen between ORF36a and bacterial u chains. Likewise, a small polypeptide predicted from ORF36a is 88.9% identical with the last 36 amino acids of the tobacco ORF2 (Fig. 12(c)). As described above, ORF370i (26,976 to 28,088) is located completely within a long tmK(UUU) intron. The fine structure of the tobacco counterpart has been reported; it consists of 509 amino acids (Sugita et al., 1985), containing 139 more amino acids at the N terminus than the liverwort ORF370i product did (Fig. 12(d)). These two ORFs have structural similarity in their amino acid sequences, although at a low value (34.3%). A split gene ORFl67 (48,599 to 47,488) is upstream from psaA. The maize chloroplast DNA sequence in the psaAB locus (Fish et al., 1985) may partially include the second exon of ORF167. The maize sequence positions from - 719 to - 612 reported by Fish et al. (1985) might encode 36 amino acids with a sequence 72.2% identical with the last 33 amino acids predicted from ORF167 (Fig. 12(e)). A counterpart of ORF167 can be seen in the corresponding region containing ORF82 (Shinozaki et al., 1986) of the tobacco chloroplast genome, but it seems to be split into three exons by our method for the prediction of group II introns (data not shown). Our preliminary analysis of the tobacco chloroplast DNA sequence (EMBL database ver. 12) indicated the presence of homologous sequences to the liverwort ORF29 (5257 to 5168), ORF34 (4001 to 4105), ORF169 (51,742 to 51,233) and ORF2136 (29,909 to 36,319). Counterparts of the
latter are designated as two ORFs, ORF158 and ORF1708, in tobacco (Shinozaki et al., 1986). We could not detect tobacco sequences sharing homologies with the liverwort ORF30 (22,425 to 22,333), ORF32 (22,516 to 22,614), ORF33 (22,263 to 22,162) ORF50 (25,769 to 25,921), ORF135 (4236 to 5128), or ORF513 (24,053 to 25,594).
4. Discussion The gene organization of the liverwort chloroplast genome has several distinctive features. In the region described here, an inversion of about 30 kb could be seen between the liverwort and tobacco chloroplast genomes. The rearranged region is flanked by two tRNA genes, tmL(CAA) and tmD(GUC), and is located entirely within the LSC region in liverwort, covering at least the region from ORF34 to ORF2136. This region is partially included in the IR, sequence in tobacco (Fig. 13). Within the rearranged region, however, the relative gene locations seem to be conserved, except for a redundant copy of tmI(CAU)-rpZ23-rpZ2 and a ribosomal protein S16 gene (rpsl6), both found in the tobacco genome (Shinozaki et al., 1986). Instead of the tobacco rps16, we found ORF513 and ORF50, which are completely different from rpsZ6, and we did not detect rpsl6 anywhere in the liverwort chloroplast genome (Ohyama et al., 1986, 1988). With the spinach-type chloroplast genome regarded as ancestral, a large inversion in chloroplast DNA has been observed to be frequent (Palmer, 1985). One of the best-characterized inversions occurs in the wheat chloroplast genome, with endpoints associated with repeated sequences and flanked by tRNA genes (Howe, 1985). Unlike in wheat, there are no repeated sequences near the boundaries of the rearranged region in the liverwort. This rearrangement is, however, different from the others, because no DNA rearrangement among plant chloroplast genomes includes a psbA gene that is located near the border of the inverted repeats (IR) and LSC region (Palmer, 1985). The IR sequences in liverwort and fern chloroplast DNAs are very similar in size (Stein et al., 1986), but the fern Oamunda cinnumomea does not contain the liverwort-type DNA rearrangement in the LSC region (Palmer & Stein, 1986). Detailed comparison of the chloroplast gene organization will help to evolutionary relationships among elucidate liverwort, fern and higher plant chloroplast genomes. Our finding of putative genes for RNA polymerase subunits, rpoA (Fukuzawa et al., 1988), rpoB, rpoC1 and rpoC2, in the liverwort chloroplast genome may be controversial, because chloroplast RNA polymerase subunits have been shown to be synthesized from poly(A)+ RNA, indicating that they are nuclear-encoded gene products in spinach (Lerbs et al., 1985). However, it has been demonstrated that there are two distinct RNA polymerase activities in both spinach and
Liverwort
Chloroplast
Figure 13. Inversion between the liverwort and tobacco chloroplast genomes. Only a portion of the chloroplast genomes is shown. The gene organization of the tobacco chloroplast genome is quoted from Shinozaki et al. (1986), except for ORF29 and ORF34, which were deduced from our analysis with the liverwort sequences. Genes for tRNAs are represented by the l-letter amino acid code with their anticodon sequences in parentheses. Arrows indicate one of the IR and LSC junctions (JLA). A region from ORF34 to ORF2136 in the liverwort can be seen in a reverse orientation (ORF1708 to ORF34) in the tobacco genome, as indicated by filled triangles. The regions are flanked by 2 tRNA genes, tmL(CAA) and tmD(GUC), in both genomes.
Eu. gracilis chloroplasts (Greenberg et al., 1985). In Chlamydomonas reinhardtii, nuclear and chloroplast genomes share DNA sequences homologous to E. coli rpo genes (Watson & Surzycki, 1983). Preliminary studies indicated that the liverwort chloroplast rpoC1 gene was not a pseudogene but was transcriptionally active, and that the transcripts were correctly spliced in chloroplasts from cultured cells (K. Umesono & K. Nakahigashi, unpublished results). Therefore, the chloroplast rpo genes may be required to express a part of the genetic information in the course of chloroplast
Genome. I I
329
development, Comparison with the E. coli RNA polymerase subunits encoded by four genes (rpoA, rpoB, rpoC and rpoD) showed that there is no rpoDlike ORF in the liverwort chloroplast genome. A gene rpoD encodes a sigma subunit necessary for accurate promoter recognition (Burton et al., 1981). We suspect that the sigma subunit for the RNA polymerase is chloroplast-encoded synthesized in the cytoplasm and then transported into chloroplasts, or that we have missed the rpoD gene in the chloroplast genome because of its low level of homology to its prokaryotie counterparts. For the tobacco chloroplast genome, a similar organization of rpo genes has been reported, but the rpoC locus seems to be more complicated than in the liverwort (Shinozaki et al., 1986). The p&A proteins are the most conserved, but the liverwort psbA gene contains a lysine codon at position 238 not found in higher plants. In Anabaena 7120 psbA1, a transcribable gene in a complement of two psbA genes also contains a lysine residue (Curtis & Haselkorn, 1984). A strong preference for either A or T in synonymous codon choice can be seen at the third-letter position in identified and unidentified protein genes in the liverwort chloroplast genome (Ohyama et al., 1988). The codon usage pattern in psbA is particularly divergent in the choice of the third-letter pyrimidines (Y) for two-codon families such as asparagine (AAY) , aspartic acid (GAY), cysteine (UGY), histidine (CAY), phenylalanine (UUY), serine (AGY) and tyrosine (UAY) codons. In these codons, the psbA gene appears to use C twice as often as T (Table 1). A preference for C residue was observed in the choice of isoleucine codons. This peculiar codon usage pattern in the psbA gene may be correlated with physiological stability or translational efficiency of mRNA molecules. As described above, a polypeptide predicted from
Table 1 Codon usagepattern in the liverwort psbA and other psb genes psbA
Codon ucu ucc UCA UCG
8 0 2 0
38 2 11 1
ecu ccc CCA CCG
72 9 27 41
ACU ACC ACA ACG
Thr
Met
17 12 0 12 11
Val
1i
62 4 43 8
GCU GCC GCA GCG
Ala
Phe
1;
Leu
1;
cuu cut CUA CUG AUU AUC* AUA AUG GUU GUC GUA GUG
pab 132 17 114 14
Codon
uuu uuc* UUA UUG
Leu
Ile
0
pabA SW
Pro
12 0
4 0 12 0 3 0 14
P8b 49 10 15 4 49 3 35 3
0
55 7 40 3
32 0 6 0
87 5 47 6
;
Codon
UAU UAC* UAA UAG CAU CAC* CAA CAG
AAU
psbA TY~ Ter Ter
l; 1 0
His
5 5
Gln
i
AAC*
Asn
AAA AAG GAU
LYS
0 1
Asp
4 4
GAC* GAA GAG
Glu
1:
‘;
psb 43 7 6 1
Codon UGU UGC* UGA UGG
34 4 39 5
CGU CGC CGA CGG
47 11 59 5
AGU AGC* AGA AGG
61 6 74 8
GGU GGC GGA GGG
psbA CYS
;
Ter Trp
0 10 10
psb 12 1 0 54
4
; 0
36 3 6 0
Ser
3 7 2 o
30 4 30 3
29
88 9 70 11
Ax
Gly
; 0
Columns of pa6 in the Table include psb3, psbC, psbD, pabE, psbF, pa68 and pabH genes (this paper; Fukuzawa etal., 1988). Codons with asterisks are preferentially used in the pabA gene, but not in the other psb genes. The codon choice pattern of the psb genes in the Table can be observed in the other protein genes encoded by the chloroplast genome (Ohyama etal., 1988). Ter, terminal codon.
330
K. Umesono
the mbpX gene shares extensive homologies with inner membrane subunits of bacterial transport complexes such as HisP, MalK and PstB proteins (Fig. 10). In contrast with the conserved structure of the mbpX gene, the predicted amino acid sequencefrom the mbpY gene has less homology to the sequences of bacterial counterparts (Kohchi et al., 1988). It is therefore necessary to identify the products and functions of these mbp genes. It has been reported that the affinity histidine transport system of Salmonella typhimurium consists of a periplasmic histidine-binding protein (hisJ gene product) and three membrane-bound components, the his&, hisM and hisP gene products (Higgins et al., 1982). A chloroplast gene designated as mbp Y is a candidate for another component of the putative complex containing the mbpX product (Kohchi et al., 1988). The physiological role of the mbpX protein is not understood, but it seemslikely that this protein may be a component of an unknown transport system in the chloroplasts associated with adenine nucleotide binding activity. No mbpX-like gene is reported for the tobacco chloroplast genome (Shinozaki et al., 1986). In plant chloroplasts, the presence of the clustered genes involved in protein biogenesis or photosynthesis implies that they represent regulatory units for gene expression equivalent to bacterial operons. The transcriptional promoter-like sequences (shown in Figs 2 and 3) were present upstream from the clusters of functionally related genes such as tmE(UUC)-tmY(GUA)-tmD(GUC), tmL(UAA)-tmF(GAA), tmG(UUC)-tmR(UCU), rps’lZ-rps7, rpoB-rpoCl-rpoCZ-rpa2, atpI-atpH-atpF-atpA, psbD-psbC
atpB-atpE,
and psaA-psaB, suggesting that they may be cotranscribed. The deduced gene organization also indicates that protein genes tend to be tandemly oriented in the same direction, although tRNA genes are frequently transcribed in the opposite direction (Fig. 1). The organization of these gene clusters appears to be phylogenetically conserved among higher plant chloroplast genomes (Palmer, 1985). We thank Dr J. C. Gray for his critical reading of the manuscript and M. Toda for his assistance in DNA sequencing. This research was supported in part by a Grant-in-Aid for Special Research Projects from the Ministry of Education, Science, and Culture of Japan (to H.I., K.O. and H.O.) and in part by the Yamada Science Foundation (H.O.) References Alt, J., Winter, P., Sebald, W., Moser, J. G., Schedel, R., Westhoff, P. & Herrmann, R. G. (1983). Curr. Genet. 7, 129-138. Alt, J., Morris, J., Westhoff, P. & Herrmann, R. G. (1984). Curr. Genet. 8, 597-606. An, G., Bendiak, D. S., Mamelak, L. A. & Friesen, J. D. (1981). Nucl. Acids Res. 9, 4163-4172. Anderson, S., Bankier, A. T., Barrell, B. G., de Bruijn, M. H. L., Co&on, A. R., Drouin, J., Eperon, I. C., Nierlich, D. P., Roe, B. A., Sanger, F., Schreier,
et al P. H., Smith, A. J. H., Staden, R. & Young, I. G. (1981). Nature (London), 290, 457-465. Bedwell, D., Davis, G., Gosnik, M., Post, L., Nomura, M., Kestler, H., Zengel, J. M. & Landahl, L. (1985). Nucl. Acids Reu. 13, 3891-3903. Berard, J., Belanger, G., Corriveau, P. & Gingras, G. (1985). J. Biol. Chem. 261, 82-87. Bird, C. R., Keller, B., Auffret, A. D., Huttley, A. K., Howe, C. J., Dyer, T. A. & Gray, J. C. (1985). EMBO J. 4, 1381-1388. Bonnard, G., Michel, F., Weil, J.-H. & Steinmetz, A. (1984). Mol. Gen. Genet. 194, 330-336. Bookjans, G., Stummann, B. M., Rasmussen, 0. F. & Henningsen, K. W. (1986). Plant Mol. Biol. 6, 359-366. Buckel, S. D., Bell, A. W., Rao, J. K. M. & Hermodson, M. A. (1986). J. Biol. Chem. 261, 7659-7662. Burke, J. M., Irvine, K. D., Kaneko, K. J., Kerker, B. J., Oettgen, A. B., Tierney, W. M., Williamson, C. L., Zaug, A. J. & Cech, T. R. (1986). Cell, 45, 167-176. Burton, Z., Burgess, R. R., Lin, J., Moore, D., Holder, S. & Gross, C. A. (1981). Nucl. Acids Res. 9, 2889-2903. Cech, T. R. & Baas, B. L. (1986). Annu. Rev. B&hem. 55, 599-629. Cerretti, D. P., Dean, D., Davis, G. R., Bedwell, D. M. & Nomura, M. (1983). Nucl. Acids Res. 11, 2599-2616. Chomyn, A., Malriottini, P., Cleeter, M. W. J., Ragan, C. I., Matsuo-Yagi, A., Hatefi, Y., Doolittle, R. F. & Attardi, G. (1985). Nature (London), 314, 592-597. Cozens, A. L. & Walker, J. E. (1986). Biochem. J. 236, 453-460. Cozens, A. L., Walker, J. E., Phillips, A. L., Huttley, A. K. & Gray, J. C. (1986). EMBO J. 5, 217-222. Curtis, S. E. & Haselkorn, R. (1984). Plant Mot Riot 3, 249-258. Davies, R. W., Waring, R. B., Ray, J. A., Brown, T. A. & Scazzocchio, C. (1982). Nature (London), 300, 719-724. Deno, H. & Sugiura, M. (1983). Nucl. Acids Res. 11, 840778414. Deno, H., Kato, A., Shinozaki, K. & Sugiura, M. (1982). Nucl. Acids Res. 10, 7511-7520. Deno, H., Shinozaki, K. & Sugiura, M. (1983). Nucl. Acids Res. 11, 2185-2191. Doolittle? R. F., Johnson, M. S., Husain, I., Van Houten, B., Thomas, D. C. & Sancar, A. (1986). Nature (London), 323, 451-453. Erickson, J. M., Rahire, M. & Rochaix, J.-D. (1984). EMBO J. 3, 2753-2762. Fish, L. E., Kuck, U. & Bogorad, L. (1985). J. Biol. Chem. 260, 1413-1421. Fromm, H., Edelman, M., Koller, B., Goloubinoff, P. & Galun, E. (1986). Nucl. Acids Res. 14, 883-898. Fukuzawa, H.. Kohchi, T., Shirai, H., Ohyama, K., Umesono, K., Inokuchi, H. & Ozeki, H. (1986). FEBS Letters, 198, 11-15. Fukuzawa, H., Kohchi, T., Sano, T., Shirai, H., Umesono, K., Ozeki, H. & Ohyama, K. (1988). J. Mol. Biol. 203, 333-357. Gilson, E., Nikaido, K. & Hofnung, M. (1982). NucE. Acids Res. 10, 7449-7458. Greenberg, B. M., Narita, J. O., DeLuca-Flaherty, C. R. & Hallick, R. B. (1985). In MoEecuZar Biology of the Photosynthetic Apparatus (Steinback, K. E., Bonitz, S., Arntzen, C. J. & Bogorad, L., eds), pp. 303-309, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Hallick, R. B., Hollingsworth, M. J. & Nickoloff, tJ. A. (1984). Plant Mol. Biol. 3, 169-175.
Liverwort
Chloroplast
Henning, J. t Herrmann, R. G. (1986). Mol. Gen. Genet. 203, 117-128. Higgins, C. F., Haag, P. D., Nikaido, K., Ardeshir, F., Garcia, G. & Ames, G. F.-L. (1982). Nature (Lo&m), 298, 723-727. Higgins, C. F., Hiles, I. D., Whalley, K. & Jamieson, D. J. (1985). EMBO J. 4, 1033-1040. Higgins, C. F., Hiles, I. D., Salmond, G. P. C., Gill, D. R., Downie, J. A., Evans, I. J., Holland, I. B., Gray, L., Buckel, S. D., Bell, A. W. & Hermodson, M. A. (1986). Nature (London), 323, 448-450. Holschuh, K., Bottomley, W. & Whitefeld, P. R. (1984). Nucl. Acids Res. 12, 8819-8834. Howe, C. J. (1985). Curr. Genet. 10, 139-145. Howe, C. J., Auffret, A. D., Doherty, A., Bowman, C. M., Dyer, T. A. & Gray, J. C. (1982). Proc. Nut. Acud. Sci., U.S. A. 79, 6903-6907. Howe, C. J., Fearnley, I. M., Walker, J. E., Dyer, T. A. 6 Gray, J. C. (1985). Plant Mol. Biol. 4, 333-345. Kanazawa, H., Kayano, T., Mabuchi, K. & Futai, M. (1981). B&hem. Biophys. Res. Commun. 103, 694612. Kanazawa, H., Kayano, T.. Kiyasu, T. & Futai, M. (1982). Biochem. Biophys. Res. Commun. 105, 12571264. Keller, M. & Stutz, E. (1984). FEBS Letters, 175, 173-177. Kirsch, W., Seyer, P. & Herrmann, R. G. (1986). Curr. Genet. 10, 843-855. Kohchi, T., Shirai, H., Fukuzawa, H., Sane, T., Komano, T., Umesono, K., Inokuchi, H., Ozeki, H. 6 Ohyama, K. (1988). J. Mol. Biol. 203, 353-372. Krebbers, E. T., Larrinua, I. M., McIntosh, L. & Bogorad, L. (1982). NucE. Acids Res. 10, 4985-5002. Krebbers, E., Steimetz, A. & Bogorad, L. (1984). Plant Mol. Biol. 3, 13-20. Kruger, K., Grabowski, P. J., Zaug, A. J., Sands, J., Gottschling, D. E. & Cech, T. R. (1982). Cell, 31, 1477157. Lerbs, S., Brilutigam, E. & Parthier, B. (1985). EMBO J. 4, 1661-1666. Michel, F. & Dujon, B. (1983). EMBO J. 2, 33-38. Michel, F., Jacquier, A. & Dujon, B. (1982). Biochimie, 64, 867-881. Montandon, P.-E. & Stutz, E. (1983). NucE. Acids Res. 11, 5877-5892. Montandon, P.-E. & Stutz, E. (1984). Nucl. Acids Res. 12, 2851-2859. Ohme, M., Tanaka, M., Chunwongse, J., Shinozaki, K. & Sugiura, M. (1986). FEBS Letters, 200, 87-W. Ohyama. K.. Fukuzawa, H., Kohchi, T., Shirai, H., Sane, T., Sane, S., Umesono, K., Shiki, Y., Takeuchi, M., Chang, Z., Aota, S., Inokuchi, H. & Ozeki, H. (1986). Nature (London), 322, 572-574. Ohyama, K., Fukuzawa, H., Kohchi, T., Sano, T., Sane, S., Shirai, H., Umesono, K., Shiki, Y., Takeuchi, M., Chang, Z., Aota, S., Inokuchi, H. & Ozeki, H. (1988). J. Mol. Biol. 203, 281-298. Ovchinnikov, Y. A., Monastyrskaya, G. S., Gubanov, V. V., Guryev, S. O., Chertov, 0. Y., Modyanov, N. N., Grinkevich, V. A., Makarova, 1. A., Marchenko, T. V., Polovnikova, I. N., Lipkin, V. M. & Sverdlov, E. D. (1981). Eur. J. B&hem. 116, 621-629. Ovchinnikov, Y. A., Monastyrskaya, G. S., Gubanov, V. V., Guryev, S. O., Salomatina, I. S., Shuvaeva, Edited
Genome.
II
331
T. M., Lipkin, V. M. & Sverdlov, E. D. (1982). Nucl. Acids Res. 10, 40354044. Palmer, J. D. (1985). Annu. Rev. Genet. 19, 325-354. Palmer, J. D. & Stein, D. B. (1986). Curr. Genet. 10, 823-833. Quigley, F. t Weil, J.-H. (1985). Curr. Genet. 9, 495-503. Rasmussen, 0. F., Bookjans, G., Stummann, B. M. & Henningsen, K. W. (1984). Plant Mol. Biol. 3, 191-199. Reinbolt, J., Tritsch, D. & Wittmann-Liebold, B. (1978). FEBS Letters, 91, 297-301. Rochaix, J.-D., Dron, M., Rahire, M. C Malnoe. P. (1984). Plant Mol. Biol. 3, 363-370. Saraste, M., Gay, N. J., Eberle, A., Runswick, M. J. & Walker, J. E. (1981). Nucl. Acids Res. 9, 5287-5296. Schon, A., Krupp, G., Gough, S., Berry-Lowe, S., Kannangara, C. G. & Siill, D. (1986). Nature (London), 322, 281-284. Schwarz, Z., Jolly, S. O., Steinmetz, A. A. & Bogorad, L. (1981). Proc. Nat. Acad. Sci., U.S.A. 78, 3423-3427. Shinozaki, K., Ohme, M., Tanaka, M., Wakasugi, T., Hayashida, N., Matsubayashi, T., Zaita, N., Obakata, J., YamaguchiChunwongse, J., Shinozaki, K., Ohto, C., Torazawa, K., Meng, B. Y., Sugita, M., Deno, H., Kamogashira, T., Yamada, K., Kusuda, J., Takaiwa, F., Kato, A., Tohdoh, N., Shimada, H. & Sugiura, M. (1986). EMBO J. 5, 2043-2049. Spielmann, A. & Stutz, E. (1983). Nucl. Acids Res. 11, 715777167. Stein, D. B., Palmer, J. D. & Thompson, W. F. (1986). Curr. Genet. 10, 835-841. Steinmetz, A., Gubbins, E. J. & Bogorad, L. (1982). Nucl. Acids Res. 10, 3027-3037. Steinmetz, A. A., Castroviejo, M., Sayre, R. T. & Bogorad, L. (1986). J. Biol. Chem. 261, 2485-2488. Subramanian, A. R., Steinmetz, A. & Bogorad, L. (1983). Nucl. Acids Res. 11, 5277-5286. Sugita, M., Shinozaki, K. & Sugiura, M. (1985). Proc. Nat. Acad. Sci., U.S.A. 82, 3557-3561. Surin, B. P., Rosenberg, H. & Cox, G. B. (1985). J. Bacterial. 161, 189-198. Torazawa, K., Hayashida, N., Obokata, J., Shinozaki, K. & Sugiura, M. (1986). Nuc2. Acids Res. 14, 3143. Umesono, K., Inokuchi, H., Ohyama, K. & Ozeki, H. (1984). Nucl. Acids Res. 12, 9551-9565. van der Horst, G. & Tabak, H. F. (1985). Cell, 40, 759766. Walker, J. E., Saraste, M., Runswick, M. ,J. & Gay, K. J. (1982). EMBO J. 1, 945-951. Watson, J. C. & Surzycki, S. J. (1983). Curr. Genet. 7, 201-210. Yamada, K., Shinozaki, K. & Sugiura, M. (1986). PEant Mol. Biol. 6, 193-199. Youvan, D. C., Alberti, M., Begusch, H., Bylina, E. J. & Hearst, J. E. (1984). Proc. Nat. Acad. Sci., C’.S.A 81, 189-192. Zurawski, G. & Clegg, M. T. (1984). Nucl. Acids Res. 12, 2549-2559. Zurawski, G., Bottomley, W. C Whitfeld, P. R. (1982a). hoc. Nat. Acud. Sci., U.S.A. 79, 6260-6264. Zurawski, G., Bohnert. H. J.. Whitfeld. P. R. & Bottomley, W. (1982b). Proc. Nat. Awd. Sci., U.S.A. 79, 7699-7703. Zurawski, G., Bottomley, W. t Whitfeld, P. R. (1986). Nucl. Acids Res. 14, 3974.
by S. Brenner