Structure and organization of Marchantia polymorpha chloroplast genome

Structure and organization of Marchantia polymorpha chloroplast genome

J. Mol. Biol. (1988) 203, 294-331 Structure and Organization of Marchantia Chloroplast Genome II. Gene Organization polymorpha of the Large Single ...

4MB Sizes 18 Downloads 220 Views

J. Mol. Biol. (1988) 203, 294-331

Structure and Organization of Marchantia Chloroplast Genome II. Gene Organization

polymorpha

of the Large Single Copy Region from rps’l2 to atpB

Kazuhiko Umesono’t, Hachiro Inokuchi’, Yasuhiko Shiki’, Masayuki Takeuchi’ Zhen Changl, Hideya Fukuzawa2$, Takayuki Kohchi2, Hiromasa Shirai’$ Kanji Ohyama2 and Haruo Ozekil(I 1 Department of Biophysics, Faculty of Science and 2 Research Center for Cell and Tissue Culture Faculty of Agriculture, Kyoto University, Kyoto 606, Japan (Received 24 June 1987, and in revised form

14 April

1988)

The nucleotide sequence (56,410 base-pairs) of the large single-copy region of chloroplast DNA from the liverwort Murchantia polymorpha has been determined. The sequence starts from one end (JLA) of the large single-copy region and encompassesgenesfor 21 tRNAs, six ATPase subunits (atpA, atpB, atpE, atpF, atpH and atpI), two photosystem I polypeptides (psaA and psaB), four photosystem II polypeptides (psbA, psbC, p&D and psbG), five ribosomal proteins (rps2, rps4, rps7, rps’l2 and rps14), and three RNA polymerase subunits (rpoB, rpoC1 and rpoC2). In addition, we detected 18 open reading frames ranging from 29 to 2136 amino acid residues long, four of which share significant amino acid sequencehomology to those of an Escherichia coli malK protein (designated mbpX), human mitochondrial ND2 (ndh2) and ND3 (n&3) of a respiratory chain NADH dehydrogenase, or a bacterial antenna protein of a light-harvesting complex (ZhcA). Sequence analysis suggests that four tRNA genes and six protein genes might be split by introns; they are trnG(UCC), trnK(UUU), trnL(UAA), trnV(UAC), atpF, ndh2, rpoC1, rps’12, ORF135 and ORF167. In the large single-copy region described here, the gene organization deduced is highly conserved with respect to that of higher plants, but an inversion of some 30,000 base-pairs flanked by tmL(CAA) and tmD(GUC) was seen between the liverwort and tobacco chloroplast genomes.

1. Introduction In this paper, we discuss the gene organization in about 70% of the large single copy (LSCT/) region from the IR,-LSC junction. Portions of the DNA sequence have been published (Umesono et al., 1984; Fukuzawa et aZ., 1986).

2. Materials

and Methods

Sequencingof the cloned chloroplest DNA molecules and analysisof the nucleotide sequenceare describedin an accompanyingpaper (Ohyama et al., 1988).

3. Results t Present address: Gene Expression Laboratory, The Salk Institute, San Diego, CA 92138-9216, U.S.A. $ Present address: Institute of Applied Microbiology, University of Tokyo, Tokyo, 113, Japan. 3 Present address: R & D Center, Unitika Ltd., Uji, Kyoto, 611, Japan. 11 Author to whom all correspondence should be addressed. 7 Abbreviations used: LSC, large single-copy; IR, inverted repeat; bp, base-pair(s);kb, 10’ base-pairs; ORF, open reading frame; SD, Shine & Dalgarno. 0022-2836/88/160299-33

$03.00/O

The entire gene organization of this region is presented in Figure 1. The DNA sequences with deduced coding and amino acid sequences are shown in Figures 2 and 3, which correspond to the a and b regions, respectively, in Figure 1. Nucleotides in the LSC region are numbered with the end proximal to the IR, region as 1 (Ohyama et al., 1988). Figure 2 shows the sequence corresponding to nucleotide numbers 1 to 42,240. Figure 3 shows the reverse strand from nucleotides 56,410 to 299 0 1988 Academic Press Limited

300

K. Umesono

et al.

(b) Figure 1. Gene organization in a region from J,, (o p en triangle) to at@ of the LSC region. Genes identified are indicated in either the upper or the lower position of the linearized DNA strands. The letters (a) and (b) with arrows indicate the sequence file names described in the legend to Figs 2 and 3. Nomenclature of the genes is described in an accompanying paper (Ohyama et aZ., 1988). Coding or exon sequences are represented by filled boxes, and introns by hatched boxes. Numbers indicate the length from J,, in base-pairs x 10-3.

42,011, with an overlap of 230 nucleotides (42,011 to 42,240) with the sequence shown in Figure 2. (a) Genes for transfer

RNAs

We detected 21 possible tRNA genes in the region described here. None of them was duplicated in the entire liverwort chloroplast genome. Judging from the predicted anticodon sequences,they would encode three isoacceptors for serine, two for glycine, leucine, methionine (an initiator and an elongator) and threonine, and single species each for arginine, aspartic acid, cysteine, glutamic acid, glutamine, histidine, lysine, phenylalanine, tyrosine and valine. None of the tRNA genes encode the mature CCA sequence at their 3’ termini, in common with the other tRNA genes detected in the liverwort chloroplast genome. The predicted nucleotide sequences of tRNA molecules are compiled in Table 5 of an accompanying paper (Ohyama et al., 1988). Three serine tRNA genes, tmS(GCU) (22,892 to (41,494 to 41,407) and 22,979), tmS(UGA) tmS(GGA) (48,845 to 48,932), are the same length in their coding regions with long loop sequencesof 19 nucleotides. They are scattered in the LSC region (Fig. l), and no other serine tRNA gene was found throughout the chloroplast genome. Cloverleaf structures of their products seem to be normal except for a short D-stem (2 bp) in the tmS(GCU) product caused by C12-A24 and G13-A23 mispairings.

Two of the three chloroplast leucine tRNA genes, tmL(CAA) (3679 to 3758) and tmL(UAA) (50,522 to 50,921), were located 46.8 kb apart in the LSC region, and the other tmL(UAG) has been mapped in the SSC region (Kohchi et al., 1988). The tmL(UAA) is split by an intron of 315 nucleotides in the middle of the anticodon loop (T35-A36; Tables 2 and 5 of Ohyama et al., -1988). A split structure for tmL(UAA) has been reported for maize (Steinmetz et aZ., 1982), Vicia faba (Bonnard et aE., 1984) and tobacco (Yamada et al., 1986); the length of the intron varies from 315 (liverwort) to 501 (tobacco) nucleotides. As was pointed out by Bonnard et al. (1984), the intron of the liverwort tmL(UAA) shares some sequences similar to those found in fungal mitochondrial group I introns (Davies et al., 1982; Michel et al., 1982; Michel & Dujon, 1983; and see Fig. 4). These sequences are essential for self-splicing in vitro (Burke et al., 1986). However, the liverwort tmL(UAA) intron did not undergo such a reaction using artificial precursors under conditions similar to those used for the selfsplicing of Tetrahymena L-rRNA (Kruger et al., 1982) and yeast mitochondrial L-rRNA (van der Horst & Tabak, 1985; K. Umesono, unpublished results). Genes encoding glycine tRNAs, tmG(GCC) (42,035 to 42,105) and tmG(UCC) (22,047 to 21,385) are 20 kb apart on different DNA strands. The former tRNA would contain a mismatched basepairing between C26 and A42, making an anticodon stem of 4 bp (Umesono et al., 1984). The cloverleaf

Liverwort

rps’l2 5'

Chloroplast

Genome.

301

II

- trnfM(CAlJ)

TTGAACGAGnAGCCGTATGnAATGAAAAT~TCAAGTACG~TTTTGT~A~TGACAATTT~GGTAACTTA~TTGTCAACT~TTCCACTACnACACCAAAAnAACCAAACTCTGCCTTACG;\ TTPKKPNSALR .rps'1z..ragccg-augaa--gaaa--uucaugu-cgguuy.......(lntron)................cuayy-y-ayT

120

AAAATAGCTi;GAGTTAGACiAACCTCTGGATTTGAAATTACTGCATATAiTCCAGGTATiGGCCATAAT~TGCAAGAACATTCAGTTGTiTTGGTAAGAGGAGGAAGGGiCAAAGATTTA \IARVRLTSGFEITAYIPGIGHNLQEHSVVLVRGGRVKDL

240

CCTGGTGTAkGATATCATAiTATTAGAGGAACACTGGAT~CTGTAGGAG~AAAAGATCG~CAACAAGGG~GTTCTAGTG~GTTGTATAT~ATAATCTAT~AAAATGTAT~ATTTTAGAT~ P G V R Y H I I R G T L D A V G V K D R Q Q G R S Kgugyg.......................................

360

CCTAATTTAiTGCTGATAAiATGTAAAAA~TAGCTAACC~GTGATTAAA~TTTACATTT~AAAACGGAA~AAAAGCAGG~TATATGTAT~TAAAATAAA~TAAAATATT~TCTATATTA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron) .. . . . .. . . . .. . .. . . .. . . . .. . .. . . . .. . . . .. . . . .. . . . .. . . . .

480 ..

.

ATACTATACnATATCTAGGETTTTATTTA~AGTTAAAATAAAAATTTAA~TTTTCCCTT~CTTTTTAAT~CAAAATAAA~AAAATTTTA~TTTTTTAGA~CAAGTTAAA~TAAATAGCA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron) . .. . . . .. . . .. . . .. . . . . .. . . . .. . . . .. . . .. . . . .. . . .. . . . .. . .. . .

600

AAATAAAAAkATTTATTTTiATACAATATiTTTATAAATkTAAAACACT~GAAACGGAT~CTCAATTAA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ( 7" t ron).........................................................

720

AGTGAGTAAkCATCAATAAAATTAAACGA~GTAAAAAGC~GTATTCGTT~AAAATCGGA~GTACGGTTT~GAGGGAGAT~AAAAAATCC~CCCTACAAT~TGGAGTAAA~AAGTCAAAA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..ragccg-a"g~~--g~~~--""~~"g"-~gg""y..................c"ayyy-ay YGVKKSK=

840

AAATTTAAAkTAACTCTTAkATAAAAAAA~TAACTTTAA~TATTTATTA~TATGTCACG~AAAAGTATT~CAGAAAAAC~AGTTGCAAA~CCTGATCCA~TATATCGGA~TCGATTAGT~ rps7> MS R K S I A E K Q V A K P D P I Y R N R

L

V

AATATGTTAcTTAATCGTA;TTTAAAAAA~GGAAAAAAA~CATTAGCTT~TCGGATTCT~TATAAAGCT~TGAAAAATA~AAAACAAAA~ACAAAAAAA~ATCCAT~AT~TGTATTACG~ :! M L V N R I L K N G K K S L A Y R I L Y K A M Y N I K Q K T K K N P L F

V

L

R

CAAGCAGTTcGAAAAGTAAcTCCTAACGT~ACAGTCAAA~CAAGACGCA~CGATGGATC~ACTTATCAA~TTCCACTAG~AATTAAATC~ACACAAGGA~AGGCATTAG~CATTC(~TTG~ Q A V R K V T P N V T V K A R R I D G S T Y Q V P L E I K S T Q G K A LA

I

R

W

960

1080

1200

CTATTAGGAGCCTCACGGAnACGCTCAGGiCAAAATATG~CTTTTAAAC~TAGTTATGA~TTAATTGAC~CAGCCAGAG~TAATGGAAT~GCTATTCGT~AAAAAGAAG~AACTCATAA~ LLGASRKRSGQNMAFKLSYELIDAARDNGIAIRKKEETHK

1320

ATGGCAGAAGCTAATAGAGCTTTTGCTCAiTTTCGTTA~T~ACGT~TAAATTATA~AAAACAAT~TTTATTGTA~TGAAATATG~TTTAATATT~TTTATTATT~CAAATATTT~ +---------------~---------> f---, <---+ MA E AN R A F A H F R ===

1440 <-~.-.---

AATACAATAnAAATTGTTTiAGTTTTTTTiTnTTATTTT~ATTGTA~A~AATTTATTT~TTGGAAAAT~TTTATGAAA~TAGAACTTG~TATGTTTTT~TTATATGGA~GTACTATTT~ TTGTTT> TTTAAT> ndhz, GGA MKLELDMFFLYGST:L + ACCAGAATGiATTTTAATTiTTAGTTTATjAATTATTTT~TAATTGAT~TAACATTTC~T~AAAAGA~ACAATTTGG~TATATTTCA~CTCCTTAAC~AGTTTATTA~TAAGCATAA~ PECILIFSLLIILIIDLTFPKKDTI W L Y F I S L T S L L I

1560

1680 S

i

I

AATATTGTTaTTTCAATACiAAACAGATCCTATTATTAGiTTT~AATAGAATT~TTCAGTCAT~TATAGTATT~TGTTCCATT~TATGCATTC~ ILLFQYKTDPIISFLGSFQTDSFNRIFOSFIVFCSILC'P TTTATCAAT;GAATATATTAAATGTGCAA;\AATGGCTAT~CCTGAATTT~TMTATTTA~ATTAACAGCiACTGTCGGAGGAATGTTTTiGTGTGGAGCiAATGATTTA;;TTACTATTTj L S I E Y I KC A KM A I P E F L I F I L TAT V G G M F L C G A N D L V

1800

1920 T‘

F

TGTTTCGTTi\GAATGCTTGnGTTTATGTTCTTATTTATT~TGCGGT~T~CAAA~GAG~TATTCGATCiAATGAAGCTGCTATTAAATATTTACTTAT;\GGTGGAACAilGTTCTTCGAT V S L E C L S L C S Y L L C G Y T K R D I R S N E A A I K Y L L I G G T S S

2040 i;

I 2160

TATTTGTATiCTCGTAGGnCTTGCATTTAi\ACTTTCTTT~GTTCCATTT~ATCAATGGA~TCCTGATAT;TATGAAGGAGTGCGATTCGiTAAAAAAATiATTTAATAAGTTTAATAAAA I C I L V G L A F K L S L V P F H Q !d T P D I Y E G gugyg....................................

2280

ACTCAATAGi\TATATATATi\TATAAATAT;\TTTTTTTTT~ACACAAATT~TAG~T~TTTACAAAA~AAAATAACG~AAAATTTCA~CAATTAATT~TTATTTTTT~ATAAAAATT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron) . . .. . . .. . . . .. . .. . . . .. . . . .. . . . .. . . .. . . . .. . . .. . . . .. . . .. . ..

2400

TTAMnnnGiTnnTAAATAiTACGGAGTniTTGM~TTiAACCTAAAAGTGTAAAAACATAAGAATAG;AATAATAATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (intron)........................................................

2520

ATATT~TTCCTAAAAiAAATTGAATiAATAACTATiTTCGAAAAG~TTAAAAATT~ATTTAATTT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (intron)........................................................

2640

TAATTATACnTACACcAAG;\AncTTTTnnnnnnTTGATT~ATAMATTT~TTATTACTT~GGAGCCGTG~GAATTGAAA~TCTCATGCA~GGTTTTGAA~GAGAGAAAA~ATAATTTTT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..~~g~~g-~"g~~--g~~~--""~~"g"-~gg""y.........................

2760

TTTTTCGACiCTAACTCACCCACCCCAGTCGTTGCTTTTCTTTCTGTTA~TTCAAAAAT~GCTGGATTA~CTTTAGCTA~TAGAATTTT~AATATTTTA~TCTCTTTTT~ACCAAATGA~ . . . ..cuayy-y-ayS P T P V V A F L S V T S K I A G L A L A T R I L N I L F S F 8 P

2880 N

E

TGGA~TTiTTTTAGAAAiTTTAGCTATjTTAAGTATG~TTTTAGGAA~TCTAGTTGC~ATTACTCAA~CAAGTATGAAACGAATGCT;GCTTATTCTiCAATAAGTCAAATTGGATA; WKIFLEILAILSMILGNLVAITQTSMKRMLAYSSIS.QIGY

3000

ATTCTTATTGGATTAATAACAGGTGATCTj\AAAGGGTAC~CTAGTATGA~GATTTATGT~TTTTTCTAC~TTTTTATGA~TTTAGGAAC~TTTGCTTGT~TTATATTAT~TAGTTTACG~ ILIGLITGDLKGYTSMTIYVFFYIFMNLGTFACIILYSLR

3120

ACAGGAACAi;ATAATATTCETGATTATGCAGGTTTGTAT~T~GATC~TTTATTAAG~TTTTCCTTA~CATTATGTTiATTATCTTTAGGAGGACTTCCTCCTTTAACTGGCTTTTTi TGTDNIRDYAGLYIKDPLLSFSLTLCLLSLGGLPPLTGFF

3240

G~TTAiATTTATTTTi;GTGTGW\TGGCAATCAGGT~TTTATTTAT~AGTTTTTAT~GCATT~TT~CAAGTGTAA~TTCACTTTA~TATTATTTA~AAATTATTA~ATTAATTTT~ GKLYLFWCGWQSGFYLLVFIALITSVISLYYYLKIIKLll

3360

ACTAAAAAA6nTMT~iAAATCCTTAiATTCAAGCTiATATTATTACATCACCAACiTTTTTTTCT~MAATCCTA~TGAATTTGT~ATGATTTTT~GTGTATTAGGATCTACTTTi TKKNNEINPYIQAYIITSPTFFSKNPIEFVMIFCVLGSTF

3480

TTAGGCATTATTATAAACCCTATTTTTTCiTTTTTTCM~TAGTTTAT~TTT~GTGT~TTTTTTATT~AATAGAAAT~TTTGTTTTT~TTAAGGGTA~TAAAACTTT~TATATATAT~ L G I I I N P I F S F F QD S L S L S V F F I K === +-------

Fig. 2.

----

3600

302

K. Umesonoet al.

TATATATATATATATATATnTATATATATATATATATATACACGCGAGA~TCAAAATTT~ -------------------)<------------------~-----------+

3720

LeU-cAA>

5'-GCCUUGAUGGUGAAAUGGIJAGACACGCGAGAUUCAAAAUUUC

GTGCTTAAAGCATGGAGGTiCGAGTCCTCiTCAAGGCAA~AAAATAAAA~TATTTAGTT~AATTTTTTA~ATAAATATT~TTTATGTTA~ACTATTCTA~TGATAGTAA~GAAATTTAT~ GUGCUUAAAGCAUGGAGGUUCGAGUCCUCUUCAAGGCA-3'

3840

TATGTAAAAEATATATTTTiTATGAACATiGATTAAATTGAAAAACAGA~TCATATTTT~ATCAGATGA~

3960

ATTTTTAATkTTTTTAAATAACCCTTAAAATAATATAATiATATTTTAG~ATTTATTGC~ACGGCTTTG~TTATTTTAA~TCCTACTGC~TTTTTACTT~TTCTTTATG~ ORF34> MEVNILAFIATALFILIPTAFLLILYV

4080

ACAAACAGCiAGTCAAAAT;GTTAAATTT~TCAAAATTT~TTClTAGAA~ATTTATATT~TTTAAATAT~TATAAAAAG~AGAAAAAAA~ATAAAATTT~TATTTTTTT~TTCTTTTTT~ +--------------------, (-------------------$ Q T A S Q N S === TTCAAA> TATATT>

4200

ATTACTATT;CAATTATTA;\nTTATTAAAiAAGAAATGA~TCATATGGA~CTAGGACCT~GTACAATAC~AGGCGTTGG~TTAATTATA~TAGGTCTAT~TTTATATGC~CTTAAATTA~ ORFl35 MNHMELGPSTILGVGLIIIGLFLYALKLR

4320

GGGAACCTTi\TGTTTCTAGnGGTGTGATTiAAAATGTAC~TAAAAAATT~TCAACATAT~TTGAATATA~AATTAATAT~AAACTTGAA~ATTTTTTAA~AAATTAGAA~AACTTTATT~ E P Y V 5 R Dgugyg..............................(~ntron)........................................................

4440

AAAAGTTTTI\TAAAACCTAiGGTTAAAAT;\TTTAAACAT~TTTATGTTA~TTCGAAAAT~TACTCTTTA~ATGTAAAAA~TAGGATTTT~GTTAAAATT~TTTTTTCTC~AGAGAAAAA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .._...

4560

CAACAACAAEAACAACAACEAAATTTAAT~TGAACCTAA~GATTTTTTA~AATAGTACA~AAAATATAA~TTTTTATAA~TAACAAATT~TTAATTTTT~TTTGTTAAT~ATAAAAAAA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..(1ntron)........................................................

4680

GACAATCCAl\GACTTAAAAAnnnTTACTTGGnsTTTAGAG~AAAAACCAT~TTTTATGGT~TTTGTTTAA~CCATACAGA~TTGAAAATA~CATATATGG~TTTCAAGGG~GGAAAAAGA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .(lntron) . ..ragccg-augaa---gaaa--uucaugu-cgguuy...................

4800

TTATAAAATtTAAAACCTAiCCTAATATT~TGATTTTTT~TTTTCATGT~TTGGATTAT~ATGTGGAGG~ATTCTTTTT~TTCAGGGTT~GCGCTTAGA~CCTATTCTT~TATTATCAC~ YDFFFSCIGLLCGGILFFQGWRLDPILLLSQ . . . . . . . . . . . . . . ..cuayy-y-ay

4920

AATTTTATTitAGTGGAACAi\CTATTTTTTiTATTGCAGA~AGTCTTTAT~TAAGAAAAA~TCTCAATTT~GTAAAATCT~AAAAAAAAT~CATAAATTT~GCAAAAAAA~ATATATATA~ I L L 5 G T T I F FIA E S L Y L R K N L N F V K S K K K Y I N L AK K N

5040 I

Y

K

ATACATTTAiGAAAATTTTkAATTAAAAA6AAAATGGAAjGAATTAAAT~ATACAAGAC~CATTTTTTA~AAAAAAAAA~AGCATTGAA~TTTTTGAAT~CAATGCTTT~TTTAAAAAA~ Y I Y E N F K L K K K W N E L N Y T R H I F Y K K K K H === +---------------, <---------------+ CTTATATTTnTAAACCACTiCGTCCCCAT~CAACAAGTG~AAGAGAAAA~GTAAAAATA~CCATTAAAG~AGCCCAGGC~ATATTAATA~TATCCATTT~AAAATCCTT~ATTTTGCAT~ === L G S R G W V V L S L S F T F I V M L A A d A I N I I D M AGGA TTTTTTTTAkCTATCAAAAkTTTTTTCTTiCTATATAATkTATATATAT~TATATATAT~ +----. -----------------,c~-----------------

-----+

5160

5280


+--~------------,<----

TATATATATiiTAAATATAGbTnGAAATATTTTTTTTTTA~ATAATTTTT~ATATTTTTT~GCTAATTAA~AAAATGAAA~ATTAACTCG~TTTTTTTTT~TTATTGTTT~AATCTTATG~ +------<--------------+ > +

5520

TGGGTTGTCiATAATATAAiAAATTAAAAAACATTTAAA~ACGTATCAG~CTATAATAC~TAGAGCAGT~CCAATTTGT~TTTTTAAAT~ACATAATAG~CAATCCACA~TTTCTATAT~

5640

TTATTAATAiiAGCGACACCCAGATTTGAACTGGGGATAA~GGATTTGCA~TCCTCTGCC~TACCACTTG~CCATGTCGC~TTTATTTTA~TTAATTGTA~AATACAATG~ATTATTTTA~ 3'-UUCGCUGUGGGUCUAAACUUGACCCCUAUUUCCUAAACGUCAGGAGACGGAAUGGUGAACCGGUACAGCGG-5' < Cys4CA
5760

AATTCAAGGEATATTCAATkTTTTTTATA;\CAAAATATA~TCAATTTTT~TTTCTAATA~TTAATAAAA~ACACTCCAA~AATAATTTA~AGGAAAATA~GGAGATTTT~ATACTCCCT~ TATAAC>

5880

AATTTGGTAiiAATACAATTiGAAGGATTTAATCGTTTTA~AAATCAAGG~TTGAGTGAA~AACTTAGTA~TTTTCCAAT~ATTGAAGAT~TAGATCAAG~ATTCGAGTT~CAAATATTT~ FGKIQFEGFNRFINOGLSEELSNFPIIEDIDQEFEFQIFG

6000

GTGAACAATi\TAAATTAGCkGnnccnTTAiTAAAAGAAA~AGATGCCGT~TATCAATCT~TTACCTATT~ATCCGACGT~TACGTACCA~CTCAATTAA~ACAAAAAAA~AAAGGAAAA~ EQYKLAEPLLKERDAVYQSITYSSDVYVPAQLTQKKKGKI

6120

TACAAAAAC2\AATAGTTTTiCTTGGAAGTATiCCTTTAA~GAATTCTCA~GGTACTTTT~TTGTTAATG~AGTAGCTCG~GTTATAATT~ATCAAATTT~ACGAAGTCC~GGAATTTAT~ Q K Q I V F L G S I PLM N S 0 G T F V V M G V A R V I I N Q I L R 5 P G

I

Y

Y

6240

ATAATTCAGAATTAGATCAiAACGGAATTCCTATATATACAATATGGGC~CGTATAAGT~ N S E L D H N G I P I Y T G T L I S id

I

S

K

6360 W

G

G

R

L

K

L

E

I

D

G

K

T

R

I

WAR

AAAAAAGAA6AGTTTCTAT;TTAGTTTTA~TATTAGCTA~GGGTTTAAA~TTACAAAAT~TTTTAGACA~TGTTTGTTA~CCTAAAATT~TTTTAGAGT~TATAAAAAA~AACACAAAA~ KRKVSILVLLLAMGLNLQNILDSVCYPKIFLEFIKKNTKK

6480

AAGAATATCEGAATTCAACiGAAGACGCTATncTGGAAC~TTATAAACA~CTATATTGC~TAGGTGGAG~TCTTTTTTT~TCTGAATCG~TACGCAAAG~ATTACAAAA~AAATTTTTT~ EYPNSTEDAIVELYKHLYCIGGDLFFSESIRKELQKKFFQ

6600

AACAGAGATGTGAGTTAGGcAAAATTGGA~GATTAAATT;A.4ATGAAAT~TTTGTATTA~CACAAGATA~TTTAGCAGC~GTTGATTAT~ QRCELGKIGRLNLNKKLNLNVPENEIFVLPQDILAAVDYL

6720

TAATCAAATiAAAATTTGGiATAGGTACA~TTGATGATA~AGATCACTT~AAAAATCGA~GTGTTTGTT~TGTAGCAGA~TTATTACAA~ATCAATTAA~ATTAGCATT~AATCGTTTA~ IKLKFGIGTIDDIDHLKNRRVCSVADLLQDQLKLALNRLE

6840

AAAATTCAGiTCTTTTTTTiTTTCGAGGAEiCCACAAAAC~AAAACGATT~CCGACTCCA~AAAGTTTAG~AACTTCAAC~CCATTAATA~TGACTTTTA~AGAATTTTT~GGTTCACAT~ NSVLFFFRGATKRKRLPTPKSLVTSTPLIMTFKEFFGSHP

6960

CATTGTCTCAATTTTTAGAiCAAACAAATCCATTAACTG6GAAGAACfG~AAGTTTTCA~GTACGTGAT~ LSQFLDQTNPLTEIVHKRRLSSLGPGGLTRRTASFQVRDI

7080

Fig. 2,

cont.

Liverwort

Chloroplast

Genome.

II

303

TTCACGCTAETCATTATGG;AGAATTTGTCCTATAGAAA~ATCTGAAGG~ATGAATGCT~GACTAATAG~TTCATTAGC~ATTCATGCA~AAATAAGTA~TTTAGGGTG~TTAGAAAGT~ HASHYGRICPIETSEGMNAGLIASLAIHAKISILGCLESP

7200

CATTTTATAkAATATCTAAkTTATCGAATiTAGAAGAAA~TATTAACTT~TCTGCTGCT~AAGATGAAT~CTATCGAAT~GCTACTGGC~ATTGTTTAG~ATTAGATCA~AATAGTCAA~ FYKISKLSNLEEIINLSAAEDEYYRIATGNCLALDQNSQE

7320

AAGAACAAAiTACTCCTGCGCGCTATCGA~AAGATTTTGiAAGTATTTT~CCTTTACAA~ATTTCTCCG~TGGAGCATC~CTTATTCC-~ EQITPARYRQDFVAIAWEQVHLRSIFPLQYFSVGASLIPF

7440

TTCTTGAACRTAACGATGCnAATAGAGCTiTAATGGGCTCTATAGAAAG~CAAACAGCG~ LEHNDANRALMGSNMQRQAVPLLKPEkCIVGTGIESQTAL

7560

TAGATTCGGGAAGTGTTACiGTCTCATCGCATGGAGGAAjATCAAATTA~TTTATCCTT~AAAAAAAAA~AAATTGATA~AAATTTAAT~ATATATCAA~ DSGSVTVSSHGGKIEYLDGNQIILSLKKKKIDKNLIIYQR

7680

GTTCTAATAkTAGTACGTGiATGCATCAA;\AACCTAAAG~AGAAAAACA~AAATATATA~AAAAAGGAC~AATTTTAGC~GACGGAGCT~CTACTGCAA~TGGCGAATT~GCTTTAGGT~ 8 '/ 'G STCMHQKPKVEKQKY!KKGQILADGAATANGELALGK

7800

AAdATATTTiAGTAGCTTAiATGCCTTGG~AAGGTTACA~TTTTGAAGA~GCAATTTTA~TTAACGAAC~TCTAATTTA~GAAGATATT~ATACTTCAA~TCATATTGA~AGATATGAA~ II ! L VA Y M P W E G Y N F E 0 A I L I N E R L I Y E 0 I Y T S I H I E R

7920 Y

TTGA~GCTCbTGTAACAAGiCAAGGTCCT~AAAAATTTA~TAATGAAAT~CCCCATTTA~ATGATTACT~ACTTCGTCA~TTAGATCAA~ATGGCATTG~ATTAACAGG~TCTTGGGTr~ E A R 1 T S Q G P E K F TN E I P H L 0 D Y L L R H L D Q N G I V L T G SW

E

I

V

E

8040

IGlC~GGAG6TGTTTTAGTEGGAAAATTA6CACCTCAAGjAGTAGCAAC~TCGAAAGAA~ T G DlLVGKLTPQETEENLRAPEGKLLQAlFGIQVATSKET

8160

C-TGT:TTAkAGTCCCTCCkGGAGGTAGGi;GTCGAGTTA~TGATATTCG~TTAATCTCT~AAGAAGACA~TTCTGCTAA~ACAGCACAA~TTATTCATA~TTATATTTT~CAAAAACG.r~ C L h : P P G G R G R V I II I R L I S Q E D N S A il T A Q I I H I Y I L Q

8280 K

R

K

'I:~-~~AA4TI\GGTGATAAAGTTGCTGGAAGACATGGAAATAAAGGTATTATTTCAAAAATATTACCAAGACAAGATAT~CCTTTTTTA~AAGATGGTA~ACCAATAGAiATGATATTA~ 1 G D K V A G R H G N KG I I S K I L P R Q 0 iI PFLQDGTPIDMILS

8400

;-CC?TTAG6CGTACCTTC;CGAATGAATGTAGGACAAA~TTTTGAATG~TTGTTGGGT~TAGCAGGAA~TTTTCTTCA~AAAAATTAT~GAATAATTC~TTTTGACGA~CGATATGAA~ 3 L G /I P 8 R M N V G Q I F E C L L G LAG S F L ti K F? i'R I I P F D E R

Y

E

R

i;~~liCCTCAAGAAAGCTAGTCTTTTCTGAACTTTATAAAGCAAGTAAAAAAACAACA~ATCCATGGT~ATTTGAACC~GATAATCCC~GAAAAAATC~ACTAATCGA~GGAAGAACA~ :. 8 R K L V F S E L Y K A 5 K K T T N F > ! L F E p D: P G K N R L ID

G

R

T

G

T

Q

Q

P

C?C-TCGAGcAAGATCTAGAAGAGGAGGTCnAAAGAGTTG~TGAAATGGA~GTGTGGGCT~TAGAAGGCT~TGGTGTAGC~TATATTTTA~AAGAAATGT~AACTATAAA~TCTGACCAT~ R G P S R R G G Q R V G EM E VW A L E G F G VA Y : L Q E M L T I K 8 D

H

I

--CGdGCTCsTTATGAAGTiCTTGGTGCTjlTTGTTACTG~AGAACCTAT~CCTAAACCA~ATACTGCTC~GGAATCATT~AAATTACTT~TAAGAGAAT~ACGATCTTT~GCTTTAGAA~ R : R \- E V L G A I V T G E P I P K P II T A P E 8 F K L L V R E L R S LA

L

E

I

L

R

I

;-sd~ATTTiTGAACAACCiATAACTATTGGAAAAGCTT6TGCATTAGT~ACACAACAA~ : F E Q P I T I G K A Y M L K L I

8520

8640

8760 H

Q

V

D

D

K

I

H

A

R

S

TTaATCATGjTAiTATATGiGAAAAAAATiTGAAATTAAj~TTGGAATTT~TATGTTATG~CTTATCAAA~AAAACATCA~CATCTTCGA~ ', I. 'J : ! C E K N L K L K L K E I === GGA rpoCl>

S

G

P

Y

A

L

V

8880

9000

9120 M

T

Y

Q

K

K

H

Q

H

--~AATTAGCCTCACCTGAACAAATACGTAATTGGGCCGAAAGAGTGTTACCAAATGGT~AAATTGTTG~TCAAGTAAC~AAACCTTAT~CATTACACT~TAAAACACA~AAACCAGAA~ .ASPEQIRNWAERVLPNGE:VGQJT'PfTLHYKTHKPEK

9240

P~GPTGG:TTATTTTGCGAAAAAATTTTCGGACCTATTAAAAGTGGAAT~TGTGCATGT~GAAAATATC~AGGTATTGA~AAGAAAAAA~AAAATATAA~ATTTTGTGA~CAATGCGGA~ ;m 1: L : CEKIFGPIKSGICACG
9360 G

V

TGGAATTTP;TGAATCTCGnATTCGAAGA~ATCGAATGGEATATATTAA~TTAGCATGT~CTGTAACTC~TGTTTGGTA~T?AAAACGT~TACCTAGTT~TATTGCAAA~CTTTTGGCr~ E F : E S R I R R Y R M G Y I K LA C S d T H V: i L" R L P 8 Y I AN L LA

9480 K

APCCTCTTA6AGAGTTAGA6ncTCTAGTTinCTGCGATG~GTGACTTGA~CATATTGAT~TTGATTTTA~CAGATATAA~TAACTGTTA~TTTAATTTA~TTTTAATAA~TACCTCAAA~ ' L i E L E S L V Y C D gugyg.............(lntron)........................................................

9600

-TTG~CATTnCTTAATAAARATATGAAAT~AAGCTCAAA~AAAACAATT~TCATATTAT~TTTTGTATT~GTTTAGAGG~TTTAATCrA~GGTTATAAA~AAAATTATT~GCTATTTAG~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..(lntron)........................................................

9720

TTTCGTAAAAAAAATAATGAAAAAACTATTAAATTTTTTAATGGTAAATTTAAAGTAAA~TCATCGCAA~TATACTAAA~AAAAAACCA~CGAAATAGA~CAAAAACCT~AAGGATAAA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..(1ntron)........................................................

9840

DKAAATCAA6ATACAGACA6GAAAAATCTjTATTAAATA~TATAATAAA~AATATATAG~AATAAACAT~TATAAACAA~TPTTTTTAA~GAAATAAGA~AACTCAATA~TAAAAAAAT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron) . .. . .. . . . .. . . .. . . . .. . .. . . . .. . . . .. . . . .. . .. . . . .. . . . .. . . .. .

9960

ACTTGAGTCnTGAGTAGCA;TTTTTTTTiTTTTGATTT~TTATATTTA~AATCAAAGT~TTTAATAAA~AAATTATAA~TPTlT~~AG~~GGATGACGG~AAACTTTCA~GTCCGATT'~~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron). .. . . . .. . . .. . . . . . . ro','.'.'l-d"',,,a-gdaa---uucdugu-cgguuy.

10080

TAGGGGGGGAATTCTATAA6TAACCTATCCCAATCTCTT~CTTGCTAGA~CTATAACTA~AAAACCCAC~TTATTAAAA~TA~AA~,f,~TiATTTAAATA~GAAGATCAA~CTTGGAAAG~ cuayy-y-ayL F L A R P I T K K P T L L Y .*...................... 0 (> I t?YEDQSWKa

10200

TATTTTTCCiCGCTTTTTTiCTCCTAGAG~TTTTGAAGTiTATTiAAA~~A~.AAl;AA(.TAATTT6AACTTACAAkATGTAATAA~ : F P R F F S P R G F E V F Q N R E I A T G G Ii Al

10320 0

Y

Q

N

V

I

TCTTGCACAcTTAGAATGGiAAGAGTTTG~TGAACAAAA~TCAACTGGA~ATGAATGGG~AGATAGAAA~ATT~Af,~f,A~r,AAAA~,Ali.;ITlAf;TTAG/\CGAATAAAACTAGCTAAAC;\ L A H L E 'vl K E F A C Q K S T G N E W E D R Y I II // ' Y II I I V I( RIK

L

A

Ki

TTTTATTCA~ACAAATATA6AACCAGAAT~GATGGTTTT~TCATTATTA~CAGTGCTTC~CCCGGAATT~Cf~T~~AAi~,~II~,AA~.lAf,~,f.f,AAf,f;Tf~AnrrnnrnnCnjcTGATTTAA~ F I Q T N I K P E W M V L 5 L L P V L P P E L P /1 M II ! f, I (, I I IT

S

D

L

TGAACTTTA~AGAAGAGTT~\TTTATAGAA~TAATACTCT~CTTGATTTT~TGGCACGAA~TGGTTCTA(.~I.~.A~,~,A~,~,I~IA~,II~,~I Et Y R R V I Y R N NT L L D F LA R S G 8 T II f, I, \,

Fig. 2, cont.

Cl

i

I

I

/j

L

N

L

V 10440

10560

I~,(.~AAAA~,C~TTAGTTCAA~AAGCTGTTG~ I. II Y I( i VQEAV?

Y 10680

304

K. Umesono et al.

10800

10920 TCTTATTGG;\CGAAATTTTi;CTCCTAATCiGAGAGCAGC~AAAACTATG~TTCAAAATA~AGAACCTAT~ATTTGGAAA~TACTTCAAG~AGTTATGCA~GGACATCCT~TTTTATTAA~ L I G R N F A P N L R A A K T MIQ N K E P I I W K V L 0 E V M Q G H P I

11040 L

L

N

TAGAGCACC;mCATTACATi\cATTAGGAA~ACAAGCATT~CAACCAATT~TAGTAAATG~ACGAGCTAT~CATTTACAT~CGTTAGTTT~TGGTGGTTT~AATGCTGAT~TTGATGGAG~ R A P T L H R L G I Q A F 0 P I L V N G R A I H L H P L V C G G F N A D F D

G

D

TCAAATGGCiGTTCACATAECTTTATCATiAGAAGCTCA~GCAGAAGCT~GTTTACTTA~GCTTTCTCA~AAAAATTTA~TATCTCCAG~TACAGGAGA~CCTATTTCT~TGCCAAGTC~ QMA V H I P L 5 L E A Q A E A R L L M L 5 H K M L L S PAT G E P I S V

P

8

Q

AGATATGCTiCTTGGACTTiATATTTTAACAATTGAAAAiTAAAAAAAA~TTTTCTCAA~TACCTTATT~ D M L L G L Y I L T I E N N Q G I Y G N K Y

P

Y

F

11160

11280

11400 N

P

S

K

K

Y

D

S

K

K

K

F

S

Q

I

11520 TATTGAAATiCAATACAAAiCTTTTGGnAhTTCTTTTCA~ATTTATGAA~ATTACCAAC~TAGAAAAAA~AAAAACCAA~AAATTATTA~TACTTATAT~TGTACAACA~CTGGACGJA~ I E I Q Y K S F G N S F Q I Y E H Y Q L R K N K N Q E I I S T Y I C T T TCTTTTTAAiCAACAAATTbAAGAAGCTAiACAAGGTACjTGGATAAAA~AGGAGCCAT~CATTTCAAT~ L F N Q Q I E E A I Q G T Y K A S L K 0 K

T

F

V

4

K

I

E

K

N

G

===

A

G

R

+-----,

11640

I

11760

<---

GCTTAAATTCATTGTTTATiTATGTTTAT~AAAACAAGAGGTTTTTTAT~ATGGCAGAA~CAGTCAATT~GATATTTTA~AATAAAGTT~TGGATCGAA~TGCCATAAA~CAACTTATA~ + rpCi!> GAGG i.lA E P V Nt I F Y N K V M D R T A I K Q L

1188G J

S 1200G

CAGCACCTTCTAAAAGTTGGCTTATTGAAi;AIGCAGAAC~ATATGGTAA~CTTTCAGAA~AACACCATA~TTATGGGAG~TTACACGCA~TAGAAAAAT~GCGTCAACT~ATAGAAACA~ A P S K S W L I E D A E Q Y G N L 8 E K H H ni Y G SLH A :! E K L R Q L I

E

T

14

GGTATGCTAEAAGTGAATAiTTAAAACAGGAAATGAATC~TAATTTTCG~ATAACAGAT~CGTTAAATC~AGTTCATAT~ATGTCTTTT~CCGGAGCTC~AGGCAGCAC~TCTCAAGTT~ Y A T 5 E Y L K 0 E M 1‘1 P N F R I T 0 P L M P V H MM S F 8 G A R G S T S

Q

V

H

Y

G

ATCAATTAGiAGGTATGAGkGGATTAATGiCAGATCCTCATAATTTTAG~GAAGGTTTA~CTTTAACAG~ATACATAAT~TCCTGCTAT~ Q L V G M R G L M S D P Q G 0 I I D L P I Q S N F R E G

12120

12240

12360 L

S

L

T

E

Y

I

I

SC

GAGCACGGA;\AGGAGTAGTnGATACTGCAGiACGTACCT~TGATGCAGG~TATCTTACT~GAAGACTTG~TGAAGTAGT~CAACATATT~TTGTCCGAA~AGTAGATTG~GGTACTCTT~ ARKGVVDTAVRTSDAGYLTRRLVEVVQHIVVRKVDCGTLY

12480

ATGGTATAAl\TGTAAATAAiTTATCAGAA~AAAAAAATA~TTTTCAACA~AAATTAATC~GACGTGTGA~TGCAGAAAA~AT~TATATA~ATCATAGAT~TATTGCTCC~CGAAATCAA~ G I N V N N L S E K K N N F Q Q K L I G R V I A E N I Y I D H R C I A P R

N

Q

D

ATATCGGCGEACTTTTAGCEAATAGATTAATAACATTAA6ATTATGCTA~GGTTGGAGT~ I G A L L A N R L IT L K T K 0 I F L

li

S

L

12600

127?3 R

S

P

L

T

C

K

S

M

N

W

I

C

Q

L

C

Y

G

1284C

GAGATATTGCAGAGCATGTACGAACCCCTiTiAATGGAAiTATTGAATT~AATGAAAAT~TTGTATATC~AACACGAAC~AGACATGGA~ATCCTGCAT~GATGTGTCA~ACTAATTTA~ D I A E H V R T P F bJ G I I E F 1‘1 E M F V Y P T R T R H G H PAW M C H TN

12960 L

F

TTTTAGTAAjTAAAAGTAA;AATAAAGTACATAATTTAA~TATTCCACC~AAAA~TTTA~TATTAGTTC~AAATAATCA~TACGTGGAA~CCAAACAAG~TATTGCCGA~~TTCGGGCT~ L V I K S K N K V H N L TIP P K 8 L L L V Q N N Q Y V E S K Q V I A E I

R

A

K

AAACATCAClTTTTAAAGAkAAAGTTCAAkAATATATTT~TTCTAATTT~GAAGGCGAA~TGCATTGGA~TACAAAAGT~CGTCACGCT~CTGAATATA~ACATAGTAA~ATTCACCTT~ T S P F K E K V Q K Y IY S U L E G E 14 H W ST K V R HA 8 E Y I H S N I

H

L

I

TACTTAAAAEGTGTCATAT;TGGATATTAiCAGGAAATT~TCATAAAAA~AACAATGAT~TATCTGTAT~ATTTTATAA~AACCAAGAT~AAATTGATT~TCCAATTTC~CTTACAAAA~ L K T C H I W I L 8 G N F H K K N N D L S V L F Y K N Q D K I D F P I S L

T

K

E

AAAAAAATGiATTTTCTJTiGTAAAAAATkAAACTCAAT~AAATCTTTT~CTJTTTCAT~TTTATCTTT~TAAAAAGAA~AAAATTTTT~~TAAATCCC~ATTAACAAA~AATATATTA~ K N E F 8 F V K N K T Q L N L F L F H F Y L Y K K N K I F I K S Q L TN

I

L

N

133SQ

13200

i332C

13440 N

13562

1366:

TTTTTATAA6AAATAATAAkTTTATTCAA~CAGGTACGC~TATTACTTr~AATATAAGG~GTAATACCA~TGGATTAGT~AAAATTCAA~AAAAAGGAA~TAATAATTA~GAGTTAAAA~ FIKNNKFIQAGTLI TSNIRSNTNGLVKIQKKGNNNYELKI

13921‘

TATTACCTGcAACTATATAiTATCCAAAT~AAACATATA~AATTTCAAA~CAAATAAGT~TTTTAATAC~ACCAGGAAA~AAACTTTTT~ATGAATTTG~ATGCAAAAA~TGGACATAT~ LPGTIYYPNFTYhlSKQI SILIPPGKKLFNEFECKNWTYL

14040

TTCAATGGAiTATGCCTTCiAAAGAAAAACCGTTCGTTT;TTTATTAAA~AAAAATAAA~ QWIMPSKEKPIV~IRPAVEYKISKKLNKSTLFDLLKKNKK

14161'

AAGTAGAAAiTAAAACTATRAATTATCTT~TTTAC~AAG~TGAC~AACA~ATTCAAATA~TAAATGAAA~AAACATTCA~TTAATTCAA~CTT~Tr~-AC~TGTACATTG~AAAAAAAAA~ VEIKTINYII YLII II F 0 [ 0 I I N E K N I Q L I Q T C L L V H W

Fig. 2, cont.

142X K

K

K

Y

Liverwort

Chloroplast

Genome.

305

II

14400

14520

TTTCAAAAAATGTTTTAAAnAAAAACTATinTGATCATTiATAATAT~TAATCAAAAiAATGGAATG~ SKNVLKKNYYDHFFSISKNELKNKKQGVIRIISNQNNGMQ

14760

15000

AATTGCTTTiCGAAAATTTiGTGATATCTAAATATAAAACTTATAAGTA~AAATATAAA~TArTTTATT~TTCGATTAG~TAAACCTTA~CTGGCGACT~ L L F E N F V I S K Y K T S Y P S G 0 I I S I N I N YF I I R

15120 L

A

K

P

Y

LA

T

G

GGGGGGCTAtTATTCATAAiAATTATGGT~AATTTATTA~AGAAGGAGAiACTTTAATA~CACTTATAT~TGAAAGATT~AAATCTGGT~ATATCATTC~AGGTCTTCC~AAAGTTGAG~ GATIHNNYGEFIKEGDTLITLIYERLKSGDIIQGLPKVEQ AATTGCTAGAGGCACGTCCAATAAATTCAGTTTCTATTAkTGATATGAT~AAATTTATC~GTAATCTTT~GGGTTTTTT~TTAAGTACA~ L L E A R P I N S V S I N L E N G F E D W N N D !I I K F I

15240

15360 G

N

L

W

G

F

F

L

8

T

AAATTAGTAiGGAACAAGGnCAAATAAACiTGGTTGATC~AATTCAAAA~GTATATCAAiCTCAAGGAG~ACAAATATC~AATAAACAT~TAGAAATCAiTGTACGTCA~ATGACTTC~~ I S M E Q G Q I N L V D Q I Q K V Y Q S Q G V Q IS 'd K H I E I I V R Q

M

T

SK

AAGTAATAAcTTTAGAAGAiGGAATGACTbnTGTTTTTTiAGTTCCTTA~AAACCCATA~ V I T L E D G M T N V F L P G E L I

K

P

I

K 15480

15600 E

F

S

R

T

0

K

lil

*j

R

A

L

E

E

A

VP

Y

L

TATTAGGAAiAACCAAAGC;TCTTTAAAT~CTCAAAGTTiTATTTCAGA~GCTAGTTTT~AAGAAACTA~AAGAGTTTT~GCAAAAGCT~CGTTAAAAG~CCGAATTGA~TGGTTAAAA~ L G I T K A S L N T Q S F I S E A S F 0 E T T R V LAYAALKGRIDWLKG

15720

GTTTAAAAG6AAATGTTATiCTTGGTGGA~TAGTTCCAG~GGGAACAGG~TCACAAGAA~TTATTTGGC~AATAACTTT~GAAAAAAAA~AAGAAATAT~TTTAAAAAA~AAAAAAGAA~ L K E N V I L G G L V PA G T G S Q E V 11;: Q! TLEiYKEIYLKKKKEF

15840

TTTTTACTA6AAAAATTAAEAATGTTTTTiTATATCAAG~CACATTTTC~ATTTTTCCT~CTACAGAAA~TATTCATAA~GTATTAAAA~AATCAATTT~TCAAAATAA~AAAAATAAT~ F T K K I N N V F L Y Q D T F S I F P T T E I :C )i V L i E S IS Q N N K

15960 N

N

F

TTTCTATTTAAAAAAAAAAiAGAAATTTAATGATATATAiGTAAAACCT~TTTTATATA~ACTTTTATT~TAAATAATA~TATAAAAAT~AAAAATGAA~CAAAAATCTiGGAATATTC~ s 1 === rp.72, MKQKSWNIH

16080

TTTAGAAGAAATGATGGAAGCAGGTGTTCj\TTTTGGTCA~CAAGCTCGG~AATGGAATC~AAAAATGGC~CCTTATATT~TTACAGAAA~AAAAGGTAT~CATATTA~,~~ATCTTACTC~ L E E MM E A G V H F G H Q A R K W N P K MA P i I r T E R KG I H I I N L

TQ

AACAGCTCGnTTTTTATCTGAAGCTTGTG~TTTAGTT~~AATGCGTCA~GTAAAGGAA~ACAATTTTT~ATTGTAGGA~CAAAATATC~AGCAGCTGA~TTAATTGAG~CATCTGCT,:~ TAR F L S E AC D L VAN AS SK G K Q F L ! V G T Y i 0 A AD L I E S S

A

16200

16320 1

AAAAGCTAGATGTCATTAT;TAAATCAAA~ATGGCTTGG~GGTATGTTA~CAAATTGGT~AACTATAGA~ACTCGTCTT~AAAAATTTA~AGATTTAGA~AATAAAAAA~AAACAGGAA~ K A R C H Y V N Q K W L G G M L TN W S T I E T R L 0 v FYDLENKKKTGT

16440

AATAAATCGACTTCCTAAAlAAGAAGCAG~AAATTTAAA~AGACAATTA~ATCATTTAC~AAAGTATTT~GGTGGTATT~AATATATGA~AAGTTTACC~GACATTGTT~TTATTATTG~ I N R L P K K E A A N L K R Q L D H L 0 K Y L G GI ' i 14 T S L P D I V I I

I

I)

TCAACAAAAkGAATTTACAGCTATTCAAG~ATGCATTAC~TTAGGAATT~CTACAATTT~TTTAGTTGA~ACAGATTGT~ATCCAGATA~GACAGATAT~CCAATTCCT~CCAACGATG~ QQKEFTAIQECITLGIPTICLVDTOCJP li M T D 1 P I P A N

D

D

TGCTAGAGCiTCAATTAGAiGGATTTTAA6TAAATTAAC/AAAAATAAT~ACTACTTTC~AATAAAAAA~ AR A S I R W I L N K L T L A I C E G R Y

Id

8

I

Y

11

===

16560

16680

+-------

TAGATTAAAiTAATAACAA6TCTTTTTTTaTTTATTCTTnTACGAAAAA~TGTCTTTTA~TATTTTTAT~TTATTTlT~~AGGAGTAAT~TGTCTCATA~TGCAAAAAT~GCTAGCAC~~ <------------+ > AI,GAG atpI, M S H T A K M A

~_

16800

16920 S

T

F

TTAATAATTiTTACGAAATkTCAAATGTCi;AAGTAGGTC~ACATTTTTA~TGGCAATTA~GTAGTTTTC~AGTTCA~~~~~AA~TACTA~TAACTTCAT~GATTGTAAT~GCTATTTT,~~ N Id F Y E I S N V E V G Q H F Y W Q L G S F 0 V 4 A C J 1 ITSWIVIAILL

17040

TAAGTTTGGETGTTTTAGCEACTCGAAATiTACAAACAAiTCCAATGGGiGGTCAAAATiTTGTCGAAT~TGrTTTAf~A~TTlATICGr~ATTTGACTA~AACACAAAT~GGAGAAGA,~~ S L A V L AT R N L Q T I PM G G Q N F V E I V L Cl IPDLTRTQIGEEE

17160

AATATCGTCCTTGGGTACCiTTTATAGGAj\CTATGTTTT~ATTTATTTT~GTTTCTAAT~GGTCTGGTG~TCTTTTTf,~~T~~(,GAGTT~TTGAACTCC~TAATGGAGA~CTTGCTGCA~ Y R P W V P F I G TM F L F I F V S N W S G A II P ~1 i, Vi ELPNGELAAP

17280

CAACAAATGl\TATCAATACiACTGTTGCA~TAGCTTTAC~TACATCTGT~GCATATTTTiATGCTGGTC~ACATAAAAA~~~~,ATrAAGT~ATTTTGGTA~ATATATTCA~CCAACCCCA~ TN D I NT T VA L A L L T S VA Y F Y A G L H Y Y I, L 5 Y F G K Y IQ P

17400 T

P

V

TACTTTTACEAATAAATATiTTAGAAGATiTTACTAAAC~TTTATCACT~AGTTTTCGA~TTTTTGGAA~TATTTTA~~(,~r,AC~~AATTA~TTGTTGCTG~ACTTATTTCiTTAGTACCT~ L L P I N I L E D F T K P L S L S F R L F G N II A IJLI VVAVLISLVPL

17520

TAGTAGTTCCTATACCTATEATGTTTTTAi;GATTATTTA~TAGTGCTATiCAAGCTTTA~TTTTTGCCA~ACTTf,(.Ar,CI1(,CTTACATAEGCGAATCTAjGGAAGGGCA~CATTAATAA~ V V P I PM M F L G L F T S A I Q A L I F A TLA A A f IG E S M E G H H =====z

17640

ATTTTTTTTkTAAAGAAAAiTAAGTATTA~AAAAAAAAT~TATATAATA~GATTGTTTT~ATTTGATTT~GAATTTTAT~ATATAGAGT~AGATATTTC~AATAAAGAT~TATATCTTT~ +---------, <-------+ +-- ----------)<-------. TTGTTT> lAIAAT>

17760

TTTTATTTAACATTTTTAT6AATATCGACATACTAACAA~AATTTtTAT~GGTAATTAG~TATTTCAAT~TTTTATTTA~AAATTTTTG~ATTTAAAATiTAATAAAAC~TTTAGTGAT~ +-----------> <- ----------+ +

17880

Fig. 2, cant

306

K. Umesono et al.

ClAACACTTnAAAAAGAGAEACTTTGAGTinTTAACTGC;AAAAATTTT~GTAAGCAAC~AAATAGATT~TTAATAAAA~CTTTTTTCA~AAATTTAGT~

18000 a tpn>

AAAGGAGAT;ATCATGAACECCTTGATTTtTGCTGCTTC~GTTATTGCT~CTGGATTAG~TGTGGGCCT~GCTTCTATT~GACCTGGAA~TGGTCAAGG~ACTGCAGCA~GTCAAGCTG~ AGGAG MNPLISAASVIAAGLAVGLASIGPGIGQGTAAGQAV

13120

AGAAGGTAT;GCAAGACAGECTGAAGCAG~AGGTAAAAT~CGAGGTACT~TACTTTTAA~TTTAGCTTT~ATGGAAGCT~TAACCATTT~TGGATTAGT~GTAGCTTTA~CACTTTTAT~ EGIARQPEAEGKIRGTLLLSLAFMEALTIYGLVVALALLF

18240

TGCAAATCCEiTTTGTTTAAiAATTTCAAT;TTTGAATAA~TAATTGTTA~TAATTTTTC~TTCTTGAAA~AAAAGAAAG~AAAATTAAT~ACAATTAAA~TATATTTAA~ATTTTTTTG~ +------------------------, <------------------------+ A N p F " ==z===

18360

GGTAAAAATnAAAAAAAGCAAAAATTTTTAAnTGAA~AAAACTGTT~AGTAAACAA~AAACTCCAT~ATTTTCAAT~ATATAATAA~GAAAAAAAG~GGACAGCAT~GAAAATGGG~ +----------) <-------+ GAGG M TTGAGT> TTCAAT, atpl+

18480 E

N

Gl

CTTATTTTA;TATTTCCTCAAATTTTTGGj\CTATAGCTG~AAGTTTTGG~TTAAATACA~ATTTATTAG~AACAAATTT~ATCAATTTA~GCGTAGTAC~TGGGTTGTT~GTGTATTTT~ Y F I I S 5 N F W T I A G S F G L N T N L L E T Nt I N L G V V L G L L V Y

F

18600 G

GAAAGGGAGiGTGTGCGGGiTGAATATTTGAATAAAAAA~TGGAATGAT~CAATAATAC~TAAATAAAA~AGTATATAA~CCTAACGAA~AACTTTTGG~TAAAAAACT~AAAAGAACA~ K G V Lgugyg.......................................(lntron)........................................................

18720

TAGCATTTCETAAACTCAAkAAATTTATTiTGAGAAGGG6GGATTTTCT~TATCCCACC~AGCTTTTTG~ (lntron)........................................................ ,.......................................................

18840

ATGGTTGAAbATATTTTTGATATATAACACTCATATCAAiCTATAAAAT~TATTAAATT~TGATAATCT~CCCTTAAAT~TTTTTAAGT~CTGAATTGA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron)........................................................

18960

GACCTATTTACAATTTATAATTTATATAA~AACAATCTT~CTGACAAGT~TCAAIATTT;TGTCAAAAGAATCATCAACAATTATTTTACGTAAAAAAA~GAAAATAAA~AAAGAAGAT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron)........................................................

19080

AGTTCAGTCAAATCATCAAAACTTTTTTGiAAAAAACTG~ATAAGAAAG~CGAATGAAT~GAAAAGTTC~TGTTCGGTT~GGGAAGAGA~TATAAAATA~ATATATAAT~TACTTTCAT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..ragccg-augaa--gaaa--""~~"g"-~gg~~y.............................cuayy-y-ay

19200

AAGTAATCTi\TTAAATAATCGTAAACTGACCATTCTAAA~ACTATTCAA~ATGCAGAAG~GCGATATAA~GAAGCTACT~ATAAGCTTA~TCAAGCTCG~ACTCGGTTA~AACAAGCAA~ S N L L N N R K L T I L N T I Q D A E E R Y K E A T 0 K L N Q A R T R L Q

Q

A

19320

ACAAAAAGCkGATGATATCCGAATAAATGGATTATCTCAAGATTCAAAA~ACGCTACTA~ Q K A D D I R I N G L 5 Q M E K E K 0

A

Tl

K 19440

D

L

I

N

A

A

D

E

D

S

K

R

L

E

D

8

K

N

TCGTTTTGAAAAACAGAGAGCTATTGAACAAGTTCGTCAACAAGTTTCT~GTCTGGCTT~AGAACGAGC~TTAGAAACA~TAAAAAGTC~TTTAAATAG~GAATTACAT~TACGTATGA~ RFEKQRAIEQVRQQVSRLALERALETLKSRLNSELHLRMI

19560

TGATTATCAiATTGGCCTACTTAGAGCCAiGGAAAGTACkATATTCGAC~TGATGAAAT~ D Y H I G L L RAM E S T I E ===

19680 atpA>

MVNIRPDEI

AGCAGTATT2\TCCGTAAACkAATAGAACAATATAATCAAGAAGTTAAAAiTGTCAATATjGGAACAGTACTTCAAGTTGGAGATGGTATiGCACGTATTiATGGTCTTGkTAAAGTTATG S S I I R K Q I E Q Y N Q E V K IV N IG T V L Q V G D G I AR I Y G L D K

V

M

19800

GCAGGTGAAiTAGTTGAATiTGAAGATGGiACAGTAGGA~TTGCTTTAA~TTTGGAATC~GATAATGTT~GTGCTGTTT~AATGGGTGA~GGATTAACT~TACAAGAAG~TAGTTCTGT~ A G E L V E F E D G T V G I A L N L E S 0 N V GA V L M G D G L TiQ E G S

5

V

AAAGCAACAEGTAAAATTGETCAAATACC6GTTAGTGATGCTTATTTAG~CCGTGTTGT~AATGCATTA~CTCAACCGA~TGACGGAAA~GGTCAAATA~CAGCATCTG~ATTTAGACT~ K A T G K I A Q I P V S DA Y L G R V V N A L A Q P I D G KG Q I PAS E F

R

L

ATTGAATCTECAGCTCCAGbTATTATATC~AGACGTTCT~TTTATGAAC~TATGCAAAC~GGACTTATT~CTATTGACT~TATGATTCC~ATTGGACGT~GTCAGCGAG~ATTAATTAT~ I E SPA P G I IS R R S V Y E PM Q T G L I A I D SM I P I G R G Q R E L

I

I

19920

20040

20160

GGAGACAGAEAAACAGGAA6AACAGCTGTkGCTATTGAT~CTATTTTAA~TCAAAAAGG~CAAAATGTA~TATGTGTTT~TGTAGCTAT~GGTCAAAAA~CCTCTTCTG~TGCTCAAGT~ IGQKASSVAQV GDRQTGKTAVAIDTILNQKGQNVVCVYVA

20280

GTTAATACAiTTGAAGATCtTGGTGCATT~GAATATACA~TTGTTGTTG~TGAAACTGC~AATTCGCCT~CTACATTGC~ATATCTTGC~CCTTATACT~GAGCTGCTT~AGCTGAATA~ VNTFEDRGALEYTIVVAETANSPATLQYLAPYTGAALAEY

20400

TTTATGTATEGTAAGCAACkTACTCTTAT~ATTTATGAT~ATCTTTCTA~ACAAGCTCA~GCTTATAGA~AAATGTCAC~TTTATTAAG~AGACCACCA~GAAGGGAAG~TTATCCTGG~ FMYRKQHTLI IYODLSKQAQAYRQMSLLLRRPPGREAYPG

20520

GATGTTTTTiACTTACATTCTCGTCTTTTkGAAAGAGCA~CTAAATTAA~CTCTAACTT~GGTGAAGGT~GTATGACTG~TTTACCTAT~GTTGAAACC~AAGC~GGTG~TGTTTCAGC~ DVFYLHSRLLFRAAKLSSNLGEGSMTALPIVETQAGDVSA

20640

TATATTCCAI\CAAATGTTAiTSCTATTACkGATGGACAA~TTTTCTTAT~AGCTGACTT~TTTAATGCA~GAATTCGTC~AGCAATTAA~GTAGGTATT~CTGTATCAA~AGTTGGTTC~ YIPTNVISITDGQIFLSADLFNAGIRPAINVGISVSRVGS

20760

GCTGCACAAATTAAAGCTAiGAAACAAGT6GCTGGTAAA;ATTAGCTCA~TTTGCAGAA~TGGAAGCTT~TG~TCAATT~GCTTCTGAT~TTGATAAGG~TACTCAAAA~ AAQIKAMKQVAGKL KLELAQFAELEAFAQFASDLDKATQN

20880

CAATTAGCAkGAGGTCAAAEATTACGTGA6TTACTTAAACAACAGATAG~TACTATTTA~ACTGGCGTT~ACGGTTACT~AGATGTATT~ QLARGQRLRFLLKQSQSAPLSVEEQIATIYTGVNGYLDVL

21000

GAAACAGGAcAAGTTAAAARATTTTTAAT~CAATTACGTGCAGAACAAG~AGAAAATCT~ ETGQVKKFL IQLREYLVTNKPQFAEIIRSTKVFTEQAENL

21120

TTAAAGGAAGCTATCACTGkACATATCGA6CTTTTCTTAiTGAATAAAA~TTTATTTAG~ L K E A IT t )I I tl F L F Q E E K ===

+-------

<---

TTAAATAATiAcGTCCAAT6GGATTCGAA~CTATACTGG~GGTTTAGAA~ACCTCTGTC~TATCCTTTA~ACGATGGAC~CTTAAAAAA~AATTTGAAA~GATAAACTC~AAAAAATTT~
Fig. 2, cont.

21240 + <---

21360

Liverwort

Chloroplast

Genome. II

307

TCATTTCAA;\TTCTTTTTTnGAAAAGCGGi;TACGGGAAT~GAACCCGCA~CGTTAGCTT~GAAGGCTAA~GGTTATAGT~GACATTTAT~AAAAATAAA~AATGCCTCT~ATTCAAAAC~ --------------m----+ 3'-UCGCCCAIJGCCCUUAGCUUGGGCGUAGCAAUCGAACCUUCCGAUUCCya-y-yyauc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..yuugg

21480

GAACGTGAAkGTTTTCTTTEATTCGGCTCCTTTATAAAAAATTCAAAAA~TACATGTAT~TTTTGAATT~CTTACCGAT~TTTATAACA~CTATGTTTA~TTTTTTTTT~ c-uguacuu---aaag-aagua-gccgar...........................(lntron)........................................................

21600

CATTTTTATbTAAAAAACAETTTATTAGAiGATCCTTTTAGAAAGATAA~ATAAACAAG~TTAAATTAA~AACTTTCTA~TTACTTCGT~CTTTATTTC~ATTTGTTTT~AATCTTTAG~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron)........................................................

21720

AAATTTTTT;TCACCAAGTiTTTTGTAAT~TTAATAATT~TAGCAAACT~AAATTACTT~AATAATCAC~ATTTTTGCA~CAAAATTTT~AAGATTTAG~TATGATGAA~AAATTTATT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron) . . . .. . .. . . . .. . .. . . . .. . . . .. . . .. . . . .. . . .. . . .. . . . .. . . .. . . ..

21840

TTGATTTCTiTCCATAGATiTTTAGCAAA~CAAAAATTTiTATTTTGAA~TATTCCGCT~TTTTTTAAG~AATAATGCT~TTATTAATA~GACATTAAT~TTAATAATT~AAAAATTTA~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..(1ntron)........................................................

21960

TAAGAAATAAAGTTATAAARAAGAAATTAiGATCCTTTA~CTTTTTTTT~TTTACGAAT~GCACTTTTA~CACTAAACT~TACCCGCTG~ATAATTATT~TATACTATA~ATTAGGTTT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..gygugAAAAUGGUGAUUUGAUAUGGGCG-5'
22080

TATCATATTiiTTTAACTCC6rCCCCGGAA~TAGATTATT~ACAAAATGC~TATTATTTC~GGAGATGGA~AATCAAAAA~ATTACAAAT~ACCTTTACG~GCTGCTAAT~AAGCAATAA~ +----------> <- ----------+ === L N G K R A A L LA

22200 I

V

TAATGGCCCiGATGCAACTkTTAAAGCCA~AACAGTAAG~TGTGCAATA~CTTCTAAAT~CATTCTGGT~TAACCTCTC~TTTTTTTTA~AATTCTAAA~TTTTTACAA~AATTTTTTT~ +-------L P G S A V I L A L V T L Q A I V CL14 M GGAG cORF33

22320

ATTAACGTA~ACTTAAAATkACAATAAAA6AAATCTATA~CGATAATAA~CTGAGATCA~TACAATAAT~ACAAGACGA~ATTCTTTAT~CAAAATTAA~TCCATACTA~CCAAAAAGA~ === F L L L F F R Y R Y Y V SIL V N N V L R Y E K N L I L V M cORF30

22440

AAATATAATiTTGGATTAAiTGCTAATTG~ATTATATAT~TTTTTTTAT~TTTTAACGA~TAAGGAGAA~AAATAATGA~CTCAATTTC~GATAGTCAA~TTATTGTAA~TCTTTTAAG~ TATATT> ORF32> TTGGAT> AGGAG MTSISDSOIIVILLS

22560

GTATTTATAnCTAGTATTTiAGCTTTAAG~CTAGGAAAA~AGTTATATC~ATAAATGAT~TAAATCATT~AATTTTCAA~TTTGAAAAA~AAAGTTTTG~CAAATTTAA~ATATTATTC~ f-----) <-----+ V f I T S I LA L R L G K E L Y Q ===

22680

AAATAATATAGAATATATAiATATATATAiATATATATA~ATATATATA~ATATATATA~TATAGTGTA~TCCATCAAA~TAAATAATT~GAATAAAAA~AATAAAAAA~ATTGAAATA~ +-----------------------)(------------------------+

22800

>

<----------+

TTGAAA>

ATATTTTTTiTATAGTATAnTGAATATAC~ATATGTTGT~TATATATAC~ATTTTTTGT~TTTATTCAA~AACAAAAAT~TTTTTATAT~TGGAGAGAT~GCCGAGTGG~CGAAAGCGG~ +---------> TATAAT> <---------+Ser-GCU> 5'-GGAGAGAUGGCCGAGUGGACGAAAGCGGC GGATTGCTAATCCGTTGTAEAAGCTTTTIGTACCGAGGG~TCGAATCCC~CTCTCTCCG~TTAAAAATT~AATGTTTTA~TCTTTACGT~CAGGATTAC~TCCTGGATC~TTAGATAAA~ GGAUUGCUAAUCCGUUGUACAAGCUUUUUGUACCGAGGGUUCGAAUCCCUCUC~JCUCCG-3' === E K R G P N R G P 0 N S

22920

23040 L

ATCCAAAAAEAAAAAGAGAAACAAAAAAA~TAACCACTG~ATAAACAAA~AGCTTAAGA~TAAGCATGA~ATAATCTCC~GGATCATTA~TAGAAATAG~ATAGAATAA~ATTCAAATT~ FGFVFLSVFFIVVTYVFLKLTLM GAGG
23160

23780

CTTCTATAA6TTAAAAAAAGTCTTATAGAiTTTGAATAA;GAAATTTAA~ATCCGCTTT~ f----------->

23400

<-----------~~

TAAACTTAAbTTACAATAA6AATTTTAAA~TGAGAATTT~TCGAAAACT~ACTGAAGCT~GCCATACAA~GGCTAAAAG~AAAAAAAAC~AAGGTATAA~TGGCATTAC~TCTACAATT~ -+ === R F S V S A Q W V F A L L F F F L P I I PM V

23520 D

V

I

GATCAAAAAiCGAATAAGCiTCTGGsnnTiTAGCAAAAG~GATACCATT~AAATAAAAC~CATTTTCTA~ATAAATATT~AACATAACT~AATTTTTAG~TCTCCAATA~ATATTATTA~ POFISYAEPLKAFTIGNLYFANELYINFM ClhcA GAGG AAAAGTGAAkTAGAAGAGA6AAAAACTTA;1IATATATAG~ATAAACTCT~AGGTACATC~TACTACAAG~TGTTTTAGC~TTAAAGAGG~TATAATTTT~TGATTTTTA~TAAATAGAT~
23640

23760 TT

GACAAAATG6AAAAAAAAAiGATTTACiAGAATCAAATAGATCCTTCCG~CCCAGACTT~ GACA> TTTACT> Gln-UUG> 5'-UGGGGCGUCGCCAAGUGGUAAGGCUGCAGGUUUUGGUCCUGUUAUUCGGAGGUUCGAAUCCUUCCGUCCCAG-3'

23880

TTATTAATTtTTTTATATT;TTCAATTTTiCTAAAAATThTATTTTTTA~TTATATTAT~ACAATAAGA~AAATATGGC~AGGGATTTT~

24000

CTATGTTATRAAATATAAAiATAGATTAT~GAAAGTAAG~AGATTTTAA~TTATGAAAT~AGCTTATTG~ATGTATGCT~GTCCTGCTC~TATTGGAAC~CTCCGAGTA~CTAGTTCTT~ ORF513> AGGAG MKLAYWMYAGPAHIGTLRVASSF

24120

24240

24360

24480

24600

24720

24840

Fig. 2, cont.

308

K. Umesonoet al.

TAATATATGGTCTCCTATTiTACTAGGAAAAAAATTTGA~TTTGAACCT~ATATTGACG~GCAAACTAG~TTTATTTCG~AAGCTGCTT~GTTTTCAAG~TCAATTGAT~GTCAAAATT~ N I W S P I L L G K K F D F E P V I D E Q T R F I S Q A A W F 8 R SID C Q

24960 N

L

AACAGGAAAAAAAGCTGTTbTTTTTGGTGj\TGCAACACA~GCTGCTTCA~TTACAAAAA~TCTTGCTTG~GAGATGGGA~TTCGTGTTA~TTGTA~TGG~ACTTATTGT~AACATGATG~ TGKKAVVFGDATHAASITKILACEMGIRVSCTGTVCKHDE

25080

AGAATGGTTiAGAGAACAAbTTCAAAATT~TTGTGATGAATACAGAAGT~GGGGACATG~TTGCTCGTA~AGAACCATC~GCTATTTTT~GTACTCAAA~ EWFREQVQNFCDEILITDDHTEVGDMIARIEPSAIFGTQM

25200

GGAACGTCAiATTGGTAAAEGTCTTGATA;TCCTGTGGAGTTATTTCC~CACCAGTTC~TATTCAAAA~TTT~~TTTA~GTTATAGAC~TTTTTTAGG~TATGAAGGT~CTAArCAAA~ ERHIGKRLDIPCGVISSPVHIQNFPLGYRPFLGVEGTNQI

25320

AGCAGATTTnGTTTATAATiCTTTTACTTiAGGGATGGA/GACCATCTT~TAGAAATTT~TGGTGGACA~GATACTAAA~AAGTTATTA~TAAATCTTT~TCTACAGAT~CAGATTTAA~ ADLVYNSFTLGMEDHLLEIFGGHDTKEVlTKSLSTDlDLT

25440

TTGGAATTCiGAAAGTCAAiTAGAGTTAAkTAAAATACCjAAATAACAT~ACAAAAATT~CTGTTGAGG~ WNSESQLELNKIPGFVRGKIKRNTEKFARQNNITKITVEV

25560

TATGTACGCkGCTAAAGAAGATTTAAGTGCATAAAAATT~TTAAGTCTC~GTTTTACTA~ATAATTATT~TTTTTATTT~CAAGCTTTT~TTACTTTAA~CAATTTGAA~TTGAAAGTG~ +-----------M Y A A K E D L S A ===

25680

ATAAATTTTiGGATTCAAA~AAATTAATTiTTTTTGAATECAAAAATTT~TGCACTTTC~ATTTATTAC~AAACTTAAG~AGATTTrAA~GAATCCATl~TTTTATTTT~TACAATTGT~ ____-------_-------_---><-----------------------------------+ ORFSO> AGGAG M N P F F V F V

Q

L

F

25800

TTTTTATTAiCCATTTTTTATCATTTTTTiATATATTTAjAACAAATAA~ATAAATTTT~CAAATATTT~TCCACTTTT~TCAAAATGG~TCAAAAAAT~ F V V P F F I I F L V I Y L V F F I P K TN N I N F 5 u I F P

K

K

==

25920 L

F

S

K

W

I

AAGTAAATTiTTTAAATTGAATCTAATTAAAATAGAATTGACAATACTA~TTCACATAC~TATAATTGT~TTGATGTTT~TT,4TTTTl~~ATCCATTTT~AAATTCAAA~TTTATTTTA~ TTGACA> TATAAT> Lys-UUb

26040 S'-G

GGTTGCTAAETCAATGGTAcAGTACTCGGCTTTTAAGTGCGACTTGGAT~TTTACACAT~TAGATGAAA~AAAAAATTC~TCCATACCG~TGACAAGGT~TGTAAAACT~CGACTAATC~ GGUUGCUAACUCAAUGGUAGAGUACUCGGCUUUUAAgugyg.....................(~ntron)..................................................

26160

TAAAAGGAAkCTTTACAGA6AAAATAGCA~GTCGTTTAT~TTTTTTCAT~CATTTTTTT~AACTAAATA~AATTTCAAT~A4aAATACA~TAAGTCAAT~AAAGTTAAT~GATAAAGCT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..(lntron)..................................................

26280

AATTGCTTAAATCATAGGTAAAAGAAGAA1CAGCTTCTG;TTCAAATTT~TGAAATATT~TCTTGATTT~TTTAAAAAA~TGTTAAAAG~TTTTTAAAC~GTCAATATA~GAGAAAAAT~

26400

CCTATTACTiTTTAGGTTTiTTACCAAAAATGAATCCTA~ACACTTTTT~AAATGTGTC~AGAAATAAC~AGCATGCTG~TTAATATAA~TTTTTATTA~GTCAGGAGA~CAATCAATT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (intim) . . . . .. . .. . . . .. . . . .. . . .. . . . .. . . . .. . . . .. . . . .. . .. . . . .

26520

AAAAAAATGiTTTTTTTTTiTTGTTTAAnkGnTTGTACT~TGTGTTTAT~CTTTTATAT~TTACAAAAT~ATTAAATGA~AACTATACT~AAAGTTTTG~CTACTTTTG~AAAACAAAC~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron) . . .. . . . .. . . .. . . . .. . . . .. . . . .. . . . .. . .. . . . .. . . . .. . .. .

26640

AACAAAAAT;TTTGGAATA6ACCATATAT~TATATATTTiTCAGG~AAATTTTTA~GGAATTGCA~ATAATCG~~~-

26160

. . . .. . . .. . . . . .. . .. . . . . .. . . .. . . .. .. . .. . . . . .. . . .. . . .. . . . .. . . . . .. (1ntron). . . . .. . . . .. . .. . . .. . . . .. . . . . .. . . .. . . . .. . .. ...*.....

mATAAATAAhACCAATTAT6AAAATTTTTiTTTTTTGGTi

. . .. . . . .. . . . .. . .. . . . .. . . . .. . . . .. .. . .. .. . .. . . . .. . . .. . . . .. . . . ...i1ntrcn~..................................................

TAAAAAATAhTTTTAATTTiTTGAAAATAkAncGncTTA~AAAAAAAAT~CGTCAATTT~ATTTTTTAT~T~~~T~TTC~AAAAAAaAA~ATTTTGTAA~TATCAAAGA~GTTATAGTT~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..(lntro~................................................... TTATTTTGGATAATATTTT~ACATTTGAAiCnnAAAAAAT~TTTTATTAA~AACAAAATA~AAAGTTATC~ATC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron).

26880

,ATiCA+TCTATATTTkTTTTATG&ACATAGAAT~TATAATTCAA . . . . . . . . . ..ORF370i>....M E H R I Y N

ATTATTTTTiAGATATTACAATACCTTATTTTTTTCACCCATTAGAATC~TTCGTCGGC~TATT~~~~~~~T~~CA--~~ Y FL D I T I P Y F F H P E I LIR I F R R n I C

;:

P

F

i

27000 5

N

TACATTTTT+GCGAACTCTiTTATATAAAi H F L R T L L Y K

M

ATAAATGTTiAAATATTTT;AATATAGAA~ATTTTTTTT~TTTGAAAAA~AATCAGTTT~TTTGTTTTT~~T~GAATTT~?~TfiT~iAC~AATTTGAAT~TCTTTTAAA~GATATATGG~ f E F E Y L L N 0 K C L N I L N I E N F F Y L K K N Q F F C F L ~ : F ' :

27240 I

hi

AAAAATTTTkTAAATTTGAbTCAGTATTT~TTTGGAATT;TATTGATAA~ACAAATTCT~TAAAAAAAA~AAAACATAT~TTAAAAAAA~CTAAAAAAC~GATTGAAAA~AAAATTGTA~ K F Y K F E 8 V F F W N F I D K T N SIK Y Iv H Ii k Y: Y K P I E K KIV AAAAAATAAbTTCCATTCAiTATATCCGA~ATAAAAATA~TTTGATTAT~ACTTTAAAT~ATAGAAATA~TTTG~TTTT~GAAAATTGG~AAGATTTTT~TCTTATTTT~TGGCAAAAA~ K IS S I H Y I R Y K N N L I I T L N 0 R II : L I L E II : Y D F F L I F W

27120

E 27360 K 27480

Q

K

Y

ATTTTAATGiTTGGTTTAAnTCTTCTAGAiiTTTTAATT~~AAATTTTTA~AAAAACTCA~TTT~TTTTT~AGGTTATAT~rTT~~T~~TT~AAAGTCAAA~TATTTTAAT~CAAATTCAA~ F N V ii F K 5 S R I L I Q N F Y K N S F S F L G f :li k I i 5 0 I ILIQIQI

27600

TAATAAATTiATTAAGAAAiGTTAATTTA~~TAAAAAAG~ATTTTGTAG~ATTATTCCA~TAATACCTT~AATTAGACT~TT~f~~TAAA~AAAAATTTT~TGATGTTTT~GGACGTCCA~ I N L L RN V N L I K K E F C S I I P V I P L I PI I F v FYFCDVLGRPL

27720

TTTGTAAATiATCTTGGACIACATTATCAGATAATGAAA~TTTTGAACG~TTTGATCAA~TAATAAAAC~TATTTrTA~~~T~~~ATAf~T~GATGTATTA~TAAAAAAGG~TTATATCAA~ rj L I: I N K K G L C KLS W T T L SD N E I F E R F D Q I I K H Ii : ' i

Y

Q

L

TACAATATAiTTTCCGATTiTCTTGTGCTAAnnCATTAG~ATGTAAACA~AAAAGCACA~TACGCACTG~TTGGAAAAA~TAT~r,TT~A~ATTTATTAA~AAGTTCTAT~TTTTTTAAT~ Q V I F R F S CA K T LA C K H K 5 T I R T V b: Y y / (, (I II I L T S SIF

F

N

K

L

L

K

AAACAAAATiAATTTCTTT6AATTTTTCT~ATAAAAATC~TTACAAAAA~AATTTTTGG~ATTTAAATA~TATT~AA~r~AATTA~TTA~CA~ATTCAT~ACAAAAAAG~AAATTATTA~ T K L I S L N F S N K N P Y K K N F W Y L N I ID VII i L AH 5 L Q K 5

27840

27960

28080 K

AAGAATAAAkAAACATAGAEAAAGCCGTA~GCAGTAAAA~TTGCAAGTA~GGTTTGGGA~GAGATGATT~TATTTTTA~~GAAAAAAAA~TTATTTATC~ACTTCATCC~ACGAGTTCC~ E ===.. . . . . . . . . . ..ragccg-augaa--gaaa--uucaugu-cgguuy...........................................cuayyy-ayCCGACGAGUUCCG

28200

GGTTCGAGCECCGGGCAACECATTTTTTTiATTTTAATA~AATTTCTTG~TTTTTTAGG~AATATTTGA~TATTAGTTG~~ATAAT~AT~TGTTATGTG~AATACTATA~GTTAACAAG~ TACTAT> GGUUCGAGCCCCGGGCAACCCA-3' TTGACA;

28320

TTAAATATTiGGGAAACTCiTAATTATTT~AAAAACCAA~TTTTACTAT~ACCGCTACT~TAGAAAGAC~CGAAAGCGC~AGCATTTGG~GTCGCTTCT~CGATTGGGT~ACTAGCACT~ MTATLERRE 8 A i I I ! G RF C 0 WV T psbA>

Fig. 2, cont.

28440 ST

E

Liverwort

Chloroplast

Genome. II

309

AAAACCGTTiATACATTGGiTGGTTTGGTi;TnTTGATGA~TCCTACTTT~TTAACAGCA~CTTCAGTAT~CATTATTGC~TTTATTGCA~CTCCTCCTG~AGATATTGA~GGTATCCGT~ NRLYIGWFGVLMIPTLLTATSVFIIAFIAAPPVDIDGIRE

28560

AACCTGTATETGGTTCTCTiCTTTACGGA;\ATAACATCA~TTCTGGTGC~ATTATTCCT~CCTCTGCAG~TATCGGTTT~CACTTCTAC~CTATTTGGG~AGCTGCTTC~GTTGATGAA~ P V S G S L L Y G N N I I S G A I I P T SAA I G L H F Y P I W E A A S V D

X768@ E

W

GGTTATACAi\TGGTGGTCCiTACGAACTTATCGTTCTTC~TTTCTTACT~GGTGTAGCT~GCTACATGG~TCGTGAATG~GAACTTAGC~ATCGTTTAG~TATGCGTCC~TGGATTGCT~ LYNGGPYELIVLHFLLGVACYMGREWELSYRLGMRPWIAV

28800

TTGCATATTCAGCTCCAGTiGCTGCTGCTi\CTGCTGTTT~CTTGATCTA~CCTATTGGT~AAGGAAGTT~CTCAGACGG~ATGCCTTTA~GTATCTCTG~TACTTTCAA~TTCATGATT~ AYSAPVAAATAVFLIYPIGQGSFSDGMPLGISGTFNFMIV

28920

TATTCCAAGETGAACACAAtATCCTTATGCACCCATTCCATATGTTGGG~GTAGCTGGT~TATTCGGCG~TTCTCTATT~AGCGCTATG~ATGGTTCTT~GGTAACTTC~AGTTTAATC~ FQAEHNILMHPFHMLGVAGVFGGSLFSAMHGSLVTSSLIR

29040

GTGAAACTACTGAGAATGAETCTGCTAATGCAGGTTACA~GTTTGGTCA~GAAGAAGAA~CTTACAACA~CGTAGCTGC~CACGGTTAC~TTGGTAGAT~AATCTTCCA~TACGCTAGC~ ETTENESANAGYKFGQEEETYNIVAAHGYFGRLIFQYASF

29160

TTAACAACTCTCGTTCTTTACATTTCTTCjTGGCTGCTT~GCCAGTTGT~GGTATTTGG~TTACTGCTT~AGGTATCAG~ACTATGGCT~TCAACTTAA~TGGTTTTAA~TTTAACCAA~ NNSRSLHFFLAAWPVVGIWFTALGISTMAFNLNGFNFNQS

29280

CTGTTGTTG6CAGTCAAGG;CGTGTAATTAACACTTGGG~TGATATTAT~AACCGTGCT~ACCTTGGTA~GGAAGTTAT~CATGAACGT~ACGCTCACA~CTTCCCTCT~GACTTAGCT~ VVOSQGRVINTWADIINRANLGMEVMHERNAHNFPLDLAA

29400

CTGTTGAAGCTCCTGCTGT2AATGGTTAAiGTCCTATAA~AAGGTTACA~AAATAATAA~GAATATTTA~TATTTTAGT~AGAAATTAA~AAACTAAAA~TTTTTAAAG~AGGAAAAAA~ +- -------------) (-----------+ V E A P A V N G ===

29520

TAGAAAATAATGACCTTTGnGACTTGAAA~CTTAAAGGTETGGATTAAG~CAGTGGATT~TGGATCCTC~ +----------------, <----------------+ His-GUG>

29640 5'-GGCGGACGUAGCCAAGUGGAUUAAGGCAGUGGAUUGUGGAUCCUCu

ACGCGCGGGiTCAATTCCCtTCGTTCGCCEAnTAACAATiCTTTGTTTA~GTCATTATA~TGAAGATGA~ATAAAATAG~CTATTTTTC~ +--ACGCGCGGGUUCAAUUCCCGUCGUUCGCC-3'

29760 <----------

>

---++--------

TTTTTTAATiTTTTATAAA;TATCTACTTiAATATTAAT~ATTAAAAAA~AGAAAAATA~GAAATAAAA~TTTAGATAT~TTGTATTTT~ATATTTTTA~AAAAAAAAT~TATTTATTA~ <---------------+ >

29880

TAAAGTCTT6TGTAACTGTiGTAAAAAAA;GAAACAAAA~TTACCAAAA~AAAAATCTT~ATATAAAAA~TTAGATTTA~ATGAAATAC~AAAAATTCA~AATTTAGGA~ATCCATACA~ ORF2136> MKQKLPKKKSLYKNLDLDEIQKIQNLGNPYT

30000

AAAATGGAGiTTAATTAGAiTGTTAATTGCAATATTTTCEATTTTAGTA~TTTATTGGA~TTTCAAATT~TTACTTCAT~ATTTTTTCG~GATTTATAT~ATTCAAAAA~ KWSLIRLLIAIFSNKRNFSTLLDFQILTSLFFRDLYNSKK

30120

AAAAAAAAAGTTTTTACTTiATATTTTAG~TTTTTTAAC~TTACCTTTT~TTGTCTATA~ATTAATTGA~AAAAGTATT~TTGAACAAC~AAATTTTGA~TTTCTAAAA~TTCAAAAAC~ KKKFLLNILVFLTLPFFVYILIOKSIVEQQNFDFLKIQKQ

30240

AAATTTTATiGAAAAAAATnATAAAAGTA~TTTAAAAAAiAAAAAAAAA~GGTATAAAA~ N F I E K N N K 5 I L K N N F Y F L

30360 N

T

K

F

D

I

F

L

H

N

F

F

S

L

K

K

K

K

W

Y

K

N

TTCACTGTTl\AATTTAATTGnTTTTCGTTEGnTTTTAAA~AAAAAAGAA~TTTTAAATC~TC,4TTGGTG~AAATTTTTG~TTTTAGAAC~AATTCAATC~AATTGGAAA~TATCCGAAG~ SLLNLIDFRSILKKKEILNLHWWKFLVLEQIQSNWKISEE

30480

ATCTTTGTCiGAACTCAAAkTTGTATTAGAACAAAAAAA~ATAGATGAA~TAAAACATT~TTTTGAATT~TATATTAAT~AAAAAATAT~TCCTAATAA~AATTGGGAA~ACTATTTTT~ SLSELKIVLEQKNIDELKHFFEFYINQKIYPNNNWEYYFY

30600

TTCAATTTTiATAAACCAAiTAAAAATTGATATAAAAAAiATAAAAATA~TATTGGTTT~GAAGTTTTT~TGGCTTTTT~TGAAAAACT~TTATTTGAA~TTGAATTTT~ SIFINQLKIDIKNSKYNKNSIGFEVFLAFCEKLLFEVEFL

30720

ATCTAAGCCnAACAATAATkATTTACAAA~GAAACTAAA~TGTCTGGAA~ACTTTAGTT~TTTAGATAT~TTTTGCATA~TAAATAAAA~ACTTCCATG~GTTAACAAA~AAATATTTA~ SKPNNNNLQMKLNCLENFSFLDIFCLNKKLPWVNKKIFK

30840

AAATTTACAi\AATTTTAATEnnTCAGATAi\AAAACTTAT~GAATCGTTT~TTTTATTAA~AATAAAAGG~AATCTATAT~TTAAAAATT~TATTGAATT~GTTACTTGG~AATCATAT4~ N L Q N F N E S D K K L I E 5 F F L L K I KG N L Y F K N Y I E F VT W Q S

Y

K

30960

AAAGGATTGiTTGGATTTTAATAAGTTTAATGAATTAAAiTAAATTTTC~AAATATATT~TATATGAAG~ K DC L D F N K F N E L N N 5 E I Y I K I E

E

'G

31080 E

L

F

S

D

Y

I

Y

K

F

S

K

Y

I

L

Y

AAAAAAATCcAAAACCATAiTAAAACAAT~TTTTAATAA~AATATTTAT~ATAAAAAAT~GAATTCTAT~TTTAATTTC~ATACTATTT~TTATTTTGA~TCGAATAAT~TACTTTTT,~~ KKSKTIIKQSFNNNIYYKKLNSIFNFNTIFYFDSNNLLFD

31200

TTGGTTAAAkAAAAATTATiATATCAATAATAAACCATT~CTAAAATCA~TTTTAATTT~CTCAAGTAT~TCAAATCAG~TTATTTTAT~TTTTAAACA~AAAAATTCC~AATCTTTT4~ WLKKNYYINNKPFLKSFLIYSSISNQFILFFKQKNSKSFV

31320

TAAAAATTTnGTAAAAAAAiATAGTAAAG~TGTTATAAC~AATGTTTTT~CAAAAGAAA~TAAAATAGA~ATAAATAAC~TTTCAAAAT~CATTTATT.4~GCTTTTTTT~AGATATTAr~ KNLVKKNSKDVITNVFSKENKIEINNFSKSIYYAFFEILS

31440

AATAAATGA6ATTGATAATAAATTTGTTA~TAATAAGATiAAAAGATTT~ATTTAAACA~AATAAAAAG~TCTGATAAT~TTCGATTTA~ INEIDNKFVINKISLKNINKKKQKRFYLNKIKSSDNFRFI

31560

TAATTTATGEAAAATAAAA;\nTinTTCATEACAACAATT~GTATCAAAT~ATTCTTTTT~ATTAAATCC~GCATTTGAA~TACTTCAAC~AAATTATTA~TTGAAGAAA~AAAATATTI~ NLWKIKNYSSOQFVSNNSFLLNPAFEILQQNYYLKKKNI-

31680

GTTTTTTAAkAAACTAAACbAGGTATTTTCAAATTTTTT~TATTTTCAA~ATTACAAGT~TAAAAAATT~AATATTTTT~TGAAATTTG~TAGTTTAGA~AAAATTCTA~AAAAAAGAA~ FFKKLNEVFSNFFYFQYYKCKKLNIFLKFASLEKILKKRN

31800

TAAAAAATTiACTATATCAiTAAAACTTTiTnncnnnTT~TATAAAAAC~AATTAAATG~AAATGGTGA~TATAAAATT~AAAGTCAAA~TTTACAAAA~GAAAAAGAA~TAAACAAAA~ KKFTISIKLFKKFYKNKLNENGEYKIESQILQNEKELNK<

31920

AAGAAAAAA6AATTTTCAAiTTAATCCAAkcATAAAAAT~TTAAGTTTT~ATAATTCAA~TAAAAAAAA~ATTTATTTA~AAAATAAAT~TTTTTTTAA~AAAAACTTA~TAAATAACA~ R K K N F Q F N P N I K I L S F Y N S S K K N I Y L Q N K Y F F N K N L I

Fig. 2,

cont.

32040 N

N

K

310

K. Umesonoet al.

TACTTTTTTiTTTAATAAAkAATCTTTTA6TATAATTACAAATTGTTTT~CTCTTTTTT~ TFFFNKKSFNIITVIFDKLKKIQLNFQEIQKILNCFSLFF

32280

TAATTCTAAAAATATAAAAkAAACTAAAAiTTTTAAAAA~TCTTATTTT~TTAATGAAA~TTTAACAAC~ACTTTTTCT~TTAATGATA~AGAATTTAA~ATTTTTTTT~TAGAGTTAT~ NSKNIKKTKIFKNSYFINENLTTTFSFNDKEFNOKEFNIFFLELF

32400

TATTTCTGAAATTAACAATGATTTTTTAA;GAGATTTTTiTCCTATAGA~AATAGGCAA~TATTACAAA~ ISEINNDFLMRFFKKYLYYRIYKDKEILFNPIENRQLLON

32520

TTTTTTTGAAAAAACAAAA;TTTTAACTTiTATAGATTT~TTACAGGAT~CTGAATTAA~TTATAATAA~CGATTTATT~TTCATTTAG~AAAAAAAAC~ATTAAAAAT~ATAATTTAT~ FFEKTKILTFIDFLQDPELNYNNRFIFHLEKKTIKNNNLL

32640

ATATTTACGATTATTGAAAnTTTTTCTAAAAGATAAAAGACTTATTTAT~AAATCTCAA~TATCTAATG~ YLRLLKIFLKDKRNFLLINEIKSFIEKKNNLFIKSQLSNV

32760

TTTATTAGT;\AAAAATTCA;nTAAATTTT~TGATAATAT~TTTAATTTT~ATTTTTTGA~ACAAAAAGA~AAAAACATT~AAATTATTT~AAATAACCA~AATTATTTT~AAAAAAGTT~ LLVKNSYKFFDNIFNFHFLKQKEKNIEIILNNQNYFEKSL

32880

ATTAAAAAA;\ACTTATTTA;\AAAATTTAA~CTTAAATAA~AGTTATAGT~AATTTTCTT~TAAAATATT~ATTTTTCAA~TATTAAACA~TTTAAATAA~AATAATTAC~AAACTTTTC~ LKKTYLKNLNLNNSYSKFSYKIFIFQLLNILNKNNYKTFQ

33000

GTGGATTAGiGAACTTATTiTTTATTCAAL\AAATTTAAA~TATAAAATT~AAAACAAAA~AGAAAAAAA~AATTATTGT~ATAATAAAA~TATTTCTTA~AAAAAAAAG~AAATAAAAA~ WISELIFYSKNLNYKIQNKIEKNNYCYNKNISYKKKKIKT

33120

AGTTAATTTiTTTGAAAAAnATAATTTAT~TCAGACTAA~AATTCATGG~TTTTTACTT~GGAATGGTG~GAATATAAT~CATATATAT~ATTACAAAT~ATTCAAGAA~CTTTTTTTC~ VNFFEKNNLFQTNNSWFFTLEWWEYNTYILLQIIQETFFQ

33240

AATTACCGAiGTTTTGGAA;ATTTCAAAAAAAAAAAAATRAACCTTATC~TTTCATAAT~TCAAATTGA~ ITDVLEYFKKKKIIEKNLKFFLKSKKISLKTLSFHNFKLK

33360

ATGGAATTTnCGTTTTTTTkATGAAATTAj\TTATAAAAA~AATTATTTA~TAAATTTTT~ATGGTCTGA~TTTAATTTA~TAAATAATT~TAATAATTT~TATTGGGTT~TTTTTAGTT~ WNLRFFNEINYKKNYLLNFLWSDFNLINNCNNLYWVIFSL

33480

AGTTATATTiATTTTTTTA;ATTATCAAAL\AATTTTTTC~ATTATTATA~GTTCTGATT~TTTTCATTT~TGGAAAAAT~TTGAAATAA~TCAATATTT~ACAGATCGT~CGCGAAGTC~ VIFIFLYYQKIFSIIIGSDCFHLWKNFEIIQYLTDRSRSL

33600

TTATTTTACnAAATTAACTEGTCGTAATAkAACAGCCTTnAAAATTTAT~AAGTTATTT~TTTCAAAAT~TAACACATT~TATTACAAA~ATTAAATTT~ATTTATTAA~ YFTKLTRRNKTALNKTENLLSYFFQNLTHYITNIKFYLLT

33720

AAAAAAAAAiTTAAAAAAAiGGTTAATTA~TAATAAAAC~TTAGATCTA~CTCGTAGAA~ACGTAAATT~TTAGTTCAA~CTTTAATTA~ACATAACAA~ATTCAAAAT~ATGGATTTG~ KKNLKKWLINNKTLDLSRRKRKLLVQSLITHNKIQNYGFE

33840

ATTAAATTCEAATAAACAAiTTTTTACTTCTTATTTTGG~TATCAGATA~CAAATCAAC~AGGACTTTT~TATTTTCAA~ATTTAGCTC~ATTTTTTCA~AAAAATTTA~TTAATAATT~ LNSNKQFFTSYFGYQITNQQGLLYFQYLAQFFQKNLINNS

33960

ATTAGATTTAGCCAATAAAiGGATTGTTT~TTCTTTTTG~CATAAAATT~TTTCTTCAC~AAAATTACG~CAAACAAAT~ATATTGAAT~AGGGTTTCA~AATATACCC~TTCCATTGC~ LDLANKWIVFSFWHKIFSSQKLRQTNNIELGFQNIPVPLQ

34080

ATTTGGATTATCTTATTCAAAAGGAATTTiATTAATAGGiTTATTCAAA~TTTCGATAA~ FGLSYSKGILLIGPIETGRSYLIKNLAAESYVPLFKISIN

34200

CAAACTATTi\TATAATAAA~CTGATGTTAinACAGAAAG~TGGATGAAC~TTTTAATTG~AAGTTTACG~AGGCTAAAT~TTACTTTAG~TTTTGCCAA~AAAATGTCA~CTTGTATAA~ KLLYNKPDVITESWMNILIESLRRLNLTLDFAKKMSPCII

34320

ATGGATTCAiiAATATTCATEAATTAAATGiAAATCGTTT~ACGCAAAAT~TAGAATCTG~TCCAACCTT~TTGCTTGGT~TTTTGTTAA~ATATTTTCA~ACAGATTTT~GTAAAACTA~ WIQNIHQLNVNRLTQNVESDPTFLLGILLKYFQTDFSKT‘K

34440

AAAAAATAAiATAATTGTTRTTGGGTCAA~TCATCTCCC~AAAAAAGTG~ATCCAGCTT~AATTTCTCC~AATAGATTA~ATAAAATAA~TAATGTTCG~TTATTTAAT~TTTCTCAAA~ KNNIIVIGSTHLPKKVDPALlSPNRLDKIINVRLDKIINVRLFNISQR

34560

AAAAAAACA6TTTCCCCTTETTTTAAAAAAAAAGAATTTiCAATTAAAA~AAAATCTGT~TTTTTTAAA~GAGTTTGGA~CACGAACTA~GGGCTATAA~TTAAGAGAT~TATCAGCAT~ KKQFPLLLKKKNFQLKENLFFLNEFGSRTMGYNLRDLSAL

34680

GACAAATGAAGTTTTATTA;TAAGTATTACAAAAAATAG~TCATTTATT~ATACTGATA~TTTAAAATT~GCTTTTCAT~GACAAATTT~TGGTTTAAC~TATACAAAT~ATAAATTAA~ TNEVLLISITKNRSFIDTDTLKLAFHRQIFGLTYTNNKLN

34800

TTTTGATAGi\ATATTTAAAkTAGTTATTTATAAAGTAGG~AAAACTATT~TACAAAATA~TTTAATTAA~AGCTCTAGT~TGAATTTGT~AAATATTGG~AATTTTTTA~GGAAAAAAA~ FDRIFKIVIYKVGKTIIQNILIKSSSMNLLNIGNFLWKKN

34920

TTTTTATTAETTATCTAAAiGGTATTTAG~ACCCTCTnTiTTTAGCTGG~ACAGCTGCT~GAGATTCAT~ FYYLSKWYLEPSIDESIIKELTILTtiILACLAGTAARDSW

35040

GTTTTTATTkGAAAAAAAAiCAGAAAGTTiACTTCCTATiGATAAGTTAGTTGAAAATGL\TTTTACTTTkGCCTTTAGTATTTTAGAAAGTTTTTTTTC~GAATTTCCAjGGTTAGAAAi FLLEKKAESLLPIDKLVENDFTLAFSILESFFSEFPWLEI

35160

ATGTCAAACiAATGTTGTTAATTCTAAAAj\AAATAAAAT~ATTGAATTT~CAACAAAAA~CTCTATGAA~ATTATGCAA~ATGGAATTT~TGCTATAGC~AATAAAAAA~TCATTTACA~ CQTNVVNSKKNKIIEFSTKNSMNIMQNGIFAIANKKFIYT

35280

TCAAAATCAiTTACAATAT6AATCGTCnCiTTCTCAACA~ATAAGTTTT~ATAAAAAAA~AAATTATGA~TTTAAAAAT~CTTCTTGGT~ACCTCGATT~TGGCGTTTG~GTTTTTTTC~ QNHLQYKSSLSQQISFNKKKNYEFKNTSWSPRFWRLSFFR

35400

TAGTAATTTATTTGATTGGATTAAAAGACCAAATGATTT~GAATTTTCT~ATAAATTTG~ATTTACAAA~AAAAAAGAA~ATCTTTTTT~TGCTAATTT~CAAAAAAAA~ATAATTATG~ SNLFDWIKRPNDFEFSYKFGFTKKKEYLFSANLQKKNNYG

35520

ACAATTTATAGAAAAGAAAiAAAAAGAACAACTTCTTTA~GAAAGAATT~TACCGAGAA~ACGAAGAAG~AATGTACAA~AGTTAGAAT~TCAATTTGA~GAAATATTA~TAGAAGAAC~ QFIEKKKKEQLLYERILPRIRRRNVQELESQFEEILLEEQ

35640

Fig. 2, cont.

Liverwort

Chloroplast

Genome. II

311

ATTTGAAAT;TTAGGTTTTiTTCGATTATCAGAACAATA~CCAATGGAA~ATCAATTAT~TAATAAGCC~AGATTATTT~TTGGAAAAC~AATTCTTTG~GATCCAATA~GTTTATTTT~ FEILGFFRLSEQYP.MEYQLYNKPRLFIGKRILWDPIGLFF

35760

TCAAATTCGiCATTTTGTGiTTTCACGTCi;AGAATTTTTiGTAGATGAAi;AAATGTTAAGAAGACTTTA;GTTACTTATGGAGCTCGAAGAGAAAGAGA6AGATCTCGT;CAAGTCAAA6 QIRHFVFSRREFFVDEEMLRRLYVTYGARRERERSRSSQK

35880

AATTAAACA;\TTTTTTCTTiGTCGTGGATATAATAAAGA~CTAATTAGT~AATTATCTA~TCGTTGGTG~AGTCAATTA~CTATTAATG~AAAAAAAAA~ATTGATACA~TAAAACGTA~ IKQFFLCRGYNKDLISKLSIRWWSQLPINEKKNIDTLKRI

36000

TGAACATATiAGTATTCAAiTAAAACGCCCTCAAATTTTiGTTTTTTCG~TTTGAGTTA~TAACTCATC~ EHISIQLKRPQIFTPVYLYQRWLIENSPEKFFRFELLTHR

36120

CAAAAAATGECTTAAAATAAATAGTTTAT~ATTAAATGA~TCTTTTATT~ACACAACAC~TTTAGAAAT~TATGAATAT~TATTGCATT~TTTTATTGC~AATAAAAAA~TACTAAATC~ KKWLKINSLLLNDSFIYTTLLEIYEYLLHFFIANKKLLNQ

36240

AATGACAAAhATTTTATTAkAAAAAGGGT~GCTTTTTGA~AATGAAATA~AAACTATTA~TAATGAAAC~AGACAATAA~CAAATTAGT~TTGAAGATA~ATAGGAATA~ATATAGATA~ +---------------------> MT K I L L K KG W L F EN E I E T I I N E T R Q ===

36360 <---

ATTCCTATAiATCTTCAAAiTTTAATTAACTTAGTTTTT~TTTTTAGAG~CGGGATTGA~GGGGCTCGA~CCCGCAACT~CCGTCTTGA~AGGGCGGTA~TCTAACCAA~TGAACTACA~ + 3'-GCCCUAACUGCCCCGAGCUUGGGCGUUGAAGGCAGAACUGUCCCGCCAUGAGAUUGGUUAACUUGAUGUU

36480

TCCCATTATATAAATGCAT;TTTTTTATAiTGTCAAAAA~GTTTGACAT~ATAAACGGC~ACTTTTTTC~ATAATTAAT~TTGGGTCGA~CTGGATTTG~ACCAGCGTA~GCATTGCCA~ AGGG-5'
36600

CGGATTTACAGTCCGTCCCcATTAACCAC~CGAGCATCG~CCCAGATAG~AAAATCTTT~TCTATCTGA~AAAAAGTAT~AAATATTAA~TAATTAATT~AAGGACTTT~TTATTACCC~ GCCUAAAUGUCAGGCAGGGGUAAUUGGUGAGCUCGUAGCUGGG-5'
36720

CAGGGGAATiCGAATCCCCETCGCCTCCTiGAAAGAGAG~TGTCCTAGG~CACTAGACG~TGGGGGCTT~ATCCCTTAA~CTTATCTTA~TCAATATAT~ATTTCCTGT~AATAGTTTT~ GUCCCCUUAAGCUUAGGGGCAGCGGAGGAACUUUCUCUCCACAGGAUCCGGUGAUCUGCUACCCCCG-5'
36840

f--------->

<-..-----+

+----

TTAGAAAAAAAATTGTTTCiAAAAAATCA~TAATTTATTbCAAATTGAA~ATTACTCAT~AATATTTTT~TGTTATAAA~CATATTGAA~TTAAATAGA~AAATTAAAC~TAAATAAAA~ (---------+ TTGAAT> TGTTAT> ATTTTTTATiTAAATATTG;TTAAAATAGiTTTAAGATAGTTTTAGATC~AGTTAGTTT~ mbpX> AGGAG MS I

36960

37080 L

I

Y

K

V

SK

S

L

G

N

L

K

I

L

D

R

V

S

L

TATGTACCTAAATTTTCTTiAATAGCACT~TTAGGTCCT~CTGGTTCGG~AAAATCCAG~TTATTACGA~TTATTGCAG~TCTTGACAA~TGTGATTAT~GAAATATAT~GTTACATGG~ YVPKFSLIALLGPSGSGKSSLLRIIAGLDNCDYGNIWLHG

37200

ATAGATGTT6CTAATATTTCTACACAATAiAGAAGAATG~GTTTTGTTT~TCAACATTA~GCACTTTTT~AACATATGA~TGTTTATGA~AATATTTCA~TTGGACTTC~ATTACGAGG~ IDVTNISTQYRRMSFVFQHYALFKHMTVYENISFGLRLRG

37320

TTTTCTGCTcAAAAAATAAECAATAAGGTEAnTGATTTA~TAAATTGTT~ACGAATTGC~GATATTTCT~TTGAATATC~TGCCCAACT~TCAGGAGGA~AGAAACAAC~TGTTGCTCT~ F S A Q K I TN I( V N D L L NC L R I AD I S F E Y PA Q L S G G Q K Q R V

A

i

37440

GCACGAAGTiTAGCAATTCnACCAGATTTiCTTTTATTAtTGGTTAAAA~GTTATTTGC~AGATAACAA~ AR S LA IQ P 0 F L L L D E P F GA L D G

D

N

K

ATTACAACAATTATGGTTAtACATGATCA~AAAGAAGCG~TTTCTATGG~TGATGAAAT~GTGATTTTA~AAGAAGGTC~TCTGTTACA~CAAGGAAAA~CTAAAAATT~ATATGACCA~ I T T I M V T H D 0 K E A I S M A D E I V I L K E G R L L Q 0 G K P K N L Y

D

3

37560 E

L

R

R

H

L

S

K

W

L

K

R

Y

L

Q

37680

CCAATTAATiTTTTTGTTGGTATTTTTTkGGATTACTTkAAAATTTAA~AAAATTTGC~ PINFFVGIFLGLLIEIPKLNESITLKNIPSKTPQNLKKFA

37800

TTTGATCCTi\TATGGGTGA6AATATTTGCiAATCGATCA~TAAACAAAT~TCGATTTTT~TTAAGACCT~ATGAATTTT~TATAAAATC~GAAATGGAT~TGGAAGCAA~ACCAGTTCA~ F D P I w V K I F AN R S I N K Y R F F L R P Y E F C I K S EM D L EAT P

37920 V

Q

ATTAAAACAnTAATTTATAlAAGAACTTTiGTTCAGTTG~ATCTTTTTG~AACTTCTTT~TTATGGAAT~TAACAATTC~AATAGGTTA~CAATCTTTC~GAAATTTAC~TATTGAATC~ IKTIIYKRTFVQLDLFVTSFLWNLTIPIGYQSFRNLHIES TTTATGCAA6CACTTTATAiAAAACCTAG~CTTCAAGTT~TTTTAAGAG~ATATCCTAT~TTAACAAAT~TCAAAAAAA~TTAATAATT~GTTTTTTTA~TACTAAATT~TTATTATAA~ FM Q T L Y I K P R L Q V F L RAY P I L TN I K K N ====== +------)

38040

38160 <-----

TTATCTTTA;ATATATATA;ATATTTATAiAATATATAAkTATGTTATA~ATCTTATTG~CAACTTAGT~ +--------, <-----------+ +

38280

TTTTTAATTliAGTTGACAAiTTTGTAACTiiTGTTACAC~ATATGTTGT~TTATTTATA~AAAAAAAAA~ATAGTTTCT~CTTATTGCC~TTTTAACTC~GTGGTAGAG~AACGCCATG~ TTGACA> CACAAT> Thr-GGU> 5'-GCCCUUUUAACUCAGUGGUAGAGUAACGCCAUGG

38400

TAGGGCGTAnGTCATCGGTiCAAATCTGAiAAAGGGCTT~TTTTTACAA~GTCAAATAA~GTTCACATT~TATTTAACG~AAAATTGAA~TTTATAACT~AATTATATA~TCTAATACT~ UAGGGCGUAAGUCAUCGGUUCAAAllCllGAllAAAGGGCLl-3'

38520

AAGTTTTTTCTAATTTACAbAATTAAAAA;TTCTAGTAT~GGTTAAAAA~ATTTTGGAA~TTTCAACTT~CTATTTACT~TGATAAAAT~AATTAGTAA~TTTTGAAAG~TAATAGATT~ TTGAAA>

38640

TCTTTTATAcAATATAAAA;AACTATTAG~TAATGAATT~AAAATAAAA~AAGACATAA~AAATTATTA~AAAGAAAAG~AAAGTTTAC~TACTTTTTA~TTTTAATCG~TTTTAGAAA~ TACAAT>

38760

TATATTAAT6AAAAAAATTRAAATTGAAAi;GGATCATAA~AAAATTTTT~TAGGTACTT~AAAATGGTT~AAGTAACTT~ATAGGAGAA~CATTATGAC~ATAGCCATT~GAAAGTCTT~ psbD> AGGAG MTIAIGKSS

38880

CAAAGAACCi\AAAGGTTTAiTTGATAGCA;ccnrcnCTG~CTAAGGAGA~ACCGTTTTG~ATTTGTAGG~TGGTCTGGT~TATTGCTTT~TCCTTGTGC~TATTTTGCT~TAGGTGGAT~ K E P K G L f 0 S M D D W L R R D R F V F V G W S G L L L F P C A Y F A L G GTTTACAGGiACAACCTTTETAACTTCATi;GTATACTCA~GGATTAGCT~GTTCTTATT~AGAAGGTTG~AATTTTTTA~CTGCCGCTG~TTCTACCCC~GCAAATAGT~TAGCTCATT~ F T G T T F V T S w Y T H G L A S S Y L E G C N F L T A A V ST P AN S L AH

Fig. 2, cont.

39000 G

W 39120 S

312

K. Umesm

et al.

TTTACTATTATTATGGGGACCTGAAGCACAAGGTGATTTiACTCGTTGG~GTCAATTAG~CGGTTTATG~ACTTTTGTA~CTCTTCATG~TGCATTTGG~TTAATAGGC~TTATGTT~~ LLLLWGPEAQGDFTRWCQLGGLWTFVALHGAFGLIGFMLR

39240

ACAATTTGAi\CTTGCTCGAiCTGTTCAATiACGTCCTTAiAATGCAATT~CGTTTTCTG~ACCTATTGCiGTTTTTGTAiCTGTTTTTC~TATTTATCCiTTAGGACAAiCAGGTTGGTi QFELARSVQLRPYNAIAFSGPIAVFVSVFLIYPLGQSGWF

39360

TTTTGCACCiAGTTTTGGTI;TAGCTGCAAiTTTTAGATTiATTCTTTTTiTTCAAGGCTiTCATAACTG~ACTTTAAAC~CATTTCATA~GATGGGTGT~GCTGGAGTTiTAGGGGCTG~ FAPSFGVAAIFRFILFFQGFHNWTLNPFHMMGVAGVLGAA

39480

39600

39720

TTTAGCTTTnAATTTACGTGCCTACGATT~TGTTTCCCAATCCAGAATTiGAAACTTTT;ATACAAAAA6TATTTTATTi\AATGAAGGT~TTAGAGCTT~ LALNLRAYDFVSQEIRAAEDPEFETFYTKNILLNEGIRAW

39840

GATGGCAGCiCAAGATCAGiCTCATGAAA~TCTTGTATT~CCAGAGGAG~TTCTACCCC~TGGAAACGCiCTTTAATGG~ACTTTAGCT~TAGGTGGTC~TGATCAAGA~ACCACAGGTi MA A Q D Q P H E N L V F P E E V L P R G N A L === MKILYSQRRFYP(M)ETLFNGTLALGGRDQETTGF W'

39960

TTGCTTGGTGGGCAGGTnniGCTAGACTTATTAATTTAT~TGGAAAGTT~CTTGGAGCT~ATGTAGCTC~TGCTGGATT~ATTGTTTTT~GGGCTGGAG~AATGAATTT~TTTGAAGTT~ AWWAGNARLINLSGKLLGAHVAHAGLIVFWAGAMNLFEVA

40080

CTCATTTTG;ACCAGAAnniCCTATGTAT~AACAAGGATiAATACTACTiCCTCATTTA~CTACTTTAG~TTGGGGAGT~GGACCTGGTGGAGAAATTGiTGATACTTTiCCATATTTTG H F V P E K PM Y E QG L I L L P H LA T L G :21 G V G P G GE I V DT F P

40200 V

F

V

TGTCTGGAGiTCTTCATTTAATTTCTTCT;;CAGTTTTAG~TTTTGGTGGiATTTATCAT~CACTTATTG~ACCAGAAACiTTAGAAGAA~CTTTTCCGTiTTTTGGTTA~GTTTGGAAA~ SGVLHLISSAVLGFGGIYHALIGPETLEESFPFFGYVWKD

40320

40440

40560

TCATTGGCGGGCATGTATGiTTAGGTTCC;\TTTGTATTTiTGGGGGAATCTGGCATATTiTAACAAAACETTTTGCATGi;GCTCGTCGT;;CTTTGGTATi;GTCTGGGGA~GCTTACTTAi 1 G G H V W L G 5 I C I F G G I W H I L T K P FAN AR R A L V W S G E A Y

L

40680

CTTATAGTTiAGGTGCTATiGCTGTTTTTGGTTTTATTGCATAATACAGCTTATCCGAGiGAATTTTATGGTCCTACCGGTCCAGAAGCATCTCAAGCTC Y S L GA I A V F G F I AC C F VW F N N T A 'I P S E F Y G P

AQ

S 40800

T

G

P

E

A

S

Q

AAGCTTTTACTTTTTTAGTiAGAGATCAAEGTCTTGGAG~TAATGTAGG~TCAGCTCAA~GACCTACTG~ATTAGGGAA~TATATTATG~GTTCGCCCA~TGGAGA~T~ATTTTTGGT~ AFT F L V R DQ R L G A N V G S AQ G P TG L G K Y I M R S P TG E I I F

40920 G

G 41040

CAGAATACAiGACTCATGCiCCATTAGGAiCATTAAATTiCAACATCiCATTTCGTTi E Y M T H A P L G S L N S V G G V

41160 A

T

E

I

HA

V

:;

Y

V

5

P

R

S

W

L

A

T

S

H

F

V

L

TAGGTTTCTiTTTCTTTGTkGGCACTTA~GGCATGCTG~AAGAGCACG~GCTGCTGCA~CTGGTTTTG~AAAAGGAAT;GATCGTGAT;TTGAACCAG;TCTTTCTAT;;ACACCTCTT~ ID E D F E P V L S M T GFFFFVGHLWHAGRARAAAAGFEKG

P

L

N

41280

41400

ATTAATTAAiTAATTAATTnATTATTAACiAAAnnnncniTTTTTiTTTTACTTGiTTTTTTAGTiAATAATTAA~TTGCTAATTiAACTATTTA~TTTTTGAAAi ------+ _== +--------------------------~~~~~~~~~~~---> <---------------------------------TAAAAAAGGiGAGAGAGGGnTTTGAACCCiCGATAATCTiAAAAACTAT~TCGGTTTTC~AGACCGACG~CATAAACCA~TCGGCCATC~CTCCTATAGiAAACATTTT~AATCTAATA~ aer-MiA 3'-UCCUCUCUCUCCCUAAACUUGGGAGCUAUUAGAAUUUUUGAUAUAGCCAAAAGUUCUGGCUGCGGUAlJlJ~JGGUGAGCCGGUAGAGAGG-5'

41640

TTTTTTTCAkAAAAATTATiAAAGTTTGAiCAAATCGAA~TTATAAGTA~TTTTTTGAT~ATTTTACAA~AACAGGATT~GATGGTAAT~TTTTCATATiTATTAAAAA~TTGGAGAAT~ > <-----------f -----_ cAATGTT TTTGAT> ACAACTATGACTATAGCTTicCAATTGGCiGTGTTTGCACTAnTTGCTAiTTCATTTCTi:CTAGTAATTEGTGTTCCCGiAGTACTAGCiTCTCCTGAAGGTTGGTCAAETAACAAAAAi MT I A F Q L A V F A L I A IS F L L V I G V P V V L A S P E G W S S 62>

41520

+-----

GGAG ORF 41760 N

K

N 41880

GTTGTTTTTiCAGGTGCTTCTTTATGGATiGGATTAGTTiTTTTAGTAG~TATTCTTAAiTCGTTTATA~CTTAAAATT~TATAGTAAT~TAAATTTTA~GAATTTA~~CTTCCTTGGi V V F s G A s L W , G L ,, F L V G , L N s F , 5 i=z +-----__------ ---_____________ TTACATTATI\TTATAAATT~TAAATGCAT~TGAAACAAGiGCTTTAGAT~~AAAAATGiTTCCAAGGA~GTTTAATTA~AAAAAAATT~TATAAATAT~TATA~TTA~ATATATATAi ---______ - __....._ - .____._ ___- -------+ ---------------> <-----------TTATAT> +---------------

42000 ----42120

ATATATATAiATAATATnnGTATAATTTninTACGCGGGiATAGTTTAA~GGTAAAATT~CTCCTTGCC~AGGAGAATA~GCGGGTTCG~TTCCCGCTA~CCGCCAATT~AAATAATTA~ TATAAT> 61y-GCC> 5'-GCGGGUAUAGUUUAAUGGUAAAAUUCCUGGUUGCCAAGGAGAAlJAlJGCGGGlJlJCGAlJUCCCGCUACCCGCC-3' ------><--------------------+ CTTAAAAAAnGAAAATAAGiAAAAATATTiTTTTTTGGC~GAGACAGGA~TTGAACCTA~GACCTCAAG~TTATGAGCC~TGCGAGCTA~CAGACTGCT~TACTCCGCGiTATAATTAA~ 3'-ACCGCCUGUGUCCUAAACUUGGAUACUGGAGUUCCAAlJAClJCGGAACGClJCGA~lGGlJCUGACGAGAUGAGGCGC-5' +---------> <---------+

42240 4AIJ-ftkt

Figure 2. Nucleotide and deduced amino acid sequences of a region from r&12 to trrzjAf(CAU) (1 to 42,240). Only the plus strand DNA sequence is shown. The deduced amino acid sequences in the l-letter code and tRNA-like RNA sequences are presented in the lower line. Genes are designated in bold characters at their 5’ termini. Broken arrows at “ - 10”) and indicate inverted repeated sequences. Possible promoter sequences (‘JTGACA at “ - 35” and TATAAT consensus sequences of the group II intron (gugyg . . . ragccg-augaa- -gaaa- -uucaugu-cgguuy . . . cuayy-y-ay) are shown. SD-like sequences and termination codons are indicated by capitals and double underlining, respectively. Note that ORF29, tmC(GCA), tmR(UCU), tmC(UCC), ORF33, ORF30, ORF36a, ORF55(ZhcA), tmD(GUC), tmY(GUA), &FzE(UUC) and tmS(UGA) are oriented in the reverse strand.

Liverwort

atpB

Chloroplast

313

Genome. II

- tmG(GCC)

~CTTTAACAC~;AGCTTTGAA~CCAACAC~T~CTTTAGTCT~~GTTTGTGG~GA~ATAAGT~~~T~~~TAC~AAT~TAATA~TATT~TTGC~AGGATGAGG~CTG~TCGAT~AAAATTTTT GGAGG
56291

~TCTAATTTT~CTTTTGTAT~AAAAATTTT~TAAGTAAAA~TTTTTT~CA~T~ATAAAAC~TTATTATTG~ATATTGTTT~TTATATGTA~TGCAACCTA~CTATTGTAT~ATTAAATAA +---------> (---------+ <-------+

56171

RTTTTATTAT~TTTTTTTAT~GATACA~AT~GAC~TTAAC~ATTTTT~AG~ATATAGATT~AAATATATA~ATATATATA~ATATATATA~ATATGAGTA~TATATATCT~TATCTATAT +--------------------------------) <-------------------

56051

RTATATATATAGATATATATEAAATTTATA~ATTTTGT~A~TAATTTAGT~TCAAAATTT~AT~TATCTA~TTAACTT~A~AAATATTAA~AAAAAGTTT~AATATATAT~TTTTCTAAG + TTGTCA> TTTAAT>

55931

~;TAGTTTTTT;TATTATTAA~TGATTTATT~GATA~ACAA~ATTTTTTTT~TTATAATTT~ATTATTAAC~AACTTTTTA~TTTATGAAA~CAAATTTTT~AGCTTTTGG~ATGTCTA~A a@> M K T N F L A F G M S

55811 T 55691

~CTATGGCTGAGTATTTTCG~GATGTTAATAAA~AAGATG~ACTTTTATT~ATTGATAAT~TTTTTCGTT~TGTTCAAGC~GGTTCAGAA~TTTCTG~TT~ATTAGGTAG~ATGCCGTCT TM A E Y F R D V N K Q D V L L F I D N I F R F V 0 A G S E V S A L L G R M P

54971 S 54651

54731

54611

54491

ATAAAAGGAT~TCAAATGAT;CTTTCGGGA~AATTAGATA~~~TT~CTGA~~AAGCATTT~ATTTAGTAG~AAATATAGA~GAAGCTA~T~CAAAAGCAG~TACTTTACA~GTGGAGAGT IKGFQMILSGELOSLPEQAFYLVGNIDEATAKAATLQVES etpE> GGAG ~AAAAATTATECTAAAT~TTEGTATCATGGC~CCTAATCG~ATTGTTTGG~ATTCGGATA~TCAAGAAAT~ATTTTATCA~CGAATAGTG~GCAAATTGG~ATACTACCT~ACCATGCTT M L N L R I MA P N R I VW N S D I Q E I I L ST N S G Q I G I L P N HA

54371

54251 S

CAGTTTTAA~~G~TTTAGAT~\TAGGAATTG~CAAAATA~G~CTTAATGAT~AATGGT~TA~TATGG~ATT~ATGGGTGGT~TTGCTATGA~TGACAATAA~AATTTAACT~TTTTAGTTA VLTALDIGIVKIRLNDQWSTMALMGGFAMIDNNNLTILVN

54131

I\TGATG~TGAAAAAGCTAGTGAAATAGATT~TC~AGAAG~~CAAGAAA~T~TTCAAAAAG~TAAAACAAA~TTAGAAGAA~CAGAAGGTA~CAAAAAAAA~GAAATCGAA~~TCTATTAG DAEKASEIDYQEAQETFQKAKTNLEEAEGNKKKEIEALLV

54011

;TTTTAAAAG~G~TAAAG~A~GATTAGAAG~AAT~AATAT~G~AT~AAAG~TATAAATTA~ATAATTAAT~AATAATTAA~AATTTATAT~AGATGCCAC~TTTTCTGGC~T~TAATATA ++---> F K R A K A R L E A! N MA S KL=== <----+ f----------------, (--------------~~

53891

~ATTAGAAAA~\AAA~TTA~~~A~T~~~GGA~TTGAAC~AA~GA~TCTCGC~GTATGAAAG~GATACTCTA~AC~ACTGAG~TAAGTAGGT~TTTTATTTC~AATTTATTA~AACTATATA + 3'-AUGGAUGAUAACCUAAACUUGGUUACUGAGAGCGGCAUACUUUCGCUAUGAGAUUUGGUGACUCAAUUCAUCCA-5'
53771

~TAA~TTTTA~TG~AAGATA~GTAAAAAAA~TTGAAT~TA~ACCTTGA~A~TAAATAATA~AAAAGTATA~ATATATATT~TATAGAAAA~AATA~TAAT~TAA~TCGTC~GTATTAAAG TATAAC>Val-UAC> f-----.)<-------+

53651 5'-AG

~G~TATAG~TEAGCGGTAGA~CGCCTCGTT~ACACGTG~G~CAATGCTTA~CAAAAAT~T~TTTTCGATT~GTCGATTCA~AACTAAAAG~TTTCTAATT~TTGTGAAAT~GAAAATGT~ GGCUAUAGCUCAGCGGUAGAGCGCCUCGUUUACAC gugyg................(1ntron).... . . .. . . . .. . . .. . . . .. . .. . . . .. . .. . . . .. . . .. . . . .. . . . .. . .. .

53531

;TACTCTTTGATTTAAT~AT~GAGAAAAATAGCCTGA~A~~AATAATTTC~ATTATTTAT~TGAAATTA~~TTTTTAGTT~ATATGGTTA~TTTTTT~TT~TAATGTTAT~A~ATGATGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron) . . . . .. . .. . . .. . . .. . . . .. . .. . . . .. . .. . . . .. . . . .. . . .. . .. . . . .. .

53411

~AATTACGGG~AACT~AAGA~ATT~TTTTT~TGCTTTATG~AATTTTAAG~TGTATAAAA~TT~ATATTA~TTTAGCAAG~GAAACT~TT~ATTGAGTAA~T~CATGTAA~AAACAAACC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..(lntron) . . . .. . .. . . . .. . . .. . .. . . .. . . .. . . . .. . . .. . . . .. . .. . . . .. . . .. . .

53291

~AAGTCAATA~TTGATAATT~TTGAAAAA~~TTGGGATTG~ATTAAAATT~TT~AGAATT~TAAG~AAA~~GAA~~AT~T~ATTATTAA~~AAAAAAAGG~TGGAAAATA~~TAAATTAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1ntron).. .. . . .. . . .. . . . .. . .. . .. . . . .. . . .. . . . .. . . . .. . . . .. . .. . . .. . .

53171

EA~TTAGTTA~TAAA~GAGCEC~ATG~ATA~AAA~ATG~A~GTTGGGTT~~TAAAG~AGT~~TTAATTTA~AAGAA~TGT~TTA~~GAGA~TGTCTA~GG~TCAAATC~G~ATAG~C~TA . . . . . . . . . . . . . . .ragccg-augaa--gaa~~--Ilucaugllgll-cggul~y.. . . . (1ntron) . . . . . . . . . . ..cuayyy-ayCGAGAAUGUCUACGGUUCAAAUCCGUAUAGCCCUA-3'

53051

Fig. 3.

314

K. Umesono

et al.

;\ATTTGTTTT;TTTTATAAA~TGAAAAAAG~;TATACTT~A~~ATAAAAGA~TAGTTAATA~AATTTAA~T~AAAAATCTA~AGTT~AAAT~AGTACAATT~TAATAATAA~TGACCAATT +-----------> <-----------+ +------, <------+ TTGACC> f-----> <-----+

52931

52811

52691

52571

~TTTAATTAT;GGTTTAGTA~ATG~ATGGCEAAAAGGAG~~~TAGAATGG~CTTAAATTT~AAATTTTTT~CTTGTGAAA~TAGTTTAGA~GATAACTCT~CAACTATGC~TAAAAATTC L I I G L V YAW R KG A L E W S === psbG> AGGAG M V L N F K F F T C E N 8 L E D N S T TM L K

52451 N

S

~ATAGAAT~T~~TTTTATTA~CAAAA~T~T~ACAAATTCA~TTATTTTAA~AACTTTTAA~GATTTTT~T~ATTGGGCTA~ACTTTCTAG~CTATGGCCA~TCCTTTATG~TACAAGTTG I E S 5 F I N K T L TN S I I L T T F N D F 5 NW AR L S S L W P L L Y G T

S

C

52331

52211

52091

51971

~GAAAAAAAAAT~~TT~~~A~~GGAA~~AG~TTTTTTA~T~TAA~T~AT~~ATT~A~TTT~TTTT~AAAT~T~GA~AAT~~~AA~~TAAC~T~~T~AAA~~AATTTTTCC~ATCTAAAAA EKKILKKGTRFFTLNHQFNFFSNLDNPKLTSSNQFFQSKK ;~A~TT~T~~~~TTTTATT~~A~~~AT~TTT~A~ATTTAAA~~~~~GGAAA~TTTATA~~T~T~A~~TTTT~~TTTTGA~T~AAAAAAAAT~AAAAAAGTA~AAATA~TAT~TTAAA~ATT +------> <------+ T S K V L L E T S L T F K E K E N L === ORF169> M L

51851

51731 N

I

~TAAAAAATAA~AATAATAA;\~T~CAAGGA~GTTTATCTA~TTGGTTAAT~AAGCATAAT~TAAAACA~A~ACCTTTGGG~TTTGATTAT~AAGGAATAG~AA~ATTACA~ATTAGATCT LKNNNNKIQGRLSIWLIKHNLKHRPLGFDYOGIETLQIRS

51611

51491

RTAA~~GATAATG~~GAT~~~~~TGAAGA~~TATG~ATAA~AATTTTT~T~TTA~G~AAA~ACCCAAAA~~~TCCATCTAT~TTTTG~GTC~GGAAAAGTG~AGATTTTCA~GAACGT~AA I T DNA D Q P E E I C I K I F I L R K N P K I P S I F bl V bJ K S A D F Q E

51371 R

E

~~TTA~GATA~GTTTGG~AT~TTTTATGAA~~~-~C~CCT~TT~~A~G~ATTTTAATG~~TGATAGTT~GCTA~GATG~CCTTTACGC~AAGATTATA~~~TA~CTAA~TTTTATGAA SYDMFGIFYENHPCLKRILMPDSWLGWPLRKDYIVPNFYE

51251

51131

~TA~AAGACG~TTATTAATT~TATAGAAATAAAAAAAATA~~TA~TATTT~AATAATATA~TAAAGTAAA~ATAAGTATA~ATATAAAAT~TATATAAAA~GGGTTTAAA~TATTCTAAA +--- .-----) <--------+ +--------L Q D A Y ===

>

~TTCAGATTT~TATTAGAAG;\TTTTAAG~~~TTTTTATAT~TTTATATAT~TTAGAA~T~~GC~AGAAAC~AGATTTGAA~T~GTGA~AC~AGGATTTT~~GTC~TCT~C~~TA~~AA~T (----___------------------------+ 3'-ACGGUCUUUGGUCUAAACUUGACCACUGUGCUCCUAAAAGUCAGGAGACGAGAUGGUUGA

51011

~AGCTAT~CCGGCAATTTTTATTTTTT~T~~TTTAA~AT~~ATAG~AA~~~T~TGT~AA~~TTTT~TTAA~ATTCAATAA~AGTTTAC~T~GGGGT~GA~~GA~TTGAA~~~T~A~GGT~ +- ---------, CUCGAUAGGGCCG-5'
50891

~TTAAAGCCA~CGGATTTTC~~TTTTA~T~~~ATTTG~ATT~TTGTT~~~A~T~~~TTGTA~AAA~GGA~T~TATCTTTAT~~TCGTCTAT~TTTAAAATG~A~~~~TGAT~TAAT~~TTT AAAUUUCGGUUGCCUAAAA......................................(lntron).......................................................

50771

~ATTATTATTAATTAATATA~TAAT~CAAT~TTTTATAAA~ATG~T~~~A~ATTT~GTTA~GATAGTTTT~TTTGAGTCT~TG~~~~TCT~TT~TTTTAG~AAA~AAAAT~TGG~T~~GG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron).......................................................

50651

~TTACCTAAT~TTTATCTTT~TTTTGCAA~~TA~GTTT~~~TGAATTTGA~AA~~AT~AT~TA~TAAATT~CTCAACTAA~~~TCAATTA~ATTAA~T~~~~AG~GTCTA~~AATTT~G~ UUCAGGCGUCGCAGAUGGUUAAAGCG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron).............................

50531

EATACCCCCT~CAGTGAAA~~~AAAAAATTT~TTTCTTGTA~~AAAT~TAA~TATAGTTTT~TTTTTTTTT~TATGCAATA~TTAAAAAAA~ATTAAAAAA~AATT~TTG~~TTTTTT~AA TTGCTT>
50411

~ATATTATAT~TAATATGTTGTTTTTAATA~AAAAGAAAA~AATGGTATA~ATTAT~~~T~TATTTTTTA~TTATTTGCC~GTTTAG~T~~~AGGT~AGA~~GT~G~A~T~GTAATG~GA Thr-lJGlJ> 5'-GCCIJGUUUAGCIJCAGAGGUCAGAGCGUCGCACUUGUAAUGCGA TATAAT>

50291

~GGTCATCGG~TCGACTC~G~\TAGCGGGCT~TTTTTCTAT~TTTTAAATA~CAATAAAAA~ATTTTTGAT~ATTTTTATT~CTATAGTAA~ATAATTTTT~TTTTATTTT~TTATTTGTT f-----, <-----+ TTGATT> TATAAT> TTGTT UGGUCAUCGGUUCGACUCCGAUAGCGGGCU-3'

50171

~GTTTA~TTT~TTATGCTAT~\GTTTTTTAACTATGAAAAA~AGTTTATAA~AATTGTAAA~TTTTTTATA~AA~GATTTT~ATATGTTAT~TATTTTATA~TATTTTTT~~TATTTTTGT T> TATAGT>

50051

~ACATAAAGGAGTTTTTATG~CCCGTTAT~~AGGACCTCG~GTAAAAATA~TA~GT~GT~~GGGGGCTTT~CCAGGTTTA~~TAATAAAA~ACTTAAATT~AAATCTGGT~ATATTAAT~ MS R Y R G P R V K I I RR L GA L P G L T IJ Y T L K L K S G Y I r&> AGGAG

49931 N

Q

~AT~AA~ATC~AATAAAAAAGTTT~T~AGTATCGTATT~G~TTAGAAGAA~AA~AAAAAT~A~GTTTT~A~TATGGATTA~~~GAAAGA~~ATTATTAAA~TATGTA~GT~TTG~TAGAA STSNKKVSQYRIRLEEKQKLRFHYGLTFRDLLKYVRIARK

49811

~AG~AAAAGG~T~AACAG~TC~GGTGTTGT~~C~~TTGTT~GAAATG~GT~TA~ATAAT~~TATTTTTCG~TTG~GTATG~CTCC~ACAA~TCCAGGAG~~AGA~A~TTA~TTAAT~AT~ PTIPGARQLVNHR AKGSTGQVLLQLLEMRLDNIIFRLGMA

49691

Fig. 3, cont.

Liverwort

Chloroplast

Genome . I I

315

49571

49451

CTTTTCAAAA~CAAAAAATACCAAATCACT~AACTTTTGA~TTAATGCAA~TTAAAGGAT~AGTTAATCA~ATTATTGAT~GTGAATGGA~TTATTTAAA~ATAAATGAG~TGCTAGTTG F Q K Q K I P N H L T F tl L M Q I KG L V N Q I I D R E W I Y L K I N E L L

V

V

~AGAATATTA;T~TCGT~AAGTTTAAAAAAACAGATAGAA~ATTAGAGTT~TTATCCTAT~TCT~ATTAA~ATGAGAAAT~GGATAATAA~T~TAATTTT~TATCTGTTT~TAATTTATC (----------------------------------------+ E y y s R Q ,, ===+----------------------------------------,

49331

~AAATTTTTGAAAATTTTTT~AAATGAATC;GCTATA~AA~ATATAAAAT~CTATTTGGT~GTAATTAAA~TTGAATTAT~ATATATAAT~TTATTTTGA~ATTGAAACT~TTGTTTAAT

49211

EAAAAAAACT~GTTTATTTT;TTTATTACTATTATTTTTA~TAATTGTAA~ATGTTTATT~ATTAAAAAT~CATTTTTTT~ATAACATAC~AACTAGGAT~AATA~TAGG~TTCATAAAA +-----> (-----+ +-----------------)<-----

49091

RAAAATGAAT;ATAAGTATT~~TITAA~TA~ATTATAATT~~A~TTTAAA~AAATAATAA~AACAATTAG~ATAAAAAAA~TTCAGTTAA~AATTAAACT~AAAAATGCT~ATAAGCATT f--------------->(----+-----, + +

48971 <--------

~ATTTTTTGA~AAAAAAAAA;\~C~CATAAATTTTCTTTCG~AAAGAGAGG~ATT~GAAC~~TCGATAGA~~AAAAGCCTA~ATAGCATTT~CAATGCTAC~CCTTAAA~~~CT~AACCAT 3'-GCCUUUCUCUCCCUAAGCUUGGGAGCUAUCUGGUUUUCGGAUGUAUCGUAAAGGUUACGAUGCGGAAUUUGGUGAGUUGGUA +

48851

ETTTCCAAGA~AAACTATTTGAAAATATTG~AATGGAAAA~TGTATATTA~AG~AATATT~TTTAATTTG~GTTATTTTA~TTAATATAT~AAATAAAAT~ACTTGAATT~AAAAAAAAT GAAAGG-5'
48731

~TATATTTCAI\TAAAAAGCT~TTACATATACATATGTAAA~TTTTTTTAT~ATA~AAATA~TTCAAAATT~TTTATGGAT~TATTTT~AA~AAAAAGAAA~TAAGATTAT~TGATAAAT~ TTCAAT> -,<----+ f---------------) <--------------+

48611

+---------

--- ------------,<------------

--- ---------++---

48491

~GTACTATAGAGATGGTGTG~\TTTGACTTT~AA~~~AAGA~TTTTAATAA~GTGGAATAA~TAA~ATTTT~AATAAATA~~TATG~TTAG~GAAGGTTA~~TTGACTTTT~TT~TAAA~C Y Y R D Ggugyg.....................................(~ntron).......................................................

48371

CTTCTATCGT~TACCCTCGA;TAATGTAGCCTTAAATA~T~ATTTA~AGT~TTAAACA~A~GGTTAAACT~ATGTAAAAA~GTTTAAACA~TTTTTATAA~TATACATAT~AAGAAAGC~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron) . . .. . . .. . . .. . . . .. . .. . . . .. . . .. . . . .. . . . .. . . .. . . .. . . . .. . . .

42251

~TACATGTAGAAATTGACAA~ACAAAAGCT~CGTTATTTC~TTTTTAATT~AATTAAAAT~TATGATTGA~GAAA~AAAC~T~~AT~T~G~TTGTATTTT~GGTATCTAT~AAATCAT~G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron).......................................................

48131

~AGATTGTCA~AAGAGCTAA~TCTTATACAATACAAAGCA~TAAGAA~AA~GAATTTTGT~AAAAATTTG~AAAAAAGGA~GGTTAGA~A~TTTTTAGAA~AA~A~TAA~~~ATATAGAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron).......................................................

48011

~TTCATTTTG~TTACAATAA~~ATTGTATT~TTTTAAATA~~CTTTTTTT~ATTTACTCT~TATAAAGAG~AGCCGTATG~AGTTTAAAC~TCATGTACG~TTTTGAAAC~GAGTTTATT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lntron:..... ragccg-a"gaa--gaaa--""~~"g~-~gg""y................

47891

~TAAATAAAT~AACAACCGTRACGAATGTCAGC~CAATCT~AAGGAGAAT~TGCAGAAGC~TTG~AAAAT~ATTATGAAG~TATGCGCTT~GAAATTGAT~CTTATGATC~AAGTTATAT M S A Q S E G E Y A E A L Q N Y Y E AM R L E I D P Y D R S . . . . ..*......cuayy-y-ay

47771 Y

I

ATTATATAATATAGGTCTTA~~~~TA~AAG~AATGGAGAA~ATG~TAAAG~TTTAGAATA~TATTTT~AA~~ATTAGAA~~AAAT~~ATC~TTGCCT~AA~~TTTTAATA~TATGGCTGT L Y N I G L I H T S N G E HA K A L E Y Y F Q A L E RN P S L P Q A F N N MA ~ATTTGCCAT~ATCGAGGAG~ACAAG~GAT~CAACAAGGA~ATC~AGAAG~TT~AGAAA~~TGGTTTGAT~AAG~GG~TG~GTATTGGAA~CAAGCTATT~TACTTGCTC~AAGTAATTA IC H Y R G E Q A IO Q G D P E A 5 E T W F D Q A A E Y W K Q A I L LA P S

47651 V 47531 N

Y

~ATTGAAGCACATAATTGGT~AAAAATGAC~GGACGTTTT~AATAAAAAT~TAAAATGAA~TTTAGTTAA~TTAGTTATT~GTTTTTTAT~ACTAATTCA~CTAAATTTC~TTTTAAAAT 1 E A H N W L K M T G R F === f-e ____-__-~---_--~~-_ -_--_-_ --) (- -----------_----_---_--__

_

47411

~TACATAAAA~TTATGGTCC~TTAAGCACC~TAAAATTGT~TCATAATAA~TTTGAAGAC~TG~~TAGGT~ATTT~TGTA~AATAGTTGT~CCAGTTTTA~ACTGAGAAA~AGATCTAAA -+ TTGAAG> TATAAT>

47291

~GTTATATATEGAT~TCTCAGAAGTTTACT~AITATTGTT~GTAGGTTTT~CCTATGCCT~GT~TGAAGA~AGGAGAAC~~~GATGACTA~TCGTTCACC~GAACCAGAA~TCAAGATTG psaA > AGGAG MTIRSPEPEVKIV

47171

~GGTGGAAAAEGATC~TGTAAAAAC~T~TT~TGAAAAATG~G~TAAACCT~GGCATTTTT~AAGGA~TCT~G~TAAGGGT~~TAGTA~TA~CACTTGGAT~TGGAATTTA~ATGCTGATG VEKDPVKTSFEKWAKPGHFSRTLAKGPSTTTWIWNLHADA

47051

46931 ~TCGTTTTTC~AATTATGAA~CATGGTTAA~TGATCCTAC~~ATATTAAG~~CAGTGCT~~AGTTGTTTG~CCTATAGTT~GT~AAGAAA~TTTAAATGG~GATGTTGGC~GGGGTTTTC R F S N Y E A W L S D P T H I K P S A Q V VW P I V G Q E I L N G D V G G G

46811 F

Q 46691

EAGGATGGTT~CATTATCA~~\AAGCTGCTCCAAAATTAG~~TGGTTT~AA~ATGTTGAAT~TATGCTAAA~CAT~ATTTA~~AGGT~TTT~AGGCTTAGG~TCTCTTTCT~GGGCTGGAC G W F H Y H K A A P K L A W F 0 D V E S M L N H H L A G L L G L G S L S W A

46571 G

H 46451

46331

Fig. 3, cont.

316

K.

et al.

Umesono

46211

46091

~TTATGATCCAACTACTCAA~ACAACAATT~GTTAGATCGCAT~TTAAT~GGGTATGCA~CTTTTTAGG~TTTCATAGT~TTGGATTAT Y D P T T Q Y N N L L 0 R V L R H R DA I I S H L NW V

45851 C

I

F

L

G

F

H

S

F

G

L

Y

~TATT~ATAA~GATA~GATGAGIGCITTTAG~ACGTCCTCACACATGCT~TAGCACCAA IHNDTMSALGRPQDMFSDTAIQLQPVFAQWIQNTHALAPN

45731

ACTTTACTGC~CCTAATGCT~TAGCAAGTACTAGTTTAACGGAA~AG~A~ATTTTTTAG FTAPNALASTSLTWGGGDVIAVGSKVALLPIPLGTADFLV

45611

45491

45371

45251

ETTCTCAAGT~\ATTCAATCT~ATGGTTCTTCCTTATCTG~~TATGGTCTT~TATTTTTAG~TG~T~A~TT~GTTTGGG~T~TTAGTTTAA~GTTCTTATT~AGTGGT~GT~GATATTGG~ S Q V I Q S V G S S L S A Y G L L F L GA H F ii i A F 5 L M F L F S G R G Y W

Q

45131

~AGAGCTTAT~GAATCCATT~;TTTGGGCTC~CAACAACAAATGTAGCTCAT~A~CTT~TAG E L I E S I VW AH N K L K V A P A I

G

45011 Q

P

R

A

L

S

I

T

Q

G

R

A

V

G

V

A

H

V

L

GTGGAATTGCEACAACATGGGCATTCTTT~~AGCAAGAAT~ATTGCAGTA~GATAATGG~~AAGGAGGAT~TGAAAAGCA~TATGG~AT~~AGATTTCCC~AATTTAG~C~GGGCCTATC G I ATT W A F F L AR I I A V G === AGGA p%dJ~ MA S R F P K F S Q G

L

44891 L

S

~CAAGACCCA~\CTACG~GTCET~~TTTGGTT~GGTATTGCG~~CGCACATG~~TTTGAAAG~~ATGATGAT~TGACTGAAG~A~GT~TTTA~~AAAAGATT~TTGCGTCAC~TTTTGGTCA QDPTTRRIWFGIATAHDFESH D D M T E E R L Y Q K I F A S H F G

Q

~TTAGCAATC;\TTTTTTTATGGACTTCTGG~AATTTATTT~A~GTTG~TT~G~AAGGTAA~TTTGAAG~A~GGGGA~AAG~CCCTTTACA~GT~AGA~CA~TTGCTCATG~AATTTGGGA LAIIFLWTSGNLFHVAWQGNFEA %! G Q D P L H V R P I AH A I

W

D

~CCGCATTTTGGTCAACCAGETGTTGAAGC~TTTACTCGA~GAGGAGCTT~TGGACCAGT~AATATAG~A~ATT~TGGTG~ATAC~AATG~TGGTATACA~TTGGTTTACGAACTAATCA PHFGQPAVEAFTRGGASGPVNIAYSGV V Q w W Y T I G L R T

N

Q

44771

44651

44531

44411

44291

~ACAAAATTA~CACAT~CGG~~GGACTAGG~CCATTTTTT~~AGGACAAT~GAATATTTA~G~T~AAAAT~TCGATTCAA~TAATCATG~~TTTGGAACA~~T~AAGGGG~TGGAACAG~ T K L P H P E G L G P F F A G Q W N I Y A Q El V D S 8 F ! HAFGTSQGAGTA

44171

~ATCTTAACT~TTATTGGTGGATTTCATCCACAAACACAA~GTTTATGGC~TA~TGATAT~G~TCACCAT~ATTTAG~TA~TGCAGTTGT~TTTATTATA~~TGGTCATA~GTATAGAA~ ILTFIGGFHPQTQSLWLTDIAHHHLAIAVVFIIAGHMYRT

44051

~AATTTTGGA;\TTGGTCATAGTATCAAAGAEIATTCTTGAA~~~CATACTC~T~~AGGAGG~CGTTTAGGT~GAGGA~ATA~AGGTCTTTA~GATACTATT~A~AATT~T~~T~ATTTTCA N FG IGHS I KE I LETHTPPGGRLGRGH KGL YDT I NFISLH.FQ

43931

43811

iCATCATCAAiATATTGCTG;;TTTTATTATi;ACAGGTGCi~TTGCTCATG~AGCTATTTT~TTTATTAGA~ATTATAATC~GGAACAAAA~AAAGATAAT~TATTAGCTA~AATGTTAGA F F I R D Y N P E 0 N K D N V L A R HHQYIAGFIMTGAFAHGAI

43691 M

L

E

ACATAAAGAA~;CTATAATAT~CCATTTAAG~TGGGCTAGT~TATTT~TAG~ATTTCATA~~TTAGGT~TT~ACGTT~ATA~TGATGTTAT~~TTG~TTTT~GTA~T~~TG~AAAACAAAT H K E A I I S H L SW A s L F L G F H T L G L Y V H PJ D V M LA F G T P E K

Q

I

~TTAATTGAAECTATTTTTGETCAATGGAT~\CAATCTGCT~ATGGTAAAG~TTTATATGG~TTTGATGTA~TTTTATCAT~AA~AAATAA~C~AGCATTT~ATG~TGGT~~AAG~ATATG N N PA F NAG Q S LIEPIFAQWIQSAHGKALYGFDVLLSST

I

W

42571

43451

ETTACCTGGT~GGTTAGATG~TATAAATAA~AATAGTAAT~CACTTTT~T~AA~AATTGG~~~TGGAGAC~TTTTAGTAC~T~A~G~TAT~G~TTTAGGT~TA~ATA~TA~TACATTAAT IALGLHTTTLI LPGWLDAINNNSNSLFCTIGPGDFLVHHA

43331

~TTAGTGAAAGGTGCTTTAG~TGCACGAGGATCTAAATTA~TG~~AGATA~AAAAGAATT~GGTTATAGT~TTC~TTGTG~TGGT~CTGG~~GAGGTGGT~CT~GTGATA~TTCTGCTTG LVKGALDARGSKLMPDKKEFGYSFPCDGPGRGGTCDISAW

43211

43091

~ACATATTTA~TGGGCTGGT~AAGAGATTA~TTATGGTTA~ATT~TTCAC~ATTGATTAA~GGATATAAT~~TTTTGGTA~GAATAGTCT~T~TGTTTGG~CATGGATGT~TTTATTTGG T Y L M G W L R D Y L W L N S S Q L I N G Y N P F G M N S L 5 VW A W M F L F ~CATTTAGTT~GGGCTACTGGATTTATGTT~CTGATATCG~GG~GTGGAT~TTGG~AAGA~~TTATTGAA~CTTTAGCTT~GG~T~A~GA~~GTA~T~CT~TAG~GAATT~AGTTCGCTG HLVWATGFMFLISWRGYWQELIETLAWAHERTPLANLVRW

Fig. 3, cont.

42971 G 42851

Liverwort

Chloroplast

Genome. II

317

GAAAGATAAA~CAGTAGCTC~TTCTATTGT~CAAGCAAGA~TAGTTGGAT~AG~T~ATTT~TCTGTAGGT~ATATATTTA~TTATG~TG~~TTTTTAATT~~TT~TA~AT~TGGTAAATT KDKPVALSIVQARLVGLAHFSVGYIFTYAAFLIASTSGKF

42731

~GGTTAAATT~TATTTGTAT;AAATTT~~A~TTTTACTCA~AATATTTTT~AATAAAAAT~TATTGAAAA~GAATATCTT~~TAATTAAA~AATTATGG~~AAAAAGAGT~TTATT~AAA G === QXSl4~ MAKKSLIQR TTCCAA, TTTAAT, +----------) <---------+

42611

EAGAAAAAAA~AGACAAAAT~T~G~~~~~~A~~~~~~~AT~TTA~GTAAT~~TTTAAAAA~AAAAATTAC~GAAACCT~A~~ATTAGATG~AAAATGGGA~TTT~AAAAA~AATTA~AAT E K K R Q N L E K KY K I L RN S L K K K I T E T S S L D E K Id E F Q K

42491 K

L

0

S

ETTTACCACG~AATAGTGC~ECG~CTCGT~~TCATCGTCG~TGTTTTTTG~CTGGAAGA~~TAAAGCAAA~TATCGCGAT~TTGGTTTAT~TAGACATTT~~TT~GTGAA~TGGCT~ATG LPRNSAPTRLHRRCFLTGRPKANYRDFGLSRHLLREMAHA EATGTTTATTGCCTGGAGTAACCAAAT~TAGTTGGTAAAC~TTTTGGGTT~CTTTCCCG~~TCTTATAAG~GGCGGGGTT~TT~AAAAAA~ATCTTGTTT~ATT~CATTT~TATAGAGTA +---------><---------+ TTGTTT, C L L P G V T K S S W === ~ACTTTTTTC~TTAATTATA~CG~GGAGTAGAG~AGT~TG~TAGCTCGCA~GG~TCATAA~~TTGAGGT~~TAGGTT~AA~T~~TGT~T~~G~~AAAAAA~AATATTTTT~~TTATTTT~ TACT> f&t-CAU> 5'-CGCGGAGUAGAGCAGUCUGGUAGCUCGCAAGGCUCAUAACCUUGAGGUCAUAGGUUCAAAUCCUGUCUCCGCCA-3'

+--------->

~TTTTTTAAGETAATTATTT~~ATTGGCGGETAGCGGGAA~~GAA~~~G~~TATT~T~~T~GGCAAGGAG~AATTTTA~~~TTAAACTAT~CCCGCGTAT~TAAATTATA~TTATATTAT 3'-CCGCCCAUCGCCCUUAGCUUGGGCGUAUAAGAGGAACCGUUCCUCCUUAAAAUGGUAAUUUGAUAUGGGCG-S'
42371

42251 TA 42131

<------42011

Figure 3. Nucleotide and deduced sequences of a region from a@B to tmG(GCC) (56,410 to 42,011). As in Fig. 2, but the minus strand DNA sequence is shown. Genes tmM(CAU), tmF(GAA), tmL(UAA), tmS(GGA) and @aG(GCC) are located in the oppositedirection.

structure of the latter is interrupted by a group II intron of 593 nucleotides at the 3’ side of the D-stem (A23-C24; Table 5 of Ohyama et al., 1988), as seen in tobacco (Deno & Sugiura, 1983) and wheat (Quigley & Weil, 1985). Identification of the trfM(CAU) gene (42,229 to 42,156) that encodes an initiator, not an elongator, has been described (Umesono et al., 1984). We found two more tRNA genes containing CAT anticodon sequences, one for methionine and the other for isoleucine. By comparison with known chloroplast tRNA gene sequences,we identified the tRNA gene (53,801 to 53,874) in the middle of the LSC region as a methionine acceptor trnM(CAU), and the other near the distal end of the LSC region as an isoleucine acceptor trnl(CAU) (Fukuzawa et al., 1988). Two threonine tRNA genes, trnT(GGU) (38,367 to 38,438) and tmT(UGU) (50,333 to 50,261), are 11.8 kb apart on opposite DNA strands (Fig. 1). They do not contain any mismatched base-pairings or introns. One of three chloroplast arginine tRNA genes, tmR(UCU) (21,321 to 21,250), is downstream from trnG(UCC) with a spacer of 63 bp (Fig. 1). The other two arginine tRNA genes trnR(CCG) (Fukuzawa et al., 1988) and trnR(ACG) (Kohchi et aE., 1988) contain mismatched base-pairings in the aminoacyl stems and are highly conserved in their primary structures identical). However, the (80% tmR(UCU) gene predicted a normal aminoacyl stem of 7 bp and is divergent from the isoacceptor sequences(45 to 51 o/o identical). Three tRNA genes, trnE(UUC) (36,787 to 36,715) for glutamic acid, tmY(GUA) (36,643 to 36,562) for tyrosine, and tmD(GUC) (36,484 to 36,411) for aspartie acid, are tandemly oriented with respective spacers of 71 bp and 77 bp in this order. No other tRNA genes encoding glutamic acid, tyrosine or aspartic acid were identified in the liverwort chloroplast genome. Barley chloroplast tRNAG’“(UUC) molecules are identical with

RNADALA, which is an essential component for chlorophyll biosynthesis, as reported by Schijn et al. (1986), who pointed out the presence of an A *U pairing instead of a conserved G *C pairing at the distal end of the Tti stem of the barley tRNAoi”(UUC). This A . U pairing would be conserved in the liverwort tRNAG’“(UUC), because the cognate trnE(UUC) gene contains A53 and T61

1



:j: A : ” A : ” “:A A ”

A ” A”

Figure 4. Possible secondary structure of tmL(UAA) intron. A secondary structure was constructed according to the proposal of Michel & Dujon (1983). Open boxes SR’, A, B, 9L, 9R and 2 indicate conserved sequence elements found in group I introns (Cech & Baas, 1986). Boxes 9L and 2 are complementary. Putative splicing points are marked with arrows.

K. Umesono et al.

318

(Table 5 of Ohyama et al., 1988). Products of trnD(GUC) and tmY(GUA) might contain mispairings in the anticodon stem; C31. A41 in tmD(GUC) and A28. C44 in trn Y(GUA) were observed (Table 5 of Ohyama et al., 1988). One of two chloroplast valine tRNA genes, tmV(UAC) (53,652 to 53,051), and a lysine tRNA gene, tmK(UUU) (26,040 to 28,222), are split by long introns at the anticodon loop-anticodon stem junction: that is, A37C38 of tRNALYS(UUU) and C37-C38 of tRNA”*‘(UAC) (Tables 2 and 5 of Ohyama et al., 1988). The introns found in tmV(UAC) and tmK(UUU) are 530 and 2111 nucleotides long, respectively. The split form of to be conserved trn V( UAC) seems among chloroplast genomes in higher plants (Deno et al., 1982; Zurawski & Clegg, 1984; Krebbers et al., 1984), but not in Euglena gracilis (Hallick et al., 1984). Furthermore, the tmK(UUU) intron could encode a polypeptide of 370 amino acid residues (ORF370i) with a sequence 34.4% identical with that reported in the tobacco tmK(UUU) intron (509 amino acid residues long; Sugita et al., 1985). We have not detected any other copy of the tRNALyS gene in the liverwort chloroplast genome. The spliced cloverleaf structure of tRNALYs(UUU) contains a mispairing of C26 . C42. The 5’ end of a histidine tRNA gene, tmH(GUG) (29,595 to 29,669), was tentatively assigned according to the maize sequence (Schwarz et al.? 1981). The deduced tRNAH’“(GUG) would contain a mispairing in the anticodon stem, because U30 and U44 are predicted from the DNA sequence (Tables 2 and 5 of Ohyama et al., 1988). A phenylalanine tRNA gene, tmF(GAA) (50,998 to 51,070). is tightly linked to the tmL(UAA) gene; they are oriented as 5’-tmL(UAA)-76 bp-tmF(GAA)-3’. The genes tmH(GUG), tmF(GAA), tmC(GCA) (the gene for a cysteine tRNA, 5720 to 5650) and tmQ(UUG) (the gene for a glutamine tRNA, 23,804 to 23,875) all are a unique copy in the chloroplast-encoded tRNA genes.

(b) Ribosomal protein genes Five ribosomal protein genes were found in this region: rps.2, rps4, rps7, rps’l2 and rpsl4, encoding the counterparts of Escherichia coli S2, S4, S7, S12 and 514 ribosomal proteins, respectively (Fig. 1). The rps2 gene (16,055 to 16,762) encodes 235 amino acid residues, and the predicted polypeptide is 44.3% identical with the E. coli counterpart (241 amino acid residues; An et al., 1981) and 72.3 o/o identical with that of pea chloroplasts (236 amino acid residues; Cozens & Walker, 1986; and see Fig. 5(a)). A putative protein predicted by the rps4 gene (50,033 to 49,425) consisting of 202 amino acid residues shares 40.1 o/o and 69*8o/o homology to proteins in E. coli (205 amino acid residues; Bedwell et al., 1985) and maize chloroplasts (201 amino acid residues; Subramanian et al., 1983; and see Fig. 5(b)).

The rps7 (892 to 1359) and rps’l2 (1 to 842) genes were first located by a heterologous Southern hybridization experiment using a cloned Eu. graeilis chloroplast DNA fragment as a probe (data not shown). Sequence analysis confirmed the presence of these two genes, but failed to detect an EF-Tu gene that is tightly linked to the rps12 and rpsY genes in Eu. gracilis (Montandon & Stutz, 1983, 1984). A polypeptide predicted from the rpsY gene, starting 50 bp downstream from rps’l2, consists of 155 amino acid residues and is 43.8% and 44.5 y. identical with those of E. coli (154 amino acid residues; Reinbolt et al., 1978) and Eu. gracilis (156 amino acid residues; Montandon & Stutz, 1984), respectively (Fig. 5(c)). The first exon (rpsZ2’) is found in the DNA strand opposite the second exon (rps’12), suggesting the possibility of trans-splicing in vivo (Fukuzawa et al., 1986). The second and third exons encoded by rps’l2 gene are also separated by a group II intron of 500 nucleotides long. A similar organization of the rpsl2 gene has been suggested for tobacco chloroplasts (Fromm et al., 1986; Torazawa et al., 1986) The rpsl4 gene (42,635 to 42,333) has been identified (Umesono et al., 1984); it consists of 100 amino acid residues and 71 .O% homologous to the is 45*00,/, and counterparts in E. coli (Cerretti et al., 1983) and spinach (Kirsch et al., 1986), respectively (Fig. 5(d)).

(c) RNA polymerase genes Three large tandemly oriented open reading frames (ORFs) have amino acid sequence homologies to either the p or fl subunits of E. coli RNA polymerase. We designated these chloroplast ORFs as rpoB, rpoC1 and rpoC2. The chloroplast rpoB gene (5859 to 9056) encodes 1065 amino acid The amino acid sequence is 43.6% residues. identical with that of the E. coli subunit (1342 amino acid residues; Ovchinnikov et al., 1981), if large gaps are introduced in the chloroplast rpoB gene product. The amino acid sequence predicted from the liverwort rpoB genes is 68.5% identical with that of tobacco (1070 amino acids residues; Ohme et al., 1986; and see Fig. 6(a)). The chloroplast counterpart of the E. coli p subunit may be encoded by two separate genes, chloroplast rpoC1 and rpoC2, in the liverwort genome. The rpoC1 gene (9087 to 11,737) is located 31 bp downstream from rpoB and consists of 684 sense codons with a group II intron (596 nucleotides long) between the 124th and 125th codons (Fig. 2). The amino acid sequence predicted from the rpoC1 gene shares 41.7% homology with the N-terminal 580 amino acid sequence of the E. coli p’ subunit (1407 amino acid residues; Ovchinnikov et al., 1982; and see Fig. 6(b)). The C-terminal portion (amino acid residues 541 to 684) of the rpoC1 protein is quite different in sequence and size from the corresponding region (512 to 580) of the E. coli /?’ subunit.

Liverwort

Liverwart Pee E. coli

Chloroplast

Genome. II

319

~~UrnIH~EE~E~VH~G~ARKUNP~~PVIFTER~GlHIINLTQ~ARFLSEAC-D~VANASSKGKbFLIVGTKVQ~ADLIESSAL~ARCHYVNPK :TKRY:::TF::::::::::::DT:::::R:::F:SAK::::::T:::K:::::::::-::AFD:A::::::::::::KK:::SVTRA:IR:::::::Q: MATVSMRD:LK::::::::T:V:::::K:F::GA:NKV::::: EK:VPMFN::LAE:NKI::R:::-I:F::::RA:SEAVKDA::SCWFF::HR irLGGnLTNUsiIETRLQKFK~LENKKKTGTiNRLPKKEAAi(LKRQLDHLQ~VLGGIKYMT~LPDIVIIIDiMKEFTAIOE~ITLGIPTIC~VDTDCDPDM ::R::::::V:T::::G::R::RTEQ:::KL:S::: RD::M:::::S:FET:::::::::G:::::::V::::: V::L:::::::::::::I::N::::L :::::::::K:VRQSIKRL::::TQSQD::FDK:T:::: LMRT:E:EK:ENS:::::D:GG:::ALFV::ADH:HI::K:ANN::::VFAI:::NS:::G iDIPIPmD&-RASIRWILN~(LTLAICEGR~NSIKN A:%.-.**a*-I::::L:::::VF::::::SS::R:V . . . .. . . V:FV::G::::I::-VTLV:GAVAATVR:::SQDlASQAEESFVEAE

Liverwort Maize E.

235 236 241

72.3% 44.3%

MSR~RGPRVKIIRRLGA-LP---~LTNKTLKLK-~GVIN~STSN~K--VS~~RIRLEEK~KLRFH~~LTER~LLKV;IRIARKAKGS~G~VLL~LL~~~RLD ::::::::L:K::::::-::---:::R::P:SG-:NPKKKFH:G::---E::::::Q:::::::::::::::::R::H::G:::R::::::::::::::: :A::L::KL:LS::E:TD:FLKS:VRAIDT:C:IEQAPG:ffiA-R:PRL:D:GVO:R::::V:RI::VL:::FRN:VKE:ARL::N::EN::A:::G:::

coli

NIIFRL~~PTIPGAR~LVNHRHILI~NNTVDIPSV~CKPKDVITI~DRSK~SIIiKNLNSF-QKQi(IPNHLTFDL~IKGLVNQIiDREWIVLKI~EL :;L......S...............V:GRI..... . . . . . . .. .. . . . .. . . . .. . . . . ..FR...R:I::T::NaR:KRLVPNYIA:S-DPG:L:K:::V:TL:V::::KK:L::K:VG:::::: :VVV:M:FGA:RAE:::::S:KA:MV:GRV:N:A::OVS:N::VS:REKA:K::RVKAA:ELAE:RE:-:TW:EV:AGKME:TFKRKPE:SDLSAD:::H LvvEvvsRQv .. .. .. .. .. .. .. ..T .. :I::L::K (4

rps7

202 201 205

69.82 40.1%

(57)

Liverwort E. gracilis E. coli

MSRKSIAEK~VAKPDPIVR~RLVNMLVNRiL-KNGKKSLA;RILYKAMXNiK~KTKKNPL~VLRQAVRKV~PNVTVKARRiDGSTVQVPL~IKSTQGKAL :::RRR:K:RIISQ:::: NST:ASKVI:K::-L::::T::QV:F:ET::::OEIV::D::DI::K:IKNAS:QMETRK:::G:TI::::V:V:EDRGTS: :P:RRVIGQRKIL:::KFGSE:LAKF:: -::MVD::::T:ES:V:S:LETLA:RSG:SE:EAFEV:LEN:R:T:E::S::VG:::::::V:VPVRRD-:: ~IRWLLGASRKRSGPNMAFK~SYELIDAAR~NGIAIRKKE~THKMAEANRAFAHFR :LKFIIEKA:E:K:RGIST::KN:I:::SNNT:E:VK::::I::T::::K::SNMKF :M::IVE:A:::GDKS::LR:AD::S:::ENK:T:VK:R:DVAR:::::K::::V:

(d)

rpsl4

Liverwort Spinach E. coli

155 156 154

44.52 43.8%

(514) HAKKSLIQR~KKRQNLEKK~KILRNSLKK~ITETSSLOE-~WEFQKKLPSiPRNSAPTRL~RRCFLTGRP~ANYROFGLS~HLLREMAHA~LLPGVTKSSL; ::R::::::::::R:::Q::HLI:R:S:QE:RKVT::SD-::::P:::::: A::::::::::::R::I::::::G:I::::V:T:::::A:R::: :::Q:MKA::V::VA:AD::FAK:AE::AI:SDVNAS::DR:NAVL:::T:::O:S:S:~RN::RQ::::HGFL:K:::::IKV::A:MRGOI::LK:G

100 100 99

71.0% 45.0%

Figure 5. Amino acid sequence comparison of ribosomal proteins. Amino acid sequences are translated from the respective DNA sequences except for E. coli S7 protein. Identical residues are marked with colons. The bar indicates artificial shifting to maximized homology. (a) E. coli S2 protein homologue. Pea (Cozens & Walker, 1986), and E. coli (An et al., 1981). (b) E. coli 54 homologue. Maize (Subramanian et al., 1983), and E. coli (Bedwell et al., 1985). (c) E. coli S7 homologue. Eu. gracilis (Montandon & Stutz, 1984), and E. coli (Reinbolt et al., 1978). (d) E. coli S14 homologue. Spinach (Kirsch et al., 1986) and E. coli (Cerretti et al., 1983).

The rpoC1 gene is followed by the rpoC2 gene (11,811 to 15,971) encoding 1386 amino acids. The initiation codon of the rpoC2 was tentatively assigned to 74 bp downstream from the rpoC1, because of the presence of an SD-like sequence upstream from the rpoC2 coding sequence. The product of the rpoC2 gene corresponds to the C-terminal half (581 to 1407) of the E. coli p subunit, if large gaps are introduced in the E. coli According to sequence. the alignment in Figure 6(c), the N-terminal 349 and C-terminal 283 amino acid residues are 40.7 y. and 37.8% homologous to the corresponding portions (581 to 948 and 1103 to 1407) of the E. coli /I’ protein. The discrepancy is caused by the additional residues in the rpoC2 protein (amino acid residues 350 to 1103; filled arrows), which is much longer than the corresponding E. coli sequence (949 to 1102). The possibility cannot be ruled out that the central portion of the rpoC2 gene, as well as the region between the rpoC1 and rpoC2 genes, represents introns, although we have not detected the characteristic structures of both group I and II introns. A rpoC2-like gene can be seen in pea chloroplast DNA, although its 5’ part was not described (Cozens & Walker, 1986). This partial

ORF encoding 1163 amino acids is comparable in size to the corresponding region of the liverwort rpoC2 (amino acid residues 205 to 1386). Amino acid sequences are less homologous (32.9%) in the central regions (amino acid residues 476 to 1080 of liverwort rpoC2 and 276 to 876 of pea ORF) than in portions of N termini (65.3 %, 205 to 475 of liverwort and 1 to 275 of pea; open arrows) and C termini (63.8%, 1081 to 1320 of liverwort and 877 to 1123 of pea). (d) ATPase subunit genes genes (atpA, atpB, atpE, atpF, atpH and for the chloroplast H+-ATPase, which consists of nine non-identical subunits, were identified by sequence comparison with genes in higher plant chloroplast genomes. In addition to the split structure of atpF (587 bp group II intron), the organization of the atp genes is also conserved in the liverwort chloroplast genome. These genes form two clusters: 5’-atpIbp-atpH-208 bpatpF-44 bp-atpd-3’ and 5’.atpB-5 bp-atpE-3 (Fig. 1). An F,-IV subunit (248 amino acid residues) encoded by the atpI gene (16,890 to 17,636) is Six

atpI)

320

K. Umesonoet al.

livwwrt

rpoa

Tebecco I?. cdl

rpoe rpo6

MEIFILPEFi;KIQFEGFNR~I~---GLSEEL~FPIIEDI~EFEFQIFGEi)YKLAEPLLK~RDAVYQSIT~SSDVYVPA ILGffiNEG:STI:G:~::::::C:::D:---::T:::YK::K:::T::-:I:::L:V:T:Q:V::: 1:::::::E:L::::EL::S: MVYSYTEKKRIRKDFGKR~VLDV:YLLS::LDS:QK::E:DPE:DYGLEAA:RSVFP:QSYSGNSELDYVS:R:G::VFDVQECDIRGV:::APLR:KL . ------TQKKKGK-IDicDIVFLGSIP~MN~TFVV;IVARVIINQiLRSffiIYYN~EL--DHN-GIPI~TGTLISNffi~RLKLEIDGK~RIWARISK cQ':- ---IU:NSRD-FI:E:TI:I:N::::::L::SI:::IY:IV:::::Q::::::R:::--:::-::SV::::I:: D::::SE::::R:A:::::V:R R:VIYEREAPEGTV:D:KE:E:YM:E**..TDN:::: . . . . 1::TE:::VS:LH::::VFFD:DKGKT:SS:KVL:NARI:PYR:SWLDF:F:P:DNLFV::DR KiKVSILVLLL~LNLQNIL~SVCYPKIFL~F------------------------------------------------------------------

RE::EN::::E:

:Q:I:::::SS::::::

::S:------------------------------------------------------------------

R::LPATII:R:LNYTTEQ:::LFFEKV::EIRDNKLQMELVPERLRGETASFDIEANGKVYVEKGRRITARHIRDLEKDDVKLIEVPVEYIAGKVVAKD --------------------------------------------IKKNTKKE~PNSTEDAIV~LYKHLYCIGc-DLFFSESIR~ELDKKFFQD-~CEL --------------------------------------------LSDKER:KIG-:K:N::L:F:QQFA:V::-:P\I::::LC..........-:::: YIDESTGELICAANMELSLDLLAKL~SGHKRIETLFTNDLDHGPY:SETLRVDPT:DRLS:L::I:RMMRffiEPPTREAA::LFEN:---::SED:YD: GKIGRL~LNKKLNLNVPENEIFVLW~)~LAAVDYLI;(RKRLPTPKS~VTS :R:::R:M:RR:::DI:Q:NT:L::R:::::A:H::G::::M:AL::MN::::K:IR.........FG::: SAV::MKF:RS:LREEI:GSGILSKD::IDVMKK::DIRN:K:EV::::::G:::

. .. . . . . . ..

V::::V:RGTIC::IRH:LI:::QN::::

.. . . . . .. .

IR::GEMAEN:FRVG:V:V:RA:--KE:LSLGDLDTLM:QDMINA

TPLIMTiK~FFGSHPL~~FL~TNPLiEIVHKRRLS~LGP~LTRR~ASF~VRDlH~SHYGRICPI~TSEGRNAGLiASLAIHAKI~ILGCL~SPF~KIS :::TT:YES:::L:::::V::R:::::Q:::G:K::Y:::::::G:::::RI::::P::::::::: D::::I:V:::G::::::R:GHW:S::::::E:: K:ISAAV::::::SQ::::MV:N:::S::T:::: 1:A::::::::ER:G:E:::V:PT::::V::::: P::P:I:::N::SVY:QTNEY:F::T:YR:VT KLSNLEiIINLSAAEDiYYRIATGNCiAL~NSQ-EE~ITPARYR~D~VAIAWEQVH~RSIFPLQYF~VGASLIPFL~HNDANRALM~SNMORQAVP~lK ER:TGVRHLY::PGR****MV:A::S::: ,... N:DI:-:::VV::::::E:LT:::::::::::::F::::I:::::::I::::::::::S::::::::::SR DGVVTD::HY:::I:EGN:V::()A:SN-::EEGHFV:DLVTC:SKGESSLFSRD::DYMDVST~:VV::::::::::::

D::::::::A::::::::T:R

P~KCIVGiGIESOTALD~GSVT-VSSHG~KIEYLDGNQiILSLKKKK-------I~-KNLIIYQRS~NSTCMHQKP~V~KQKYIKK~QILADGAAT~NG~ AL-AIAERE:RVV:TNTDK:L:AGNGDIL-----S:---P:V::::::KN:::::: LO:PRG:C:::::::::::::VG:: S::::::::L:R:A::::: AD:PL::::M:RAV:V:::-:: A:AKR::VVQ:V:ASR:VIKVNEDEMYPGEAG::IY::TK:T:::QN::IN:M:C:SLGEPVER:DV::::PS:DL:: LALGKNiLVAYMPWEG~NFEDAILIN~RLIYEDIYl~IHIERYEIE~RVTS~GPEK~TNEIPHLDD~LLRHtWNGiVLTGSWVET~DVLvGKLTPhETE .. .. .. .. .. .. V:::::::::::S:::V::S:::V:::::::F::RK:::OTH:::::::: V:::::::EAH:::N::K::::ML::::::::I::::::::VVK ::::Q:MR::F:::N:::::: S::VS::VVQ::RF:T:::QELACVS:D:KL:::EI:AD::NVGEAA:SK::~S:::YI:AE:TG::I::::V::KG-:

ENLRAPtGKLLQAIFGiQVATSKETtiKVPPGGRGR;IDIRL---------------------------------------------------------:SSY:::DR::R::L***.S::::::::L:I TQ:T-::E:::R:::I~K~sDv:DSS:R::N:VS:T...

::::::::v:w---------------------------------------------------------. ..V~VFTRDGVEKDKRALEIEEMQLKQAKKDLSE~LQlLEAGLFSRIRAVLVAGGVEAEKCDK

--------------------------------------------IS~E~NSA-NTAQII;l~YlLQKRKI;lIGDKVAGRH~NKGIlSKlL~RQDMPFLQD~ _________________ -__-_______________----_-:oKRGG:S-YNPET:RV::::::E:KV:::::::::::::::: LPRDRWLELGLTDEEKQNQLEQLAEQYDELKHEFEKKLEAKRRK:T:G:DL:PGVLK:VKV:LAV::R::P:::

::::::::y:::: M::..-..:V::::N:IE:::YDEN: . . . . .

TPIDMILsPiGVPSRMNVG9IFECLLGLA~SFL--------------------------------HKNYRIiPFDERYEREAS--RKL------VFSEL; Rsv::vF~................s...... ................ . . . . ..L.--------------------------------DRH...A:::::::~:::--:::------:::::: ::V:IV:N:::::::::I:::L:TH::M:AKGIGDKINAMLKQQQEVAKLREFIQRAYDLGADVRQ:VDLSTFS::EVM:L:ENL::GMPIATP::DGAK KASKKTTNP;LFEPDNPGK;RLIDGRTGEiFEOPITIGK~YMLKLIHQV~DKIHARSSG~YALVTQQPL~GRSRRGGOR~GEMEVWALE~FGVAYILQE~ ::::::::::::AKO:::::::::::::::::::H::::: E:::Q:A :::V:::EY:::S:IF:::::NP::::VI:::P:I::::::::::::G::::H :::::N:L::::M::::T:S:S:::::::G:KAQF::::F:::::::::AY:A::T:::: E:EI:ELL-K:GDLPTS:CJI::Y::::::Q::R:V:V:YM LTIKSDHIRiRYEVLGAIV;GEPIPKPNTi\PESFKLLVR~LRSLALEIN~VIICEKNLK~KLKEI ::Y::::::::O::::TTII:GT::N:ED:::::R::::::::::::L::FLVS:::FQINR::A ::V:::DVNG:TKMYKN::D:NHQME:GM-::::NV:LK:I:::G--::IELED: (b)

rpC1

Liverwort E. coli

(RNA rpo

polperase

Cl

8'

1065 1070 1342

68.5%

43.6%

subunit)

MTYQKKHQHiRIELASPEQiRNWAERVLP~GEIVGQVTK~YTLHYKTHK~EKDGLFCEKiFGPIKSGIC~CGKYQGlEK~KENIKFCEO~ R:::::AR::::V:DYE:L ::::---KRL:HRGVI::K: VKDLLKFLKAQ:KTEEFDAIK:A::::DM::S:SF---------:E:K::E:IN:R:F:::

rpaC (l-580)

GVEFIESRIiRYRMGYlKL~CSVT-HVWYL;RLPsYIANL~AKPLKELES~VYCD T FLARPITKKPTLL-KiQGLFKYEDQ~WKDIFPRFFSPRGFEVFO . ..R.GL::DM::RDI:RVL:FESYVV--:EGGM:N:ERo:I:T--:E:YL-:AL-EE:---:D:-:D :::VTQTKV::E:::H:E::-:P:A:I:F::S...

lT~DLNELYRRViYRNNT---LLDFiARSGSTPGG;YVC~KRLV~~AVDALIDNGiRG~PMKOSH~RPYKSFSDLiEGKEGRFRE~LLGKRVDVS~RSVl A~~~~~D~~~~~~N:::RLKR:::-::A----:OII:RNE::ML:::::::L:::R::RAITG:NK::L::LA:M:K::Q::::~::::::::::::::: . . . .. . . . . . . VVGPFiPLH~CGLPRiMAlELFQAF~-----IRG~IGRNF~PNLRAAKTMiONK~PIIWK~L~EVMOGHPiLLNRAPTLH~LGIQA~OPI~VNGRAIHLH T:::V:R:::::::KK::L:::KP:IVGKLEL:::-----:TTIK:::K:VERE:AVV:DI:O::IRE::V::::::::::::::::E:V:IE:K::~:: ~iVCGGFNAD~DGD~MAVHI~~SLEAQAEA~LLMLSHKNL~SPATGEPlS~PSODM~LGL~lLTl~NN~GiYGNKYNPSK~YDSKKKFSOiPYFSSYDNV YMTRDCVNAKGEGMVLTGP:EAERLYR:GLASLHARVK: :::A::M:TN:I::::N::::I:::::VV:::: ::::AAY::::::::::::V::T::::L

684

IEKNG ----(c)

rpoC2

Liverwort E. coli

(RNA

rpoC2 rpd (581-1407)

plprase

(CONTINUE) 6'

41.7%

subunit)

NAEPVN~IF;NKVMDRTA~;~LISRLIA----H~GITYTTH~L~~LKTLGFOO-~TFGAISLGI~OLLTAPSKS~LIEDAEQYG~LSEKHHNYG~LHAVE V::::MVIPEK:HEI:SE::AEVAEIOECIFQS:LVT:G: :IV:KG:P-:SI:NOALG-:KA::KMLNTCVRIL:LKP:VIFA::IMVT::AYA:RS::-:

KLRQiIETWyATSE;L-KQEM-NPNFk--IT-DP-------LNPVH~MSFSGARGS;SOVH~LVGM~GLMSDPQGQiIDlPI~SNFkEGLSLTEYliSCY RYNKV:OI:A:ANDRVS:AM:D:LQTETV:NR:GQE~KQVSF:SIY::AD::::::AA:IR::A::::::AK:D:~::ET::TA::::::NVLQ:F::TH

Fig. 6.

Liverwort

Chloroplast

Genome. II

321

GARKGV~DTAVRTsDAI%LTRRLVEV~HIVVRKVD~GTLVGINVN~LSE ---KKNNFOi)KLIGRVIAE~IVI--DHRCIA~RNODIGALL~NRLIlLKT :::::LA:::LK:ANS::::::::D:A:DL::TED:::: HE::MMTPVI:GGDV:EPLRDRVL:::T::DVLKPGTADILV:::TLLHEOWCDL:EENSV (Pea) ::::::::::::RT::::IR::S::TRN-GMMPEIILI:T:::::V::::::-GS-:::VV::::::IG:I::F::FO: 9 RTPFNGIIEiNENFVYPTRiRHGHPAWM N~IFLRSPLTC~~NWICQLC~GWSLSffiNL~EffiEAVGIIi\GOSIGEPGTi)LTLRTFHTG~VFTGDIAEH ........... . . ..I..AASRAA::S----------------------------DAVKV::VVS:DTDFGV:AH:::RD:AR:HI:NK:::I:V::A...........M.... R:pI::pv:~................................ 0P::I:T:F::RNTS:::R:::: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..GT..Y..A.S..K.KL..DL.H.......Y..FI C~TNLFLVIKS~NKVHNLTIP~KSLLLVONNi)YVESKOVIA~IRAKTSPF--K~KVOKYIYSN~EG~MHWSTK~RHASEYIHS~IHLlLKTCHiWILSGN __-_________________-------------------------------------------------------------------------------:NID:YVT:E:DDII::VI:::::F:::::D:::K:E:::::::AG:YT:NL::R:R:H:::DS::::::::D:Y::::FMY::V:ILP::S:L:::::K * FHK;NNDLSVLFYic~DKIDFPI~LTKEKNEFS~VKNK-TOLNL~LFHFYLYKK~KIFIK~LT~NILNKINNS~NYNFILOEY~I-KKKKN-FYFi--___-_________-_----_----------SIQ::::-GS[K:SNVKSVVNSSG:LV:T:RN:ELK:-------------------------------SCRS:T-IHF:LR:D::O:TMD-::SNG:TNI:NLLERND:VKHK::R:NTFGTKEKG:SDVSIF:EIICTDHSYPAlHFDTF:FLA:RRR:R:RlPFPf ------KNKNLT-CPL~LKIKKNGVLi(NNEIFAILD~PSYKVKNSGiLKYGNIKVD~INONTNF-ED~OTKLFRPRY~-IIKEGNFF---FiPEEVYVLT ----------------------------IDE:GRTKE -::::PYGAV:AK:DGEOVAGGETVAN-W::H:MPVITEV:GFVRFlDMIDGOT:TROTDE:: OSIOER::ERMSPSGVSIE:PI::IFHR:S:::VF:::O:RRHS:::T::RT:GIHS:F:KED:I:YRGI:ELK:K-:O:OVD-R::---::::::Hl:p ~~LSS~FIKN~~FI~AGTLITSNIRSNT~GLVKI~KKGN~~~ELKIL~GTI~~~~ET~KI~~OI~ILI~~G--~KLF-~EFEC-K~WT~L~WIM~~K~K~ 4:: :LVVLDSAERT::GKD---L:----PAL::VDAQG:DV---LI: ::-DM:---------------------------------------------K:-::LMVR::SLVGI::P::F::::RVG:::RLD::KKK-I::::FS:N:HF.G.MD:::RHSA::::::TV..KKC.KSKKI:::I:V:::ATT:K:Y . . .. . FvLIRPAvEyKISKKiNKSTLF--DLL;KNKKVEIKT~NYIS -------_--______________________________-----------------------------------------------------------:::V::VIL:E:PDSN:FVK::PO::FOEKDNL:L:VV::I::GNGKS:RG:SDTR:::VR:::VFN:DDG:NSSSIE::PA::IEVR:NGLIEY::R:D iIEYSNLE--KK~EKTISKNVL;KNYYD---HFFSi-SKNELKN---KK~GVIRIISNO~NGMOSFIIL~SSDLVKTFK~KKLTKNISI~TNTNTSTAK~ -_------_-______________________________-----------------------------------------------------------:VK-::TSVIR:RNEPSGFGLIGD:KS:RINP ::::H::GKIOOSLSONH:T::MLL:R:KECR:W::::: NNCFOMRP:NNEKSHNG::KDP------FEFNKNFKIiNKKKKLNLT~KNFSIGLLL~KKLGFLGNL~NIVTNSFSSiY-LINYTKLI~NKYSIITKF~HTCONPKWY~IDESKKINK~ILGKHINVN -------_________________________________-------------------------------------------------------------------------------------IISINNN:P::-IALO:A:-:Y:L:H::THNOISII:NLOLD:LTEIF:VI:Y::M::ND::C:PD:YSN:IL: iFNWCFPLFSiLKKKIDFOTiKLGOLLFEN~VIS--KCKT--S"~SGOIISINIEIYFIIRLAKP;LA~ATIHkNYGEFIKEGbTLITLIYERiKSGDI -----------------------------------------------------AO::Lp---------:K:IVoLED:VO:SS::::ARIPo:SGGTK:: ~:HLNWFFLHHFYCEKT:TR:S:::FIC::IC:AOM:NRPHLKLK:::V:IVOMDSV:::S:N:::::P:::::GH:::ILSO::I:V:F:::KSR:::: * IOGLPKVEOLLEAR-P--------IN-SVSINLE---------------------------~GFEDWNNDMiKFI--GNLWGF~LSTKISM~O~OINLV TG:::R:AD:F:::R:KE~AILA~:SGI::FGK:TKGKRRLVITPVDGSD~Y~~MI~KWROL:~::G~RV~RGD~:SD:~~A~HDILRLRGVHA~TRYI: T::::::::I::I: _________ S:D-:I:M :::----------------------------KRlDA::EClT:I:--:IP:::LIGAELTIA:SR:S:: DOIOKVYOSOGVOI~NKHIEIIVRimTSKVITLEbGMTNVFLPG~LIEFSRTQK~NRALE--EAVP--~KPlLLGITK;\SLNTOSFISEASFPETTRV;A NEV:D::RL:::K:ND::::V::::: LR:ATIVNA:SSD-::E::OV:Y::VKIA::E::ANGK:GAT:SRD:::::::::A:E::::A::::::::::T NK:::::R::::H:H:R:::::::: 1::::LVS::::S:I::::::: GLL:AERTG::::--::IC--:RAL:::V::T::::::::::::::::A:::: KAALKGRI~WLKGLKENViLGGLVPAGT~SOEVIWOITiNVFLYODTF~IFPTTEIIH~VLKESlSON~KNNFsl E::VA:KR:E:R:::::::V:R:I::::: YAYHODRMRRRAAG:APAAPOVTA:DASASLAELLNAGLGG:DNE ::::R:::::::::::::V:::MI:V:::FKRIMHRSRSROHNK:T---R::: L:EVEIR:LL:HHRKLLDFANFK:FM

more

than

1386 1407 1163

20.9% 45.4%

Figure 6. Amino acid sequences of RNA polymerase subunits deduced from homologue.Tobacco (Ohme et al., 1986), and E. coli (Ovchinnikov et aE., 1981). The liverwort rpoC1 sequence was obtained after removal of a putative intron together with the rpoC1product, only the N-terminal 580 amino acid residues

the DNA sequences. (a) E. coli @subunit (b) and (c) E. coli /? subunit homologue. between D,,, and LIZ5 (arrow). In (b), of the E. coli #?’subunit (Ovchinnikov et al., 1982) are shown. The following C-terminal portion (amino acid residues 581 to 1407) is compared with liverwort, and pea (partial) rpoC2 products (Cozens & Walker, 1986) in (c).

82.3% and 80.2% identical with the subunits of pea (247 amino acid residues; Cozens et al., 1986) and spinach (247 amino acid residues; Henning & Herrmann,

1986),

respectively.

They

diverge

from

one another mainly in the N-terminal 16 amino acid residues (Fig. 7(a)). A relatively small FOE”,-111 subunit protein (81 amino acid residues) is the atpH (18,014 to 18,259) product. This protein seemsto be highly conserved (97.5%) among plant species,because the liverwort protein differs in only two amino acids from both wheat (Howe et al., 1982) and spinach (Alt et al., 1983) sequences (alanine to serine at the 6th and valine to isoleucine at the 26th), suggesting that the changes are caused by single-base substitutions (Ala to Ser, GCT to TCT; Val to Ile, GTT to ATT; Fig. 7(b)). A split gene, atpF (18,468 to 19,609), encoded an Fo-I subunit (184 amino acid residues), to judge from the amino acid sequence similarities to the

sequences of spinach (50.8%; Henning & Herrmann, 1986) and wheat (49.2%; Bird et al., 1985; and seeFig. 7(c)). Divergence of the liverwort atpF gene was observed in the length of the intron. It consists of 587 nucleotides, but the spinach and wheat introns have 764 bp and 823 bp, respectively (Bird et al., 1985; Henning & Herrmann, 1986). The nucleotide-binding subunit a of F,-ATPase (507 amino acid residues) is encoded by atpA (19,654 to 21,177). The amino acid sequence of liverwort a subunit is highly homologous to the sequencesof tobacco (87.6%; Deno et al., 1983) and wheat (82.8%; Howe et al., 1985). However, the three proteins in the N and C-terminal portions are less similar to the protein of E. coli (Fig. 7(d)). The atp gene cluster, atpBE, is about 30 kb from the atpIHFA cluster and is transcribed in the opposite direction. The nucleotide-binding subunit fl coded for by the atpB gene (55,846 to 54,368; 492 amino acid residues) is 89.0 %, 87.2 o/o and 87.0 y.

322 (a)

K. Umesono stpr

Liverwort Spinach Pea

(Ii+-ATPase

F. subunit

et

al.

IV)

MSHTAKMASiFNNFYElSN~EVGOHFYWOiGSFOVHAQV~ITSWIVIAI~LSLAVLATR~L~TI~~FVEyVLEFI~DLTRTQIGE~EYRPWVPFI~ :NVLSYSINPLKGL:A::G::::::::::I:G::I:DKA:::::V::::::GS:AI:V:SP::::T:::::F:::::::::VSK::::::-::::::::: :NVLLCYIN:LNR::O::A::::::::::l:O::::::::::::V:::::: 1STI:VV::P::::TS:::: F:::::::::VSK::::::-:G::::::: TMFLFIF~S~WSGALF~WRSFEL~NGELA~\PTN~INTT\LALLTS~A~~YAGLHKKGL~YFGKYIQPT~VLLPINILE~FTKPLSLSF~LFGNILA~E~ :::::::::::::::L::KIIQ::H:::::::::::::::::::A:::::::::T::::G::::::::::I::::::::::::::::::::::::::::: :L:::::::::::::L::KIIK::H::::::::::::::::::::::::::::IS::::A::::::::::I::::::::::::::::::::::::::::: VVAVLISLV~LVVPIPMMFiGLFTSAIQAkFATLAAAYiGESMEGHH ::,,::~.....-....,,........G.................L.... . . . . .. . .. . . . . .. . . . . . . .. . . . . . .. .. . . . . . . . ::V::V..........v........G...................... . .. . . . . . . . . .. . . . . . .. . . . . . .. . . . . . . .. . . . . .

(b)

atpN

Liverwort Spinach Wheat (c)

stpF

Liverwort Spinach Wheat

(Ii+-ATPase

F. subunit

248 241 247

80.2% 02.3%

III)

MNPLISAAStIAAGLAVGLiSIGPGIGQGiAAGQAVEGI;EALTIYGLV~ALALLFANP~V .. .. .. .. .. A...................V.........................,............................. . . . . . .. .. .. . . . . .. .. . . . . . .. . . . . . . .. . . . . . .. . . . . . . . . .. . . . . .. . . . .. .. . . . . .. .. . . .. .. .. .. .. A...................V....................................................... . . . . . . . .. . . . . . . .. . . . . . . . .. . . . . . .. . . . . . . .. . . . . .. . . . . . . . .. . . . . .. . . . . . . .. . . . . (H+-ATPare

FG subunit

al 8, 8,

I)

MENGTYFII~SNFWTIAGS~GLNTNLLET~LINLGVVLGiAK~KADD,RING~ :K:V:OSFVFLGH:PS:::::F::OI:A::::::S::::V:IF:::::::D::D:::QR::::: :K:V:HSFVFLAH:PS::::::::DI:A::::::T::V:V:IF::::::

v RNS::LRGK:IEQ:EK::A::KKVEblD::QF:V::Y KD::O:::QR::S::RNS::LRRGTIEQ:EK::I:::KVELE::~Y:M::Y

SPMEKEKOOiINAADEOSKkLEOSKNAT,~FEKQRAIEQ~R~VSRLAL~RALETLKSR~NSELHLRMI~YHIGLLRAM~STIE :EI:R::MN:::STYKTLEOF:NY::E::Q::Q:K::N::::R:FQQ::OG::G::N:C::N :::::T:NAN::MFG::NEITD :EI:R::Api::::TSISLEQ::K::: E:LY::::::MNE:::R:FQQ:VOG::G::N:C::T:::F:T:RAN::,:GSLEWKR (d)

atpA

Liverwort Tobacco Wheat E.

coli

(H+-ATPase

F1 subunit

97.5% 97.5%

184 la4 ,a3

50.82 49.2%

a)

MVNIRPO~I~S,1RK~IE~;NQEVKIVNI~TVLOVGOGI~RIYGLOKVM~GELVEFEOG~VGIALNLES~NVGAVLffiO~LTIQEGSSV~ATGKIAQIP~ ::T::A::::N:::ER:::::R::::::T:::::::::::::H:::E::::::::::E::,::::::::N:::V:::::::L:::::::::::R:::::: :ATL:V:::HK:V:EL:::::RK:G:E:::R:V::::::::: 1::GEI:S::::::AE::R::::::::K:::I:::::::M:::::F:::::R:P:::: :-'JLNST::: EL:KOR:A:F:VVSEAH:E::IVS:S::VI::H::AOC:Q::M,SLffiNRYA::::::R:S::::V::PYAOLA::MK::C::R:LEV:: SOAYLGRVV~ALAQPIOGK~~IPASEFRL~ESPAPG~,S~RSVYEPMOT~LIAIDSM,P~GRGQRELII~DRQTGKTAV~IOTILNOKG~NVVCVYVAI~ :E::::::,::::K::::R:E:S:::::::::A:::::::::::::L:::::::::::::::::::::::::::::::::T::::::O::::I::::::: :E:::::::::::K::::::E:,:::S::::::::S::::::::::L:::::::::::::::::::::::::::::::::T:::::::::G:I::::::: GRGL::::::T:GA::::::PLDHOG:SAV:AI:::V:E:Q::OQ:V:::YK:V::::::::::::::::::::::::~:::A:I::ROSGIK:I::::: QKASSVA~V~NTFEORGAL~YT,VVAETA~SPATL~YLA~YTGAALAEY~MYRKQHTLI~YODLSK~AQ~YRQMSLLLR~PPGREAYPG~VFYLHSRLL~ 0:::::::::::::::::::::::ER::::::::P:::::::::::::::::::::::L:::::::::::: ::::::::::T:LQE:::M:::::::::: :R::::::::T:.HEE::,4 :::: ::-:M:D .~~~~~~~~~~~~~~~~~~~~~~~~R.................................'..........' . .. .. . . . . . .. . . . . . . .. . . .. . . . . . .. . . . . . .. . . . . .. .. . . . .. . . . . . . .. . . . . . .. . ::::TISN::RKL:EH:::AN::::VA::SES:A::::: RMPV:LMG:::RD:GEDA:::::::::::V::::,::::::::::::F::::::::::::: RAAKLSSNL------~E-----GSMTALPI;ETQAGDVSA~IPTNVISIT~G~,fLSADL;NAGIRPAIN;rGISVSRVGS~A~IKAMK~V~GKLKLELA~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~...........................~......."...........................' . .. . . . . .. .. . . . . .. . . . . . .. .. . . . .. . . . .. . . . . . . .. . . . . . .. . . . . . .. . . . . .. . . . . . . ..-......s...... :::::N:t:------::-----.........".s.............................. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. ...I............. :::RVNAEYVEAFTK::VKGKT::L:::::,:::::::::FV::::::::::::::ETN:::::::::V:P::::::::G:::TY,::MLS:GIRTA::: ~AELEAfA~fbSDLDKATQN~LARGORLRE;LKQSoSAPL~VEEQ,AT,Y~GVNGY~DVL~TG~VKKFLI~LREYLVTNK~QFAEIIRST~VFTEQAENL .. .. .. .. .. .. .. .. ...*............................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..T.....".....T.....S..V... R:::VE::T::K::::::Q:::S:::T:::E::A: :::: 0 :::::::A:::TS:::::::R::::::::::AN::P:::::::::::TR::::S::,E::N:::OE::KH:KOT::::Q:::S:S:T::::::,: YR::A::S:::::::D::RK::DH::KVT:::::K:Y::M::AQ:SLVLfAAER::: ADV:LSK,GS:EAA:LA:VORDH:LMO::NG-:GGYNDEI:GK~KEAITEHIE~FLFOEEK :::::Q:QMOR:IL::QA :::::Q:'JL:R:SL: ::G-:L---0S:KAToSW

(e)

etptl

Liverwort Spinach Maize Pea E.

coli

(Ii+-ATPase

Fl

subunit

507 507 504 513

87.6% 82.8% 55.0%

6)

MKTNFLA--FG~STLVAKNIG~ITQVIGPVL6VAF5PGKMP~IYNSLIVKO~NSAGEEINV~CEV~QLLGN~KVRAV~SA~DG~RGMKV~DTGAPLTV :R,:PTTSDP:V:::EK::L:R:A:I:::::N:::P:::::::::A::::GROT::OPM:::::::::::::R:::::::::::LT:::E::::::::S: :R::PTTSRP:I::IEE:SV:R:D:I::::::IT:P:::L:Y:::A::::SROT~OK~:::::::::::::: R:::::::::E:L::::E:::::T::S: :T,TPPPSOTEV:V:EN::L:R:::,:::::::V:P:::::Y:::A:::QGROTV:K~::::::::::::::RI::::::::::LK:::E::::::A:S: MAT:K:V::::A:V::E:POOAV:RV:OA:E:--::--:N:RL:L-.... *.**Q::GGI::TI::GSS::LR::LO:K:LEH:IE: P~GEATLGR,f~VLGEPVDNL~PVEVTTTFPiHRAAPAfTQ~OTKLS,fET~IKVVDLLAP~RRGGKIGLF~GAGVGKTVLiMEL,NN,LK~HGGVSVFG :::GP::::::::::::::::R::DTR::S::::S:::::::::::::::::::::N::::::::::::::::::::::::::::::::A:::::::::: :::G:::::::::::::,::::::DTSA::::::S::::,E::::::::::::::::::::::::::::::::::::::::::::::::A:::::::::: :::G.............]......OTR::S::::S....,....P............................................A:::::::::: . . . .. .. .. . . . . . .. . . . . ..* .. . . . . .. . . . . . .. . . . . .. . . . . . .. . . . . .. . . . . . . .. . . . .. . V::::::::::::NM::::R::AIE:S:Y:::A :::K::::::M::::::::MK:EIGEEERWA::::::SYEE:SNSQELL::::::I::MC:FAK::: G;GERTREGND;YMEMKESKViNEONISESK~AL"YG~MNE~PGARMRVGL~ALTMAEYFR~VNKQOVLLFiDNIFRFVQA~SEVSALLGR~PSAVGY~P A.................................... :::::::::::::::::::G::::::: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E................................... . . . .. . . . .. . . . . . . .. . . . . . .. . . . . . .. . . . :::::::::::::::::::G::::K::E:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: E................................... .. .. .. .. .. .. .. .. .. .. .. .. ... .. .. .. .. ..R....K..A............:::::::::::::::::::::::: . . . .. . . . . . .. . . . . . . .. . . . . . .. . . . . .. .. .. :::::::::::F:H::TD:N::D-------::S:::::::::::N:L::A::G:::::K:::-EGR:::::V:::Y:YTL::T::::::::::::::::: T~STEMGTLoE~ITSTKEGS,~SIQAVYVPAi)DLTDPAPAT~FAHLDATTV~SRGLAAKGI~PAVDPLOST~TML~PWIVG~EHYETAOGV~~TLORYKE .. .. .. .. .. ..s....................... . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..a......................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . R::::::::I::R::E::::::: . . . . . . S.........K........................... ..,.... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. s...................R...,,.......R::E:.::::: . . . . . .. . . . .. . . . . . .. . . . . .. . . . . G....,.........,............. . . . . . . . . . . . . . . . . . . :: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..R...........R.......... .. ::AE:::V:::::::::T:::::V::::::::::::: S:::::::::::V::::~,:SL:::::::::::::RQ:D:LV::Q:::O::R::OSI::::Q: Lb011 A ILGLD~LSEEDRLTV~RARKIERFL~PFFVAEVF~GSPGKYVSL~ETIKGF~MI~~ELDSLPE~AFYLVGNIO~ATAKAATLO~ES .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ...*....................... . . . . . . . . . . . . . . . . . . . . . . . ..G.A...R...L...........................MN.EH..KLKK . .. .. . . . .. . . . . . . .. . . . . . .. . . .. .. .. .. .. :::::::::::::::::::::::::::::::::::::::::::G:A:::R:::L:::::::G:::::::::::::::ST::IN:EE::KLKK . . . . ..v................. :::M: . . . . . . . . . . . . . . . . . . . . . ..F...............:::G:A:::R::: L:::::::::::::::::::::::::::TN:T :K.... . . :::M:::::::K:V:::::::Q:::::::::::::::::::::::KD::R::KG:ME::Y:H:::::::M::S:E::VE::KK:

Fig. 7.

492 498 498 491

a9.0% 07.2% 87.0%

460

63.13%

Liverwort

(f)

atpE

Liverwwt Spinach Ma1.X Pea E. coli

(H+-ATPase

FT subunit

Chloroplast

Genome

323

II

c)

M-LNLRIMAPfrRIVWNSDIQ~IILSTNSGQiGILPNHASViTALDAGIVKiRL-NWWSTMiLETFQKA :T:::cv~T:::s*:::EvK............v..... . . . . . . . . . . . . . . . ..PTA..V.I::LR:::-::::L:L:::::::R:G::EI ::::::::RG:D::P::::Q:LEI: :K:::VVLT:K::I:DCEVK............V...... . . . . . . . . . . . . . . ..PIN..V.M:PLR:::L:::: L:AV:WS:P:R:V::EII::G::::LG:D::PE:::QALEI: :TF::CVLT::::::D:EVK::::::::::::V:Q:::PIA:::: I::LR:::-K:R:L:::::::::R:G::EI ::::T:::S::D:NP::::Q:LQI: IKP:MIR:VKQHGHEEFIV:S::ILEVQPG:V:V:ADT:IRGQDL:EAR:M:AKR:: MA:TVH:DVVVVVQQMFSGLVEK:QVTGSE:EL::V:G::PL::: KiNLEEAEGNK;(KEIEALLVFkRAKARLEAIk!4SKL EA::RK:::KRQ:-::: N:ALR::RT:V::S:TI:S EA::SK:::T:EL-V::K:ALR::RI:V::V:WIPPSN EA::NK:::KRET-:::N:SLR:::T:V:::VETI:RIS EEHISSSH:DVDV-AQ:SAELAK:I:Q:RLSS

135 134 137 137 133

63.0% 45.9x 60.7% 23.02

Figure 7. Amino acid sequence homologies in each subunit of H+-ATPase. (a) ATPase FO subunit IV or a. Spinach (Henning & Herrmann, 1986) and Pea (Cozens et al., 1986). (b) ATPase F, subunit III. Spinach (Alt et aZ., 1983) and wheat (Howe et al.. 1982). (c) ATPase F, subunit I. An intron is located at the 49th leucine codon (an arrow). Spinach (Henning & Herrmann, 1986) and wheat (Bird et aZ., 1985). (d) ATPase F, subunit a. Tobacco (Deno et al., 1983), wheat (Howe et al., 1985), and E. coli (Kanazawa et al., 1981). (e) ATPase F, subunit fl. (f) ATPase Fi subunit E. Subunits /Yl and E of sninach fzurawski et al., 1982a), maize (Krebbers et al., 1982), E. coli (Saraste et al., 1981; Kanazawa et al., 1982),and pea (Zurawski et al., 1986). ”

identical with the fi subunits of spinach (Zurawski et al., 1982a), maize (Krebbers et al., 1982), and pea (Zurawski et al., 1986), respectively. About 60 amino acid residues at the N terminus are rather divergent among these plant species (Fig. 7(e)). In contrast to the conserved p subunits, liverwort E subunit (135 amino acid residues), the product of the atpE gene (54,362 to 53,955), is lesshomologous to those of spinach (63.0%; Zurawski et al., 1982a), pea (60.7 %; Zurawski et al., 1986) and maize (45.9%; Krebbers et al., 1982; and see Fig. 7(f)). Although the initiation codon of atpE overlaps the termination codon of atpB in most of the higher plant chloroplast genomes, the atpB and atpE genes in liverwort are separated by a spacer of five nucleotides. (e) Genes for photosystem

I polypeptides

There are two tightly linked genes encoding photosystem I P700 chlorophyll a apoproteins (psaA and psaB) that are partly homologous (Fig. 1). These genes were identified by comparison with genes in maize (Fish et al., 1985) and spinach (Kirsch et al., 1986). The psaA (47,207 to 44,955) and psaB (44,928 to 42,724) genes code for polypeptides 750 and 734 amino acid residues long, respectively. The liverwort psaA protein is highly homologous to the spinach (93.2%) and maize (91.2%) products (Fig. 8(a)). Likewise, the psaB protein is conserved (spinach, 92.3%; maize, 91.2%; see Fig. 8(b)). Spacers between the psaA and psaB genes are relatively short (maize and spinach, 25 bp; liverwort, 26 bp), and their sequencesare nearly identical; 5’-TGGCTAAGGAGGATTTGAAAtGCATT-3’.

There is a transition of

A A to G in maize (G m the preceding nucleotide sequence) and an insertion of A in liverwort This

spacer

may represent

a signal

control of the downstream psaB

for translational

gene.

(‘). -

(f) Genes for photosystem

II polypeptides

Four (psbA, psbC, psbD and psbG) of the eight photosystem II polypeptide genes so far identified in

higher

plant

chloroplast

genomes

are

in

the

region. Two of them (5’-psbD-psbC-3’) are tandemly oriented, and no intron-like sequence was found in these genes (Fig. 1). A 32,000 M, protein predicted from the psbA gene (28,368 to 29,429) consists of 353 amino acid residues and retains a highly conservative structure with the protein of higher plants; 96.9% identical with that of spinach (Zurawski et al., 19823) and 96.6% with soybean (Spielmann & Stutz, 1983). The higher plant psbA genes do not contain any codons for lysine, but in liverwort there is a lysine codon at the 238th residue instead of the arginine observed in higher plants (Fig. 9(a)). Another protein called D2 is encoded by the psbD gene (38,855 to 39,916), which is structurally related to the 32,000 M, protein reported by Rochaix et al. (1984). The molecular size of the liverwort psbD protein (353 amino acid residues) is nearly identical with that of the psbA, and the primary structure is highly conserved with the counterparts in spinach (96.3%; Alt et al., 1984; Holschuh et al., 1984) and pea (96.3%; Rasmussen et al., 1984; and see Fig. 9(b)). One of the photosystem II chlorophyll a binding proteins is coded for by the psbC gene (39,864 to 41,285), and the other is encoded by psbB (Fukuzawa et al., 1988), located about 28 kb apart in the liverwort chloroplast genome. We first identified an ORF (previously designated as ORF701) from the observation that the ORF701 product affected the antibiotic sensitivity of the host E. coZi cells (Umesono et al., 1984). Sequence comparison between ORF701 and the spinach psbC confirmed the identity. The amino acid sequence deduced from the ORF701 shares 94.7% homology with spinach psbC (Alt et al., 1984; Holschuh et al., 1984), if the initiation codon of psbC gene is taken at the second ATG of ORF701. The liverwort psbC protein contains 473 amino acid residues (Fig. 9(c)), and the 5’-terminal portion of psbC overlaps the 3’ terminus of psbD by 53 bp (Fig. 2). This over-

324

K. Umesono (a)

ps.s4

Liverwort Spinach Maize

(Photosystk

I P7DD

chlorophyll

et al.

a apopmtein)

MTIRSPEPEiKIVVEKDPViTsFEKWAKPi;HFSRTLAKG-~slTTWIWNL~ADAHDFDSH~NDLEEISRK~FSAHFGQLAiIFIWLSGMY~HGARFSNYE :I::::::::::L:DR::::::::A::::::::::I:::-:E:::::::::::::::::::S::::::::I::::::::S:::L:::::::::::::::: :I:::S::::::A:DR::I:::::E::R:::::::I:::N:D::::::::::::::::::: G:::::::::::::::::S:::L:::::::::::::::: ;WLSDPTHIKPSA~VVWPIV~QEILNGDVG~GF~GIQIT~FFQLWRASGiTSELQLYST~IGGLVFAAL~LFAGWFHYH~AAPKLAWFQ~VESMLNHHL .. .. .. .. .. .. .. .. .. G.......................R..........I.............t.... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.................................... . . . . . .. .. . . . . .. . . . . . .. . . . . .. .. .. . . . . .. .. .. .. .. .. .. ..G.......................R..........I.............C....A.I..S............................... . . .. .. . . . . . . .. . . . . . . . . . .. . . .. .. . . . . . . . . . . .. .. . . . .. . . . . . .. . . . . .. .. . . . . .. . . . . . . . . ~GLLGLG~L~~AGH~~HV~L~IN~LLDAGV~PKEIPLPHE~ILNRDLLAE~Y~~FAKGLT~FFTLNWS~Y~DFLTFRGGL~~~TGGLWLT~TAHHHLAIA .. .. .. .. .. .. .. .. .. .. .. .. .. ..I........F.N.............L........~...... .. ........ . . . . . . . . . . . . . . . . . . . . . . . . . . . E:A:::::::::K:A:::::::::D::::::::::::::::::: .. .. .. .. .. .. .. .. .. .. .. .. .. ..I........F........................o...... .. . . . . . .. . . . . . . . .. . . . . . . .. . . . . .. . . E:A:::::::::K:AE::S::::: D:I::::::S:l:::::::: CLFLVAGHMY~TNWGIGHSFKEILEAHKGP~TGEGHKGLY~ILTTSWHA~~ALNLAMLG~~TIIVAHHMY~MPPYPYLAT~YGT~L~LFT~HMWIGGFLI 1:::I:::::::::::::GL:D:::::::::::Q:::::::::::::::::::::::::::::V:::::::::::::::::::::::::::::::::::: I:::I:::::::::::::GL:D:::::::::::9:::::::::::::::::S::::::::T::V::::::S::::::::::::::::::::::::::::: CGAAAHAAIF~~RD~DPTT~CNNLLDRVLR~RDAIISHLN~V~IFLGFH~~GLYIHNDTM~ALGRP~DMF~DTAI~L~P~~ADWIQNT~A~APNFTAPNA A...................................................SA:::G: :::::::::::::::::::R::D:::::::::::::::::: . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . .. .. . . . . .. . . . . . .. .. . . :::::::::::::::::::R::D:::::::::::::::::::::::::::::::::::::::::::::::::A::::::I:::::::I::G::GV:::G: ~AST~LTWGG~DVIAVG~KV~LLPIPLGTA~FLVHHIHAF~IHVTVLILL~GVLFAR~~R~I~DKANLGF~FP~DGPGRG~TC~~~AWDH~FLGLF~~~~ T:::::::::S:LV:::G............,..................................................................... . . . .. . . . . . . . . .. . . . . . .. . . . . .. . . . . . .. . . . . . . . . .. . . . . .. . . . . . .. . . . . .. . . . . . . . .. . . . . . . . .. TT:::::::::ELV:I:G.............................................................::::::::::::::::::::: . . .. . . . . . . . .. . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . .. .. . . . ~ISVVIFHFS;KMPSDVWGTiSEQGVVTHIiGGNFADSAI~INGWLRDFL~AQASDVIQS~GSSLSAYGL~FLGAHFVWA~SLMFLFSGR~YWQELIESI :::::::::::::::::::S::D:::::::::::::::S:::::::::::::::::::::::::::::::F::::::::::::::::::::::::::::: ::::::::::::::::::::::O::I::::::::::::s:::::::::::::::::::::::::::::::F::::::::::::::::::::::::::::: ;WAHNKLKVAPAIOPRAL~I~PGRAVGVAH;LLGGIATTW~FFLARIIAV~ .. .. .. .. .. .. .. .. .. .. ..T.......V.......T...................... . . .,..... . . . . . .. . . . . .. . . . . .. . . . . . .. . . . .. .. .. .. .. .. .. .. .. .. ..T.......I....... . . . . . . . . . . . . . . . . T....,................. . . .. . . . . . . .. . . . .. .. . . .

(b)

ps.&

Liverwort Spinach Maize

(Photosystem

I P~DD chlorophyll

750

e apoprotein)

MASRFPKFS~GLSDDPTTRklWFGIATAH~FESHDDMTE~RLYDKIFAS~FGDLAIIFL~TSGNLFHVA~DGNFEAWGD~PLHVRPIAH~lWDPHFGDP~ ::L:::R:::::A.......................I:::::::N ::::::::::::::::::::::::::::::s:v:::::::::::::::::::::: . . . . . . .. .. . . . . . . .. . . . . . N..............................S:I:::::::::::::::::::::: :EL:::R:::::,.......................I....... . . . . . . . . . . . . . . . . . . . . . . . . ..*... . . . . . . . . . . . . . . . . . . . . . . ...*.... VEAFTRGGA~GPVNIAYSGtY~WWYTIGL~TNQDLyNGAiFLVlLSSIS~IAGWLHL~P~WKPKVSWFK~AESRLNHHL~GLFGVSSLA~TGHLVHVAI~ .. .. .. .. .. .. .. ..L......................E...T....:LF::V:::LG........... . . . . . .. . . . . . . . . .. . . . . . . . . . . . . . . . . . . s...............'.......... . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..A......................E...T....FLF..TL...G.::::::::::SL::::::::::::::::::::::::::::::::::: . . . .. .. . . . . .. .. . . . . . . .. . . . . .. . . E-SRGEHVRWbNFLTKLPHPiCLGPFFAG~~Nt"AQNVDS~NHAFGTSQG~GTAILTFIG~FHP~TQSLW~TDIAHHHLAiAVVFIIAGH~YRTNFG~GH G-::::Y:::N:::DV::::Q::::L:T::::L:::: P:::S:L:::::::::::::LL::::::::::::::M::::::::F::LV::::::::::::: . . . . . . . . . . . . . . . . . . . . . ..FI.L....~~~~~~~~~~ GS::::Y:::N:::DV::Y:O::::LLT::::L::::P:::::L:::T:::::::::LL....................... ~IKEILE~HTP~GGRLGRGH;GLYDTINNS~HFQLGLALA~LGVITSLVA~HMYSLPPYA~LA~DFTTQA~LYTHH~YIA~FIMTGAFAH~AIFFIRDYN :M:DL::A:I...............................................A... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I.*........................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. ,. .. .. .. .. .. :::,L::A...................... . . . . . . . . . . . . . . . . . . . . . . I..........................A...I...................................... .. . . . . .. . . . . . .. . . . . . . .. . . . . .. . . . .. .. . . . .. . . . . . .. . . . . .. . . . . . . .. .. . . . ~E~NKDNVLA~~~LEHKEAIISHLSWASLFLGFHTLGL~VH~DVMLAFGTP~K~ILIE~IF~~WI~~AHGK~L~GFDVLL~~TNN~AFNAG~SI~L~G~LD ::::E........D........................................................TS::::::::::SG::::::R........N . .._.... . . . . . .. . . . .. . . . . . . . .. . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . .. . .. .. ...E........D......................p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..TT....I .. ::::::G:T::::RN:::::::N

.......,

6tNNNSNSLFiTIGPGDFLVkHAIALGLHT;TLILVKGAL~ARGSKLMPD~KEFGYSFPC~GPGRGGTCDiSAWDAFYLA~FWMLNTlGW~TFYWHWKHl ~V~~....................................".'....."' . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D....................................: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. :V:E....................................*.. . . . . D............................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. ...a.. . . .. . . . . . .. . . . . . .. . . . . .. . . . . . .. . . . . .. . . . . . . . . .. ~LW~GNAA~FNESSTYLMGW~RDYLWLNSS~)LINGYNPFG~N~L~VWAWM~LFGHLVWAT~F~FLI~WRG~W~ELIETLA~AHERT~~AN~~R~KDK~~A ~~~~~~vs................................'............ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . ..I..R..... ~~~~~~vs................,........................................" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7:::::::::::::::::::::::I::R~~~~~ iSIVQARLVGiAHFSVGYIFivnnFLIAST:GKFG .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..*.......... .. . . . . . . . .. . . ::v::::::::::::::::::::::::::::::::

734 734 735

92.3% 91.2%

Figure 8. Amino acid sequences of photosystem I P700 chlorophyl a apoproteins. Liverwort (a) psaA and (b) psaB proteins share 45% homology, as do those of maize (Fish et al., 1985) and spinach (Kirsch et al., 1986).

lapping of p&D and psbC has been described for spinach (Alt et al., 1984; Holschuh et al., 1984). However, we cannot rule out the possibility of the presence of a conserved GTG codon 36 nucleotides downstream from the assigned ATG, because there is an SD sequence (AGGAGG at 39,885 to 39,890 in Fig. 2). Perhaps this GTG is the initiation codon for translation of the psbC mRNA. Steinmetz et al. (1986) identified a new protein (248 amino acid residues long) associated with the photosystem II complex, and analysed the fine structure of its gene, p&G, on the maize chloroplast genome. The liverwort counterpart of psbG (52,524

to 51,793) encoding 243 amino acids was identified and the predicted amino acid sequences were compared. Unlike other photosystem II polypeptides, the psbG proteins had significantly diverged in both the N and C-terminal portions; on the average, they are only 65*Oo/o homologous to proteins in the maize psbG product, although the central portions (maize, amino acid residues 36 to 182; liverwort, 37 to 183) are 91.8% identical (Fig. 9(d)). The liverwort psbG gene overlaps the last seven nucleotides of the preceding ndh3 gene (Fig. 3), and may correspond to tobacco ORF284 (bhpB) (Shinozaki et al., 1986).

(a)

p&A

Li vewort Spinach Soybean C. reinhardcii E. gracilis Anabaena

(Photosystem

II

3PK protein)

MTATLERRE~ASIffiRFCD;VTSTENRLViGWFGVLMIPiLLVGNNIIS~AIIPTSAAI~LHFVPIWEA~ :::,::::::E:L:::::N:,............................................................................... . . . . .. . . . . . .. .. . . . .. . . . . .. . . . . . .. . . . . .. . . . . . . . .. . . . . .. . . . . . .. . . . . . . .. . . . . . .. . . . :::I::::::E:L:::::N:I::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 1:::C::::::::::::::::::::::::::::::::::::::T::V::::N::::::::::::: :::I::::::S:L:A:::E:I:::::::::::::: HISPV:KKYARP:L:V:::A::A:KK:::: V::::::::::::::AT:::::::::::::::::::::::::F:::::LT::VV:::N::::::::::::T ::T::QQ:S::NV:E:::T:I:::::: 1:V::::::::::::A::VC:::::V::::::::::::::A::: 1:::::::::VV:S:N::::::::::::: SVDEWLVNG~PYELIVLHFiLGVACVffiR~WELSVRLGM~PWIAVAVSA~VAAATAVFLiVPlGOGSFS~GMPLGISGT~NFMIVFQAE~NILMHPFHM~ .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ...a.... . . . . ..F................................................................. .. . . . .. . . . .. . . . . . .. . . . . .. . . . . . . . .. . . . .. . . . . . . .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..F................................................................. . . .. .. . . . .. . . . .. . . . . . . .. . . . . . .. . . . . .. . . . . . .. . . . . .. :L::::::::::9:::C::::::y:::::::::::F .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. s .. .. .. .. V .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. IV::L..................................... :L::::::::::Q:::C::FI:ICS:::::::::F:::::::::::::::::::S::: . . . . . .. . . . .. .. . . . .. . :L::::::::::Q:VIF::: 1:C:::L::P:::::::::::::C::::::L:S:::::::::::::::::::::::::::::::::::::::::::::::

. . .. .

. . .. .

. .. . . . . . .. .. .. .. .. .. . . . ..

.. .. .. .. ..

. . .. .

.. .. . . . . .. .. .. .. .. .. . . .. .

. . .. .

. . .. .

GvAGVFGGSiFS~HGSLViSSLIRETTE~ESANAGVKFi;~EEETVNIV;\AHGYFGRLIiQYASFNNSR~LHFFLAAWP~VGIWFTALGiSTMAFN~NG~ ::::::::::::::::::::::::::::::::::E::R:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: .................... ....................

:::::::::::::E::R..............................................................

.................... ....................

:::::::::::::E::R::::::::::::::::::::::::::::::::::::::::::i::::::::i::::::::::. . . . . . . . . . . . . . . . . . . . . . . . . . ..a.............

.................... .................... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..

NFNQSVVDSi)GRVINTWAOi .................... .................... .................... .................... ........... .............

..L

...... ......

::L::::::::I:V:::::::::::::I:::A:::::::::::::::::::::::V::::::::::::V:::::::::: ::":::::,::Q:",.................................9.......... . . . . . . . . . ..*...................... . NRANLGMEVMHERNAHNFbLDLAAVE-------AP;\VNG II ::::::::::::::::::::::::,:-------::ST:: ::::::::::::::::::::::::,D-------::s*:: :::::::::::::::::::::::STN-------SSSN:

:::::II::::::::::::::::::::::::::::::::::::::G:VAPVALT:::I:: psbD

Liverwort Spinach Pea C . reinhardtii

(Photosystem

II

. . . . . .. . . .

. . . . . . . . ..I........V:::::::::: 353 353 353 352 345 360

.. .. .. .. ..~................................*.... . . .. . . . . . .. . . . . . . .. . . . . .. . . . .. . . . . . .. . .

(b)

.. . . . . . .

96.9% 96.6% 91.2% 84.1% 87.5%

DZ protein)

MTIAIGKSSicEPKGLFDsM~DWLRRDRFV~VGWSGLLLF~CAVFALGGW~TGTTFVTSW~THGLASSYL~GCNFLTAAV~TPANSLAHS~LLLWGPEAO~ ::::"::FT:DE:D...................................................................................... ::::L::FT:WND:::i:::i:::::::::::::::::::::::;:::::::::::::::::::::::::::::::::::::::::::::::::::::: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..I........................................ :::::::TYQ:KRTW::DA:::::Q::::::::::::::::::::::::L:::::::::::::::T:::::::::::::::::::M:::::FV::::::: DFTRWCQLG~LWTFVALHGiFGLIGFMLRbFELARSVQL~PVNAIAFSG~IAVFVSVFLiVPLGOSGWF~APSFGVAAI~RFILFFOGF~NWTLNPFHM~ .. .. .. .. .. .. .. .. .. .. ..A........A.............................................................................. . . . . . . ...* .. . . . . . . .. . . . .. .. . . . .. . . . .. . . . .. . . . . .. . . . . . .. . . . . .. . . . .. . . . . . .. . . . .. .. . . . . .. . . :L.................................................................................................. . . .. . . . .. . . . . .. .. . . . .. . . . . . . .. . . . .. . . . . . .. . .. .. . . . .. . . . . .. . . . . . .. . . . .. . . . . .. . . . . . .. . . . . . .. . . . . .. . . ............. . . . . . . . . . ..A ,............*.....I.... . . .. . . . .. . . . . .. . . . .. . . . . . . .. . . . .. . . . . .. . . . .. . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . N.........,A................................................... GVAGVLGAAiLCAIHcnTV~NTLFEDGOG~NTFRAFNPT~SEETVSMVT~NRFWSQIFG~AFSNKRWLH~FMLFVPVTG~WMSAIGVVG~ALNLRAYDF~ A...........................................L............... :::::::::::::::::::::::::::::::::::::::: . .. . . . .. . . . . .. . . . .. . . . . . .. . . . .. . . . . . . .. . . . . . . . .. .. . . . . .. . . A........................................... :::::::::::::::::::::::::::::::::::::::: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..L............... A.........................,...... :::::::::::::::::::::::::::::::::::::::: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L.......................... . . .. . . . . . .. . . . .. .. . . . . .. . . S~EIRAAED~EFETFVTKNiLLNEGIRAWflAAPDQPHEN~VFPEEVLPR~NAl . . . . . . . . ...* ::::::::::::::::::::::::::::::::::::::::1............ .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..T..................... . . . . . . . .. . . .. . . . .. . . . . ::::::::::::FFSIFIIPNHIINGSVFFNKSQKQIVYI

(c)

pstC

Liverwort Maize

(Photosystem

II

P6GO chlorophyll

353 353 353 339

96.3% 96.3% 82.4%

e apoprotein)

MKILYSQRR~VPVETLFNG~LAL~RDQE~TGFAWWAGN~RLINLSGKL~GAHVAHAGLiVFWAGAMNL~EVAHFVPEK~NYEOGLILL~HLATLGWGV~ ::T:::L::::::::::::::T:A............................................................................ . .. . . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. .. . . . . .. . . . .. . . . .. . . . . . .. . . . .. .. . . . . .. . . ::T:::L::::H...........T...........................................................................: . . . . . .. . . . . .. .. ....*. ..L....H....::::FV:A::::::::::::::::::::::::::::::::~:::::::::::::::""..""..'.'..'."""" G........................... .... . . . . . .. . . . .. .. . . . . .. . . . .. .. . . . . . . . . . . . . . . . . . . . ..*.......................... PGGEIVDTF~VFVSGVLHLiSSAVLGFGGiYHALIGPETiPGGGDV~Kl~ ::::V,............................L..................R...............,..............V.............., . . . . .. .. . . . . .. . . . . . .. . . . .. .. . . .. . . . .. . . . .. . . . . . . . . . . .. . . . .. . . .. . . . . .. . . . . . . . . . . . .. . . . . .. . . ::::v,............................L.................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..R...............I.S:::::::F:::::::::::::::::::: ::::VL:::::::::::::::::::::::::O::L::::::::::::::::::R:::::::::::::::L::::::L:::::::V::::::::::::::: NLTLSPGVI~G~LLKSPFG~EGWIVSVDN~EOIIGGHVW~GS~C~FGG~~H~LTKPFAW~RRALVWSGE~YLSVSLGAI~VFGFIACCF~WFNNTAYPS~ :v::::s,:::c................. . . . . . . . . . . . . . ..cl..........*.v... L..............................A:LS.................... . .. . . . . .. . . . . . .. . . . . .. . . .. . . . . . . . .. . . . . . .. . . . . .. . . :F::::~,L...................~................ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L................................L..................... . .. . . . . .. . . . .. . . . . . .. . . . .. . . . . . . . . . . .. . . . . . .. . . . . .. . . D..............."L:................ :::::::::::::::::::::::::::: . . . . .. . . . .. . . . . . . . . . . . . . . . . . . ..F..............LS:::::::::::::::::::: FYGPTGPEAjQAOAFTFLV~D~RLGANVG~AOGPTGLGK;IMRSPTGEliFGGETMRFw~LRAPWLEPL~GPNGLDLSK~KKD~QPWQE~RSAEVMTHA~ v.............................R.................... ::::::::::::::::::::::::::::::::::::::::L::::::: . . . . .. . . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . . . . .. . . . .. .. L.......".............................R..................... :::::::::::::::::::::::::::::::::::::::: . .. . . . . . . . . .. . . . . .. . . . . .. . . . .. . . . . . . . . . . .. . . . . . .. . . . . .. . . ".............................R.................... . . . . .. . . . . . .. . . . . .. .. ::::::::::::::::::::::::::::::::::::::::L::::::: ,............................ LGSLNSVGG~ATEINAVNY~SPRSWLATSHFVLGFFFFV~HLWHAGRAR~AAAGFEKGI~RDFEPVLSM~PLN S.........L.................................... . . . . . .. . . . .. . . . .. . . . .. . . . . . .. . . . . .. . . . .. . . . .. :::::::::::::::::::::::::: L.................................... :::::::::::::::::::::::::::::::::::: .. . . . .. . . . . .. . . . .. . . . . . .. . . . . .. . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..‘...................................~....y..... . . . . .. . . . .. . . . .. . . . .. .. . .. .. . . . . .. . . . . . . . . . . . . .

(d)

ps#:

Liverwort Maize

(Photosystem

II

473 473 473 473

94.7% 95.3% 94.7%

G protein)

MvLNFKFFTtENSLEDNSTiMLKNSIESS~lNKTLTNSIiLTTFNOFSN~ARLSSLWPL~YGTSCCFIE~ASLIGSRFO~DRYGLVPRS~PR~AD~IIT~ :::TE"SEKKKKEGK:-:,ET,MSL::FPLLDQ:SS::V:S::P::L:::S.............................................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..L.. GT~~KMAPSL~RL~EPMP~PK~~I~GAET(TGGMFST~S~TT~RG~~~L~PV~I~L~~CPPKPEAII~AIIKLRKKI~~EIY~EKKI~KKGTRFFTLN s............v........... :::::::::::::::::::::::::::::::::::::::::: . . . . . . . . . . . . . . . . . . . . . ..V...LT......:R::I:DRTLCQSQKKNRSFT HQFNFFSNLbNPKLTSSNQiF-OSKKTSKViLETSLTFKEiENL TRHKLYVRRSTHTG:VEOELLY::PS:LDISS::FFKS:SSVSSVKLVN

243 248

65.0%

Figure 9. Amino acid sequences of photosystem II proteins. (a) Herbicide-binding 32,000 M, protein. Spinach (Zurawski et al., 19823), soybean (Spielmann & Stutz, 1983), C. reinhardtii (Erickson et al., 1984), Eu. gra&&y (Keller & Stutz, 1984), Anabaena 7120 (psbA1) (Curtis & Haselkorn, 1984). Note that spinach, soybean and C. rei&&tii p&A proteins do not contain a lysine residue, but that those of liverwort, Eu. gra&% and Anabaem do. (b) 32,000 Jf,-like D2 protein. Spinach (Alt et al., 1984; Holschuh et al., 1984), pea (Rasmussen et al., 1984), and C. reinhurdtii (Rochaix et al., 1984). (c) Chlorophyll a apoprotein or 44,000 M, protein. Spinach (Alt et aZ., 1984; Holschuh et al., 1984). Pea and maize (Bookjans et al., 1986). (d) G protein. Maize (Steinmetz et aZ., 1986).

326

K. Umesono et al.

Liverwort E. coli S. typhimurium E. coli

HbpX MdK HisP PstB

MSILIYKVS~SLGNLKILDiVSLYVPKFSiIALLGPSGS~KSSLLRIIA~LDNCDYGNI~LHGID-----VT-NI---~--TOYR----RMS MA:VQLQN:T:AW:EVVVSKDIN:DIHEGEFVVFV:::: C:::T:::M::::ETItS:DLFI-:-EK--RMNDTP--PAE---:---G-VG MMSENKLHVIDLH:RY:GHEV:KG::: 0ARAGDV:SII:S:::::: TF::C:NF:EKPSE:A:IVN:QNINLvRDKDGQLKVADKN:L:LLRT:LT RSMVETAPSK:OVRNLNFYY:KFHA:KNIN:DIA:NPVT:FI::::C:::T::: TFNKMFEL-:PEQRAE:-EILL--DGD::LTN:QD--1ALLRAKVG FViQHYALFKHMiVYEN-ISFGL~LRG-FSAPKIiNKVND-L---LNC~RIADISFEY~A~LSGGOKQ~VALARSLAI~PDFLLLDEP~~LDGELRR~L M:::S:::YP:LS:A::-M:::.: KPA:-AKKEV:NQR::O-V--A-EV:QL:HLLDRK:KA:::::R::::IG:T:VAE:SVF:::::LSN::AA::VQM:::HFN:WS::::L::VMEAPIQVL:-L:KHDARERALKY:--A-KVGIDERAQGK::VH:::::O:::SI::A::ME::V::F:::TS:::P::VGEV M:::KPTPFP-:SI:D:I-A::V::FEKL:RADMDER:QWA:TKAALWNETK:KLHQSGYS:::::Q::LCI::GI::R:EV::::::CS:::PISTGRI

LR-IMQO:AEE-G:-:MVV:::EffiF:RHVSSHVIF:HQ:KIEEE:D:EQV-F-GN:QSPRLQQ-::K:S:K----------------------1----EE:ITE:KQDY---:VVI::: NMOQ:ARCS:HTAFMYL:E:I-EFSNTDDLFT-K:-AKKQTEDYIT:RYG-------------------------FDPIWV~~IFANRSINK~'RFFLRPYEF~IKSEMDLEA~PYP LPVESRDVQVGANMSLGIRPEHLLPSD:A-DVI::GE-::VVEQLGNE:--:IHIQIP:IRQ::VYRQNDVVLVEEGATFAIGLPPERCH:FREDCIT:CR

ILTNIKKN R:HKEPGV --------------

370 370 258 257

28.1% 21.4Z 18.4%

Figure 10. Structures conserved between liverwort mbpX and bacterial membrane subunits of transport complexes. Amino acid sequences G35 ‘. -G40-K41S42 and R145.. -F153-L154-L155-L156-D157 found in liverwort mbpX product correspond to the adenine nucleotide binding consensus (Walker et al,, 1982). Genes w&K (Gilson et al., 1982), h&P (Higgins et aE., 1982) and p&B (Surin et al., 1985).

(g) A bacterial permease-like gene An ORF encoding a polypeptide of 370 amino acids was tentatively designated as mbpX (37,012 to 38,124), because the amino acid sequence shows homology to those of inner membrane components of bacterial permeases such as hisP (Higgins et al., 1982), m&K (Gilson et al., 1982), oppD (Higgins et al., 1985), pstB (Surin et al., 1985) and rbsA (Buckel et al., 1986) proteins of the histidine, maltose, and ribose transport oligopeptide, phosphate systems, respectively, in size as well as in the primary structure (e.g. 28.1% homology with malK; Fig. 10). Amino acid sequence homology

(a)

bet#ween mbpX and the bacterial inner membrane subunits has been detected in other bacterial proteins involved in cell division (FtsE), nodulation (NodI), haemolysin transport (HlyB), and DNA repair (UvrA; Higgins et al., 1986; Doolittle et al., 1986).

(h) Genesfor NADH

dehydrogenase

The liverwort chloroplast genome contains a set of homologues of human mitochondrial “URF” or ND genes encoding components of a respiratory chain NADH dehydrogenase (Ohyama et al., 1986).

dh2

Liverwort Human-URF2

MKLELDMFFiYGSTILPECiLIFSLLIlLiIDLTFPKKD~lWLYFISLl~LLISIIILL~QYKTDPIIS~LGSFQTDSF~RIF~SFIVF~$ILCIPLSI~ TA:-------------------------------------------------------------------------INP:AOPVI:-:::FAGT-::YIKCAK~AI~~FLIFILTA~VGG~FLC~~DLVTIFVSL~CLSLCSYLL~G~TKRDI-RS~~AAIKYLLI~GTSSSILAY~FSWL~G~SG~ETNI~KITN ----------------------------SSHWFFTW:G::MNM:-AF[PV-L::KMNP;:T:::::: F:TQA:A:M::LMAILFNNEI:::Qw:MTNT-:: GLLNAETYNSSGTFIAFICIiVGLAFKLSLVPFHOWTPDI~~TPVVAFLSVTSKIAG~ALATRI-LNI~FSFSPN~WKiFL~ILAILS~ILGNLVAI ------Qy:: -____- LM:,.,,+j,.,:,.,::G"A::: F:V:EVTO:T:--------L::GLLL:TWOKLAPIS:MYPI::SLNVSL:LT:S:::IMA:SWGGL T~TSMKRMLAY~SISQIGYILiGLITGDLKG~TSMTIYVFF;IFMNLGTFAi;IILYSLRTGiDNIRD---YffiL~IKDPLLSFSiTLCLLSLGG~PPLTG N::QLRKI::::::THM:KMMAVLPYNPNMTILNL::: I------I:T:T:FLL:N-:NSS:TTLLLSRTWNK:TWLT::I-P:-:--:::::::::::: FFGKiYLFWCGWQS~FYLLVFI-ALiTSVISLYYYiKII----KLILTK~NNEINPYIQ~YIITSPTFF~KNPIEFVMI~CVLGSTFLGiIINPIFSFF~ :LP:WAIIEEFTKNNSLIIPTIM:T::-LLN::F::RL:VSTSITL:PMS::-VKflKW:-FEH:K::P:LPTL:ALTTLLLPI-:P:MLM:L-------DSLSLSVFFiK ___--------

(b)

501 347

22.79.

ndhf

Liverwort Human-URF3

MFLLOKYDV~FVFLLIISF;SILIFSLSK~IAPINKGPE;FTSYESGIE~ffiEACIQF~iRYYMFALVF~IFDVETVFL--~PWAMSFYNF~ISSFIEAl MN:ALI:M:NTLLAL:LMIITF:LPQL:GYM::S:P::C:FD::SP:RV~SMKFFLv:IT:LL::L:IAL:LPL:::L~TT:L-PLMvMSS: IFILILIIGLVtAWR-KGAtEWS LL:I::ALS:A:E:LQ::-:D:TE

120 115

30.8:!

Figure 11. Structural similarity between liverwort chloroplast ndh and mitochondrial URF or ND proteins. Human ND2 and ND3 (Anderson et aZ., 1981) are components of respiratory chain NADH dehydrogenase complex (Chomyn et al., 1985). (a) Homology between n&2 and human ND2 proteins. An arrow indicates the presence of an mtron in the n&2 gene. Possible initiation methionine codons are marked with asterisks. (b) Homology between ndh3 and human ND3 proteins.

Liverwort (a)

oRF62

Liverwort Spinach Wheat (b)

MTIAFOLAViALlAISFLL~IGVPVVLAS6EGWSSNKNV~FSGASLWIG~VFLVGILNS~IS .. .. .. .. .. .. .. .. .. .. .. .. ..T.S*.L.s....F...D.......*....T...L...........L.. .. . .. . . . . . . . . .. . . . .. . . . . .. . . . ::::::::::::::T:SV:::S::L:F:::D:::N::::::::T::::::::::A::::L::

ORF55

Liverwort Tobacco

82.3% 82.3%

ORF55 ORFl

M-(30 M-(36

AMINO AC,Ds)-~FN------I"LENAFYLNGITFAK~PEAySIFOPiVDV"PIIPL~FFLLAFVW~~SVSFR AMINO ACIDS)-:L:TFSLIG:C:NSTLFSSSFF:G::::::AFLN::::I::V::::::::::::::A:::: MAEVKQES:S::: EGEAK:FHK::TSSIL:FFGVAA:AH::VWI:RPW:PGPNGYSALETLTQTLTYLS MAOK:OLSFT:L:OEQAQ:LHAVYMSGLSAFIAVAVLAH:AVMI:RPWF

86 or

55 90 69 49

63.6% 23.6% 10.9%

ORF36a

Liverwort Tobacco (d)

62 62 62

(IhcA)

R. rubrum R. capsulata

(c)

327

Genome.II

Chloroplast

1 MLTLKLFVYTiVIFFVSLFVfGFLSNOPGRiPGRKE MI~SLFFKKNHLGI-JCV...................~......... . . . . . . ..*.......... . . . . . . . ..I.O..E.

ORF36a ORF2

if2

08.94

ORF370I

Liverwort Tobacco

--------------___---------------------------------------------------------------------------------MEEIORYLQPDRSWHNFLYPLMFQEYIYALAHDHGLNRNRSILLENPGYNNKLSFLIVKRLITRMYWNHFLISTNDSNKNSFLGCNKSLYS~ISEGF 1 ---------------------------------------MEHRIYNSN~FLDITlPYF~HPEILIRIF~RHIQDIPFL~FLRTLLYKN~CLNIL---NIE AFIVEIPFSLRLISSLSSFEGKKIFKSYNLRSIHSTFPFL:DNFSHL::V::: L:::PV:L:::VQTL:YWVK:ASS::L::FF:HEFWN::S:ITSKKP NiFYLKKNP-FFtFLWNFYIYE~EYLLNDIWE~FYKFESVFF~NFIDKTNSI~KIKHILKKS~KPIE-KKIVK~ISSIHYIRY~NNLlITLND~NILILE GYSFS::::R::F::Y:S:V::C:STFVFLRNQSSHLR:TS:GALLERIYFYG::ERLVEVFA:DFOVTLWLF:DPFM::V::QGKS:LASKG-TF:LMN NWK~FFLIFWQKY~NVWFKSSRIiIPNFYKNSF~FLGYMFRIE~~IILIQIQIiNLLRNVNL-I~KfFCSlIPViPLIRLLAKE~FCDVLGRPL~KLSWT K::FYLVN:::CHCSLC:HTG::H:NQLSNH:RD:M::LSSVRLNPSMVRS:MLENSFLI:NA:::-:DTLV:I::::GS:::AN::T:::H:IS:PV:S TLSD~EIFERFWIiKHIFSYYSGtlNKKGLYQL~YIFRFSCAKiLACKHKSTI~TVWKKYGSN~LTSSIFFNK~KLISLNFSN~NPYKKNFWY~NIIOV O:::~D:ID::GR:CRNL:H::::SSK::T::RIK::L:L:::R:::R::::: V::FL:RS::E::EEFLT-SEEQVL::T:PRAS-SSLWGV:RSR:-NYLAHSLQKSKLLKE W::DIFCINDLANYQ

(e)

370 509

34.3%

ORF167

Liverwort Maize

(134

AMINO ACIDS)-WFDQA~EYWKQAILLAP~NV~E~HN~~LKMTGRF -:::::::::::::A:T:':::::Q:::'I:K::EFE

167 ?

Figure 12. Conserved amino acid sequences of ORFs detected in chloroplast DNA sequences. (a) ORF62 is highly conserved among liverwort, spinach (Holschuh et al., 1984) and wheat (Quigley & Weil, 1985). (b) ORF55 (ZheA) contains 86 or 55 amino acid residues. Sequence similarity to tobacco ORF 1 (Deno & Sugiura, 1983) is localized downstream from the 2nd methionine codon. ORB55 (ZheA) also shares structural homology with the fl chains of the light-harvesting complex from Rhodes. rubrum (Berard et al., 1985) and Rhodop. cupsuZata (Youvan et al., 1984). (c) ORF36a corresponds to the carboxy-terminal portion of tobacco ORF2 (Deno & Sugiura, 1983). (d) ORF370i is entirely within the tmK(UUU) intron. In tobacco, the corresponding intron also includes an ORF of 569 amino acid residues (Sugita et al., 1985). (e) The last 33 amino acid residues of ORF167 can be seen in maize chloroplast DNA (Fish et al., 1985). Two genes in this region, ndh2 corresponding to ND2 and ndh3 corresponding to ND3, were identified by comparison of their amino acid sequences. The ndh2 gene (1514 to 3555) is split by a group II intron of 536 bp and it specifies 501 amino acid residues, much more than the human mitochondrial ND2 gene product (347 amino acids; Anderson et al., 1981). The discrepancy in their lengths is caused principally by an additional stretch of N-terminal amino acid residues in the ndh2 product. This portion of the ndh2 polypeptide is removable if either the third or fourth ATG codon, marked with asterisks in Figure 1 l(a), is used for the initiation of translation. Taking the first methionine codon tentatively as the initiation codon, the amino acid sequences of ndh2 and the human mitochondrial ND2 proteins are 22.7% identical. Northern blot analysis indicated that the ndh2 gene is actively transcribed in the chloroplasts (K. Umesono, unpublished results). The other gene, ndh3 (52,877 to 52,515), is in the upstream region of p8bG with an overlap of ten nucleotides. The ndh3 product (120 amino acids) is similar in size to a human

mitochondrial ND3 protein (115 amino acids; Anderson et al., 1981) sharing 30+3% homology (Fig. 11(b)). The protein products of these chloroplast ndh genes have not been identified. Similar observations on a series of ndh genes have been reported for tobacco chloroplast DNA (Shinozaki et al., 1986). (i) Other ORFs There are at least 15 different ORFs in this region, with products that range in length from 29 (ORF29) to 2136 amino acid residues (ORF2136). Some of the ORFs appeared to be phylogenetically conserved among the higher plant chloroplast genomes, suggesting that they are likely to be active genes. It has been shown that “URF-62” is a conserved unidentified ORF encoding 62 amino acids in the wheat, maize, spinach and tobacco chloroplast genomes (Quigley & Weil, 1985; Shinozaki et al., 1986). Accordingly, the liverwort counterpart of “URF-62” was identified as ORF62 (previously called ORF702; Umesono et al., 1984) by amino

328

K. Umesono et al.

acid sequence comparison (Fig. 12(a)). The amino acid sequence of ORF62 is 82.3% homologous to that in wheat and spinach. There are two ORFs (ORFl and ORF2) in the region between the tobacco chloroplast tmS(GCU) and t@(UUG) genes (Deno & Sugiura, 1983). In the corresponding region of the liverwort chloroplast DNA, we found two ORFs that we designated as ORE’55 (23,605 to 23,438) and ORF36a (23,107 to 22,997) with sequences partially homologous to those of the tobacco ORFl and ORF2, respectively. The ORF55 encodes 86 or 55 amino acids, depending on the choice of start codons. Comparison with the tobacco ORFl indicates that the amino acid sequence homology is only in their C-terminal portions just after the second in-frame ATG codons (Fig. 12(b)). Therefore, we have tentatively assigned 55 amino acids to ORF55, where they share 63.6% identical amino acids. A database search showed that ORF55 is partially homologous to light-harvesting polypeptides (fi chain) of photosynthetic bacteria such as Rhodospirillum rubrum (Berard et al., 1985) and Rhodopseudomonas capsulata (Youvan et al., 1984; Fig. 12(b)). Therefore, we tentatively designated ORF55 as 1hcA. Genes for the /? and a chains of the bacterial light-harvesting complex are tandemly oriented as 5/-/l chain-a chain-3’ (Youvan et al., 1984; Berard et al., 1985). Conserved ORF36a is downstream from ORF55 (IhcA), but no significant homology was seen between ORF36a and bacterial u chains. Likewise, a small polypeptide predicted from ORF36a is 88.9% identical with the last 36 amino acids of the tobacco ORF2 (Fig. 12(c)). As described above, ORF370i (26,976 to 28,088) is located completely within a long tmK(UUU) intron. The fine structure of the tobacco counterpart has been reported; it consists of 509 amino acids (Sugita et al., 1985), containing 139 more amino acids at the N terminus than the liverwort ORF370i product did (Fig. 12(d)). These two ORFs have structural similarity in their amino acid sequences, although at a low value (34.3%). A split gene ORFl67 (48,599 to 47,488) is upstream from psaA. The maize chloroplast DNA sequence in the psaAB locus (Fish et al., 1985) may partially include the second exon of ORF167. The maize sequence positions from - 719 to - 612 reported by Fish et al. (1985) might encode 36 amino acids with a sequence 72.2% identical with the last 33 amino acids predicted from ORF167 (Fig. 12(e)). A counterpart of ORF167 can be seen in the corresponding region containing ORF82 (Shinozaki et al., 1986) of the tobacco chloroplast genome, but it seems to be split into three exons by our method for the prediction of group II introns (data not shown). Our preliminary analysis of the tobacco chloroplast DNA sequence (EMBL database ver. 12) indicated the presence of homologous sequences to the liverwort ORF29 (5257 to 5168), ORF34 (4001 to 4105), ORF169 (51,742 to 51,233) and ORF2136 (29,909 to 36,319). Counterparts of the

latter are designated as two ORFs, ORF158 and ORF1708, in tobacco (Shinozaki et al., 1986). We could not detect tobacco sequences sharing homologies with the liverwort ORF30 (22,425 to 22,333), ORF32 (22,516 to 22,614), ORF33 (22,263 to 22,162) ORF50 (25,769 to 25,921), ORF135 (4236 to 5128), or ORF513 (24,053 to 25,594).

4. Discussion The gene organization of the liverwort chloroplast genome has several distinctive features. In the region described here, an inversion of about 30 kb could be seen between the liverwort and tobacco chloroplast genomes. The rearranged region is flanked by two tRNA genes, tmL(CAA) and tmD(GUC), and is located entirely within the LSC region in liverwort, covering at least the region from ORF34 to ORF2136. This region is partially included in the IR, sequence in tobacco (Fig. 13). Within the rearranged region, however, the relative gene locations seem to be conserved, except for a redundant copy of tmI(CAU)-rpZ23-rpZ2 and a ribosomal protein S16 gene (rpsl6), both found in the tobacco genome (Shinozaki et al., 1986). Instead of the tobacco rps16, we found ORF513 and ORF50, which are completely different from rpsZ6, and we did not detect rpsl6 anywhere in the liverwort chloroplast genome (Ohyama et al., 1986, 1988). With the spinach-type chloroplast genome regarded as ancestral, a large inversion in chloroplast DNA has been observed to be frequent (Palmer, 1985). One of the best-characterized inversions occurs in the wheat chloroplast genome, with endpoints associated with repeated sequences and flanked by tRNA genes (Howe, 1985). Unlike in wheat, there are no repeated sequences near the boundaries of the rearranged region in the liverwort. This rearrangement is, however, different from the others, because no DNA rearrangement among plant chloroplast genomes includes a psbA gene that is located near the border of the inverted repeats (IR) and LSC region (Palmer, 1985). The IR sequences in liverwort and fern chloroplast DNAs are very similar in size (Stein et al., 1986), but the fern Oamunda cinnumomea does not contain the liverwort-type DNA rearrangement in the LSC region (Palmer & Stein, 1986). Detailed comparison of the chloroplast gene organization will help to evolutionary relationships among elucidate liverwort, fern and higher plant chloroplast genomes. Our finding of putative genes for RNA polymerase subunits, rpoA (Fukuzawa et al., 1988), rpoB, rpoC1 and rpoC2, in the liverwort chloroplast genome may be controversial, because chloroplast RNA polymerase subunits have been shown to be synthesized from poly(A)+ RNA, indicating that they are nuclear-encoded gene products in spinach (Lerbs et al., 1985). However, it has been demonstrated that there are two distinct RNA polymerase activities in both spinach and

Liverwort

Chloroplast

Figure 13. Inversion between the liverwort and tobacco chloroplast genomes. Only a portion of the chloroplast genomes is shown. The gene organization of the tobacco chloroplast genome is quoted from Shinozaki et al. (1986), except for ORF29 and ORF34, which were deduced from our analysis with the liverwort sequences. Genes for tRNAs are represented by the l-letter amino acid code with their anticodon sequences in parentheses. Arrows indicate one of the IR and LSC junctions (JLA). A region from ORF34 to ORF2136 in the liverwort can be seen in a reverse orientation (ORF1708 to ORF34) in the tobacco genome, as indicated by filled triangles. The regions are flanked by 2 tRNA genes, tmL(CAA) and tmD(GUC), in both genomes.

Eu. gracilis chloroplasts (Greenberg et al., 1985). In Chlamydomonas reinhardtii, nuclear and chloroplast genomes share DNA sequences homologous to E. coli rpo genes (Watson & Surzycki, 1983). Preliminary studies indicated that the liverwort chloroplast rpoC1 gene was not a pseudogene but was transcriptionally active, and that the transcripts were correctly spliced in chloroplasts from cultured cells (K. Umesono & K. Nakahigashi, unpublished results). Therefore, the chloroplast rpo genes may be required to express a part of the genetic information in the course of chloroplast

Genome. I I

329

development, Comparison with the E. coli RNA polymerase subunits encoded by four genes (rpoA, rpoB, rpoC and rpoD) showed that there is no rpoDlike ORF in the liverwort chloroplast genome. A gene rpoD encodes a sigma subunit necessary for accurate promoter recognition (Burton et al., 1981). We suspect that the sigma subunit for the RNA polymerase is chloroplast-encoded synthesized in the cytoplasm and then transported into chloroplasts, or that we have missed the rpoD gene in the chloroplast genome because of its low level of homology to its prokaryotie counterparts. For the tobacco chloroplast genome, a similar organization of rpo genes has been reported, but the rpoC locus seems to be more complicated than in the liverwort (Shinozaki et al., 1986). The p&A proteins are the most conserved, but the liverwort psbA gene contains a lysine codon at position 238 not found in higher plants. In Anabaena 7120 psbA1, a transcribable gene in a complement of two psbA genes also contains a lysine residue (Curtis & Haselkorn, 1984). A strong preference for either A or T in synonymous codon choice can be seen at the third-letter position in identified and unidentified protein genes in the liverwort chloroplast genome (Ohyama et al., 1988). The codon usage pattern in psbA is particularly divergent in the choice of the third-letter pyrimidines (Y) for two-codon families such as asparagine (AAY) , aspartic acid (GAY), cysteine (UGY), histidine (CAY), phenylalanine (UUY), serine (AGY) and tyrosine (UAY) codons. In these codons, the psbA gene appears to use C twice as often as T (Table 1). A preference for C residue was observed in the choice of isoleucine codons. This peculiar codon usage pattern in the psbA gene may be correlated with physiological stability or translational efficiency of mRNA molecules. As described above, a polypeptide predicted from

Table 1 Codon usagepattern in the liverwort psbA and other psb genes psbA

Codon ucu ucc UCA UCG

8 0 2 0

38 2 11 1

ecu ccc CCA CCG

72 9 27 41

ACU ACC ACA ACG

Thr

Met

17 12 0 12 11

Val

1i

62 4 43 8

GCU GCC GCA GCG

Ala

Phe

1;

Leu

1;

cuu cut CUA CUG AUU AUC* AUA AUG GUU GUC GUA GUG

pab 132 17 114 14

Codon

uuu uuc* UUA UUG

Leu

Ile

0

pabA SW

Pro

12 0

4 0 12 0 3 0 14

P8b 49 10 15 4 49 3 35 3

0

55 7 40 3

32 0 6 0

87 5 47 6

;

Codon

UAU UAC* UAA UAG CAU CAC* CAA CAG

AAU

psbA TY~ Ter Ter

l; 1 0

His

5 5

Gln

i

AAC*

Asn

AAA AAG GAU

LYS

0 1

Asp

4 4

GAC* GAA GAG

Glu

1:

‘;

psb 43 7 6 1

Codon UGU UGC* UGA UGG

34 4 39 5

CGU CGC CGA CGG

47 11 59 5

AGU AGC* AGA AGG

61 6 74 8

GGU GGC GGA GGG

psbA CYS

;

Ter Trp

0 10 10

psb 12 1 0 54

4

; 0

36 3 6 0

Ser

3 7 2 o

30 4 30 3

29

88 9 70 11

Ax

Gly

; 0

Columns of pa6 in the Table include psb3, psbC, psbD, pabE, psbF, pa68 and pabH genes (this paper; Fukuzawa etal., 1988). Codons with asterisks are preferentially used in the pabA gene, but not in the other psb genes. The codon choice pattern of the psb genes in the Table can be observed in the other protein genes encoded by the chloroplast genome (Ohyama etal., 1988). Ter, terminal codon.

330

K. Umesono

the mbpX gene shares extensive homologies with inner membrane subunits of bacterial transport complexes such as HisP, MalK and PstB proteins (Fig. 10). In contrast with the conserved structure of the mbpX gene, the predicted amino acid sequencefrom the mbpY gene has less homology to the sequences of bacterial counterparts (Kohchi et al., 1988). It is therefore necessary to identify the products and functions of these mbp genes. It has been reported that the affinity histidine transport system of Salmonella typhimurium consists of a periplasmic histidine-binding protein (hisJ gene product) and three membrane-bound components, the his&, hisM and hisP gene products (Higgins et al., 1982). A chloroplast gene designated as mbp Y is a candidate for another component of the putative complex containing the mbpX product (Kohchi et al., 1988). The physiological role of the mbpX protein is not understood, but it seemslikely that this protein may be a component of an unknown transport system in the chloroplasts associated with adenine nucleotide binding activity. No mbpX-like gene is reported for the tobacco chloroplast genome (Shinozaki et al., 1986). In plant chloroplasts, the presence of the clustered genes involved in protein biogenesis or photosynthesis implies that they represent regulatory units for gene expression equivalent to bacterial operons. The transcriptional promoter-like sequences (shown in Figs 2 and 3) were present upstream from the clusters of functionally related genes such as tmE(UUC)-tmY(GUA)-tmD(GUC), tmL(UAA)-tmF(GAA), tmG(UUC)-tmR(UCU), rps’lZ-rps7, rpoB-rpoCl-rpoCZ-rpa2, atpI-atpH-atpF-atpA, psbD-psbC

atpB-atpE,

and psaA-psaB, suggesting that they may be cotranscribed. The deduced gene organization also indicates that protein genes tend to be tandemly oriented in the same direction, although tRNA genes are frequently transcribed in the opposite direction (Fig. 1). The organization of these gene clusters appears to be phylogenetically conserved among higher plant chloroplast genomes (Palmer, 1985). We thank Dr J. C. Gray for his critical reading of the manuscript and M. Toda for his assistance in DNA sequencing. This research was supported in part by a Grant-in-Aid for Special Research Projects from the Ministry of Education, Science, and Culture of Japan (to H.I., K.O. and H.O.) and in part by the Yamada Science Foundation (H.O.) References Alt, J., Winter, P., Sebald, W., Moser, J. G., Schedel, R., Westhoff, P. & Herrmann, R. G. (1983). Curr. Genet. 7, 129-138. Alt, J., Morris, J., Westhoff, P. & Herrmann, R. G. (1984). Curr. Genet. 8, 597-606. An, G., Bendiak, D. S., Mamelak, L. A. & Friesen, J. D. (1981). Nucl. Acids Res. 9, 4163-4172. Anderson, S., Bankier, A. T., Barrell, B. G., de Bruijn, M. H. L., Co&on, A. R., Drouin, J., Eperon, I. C., Nierlich, D. P., Roe, B. A., Sanger, F., Schreier,

et al P. H., Smith, A. J. H., Staden, R. & Young, I. G. (1981). Nature (London), 290, 457-465. Bedwell, D., Davis, G., Gosnik, M., Post, L., Nomura, M., Kestler, H., Zengel, J. M. & Landahl, L. (1985). Nucl. Acids Reu. 13, 3891-3903. Berard, J., Belanger, G., Corriveau, P. & Gingras, G. (1985). J. Biol. Chem. 261, 82-87. Bird, C. R., Keller, B., Auffret, A. D., Huttley, A. K., Howe, C. J., Dyer, T. A. & Gray, J. C. (1985). EMBO J. 4, 1381-1388. Bonnard, G., Michel, F., Weil, J.-H. & Steinmetz, A. (1984). Mol. Gen. Genet. 194, 330-336. Bookjans, G., Stummann, B. M., Rasmussen, 0. F. & Henningsen, K. W. (1986). Plant Mol. Biol. 6, 359-366. Buckel, S. D., Bell, A. W., Rao, J. K. M. & Hermodson, M. A. (1986). J. Biol. Chem. 261, 7659-7662. Burke, J. M., Irvine, K. D., Kaneko, K. J., Kerker, B. J., Oettgen, A. B., Tierney, W. M., Williamson, C. L., Zaug, A. J. & Cech, T. R. (1986). Cell, 45, 167-176. Burton, Z., Burgess, R. R., Lin, J., Moore, D., Holder, S. & Gross, C. A. (1981). Nucl. Acids Res. 9, 2889-2903. Cech, T. R. & Baas, B. L. (1986). Annu. Rev. B&hem. 55, 599-629. Cerretti, D. P., Dean, D., Davis, G. R., Bedwell, D. M. & Nomura, M. (1983). Nucl. Acids Res. 11, 2599-2616. Chomyn, A., Malriottini, P., Cleeter, M. W. J., Ragan, C. I., Matsuo-Yagi, A., Hatefi, Y., Doolittle, R. F. & Attardi, G. (1985). Nature (London), 314, 592-597. Cozens, A. L. & Walker, J. E. (1986). Biochem. J. 236, 453-460. Cozens, A. L., Walker, J. E., Phillips, A. L., Huttley, A. K. & Gray, J. C. (1986). EMBO J. 5, 217-222. Curtis, S. E. & Haselkorn, R. (1984). Plant Mot Riot 3, 249-258. Davies, R. W., Waring, R. B., Ray, J. A., Brown, T. A. & Scazzocchio, C. (1982). Nature (London), 300, 719-724. Deno, H. & Sugiura, M. (1983). Nucl. Acids Res. 11, 840778414. Deno, H., Kato, A., Shinozaki, K. & Sugiura, M. (1982). Nucl. Acids Res. 10, 7511-7520. Deno, H., Shinozaki, K. & Sugiura, M. (1983). Nucl. Acids Res. 11, 2185-2191. Doolittle? R. F., Johnson, M. S., Husain, I., Van Houten, B., Thomas, D. C. & Sancar, A. (1986). Nature (London), 323, 451-453. Erickson, J. M., Rahire, M. & Rochaix, J.-D. (1984). EMBO J. 3, 2753-2762. Fish, L. E., Kuck, U. & Bogorad, L. (1985). J. Biol. Chem. 260, 1413-1421. Fromm, H., Edelman, M., Koller, B., Goloubinoff, P. & Galun, E. (1986). Nucl. Acids Res. 14, 883-898. Fukuzawa, H.. Kohchi, T., Shirai, H., Ohyama, K., Umesono, K., Inokuchi, H. & Ozeki, H. (1986). FEBS Letters, 198, 11-15. Fukuzawa, H., Kohchi, T., Sano, T., Shirai, H., Umesono, K., Ozeki, H. & Ohyama, K. (1988). J. Mol. Biol. 203, 333-357. Gilson, E., Nikaido, K. & Hofnung, M. (1982). NucE. Acids Res. 10, 7449-7458. Greenberg, B. M., Narita, J. O., DeLuca-Flaherty, C. R. & Hallick, R. B. (1985). In MoEecuZar Biology of the Photosynthetic Apparatus (Steinback, K. E., Bonitz, S., Arntzen, C. J. & Bogorad, L., eds), pp. 303-309, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Hallick, R. B., Hollingsworth, M. J. & Nickoloff, tJ. A. (1984). Plant Mol. Biol. 3, 169-175.

Liverwort

Chloroplast

Henning, J. t Herrmann, R. G. (1986). Mol. Gen. Genet. 203, 117-128. Higgins, C. F., Haag, P. D., Nikaido, K., Ardeshir, F., Garcia, G. & Ames, G. F.-L. (1982). Nature (Lo&m), 298, 723-727. Higgins, C. F., Hiles, I. D., Whalley, K. & Jamieson, D. J. (1985). EMBO J. 4, 1033-1040. Higgins, C. F., Hiles, I. D., Salmond, G. P. C., Gill, D. R., Downie, J. A., Evans, I. J., Holland, I. B., Gray, L., Buckel, S. D., Bell, A. W. & Hermodson, M. A. (1986). Nature (London), 323, 448-450. Holschuh, K., Bottomley, W. & Whitefeld, P. R. (1984). Nucl. Acids Res. 12, 8819-8834. Howe, C. J. (1985). Curr. Genet. 10, 139-145. Howe, C. J., Auffret, A. D., Doherty, A., Bowman, C. M., Dyer, T. A. & Gray, J. C. (1982). Proc. Nut. Acud. Sci., U.S. A. 79, 6903-6907. Howe, C. J., Fearnley, I. M., Walker, J. E., Dyer, T. A. 6 Gray, J. C. (1985). Plant Mol. Biol. 4, 333-345. Kanazawa, H., Kayano, T., Mabuchi, K. & Futai, M. (1981). B&hem. Biophys. Res. Commun. 103, 694612. Kanazawa, H., Kayano, T.. Kiyasu, T. & Futai, M. (1982). Biochem. Biophys. Res. Commun. 105, 12571264. Keller, M. & Stutz, E. (1984). FEBS Letters, 175, 173-177. Kirsch, W., Seyer, P. & Herrmann, R. G. (1986). Curr. Genet. 10, 843-855. Kohchi, T., Shirai, H., Fukuzawa, H., Sane, T., Komano, T., Umesono, K., Inokuchi, H., Ozeki, H. 6 Ohyama, K. (1988). J. Mol. Biol. 203, 353-372. Krebbers, E. T., Larrinua, I. M., McIntosh, L. & Bogorad, L. (1982). NucE. Acids Res. 10, 4985-5002. Krebbers, E., Steimetz, A. & Bogorad, L. (1984). Plant Mol. Biol. 3, 13-20. Kruger, K., Grabowski, P. J., Zaug, A. J., Sands, J., Gottschling, D. E. & Cech, T. R. (1982). Cell, 31, 1477157. Lerbs, S., Brilutigam, E. & Parthier, B. (1985). EMBO J. 4, 1661-1666. Michel, F. & Dujon, B. (1983). EMBO J. 2, 33-38. Michel, F., Jacquier, A. & Dujon, B. (1982). Biochimie, 64, 867-881. Montandon, P.-E. & Stutz, E. (1983). NucE. Acids Res. 11, 5877-5892. Montandon, P.-E. & Stutz, E. (1984). Nucl. Acids Res. 12, 2851-2859. Ohme, M., Tanaka, M., Chunwongse, J., Shinozaki, K. & Sugiura, M. (1986). FEBS Letters, 200, 87-W. Ohyama. K.. Fukuzawa, H., Kohchi, T., Shirai, H., Sane, T., Sane, S., Umesono, K., Shiki, Y., Takeuchi, M., Chang, Z., Aota, S., Inokuchi, H. & Ozeki, H. (1986). Nature (London), 322, 572-574. Ohyama, K., Fukuzawa, H., Kohchi, T., Sano, T., Sane, S., Shirai, H., Umesono, K., Shiki, Y., Takeuchi, M., Chang, Z., Aota, S., Inokuchi, H. & Ozeki, H. (1988). J. Mol. Biol. 203, 281-298. Ovchinnikov, Y. A., Monastyrskaya, G. S., Gubanov, V. V., Guryev, S. O., Chertov, 0. Y., Modyanov, N. N., Grinkevich, V. A., Makarova, 1. A., Marchenko, T. V., Polovnikova, I. N., Lipkin, V. M. & Sverdlov, E. D. (1981). Eur. J. B&hem. 116, 621-629. Ovchinnikov, Y. A., Monastyrskaya, G. S., Gubanov, V. V., Guryev, S. O., Salomatina, I. S., Shuvaeva, Edited

Genome.

II

331

T. M., Lipkin, V. M. & Sverdlov, E. D. (1982). Nucl. Acids Res. 10, 40354044. Palmer, J. D. (1985). Annu. Rev. Genet. 19, 325-354. Palmer, J. D. & Stein, D. B. (1986). Curr. Genet. 10, 823-833. Quigley, F. t Weil, J.-H. (1985). Curr. Genet. 9, 495-503. Rasmussen, 0. F., Bookjans, G., Stummann, B. M. & Henningsen, K. W. (1984). Plant Mol. Biol. 3, 191-199. Reinbolt, J., Tritsch, D. & Wittmann-Liebold, B. (1978). FEBS Letters, 91, 297-301. Rochaix, J.-D., Dron, M., Rahire, M. C Malnoe. P. (1984). Plant Mol. Biol. 3, 363-370. Saraste, M., Gay, N. J., Eberle, A., Runswick, M. J. & Walker, J. E. (1981). Nucl. Acids Res. 9, 5287-5296. Schon, A., Krupp, G., Gough, S., Berry-Lowe, S., Kannangara, C. G. & Siill, D. (1986). Nature (London), 322, 281-284. Schwarz, Z., Jolly, S. O., Steinmetz, A. A. & Bogorad, L. (1981). Proc. Nat. Acad. Sci., U.S.A. 78, 3423-3427. Shinozaki, K., Ohme, M., Tanaka, M., Wakasugi, T., Hayashida, N., Matsubayashi, T., Zaita, N., Obakata, J., YamaguchiChunwongse, J., Shinozaki, K., Ohto, C., Torazawa, K., Meng, B. Y., Sugita, M., Deno, H., Kamogashira, T., Yamada, K., Kusuda, J., Takaiwa, F., Kato, A., Tohdoh, N., Shimada, H. & Sugiura, M. (1986). EMBO J. 5, 2043-2049. Spielmann, A. & Stutz, E. (1983). Nucl. Acids Res. 11, 715777167. Stein, D. B., Palmer, J. D. & Thompson, W. F. (1986). Curr. Genet. 10, 835-841. Steinmetz, A., Gubbins, E. J. & Bogorad, L. (1982). Nucl. Acids Res. 10, 3027-3037. Steinmetz, A. A., Castroviejo, M., Sayre, R. T. & Bogorad, L. (1986). J. Biol. Chem. 261, 2485-2488. Subramanian, A. R., Steinmetz, A. & Bogorad, L. (1983). Nucl. Acids Res. 11, 5277-5286. Sugita, M., Shinozaki, K. & Sugiura, M. (1985). Proc. Nat. Acad. Sci., U.S.A. 82, 3557-3561. Surin, B. P., Rosenberg, H. & Cox, G. B. (1985). J. Bacterial. 161, 189-198. Torazawa, K., Hayashida, N., Obokata, J., Shinozaki, K. & Sugiura, M. (1986). Nuc2. Acids Res. 14, 3143. Umesono, K., Inokuchi, H., Ohyama, K. & Ozeki, H. (1984). Nucl. Acids Res. 12, 9551-9565. van der Horst, G. & Tabak, H. F. (1985). Cell, 40, 759766. Walker, J. E., Saraste, M., Runswick, M. ,J. & Gay, K. J. (1982). EMBO J. 1, 945-951. Watson, J. C. & Surzycki, S. J. (1983). Curr. Genet. 7, 201-210. Yamada, K., Shinozaki, K. & Sugiura, M. (1986). PEant Mol. Biol. 6, 193-199. Youvan, D. C., Alberti, M., Begusch, H., Bylina, E. J. & Hearst, J. E. (1984). Proc. Nat. Acad. Sci., C’.S.A 81, 189-192. Zurawski, G. & Clegg, M. T. (1984). Nucl. Acids Res. 12, 2549-2559. Zurawski, G., Bottomley, W. C Whitfeld, P. R. (1982a). hoc. Nat. Acud. Sci., U.S.A. 79, 6260-6264. Zurawski, G., Bohnert. H. J.. Whitfeld. P. R. & Bottomley, W. (1982b). Proc. Nat. Awd. Sci., U.S.A. 79, 7699-7703. Zurawski, G., Bottomley, W. t Whitfeld, P. R. (1986). Nucl. Acids Res. 14, 3974.

by S. Brenner