Biochirnica et Biophysica Acta, 1173 (1993) 333-336 © 1993 Elsevier Science Publishers B.V. All rights reserved 0167-4781/93/$06.00
BBAEXP 90519
333
Short Sequence-Paper
Sequence of Lhcb3* 1, a gene encoding a Photosystem II chlorophyll a/b-binding protein in Pisum Denis Falconet a, Christian Godon a,1, Michael J. White b,2 and William F. Thompson b,c "Laboratoire de Biologie Mol~culaire V~g~tale, URA 1128, Universit~ Paris Sud, Orsay (France), h Department of Botany and c Department of Genetics, North Carolina State University, Raleigh, NC (USA) (Received 19 February 1993)
Key words: Lhcb gene; Chlorophyll a/b-binding protein; Nucleotide sequence; (Pisum)
We have cloned and sequenced a pea Lhcb3 gene, encoding a Photosystem II chlorophyll a/b-binding protein. Sequence analysis indicates that the gene contains two introns and predicts a polypeptide of 265 amino acids. The predicted polypeptide sequence is highly homologous to the polypeptide sequences deduced from Lhcb3 genes previously characterized in tomato and barley.
The chlorophyll a/b-binding proteins of the Photosystem II light harvesting complex (LHCP II) are encoded by the major Lhcb multigene family (according to the nomenclature proposed by Jansson et al. [1]). This gene family consists of 3 to 16 members in different species, and contains three clearly defined subfamilies: Lhcbl genes lack introns while Lhcb2 and Lhcb3 genes contain one and two introns, respectively, within the coding sequence [2-4]. Only Lhcbl and Lhcb2 genes have previously been described for pea, but we now report the sequence of a pea Lhcb3 gene designated Lhcb3* 1. A clone containing Lhcb3*l was isolated from a genomic library [5] of Pisum saticum L. (cv. Alaska) by probing with the cDNA clone pEA315 [6]. It was previously shown that this 704 bp cDNA, whose abundance is controlled by phytochrome [7, 8] belongs to the Lhcb multigene family (D. Falconet, unpublished data).
Correspondence to: D. Falconet, Laboratoire de Biologie Mol6culaire V6g&ale, URA 1178, BP 53X, Universit6 Joseph Fourier, 38041 Grenoble, France. i Present address: Laboratoire de Biologie Cellulaire, INRA, Versailles, France. 2 Present address: Biology Department, Saint Mary's University, Halifax, Nova Scotia, Canada. The sequence data of clone Lhcb3*l have been submitted to the EMBL/Genbank Data Libraries under the accession number X69215.
We report here the nucleotide sequence of the Lhcb3*l gene and its 5' and 3' flanking regions. As shown in Fig. 1, the gene sequence includes three open reading frames separated by two introns, and predicts a polypeptide of 265 amino acids. The 105 bp and 191 bp introns of the pea Lhcb3* 1 are found at equivalent positions as the introns present in the tomato Lhcb3* 1 gene [4], the only other genomic sequence known yet. Typical G T / A G splice donor and acceptor sites are present, and the sequences surrounding these sites are 75% homologous with the consensus sequence for splice sites of dieots [9]. Two T A T A sequences and a CAAT sequence are indicated by boxes in Fig. 1. Between these boxes are three G A T A motifs (overlined in Fig. 1). This motif is commonly found in tandem in a region 5' proximal to the T A T A box of most Lhcbl genes [10,11]. It is also present, designated as the 'I-box' [12] or 'sequence 2' [13], in most but not all ribulose 1,5-bisphosphate carboxylase small subunit (rbcS) genes, although in these genes the motif resides further upstream than i t d o e s in Lhcbl genes. It has been shown that a common protein factor, designated GA-1, binds to these sequences in both rbcS and Lhcbl genes [14], and functional analyses indicate that the G A T A elements are important for high level expression in illuminated plants [11,15]. Fig. 2 shows a comparison of the amino acid sequences predicted from the D N A sequence of the pea Lhcb3* 1 with those deduced from the tomato [4] and barley [16] Lhcb3 genes, other pea Lhcbl and Lhcb2
334 -442 -362 -282 -202
ATTAAACATAAATTACTAAAATTATATTTATAACATTTTTTTTTGGAAACGAGAAAATAGACGTAAATCTTTAACAAAGA GGTTGTGGTCAATTATGTAT~TAATAAGGAATAAAAAGGATGGGTATTTGCTACTTTTAGAAAGAGAGTGTAT ACCAAAAAATAATAAAAAAGAAAATAAGTGGTTAGATAATAAGGATAAAATGTGGATAGAAAGA~-~-~STAAG~A TAAAATCCTTAAAACCGTTCTCATTGGTTGCATcAAACAACATcATTTcAAcCAcAAACTcCCACATGTCTTTCTcACCA
-122 -42
cACATTCATCTCATTTTcATTcCCTACATATCATGTTTCTCTCACCCATTCATTTCACAAATCTCTACTAA•AGCTTCTG AGTTAACTAGTAAGGTTTCAACTCGACAACAAGAAGAAAATCATGGCATTGATGGCAGCTACAGCAAGCTCAGCAACTGT
M A L M A A T A S S 39 T G T T A A G C A A A c T C C T T T C C T T G G T C A A A G G A A G A G T G c c A A T c C T C T T A G A G A T G T T G T T G C C A T G G G A A C T T C C A A A T
V
K
Q
T
P
F
L
G
Q
R
K
S
A
N
P
L
R
D
V
V
A
M
G
A
T
T
S
V
K
119
TCACCATG
198
ATCGATACATGCAATTTATTTTGTGATTGTGAAG
277
CCATTTTCAGCTCAGACTCCTTCATACTTGACTGGAGAATTCCCTGGTGATTATGGATGGGACAcTGCTGGTTTATCTGC P F S A Q T P S Y L T G E F P G D Y G W D T A G
357
TGATCCAGAAGCATTTGCCAAGAACCGGGCTCTTGAG
436 516
GAACCAATCAATATACTAAGTTGTTAGCAAATCCTAGCTTAATTGGCAGAAGCCGATATTGCTAGGTTTGACGTCATCAC TAGGGTTCGAACTCGAACTCTGATGATGAGTGTTGTGTGTGAGTTTCTAAATCATGTTGAATAATGAAG GTGATCCATG V I H
595
G A A G A T G G G C T A T G C T A G G A G C A C T A G G A T G C A T A A C A C C A G A A G i A C T T CmA A A A A T G G G T T A G A G T T G A C T T C A A A G A A
675
G R W A M L G A L G C I T P E V L Q K W V R V D CCAGTTTGGTTCAAAGCAGGATCACAGATTTTCTCAGAAGGTGGACTAGACTATTTAGGAAACCCTAACCTTGTTCATGC
755
ACAAAGTATCTTAGCAGTACTAGGTTTCcAAATTGTTCTAATGGGACTCGTCGAAGGTTTTCGCATCAACGGACTTCCCG
F
T
GTATTTTCACTATGTTAACATATTTACATTGTTGAATTTTCATTAGTCTTTAAACGTGTGGCTCTGATATG
M GGAAATGATTTGTGGTATGGACCAGACAGAGTGAAATATTTAGGA G
D
P
P
835 915 995
E
V
A
W
F
F
A
K
A
K
G
N
S
R
Q
A
I
D
L
W
Y
G
P
D
R
V
K
Y
L L
G S
A
GTTGGTTCATTGGTTACTTTTCATGTTTATAGTAGTTTGTTT
L
F
N
E
S
E
G
G
L
D
Y
L
G
N
P
N
L
F
K V
Q S I L A V L G F Q I V L M G L V E G F R I N G L ATGTCGGAGAAGGCAATGATCTTTACCCTGGTGGGCAATACTTTGATCCTCTTGGTCTTGCCGACGATCCGGTTACTTTC D V G E G N D L Y P G G Q Y F D P L G L A D D P V T F GCCGAGCTTAAAGTGAAGGAGATCAAGAATGGGAGATTGGCTATGTTCTCTATGTTTGGTTTCTTTGTTCAAGCTATTGT A E L K V K E I K N G R L A M F S M F G F F V Q A GACTGGTAAAGGACCTTTGGAGAATCTGTTGGATCAcCTTGATAACCCTGTTGCTAACAATGCTTGGGTTTATGCTACAA T G K G P L E N L L D H L D N P V A N N A W V Y A
E H
A P
I
V T
1075
AGTTTGTGCCTGGTGCTT~CATTTGATGTTATCTATGATTGCATTTGTGTG]~GGATGAGACTTGGCATTATTG@TA
1155 1235
~fTATGATGTTGATGGTTGTTGGATTAGTGCACTTGTTTGTTTCTTGGTTCTT,~GGACTAAGGTTCATTTT~AGATG GGAGATcTTTTGTTAATATGTTTTGTT•AA•cCAA•cA•TGAGAA•AGTTATGTc•A•ATGATTTGTTTAAA•cAAGTAA
1315
CAAGGAAGGCATATAAATGTTATTTGTTGTTGAGAAATTAAATTTTAGAATACAATTTATACTTAAACATGGTCAATTAA
K
F
V
P
G
A
1395
TAACTTAAATGTTGCAAAAATAGATATGAATCCAAGGGTTGAATCATCTACAAGGATTTTTAAAGTTAGGTAGcTGAGAC
1475 1555 1635
ATACTATAAATGTTTAAGGCTTGAGTTTGATTTTGGGTAAAGGTTTGATTTTGGGTAAAGGTTTGAGTTTGATTTATGAT TTATTATAAGTTCTTTTTTTGTCAAATTTAATCTAAAAATTTATTTAAAAGTCTGTTTTATATTAAGATTTCCAAATAGT TTATTTTACCTACTTATTTATCTATTTTTTAATGAATATAcATATAATATATATAAATACCGTAGACCTTAAATAAGCAT
1715 1795
CGGTAGGTcCGTTTGCCATATTTTcATcCCT~TcATAAGTGAcc.~TTTGTA~T]~&i~@AGCcATTCCTAATCATAA GTGACCAATTTGTAAAATAAAA4TCTCTTAGAATCGAT
Fig. 1. Nucleotide and deduced amino-acid sequences for the pea Lhcb3* 1 gene. Boxes indicate possible C A A T and T A T A motifs and a putative signal for processing the 3' end of the mRNA. Three G A T A motifs b e ~ e e n the C A A T and T A T A boxes are overlined. The terminal GT and A G dinucleotides of the introns are underlined. The two halves of a direct repeat in the 3' flanking region are indicated by horizontal arrows.
sequences, and consensus sequences for Lhcbl and Lhcb2 genes. The arrows mark putative transit peptide processing sites within the precursor protein. N-terminal sequencing of the tomato [4] and barley [17] LHCB3 polypeptides indicates that the Ser and Gly residues, respectively, are the first amino acids of the mature polypeptide in these species. In pea, the 265 amino acid LHCB3 precursor is 3 or 4 residues shorter than LHCB1 and the same length as LHCB2. The mature
LHCB3 polypeptide (223 residues as the Type 3 mature polypeptides in tomato and barley) is shorter than LHCB1 (234 residues) or LHCB2 (230 residues), with most of this difference in length residing in the Nterminal portion of the sequence. The pea amino acid sequence of the mature peptide is approx. 96% identical to those predicted for the tomato and barley Type 3 polypeptides, and approx. 75% identical to either the predicted sequences of LHCB1 or LHCB2. The transit
Fig. 2. Comparison of amino acid sequences deduced from the pea Lhcb3* 1 gene with Lhcb3 sequences from tomato [4] and barley [16], with Lhcbl [18-21] and Lhcb2 [22] sequences from pea, and with consensus sequences for Lhcbl [21] and Lhcb2 [22]. Asterisks indicate identical amino acids, and a dot represents a gap introduced to maximize similarity. The vertical arrows mark the putative transit peptide processing site within the precursor protein.
335
Pea Tomato Barley Pea Pea Pea Pea Pea Consensus Consensus
Lhcb3*l Lhcb3*l Lhcb3*l Lhcbl*4 Lhcbl*3 Lhcbl*2 Lhcbl*l Lhcb2*l Lhcbl Lhcb2
Pea Tomato Barley Pea Pea Pea Pea Pea Pea Consensus Consensus
Lhcb3*l Lhcb3*l Lhcb3*l Lhcbl*4 Lhcbl*3 Lhcbl*2 Lhcbl*l Lhcbl*5 Lhcb2*l Lhcbl Lhcb2
Pea Tomato Barley Pea Pea Pea Pea Pea Pea Consensus Consensus
Lhcb3*l Lhcb3*l Lhcb3*l Lhcbl*4 Lhcbl*3 Lhcbl*2 Lhcbl*l Lhcbl*5 Lhcb2*l Lhcbl Lhcb2
Pea Tomato Barley Pea Pea Pea Pea Pea Pea Consensus Consensus
Lhcb3*l Lhcb3*l Lhcb3*l Lhcbl*4 Lhcbl*3 Lhcbl*2 Lhcbl*l Lhcbl*5 Lhcb2*l Lhcbl Lhcb2
Pea Tomato Barley Pea Pea Pea Pea Pea Pea Consensus Consensus
*hcb3*l Lhcb3*l Lhcb3*l Lhcbl*4 Lhcbl*3 Lhcbl*2 Lhcbl*l Lhcbl*5 Lhcb2*l Lhcbl Lhcb2
i0 20 30 40~ MAL..MAATASSATVVKQTPFL.G..QRKSANPLRDVVAM.GTSKFTMGNDL ********************** *****************************
........... ...........
**S*****.**R**********S**GRAA**S***********G******** ........... ..... * * * . . . * S M A L S S P T L T . * K P V E T * * * * * . S S Q E L . * G A R * * * R K S A T T K K V A S S G S P ..... * * * S S * * S M A L S S P T L A . * K * * L * . L * * . . S S Q E L . * A A R * * * R K S A T T K K V A S S G S P ..... * * * S S * * S M A L S S P T L A . * K * * L * . L * * . . S S Q E L . * A A R * * * R K S A T T K K V A S S G S P TTKKVASSSSP ..... * * . * . * A I Q Q S A F * G K T . * . . L * Q G N E F I * K R G N F . * Q A R * * * R R T V . K S A P . . . E S I ..... M A a s . t . . m a l S S s s f a . G k a v k l s . . p . . s s s e i t g n g r v t M R K t a t k a k p v s s g S P ..... M A . t . s a i Q q s a f a G g t . a . . l k q q n e l v r k v G . f . g g g R i t M R R T v . KsaP...qSI 50 60 70 80 90 I00 WYGPDRVKYLGPFSAQTPSYLTGEFPGDYGWDTAGLSADPEAFAKNRALEVIHGRWAMLGAL ************************************************************** ************************************************************** **************************************************************
************************************************************** ************************************************************** **************GES****,******************************S********* ******p***********************************************,******* WyGpDrVkYLGPfSGEsPSYLTGEFpgDYGWDTAgLSADPetFakNReLEVIHcRWaMLGAL WyGeDRPKyLGPFSEQtPSYLTGEFPGDYGWDTAGLSADPETFArNRELEVIHcRWAMLGAL
Ii0 120 130 140 150 160 GCITPEVLQKWVRVDFKEPVWFKAGSQIFSEGGLDYLGNPNLVHAQSILAVLGFQIVLMGLV
********m***K************************************************* ************************************************************** ************************************************************** ************************************************************** ************************************************************** ************************************************************** ************************************************************** ************************************************************** GCVFPELLaR.NGvKFGEAvWfKaGsQIFseGGLdYLGNPSLvHAQSiLAIWAcQVvLMGAV GCvFPEILsK.NGVkFGEAVWFKAGsQIFsEGGLDYLGNPNLiHAQSILAIWAcQWLMGfv
170 180 190 200 210 220 230 EGFRINGLPDVGEGND.LYPGGQYFDPLGLADDPVTFAELKVKEIKNGRLAMFSMFGFFVQA ************************************************************** *************VV**'*****.************************************** ************************************************************* *****************************,******************************* ********,**************************************************** **********************.*******E****************************** ************************************************************** EGYRvAGgPL.GevvDplYPGGs. FDPLgLAddpeAFaELkVKEiKnGRLAMfSMFGFFvqA EGYRVGGGPL.GEGLDplYPGgA.FDPLGLAdDPeafAELKVKEiKNGRLAMFSMFGFFVQA 240 250 260 IVTGKGPLENLLDHLDNPVANNAWVYATKFVPGA ********************************** ********************************** ********************************** ********************************** ********************************** ********************************** ****-************************,**** iVTGKGPIenLADHIaDPVNNNAwafATNFVPgK IVTGKGPIeNLyDHiaDPVANNAWAfATNFVPGk
336 p e p t i d e consists o f 42 residues, a n d is 80% a n d 53% identical to the t o m a t o (42 r e s i d u e s ) a n d b a r l e y (45 r e s i d u e s ) t r a n s i t s e q u e n c e s , respectively. M u c h less similarity is o b s e r v e d in c o m p a r i s o n s with the L H C B 1 a n d L H C B 2 transit sequences, which a r e also slightly s h o r t e r t h a n the T y p e 3 t r a n s i t s e q u e n c e s . It has b e e n r e c e n t l y shown [18] t h a t L h c b 3 * l b e longs to a g r o u p o f g e n e s w h o s e m e m b e r s exhibit relatively strong e x p r e s s i o n in r e s p o n s e to r e d light treatments of etiolated pea seedlings and accumulate t r a n s c r i p t levels in g r e e n i n g b u d s which a r e similar to o r slightly h i g h e r t h a n those a c h i e v e d in fully e x p a n d e d leaves. This g r o u p consists o f C a b - 8 ( L h c b l * 4 ) , A B 9 6 ( L h c b l * 1), Cab-215 (Lhcb2 * 1) a n d Cab-315 (Lhcb3 * 1). In contrast, a s e c o n d g r o u p of L h c b genes, consisting of the type I g e n e s Cab-9 ( L h c b l * 5 ) , A B 8 0 ( L h c b l * 2 ) a n d A B 6 6 ( L h c b l * 3 ) , shows little o r no r e s p o n s e to r e d light a n d a c c u m u l a t e s h i g h e r t r a n s c r i p t levels in leaves t h a n in buds. This r e s e a r c h was s u p p o r t e d by a g r a n t f r o m the N a t i o n a l Science F o u n d a t i o n to W . F . T . a n d a N a t i o n a l Science F o u n d a t i o n - C e n t r e N a t i o n a l de la R e c h e r c h e Scientifique g r a n t to D . F . D . F . a c k n o w l e d g e s the F r e n c h Minist6re des A f f a i r e s E t r a n g ~ r e s a n d C.G. the Conseil G 6 n 6 r a l de l ' E s s o n n e .
References 1 Jansson, S., Pichersky, E., Bassi, B., Green, B.R., Ikeuchi, M., Melis, A., Simpson, D.J., Spangfort, M., Staehelin, L.A. and Thornber, J.P. (1992) Plant Mol. Biol. Rep. 10, 242-253. 2 Buetow, D.F., Chen, H., Erdos, G. and Yi, L.S.H. (1988) Photosynth. Res. 18, 61-97. 3 Demmin, D.S., Stockinger, E.J., Chang, Y.C. and Walling, L.L. (1989) J. Mol. Evol. 29, 266-279. 4 Schwartz, E., Stasys, R., Aebersold, R., MacGrath, J.M. and Green, B.R. (1991) Plant Mol. Biol. 17, 923-925.
5 Dobres, M.S., Elliot, R.C., Watson, J.C. and Thompson, W.F. (1987) Plant Mol. Biol. 8, 53-59. 6 Thompson, W.F., Everett, M., Polans, N.O., Jorgensen, R.A. and Palmer, J.D. (1983) Planta 158, 487-500. 7 Kaufman, L.S., Briggs, W.R. and Thompson, W.F. (1985) Plant Physiol. 78, 388-393. 8 Kaufman, L.S., Roberts, L.L., Briggs, W.R. and Thompson, W.F. (1986) Plant Physiol. 81, 1033-1038. 9 Hanley, B.A. and Schuler, M.A. (1988) Nucleic Acids Res. 16, 7159-7176. 10 Castresana, C., Staneloni, R., Malik, V.S. and Cashmore, A.R. (1987) Plant Mol. Biol. 10, 117-126. 11 Gidoni, D., Brosio, P., Bond-Nutter, D., Bedbrook, J. and Dunsmuir, P. (1989) Mol. Gen. Genet. 215, 337-344. 12 Giuliano, G., Pichersky, E., Malik, V.S., Timko, M.P., Scolnik, P.A. and Cashmore, A.R. (1988) Proc. Natl. Acad. Sci. USA 85, 7089-7093. 13 Manzara, T., Carrasco P. and Gruissem, W. (1991) Plant Cell 3, 1305-1316. 14 Schindler, U. and Cashmore, A.R. (1990) EMBO J. 9, 3415-3427. 15 Castresana, C., Garcia-luque, I., Alonso, E., Malik, V.S. and Cashmore, A.R. (1988) EMBO J. 7, 1929-1936. 16 Brandt, J., Nielsen, V.S., Thordal-Christensen, H., Simpson, D.J. and Okkels, J.S. (1992) Plant Mol. Biol. 19, 699-703. 17 Morishige, D.T. and Thornber, J.P. (1990) in Current Research in Photosynthesis (Baltscheffsky, M., ed.), Vol. If, pp. 261-264, Kluwer, Dordrecht. 18 White, M.J., Fristensky, B.W., Falconer, D., Childs, L.C., Watson, J.C., Alexander, L., Roe, B.A. and Thompson, W.F. (1992) Planta 188, 190-198. 19 Coruzzi, G., Broglie, R., Cashmore, A.R. and Chua N.-H. (1983) J. Biol. Chem. 258, 1399-1402. 20 Timko, M.P., Kausch, A.P., Hand, J.M., Cashmore, A.R., Herrera-Estrella, L., Van den Broeck, G. and Van Montagu, M. (1985) in Molecular Biology of the Photosynthetic Apparatus (Arntzen, C., Bogorad, L., Bonitz, S. and Steinback, K., eds.), pp. 381-396, Cold Spring Harbor Laboratory, Cold Spring Harbor. 21 Alexander, L., Falconet, D., Fristensky, B.W., White, M.J., Watson, J.C., Roe, B.A. and Thompson, W.F. (1991) Plant Mol. Biol. 17, 523-526. 22 Falconer, D., White, M.J., Fristensky, B.R., Dobres, M.S. and Thompson, W.F. (1991) Plant Mol. Biol. 17, 135-139.