433
Gene, 74 (1988) 433-443 Elsevier GEN 02780
Organization of the sunflower 11s storage protein gene family (Legumin/globulin seed proteins; nucleotide sequence; divergent gene families; Helianthus annuus; helianthinin)
Raymond A. Vonder Haar, Randy D. Allen *, Elizabeth A. Cohen*, Craig L. Nessler and Terry L. Thomas BiologyDepartment, Texas A&M University, College Station, TX 77843 (U.S.A.) Received 13 April 1988 Accepted 13 August 1988 Received by publisher 19 September 1988
SUMMARY
We have isolated and characterized genes encoding the sunflower 11S globulin seed storage proteins, collectively termed helianthinin. One gene, designated HaG3, has a primary transcription unit of about 1750 nucleotides including two short intervening sequences. The predicted precursor polypeptide from HaG3 is 493 amino acids long, is rich in glutamine and other nitrogen-rich amino acids and includes the amino acid sequence NGVEETICS. This sequence is highly conserved among 11s seed storage proteins and is involved in the proteolytic processing of these polypeptides. Additional helianthinin sequences are conserved among other seed storage protein genes. Analysis of various cDNA and genomic sequences indicates helianthinins are encoded by a small gene family that includes a minimum of two divergent subfamilies.
INTRODUCTION
Like embryos of other oilseed plants, sunflower embryos accumulate and store large quantities of lipid and protein. These stored materials are utilized by the seedling following germination and, in addition, are of immense agronomic importance. The organization and expression of genes encoding seed
Correspondenceto: Dr. T.L. Thomas, Biology Department, Texas A&M University, College Station, TX 77843 (U.S.A.), Tel. (409) 845-O184; Fax (409) 845-289 1. * Present addresses: (R.D.A.) Department of Biology, Washington University, St. Louis, MO 63130 (U.S.A.), Tel. (314)8896883; (E.A.C.) Department of Genetics, University of Georgia, Athens, GA 30602 (U.S.A.), (404)542-1444.
0378-I 119/88/$03.50
0 1988 Ekevier
Science
Publishers B.V.
(Biomedical
storage proteins has been investigated in a number of plant species, including both dicots and monocots (reviewed by Shotwell and Larkins, 1988). In all cases, the accumulation of storage proteins during seed development and maturation requires the highly regulated expression of genes encoding these proteins. Substantial post-translation modifications and targeting to appropriate subcellular compartments
Abbreviations: aa, amino acid(s); bp, base pair(s); 2D, two dimensional; DPF, days post-flowering; Denhardt’s solution, 0.02% bovine serum albumin, 0.02% Ficoll and 0.02% polyvinylpyrrolidone; ER, endoplasmic reticulum; nt, nucleotide(s); ORF, open reading frame; PAGE, polyacrylamide gel electrophoresis; SDS, sodium dodecylsulfate; SET, 0.15 M NaCl, 0.02 M Tris-HCl, 0.002 M EDTA (pH 8.0). For nucleotide sequences, H = A, C or T; M = A or C. Division)
434
are also necessary. Consequently, these genes provide an excellent opportunity for analysis of the molecular mechanisms controlling many aspects of ontogenic gene expression in plants. Sunflower seed proteins include the water soluble 2S albumins and the salt soluble 11S globulins. The sequence and expression of albumin structural genes has been described (Allen et al., 1987a, 1987b). The sunflower 11s storage protein, designated helianthinin, is structurally similar to legumin-like seed proteins of other plant species and is represented in planta by an approximately 300-kDa hexameric holoprotein (reviewed by Shotwell and Larkins, 1988). Each subunit of the holoprotein consists of a larger, acidic (IX) polypeptide (30-40 kDa) and a smaller, basic (B) polypeptide (23-27 kDa) linked by disulfide bonds. The a and /I polypeptides of legumin-like proteins such as the helianthinins are generated proteolytically from larger precursor polypeptides that are synthesized on the rough ER. An NH, terminal signal peptide targets the nascent polypeptide to the lumen of the rough ER, where it is removed. The 11s precursors assemble into trimers in the ER and are then transported to the vacuole through the Golgi. Once in the vacuole, 11S precursors are cleaved into disullide-linked a and b polypeptides. The trimers then assemble into hexamers, and following additional protein accumulation, the vacuole subdivides to form protein bodies characteristic of many plant seeds (Higgins, 1984). The cloning and expression of helianthinin mRNAs has been described (Allen et al., 1985 ; Allen et al., 1987b). Synthesis of helianthinin mRNAs and precursor polypeptides is tightly regulated during sunflower embryogenesis. Helianthinin a and j? subunits first appear about 7 DPF, two days after the albumin seed proteins appear (Cohen, 1986), and like the albumins, these polypeptides continue to accumulate through much of sunflower seed development. Helianthinin mRNAs are also detected 7 DPF; these transcripts accumulate and disappear with kinetics similar to those observed for albumin mRNA (Allen et al., 1987b). In this paper, we describe the isolation and characterization of genes encoding helianthinin in sunflower. Sequence and S 1 nuclease analysis of one gene, designated HaG3, defined a primary transcription unit of about 1750 nt, including two short intervening sequences. The helianthinin polypeptide
predicted from the nucleotide sequence of HaG3 shares significant, functional sequence homologies with other 11s seed storage proteins. Analysis of cDNA and genomic DNA sequences indicate helianthinins are encoded by a small gene family that includes at least two divergent subfamilies. Sequences located 5’ ofthe HaG3 transcription unit are conserved among other seed storage protein genes.
MATERIALS
AND METHODS
(a) Materials Sunflower seeds (Helianthus annuus L. cv. Giant Grey Stripe, Northrup Ring Seed Co., Minneapolis, MN) were obtained commercially. Embryos from field-grown plants were dissected from achenes at the indicated times, frozen in liquid nitrogen and stored at -80°C. (b) Isolation and labeling of nucleic acids Bacteriophage and plasmid DNAs were prepared by standard methods (Maniatis et al., 1982). Total and poly(A)+ RNA from leaves and staged sunflower embryos were prepared as described by Allen et al. (1985). Radiolabeled hybridization probes for genomic library screening, phage recombinant mapping and genomic DNA blots were prepared by nick translating a l.l-kb EcoRI fragment prepared from the cDNA recombinant, Ha2 (Allen et al., 1987a; Allen, 1986). (c) Plaque hybridization Construction of a sunflower genomic library in the bacteriophage 1 vector EMBL3 (Frishauf et al., 1983) has been described (Allen et al., 1987a). This library was screened for helianthinin phage recombinants by hybridization with nick translated Ha2 cDNA probes (Benton and Davis, 1977). Filters were prehybridized 4 h and hybridized 15-18 h at 67°C in 4 x SET, 5 x Denhardt, 0.2% SDS, 100 pg/ml denature calf thymus DNA, 50 pg/ml poly(A) and 10 pg/ml poly(C). Filters were washed successively at 60°C in 4x, 2x, and 1 x SET containing 0.025 M phosphate buffer and 0.2% SDS
435
tions of selected recombinants were prepared and sequenced as described by Dale et al. (1985). The complete sequence of IIaG3 was assembled from these overlapping clones. Computer analyses were done on a DEC MicroVax using the University of Wisconsin Genetics Computer Group (UWGCG) Sequence Analysis Software (Version 5.0; Devereux et al., 1984).
for 1 h each, air-dried and autoradiographed. Positive recombinants were plaque-purified and restriction-mapped by standard procedures (Maniatis et al., 1982). (d) Nucleotide sequence analysis H&3 DNA was sequenced by the dideoxynucleotide chain te~~ation method @anger et al., 1977) after ligation into M13mp18 and M13mpl9 and transfection into JMlOl (Messing et al., 1983). Single-stranded recombinant phage DNA was processed and sequenced as described @anger et al., 1980). Ad~tion~ overlapp~g T4 polymerase dele-
(e) Transcription analysis Nuclease mapping of the transcriptional start point of HaG3 was done as described by Favaloro et al. (1980) using a 446-bp XhoI-BraI fragment (see
lkb
A SSH
SE
Ii
E
GH
G
8
s
E
G
HS
HaGlO-Cl
Ii HaClcrSl
B
S
Ii
E
I
1
I
IEBI
I
f
t,
I
L
S
St?
,
B
H
s
HaG3-Al
IES
C
8
x
5
P
S
PXS
X
NB-
l
3’
Fig. 1. Physical maps of sunflower he~~~n~ genes. PaneIs A and B: Restriction maps of HuGlO-Bl, HuGlO-Cl, HaG3-Dl and HuG3-Al. Shaded areas indicate regions which hybridize with helianthinin cDNAs, Ha2 and Halo. Panel C: Detailed map of 2.X-kb Sa&BglII fragment of HuG3-Dl that was sequenced. Organization of the helianthinin transcription unit in HaG3-Dl is shown. Dark boxes indicate exons; open boxes indicate introns. B, BglII; E, EcoRI; G, BglI; H, HindIII; I, BumHI; N, iVco1; P, PstI; S, SalI; X, XhoI.
436
1
Sal I GTCGACTATCTTAGTTAATCAAATAAATTTATTTTGATTTG~GTTAATGTATTTTCTCCTAGTTTAA
141
GTCTTCGTTTTTTATATGGTATCAGCGGTGTGGTGTACGATGACGATTA~T~T~TGACGGACT
211
TCTTGGTTGTTACTTATTGATGTACGAAGCTCAGATCTAATTGG
281
AT~GATTACGACT~A~ATCGGTTGCCATGAAATTTGGkAGACTTG
421
ATGTAAAATTTTTAAAGACTATATGGTGTTCTTACGGTTTTACATCTAGT~GAGATT~
491
AAGCAAGGAAAGTAAGTGTAAAGACAGAGTAAAG v
561
GC~A~ATA~ATCATC~GATGATGCATATAGAC
631
AGCTCCAAATGGTGATCTTCTCCTGGCATAACCTCTT
701
v ~CCAGCTAGTTCACAACACCTA~CACCACATCACATCCCATTCCACTTAACAATCGCATCClUIAG MA S KA
771
841
911
GCCATGATATGGCTGATTGTTCATCACCAT ~ATACAGAT~AGCATG~~TC a a
CTTCCTCCTTGATCTTCTTCCAC~
CAACTTTGCTCTTAGCTTTTACCCT~T~GCCACTTGCATTGCCGGCCACCAGCAACGGCAACAGCA TLLLAFTLLFATCIARHQQRQQQ xho I ACAGAACCAGTGCCAGCTTCAAAACATCGAGGCGCTCGAGCCCATCG~GTTATCC~GCTG~GCCGGT QNQCQLQNIEALEPIEVIQAEAG GTGACCGAAATTTGGGACGCCTATGACCAACAGTTCCAGCCAGTGTGCGTGGTCGA~TA~CGACACCGGAT VTEIWDAYDQQFQCAWSILFDTGF
981
TCAACCTGGTGGCCTTCTCTTGCCTTCCTACGTCAACACCCCTATT~GGCCTTCGTCGAGAGAG~ATA NLVAFSCLPTSTPLFWPS SRE<---Pst I 1051 GACAT~T~TA~~GAGTCGC~~~~~T~T~TCT~CTGCAGTG~TGGC _________________________________ I*tron 1 _____*_--_________-________ 1121 ATGTTTAAAGGTAGGGGTATTC~GGGTTATATTGCCGGGATGCCGCAG~CCTATG~TATTCGCAGG _______________________x: V I L P G C R R T Y E Y S Q
E
1191 AGCAACAGTTTTCCGGTGAGGGTGGCCGCAGAGAGGAGGAGGAGAGGGCA~A~CAGGACCGTCATCAGAAA QQFSGEGGRRGGGEGTFRTVIRK 1261 GTTAGAGAACTTAAAGGAGGGTGACGTGG~GCCATCCCCACCGG~CAGCTCACTGGCTTCAC~CGAC LENLKEGDVVAIPTGTAHWLHND 1331 GGCAACACAGAAC~GTGGTCGTCTTCTTGGAA GNTELVVVFLDTQYHENQLDENQR 1401 GGmAACATATATACTCTAAAAAA CTACTCCATT~~CCT~TATATATATA~TG~CT~CTT <_______________________________Intron 2____________________________ 1471 GTAACGTTTC~AGATTCTTCTTAGCCGGkkACCCTCAAGCTCAAGCTCAAAGCCAGCAGCAACAACAAA ----------->R F F L A G N P Q A Q A Q S Q Q Q Q Q R
437
1541 GACAACCACGCCAACAATCTCCTCAAAGGCAAAGGCA4AGGCAAAGGCAAGGG~GGTCAGAACGCCGG QPRQQSPQRQRQRQRQGQGQNAG Sal I 1611 CAACATCTTCAACGGTTTCACCCCCGAGCTCA~G~C~TCA~~CGTCGAC~GAGACCGCC~G NIFNGFTPELIAQSFNVDQETAQ 1681 AAGCTACAAGGACAAAACGACCAGAGAGGCCACA~G~~TGTCGGAC~GACC~~TAGTCCGCC KLQGQNDQRGHIVNVGQDLQ1VR.P 1751 CACCACAAGACAGACGCTCTCCTCGCCAACAACAAGAGCACA PQDRRSPRQQQEQATSPRQQQEQ Pst I 1821 GCAGCAAGGCAGACGTGGCGGATGGAGCAACGGTGTGGAATG QQGRRGGWSNGVEETICSMKFKV <____a /g ____> 1891 AACATTGACAACCCTTCCCAGGCTGACTTTGTAAACCCCGC~GCCGGCAGCATTGC~CCTC~CAGCT NIDNPSQADFVNPQAGSIANLNSF xho I 1961 TCAAATTCCCCATTCTCGAGCACCTCCGGCTCAGCGTGGAAAGAGGCGAACTCCGTCCGAATGCCATCCA KFPILEHLRLSVERGELRPNAIQ 2031 ATCCCCACACTGGACAATCACGCCCACAATCTTCTTCTCTACGT~CCGAGGGAGCCTTGAGGGTAC~TC SPHWTINAHNLLYVTEGALRVQI Sal I 2101 GTCGACAACCAAGGAAACTCAGTTTTCGACAACGAC~CGAGCTCCGTGAGGGACAGGTGGTGGTGATCCCGCAGA VDNQGNSVFDNELREGQVVVIPQN 2171 ACTTTGCGGTGATCAAGAGAGCCAATGAACAACGAAGCAGGTGGGTGTC~C~GACT~TGAT~TGC FAVIKRANEQGSRWVSFKTNDNA 2241 CATGATAGCAAACCTTGCAGGGCGTGTGTCCGCATCAGCAGCATCGCCGTTGACGTTGTGGGCGAATCGG MIANLAGRVSASAASPLSLWANR xho I 2311 TATCAGCTATCTCGAGAGGAGCTCAGCAGCTCAACTTTA YQLSREEAQQLKFSQRETVLFAPS 2381 GTTTTTCCAGGGGCCAAGGGATCAGGGCTTCACGTT~GTC~TGTGTAG~GCATTGTT~CTTC~C FSRGQGIRASR2451 TTGAAGAATAAAAGATGTAGGAG~ATGT~TATAAGTGCAAGAGGTAATAAGAGCTTCACGTATGTT 2521 TATGCATATTTATCTAAATATATTGTCTCGCTTTTGC~~TCTA~ATATAT~TATGTA~GTG Nco I 2591 TTTCATATTTTTGAAGGGATATAATCGCATGACGGATGACGTATGCATCCTCATCCTTAAATTATACATTTCCATGG 2661 ACATGTATATAGTGCTTTTGTTATTTTTGATAT~CATATTACA~TTAGT~G~GT~GATAT 2731 ACACATATTACATTTTAGAAGACTA~ACGTG~~T~T~~CTTTC~TCATTATAGGTTTGTATAG Bgl II 2801 TATTTCTGCGGGTATGAGTGAGATCT Fig. 2. Nucleotide sequence of HuG%DI transcription unit and flanking sequences. CAAT and TATA sequences, splice. junctions and polyadenylation signal are underlined. The transcription initiation site at nt position 726 is indicated (V). Additional upstream sequences shared with other seed protein genes are underlined and indicated by the letters a and b (see section b of RESULTS). The predicted aa sequence of the helianthinin precursor with the N-terminal signal sequence underlined is shown under the nucleotide sequence; the z/p cleavage site is boxed. Introns 1 and 2 are indicated.
438
Fig. 2, nt position 433 to 879) asymmetrically labeled at the 5’ terminus of the XhoI site. Total embryo RNA was used. The only difIerences in method were that the hybridizations were carried out for 6-8 h and 10 units of S 1 nuclease were used per reaction. Reaction products were analyzed on polyacrylamide-urea gels. The 446-bp XhoI-DraI fragment was subjected to Maxam-Gilbert sequencing reactions (Maxam and Gilbert, 1980) which were then used as length markers in the Sl protection experiments.
RESULTS AND DISCUSSION
(a) Isolation and characterization of helianthinin genes A cDNA recombinant representing heli~th~m mRNA was used to screen a sunflower genomic DNA library constructed in the bacteriophage 1 vector, EMBL3 (Frishaufet al., 1983). Multiple bacteriophage A recombinants representing the helianthinin gene family were recovered in these screens. Further analysis of these recombmants by hybridization with the divergent helianthinin cDNAs, Ha2 and Hal0 (Allen et al., 1987b), defined two divergent subfamilies that encode helianthinin in sunflower embryos (Fig. 1,A and B). Two ba~te~oph~e 1 recombinants, HuGlO-Bl and H&10-Cl, hybridize primarily to Hal 0; under less stringent hybridization criteria (6 x SET, 55”C), these recombinants cross-hybridized weakly with Ha2 (data not shown). Conversely, HaG3-Dl and HaG3-Al were more similar to Ha2 than to Halo. Additions sequence data presented below conthms these sequence relationships. (b) Sequence of the HuG3 hel~t~~
gene
A 2.8-kb region of the genomic recombinant, HaG3-Dl, bounded by BgAI and San sites (Fig. lC), was sequenced to determine the precise organization of a representative sunflower legumin-like seed storage protein gene (Fig. 2). Three exons separated by two very short introns (99 and 79 bp) were identified by comparison to the amino acid sequence predicted from the helianthinin cDNA,
Ha2 (Allen, 1986). Intron/exon boundaries were assigned based on ORF discontinuities at each junction, on the co~e~ty of HuG3 and Ha2 on either side of each intron and on the presence of consensus splice junctions (Mount, 1982). The locations of the three exons and two introns in HuG3-D 1 are schematically shown in Fig. 1C; the precise sequence locations are displayed in Fig. 2. The introns in the HuG3 tr~s~~ption unit differ in number and location from those observed for the prototypical /egA gene of pea (Lycett et al., 1984). The !egA gene has three introns at aa positions 95, 179 and 388 (henceforth referred to as Il, 12 and 13). The HuG3 legumin gene has two introns at approximately the same positions as I1 and 12; 13, however, is missing from the sunflower gene. The pea legJ/K genes (Gatehouse et al., 1988) and the Viciu faba LeB4 gene (Baumlein et al., 1986) each contain two introns; in these genes however, I2 and 13 remain and 11 is absent. Interestingly, two divergent Arab&iapsis legumin genes contain all three introns in approximately the same relative position as previously noted for the pea ZegA gene (Pang et al., 1988). The HuG3 transcription unit was mapped by Sl nuclease protection (data not shown). The transcriptional start point is located at nt position 726 (Fig. 2), 32 nt upstream from the translational initiation site. Consensus sequence elements typical of RNA polymerase II tr~sc~ption units in the regions surrounding the 1egumin transcription unit are underlined in Fig. 2. These include a CAAT homology at nt position 635 and a TATA homology at position 699, both 5’ of the transcriptional start point. A consensus ~lyadenylation signal, AATAAA, is located 37 nt 3’ of the stop codon. Sequence elements located 5’ of the HuG3 transcription unit are shared with upstream sequence elements associated with other storage protein genes. Particularly noteworthy is the conse~ation of an element of the legumin (leg) box, a phylogenetitally conserved sequence located approximately 100 nt upstream from several genes encoding legumin and legumin-like seed proteins (Bgumlein et al., 1986). Although the complete leg box is not conserved in HuG3, three elements that differ from the sequence AGAATGTC by only one nt are located between 50 and 2 10 nt upstream of the HuG3 cap site (indicated by a in Fig. 2). In addition to elements of
439
the leg box, the consensus sequence, HAACACAMH, characteristic of most seed protein genes (Goldberg, 1986) is present at position 598 in Fig. 2 (indicated by b). Despite the conservation of sequence and location of the legumin box elements and the CACA motif inHaG3, the functional significance of these conserved sequences remains to be determined. (c) Molecular characteristics of helianthinin and its precursors The precursor polypeptide predicted from the HaG3 sequence is 493 aa and has an M, of 64.5 kDa. As with most legumin-like seed proteins, the HaG3 gene product is rich in amide amino acids, e.g., glutamine and asparagine, and is relatively deficient in methionine and cysteine (Table I). As expected from previous 2D PAGE analyses (Allen, 1986), charged amino acids are distributed within the precursor polypeptide so that the a polypeptide has a net negative charge at neutral pH whereas the fi polypeptide is positively charged under the same conditions. The mechanism of post-translational processing and targeting of 11s globulins to protein bodies is complex, and although in some cases sequences required for these events are phylogenetically conserved (Borroto and Dure, 1987), the molecular basis of these events remains to be elucidated. The initial processing event, cleavage of the signal peptide, occurs co-translationally and results in the transport of the cleaved polypeptide into the lumen of the ER. TABLE I Amino acid composition of HaG3 precursor polypeptide as predicted from the sequence in Fig. 2 Amino acid
Number (%)
Amino acid
Number (%)
Ala
38 (7.71) 6 (1.22) 16 (3.25) 31 (6.29) 25 (5.07) 35 (7.10) 8 (1.62) 24 (4.87) 10 (2.02) 37 (7.51)
Met Asn Pro Gin
3 (0.60) 34 (6.90) 22 (4.46) 69 (14.0) 39 (7.91) 31 (6.29) 23 (4.66) 29 (5.88) 5 (1.01) 8 (1.62)
cys Asp GlU Phe GUY His Ile LYS Leu
Arg Ser Thr Val Tyr Trp
The probable NH, terminal leader sequence of the HaG3 precursor is indicated in Fig. 2; this site was
selected using the - 1, -3 rules defmed by von Heijne (1986) for signal sequence cleavage site selection. The location of the predicted a//? cleavage site is boxed in Fig. 2 (see below). (d) Divergent subfamilies encode helianthinin Hybridization and restriction analysis of two nearly full-length cDNA recombinants, Ha2 and HalO, suggested that sunflower 11s seed proteins were encoded by two divergent subfamilies (Allen, 1986; Allen et al., 1987b). These subfamilies are designated Ha2 and Halo, corresponding to the cDNAs that distinguish each subfamily. Genomic blot analyses (Allen, 1986; Allen et al., 1987b) revealed that the Ha2 subfamily includes at least three members and the Hal0 subfamily includes two or more members. Genomic sequences representative of each subfamily were isolated from a sunflower genomic DNA library; restriction maps of these recombinants are shown in Fig. 1. Regions that are complementary to either Ha2 or Hal0 are indicated. Even at relaxed hybridization criteria (6 x SET, 55 oC), Ha2 and Halo, or their genomic homologues, cross-hybridize very poorly (data not shown). Based on the intrafamilial sequence variation reflected in restriction site locations in regions flanking helianthinin coding sequences (Fig. l), we conclude that HaGIO-Bl and Cl are non-allelic members of the Hal0 subfamily; similarly, HaG3-A 1 and Dl are non-allelic members of the Ha2 subfamily. Based on genomic blot analysis (Allen, 1986, Allen et al., 1987b), the helianthinin genes shown in Fig. 1 cannot represent all members of each subfamily. In the Ha2 subfamily, at least two additional members remain uncharacterized, and in the Hal0 subfamily, there is at least one additional member. The extent of divergence between the Ha2 and Hal0 subfamilies is illustrated in Fig. 3 where the DNA sequence from a region of HaG3, including the Z//I cleavage site (Fig. 2), is compared to a similar region of the cDNA, Halo. Overall these nucleotide sequences share only 50% sequence similarity. The predicted Hal0 and HaG3 aa sequences share 43% similarity (data not shown). This latter observation suggests that the majority of the helianthinin coding sequence has diverged significantly, so much so that
440
CcoTtpcMccAcM CMGGMGA~cMAcAGATrcA~ 45 second-hit mutations contribute significantly to the I II I I III1 III II I II overall divergence of the two helianthinin sub1480 lmxGAGAlTcTTclTAGCCcGr~cAAGcTcAAAGccAG 1529 families. Although highly divergent throughout most 95 46 GGACAACAMGCAGACAA-CMCMGGCAGMGGAGAAGTCC llllll IIIIIII I I 1111 III1 III I of the protein coding sequence, the similarity of the 1579 1530cAGcAAcAAcAMGAcMccAcGccAAcMTcTccTcMAGGDNA sequences encoding the a/b cleavage site 96 tXTN%X&V: . dXGCA&CAAmiTACGm& 142 approaches 90% ; in this region, 24 of 27 nt are IIII I I II I I II IIII III IIII 1580 AAGGCAAAGGCAAGGGCAAGGTCAGAACGCCGGCAACATCTTCAACGGTT 1629 identical. The three nt differences in Fig. 3 are third 143 TCGATACTGM~A~WGC~CCCT~T~ 192 base changes; consequently, the predicted a/j? II III I I II1111 II I 1679 1630 TCACCCCCOACCTCA~C~~~~C~A~C~CC cleavage sites of HuG3 and Hal0 are identical. 1
. . . . .
242 AlTGTCCAGGTACi '
193 ATGAGGGCA&GCAGGAGTkGAAAC~~
III
II
I
I
I
lllll
III
I II
1600 CAGMGCTACAAGGACAAMCGACCAGAGAGGCCACAlTGTTMl'GTCGG 1729 243 G~A~M~cCr~CCC~~......:........A~
III
I
I
II I III
III
277
III1
I
(e) The a//I cleavage site is phylogenetically
con-
served
1730 ACAAGACClTcAMTAcTCCGCCCACCACAAGACAGACGCTCTCCTCGCC1779 278 AACAACAAAiGC.......:.........:.ACCAACGTiGGATCMCGi306
llllllll
II
II
I
II
I II
I I
1780 MCAACAAGAGCMGCGACCT~CCTCGCCAACAACAAGAGCAGCAGCAA 1829 350
309 GGAGGACCCiTC........~CccTc~~~CMTATG~~G~
II
Ill
IIIIIIIIIIIIIIIIII
II lllll 1679
1830 GGCAGACGTGGhGATGGAGCAACG~AMCCATCTGCAGCAT UGVBETICS
400
351 TfWXlT>TACMCA&ATAAC&GAGAGG~GATGTCTTCi
II
III
I
I II IIII
lllllll
I
I I
1880 GAAGTTCAMGTGMCA...TTGACAACCClTCCCAGGCTGA‘XlTGTAA 1926 450
III
llllIllI
I
I
IIII
IIIII
I llllll 1976
1927 ACCCGCMCCCCGCAG~~G~C~~~C~~~CCC~~
451 CCTATCTCTiATGGACCTCiATGGCCGAGiMGGACAC&CMCCGAAT~ 500
II
I I I
III1
I
I
II
I I
I
lllllll 2025
1977 .CTCGAGCACCTCCGCCTCC~~~~~~CC~CC~TG
501 CATTATTCT~GCCACACTGiACAGTGAA&CCACACAG~CGTCTACAG~ 550
I
I
II lIIIIIIIIlII
I III
lllll
I
lllll
II
2074 2026 CCATCCMTCCCCACACTGGACAATCAACGCCCACAATCTTCTCTAC.GT 551 GCTCAACGG~GACGCACAGiTCCAGGTGGi'GTCGMCM~GGiGMGCT~ 600
I I III1
I
IIII
III
I II
Ill
I
I I
2075 AACCGAGGGAGCCTTGAGGGTACMATCGTCGACA4CCAAGGAAACTCAG2124 601 TGTTGAACG~GCAAGTGAG~AGAGGTGA&TI'CCCCAG~GCCACAGTT~650
I II
II
I
I
I
I
II
I
I
I
llllll
I
2125 ~T~TCGACAACGAGCTCCGTGAGGGACAGGTGGTGGTGATCCCGCAGAAC 2174 651 CTTCCGTCA&~ACTGCTC&GCTGGACA&~C~AGTGGGTGG~
IIII
I
I
IIII
I
III
I
lllllll
700
I
2175 TTTGCG...GTGATCAAGAGAGCCAATGMCAAGGMGCAGGTGGGTGTC 2221 701 GTTCMCACCMCCACCTCWLCTCMCACCCCh.TTffiC~GGGTACACAT 749
llllllll
II
II
I
Ill1
I
Illlll
I
2222 TTTCMCACTMTOATMT~TMTGCCATCATACC~C~
2271
750 CCC~TCCGAGCCATGCCGC~~~TAT~~
II
I
III
IIII
799
I
I II
2272 CCGCATCAGCAGCATCGCC 800 TCACCGA&AGG~CACAG
II I
I
I llllll
~T&~$~~~A 2321
CTTGMACTiWXGGGAG~CCGAGAGC&
III
I
II
III
849
IIII
2322 TCTCGAGAGGAAGCTCAGCAGCT-...TAGCCAGAGGGAGACGG~
2366
650 ACTGTTTTC'kCACAGAGG~AGTACTAGG&AAGTAMT&TCCGAGTA~ 899 2369
lllll
I III
II
II
II
lTTGTlTGCACCA...AGTlllTCCAGGGGCCAA................
Comparison of the predicted aa sequences for sunflower helianthinin and Bramica cruciferin (Simon et al., 1985) revealed an overall similarity of 46% including conservative aa differences (data not shown). However, the region encoding the a/j3cleavage site in helianthinin and cruciferin are nearly identical, differing by a conservative change from valine to leucine. The phylogenetic conservation at the a/j? cleavage site is illustrated in Fig. 4. The a//I cleavage site and the nt sequence encoding each site for HuG3 and Hal0 and Ha2 (Allen, 1986) are included. The a//I cleavage sequences for leguminlike seed proteins in seven other species, including both monocots and dicots, are also summarized in Fig. 4. At the aa level, the sequence conservation is striking. The presence of a serine in the last position appears to be characteristic of legB-type genes (Wobus et al., 1986); threonine at this position is indicative of legA-type genes. Based on these data and the intron/exon structure, we conclude that the HuG3 helianthinin gene is a B-type gene and is most similar to the LeB4 gene of V.faba (Baumlein et al., 1986) and legJ/K genes of pea (Gatehouse et al., 1988) than to the prototypical ZegA gene of pea (Lycett et al., 1984). Thus far, all helianthinin genes or cDNAs analyzed are of the legB-type. (f) Conclusions
I 2399
Fig. 3. Comparison of HuG3 and Hal0 sequences. Nucleotide sequences of Hal0 (upper sequence) and HuG3 (lower sequence) spanning the region encoding the u/g cleavage site were compared. The aa sequence of the a//f cleavage site (boxed in Fig. 2) is shown below the nucleotide sequence. Solid bars indicate identical nucleotides. Dots indicate gaps inserted to maximize sequence homology. The fust 1479 nt of the HaG3 sequence are not shown.
(1) The sunflower helianthinins are legumin-like
seed proteins and are encoded by at least two divergent gene families defined by the sequences of HuG3 and Halo. (2) Among legumin-like seed proteins of diverse plant species, the most conserved aa sequences are those required for appropriate post-translational
441
4
Eelian h s annuus &G: ::* Vicia faba k24IA
NGVEETICS AACGGTGTGGAAGAAACCATCTGCAGC AACGGTGTGGAAGAAACCATtTGCAGt AACGGTGTGGAAGAAACaATaTGCAGt
L V T AAtGGgcTtGAgGAAACCgTtTGCAct L AAtGGTtTGGAAGAAACCATCTGtAGt
J%wm
sativum legA
L V T AAtGGgcTtGAgGAAACagTtTGCAct L
AAtGGTtTGGAAGAAACtATCTGtAGt
Brassica nams
Arabidoosis thaliana CRA-1
L AACGGTtTaGAAGAgACCATaTGCAGC L
AAtGGctTaGAgGAgACCATCTGCAGC CRB
Fossv~ium hirsutum Subfamily A
Subfamily B Avena sativa
L L T MtGGTtTaGAgGAgACttTgTGCAcC L F AAtGGccTcGAgGAAACttTCTGCtcC L F AACGGctTaGAAGAAACatTCTGCtca L NF AAtGGTtTGGAgGAgAtttTCTGttca LD F T AACGGTtTGGAtGAgACCtTtTGCAcC
CONSENSUS
NGLEETICS F
T
Fig. 4. Phylogenetic comparison of legumin a/p cleavage sequences. The complete aa sequence for the sunflower a//I cleavage site is shown; downward arrow indicates cleavage site. Only aa that differ from the sunflower sequence are shown for the other plant species. The nucleotide sequences encoding the a//3 cleavage sites are displayed immediately below its corresponding complete or partial aa sequence. Nucleotides that differ from the HuG3 sequence are shown in lower case letters. A consensus a/j cleavage site is indicated at the bottom of the figure. Data sources include Hebiznthirsunnuus: this work; Allen et al. (1987b); Allen (1986); Vi& fubu: Wobus et al. (1986); Piwm sutiwm: Lycett et al. (1984). Gatehouse et al. (1988); Brussica nupus: Simon et al. (1985); Arubidopsis thuliuna: Pang
. ,.n.,n\ _
I.
_ ~~~. _,.~. __., ,*no<\_ 1 __.___.A___ ‘X,_,L..___. _, ,,OOL\ x3 * __LL.____^ ____.._:_ /L..._^__.&._.
442
processing and intracellular trafficking. With these constraints, the bulk of the aa sequence of the storage proteins apparently are free to diverge. Additional c&acting regulatory sequences must also be ‘functionally conserved’ to ensure appropriate transcriptional regulation of legumin-like genes. (3) Based on the aa sequence of the oc/jIcleavage site, the seed protein encoded by HaG3 is most similar to the K/.faba LeB4 gene (Baumlein et al., 1986) and the pea legJ/K genes (Gatehouse et al., 1988). Furthermore, based on the diversity of intron number and location among various legumin-like storage protein genes, it is likely that the progenitor gene for the B-type legumin genes was an A-type legumin gene (Lycett et al., 1984) containing three introns.
ACKNOWLEDGEMENTS
This research was supported by grants from the Texas Advanced Technology Research Program and Rh6ne-Poulenc Agrochimie. R.D.A. was a recipient of a W.R. Grace predoctoral fellowship. We thank Drs. Carl Adams and Juan Jordan0 for their critical review of this manuscript and also Conception Almoguera for critical technical assistance. We thank P. Pang, R. Pruitt and E. Meyerowitz for sharing their results prior to publication.
REFERENCES Allen, R.D.: Expression of 11s Seed Storage Protein Genes of Helianrhus annuw L. Ph.D. Dissertation, Texas A&M University, 1986. Allen, R.D., Nessler, CL. and Thomas, T.L.: Developmental expression of sunflower 11S storage protein genes. Plant Mol. Biol. 5 (1985) 165-173. Allen, R.D., Cohen, E.A., Vonder Haar, R.A., Adams, C.A., Ma, D.P., Nessler, CL. and Thomas, T.L.: Sequence and expression of an albumin storage protein in sunflower. Mol. Gen. Genet. 210 (1987a) 211-218. Allen, R.D., Cohen, E.A., Vonder Haar, R.A., Orth, K.A., Ma, D.P., Nessler, C.L. and Thomas, T.L.: Expression of embryo specific genes in sunflower. In Davidson, E.H. and Firtel, R. (Eds.), Molecular Approaches to Developmental Biology, Alan R. Liss, New York, (1987b) pp. 415-424. Benton, W.D. and Davis, R.W.: Screening Igt recombinant clones by hybridization to single plaques in situ. Science 196 (1977) 180-182.
Baumlein, H., Wobus, Il., Pustell, J. and Kafatos, F.: The legumin gene family: Structure of a B type gene of Viciafaba and a possible legumin gene specific regulatory element. Nucleic Acids Res. 14 (1986) 2707-2720. Borroto, K. and Dure III, L.: The globulin seed storage proteins of flowering plants are derived from two ancestral genes. Plant Mol. Biol. 8 (1987) 113-131. Chlan, C.A., Pyle, J.B., Legocki, A.B. and Dure III, L.: Developmental biochemistry of cottonseed embryogenesis and germination, XVIII. cDNA and amino acid sequences of members of the storage protein families. Plant Mol. Biol. 7 (1986) 475-489. Cohen, E.A.: Analysis of SunBower 2s Seed Storage Protein Genes. M.S. Thesis, Texas A&M University, 1986. Dale, R.M.K., McClure, B.A. and Houchins, J.P.: A rapid singlestranded cloning strategy for producing a sequential series of overlapping clones for use in DNA sequencing: applications to sequencing the corn 18s rDNA. Plasmid 13 (1985) 31-40. Devereux, C., Haeberli, P. and Smithies, 0.: A comprehensive set of sequences and analysis programs for the VAX. Nucleic Acids Res. 12 (1984) 387-395. Favaloro, J., Treisman, R. and Kamen, R.: Transcription maps of polyoma virus specific RNA: Analysis by two dimensional nuclease Sl mapping. Methods Enxymol. 65 (1980) 718-749. Frishauf, A.M., Lehrac,H., Poustka, A. and Murray,N.: Lambda replacement vectors carrying polylinker sequences. J. Mol. Biol. 170 (1983) 827-842. Gatehouse, J.A., Bown, D., Gilroy, J., Levasseur, M., Castleton, J. and Ellis, T.H.N.: Two genes encoding ‘minor’ legumin polypeptides in pea (Pirum sat&m L.). Bioch. J. 250 (1988) 15-24. Goldberg, R.B.: Regulation ofplant gene expression. Phil. Trans. R. Sot. B 314 (1986) 343-353. Higgins, T.J.V.: Synthesis and regulation of major proteins in seeds. Annu. Rev. Plant Physiol. 35 (1984) 191-221. Lycett, G.W., Croy, R.R.D., Shirsat, A.G. and Boulter, D.: The complete nucleotide sequence of a legumin gene from pea (Pisum sativum L.). Nucleic Acids Res. 12 (1984) 4493-4506. Maniatis, T., Fritsch, E.F. and Sambrook, J.: Molecular Cloning. A Laboratory Manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 1982. Maxam, A. and Gilbert, W.: Sequencing end-labeled DNA. Methods Enzymol. 65 (1980) 499-560. Messing, J.: New Ml3 vectors for cloning. Methods Enxymol. 101 (1983) 20-78. Mount, S.: A catalogue of splice junction sequences. Nucleic Acids Res. 10 (1982) 459-472. Pang, P.P., Pruitt, R.E. and Meyerowitz, E.M.: Molecular cloning, genomic organization, expression and evolution 12s seed storage protein genes of Arabidopsis thaliana. Submitted. Sanger, F., Nicklen, S. and Coulson, A.R.: DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74 (1977) 5463-5467. Sanger, F., Coulson, A.R., Barrell, B.G., Smith, A.J.H. and Roe, B.A.: Cloning in single-stranded bacteriophage as an aid to rapid DNA sequencing. J. Mol. Biol. 143 (1980) 161-178. Shotwell, M.A. and Larkins, B.A.: The biochemistry and molecular biology of seed storage proteins. In Marcus, A. (Ed.), The Biochemistry of Plants: A Comprehensive Treatise, Vol. 15
443 Plant Molecular Biology, Academic Press, New York, in press. Simon, A.E., Tenbarge, K.M., Scofield, S.R., Finkelstein, R.R. and Crouch, M.L.: Nucleotide sequence of a cDNA clone of Brassica napus 12s storage protein shows homology with legumin from Pisum sativum. Plant Mol. Biol. 5 (1985) 191-201. Takaiwa, F., Kikuchi, S. and Oono, K.: A rice glutelin gene family: a major type of glutelin mRNA can be divided into two classes. Mol. Gen. Genet. 208 (1987) 15-22. von Heijne, G.: A new method for predicting signal sequence cleavage sites. Nucleic Acids Res. 14 (1986) 4683-4690.
Walburg, G. and Larkins, B.A.: Isolation and characterization of cDNAs encoding oat 12s globulin mRNAs. Plant Mol. Biol. 6 (1986) 161-169. Wobus, U., Bgumlein, H., Bassiiner, R., Heim, U., Jung, R., Mtintz, K., Saalbach, G. and Weschke, W.: Characteristics of two types of legumin genes in the field bean (Viciu faba L. var. minor) genome as revealed by cDNA analysis. FEBS Lett. 201 (1986) 74-80. Communicated by T.D. M&night.