85
Gene, 67 (1988) 8.5-96 Elsevier GEN 02439
Nucleotide sequence and transcript analysis of three photosystem II genes from the cyanohacterium Synechococcus sp. PCC7942 (Recombinant
DNA;
p&l);
psbC; Anacystis nidzdans R2; prokaryotic
gene family; operon;
translational
start
codon)
Susan S. Golden and George W. Stearns* Department of Biology, Texas A & M University, College Station, TX 77843-3258 (U.S.A.) Received
15 January
1988
Revised
29 February
Accepted
3 March
1988
Received
by publisher
1988 25 March
1988
SUMMARY
The genome of the cyanobacterium Synechococcus sp. PCC7942 contains two genes encoding the D2 polypeptide of photosystem II (PSII), which are designated here as psbDZ and psbDII. The psbDI gene, like the psbD gene of plant chloroplasts, is cotranscribed with and overlaps the open reading frame of the psbC gene, encoding the PSI1 protein CP43. The psbDII gene is not linked to psbC, and appears to be transcribed as a monocistronic message. The two psbD genes encode identical polypeptides of 352 amino acids, which are 86% conserved with the D2 polypeptide of spinach. In plants, the translational start codon of the psbC gene has been reported to be an ATG codon 50 bp upstream from the end of the psbD gene. This triplet is not present in the psbDI sequence of Synechococcus sp., but is replaced by ACG, a codon which is very unlikely to initiate translation. Translation of the psbC gene may begin at a GTG codon which overlaps the psbDI open reading frame by 14 bp and is preceded by a block of homology to the 3’ end of the 16s ribosomal RNA, a potential ribosome-binding site. There are only two bp differences between the sequences of the two psbD genes; one of these results in substitution in psbDII of GCG for the presumed GTG start codon in psbDI.
The major membrane protein complexes of photosynthetic thylakoids from the oxygenic cyanobacteria are very similar to those from plant
chloroplasts (Ho and Krogmann, 1982). All of the core polypeptides of PSI1 are conserved between these groups of organisms. These include two chlorophyll-a-binding polypeptides, CP47 and CP43, two proteins, D 1 and D2, which may house all of the
Correspondence
methionine;
INTRODUCTION
Abbreviations: Texas
A & M University,
(U.S.A.) * Present Oaks,
to: Dr. S.S. Golden, College
Department
of Biology,
Station,
TX 77843-3258
1900 Oak Terrace
Lane, Thousand
Tel. (409) 845-9824. address:
AMGen,
CA 91320 (U.S.A.)
Tel. (805)499-5725
Ext. 340.
open
reading
acid;
PSII,
rRNA,
aa, amino acid(s); bp, base pair(s); kb, kilobase frame;
sodium dodecyl 0.001 M EDTA,
0378-l 119/8X/$03.50 0 1988 Elsevier
Science Publishers
B.V. (Biomedical
Division)
PIPES,
photosystem
ribosomal
reaction
S, sedimentation
center
of PSII;
constant;
SSPE, 0.1 M NaCl, 0.01 M NaH,
pH 7.6; tRNA,
ORF,
1,4-piperazine-diethanesulfonic
II; P680,
RNA;
sulfate;
fmet, formyl
or 1000 bp; nt, nucleotide(s);
transfer
RNA.
SDS, PO,,
86
cofactors
of the reaction center, and the two subunits
of cytochrome b-559 (reviewed by Satoh, 1985). In higher plants each of these proteins is encoded by a unique
gene in the chloroplast
cyanobacteria
genome,
whereas
one p&4 and psbD gene, respectively (Curtis Haselkorn, 1984; Golden et al., 1986; Williams Chishohn, heterodimer
in
D 1 and D2 are encoded by more than and and
1987). Dl and D2 are thought to form a which binds
the P680 reaction
center
chlorophyll molecule, the primary electron acceptor, and the first and second stable electron accepting quinones,
QA and
Qn .This model was first proposed
by Trebst (1986) based on homology of these polypeptides to the crystallographically-defined L and M subunits of the reaction centers of purple bacteria (Hearst and Sauer, 1984; Deisenhofer et al., 1985). This idea was supported when Nanba and Satoh (1987) isolated PSI1 particles containing only Dl, D2, and cytochrome b-559 which exhibited photoaccumulation of reduced pheophytin. The cyanobacterium Synechococcus sp. PCC7942 (also called Anacystis nidzdans R2) contains three distinct, transcriptionally active psbA genes encoding two forms of the Dl protein (Golden et al., 1986), suggesting that thylakoids from this organism may contain a mixed population of Dl-D2 heterodimers. We report here the nt sequence and transcript analysis of the two DZencoding and the single CP43-encoding ORFs from this organism.
MATERIALS
AND METHODS
(a) Bacterial straius The cyanobacterium Synechococcus sp. (Pasteur Culture Collection #7942) was grown in liquid BG-11 medium (Allen, 1968) with constant shaking at 30’ C under fluorescent illumination at a photosynthetic photon flux density of 100 PEinsteins x S - ’ x rnp2. Alternatively, cells were plated on BG11 solidified with 1.5 % Difco Bacto-Agar and supplemented with 1 mM Na,S,O, as previously described (Golden et al., 1987). A detailed protocol for transformation and gene inactivation in this organism is described by Golden et al. (1987). The psbDII-truncated mutant R2S2.2 was grown in the presence of 40 pg spectinomycin/ml.
(b) Screening of the bacteriophage Southern analysis
library
The psbD and psbC genes were identified library of Synechococcus sp. DNA cloned bacteriophage 2 vector Charon 30 (Rimm 1980). This library L. Sherman.
from a in the et al.,
was a gift from C. Vann
Plasmids
containing
portions
and
and
of the
psbD gene from Chlamydomonas reinhardtii and the psbC gene from Synechocystis sp. PCC6803
were a
gift from J. Williams. Plaques were lifted from agar plates onto Colony/Plaque Screen filters (NEN Research Products), probed with a fragment of the C. reinhardtiipsbD gene which had been 32P-labeled by nick translation, and washed in 2 x SSPE, 0.2% SDS at 65°C (Maniatis et al., 1982). Southern analysis of phage DNA to identify the psbC gene was performed by alkaline capillary transfer of DNA from 0.7% agarose gels to Gene Screen Plus (Reed and Mann, 1985). Unhybridized nick-translated psbC probe sequences were washed from the blot as for the Plaque Screen filters. (c) Nucleotide sequence analysis Restriction fragments of the psbD and psbcgenes were cloned into Ml3 mp 18 and 19 (Norrander et al., 1983), or the Bluescript (Stratagene) and pUCl8/19 (Norrander et al., 1983) plasmid vectors for nt sequence analysis. Escherichia coli strain DH5a (BRL) was used as a host for all plasmids, and the Ml3 vectors were propagated in JMlOl (Messing, 1983). A combination of dideoxynucleotide (from single- or double-stranded templates) and chemical cleavage methods was used to determine the complete nt sequence of each of the three genes on both DNA strands (Sanger et al., 1977; Chen and Seeburg, 1985; Maxam and Gilbert, 1980). (d) Transcript analysis Total RNA was isolated from Synechococcus sp. cells as described elsewhere (Golden et al., 1987). Northern analysis was performed by capillary transfer of glyoxal-treated RNA samples from agarose gels to Gene Screen Plus membranes as described in the legend to Fig. 4. Blots were probed with 32P-labeled restriction fragments produced by nick translation with a BRL labeling kit or anti-sense
81
RNA transcribed
from Bluescript
mids (Stratagene). identified
using
vector-based
plas-
The 5’ ends of transcripts
were
nuclease-protection
extension methods probes for mapping script are described
and
RESULTS
(a) Identification of the psbD genes from SynechocOccuSsp.
primer-
as described for Fig. 5. The the 5’ end of the psbDII tranhere. For primer extension, a
Recombinant Kharon bacteriophage containing two different EcoRI fragment inserts of Synechococ-
17-mer was synthesized which was complementary to the sequence shown in Fig. 2 at positions -2 to -18 of the psbDII 5’ flanking region. The extended quencing
cus sp. DNA were identified by hybridization with a C. reinhardtiipsbD probe which had been 32P-labeled by nick translation. The inserts of 11.5 kb and 10 kb
product was measured against a seladder generated by dideoxynucleotide
chain termination
AND DISCUSSION
corresponded
using the same primer on a DNA
to two
bands
digests which hybridized
template containing the psbDII 5’ flanking region. The DNA fragment used for nuclease protection experiments was 5’ end-labeled at the HincII site inside the psbDII ORF (nt 146), and extended to a PvuI site 173 bp upstream from the start codon. The same labeled fragment was subjected to chemical cleavage reactions (Maxam and Gilbert, 1980) to generate a sequencing ladder by which to measure the protected fragment.
in genomic
EcoRI
to the probe in Southern
analysis (data not shown). Subsequent restriction mapping and Southern analysis indicated that the region of psbD homology in each of these clones was completely internal to the EcoRI fragments, suggesting that two distinct psbD genes were present in this organism, as has been reported for the cyanobacterium Synechocystis sp. PCC6803 (Williams and Chishohn, 1987). The nt sequence of both regions of homology confirmed the presence of two complete psbD ORFs. These genes were named psbDI and psbDII, in keeping with the nomenclature of the cyanobacterial psbA multigene families (Curtis and Haselkorn, 1984; Golden et al., 1986). Fig. 1. sche-
psbC ORF
Pv 4
HcA
SB
Hc
44
44
4
psbDII ORF
Fig. 1. Schematic
summary
of the psbD physical
each of the psbD genes and psbC. Restriction identify each enzyme:
A, ApaI;
B, BumHI;
H, HindIII;
is represented
by an open box labeled with the respective
The direction,
size, 5’-end position
restriction
fragment.
and estimated
heavy lines represent
sites are denoted
Hc, HincII;
psbDZ and psbDZZ are aligned with respect
representing
I
maps. Horizontal
enzyme cleavage
ioodp
arrows
P, PstI; Pv, PvuII; R, EcoRV;
to conserved gene name;
3’-end position
the region of the chromosome
by downward
restriction
sites and ORFs.
the orientation
of each transcript
abbreviations
S, SstI; Sp, SphI. The heavy lines The position
of each of the ORFs
of each ORF is, left-to-right,
is denoted
which contains
and the following
by a hatched
N to C terminus.
arrow below the respective
88
-170 . RSU GCTTCCAGTA TCTCAGATCA ATATCCCTCC CCGATGGGAG GCAGCCTGCC -GAGA CGACAGTAAC GAAACGTTAA GCTGCGACCT CTGAGAGGGG CAGTTTCTGC GATTTCAATG AGTAGATCTG CTCAAAGGCT TGCCCCCAAA GGTCTCTGAG CAATTCGTGT CGGAm
-80
(-81)
Q&L
CGGTTGCCAT AATCATCCAT GTTTGTCGCA AACTGCGTTT CTCTGAACTG ATATTGCAAA TATCTocAc;A TTGCTAAGCA +l AAATCCTTGG AGCCGAGGGGTGGA CAACTTGCTG AAACGTTCTG ATTTCAGTAC GGCRAGAGOT TTTAGATCCA ATG ACG ATT (9) AAA M T I r31 GCA GTA GGG CGA CXG CCA GCG GAG CGG GGA TGG TTT GAC GTC CTC GAC GAC TGG CTG AAG CGC GAC CGA TTT GTA (84) AVGRAPAERGWFDVLDDWLKRDRF V WI
TTT GTG GGT TGG TCA GGG TTG CTG CTG TTT CCC TGT GCG TAT TTA GCA CTG GGC GGG TGG TTG A& FVGWSGLLLFPCAYLALGGWLTGT TTT GTG ACG TCG TGG TAC ACC CAC GGC ATC GCG TCT TCG TAC TTA GAA GGC ‘XC FVTSWYTHGIASSYLEGGNFLTVA
GGG ACC AGC (159) S
[531
AAC Tl!T TTG ACC GTA GCA GTG (234) V
[781
AGC ACC CCA GCG GAT GCG TTT GGG CAT TCG TTG ATG CTG CTG TGG GGC CCC GAG GCA CAA GGG AAC TTC GTG CGT (309) STPADAFGHSLMLLWGPEAQGNFV R [103] TGG TGC CAG TTG GGT GGC TTG TGG AAC TTC GTA GCA CTG CAC GGC GCC TIC GCG CTG ATT GGG TTC ATG CTG CGT (384) WCQLGGLWNFVALHGAFALIGFML R (1281 CAA TTT GAG ATT GCG CGG TTG GTG GGC GTC CGT CCG TAC AAC GCG ATC GCC TTT TCG GGT CC% ATC GCA GTG TTC (459) QFEIARLVGVRPYNAIAFSGPIAV F [153]
GTG TCG GTG TTC TTG ATG TAC CCG TTG GGT CAA TCG AGC TGG TTC TTC GCT CCG AGC TTT GGC GTG GCA GCG ATT (534) VSVFLMYPLGQSSWFFAPSFGVAA I (1781 TTC CGG !tTT TTG TTG TTC CTG CAA GGG TIC CAC AX FRFLLFLQGFHNWTLNPFHMMGVA
TGG ACC TTG AAC CCA TTC CAC ATG ATG GGC GTG GX
GGG (609) G [203]
ATT TTG GGT GGG GCA TTG CTG TGC GCC ATT CAC GGT GCG ACG GTG GAG AAC ACC CTG TTC GAG GAT TCA GAG CAA (684) ILGGALLCAIHGATVENTLFEDSE Q (2281 TCG AAC ACC TTC CGG GCA TTT GAG CCG ACG CAG GCC GAA GAG ACG TAC TCG ATG GTG ACG GCG AAC CGT TTT TGG (759) SNTFRAFEPTQAEETYSMVTANRF W [253] AGC CAG ATT TTC GGG ATT GCG TTT TCG AAC AAG CGG TGG CTG CAC TTT !M’C ATG CTG TTC GTG CCG GTG ACG GGC (834) SQIFGIAFSNKRWLHFFMLFVPVT G (2781 TTG TGG ATG AGC TCG ATC GGG ATT GTA GGT TTG GCG TTG AAC CTG CGG GCG TAC GAC TPC GTG TCG CAG GAG CTG (909) L (3031 LWMSSIGIVGLALNLRAYDFVSQE CGG GCC GCT GAG GAT CCG GAA TTT GAG ACG TTC TAC ACG AAG AAC ATC TI’G TTG AAC GAA GGG ATT CGG GCC TGG (984) W (3281 RAAEDPEFETFYTKNILLNEGIRA T C ATG GCA CCG CAA GAC CAA CCG CAC GAA AAA TTC GTC TTC CCC GAA GAG GTT CTG CCC CGT GGT AAC GCT CTC TAG (1059) * [352*] MAP QD QPHEKFVFPEEVLPRGNAL acg aaa aat tcg tct tee ccg sag agg ttc tgc ccc GTG GTA ACG CTC TCT AGT(1060) v/M V T L S tknssspk rfcp S (61 (end PsbDl-I homology) CCT TCC GTG ATC GCA GGC GGC CGG GAT ATT GAC TCC ACC GGT TAC GCT TGG TGG TCC GGC AAT GCC CGT TTG ATC (1135) I (311 PSVIAGGRDIDSTGYAWWSGNARL
Fig. 2. Nucleotide
and deduced
region represents
the nucleotide
as + 1 and subsequent
nt numbers
amino acid sequences sequence
are shown in parentheses
-1 to -170. Within the ORF, a single sequence shown
above the line. Representation
ORF overlaps
of the psbDZ, psbDZZ, and psbC genes. The upper line in the upstream
flanking
of the psbDZZ gene and the lower line is psbDZ. The first nt of the psbD ORF is designated represents
at the end of each line of nt sequence;
5’ flanking
sequences
are numbered
both genes, except at nt 1041 and 1044, where the psbDZZ nt differences
of psbDZZ ends after nt 1059; the downstream
the end of the psbDZ ORF. The ACG that corresponds
for psbC is shown in lower case letters, as is the translation
to the position
flanking
are
region can be seen in Fig. 3. The psbC
of the previously
of this region. The upper case indicates
reported
translational
our suggested
start codon
translational
start
site at nt 1043. Amino acids are specified by the single letter code shown below the nt sequence. Amino acid numbers are given in brackets
89
AAC CTG TCC GGT AAG CTG CTG GGC GCT CAC GTC GCT CAT GCT GGC TTG ATC GTC TTC TGG GCT GGT GCG ATG ACG (1210) NLSGKLLGAHVAHAGLIVFWAGAM T [561 CTG TTT GAA GTC GCG CAC TTT GTC CCC GA?+ AAA CCG ATG TAC GAG CAA GGC ATC ATC CTG CTC TCG CXC TTG GCG (1285) LFEVAHFVPEKPMYEQGIILLSHL A [811 ACC CTC GGC TGG GGC GTT GGC CCT GGT GGC GAA GTC OTC GAT ACC TTC CCC TAC TTT GTG G'IT GGG GTT CTG CAC (1360) TLGWGVGPGGEVVDTFPYFVVGVL H [106] CTC ATT TCT TCC GCC GTT CTG GGT TTG GGT GGG ATC TAC CAC GCC CTG CGC GGC CCT GAG TCG CTG GAA GAG TAC (1435) LISSAVLGLGGIYHALRGPESLEE Y [131] ACX ACC TTC TTC AGC CAA GAC TGG AAA GAC AAG AAT CAG ATG ACC AAC ATC ATT GGT TAT CAC CTG ATT CTG CTG (1510) STFFSQDWKDKNQMTNIIGYHLIL L [156] GGC TTA GGT GCC TTC TTG CTG GTC TTT AAG GCC ATG TTC TTC GGC GGT GTC TAT GAC ACC TGG GCG CCG GGT GGT (1585) GLGAFLLVFKAMFFGGVYDTWAPG G [181] GGC GAT GTC CGC ATC ATC TCC AAC CCA ACC CTC AAC CCG GCT GTG ATC 'iTC GGC TAC CTG CTG AAA TCA CCC TTT (1660) GDVRIISNPTLNPAVIFGYLLKSP F [206] GGT GGC GAC GGC TGG ATT GTC AGC GTC GAC AAC CTT GAA GAC GTG ATT GGC GGC CAT ATC TGG ATT GGT CTG ATC (1735) GGDGWIVSVDNLEDVIGGHIWIGL I [231] TGC ATT TCG GGT GGT ATC TGG CAC ATC CXG ACC AAG CCT TTT GGC TGG GTC GGT CGC GCC TTC ATC TGG AAT GGC (1810) CISGGIWHILTKPFGWVGRAFIWN G [256] GAA GCT TAC CTC TCC TAC AGC TTG GGT GCC CTG TCG TTG ATG GGC T'PC ATT GCC TCG ACG ATG GTT TGG TAC AAC (1885) EAYLSYSLGALSLMGFIASTMVWY N [281] HindIII AAC ACC GTC TAT CCT TCC GAG TTC TTT GGC CCG ACC GCT GCT GAA GCT TCG CAA TCG CAA GCC TTC ACC TTC TTG (1960) NTVYPSEFFGPTAAEASQSQAFTF L [306] GTG CGT GAC CAA CGC CTC GOP GCC AAC ATC GGT TCA GCT CAA GGC CCG ACC GGT CTG GGT AAA TAC CTG ATG CGC (2035) VRDQRLGANIGSAQGPTGLGKYLM R [331] TCT CCT ACC GGC GAG ATC ATfZ TTC GGT GGC GAA ACC ATG CGC TTC TGG GAC TTC CGT GGC CCJ! TGC GTG GAG CCC (2110) SPTGEIIFGGETMRFWDFRGPCVE P [356] CTG CGT GGA CCG AAT GGT CTG GAT CTC GAC AAG CTG ACC AAT GAC ATT CAG CCT TGG CAA GCC CGT CGT GCG GCT (2185) LRGPNGLDLDKLTNDIQPWQARRA A [381] GAG TAC ATG ACC CAC GCA CCG CTG GGT TCG CTG AAC TCT GTG GGT GGT GTG GCA ACG GAA ATC AX EYMTHAPLGSLNSVGGVATEINSV
TCG GTG AAC (2260) N [406]
TTC GTG TCT CCC CGT GCT TGG TTG GCG ACC AGC CCA TJ?C GTC TTG GCC TPC TTC TTC TTG GTC GGT CAC CTC TGG (2335) FVSPRAWLATSPFVLAFFFLVGHL w [431] k&L CAT GCA GGC CGC GCT CGT GCA GCT GCT GCA GGC TTT GAG AAA GGT ATC GAT CGC GCG ACC GAA CCC GTG CTC GCA (2410) HAGRARAAAAGFEKGIDRATEPVL A [456] ATG AGA GAC CTC GAC TAA TTCCA?iCTGC AGGACATTAG CCTCAAAGIC TGAAAAGCCC TTGCCTCGGC AGGCGGTTTT TCGTATCTCT(2498) [461*] MRDLD* GGGGGAATGA CAACGCCTGC GAGCAGCTGG GGTGCTCTTA CCGACAGTGG TTTGGGAGAA GTCACTGAGC GGCTCTAGTT TTCTGGAATC
(2588)
TTGCGGTGGA ATAGTCCCAG
(2678)
CTCGCAGCCC TCTCGTCGCA AAGTCTCAGT TAGACGCTGC CAGTTCGCAG CATCGCAAG?i GCATTTCTCC
at the end of each line; numbering Possible
ribosome-binding
are marked
by arrowheads
of prokaryotic underline.
ofthe psbD aa sequence
the 5’ end (start points)
(one for the upper line and three for the lower line). Blocks of nt having sequence
‘-10’ elements
are underlined.
No such element was identifiable
the nt sequence;
begins at the M (ATG, nt l-3) and of psbC at the V/M (GTG, nt 1043-1045).
sites are shown in bold face type. Those nt that represent The ‘-35’ element upstream
upstream
from the psbDZ/psbC message
from psbDZ2. Only those restriction
there are other sites for some of these enzymes
that are not shown.
sites mentioned
of each of the messages
and spacing characteristics is marked
with a zig-zag
in the text are shown above
90
psbDI
I+1040 CCC CGTGGT
t '1060 AAC GCT CTC TAG TCCTTCCGTG
psbDII D2
CCT CGCGGT PRGNAL*
AAC
psbDI psbDII
’ 1090 TGACTCCACC TCAACGGCCT
I
GCT CTC TAG GCATTTTTCT
I
GGTTACGCTT GGTGGTCCGG CCCCAAAGGA GTCTTGCCGT
I
I
ATCGCAGGCG CTAACAGGAA
GCCGGGATAT AGATTTTTGC
I
I
,
CAATGCCCGT TAGATAGGGG
TTGATCAACC GCGTAAACCT
TGTCCGGTAA GTTGTAGTTG
Fig. 3. Divergence of the psbD genes downstream of the ORF. The nt sequences encoding the last six aa of the D2 C-terminus and downstream flanking region are compared for psbDI and psbDIZ. Numbers refer to the numbering system used for Fig. 2. Two nt that differ between the ORFs ofthe two genes are shown in bold type. There is no apparent conservation between the two sequences following the TAG stop codons.
matically
summarizes
the physical map of the genes
as derived from restriction and transcript analyses. (b) Nucleotide
mapping,
nt sequence,
sequence analysis of the psbD genes
The psbDZ and psbDZZ genes have ORFs encoding 352 aa (Fig. 2). Only two bp differences were detected within the coding regions of the two genes; both of these occur in third positions of codons, and do not affect the specified aa. The overall conservation between the aa sequence of this D2 polypeptide and that reported for spinach is 86%, although an additional aa is encoded within the N-terminus of the spinach psbD gene (Holschuh et al., 1984). Since the two genes predict an identical D2 polypeptide, and since the three psbA genes encode two distinct forms of the Dl polypeptide (Golden et al., 1986), there are two possible Dl-D2 heterodimers in the PSI1 reaction centers. The flanking regions downstream from the ORFs diverge immediately following the stop codons (Fig. 3). This is not surprising, since psbDZ overlaps a second ORF, and no equivalent ORF is present adjacent to psbDZZ (see RESULTS AND DISCUSSION, section c). Except for the 2 nt immediately before the start codons, the upstream regions of psbDZ and psbDZZ are divergent. The start codons of both genes are preceded by a Shine and Dalgarno (1974) consensus site which is complementary to the 3’ end of one of the Synechococcus sp. 16s rRNA genes (S.S.G., unpublished). The spacing between the putative ribosome-binding sites and the ATG start codons is 10 bp for psbDZ and 11 bp for psbDZZ. This is within the range of spacing observed in E. coli; however, the usual distance between these elements in E. coli is 7 + 2 nt (Stormo, 1986; Gold, 1988).
(c) Identification psbC gene
and nucleotide
sequence
of the
Previous reports indicated that the psbC gene in plants and in another cyanobacterium overlaps the 3’ end of the psbD gene, potentially by 50 bp (Holschuh et al., 1984; Bookjans et al., 1986; Williams and Chishohn, 1987). Examination of the nt sequences of the C-terminal regions of the two psbD genes did not reveal the ATG putative start codon for psbC gene that is present in the other organisms. Southern analysis of DNA from each of the Kharon bacteriophage clones indicated that homology to psbC lay downstream from the psbDZ gene, but no psbC homology was present adjacent to psbDZZ (data not shown). The sequence of the DNA fragment downstream from psbDZ confirmed an ORF that was highly homologous to the spinach psbC sequence, extending from nt 1007 to nt 2425 (Fig. 2); however, the putative ATG start codon was replaced by the triplet ACG in the Synechococcus sp. sequence. Since other PSI1 genes from this organism appear to be lethal in E. coli when cloned on highcopy-number plasmids (Golden et al., 1986), we wanted to determine whether a T + C transition had occurred during subcloning. Therefore we determined the sequence at this position by chemical cleavage directly from the recombinant bacteriophage DNA, and found that ACG was present in the original clone. A report by J. Gingrich (Workshop on Molecular Biology of Cyanobacteria, July 1987, St. Louis, MO) that the marine cyanobacterium Synechococcus sp. PCC7002 also lacked the ATG triplet supports our sequence. Gingrich suggested that translation from the psbC transcript might begin at a conserved GTG codon nearer the end of psbD in chloroplasts and in cyanobacteria (Table I). GTG
91
TABLE I Nucleotide sequence a comparison of the psbD/psbC overlap region from four species GACTCAAGATCAGCCTC~AAAACCTTATATTCCCCTGAGGAGGTTCTACCAC~G~GGCTCAAGATCAGCCTC~AAAACCTTATATTCCCTGAGGAGGTTCTACCCC~GAATCCCCAAGATCAACCCC~AAAACTTTATCTTTATCTTCCCTGAGGA~TTCTCCCCC~GTAACCGCAAGACCAACCGCACGAAAAATTCGTCTTCCCCGAAGA~TTCTGCCCC~GTA-
Pea” Spinach” Synechocystisd Synechococcus’
a Corresponds to the region represented as nt 990-1048 in Fig. 2. Bold type indicates possible ribosome-binding sites; potential start codons are underlined. b Bookjans et al. (1986). ’ Holschuh et al. (1984). d Chisholm and Williams (1988). e Fig. 2, nt 990-1048.
initiates translation in 8 y0 of the known E. coli genes (reviewed by Stormo, 1986), and fmet-tRNA will form a ternary complex with a 30s ribosomal subunit and either a GTG or ATG start codon in vitro (reviewed by Gold et al., 1981). At least one cyanobacterial gene is thought to initiate translation at a GTG (Lomax et al., 1987). Overall aa homology among proteins specified by the psbC genes of Synechococcus sp. and spinach is 79-80x, depending on whether the comparison is made beginning at nt 1007 or 1043, respectively. A number of features in the sequence (Fig. 2) support Gingrich’s hypothesis. Preceding the GTG codon (nt 1043-1045), with a 9-bp spacer, is the sequence ‘AAGAGG’, a perfect complement to the 3’ end of one of the 16s rRNA genes (S.S.G., unpublished results). The best match to a putative ribosome-binding site upstream from the ACG codon is ‘AAGA’ (with an 8-nt spacer). The sequences from spinach (Holschuh et al., 1984), pea (Bookjans et al., 1986), Synechocystis sp. (Chisholm and Williams, 1988), wheat (J. Gray, personal communication) and barley (E. Neumann, personal communication), all of which have an ATG at the site previously reported as the psbC start codon, have a higher degree of similarity to the Shine-Dalgamo sequence upstream from the GTG codon than upstream from the ATG (Table I). It should be noted, however, that the degree of homology to the 16s ribosomal RNA is not sufficient to predict the strength of ribosome binding (Gold et al., 1981). There are only two bp differences between the psbDI and psbDII sequences, neither of which changes an aa in the D2 polypeptide. However, these two nt differences would alter the aa predictions for an overlapping psbC gene in another reading frame.
One difference changes the putative GTG psbC start codon in psbDI to a GCG triplet in psbDII. One might expect the psbC start codon to be missing from psbDII since there is no psbC ORF on its message. This nt change may block ribosomes from abortive translation initiation at the end of psbDII. The ACG triplet, however, is present in both psbDI and psbDII. It is unlikely that the Synechococcus sp. translational start site differs from that used by Synechocystis sp. and chloroplasts. The similarity of the genes between species, including the positioning of Shine-Dalgarno sequences before each of the potential start codons, suggests that the initiation signals are conserved. Therefore these data suggest that GTG is the start codon for psbC in all of these species. Protein sequence analysis of the mature CP43 polypeptide from spinach (Michel et al., 1988) does not clarity the translational start site, as there is evidence of protein processing during maturation. The mature N-terminal residue is N-acetyl-O-phosphothreonine, which represents the third aa if translation begins at the conserved GTG, and the fifteenth residue if translation begins at the first ATG in the spinach sequence. We have not yet determined whether the CP43 polypeptide or three other PSI1 phosphoproteins are modified in the same way in cyanobacteria. (d) Analysis of transcripts from the psbD genes Northern analysis of total RNA probed with a restriction fragment containing most of the psbDI ORF detected a 2.5-kb RNA (Fig. 4, lanes 1 and 2). A probe from a region of the psbC gene that did not overlap psbDI recognized an RNA of the same size
92
12
34
upstream
56
from psbDI would be expected
nate shortly psbC. No mRNA
bands
were detected
using
highly
homologous
to termi-
stop codon
of
other than the 2.5-kb species a probe to
that
a message
should
be very
arising
from
psbDII. To determine whether a psbDII transcript was present, we constructed a sensitive probe that
2.5kb
d== LZkb LO kb
Fig. 4. Northern truncated glyoxal
mutant and
according
of psbDI, psbC, and psbDII. Total
transfers
cells (lanes 1, 3, and 5) and the psbDZZ-
RNA from wild-type
R2S2.2
separated
to Thomas
(lanes 2, 4, and 6) was treated
by
using electrophoresis hybridized BumHI
1.2%
with
gel electrophoresis,
nylon membrane
from the
by capillary
transfer
buffer. The blot was sliced into strips and
with the following fragment,
agarose
(1983). The RNA was transferred
gel to a positively-charged
probes:
32P-labeled
lanes 1 and 2, a 1.2-kb
by nick translation,
ending
at
nt 921 (Fig. 2) and carrying
most of psbDZ and none of psbC;
lanes 3 and 4, and antisense
RNA transcribed
consisting
of a 435-bp HindIII-PstI
from a template
fragment
from a template
immediately to -158).
upstream Unhybridized
consisting probe
was washed
0.1 x SSPE, 0.2% SDS at 70°C (Maniatis mark the hybridizing dots indicate
transcripts,
the positions
leading edges of the rRNAs
RNA trans-
of a 154-bp D&I
the psbDZI ORF
from
to psbC
internal
(Fig. 2, nt 1930 to 2365); lanes 5 and 6, an antisense cribed
after the translational
fragment
(Fig. 2, nt -4
from the blots in et al., 1982). Arrows
whose sizes are shown. Black
of the rRNAs; are artifacts
diffuse bands caused
at the
by these abun-
dant RNA species.
(Fig. 4, lanes 3 and 4). This result suggests that the 2.5-kb species is the transcript from an operon that includes psbDI and psbC. To determine whether psbDI and psbC are cotranscribed, the 5’ end of the psbDI message was mapped using Sl and mung bean nuclease-protection and primer-extension reactions (Fig. 5). These analyses identified a single start point with terminal nt corresponding to nt -53 to -51 in Fig. 2. This 5’ end was very abundant in the RNA population, indicating that it corresponded to the message detected by Northern analysis. A 2.5-kb message that begins approximately 50 nt
would detect a rare psbDII message, but would not hybridize to the psbDI message. This probe was transcribed as a 32P-labeled antisense RNA from a DdeI fragment (Fig. 2, nt -4 to -158) that would recognize the 5’-untranslated leader sequence, but none of the protein-coding region, of a psbDII message. A Northern blot probed with this labeled RNA identified an approx. 1.2-kb message from psbDII (Fig. 4, lane 5). This species was not detected by any nick-translated probes we used, including the DdeI fragment, indicating that the psbDII message is much less abundant than the psbDI message in cells grown under our standard laboratory conditions. Sl nuclease mapping of the psbDII transcript using a probe that originated at a &XII site inside the psbDII ORF (Fig. 2, nt 146) and extended to nt -173 identified a cluster of protected bands approx. 100 nt upstream from the ORF (data not shown). This Sl-resistant signal had less than onetenth the intensity of a protected band extending a few nt upstream from the start codon. The more intense signal probably originates from protection of the psbDII-derived DNA fragment by the abundant psbDI message, which is complementary to psbDII within the ORF and 2 nt upstream from the start codon (Fig. 2). The 5’ end of the psbDII transcript was determined more precisely by primer extension using a synthetic oligodeoxynucleotide that was complementary to the 5’-untranslated region of the psbDII message (nt -2 to -18). To assign the position of the extended product, the same primer was used for dideoxynucleotide sequencing from a DNA template of the psbDII upstream region and the products of this reaction were run as a standard (data not shown). Reverse transcription with this primer yielded a single band corresponding to the nt at -108 indicated in Fig. 2. The sequences upstream from the 5’ end of each message were examined for putative promoter sequences centered around 10 and 35 nt from the
93
transcriptional start points. The psbDI gene has appropriately spaced regions of homology to the ‘-10’ and ‘-35’ elements of E. coli promoters. The psbDII transcript is preceded by a perfect E. coli consensus ‘-10’ element, but no homology to a ‘-35’ element is observed. It should be noted that in E. co& genes under positive regulation often lack homology to the ‘-35’ element (reviewed by Raibaud and Schwartz, 1984). There is insufficient data to define a consensus promoter in Synechococcus sp. or in any other cyanobacterium; however, all five Synechococcus sp. PSI1 genes for which transcripts have been examined are preceded by at least one of the characteristic E. coli promoter elements at the appropriate distance from the transcriptional start point (Golden et al., 1986). Thus, some characteristics of transcriptional initiation signals seem to be conserved between Synechococcus sp. and E. coli. (e) Truncation of the psbDII
gene
A mutant having a truncated psbDII gene was constructed by recombination of a cloned, altered allele with the chromosome of Synechococcus sp. To same end-labeled to generate
fragment,
ORF (numbering as described protection
followed
and Geiduschek
the procedure
for Sl protection
of MBN buffer
with ethanol
and nuclease
ated on 6% polyacrylamide, (Maxam
are shown:
C + T sequencing
message.
Total RNA from wild-type
protection contained labeled
and
of the psbDZ
point)
cells was used for nuclease-
primer-extension
analyses.
Each
reaction
10 pg of total RNA and approx. 21000 cpm of 5’-endDNA.
was a 440-bp
For nuclease
protection
double-stranded
only one 5’ end, corresponding
experiments,
restriction
fragment
to the HincII
to the upstream
(see Fig. 1). The reverse-transcription
primer
the complement
labeled
difference
BumHI
consisted
the chemically
and the nuclease-treated
site
heavy band primer.
of the
ladder;
of the ladder
to the bands
cleaved
by an arrow
in
shown is
in Fig. 2, and that a 1.5~nt migration sequencing
and reverse-transcribed
in lane 3 marked
lane 5, Sl nu-
sequence
presented
lane 2, reaction;
Note that the sequence
of the sequence
which
The following
reaction;
of the 5’ ends took into account
the
by chemical
lane 3, primer-extension
by asterisks.
between
gels (Maniatis
experiments.
The nucleotide
of
were separ-
1980) of the same fragment
lane 1, A + G sequencing
reaction.
lanes 3-5 marked assignment
at
reactions
ladders generated
protection
ladder;
1 mM
and dried. The products
is given to the left, with the nt corresponding
the DNA
site within
psbDZ ORF (nt 146), and extending
clease-protection
50 mM NaCl,
protection
lane 4, mung bean nuclease-protection of the 5’ end (start
was done
for 1 h, after which the
7 M urea sequencing
and Gilbert,
was used in the nuclease
Fig. 5. Determination
et al. (1983). The reaction
Mung bean nuclease
acetate,
at 37°C continued
et al., 1982) along with sequencing
samples
of Turner
was added to the sample in a 300 pl’aliquot
sample was precipitated
cleavage
was performed
(1982). Sl nuclease
experiments.
(30 mM sodium
Incubation
primer extension
with Hue&
at nt + 22 of the
step of the mung bean nuclease
(4 units, Pharmacia) ZnCl,).
by digestion
that terminated
as in Fig. 2). Primer extension
by Kassavetis
hybridization as described
shortened
a 125-bp fragment
ladders
products.
The
is the unextended
94
construct this mutant, a plasmid clone carrying most of the psbDII gene, and unable to replicate in
mechanism
Synechococcus sp., was cleaved at a unique SstI site within the psbDII ORF and ligated to the B fragment from pHP4552 (Prentki and Krisch, 1984). The
psbC. We have not detected a monocistronic message for psbC; however, our analyses may not have been sufficiently sensitive to detect one which
52fragment encodes resistance to spectinomycin and streptomycin and is flanked by inverted repeats that
might be present at a much lower abundance dicistronic message. As the Northern
carry signals.
transcription
and
Transformation
translation
termination
of Synechococcus with this plasmid
sp. to
spectinomycin
resistance
resulted
in replacement
of the psbDII gene with the mutant
change the ratio of D2 : CP43, which might favor a to uncouple
transcription
shown in Fig. 4 demonstrate, was detectable
than the analyses
the psbDII message
only when an antisense
specific for the transcript
of psbD and
RNA probe
from psbDII was used.
allele and loss of the non-replicating vector sequences, as has been previously reported for gene
(f) Conclusions
inactivation in this organism (Golden et al., 1986; Golden et al., 1987). Fig. 4 lane 6 shows that the psbDII message from this mutant is truncated by approximately 200 nt, whereas the psbDI/C message is unperturbed (lanes 2 and 4). The location of the spectinomycin insertion is 213 bp from the end of the ORF; the size of the mutant message suggests that termination is occurring near the junction of psbDII and the spectinomycin-resistance cassette. Viability of the strain carrying a truncated psbDII gene suggests that either psbDII is not essential, or that the mutation did not adversely affect the protein. Given that the mutation removes 71 aa from the C terminus, including all of the hydrophilic lumenal tail and a portion of a putative membrane-spanning domain (Trebst, 1986), it is unlikely that the defective protein would integrate into the membrane and function properly. An indication that psbDII is not essential under normal growth conditions is that a second mutant, carrying a kanamycin-resistance
(1) The genome of the cyanobacterium Synechococcus sp. PCC7942 (A. nidulans R2) contains two psbD genes that encode the PSI1 polypeptide D2. This organism also has been shown to have three functional psbA genes encoding the D 1 polypeptide. Dl is another PSI1 protein that probably forms a heterodimer with D2 to house the cofactors that carry out the primary photochemical reactions. Unlike the psbA genes, which encode two distinct primary aa sequences for Dl, the two psbD genes encode identical D2 proteins. This indicates that only two forms of the PSI1 reaction center are possible in this organism: those having heterodimers consisting of a single D2 polypeptide and one of two forms of the Dl protein. (2) One of the psbD genes, psbDI, is transcribed as a dicistronic message with the unique psbC gene, whose ORF appears to overlap that of psbDI. This is the same configuration that has been reported in higher plant chloroplast genomes (Holschuh et al.,
cassette inserted into psbDI1 after only 93 aa of the ORF, is also viable (data not shown). The presence of a duplication of the psbD locus (but not of psbC) in cyanobacteria presents an evolutionary puzzle. Chloroplast genomes have a psbDC operon analogous to the psbDI configuration, but no examples are known for a second copy of psbD in plants. In the case of the multiple psbA genes in Synechococcus sp. two distinct forms of the D 1 protein are produced, which may suggest a regulatory role if the two products have different functions. The products of the two psbD genes are identical, so any regulatory advantage in maintaining two independently transcribed genes is likely to be of a quantitative, rather than a qualitative nature. It may be advantageous under some conditions to
1984; Bookjans et al., 1986) and in the distantly related cyanobacterium Synechocystis sp. PCC6803 (Chisholm and Williams, 1988). The psbDII gene appears to be transcribed as a monocistronic message. The steady-state level of this message is less than that of the psbDI/Ctranscript by at least lo-fold under our standard growth conditions. (3) Both psbD genes have upstream sequence elements that are similar to transcription and translation initiation signals in E. coli. (4) The psbC gene of cyanobacteria and chloroplasts appears to begin with a GTG codon 14 nt before the psbD stop codon, rather than at a triplet overlapping psbD by 50 nt as previously reported for chloroplast sequences.
95
plast thylakoid
ACKNOWLEDGEMENTS
binding
We gratefully acknowledge the expert technical assistance of M. Nalty, and the contributions of Drs. D.-P. Ma and R. VonderHaar, past and present directors of the Department of Biology, DNA Sequencing Support Facility. We are indebted to Dr. J. Williams, who made sequences and probes available to us prior to publication. This work was supported by a grant from the NIH to S.S.G. (ROl GMS 37040). Some of the equipment used in this research was provided by an NSF Biological Instrumentation Program Grant to S.S.G. and other investigators (BBS-8703784).
membranes:
Ho, K.K. and Krogmann, and Whitton, Blackwell, Holschuh,
Oxford,
Nucleic
The Biology
In Carr, N.G.
of Cyanobacteria.
W. and Whitfeld, P.R.: Structure
chloroplast
centre proteins
to quinone-
1982, pp. 191-214.
K., Bottomley,
spinach
relation
39c (1984) 421-424.
D.W.: Photosynthesis.
B.A. (Eds.)
genes for the D2 and 44-kd
of photosystem
of the
reaction-
II and for tRNA”“’
(UGA).
Acids Res. 12 (1984) 8819-8834.
Kassavetis,
G.A. and Geiduschek,
promoters:
mapping
E.P.: Bacteriophage
T4 late
5’ ends of T4 gene 23 mRNAs.
EMBO
J. 1 (1982) 107-114. Lomax,
T.L., Conley,
Isolation
P.B., Schilling,
and characterization
some linker polypeptide polycistronic Maniatis,
mRNA.
T., Fritsch,
A Laboratory Spring Maxam, REFERENCES
a proposed
sites. Z. Naturforsch.
A.R.:
phycobili-
genes and their transcription
J. Bacterial.
as a
169 (1987) 2675-2684.
E.F. and Sambrook,
Manual
Harbor,
J. and Grossman,
of light-regulated
J.: Molecular
Cold Spring Harbor
Cloning.
Laboratory,
Cold
NY, 1982.
A.M. and Gilbert,
with base-specific
W.: Sequencing
chemical
cleavages.
end-labeled
Methods
DNA
Enzymol.
65
(1980) 499-559. Allen, M.M.: Simple conditions
for growth
green algae on plates. J. Phycol. Bookjans,
G.,
Henningsen,
Stummann,
B.M.,
K.W.: Structure
plast DNA containing polypeptide.
blue-
Rasmussen,
O.F.
and
of a 3.2-kb region ofpea chloroII
Plant Mol. Biol. 6 (1986) 359-366.
simple method
P.H.: Supercoil
for sequencing
165-170. Chisholm, D. and Williams,
J.G.K.:
a fast and
DNA. DNA 4 (1985)
Nucleotide
the CP43
sequence
chlorophyll
of
a-binding Synecho-
II, in the cyanobacterium
cystix 6803. Plant Mol. Biol. 10 (1988) 293-301. Curtis,
S.E. and Haselkorn,
sion of two members
R.: Isolation,
and expres-
membrane
pro-
Anabaena 7120.
tein gene family from the cyanobacterium Plant Mol. Biol. 3 (1984) 249-258. Deisenhofer,
of the protein
Nature
Rev. Biochem.
Gold, L., Pribnow, and Stormo,
regulatory
H.:
photosynthetic
in E. coli.
mechanisms
57 (1988) in press.
D., Schneider,
initiation
S., Singer, B.S.
in prokaryotes.
Annu.
35 (1981) 365-403.
S.S., Brusslan,
family of psbA genes encoding
R.: Expression
a photosystem
of a
II polypeptide
Anacystis nidulans R2. EMBO
J. 5
(1986) 2789-2798. Golden,
S.S.,
engineering zymol. Hearst,
Brusslan,
J.
and
Haselkorn, chromosome.
R.:
Genetic
Methods
En-
153 (1987) 215-231. J.E.
and
between portions
reveals
Sauer,
K.:
at their
contain
N-termini.
1123-1130. Nanba, 0. and Satoh, K.: Isolation consisting
of D-l
photosystem
II pro-
N-acetyl-O-phospho-
J. Biol. Chem.
263 (1988)
of a photosystem
II reaction
and D-2 polypeptides
and cyto-
chrome b-559. Proc. Natl. Acad. Sci. USA 84 (1987) 109-l 12. Norrander,
J.: Construction
of im-
proved Ml3 vectors using oligodeoxynucleotide-directed
J., Kempe,
mu-
Prentki,
P. and Krisch,
with a selectable Raibaud, Reed,
T. and Messing,
Gene 26 (1983) 101-106. H.M.: In vitro insertional
DNA fragment.
0. and Schwartz, in bacteria.
K.C. and
agarose
Mann,
M.: Positive control D.A.:
mutagenesis
Gene 29 (1984) 303-313.
Ann. Rev. Genet.
gels to nylon
Rapid
membranes.
of transcription
18 (1984) 173-206.
transfer
of DNA
Nucleic
Acids
from
Res.
13
(1985) 7207-7221. Rimm, D.L., Horness, struction
D., Kucera,
of coliphage
lambda
J. and Blattner, Charon
vectors
F.R.: Conwith BamHI
sites. Gene 12 (1980) 301-309.
Sanger, F., Nicklen,
S. and Coulson, A.R.: DNA sequencing
chain terminating
inhibitors.
Satoh, K.: Protein-pigments Photochem.
Photobiol.
Shine, J. and Dalgarno, triplets
Proc. Natl. Acad.
with
Sci. USA 74
and ribosome
sequence
of the L and M subunits
homologies
of reaction
ofRhodopseudomonas capsulata and the Q,-protein
centers of chloro-
and photosystem
II reaction
center.
42 (1985) 845-853.
L.: The 3’-terminal
sequence
RNA: complementarity binding
of&cherito nonsense
sites. Proc. Natl. Acad.
Sci.
USA 71 (1974) 1342-1346. Sollner-Webb,
Protein
Enzymol.
J. and Bennett, J. Tandem
that three
chloroplasts
chia coli 16s ribosomal
ofthe cyanobacterial
Methods
(1977) 5463-5468.
J. and Haselkorn,
in the cyanobacterium
teins of spinach
cloning
T., Shinedling,
G.: Translation
Rev. Microbial. Golden,
R. and Michel,
in the
318 (1985) 618-624.
Gold, L.: Post-transcriptional Annu.
subunits
centre of Rhodopseudomonas viridisat 3A resolution.
reaction
spectrometry
initiation
J., Epp, O., Miki, K., Huber,
Structure
mass
tagenesis.
sequence
of the 32-kd thylakoid
for cloning.
Michel, H., Hunt, D.S., Shabanowitz,
center
psbC, the gene encoding of photosystem
J.: New Ml3 vectors
threonine
sequencing:
plasmid
Messing,
101 (1983) 20-78.
the gene for the 44-kd photosystem
Chen, E.Y. and Seeburg,
protein
of unicellular
4 (1968) l-4.
B. and Reeder, R.H.: The nucleotide
the initiation scription Stormo,
and termination
sites for ribosomal
sequence
of
RNA tran-
in X. laevis. Cell 18 (1979) 485-499.
G.D.: Translation
initiation.
In Reznikoff,
W. and Gold,
96
L. (Eds.), Boston, Thomas,
Maximizing
Gene
Expression.
Butterworths,
1986, pp. 195-224.
P.S.: Hybridization
dotted to nitrocellulose
of denatured paper. Methods
RNA transferred Enzymol.
or
100 (1983)
255-266. Trebst,
brane.
for the Anabaena
growth
using molecular
peptides
of the plastoquinone
of photosystem
Z. Naturforsch.
Turner, N.E., Robinson,
and herbicide
II in the thylakoid
mem-
glutamine
Williams,
J.G.K. psbD
and Chisholm, genes
from
R.: Different
pro-
gene during
Nature
D.A.: Nucleotide
the cyanobacterium
6803. In Biggins,
J. (Ed.), Progress
search.
Nijhoff, Dordrecht,
Martinus
41c (1986) 240-245. S.J. and Haselkorn,
synthetase
or fixed nitrogen.
306 (1983)
331-342. both
A.: The topology
binding
moters
Communicated
by T.D. M&night.
sequences
of
Synechocyxtis
in Photosynthesis 1987, pp. 809-812.
Re-