The 2 Å resolution crystal structure of HetL, a pentapeptide repeat protein involved in regulation of heterocyst differentiation in the cyanobacterium Nostoc sp. strain PCC 7120

The 2 Å resolution crystal structure of HetL, a pentapeptide repeat protein involved in regulation of heterocyst differentiation in the cyanobacterium Nostoc sp. strain PCC 7120

Journal of Structural Biology 165 (2009) 47–52 Contents lists available at ScienceDirect Journal of Structural Biology journal homepage: www.elsevie...

897KB Sizes 1 Downloads 72 Views

Journal of Structural Biology 165 (2009) 47–52

Contents lists available at ScienceDirect

Journal of Structural Biology journal homepage: www.elsevier.com/locate/yjsbi

Structure Report

The 2 Å resolution crystal structure of HetL, a pentapeptide repeat protein involved in regulation of heterocyst differentiation in the cyanobacterium Nostoc sp. strain PCC 7120 Shuisong Ni a, George M. Sheldrick b, Matthew M. Benning c, Michael A. Kennedy a,* a b c

Department of Chemistry and Biochemistry, Miami University, 701 East High Street, Oxford, OH 45056, USA Department of Structural Chemistry, University of Goettingen, D37077 Goettingen, Germany Bruker AXS Inc., 5465 East Cheryl Parkway, Madison, WI 53711, USA

a r t i c l e

i n f o

Article history: Received 12 August 2008 Received in revised form 18 September 2008 Accepted 29 September 2008 Available online 11 October 2008 Keywords: Protein crystal structure Pentapeptide repeat protein Repeated-five-residue fold Right-handed quadrilateral beta helix Cyanobacterium Anabaena sp. PCC 7120 Nostoc sp. PCC 7120 Sulfur anomalous scattering

a b s t r a c t The hetL gene from the cyanobacterium Nostoc sp. PCC 7120 encodes a 237 amino acid protein (25.6 kDa) containing 40 predicted tandem pentapeptide repeats. Nostoc sp. PCC 7120 is a filamentous cyanobacterium that forms heterocysts, specialized cells capable of fixing atmospheric N2 during nitrogen starvation in its aqueous environment. Under these conditions, heterocysts occur in a regular pattern of approximately one out of every 10–15 vegetative cells. Heterocyst differentiation is highly regulated involving hundreds of genes, one of which encodes PatS, thought to be an intercellular peptide signal made by developing heterocysts to inhibit heterocyst differentiation in neighboring vegetative cells, thus contributing to pattern formation and spacing of heterocysts along the filament. While overexpression of PatS suppresses heterocyst differentiation in Nostoc sp. PCC 7120, overexpression of HetL produces a multiple contiguous heterocyst phenotype with loss of the wild type heterocyst pattern, and strains containing extra copies of hetL allow heterocyst formation even in cells overexpressing PatS. Thus, HetL appears to interfere with heterocyst differentiation inhibition by PatS, however, the mechanism for HetL function remains unknown. As a first step towards exploring the mechanism for its biochemical function, the crystal structure of HetL has been solved at 2.0 Å resolution using sulfur anomalous scattering. Ó 2008 Published by Elsevier Inc.

1. Introduction Pentapeptide repeat proteins (PRPs),1 defined as proteins containing at least eight tandem repeating sequences of five amino acids with an approximate consensus [STAV][D/N][L/F][S/T/R][X] (Bateman et al., 1998; Vetting et al., 2006), are disproportionately abundant in ancient photosynthetic cyanobacteria with some species containing nearly 40 PRPs in their genomes (Chandler et al., 2003). PRPs adopt a highly regular four-sided right-handed beta helical structure that can be described as a collection of type II and IV beta turns (Buchko et al., 2006a, 2008; Vetting et al., 2006, 2007). While more than 3500 PRPs (Pfam00805) have been identified according to the Pfam database (Bateman et al., 2000; Buchko et al., 2008), the biochemical function of PRPs remains largely unknown.

* Corresponding author. Fax: +1 513 593 5715. E-mail address: [email protected] (M.A. Kennedy). 1 Abbreviations used: PRP, pentapeptide repeat protein; PR, pentapeptide repeat; Rfr, repeated five residues; SAS, sulfur anomalous scattering; SAD, single wavelength anomalous diffraction; S-SAD, sulfur-single wavelength anomalous diffraction. 1047-8477/$ - see front matter Ó 2008 Published by Elsevier Inc. doi:10.1016/j.jsb.2008.09.010

HglK (all0813) from the filamentous cyanobacterium Nostoc sp. 7120 was the first PRP discovered and characterized (Black et al., 1995). HglK was described as an integral membrane protein with its C-terminal domain predicted to contain 36 tandem pentapeptide repeats (Black et al., 1995). While the precise biochemical function of HglK remains elusive, insight into its cellular function was derived from genetic analysis, which showed that its C-terminal pentapeptide repeat domain was essential for localizing or transporting glycolipid components required for the maturation of heterocysts. Heterocysts are specialized cells in cyanobacteria that terminally differentiate to carry out fixation of atmospheric N2 when there are insufficient sources of combined nitrogen containing compounds in its growth environment (Black et al., 1995). The physiological function of another PRP, RfrA from Synechocystis sp. PCC 6803, so named because it contained a ‘‘repeated five-residues” domain, or pentapeptide repeat (PR) domain, in the amino terminus of the protein, was found to play a role in manganese uptake (Chandler et al., 2003), however, no structure for RfrA exists and its biochemical function remains unknown. Perhaps the most intriguing physiological function of a PRP was elucidated from the crystal structure of MfpA from Mycobacterium tuberculosis (Hegde et al., 2005), which not only established the precise nature

48

S. Ni et al. / Journal of Structural Biology 165 (2009) 47–52

of the Rfr-fold found in PRPs as a right-handed quadrilateral beta helix, correcting the original prediction that this family of proteins would adopt a right-handed beta helical structure with a triangular-shaped cross-section (Bateman et al., 1998; Kajava, 2001), but also provided direct evidence that explained its important biochemical function in vivo, namely that MfpA’s DNA gyrase inhibition resulted from its acting as a DNA mimic that binds to DNA gyrase, and thus conferred antibiotic resistance against the fluoroquinolone class of antibiotics that are normally quite effective bactericidal agents. The authors concluded that MfpA was an effective DNA mimic because its right-handed quadrilateral beta helical structure exhibited a size, shape and electrostatic surface charge similar to that of right-handed DNA (Hegde et al., 2005). The X-ray crystal structures of two pentapeptide proteins from cyanobacterium Cyanothece 51142 have also been reported (Buchko et al., 2006a,b, 2008), but the structures of these PRPs failed to yield insight into their biochemical function. Knowledge of these structures, however, extended the number of known PRP structures to four, making it possible to refine our understanding of the precise nature of the Rfr-fold, as well as the types of sequence and structural variations that occur in PRPs. HetL (all3740), one of more than 30 PRPs from Nostoc sp. PCC 7120, has been shown to be involved in regulation of heterocyst differentiation (Liu and Golden, 2002). Heterocysts provide a microaerobic environment necessary for fixing atmospheric N2 since oxygen irreversibly inhibits the nitrogenase that carries out the critical step in nitrogen fixation (Golden and Yoon, 2003). The process of heterocyst differentiation is a hallmark of many filamentous photosynthetic cyanobacteria that adapt to grow when atmospheric N2 is the only available nitrogen source (Zhang et al., 2006). This developmental process is not only temporal (initiated within a few hours after nitrogen limitation), but also spatially controlled, with less than 10% of vegetative cells undergoing differentiation in a pattern that is spaced regularly along the filament during heterocyst development (Zhang et al., 2006). It has been estimated that this complex process involves anywhere from 600 to 1000 genes (Lynn et al., 1986), and genetic studies have started to reveal complex regulatory pathways of those interacting gene products (Zhang et al., 2006). One of these gene products is HetL, a single domain protein with 237 amino acids composed of 40 predicted pentapetide repeats. The biochemical function of HetL remains unknown, however, it has been shown that overexpression of HetL interferes with the action of PatS, a small intercellular signaling peptide produced by developing heterocysts that inhibits heterocyst differentiation among neighboring noncommitted vegetative cells. It has been shown that PatS is required to maintain the regular spacing of heterocysts along the filament during nitrogen starvation (Liu and Golden, 2002). As a first step towards exploring the biochemical function of HetL, and the role it plays in heterocyst development, we have initiated structural studies of HetL. Here we report the 2.0 Å resolution crystal structure of HetL and discuss the characteristics of its Rfr-fold in the context of the four other members of the PRP family with known structures. 2. Materials and methods 2.1. Protein cloning, expression and purification The HetL (all3740) gene was amplified from the genomic DNA of Nostoc sp. PCC7120 by using a standard PCR protocol. The PCR primer sequences are 50 -TCGATCGCATATGAATGTGGGTGAAATTCTGAG and 50 -TGACTCTCGAGCTAATCATGAATTGAACCATCAGGC, where the underlined sequences indicate the recognition sites for restriction endonucleases of NdeI and XhoI, respectively. The resulting PCR product was digested with NdeI and XhoI and cloned

into the expression vector pET28b (Novagen). This construct allows the expression of HetL with an N-terminal 6-histidine tag cleavable by thrombin. The plasmid containing hetL was transformed into the expression host Escherichia coli BL21(DE3) (Novagen) and the cells were grown at 37 °C with vigorous shaking to an OD600 of 0.8 in 1 L of the auto-induction medium ZYP-5052 supplemented with 30 lg/mL kanomycin. Protein expression was spontaneously induced at 28 °C overnight. Cells were then harvested, and stored at 80 °C. Thawed cells were resuspended in 25 mL of a lysis buffer (0.25 M NaCl, 20 mM Tris, pH 7.8). Phenylmethylsulfonyl fluoride was added to the cell suspension to a final concentration of about 0.2 lM immediately prior to cell lysis by three passes through a French Press (SLM Instruments Inc.). The cell lysate was spun at 24,000g for 60 min. The supernatant was loaded onto a 10 mL Ni-NTA affinity column (Qiagen) and washed step-wise with 50 mL of buffer (0.25 M NaCl, 20 mM Tris, pH 7.8) with 30 mM imidazole. HetL protein was eluted from the Ni-NTA column with 300 mM imidazole in the starting buffer. Purified HetL was concentrated with an Amicon Centriprep-10 to about 10 mg/ml and further purified on a Pharmacia Superdex200 HiLoad size exclusion column (SEC) equilibrated with a buffer containing 0.25 M NaCl, 20 mM sodium HEPES, pH 7.0. HetL fractions were combined and concentrated to 5.0, 7.5, 10, and 15 mg/ml for crystallization screening. 2.2. Crystallization Crystallization screening of HetL was carried out at 22 °C using the vapor diffusion hanging drop method. Drops were set up by mixing 1 lL of purified HetL with 1 lL of each reservoir buffer from the Hampton Research screen kits. Crystals appeared in a buffer containing 0.1 M sodium HEPES (pH 7.5), 0.8 M NaH2PO4, and 0.8 M KH2PO4, after 2–3 days. Crystallization was optimized around this buffer by varying protein concentrations and by adding glycerol at various concentrations. Good-quality diffracting crystals were obtained at protein concentration of 7.5 mg/ml in the buffer containing 0.1 M sodium HEPES (pH 7.5), 0.7 M NaH2PO4, 0.7 M KH2PO4, and 12.5% glycerol. 2.3. Data collection, structure solution, phasing, and refinement HetL crystals were screened and characterized using Cu Ka radiation from a Bruker Microstar rotating anode generator equipped with Montel optics and a SMART 6000 CCD. A sulfur SAD data set was collected on the same system using a flash cooled crystal at 100 K with an exposure time of 5 s and an oscillation angle of 0.2° per image. Full data collection statistics are listed in Table 1. The data were integrated using the Proteum software package (Bruker AXS, 2006) and scaled using SADABS (Bruker AXS, 2007a). XPREP (Bruker AXS, 2007b) was used to calculate data statistics and output anomalous differences. Anomalous scatters were found using SHELXD (Schneider and Sheldrick, 2002). All six sulfur atoms were located including the N-terminal methionine using data to 2.7 Å resolution using a run of 300 trials. The sub-structure was input into SHELXE (Sheldrick, 2002) for phase determination and improvement to 1.8 Å without further refinement of the atomic positions or occupancies. A new auto-tracing feature under development in SHELXE was used to determine an initial model for the protein backbone. The program uses an iterative method combining density modification with partial structure information to improve phases and produces a model containing the positions of the main-chain atoms. Using this feature, SHELXE was able to determine the positions of all 237 residues in approximately 20 min. The densities for most of the side-chain atoms were visible in the corresponding electron

S. Ni et al. / Journal of Structural Biology 165 (2009) 47–52

49

Table 1 Summary of data collection and structure refinement statistics for HetL. Values in parenthesis are for the highest resolution shell.

each PRP was based on SOSUI (Gomi et al., 2004) and SignalP (Emanuelsson et al., 2007).

Data collection Data set Space group Unit-cell parameters (Å, °)

3. Results and discussion

Matthews coefficient Percent solvent (%) X-ray source Temperature (K) Resolution limits (Å) Detector distance (mm) Exposure time (s) Oscillation angle (°) No. images Mosaicity (°) Wavelength (Å) No. unique reflections Redundancy Rsym (%)a Rpim (%)b Completeness (%) I/r (I) Structure refinement Roverall (%)c Rfree (%)d Correlation coefficient (FoFc) Correlation coefficient (FoFc)free Protein atoms Solvent atoms Protein residues Average B value, main chain protein atoms (Å2) Average B value, side chain, and solvent atoms (Å2) Average B value, total (Å2) R.m.s. deviations from ideal bond lengths (Å) R.m.s. deviations from ideal bond angles (°)

SAS I222 a = 68.20, b = 93.76, c = 101.31, a = b = c = 90 3.2 61.2 Cu anode 100 68.82.0 (2.052.00) 80 5 0.2 21,844 0.65 1.5415 20,737 (981) 50.3 (16.0) 4.0 (16.8) 0.7 (4.6) 97.8 (90.6) 75.6 (14.6) 17.7 20.3 0.96 0.94 1800 237 237 13.8 21.2 17.8 0.014 1.33

P P Rsym is defined as h|Ih|/ h and was calculated using XPREP. P P P P b Rpim is defined as [1/(N1)]1/2 i|Ii(h)|/ h iIi(h), where N refers to redundancy of the measurement of reflection Ih, Ii(h) refers to the ith measurement of reflection Ih and refers to the average value of the measurement of the Ih reflection. P P c Roverall is defined as ||Fo||Fc||/ |Fo| summed over all reflections. d Rfree is calculated in the same way as Roverall except only the reflections marked for cross-validation are used in the calculation. a

density map and were built in prior to refinement. Remaining corrections to the structure were done manually using Xtalview/Xfit (McRee, 1999) and refined using CNS (Brunger et al., 1998) and REFMAC (5.2.0019) (Murshudov et al., 1997). The resolution cutoff for final structure refinement was fixed at 2.0 Å, where the completeness in the highest resolution shell exceeded 90%. The positions of the water molecules were determined using the water_pick.inp input file and the CNS software (Brunger et al., 1998) using npeaks = 400 as the search parameter for the number of water molecules and peak = 3.0 as the signal-to-noise cutoff. Structure quality was assessed using Procheck (Laskowski et al., 1993) and Molprobity (Davis et al., 2004). The Molprobity analysis output was included in the final structure file submitted to the Protein Data Bank. 2.4. Nostoc sp. PCC 7120 PRP sequence analysis and cellular location prediction The KEGG database (Kanehisa and Goto, 2000) was used to query the Nostoc sp. PCC 7120 genome to identify PRPs. The KEGG paralog search utility was used to identify HetL paralogs. PR domain boundaries identified by KEGG were confirmed and refined by manual inspection. The prediction of the cellular location for

3.1. Crystallization, phasing, and structure determination of HetL Native HetL formed diamond-shaped crystals (dimensions on the order of 70  180  240 microns) that diffracted to about 1.8 Å resolution. Indexing showed that the crystals were orthorhombic (space group I222, unit-cell dimensions a = 68.66 Å, b = 94.43 Å, c = 102.07 Å and a = b = c = 90°) and contained one molecule in the asymmetric unit with a corresponding Matthews’ coefficient of 3.2 Å3/D and a solvent content of 61.2%. Diffraction data statistics are summarized in Table 1. Se-Met derivatives of HetL, however, exhibited poor solubility and could not be crystallized. Therefore, diffraction measurements were made using native protein crystals and phased from highly redundant sulfur anomalous scattering data using the program SHELXD (Schneider and Sheldrick, 2002), as described in the methods section. SHELXD found six distinct sulfur sites with correlation coefficients CC(all) = 30.31 and CC(weak) = 17.27. The initial structure was generated by a new automatic tracing feature under development in SHELXE (Sheldrick, 2002) and further refined manually, as described in Section 2. The CNS program was used to identify the position of 237 water molecules using search parameters discussed in Section 2. The structure quality was evaluated using Procheck (Laskowski et al., 1993) and Molprobity (Davis et al., 2004). Procheck analysis indicated 68.4% of residues in the most favored region and 31.6% of residues in the additional allowed region, typical for PRPs (Buchko et al., 2006a,b), with no residues in the generously allowed or disallowed regions, and G-factors of 0.28, 0.44, and 0.0 for dihedral angles, main chain bond lengths and angles, and overall average, respectively. Molprobity analysis of all atom contacts yielded a clash score of 4.44 (98th percentile) and analysis of protein geometry yielded a Molprobity score of 1.25 (99th percentile). Correlation coefficients for FoFc and FoFc free, calculated after refinement using REFMAC (v5.2.0019) (Murshudov et al., 1997) were 0.96 and 0.94, respectively. Structure refinement statistics are summarized in Table 1. 3.2. Structure of HetL HetL adopts a right-handed quadrilateral beta helix (Fig. 1A), typical of the Rfr-fold common to all PRPs. HetL is composed of ten complete coils (R15-G227) with a ten residue a-helix (V3A12) capping the N-terminus, a two-stranded anti-parallel b-sheet (A228-D237) that sits on C-terminus of Face 1, a six-residue loop insertion (P129-R134) protruding from the corner joining Face 3 and 4 near the middle of the protein sequence, and a nine-residue loop insertion (R174-A182) protruding from the corner joining Face 3 and 4 in the C-terminal half of the protein sequence (Fig. 2). The Rfr-fold of HetL is composed completely of type II turns (Vetting et al., 2006), as found both in Rfr23 from Cyanothece 51142 (Buchko et al., 2008) and the fusion product of two PRPs from Nostoc punctiforme, Np275/Np276 (Vetting et al., 2007). The average u/w backbone angles for the Rfr-fold of HetL for the i  1, i, i + 1, i + 2, and i  2 (next PR) are 102.3 ± 8.5/109.7 ± 6.0, 117.3 ± 8.3/28.6 ± 6.3, 58.2 ± 5.6/129.6 ± 5.9, 63.2 ± 6.6/14.6 ± 8.8, and 73.3 ± 6.0/152.7 ± 6.1, respectively. Based on previous structures, it is known that the side chains of amino acids in the i1, i + 1, and i + 2 positions always point away from the interior of the Rfr fold. Analysis of the HetL structure and sequence indicate several surface-exposed hydrophobic residues that might be important to its biochemical function including I66, I81, I103, L121, F123, L141, I204, and I219, with several of these located in

50

S. Ni et al. / Journal of Structural Biology 165 (2009) 47–52

the nine-residue loop insertion in the C-terminal half of the protein and in the C-terminal capping b-sheet. The majority of residues in the i position in HetL are occupied by leucine (37 out of 40 positions or 92.5%), while three i positions are occupied by phenylalanine (F17, F52, and F172, see Fig. 2). Even the PRs containing phenylalanine in HetL adopt a type II turn, suggesting a refinement of the rules suggested by Vetting et al. (2006), specifically that the i position on at least two consecutive coils on the same face must contain a large hydrophobic side chain before the transition from a type II to a IV turn will occur, since F17, F52, and F172 are all accommodated by type II turns in HetL. 3.3. Electrostatic potential surface of HetL An intriguing structural feature of MfpA related to its function is its distinctive distribution of electrostatic surface potential, which contains large contiguous patches of negative electrostatic potential resembling that of DNA. Since the biochemical function of HetL remains unknown, the electrostatic potential on the surface of HetL was analyzed to determine if it might also resemble that of DNA suggesting a possible role as a DNA mimic. Fig. 1B shows the electrostatic surface potential for the four distinct faces of HetL. Interestingly, Faces 1, 2, and 3, all exhibit large patches of negatively charged surface area, similar to that observed in MfpA, however, Face 4 is rather evenly distributed with both positively and negatively charged residues. The N-terminal a-helix in HetL presents a highly contiguous positively charged surface that caps the N-terminal base of the beta helix, a feature that stands out against the mostly negatively charge across the overall surface of HetL. While it is impossible to pre-

Fig. 2. Structure-based sequence alignment of the 40 tandem pentapeptide repeats in the Rfr fold (R15-G227) in HetL. The residue position in the pentapeptide repeat, relative to the central residue i is labeled on the bottom. The side chains of the i and i2 residues are in the interior of the Rfr fold while the side chains of the i1, i + 1, and i + 2 residues are on the exterior of the Rfr fold. Residues in bold blue indicate hydrophobic side chains that point away from the exterior surface of the Rfr fold. Residues in bold red indicate phenylalanines that occupy the i position in the pentapeptide repeat. Two sequence insertions that form loops that protrude on the exterior of the Rfr fold are indicated by numbers (1 and 2) with their sequences indicated below. The sequences corresponding to a1 and b1 are also shown at the bottom. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 1. (A, top) Ribbon diagram of HetL shown looking directly at Face 4 with the coil numbers at the right, consistent with the labeling in Fig. 2. The N-terminal a-helix is shown in red at the bottom and the C-terminal b-sheet at the top. The two loop insertions are colored blue and labeled according to Fig. 2. (A, bottom) shows HetL rotated 90° looking along the long axis of the Rfr fold. (B) shows the electrostatic potential looking at the four distinct faces of HetL. The electrostatic surface potential is colored using a range from +5 to 5 kT. The figure was generated using the APBS plugin of Pymol (C) Ribbon diagram of Rfr32 (PDB ID 2F3L). (D) Ribbon diagram of MfpA (PDB ID 2BM5). (E) Ribbon diagram of Np275/Np276 (PDB ID 2J8I), and (F) Ribbon diagram of Rfr23 (PDB ID 2O6W). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

S. Ni et al. / Journal of Structural Biology 165 (2009) 47–52

dict any binding-partners for HetL without additional biochemical studies, the presence of large contiguous patches of positively and negatively charged surfaces suggests potential binding sites that can be examined in future studies.

51

direction of the beta helix axis, which appears critical to its biochemical function resulting in MfpA adopting a shape that more closely resembled DNA.

3.4. Comparison of the structure of HetL with other PRPs

3.5. Analysis of HetL in the context of other PRP sequences in Nostsoc sp. PCC 7120

HetL is the fifth PRP structure solved by X-ray crystallography. A Dali search (Holm and Sander, 1996) against HetL indicated, as expected, high Z-scores for the other four PRPs with known structures, with the resurrected fusion protein from Nostoc punctiforme Np275/Np276 having the highest Z-score of 26.1, followed by MfpA (Z = 21.2), Rfr23 (Z = 16.1), and Rfr32 (Z = 14.4). Fig. 1C–F show Rfr32, MfpA, Np275/Np276, and Rfr23, respectively, for comparison with HetL (Fig. 1A). Comparison of these five PRPs reveals some interesting trends. Three of the five structures are composed entirely of type II turns, including HetL (Fig. 1A), Np275/Np276 (Fig. 1E) and Rfr23 (Fig. 1F). All of these PRPs composed purely of type II turns also are all capped by an N-terminal a-helix oriented in the same direction (Fig. 1A, bottom and Fig. 1E and F bottom). Among the three PRP structures composed purely of type II turns, two have loop insertions. Rfr23 contains a 24-residue loop insertion and HetL contains two loop insertions (a six residue insertion, P129-R134, and a nine residue insertion, R174-A182, both of which begin at an i + 2 position on the same face meaning that both loops protrude from the same corner at the turn between Face 3 and 4. HetL is also terminated by a two-stranded anti-parallel C-terminal b-sheet sitting on the edge of Face 1 that appears similar to the termination of Np275/Np276. In contrast, the two PRPs that contain mixtures of type II and IV turns, MfpA (Fig. 1C) and Rfr32 (Fig. 1D) are both capped by C-terminal two a-helix bundles and contain no N-terminal capping structural features. MfpA stands out among the five PRP structures in that it contains an interesting sequence variation, a proline at the i + 2 position between the 4th and 5th coils, that leads to a kink in the

Fig. 3 summarizes information about the PRPs from Nostsoc sp. PCC 7120. There are 32 PRPs in Nostsoc sp. PCC 7120 ranging in size from 125 to 1010 amino acids. Among these, 28 originate on the chromosomal DNA with four originating on one of six extra-chromosomal plasmids (a–f) (Kaneko et al., 2001). The PR domains often span the entire length of the smaller PRPs, but can occur anywhere in the larger PRPs, including at the N-terminus, C-terminus, or as disrupted domains in the middle of the protein. Almost two-thirds of the PRPs (19/32) are predicted to be cytoplasmic with eight predicted to be membrane associated and five predicted to exist either in the periplasmic or thylakoid lumenal spaces. The nature of the PR consensus sequence makes it difficult to assess homologs across multiple cyanobacteria species, with the 32 PRPs from Nostsoc sp. PCC 7120 sharing an average sequence identity of 37.6%, notwithstanding the variability in PRP domain length and predicted location within the cell. Not only does the high background sequence similarity among PRPs complicate identification of conserved regions that might be of important biological significance, but it might also complicate attempts at molecular biology studies designed to gain biochemical understanding. For example, a hetL knock-out was found to have no effect on its phenotype, however, overexpression of HetL had a significant phenotype, specifically, stimulation of heterocyst differentiation (Liu and Golden, 2002). A possible explanation for the knock-out behavior is that, due to the similarity in their structures, other PRPs might complement for the function of one whose function is disrupted, which could be the case in the HetL knock-out strain. Clearly, cross-complementation among PRPs could complicate interpretation of PRP knock-out data.

Fig. 3. A bar graph representation of the 32 PRPs in Nostoc sp. PCC 7120. Each bar represents one PRP with the PR domain colored red, the N-terminal region colored blue, and the C-terminal domain, or peptide segments separating PR domains colored yellow. The gene name for each PRP is indicated above each bar. A greek letter preceding the gene name indicates the plasmid of origin for each gene not residing on the chromosomal DNA. The length of each protein is indicated in parentheses. The predicted cellular location for each PRP is indicated below each bar (C, cytoplasmic; P/L, periplasm, thylakoid lumen; M, membrane). The gene corresponding to HetL is indicated by an arrow. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

52

S. Ni et al. / Journal of Structural Biology 165 (2009) 47–52

3.6. Discussion At first glance, it might appear that PRPs would have limited structural variability since they all adopt a common Rfr fold, however, as is evident from this comparison, PRPs can achieve significant structural variability, which might lead to considerable diversity in biochemical functions, through variation in the composition and placement of type II and IV turns, the number, position, and size of loop insertions, the nature of capping motifs that occur either at the N- or C-termini, the distribution of surface exposed hydrophobic side chains, the nature of electrostatic potential surface patches, and kinks in the beta helix axis introduced by sequence departures from the PR pattern. While the HetL structure adds to our knowledge about PRP structures, we still have only a small sampling of the anticipated structural diversity of PRPs. HetL is predicted to be a cytoplasmic protein, however, cyanobacterial PRPs are predicted to exist in all cellular compartments including the membranes, the thylakoid lumen and periplasmic spaces, as well as the cytoplasm, and structural coverage of PRPs from each compartment type is still sparse or absent. Furthermore, all five known PRP structures come from proteins where the PRs span the entire protein sequence, however, many predicted PRPs contain PR domains along side other functional domains (Buchko et al., 2006a,b), and knowledge of these mixed domain structures could provide clues about PRP domain functions, however, such structures do not yet exist. With thousands of PRPs already identified as a result of genome sequencing efforts, and with so little known about their biochemical and cellular functions, each new PRP structure significantly expands our understanding of the nature and variability of PRPs, and provides a foundation for exploring the function of this intriguing family of proteins that presumably played an important role in the evolution and life of ancient cyanobacteria. Atomic coordinates The atomic coordinates and structure factors have been deposited at the Research Collaboratory for Structural Bioinformatics under PDB ID 3DU1. Acknowledgments This work was supported by start-up funds provided to M.A.K. at Miami University. We gratefully acknowledge Professor Susan Barnum for providing the genomic DNA of Nostoc sp. PCC 7120. References Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Howe, K.L., Sonnhammer, E.L., 2000. The Pfam protein families database. Nucleic Acids Res. 28, 263–266. Bateman, A., Murzin, A.G., Teichmann, S.A., 1998. Structure and distribution of pentapeptide repeats in bacteria. Protein Sci. 7, 1477–1480. Black, K., Buikema, W.J., Haselkorn, R., 1995. The hglK gene is required for localization of heterocyst-specific glycolipids in the cyanobacterium Anabaena sp. strain PCC 7120. J. Bacteriol. 177, 6440–6448.

Bruker AXS, 2007a. SADABS. Bruker AXS, 2007b. XPREP. Bruker AXS, 2006. Proteum2. Brunger, A.T., Adams, P.D., Clore, G.M., DeLano, W.L., Gros, P., Grosse-Kunstleve, R.W., Jiang, J.-S., Kuszewski, J., Nilges, N., Pannu, N.S., Read, R.J., Rice, L.M., Simonson, T., Warren, G.L., 1998. Crystallography and NMR system (CNS): a new software system for macromolecular structure determination. Acta Cryst. D. 54, 905–921. Buchko, G.W., Robinson, H., Pakrasi, H.B., Kennedy, M.A., 2008. Insights into the structural variation between pentapeptide repeat proteins––crystal structure of Rfr23 from Cyanothece 51142. J. Struct. Biol. 162, 184–192. Buchko, G.W., Ni, S., Robinson, H., Welsh, E.A., Pakrasi, H.B., Kennedy, M.A., 2006a. Characterization of two potentially universal turn motifs that shape the repeated five-residues fold––crystal structure of a lumenal pentapeptide repeat protein from Cyanothece 51142. Protein Sci. 15, 2579–2595. Buchko, G.W., Robinson, H., Ni, S., Pakrasi, H.B., Kennedy, M.A., 2006b. Cloning, expression, crystallization and preliminary crystallographic analysis of a pentapeptide-repeat protein (Rfr23) from the bacterium Cyanothece 51142. Acta Crystallogr. Sect. F Struct. Biol. Cryst. Commun. 62, 1251–1254. Chandler, L.E., Bartsevich, V.V., Pakrasi, H.B., 2003. Regulation of manganese uptake in Synechocystis 6803 by RfrA, a member of a novel family of proteins containing a repeated five-residues domain. Biochemistry 42, 5508–5514. Davis, I.W., Murray, L.W., Richardson, J.S., Richardson, D.C., 2004. MOLPROBITY: structure validation and all-atom contact analysis for nucleic acids and their complexes. Nucleic Acids Res. 32, 615–619. Emanuelsson, O., Brunak, S., von Heijne, G., Nielsen, H., 2007. Locating proteins in the cell using TargetP, SignalP and related tools. Nat. Protoc. 2, 953–971. Golden, J.W., Yoon, H.S., 2003. Heterocyst development in Anabaena. Curr. Opin. Microbiol. 6, 557–563. Gomi, M., Sonoyama, M., Mitaku, S., 2004. High performance system for signal peptide predication: SOSUIsignal. Chem.-Biol. Inform. J. 4, 142–147. Hegde, S.S., Vetting, M.W., Roderick, S.L., Mitchenall, L.A., Maxwell, A., Takiff, H.E., Blanchard, J.S., 2005. A fluoroquinolone resistance protein from Mycobacterium tuberculosis that mimics DNA. Science 308, 1480–1483. Holm, L., Sander, C., 1996. Mapping the protein universe. Science 273, 595–603. Kajava, A.V., 2001. Review: Proteins with Repeated Sequence—Structural Prediction and Modeling. J. Struct. Biol. 134, 132–144. Kanehisa, M., Goto, S., 2000. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30. Kaneko, T., Nakamura, Y., Wolk, C.P., Kuritz, T., Sasamoto, S., Watanabe, A., Iriguchi, M., Ishikawa, A., Kawashima, K., Kimura, T., Kishida, Y., Kohara, M., Matsumoto, M., Matsuno, A., Muraki, A., Nakazaki, N., Shimpo, S., Sugimoto, M., Takazawa, M., Yamada, M., Yasuda, M., Tabata, S., 2001. Complete genomic sequence of the filamentous nitrogen-fixing cyanobacterium Anabaena sp. strain PCC 7120. DNA Res. 8, 205–213. 227–253. Laskowski, R.A., MacArthur, M.W., Moss, D.S., Thornton, J.M., 1993. PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Cryst. 26, 283–291. Liu, D., Golden, J.W., 2002. hetL overexpression stimulates heterocyst formation in Anabaena sp. strain PCC 7120. J. Bacteriol. 184, 6873–6881. Lynn, M.E., Bantle, J.A., Ownby, J.D., 1986. Estimation of gene expression in heterocysts of Anabaena variabilis by using DNA–RNA hybridization. J. Bacteriol. 167, 940–946. McRee, D.E., 1999. XtalView/Xfit––A versatile program for manipulating atomic coordinates and electron density. J. Struct. Biol. 125, 156–165. Murshudov, G.N., Vagin, A.A., Dodson, E.J., 1997. Refinement of Macromolecular Structures by the Maximum-Likelihood Method. Acta Cryst. D53, 240–255. Schneider, T.R., Sheldrick, G.M., 2002. Substructure solution with SHELXD. Acta Crystallogr. D Biol. Crystallogr. 58, 1772–1779. Sheldrick, G., 2002. Macromolecular Phasing with SHELXE. Z. Kristallogr. 217, 644– 650. Vetting, M.W., Hegde, S.S., Hazleton, K.Z., Blanchard, J.S., 2007. Structural characterization of the fusion of two pentapeptide repeat proteins, Np275 and Np276, from Nostoc punctiforme: resurrection of an ancestral protein. Protein Sci. 16, 755–760. Vetting, M.W., Hegde, S.S., Fajardo, J.E., Fiser, A., Roderick, S.L., Takiff, H.E., Blanchard, J.S., 2006. Pentapeptide repeat proteins. Biochemistry 45, 1–10. Zhang, C.C., Laurent, S., Sakr, S., Peng, L., Bédu, S., 2006. Heterocyst differentiation and pattern formation in cyanobacteria: a chorus of signals. Mol. Microbiol. 59, 367–375.