Allelic variation in the highly polymorphic locus pspC of Streptococcuspneumoniae

Allelic variation in the highly polymorphic locus pspC of Streptococcuspneumoniae

Gene 284 (2002) 63–71 www.elsevier.com/locate/gene Allelic variation in the highly polymorphic locus pspC of Streptococcus pneumoniae Francesco Ianne...

279KB Sizes 0 Downloads 85 Views

Gene 284 (2002) 63–71 www.elsevier.com/locate/gene

Allelic variation in the highly polymorphic locus pspC of Streptococcus pneumoniae Francesco Iannelli, Marco R. Oggioni, Gianni Pozzi* Laboratory of Molecular Microbiology and Biotechnology, Sezione di Microbiologia, Dipartimento di Biologia Molecolare, Universita` di Siena, 53100 Siena, Italy Received 17 October 2001; received in revised form 30 November 2001; accepted 12 December 2001 Received by R. Di Lauro

Abstract PspC, also called SpsA, CbpA, PbcA, and Hic, is a surface protein of Streptococcus pneumoniae studied for its antigenic properties, its capability to bind secretory IgA, C3 and complement factor H, and its activity as an adhesin. In this work we characterized the pspC locus of 43 pneumococcal strains by DNA sequencing of PCR fragments. Using PCR primers designed on two unrelated open reading frames, flanking the pspC locus, it was possible to amplify the pspC locus of each of the 43 strains of S. pneumoniae. In 37 out of 43 strains there was a single copy of the pspC gene, while two tandem copies of pspC were found in the other six strains. The sequence of the pspC locus was different in each of the 43 strains. Insertion sequences were found in the pspC locus of 11 out of 43 strains. Analysis of the deduced amino acid sequence of the PspC variants showed a common organization of the molecules: (i) a 37 amino acid leader peptide which is conserved in all proteins, (ii) an N-terminal portion which is essentially alpha-helical, and is the result of assembly of eight major sequence blocks, (iii) a proline-rich region, and (iv) a C-terminal anchor responsible for the cell surface attachment. By sequence comparison we identified 11 major groups of PspC proteins. Proteins within one group displayed only minor variations of the amino acid sequence. An unexpected finding was that PspC variants could differ in the anchor sequence. While 32 of the PspC proteins displayed the typical choline binding domain of pneumococcal surface proteins, 17 other PspCs showed the LPXTG motif, which is typical of surface proteins of other gram-positive bacteria. This major difference in the anchor region was also observed in the adjacent proline-rich regions which differed considerably in size and composition. q 2002 Published by Elsevier Science B.V. Keywords: Pathogenicity; Virulence; Surface protein; Vaccine

1. Introduction Surface proteins of Streptococcus pneumoniae (pneumococcus) have been investigated for their role in pneumococcal pathogenicity and as candidate antigens for proteinbased vaccines. The most extensively studied molecule is pneumococcal surface protein A (PspA), which was shown to be essential for virulence and to elicit protection against pneumococcal infection (McDaniel et al., 1987; Briles et al., 1998, 2000; Ogunniyi et al., 2000). While typical surface proteins of gram-positive bacteria are covalently linked to the cell wall via the LPXTG motif at their C-terminal end Abbreviations: aa, amino acid(s); bp, base pair(s); IgA, immunoglobulin A; IS, insertion sequence(s); ORF, open reading frame(s); PCR, polymerase chain reaction; TIGR, The Institute for Genomic Research * Corresponding author. LAMMB, Biologia Molecolare/Universita`, Policlinico Le Scotte, Viale Bracci, 53100 Siena, Italy. Tel.: 139-0577233430; fax: 139-0577-233334. E-mail address: [email protected] (G. Pozzi). 0378-1119/02/$ - see front matter q 2002 Published by Elsevier Science B.V. PII: S 0378-111 9(01)00896-4

(Navarre and Schneewind, 1999), PspA was found to be attached by a novel mechanism by which its C-terminal repeat region binds to the choline residues of the lipoteichoic acid of S. pneumoniae (Yother and White, 1994). By searching the pneumococcal genome data (Tettelin et al., 2001; Dopazo et al., 2001; Hoskins et al., 2001; Oggioni and Pozzi, 2001) it is possible to identify a variety of Cterminally anchored surface proteins, which contain the LPXTG motif or the choline-binding domain. Different biological functions have been associated with pneumococcal surface protein C (PspC), and different allelic forms of PspC have been referred to by different names, including choline-binding protein A (CbpA) (Rosenow et al., 1997), S. pneumoniae secretory IgA binding protein (SpsA) (Hammerschmidt et al., 1997), C3-binding protein A (PbcA) (Cheng et al., 2000), and factor H-binding inhibitor of complement (Hic) (Janulczyk et al., 2000). PspC plays an important role in pneumococcal pathogenesis by

64

F. Iannelli et al. / Gene 284 (2002) 63–71

functioning as an adhesin (Rosenow et al., 1997), by promoting invasion of epithelial cells (Zhang et al., 2000), and by binding soluble host factors such as the IgA secretory fragment, C3 and complement factor H (Hammerschmidt et al., 1997; Cheng et al., 2000; Janulczyk et al., 2000; Dave et al., 2001). Used as an immunogen in the mouse model, PspC proved to be a good candidate for anti-pneumococcal vaccines (Brooks-Walter et al., 1999; Briles et al., 2000). Given this multiplicity of phenotypes, it is important to investigate the variability of the pspC locus among different isolates, to gain a better understanding of PspC as a virulence factor and to facilitate design of PspC-based vaccines. In this work we characterized the pspC locus of 43 clinical isolates of S. pneumoniae. 2. Materials and methods 2.1. Bacterial strains, media and growth conditions The 43 pneumococcal strains characterized in this study include standard strains D39 (Avery et al., 1944; Iannelli et al., 1999) and 8R1 (Bernheimer, 1979), four strains from the American Type Culture Collection, and 37 recent clinical isolates randomly chosen from an Italian collection (including the type 19F strain G54 whose draft genome sequence is annotated) (Pozzi et al., 1996; Dopazo et al., 2001). Bacteria were grown at 37 8C in tryptic soy broth or tryptic soy agar (Difco) supplemented with 3% horse blood. 2.2. PCR primers for the pspC locus According to the annotated type 4 genome (Tettelin et al., 2001), available for search and download at The Institute for Genomic Research (TIGR) website (http://www.tigr.org), the pspC gene is flanked by sp2191 and sp2189. We used the sequences corresponding to sp2191 and sp2189 to design primers for PCR amplification of the pspC locus. The primer IF43 (5 0 -AAT GAG AAA CGA ATC CTT AGC AAT G-3 0 ) is complementary to nucleotides 3347 through 3370 and the primer IF30 (5 0 -AAG ATG AAG ATC GCC TAC GAA CAC-3 0 corresponds to nucleotides 6368 through 6392 on section 190 of the type 4 genome (GenBank Accession number: AE007507). 2.3. PCR amplification The Expand High Fidelity PCR Kit (Roche) was used following essentially the protocol suggested by the manufacturer. Briefly, the 25 ml reaction mixture was in ‘1 £ Expand High Fidelity buffer 1’ and contained: (i) 200 mM dNTPs, (ii) 10 pmol of both primers, and (iii) 0.2 units of ‘Expand High Fidelity DNA polymerase mixture’. Template pneumococcal DNA was obtained by adding 1 ml of bacterial culture (about 10 5 cells) directly to the PCR reaction mixture, as previously described (Iannelli et

al., 1998). PCR experiments were performed with an Omnigene thermal cycler (Hybaid) using the following thermal profile: one cycle at 94 8C for 4 min, then 30 cycles at 50 8C for 30 s, 68 8C for 10 min, 92 8C for 30 s, and one cycle at 50 8C for 2 min and 68 8C for 20 min. 2.4. DNA sequencing The direct automated sequencing of the PCR fragments containing the pspC locus was performed by using a primer walking strategy. A total of 33 sequencing primers including the primers IF43 and IF30 were utilized. The sequences of both DNA strands were determined, each using as a template the product of a different PCR reaction. Cycle sequencing reactions were performed with the Thermo Sequenase Fluorescent Labeled Primer Cycle Sequencing Kit (Amersham) with a modification of the protocol already described (Iannelli et al., 1998). As a starting sequencing template we used 50 ng of crude PCR product and 2 pmol of each IRD800 infrared labeled sequence primer (MWG-Biotech). The cycle sequencing conditions were as follows: one cycle at 92 8C for 2 min, then 30 cycles at 60 8C for 30 s, 72 8C for 30 s, and 92 8C for 30 s. The sequence products were analyzed on a LICOR 4000L automated infrared DNA sequencer apparatus (LICOR) (Iannelli et al., 1998). 2.5. DNA sequence analysis The software BLAST was used to conduct homology searches of the GenBank database and the microbial genome databases available at the National Center for Biotechnology Information website (http://www.ncbi.nlm.nih.gov/), at the TIGR website (http://www.tigr.org/), at the S. pneumoniae Genome Diversity Project website (http:// genome.microbio.uab.edu/strep/), and at the WIT website (http://wit.mcs.anl.gov./WIT2/). Sequence analysis was carried out using various tools available at the Baylor College of Medicine Search Launcher website (http:// dot.imgen.bcm.tmc.edu:9331/index.htlm). RNA structure prediction was done using RNAstructure 2.5. Protein analysis was made using tools available at the Expasy Molecular Biology Server website (http://www.expasy.ch/). The CLUSTALW program was used to compare the allelic variants of PspC and to group them. Proof-reading of sequences obtained was performed by doing BLAST analysis on a database containing all raw sequences using BioEdit as already described (Oggioni and Pozzi, 2001). 2.6. Nucleotide sequence Accession numbers The nucleotide sequences of the pspC locus of the 43 strains analyzed are assigned GenBank Accession numbers AF154006 to AF154045, and AF276620 to AF276622 (see Table 2).

F. Iannelli et al. / Gene 284 (2002) 63–71

65

Fig. 1. Groups of PspC proteins. Eleven groups are represented. For each group, a representative protein was selected and depicted here: PspC1.1, PspC2.2, PspC3.1, PspC4.2, PspC5.1, PspC6.5, PspC7.1, PspC8.1, PspC9.2, PspC10.1, and PspC11.1 (Table 2). The number of strains belonging to each group is reported, together with the already described proteins. Homologous sequence blocks are drawn as boxes of the same color and aligned. All boxes are drawn to scale except choline-binding domains, which have been uniformed (160 amino acids, as in Psp5.1) to obtain a better alignment of sequences. Choline-binding domains vary, in fact, between 141 amino acids (PspC2.5) and 261 amino acids (PspC3.5). Leader sequences are in black, random coil in gray and anchor domains in blue. The proline-rich regions are represented by green boxes; the different shadings and tonality reflect the different sequence compositions. All other colors and shadings refer to the alpha helical domains. The yellow alpha helical domain is repeated twice in strains of group 5 (top line), which serves as a standard for alignment of other groups. Protein groups are separated in two blocks depending on the anchor type encountered with proteins having a choline binding anchor (dark blue; groups 1–6) shown above and protein groups having an LPXTG anchor (light blue) below.

3. Results

3.2. The pspC locus

3.1. PspC nomenclature

The coding sequence of the pspC gene of type 4 strain TIGR4 (Tettelin et al., 2001) is 2082 bp in size. Using the IF43/IF30 primers pair of PCR directed at the ORFs adjacent to pspC, it was possible to amplify and analyze the pspC locus of each of the 43 strains of S. pneumoniae. In 37 out of 43 strains there was a single copy of the pspC gene, while two different tandem alleles of pspC were found in six strains. The sequence of the pspC locus was different in each strain, so that 43 different DNA sequences (for a total of 169,214 bp) were deposited in GenBank as a product of this study (Table 2). Insertion sequences (ISs) were found in the pspC locus of 11 out of 43 strains, including all six strains with two pspC alleles.

The term PspC is preferred over other designations, for two reasons: (i) it was used for the first sequence deposited in GenBank (Accession number: U72655), and (ii) it is a generic name referring only to the surface location of the molecule. Based on deduced amino acid sequence analysis we identified 11 major groups of PspC proteins, and used numbers from 1 to 11 to designate proteins belonging to each group (Fig. 1, see below). Single proteins within each group which display minor variations are identified by sequential numbers preceded by a dot. Priority in number assignment was given by the date of GenBank submission. Table 1 indicates the new names of allelic variants of PspC whose sequence was already available in GenBank, including CbpA, SpsA, PbcA, and Hic (Rosenow et al., 1997; Hammerschmidt et al., 1997; Cheng et al., 2000; Janulczyk et al., 2000).

3.3. PspC allelic variants Analysis of the deduced amino acid sequence of all PspC variants (Tables 1 and 2) showed a common organization of

66

Table 1 Proposed nomenclature for the allelic variants of PspC in S. pneumoniae Proposed new name

Capsule type

Strain

Proposed function

GenBank Accession number

Submission date

Reference

PspC PspC SpsA CbpA SpsA2 SpsA47 PbcA PspC PspC SP2190 PspC PspC PspC PspC Hic

PspC1.1 PspC1.2 a PspC2.1 PspC3.1 PspC3.2 PspC3.3 PspC3.1 PspC3.1 a PspC3.5 a PspC3.4 PspC4.1 PspC5.3 a PspC6.13 a PspC6.14 a PspC11.4

6A 6B 1 Rough (2) b Rough (2) b 47 Rough (2) b 2 4 4 NS d 23 6A 19 3

EF6796 BG9163 ATCC33400 (SV1) R6x c ATCC11733 (R36A) NCTC10319 CP1200 (RX1) D39 L81905 TIGR4 V26 E134 DBL6A BG8090 A66

Surface protein Surface protein SIgA-binding protein Adhesin SIgA-binding protein SIgA-binding protein C3-binding protein Surface protein Surface protein Choline binding protein A Surface protein Surface protein Surface protein Surface protein FH-binding inhibitor of complement

U72655 AF068650 Y10818 AF019904 AJ002054 AJ002055 AF067128 AF068646 AF068649 AE007507 AF145055 AF068647 AF068645 AF068648 AF252857

26-Sept-96 28-May-98 27-Jan-97 18-Aug-97 13-Oct-97 13-Oct-97 18-May-98 28-May-98 28-May-98 29-Jun-2001 22-April-99 28-May-98 28-May-98 28-May-98 5-April-2000

Brooks-Walter et al. (1999) Brooks-Walter et al. (1999) Hammerschmidt et al. (1997) Rosenow et al. (1997) Hammerschmidt et al. (1997) Hammerschmidt et al. (1997) Cheng et al. (2000) Brooks-Walter et al. (1999) Brooks-Walter et al. (1999) Tettelin et al. (2001) Brooks-Walter et al. (1999) Brooks-Walter et al. (1999) Brooks-Walter et al. (1999) Brooks-Walter et al. (1999) Janulczyk et al. (2000)

a b c d

This sequence was released after our manuscript preparation and the progressive allele name refers to our allele classification (see Table 2). This strain is an unencapsulated derivative of type 2 strain D39 (Avery et al., 1944; Iannelli et al., 1999). The complete genome sequence of the R6 strain is deposited in GenBank under the Accession number AE007317 (Hoskins et al., 2001). NS, serotype not specified in the submitted sequence.

F. Iannelli et al. / Gene 284 (2002) 63–71

Protein

F. Iannelli et al. / Gene 284 (2002) 63–71

67

Table 2 PspC allelic variants in S. pneumoniae Protein(s)

GenBank Accession number

Sequence size (bp) a

Strain b (reference)

Capsule type or group

PspC1.1

AF154037

3846

SRF10

ND c

PspC2.2 PspC2.3 PspC2.4 PspC2.5

AF154035 AF154023 AF154027 AF154039

3046 3052 3055 3019

G9 G394 G402 SRF22

19 6 6 ND

PspC3.1 PspC3.5 PspC3.6 PspC3.7 PspC3.8 PspC3.9 PspC3.10 PspC3.11 PspC3.12 PspC3.13

AF154012 AF154026 AF154013 AF154021 AF154028 AF154038 AF154040 AF154006 AF154008 AF154009

3070 3231 3157 3094 3157 2947 3010 3001 2494 3052

D39 G40 G363 G387 G403 SRF2 SRF25 ATCC6302 ATCC6305 ATCC6307

2 19 10 24/31/40 10 ND ND 2 5 7

PspC4.2/PspC10.1

AF154033

7656

G100

6

PspC5.1 PspC5.2

AF154032 AF154031

3574 2968

G5 G48s

29/34/35/42/47 ND

PspC6.1/PspC9.1 PspC6.2 PspC6.3 PspC6.4 PspC6.5/PspC9.2 PspC6.6 PspC6.7 PspC6.8/PspC9.3 PspC6.9/PspC9.4 PspC6.10 PspC6.11 PspC6.12 PspC6.13/PspC9.4 PspC6.14

AF154044 AF154018 AF154030 AF154036 AF154014 AF154016 AF154017 AF154045 AF154020 AF154029 AF154010 AF154011 AF154042 AF154041

9015 2947 2938 2524 6581 3007 3064 10511 6917 3223 2932 3001 6986 2861

G31 G38 G46 G99 G374 G376 G378 G383 G386 G408 8R1 A60 SRF9 SRF30

6 18 19 19 17 19 9 6 6 24/31/40 Rough (8) 6 ND ND

PspC7.1 PspC7.2 PspC7.3 PspC7.4

AF154034 AF154022 AF154019 AF154043

6108 6229 6099 6097

G54 G389 G385 SRF15

19F 9 29/34/35/42/47 ND

PspC8.1 PspC8.2 PspC8.3 PspC8.4

AF154015 AF154024 AF154025 AF154007

2466 2466 2457 2565

G375 G396 G398 ATCC6303

3 3 3 3

PspC11.1 PspC11.2 PspC11.3

AF276620 AF276621 AF276622

2793 2733 2265

G48 G60 3496

3 3 3

a

Size refers to the length of the whole DNA fragment sequenced. ‘SRF’ strains are clinical isolates from this work; ‘G’ strains, 8R1, and A60 have been described before (Pozzi et al., 1996; Oggioni et al., 1999; Bernheimer, 1979); ‘ATCC’ strains are from the American Type Culture Collection. c ND, serotype not determined. b

the molecules: (i) a 37 amino acid leader peptide, (ii) a Nterminal segment which is essentially alpha-helical, (iii) a proline-rich region, and (iv) a C-terminal anchor, responsible for the cell surface attachment (Fig. 1). However, the

mechanism for surface attachment was not uniform, since in 17 cases the PspC allelic variants did not show the choline binding domain, but rather the LPXTG motif. This major difference in the anchor sequence extended through the

68

F. Iannelli et al. / Gene 284 (2002) 63–71

adjacent proline-rich regions, which also differed considerably in size and composition (Fig. 1, see below). 3.4. PspC groups Multiple sequence alignments clearly divided the PspC proteins into different groups. Based on identification of unique sequence blocks, we placed the PspC alleles into 11 groups (Fig. 1). The grouping and respective numbering obtained is not based on evolutionary relationships, but on clusters of sequence homogeneity, which may have functional implications. 3.5. Intergenic regions A 78 bp DNA sequence essentially identical in all pneumococcal strains was present between the stop codon of sp2191 and the start codon of pspC (Fig. 2). In strains carrying PspC proteins of groups 1–6, the 229 bp DNA intergenic region, located between pspC and sp2189, was essentially identical (Fig. 2). A totally different 219 bp sequence was found downstream of pspC8, pspC9, pspC10, and pspC11 genes, while two ISs were found downstream of pspC7 (Fig. 2, see below). Upstream of pspC genes in groups 9 and 10, which also contain a second copy of pspC (Fig. 2), ISs and, in some cases, additional ORFs were observed (Fig. 2, see below). The DNA segment spanning the pspC stop codon and the insertion sequence IS1167 (Zhou et al., 1995) is 177 bp long and is highly conserved except in PspC7, where a

completely different sequence was found. This sequence is 90% homologous to the sequence between the bag stop codon and IS1167 in Streptococcus agalactiae (Fig. 3) (Jerlstrom et al., 1991). 3.6. ISs and additional ORFs ISs were found in 11 out of 43 strains: six strains with two tandem copies of pspC, four strains with the pspC7 allele, and strain SRF30 (pspC6.14). Upstream of pspC6.14 there is a truncated form of IS1381 (Sanchez-Beato et al., 1997). In all strains with two pspC genes, IS1167 (Zhou et al., 1995) – or parts thereof – was found downstream of the first pspC gene (Fig. 2). In two strains (G386, SRF9) IS1515 (Munoz et al., 1998) was found downstream of the second copy of pspC. Two new putative ISs were found and named according to the suggested criteria (Mahillon and Chandler, 1998): IS3-SpnI was found in strains carrying pspC7 and is similar to IS861 of S. agalactiae (Mahillon and Chandler, 1998) and IS1380-SpnI was found inserted into IS1167 in strain G383 (Fig. 2). Three additional ORFs are present between the two copies of pspC of strains G31 and G383, which show essentially identical DNA sequence in the pspC locus except for the insertion of IS1380-SpnI in IS1167 (Fig. 2). orf1 and orf2 are similar to nadR and pnuC genes involved in nicotinamide mononucleotide transport in Salmonella typhimurium (Foster et al., 1990). The orf3 gene product is similar to Methanococcus jannaschii MutT protein (Bult et al., 1996).

Fig. 2. Organization of pspC loci of S. pneumoniae containing ISs. The pspC locus in type 4 S. pneumoniae TIGR4 (Tettelin et al., 2001) is located between two ORFs annotated as sp2191 and sp2189. pspC is 2082 bp, sp2191 is 528 bp, and sp2189 is 957 bp (values of the type 4 S. pneumoniae strain TIGR4) (Tettelin et al., 2001). Start codons of sp2189 and sp2191 are at positions 2595 and 6458, respectively, on section 190 of the type 4 genome (GenBank Accession number: AE007507). Sequenced DNA fragments are represented by an open rectangle. Within rectangles, ORFs and their direction of transcription are represented by arrows, and ISs are represented by boxes. Names of strains containing the depicted structures are reported. The scale is in kilobases.

F. Iannelli et al. / Gene 284 (2002) 63–71

69

Fig. 3. Nucleotide identity between the beta antigen (bag) gene of S. agalactiae and pspC7. Nucleotide identity extends to DNA segments adjacent to the pspC coding sequence. The scale is in kilobases.

3.7. Signal peptide A 37 aa signal peptide was found in the sequence of all PspC proteins. The leader peptide was essentially identical in all strains with a very limited variability at positions 5, 25, 30, 33, and 35. More variability was encountered in proteins of groups 7, 9, and 10, which are essentially identical to the leader peptide of the IgA-binding Bag protein of S. agalactiae (Jerlstrom et al., 1991) (Fig. 3). The PspC leader peptide showed only limited homology to leader peptides of other pneumococcal proteins. 3.8. N-Terminal alpha-helical region Immediately downstream of the signal sequence cleavage site, all PspC proteins have a short stretch (9–12 amino acids) which is predicted to form a random coil, and followed by an alpha helix. The N-terminal part of PspC is predicted to have a charged alpha-helical region interrupted by few random coils. This helical region showed a high degree of polymorphism, as already described (Brooks-Walter et al., 1999), and its size varied from 118 to 589 amino acids. By amino acid sequence comparison conserved and unique domains were identified (Fig. 1). One of these domains – or part thereof – was present in most variants (yellow box in Fig. 1), with the exception of PspC8, PspC9, and PspC10. This ‘yellow’ domain – or part thereof – is repeated twice in PspC1, PspC2, PspC3, and PspC6, while two complete copies were found only in PspC5 proteins (Fig. 1). Also other domains composing the helical part of the various proteins are shared by different PspC allelic groups, while unique domains were found only in PspC4 and PspC10. All helical regions are interrupted by short random coils, one of which is 56 amino acids in size and separates the two yellow domains in PspC5 (Fig. 1). The domains depicted in fuchsia and in pink are followed by a short serine-rich domain at their C-terminal end, whereas short proline-rich domains (random coil) are associated with the ‘brown’ domain (Fig. 1). Significant homology to known proteins was found only for two

helical domains. The red domain of PspC1 and PspC5 shows 77 and 83% amino acid identity with a region of PspA (GenBank Accession number: M74122) (Yother and Briles, 1992), while the brown domain of PspC7, PspC9, and PspC10 is about 50% identical to a domain of the IgA-binding Bag protein of S. agalactiae (Jerlstrom et al., 1991) (Fig. 3). 3.9. Proline-rich region The C-terminal anchor of all PspC proteins is preceded by a proline-rich repeat region, which varies considerably among groups (green boxes, Fig. 1). This region is essentially identical in PspC1 and PspC5 (90% amino acid identity). The proline-rich regions are also highly homologous among PspC2, PspC3, and PspC6 (80–100% amino acid identity), showing 70–80% identity to the proline-rich region of PspA (Yother and Briles, 1992; Brooks-Walter et al., 1999). In groups 8–10 the proline-rich region is characterized by the presence of a variable number (from 7 to 27) of an 11 amino acid repeat, which was essentially identical in all cases (Fig. 4). Within one molecule, different repeats could have either proline or leucine in position one, whereas PspC9 and PspC10 carried glutamine in position 11 instead of glutamic acid (Fig. 4). In PspC7 proteins there are six complete repeats of 31 amino acids (Fig. 4). 3.10. Choline-binding domain Similar to PspA and other pneumococcal surface proteins, the C-termini of PspC proteins of groups 1–6 are character-

Fig. 4. Amino acid repeats of the proline-rich regions of PspC proteins with a LPXTG anchor.

70

F. Iannelli et al. / Gene 284 (2002) 63–71

ized by a choline-binding domain followed by a weakly hydrophobic stretch of 18 amino acids. Depending on the PspC protein, this domain is formed by a number of repeats which varies between seven and 13. The repeats are 20 amino acids in size and are very conserved among PspC proteins, with only a limited number of point mutations occurring mainly at positions 5, 15, and 18. The 18 amino acid, weakly hydrophobic C-termini were essentially identical in all PspC proteins of groups 1–6, and were not present in PspC4.2. 3.11. LPXTG-based cell wall sorting signal The C-terminus of PspC proteins of groups 7–11 displays the typical cell wall sorting signal of gram-positive bacteria, which is formed by the C-terminal residues of the protein, comprising the LPXTG motif, the hydrophobic domain, and the charged tail (Navarre and Schneewind, 1999). In all cases, the cell wall sorting signal of PspC is 32 amino acids, and starts with the LPSTG motif. The other 27 residues are identical among PspC proteins of groups 8–11, while in PspC7 they are essentially identical to the 27 Cterminal residues of Bag protein of S. agalactiae (Jerlstrom et al., 1991) (Fig. 3). 4. Discussion In this work we characterized the highly polymorphic pspC locus of S. pneumoniae. Sequencing of the pspC locus in 43 pneumococcal isolates allowed comparison with already available pspC sequences, and provided insights into the nature of the pspC gene product, a surface protein important for pathogenesis and vaccine development. Results indicate that: (i) each pneumococcal strain contains a pspC gene at the same chromosomal location, (ii) each pneumococcal strain has a unique DNA sequence at the pspC locus, (iii) DNA encoding for the signal peptide is the only part of the pspC coding sequence conserved in all strains, (iv) pspC gene products (i.e. PspC proteins) share a common molecular organization, (v) 11 major groups of PspC proteins were identified by sequence comparison, (vi) the N-terminal helical portion of different molecules is the result of assembly of eight major sequence blocks, which – with two exceptions – have no homologs in sequence databases, but recur in different PspC proteins, and (vii) the Cterminal anchor region of different molecules contains either the choline-binding domain or the LPXTG motif. Of the many phenotypes associated with PspC, only the binding to the secretory component of human IgA was mapped at the peptide level (Hammerschmidt et al., 2000). The hexapeptide YRNYPT, present in the N-terminal portion of PspC (Fig. 1, asterisk in the yellow domain), was identified as responsible for this phenotype (Hammerschmidt et al., 2000). This binding was shown to occur in 73% of strains, in a sample of 52 pneumococcal strains of different serotypes analyzed (Hammerschmidt et al., 1997). In our sample of 43 pneumococcal isolates, the YRNYPT motif was present in all

PspC proteins of groups 1–7, while it was not carried by proteins present in type 3 strains (PspC8 and PspC11), or by the second protein of strains with two PspC alleles (PspC9 and PspC10) (Fig. 1). In conclusion, only serotype 3 strains do not have the potential of binding to human secretory IgA and human secretory components, however this property is highly conserved among different serotypes. On the basis of multiple sequence alignments we classified the PspC proteins into 11 groups, identified by the presence of sequence blocks. The 11 different groups have no obvious evolutionary relationship to each other. PspC loci appear to evolve like other loci (capsule locus, antibiotic resistance genes) of transformable species which are under selective pressure. Divergence results from horizontal gene transfer and homologous recombination to produce a mosaics of genes. The presence of different anchor regions in different allelic variants is a unique trait of PspC. The LPXTG-based cell wall sorting signal was found in PspC of type 3 pneumococci (groups 8 and 11), in the second protein of strains with two alleles (groups 9 and 10), and in PspC7. pspC7 and adjacent DNA sequences appear more closely related to the beta antigen gene (bag) of S. agalactiae than to other pspC alleles (Fig. 3). Given the pspC variability, it is possible that some PspC variants may lack some of the virulence phenotypes associated with this surface protein, however a definitive picture will be obtained only when all PspC properties have been clearly mapped to a peptide level. Considering the use of PspC in a vaccine perspective, it should be noted that most variability is generated by a limited number of domains (eight major sequence blocks). Therefore, it is possible to design and produce recombinant mosaic molecules capable of covering all (or most) of the relevant epitopes of PspC. These mosaic molecules could be valuable candidates for protein-based vaccines against S. pneumoniae. If the eight major helical sequence blocks identified in this work are cross-protective toward all the strains within a certain PspC group this should be one of the primary aims in any initiative considering a such designed PspC-based vaccine.

Acknowledgements We thank Pat Cleary for critically reading the manuscript. The authors also thank The Institute for Genomic Research (TIGR) for preliminary sequence data of the type 4 S. pneumoniae TIGR4. This work was supported in part by grants from MURST (Cofinanziamento 1999), CNR (P. F. Biotecnologie), and the Commission of the European Union (QLK2-2000-00543).

References Avery, O.T., MacLeod, C.M., McCarty, M., 1944. Studies on the chemical nature of the substance inducing transformation of pneumococcal types.

F. Iannelli et al. / Gene 284 (2002) 63–71 Induction of transformation by a desoxyribonucleic acid fraction isolated from Pneumococcus type III. J. Exp. Med. 79, 137–158. Bernheimer, H.P., 1979. Lysogenic pneumococci and their bacteriophages. J. Bacteriol. 138, 618–624. Briles, D.E., Tart, R.C., Swiatlo, E., Dillard, J.P., Smith, P., Benton, K.A., Ralph, B.A., Brooks-Walter, A., Crain, M.J., Hollingshead, S.K., McDaniel, L.S., 1998. Pneumococcal diversity: considerations for new vaccine strategies with emphasis on pneumococcal surface protein A (PspA). Clin. Microbiol. Rev. 11, 645–657. Briles, D.E., Hollingshead, S.K., Brooks-Walter, A., Nabors, G.S., Ferguson, L., Schilling, M., Gravestein, S., Braun, P., King, J., Swift, A., 2000. The potential to use PspA and other pneumococcal proteins to elicit protection against pneumococcal infection. Vaccine 18, 1707– 1711. Brooks-Walter, A., Briles, D.E., Hollingshead, S.K., 1999. The pspC gene of Streptococcus pneumoniae encodes a polymorphic protein, PspC, which elicits cross-reactive antibodies to PspA and provides immunity to pneumococcal bacteremia. Infect. Immun. 67, 6533–6542. Bult, C.J., White, O., Zhou, L., Fleischmann, R.D., Sutton, G.G., Blake, J.A., Fitzgerald, L.M., Clayton, R.A., Overbeek, R., Kirkness, E.F., Weinstock, K.G., Merrick, J.M., Glodek, A., Scott, J.L., Geoghagen, N.S.M., Venter, J.C., 1996. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science 273, 1058– 1073. Cheng, Q., Finkel, D., Hostetter, M.K., 2000. Novel purification scheme and functions for a C3-binding protein from Streptococcus pneumoniae. Biochemistry 39, 5450–5457. Dave, S., Brooks-Walter, A., Pangburn, M.K., McDaniel, L.S., 2001. PspC, a pneumococcal surface protein, binds human factor H. Infect. Immun. 69, 3435–3437. Dopazo, J., Mendoza, A., Herrero, J., Caldara, F., Humbert, Y., Friedli, L., Guerrier, M., Grand-Schenk, E., Gandin, C., de Francesco, M., Polissi, A., Buell, G., Feger, G., Garcia, E., Peitsch, M., Garcia-Bustos, J.F., 2001. Annotated draft genomic sequence from Streptococcus pneumoniae type 19F clinical isolate. Microb. Drug Resist. 7, 99–125. Foster, J.W., Park, Y.K., Penfound, T., Fenger, T., Spector, M.P., 1990. Regulation of NAD metabolism in Salmonella typhimurium: molecular sequence analysis of the bifunctional nadR regulator and the nadApnuC operon. J. Bacteriol. 172, 4187–4196. Hammerschmidt, S., Talay, S.R., Brandtzaeg, P., Chhatwal, G.S., 1997. SpsA, a novel pneumococcal surface protein with specific binding to secretory Immunoglobulin A and secretory component. Mol. Microbiol. 25, 1113–1124. Hammerschmidt, S., Tilling, M.P., Wolff, S., Vaerman, J.P., Chhatwal, G.S., 2000. Species-specific binding of human secretory component to SpsA protein of Streptococcus pneumonie via a hexapeptide motif. Mol. Microbiol. 36, 726–736. Hoskins, J., et al., 2001. Genome of the bacterium Streptococcus pneumoniae strain R6. J. Bacteriol. 183, 5709–5717. Iannelli, F., Giunti, L., Pozzi, G., 1998. Direct sequencing of long PCR fragments. Mol. Biotechnol. 10, 183–185. Iannelli, F., Pearce, B.J., Pozzi, G., 1999. The type 2 capsule locus of Streptococcus pneumoniae. J. Bacteriol. 81, 2652–2654. Janulczyk, R., Iannelli, F., Sjoholm, A.G., Pozzi, G., Bjorck, L., 2000. Hic,

71

a novel surface protein of Streptococcus pneumoniae that interferes with complement function. J. Biol. Chem. 275, 37257–37263. Jerlstrom, P.G., Chhatwal, G.S., Timmis, K.N., 1991. The IgA-binding beta antigen of the c protein complex of Group B streptococci: sequence determination of its gene and detection of two binding regions. Mol. Microbiol. 5, 843–849. Mahillon, J., Chandler, M., 1998. Insertion sequence. Microbiol. Mol. Biol. Rev. 62, 725–774. McDaniel, L.S., Yother, J., Vijayakumar, M.N., McGarry, L., Guild, W.R., Briles, D.E., 1987. Use of insertional inactivation to facilitate studies of biological properties of pneumococcal surface protein A (PspA). J. Exp. Med. 165, 381–394. Munoz, R., Lopez, R., Garcia, E., 1998. Characterization of IS1515, a functional insertion sequence in Streptococcus pneumoniae. J. Bacteriol. 180, 1381–1388. Navarre, W.W., Schneewind, O., 1999. Surface proteins of gram-positive bacteria and mechanisms of their targeting to the cell wall envelope. Microbiol. Mol. Biol. Rev. 63, 174–229. Oggioni, M.R., Pozzi, G., 2001. Comparative genomics for identification of clone-specific sequence blocks in Streptococcus pneumoniae. FEMS Microbiol. Lett. 200, 137–143. Oggioni, M.R., Iannelli, F., Pozzi, G., 1999. Characterization of cryptic plasmids pDP1 and pSMB1 of Streptococcus pneumoniae. Plasmid 41, 70–72. Ogunniyi, D.A., Folland, R.L., Briles, D.E., Hollingshead, S.K., Paton, J.C., 2000. Immunization of mice with combinations of pneumococcal virulence proteins elicits enhanced protection against challenge with Streptococcus pneumonioae. Infect. Immun. 68, 3028–3033. Pozzi, G., Masala, L., Iannelli, F., Manganelli, R., Havarstein, L.S., Piccoli, L., Simon, D., Morrison, D.A., 1996. Competence for genetic transformation in encapsulated strains of Streptococcus pneumoniae: two allelic variants of the peptide pheromone. J. Bacteriol. 178, 6087–6090. Rosenow, C., Ryan, P., Weiser, J.N., Jhonson, S., Fontan, P., Ortqvisist, A., Masure, H.R., 1997. Contribution of novel choline-binding proteins to adherence, colonization and immunogenicity of Streptococcus pneumoniae. Mol. Microbiol. 25, 819–829. Sanchez-Beato, A., Garcia, E., Lopez, R., Garcia, J.L., 1997. Identification and characterization of IS1381, a new insertion sequence in Streptococcus pneumoniae. J. Bacteriol. 179, 2459–2463. Tettelin, H., et al., 2001. Complete genome sequence of a virulent isolate of Streptococcus pneumoniae. Science 293, 498–506. Yother, J., Briles, D.E., 1992. Structural properties and evolutionary relationships of pspA, a surface protein of Streptococcus pneumoniae, as revealed by sequence analysis. J. Bacteriol. 174, 601–609. Yother, J., White, J.M., 1994. Novel surface attachment mechanism of Streptococcus pneumoniae protein PspA. J. Bacteriol. 176, 2976–2985. Zhang, J.R., Mostov, K.E., Nanno, M., Shimida, S., Ohwaki, M., Tuomanen, E., 2000. The polymeric immunoglobulin receptor translocates pneumococci across human nasopharyngeal epithelial cells. Cell 102, 827–837. Zhou, L., Hui, F.M., Morrison, D.A., 1995. Characterization of IS1167, a new insertion sequence in Streptococcus pneumoniae. Plasmid 33, 127– 138.