Comparative analysis of the complete cag pathogenicity island sequence in four Helicobacter pylori isolates

Comparative analysis of the complete cag pathogenicity island sequence in four Helicobacter pylori isolates

Gene 328 (2004) 85 – 93 www.elsevier.com/locate/gene Comparative analysis of the complete cag pathogenicity island sequence in four Helicobacter pylo...

435KB Sizes 0 Downloads 59 Views

Gene 328 (2004) 85 – 93 www.elsevier.com/locate/gene

Comparative analysis of the complete cag pathogenicity island sequence in four Helicobacter pylori isolates Anna Blomstergren a, Annelie Lundin b,c, Christina Nilsson b,c, Lars Engstrand b,c, Joakim Lundeberg a,* a

Department of Biotechnology, Royal Institute of Technology, Alba Nova University Center, Roslagstullsbacken 21, S-106 91 Stockholm, Sweden b Department of Bacteriology, Swedish Institute for Infectious Disease Control, Solna, Sweden c Microbiology and Tumor Biology Center, Karolinska Institutet, Stockholm, Sweden Received 7 July 2003; received in revised form 6 October 2003; accepted 24 November 2003 Received by T. Sekiya

Abstract The cytotoxin-associated gene (cag) pathogenicity island (PAI) is important for the virulence of Helicobacter pylori. In this study, we have compared the complete nucleotide sequence of the cag PAI in four clinical isolates. These isolates were selected from patients matched for age and sex from the same geographical region. The patients suffered from either gastric cancer (Ca52 and Ca73) or duodenal ulcer (Du23:2 and Du52:2). All four strains induced an interleukin (IL)-8 response in AGS cells and translocated CagA into host cells where the protein was tyrosine phosphorylated, and thus harboured a functional type IV secretion system encoded by the cag PAI. The cagA gene contains a variable region close to its 3V end. Different compositions of this region has been proposed to exert various degrees of morphological changes in cultured gastric epithelial cells, and there are indications that the structure of the repetitive region is connected to the severity of disease. One of the studied strains (Du23:2) possessed five Western-type repeat regions while the other three strains harboured one Western-type repeat. Strain Du23:2 also contained a major rearrangement or large insertion/ duplication in the intergenic region between HP0546 and HP0547 (cagA). Sequence similar to that of genes HP0510 and HP0509 was found in the 5V end of this region. The 3V end was similar to the corresponding region of strain ATCC 43504, including a mini IS605 element and a duplication of the 3V end of the cag PAI. Finally, a novel gene was identified in the cag PAI in three of the sequenced strains at the position of HP0521. This gene, HP0521B, is present in approximately half of Swedish H. pylori isolates. D 2004 Elsevier B.V. All rights reserved. Keywords: HP0521B; Diversity; Type IV secretion system

1. Introduction Helicobacter pylori causes gastritis, gastric and duodenal ulcer and is associated with gastric cancer (Peek and Blaser, 2002). About half of the world’s population is chronically infected with H. pylori, but only 10% of Abbreviations: bp, base pair(s); cag, cytotoxin-associated gene; IL, interleukin; IS, insertion sequence; kb, kilobase pair(s); ORF, open reading frame; PAGE, polyacrylamide gel electrophoresis; PAI, pathogenicity island; SDS, sodium dodecyl sulphate; SHP-2, SRC homology 2 domain containing tyrosine phosphatase; VacA, vacuolating cytotoxin. * Corresponding author. Tel.: +46-8-5537-83-27; fax: +46-8-55-3784-81. E-mail address: [email protected] (J. Lundeberg). 0378-1119/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2003.11.029

infected individuals develop disease during their lifetime. A focus of the H. pylori research community is to understand why only a portion of infected individuals develops gastroduodenal disease and determine bacterial and host factors contributing to disease progression. A known bacterial virulence trait is the cag PAI, the cytotoxin-associated gene pathogenicity island (Censini et al., 1996), a 40-kb gene fragment containing 27 genes that encodes a functional type IV secretion system. CagA is one of the most studied proteins in the cag PAI. In addition, type I strains that contain cagA and a toxic form of VacA, the vacuolating cytotoxin, are considered to be more virulent than type II strains that lack cagA and contain a nontoxic form of VacA. Parts of or the complete

86

A. Blomstergren et al. / Gene 328 (2004) 85–93

cag PAI can be excised from the H. pylori genome, and such strains are less virulent than strains with an intact cag PAI (Censini et al., 1996; Bjo¨rkholm et al., 2001; Nilsson et al., 2003). Several of the genes in the cag PAI are involved in the induction of the proinflammatory cytokine interleukin (IL)-8, which contributes to the chronic inflammation in the stomach of infected individuals. H. pylori uses the type IV secretion system to inject CagA into epithelial cells where it is tyrosine-phosphorylated by members of the Src family of tyrosine kinases (Selbach et al., 2002a, 2003). Posphorylated CagA forms a complex with, and activates the phosphatase SHP-2 (Higashi et al., 2002b) resulting in a signalling cascade that is further associated with cytoskeletal rearrangements and elongation of cultured epithelial cells (Segal et al., 1997, 2003). However, the role of CagA phosphorylation has been disputed and unphosphorylated CagA interacts with growth factor receptor bound 2 (Grb-2) and induces a similar response in host cells (Mimuro et al., 2002). Recently, Amieva et al. (2003) described an alternative function of CagA using polarized cells. They showed that CagA associates with the tight-junction protein ZO-1 and thereby alters the composition and function of the epithelial apical – junctional complex. The carboxy (C)-terminal region of the CagA protein includes a variable number of repeats of the tyrosine phosphorylation motif EPIYA that may be important for the degree of morphological changes caused by CagA (Azuma et al., 2002). Comparative genomics is a rapidly growing field. H. pylori was the first organism from which two complete genomes were sequenced and extensive comparisons of these have been made (Tomb et al., 1997; Alm et al., 1999). Genome assessments of H. pylori shows that most of the variations on nucleotide level does not confer amino acid differences because they frequently occur at the third position of the codons (Alm et al., 1999). Comparisons of the two genomes also show that approximately 7% of the genes are strain specific (Alm et al., 1999). In this study, the complete cag PAI sequence from four clinical isolates, two from cancer patients and two from duodenal ulcer patients, were compared. This comparison supported interstrain differences of known variable regions as well as revealed so far unknown diversity.

2. Materials and methods 2.1. Bacterial strains and culture conditions The H. pylori strains, Ca52, Ca73, Du23:2 and Du52:2, used for sequencing, have previously been characterized in our laboratory (Enroth et al., 2000). Additional H. pylori strains used are 26695 (Tomb et al., 1997), J99 (Alm et al., 1999), 67:20 and 67:21 (Bjo¨rkholm et al., 2001). All strains were grown on GC agar plates as previously described (Bjo¨rkholm et al., 2001).

2.2. Preparation of genomic DNA Genomic DNA was prepared using the Qiagen DNeasy preparation kit (Qiagen, Hilden, Germany). DNA concentrations were determined by measuring the absorbance at 260 nm. 2.3. Primer design The initial primer set was designed to produce PCR products of approximately 800 base pairs (bp) with roughly 100 bp overlap of adjacent PCR products, using the two complete H. pylori genomes (26695 and J99) as templates. All primers had a Tm of 58 –62 jC and were supplied by MWG Biotech AG (Ebersberg, Germany). The primer analysis software OligoR 4.05 (National Biosciences, Plymouth, MN, USA) was used to search for secondary structures or primer – dimer formation. Additional primers were designed to close the gaps and achieve high quality sequence using the obtained strain-specific sequence data. All primer sequences can be found at http://biobase.biotech.kth.se/hpylori. 2.4. PCR amplification The designed primers were used in 50 Al PCR reactions containing 1  PCR II (10 mM Tris – HCl pH 8.3, 50 mM KCl), 2 mM MgCl2 and 0.25 U AmpliTaq GoldR from Applied Biosystems (Foster City, CA, USA), 0.2 mM of each nucleotide (Amersham Pharmacia Biotech, Piscataway, NJ, USA), 10 pmol of each primer and 5 ng genomic DNA. Samples were denaturated at 95 jC for 10 min and then cycled 35 times, 96 jC for 30 s, 51 jC for 30 s and 72 jC for 2 min, followed by a final elongation step at 72 jC for 10 min. All PCR products were visualized by agarose gel electrophoresis. 2.5. Sequencing on MegaBACE 1000 Cycle sequencing reactions were performed using the DYEnamic ET terminator Cycle Sequencing kit (Amersham Pharmacia Biotech). In a 20 Al reaction, 8 Al of sequencing reagent premix was used with 1.5 Al PCR product and 5 pmol primer. The cycling was performed according to the manufacturer’s instructions. The cycle sequencing products were ethanol precipitated in 96-well plates using 2 Al 7.5 M NH4Ac and 60 Al 96% ethanol per reaction, washed in 70% ethanol and dissolved in 20 Al water prior to sequencing on a MegaBACE 1000 DNA sequencer (Amersham Pharmacia Biotech). 2.6. Sequencing on ABI3700 Cycle sequencing was performed in 10 Al reactions using 1 Al Big Dye version 2.0 mix (Applied Biosystems), 1 Al PCR product and 5 pmol primer in 26 mM Tris –HCl

A. Blomstergren et al. / Gene 328 (2004) 85–93

pH 9 and 6.5 mM MgCl2. Cycling was performed according to the manufacturer’s instructions. The products were ethanol precipitated using 1 Al 3 M NaAc and 25 Al 96% ethanol, washed in 70% ethanol and dissolved in 10 Al water prior to sequencing on an ABI3700 DNA sequencer (Applied Biosystems). 2.7. Cycle sequencing on genomic material Regions where no PCR products were obtained were sequenced directly from genomic material. Reactions with a total volume of 40 Al containing 16 Al Big Dye mix version 2.0, 10 pmol primer and 3 Ag genomic DNA were cycled as follows: 94 jC for 5 min, 70 cycles of 96 jC for 20 s, 50 jC for 30 s and 60 jC for 4 min. The reactions were precipitated with 4 Al 3 M NaAc and 100 Al 96% ethanol, washed in 70% ethanol and dissolved in 10 Al of water prior to sequencing on an ABI3700 DNA sequencer. 2.8. Sequence assembly Each strain was assembled individually using the Staden package (Staden et al., 2000), which include Pregap4 and Gap4. The sequences were assigned Phred scores (Ewing et al., 1998) and screened for poor quality in Pregap4, then aligned and edited with Gap4. 2.9. Sequence comparisons MultiPipMaker (Schwartz et al., 2000) was used to make a full alignment of the nucleotide sequence of the four strains. Some regions were aligned separately to get an optimal alignment. The nucleotide sequences were translated using the ExPASy Translate tool and the amino acid sequence of the open reading frames (ORFs) were aligned using CLUSTALW; both programs are from the Swiss Institute of Bioinformatics’ ExPASy server. Overall average KS and KA values were calculated for each gene with the Kumar et al. (2001) method using the MEGA2.1 software. 2.10. Infection assay AGS cells (ATCC CRL-1739) were routinely maintained in Nutrient Mixture Ham’s F-12 (Life Technologies, Paisley, UK), supplemented with 10% fetal bovine serum (Sigma, St Louis, MO, USA) and cultured at 37 jC in 5% CO2/95% air. Cells were seeded in 24-well tissue culture plates and infected with 2  107 bacterial cells. After coincubation for 6 h, the cell culture supernatant was collected and stored at 20 jC. Washed cells were lysed in 1  Sample buffer (80 mM Tris –HCl, pH 6.8, 2% SDS, 10% glycerol, 5% mercaptoethanol) and stored at 20 jC. The concentration of IL-8 in the supernatant was analyzed by ELISA, using IL8 Eli-pair (Diaclone, Besancß on, France), according to the manufacturer’s manual.

87

2.11. Immunoblotting experiments Cell lysates were subjected to sodium dodecyl sulphatepolyacrilamide gel electrophoresis (SDS-PAGE) and transferred to PVDF membranes using standard procedures. Primary monoclonal antibodies against phosphotyrosine (PY-99, 1:10,000) and anti-CagA antibodies (1:5,000) were kindly provided by Steffen Backert (Selbach et al., 2003). Horseradish peroxidase-conjugated rabbit antimouse and goat antirabbit antibodies (DAKO, Glostrup, Denmark) were used and signals were visualized by ECL detection system (Amersham Biosciences, Uppsala, Sweden). 2.12. Accession numbers The nucleotide sequences of the four strains were deposited in GenBank under the following accession numbers: Ca52 AY330636 and AY330637; Ca73, AY330638 and AY330639; Du23:2, AY330643 and AY330644; and Du52:2, AY330640, AY330641 and AY330642.

3. Results and discussion 3.1. Overview of the cag PAI We have sequenced the complete cag PAI in four H. pylori strains selected from patients matched for age and sex from the same geographical region in Sweden. Two isolates were obtained from cancer patients (Ca52 and Ca73) and two from duodenal ulcer patients (Du23:2 and Du52:2). Primers for PCR and sequencing were initially designed using conserved regions in the two completely sequenced genomes, 26695 (Tomb et al., 1997) and J99 (Alm et al., 1999), and the nomenclature used in this paper follows strain 26695. Additional strain-specific primers were designed in order to close the remaining gaps and to obtain the high quality sequence necessary for comparisons. The calculated error rate was less than 1 error per 10,000 bases with approximately 6 times coverage. The G + C content of the cag PAI was 35.8% in Du23:2, Du52:2, and Ca73 and 36.0% in Ca52. The cag PAI is approximately 40 kb long and flanked by two 31 bp repeats. An overview of the cag PAI in the four isolates together with the two completely sequenced strains, 26695 and J99, is shown in Fig. 1. The four newly sequenced strains followed the same basic structure of the cag PAI as previously reported with some interesting exceptions. Strain Ca52 lacked the complete HP0521 gene and harboured an IS606 insertion element between HP0523 and HP0524. The other three strains all had a new gene at the position of HP0521. Finally, Du52:2 contained a large insertion or rearrangement in the intergenic region between HP0546 and HP0547. From the alignment in Fig. 1, it is apparent that the majority of insertions and deletions are located in the

88

A. Blomstergren et al. / Gene 328 (2004) 85–93

Fig. 1. Alignment and comparison of the complete cag PAI sequences of strains Ca52, Ca73, Du23:2 and Du52:2 and the two fully sequenced strains 26695 and J99. Green areas represent aligned sequences, while grey areas and black bars mark deletions. Yellow areas show regions where more than one allele is present, making alignment impossible. Three strains contained a new gene, HP0521B, at the position of HP0521. The intergenic region between HP0547 and HP0549 is categorized according to Kersulyte et al. (2000b). Gene HP0527 contains a large repetitive region, which is unresolved in the four newly sequenced strains (dark orange areas). Strain Du52:2 contained a large insertion or rearrangement in the intergenic region between HP0546 and HP0547, labelled red in the figure. Dark blue arrows denote the location and direction of open reading frames, and the light blue arrow shows the location of an insertion element in strain Ca52. The repeats flanking the cag PAI are marked with blue bars.

intergenic regions. Most intragenic insertions and deletions are in multiples of three, and thus do not alter the reading frame. However, in Ca73 one base is inserted close to the 3V end in the ORF of HP0536, resulting in a frameshift that either disrupts the translation of the entire protein or forces the strain to use an alternative starting point. The first available starting point is a GTG that will result in a protein lacking 52 amino acids at the amino-terminal. Because the HP0536 protein in the reference strain 26695 is predicted to consist of 114 amino acids, the corresponding protein in Ca73 would have a dramatically reduced size. The sequence variation of the cag PAI genes is presented in Table 1. Some genes harboured a very conserved sequence, especially when the amino acid sequences were compared. Other genes (e.g., HP0547) showed larger sequence diversity. The number of nucleotide substitutions that actually resulted in an amino acid substitution also varied between the genes, as shown in Table 1. KA values are defined as the number of nonsynonymous mutations per nonsynonymous site of the coding sequence, while KS values are defined as the number of synonymous mutations per synonymous site. For a coding sequence under evolutionary pressure the ratio between KA and KS (KA/KS) is generally lower than 1 (Kumar et al., 2001), which was also

the case for all cag PAI genes when the four strains in this study were compared. 3.2. Identification of a novel gene (HP0521B) When gene HP0521 in strain 26695 (type HP0521:3 in Fig. 2) is compared to the corresponding gene in J99 (jhp0470 or type HP0521:1 in Fig. 2), two major gaps are found. In addition, there is a frameshift in 26695, which is likely to disrupt the translation of the protein. Previous microarray analyses also confirm divergence of this gene (Bjo¨rkholm et al., 2001; Kim et al., 2002). The complete gene (together with the 3V region of HP0520) was absent in strain Ca52. The three remaining strains (Ca73, Du23:2 and Du52:2) contained a novel hypothetical gene, HP0521B, that differed substantially from J99 in the centre of the gene (type HP0521B:1 in Fig. 2). HP0521B consisted of an ORF of approximately 750 bp with start and stop in a different reading frame as compared to HP0521 in J99. When an nBLAST search was performed on HP0521B, no significant matches were found. Twenty-one additional strains were sequenced over this region for further analyses. Thirteen of the strains clustered with 26695, but did not contain the frameshift, while eight strains clustered with the novel

A. Blomstergren et al. / Gene 328 (2004) 85–93

89

Table 1 Average difference between the four sequenced strains (Ca52, Ca73, Du23:2 and Du52:2) together with the fraction of nucleotide differences that result in amino acid changes, KA and KS values Gene designation HP0520 HP0521B HP0522 HP0523 HP0524 HP0525 HP0526 HP0527 HP0528 HP0529 HP0530 HP0531 HP0532 HP0534 HP0535 HP0536 HP0537 HP0538 HP0539 HP0540 HP0541 HP0542 HP0543 HP0544 HP0545 HP0546 HP0547

cag1

cag~

cag3 cag4 cag5

cagy cagg cagh caga cagZ cagY cagX cagW cagV cagU cagT cagS cagQ cagP cagM cagN cagL cagI cagH cagG cagF cagE cagD cagC cagA

cag6 cag7 cag8 cag9 cag10 cag11 cag12 cag13 cag14 cag15 cag16 cag17 cag18 cag19 cag20 cag21 cag22 cag23 cag24 cag25 cag26

Nucleotide differencesa (%)

Amino acid differencesa (%)

Nonsynonymous mutations (%)

KAb

KSb

3,3 2,3 3,7 7,8 3,2 1,7 1,6 2,3 2,0 1,6 2,0 2,1 1,3 1,8 1,7 2,9 1,5 2,2 2,4 2,1 1,3 2,1 2,3 1,8 2,5 3,6 6,5

4,8 3,9 2,5 6,0 1,0 0,3 0,9 2,7 1,2 0,9 1,0 2,4 0,8 2,7 2,0 3,8 1,3 3,8 3,2 2,0 0,5 1,9 3,0 0,6 1,9 3,9 10,3

57 52 27 30 15 6 18 46 21 16 17 37 20 50 38 28 27 55 45 33 15 29 44 11 27 45 64

0,034 0,039 0,013 0,034 0,007 0,001 0,005 0,014 0,007 0,003 0,004 0,012 0,004 0,013 0,009 0,016 0,007 0,019 0,015 0,010 0,003 0,009 0,014 0,003 0,007 0,016 0,058

0,050 0,041 0,090 0,216 0,099 0,048 0,036 0,038 0,047 0,033 0,051 0,037 0,030 0,035 0,036 0,055 0,036 0,032 0,043 0,045 0,037 0,053 0,049 0,052 0,053 0,073 0,071

a Percentage differences were calculated for all possible pair wise comparisons between the four strains and the average is presented. Large insertions or deletions (more than 15 base pairs) have not been included because they are discussed elsewhere in this article. Strain Ca52 lacks the HP0521B gene and is therefore omitted from that comparison. b KA and KS values were calculated by the MEGA 2.1 software, using the Kumar et al. (2001) method.

sequence HP0521B (Fig. 2). Additional strains were analysed by PCR showing that 34/63 (54%) of Swedish strains contained the HP0521B gene. The 3V ends of both HP0521 and HP0521B were highly variable in the studied strains, with insertions or deletions of one or two bases in several strains. Two strains contained a frameshift that resulted in cotranslation with HP0522 while all the other strains terminated in approximately the same region, although the amino acid sequences differed at the C-terminal. Cotranslation of HP0521 and HP0522 has previously been described (Azuma et al., 2002). 3.3. Insertion elements in the cag PAI Four different insertion sequence (IS) elements (IS605 – IS608) have been described in H. pylori (Censini et al., 1996; Kersulyte et al., 1998, 2000a, 2002). The IS elements encode two transposases, orfA and orfB, where orfB is homologous in all four IS elements and orfA is homologous in three of them (IS605, IS606 and IS608). In this study, we found an IS606 element in the cag PAI of strain Ca52 that was located between genes HP0523 and HP0524. Because HP0523 and HP0524 are transcribed in opposite directions, both toward the IS element, this insertion most probably

does not affect the transcription of the genes. In strain 84– 183, an IS606 element is located at a different position in the cag PAI, approximately 700 bp downstream of the 3V end of the cagA gene (Kersulyte et al., 1998). IS elements are often found close to specific target sequences. The 5V end of IS606 in strain 84 – 183 is inserted next to a TTAT motif, while there is no specific target sequence for the 3V end. The same target sequence was also present in strain Ca52. Both transposases in IS606 showed 96% identity between strain 84 – 183 and Ca52 on the nucleotide level. On the amino acid level, the identities of orfA and orfB were 97% and 95%, respectively, between the two strains. 3.4. The highly repetitive HP0527 (cag7) gene The HP0527 protein, a VirB10 homologue, is an outer membrane protein that forms parts of the core pilus-like structure of the type IV secretion system (Selbach et al., 2002b). Recently, a model was proposed, where HP0527 serves as a variable antigenic protein located on a novel filamentous surface organelle (Rohde et al., 2003). The HP0527 gene is relatively large (5 – 6 kb) and includes two repetitive regions (Liu et al., 1999). The first region

90

A. Blomstergren et al. / Gene 328 (2004) 85–93

Fig. 2. Diversity of the HP0521 locus in a panel of Swedish H. pylori strains. The nucleotide sequences of the different alleles are presented as thick lines, where black regions represent sequences identical or highly similar to jhp0470 (HP0521) in strain J99 while striped regions are completely different from J99 and thus represent a novel gene (HP0521B). Thin arrows mark predicted translated regions. The previously described gene HP0521 can be divided in two variants depending on the absence or presence of two major deletions in the nucleotide sequence compared to strain J99 (type HP0521:1 vs. HP0521:2 – 4). Furthermore, the gene products of HP0521 or HP0521B can be subdivided into a number of categories (HP0521:1 – 4, HP0521B:1 – 2) based on the presence of frame shifts in the genes. Some strains lacked a stop codon between HP0521 and HP0522 (type HP0521:4), and because the ORFs were in the same reading frame, this probably results in the cotranslation of the adjacent genes. The new allele, HP0521B, was discovered in half of the analyzed strains and the proportion of strains that falls in each category is shown at the far right (total number of strains is 26).

Fig. 3. The two ends flanking the unresolved intergenic region between HP0546 and HP0547 in strain Du52:2. This complex insertion probably consists partly of duplications of regions from the cag PAI and the surrounding genes, as well as of rearrangements of chromosomal regions without duplication. Boxes correspond to ORFs or other motifs while narrow lines correspond to connecting sequences. The junctions between the cag PAI sequence and sequence corresponding to other regions of the H. pylori genome are located at the arrows. (A) The 5V end was similar to HP0510 and HP0509. (B) The 3V end showed similarity to the corresponding region of strain ATCC 43504 (bold line) and contained gene HP0511, a duplication of motif 1a and a mini IS605.

A. Blomstergren et al. / Gene 328 (2004) 85–93

consists of one and a half or two and a half units of 130 bp each. Both strains from the cancer patients (Ca52 and Ca73) together with strain 26695 have two and a half copies, while the strains from the duodenal ulcer patients (Du23:2 and Du52:2) and J99 have one and a half copy. The second repetitive region is approximately 2.7 kb in strain 26695 and consists of 74 segments composed of six different elements (a, h, E, A, y, q), 15 – 42 bp long. This complex structure makes sequencing with specific primers impossible because the repeat region is too long to sequence through in one reaction, and multiple priming sites cannot be avoided within the repetitive region. All our strains are similar to J99 and 26695 in this region when sequenced as far as possible directly on genomic material. 3.5. Major insertion/duplication in Du52:2 In the region between HP0546 and HP0547, an extensive insertion/duplication in strain Du52:2 was found. The in-

91

sertion was larger than 4 kb and no PCR reaction was successful in spanning it. The 5V region of this insertion was similar to genes HP0509 and HP0510 in strain 26695 (Fig. 3A). The 3V region was even more complex (Fig. 3B). Apart from 158 bp, it was similar to strain ATCC 43504 (McGee et al., 1999) and included a mini IS605, a partial duplication of motif 1a in the variable region downstream of HP0547 (see Section 3.7) and a section corresponding to gene HP0511 in strain 26695. The duplicated motif 1a, had truncations of 64 bp at the 5V end and 105 bp at the 3V end, when compared to the corresponding region between HP0547 and HP0549 of the same strain (Fig. 1), but the duplicated 633 bp showed 100% identity. Collectively, these duplication and insertion events showed that a major rearrangement had occurred and remnants of both IS605 and IS606 (in motif 1a) were found. This suggests that the cag PAI was split in two. However, strain Du52:2 still translocated CagA as well as induced an IL-8 response in host cells (Fig. 4C and D).

Fig. 4. (A) Structure of the variable 3V region of CagA. The EPIYA motif is marked by a broad black band and the Western repeat region is grey. Strains Ca52, Ca73 and Du23:2 have one repeat unit while Du52:2 has five repeat units. (B) PCR products over the variable region. Lane 1: Ca52, lane 2: Ca73, lane 3: Du23:2 and lane 4: Du52:2. (C) Western blot analysis of CagA translocation into AGS cells. Upper panel: anti-CagA and lower panel: antiphosphotyrosine. Lane 1: 26695, lane 2: J99, lane 3: Ca52, lane 4: Ca73, lane 5: Du23:2, lane 6: Du52:2, lane 7: 67:20 (cag PAI negative), lane 8: 67:21 (cag PAI positive) and lane 9: control only cells. (D) Induction of IL-8 secretion in AGS cells incubated with the indicated strains for 6 h. Samples were run in triplicates and the results are presented as the average of four independent experiments. Error bars show standard errors.

92

A. Blomstergren et al. / Gene 328 (2004) 85–93

3.6. Diversity of the repetitive region in the HP0547 (cagA) gene The most studied gene in the cag PAI is cagA or HP0547. CagA is translocated into host cells via the type IV secretion system, encoded by the cag PAI (Stein et al., 2000). Inside the host cells, CagA is phosphorylated on one or several tyrosine residues located in EPIYA motifs (Higashi et al., 2002a). A repetitive region is located close to the 3V end of cagA (Evans et al., 1998) and the structure of this region of the gene is distinct between Western and East Asian strains (Yamaoka et al., 1999; Evans and Evans, 2001). In Western strains, two EPIYA motifs are found outside of the repetitive region and one or more EPIYA motifs are located within the repetitive region. All of these motifs are possible targets for host cell kinases. Most strains carry one copy of the repetitive unit, but strains with up to seven units have been reported (Evans et al., 1998; Yamaoka et al., 1999; Higashi et al., 2002a; Stein et al., 2002). There is also evidence that the number of repetitive units can vary between subclones of specific strains (Yamaoka et al., 1999). The four strains in this study all harboured the Western repeat region. One strain, Du52:2, possessed five Western repeat units, while the other three contained one Western repeat unit (Fig. 4A and B). The number of repeats may have an effect on the biological activity of the protein because EPIYA motifs are located in this region. It has previously been shown that a greater number of repeats will give a higher degree of morphological changes in host cells (Stein et al., 2002). The five amino acids following the tyrosine have an effect on the phosphorylation with the preferred amino acid sequence pY-(S/T/A/V/I)-X-(V/I/L)X-(W/F) (Higashi et al., 2002a). In Ca73, a glutamic acid was located at the fifth position after the tyrosine, where aspartic acid normally resides in Western strains. The effect of this substitution is unclear because neither of these amino acids are among the preferred residues at this position. The four strains harboured a functional cag PAI because they all expressed and translocated the CagA protein into AGS cells (Fig. 4C, upper), where the protein was subsequently tyrosine phosphorylated, as determined by Western blot analysis (Fig. 4C, lower). Furthermore, all four strains induced IL-8 secretion in AGS cells (Fig. 4D). 3.7. Variable 3V region of the cag PAI The 3V end of the cag PAI has previously been reported to be highly variable (Kersulyte et al., 2000b). The different motifs (Ia –c, II, IIIa –b, IV and V) consist of truncated parts of IS606, mini IS605 and a homologue to a helicase gene. All our strains contained different motifs (marked yellow in Fig. 1) and can be assigned to the motifs described previously (Kersulyte et al., 2000b).

4. Conclusions We have shown that even in a comparably well-studied region of the H. pylori genome, like the cag PAI, there are still issues to consider in terms of gene variability. A novel gene present in approximately half of the Swedish strains has been presented. Further studies are needed to unravel the function of this gene. We have also detected an IS606 element at a new location in the cag PAI in a strain from a cancer patient as well as a major rearrangement containing insertions and/or duplications in a strain from a duodenal ulcer patient. The increased amount of information regarding the variability of different genes in the cag PAI gives new insights concerning the differences in evolutionary pressure on these genes.

Acknowledgements This work was supported by the Foundation for strategic research and the Swedish research council.

References Alm, R.A., et al., 1999. Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori. Nature 397, 176 – 180. Amieva, M.R., Vogelmann, R., Covacci, A., Tompkins, L.S., Nelson, W.J., Falkow, S., 2003. Disruption of the epithelial apical – junctional complex by Helicobacter pylori CagA. Science 300, 1430 – 1434. Azuma, T., Yamakawa, A., Yamazaki, S., Fukuta, K., Ohtani, M., Ito, Y., Dojo, M., Yamazaki, Y., Kuriyama, M., 2002. Correlation between variation of the 3V region of the cagA gene in Helicobacter pylori and disease outcome in Japan. J. Infect. Dis. 186, 1621 – 1630. Bjo¨rkholm, B., Lundin, A., Sille´n, A., Guillemin, K., Salama, N., Rubio, C., Gordon, J.I., Falk, P., Engstrand, L., 2001. Comparison of genetic divergence and fitness between two subclones of Helicobacter pylori. Infect. Immun. 69, 7832 – 7838. Censini, S., Lange, C., Xiang, Z., Crabtree, J.E., Ghiara, P., Borodovsky, M., Rappuoli, R., Covacci, A., 1996. cag, a pathogenicity island of Helicobacter pylori, encodes type I-specific and disease-associated virulence factors. Proc. Natl. Acad. Sci. U. S. A. 93, 14648 – 14653. ˚ kerlund, T., Sille´n, A., Engstrand, L., 2000. Clustering of Enroth, H., A clinical strains of Helicobacter pylori analyzed by two-dimensional gel electrophoresis. Clin. Diagn. Lab. Immunol. 7, 301 – 306. Evans Jr., D.J., Evans, D.G., 2001. Helicobacter pylori CagA: analysis of sequence diversity in relation to phosphorylation motifs and implications for the role of CagA as a virulence factor. Helicobacter 6, 187 – 198. Evans Jr., D.J., Queiroz, D.M., Mendes, E.N., Evans, D.G., 1998. Diversity in the variable region of Helicobacter pylori cagA gene involves more than simple repetition of a 102-nucleotide sequence. Biochem. Biophys. Res. Commun. 245, 780 – 784. Ewing, B., Hillier, L., Wendl, M.C., Green, P., 1998. Base-calling of automated sequencer traces using phred: I. Accuracy assessment. Genome Res. 8, 175 – 185. Higashi, H., Tsutsumi, R., Fujita, A., Yamazaki, S., Asaka, M., Azuma, T., Hatakeyama, M., 2002a. Biological activity of the Helicobacter pylori virulence factor CagA is determined by variation in the tyrosine phosphorylation sites. Proc. Natl. Acad. Sci. U. S. A. 99, 14428 – 14433. Higashi, H., Tsutsumi, R., Muto, S., Sugiyama, T., Azuma, T., Asaka, M., Hatakeyama, M., 2002b. SHP-2 tyrosine phosphatase as an

A. Blomstergren et al. / Gene 328 (2004) 85–93 intracellular target of Helicobacter pylori CagA protein. Science 295, 683 – 686. Kersulyte, D., Akopyants, N.S., Clifton, S.W., Roe, B.A., Berg, D.E., 1998. Novel sequence organization and insertion specificity of IS605 and IS606: chimaeric transposable elements of Helicobacter pylori. Gene 223, 175 – 186. Kersulyte, D., Mukhopadhyay, A.K., Shirai, M., Nakazawa, T., Berg, D.E., 2000a. Functional organization and insertion specificity of IS607, a chimeric element of Helicobacter pylori. J. Bacteriol. 182, 5300 – 5308. Kersulyte, D. et al., 2000b. Differences in genotypes of Helicobacter pylori from different human populations. J. Bacteriol. 182, 3210 – 3218. Kersulyte, D., Velapatino, B., Dailide, G., Mukhopadhyay, A.K., Ito, Y., Cahuayme, L., Parkinson, A.J., Gilman, R.H., Berg, D.E., 2002. Transposable element ISHp608 of Helicobacter pylori: nonrandom geographic distribution, functional organization, and insertion specificity. J. Bacteriol. 184, 992 – 1002. Kim, C.C., Joyce, E.A., Chan, K., Falkow, S., 2002. Improved analytical methods for microarray-based genome-composition analysis. Genome Biol. 3, research 0065.1 – 0065.17. Kumar, S., Tamura, K., Jakobsen, I.B., Nei, M., 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17, 1244 – 1245. Liu, G., McDaniel, T.K., Falkow, S., Karlin, S., 1999. Sequence anomalies in the Cag7 gene of the Helicobacter pylori pathogenicity island. Proc. Natl. Acad. Sci. U. S. A. 96, 7011 – 7016. McGee, D.J., May, C.A., Garner, R.M., Himpsl, J.M., Mobley, H.L., 1999. Isolation of Helicobacter pylori genes that modulate urease activity. J. Bacteriol. 181, 2477 – 2484. Mimuro, H., Suzuki, T., Tanaka, J., Asahi, M., Haas, R., Sasakawa, C., 2002. Grb2 is a key mediator of Helicobacter pylori CagA protein activities. Mol. Cell 10, 745 – 755. Nilsson, C., Sille´n, A., Eriksson, L., Strand, M.L., Enroth, H., Normark, S., Falk, P., Engstrand, L., 2003. Correlation between cag pathogenicity island composition and Helicobacter pylori-associated gastroduodenal disease. Infect. Immun. 71, 6573 – 6581. Peek Jr., R.M., Blaser, M.J., 2002. Helicobacter pylori and gastrointestinal tract adenocarcinomas. Nat. Rev., Cancer 2, 28 – 37.

93

Rohde, M., Pu¨ls, J., Buhrdorf, R., Fischer, W., Haas, R., 2003. A novel sheathed surface organelle of the Helicobacter pylori cag type IV secretion system. Mol. Microbiol. 49, 219 – 234. Schwartz, S., Zhang, Z., Frazer, K.A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., Miller, W., 2000. PipMaker-a web server for aligning two genomic DNA sequences. Genome Res. 10, 577 – 586. Segal, E.D., Lange, C., Covacci, A., Tompkins, L.S., Falkow, S., 1997. Induction of host signal transduction pathways by Helicobacter pylori. Proc. Natl. Acad. Sci. U. S. A. 94, 7595 – 7599. Selbach, M., Moese, S., Hauck, C.R., Meyer, T.F., Backert, S., 2002a. Src is the kinase of the Helicobacter pylori CagA protein in vitro and in vivo. J. Biol. Chem. 277, 6775 – 6778. Selbach, M., Moese, S., Meyer, T.F., Backert, S., 2002b. Functional analysis of the Helicobacter pylori cag pathogenicity island reveals both VirD4-CagA-dependent and VirD4-CagA-independent mechanisms. Infect. Immun. 70, 665 – 671. Selbach, M., Moese, S., Hurwitz, R., Hauck, C.R., Meyer, T.F., Backert, S., 2003. The Helicobacter pylori CagA protein induces cortactin dephosphorylation and actin rearrangement by c-Src inactivation. EMBO J. 22, 515 – 528. Staden, R., Beal, K.F., Bonfield, J.K., 2000. The Staden package, 1998. Methods Mol. Biol. 132, 115 – 130. Stein, M., Rappuoli, R., Covacci, A., 2000. Tyrosine phosphorylation of the Helicobacter pylori CagA antigen after cag-driven host cell translocation. Proc. Natl. Acad. Sci. U. S. A. 97, 1263 – 1268. Stein, M., Bagnoli, F., Halenbeck, R., Rappuoli, R., Fantl, W.J., Covacci, A., 2002. c-Src/Lyn kinases activate Helicobacter pylori CagA through tyrosine phosphorylation of the EPIYA motifs. Mol. Microbiol. 43, 971 – 980. Tomb, J.F., et al., 1997. The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature 388, 539 – 547. Yamaoka, Y., El-Zimaity, H.M., Gutierrez, O., Figura, N., Kim, J.G., Kodama, T., Kashima, K., Graham, D.Y., Kim, J.K., 1999. Relationship between the cagA 3V repeat region of Helicobacter pylori, gastric histology, and susceptibility to low pH. Gastroenterology 117, 342 – 349.