International Journal of Food Microbiology 74 (2002) 217 – 227 www.elsevier.com/locate/ijfoodmicro
A bacteriophage reagent for Salmonella: molecular studies on Felix 01 Jonathan Kuhn a,*, Mordechai Suissa b, David Chiswell c, Aviva Azriel d, Bluma Berman a, Dina Shahar d, Sarah Reznick b, Rekefet Sharf d, Joseph Wyse b, Tamar Bar-On b, Ilana Cohen a, Rachel Giles b, Irit Weiser b, Sharon Lubinsky-Mink a, Shimon Ulitzur d a
Department of Biology, Technion, Israel Institute of Technology, 32000 Haifa, Israel b Biolume, Haifa, Israel c Amersham International, Little Chalfont, UK d Department of Food Engineering and Biotechnology, Technion, Israel Institute of Technology, 32000 Haifa, Israel Accepted 1 July 2001
Abstract Felix 01 (F01) is a bacteriophage originally isolated by Felix and Callow which lyses almost all Salmonella strains and has been widely used as a diagnostic test for this genus. Molecular information about this phage is entirely lacking. In the present study, the DNA of the phage was found to be a double-stranded linear molecule of about 80 kb. 11.5 kb has been sequenced and in this region A + T content is 60%. There are relatively few restriction endonuclease cleavage sites in the native genome and clones show this is due to their absence rather than modification. A restriction map of the genome has been constructed. The ends of the molecule cannot be ligated although they contain 50 phosphates. At least 60% of the genome must encode proteins. In the sequenced portion, many open reading frames exist and these are tightly packed together. These have been examined for homology to published proteins but only 1 to 17 shows similarity to known proteins. F01 is therefore the prototype of a new phage family. On the basis of restriction sites, codon usage and the distribution of nonsense codons in the unused reading frames, a strong case can be made for natural selection that reacts to mRNA structure and function. D 2002 Elsevier Science B.V. All rights reserved. Keywords: Salmonella detection; Bacteriophage Felix 01; Codon usage; Out-of-frame nonsense codons; Felix 01 DNA
1. Introduction Felix 01 is a virulent bacteriophage that was first isolated by Felix and Callow (1943). At high titer, the *
Corresponding author. Tel.: +972-4-829-3455; fax: +972-4831-1284. E-mail address:
[email protected] (J. Kuhn).
phage lyses a very high percentage of Salmonellae and a few other enteric bacterial strains (Cherry et al., 1954). Since the most important characteristics in the development of a phage reagent for bacterial detection is the host range of the phage, Felix 01 is the obvious choice with regard to Salmonella. The phage, hereafter referred to as F01, has in fact found use as a diagnostic agent.
0168-1605/02/$ - see front matter D 2002 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 8 - 1 6 0 5 ( 0 1 ) 0 0 6 8 2 - 1
218
J. Kuhn et al. / International Journal of Food Microbiology 74 (2002) 217–227
The creation of a phage reagent requires some basic molecular information about the phage (or phages) that compose it. The type of nucleic acid (RNA or DNA), whether it is single stranded or double stranded, the presence and amount of unusual bases, and glycosylation, if any, are of obvious importance in this regard. The ends of phage DNA can be either constant or variable, which usually depends on the packaging mechanism of the particular type. Variable ends are often the result of headful packaging. The existence of constant ends is advantageous both for making a restriction enzyme cleavage map and isolating deletions. Given that the genome is double-stranded DNA, cloning of some segments can be followed by sequence determination. When coupled with analysis of the synthesis of phage proteins, an adequate picture of some of the genes can be obtained. The production of a recombinant phage reagent requires neither the elucidation of all the phage functions nor the sequence of the entire genome since the method involves substituting reporter genes for a small region of the phage DNA. Some studies on the dimensions of the phage particle and the host range of this phage have been carried out. In addition, the investigations of the structure and chemistry of the lipopolysaccharide (LPS) complex of Salmonella have used Felix 01 as a probe. However, very little is known about the phage itself. F01 has been viewed in the electron microscope by Ackermann and Nguyen (1983) and by Lindberg (1967). The micrographs show a hexagonal, polyhedral head whose diameter is reported to be 72 nm (Ackermann and Nguyen, 1983) or 60 nm (Lindberg, 1967). The capsid is connected to a contractile sheath whose dimensions are 113 16 nm (Ackermann and Nguyen, 1983) or 90 – 100 nm (Lindberg, 1967). Sheath length after contraction is 50 nm. The virion contains a base plate with six tail fibers and adsorbs to ˚ length whose isolated LPS units of 200 – 300 A molecular weight is 500,00 –1,000,000 Da (Lindberg, 1967). Many laboratories have elucidated the structure of the core polysaccharide of the LPS complex and its attached O chains. Since F01 attaches to the core polysaccharide, this phage has been invaluable in such studies (Wilkinson and Stocker, 1968; Lindberg and Holme, 1969; Wright, 1971). Lindberg (1967) isolated
rough mutants of S. minnesota. One strain adsorbed F01 while another did not. Both have only the core polysaccharide and the difference between them is that adsorbing strain contains a terminal N-acetylglucosamine moiety. LPS was isolated from these strains and the material from the adsorbing strain inactivated the phage in contrast to that from the non-adsorbing strain. The LPS from the former strain could be dissociated with sodium deoxycholate and then reconstituted after its removal. These experiments demonstrated that the core polysaccharide is sufficient for adsorption and that the N-acetylglucosamine is a critical component of the F01 receptor. These studies were extended to the other Salmonellae (Lindberg and Holme, 1969; Lindberg and Hellerqvist, 1971) and it was found that the O side chains sometimes sterically interfere with the adsorption of the phage. For example, S. anatum is normally resistant to F01 but McConnell and Schoelz (1983) were able to isolate a variant of this strain that had become sensitive because the length of its O chain had become shorter. In a further communication (McConnell and Wright, 1979), it was found that growth at lower temperatures greatly increased the sensitivity of wild-type S. anatum to F01 and changed its LPS profile. Wright (1971) found that host range mutants of F01 could be isolated on S. anatum and used these to investigate the O antigen conversion by the e15 and e34 temperate phages. The present communication reports on the characterization of F01 at the molecular level and, in particular, those aspects that are important for developing a phage reagent as described in the accompanying article. Although characterization is by no means complete, it is clear that F01 represents a new phage type with respect to both its DNA and protein sequence.
2. Methods 2.1. Cloning and DNA sequencing The DNA from the mature virion was isolated (see accompanying article) and cleaved with various enzymes and the fragments ligated to a suitable vector. The clones are shown in Fig. 1 which gives their approximate map position within the chromo-
J. Kuhn et al. / International Journal of Food Microbiology 74 (2002) 217–227
Fig. 1. Clones of F01. The enzymes used in cloning various segments are given. A number of clones containing small segments are not included. The sequenced segment is in the 55 to 70 kb region.
some. For DNA sequencing, the cloned fragments were sheared and filled in with the Klenow fragment of DNA polymerase and subcloned into M13 vectors. Sequencing was performed by the dideoxy method. Most regions have been multiply sequenced as the library of subclones contains random, overlapping fragments. 2.2. Labeling and analysis of F01 encoded proteins The method was a modification of a method developed by R. Studier for phage T7. Modification of his method was necessitated by the rather poor adsorption
219
of F01 in his medium. The following medium was found to be suitable for F01: (A) 10 MOPS: 40 ml of 1 M KMOPS pH 7.4; 4 ml of 1 M tricine-KOH pH 7.4; 1 ml of ferric sulfate n-hydrate (Baker) 5.6 mg/20 ml; 5 ml of 1.9 M NH4Cl; 1 ml of K2SO4, 4.8 g/100 ml; 10 ml of 5 M NaCl; 10 ml of MgCl2, 10.7 g/100 ml; 10 ml of 0.5 mM CaCl2; 1 ml of micronutrients and H2O to 100 ml. Micronutrients contained per 100 ml: 3.7 mg ammonium molybdate tetrahydrate, 2.5 mg cupric sulfate pentahydrate, 1.86 mg boric acid, 1.58 mg manganese chloride tetrahydrate, 0.72 mg cobalt chloride hexahydrate and 0.29 mg zinc sulfate heptahydrate. (B) Phosphate solution: 0.132 M K2HPO4. (C) Glucose, 40%. (D) Difco methionine assay medium. The labelling medium contained 100 ml of 10 MOPS, 10 ml phosphate solution, 10 ml glucose and 10 g of Difco methionine assay medium per liter. The host cells were grown to the logarithmic phase of growth at 37 C and then infected. The background of cellular proteins was reduced by UV irradiating the cells prior to phage infection. At various times after infection, 35S-methionine was added for 2 min and then LB was added to prevent further labeling. Four minutes later the cells were chilled and pelleted by centrifugation. The cell pellet was resuspended in sample preparation buffer (Bollag et al., 1996) and placed in boiling water for 3 min. Uninfected cells served as a control. Mature virions were labeled by the same procedure except that labeling was continued until lysis. Unlabeled carrier phages were added to the lysate, the phage precipitated with polyethylene glycol and then purified by CsCl equilibrium density gradient centrifugation.
3. Results 3.1. Phage growth A particularly good host for F01 is strain K772 of Salmonella. A one-step growth curve was obtained by infecting exponentially growing cells (7 108/ml) in LB medium at 37 C with the phage (109 PFU/ml) and incubating for 10 min for adsorption and then diluting the suspension 10,000-fold in the same pre-warmed medium. Aliquots were taken for plating every 5 min. The rise period began after 20 min and reached a maximum after 60 min. Average burst size was 56.
220
J. Kuhn et al. / International Journal of Food Microbiology 74 (2002) 217–227
Chloroformed aliquots showed that the eclipse period is 18 min under these conditions. Single burst analysis in which 66 of 98 tubes did not yield phage gave an average burst size of 39. F01 failed to exclude phage P22 which had been induced before or after F01 infection. However, F01 does exclude another Salmonella phage isolated by us, named LOL1. In addition, when LT2 was made LAC + by introduction of the transferable plasmid F’lac-pro kanR (containing the lac operon and conferring resistance to kanamycin (30 mg/ml)), F01 was found to depress the induction of b-galactosidase during its infective cycle. 3.2. Isolates of F01 from nature Among a number of our isolates of phages that grow on Salmonella, two were found that are related to F01. After cleavage with a number of enzymes, one of these (F44) showed identical restriction fragment patterns to those of F01. The second (F41) was similar to F01 but some digestions (EcoRV, HpaI, ScaI and SspI) revealed differences. Two radioactive nucleic acid probes were prepared: A 3.5-kb BstXI fragment from F41 and a 1.3-kb HpaI from F44. After labeling by nick translation, these were used in hybridization experiments with digestions of the DNA from the two phages after they have been cleaved by BstEII, BstXI, EcoRV, HpaI, ScaI or XmnI. The two probes were found to come from the same region of the phage chromosome and differences between the two strains were found with EcoRV, HpaI and XmnI These results indicate that F41 phage is closely related to F01 and can be considered a variant rather than a discrete type. It therefore seems that F01 is not a particularly rare type among natural isolates on Salmonella. 3.3. Characterization of F01 DNA The genome of F01 is a single molecule with a length of about 80 kb. As expected for such a large phage genome, the chromosome is double-stranded DNA. The estimation of the length of DNA comes both from the summation of fragment lengths generated by digestion with various restriction endonucleases and by electron microscopic measurement. By using FX174 DNA as a reference, electron micro-
graphs showed the molecule to be linear and approximately 80 kb. While less accurate, a summation of the lengths of fragments generated by restriction enzymes gave a similar estimate of genome length. The DNA either does not contain unusual bases or they are very rare because restriction enzyme cleavage of DNA extracted from the virions was identical to that found in cloned sections extracted from Escherichia coli. In particular, the enzyme HindIII recognizes and cleaves sites containing all four bases (AAGCTT). This enzyme cleaves DNA isolated from virions. No additional HindIII sites were found in cloned segments. This was also true for the many other enzymes examined. It is therefore unlikely that an unusual base exists at a moderate frequency. The 11.5-kb section that was sequenced (Kuhn et al., 1998) had an A + T content of 60%. We have not determined whether the rest of the genome has a similar composition. 3.4. Restriction enzyme analysis Purified virion DNA was cleaved with many different restriction enzymes. The results of these digestions are given in Table 1. A restriction map for some of the enzyme sites is presented in Fig. 2. Restriction enzymes that cut many times are not included. The enzymes chosen for mapping are those that cleave less than 12 times. The map was constructed using both single and double digestions. Some of the digestions were subsequently hybridized to isolated, end labeled restriction fragments of the phage DNA. The ClaI end fragments were used for final orientation and for preparing a detailed map of the ends of the phage. Table 1 Restriction endonucleases and cleavage of F01 Enzymes that do not cleave ApaI BssHII NcoI Sau3AI
ApaLI EagI NheI SfiI
AvaI EcoRI NotI SmaI
BalI HphI NruI SphI
BamHI KpnI NsiI SstI
BanI MboI PstI SstII
BanII MluI PvuI StuI
BglI NaeI PvuII StyI
BglII NarI SalI XhoI
AvaII FnuDII HpaI SnaBI
BsmI Fnu4HI HpaII SpeI
BstEII FspI MaeI SspI
BstNI HaeIII NciI TaqI
BstXI HgaI NdeI XbaI
ClaI HhaI RsaI XmnI
Enzymes that cleave AatII DdeI HincII ScaI
AccI DraI HindIII ScrFI
AluI EcoRV HinfI SfaNI
J. Kuhn et al. / International Journal of Food Microbiology 74 (2002) 217–227
221
might result from the elicitation of a restriction endonuclease for this sequence by the phage during infection and such an enzyme could be used to destroy the host DNA. However, through cloning and recombination, we have created a phage that carries BamHI (GGATCC) and Sau3AI sites and this phage shows no ill effects. The phage DNA containing these sites is cut by both BamHI and Sau3AI. It must be concluded that F01 simply has no Sau3AI sites and one must look elsewhere for its biological basis. Fig. 2. A partial restriction map of F01. The sites of some restriction endonucleases are presented. The data for the enzymes in the lower part of the figure (BstEII, EcoRV, HaeIII, HindIII, HpaI, ScaI) are only given with respect to their cleavage sites nearest the phage ends.
For the latter purpose, the ends were labeled and then cleaved with various, frequently cutting restriction endonucleases. The length of the resulting bands was determined. Assignment to the respective ends of F01 was carried out using isolated ClaI fragments. Table 1 shows that a surprising number of enzymes do not cleave this DNA. The enzymes which do not cut are mostly those with a majority of G and C. A number of restriction enzymes whose recognition site is in effect five bases also did not cut the F01 chromosome and all of them have a majority of G and C. With respect to G + C content, this probably indicates that the entire genome will not be much different than the sequenced region. Table 2 presents the observed four- and six-base palindromes in the region whose sequence was determined. While there are less palindromes than expected, the relative lack of palindromes that are GC-rich is striking. In Table 1, it is particularly striking that Sau3AI does not cleave F01 DNA since 288 cleavages are expected for a DNA of this length and A + T content. There are several possibilities for the lack of Sau3AI cleavage. The first of these is that the mature DNA is methylated at the cytosine of the GATC sequence which is known to prevent the activity of Sau3AI. However, cloning and sequencing of 11.5 kb of F01 DNA does not reveal any Sau3AI sequences even though 40 such sites would be expected. A second possibility is that the entire 80 kb of F01 DNA simply does not contain this sequence. This
3.5. Analysis of the ends of F01 DNA As mentioned previously, the ends of F01 can be labeled with g32P-ATP and polynucleotide kinase after previous dephosphorylation by alkaline phosphatase. Cleavage of labeled DNA gave two discrete, sharp bands with a number of enzymes and this indicates that terminal redundancy is unlikely. Phages such as l have DNA with discrete ends with complementary single-strand extensions. The ends encourage ligation and the formation of circular molecules after injection. In order to examine this possibility in F01, the labeled HindIII end fragments were isolated. As shown in Fig. 3, the ends of F01 are not substrates for T4 DNA ligase. After ligation, all the ligated molecules could be cleaved by HindIII. Therefore, no ligation had occurred between the F01 ends. Attempts to clone the end fragments after digestion with different enzymes were also unsuccessful. Table 2 Perfect palindromes in the sequenced region Length
AT
GC
Exp
Obs
Obs/Exp
4
4 2 0
0 2 4
372 331 74 777
251 258 22 531
0.67 0.78 0.30 0.68
6 4 2 0
0 2 4 6
67 89 39 6 201
40 66 1 0 107
0.60 0.74 0.03 0.00 0.53
Total 6
Total
The expected frequencies are calculated on the basis of an A + T content of 60%, sequence randomness and a segment length of 11.5 kb.
222
J. Kuhn et al. / International Journal of Food Microbiology 74 (2002) 217–227
3.7. Proteins synthesized during infection During infection, 33 proteins encoded by F01 could be detected by SDS-Page electrophoresis. Five of these appear very early and have approximate molecular weights of 10, 11, 12, 14 and 20 kDa. The syntheses of the 14 and 20 kDa species cease before 15 min while the other three are synthesized throughout the life cycle albeit at a reduced rate. The detectable polypeptides range from 10 to about 110 kDa and the particularly prominent species are 10, 11, 12, 16, 27, 28, 32, 43 and 50 kDa. The mature virion contains at least 22 different protein types. Because several of these (12, 16, 43 and 50 kDa) are present in such large amounts, there may be other species that are masked by them. We have no information as to whether some of the polypeptides in the virion result from proteolytic digestion. A summation of the proteins that were detected during infection renders an estimate of 1500 kDa for their combined molecular weights. Slightly more than 50% of the F01 genome is required to encode the visualized proteins. Fig. 3. Ligation and recleavage of the ends of F01. The ends were labeled by removing the terminal phosphates with alkaline phosphatase and then transferring a phosphate from gATP-32P using polynucleotide kinase. The labeled DNA was cleaved with HindIII and the ends isolated from an agarose gel. Lane 1 (StRL) shows the two HindIII ends of 0.9 and 2.3 kb. The next two lanes (R1 and R2) show the right end (2.3 kb) after ligation (R1) and subsequent HindIII digestion (R2). The ligated product is 4.6 kb but all of it can be cleaved by HindIII. Lanes L1 and L2 show the left end after ligation (L1) and HindIII digestion (L2) and this end behaves similarly to the right end. Lane RL1 shows the ligation of a mixture of the two ends. Three products are found: a 4.6-kb fragment from the ligation of two right ends; a 3.2-kb fragment from the two ends and a 1.8-kb fragment from two left ends. All the ligation products can be cleaved by HindIII (RL2). The molecular weight (Mw) marker is l cut with BstEII. The 14,140-bp fragment of l is the cos fragment.
3.6. Clones of F01 A large number of segments of DNA have been cloned, albeit with difficulty. These are shown in Fig. 1. Some regions of the phage have not yielded any clones and may lack sites for the enzymes used or contain genes encoding lethal functions. Further sequencing work may require direct shotgun cloning into M13 vectors.
3.8. The open reading frames of the sequenced region An 11.5-kb contiguous region of F01 DNA has been sequenced. Analysis of the potential open reading frames (ORFs) shows that there are 17 of these encoded by one strand and several of these by the other. It is very likely that only one strand is used for coding in this region since the ORFs on the other strand extensively overlap those of the coding strand. These ORFs are given in Table 3. The genes are tightly packed together in this region and in most cases the start codon is AUG. All terminate with either UAA or UGA. At present very little is known about these ORFs. ORF1 has been named red because it complements l red mutants in recA hosts. The complementing clone shows that this gene does not have an adjacent promoter since its expression is dependent on the lac promotor of the plasmid cloning vector, pHG165 (Stewart et al., 1986). Other clones show that the red gene is transcribed from a promoter about 500 bp upstream of this gene. One of the amber mutations isolated is complemented by clones with ORF6 which is thereby an essential gene. ORF9 and its product are
J. Kuhn et al. / International Journal of Food Microbiology 74 (2002) 217–227
223
Table 3 Open reading frames in the sequenced region of Felix 01 ORF 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Base pairs 171 – 620 746 – 901 947 – 1192 1302 – 1499 1524 – 3122 3142 – 4605 4608 – 5036 5107 – 5436 5451 – 6776 6867 – 7181 7218 – 8321 8345 – 8791 8949 – 9275 9275 – 9673 9953 – 10,219 10,249 – 10,710 10,710 – 11,495 7329 – 8093
Amino acids
Frame
Start codon
RBS
Termination codon
150 52 82 66 533 488 143 110 442 105 368 149 109 133 89 154 262 + 255
3 2 2 3 3 1 3 1 3 3 3 2 3 2 2 1 3 4
ATG ATG ATG ATG ATG ATG ATG ATG ATG TTG ATG ATG TTG ATG ATG ATG ATG ATG
+ + + + + + + + + + + + + + + + + +
TAA TAA TAG TAA TGA TAA TGA TAG TAA TAA TAA TAA TGA TGA TAG TGA – TGA
the only genes to show homology to any known DNA or protein sequence. The protein synthesized by ORF9 shows similarity to several proteins. The highest scoring segments in a Blast search were the head – tail connector of phage 21 (VG05 BPP21) and the minor capsid protein C of l (VCAC LAMBD). Other proteins with good, but lesser, similarity are proteases from bacterial species: protease IV of Methanobacter, PspA of Campylobacter jejuni, and protease IV of Helicobacter. However, the physiological function of ORF9 in F01 has not been determined. 3.9. Codon usage of the apparent coding regions One of the hosts of F01, E. coli, has a G + C content of about 50%. Phage l is similar to E. coli in this respect but F01 differs greatly from this host. It might be expected that its high A + T content of F01 would be reflected in the third base of codons within coding regions. For example, in Mycobacterium which has a very high G + C content ( 72%), this is achieved by almost exclusively using G or C as the third base of codons, both for the bacterium and its phages. The codon usage of F01 for the open reading frames of the sequenced region is given in Table 4. The data for E. coli are also presented. The data show
that this phage differs from its host in many respects and these differences are quite specific. Some amino acids (C, D, E, F, H, N and Y) show codon usage similar to that of E. coli. All these are amino acids that have only two codons and also have third bases with greater than 50% A + U in E. coli (with the exception of asparagine). The rest of the F01 amino acids have codon usage that differs from their host. Lysine is encoded by less AAA codons than in E. coli while glutamine results more from CAA than in E. coli. Besides methionine and tryptophan with their single codons, the remaining amino acids that have three or more codons are very preferentially coded for by codons which end in A or U (Table 5). In E. coli, alanine codons have 61% G and C as the last base whereas in F01 G + C is only 10% and GCU is the preferred codon. Glycine codons show a pattern similar to alanine and GGU is the dominant codon (70%) while GGC is rarely used. In E. coli, AUA is rarely used for isoleucine and this holds true for F01 as well. Half of the use of AUA takes place in one small ORF which may not encode a protein. The predominant leucine codon of E. coli, CUG (50%), while still used by F01 is no longer very prevalent (18%). With regard to proline, the E. coli CCG codon (52%) has become rare (5%) while CCU and CCA obtain 91% of the time. Of considerable interest are
224
J. Kuhn et al. / International Journal of Food Microbiology 74 (2002) 217–227
Table 4 Codon usage in F01 Amino acida
Codon
% F01
A 296
GCU GCC GCA GCG
52.4 7.4 37.5 2.7
17.0 26.7 21.6 34.7
C 31
UGU UGC
54.8 45.2
56.8 43.2
D 225
GAU GAC
57.3 42.7
E 232
GAA GAG
F 131
Amino acid
Codon
% F01
N 161
AAU AAC
55.3 44.7
45.3 54.7
P 128
CCU CCC CCA CCG
37.5 4.7 53.1 4.7
16.1 12.1 19.4 52.3
62.2 37.8
Q 127
CAA CAG
48.8 51.2
33.7 66.3
73.7 26.3
68.4 31.6
R 149
UUU UUC
50.4 49.6
56.5 43.5
CGU CGC CGA CGG AGA AGG
37.6 2.7 6.7 0.0 48.3 4.7
37.7 38.8 6.4 9.9 4.5 2.7
G 227
GGU GGC GGA GGG
69.6 8.8 15.0 6.6
34.3 39.6 11.2 14.9
S 239
H 38
CAU CAC
52.6 47.4
56.4 43.6
UCU UCC UCA UCG AGU AGC
37.7 8.4 27.2 1.3 12.1 13.4
15.5 15.1 12.9 14.8 15.1 26.7
I 201
AUU AUC AUA
71.1 25.9 3.0
49.9 41.9 8.1
T 224
ACU ACC ACA ACG
47.8 13.4 34.8 4.0
17.3 42.7 14.0 25.9
K 232
AAA AAG
51.7 48.3
75.5 24.5
V 226
L 282
UUA UUG CUU CUC CUA CUG
26.6 11.7 27.0 7.1 9.6 18.1
12.9 12.7 10.7 10.2 3.7 49.8
GUU GUC GUA GUG
43.4 10.6 37.6 8.4
26.6 21.1 15.8 36.5
W 33
UGG
100.0
100.0
AUG
100.0
100.0
Y 135
UAU UAC
59.3 40.7
56.8 43.2
M 94 a
Use E. coli
The number of total codons is also given.
Table 5 Percent A + T in the third base of F01 codons R P A G T
Use E. coli
93 91 90 85 83
V S I E L
81 77 74 74 63
Y D N C H
59 57 55 55 53
K F Q M W
52 50 49 0 0
AGA and AGG, the rare codons for arginine in E. coli. Usage of AGG is unchanged but AGA (48%) is extensively used. The arginine codon CGG is absent in this region of F01. The serine codon UCG of E. coli (15%) has also become rare in F01 (1%). As mentioned earlier, the sequence GATC does not exist in F01. However, the GAU codon for aspartic
J. Kuhn et al. / International Journal of Food Microbiology 74 (2002) 217–227
acid exists 127 times in the coding region of the sequenced region and in all cases the succeeding amino acid does not start with C. Similarly, AUC (isoleucine) occurs 52 times and all preceding codons do not end with G. The 178 serine codons that begin with UC are not preceded by codons ending in GA even though AGA is preferentially used for arginine (UGA, nonsense; CGA, 7%, AGA, 48%; GGA, 15%). That these situations occur by chance is extremely unlikely. 3.10. Nonsense codons in untranslated frames In microorganism, nonsense codons exist at the end of a translated reading frames. When G + C = 50%, it might be expected on a random basis that the other two transcribed but untranslated reading frames would exhibit equal numbers of these codons. When G + C is only 40% as for F01, there should be 1.5 times more UAA codons than the other two. Out-of-frame nonsense codons could reduce the cost of mistakes in transcription (slippage) and in translation by leading to termination of defective polypeptides. Table 6 gives the frequency and distribution of the three nonsense codons in both the transcribed but untranslated frames and the untranscribed frames. F01 DNA has an excess of nonsense codons in the transcribed frames while the others are as expected in this
Table 6 Nonsense codons in Felix 01, l and lacZ DNA
Stranda
Unusedb Nonsensec condons Obs Exp
Felix 01 Sense 7859 Antisense 11,496 l Sense 24,458 Antisense 36,687 lacZ Sense 2048 Antisense 3072
Specific codons UAG UAA UGA
714 368 123 655 539 164 1438 1146 121 1254 1754 189 103 96 6 93 144 19
279 299 418 475 27 34
312 192 899 590 70 40
Only codons in regions that are translated to protein are considered. The sense strand (a) only includes codons in the two frames that are transcribed but untranslated, while the antisense strand has the three reading frames that are not transcribed. Unused codons (b) are those that are not translated. Nonsense (c) refers to nonsense codons in E. coli; exp: expected; obs: observed. The expected values are calculated on a basis of G + C = 50%. For Felix 01, a composition of G + C = 40% would increase the expected number of nonsense codons by approximately 1.5 times. lacZ is the E. coli gene for bgalactosidase; l is an E. coli bacteriophage.
225
respect. However, with regard to the type of nonsense codons present, l, lacZ and F01 all have a great excess of UGA codons and a relative lack of UAG codons. It is unclear what biological advantage might ensue from the overuse of the UGA codon.
4. Discussion In order to construct a recombinant bacteriophage by in vitro DNA manipulations, it is necessary to have some basic information about the phage genome and some of its genes. In the present study, part of the F01 genome was cloned and sequenced. From these data, it was possible to assemble the correct DNA constructs that led to the isolation of an F01 carrying the lux reporter genes. F01 appears to represent a new phage prototype. It is likely that as more information becomes available about various phage isolates, a family of F01-related phages will be discovered. While different criteria may be used to define a phage family, it seems that similarity of genetic organization is the best of these. Evidence for similarity can come from a variety of data: genetic crosses that lead to hybrid phages, complementation, antigenicity and molecular characterization. The last of these can identify the genes present, their position within the genome, regulation of gene expression and the sequence of their DNA and proteins. At present, only part of the F01 genome has been examined at the sequence level. The F01 genes sequenced to date show that these are tightly packed. The phage shows no similarity at either the DNA or protein level (with one exception) to any previously characterized phages. As the functions of the gene products remain unknown, comparison with other phages at the functional level is not yet possible. Although many Salmonella and E. coli strains contain restriction systems, no evidence has been reported that F01 is sensitive to these systems. As shown above, the number of palindromic sites is very low in the F01 genome and this might be the basis for its resistance to the restriction systems of these genera. Many phages encode anti-restriction functions and it would seem worthwhile to examine this question, perhaps by introducing the plasmid encoding EcoRV into Salmonella because EcoRV is an enzyme that
226
J. Kuhn et al. / International Journal of Food Microbiology 74 (2002) 217–227
cleaves F01 DNA. The dearth of restriction sites in F01 DNA is unlikely to stem from direct selection against particular sites. Most of the enzymes that fail to cleave F01 are not found in its hosts. A less-specific type of selection may be at work here: selection against palindromic sequences in general (Table 2). Short palindromic sequences are not expected to affect DNA structure. In contrast, the single strandedness of RNA makes it more sensitive to these sequences. As discussed above, the lack of Sau3AI sites seems to have come into being, at least in part, by selection for mRNAs that have particular codon biases. In addition, the excess of UGA codons in the untranslated frames supports the idea of selection for specific mRNA sequences. The main effect of these codons will be to lower the cost of translational errors and in particular those due to frameshifting (Manley, 1978; Kurland et al., 1996). While the ends of F01 can be labeled with polynucleotide kinase, they cannot be ligated to one another. It is clear that detailed studies of F01 during replication and molecular studies of the structure of the ends are needed. Packaging of the DNA does not appear to be the result of a headful process because the ends are discrete. The isolation of deletions would be helpful in this regard. However, it seems unlikely that deletions will be obtained using chelating agents and their isolation will best be pursued by searching for lighter variants from CsCl equilibrium density gradients. In the section of F01 DNA examined, the composition is 60% A + T and its codon usage reflects this. Various organisms differ greatly in the G + C content of their genomes: Mycobacteria have more than 70% G + C while phage T4 has only 34%. However, the host of T4, E. coli, has 51% G + C. The advantage of these differences, if any, to a specific organism remains unknown. The same can be stated for codon usage. For many amino acids, F01 has a very different codon usage than one of its hosts. On the assumption that most or all of the tRNAs and the associated synthetases used by F01 are furnished by the host, this finding is somewhat surprising. However, this does not seem to affect F01’s ability to survive. The codons, GCU, GGU, AUU, CCA, CCU, ACU and AGA, are used much more frequently by F01 than its host. In F01, AGA encodes 48% of the arginine residues while in E. coli this number is less than
5%. A number of studies have indicated that the rate of translation slows when genes with too many AGA and AGG codons are expressed from multicopy plasmids (Makoff et al., 1989; Chen and Inouye, 1990; Makrides, 1996). It will be of interest to investigate whether F01 augments the capacity of the cell for AGA codons by synthesizing its own tRNA specific for this codon. References Ackermann, H.W., Nguyen, T.M., 1983. Sewage coliphages studied by electron microscopy. Appl. Environ. Microbiol. 45, 1049 – 1059. Bollag, D.M., Rozycki, M.D., Edelstein, S.J., 1996. Protein Methods, 2nd edn. Wiley-Liss, New York, NY, USA, pp. 107 – 154. Chen, G.T., Inouye, M., 1990. Suppression of the negative effect of minor arginine codons on gene expression: preferential usage of minor codons within the first 25 codons of the Escherichia coli genes. Nucleic Acids Res. 18, 1465 – 1473. Cherry, W.B., Davis, B.R., Edwards, P.R., Hogan, R.B., 1954. A simple procedure for the identification of the genus Salmonella by means of a specific bacteriophage. J. Lab. Clin. Med. 44, 51 – 55. Felix, A., Callow, B.R., 1943. Typing of paratyphoid B bacilli by means of Vi bacteriophage. Br. Med. J. 2, 127 – 130. Kuhn, J.C., Suissa, M., Chiswell, D., Ulitzur, S., Bar-On, T., Wyse, J., 1998. Bacteriophage Felix 01 and some of its genes. Genbank accession number AF071201. Kurland, C.G., Hughes, D., Ehrenberg, M., 1996. Limitations of translational accuracy. In: Neidhardt, F., Curtiss, R., Ingraham, J.L., Lin, E.C.C., Low, K.B., Magasanik, B., Reznikoff, W.S., Riley, M., Schaechter, M., Umbarger, H.E. (Eds.), Escherichia coli and Salmonella. American Society for Microbiology Press, Washington, USA, pp. 979 – 1004. Lindberg, A.A., 1967. Studies of a receptor for Felix 0-1 phage in Salmonella minnesota. J. Gen. Microbiol. 48, 225 – 233. Lindberg, A.A., Hellerqvist, C.G., 1971. Bacteriophage attachment sites, serological specificity, and chemical composition of the lipopolysaccharides of semirough and rough mutants of Salmonella typhimurium. J. Bacteriol. 105, 57 – 64. Lindberg, A.A., Holme, T., 1969. Influence of O side chains on the attachment of the Felix 0-1 bacteriophage to Salmonella bacteria. J. Bacteriol. 99, 513 – 519. Makoff, A.J., Oxer, M.D., Fairweather, N.F., Ballantine, S., 1989. Expression of tetanus toxin fragment C in E. coli: high level expression by removing rare codons. Nucleic Acids Res. 17, 10191 – 10202. Makrides, S.C., 1996. Strategies for achieving high-level expression of genes in Escherichia coli. Microbiol. Rev. 60, 2005 – 2011. Manley, J.L., 1978. Synthesis and degradation of termination and premature-termination fragments of b-galactosidase in vitro and in vivo. J. Mol. Biol. 125, 406 – 432. McConnell, M.R., Schoelz, J.E., 1983. Evidence for shorter average O-polysaccharide chain length in the lipopolysaccharide of a
J. Kuhn et al. / International Journal of Food Microbiology 74 (2002) 217–227 bacteriophage Felix 01-sensitive variant of Salmonella anatum A1. J. Gen. Microbiol. 129, 3177 – 3184. McConnell, M., Wright, A., 1979. Variation in the structure and bacteriophage-inactivating capacity of Salmonella anatum lipopolysaccharide as a function of growth temperature. J. Bacteriol. 137, 746 – 751. Stewart, G.S.A.B., Lubinsky-Mink, S., Jackson, C.G., Cassel, A., Kuhn, J., 1986. pHG165: A pBR322 copy number derivative of pUC8 for cloning and expression. Plasmid 15, 172 – 181.
227
Wilkinson, R.G., Stocker, B.A.D., 1968. Genetics and cultural properties of mutants of Salmonella typhimurium lacking glucosyl or galactosyl lipopolysaccharide transferases. Nature 217, 955 – 957. Wright, A., 1971. Mechanism of conversion of the Salmonella Oantigen by bacteriophage e34. J. Bacteriol. 105, 927 – 936.