Codon usage in Entamoeba histolytica, E. dispar and E. invadens

Codon usage in Entamoeba histolytica, E. dispar and E. invadens

ELSEVIER Parasitology International 46 (1997) 105-109 Codon usage in Entamoeba histolytica, E. dispar and E. invadens Tomoyoshi Nozaki* , Takashi As...

413KB Sizes 3 Downloads 113 Views

ELSEVIER

Parasitology International 46 (1997) 105-109

Codon usage in Entamoeba histolytica, E. dispar and E. invadens Tomoyoshi Nozaki* , Takashi Asai, Tsutomu Takeuchi Department of Tropical Medicine and Parasitology, Keio University, School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, Japan Received 10 February 1997; accepted 22 March 1997

Abstract We analyzed the frequencies of genetic codon usage of 68 non-redundant protein coding genes from the human-pathogenic E. histolytica (28 117 codons), 6 from the non-pathogenic E. dispar (1744 codons), and 4 from the reptilian E. invadens (933 codons). The A + U contents of the protein coding sequences from E. histolytica, E. dispar, and E. invadens were 67%, 66%, and 58%, respectively. The nucleotide frequency in the third position was strongly biased toward A + U in E. histolytica and E. dispar (85% and 82%, respectively); the degree of the A + U bias was higher in the third position than that in the first or second position. In contrast, the nucleotide frequency in the third position was less biased in E. invadens (60% A + U) than in E. histolytica and E. dispar. Codon usage was biased in accordance with the A + U preference in the third position in E. histolytica and E. dispar. However, no .apparent difference in the codon usage was found between E. histolytica and E. dispar. The codon usage in E. invadens was found less biased; the nucleotide biases observed in the third position of the synonymous codons for several amino acids including leucine, tyrosine, cysteine, and histidine of the E. histolytica and E. dispar genes were reversed or absent. The codon usage in Entamoeba species significantly differed from that in other amitochondrial protist, Giardia lamblia and Trichomonas vaginalis. Two sequences encoding ribosomal protein SlO and S27 showed significantly smaller codon biases than the rest of E. histolytica sequences, suggesting that these ribosomal proteins might be under specific functional constraint of codon usage. The differences in the A + U content of the coding sequences and in the codon usage between the mammalian E. histolytica and E. dispar and the reptilian E. invadens suggested that the reptilian Entamoeba species were distantly related to the mammalian species. These results may aid in elucidating pressures that facilitate changes in the patterns of the genetic codon usage. 0 1997 Elsevier Science Ireland Ltd. Keywords:

Entamoeba

*Corresponding

histolytica;

E. dispar; E. invadens; Codon usage

author. Tel.: +81 3 33531211 ext. 2667; fax: + 81 3 33535958; e-mail: [email protected]

1383-5769/97/$17.00 PII S1383-5769(97)

0 1997 Elsevier Science Ireland Ltd. All rights reserved. 00016-O

T. Nozaki et al. /Parasitology International 46 (1997) 105-109

106

The frequencies of genetic codons have been shown to vary among species of organisms [l], among members of gene families [2,3] and depending upon the locations within a genome [3,4]. Thus, the codon usage data have been used for a variety of purposes. Since the codon usage is apparently affected during the evolution of proteins and organisms, the codon bias information has been applied to deduce the evolutionary rate of proteins [5] and to correlate the genetic and taxonomic distances between organisms [6-81. In addition, the codon usage information is technically important to deduce nucleotide sequences from amino acid sequence data [9] and to predict feasibility of the heterologous gene expression DOI. A group of unicellular protozoan parasites that belong to the genus Entumoeba contained more than a dozen species [11,12]. The genus was subdivided into groups of species based on the host from which the amoeba was isolated, the sizes of the trophozoites and the cysts, the morphology and the number of nuclei present in mature cysts. However, the validity of the classification based on morphology has not been examined at the molecular or genetic level except for the riboprinting method [12]. Recent advances in molecular cloning and characterization have clarified [13] that Entamoeba isolates derived from man mostly consist of two distinct species, E. histolytica, originally described by Schaudinn in 1903 and E. dispar described by Brumpt in 1925, which are indistinguishable at the light microscopic level. E. histolytica is now solely considered to cause human amebiasis and is medically important. E. dispar is not believed to cause diseases. Table 1 Nucleotide

frequency

(%)

in the three

codon

First position

E.h. T C A G Frequency

20 13 34 33 shown

positions

in Second

E.d.

E.i.

E.h.

24 11 32 33

25 13 30 33

28 21 36 15

One of the reptilian species, E. invadens, also has been well studied siritie it serves as a model for encystation in vitro [14-161. In the present work, we collected 68 protein coding genes of the human-pathogenic E. histolytica, six of the non-pathogenic E. dispar, and four of the reptilian E. invadens from the GenBank database (ver. 98.01, and analyzed the frequencies of genetic codon usage in these Entamoeba species with DNASIS computer software (version 3.00, Hitachi Software Engineering, Yokohama, Japan). The aims of the present study were (1) to expand the previous analyses [17,18], in which a limited number of codons (4680 and 4455 codons in [17] and [lS], respectively) were used, of the codon usage in the pathogenic E. histolytica and (2) to compare the codon usage between the pathogenic E. histolytica and the non-pathogenic E. dispar and between the mammalian Entamoeba species and the representative reptilian species. The sequences analyzed did not contain redundant sequences, but contained only representative members of each gene families. Total numbers of codons analyzed for E. histolytica, E. dispar and E. invadens genes were 28 117, 1744 and 933, respectively. The A + U contents of the protein coding sequences from E. histolytica, E. dispar and E. invadens were 67%, 66% and 58%, respectively (data not shown). The nucleotide frequency in the third position was strongly biased toward A + U in E. histolytica and E. dispar (85% and 82% A + U, respectively); the degree of the A + U bias was higher in the third position than that in the first (54% and 56% A + U, respectively) or in the second position (64% and 60% A + U, re-

E. histoiytica, E. dispar and E. inrradens Third

position

in percentage.

Abbreviations: E.h., E. histolytica; E.d., E. dispar; E.i., E. inuadens.

position

E.d.

E.i.

E.h.

E.d.

E.i.

26 23 34 17

23 25 36 16

42 7 43 8

41 10 35 8

28 18 32 22

T. Nozaki et al. /Parasitology International 46 (1997)105-109

spectively) (Table 1). These data agree with the previous report on the nucleotide frequency in E. hist&ytica [171. The nucleotide frequency in the third position was less biased toward A + U in E. invadens than in E. histolytica and E. dispar; A or U nucleotide was found in the position only in 60% of the all codons studied. The lack of the bias toward A + U observed in E. invadens was not due to the limited number of sequences analyzed ( P-tubulin, chitinase, cysteine proteinase and histone H2B) since the corresponding E. histolytica and E. dispar sequences showed a strong A + U bias in the third position. The nucleotide usage patterns in these four sequences from E. histolytica and E. dispar were similar to the result shown in Table 1 (data not shown). Comparisons of codon frequencies (Table 2) showed that the codon usage was strongly biased in E. histolytica and E. dispar, in accordance with the A + U preference in the third position. The results shown in Table 2 reinforced in general the previous reports on the codon usage in the pathogenic E. histolytica [17,18]. However, The CCG for proline, GCG for alanine and AGG, CGC and CGG for arginine were rarely, but actually, used although these codons were reported absent in the previous studies [17,18]. No marked difference in the codon usage was found between E. histolytica and E. dispar. The codon usage in E. invadens was found less biased than in E. histolytica and E. dispar; the nucleotide biases observed in the third position of the synonymous codons for several amino acids including leucine, tyrosine, cysteine and histidine in the E. histoiytica and E. dispar genes were reversed or absent in E. invadens. Among the E. histolytica sequences examined, two sequences, one of which encodes ribosomal protein SlO (GenBank accession number X86145) and the other of which encodes S27 (L362451, showed significantly smaller codon biases in the third position (69% and 67% A + U, respectively) than the rest of E. histolytica sequences. These data indicate that these ribosomal proteins might be under specific functional constraint of codon usage as shown for a plant chloroplast ribosomal protein L12 [19]. The differences in the A + U

107

content of the coding sequences and the differences in the codon usage between the mammalian E. histolytica and E. dispar and the reptilian E. invadens support the results obtained with the phylogenetic analyses using the riboprinting method [12] and indicate that this reptilian Entamoeba species is distantly related to the mammalian species that contain 4 nuclei in mature cysts. In contrast, E. histolytica and E. dispar revealed remarkable similarity in the codon usage, indicating that environmental factors, if any, that facilitate changes in the genetic codon usage were similar between these species, whereas molecules and their genes associated with virulence and pathogenesis evolved in distinct fashions. The codon usage in the Entamoeba species was found to significantly differ from that in other amitochondrial protozoa such as Giardia lamblia [20] and Trichomonas vaginalis (Nozaki and Takeuchi, unpublished). Nucleotides were shown to be strongly biased toward C in the third position in G. lamblia (42%); nucleotides were significantly biased toward A + U only in the second position in T. vaginalis (64% A + U). We found no organism showing the codon usage pattern similar to that of E. invadens among 55 organisms of which the codon usage tables were available [l]. It should be determined in a further study what factors promoted the A + U bias in the mammalian Entamoeba species, or, conversely, what factors maintained the balance of the nucleotide composition in the reptilian species. The codon bias in other reptilian Entamoeba species, i.e. E. terrapinae, E. barreti and E. insolita [12] should be examined whenever sequences from these amoebas become available since these species were shown to be related more to the mammalian species than to E. invadens [12]. Such analyses may aid in elucidating determinants that facilitate changes in the patterns of the genetic codon usage. Acknowledgements

This work was supported in part by the Oyama Health Foundation and the Kanehara Ichiro Memorial Foundation.

108 Table 2 Codon frequency Amino

acid

in E. histolytica, Codon

LJ2.U LeU

Ser Ser Ser Ser W TYr End End CYS cys End Trp LeU l2.U LeU LRU

Pro Pro Pro Pro His His Gln Gln -4% -h A% ‘4% lle lle he Met Thr Thr Thr Thr Asn Asn LYS LYS Ser Ser ‘4% kg Val Val Val Val

l-l-r TTC l-l-A TTG TCT TCC TCA TCG TAT TAC TAA TAG TGT TGC TGA TGG cl-r CTC CTA Cl-G CCT ccc CCA CCG CAT CAC CAA CAG CGT CGC CGA CGG ATT ATC ATA ATG ACT ACC ACA ACG AAT AAC AAA AAG AGT AGC AGA AGG GTT GTC GTA GTG

et al. /Parasitology

E. dispar

and E. invadens

Entamoeba Total

Phe Phe

T. Nozaki

number 692 409 931 218 401 52 821 15 845 140 56 2 569 80 3 261 871 65 59 21 155 12 904 2 393 69 963 30 103 2 61 5 1579 176 221 706 735 108 735 30 1203 265 1723 663 394 54 855 40 1261 162 416 60

histolytica

International

Entamoeba Frequency 24.6 14.5 33.1 7.8 14.3 1.8 29.2 0.5 30.1 5.0 2.0 0.1 20.2 2.8 0.1 9.3 31.0 2.3 2.1 0.7 5.5 0.4 32.2 0.1 14.0 2.5 34.2 1.1 3.7 0.1 2.2 0.2 56.2 6.3 7.9 25.1 26.1 3.8 26.1 1.1 42.8 9.4 61.3 23.6 14.0 1.9 30.4 1.4 44.8 5.8 14.8 2.1

Total

number 44 26 45 18 67 4 48 0 75 18 4 0 38 9 0 26 44 6 8 3 10 1 50 0 21 5 39 4 3 1 1 0 84 14 9 37 59 11 46 2 74 27 106 21 28 3 34 3 80 10 16 5

46 (1997)

105-109

dispar

Entamoeba Frequency 25.2 14.9 25.8 10.3 38.4 2.3 27.5 0.0 43.0 10.3 2.3 0.0 21.8 5.2 0.0 14.9 25.2 3.4 4.6 1.7 5.7 0.6 28.7 0.0 12.0 2.9 22.4 2.3 1.7 0.6 0.6 0.0 48.2 8.0 5.2 21.2 33.8 6.3 26.4 1.1 42.4 15.5 60.8 12.0 16.1 1.7 19.5 1.7 45.9 5.7 9.2 2.9

Total

number 20 15 8 27 26 1 33 16 13 26 1 1 12 13 0 18 12 11 3 5 5 3 32 1 3 10 17 13 2 1 1 0 18 15 9 24 19 8 23 5 21 17 44 35 11 8 17 2 20 8 10 12

invaders Frequency 21.4 16.1 8.6 28.9 27.9 1.1 35.4 17.1 13.9 27.9 1.1 1.1 12.9 13.9 0.0 19.3 12.9 11.8 3.2 5.4 5.4 3.2 34.3 1.1 3.2 10.7 18.2 13.9 2.1 1.1 1.1 0.0 19.3 16.1 9.6 25.7 20.4 8.6 24.7 5.4 22.5 18.2 47.2 37.5 11.8 8.6 18.2 2.1 21.4 8.6 10.7 12.9

T.

Nozaki et al. /Parasitology

international

46 (I997)

109

105-109

Table 2 (continued) Ala Ala Ala Ala ASP ASP Glu GlU GUY GIY GUY GUY

GCT GCC GCA GCG GAT GAC GAA GAG GGT GGC GGA GGG

938 104 883 13 1351 192 1944 162 432 20 1421 61

Total

NNN

28 117

Note.

33.4 3.7 31.4 0.5 48.0 6.8 69.1 5.8 15.4 0.7 50.5 2.2 1000

111 Wada, K., Wada, Y., Ishibashi, F., Gojobori, T. and

[31

[41 [51

[61 [71

Bl

[91

[lo]

1744

36.1 6.3 19.5 0.6 51.6 10.9 47.0 8.0 21.2 1.1 55.6 4.0 1000

22 6 25 11 40 26 41 26 15 4 33 9 933

23.6 6.4 26.8 11.8 42.9 27.9 43.9 27.9 16.1 4.3 35.4 9.6 1000

Frequency per thousand.

References

La

63 11 34 1 90 19 82 14 37 2 97 7

Ikemura, T. (1992) Codon usage tabulated from the Gen Bank genetic sequence data. Nucleic Acids Res. 20, suppl., 2111-2118. Mita, K., Ichimura, S. and Nenoi, M. (1991) Essential factors determining codon usage in ubiquitin genes. J. Mol. Evol. 33, 216-225. Porter, T.D. (1995) Correlation between codon usage, regional genomic nucleotide composition, and amino acid composition in the cytochrome P-450 gene superfamily. Biochim. Biophys. Acta 1261, 394-400. Sharp, P.M. and Matassi, G. (1994) Codon usage and genome evolution. Curr. Opin. Genet. Dev. 4, 851-860. Barrai, I., Scapoli, C., Nesti, C., Poli, G., Gambari, R. and Beretta, M. (1994) Codon usage and evolutionary rates of proteins. J. Theor. Biol. 166, 331-337. Long, M. and Gillespie, J.H. (1991) Codon usage divergence of homologous vertebrate genes and codon usage clock. J. Mol. Evol. 32, 6-15. Morrison, D.A., Ellis, J. and Johnson, A.M. (1994) An empirical comparison of distance matrix techniques for estimating codon usage divergence. J. Mol. Evol. 39, 533-536. Nesti, C., Poli, G., Chicca, M., Ambrosino, P., Scapoli, C. and Barrai, I. (1995) Phylogeny inferred from codon usage pattern in 31 organisms. Comput. Appl. Biosci. 11, 167-171. Goeddel, D.V., Yelverton, E., Ullrich, A., Heynecker, H.L., Miozzari, G., Holmes, W., Seeburg, P.H., Dull, T., May, L., Stebbing, N., Crea, R., Maeda, S., MacCandIiss, A., Sloma, A., Tabor, J.M., Gross, M., Failletti, P.C. and Pestka, S. (1980) Human leukocyte interferon produced by E.coli is biologically active. Nature 287, 411-416. Sean Ha, D., Schwarz, J.K., Salvatore, J.T. and Beverley,

S.M. (1996) Use of the green fluorescent protein as a marker in transfected Leishmania. Mol. Biochem. Parasitol. 77, 57-64. [Ill Levine, N.D. (1973) Protozoan Parasites of Domestic Animals and of Man, Burgess Publishing Company, Minneapolis, MN. D21 Clark, C.G. and Diamond, L.S. (1997) Molecular phylogeny of the Genus Entamoeba as revealed by riboprinting. Arch. Med. Res. 28, suppl., S69-S70. 1131 Diamond, L.S. and Clark, C.G. (1993) A redescription of Entamoeba histolytica Schaudinn, 1903 (Emended Walker, 1911) separating it from Entamoeba dispar Brumpt, 1925. J. Euk. Microbial. 40, 340-344. [141 Sanchez, L., Enea, V. and Eichinger, D. (1994) Identmcation of a developmentally regulated transcript expressed during encystation of Entamoeba invadens. Mol. Biochem. Parasitol. 67, 125-135. [151 Villagomez Castro, J.C., Calve Mendez, C. and Lopez Romero, E. (1992) Chitinase activity in encysting Entamoeba invadens and its inhibition by allosamidin. Mol. Biochem. Parasitol. 52, 53-62. I161 Das, S. and Gillin, F.D. (1991) Chitin synthase in encysting Entamoeba invadens. Biochem. J. 280, 641.-647. [171 Tannich, E. and Horstmann, R.D. (1992) Codon usage in pathogenic Entamoeba histolytica. J. Mol. Evol. 34, 272-273. UN Char, S. and Farthing, M.J. (1992) Codon usage in Entamoeba hktolytica. Int. J. Parasitol. 22,381--383. 1191 Schmidt, M., Pichl, L., Lepper, M. and Feierabend, J. (1993) Identification of the nuclear-encoded chloroplast ribosomal protein L12 of the monocotyledonous plant Secale cereale and sequencing of two different cDNAs with strong codon bias. Biochim. Biophys. Acta 1172, 349-352. [201 Char, S. and Farthing, M.J. (1992) Codon usage in Giardia lamblia. J. Protozool. 39, 642-644.