Molecular & Biochemical Parasitology 142 (2005) 149–157
Expressed sequence tags from the plant trypanosomatid Phytomonas serpens夽 Georgios J. Pappas Jr. a , Karim Benabdellah c , Bianca Zingales b , Antonio Gonz´alez c,∗ a b
Genomic Sciences and Biotechnology Program, Universidade Cat´olica de Bras´ılia, Bras´ılia, DF, Brazil Departamento de Bioqu´ımica, Instituto de Qu´ımica, Universidade de S˜ao Paulo, S˜ao Paulo, SP, Brazil c Instituto de Parasitolog´ıa y Biomedicina, CSIC, Parque Tecnol´ ogico de Ciencias de la Salud, Avenida del Conocimiento, s/n E-18100 Armilla, Granada, Spain Received 21 October 2004; received in revised form 24 March 2005; accepted 31 March 2005 Available online 18 April 2005
Abstract We have generated 2190 expressed sequence tags (ESTs) from a cDNA library of the plant trypanosomatid Phytomonas serpens. Upon processing and clustering the set of 1893 accepted sequences was reduced to 697 clusters consisting of 452 singletons and 245 contigs. Functional categories were assigned based on BLAST searches against a database of the eukaryotic orthologous groups of proteins (KOG). Thirty six percent of the generated sequences showed no hits against the KOG database and 39.6% presented similarity to the KOG classes corresponding to translation, ribosomal structure and biogenesis. The most populated cluster contained 45 ESTs homologous to members of the glucose transporter family. This fact can be immediately correlated to the reported Phytomonas dependence on anaerobic glycolytic ATP production due to the lack of cytochrome-mediated respiratory chain. In this context, not only a number of enzymes of the glycolytic pathway were identified but also of the Krebs cycle as well as specific components of the respiratory chain. The data here reported, including a few hundred unique sequences and the description of tandemly repeated motifs and putative transcript stability motifs at untranslated mRNA ends, represent an initial approach to overcome the lack of information on the molecular biology of this organism. © 2005 Elsevier B.V. All rights reserved. Keywords: Phytomonas serpens; Expressed sequence tags; Hexose transporters; Microsatellites; Stability motifs
1. Introduction In 1909, the genus Phytomonas was introduced to designate trypanosomatids originally found in latex from Euphorbiaceae and later in a wide variety of plant species. Awareness toward the parasitological status of Phytomonas spp. was encouraged when it was discovered that they cause Abbreviations: ARE, AU-rich elements; EST, expressed sequence tag; GIPL, glycoinositol phospholipid; GPI, glycosyl phosphatiylinosoitol; ND, NADH dehydrogenase subunit; SL, spliced leader; SSR, simple sequence repeats; TAO, alternative oxidase; TTA2, subunit of transamidase complex 夽 Note: Nucleotide sequences reported in this paper have been submitTM ted to the GenBank database with the accession numbers CO723750– CO724446. ∗ Corresponding author. Tel.: +34 958 181 657; fax: +34 958 181 632. E-mail address:
[email protected] (A. Gonz´alez). 0166-6851/$ – see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.molbiopara.2005.03.017
devastating diseases in commercially important crops, such as coconut, coffee, cassava and oil palm [1,2]. Surprisingly, however, it was also found that many other infected plant species do not show apparent damage. This intriguing difference has been linked to host–tissue distribution of the microorganism, which is restricted mainly to three environments: latex ducts, fruits/seeds and phloem. Only isolates from the latter group have been associated with severe pathogenesis [1]. Classification of isolates as Phytomonas is based on certain morphological, biochemical and molecular characteristics that distinguish them from other plant-harbored kinetoplastids, such as Leptomonas, Herpetomonas or Crithidia. Nonetheless, no objective criteria have been established for species determination inside the genus nor for population identification [3].
150
G.J. Pappas Jr. et al. / Molecular & Biochemical Parasitology 142 (2005) 149–157
Some peculiarities of Phytomonas metabolism have been studied in more detail. Most remarkable was the observation that several genes of the respiratory complex were absent in the kinetoplast genome [4,5]. Linked to the absence of cytochromes and lack of functional Krebs cycle, evidence that ATP production is largely driven by glycolysis was provided [6]. This adaptation seems to be well suited for Phytomonas endurance in the carbohydrate-rich medium provided by the plant host and is further supported by a very high concentration of glycosomes in the cytoplasm in comparison to other trypanosomatids [1,6]. Little is known about the Phytomonas metabolism in the insect vector stage where it has been postulated that amino acids would be the main energy source [4]. Finally, knowledge on a number of facets of the organism’s fundamental biology has been hampered by the difficulty in obtaining pure in vitro cultures [1]. Therefore, compared to other trypanosomatids, a vast information lag is in place for several essential issues such as life cycle, transmission, pathogenicity, treatment and prophylaxis. In recent years, a surge of genomic sequencing projects has emerged for several kinetoplastids with the expectation to help understand key aspects of their distinctive biology. Trypanosomatid expressed sequence tags (ESTs) have been populating public databases with valuable information and have been used to expand basic knowledge, expose novel gene products and provide useful molecular markers among other benefits. The situation for Phytomonas contrasts sharply with this setting. Studies on the molecular biology of phytoflagellates have been mostly focused for taxonomical or diagnostic purposes [7,8]. Therefore, mostly high copy number genes have been characterized: spliced leader [3,9], ribosomal RNA [10] and kinetoplast DNA minicircles [11]. In addition, very few Phytomonas spp. genes have been sequenced up to date. To overcome this scenario we undertook a moderate gene survey by means of the generation, sequencing and analysis of Phytomonas serpens ESTs aiming to unveil new features and shed more light on lingering questions related to metabolism, host interaction and molecular systematics of this genus.
2. Materials and methods 2.1. Organism Phytomonas serpens, strain 10T, was isolated from tomato [12]. It was grown at 28 ◦ C in Grace’s medium (Sigma) supplemented with 1.2 g L−1 CaCl2 , 1 g L−1 dglucose, 56 mg L−1 penicillin, 100 mg L−1 streptomycin and 48 mg L−1 gentamycin. pH was adjusted to 6.1 with NaOH. The medium was sterilized by filtration and inactivated fetal calf serum (GIBCO-BRL) was added (10%). 2.2. Library construction and sequencing Log-phase cells were collected by centrifugation at 2500 rpm for 10 min and washed twice with PBS. Poly A(+)
RNA was prepared from 109 cells using the QuickPrepTM Micro mRNA Purification Kit (Amersham Pharmacia Biotech). cDNA first strand was synthesized with Superscript II reverse transcriptase (GIBCO-BRL) using an oligo-dT-Not primer (5 -CCTGCGGCCGCT18 ). Synthesis of the second strand was performed using the exo− Klenow fragment of Escherichia coli DNA polymerase and a spliced leader primer (5 -GATACAGTTTCTGTA). After methylation with EcoRI methylase, phosphorylated EcoRI linkers (5 -ACGGAATTCGT) were ligated to the cDNA. The resulting cDNA mixture was then digested with NotI and EcoRI restriction endonucleases, subjected twice to size fractionation on SizeSepTM 400 Spun Columns (Pharmacia) and ligated to dephosphorylated, NotI/EcoRI-digested pBluescript SK+ vector (Invitrogen). Primary clones were obtained through transformation of competent E. coli XL1 Blue cells. Plasmid templates from randomly selected colonies were prepared using 96-Plasmid Purification System Kits for the Biomek 2000 Automation Workstation (Beckman). Sequencing was carried out with a T7 primer using an ABI 377 automated DNA sequencer (Applied Biosystems Inc.). 2.3. EST processing pipeline Sequence analysis began with base calling using the program phred [13]. Sequence quality trimming was performed with the program lucy [14], and vector masking with the program cross match (B. Ewing, unpublished). The processed sequences were clustered with the program tgicl [15] generating the sequence consensi that were used in sequence annotation. 2.4. Sequence annotation Sequence similarity searches were performed with the BLAST version 2.2.6 suite of programs [16], against the non-redundant sequence database from National Center for Biotechnology Information (NCBI) obtained in November 2004. A bit-score greater than 60 bits and minimum alignment length of 70 amino acids were set as the threshold to consider a BLAST hit significant. Functional categories were assigned based on BLAST searches against a specially formatted database of the eukaryotic orthologous groups of proteins (KOG; [17]). All steps from pre-processing to final annotation were executed, analyzed and consolidated automatically by several programs written in PERL language and are available upon request. 2.5. PCR amplification The 5 region of the hexose transporter genes was amplified by PCR from purified P. serpens DNA using standard conditions and the primers HTFW (5 -CCAGCGACACAACAT) and HTBW (5 - CGAATACACGCTCAC). Cycling conditions were: 1 min at 95 ◦ C, 1.5 min at
G.J. Pappas Jr. et al. / Molecular & Biochemical Parasitology 142 (2005) 149–157
42 ◦ C and 40 s at 72 ◦ C. Amplification products were separated by agarose gel electrophoresis and stained with ethidium bromide.
3. Results and discussion 3.1. EST annotation Sequencing of random cDNA clones yielded a total of 2190 raw electropherograms that were processed with the base calling program phred. Sequences with the respective phred-generated base-quality values were then subjected to a series of processing steps including trimming of poor quality regions with the program lucy, vector and Phytomonas spliced leader [18] masking with the program cross match and a final sequence cleaning step consisting of the removal of terminal masked and poly A/T regions. After this preprocessing step 1893 sequences (86.4% of the total) were selected for further analyses given the constraints of having
151
at least 200 bases with 1% probability of miscalling (phred base-quality value of 20). EST clustering was performed with the program tgicl in order to remove intrinsic sequence redundancy present in the cDNA library. Clustering results indicated the presence of 697 clusters, of which 452 contained only one EST (singletons) accounting for 64.85% of the clusters, whereas 245 clusters (35.15%) contained multiple ESTs (contigs). The 697 consensi represent 36.82% of the accepted sequences, and was the working set of unique sequences subjected to annotation. These values reveal a moderate to high redundancy level but, since there are roughly twice more singletons than contigs, it follows that few clusters are densely populated thus facilitating the appearance of novel sequences. Therefore, the generated cDNA library is a useful source for obtaining new sequence information for Phytomonas. A more detailed overview of redundant sequence distribution is shown by a classification of contigs based on the number of ESTs per cluster (Fig. 1A). As expected, there is an inverse relationship between the number of clusters and
Fig. 1. (A) Distribution of the number of ESTs per cluster based on a clustering generated by the program tgicl. (B) Distribution of the 1893 ESTs according to functional classes based on KOG classification. An EST was included in a KOG class if exists a blast hit to KOG database with an E-value better than 10−10 .
152
G.J. Pappas Jr. et al. / Molecular & Biochemical Parasitology 142 (2005) 149–157
Table 1 List of best BLASTX hits against NCBI’s non-redudant database ranked by the alignment bit-score EST CL152 CL41 CL45 CL33 CL139 CL121 CL62 P1031 P837 CL113 CL106 CL140 CL143 CL26 CL64 P2670 CL243 P2301 P2442 CL175 CL89 CL122 P2750 CL159 CL226 P1441 P1222 P2090 P2870 CL145 P169 CL228 P2493 P754 P1145
Description Major paraflagellar rod component PAR 2 (Trypanosoma cruzi) p36 LACK protein (Leishmania amazonensis) Glycosomal glyceraldehyde-3-phosphate dehydrogenase (Phytomonas sp.) Heat shock protein 70 (Leishmania braziliensis) 2-Hydroxyacid dehydrogenase (Phytomonas sp. isolate Ech1) Aldolase (Leishmania mexicana) Beta-tubulin (Leishmania mexicana) Proteasome alpha 4 subunit (Trypanosoma brucei) Carbamoyl-phosphate synthetase (Leishmania mexicana amazonensis) Phosphoenolpyruvate carboxykinase glycosomal (Trypanosoma cruzi) Phosphoglycerate kinase glycosomal (Crithidia fasciculata) Paraflagellar rod protein 1D (Leishmania mexicana) S-adenosylmethionine synthetase (Leishmania donovani) Laminin receptor precursor-like protein (Trypanosoma cruzi) Paraflagellar rod protein 2C (Leishmania mexicana mexicana) ADP-ribosylation factor (Trypanosoma cruzi) Heat shock protein 90 homolog (Trypanosoma cruzi) GTP-binding protein rtb2 (Trypanosoma brucei) Dihydrolipoamide dehydrogenase (Trypanosoma cruzi) p45 (Leishmania major) Guanine nucleotide-binding protein beta subunit-like protein (Trypanosoma brucei) Adenine phosphoribosyltransferase (Leishmania donovani) Calmodulin (Trypanosoma cruzi) MRP-family nucleotide-binding protein (Leishmania major) Cofactor-independent phosphoglycerate mutase (Leishmania mexicana) S-adenosyl-l-methionine-C-24-delta-sterol-methyltransferase A (Leishmania donovani) Succinyl-CoA ligase alpha-chain mitochondrial precursor (Homo sapiens) Arginine N-methyltransferase probable (Trypanosoma brucei) Cyclophilin (Leishmania major) Probable nucleolar protein involved in pre-rRNA processing (Leishmania major) Rab1 (Trypanosoma brucei) BiP/GRP78 ER chaperone (Trypanosoma brucei) Acetyl-CoA synthetase (Leishmania major) Squalene synthase (Leishmania major) Probable pseudouridylate synthetase (Leishmania major)
cluster size in terms of EST counts. An important observation regarding the highly populated clusters is that the relative levels of gene expression can provide a snapshot of the physiological state of the cells. The functional identity of all sequences was inferred from similarity searches using the program BLASTX against NCBI’s non-redundant database. A massive presence of sequences encoding ribosomal proteins and histones in the most populated clusters was verified, which is an indicative of the active engagement of the cells in protein synthesis and proliferation (annotation of the most populated EST clusters is reported in Table S1, supplementary data). Further, BLASTX searches revealed a high number of clusters with no significant hits (59.9%, 418/697). In order to improve functional assignment for these sequences, the latest draft sequence of Trypanosoma brucei genome was taken from TIGR (February 2005; http://www.tigr.org/tdb/e2k1/tba1/) as the ground for a trypanosomatid specific comparative study. With the use of T. brucei database the number of no hits dropped favorably from 418 to 314, or 45.05% of the clusters. Next, the annotations
NCBI accession
E-value
Bit-score
A45112 AAK51530 AAD02468 AAG01344 AAG01145 1EPX AAK31149 Q9NDA2 BAA94293 P51058 P08967 AAO25623 O43938 AAD30064 AAB17719 AAF82562 A26125 AAA79869 CAA72132 AAF04629 Q94775 AAC37295 P18061 CAC14524 CAD66620 AAR92098 P53597 CAB95620 CAA73904 CAC37159 CAA68211 AAC37174 CAB55376 AAC17923 NP 047053
1 × 10−160
1454 1161 1028 1007 989 968 962 903 899 893 879 865 846 841 831 816 802 792 787 765 760 756 754 751 747 737 720 712 711 704 696 686 685 684 682
1 × 10−126 1 × 10−110 1 × 10−108 1 × 10−106 1 × 10−104 1 × 10−103 4 × 10−96 8 × 10−96 4 × 10−95 2 × 10−93 7 × 10−92 1 × 10−89 5 × 10−89 8 × 10−88 4 × 10−86 2 × 10−84 2 × 10−83 8 × 10−83 3 × 10−80 1 × 10−79 5 × 10−79 7 × 10−79 1 × 10−78 4 × 10−78 5 × 10−77 5 × 10−75 4 × 10−74 7 × 10−74 3 × 10−73 2 × 10−72 5 × 10−71 6 × 10−71 8 × 10−71 7 × 10−71
from these 104 additional sequences from T. brucei were taken from geneDB [19], which provides a curated information source for several protozoa. There was no significant improvement in the annotation since the majority of cases corresponded to hypothetical conserved proteins (75/104; 72.11%). Nevertheless, the noticeable improvement in the number of potentially homologous sequences coming from T. brucei and the availability, in the foreseeable future, of the complete genome sequences from three trypanosomatids (T. brucei, T. cruzi and Leishmania major), indicates that comparative genomics is a promising technique to support Phytomonas research. A general functional classification of the ESTs can be obtained by BLASTX similarity search against the KOG database, a compendium of predicted orthologous groups of proteins from seven complete eukaryotic genomes that provides a classification of the groups into broader functional categories [17]. Differently from the previous analysis, where qualitative functional assignment of the unique sequences was sought for general gene discovery, the redundant EST set (1893 sequences) is subjected to classification according
G.J. Pappas Jr. et al. / Molecular & Biochemical Parasitology 142 (2005) 149–157
to KOG’s general functional categories to quantitatively estimate the cellular processes the ESTs are engaged on. An EST was included in a KOG class when the BLASTX hit had a bit-score higher than 60 (E-value better than 10−10 for this database). The results are shown in Fig. 1B. In agreement with the data regarding the annotation of the most populated EST clusters (list in Table S1, supplementary data), the most prominent KOG classes correspond to translation, ribosomal structure and biogenesis categories, comprising 750 ESTs or 39.6% of the total. This value is even higher than the number of unassigned ESTs due to the absence of blast hits at the chosen similarity threshold (Fig. 1B; no hits, 36%). To situate this into perspective, ribosomal-related proteins accounted for 12.5% of T. brucei rhodesiense bloodstream-form ESTs [20] and 22.0% of T. carassii trypomastigote ESTs [21] obtained from non-normalized libraries with a similar number of sequenced clones as in this study. A characteristic of the KOG classes is that they are highly biased for housekeeping proteins and, consequently, an individual description of the ESTs is necessary to provide a better picture of the annotation. In Table 1, we show a list of 35 ESTs with bit-scores between 1454 and 682 and the description of the best BLASTX hits against NCBI’s non-redundant database after filtering out those hits previously identified as housekeeping genes. A list of 122 ESTs ranked from 1454 to 181 alignment bit-scores is presented in Table S1, supplementary data. The best hit of 100 of these ESTs (84%; 100/119) was observed with previously reported kinetoplastid proteins (see accession numbers in Table S1, supplementary data).
153
Fig. 2. (A) Sequence alignment of the amino terminal end of the two hexose transporter isoforms found in one of the most populated EST cluster. (B) Electrophoretical separation of PCR amplification products from genes encoding hexose transporters: 100-bp ladder molecular weight marker (1); PCR products in the presence (2) or absence (3) of P. serpens DNA template.
3.2. Hexose transporters The most populated cluster contains 45 ESTs or 2.38% of the total accepted sequences. Similarity searches indicate that the sequences of this cluster are very similar to the well-known members of the glucose transporter family. This high reiteration can be correlated with the Phytomonas energy metabolism, almost entirely dependent on substrate level phosphorylation. High levels of transporter mRNA expression seem to be essential to guarantee the supply of carbon sources needed to sustain such a high glycolysis activity. The kinetoplastid glucose transporter family consists of integral membrane proteins that mediate facilitated diffusion of glucose through plasma membrane [22]. These transporters also recognize with reduced affinity other monohexoses such as fructose, mannose and galactose [23,24]. Where known, this family consists of a single cluster of genes with a species-specific gene organization pattern. For instance, T. brucei possess two distinct transporters, THT1 and THT2, with six tandemly repeated copies of the first followed by five copies of the latter [25]. In T. congolense, the isoforms TcoHT1 and TcoHT2 are 92.4% identical and there are three copies of TcoHT2 interspersed with two copies of TcoHT1 [24]. For T. cruzi only one isoform (TcrHT1) in eight tandem copies was reported [26].
The extremely high relative number of glucose transporter ESTs in Phytomonas suggests the presence of multiple gene copies in the genome of this organism. A further analysis of the EST cluster revealed the presence of two highly similar glucose transporter sequences, differing in the N-terminal region of the predicted protein (Fig. 2A). An analogous observation was made earlier for L. enriettii where the two transporter isoforms differed only at their N-terminus [27]. Later it was shown that this difference was linked to the cellular targeting, with one isoform located primarily in the flagellar membrane and the other in the plasma membrane [28]. While further studies are required to verify if this holds true for Phytomonas, the PCR result presented in Fig. 2B using primers common to both isoforms shows that the band corresponding to the larger isoform (hereby named I1) is more intense than the band of the shorter isoform (I2). After counting isoform numbers in the hexose transporter cluster we found 23 copies of I1 and 9 of I2 in good agreement with the PCR amplification result. Both evidences clearly indicate (i) that Phytomonas also contains a glucose transporter gene family, which is repeated in the genome, the larger isoform being in higher copy number and (ii) that active monohexose import seems to play a key role for the survival of the organism. In this context, several studies point out that glucose transporters are ideal drug targets for trypanosomatid control [22].
154
G.J. Pappas Jr. et al. / Molecular & Biochemical Parasitology 142 (2005) 149–157
3.3. Energy metabolism The data in Fig. 1B show that 2% of the ESTs cluster falls in the KOG functional class of energy production and conversion. Analysis of the annotated genes (Table 1 and Table S1, supplementary data) indicates the presence of several enzymes thought to be contained in glycosomes, a trypanosomatid-specific organelle where most of the glycolytic pathway enzymes are concentrated [6,29,30]. Among them, we identified glyceraldehyde-3-phosphate dehydrogenase (CL45), 2-hydroxyacid dehydrogenase (CL139), aldolase (CL121), phosphoenolpyruvate carboxykinase (CL113) and phosphoglycerate kinase (CL106) (Table 1); and glycerol kinase (P2892) and hexokinase (P1376) (Table S1, supplementary data). In the genus Phytomonas, production of ATP is mostly due to anaerobic glycolysis [6,29,30] and the mitochondrion is respiratory-deficient with several components of the electron transport chain missing, most notably maxicircle-encoded components of the respiratory complexes III and IV, such as the subunits I–III of cytochrome oxidase and apocytochrome b [5]. Nonetheless, we identified other components of the respiratory chain: the previously reported Phytomonas alternative oxidase, TAO ([31]; CL93) and NADH2 dehydrogenase chain 5 (ND5 gene; P175) (see Table S1, supplementary data). No cytochromes could be found confirming earlier biochemical assays [6]. The presence of ESTs coding for TAO and the ND5 subunit in Phytomonas suggests a constitutive process analogous to that found in the bloodstream trypomastigotes of T. brucei, where oxidative phosphorylation and cytochrome-mediated electron transport are shut down and NADH dehydrogenase complex works by supplying electrons, via ubiquinol, to TAO [30].
metabolism were identified. It was found an EST similar to the phosphomannose isomerase of L. mexicana (P1860), which isomerizes fructose-6-phosphate to mannose-6phosphate [34], the starting compound for the synthesis of activated mannose derivatives used in glycoproteins and glycolipids biogenesis. These compounds in the form of glycosyl phosphatidylinositol-anchored proteins (GPI) and glycoinositol-phospholipids (GIPL) are particularly abundant in trypanosomatids forming a dense glycocalyx. Another identified EST directly involved in this process was the TTA2 subunit of the GPI transamidase complex (P1412), responsible for the linking of a GPI anchor to the C-terminus of acceptor proteins [35]. In Phytomonas, the presence of four unique glycolipids [36] has been reported and the biosynthetic pathway of these molecules is a potential target for the development of parasite-specific drugs. Previous biochemical characterization of a Phytomonas isolate from the latex plant Euphorbia characias revealed the activity of several enzymes in culture supernatants [6], in particular, invertase, amylase, carboxymethylcellulase and polygalacturonase. These enzymes are involved in plant polysaccharide (starch, cellulose, pectin) degradation and could constitute a parasite strategy to scavenge monosaccharides from the host. Since specific activities for these enzymes were high even in the case of short-term isotonic buffer incubations [6], we expected their detection in the EST collection. However, this turned out not to be the case, perhaps due to our limited sample size. Notwithstanding, the observation that fruit flagellates grow well on media containing monohexoses but fail to grow with disaccharides [1] may provide supporting evidence that the P. serpens strain isolated from tomato fruit indeed does not possess invertase activity as opposed to the latex strains.
3.4. Other ESTs of potential interest 3.5. Tandemly repeated motifs Several other identified ESTs can shed some light on the physiological aspects of the parasite lifestyle. For instance, various isoforms of nucleoside transporters were found (P2657, P2932, P2491) (see Table S1, supplementary data). This is consistent with the observation of a family of clustered genes in T. brucei [32] and reflects the fundamental role of purine salvage pathway observed in kinetoplastids [33]. Amino acid metabolism is thought to be an essential source of energy for the parasite while inside the insect vector. It was shown that proline plays an important role in trypanosomatids being metabolized through the Krebs cycle, which seems not to be the case for Phytomonas [30]. This is a significant issue and therefore alternative candidates are needed. Some clues can be provided by the presence of a putative threonine aldolase (P1473) (see Table S1, supplementary data), which catalyzes the conversion of threonine to glycine and acetaldehyde. Several other ESTs related to proteins acting in the synthesis and processing of by-products of carbohydrate
Microsatellites or simple sequence repeats (SSR) are regions of tandemly repeated DNA segments constituted by units of up to six bases. Length polymorphisms between alleles of the same locus are extensively used as genetic markers. Because the classification of Phytomonas is still an open debate, we utilized our EST dataset to survey the occurrence of SSRs in this sample as an initial step for development of new markers. The consensi sequences were scanned for SSRs of unit length from two to six using the program mreps [37]. In total, 128 (17.75%) consensi displayed an SSR with lengths of more than 10 bases, of which 98 came from singletons (13.59%) and 30 from populated clusters (4.16%). Analysis of 58 di-nucleotide repeats revealed a clear bias for TA and AT repeats, which accounted, respectively, for 36% (21/58) and 29% (17/58) of the observations. In the case of tri-nucleotides, there was not a strong occurrence of any particular type with probably the exception of TTA with 5 appearances in 33 scored regions. It should be pointed out that
G.J. Pappas Jr. et al. / Molecular & Biochemical Parasitology 142 (2005) 149–157
155
Fig. 3. (A) Sequence logos representation of ESTs possessing A/T-rich motifs as identified by program meme. Numbering in the abscissa refers to position in the alignment block and does not relate to actual position in the sequences. The central upper line represents the region where motifs are located. Flanking sequence regions (positions 1–10 and 40–48) are shown as reference. (B) ESTs displaying A/T-rich segments. Annotations are based on BLASTX hits against non-redundant database.
the final counts for both classes were recorded from different clusters. 3.6. Motifs related to transcript stability In trypanosomatids, transcription is polycistronic and control of gene expression is exercised mainly at the posttranscriptional level, primarily by processing of precursor RNA, modulation of mRNA stability and translation efficiency. Stage-specific destabilization of mRNAs that possess particular sequence patterns at their 3 -untranslated regions has been reported in trypanosomatids [38]. One of such patterns, the AU-rich elements (AREs), is widely present in eukaryotes and control mRNA stability upon interaction with specific RNA binding proteins [39]. The existence of ARE regions in Phytomonas ESTs was therefore probed by using the program meme [40], which
searches for motifs in the unaligned set of consensi sequences. Several ESTs with sizeable A/T stretches, which potentially can be assigned as AREs, were identified. The results are shown in Fig. 3A, represented as sequence logos [41], where the size of a nucleotide is scaled up to its frequency in the particular alignment column. Aiming at verifying if sequences sharing similar regulatory motifs could be subjected to a joint regulation mediated by the same cisacting factors, we scored the putative function of the consensi depicted in Fig. 3A. The annotations shown in Fig. 3B do not disclose clear functional relationships that could evoke some coordinated expression. On the other hand, some members of the list have poorly characterized functions that demand more studies to verify the biological relevance of sharing such motifs. In conclusion, the EST sequencing and annotation approach used in this work has provided some new insights
156
G.J. Pappas Jr. et al. / Molecular & Biochemical Parasitology 142 (2005) 149–157
about general aspects of the metabolism of the plant trypanosomatid P. serpens. Despite intrinsic problems of this methodology, like sequencing errors and incomplete sequences, it proved useful mainly because of sequence scarcity in public databases for this organism. We identified several hundred unique sequences which can provide a starting point to tackle a number of intriguing questions that remain to be answered given the particularities of this organism metabolism.
[9]
[10]
[11]
Acknowledgements
[12]
This work was supported by grant BIO2002-02228 from Ministerio de Ciencia y Tecnolog´ıa (Spain) to A.G. and grants from Conselho Nacional de Desenvolvimento Cient´ıfico e Tecnol´ogico (CNPq, Brazil) to G.P. and B.Z. and Fundac¸a˜ o de Amparo a` Pesquisa do Estado de S˜ao Paulo (FAPESP, Brazil) to B.Z. We thank Dr. Fern´an Ag¨uero for critical reading of the manuscript, Antonio Lario for technical support, and Dr. Najib El-Sayed from TIGR for the authorization to access T. brucei genomic data.
[13]
Appendix A. Supplementary data
[18]
Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.molbiopara. 2005.03.017.
[14] [15]
[16]
[17]
[19]
[20]
References [1] Camargo EP. Phytomonas and other trypanosomatid parasites of plants and fruit. Adv Parasitol 1999;42:29–112. [2] S´anchez-Moreno M, Fern´andez-Becerra C, Mascar´o C, et al. Isolation, in vitro culture, ultrastructure study, and characterization by lectin-agglutination tests of Phytomonas isolated from tomatoes (Lycopersicon esculentum) and cherimoyas (Anona cherimolia) in southeastern Spain. Parasitol Res 1995;81:575– 81. [3] Dollet M, Sturm NR, Campbell DA. The spliced leader RNA gene array in phloem-restricted plant trypanosomatids (Phytomonas) partitions into two major groupings: epidemiological implications. Parasitology 2001;122:289–97. [4] Gonz´alez-Halphen D, Maslov DA. NADH-ubiquinone oxidoreductase activity in the kinetoplasts of the plant trypanosomatid Phytomonas serpens. Parasitol Res 2004;92:341–6. [5] Nawathean P, Maslov DA. The absence of genes for cytochrome c oxidase and reductase subunits in maxicircle kinetoplast DNA of the respiration-deficient plant trypanosomatid Phytomonas serpens. Curr Genet 2000;38:95–103. [6] S´anchez-Moreno M, Lasztity D, Coppens I, Opperdoes FR. Characterization of carbohydrate metabolism and demonstration of glycosomes in a Phytomonas sp. isolated from Euphorbia characias. Mol Biochem Parasitol 1992;54:185–99. [7] Hollar L, Maslov DA. A phylogenetic view on the genus Phytomonas. Mol Biochem Parasitol 1997;89:295–9. [8] Serrano MG, Nunes LR, Campaner M, et al. Trypanosomatidae: Phytomonas detection in plants and phytophagous insects by PCR
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
amplification of a genus-specific sequence of the spliced leader gene. Exp Parasitol 1999;91:268–79. Teixeira MM, Serrano MG, Nunes LR, et al. Trypanosomatidae: a spliced-leader-derived probe specific for the genus Phytomonas. Exp Parasitol 1996;84:311–9. Dollet M, Sturm NR, S´anchez-Moreno M, Campbell DA. 5S ribosomal RNA gene repeat sequences define at least eight groups of plant trypanosomatids (Phytomonas spp.): phloem-restricted pathogens form a distinct section. J Eukaryot Microbiol 2000;47:569–74. Dollet M, Sturm NR, Ahomadegbe JC, Campbell DA. Kinetoplast DNA minicircles of phloem-restricted Phytomonas associated with wilt diseases of coconut and oil palms have a two-domain structure. FEMS Microbiol Lett 2001;205:65–9. Jankevicius JV, Jankevicius SI, Campaner M, et al. Life cycle and culturing of Phytomonas serpens (Gibbs), a trypanosomatid parasite of tomatoes. J Protozool 1989;36:265–71. Ewing B, Green P. Base-calling of automated sequencer traces using phred II. Error probabilities. Genome Res 1998;8:186–94. Chou HH, Holmes MH. DNA sequence quality trimming and vector removal. Bioinformatics 2001;17:1093–104. Pertea G, Huang X, Liang F, et al. TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 2003;19:651–2. Altschul SF, Madden TL, Schaffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25:3389–402. Tatusov RL, Fedorova ND, Jackson JD, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 2003;4:41. Nunes LR, Teixeira MM, Camargo EP, Buck GA. Sequence and structural characterization of the spliced leader genes and transcripts in Phytomonas. Mol Biochem Parasitol 1995;74:233–7. Hertz-Fowler C, Peacock CS, Wood V, et al. GeneDB: a resource for prokaryotic and eukaryotic organisms. Nucleic Acids Res 2004;32:D339–43. Djikeng A, Agufa C, Donelson JE, Majiwa PA. Generation of expressed sequence tags as physical landmarks in the genome of Trypanosoma brucei. Gene 1998;221:93–106. Ag¨uero F, Campo V, Cremona L, et al. Gene discovery in the freshwater fish parasite Trypanosoma carassii: identification of trans-sialidase-like and mucin-like genes. Infect Immun 2002;70: 7140–4. Barrett MP, Tetaud E, Seyfang A, Bringaud F, Baltz T. Trypanosome glucose transporters. Mol Biochem Parasitol 1998;91:195– 205. Langford CK, Kavanaugh MP, Stenberg PE, et al. Functional expression and subcellular localization of a high-Km hexose transporter from Leishmania donovani. Biochemistry 1995;34:11814–21. Vedrenne C, Bringaud F, Barrett MP, Tetaud E, Baltz T. The structure-function relationship of functionally distinct but structurally similar hexose transporters from Trypanosoma congolense. Eur J Biochem 2000;267:4850–60. Bringaud F, Baltz T. A potential hexose transporter gene expressed predominantly in the bloodstream form of Trypanosoma brucei. Mol Biochem Parasitol 1992;52:111–21. Tetaud E, Bringaud F, Chabas S, Barrett MP, Baltz T. Characterization of glucose transport and cloning of a hexose transporter gene in Trypanosoma cruzi. Proc Natl Acad Sci USA 1994;91:8278–82. Stack SP, Stein DA, Landfear SM. Structural isoforms of a membrane transport protein from Leishmania enriettii. Mol Cell Biol 1990;10:6785–90. Piper RC, Xu X, Russell DG, Little BM, Landfear SM. Differential targeting of two glucose transporters from Leishmania enriettii is mediated by an NH2-terminal domain. J Cell Biol 1995;128:499–508. Chaumont F, Schanck AN, Blum JJ, Opperdoes FR. Aerobic and anaerobic glucose metabolism of Phytomonas sp. isolated from Euphorbia characias. Mol Biochem Parasitol 1994;67:321–31.
G.J. Pappas Jr. et al. / Molecular & Biochemical Parasitology 142 (2005) 149–157 [30] Hannaert V, Bringaud F, Opperdoes FR, Michels PA. Evolution of energy metabolism and its compartmentation in kinetoplastida. Kinetoplastid Biol Dis 2003;2:11. [31] Van Hellemond JJ, Simons B, Millenaar FF, Tielens AG. A gene encoding the plant-like alternative oxidase is present in Phytomonas but absent in Leishmania spp. J Eukaryot Microbiol 1998;45:426–30. [32] S´anchez MA, Tryon R, Green J, Boor I, Landfear SM. Six related nucleoside/nucleobase transporters from Trypanosoma brucei exhibit distinct biochemical functions. J Biol Chem 2002;277:21499–504. [33] Landfear SM. Molecular genetics of nucleoside transporters in Leishmania and African trypanosomes. Biochem Pharmacol 2001; 62:149–55. [34] Garami A, Ilg T. The role of phosphomannose isomerase in Leishmania mexicana glycoconjugate synthesis and virulence. J Biol Chem 2001;276:6566–75. [35] Nagamune K, Ohishi K, Ashida H, et al. GPI transamidase of Trypanosoma brucei has two previously uncharacterized (trypanoso-
[36]
[37]
[38] [39] [40]
[41]
157
matid transamidase 1 and 2) and three common subunits. Proc Natl Acad Sci USA 2003;100:10682–7. Redman CA, Schneider P, Mehlert A, Ferguson MA. The glycoinositol-phospholipids of Phytomonas. Biochem J 1995;311(Pt 2):495–503. Kolpakov R, Bana G, Kucherov G. mreps: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res 2003; 31:3672–8. D’Orso I, De Gaudenzi JG, Frasch AC. RNA-binding proteins and mRNA turnover in trypanosomes. Trends Parasitol 2003;19:151–5. Wilusz CJ, Wormington M, Peltz SW. The cap-to-tail guide to mRNA turnover. Nat Rev Mol Cell Biol 2001;2:237–46. Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 1994;2:28–36. Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 1990;18:6097–100.