Gene 221 (1998) 93–106
Generation of expressed sequence tags as physical landmarks in the genome of Trypanosoma brucei Appolinaire Djikeng a, Caroline Agufa a,1, John E. Donelson b, Phelix A.O. Majiwa a,* a International Livestock Research Institute (ILRI), PO Box 30709, Nairobi, Kenya b Department of Biochemistry, University of Iowa and Howard Hughes Medical Institute, Iowa City, IA 52242, USA Received 5 June 1998; accepted 5 August 1998; Received by T. Sekiya
Abstract Previous molecular genetic studies on the African trypanosome have focused on only a few genes and gene products, the majority of which are concerned with surface antigenic variation; consequently, an insignificant number of the genes of this organism have been characterized to date. In order to: (1) identify new genes and analyze their expression profile, (2) generate expressed sequence tags (ESTs) for derivation of a physical map of the trypanosome genome, and (3) make available the partial sequence information and the corresponding clones for general biomedical research on the parasite, we have performed singlepass sequencing of random, directionally cloned cDNAs from a bloodstream form Trypanosoma brucei rhodesiense library. Analysis of 2128 such ESTs sequenced so far in this study showed significant similarities [BLASTX P(n)-value <10−4, and a match >10 amino acid residues] with proteins whose genes have been described in diverse organisms including man, rodents, kinetoplastids, yeasts and plants. A number of the ESTs encode homologues of proteins involved in various functions including signal reception and transduction, cell division, gene regulation, DNA repair and replication, general metabolism, and structural integrity. Although some of these genes may have been expected to be present in the African trypanosomes, the majority of them had not previously been described in these organisms. A large proportion, 768 individual ESTs (36%, representing 385 different transcripts), had a significant homology with genes described in organisms other than the African trypanosomes; however, 15% of the ESTs were from genes already described in trypanosomes. Among the ESTs analysed were 462 distinct known genes, only 77 of which have been described in T. brucei. Approximately 52% of the ESTs did not show any significant homology with the sequences in any of the public domain databases. © 1998 Elsevier Science B.V. All rights reserved. Keywords: Recombinant DNA; EST; Genome analysis; Single-pass random cDNA sequencing; Trypanosoma
1. Introduction Many aspects of the basic biology of the African trypanosomes are difficult to approach and thus remain unknown partially because a majority of the previous studies have focused on only a few genes or gene products of this parasite. To identify a new set of trypanosome genes for biomedical studies and to facilitate the derivation of a physical map of the genome of this parasite, we have generated expressed sequence tags ( ESTs) from transcripts of the human-infective * Corresponding author. Tel: +254 2 630 743; Fax: +254 2 631 499; e-mail:
[email protected] 1 Present address: International Centre for Research in Agroforestry (ICRAF ), United Nations Avenue, PO Box 30677, Nairobi, Kenya. Abbreviations: EST, expressed sequence tag; VSG, variant surface glycoprotein. 0378-1119/98/$19.00 © 1998 Elsevier Science B.V. All rights reserved. PII: S0 3 7 8 -1 1 1 9 ( 9 8 ) 0 0 42 7 - 2
Trypanosoma brucei rhodesiense. These ESTs will facilitate detailed analyses of genes and loci, which have roles in phenotypes of importance, thus providing a convenient starting point for the study of the phenotypes. Besides serving as useful molecular landmarks in large fragments of chromosomal DNA cloned in big-DNA vectors such as P1, YAC or cosmid genomic libraries, the ESTs are crucial for use in establishing a transcript map, which is a necessary prelude to complete sequence determination of the genome of the parasite. The EST approach has been successfully applied to the study of gene expression and the discovery of genes in several parasitic organisms, including Plasmodium falciparum (Chakrabarti et al., 1994), Toxoplasma gondii ( Wan et al., 1996), and Schistosoma mansoni ( Franco et al., 1995). The potential of ESTs in the discovery of new trypanosome genes has been shown in the analysis of a relatively small number of T. brucei transcripts (el-Sayed
94
A. Djikeng et al. / Gene 221 (1998) 93–106
et al., 1995). In the present study, the analysis has been extended to a much larger number of cDNA clones. Among the 2128 individual ESTs analysed 462 distinct genes were represented that encode homologues of proteins involved in different functions including signal reception and transduction, cell division, gene regulation, DNA repair and replication, general metabolism, and cell structural integrity. Only 77 of these were homologues of genes already described in trypanosomes. The T. brucei haploid genome is nearly 40 Mb in size (Swindle and Tait, 1996). The yeast Saccharomyces cerevisiae, with a 12 Mb genome, has 6000 distinct transcripts (Goffeau et al., 1996); therefore, the trypanosome could have 12 000–20 000 unique transcripts. Thus, the 462 distinct transcripts described here, which are homologues of known genes, constitute only approximately 4% of the potential total number of unique gene transcripts of the trypanosome. Among the relatively small number of transcripts surveyed, we have discovered many trypanosome homologues of genes that could not have been found as efficiently by other approaches. The study of some of these will most likely lead to a better understanding of the basic biology of this parasite. The trypanosome EST data reported here and those available so far (el-Sayed et al., 1995) will facilitate the elucidation of functions of the genes not only in the trypanosomes but also in other organisms. The cDNA clones from which the data have been derived are a useful resource for research that could lead to the identification of new targets for drug design and vaccine development, proteins with potential diagnostic or therapeutic value and genes whose mutation may lead to phenotypes of important consequences.
500 bp were subjected to single-pass sequence determination from the 5∞ end, using the T3 primer. The dideoxy-chain termination sequencing reactions (Sanger et al., 1977) were carried out on individual plasmid cDNA clones, either manually or using an automated sequencer and appropriate chemistries. On average, 400 nt of reliable, >90% accurate, sequence was obtained from each clone. The sequence data were edited to remove nucleotides from the vector, the trypanosome spliced leader (Boothroyd and Cross, 1982) or any other unreliable readings before use as queries to search the GenBank using the Basic Local Alignment Search Tool (BLAST; Altschul et al., 1990) on the National Center for Biotechnology Information (NCBI ) network e-mail server. Each alignment was examined to determine the significance, if any, of a match. The ESTs without homologues in the databases were periodically re-submitted for repeated searches to identify potential matches that may be present among the new sequences deposited in the databases. For hybridization experiments, a purified insert from the desired EST was radiolabeled (Sambrook et al., 1989) to a specific activity of >1×109 cpm/mg and then used at 106 cpm/ml of hybridization solution. Hybridization of blots with probes was performed exactly as described by Sambrook et al. (1989). Highstringency washes were carried out twice for 45 min to remove non-specifically bound probes, each time using 0.1× SSC, 0.1% SDS at 65°C unless indicated otherwise. The blots were subsequently exposed to film with an intensifying screen at −70°C for at least 12 h.
3. Results and discussion 3.1. General features of trypanosome ESTs 2. Materials and methods Nucleic acids were prepared from trypanosomes following established procedures (Sambrook et al., 1989). Messenger RNA purified from a cloned population of bloodstream T. b. rhodesiense WRaTat1.24, re-expressing the metacyclic variant antigenic type (MVAT )4 (Alarcon et al., 1994), was used in the construction of a unidirectional oligo dT-primed EcoRI/XhoI cDNA library in lZAPII (Stratagene, La Jolla, CA). Inserts in an aliquot of the library were excised and rescued in pBluescript using the ExAssist helper phage in-vivo excision system (Short et al., 1988), then transformed into E. coli SOLR following a protocol supplied by the manufacturer (Stratagene). The transformants were selected on 2xYT plates containing ampicillin at a concentration of 75 mg/ml. The sizes of inserts of cDNA in random bacterial colonies were determined by the polymerase chain reaction (PCR; Saiki et al., 1985) using T3 and T7 primers. Only inserts larger than
The partial sequencing of anonymous cDNA clones to identify ESTs is an effective method by which to survey the repertoire of expressed genes of any organism (Adams et al., 1993). The partial nucleotide sequence of 2128 independent randomly selected directional cDNA clones was analysed by sequence-similarity searches against the GenBank databases, most recently in March 1998, using the BLAST program (Altschul et al., 1990). Of these, 834 were homologues of proteins whose functions are known, whereas 192 were homologous to proteins or open reading frames (ORFs) of unknown function. Proteins that have been described in at least one species of the African trypanosomes were identified by 264 (12.4%) individual ESTs; among these, one was a homologue of a gene described in T. congolense, four described in T. equiperdum and 259 in T. brucei. Thirty-eight of the ESTs identified were homologous to genes described in T. cruzi. The ESTs in the present study that are homologous to genes or gene
A. Djikeng et al. / Gene 221 (1998) 93–106
products already known in trypanosomes represented 77 distinct genes. The ESTs known to be associated with variant surface glycoprotein ( VSG) gene expression, i.e. ESAGs, GRESAGs and VSGs, were 39 or 1.8%. These are in contrast with an earlier study (elSayed et al., 1995) in which such ESTs were 3% of the total analysed. The VSG-encoding transcripts were only 29 or $1.4%; a surprising result, considering that this protein is thought to comprise 10% of trypanosome cellular protein (Cross, 1995). Among these, the ESTs that encode MVAT4, the VSG expressed by the trypanosome from which the cDNA library was made, were only 0.87% of the total analysed so far. This finding contrasts with that made from the sequencing of random clones of genomic DNA (el-Sayed and Donelson, 1997). In that study, the number of genomic sequences associated with antigenic variation was 9.6%, nearly the same as the proportion of VSG among cellular proteins (Cross, 1995). The total number of individual ESTs that identified genes described in trypanosomatids other than the African trypanosomes was 303 or $14% of the total; 26 (1.2%) of these were homologues of genes described in different species of Leishmania. The deduced amino acid sequences of 834 (39%) ESTs showed significant homologies [BLASTX P(n)<10−4, and match length ≥10 amino acid residues] with proteins whose genes have been described in diverse organisms, including man, rodents, yeasts and plants. A highly significant BLAST P(n) value provides a strong indication for the presence of a functional counterpart of the protein in the particular organism from which the query sequence has been obtained. However, the significance ascribed to the P(n) value must take into account the accuracy of the EST sequence, and the evolutionary relationship between the organisms in which the gene homologues have been described. Based upon protein homology and/or structural motifs, the EST homologues were classified into broad functional categories according to the criteria proposed by Adams et al. (1993). The number of individual ESTs in each of the six functional categories—gene expression, regulation and protein synthesis; internal/external structure and motility; metabolism; defence and homeostasis; signalling and communication; and cell division /DNA synthesis, repair and replication—is summarized in Table 1. The volume of the data precludes its presentation entirely in the printed format; instead, a partial, non-redundant list of the ESTs analysed in the present study, grouped by functional category, is given in Tables 2–11. Although any one EST can be involved in processes that span more than two functional groups, for simplicity, each EST was assigned to only the group that best describes its most commonly known function. The assignment of a putative function to each EST was based upon significant [BLAST statistic P(n)<10−4] matches to sequences in the public databases. The
95
significant matches may predict a biochemical and physiological role for a novel gene or detect a homologue of a gene whose function is well defined in another organism. Among the 1026 ESTs with homologues, 462 unique gene transcripts were represented. Nearly 10% of the ESTs that had significant open-reading frames encoded homologues of proteins to which no functions have been assigned to date. A significant proportion (34% or 729) of the ESTs with known homologues identifies genes that had not previously been described in trypanosomes. Thus, considering only the ESTs that are homologous to genes of known functions, it appears that one of every three ESTs analysed at this stage of the study encodes a homologue of a gene not described previously in trypanosomes. Using yeast (Goffeau et al., 1996) as a reference eukaryote, the almost 942 unique transcripts identified in the present study represent approximately 5–8% of the predicted distinct transcripts of the trypanosome. The proportion of trypanosome transcripts that are not translated into proteins is unknown. Transcripts that encode different proteins in the trypanosome were encountered with varying frequencies among the ESTs analysed. Those that encode different types of ribosomal proteins were encountered at the highest frequency (8.4%), followed by those encoding tubulins (a and b) that were encountered 3.6% of the time. To facilitate the broader availability of the EST data to a wider research community, relevant information on all the ESTs that we have sequenced to date has been deposited in the dbEST section of GenBank. 3.2. ESTs from developmentally transcribed genes Success of the trypanosome as a parasite depends in part on its ability to adapt to changes in its microenvironment. This adaptation is probably effected through elaborate differential gene activation and repression. Sequencing of transcripts from a particular lifecycle stage of the trypanosome could provide a useful catalogue of genes that are activated at that stage. Establishing the identity of the proteins encoded by such transcripts and elucidating their functions is a first step to a better understanding of how the trypanosome responds to the various environmental cues. Scanning of the trypanosome ESTs by differential hybridization could reveal genes, which are transcribed in a stagespecific manner. To investigate this proposition, some of the ESTs homologous to known genes were tested for the possibility that they are also developmentally transcribed in T. brucei. Equal quantities of inserts from each EST were slot-blotted on to membranes and then hybridized under identical conditions with radiolabeled total cDNA prepared from procyclic culture forms, long-slender forms and short stumpy forms of the
96
A. Djikeng et al. / Gene 221 (1998) 93–106
Table 1 ESTs in each functional category Functional category
Number in category
Percentage of ESTs analysed
Gene expression, regulation and protein synthesis Internal/external structure and motility Metabolism Defence and homeostasis Signalling and communication Cell division/DNA synthesis, repair and replication ESTs with putative known function Trypanosome gene homologues Non-trypanosome gene homologues Proteins or ORFs of unknown function No homologues Total ESTsa
268 213 158 102 58 35 834 264 570 192 1102 2128
12.5 10 7.4 4.8 2.7 1.6 39 12.4 26.6 9 52 100
aThe ESTs referred to in this paper have the following Accession Nos: 506048–506103, 506959–507089, 507671–507688, 510940–511086, 513429–513679, 522043–522091, 523427–523433, 549851–549853, 547877–547960, 576937–577136, 592257–592314, 607522–607567, 612754–612822, 637830–637876, 594603–594608, 689554–689632, 736480–736621, 825247–825380, 847743–847771; 913980–914065; 1059180–1059205; 1149831–1149875; 1152865–1152898; 1296434–1296530; 1352808–1352914; 1420090–1420188; 1568443–1568594. Table 2 T. b. rhodesiense ESTs homologues: signalling and communication Clone number T1118 T2662 T3836 T1854 T1284 T3401 T1184 T3369 T1260 T4939 T1707 T3072 T1758 T1185 T1243 T2248 T2884 T2290 T1980 T3093 T1103 T2891 T2788 T3159
dbEST number
Size (bp)
Sequence length
Putative identity
Accession No
Organism
P(n)
1 1
341 367
GP:D21070
Rana catesbeiana, bull frog
SP:Q03100
Dictyostelium discoideum
3.4×10−7 4.2×10−9
825333 511011 506967 689623 506100 689601 507051 1352887
0.6 1 2 0.6 1.3 2.1 1.2 0.9
272 309 355 394 311 364 346 341
U32305 GP:X67469 GP:U09809 L16545 SP:P06197 PRF:2104200A SP:P42525
Caenorhabditis elegans Mus musculus Limulus polyphemus Bos taurus Saccharomyces cerevisiae Orconectes limosus Dictyostelium discoideum
U06644
T. brucei
522060 825295 522089 506101 510940 513564 577071 513590 511053 607553 507676 577075 577016
1.2 1.3 0.6 1.3 0.6 2.8 1.4 0.9 0.6 0.8 1.3 0.8 1.6
267 335 288 314 319 356 426 366 257 408 301 403 317
AJ007019 U35070 PIR:S08465 GP:U27568 GP:M77133 PIR:S02392 F14824 GP:D10495 SP:P32490 SP:P17882 SP:P21865 P34099
T. brucei Candida albicans Homo sapiens Leishmania major Drosophila melanogaster Homo sapiens Sus scrofa Homo sapiens Saccharomyces cerevisiae T. brucei E. coli Dictyostelium discoideum
SP:P08240
Homo sapiens
612779
1.2
391
a ryanodine binding protein Adenylate cyclase, aggregation specific (ATP pyrophosphate-lyase) ADP-ribosylation factor protein AM2 receptor Arginine kinase cGMP-specific phosphodiesterase CPD-diacylglycerol inositol 3-P.trans. EF hand Ca-binding protein Extracellular signal-regulated kinase Flagellar calcium binding protein (24-kDa calflagin) variant surface glycoprotein 222 Integrin-like protein a Integrin b-4 chain precusor LACK Laminin receptor LDL receptor protein Potential laminin-binding protein Protein kinase C delta-type Proteinkinase MKK1/SSP32 Putative flagellar calcium-binding protein Sensor protein Kdp D Serine–threonine protein kinase Signal recognition particle receptor a subunit Vesicular-fusion protein NSF
SP:P46461
Drosophila melanogaster
506065 547921
parasite. Fig. 1 shows a typical hybridization profile obtained after a high-stringency, post-hybridization wash. Tentative identities of the ESTs placed in each of the slots are listed in Table 14. Assuming equal efficiency in the labeling of transcripts at each of the developmental stages, it is apparent from the hybridization signal intensities that a number of the ESTs, some indicated with arrows in the figure, hybridized more strongly with
8.2×10−53 4.9×10−5 1.1×10−26 4.2×10−5 7.9×10−8 9.2×10−5 6.1×10−26 1.2×10−59 5.6×10−4 7.8×10−5 4.5×10−7 2.7×10−28 4.2×10−44 8.5×10−10 2.3×10−34 3.2×10−19 4.0×10−22 1.6×10−84 4.4×10−65 1.5×10−14 3.4×10−16 9.7×10−21
the probe from one and not the other developmental stages. This indicates that the corresponding transcripts may be more abundant in the trypanosomes at one specific life-cycle stage than at the others. One such EST, which hybridized in a stage-specific manner, encodes a putative homologue of variant surface glycoprotein 222(17.II ) whose transcript is present in both the long slender and short stumpy forms but is undetect-
97
A. Djikeng et al. / Gene 221 (1998) 93–106 Table 3 T. b. rhodesiense ESTs homologues: cell division/DNA synthesis, repair and replication Clone number T2801 T3743 T1864 T2038 T3975 T2187 T2128 T3253 T1880 T1334 T6589 T2024 T3228 T4019 T1773 T4188 T4379
dbEST number
Size (bp)
Sequence length
Putative identity
577029
0.6
338
752219 511014 513431 736561 513521 513481 637849 511023 506998 1152887
2 0.8 0.6 2.8 1 0.6 0.7 1.4 0.6 1.1
352 279 241 362 411 379 340 224 296 353
511077 637834
0.7 1.9
254 369
847768 510959 914032 1352900
0.7 0.6 1.7 1.9
345 248 396 329
Activated Int-3 mammary gene product, notch4 Cell divisioncycle 2-like kinase Cell divison control protein 40 DNA polymerase DNA topoisomerase II H Beta 58 protein Histone 2A Histone 2B Histone H3 Histone H4 Meiotic spindle formation protein, MEI-1 Pac 10p Probable cell control protein crn RAD3 RAD8 RAD51 UV protection protein mucB homolog
Accession number
Organism
M80456
Mus musculus
SP:Q03114 SP:P40968 PIR:S41649 P06786 SP:P40336 GP:X83272 SP:P27795 SP:P40285 PIR:A25642
Rattus norvegicus Saccharomyces cerevisiae Plasmodium falciparum Saccharomyces cerevisiae Mus musculus T. cruzi T. cruzi Leishmania donovani Maize
sp|P34808
C. elegans
GP:U29137
Saccharomyces cerevisiae
PIR:A39634
Drosophila melanogaster
Y09076 GP:Z49811 SP:P37383 PIR:H64239
Schizosaccharomyces pombe Schizosaccharomyces pombe Gallus gallus M. genitalium
P(n)
3.8×10−5 1.1×10−8 1.0×10−13 6.7×10−10 9.6×10−41 1.6×10−23 1.7×10−55 5.8×10−38 4.2×10−20 1.6×10−37 9×10−33 4.6×10−14 2.6×10−29 9.5×10−16 8.8×10−24 8×10−60 4.1×10−9
Table 4 T. b. rhodesiense ESTs homologues: gene expression, regulation and protein synthesis Clone number
dbEST number
Size (bp)
Sequence length
Putative identity
Accession number
Organism
P(n)
T3995 T1601 T2973 T4648 T4058 T1602 T2563 T6595 T2929 T4860 T1984 T1789 T2488 T3177 T3267 T3885 T2156 T3167 T1607 T1695 T3260 T2554 T2937 T1807 T1833 T1219 T1237 T2710
736579 507066 492290 1296507 736612 507067 577108 1152890 577120 1352829 511056 510969 547894 612787 637859 825344 513500 612781 507071 522052 637842 577103 577123 510983 510946 507030 507037 576972
1.8
367 280 356 333 341 251 390 363 367 247 251 275 414 390 390 323 334 433 326 284 365 227 410 307 282 239 219 313
26S protease regulator subunit 4 25 kDa elongation factor 40S ribosomal protein 40S ribosomal protein S4 (scar protein) 40S ribosomal protein S11 40S ribosomal protein S12.e 40S ribosomal protein S14 40S ribosomal protein S15a 40S Ribosomal protein S16 40S ribosomal protein S17 40S ribosomal protein S18 40S ribosomal protein S2 40S ribosomal protein S21 40S ribosomal protein S24 40S ribosomal protein S25 40S ribosomal protein S26 40S ribosomal protein S27 40S ribosomal protein S3A 40S ribosomal protein S4 X isoform 40S ribosomal protein S5 40S ribosomal protein S6 40S ribosomal protein S8 40S ribosomal protein S9 60S acidic ribosomal protein P1 60S acidic ribosomal protein PO 60S ribosomal protei L7 60S ribosomal protei L7A 60S ribosomal protein L12
SP:P49014 SP:P34827 SP:P49206 22146 P35687 PIR:A48574 SP:P19800 SP:P39027 SP:P17008 P:P27770 GP:L34567 GP:P27952 SP:P35687 SP:P16632 SP:P46301 X82365 PIR:S35758 SP:P49242 SP:P12750 SP:P26783 SP:P41798 SP:P25204 SP:P17959 SP:P26643 SP:P26796 PIR:S30214 SP:P32429 SP:P17079
Rattus norvegicus T. cruzi Arabidopsis thaliana Homo sapiens Oryza sativa T. brucei T. brucei Rattus norvegicus Homo sapiens Neurospora crassa Entamoeba histolytica Rattus norvegicus Oryza sativa Homo sapiens Lycopersicon esculentum Brugia pahangi Xenopus laevis Rattus norvegicus Homo sapiens Podocoryne carnea Kluyveromyces marxianus Leishmania major T. brucei T. cruzi T. cruzi Mus musculus Gallus gallus Saccharomyces cerevisiae
8.8×10−20 1.1×10−10 1.1×10−31 2.9×10−16 8.1×10−24 2.3×10−25 4.3×10−80 2.1×10−24 1.6×10−33 1.3×10−23 1.4×10−21 2.8×10−21 1.4×10−23 3.1×10−38 1.8×10−24 1.2×10−26 2.1×10−18 9.2×10−39 2.3×10−20 7.5×10−28 7.8×10−37 4.2×10−31 1.4×10−61 3.9×10−28 6.8×10−45 1.9×10−8 6.7×10−8 1.7×10−38
1.2 1.4 0.9 0.7 0.7 1.4 2 2.1 0.8 0.6 0.6 0.6 0.6 0.6 0.8 1.2 0.6 0.9 0.8 0.8 1.3 1 0.6 0.8 0.6
98
A. Djikeng et al. / Gene 221 (1998) 93–106
Table 5 T. b. rhodesiense ESTs homologues: gene expression, regulation and protein synthesis Clone number
dbEST number
Size (bp)
Sequence length
Putative identity
T4807 T1089 T1283 T2568 T2098 T3966 T3189 T1262 T1208 T1304 T2789 T2212 T3067 T3270 T3870 T1099 T2011 T1126 T3272 T1302 T3280 T1893 T1182 T3982 T3216 T2600 T2160 T2050 T4047 T3081 T2882
1296481 506051 506966 577113 513466 736550 612795 507053 507022 506979 547955 513537 847744 637861 825259 506058 511070 506072 637863 506978 637869 511029 507087 736568 612816 547912 513504 513440 752250 607547 577070
1.1 0.6 0.7 1.4 0.8 0.8 0.6 0.8 0.6 0.7 0.9 1 0.7 1.4 0.6 0.6 0.6 0.5 1.4 0.8 0.6 0.7 0.7 0.6 0.6 0.6 0.6 0.6 1 0.7 1.7
278 335 320 409 313 374 402 233 269 271 396 348 283 369 238 230 254 348 414 239 395 244 336 358 383 301 379 248 336 361 388
60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S 60S
ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal ribosomal
protein protein protein protein protein protein protein protein protein protein protein protein protein protein protein protein protein protein protein protein protein protein protein protein protein protein protein protein protein protein protein
able in the RNA of procyclic forms. A surprising finding was a strong hybridization of the homologue of AnTat1.8 (in slot 23.II ) to the probe made from procyclic form mRNA, Fig. 1A (23 II ). Since procyclic forms do not express VSG, the hybridization could be from residual, attenuated or leaky transcription of a homologue of this VSG gene. Alternatively, it could be from a transcript that shares homology with a VSG transcript, but does not itself encode a VSG. In order to investigate these preliminary observations further, some ESTs were separately hybridized with Northern blots (Sambrook et al., 1989) of RNA from the different developmental forms of T. brucei. As shown in Fig. 2, transcripts from genes subject to different levels of developmental stage-specific transcription were among the ESTs. T1310, containing the trypanosome homologue of putative cell surface protein, Ldp23, of Leishmania (Campos-Neto et al., 1995), appears to be more abundant in short stumpy forms ( Fig. 2A, lane 2) and procyclic forms ( Fig. 2A, lane 3) than in the long slender forms ( Fig. 2A, lane 1); the homologue of the major surface glycoprotein, gp63, of Leishmania ( T1832)
L19 26 HG12 L1 L10 L11 L12 L13.1 L13A L15 L17A L18A L19 L2 L23 L24 L26 L27 L3 L30 L31 L32 L34 L35 L36 L37E L39 L3K L5A L6 L7
Accession number
Organism
P(n)
P:P36241 SP:Q02877 SP:P28751 SP:P49669 GP:L25899 P42922 SP:P23358 SP:P41128 SP:P35427 PIR:S48502 M85295 SP:P41093 P49693 SP:P29766 P24049 SP:P38663 PIR:S51347 PIR:S00401 SP:P39023 SP:Q10353 SP:P46290 PIR:S11393 SP:P40590 F14766 SP:P49181 SP:P39094 GP:X95458 SP:P40590 Z81207 SP:P34091 P25457
Drosophila melanogaster Homo sapiens Homo sapiens T. brucei Homo sapiens Leishmania chagasi Rattus norvegicus Brassicus napus Rattus norvegicus Saccharomyces cerevisiae Drosophila melanogaster Drosophila melanogaster Arabidopsis thaliana Lycopersicon esculentum Rattus norvegicus Homo sapiens Saccharomyces cerevisiae Rattus norvegicus Homo sapiens Schizosaccharomyces pombe Nicotiana glutinosa Homo sapiens Pisum sativum Sus scrofa C. elegans Leishmania infantum Zea mays Pisum sativum Xenopus laevis Mesembryanthemum crystallinum Schizosaccharomyces pombe
7×10−24 3.5×10−41 1.8×10−7 1.1×10−74 2.4×10−37 1.3×10−43 1.8×10−30 5.7×10−12 1.1×10−24 7.0×10−32 2.8×10−19 1.6×10−21 1.1×10−23 2.5×10−44 1.0×10−11 3.3×10−11 4.2×10−26 1.5×10−32 2.6×10−58 9.1×10−7 3.3×10−8 1.6×10−26 3.9×10−19 1.8×10−13 2.6×10−18 3.7×10−41 1.8×10−20 2.5×10−12 1.5×10−36 3.4×10−24 6.5×10−26
used in Fig. 2C appears to be most abundant in the short stumpy forms ( lane 2) or slightly more abundant in procyclic forms ( lanes 3) and least abundant in the long slender forms ( lane 1). As could be predicted from the data shown in Fig. 1, the levels of the transcript encoding the putative homologue of variant surface glycoprotein 222 (used in Fig. 2D) is present in both long slender and short stumpy forms ( lanes 1 and 2, respectively) but is undetectable in the procyclic forms ( lane 3). Curiously, the transcripts detected by this EST ( T1707) differ in size in the two developmental forms: on average, they are 2.1 kb in the RNA of long slender forms ( lane 1) and 2.2 kb in the short stumpy forms ( lane 2). The significance of this is yet to be determined, but may be due to differential polyadenylation or addition of the spliced leader. There were, however, a number of ESTs, such as the homologue of the receptor for activated protein kinase C (used in Fig. 2B), whose levels of transcription remained nearly constant irrespective of the developmental stage of the trypanosome from which the RNA was prepared.
99
A. Djikeng et al. / Gene 221 (1998) 93–106 Table 6 T. b. rhodesiense ESTs homologues: gene expression, regulation and protein synthesis Clone number
dbEST number
Size (bp)
Sequence length
Putative identity
Accession number
Organism
P(n)
T2691 T4272 T4031 T3151 T1698 T3178
576975 914039 752241 612774 522054 612788 577106
0.8 2.2 0.7 0.9 1.1 1.1 0.7
230 386 351 386 350 365 357
SP:P11518 P:P29763 Q02326 SP:P05738 GP:U24704 SP:P28656
Homo sapiens Chlamydomonas reinhardtii Saccharomyces cerevisiae Saccharomyces cerevisiae Homo sapiens Mus musculus
12626
Zea mays
2.2×10−11 1.3×10−18 1.0×10−21 2.5×10−8 1.4×10−14 1.1×10−15 7.9×10−5
825276 507061 637844 592274 507012 511062
2 1.3 2 0.7 0.6 0.6
208 263 356 399 369 265
Y08614 GP:X63071 PIR:S28030 SP:P28365 PIR:S35950
Homo sapiens Homo sapiens rice Euplotes octacarinatus Tobacco
SP:P07261
Saccharomyces cerevisiae
577097
1.3
332
GP:L22453
Homo sapiens
513551 577118 914008 513479 612792 576976
1.5 2 1.2 1.8 1.9 1.5
375 399 357 309 433 429
SP:P18858 PIR:S28721 18466 SP:P33479 SP:P40796
Homo sapiens T. brucei African swine fever virus Suid herpes virus Drosophila melanogaster
M94364
Crithidia fasciculata
506093 513493 506069 513519 511076 506991 547950 513490
1.1 0.7 1.3 0.7 0.7 2 0.6 1.1
322 273 285 383 265 414 406 337
SP:P14425 GP:X86691 GP:D14540 SP:P48415 PIR:S49326 GP:D17532 PIR:S55383
Stripped dolphin (Steco) Homo sapiens Homo sapiens Saccharomyces cerevisiae Homo sapiens Homo sapiens Triticum aestivum
SP:P15625
Saccharomyces cerevisiae
506077 689554
0.8 0.8
342 377
60S ribosomal protein L7A 60S ribosomal protein P1 60S ribosomal protein YL16A 60S ribosomal protein L9 Antisecretory factor-1 Brain protein DN38 Copia-like retrotransposon Hopscotch polyprotein CRM1 DBP-5 protein DNA-binding protein Gt-2—rice DNA-directed RNA polymerase edeB protein Glycolytic genes transcriptional activator GCR1 HIV-1 TAR RNA binding protein (TARB) Human DNA ligase I Hypothetical protein 1 Helicase Immediate-early protein IE 180 LA protein homolog Metalloproteinase (=gp63 of Leishmania) Metallothionein Mi-2 protein, helicase MLL, ALL-1 Multidomain vesicle coat protein Nascent polypeptide associated complex a P54,ATP-dependent RNA helicase Peptidylpropyl isomerase Phenylalanyl-tRNA synthetase b chain cytoplasmic Phosphotidylinositol 4.5 diP 3-kinase PMS1gene product
GP:U23476 X96581
Dictyostelium discoideum Schizosaccharomyces pombe
T2651 T3755 T1593 T3245 T2992 T1195 T1992 T2547 T2229 T2577 T4122 T2125 T3185 T2692 T1171 T2148 T1122 T2185 T2023 T1323 T2685 T2144 T1135 T3302
3.3. Potentially interesting genes in trypanosomes Only 12% of the ESTs in our study identify genes already described in African trypanosomes; thus, a majority are homologues of known genes involved in some processes that have not been studied in these parasites. In each of the six functional categories, Table 1, there are ESTs that identify genes for the first time in the trypanosomes. For example, homologues of proteins involved in the cellular protein degradation pathway were identified: ubiquitin-conjugating enzyme ( T2097), proteasome ( T3176), polyubiquitin ( T1165), putative 26S proteasome regulator ( T3995), a-proteasome subunit ( T3176) and putative b-proteasome subunit ( T3128). Similarly, homologues of several proteins, for example, extracellular signal-regulated kinase, and the receptor for activated protein kinase C ( RACK ), both of which participate in signal reception and transduction pathways, were identified. The availability of
2.6×10−11 2.4×10−6 3×10−5 1.2×10−20 2.4×10−38 7.9×10−5 4.9×10−50 6.4×10−15 9.7×10−16 1.3×10−22 7.8×10−6 1.5×10−16 2.0×10−21 8.4×10−6 2.7×10−5 9.9×10−5 7.5×10−8 1.9×10−9 5.6×10−13 4.5×10−18 3.0×10−26 2.3×10−5 9.6×10−17
these ESTs makes it possible to investigate some of these processes in the trypanosome. Of particular interest is an EST ( T2665) that encodes a homologue of TRAP-1, the Tumor Necrosis Factor ( TNF ) 1 receptorassociated protein. TRAP-1 has been shown to interact with a TNF receptor by associating with its intracellular domain (Song et al., 1995). This interaction allows TNF to bind its cognate receptor and, thus, initiate signal transduction (Song et al., 1995). The finding of a trypanosome homologue of TRAP-1 is potentially important in view of a recent observation indicating that the specific uptake and internalization of TNF by trypanosomes modulate their growth and proliferation in vivo (Magez et al., 1997). Using the TRAP-1 homologue in the yeast two-hybrid system ( Fields and Sternglanz, 1994), it should be possible to identify the receptor for TNF in trypanosomes. This, in turn, can be used to determine how these parasites internalize TNF, the signal transduction pathways in which this
100
A. Djikeng et al. / Gene 221 (1998) 93–106
Table 7 T. b. rhodesiense ESTs homologues: gene expression, regulation and protein synthesis Clone number
dbEST number
Size (bp)
Sequence length
Putative identity
Accession number
Organism
P(n)
T1165 T3104 T3176 T2104 T3128 T2916 T2163 T2214 T2086 T2040 T2152 T4281 T1265 T2085 T3436 T4130 T2130 T2097 T4624 T3327 T2912
510945 607558 612786 513470 607563 592280 513505 513539 513458 513432 513497 1149853 507055 513457 736494 914013 513483 513465 1420180 689572 825292
0.8 0.9 1.2 0.9 1.2 0.8 1.1 1.9 1.3 0.7 0.9 1.6 0.9 1.2 0.6 0.75 1.2 0.7 0.7 0.6 2.5
323 348 391 305 381 359 397 386 288 244 405 333 295 347 397 217 366 363 435 347 300
Polyubiquitin Potential laminin-binding protein Proteasome Proteasome component MECL-1 precursor Proteasome component PRE30 precursor Putative ATP-dependent RNA helicase Putative proteasome component Replicase Ribonucleotide reductase small unit RING-finger protein RNA-binding protein Serine-type carboxypeptidase Sialidase TAR RNA-binding protein Transcription factor ATBF1 Transcription factor IIIA (TFIIIA) Trithorax protein trxll Ubiquitin-conjugating enzyme E2 Ubiquitin 81-aa extension protein 2 Vacuolar protein sorting homolog Yeast clathrin coat assembly protein
PIR:A31115 SP:P38980 SP:P34066 GP:P40306 SP:P38624 SP:P34498 SP:Q09841 GP:Z68502 GP:U30494 PIR:S49446 PIR:S44024 IR:S46008 PIR:A49227 GP:L22453 D26046 P:P79797 GP:Z50152 SP:P35132 05540 U35246 P35181
T. cruzi Sus scrofa Arabidopsis thaliana Homo sapiens Saccharomyces cerevisiae C. elegans Schizosaccharomyces pombe Giarlic latent virus Urechis caupa Lotus japanicus Anabaena sp. Saccharomyces cerevisiae Actinomyces viscosus HIV Mus musculus Ictalurus punctatus Drosophila melanogaster C.elegans Arabidopsis thaliana Homo sapiens Saccharomyces cerevisiae
4.7×10−65 7.9×10−24 1.3×10−15 2.3×10−29 1.0×10−23 6.0×10−27 2.9×10−43 1.5×10−6 1.6×10−32 2.1×10−8 2.9×10−14 6.1×10−14 4.5×10−6 1.0×10−52 4.4×10−7 6.8×10−8 3.1×10−5 3.0×10−36 7.5×10−31 5.5×10−13 1.2×10−17
Table 8 T. b. rhodesiense ESTs homologues: defence and homeostasis Clone dbEST number number
Size (bp)
Sequence Putative length identity
Accession number
Organism
P(n)
T3058 T2983 T2000 T2847
607536 592266 511065 577044 607561
0.9 1.1 1.6 1.1 1.2
390 389 252 305 389
Calcium-dependent protein kinase Complement receptor 1 Copper sensor protein pcoS—E. coli Cyclophilin-related protein
PIR:S56717 L17418 PIR:S52258 L04289
Cyclosporin A-binding protein, Cyclophilin A
SP:P18253
2.7×10−7 3.4×10−5 7.5×10−21 5.3×10−5 4.4×10−38
507072 510954
2.4 1.1
331 258
510967 577042 576942 736589 507085 513531
0.6 0.8 2 1.2 0.9 0.6
248 343 258 321 321 333
513560 847752 513664 513503 513541 637839 1420124 577010 506080 547923
1.4 2.8 1.1 0.9 1.2 1.8 2.1 1 0.5 1.4
287 215 344 387 304 380 377 316 347 401
506098 825287
0.5 0.8
318 291
Cyclosporine synthethase Extension precusor-cell wall hydroxyproline-rich glycoprotein Fos-related antigen 2 Granule cell marker protein, Gcap1 gene product HSP 70 HSP 83 Humoral lectin prepropeptide, hemocytin Large proline-rich protein Bat2 (HLA-B-associated transcript 2) Lysosomal protective precursor Nucleophosmin/nucleoplasmin-3 P-glycoprotein STI (stress inducible protein) Supressor protein SRP40 T-complex protein 1, theta subunit T complex protein TCJ2 gene product Transplantation antigen Tumor necrosis factor type 1 receptor associated protein Wiskott–Aldrich syndrome protein Yeast putative mitochondrial carrier YEL006W
T3126 T1610 T1761 T1786 T2838 T2802 T4013 T1169 T2202 T2241 T3757 T2453 T2159 T2216 T3238 T4480 T2771 T1147 T2665 T1178 T2627
PIR:S41309
Zea mays Homo sapiens E. coli Mus musculus Schizosaccharomyces pombe Tolypocladium inflatum
SP:P13983
Nicotiana tabacum
GP:U18982 GP:L10908 PIR:S05438 P12861 GP:D29738
Rattus norvegicus Mus musculus Leishmania major T. brucei Bombyx mori
SP:P48634
Homo sapiens
SP:P16675 U64450 GP:L29484 GP:X79770 SP:P32583 SP:P42932 PRF:2206327A L42549 GP:U18488
Mus musculus Mus musculus Leishmania tarentolae Glycine max Saccharomyces cerevisiae Mus musculus Cucumis sativus T. cruzi Salmo trutta, brown trout
PIR:A55877
Homo sapiens
GP:U12707 SP:P39953
Homo sapiens Saccharomyces cerevisiae
4.5×10−6 6.1×10−6 3.7×10−6 1.4×10−11 3.9×10−54 1.0×10−53 1.6×10−7 5.9×10−7 3.3×10−26 2.2×10−5 1.7×10−11 3.4×10−36 3.4×10−12 1.5×10−28 6.3×10−37 1.2×10−48 2.6×10−32 1.3×10−13 4.2×10−9 2.4×10−9
101
A. Djikeng et al. / Gene 221 (1998) 93–106 Table 9 T. b. rhodesiense ESTs homologues: metabolism Clone number
dbEST number
Size (bp)
Sequence length
Putative identity
Accession number
Organism
P(n)
T2928 T2566 T2612 T2367 T2483 T4048 T2423 T1877
592287 577111 547914 513635 549851 736604 547880 511022 1420126
0.6 0.8 1.8 0.7 1.1 1.2 0.6 0.8 2
317 389 402 300 400 358 370 229 325
U40371 PIR:A52154 SP:P07108 U70253 U28722 L2541 SP:P26338 PIR:C28116
Homo sapiens Drosophila melanogaster Homo sapiens Leishmania major Mastomys hildebrantii Leishmania donovani T. equiperdum Homo sapiens
U26666
T. b. rhodesiense
2.4×10−19 3.0×10−23 5.0×10−6 6.9×10−10 1.7×10−6 2.8×10−14 8.2×10−26 9.6×10−42 8.3×10−11
513506 752182 507048 637864
1.4 0.6 1 2
341 289 384 434
GP:L46869 PIR:S52054 PIR:A48590
Neurospora crassa Trypanoplasma borreli Homarus vulgaris
PIR:JN0606
Rattus norvegicus
507046 576969 825337 507039 576948 607555 510980
0.5 1.6 1.4 1.2 1 0.9 1
273 282 311 340 416 372 251
SP:P21535 PIR:A45612 X66103 GP:X65738 PIR:JU0277 Z25955
Schizosaccharomyces pombe Sauroleishmania tarentolae Propionigenium modestum Plasmodium falciparum Pseudomonas fluorescens Arabidopsis thaliana
SP:Q02745
Sus scrofa
506067 825350 752190 511073
1.8 1.8 0.8 0.6
297 317 391 244
3∞, 5∞ cyclic nucleotide phosphodiesterase Acetyl-CoA synthase Acyl-CoA binding protein Acyl-CoA synthetase II Adenine phosphoribosyltransferase Adenosine phosphoribosyltransferase Adenylate cyclase ADP, ATP carrier protein Alkyl hydroperoxide reductase/ thiol-specific antioxidant Alternative oxidase Apocytochrome b Arginine kinase ATP-stimulated glucocorticoid-receptor translocation promoter protein ATP synthase A chain precursor ATPase 6 ATPase a subunit ATPase I Carboxylesterase Carboxypeptidase CMP-N-acetyl-neuraminate-bgalactosamide-a-2, 3 sialyltransferase Corkscrew protein 4A Cystathionine gamma-synthase Deoxyhypurine synthase Dodecenonyl-CoA delta isomerase
19909 P46807 P49366 PIR:A40517
Drosophila melanogaster Mycobacterium leprae Homo sapiens Rattus norvegicus
T4479 T2166 T3465 T1256 T3273 T1254 T2705 T3871 T1240 T2828 T3098 T1804 T1120 T3895 T4080 T2016
cytokine participates in the trypanosomes and ultimately how the cytokine modulates the growth of these parasites. Equally intriguing was the finding of homologues of two proteins that have significant roles in the cellular immune responses to Leishmania infection. One of the proteins, Ldp23, induces a Th1 cytokine response that correlates with resistance of mice to infection with Leishmania (Campos-Neto et al., 1995). The other protein, a Leishmania homologue of the receptor for activated protein C kinase (LACK ), induces nearly complete protection of mice against infection with Leishmania (Julia et al., 1996). The finding of homologues of these proteins in the trypanosome, an extracellular parasite, warrants investigations into their cellular localization and the role that they might play in cellmediated immune responses during trypanosome infection. There are other ESTs whose encoded products show a significant homology to proteins unexpected in the African trypanosomes. One of these is the ede protein ( T1195), the expression of which is crucial for de-differentiation of tobacco pith tissue upon transfer to synthetic medium (Cecchini et al., 1993). The level of homology for the ede protein is highly significant (P(n)=2.4×10−38), suggesting strongly that a homologue does indeed exist in the trypanosome. The exact
5.1×10−30 3.9×10−5 2.0×10−71 2.8×10−39 1.2×10−5 6.4×10−8 5.4×10−34 7.8×10−7 6.1×10−10 6.9×10−26 3.6×10−6 4.4×10−5 4.7×10−10 1.2×10−21 1.3×10−8
function of ede is unknown; however, it is thought to be involved in some fundamental processes, including cell division. Obviously, it is one of the proteins that mediate processes conserved in both trypanosomes and plant cells. Unlike in the tobacco pith where ede gene transcription is developmentally modulated, the transcripts encoding the trypanosome homologue of ede were detected in the long-slender forms, short stumpy forms and procyclic forms of T. brucei (data not shown). 3.4. Trypanosome-specific gene transcripts In a previous effort directed at trypanosome gene discovery by single-pass sequence determination of directionally cloned cDNA (el-Sayed et al., 1995), only approximately 39% of the cDNA sequences could be identified on the basis of matches to known genes in the public domain databases; the remaining 59%, had no matches to any known gene. In a parallel study that explored the efficiency of gene discovery through the sequencing of random clones of trypanosome genomic DNA (el-Sayed and Donelson, 1997), only 33% of the sequences had matches in the public domain databases, whereas 67% had no matches at all. In extending these efforts by surveying a greater number of cDNA clones, we found that 39% of the ESTs had a significant homology to genes with known function, whereas 9%
102
A. Djikeng et al. / Gene 221 (1998) 93–106
Table 10 T. b. rhodesiense ESTs homologues: metabolism Clone number
dbEST number
Size (bp)
Sequence length
Putative identity
Accession number
Organism
P(n)
Endozepine-related protein precursor (membrane-associated diazepam binding inhibitor) Enolase Fructose-bisphosphate aldolase Glucoamylase l precursor Glutamate decarboxylase Glycerol kinase GTPase GTP-binding protein rtb2 Guanine nucleotide regulatory protein H+ transporting ATPase H+-pumping ATPase 16-kDa proteolipid Hypothetical 71.5 kDa protein in cox13∞ region Iron superoxide dismutase precursor Lysosomal acid phosphatase Malate dehydrogenase Mevalonate pyrophosphate Mitochodrial processing peptidase Mitochondrial phosphate carrier protein precursor MURF4 protein, ATPase subunit 6 NADH-ubiquinone oxidoreductase chain 5 NADH dehydrogenase ( Ubiquinone) NADPH-cytochrome P450 reductase Phosphotransferase enzyme Phosphoribosylpyrophosphate synthetase PPi-dependent phosphofructo-1-kinase Protein disulfide-isomerase homolog precursor Protein phosphatase pp2A Putative spermidine synthase Pyrroline 5 Carboxylate reductase Pyruvate dehydrogenase E1 component Pyruvate kinase Rab23 protein, GTPase S-adenosylhomocysteine hydrolase Sedoheptulase-1,7 bisphosphatase precusor Serine carboxypeptidase III precursor Similar to Ser/Thr protein kinase Thioredoxin peroxidase Thymidine kinase, cytosolic Ubiquinol-cytochrome C reductase Ubiquitin-conjugating enzyme Vacuolar ATP synthase 16 kDa proteolipid subunit Vacuolar H+-pumping ATPase 16-kDa proteolipid Vacuolar type H+-ATPase proteolipid subunit Vacuolar-type H(+)-ATPase V-type ATPase 16-kDa proteolipid subunit
P07106
Bos taurus
4.4×10−8
SP:P42894 PIR:A54500 SP:P23176 U51031 GP:U32752 X91504 17085 GP:U03624 PIR:S37050 SP:P31413 SP:P04540 P09157 Z46971 SP:P46487 U49260 SP:P11914
Neocallimastix frontalis T. brucei Aspergillus awamori Saccharomyces cerevisiae Hemophilus influenzae Homo sapiens T. brucei Paramecium tetraurelia T. congolense Arabidopsis thaliana T. brucei E. coli Leishmania mexicana Eucalyptus gunnii Homo sapiens Saccharomyces cerevisiae
SP.P16036
Rattus norvegicus
3.3×10−47 6.9×10−81 4.9×10−5 1.3×10−12 2.9×10−32 2.5×10−16 2.6×10−22 7.9×10−13 4.1×10−63 1.7×10−33 3.9×10−7 2.6×10−23 1.6×10−15 8.6×10−13 2.2×10−16 2.8×10−7 7.3×10−19
PIR:S43288 SP:P24884 PIR:L00585 SP:Q07994 SP:P32673 76553 PIR:S52082 P12865 SP:Q06189 SP:Q09741 SP:P00373 Z25976 SP:P30615 PIR:S40244 M76556 SP:P46283 SP:P37891 U28929 P:Q26695 P27158 SP:P45634 Z50111
Herpetomonas muscarum Ascaris suum T. brucei Musca domestica E. coli L. donovani Entamoeba histolytica T. brucei Homo sapiens Schizosaccharomyces pombe E. coli Arabidopsis thaliana T. brucei Mus musculus Leishmania donovani Arabidopsis thaliana Oryza sativa C. elegans T.b. rhodesiense Rattus norvegicus Mus musculus Saccharomyces cerevisiae
SP:P31413
Neurospora crassa
L44582
Arabidopsis thaliana
AB003937
Acetabularia acetabulum
U81519 U48365
Pleurochrysis carterae Pleurochrysis carterae
T3750
752224
0.9
287
T3246 T3269 T1093 T3494 T1348 T3799 T4645 T1704 T2793 T4519 T1745 T3318 T2366 T2730 T3175 T2571
637845 637860 506054 736509 507059 752230 1420185 522057 577024 1296500 522082 689566 577021 576990 612785 577115 506993
1 1.4 0.7 1.8 1 1 0.9 0.6 1.1 1.2 1.1 1.2 0.9 0.6 1 1.9 0.9
369 386 237 322 410 398 397 250 349 361 217 353 335 331 367 416 244
513449 592257 506079 637874 577094 1152893 577046 847766 607551 513613 511050 825268 637866 507028 689628 511042 506096 825362 914053 736581 513666 847769 510970
0.8 1.2 0.8 1.2 2.8 1.2 2.3 1.6 0.9 0.8 0.7 1.6 2.1 0.8 1.2 1 0.5 1.5 0.85 2.7 0.7 1 0.7
257 425 371 375 361 384 346 300 376 332 265 245 350 352 348 308 338 340 272 356 336 361 301
1059184
1.2
409
1352808
1.6
321
1296495 914037
1.1 1.5
249 424
T1328 T2065 T2874 T1146 T3293 T2541 T6598 T2853 T4006 T3091 T2326 T1975 T3728 T3276 T1216 T3411 T1954 T1174 T3915 T4306 T3998 T2459 T4057 T1790 T4251 T4354 T4564 T4222
were homologous to genes or open reading frames (ORFs) that most probably perform some function yet to be determined ( Table 1). Thus, 52% of the trypanosome ESTs in the present study do not match any known genes. These represent potential trypanosome-
1.8×10−9 9.5×10−5 1.8×10−66 2.7×10−11 2.9×10−38 3.5×10−38 5.9×10−15 1.0×10−40 5.4×10−35 1.3×10−21 4.9×10−11 1.7×10−18 5.5×10−76 3.8×10−25 2.7×10−66 8.9×10−15 2.6×10−31 1.5×10−12 8.1×10−11 2.7×10−21 2.4×10−51 1.4×10−26 2.0×10−27 1×10−24 6.6×10−16 2×10−17 1.8×10−29
specific genes. Within this group of ESTs, transcripts from genes are likely to be found that serve complex functions such as: (1) parasite-specific survival in unusual environments within the insect vector or mammalian tissues, (2) integration or co-ordination of unknown
103
A. Djikeng et al. / Gene 221 (1998) 93–106 Table 11 T. b. rhodesiense ESTs homologues: internal/external structures and motility Clone number
dbEST number
Size (bp)
Sequence length
Putative identity
Accession number
Organism
P(n)
T4907 T4030 T1764 T3227 T2564 T2506 T2096 T1156 T2682 T2584 T1296 T1341 T2177 T2873 T2974 T2300
1352833 752240 522090 637833 577109 457898 513464 506083 576961 547900 506973 507003 513515 577065 592292 513596
1.6 0.6 1 1.5 2.8 1 0.8 1 0.6 2.1 1.2 0.7 0.6 1.8 1.6 0.9
281 375 250 354 389 339 371 277 242 344 270 339 263 378 389 350
IR:I49465 JC4039 GP:X52209 gi|998562 L48038 SP:P33322 SP:Q00382 PIR:B23008 U32944 SP:P39057 GP:M68859 SP:P22087 GP:U19972 U46957 PIR:A38096
Mouse Gallus gallus Avian erythroblastoris virus Strongylocentrotus purpuratus Homo sapiens Saccharomyces cerevisiae Mus musculus Minute virus Homo sapiens Anthocidaris crassispina Mus musculus Homo sapiens Mus musculus Rattus norvegicus Homo sapiens
SP:P47154
Saccharomyces cerevisiae
2.9×10−38 3.9×10−48 2.5×10−5 1.2×10−5 8.8×10−14 7.1×10−7 1.4×10−30 2.4×10−5 2.3×10−28 1.1×10−22 4.5×10−6 3.4×10−24 4.2×10−5 2.3×10−67 5.7×10−6 1.6×10−12
T2282 T2203 T4523
513586 513532 1296493
1.6 1.8 1.6
163 348 269
SP:P25147 GP:U15617
Listeria monocytogenes Gallus gallus
X96439
T. brucei
T1292 T1151 T1823 T1832 T1728 T1111 T1181 T2041 T2794 T3088 T4928 T1191 T2611 T3912 T3501 T3137 T1245 T3165 T3951 T2922
506970 506082 510991 510998 522075 506059 506099 513433 577025 607550 1352874 510943 547929 825360 736513 612768 507042 847747 736531 607523
0.9 1 0.6 1.4 1 1.4 1.8 0.6 2.5 0.6 1 0.5 0.7 2.8 1.2 2.6 1.3 2.2 1.8 0.6
270 298 309 295 280 312 298 249 338 371 348 343 333 310 353 396 271 397 289 429
a-cardiac actin Actin-like protein Aga, v-erb-A, V-erb-B gene product b-type nuclear lamin b-adaptin Centromere/microtubule binding protein Clathrin coat assembly protein Coat protein VP1 Cytoplasmic dynein light chain Dynein b chain Dystrophin Fibrillarin Fibrillin-1 Glycoprotein CD44s Heparan sulfate proteoglycan Hypothetical 52.3 Kd protein in CAP2–ATP2 intergenic region Internalin B precursor Kinectin Kinetoplastid membrane protein-11 (kmp-11) Laminin-2 a-2 chain precursor Laminin A Major surface glycoprotein Major surface glycoprotein Major surface glycoprotein Mucin Myosin-like antigen Nuclear envelope protein POM 12 Outer arm dynein b Outer cell wall protein precursor Paraflagellar rod protein Polyprotein Probable cell surface protein Protein X92 Surface protein type 51B Transport protein USO1 Trychohyalin Tubulin a chain Tubulin b chain Vitellogenin, phosvitin gene
GP:U12147 GP:L07288 SP:P33495 GP:L16779 GI:643438 SP:Q02817 PIR:A44939 PIR:A40670 U19464 SP:P09333 L30155 GP:D30613 PIR:S54162 P12304 PIR:S50820 PIR:A38455 SP:Q07283 SP:P04106 SP:P04107 K02113
Mus musculus Drosophila melanogaster Turkey virus Leishmania guyanensis Pneumocystis carinii Homo sapiens Onchocerca volvolus Rattus norvegicus Paramecium tetraurelia Bacillus brevis T. brucei HCV Leishmania donovani T. brucei Paramecium tetraurelia Saccharomyces cerevisiae Homo sapiens T. brucei T. brucei Gallus gallus
pathways and (3) creation of a local micro-environment suitable for parasite survival. It is not possible at this stage to determine the nature of new trypanosomespecific genes. However, given the redundancy of known genes among the ESTs reported here, we expect that there are at least 500 putative new trypanosome genes of unknown identity among the ESTs that have no homologues in the public databases. As the genomes of more organisms are analysed and the functions of more genes are determined, the number of trypanosomal ESTs without homologues will decrease, leaving only those unique to the trypanosome. Determination of the functions performed by such genes or their products must await the availability of robust assay systems for the
6.2×10−6 1.3×10−6 1.4×10−41 1.2×10−5 9.9×10−5 2.4×10−9 5.0×10−12 3.3×10−5 9.0×10−6 5.0×10−5 5.2×10−5 1.6×10−18 5.9×10−5 5×10−66 6.4×10−11 1.0×10−53 5.0×10−62 7.8×10−6 2.1×10−7 1.0×10−6 4.7×10−59 8.0×10−49 4.0×10−75
analysis of gene function in lower eukaryotes like the African trypanosomes. 3.5. Genomic or cDNA sequencing for gene discovery It is possible that information comparable to that presented here could be obtained by the sequencing of random clones of trypanosome genomic DNA. However, the trypanosome genome is rather plastic, with different genes being affected to varying degrees by the genome mobility (Donelson, 1989). Thus, there may exist gene families, duplicates or orphons, of which the majority of their members are transcriptionally silent. Similarly, there may be large open reading frames or
104
A. Djikeng et al. / Gene 221 (1998) 93–106
stage at which they are expressed. The true benefits of each approach will be evaluated more accurately when sufficient numbers of either clones have been sequenced. The proportion of genes found by genomic DNA sequencing to be associated with surface antigenic variation comprised approximately 10% of the random clones (el-Sayed and Donelson, 1997); in the sequencing of cDNAs, the number of individual clones associated with surface antigenic variation was only about 2%. The differences in these estimates may be due to the differences in target molecules sequenced in either approach. In both the genomic sequencing and the sequencing of cDNA, sufficient numbers of individual clones are yet to be analysed in order to obtain representative sampling of the target molecules. Although the T. brucei genome has many copies of transcribed repetitive elements RIME and ingi (Hasan et al., 1984; Kimmel et al., 1987), no homologues of these elements were found among the cDNA clones sequenced so far. 3.6. Conclusions Fig. 1. Tentative identification of transcripts from developmentally regulated genes. The polymerase chain reaction amplification was performed on selected recombinant cDNA clones to release the inserts. The released inserts were purified from PCR components using Wizard@ resin (Promega), then quantified spectrophotometrically. Equal quantities (20 ng) of each of the inserts were slot-blotted on to a nylon membrane, denatured then fixed on to the membrane. After 2 h pre-hybridization, each of the filters was hybridized for 12 h at 68°C with 1×106 cpm of radioactive cDNA per milliliter of hybridization solution. The cDNA probes used were labeled to the same specific activity with [a-P32]-dCTP. The filters were washed with 0.1× SSC, 0.1% SDS, twice for 30 min each time at 65°C, then exposed to film with intensifying screens at −70°C for 12 h. The panels show slot blots hybridized with cDNA from: (A) procyclic forms, (B) long-slender forms and (C ) short-stumpy forms of T. brucei ILTat 1.1. Arrows point to some of the ESTs that hybridized more strongly with at least one of the probes used. In each of the panels, the slots contain inserts from individual ESTs indicated in Table 12.
pseudogenes that are normally not transcribed. The sequencing of genomic DNA does not allow a distinction to be made between transcriptionally silent open reading frames (ORFs) or pseudogenes and transcriptionally active genes. Furthermore, although both genomic sequencing and cDNA sequencing appear to yield similar returns in terms of new genes (el-Sayed and Donelson, 1997), the cDNAs represent products from transcriptionally active genes that give rise to steady state mRNA. Therefore, sequencing of genomic DNA is likely to overestimate the number of actual genes; this contrasts with the sequencing of cDNA, which directly accesses the protein-coding regions of the genome. However, the cDNA approach used here scanned only the transcripts of genes expressed in the bloodstream forms. In contrast, the genomic DNA sequencing would reveal all the genes, irrespective of the developmental
(1) By a modest investment in cDNA sequencing, it has been possible to exploit the availability of huge amounts of sequence data in the public-domain databases to find genes with potential roles in some of the processes that could not hitherto be approached easily in the trypanosome. The sequence information provided here forms a convenient starting point for molecular investigations into a number of these processes. (2) A majority of the novel genes discovered through the sequencing of random cDNAs are likely to be involved in functions or levels of regulation not yet studied in the trypanosome. (3) Although the likely functions of proteins encoded by the ESTs were assigned according to significant matches with proteins of known functions in other organisms, some aspects of the processes in which the gene products participate are likely to be unique to the trypanosome. (4) The ESTs provide useful clues from where to select objectively those that are the best candidates for potential vaccines, drug targets and diagnostic antigens.
Acknowledgement We thank Dr Vish Nene for useful comments on the manuscript. We thank also Antony Muthiani and John Wando for providing procyclic form trypanosomes. Technical assistance of Rodney Morgan, Loren Donelson, Samuel Oyola, George Ochieng’ Odero and Lorraine Nyaoke is gratefully acknowledged. This inves-
105
A. Djikeng et al. / Gene 221 (1998) 93–106 Table 12 Tentative identification of transcripts from developmentally regulated genes Number
I
II
III
1 2 3 4 5 6 7 8
b-tubulin edeB protein Human herpes virus 7, U88 Extracellular signal-regulated kinase Type 2 inositol 1,4,5 triphosphate receptor Mitochondrial phosphate carrier protein Adenylsuccinate lyase Histone H4
a-tubulin DNA topoisomerase II Protease regulator subunit 4 Thymidine kinase, cytosolic RAD3 Ubiquitin-conjugating enzyme Deoxyhypurine synthase b-adaptin
9
Antisecretory factor-1
Ubiquitin-conjugating enzyme E2
10
Guanine nucleotide regulatory protein
11 12 13 14 15 16 17 18
RAD8 Major surface glycoprotein ADP, ATP carrier protein: STI (stress-inducible protein) Putative proteasome component Protein kinase C delta-type Phosphotransferase enzyme HIV-1 TAR RNA-binding protein ( TARB)
19 20
Tumor necrosis factor type 1 receptor-associated protein Metalloproteinase (=gp63 of Leishmania)
ATP-stimulated glucocorticoid-receptor translocation promoter protein Proteasome Cystathionine gamma-synthase 60S ribosomal protein L13.1 40S ribosomal protein S12.e HSP 108 No homologue VSG 222 Hypothetical 22 kDa protein in Aldolase locus Genome polyprotein
MVAT4 Cyclophilin A Novel brain-specific protein Arginine kinase Hypothetical protein YCR072C T. brucei, Tm7T.b.r Tubulin a chain Extension precusor-cell wall hydroxyproline-rich glycoprotein Hypothetical 71.5 kDa protein in cox13∞ region 60S ribosomal protein L1
21 22 23 24
Signal recognition particle receptor a subunit Probable cell control protein crn T-complex protein 1, theta subunit S-adenosylhomocysteine hydrolase
NADH dehydrogenase (ubiquinone) Cysteine protease VSG ILTat1.21 VSG AnTat1.8 Humoral lectin prepropeptide
Pencillin acylase I precusor Lambda clone of E. coli K-12 genome LACK Putative cell surface protein Yeast clathrin coat assembly protein Outer arm dynein b PPi-dependent phosphofructo-1-kinase 3∞,5∞ cyclic nucleotide phosphodiesterase MVAT4 Yeast hypothetical 81.7 kDa protein in MOL1-NAT2 intergenic region Tubulin a chain No homologue Putative ATP-dependent RNA helicase Vitellogenin, phosvitin gene
Fig. 2. ESTs from developmentally regulated genes. Northern blots of total RNA (5 mg/lane) from different developmental stages of T. brucei ILTat1.1: long-slender forms in lane 1; short-stumpy forms in lane 2; procyclic culture forms in lane 3, were hybridized with radio-labeled inserts from ESTs encoding trypanosome homologues of: (A) Leishmania putative surface protein, Ldp23 ( T1310); (B) receptor for activated protein kinase C ( T1185); (C ) major surface glycoprotein gp63 of L. guyanensis, ( T1832); and (D) variant surface glycoprotein 222 ( T1707). Posthybridization washes were carried out under stringent conditions (0.1× SSC, 0.1% SDS, at 65°C, twice, 30 min each time) before the filters were exposed to film with intensifying screens at −70°C to obtain the autoradiogram shown. The sizes of the transcripts are indicated in kilobases.
106
A. Djikeng et al. / Gene 221 (1998) 93–106
tigation received financial support from the UNDP/World Bank/WHO Special Programme for Research and Training in Tropical Diseases (TDR). This is ILRI publication number 98045.
References Adams, M.D., Kerlavage, A.R., Fields, C., Venter, J.C., 1993. 3,400 new expressed sequence tags identify diversity of transcripts in human brain. Nature Genet. 4, 256–267. Alarcon, C.M., Son, H.J., Hall, T., Donelson, J.E., 1994. A monocistronic transcript for a trypanosome variant surface glycoprotein. Mol. Cell. Biol. 14, 5579–5591. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local alignment search tool. J. Mol. Biol. 215, 403–410. Boothroyd, J.C., Cross, G.A., 1982. Transcripts coding for variant surface glycoproteins of Trypanosoma brucei have a short, identical exon at their 5∞ end. Gene 20, 281–289. Campos-Neto, A., Soong, L., Cordova, J.L., Sant’Angelo, D., Skeiky, Y.A., Ruddle, N.H., Reed, S.G., Janeway, C., Jr., McMahon-Pratt, D., 1995. Cloning and expression of a Leishmania donovani gene instructed by a peptide isolated from major histocompatibility complex class II molecules of infected macrophages. J. Exp. Med. 182, 1423–1433. Cecchini, E., Dominy, P.J., Geri, C., Kaiser, K., Sentry, J., Milner, J.J., 1993. Identification of genes up-regulated in dedifferentiating Nicotania glauca pith tissue, using an improved method for constructing a subtractive cDNA library. Nucleic Acids Res. 21, 5742–5747. Chakrabarti, D., Reddy, G.R., Dame, J.B., Almira, E.C., Laipis, P.J., Ferl, R.J., Yang, T.P., Rowe, T.C., Schuster, S.M., 1994. Analysis of expressed sequence tags from Plasmodium falciparum. Mol. Biochem. Parasitol. 66, 97–104. Cross, G.A.M., 1995. Identification, purification and properties of clone specific glycoprotein antigens constituting the surface coat of Trypanosoma brucei. Parasitology 71, 393–417. Donelson, J.E., 1989. DNA rearrangements and antigenic variation in African trypanosomes. In: Berg, D.E., Howe, M.M. ( Eds.), Mobile DNA. American Society for Microbiology, Washington, DC, pp. 763–781. el-Sayed, N.M.A., Alarcon, C.M., Beck, J.C., Sheffield, V.C., Donelson, J.E., 1995. cDNA expressed sequence tags of Trypanosoma brucei rhodesiense provide new insights into the biology of the parasite. Mol. Biochem. Parasitol. 73, 75–90. el-Sayed, N.M.A., Donelson, J.E., 1997. A survey of Trypanosoma
brucei rhodesiense genome using shotgun sequencing. Mol. Biochem. Parasitol. 84, 167–178. Fields, S., Sternglanz, R., 1994. The two-hybrid system: an assay for protein–protein interactions. Trends Genet. 10, 286–292. Franco, G.R., Adams, M.D., Soares, M.B., Simpson, A.J., Venter, J.C., Pena, S.D., 1995. Identification of new Schistosoma mansoni genes by the EST strategy using a directional cDNA library. Gene 152, 141–147. Goffeau, A., Barrell, B.G., Bussey, H., Davis, R.W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J.D., Jacq, C., Johnston, M., Louis, E.J., Mewes, H.W., Murakami, Y., Philippsen, P., Tettelin, H., Oliver, S.G., 1996. Life with 6000 genes. Science 274, 546–567. Hasan, G., Turner, M.J., Cordingley, J.S., 1984. Complete nucleotide sequence of an unusual mobile element from Trypanosoma brucei. Cell 37, 333–341. Julia, V., Rassoulzadegan, M., Glaichenhaus, N., 1996. Resistance to Leishmania major induced by tolerance to a single antigen. Science 274, 421–423. Kimmel, B.E., ole-MoiYoi, O.K., Young, J.R., 1987. Ingi, a 5.2-kb dispersed sequence element from Trypanosoma brucei that carries half of a smaller mobile element at either end and has homology with mammalian LINEs. Mol. Cell. Biol. 7, 1465–1475. Magez, S., Geuskens, M., Beschin, A., del Favero, H., Verschueren, H., Lucas, R., Pays, E., de Baetselier, P., 1997. Specific uptake of tumor necrosis factor-alpha is involved in growth control of Trypanosoma brucei. J. Cell. Biol. 137, 715–727. Saiki, R.K., Scharf, S., Faloona, F., Mullis, K.B., Horn, G.T., Erlich, H.A., Arnheim, N., 1985. Enzymatic amplification of beta-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. Science 230, 1350–1354. Sambrook, J., Fritsch, E.F., Maniatis, T., 1989. Molecular Cloning: A Laboratory Manual, 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Sanger, F., Nicklen, S., Coulson, A.R., 1977. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74, 5463–5467. Short, J.M., Fernandez, J.M., Sorge, J.A., Huse, W.D., 1988. Lambda ZAP: a bacteriophage lambda expression vector with in vivo excision properties. Nucleic Acids Res. 16, 7583–7600. Song, H.Y., Dunbar, J.D., Zhang, Y.X., Guo, D., Donner, D.B., 1995. Identification of a protein with homology to hsp90 that binds the type 1 tumor necrosis factor receptor. J. Biol. Chem. 270, 3574–3581. Swindle, J., Tait, A., 1996. Trypanosomatid genetics. In: Smith, D.F., Parsons, M. (Eds.), Molecular Biology of Parasitic Protozoa. IRL Press, Oxford, pp. 6–34. Wan, K.L., Blackwell, J.M., Ajioka, J.W., 1996. Toxoplasma gondii expressed sequence tags: insight into tachyzoite gene expression. Mol. Biochem. Parasitol. 75, 179–186.