AvGI, an index of genes transcribed in the salivary glands of the ixodid tick Amblyomma variegatum

AvGI, an index of genes transcribed in the salivary glands of the ixodid tick Amblyomma variegatum

International Journal for Parasitology 32 (2002) 1447–1456 www.parasitology-online.com AvGI, an index of genes transcribed in the salivary glands of ...

254KB Sizes 3 Downloads 138 Views

International Journal for Parasitology 32 (2002) 1447–1456 www.parasitology-online.com

AvGI, an index of genes transcribed in the salivary glands of the ixodid tick Amblyomma variegatum Vishvanath Nene a,*, Dan Lee a, John Quackenbush a, Robert Skilton b, Stephen Mwaura b, Malcolm J. Gardner a, Richard Bishop b a

The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA b International Livestock Research Institute, P.O. Box 30709, Nairobi, Kenya Received 12 June 2002; received in revised form 5 August 2002; accepted 6 August 2002

Abstract Random clones from a cDNA library made from mRNA purified from dissected salivary glands of feeding female Amblyomma variegatum ticks were subjected to single pass sequence analysis. A total of 3,992 sequences with an average read length of 580 nucleotides have been used to construct a gene index called AvGI that consists of 2,109 non-redundant sequences. A provisional gene identity has been assigned to 39% of the database entries by sequence similarity searches against a non-redundant amino acid database and a protein database that has been assigned gene ontology terms. Homologs of genes encoding basic cellular functions including previously characterised enzyme activities, such as stearoyl CoA saturase and protein phosphatase, of ixodid tick salivary glands were found. Several families of abundant cDNA sequences that may code for protein components of tick cement and A. variegatum proteins which may contribute to anti-haemostatic and anti-inflammatory responses, and, one with potential immunosuppressive activity, were also identified. Interference with the function of such proteins might disrupt the life cycle of A. variegatum and help to control this ectoparasite or to reduce its ability to transmit disease causing organisms. AvGI represents an electronic knowledge base, which can be used to launch investigations of the biology of the salivary glands of this tick species. The database may be accessed via the World Wide Web at http://www.tigr.org/tdb/tgi.shtml. q 2002 Australian Society for Parasitology Inc. Published by Elsevier Science Ltd. All rights reserved. Keywords: Salivary glands; Expressed sequence tag; Gene index; Ticks; Amblyomma variegatum

1. Introduction Haematophagus arthropods have recently gained considerable public attention as vectors of emerging infectious diseases and are also recognised as vectors of a number of organisms that are classified as agents of bio-terrorism (Rotz et al., 2002). Like mosquitoes, ticks are efficient vectors of a variety of viral, prokaryotic and eukaryotic organisms that cause disease in humans and animals (Wikel and AlacronChaidez, 2001). In addition, ticks are ectoparasites causing direct losses to the livestock industry due to reduced productivity and skin damage and it is estimated that the global tick and tick-borne disease problem, and efforts to control them, cost about US$ 7 billion per year (McKosker, 1979). Transmission of pathogens that are capable of replicating in ticks usually occurs via their release during tick feeding and infectivity is often enhanced by immuno-modulatory factors present in saliva (Nuttall et al., 2000; Wikel, 1999). In * Corresponding author. Tel.: 11-301-610-5968; fax: 11-301-838-0208. E-mail address: [email protected] (V. Nene).

contrast to blood feeding insects, ticks in the family Ixodidae feed for many days and the tick attachment site usually consists of a cement cone that protects the tick hypostome during the process of feeding and seals the site of skin penetration (Kemp et al., 1982). The salivary glands of ticks are responsible for osmoregulation and excess water derived from the blood meal is excreted via saliva (reviewed in Sauer et al., 2000). In addition to components of cement, ticks also secrete a variety of pharmacologically active molecules that modulate both innate and acquired mammalian immune responses in their favor, allowing successful feeding over prolonged periods of time (Wikel, 1999). Some components of tick saliva are highly immunogenic and vaccination with salivary gland proteins can interfere with the tick life cycle (Mulenga et al., 1999; Willadsen, 2001) and pathogen transmission (Nuttall et al., 2000; Wikel, 1999). Thus, there is considerable interest in the characterisation of components of tick saliva as such information is likely to be of value when designing improved methods for the control of ticks and tick-borne diseases. With the advent of high throughput DNA sequencing

0020-7519/02/$20.00 q 2002 Australian Society for Parasitology Inc. Published by Elsevier Science Ltd. All rights reserved. PII: S 0020-751 9(02)00159-5

1448

V. Nene et al. / International Journal for Parasitology 32 (2002) 1447–1456

technologies and improvements in computational tools that facilitate assignment of provisional function to gene sequences, expressed sequence tag (EST) projects have proved to be a valuable approach in gene discovery programs for developing organism-specific electronic knowledge bases. The recent reporting of 1,738 ESTs from Amblyomma americanum, the lone star tick, which is found in parts of North America, has greatly added to genome related information on ixodid ticks and exemplifies the importance of such reverse genetics approaches to arthropod biology as the sequence tags were used to predict the presence of genes encoding a wide variety of functions (Hill and Gutierrez, 2000). In Africa, a related tick species, Amblyomma variegatum, is the primary vector for transmission of a rickettsial pathogen that causes a fatal disease called heartwater in ruminants (Uilenberg, 1983). In addition, aggravation of a skin disease caused by Dermatophilus congolensis, an actinomycete bacterium, in cattle is associated with infestation with A. variegatum, probably as a result of systemic suppression of bovine immune responses by feeding ticks (Ambrose et al., 1999). This tick species was inadvertently introduced into Guadeloupe about 150 years ago by importation of infested cattle from West Africa (Camus and Barre, 1990). The ‘tropical tick bont’ as it is also known is now present on several Islands in the Caribbean and with it the associated diseases of heartwater and dermatophilosis. The Caribbean Amblyomma programme was initiated in 1995 with the aim of eradicating the tick in the region (Pegram et al., 2000) and to reduce the considerable threat it poses to livestock agriculture in mainland North America. Despite the economic importance of the problems posed by A. variegatum, there is little molecular information available from it. To further advance our knowledge of this tick species, we have adopted an EST approach, focusing on the discovery of genes expressed in the salivary glands of feeding female ticks. This paper encompasses a description of 3,992 single pass cDNA sequences and the construction of a gene index, AvGI, using an in-house gene indexing system developed at The Institute for Genomic Research (TIGR) (Quackenbush et al., 2001). Six of the eight most highly represented ESTs have the potential to encode a protein containing glycine rich repeat regions, a feature associated with protein components of tick cement. Potential homologs of genes associated with either defined functions in salivary gland biology or modulation of mammalian host immune and haemostatic responses were also identified.

2. Material and methods 2.1. Construction of salivary gland cDNA library Dissected tick salivary glands from 400 adult female ticks that had been allowed to feed on Boran cattle (Bos indicus) for 5 days were homogenised in RNA extraction buffer (Xie

and Rothblum, 1991) containing phenol and guanidine isothiocyante. The cleared lysate was extracted with chloroform and total RNA recovered from the aqueous phase by precipitation with isopropanol. This material was used to prepare a cDNA library (Invitrogen). Briefly, an oligo(dT) primer containing a NotI restriction enzyme site was used to prime first strand cDNA synthesis. Double stranded cDNA was digested with NotI, size fractionated and directionally cloned into EcoRV–NotI double digested pCMVSPORT6. ccdb. Ligation products were electroporated into DH10B tonA Escherichia coli cells. 2.2. Plasmid template preparation and sequencing Processing of bacterial colonies and automated sequencing was carried out within the TIGR sequencing core facility using robots to pick colonies and a workstation for DNA template preparation. Briefly, plasmid containing cells were selected on growth medium containing ampicilin, grown in liquid culture in 96 well plates and plasmid DNA prepared using the Perfectprep Plasmid 96 Vac, Direct Bind Purification System from Eppendorf. Purified DNA was subjected to sequence analysis using universal M13/pUC oligonucleotides to prime sequencing reactions at sites that flank the multiple cloning site in the cloning vector and Applied Biosystems PRISM Big Dyee terminator version 3.0 cycle sequencing ready reaction kits. Sequencing reactions were purified by precipitation and then analysed on ABI 3700 machines. Clones were assigned a unique label from AVAAA01 to AVABS96. Data were tracked by use of a relational database and 3,992 quality EST sequences were submitted to GenBank: BM289399 to BM293390 and dbEST ID of EST574125 to EST578116. 2.3. Creation of a gene index, AvGI The TIGR gene index system has been recently reviewed (Quackenbush et al., 2001) and will not be discussed at length, but slightly modified procedures were used. Briefly, sequence data was trimmed to remove vector sequences and polyA/T tracts and then subjected to high stringency pairwise comparison to identify sequence overlaps using megablast (Zhang et al., 2000). Sequences sharing a minimum of 95% identity over a 40 nucleotide or longer region with a 20 nucleotide or less mismatched sequence at both ends were grouped into a cluster. Sequences within a cluster were then assembled to create one or more tentative consensus sequences using the Paracel TranscriptAssemblere (Paracel Inc.). All non-clustered, non-overlapping sequences are defined as single copy singleton sequences. DNA sequences were loaded into a species specific relational database for automated annotation to identify a provisional function by searching against public protein databases using the blastx routine in the BLAST algorithm without the low complexity filter (Altschul et al., 1990) and then assigned gene ontology terms according to gene ontology categories (Ashburner et

V. Nene et al. / International Journal for Parasitology 32 (2002) 1447–1456

al., 2000) by searching a database of proteins that had been assigned gene ontology terms.

3. Results Dissected salivary glands from 5 day fed A. variegatum female ticks were flash frozen in liquid nitrogen and then processed to purify total RNA. Approximately 2.52 mg of total RNA (260/280 ratio of 1.92) was recovered and a portion of this RNA was used to construct a directional non-normalised cDNA library. A non-amplified library consisting of 2.9 £ 10 7 clones in the plasmid vector pCMVSPORT6.ccdb was made. As a result of the ccdB gene cloning strategy (Gabant et al., 1998), a very high cloning efficiency was achieved and all of the clones analysed contained an insert. Initially, 480 clones were processed and subjected to sequence analysis using M13/pUC sequencing primers that flank the multiple cloning site in the plasmid. The sequencing success rate with the reverse primer, which primes sequencing at the 5 0 end of the cDNA was 93% but that with the forward primer which primes at the 3 0 end of the cDNA insert was only 43%. We presumed that the low success rate for the latter was due to poor sequence quality caused by the polyA tract which was present in almost all of the clones analysed, and we thus elected to sequence additional clones with only the reverse primer. In total, 4,224 templates were prepared from 4,320 clones that were picked and processed and these resulted in 3,992 quality sequences, which varied in length from 101 to 838 bases. Only 207 EST sequences were generated from the 3 0 end of cDNAs and the average read length for the 5 0 and 3 0 ESTs was 590 and 406 nucleotides, respectively. Clustering and sequence assembly of the 3,992 quality EST sequences resulted in a dataset comprising 2,109 non-redundant sequences. These consisted of 478 tentative consensus sequences (TC1 to TC478) and 1631 singleton sequences, which are referred to by their GenBanke accession number. The tentative consensus sequences were derived by assembly of 2,361 EST sequences, giving a redundancy rate of 59% in the dataset. More than half of the tentative consensus sequences, 264, were derived from assembly of two ESTs, 181 contained three to 10 ESTs, 18 contained 11–20 ESTs, 10 contained 21–50 ESTs and five were assembled from 51 or more ESTs. A preliminary comparison of these data with 1,738 ESTs from unfed whole larva and adult stages of A. americanum (Hill and Gutierrez, 2000) indicated 20% sequence similarity between the datasets and only 30 of 154 ESTs from the salivary glands of feeding ticks (Roe et al., http://www.genome.ou.edu/tick.html; data release 5th January, 2001) were similar to sequences generated in this study (data not shown). The non-redundant A. vareigatum EST sequence data, which represents approximately 1.2 Mbp of sequence information, were used to search the non-redundant amino

1449

acid sequence database using the blastx algorithm. At an evalue of 10 225, 819 (39%) similarity matches were identified and sequence data was also used to automatically infer gene ontology categories and assignments: 526 sequences were assigned to the organising principle of molecular function, 470 into biological process and 449 into cellular component (Table 1). It is probable that some data relating to weak sequence similarities will be overlooked by using the above e-value (see below), but there is more confidence in such provisional gene assignments. The A. variegatum gene index, AvGI, can be accessed at the TIGR gene index website (www.tigr.org/tdb/tgi.shtml), and can be viewed and searched via various parameters. For example, retrieval of information by tentative consensus number shows the consensus sequence, open reading frames, a map of the EST overlaps that were used to assembly the sequence, a list of the component ESTs and cDNA clone names and tentative gene assignment including a summary of the similarity search data that is linked to public database accession numbers. Sequences can also be retrieved by EST ID, GenBank accession or clone name. Retrieval of information via ‘EST annotator’ recalls all 3,992 ESTs in a table format which lists clone names, GenBank accession number, tentative consensus number if appropriate, and provisional annotation. Thus data on singletons are most easily viewed by this display. ESTs that are classified according to the three major gene ontology assignments may be viewed as can components of individual categories (Table 1), for example, the 163 entries that fall in the enzyme category. The gene index can also be interrogated via syntax. A query with the word ‘proteinase’, for example, found seven entries (Table 2) and searching for ticks in the genus Boophilus, Rhipicephalus, Ixodes, Haemaphysalis and Dermacentor produced nine, six, 11, two and one entries, respectively (data not shown). Finally, it is possible to search a query nucleotide or protein sequence against the gene index using the BLAST algorithm. The current gene index will continue to evolve as improvements are made to the TIGR gene indexing system and there may be updates in provisional functional gene assignments due to periodic automated searching of the public databases. The average G 1 C content of the EST dataset was 52% but a number of sequences were A 1 T rich. For example, TC149 exhibits a G 1 C composition of only 19%. Except for TC149, which did not contain long open reading frames (ORFs), such A 1 T rich ESTs contained an ORF for proteins that are normally located on mitochondrial DNA, such as subunits of NADH dehydrogenase and cytochrome oxidase. The DNA sequence of TC149 which consists of 1,164 nucleotides was used to search a nucleotide database and found to contain a sequence that was nearly identical to the 3 0 end of an A. variegatum 16S mitochondrial rDNA gene (Black and Piesman, 1994). Thus, it is likely that TC149 represents a near full length sequence of mitochondrial 16S rRNA. TC149 was assembled from 26 ESTs and

1450

V. Nene et al. / International Journal for Parasitology 32 (2002) 1447–1456

Table 1 A. variegatum gene ontology assignments a Molecular function Category

Singleton/TC

Ligand binding or carrier Enzyme Transporter Signal transducer Obsolete Structural molecule Chaperone Molecular function unknown Enzyme regulator Apoptosis regulator Cell adhesion molecule Motor

207 163 41 27 24 22 19 9 7 3 3 1

% of 526 TC/singleton with gene ontology assignments 39.35 30.99 7.79 5.13 4.56 4.18 3.61 1.71 1.33 0.57 0.57 0.19

% of 2109 total TC/singleton with gene ontology assignments 9.82 7.73 1.94 1.28 1.14 1.04 0.90 0.43 0.33 0.14 0.14 0.05

Biological process % of 470 TC/singleton with gene ontology assignments Cell growth and or maintenance Cell communication Developmental processes Obsolete Biological process unknown Death Physiological processes Behaviour

352 46 27 14 11 9 7 4

74.89 9.79 5.74 2.98 2.34 1.91 1.49 0.85

16.69 2.18 1.28 0.66 0.52 0.43 0.33 0.19

Singleton/TC

% of 449 TC/singleton with gene ontology assignments

414 13 12 8 2

92.20 2.90 2.67 1.78 0.45

Cellular component

Cell Cellular component unknown Extracellular Obsolete Unlocalised

19.63 0.62 0.57 0.38 0.09

a Gene ontology assignments (Ashburner et al., 2000) for non-redundant EST sequence data is shown. The number of sequences that fall into each category is listed and then calculated as a percent of those that fall in each of the three organising principles and the total number of sequences.

cDNA synthesis appears to have been primed from a polyA tail, suggesting that as in other organelle systems (Baserga et al., 1985), tick mitochondrial rRNAs also undergo polyadenylation. In support of this observation, analysis of ESTs from whole larvae of A. americanum also identified mitochondrial 16S rRNA sequences within the dataset (Hill and

Gutierrez, 2000). None of the A. variegatum ESTs had sequence similarity with a partial sequence of A. variegatum mitochondrial 12S rDNA. TC3 contained the highest level of redundancy with 161 ESTs resulting in a 1,201 nucleotide consensus sequence after assembly, including a polyA tail derived from 13 3 0

Table 2 Query of AvGI gene index with syntax ‘proteinase’ a Sequence

Provisional identification in gene index

TC318 TC386 BM289546 BM289731 BM290207 BM291444 BM292795

Cathepsin L-like proteinase precursor, Boophilus microplus Probable zinc proteinase (EC 3.4.24.-) F44E7.4, Caenorhabditis elegans Cathepsin L-like proteinase precursor, Boophilus microplus Serine proteinase inhibitor serpin-4, Rhipicephalus appendiculatus Serine proteinase inhibitor serpin-4, Rhipicephalus appendiculatus Serine proteinase 2, Haemaphysalis longicornis Serine proteinase inhibitor serpin-4, Rhipicephalus appendiculatus

a

Searching the gene index with the word proteinase identified two consensus sequences and five singleton sequences.

Tentative consensus number

ESTs in tentative consensus

Length of tentative consensus

Length of ORF

% Amino acid

Provisional gene assignment in AvGI gene index

G

S

P

L

Y

TC3

161

1201

359

35

15

05

04

03

TC5 TC1

127 86

1343 1205

392 360

30 35

06 15

11 04

13 04

05 03

72 62 38 30 27

1049 1116 1057 1362 1029

– 371 352 416 319

– 22 35 21 08

– 06 16 06 05

– 12 03 11 05

– 06 05 06 08

– 06 03 06 05

TC310 TC148 TC4 TC147 TC314

Glycine-rich cell wall structural protein precursor Hydroxyproline-rich glycoprotein Glycine-rich cell wall structural protein precursor None None Glycine rich protein None Tubulin a-1 subunit

TC3, TC5, TC1, TC310 and TC147 consist of both 5 0 and 3 0 ESTs and contain polyA tails which were trimmed in constructing the consensus sequences. TC5 contains a 5 0 untranslated region and is likely to encode full length protein while TC314 represents a partial sequence of the a-1 subunit of tubulin which usually consists of 450 amino acid residues. Amino acid composition is given to the nearest whole number, but not for TC310 as all ORFs were less than 50 amino acid residues in length. The average % composition of G, S, P, L and Y in proteins lodged in the Swissprot database is 6.84, 7.22, 4.92, 9.33 and 3.19, respectively. a

V. Nene et al. / International Journal for Parasitology 32 (2002) 1447–1456

Table 3 Glycine rich coding ORFs in top eight tentative consensus sequences a

1451

1452

V. Nene et al. / International Journal for Parasitology 32 (2002) 1447–1456

ESTs. The blastx algorithm identified TC3 as containing a sequence similar to plant glycine rich cell wall structural proteins (Table 3), a feature often associated with tick cement and salivary gland proteins (Mulenga et al., 1999; Nuttall and Paesen, 1999; Godfroid et al., 2000; Tsuda et al., 2001; Bishop et al., 2002). Examination of the eight tentative consensus sequences with the highest number of component ESTs revealed that six of them contain an ORF with a high glycine content (Table 3). TC310 contained multiple small ORFs of unknown identity and TC314 encodes an homolog of a-1 tubulin, an abundant conserved protein that is present in most eukaryotes. TC5 was the only consensus sequence that contained a 5 0 untranslated region (UTR) as defined by the presence of a translation stop codon in the same frame as an ORF with a potential translation initiation codon and is thus likely to encode a full length protein. A sequence alignment of the six predicted protein sequences revealed that they fell into three distinct groups. TC3, TC1 and TC4 were related to each other (Fig. 1a), TC147 was related to TC148 (Fig. 1b) while TC5 was not related to either group, but was similar to a less abundant sequence in AvGI, TC449 (Fig. 1c). These three protein families contain a high content of glycine and different ratios of serine, proline, leucine and tyrosine amino acid residues (Table 3). Because of the skewed glycine content, there are weak degrees of sequence similarities among the protein families. Additional tentative consensus sequences and singletons containing glycine rich repeat motifs were found by visual inspection of translated ORFs longer than 100 amino acids and by database searches. Alignment of these sequences revealed that some were similar to those already defined, for example, TC2 (Fig. 1a). Others such as TC30, TC331 and TC381, were similar in sequence to glycine rich salivary gland components of Ixodes ricinus that are the subject of patent applications and which are up-regulated during feeding (Godfroid et al., 2000). In total, there are at least 11 distinct glycine rich gene families expressed in the salivary gland of A. variegatum (data not shown). In scanning the gene ontology assignments it was evident that a number of cDNAs appear to encode proteins underpinning universal cellular processes. These include components of metabolic and catabolic pathways, membrane transporters, structural molecules, signal transduction, cell cycle control and molecular components of the constitutive and regulated secretory pathways (Table 1). A number of pharmacologically active molecules and molecules important in the biology of ixodid salivary glands have been described and a few A. variegatum cDNAs that encode putative homologs of several such proteins were identified. For example, TC24 codes for a stearoyl CoA saturase, an enzyme that is likely to be important in lipid metabolism during salivary gland morphogenesis (Luo et al., 1997). cDNAs encoding fatty acid binding protein, phospholipase C, protein kinases, protein phosphatase PP2A, cAMPdependent kinase, cAMP regulated phosphoproteins, subu-

nits of vacuolar ATPase, GABA receptor were found, activities that have previously been associated with salivary gland function (Wikel et al., 1994; Sauer et al., 2000; Sauer et al., 1995). cDNAs coding for gluthathione-S-transferase, calreticulin, serpins, cathepsins, serine proteases and various peptidases have been identified. Several immunodominant salivary gland antigens of I. ricinus have been defined (Das et al., 2001) but we have not been able to identify sequences similar to them within the current dataset. The only potential homolog of ixodid tick molecules that have been previously demonstrated to modulate host immune responses might be present in TC183, which exhibits 47% sequence similarity with an immunosuppressant protein isolated from Dermacentor andersoni (Bergman et al., 2000) (Fig. 2). This similarity was revealed by analysis of blastx data using a lower e-value of 10 215. TC183 does not have this provisional identification in the gene index highlighting one of the problems associated with automated pipelines for annotation of DNA sequences, namely the inability to detect weaker sequence similarities that may still be significant. Other examples are represented by TC147 and TC148 which exhibit sequence similarity with saliva proteins HL34 and HL35 of Haemaphysalis longicornis (Tsuda et al., 2001).

4. Discussion We describe the creation of a gene index, AvGI, from about 1.2 £ 10 6 bases of EST data derived from mRNA expressed in the salivary glands of female A. variegatum ticks 5 days after initiation of feeding. Besides providing basic information on the genes expressed in tick salivary glands, we have an interest in defining genes encoding components that are secreted in saliva as these play a critical role in the biology of tick feeding (Sauer et al., 2000) and pathogen transmission (Wikel, 1999; Nuttall et al., 2000). A total of 2,109 non-redundant sequences were generated and annotated using a pipeline of automated scripts to provide an electronic resource of cDNA data and provisional gene identification. Most sequences were relatively G 1 C rich but a few were A 1 T rich. The latter were found to encode mitochondrial genes suggesting that there is a marked difference in codon usage between mitochondrial and nuclear protein encoding genes of A. variegatum, an observation supported by the high A 1 T composition of the mitochondrial genomes of two ixodid ticks, Rhipicephalus sanguineus and Ixodes hexagonus (Black and Roehrdanz, 1998). This dataset doubles the available genome information on tick species and represents a major increase to the current nine GenBanke accessions derived from A. variegatum. A tentative gene assignment was given to 39% of the nonredundant A. variegatum sequences indicating that most sequences represent hitherto undescribed genes, probably including some that encode novel secreted proteins. A bioinformatics approach can be employed to identify proteins that

V. Nene et al. / International Journal for Parasitology 32 (2002) 1447–1456

access the secretory pathway by searching for the presence of a signal peptide at the N-terminus of conceptual translations of cDNA sequences (Nielsen et al., 1997). However, as we cannot verify that the current sequence data is derived from full length cDNA, we have not attempted to systematically search conceptual translations of the sequence information for the presence of signal peptides as a means of defining proteins that might access a secretory pathway.

1453

Ixodid salivary glands undergo a huge growth phase during the initiation of feeding, increasing in mass about 25 fold (reviewed in Sauer et al., 2000). Although there is little change in the total number of salivary glands cells there is a large increase in mitochondrial number and cellular activity. The A. variegatum salivary gland EST data exhibited an overall sequence redundancy of 59% with a consensus being assembled from between two to 161 EST

Fig. 1. The conceptual translation for each consensus sequence is shown in single letter amino acid code and sequence identity between them is marked by an asterisk. TC5 was assembled from both 5 0 and 3 0 EST date, it has a 5 0 untranslated region and is thus likely to encode full length protein. TC449 was assembled from 5 0 ESTs only. The underlined sequence in TC5 marks the position of the predicted signal sequence and shaded regions mark the position of predicted trans-membrane domains.

1454

V. Nene et al. / International Journal for Parasitology 32 (2002) 1447–1456

Fig. 2. The conceptual translation of consensus sequence TC183 after alignment with protein p36, an immunosuppressant protein of D. andersoni (Bergman et al., 2000) usig BLAST is shown in the single letter amino acid code; 1marks the position of conserved amino acid substitutions.

sequences. Thus, some genes in the salivary glands appear to be highly transcribed, probably reflecting the high degree of specialisation of the function of this tick organ during tick feeding. A number of cDNAs encoding proteins known to be involved in biosynthetic and catabolic pathways were identified, together with proteins that have been associated with specific tick salivary gland functions (see Section 3). However, cDNAs for some previously described components of Amblyomma salivary glands were not found, for example, cDNA encoding an A. variegatum histamine–serotonin binding protein (Nuttall and Paesen, 1999), and an A. americanum macrophage migration inhibition factor (Jaworski et al., 2001). It is likely that additional sequencing of the current cDNA library will continue to be informative and that the point at which EST redundancy limits acquisition of novel sequences has not yet been reached. A few A. variegatum cDNA sequences that might exert an influence on vertebrate host immune, inflammatory and hemostatic responses were identified. A protein sequence exhibiting sequence similarity with a 36 kDa protein found in the salivary glands of feeding female D. andersoni that has been demonstrated to suppress murine T cell proliferative responses to concanavalin A in vitro (Bergman et al., 2000) might be present in TC183 (Fig. 2). Systemic suppression of bovine immune responses by adult A. variegatum tick infestation is associated with aggravation of dermatophilosis, an exudative dermatitis, (Lloyd and Walker, 1993; Ambrose et al., 1999). Thus, it will be of interest to determine the expression profile of TC183 in immature tick stages. TC107, BM28971, BM290207 and BM29275 appear to encode serpins, which are inhibitors of serine proteases and could therefore play a role in preventing hemostasis. Tick serpins have been recently proposed as potential targets for evaluation as vaccine candidates (Mulenga et al., 2001). TC317 is intriguing as it encodes full length calreticulin, a soluble protein usually found in the lumen of the endoplasmic reticulum, that is associated with chaperone function (Johnson et al., 2001), but which has been found to be an immunogenic component of A. americanum and Dermacentor variabilis saliva (Jaworski et al., 1995; Sanders et al., 1998). Antibody

epitopes are shared between A. americanum and rabbit calreticulin and it has been proposed that the tick protein plays a role in enhancing feeding through suppression of host immune or haemostatic responses (Jaworski et al., 1995). Calreticulin has also been found in human lymphocytic secretory granules (Andrin et al., 1998), suggesting a regulated mechanism by which tick calreticulin may be released in saliva, although it has already been noted that A. americanum calreticulin, and, A. variegatum calreticulin (this study), terminates at the sequence HEEL which may be less efficiently retrieved by the ‘KDEL’ receptor-mediated retrieval system (Jaworski et al., 1995). In humans, calreticulin is released from stressed cells and it is an immunodominant antigen associated with certain autoimmune diseases (Eggleton and Llewellyn, 1999). Calreticulin can bind to C1q (Kishore et al., 1997; Stuart et al., 1997), the first sub-component of the C1 complex of complement, and it is tempting to speculate that tick calreticulins, which exhibit greater than 70% sequence identity with human calreticulin, may both interfere with C1q-mediated, and related collectins, function during tick feeding and also play a role in priming of autoimmune responses. A striking feature of the A. variegatum sequence data is the high level of redundancy of ESTs that code for glycine rich proteins (Table 3). Several major protein components of A. variegatum saliva have an estimated size of 32–42 kDa (Lloyd and Walker, 1995) and it is possible that the abundant cDNAs identified in this study encode some of these proteins. The amino acid composition of the tick cement of Boophilus microplus which consists primarily, but not exclusively of proteins, is above average for glycine, serine, leucine and tyrosine (Kemp et al., 1982). A collagen-like glycine rich extracellular matrix-like protein present in the salivary gland of Haemaphysalis longicornus has been described (Mulenga et al., 1999). Several glycine rich proteins have been described from the glands of Ixodes ricinus (Godfroid et al., 2000) and Rhipicephalus appendiculatus (Nuttall and Paesen, 1999). Recently we have characterised the gene encoding RIM36, a 36 kDa glycine rich cement protein of R. appendiculatus that is expressed in

V. Nene et al. / International Journal for Parasitology 32 (2002) 1447–1456

secretory granules of e cells of type III salivary gland acini (Bishop et al., 2002). There appear to be a minimum of 11 distinct glycine rich protein families encoded in A. variegatum salivary gland (data not shown). Fig. 1 depicts three of them. The sequence alignments indicate that there is polymorphism within members of each protein family including deletion of stretches of amino acid residues, an observation also made during the analysis of variants of RIM36 in R. appendiculatus (Bishop et al., 2002). It is interesting to note that TC2 and TC449, which appear to be deletion variants of the sequences shown in Fig. 1 were present in relatively lower abundance, each consensus being assembled from only two ESTs. TC5 appears to encode a 38 kDa protein, consisting of five trans-membrane domains (Krogh et al., 2001) with a signal anchor sequence at the N-terminus (Nielsen et al., 1997) suggesting that it is an integral membrane protein. All the ORFs in Fig. 1a, b have a potential translation initiation codon close to the start of their ORF and it is therefore possible that these TCs also code for full length protein similar in size to that in TC5. The TIGR gene index system (Quackenbush et al., 2001) has enabled rapid organisation of the A. variegatum EST data into a structured database that can be accessed and queried by the scientific community. The AvGI gene index provides a catalogue of genes expressed in the tick salivary glands and a provisional functional assignment that can be used as a starting point for studying the role of tick proteins in salivary gland physiology, suppression of vertebrate immune responses or modulation of inflammatory responses. It is possible that some of these sequences may be useful in developing novel anti-tick chemicals and molecular markers for differentiating tick populations. The biochemical processes in cement hardening and dissolution are currently unknown and it is possible that members of the glycine rich family of proteins identified may constitute important components of tick cement. Some of the proteins that we have identified in this study are also of interest in the context of evaluation as vaccine candidates. Experimental vaccination with salivary gland molecules have been shown to both interfere with the process of tick feeding (Willadsen, 2001; Mulenga et al, 1999) and pathogen transmission (Wikel, 1999; Nuttall et al., 2000) and may therefore contribute in future to the control of ticks and tick-borne diseases. Note: Nucleotide sequence data reported in this paper are available in the GenBanke database under the accession numbers BM289399 to BM293390.

Acknowledgements We would like to thank the ILRI tick unit for provision of material, the TIGR seqcore facility for processing the cDNAs, Jennifer Tsai for submission of EST data and Geo Pertea for helpful discussions. EST data on the salivary glands of Amblyomma americanum was provided by the A. americanum cDNA Sequencing Project (Bruce A. Roe,

1455

Doris Kupfer, Sara Downard, Laura Hern, Majed Aljamali, and Richard Essenberg). This work was funded by a grant from USAID. This is ILRI publication number 200237.

References Altschul, S.F., Gish, W., Miller, W., Myers, E.W.J., Lipman, D.J., 1990. Basic local alignment search tool. J. Mol. Biol. 215, 403–10. Ambrose, N., Lloyd, D., Maillard, J.C., 1999. Immune responses to Dermatophilus congolensis infections. Parasitol. Today 15, 295–300. Andrin, C., Pinkoski, M.J., Burns, K., Atkinson, E.A., Krahenbuhl, O., Hudig, D., Fraser, S.A., Winkler, U., Tschopp, J., Opas, M., Bleackley, R.C., Michalak, M., 1998. Interaction between a Ca 21-binding protein calreticulin and perforin, a component of the cytotoxic T-cell granules. Biochemistry 37, 10386–94. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Traver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G., 2000. Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat. Genet. 25, 25–29. Baserga, S.J., Linnenbach, A.J., Malcolm, S., Ghosh, P., Malcolm, A.D., Takeshita, K., Forget, B.G., Benz, E.J.J., 1985. Polyadenylation of a human mitochondrial ribosomal RNA transcript detected by molecular cloning. Gene 35, 305–12. Bergman, D.K., Palmer, M.J., Caimano, M.J., Radolf, J.D., Wikel, S.K., 2000. Isolation and molecular cloning of a secreted immunosuppressant protein from Dermacentor andersoni salivary gland. J. Parasitol. 86, 516–25. Bishop, R., Lambson, B., Wells, C., Pandit, P., Osaso, J., Nkonge, C., Morzaria, S.P., Musoke, A., Nene, V., 2002. A cement protein of the tick Rhipicephalus appendiculatus, located in the secretory e cell granules of the type III salivary galnd acini, induces strong antibody responses in cattle. Int. J. Parasitol. (in press). Black IV, W.C., Piesman, J., 1994. Phylogeny of hard- and soft-tick taxa (Acari: Ixodida) based on mitochondrial 16S rDNA sequences. Proc. Natl Acad. Sci. USA 91, 10034–8. Black IV, W.C., Roehrdanz, R.L., 1998. Mitochondrial gene order is not conserved in arthropods: prostriate and metastriate tick mitochondrial genomes. Mol. Biol. Evol. 15, 1772–85. Camus, E., Barre, N., 1990. Amblyomma variegatum and associated diseases in the Caribbean: strategies for control and eradication in Guadeloupe. Parassitologia 32, 185–93. Das, S., Banerjee, G., DePonte, N.M., Kantor, F.S., Fikrig, E., 2001. Salp25D, an Ixodes scapularis antioxidant, is 1 of 14 immunodominant antigens in engorged tick salivary glands. J. Infect. Dis. 184, 1056–64. Eggleton, P., Llewellyn, D.H., 1999. Pathophysiological roles of calreticulin in autoimmune disease. Scand. J. Immunol. 49, 466–73. Gabant, P., Szpirer, C.Y., Couturier, M., Faelen, M., 1998. Direct selection cloning vectors adapted to the genetic analysis of gram-negative bacteria and their plasmids. Gene, 87–92. Godfroid, E., Bollen, A., Leboulle, G., 2000. Identification and molecular characterisation of proteins expressed in the tick salivary gland, Patent number WO0077198. Hill, C.A., Gutierrez, J.A., 2000. Analysis of the expressed genome of the lone star tick, Amblyomma americanum (Acari: Ixodidae) using an expressed sequence tag approach. Microb. Comp. Genomics 5, 89–101. Jaworski, D.C., Simmen, F.A., Lamoreaux, W., Coons, L.B., Muller, M.T., Needham, G.R., 1995. A secreted calreticulin in ixodid tick saliva. J. Insect. Physiol. 41, 369–75. Jaworski, D.C., Jasinskas, A., Metz, C.N., Bucala, R., Barbour, A.G., 2001. Identification and characterization of a homologue of the pro-inflammatory cytokine macrophage migration inhibitory factor in the tick, Amblyomma americanum. Insect. Mol. Biol. 10, 323–31. Johnson, S., Michalak, M., Opas, M., Eggleton, P., 2001. The ins and outs

1456

V. Nene et al. / International Journal for Parasitology 32 (2002) 1447–1456

of calreticulin: from the ER lumen to the extracellular space. Trends Cell. Biol. 11, 122–9. Kemp, D.H., Stone, B.F., Binnington, K.C., 1982. Tick attachment and feeding: role of the mouthparts, feeding apparatus, salivary gland secretions and the host response. In: Obenchain, F.D., Galun, R. (Eds.). Physiology of ticks, Pergamon Press, New York, NY, pp. 119–68. Kishore, U., Sontheimer, R.D., Sastry, K.N., Zaner, K.S., Zappi, E.G., Hughes, G.R., Khamashta, M.A., Strong, P., Reid, K.B., Eggleton, P., 1997. Release of calreticulin from neutrophils may alter C1q-mediated immune functions. Biochem. J. 322, 543–50. Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L.L., 2001. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567–80. Lloyd, C.M., Walker, A.R., 1993. The systemic effect of adult and immature Amblyomma variegatum ticks on the pathogenesis of dermatophilosis. Rev. Elev. Med. Vet. Pays. Trop. 46, 313–6. Lloyd, C.M., Walker, A.R., 1995. Salivary glands and saliva of Amblyomma variegatum ticks: comparison of immatures and adults in relation to the pathogenesis of dermatophilosis. Vet. Parasitol 59, 59– 67. Luo, C., McSwain, J.L., Tucker, J.S., Sauer, J.R., Essenberg, R.C., 1997. Cloning and sequence of a gene for the homologue of the stearoyl CoA desaturase from salivary glands of the tick Amblyomma americanum. Insect Mol. Biol. 6, 267–71. McKosker, P.J., 1979. Global aspects of the management and control of ticks of veterinary importance. Rodriguez, J. (Ed.), Recent advances in acarology, vol. 2. Academic Press, London, pp. 45–53. Mulenga, A., Sugimoto, C., Sako, Y., Ohashi, K., Musoke, A., Shubash, M., Onuma, M., 1999. Molecular characterization of a Haemaphysalis longicornis tick salivary gland-associated 29 kDa protein and its effect as a vaccine against tick infestation in rabbits. Infect. Immun. 67, 1652–8. Mulenga, A., Sugino, M., Nakajima, M., Sugimoto, C., Onuma, M., 2001. Tick-encoded serine proteinase inhibitors (serpins); potential target antigens for tick vaccine development. J. Vet. Med. Sci. 63, 1063–9. Nielsen, H., Engelbrecht, J., Brunak, S., von Heijne, G., 1997. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 10, 1–6. Nuttall, P. A., Paesen, G. C., 1999. Tissue cement protein from Rhipicephalus appendiculatus, Patent number WO9924567. Nuttall, P.A., Paesen, G.C., Lawrie, C.H., Wang, H., 2000. Vector–host interactions in disease transmission. J. Mol. Microbiol. Biotechnol. 2, 381–6.

Pegram, R.G., Hansen, J.W., Wilson, D.D., 2000. Eradication and surveillance of the tropical bont tick in the Caribbean. An international approach. Ann. N. Y. Acad. Sci. 916, 179–85. Quackenbush, J., Cho, J., Lee, D., Liang, F., Holt, I., Karamycheva, S., Parvizi, B., Pertea, G., Sultana, R., White, J., 2001. The TIGR gene indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res. 29, 159–64. Rotz, L.D., Khan, A.S., Lillibridge, S.R., Ostroff, S.M., Hughes, J.M., 2002. Public health assessment of potential biological terrorism agents. Emerg. Infect. Dis. 8, 225–30. Sanders, M.L., Jaworski, D.C., Sanchez, J.L., DeFraites, R.F., Glass, G.E., Scott, A.L., Raha, S., Ritchie, B.C., Needham, G.R., Schwartz, B.S., 1998. Antibody to a cDNA-derived calreticulin protein from Amblyomma americanum as a biomarker of tick exposure in humans. Am. J. Trop. Med. Hyg. 59, 279–85. Sauer, J.R., McSwain, J.L., Bowman, A.S., Essenberg, R.C., 1995. Tick salivary gland physiology. Annu. Rev. Entomol. 40, 245–67. Sauer, J.R., Essenberg, R.C., Bowman, A.S., 2000. Salivary glands in ixodid ticks: control and mechanism of secretion. J. Insect Physiol. 46, 1069–78. Stuart, G.R., Lynch, N.J., Day, A.J., Schwaeble, W.J., Sim, R.B., 1997. The C1q and collectin binding site within C1q receptor (cell surface calreticulin). Immunopharmacology 38, 73–80. Tsuda, A., Mulenga, A., Sugimoto, C., Nakajima, M., Ohashi, K., Onuma, M., 2001. cDNA cloning, characterization and vaccine effect analysis of Haemaphysalis longicornis tick saliva proteins. Vaccine 19, 4287–96. Uilenberg, G., 1983. Recent advances in the study of the vector role of ticks of the genus Amblyomma (Ixodidae). Rev. Elev. Med. Vet. Pays. Trop. 36, 61–66. Wikel, S.K., Ramachandra, R.N., Bergman, D.K., 1994. Tick-induced modulation of the host immune response. Int. J. Parasitol. 24, 59–66. Wikel, S.K., 1999. Tick modulation of immunity: an important factor in pathogen transmission. Int. J. Parasitol. 29, 851–9. Wikel, S.K., Alacron-Chaidez, F.J., 2001. Progress towards molecular characterization of ectoparasite modulation of host immunity. Vet. Parasitol. 101, 275–87. Willadsen, P., 2001. The molecular revolution in the development of vaccines against ectoparasites. Vet. Parasitol. 101, 353–67. Xie, W.Q., Rothblum, L.I., 1991. Rapid, small scale RNA isolation from tissue culture cells. Biotechniques 11, 326–7. Zhang, Z., Schwartz, S., Wagner, L., Miller, W., 2000. A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7, 203–14.