Toxicon 52 (2008) 794–806
Contents lists available at ScienceDirect
Toxicon journal homepage: www.elsevier.com/locate/toxicon
Transcriptome analysis revealed novel possible venom components and cellular processes of the tarantula Chilobrachys jingzhao venom gland Jinjun Chen a, Liqun Zhao a, Liping Jiang a, b, Er Meng a, Yongqun Zhang a, Xia Xiong a, Songping Liang a, * a
The Key Laboratory of Protein Chemistry and Developmental Biology of Ministry of Education, College of Life Sciences, Hunan Normal University, Changsha 410081, PR China b Xiangya School of Medicine, Central South University, PR China
a r t i c l e i n f o
a b s t r a c t
Article history: Received 12 May 2008 Received in revised form 3 August 2008 Accepted 12 August 2008 Available online 19 August 2008
The tarantula Chilobrachys jingzhao is one of the most venomous spiders with a specialized organ, venom gland, which synthesizes and secretes the complex and abundant toxin proteins. The components of the venom have been extensively studied. As far as the molecular mechanism of toxin secretion and metabolism is concerned, we still knew a little. To obtain a comprehensive view of function of its venom gland we constructed a non-normalized cDNA library of the venom gland and generated 788 expressed sequence tags (ESTs). All ESTs were assembled into 356 non-redundancy sequences including 85 clusters and 271 singlets, of which 31.4% of total unique sequences belong to secretion protein coding sequences including cystine knot toxins (29.1%) and other secretion proteins (2.3%); 54.0% are similar to common cellular transcripts; and 14.6% have no significant similarity to any known sequences. Annotation of these ESTs revealed some novel possible venom components and cellular processes important for venom gland functions, including protein posttranslation processing, cell motility, protein synthesis, energy supply, etc. This study contains a transcriptome analysis of spider venom gland focusing on its cellular structural and functional aspects, and confirms the very specialized nature of the spider venom gland as toxin producer. Ó 2008 Elsevier Ltd. All rights reserved.
Keywords: Spider Annotation Gene ontology (GO) Eukaryotic orthologous groups (KOG)
1. Introduction A large range of poisonous animals have been extensively studied because their venoms are a potential source of pharmacological agents and physiological tools. Tarantulas comprising more than 860 species, like all other spiders, are predators that feed on a variety of vertebrate and invertebrate preys. Tarantula possesses a variety of toxins that target cell receptors and ion channels in the nervous system (King, 2004; Escoubas, 2006). The molecular diversity of spider toxins has been reviewed (Escoubas, 2006) and appears to be based on a limited set of structural
* Corresponding author. Tel./fax: þ86 731 8861304. E-mail address:
[email protected] (S. Liang). 0041-0101/$ – see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.toxicon.2008.08.003
scaffolds such as ICK (inhibitor cysteine knot) and DDH (disulfide-directed b-hairpin). Most studies have focused almost exclusively on the small, compact molecules crosslinked by 3–5 disulfide bonds, with molecular masses ranging from 3.5 to 7 kDa. However, very few high molecular mass proteins have been obtained and identified by the biochemistry means, for example, liquid chromatographic separation of crude venom (Escoubas et al., 2006). Furthermore, some fundamental questions being arisen are how spider venom glands have been able to generate such diversity and what the underlying genetic mechanisms are? The tarantula Chilobrachys jingzhao is one of the most venomous spiders living in southern China and many toxin components from its venom gland have been purified and characterized (Xiao et al., 2004, 2005; Liao et al., 2006; Yuan et al., 2007a,b; Zeng et al., 2007). However, there is
J. Chen et al. / Toxicon 52 (2008) 794–806
little information about the cellular processes that normally take place inside the glands for production of the venom mixture. Analysis of expressed sequence tags is an efficient approach for gene discovery, expression profiling (Adams et al., 1991; Takasuga et al., 2001) and functional genomics studies. Sequence annotation provides the basis for inferring phylogenetic relationships and functional studies (Schwartz et al., 2007). Therefore, transcriptome analysis of venom gland will give clues to further researches of the ‘‘pharmaceutical factory’’. In the present work we randomly generated and analyzed 788 ESTs from a cDNA library of the venom gland of C. jingzhao. After clustering the resulting dataset, we identified transcripts possibly associated with different cellular functions. The possible roles of some of the transcripts were discussed, although many had unknown functions. Gene cataloguing and profiling of the venom gland of C. jingzhao offered a potential resource to guide future investigations relevant to biology of venom gland and toxicology. Moreover, it will allow the identification of cellular functional transcripts that may represent a general view of the physiological and biochemical processes of the very specialized secretory tissue. 2. Materials and methods 2.1. cDNA library construction and EST sequencing The tarantula C. jingzhao was collected in Hainan Province of China. The preparation of total RNA was performed according to the previous methods (Diao et al., 2003). The venomous glands of eight female spiders which were at the same age and from the same region were harvested and homogenized in liquid nitrogen, followed by cell lysis in the presence of TRIzol reagent (Invitrogen). PolyA(þ) RNA was purified from the total RNA on an oligo(dT)-cellulose affinity column using the mRNA Purification Kit (Promega) according to the manufacturer’s protocol. The full-length cDNA library was constructed as described by the instructions provided with the CreatorÔ SMARTTM cDNA Library Construction Kit (Clontech). The randomly selected clones were grown in LB medium containing chloramphenicol (30 mg/mL) in 96well plates for 16 h. The plasmids were extracted by alkaline lysis and were single-pass sequenced from the 50 -end with an automated ABI PRISM 3700 sequencer (Perkin Elmer), using the T7 promoter primer and ABI PRISMÒ Big DyeÔ terminator v3.1 ready reaction cycle sequencing kits (Applied Biosystems). 2.2. Data processing and bioinformatics analysis Prior to further analysis, cDNA sequencing outputs were trimmed by removal of vector, primer sequences and polyA tails with ABI PRISMÒDNA Sequencing Analysis SoftwareV.3.3 (Chou and Holmes, 2001). High-quality ESTs were assembled into contigs using Phrap software. Default settings were used except for 40 bp minimum overlap and 99% identity. With the aim of providing more useful sequence annotations for comparative studies, we selected to identify the eukaryotic orthologous groups (KOG) of each
795
sequence using a web server (http://oxytricha.princeton. edu/BlastO/) (Tatusov et al., 2003; Zhou and Landweber, 2007). Furthermore, in order to improve sequence annotation and to gain insights into the cellular processes that each sequence could be involved in, the ESTs were submitted to GOblet (Groth et al., 2004) and AmiGO (Ashburner et al., 2000) web servers. The aim of this procedure was to identify relevant Gene Ontology (GO) terms. Then those clusters without significant matches to known orthologous were searched against public databases (NCBI, Swiss-prot þ TREMBL/EMBL) using the BLASTx or BLASTn programs to identify putative functions of the new expression sequence tags (ESTs). The signal peptides were predicted with the SignalP 3.0 program (http://www.cbs. dtu.dk/services/SignalP/). The sequence similarity is computed by the program EMBOSS (http://www.ebi.ac.uk/ Tools/emboss/align/).
3. Results 3.1. cDNA library and EST analysis The constructed C. jingzhao venom gland library was nonnormalized, and the titer was 2.0 106 cfu/mL with 99% recombinant clones. After discarding the poor-quality sequences, 788 high-quality ESTs were used for investigating the gene expression profile in the C. jingzhao venom glands. The mean read length of ESTs was 494 nucleotides (ranging from 140 to 1137 nucleotides). The initial sequences were grouped into 356 non-redundancy sequences, of which 85 clusters showed more than one EST and 271 singlets. The identified sequences were assorted into three main groups: 1) the putative precursors including signal peptide, accounting for 31.4% of total unique sequences (in 58 clusters from 358 ESTs and 54 singlets). Among them, 104 cystine knot toxins (CKTs) have been reported elsewhere (Chen et al., 2008). The cDNA sequences of CKTs are available for public view in the animal toxin database (http://protchem.hunnu.edu.cn/toxin) (He et al., 2008) and in the GenBank database of NCBI (http://www.ncbi.nlm.nih.gov/entrez, accession numbers: EU233831–EU233934). The molecular diversity, classification and characteristics of the toxins in regard to modification, structure, and evolution were further explored and discussed in the previous paper (Chen et al., 2008); 2) the putative proteins similar to gene products implicated in common cellular processes, representing 54.0% of total unique sequences (in 27 contigs from 147 ESTs and 165 singlets). These ESTs and eight non-CKT secretion protein genes were mainly discussed in this report; and 3) 14.6% of total unique sequences had no significant similarity to any known sequences (Fig. 1). All sequence data reported in this paper have been submitted into the public database (GenBank accession no. FE530957–FE531203, EU233924, EU233927– EU233929, EU233931–EU233934). The abundance distribution of all ESTs (Fig. 2) was cataloged as follows:
(1) Four clusters contained more than 20 ESTs each. The most abundant transcripts constituted 1.12% of the total clusters (4 of 356 clusters) including 19.16% of total ESTs
796
J. Chen et al. / Toxicon 52 (2008) 794–806
protein coding genes, 25 common cellular functional transcripts and unknown protein coding genes.
Fig. 1. Relative proportion of each category of the unique sequences from C. jingzhao venom gland library. ‘‘Cystine knot toxins’’ include ESTs coding for small, compact molecules cross-linked by disulfide bonds. ‘‘GO-sorted’’ includes transcripts performed gene ontology function classification. ‘‘BLAST-matched’’ includes ESTs that have homologues using the BLASTn or BLASTx programs but without significant matches to known orthologous gene. ‘‘Other secretion proteins’’ include the non-typical-spider-toxin precursors with a putative signal peptide in N-terminus. ‘‘No match’’ includes ESTs that did not match with currently known sequences.
(151 of 788 ESTs). Of these 4 clusters, 2 clusters were common cellular protein genes and two were toxin genes. (2) Six clusters consisted of 11–19 ESTs each and represented 1.69% of the total clusters (6 of 356 clusters) and 11.68% of the total ESTs (92 of 788 ESTs). All 6 clusters encoded toxin proteins. They were the second most abundant mRNA transcripts in the venom gland of C. jingzhao. (3) Twelve clusters contained 6–10 ESTs each and represented 3.37% of the total clusters (12 of 356 clusters) and 10.53% of ESTs (83 of 788 ESTs), all of which encoded toxin proteins. (4) The 63 low abundant clusters, each with 2–5 ESTs, constituted 17.70% of the total clusters (63 of 356 clusters) and 22.97% of the total ESTs (181 of 788 ESTs). They included 36 toxin coding genes, 2 other secretion
(5) There were 271 unique sequences representing 76.12% of the total clusters (271 of 356 clusters) and 34.39% of ESTs (271 of 788 ESTs). The occurrence rate of these clusters was only once in current sequenced ESTs. They included 48 toxin coding genes, 6 other secretion protein coding genes, and 217 cellular functional transcripts and unmatched sequences. 3.2. Novel secretion proteins Eight putative proteins (named JZTX-75, JZTX-77, JZTX78, JZTX-79, JZTX-81, JZTX-82, JZTX-83 and JZTX-84) were predicted to have a signal peptide at the N-terminus, which implied that they might be toxin precursors. However, they are different from typical spider polypeptide toxins (Escoubas et al., 2006), hereby being named other secretion proteins. Among them, JZTX-75 is similar to cystatin precursors (Fig. 3A) and JZTX-84 is similar to some lectins with fibrinogen-related domains (FReDs) (Fig. 3B). The homologues of JZTX-81, JZTX-82 and JZTX-83 have little physical meaning because of low e-value (Table 1). Intriguingly, they are rich in cysteines. Moreover, JZTX-81 and JZTX-82 are rich in lysine and arginine residues (Fig. 3C). Other three precursors had no significant matches to known gene. Moreover, the three cDNA sequences of them are long enough but the open reading frame (ORF) sequences are short (Table 1). 3.3. KOG annotation of the ESTs implicated in common cellular processes We used a BLAST on Orthologous groups (BLASTO) web server, a modified BLAST tool designed to search orthologous group data to identify a group of sequences that were
Fig. 2. Prevalence distribution of the cluster size. The initial 788 ESTs were grouped into 356 clusters, of which 4 clusters (secretion protein: 2 clusters/54 ESTs; common cellular protein: 2 clusters/97 ESTs) comprised more than 20 ESTs each; 6 clusters contained 11–19 ESTs each (secretion protein: 2 clusters/92 ESTs); 12 clusters comprised 6–10 ESTs each (secretion protein: 12 clusters/83 ESTs); 63 clusters comprised 2–5 ESTs each (secretion protein: 38 clusters/129 ESTs; common cellular protein: 25 clusters/62 ESTs) and 271 unique sequences (secretion protein: 54; common cellular protein: 217).
J. Chen et al. / Toxicon 52 (2008) 794–806
797
Fig. 3. Five spider novel secretion proteins in C. jingzhao venom-gland cDNA library. The signal peptide sequences are underlined. A) JZTX-75 predicted sequence is aligned with five cystatin precursors. B) JZTX-84 predicted sequence is aligned with techylectin-5 precursors. Identical residues and strong similar residues are highlighted with dark and grey respectively. Gaps are introduced to maximize the sequence similarities. The similarity is annotated at the end of sequence. C) Predicted amino acid sequences of JZTX-81, JZTX-82 and JZTX-83. Cysteine residues are highlighted with grey and basic residues of JZTX-81 and JZTX-82 are in light grey.
putatively orthologous to the query sequence. There were 115 putative sequences corresponding to the eukaryotic orthologous groups (KOG) of the transcripts predicted to be involved in common cellular processes, such as proteins involved in transcription and translation, energy supply, cytoskeleton, cell mobility, metabolism, protein processing, redox-related enzymes, protein degradation, cell defense, etc. (Table 2). Noteworthy, these putative gene products signified a source of new information about spider venom gland specific genes. However, JZ0064 (KOG3965) and JZ0107 (KOG1293) presented identity with sequences that had been already described but with no functional assessment.
these ontologies the categories were: four types of components within CC; six different functions within MF and ten varied processes within BP (Fig. 4). The remaining sequences without GO annotation were searched against public databases (NCBI, Swiss-prot þ TREMBL/EMBL) using the BLASTx or BLASTn programs to identify putative functions of the new expression sequence tags. There were 48 unique sequences (100 ESTs) with homologues in databases (Fig. 1). 4. Discussion 4.1. Combine transcriptome and peptidome to mine the complexity of spider venom
3.4. GO-sorted annotated sequences The cDNA library constructed is a non-amplified primary library, so the clone abundance or the cluster size presents the relative mRNA population of a given transcript (Liu and Graber, 2006). There are 104 identified cystine knot toxins (CKTs), 115 identified eukaryotic orthologous
All non-toxin genes were submitted to AmiGO, and 144 unique sequences (212 ESTs) were annotated in each of the three ontologies of GO: cellular component (CC), molecular function (MF) and biological processes (BP). Within each of Table 1 C. jingzhao venom-gland cDNA clusters of secretion proteins non-rich in cysteine Cluster no.
cDNA (bp)
Protein
Best match to NR protein database
e-Value
JZTX-75 JZTX-77 JZTX-78 JZTX-79 JZTX-81 JZTX-82 JZTX-83 JZTX-84
660 637 841 340 776 864 820 854
134aa 48aa 57aa 56aa 161aa 212aa 208aa 222aa
L-Cystatin
7e-22 NA NA NA 0.63 0.44 2.7 3e-42
precursor (Tachypleus tridentatus)
No match No match No match Zinc finger protein 462 (Homo sapiens)* AT-rich interactive domain-containing protein 4B (Mus musculus)* Protein QN1 homologue (H. sapiens)* Techylectin-5A precursor (T. tridentatus)
The symbol ‘*’ indicates the very low e-value just to show no physical meaning homologues in Nr of NCBI.
798
J. Chen et al. / Toxicon 52 (2008) 794–806
Table 2 The identifications correspond to the eukaryotic orthologous groups (KOG) of the transcripts predicted to be involved in common cellular processes Cluster no.
GenBank no.
Energy supply JZ0142 FE531098 JZ0001 FE530957 JZ0002 FE530958 JZ0003 FE530959 JZ0080 FE531036 JZ0095 FE531051 JZ0097 FE531053 JZ0098 FE531054 JZ0114 FE531070 JZ0118 FE531074 JZ0131 FE531087 JZ0172 FE531128 JZ0140 FE531096 JZ0065 FE531021 JZ0190 FE531146 Transcription and translation JZ0152 FE531108 JZ0060 FE531016 JZ0007 FE530963 JZ0010 FE530966 JZ0012 FE530968 JZ0013 FE530969 JZ0014 FE530970 JZ0015 FE530971 JZ0016 FE530972 JZ0017 FE530973 JZ0018 FE530974 JZ0019 FE530975 JZ0020 FE530976 JZ0021 FE530977 JZ0070 FE531026 JZ0022 FE530978 JZ0023 FE530979 JZ0024 FE530980 JZ0025 FE530981 JZ0027 FE530983 JZ0028 FE530984 JZ0029 FE530985 JZ0030 FE530986 JZ0031 FE530987 JZ0032 FE530988 JZ0033 FE530989 JZ0035 FE530991 JZ0037 FE530993 JZ0038 FE530994 JZ0041 FE530997 JZ0042 FE530998 JZ0055 FE531011 JZ0057 FE531012 JZ0128 FE531084 JZ0170 FE531126 JZ0185 FE531141 JZ0102 FE531058 JZ0063 FE531019 JZ0061 FE531017 Cytoskeleton and cell mobility JZ0066 FE531022 JZ0068 FE531024 JZ0067 FE531023 JZ0072 FE531028 JZ0073 FE531029 JZ0074 FE531030 JZ0075 FE531031 JZ0077 FE531033 JZ0078 FE531034 JZ0160 FE531116 JZ0056 FE531012 JZ0192 FE531148
KOG
e-Value
Description
KOG0750 KOG3422 KOG3441 KOG1662 KOG4110 KOG4767 KOG4664 KOG4769 KOG0749 KOG3440 KOG0888 KOG3453 KOG4663 KOG4665 KOG0158
1e-18 1e-08 4e-37 2e-51 1e-12 1e-48 2e-41 2e-30 5e-07 9e-29 1e-38 1e-17 7e-45 3e-16 5e-24
Mitochondrial solute carrier protein Mitochondrial ribosomal protein L16 Mitochondrial ribosomal protein L14 Mitochondrial F1F0-ATP synthase, subunit OSCP/ATP5 NADH:ubiquinone oxidoreductase, NDUFS5/15 kDa Cytochrome c oxidase, subunit II, and related proteins Cytochrome oxidase subunit III and related proteins Cytochrome c oxidase, subunit I Mitochondrial ADP/ATP carrier proteins Ubiquinol cytochrome c reductase, subunit QCR7 Nucleoside diphosphate kinase Cytochrome c Cytochrome b ATP synthase F0 subunit 6 and related proteins Cytochrome P450 CYP3/CYP5/CYP6/CYP9 subfamilies
KOG2777 KOG2869 KOG0052 KOG0155 KOG2351 KOG2039 KOG3169 KOG2240 KOG0328 KOG1462 KOG0857 KOG0466 KOG1770 KOG3291 KOG3418 KOG1751 KOG1570 KOG1646 KOG0397 KOG0900 KOG3311 KOG0900 KOG0900 KOG3031 KOG1754 KOG1678 KOG1696 KOG3412 KOG0378 KOG0857 KOG3301 KOG4211 KOG1456 KOG0120 KOG4205 KOG1365 KOG3467 KOG3450 KOG1775
6e-07 5e-38 3e-27 9e-73 2e-37 1e-28 2e-41 5e-40 4e-09 3e-08 3e-92 e-102 2e-45 e-100 3e-50 5e-48 1e-87 1e-73 1e-65 3e-34 9e-33 7e-35 2e-48 1e-22 3e-26 2e-71 1e-66 3e-26 3e-77 2e-20 2e-88 3e-30 1e-51 3e-37 6e-51 4e-14 8e-40 2e-33 2e-28
tRNA-specific adenosine deaminase 1 Meiotic cell division protein Pelota/DOM34 Translation elongation factor EF-1 alpha/Tu Transcription factor CA150 RNA polymerase II, fourth largest subunit Transcriptional coactivator p100 RNA polymerase II transcriptional regulation mediator RNA polymerase II general transcription factor BTF3 and related proteins Translation initiation factor 4F, helicase subunit (eIF-4A) and related helicases Translation initiation factor 2B, gamma subunit (eIF-2Bgamma/GCD1) 60S ribosomal protein L10 Translation initiation factor 2, gamma subunit (eIF-2gamma; GTPase) Translation initiation factor 1 (eIF-1/SUI1) Ribosomal protein S7 60S ribosomal protein L27 60S ribosomal protein L23 60S ribosomal protein L10A 40S ribosomal protein S6 60S ribosomal protein 40S ribosomal protein S20 Ribosomal protein S18 40S ribosomal protein S20 40S ribosomal protein S20 Protein required for biogenesis of the ribosomal 60S subunit 40S ribosomal protein S15/S22 60S ribosomal protein L15 60S ribosomal protein L19 60S ribosomal protein L28 40S ribosomal protein S4 60S ribosomal protein L10 Ribosomal protein S4 Splicing factor hnRNP-F and related RNA-binding proteins Heterogeneous nuclear ribonucleoprotein L (contains RRM repeats) Splicing factor U2AF, large subunit (RRM superfamily) RNA-binding protein musashi/mRNA cleavage and polyadenylation factor I complex, subunit HRP1 RNA-binding protein Fusilli, contains RRM domain Histone H4 Huntingtin interacting protein HYPK U6 snRNA-associated Sm-like protein
KOG1376 KOG2512 KOG1375 KOG1727 KOG0676 KOG3155 KOG1735 KOG0676 KOG0676 KOG0161 KOG2659 KOG2437
e-131 1e-07 2e-23 2e-22 7e-48 3e-68 2e-07 1e-04 e-161 2e-15 3e-30 1e-68
Alpha tubulin Beta-tubulin folding cofactor C Beta tubulin Microtubule-binding protein (translationally controlled tumor protein) Actin and related proteins Actin-related protein Arp2/3 complex, subunit ARPC3 Actin depolymerizing factor Actin and related proteins Actin and related proteins Myosin class II heavy chain LisH motif-containing protein Muskelin
J. Chen et al. / Toxicon 52 (2008) 794–806
799
Table 2 (continued ) Cluster no.
GenBank no.
KOG
Lipid transport JZ0044 FE531000 KOG3668 JZ0045 FE531001 KOG1471 JZ0122 FE531078 KOG4824 JZ0155 FE531111 KOG4015 Protein secretion and processing JZ0090 FE531046 KOG1986 JZ0133 FE531089 KOG1631 JZ0069 FE531025 KOG0357 JZ0011 FE530967 KOG4117 JZ0125 FE531081 KOG0101 JZ0126 FE531082 KOG0548 JZ0184 FE531140 KOG0019 JZ0109 FE531065 KOG3591 JZ0111 FE531067 KOG0544 JZ0121 FE531077 KOG3498 JZ0178 FE531134 KOG2524 JZ0123 FE531079 KOG3627 Metabolism JZ0005 FE530961 KOG2631 JZ0006 FE530962 KOG1557 JZ0103 FE531059 KOG1175 JZ0106 FE531061 KOG2517 JZ0141 FE531097 KOG2450 JZ0132 FE531088 KOG2772 JZ0151 FE531107 KOG1584 JZ0004 FE530960 KOG0622 JZ0059 FE531015 KOG4230 JZ0158 FE531114 KOG3367 JZ0099 FE531055 KOG2586 Protein degradation JZ0052 FE531008 KOG2391 JZ0039 FE530995 KOG0733 JZ0046 FE531002 KOG2930 JZ0047 FE531003 KOG0420 JZ0048 FE531004 KOG0895 JZ0050 FE531006 KOG0003 JZ0194 FE531150 KOG1542 Redox-related enzymes JZ0008 FE530964 KOG0867 JZ0104 FE531060 KOG0852 JZ0129 FE531085 KOG0441 Cell defense JZ0091 FE531047 KOG2806 JZ0198 FE531154 KOG3017 JZ0143 FE531099 KOG1746 JZ0105 FE531061 KOG2332 JZ0115 FE531071 KOG2332 JZTX-84 EU233934 KOG2579 Cell regulation and others JZ0093 FE531049 KOG0515 JZ0085 FE531041 KOG4228 JZ0058 FE531014 KOG2269 JZ0119 FE531075 KOG0699 JZ0107 FE531063 KOG1293 JZ0064 FE531020 KOG3965
e-Value
Description
3e-37 2e-08 1e-05 3e-11
Phosphatidylinositol transfer protein Phosphatidylinositol transfer protein SEC14 and related proteins Apolipoprotein D/Lipocalin Fatty acid-binding protein FABP
4e-71 2e-47 7e-17 1e-20 8e-20 2e-77 3e-33 4e-35 1e-32 2e-30 1e-50 1e-20
Vesicle coat complex COPII, subunit SEC23 Translocon-associated complex TRAP, alpha subunit Chaperonin complex component, TCP-1 epsilon subunit (CCT5) Heat shock factor binding protein Molecular chaperones HSP70/HSC70, HSP70 superfamily Molecular co-chaperone STI1 Molecular chaperone (HSP90 family) Alpha crystallins FKBP-type peptidyl-prolyl cis–trans isomerase Preprotein translocase, gamma subunit Cobyrinic acid a,c-diamide synthase Trypsin
2e-71 e-116 4e-56 1e-17 7e-45 1e-47 2e-15 3e-60 5e-29 3e-18 3e-41
Class II aldolase/adducin N-terminal domain protein Fructose-bisphosphate aldolase Acyl-CoA synthetase Ribulose kinase and related carbohydrate kinases Aldehyde dehydrogenase Transaldolase Sulfotransferase Ornithine decarboxylase C1-tetrahydrofolate synthase Hypoxanthine-guanine phosphoribosyltransferase Pyridoxamine-phosphate oxidase
3e-27 4e-15 4e-54 2e-04 6e-62 1e-33 5e-11
Vacuolar sorting protein/ubiquitin receptor VPS23 Nuclear AAA ATPase (VCP subfamily) SCF ubiquitin ligase, Rbx1 component Ubiquitin-protein ligase Ubiquitin-conjugating enzyme Ubiquitin/60S ribosomal protein L40 fusion Cysteine proteinase Cathepsin F
6e-67 2e-59 4e-30
Glutathione S-transferase Alkyl hydroperoxide reductase, thiol specific antioxidant and related enzymes Cu2þ/Zn2þ superoxide dismutase SOD1
2e-04 1e-29 3e-43 3e-04 2e-58 2e-27
Chitinase Defense-related protein containing SCP domain Defender against cell death protein/oligosaccharyltransferase, epsilon subunit Ferritin Ferritin Ficolin and related extracellular proteins
3e-26 4e-47 3e-08 4e-15 1e-13 2e-13
p53-interacting protein 53BP/ASPP, contains ankyrin and SH3 domains Protein tyrosine phosphatase Serine/threonine protein kinase Serine/threonine protein phosphatase Proteins containing armadillo/beta-catenin-like repeat Uncharacterized conserved protein
groups and several novel secretion proteins. When compared to the proteomic and peptidomic analysis of the venom from C. jingzhao, in which 60 peptide toxins and 47 cellular proteins were identified (Liao et al., 2007), the transcriptome analysis revealed more information. However, only 34 peptide toxins were identified in both approaches (Table 3). The only partial overlap is not surprising, since on one hand, the materials were pooled venoms of C. jingzhao for the proteomic and peptidomic analysis, however the venomous glands of eight female C.
jingzhao which were at the same age and from the same region were used for transcriptome research, which represents a given transcript; on the other hand, some low abundance toxins, especially numerous toxin isoforms, are hardly identified by Edman sequencing coupled with chromatographic separation. Another important issue is maturation of toxin precursor, which needs to be identified by combining transcriptome and peptidome. A full-length CKT precursor obtained by transcriptome analysis usually contains a signal peptide, a propeptide and a mature
800
J. Chen et al. / Toxicon 52 (2008) 794–806
Fig. 4. Gene ontology-sorted sequence annotation. The non-toxin genes of C. jingzhao venom gland were submitted to AmiGO, and 144 unique sequences were annotated in each of the three ontologies of GO: cellular component (CC), molecular function (MF) and biological processes (BP).
peptide; moreover, some precursors also contain an additional tail region (Chen et al., 2008). Most of the precursors can be spliced and modified during the posttranslational processes, which was also found in some other toxins and appeared to be important for transportation, biological activity and degradation protection (Leisy et al., 1996). Combining the previous peptidomic work, we decided the endoproteolytic sites and amidation of most of these precursors in high confidence (Chen et al., 2008). It was also reported that the combination of MALDI-TOF mass spectrometry coupled with chromatographic separation and the analysis of venom-gland cDNA libraries would permit to mine the complexity of spider venom (Escoubas et al., 2006). 4.2. Novel types of possible venom components Spider venoms are incredibly complex chemical cocktails. They comprise a heterogeneous mixture of inorganic salts, low molecular mass organic molecules, small polypeptides (typically 3–6 kDa with 3–5 disulfide bonds), and high molecular mass proteins (King, 2004). Hitherto, small polypeptides have been extensively studied. However, very few high molecular mass proteins have been obtained and identified by the biochemistry means. In present cDNA library, transcripts of some novel possible high molecular mass proteins were identified as well as 104 CKTs and considerable cellular proteins. Of the secretory non-CKT proteins, JZTX-75 and JZTX-84 have high homology with Lcystatin precursor (Swiss-Prot accession Q7M429) and techylectin-5 precursors (Swiss-Prot accession Q9U8W8 and Q9U8W7) from Tachypleus tridentatus, respectively. LCystatin plays an important role in the protection of cells, resistance against Gram-negative bacteria, and response to external stimuli (Agarwala et al., 1996), which was also detected in Loxosceles laeta (Fernandes-Pedrosa et al., 2008). Techylectin-5A is involved in innate immunity, and agglutinates all types of human erythrocytes, Gram-positive and Gram-negative bacteria (Kairies et al., 2001).
Lectins are also present in venoms of other poisonous animals (Ogawa et al., 2005). Basic local alignment search of JZTX-81, JZTX-82 and JZTX-83 against non-redundant protein sequence database did not find out significant homologues. Remarkably, they are rich in cysteines. Moreover, JZTX-81 and JZTX-82 are rich in lysine and arginine residues, which is a character of antibacterial peptide. In addition, there were several 50 incomplete ESTs with homology to genes involved in the cell defense system, such as chitinase, a membrane anchoring subunit of the a-agglutinin heterodimer, ferritin and defenserelated protein containing sperm-coating glycoprotein (SCP) domain. Moreover, the homologue precursors of them each have a signal peptide. These putative secretory proteins have never been isolated or characterized in spider venom, however, some homologous transcripts were also detected in cDNA library of L. laeta venom gland and denominated ‘other venom activities’ (Fernandes-Pedrosa et al., 2008). We also identified that the transcriptions of ornithine decarboxylase, the rate-controlling enzyme of polyamine biosynthesis, which may imply the existence of polyamine in the venom gland of C. jingzhao. 4.3. ESTs relevant to peptide toxin posttranslation processing Maturation of toxin precursors is a multi-step process, which involves a set of molecular chaperons and enzymes operating successively during their translocation across the endoplasmic reticulum membrane (Kozlov and Grishin, 2007). Some posttranslation processes in animal peptide toxins are found in a broad variety of secreted peptides (such as proteolytic processing, C-terminal amidation, disulfide bond formation) (Buczek et al., 2005). The importance of protein posttranslation processing in the context of spider venom gland was emphasized by the presence of transcripts encoding for proteins involved in secretory pathway and posttranslational processing. ESTs relevant to molecular chaperon were identified in this
J. Chen et al. / Toxicon 52 (2008) 794–806
801
Table 3 Comparison of peptide toxins identified in transcriptome and proteome of C. jingzhao
(continued on next page)
802 Table 3 (continued)
J. Chen et al. / Toxicon 52 (2008) 794–806
J. Chen et al. / Toxicon 52 (2008) 794–806
803
Table 3 (continued)
(continued on next page)
804
J. Chen et al. / Toxicon 52 (2008) 794–806
Table 3 (continued)
The toxins represented in both datasets are highlighted. The signal sequences are italic. The symbol ’*’ indicates the 50 incomplete cDNA sequences of JZTX-32 and JZTX-33.
library that contained clusters matched with heat shock protein (Hsp) 70, Hsp90, heat shock factor binding protein, chaperonin complex component, molecular co-chaperone stress-induced-phosphoprotein (STI) 1 and a-crystallins. Heat shock proteins and other chaperonins are a subset of a ubiquitous group of proteins that direct the folding and assembly of cellular proteins (Welch, 1991; Gething and Sambrook, 1992), which are possibly important to toxin proteins’ secretion and regeneration. Redox-related enzymes in current sequenced transcripts included clusters that matched with glutathione S-transferase, alkyl hydroperoxide reductase, Cu2þ/Zn2þ superoxide dismutase superoxide dismutase (SOD)1, thiol specific antioxidant and related enzymes. These enzymes help to maintain an oxidation environment that can promote the formation of disulfide bonds, and allow the toxins to attain the pattern of disulfide bonds (Chae et al., 1994; Koningsberger et al., 1994). Several clusters matching with preprotein translocase, translocon-associated complex (also named signal sequence receptor) and vesicle coat complex (COP) II are necessary for preprotein transportation and secretion through the endoplasmic reticulum and the Golgi apparatus (Mossessova et al., 2003). Peptidyl-prolyl cis–trans isomerase can catalyze the cis– trans isomerization of proline imidic peptide bonds in oligopeptides and accelerate the folding of proteins. Cobyrinic acid a,c-diamide synthase, a glutamine
amidotransferase, may play important role in toxin amidation processing. 4.4. ESTs relevant to structure and cell motility There were abundant cytoskeleton and structure transcripts expressed in the venom gland of C. jingzhao encoding tubulin, actin, actin-related proteins, myosin and LIS1 protein homology (LisH) motif-containing protein, which contributes to the regulation of microtubule dynamics (Emes and Ponting, 2001). Interestingly, these cellular structural components from the venom gland, which are also identified in venoms of other poisonous animals (Zhang et al., 2006), are similar to ones from mammalian muscle tissue, which suggests that the contractile activity of the venom gland cavity be similar to that of muscle tissue. Sequences matching with cell motility proteins involved in transport such as muskelin, phosphatidylinositol transfer protein, fatty acid-binding protein, integral membrane protein of tetraspanin family (Hu et al., 2005) and apolipoprotein D precursor were also identified. 4.5. ESTs relevant to transcription and translation Proteins involved in transcription and translation processes were abundant in this library, which contributed
J. Chen et al. / Toxicon 52 (2008) 794–806
to the high level of protein synthesis events to produce large amounts of, secreted and renewable, venom proteins. Thirty-nine clusters (10.68% of unique sequences) relevant to protein synthesis were identified in this library including ribosomal proteins, translation initiation, elongation factors, splicing factors, etc. 4.6. ESTs relevant to metabolism pathway In this library, several enzymes in metabolic pathways were found. In glucose metabolism, 6 cluster sequences were identified. In protein metabolism, several enzymes and related chaperones of ubiquitin-dependent protein degradation were found such as Skp1–Cul1–F-box-protein (SCF) ubiquitin ligase, ubiquitin-protein ligase, ubiquitinconjugating enzyme, ubiquitin/60s ribosomal protein L40 fusion, nuclear valosin-containing protein (VCP) subfamily of ATPase associated with various cellular activities (AAA ATPase) and vacuolar sorting protein/ubiquitin receptor. Another interesting finding in our database was a transcript encoding cysteine proteinase cathepsin F, which is predicted with a signal peptide and play an important role in extracellular matrix degradation (Mai et al., 2002). We also identified that the transcriptions of C1-tetrahydrofolate synthase (JZ0059, KOG4230) and hypoxanthine– guanine phosphoribosyltransferase (JZ0158, KOG3367) suggest that C. jingzhao may possess a functional pathway for purine metabolism (Song and Rabinowitz, 1993). Pyridoxamine-phosphate oxidase (JZ0099, KOG2586) is required for the production of pyridoxine, which is an important coenzyme in amino acid metabolism. In energy metabolism, six clusters relevant to energyproducing organelle mitochondrion were found in this library such as mitochondrial ribosomal proteins L14, L16, mitochondrial solute carrier protein, mitochondrial ADP/ ATP carrier proteins, mitochondrial F1F0-ATP synthase and ATP synthase F0 subunit 6 and related proteins. Furthermore, abundant transcripts expressed in current library code for NADH:ubiquinone oxidoreductase, cytochrome b, cytochrome c, cytochrome oxidase, ubiquinol cytochrome c reductase and nucleoside diphosphate kinase, which were also needed to meet energy needs for toxin protein synthesis and other cellular events, for instance, gland contraction and cell motility. 5. Conclusion In this study, we adopted the molecular approach of generating and analyzing ESTs from the C. jingzhao venom gland to produce a general overview of the venom gland transcriptome. We have described a number of recognized molecules previously not known to be expressed in the venom gland of C. jingzhao, including L-cystatin, lectin, chitinase, ferritin, etc. On the other hand, the cellular functional transcripts were also identified and characterized extensively, which confirmed the highly specialized nature of spider venom glands as toxin producer, allowing the description of putative proteins that certainly are involved in cellular processes relevant for the venom glands’ function. So this study provides a global view of the genetic programs for the venom gland of C. jingzhao
805
described so far and an insight into molecular mechanism of ‘‘pharmaceutical factory’’ of C. jingzhao. Acknowledgments This project was supported by grants from National Natural Science Foundation of China (Nos. 30430170, 30500146 and 30670640) and National 973 Project of China (No. 2006CB708508). And we thank Professor Xianchun Wang, Xuanwen Li and Yuanquan He for critical reading of the manuscript. Conflict of interest There are no conflicts of interest. References Adams, M.D., Kelley, J.M., Gocayne, J.D., et al., 1991. Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252 (5013), 1651–1656. Agarwala, K.L., Kawabata, S., Hirata, M., Miyagi, M., Tsunasawa, S., Iwanaga, S., 1996. A cysteine protease inhibitor stored in the large granules of horseshoe crab hemocytes: purification, characterization, cDNA cloning and tissue localization. J. Biochem. 119 (1), 85–94. Ashburner, M., Ball, C.A., Blake, J.A., et al., 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25 (1), 25–29. Buczek, O., Bulaj, G., Olivera, B.M., 2005. Conotoxins and the posttranslational modification of secreted gene products. Cell. Mol. Life Sci. 62 (24), 3067–3079. Chae, H.Z., Robison, K., Poole, L.B., Church, G., Storz, G., Rhee, S.G., 1994. Cloning and sequencing of thiol-specific antioxidant from mammalian brain: alkyl hydroperoxide reductase and thiol-specific antioxidant define a large family of antioxidant enzymes. Proc. Natl. Acad. Sci. U.S.A. 91 (15), 7017–7021. Chen, J., Deng, M., et al., 2008. Molecular diversity and evolution of cystine knot toxins of the tarantula Chilobrachys jingzhao. Cell. Mol. Life Sci. 65 (15), 2431–2444. Chou, H.H., Holmes, M.H., 2001. DNA sequence quality trimming and vector removal. Bioinformatics 17 (12), 1093–1104. Diao, J., Lin, Y., Tang, J., Liang, S., 2003. cDNA sequence analysis of seven peptide toxins from the spider Selenocosmia huwena. Toxicon 42 (7), 715–723. Emes, R.D., Ponting, C.P., 2001. A new sequence motif linking lissencephaly, Treacher Collins and oral–facial–digital type 1 syndromes, microtubule dynamics and cell migration. Hum. Mol. Genet. 10 (24), 2813–2820. Escoubas, P., 2006. Molecular diversification in spider venoms: a web of combinatorial peptide libraries. Mol. Divers. 10 (4), 545–554. Escoubas, P., Sollod, B., King, G.F., 2006. Venom landscapes: mining the complexity of spider venoms via a combined cDNA and mass spectrometric approach. Toxicon 47 (6), 650–663. Fernandes-Pedrosa, M.F., Junqueira-de-Azevedo, I.L., Goncalves-deAndrade, R.M., Kobashi, L.S., Almeida, D.D., Ho, P.L., Tambourgi, D.V., 2008. Transcriptome analysis of Loxosceles laeta (Araneae, Sicariidae) spider venomous gland using expressed sequence tags. BMC Genomics 9 (1), 279. Gething, M.J., Sambrook, J., 1992. Protein folding in the cell. Nature 355 (6355), 33–45. Groth, D., Lehrach, H., Hennig, S., 2004. GOblet: a platform for Gene Ontology annotation of anonymous sequence data. Nucleic Acids Res. 32 (Web Server issue), W313–W317. He, Q.Y., He, Q.Z., Deng, X.C., Yao, L., Meng, E., Liu, Z.H., Liang, S.P., 2008. ATDB: a uni-database platform for animal toxins. Nucleic Acids Res. 36 (Database issue), D293–D297. Hu, C.C., Liang, F.X., Zhou, G., Tu, L., Tang, C.H., Zhou, J., Kreibich, G., Sun, T. T., 2005. Assembly of urothelial plaques: tetraspanin function in membrane protein trafficking. Mol. Biol. Cell 16 (9), 3937–3950. Kairies, N., Beisel, H.G., Fuentes-Prior, P., Tsuda, R., Muta, T., Iwanaga, S., Bode, W., Huber, R., Kawabata, S., 2001. The 2.0-Å crystal structure of tachylectin 5A provides evidence for the common origin of the innate immunity and the blood coagulation systems. Proc. Natl. Acad. Sci. U. S.A. 98 (24), 13519–13524.
806
J. Chen et al. / Toxicon 52 (2008) 794–806
King, G.F., 2004. The wonderful world of spiders: preface to the special Toxicon issue on spider venoms. Toxicon 43 (5), 471–475. Koningsberger, J.C., van Asbeck, B.S., van Faassen, E., Wiegman, L.J., van Hattum, J., van Berge Henegouwen, G.P., Marx, J.J., 1994. Copper, zincsuperoxide dismutase and hydrogen peroxide: a hydroxyl radical generating system. Clin. Chim. Acta 230 (1), 51–61. Kozlov, S.A., Grishin, E.V., 2007. The universal algorithm of maturation for secretory and excretory protein precursors. Toxicon 49 (5), 721–726. Leisy, D.J., Mattson, J.D., Quistad, G.B., Kramer, S.J., Van Beek, N., Tsai, L.W., Enderlin, F.E., Woodworth, A.R., Digan, M.E., 1996. Molecular cloning and sequencing of cDNAs encoding insecticidal peptides from the primitive hunting spider, Plectreurys tristis (Simon). Insect Biochem. Mol. Biol. 26 (5), 411–417. Liao, Z., Cao, J., Li, S., Yan, X., Hu, W., He, Q., Chen, J., Tang, J., Xie, J., Liang, S. , 2007. Proteomic and peptidomic analysis of the venom from Chinese tarantula Chilobrachys jingzhao. Proteomics 7 (11), 1892–1907. Liao, Z., Yuan, C., Deng, M., Li, J., Chen, J., Yang, Y., Hu, W., Liang, S., 2006. Solution structure and functional characterization of JingzhaotoxinXI: a novel gating modifier of both potassium and sodium channels. Biochemistry 45 (51), 15591–15600. Liu, D., Graber, J.H., 2006. Quantitative comparison of EST libraries requires compensation for systematic biases in cDNA generation. BMC Bioinformatics 7, 77. Mai, J., Sameni, M., Mikkelsen, T., Sloane, B.F., 2002. Degradation of extracellular matrix protein tenascin-C by cathepsin B: an interaction involved in the progression of gliomas. Biol. Chem. 383 (9), 1407–1413. Mossessova, E., Bickford, L.C., Goldberg, J., 2003. SNARE selectivity of the COPII coat. Cell 114 (4), 483–495. Ogawa, T., Chijiwa, T., Oda-Ueda, N., Ohno, M., 2005. Molecular diversity and accelerated evolution of C-type lectin-like proteins from snake venom. Toxicon 45 (1), 1–14. Schwartz, E.F., Diego-Garcia, E., Rodriguez de la Vega and L.DPossani, R.C., 2007. Transcriptome analysis of the venom gland of the Mexican
scorpion Hadrurus gertschi (Arachnida: Scorpiones). BMC Genomics 8, 119. Song, J.M., Rabinowitz, J.C., 1993. Function of yeast cytoplasmic C1-tetrahydrofolate synthase. Proc. Natl. Acad. Sci. U.S.A. 90 (7), 2636–2640. Takasuga, A., Hirotsune, S., Itoh, R., Jitohzono, A., Suzuki, H., Aso, H., Sugimoto, Y., 2001. Establishment of a high throughput EST sequencing system using poly(A) tail-removed cDNA libraries and determination of 36,000 bovine ESTs. Nucleic Acids Res. 29 (22), E108. Tatusov, R.L., Fedorova, N.D., Jackson, J.D., et al., 2003. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41. Welch, W.J., 1991. The role of heat-shock proteins as molecular chaperones. Curr. Opin. Cell Biol. 3 (6), 1033–1038. Xiao, Y., Tang, J., Hu, W., Xie, J., Maertens, C., Tytgat, J., Liang, S., 2005. Jingzhaotoxin-I, a novel spider neurotoxin preferentially inhibiting cardiac sodium channel inactivation. J. Biol. Chem. 280 (13), 12069–12076. Xiao, Y., Tang, J., Yang, Y., Wang, M., Hu, W., Xie, J., Zeng, X., Liang, S., 2004. Jingzhaotoxin-III, a novel spider toxin inhibiting activation of voltagegated sodium channel in rat cardiac myocytes. J. Biol. Chem. 279 (25), 26220–26226. Yuan, C., Liao, Z., Zeng, X., Dai, L., Kuang, F., Liang, S., 2007a. Jingzhaotoxin-XII, a gating modifier specific for Kv4.1 channels. Toxicon 50 (5), 646–652. Yuan, C., Yang, S., Liao, Z., Liang, S., 2007b. Effects and mechanism of Chinese tarantula toxins on the Kv2.1 potassium channels. Biochem. Biophys. Res. Commun. 352 (3), 799–804. Zeng, X., Deng, M., Lin, Y., Yuan, C., Pi, J., Liang, S., 2007. Isolation and characterization of Jingzhaotoxin-V, a novel neurotoxin from the venom of the spider Chilobrachys jingzhao. Toxicon 49 (3), 388–399. Zhang, B., Liu, Q., Yin, W., et al., 2006. Transcriptome analysis of Deinagkistrodon acutus venomous gland focusing on cellular structure and functional aspects using expressed sequence tags. BMC Genomics 7, 152. Zhou, Y., Landweber, L.F., 2007. BLASTO: a tool for searching orthologous groups. Nucleic Acids Res. 35 (Web Server issue), W678–W682.