Analysis of the transcriptome of the root lesion nematode Pratylenchus coffeae generated by 454 sequencing technology

Analysis of the transcriptome of the root lesion nematode Pratylenchus coffeae generated by 454 sequencing technology

Molecular & Biochemical Parasitology 178 (2011) 7–14 Contents lists available at ScienceDirect Molecular & Biochemical Parasitology Analysis of the...

403KB Sizes 0 Downloads 16 Views

Molecular & Biochemical Parasitology 178 (2011) 7–14

Contents lists available at ScienceDirect

Molecular & Biochemical Parasitology

Analysis of the transcriptome of the root lesion nematode Pratylenchus coffeae generated by 454 sequencing technology Annelies Haegeman, Soumi Joseph, Godelieve Gheysen ∗ Ghent University, Dpt. Molecular Biotechnology, Coupure links 653, B-9000 Ghent, Belgium

a r t i c l e

i n f o

Article history: Received 25 January 2011 Received in revised form 30 March 2011 Accepted 4 April 2011 Available online 12 April 2011 Keywords: Root lesion nematode Pratylenchus Expressed sequence tags Transcriptome 454 sequencing

a b s t r a c t To study interactions between plants and plant-parasitic nematodes, several omics studies have nowadays become extremely useful. Since most data available so far is derived from sedentary nematodes, we decided to improve the knowledge on migratory nematodes by studying the transcriptome of the nematode Pratylenchus coffeae through generating expressed sequence tags (ESTs) on a 454 sequencing platform. In this manuscript we present the generation, assembly and annotation of over 325,000 reads from P. coffeae. After assembling these reads, 56,325 contigs and singletons with an average length of 353 bp were selected for further analyses. Homology searches revealed that 25% of these sequences had significant matches to the Swiss-prot/trEMBL database and 29% had significant matches in nematode ESTs. Over 10,000 sequences were successfully annotated, corresponding to over 6000 unique Gene Ontology identifiers and 5000 KEGG orthologues. Different approaches led to the identification of different sequences putatively involved in the parasitism process. Several plant cell wall modifying enzymes were identified, including an arabinogalactan galactosidase, so far identified in cyst nematodes only. Additionally, some new putative cell wall modifying enzymes are present belonging to GHF5 and GHF16, although further functional studies are needed to determine the true role of these proteins. Furthermore, a homologue to a chorismate mutase was found, suggesting that this parasitism gene has a wider occurrence in plantparasitic nematodes than previously assumed. Finally, the dataset was searched for orthologues against the Meloidogyne genomes and genes involved in the RNAi pathway. In conclusion, the generated transcriptome data of P. coffeae will be very useful in the future for several projects: (1) evolutionary studies of specific gene families, such as the plant cell wall modifying enzymes, (2) the identification and functional analysis of candidate effector genes, (3) the development of new control strategies, e.g. by finding new targets for RNAi and (4) the annotation of the upcoming genome sequence. © 2011 Elsevier B.V. All rights reserved.

1. Introduction The root lesion nematode Pratylenchus coffeae is a parasite of banana causing lesions, necrosis and toppling of the plants. It has a worldwide distribution, probably due to spreading of infected banana planting material. It also causes damage to various other crops such as yam, ginger, turmeric, abaca and coffee. It is the most important nematode causing damage to banana in South East Asia, Central and South America, and the Pacific. In Africa, its distribution is more localized, and it generally occurs in mixed populations with other nematodes such as Radopholus similis, Helicotylenchus multicinctus and Meloidogyne species [1]. In recent years, several plant-parasitic nematodes have been subjected to molecular analyses, especially transcriptome analyses

∗ Corresponding author. Tel.: +32 92645888; fax: +32 92646219. E-mail address: [email protected] (G. Gheysen). 0166-6851/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.molbiopara.2011.04.001

by means of expressed sequence tags (ESTs). Currently, approximately 175,000 ESTs from plant-parasitic nematodes have been submitted to the NCBI database, all derived from traditional Sanger sequencing (December 2010). The sedentary root-knot nematodes (Meloidogynidae) have been extensively studied with the generation of over 70,000 ESTs and the sequencing of two complete genomes of Meloidogyne incognita [2] and Meloidogyne hapla [3]. The Pratylenchidae, a family of migratory nematodes, is the family most closely related to the Meloidogynidae [4]. It currently accounts for approximately 15,000 ESTs, of which less than 8000 are derived from the genus Pratylenchus. A molecular comparison of members of the Meloidogynidae and Pratylenchidae could provide insights on the differences and similarities between sedentary and migratory nematodes. Keeping this in mind, we decided to characterize the transcriptome of a mixed-stage P. coffeae population by 454 pyrosequencing. The latter technique has become a relatively rapid and cost-effective method for high-throughput sequencing of ESTs of non-model organisms. So far, 454 sequencing

8

A. Haegeman et al. / Molecular & Biochemical Parasitology 178 (2011) 7–14

has only been used to generate ESTs from animal-parasitic nematodes [5–7], for plant-parasitic nematodes no reports on 454 sequencing studies are available yet. One of the goals of this project was to identify putative effector genes by several approaches. More specifically, we wanted to look if P. coffeae possesses a similar arsenal of plant cell wall modifying enzymes as the Meloidogyne species. The latter enzymes are necessary for plant-parasitic nematodes to penetrate the rigid plant cell wall. Numerous enzymes have been identified in different families of plant-parasitic nematodes, such as endo-1,4-beta-glucanase, xylanase, pectate lyase, polygalacturonase, arabinogalactan galactosidase, and arabinase [8]. These genes may have been acquired by horizontal gene transfer from bacteria and fungi [9]. In Pratylenchus species, so far only an endo-1,4-beta-glucanase has been identified [10], although there is also EST evidence for an expansin, a protein known to loosen the cell wall non-enzymatically [11]. 2. Materials and methods

NC 003902; Xanthomonas axonopodis pv. citri, NC 003919; Pseudomonas syringae pv. syringae, NC 007005; Xylella fastidosa Temecula 1, NC 004556; Pseudomonas syringae pv. tomato, NC 004578; Leifsonia xyli subsp. xyli, NC 006087; Pseudomonas syringae pv. phaseolicola, NC 005773; Xanthomonas campestris pv. campestris, NC 007086; Xanthomonas campestris pv. vesicatoria, NC 007508; Aster yellows witches broom Phytoplasma, NC 007716; Clavibacter michiganensis subsp. sepedonicus, NC 010407; Candidatus Phytoplasma mai, NC 011047; Dickeya dadantii, NC 012880; Dickeya zeae, NC 012912; Phytophthora infestans T30-4, NZ AATU00000000). All putative proteins from the genomes of the root-knot nematodes M. incognita and M. hapla were downloaded from the projects’ websites [2,3]. To look for orthologues, a reciprocal blast strategy was used: the Pratylenchus sequences longer than 150 bp were blasted (blastx, bit score > 50) against the Meloidogyne proteins as well as the opposite strategy (tblastn, bit score > 50). Only when both pairs of blast hits were the same, they were considered as true orthologues.

2.1. RNA extraction, cDNA synthesis and sequencing 2.4. Annotation P. coffeae was cultured on carrot discs at a constant temperature of 25 ◦ C. RNA was isolated from mixed stages using the TRI reagent (Sigma) according to the manufacturer’s instructions. First strand cDNA synthesis was done with the Super SMART PCR cDNA synthesis kit (Clontech, CA, USA) including an amplification step of 20 cycles as described in the manual. Subsequently, the amplified cDNA was purified using the Qiaquick PCR purification kit (Qiagen, Germany) and normalized using the TRIMMER kit (Evrogen). The normalized cDNA sample was sent to LGC Genomics (Berlin, Germany), where it was sequenced in two separate runs of ¼ of a picotiter plate on a 454 FLX Titanium platform (Roche, Branford, CT, USA) by a shotgun approach. The data was submitted to the NCBI Sequence Read Archive (SRA) with accession number SRA028814.

All Pratylenchus sequences longer than 150 bp were annotated based on the blastx results against Swiss-prot and trEMBL. A sequence was annotated based on the top hit information, only if the bit score > 50 and if the description of the top hit did not contain any terms that would suggest it is a hypothetical or unknown protein (“unknown”, “putative”, “uncharacterized”, “hypothetical”, “similar”, “predicted”, “probable”). Gene Ontology terms were retrieved for all unique protein identifiers from annotated sequences using QuickGO from the EBI website (http://www.ebi.ac.uk/QuickGO/GAnnotation). KEGG orthologues were identified using the KEGG Automated Annotation Server (KAAS) with default parameters [12]. Subsequently, KEGG BRITE mapping was applied to find the most common classifications.

2.2. Cleaning and assembly 2.5. Translation into putative proteins The resulting reads were processed with the CLC Genomics Workbench 4.0.2 software. SMART adapter sequences and 454 sequencing primers were trimmed from all reads. Additionally, low quality reads (<99.5% accuracy) and short reads (<50 bp) were discarded. The assembly was done using standard settings.

To predict putative proteins, the sequences longer than 150 bp were translated using OrfPredictor [13]. To look for putative parasitism genes, the presence of signal peptides was predicted with SignalP 3.0 [14] and transmembrane domains were predicted using TMHMM 2.0 [15].

2.3. Homology searches 2.6. Searching for specific genes All contigs and singletons longer than 150 bp were blasted locally (blastx) against Swiss-prot and trEMBL (October 2010) with a bit score cut off of 50. Additionally, all nematode ESTs were downloaded from the EST division of Genbank, and split into three different datasets according to the nematode’s lifestyle: animal-parasitic nematodes, plant-parasitic nematodes and free-living nematodes. A local tblastx search (bit score > 50) looked for homologues for all sequences in these datasets. Since a lot of plant cell wall degrading enzymes in nematodes are thought to originate through horizontal gene transfer, we tried to identify HGT candidates. Therefore we did blast searches (bit score > 50) against different datasets of plant, nematode and bacterial sequences. We downloaded the coding sequences of the following genomes from the RefSeq database of NCBI: Caenorhabditis elegans (NC 003279–NC 003284), Brugia malayi (NZ AAQA00000000), Arabidopsis thaliana (NC 003070–NC 003071, NC 003074–NC 003076) and all completed genomes of plant pathogenic bacteria and an oomycete (Pectobacterium atrosepticum, NC 004547; Ralstonia solanacearum, NC 003295; Xylella fastidosa 9a5c, NC 002488; Agrobacterium tumefaciens, NC 003062; Xanthomonas campestris pv. campestris,

As described above, three sequence sets putatively related to parasitism were retained: the first dataset was derived from homology to plant pathogenic bacteria and/or plants only, the second one was derived from homology to parasitic nematode ESTs exclusively, and the third one was derived from putative proteins with a signal peptide. The blastx hits of these datasets were retrieved and manually searched for the presence of putative plant cell wall modifying enzymes. The sequences that showed similarity to these genes were locally blasted (tblastx, bit score > 50) against all Pratylenchus sequences longer than 150 bp to identify any additional family members. The following putative effector genes were retrieved from Genbank and used for homology searches: 10A06, 14-3-3b, 16D10, 19C07, 7E12, acid phosphatase, annexin, calreticulin, chitinase, chorismate mutase, CLE peptide, ERp99, galectin, glutathione peroxidase, glutathione-S-transferase, map-1, nodL factor, peroxiredoxin, SPRYSEC RBP-1, RING-H2 zinc finger protein, fatty acid and retinol binding protein or SEC-2, SKP1-like protein, SXP/RAL-2, transthyretin-like protein, ubiquitin extension protein and venom allergen protein. Accession numbers and references can be found

A. Haegeman et al. / Molecular & Biochemical Parasitology 178 (2011) 7–14

326,971 320,703 25,987 53,179 79,166 56,325

82,445,359 70,752,552 11,903,837 9,559,767 21,463,604 19,908,855

Average length (bp) 252 221 458 180 271 353

3.1. Sequencing, cleaning and assembly The 454 sequencing run resulted in a total number of 326,950 reads with an average sequence length of 252 nucleotides. After adapter and quality trimming, 320,703 reads remained. Assembly resulted in a total of 25,987 contigs and 53,179 singletons. An overview of the sequencing and assembly is presented in Table 1, including the average sizes of the sequences. The number of reads included in the contigs ranges from 1 to 651 (Fig. 1). The length of all sequences ranges from 50 to 3343 with an average of 271 bp (Fig. 1).

1000 100 10 1 0

651

3. Results

A

10000

441 421 401 381 361 341 321 301 281 261 241 221 201 181 161 141 121 101 81 61 41 21 1

in Table 4. A tblastn search (bit score > 50) was used to identify possible homologues in the Pratylenchus sequences. Resulting hits were subsequently blasted (bit score > 50) against the Genbank nr database to make a distinction between homologues to secreted proteins and homologues to endogenous nematode proteins without function in parasitism. Since recently proposed strategies to combat plant-parasitic nematodes involve RNAi based methods, it is interesting to see whether genes involved in the RNAi pathway can be found in the Pratylenchus dataset. C. elegans protein sequences involved in the RNAi pathway were downloaded from Wormbase (www.wormbase.org) and locally blasted (bit score > 50) against all Pratylenchus sequences longer than 150 bp. Subsequently, the top hits were validated to be true orthologues by blasting these against the NCBI protein database (blastx). This reciprocal blast strategy only considers homologues to be true orthologues when the retained Pratylenchus top hits gave the same original C. elegans protein (used as query in the first blast search) as top hit in the new blast search against the non-redundant database.

Number of reads per contig 250

Frequency

Total number of reads High quality trimmed reads Contigs Singletons Total Total > 150 bp

Total bases

Number of contigs

100000

Table 1 Overview of sequencing run and assembly. Sequences

9

B

200 150 100 50 0

0

500

1000

1500

2000

2500

3000

3500

Length (bp) Fig. 1. Visualization of contigs and singletons >150 bp. (A) Contig size distribution showing the number of contigs with a particular number of reads. (B) Frequency plot of the length distribution of the contigs and singletons.

against B. malayi, 4195 (7.4%) against Arabidopsis and 2982 (5.3%) against the plant pathogenic bacteria database. Of these hits, 189 had hits in plant pathogenic bacteria or the oomycete exclusively, 62 in Arabidopsis exclusively and 57 in both the bacteria and Arabidopsis. Orthologues were identified in the genomes of M. incognita and M. hapla by a reciprocal blast strategy. The P. coffeae sequences have 5955 true orthologues with M. hapla, and 4939 with M. incognita. To compare, M. hapla and M. incognita have 6547 orthologues according to this strategy, of which 2159 do not occur in P. coffeae. In total, 2712 orthologues occurred in all three datasets. 3.3. Annotation

3.2. Homology searches Preliminary analyses showed that for sequences less than 150 bp it is difficult to find homology. Therefore, we decided to continue the analysis with contigs and singletons longer than 150 bp, further referred to as the Pratylenchus sequences. A blastx search against the Swiss-prot and trEMBL database resulted in 14,046 sequences with a significant hit (24.9%), of which 9187 were derived from contigs, while 4859 were from singletons. Of the 14,046 best matches, 10,270 were unique. Of all Pratylenchus sequences, 16,329 had significant hits in one or more of the nematode EST databases (29.0%) while 39,996 did not have any hits to nematode ESTs at all. The sequences which showed homology were classified according to the number of hits in each nematode EST database, as shown in Fig. 2. Approximately half (50.7%) of the sequences with homology to nematode ESTs had hits occurring in all three nematode EST databases, while one quarter (25.0%) was specific to plant-parasitic nematode species. Further blast searches revealed that 10,917 (19.4%) sequences have a hit against the C. elegans putative proteins, 10,143 (18.0%)

The annotation resulted in 10,266 annotated sequences, of which 7462 were unique. Gene Ontology identifiers were searched

APN 10843 (66.4%)

950

508

(5.8%)

(3.1%)

4075 (25.0%)

8276 1109 (50.7%) 787 (6.8%)

PPN 14088 (86.3%)

(4.8%)

624 (3.8%)

FLN 10796 (66.1%) Fig. 2. Classification of sequences with significant homology to nematode ESTs (APN: animal-parasitic nematodes; PPN: plant-parasitic nematodes; FLN: freeliving nematodes).

10

A. Haegeman et al. / Molecular & Biochemical Parasitology 178 (2011) 7–14

Table 2 The ten most abundant Gene Ontology terms present in the dataset for the cellular component, molecular function and biological process categories. Cellular component

%

Nucleus Cytoplasm Membrane Integral to membrane Plasma membrane Mitochondrion Intracellular Nucleoplasm Endoplasmic reticulum Cytosol

14.0 12.5 11.7 6.9 4.4 4.0 2.9 2.0 1.7 1.7

Molecular function

%

ATP binding Protein binding Nucleotide binding DNA binding Metal ion binding Catalytic activity Hydrolase activity Oxidoreductase activity Zinc ion binding Protein kinase activity

9.4 7.9 4.3 3.3 2.8 2.7 2.5 2.2 2.1 2.1

3.4. Translation and signal peptide prediction When translated into putative proteins, 805 of the sequences were predicted not to have an ORF. Of the 55,520 sequences with ORF, 40,100 coded for a putative protein longer than 50 amino acids. Only the proteins with a putative start methionin were included in signal peptide prediction. 2697 putative proteins were predicted to have a signal peptide, of which 1004 lacked a transmembrane domain. 3.5. Plant cell wall modifying enzymes

Biological process

%

Embryo development ending in birth or egg hatching Protein phosphorylation Transport Oxidation–reduction process Metabolic process Reproduction Nematode larval development Translation Transcription Growth

4.0 3.7 3.3 3.1 2.7 2.3 2.0 1.8 1.5 1.4

for these unique sequences. 115,454 GO terms were retrieved (of which 6235 different ones), coupled to 6937 different protein identifiers (on average 17 GO terms per protein identifier). The most abundantly present GO terms in the dataset are shown in Table 2. Using the KEGG Automatic Annotation Server, 5267 KEGG orthologues were identified, of which 2317 are unique. KEGG BRITE mapping revealed the most common classifications in the dataset (Fig. 3).

In total, 43 sequences consisting of 674 reads have homology to putative plant cell wall modifying enzymes (Table 3). The most abundantly present transcripts are from endo-1,4-beta-glucanases or cellulases. The different contigs all show great similarity to other known GHF5 endoglucanases from nematodes, and one of the contigs appears to be the previously described gene Pc-eng1 [10]. Several contigs include, besides the GHF5 catalytic domain, also a carbohydrate binding module (CBM). The second most abundant plant cell wall modifying protein is expansin. All the identified sequences consist of a signal peptide coupled to an expansin-like domain. Apparently, a CBM is not present in any of the contigs. All putative proteins show the highest homology to an expansinlike protein from Globodera rostochiensis. Four contigs resemble pectate lyases. Three of them show greatest similarity to cyst nematode pectate lyases, while one has a higher homology to root-knot nematode pectate lyases. One contig shows high homology to a xylanase from R. similis. It contains part of the catalytic domain and part of the CBM. Another contig and one singleton are similar to Heterodera arabinogalactan galactosidases. Two contigs do not have similarity to any nematode genes, but show similarity to some bacterial GHF5 proteins and GHF16 beta-1,3-endoglucanases from various organisms respectively. Finally, five contigs and three singletons show similarity to GHF32 invertases, which were also found in the M. incognita genome. The latter proteins are not truly plant cell wall modifying proteins as they probably catalyze sucrose into glucose and fructose to be used by the nematode as a carbon source [2]. 3.6. Plant-parasitic nematode proteins known to be secreted A selection of putative secreted nematode proteins was used for homology searches against the Pratylenchus sequences (Table 4).

Fig. 3. The ten most common KEGG BRITE hierarchies identified in the Pratylenchus sequences.

A. Haegeman et al. / Molecular & Biochemical Parasitology 178 (2011) 7–14

11

Table 3 Overview of putative plant cell wall modifying proteins identified in the P. coffeae EST dataset. The table contains for each protein the family it belongs to, whether it has been found before in the Pratylenchidae family, and the number of contigs, singletons and raw reads in the EST dataset. Enzyme

Enzyme family

Previously found in Pratylenchidae?

# Contigs

# Singletons

# Reads

Endo-1,4-beta-glucanase Expansin-like protein Pectate lyase Xylanase Arabinogalactan galactosidase Polygalacturonase Endo-1,3-beta-glucanase Putative protein

GHF5

P. coffeae, R. similis, P. vulnus (ESTs) P. vulnus (ESTs) – R. similis – – – –

14 10 4 1 1 2 1 1

5 3 0 0 1 0 0 0

246 184 120 54 10 7 2 51

PL3 GHF5–GHF30 GHF53 GHF28 GHF16 GHF5

Table 4 Results from a tblastn search of selected secreted nematode proteins against the P. coffeae sequences. The accession numbers of the sequences used as query are from the Protein division of Genbank, except for nodL, which is a contig derived from EST data on www.nematode.net. The number of contigs and singletons of probable homologues are given, including the total number of reads and the best bit score. Secreted protein

Ref.

Accession nr

# Contigs

10A06 14-3-3b 16D10 19C07 7E12 Acid phosphatase Annexin Calreticulin Chitinase Chorismate mutase CLE peptide ERp99 Galectin Glutathione peroxidase Glutathione-S-transferase Map-1 NodL factor Peroxiredoxin SPRYSEC RBP-1 RING-H2 zinc finger protein SEC-2 SKP1-like protein SXP/RAL-2 Transthyretin-like protein Ubiquitin extension protein Venom allergen protein

[40] [41] [42] [43] [44] [23] [45] [41] [46] [47] [48] [49] [24] [50] [24] [51] [52] [53] [54] [48] [55] [48] [56] [57] [48] [58]

ACU12489 AAL40719 Q06JG6 AAO85458 AAQ10021 AAN08587 AAN32888 AAL40720 AAN14978 ABB02655 AAO33474 AAG21337 AAB61596 CAD38523 ABN64198 CAC27774 MI01045 CAB48391 CAM33004 AAP30834 CAA70477 AAP30763 CAB75701 CAM84510 AAO33478 AAD01511

No true homologues 2 No true homologues No true homologues No true homologues 3 No true homologues 2 No true homologues 1 No true homologues 0 1 6 8 No true homologues No true homologues 2 2 No true homologues 2 No true homologues 1 9 0 2

The number of sequences identified as possible homologues are listed in Table 4. Of the 26 secreted proteins, 15 have putative homologues in the P. coffeae dataset. Two of them, chorismate mutase and SPRYSEC RBP-1 were previously only found in sedentary nematodes. The putative P. coffeae chorismate mutase protein shows highest similarity to chorismate mutases from bacteria from the genus Burkholderia (49% identity over 85% of the query), and the second highest similarity to nematode chorismate mutases (50% identity over 77% of the query). It has a significant match with a bit score of 48 to the PFAM family chorismate mutase type 2 (PF01817). Four possible SPRYSEC homologues were identified in the P. coffeae contigs. Two additional contigs with similarity to SPRYSECs were retrieved, but these show the highest similarity to SPRY domain containing proteins from other organisms, and are probably not involved in parasitism. No signal peptide could be detected in the putative chorismate mutase and the putative SPRYSECs because the 5 parts of all sequences are lacking.

# Singletons

# Reads

Bit score

0

80

301

0

61

199

0

38

290

0

9

54.3

1 0 2 0

1 6 143 155

53.1 103 243 196

0 0

52 18

263 64.7

3

89

234

0 3 1 0

8 233 1 83

75.5 177 85.1 103

present in the dataset, such as plant cell wall modifying enzymes, oxidoreductases and ubiquitin-like proteins. However lots of other genes not known to be involved in parasitism are found (e.g. transport proteins, ethylene forming enzyme), as well as sequences

56325 contigs and singletons

ORFPredictor ORFs >50 AA

tblastx against nema ESTs

40100

16329

signalP, TM-HMM with SP, without TM 1004

blastx against Cel, Bm, At, PPN 12441

hits in parasitic species only

hits in At and PPN bacteria

5533 136

308 82

3.7. Novel candidate parasitism/effector genes To search for novel putative nematode parasitism genes or effectors, three different database searching strategies were applied as shown in Fig. 4. When we examine the sequences retained in combined approaches, several genes known to be important in parasitism are

1 candidate effectors / parasitism genes Fig. 4. Three approaches to identify putative effectors/parasitism genes. The number of retained sequences is indicated in each step, and the final numbers are the number of sequences retained by two or three approaches combined.

12

A. Haegeman et al. / Molecular & Biochemical Parasitology 178 (2011) 7–14

Table 5 Genes from C. elegans involved in the RNAi pathway, their presence in the M. incognita genome [16] and in the P. coffeae ESTs. Function in RNAi

C. elegans genome

M. incognita genome

P. coffeae ESTs

Exo-RNAi Dicer Amplification Argonautes RNAi suppressor RNAi enhancer Uptake miRNA or endo RNAi

drh-1, drh-2, smg-2, rde-2, rde-3, rde-4, smg-5, zfp-1, mut-16, mut-7 dcr-1 rrf-1, rrf-2, rrf-3, ego-1 rde-1, SAGO-1, SAGO-2, PPW-1, PPW-2 eri-1, eri-3, eri-5 gfl-1 sid-1, sid-2, rsd-2, rsd-3, rsd-6 drsh-1, alg-1, alg-2, PRG-1, PRG-2, ERGO-1, vig-1, tsn-1

drh-1, drh-2, smg-2, rde-3 dcr-1 rrf-1, rrf-2, rrf-3, ego-1 rde-1, SAGO-1, SAGO-2, PPW-1, PPW-2 eri-1 gfl-1 rsd-3 drsh-1, alg-1, alg-2, tsn-1

drh-1, smg-2 dcr-1 ego-1 PPW-2 eri-1 gfl-1 drsh-1, alg-1, alg-2, tsn-1

with no significant homology and therefore unknown function. The latter sequences are interesting for further studies. If all three approaches are combined, only 1 candidate remains: a contig with homology to an ethylene/succinate forming enzyme from Ralstonia solanacearum. 3.8. Presence of genes involved in the RNAi pathway In Table 5 the identified orthologues for C. elegans genes involved in the RNAi pathway are displayed, together with the presence of these genes in the M. incognita genome, as described by Rosso et al. [16]. Overall, most classes of proteins present in the M. incognita genome can be found in the P. coffeae sequences as well. Only for the proteins involved in uptake, no homologue could be identified. For the argonaute proteins, only one orthologue could be identified with certainty. Several other argonaute-like proteins were found, but it was not clear which C. elegans gene they were orthologous to. Therefore these were not included in Table 5. 4. Discussion Since the emergence of 454 sequencing technology, transcriptome analysis of non-model organisms by EST sequencing has become increasingly popular. To date, several EST studies of plantparasitic nematodes have been published, all of them using the classic Sanger method [17–29]. In animal-parasitic nematodes, three recent studies used 454 sequencing to study the transcriptomes of Trichostrongylus colubriformis, Necator americanus and Haemonchus contortus [5–7]. In this study, we produced over 325,000 ESTs derived from the migratory plant-parasitic nematode P. coffeae. To our knowledge, this is the first plant-parasitic nematode subjected to an EST analysis using 454 technology, increasing the amount of transcript data for the Pratylenchidae considerably. The average read length of the reads was 252 bp, which is similar or slightly lower than reported in other studies using 454 technology [5,6,30–32]. After the assembly, 56,325 sequences longer than 150 bp remained with an average length of 353 bp. In a blastx search against Swiss-prot and trEMBL, 25% of these sequences gave a significant hit and in total 18% of the sequences were successfully annotated. This percentage is relatively low, but due to the high amount of data generated, over 7000 sequences were annotated with a unique identifier. Almost half of these annotated genes were classified as “enzymes” according to KEGG. Gene Ontology mapping revealed that most genes have basic functions such as transport, transcription, protein synthesis or modification and developmental and metabolic processes. Potential orthologues to root-knot nematode genes were searched by comparing the Pratylenchus sequences to the putative protein sequences derived from the genomes of M. incognita and M. hapla. This revealed that 33% of the orthologues in common for M. incognita and M. hapla do not occur in the P. coffeae ESTs. These orthologues are potentially interesting, since they can be specific to root-knot nematodes and can therefore be involved in gall formation. However, when looking through the putative function of

these genes inferred from homology searches, most genes seem to be general metabolism genes also identified in C. elegans. This suggests that the P. coffeae dataset is far from complete, and that the orthology approach is probably only useful when using full genome data. Several plant cell wall modifying proteins were identified in the Pratylenchus sequences. Previously, only an endoglucanase and some ESTs from an expansin-like protein were known in Pratylenchus species [10,11]. Our dataset extends the arsenal of enzymes in P. coffeae with xylanase, pectate lyase, polygalacturonase and arabinogalactan galactosidase. The presence of an arabinogalactan galactosidase is remarkable, since it has only been found in cyst nematodes so far, and it is not present in the available Meloidogyne genomes [8,33]. Because the Pratylenchidae are more closely related to the Meloidogynidae than to the Heteroderidae, the most probable evolutionary explanation is that there must have been a HGT in the common ancestor of Heteroderidae and Pratylenchidae. According to the phylogeny of van Megen et al. [4], this common ancestry group contains Meloidogynidae, Pratylenchidae, part of the Telotylenchidae, Heteroderidae, Rotylenchulidae, Hoplolaimidae, Dolichodoridae and Belonolaimidae. Probably several lineages, including the Meloidogynidae, have subsequently lost this gene during further evolution. One contig showed significant similarity to bacterial and eukaryote beta-1,3-endoglucanases belonging to GHF16. This enzyme type has been previously identified in Bursaphelenchus species, where it is important in the breakdown of fungal cell walls. Interestingly, the P. coffeae GHF16 enzyme shows only limited similarity to the Bursaphelenchus GHF16 enzymes, indicating that both enzymes might have a different evolutionary origin. In P. coffeae, the putative new beta-1,3-endoglucanase could play a role in the degradation of callose (beta-1,3-glucan), which is deposited by plants between the plasma membrane and the cell wall under stress conditions [34]. Hence, it forms an extra barrier for the movement of the nematode. Other plant pathogens are able to decrease the plant’s callose content, Pseudomonas syringae for example produces effectors to suppress callose deposition [35], and Gaeumannomyces graminis secretes beta-1,3-glucanases that probably break down callose [36]. Further investigation is obviously required to determine if the putative GHF16 enzyme is truly involved in parasitism. Finally, one sequence was identified with similarity to bacterial GHF5 proteins, but without significant similarity to known nematode GHF5 endoglucanases. One nematode EST originating from Xiphinema index showed significant similarity to the putative new type of GHF5 proteins. These two nematode sequences resemble putative proteins from two extremely halophilic archaea, and two anaerobic bacteria. However, no predicted signal peptide is present in these bacteria, and none of these bacteria are involved in plant parasitism. Therefore, these new putative GHF5 proteins might not play a role in nematode parasitism. Next to the plant cell wall modifying enzymes, other nematode secreted proteins with putative functions in the plant have been described (Table 4). Several genes with similarity to these putative effectors were also identified in the P. coffeae dataset: 14-

A. Haegeman et al. / Molecular & Biochemical Parasitology 178 (2011) 7–14

3-3b protein, acid phosphatase, calreticulin, chorismate mutase, ERp99, galectin, glutathione peroxidase, glutathione-S-transferase, peroxiredoxin, RBP-1, SEC-2, SXP/RAL-2, transthyretin-like protein, ubiquitin extension protein and venom allergen protein. For most of these secreted proteins, no clear functional data is available yet. Interestingly, sequences similar to chorismate mutase and RBP-1 (SPRYSEC) were found, although both were thought to occur in sedentary nematodes only, and in the latter case even in cyst nematodes only. The presence of a chorismate mutase in a migratory nematode supports the hypothesis that this gene has a general role in modulating the plant’s defense process rather than a role in nematode feeding site formation [37,38]. The SPRYSECs are probably also involved in reducing the plant’s defense responses as for one of the SPRYSECs it was shown that it can change the turnover rate of plant defense proteins [39]. Four contigs corresponding with 38 reads were identified in the P. coffeae dataset with best similarity to cyst nematode SPRYSECs. Other SPRY domain containing contigs are present in the dataset, but these do not show the highest similarity to cyst nematode SPRYSECs. Therefore, further analysis of the identified SPRYSEC homologues is necessary to determine what their evolutionary origin is and if they are truly involved in plant parasitism. Potential candidate parasitism genes or effector genes were identified by three database searching strategies. These genes are interesting for future studies to elucidate more about the function of these genes, or to use as a target in nematode control. One promising control strategy is to disrupt the function of specific genes by RNAi, which has proven to be effective against nematodes [16]. Searching the ESTs for genes involved in the RNAi pathway revealed that P. coffeae has more or less the same RNAi machinery as the Meloidogyne species. This suggests that potential RNAi strategies useful against root-knot nematodes could also be effective against these migratory nematodes. In conclusion, the transcriptome of P. coffeae is definitely useful to understand more about the biology of endoparasitic nematodes. It can be used for comparative and evolutionary studies as well as to select interesting new genes for functional studies. Moreover, the data will be valuable for the annotation of the upcoming genome (C. Opperman, personal communication). Acknowledgements AH is a post-doctoral fellow of the Research Foundation Flanders (FWO-Vlaanderen). SJ is a PhD student supported by VLIR-UOS. References [1] Bridge J, Fogain R, Speijer P. The root lesion nematodes of banana. Musa Pest Fact Sheet 2. Montpellier, France: INIBAP; 1997. [2] Abad P, Gouzy J, Aury JM, et al. Genome sequence of the metazoan plantparasitic nematode Meloidogyne incognita. Nat Biotechnol 2008;26:909–15. [3] Opperman CH, Bird DM, Williamson VM, et al. Sequence and genetic map of Meloidogyne hapla: a compact nematode genome for plant parasitism. PNAS 2008;105:14802–7. [4] van Megen H, van den Elsen S, Holterman M, et al. A phylogenetic tree of nematodes based on about 1,200 full length small subunit ribosomal DNA sequences. Nematology 2009;11:927–50. [5] Cantacessi C, Campbell BE, Young ND, et al. Differences in transcription between free-living and CO2 -activated third-stage larvae of Haemonchus contortus. BMC Genomics 2010;11:266. [6] Cantacessi C, Mitreva M, Campbell BE, et al. First transcriptomic analysis of the economically important parasitic nematode Trichostrongylus colubriformis, using a next-generation sequencing approach. Infect Genet Evol 2010;10:1199–207. [7] Cantacessi C, Mitreva M, Jex AR, et al. Massively parallel sequencing and analysis of the Necator americanus transcriptome. PLoS Negl Trop Dis 2010;4:e684. [8] Danchin EGJ, Rosso M-N, Vieira P, et al. Multiple lateral gene transfers and duplications have promoted plant parasitism ability in nematodes. PNAS 2010;107:17651–6. [9] Jones JT, Furlanetto C, Kikuchi T. Horizontal gene transfer from bacteria and fungi as a driving force in the evolution of plant parasitism in nematodes. Nematology 2005;7:641–6.

13

[10] Kyndt T, Haegeman A, Gheysen G. Evolution of GHF5 endoglucanase gene structure in plant-parasitic nematodes: no evidence for an early domain shuffling event. BMC Evol Biol 2008;8:305. [11] Haegeman A, Kyndt T, Gheysen G. The role of pseudo-endoglucanases in the evolution of nematode cell wall modifying proteins. J Mol Evol 2010;70:441–52. [12] Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res 2007;35:W182–5. [13] Min XJ, Butler G, Storms R, Tsang A. OrfPredictor: predicting protein-coding regions in EST-derived sequences. Nucleic Acids Res 2005;33:W677–80. [14] Bendtsen JD, Nielsen H, von Heijne G, Brunak S. Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 2004;340:783–95. [15] Krogh A, Larsson B, von Heijne G, Sonnhammer ELL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 2001;305:567–80. [16] Rosso MN, Jones JT, Abad P. RNAi and functional genomics in plant parasitic nematodes. Annu Rev Phytopathol 2009;47:207–32. [17] Popeijus M, Blok VC, Cardle L, et al. Analysis of genes expressed in second stage juveniles of the potato cyst nematodes Globodera rostochiensis and G. pallida using the expressed sequence tag approach. Nematology 2000;2:567–74. [18] Dautova M, Rosso MN, Abad P, Gommers FJ, Bakker J, Smant G. Single pass cDNA sequencing—a powerful tool to analyse gene expression in preparasitic juveniles of the southern root-knot nematode Meloidogyne incognita. Nematology 2001;3:129–39. [19] Jacob J, Mitreva M, Vanholme B, Gheysen G. Exploring the transcriptome of the burrowing nematode Radopholus similis. Mol Genet Genomics 2008;280:1–17. [20] Haegeman A, Jacob J, Vanholme B, Kyndt T, Mitreva M, Gheysen G. Expressed sequence tags of the peanut pod nematode Ditylenchus africanus: the first transcriptome analysis of an Anguinid nematode. Mol Biochem Parasitol 2009;167:32–40. [21] Kikuchi T, Aikawa T, Kosaka H, Pritchard L, Ogura N, Jones JT. Expressed sequence tag (EST) analysis of the pine wood nematode Bursaphelenchus xylophilus and B. mucronatus. Mol Biochem Parasitol 2007;155:9–17. [22] McCarter JP, Mitreva MD, Martin J, et al. Analysis and functional classification of transcripts from the nematode Meloidogyne incognita. Genome Biol 2003;4:R26. [23] Huang GZ, Gao BL, Maier T, et al. A profile of putative parasitism genes expressed in the esophageal gland cells of the root-knot nematode Meloidogyne incognita. Mol Plant Microbe Interact 2003;16:376–81. [24] Dubreuil G, Magliano M, Deleury E, Abad P, Rosso MN. Transcriptome analysis of root-knot nematode functions induced in the early stages of parasitism. New Phytol 2007;176:426–36. [25] Mitreva M, Elling AA, Dante M, et al. A survey of SL1-spliced transcripts from the root-lesion nematode Pratylenchus penetrans. Mol Genet Genomics 2004;272:138–48. [26] Furlanetto C, Cardle L, Brown DJF, Jones JT. Analysis of expressed sequence tags from the ectoparasitic nematode Xiphinema index. Nematology 2005;7: 95–104. [27] Wubben MJ, Callahan FE, Scheffler BS. Transcript analysis of parasitic females of the sedentary semi-endoparasitic nematode Rotylenchulus reniformis. Mol Biochem Parasitol 2010;172:31–40. [28] Karim N, Jones JT, Okada H, Kikuchi T. Analysis of expressed sequence tags and identification of genes encoding cell-wall-degrading enzymes from the fungivorous nematode Aphelenchus avenae. BMC Genomics 2009;10:525. [29] Elling AA, Mitreva M, Gai XW, et al. Sequence mining and transcript profiling to explore cyst nematode parasitism. BMC Genomics 2009;10:58. [30] Bettencourt R, Pinheiro M, Egas C, et al. High-throughput sequencing and analysis of the gill tissue transcriptome from the deep-sea hydrothermal vent mussel Bathymodiolus azoricus. BMC Genomics 2010:11. [31] Wang W, Wang YJ, Zhang Q, Qi Y, Guo DJ. Global characterization of Artemisia annua glandular trichome transcriptome using 454 pyrosequencing. BMC Genomics 2009;10:465. [32] Parchman TL, Geist KS, Grahnen JA, Benkman CW, Buerkle CA. Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery. BMC Genomics 2010;11:180. [33] Vanholme B, Haegeman A, Jacob J, Cannoot B, Gheysen G. Arabinogalactan endo-1,4-beta-galactosidase: a putative plant cell wall-degrading enzyme of plant-parasitic nematodes. Nematology 2009;11:739–47. [34] Flors V, Ton J, Jakab G, Mauch-Mani B. Abscisic acid and callose: team players in defence against pathogens? J Phytopathol 2005;153:377–83. [35] Hückelhoven R. Cell wall-associated mechanisms of disease resistance and susceptibility. Annu Rev Phytopathol 2007;45:101–27. [36] Yu YT, Kang ZS, Han QM, Buchenauer H, Huang LL. Immunolocalization of 1,3beta-glucanases secreted by Gaeumannomyces graminis var. tritici in infected wheat roots. J Phytopathol 2010;158:344–50. [37] Vanholme B, Kast P, Haegeman A, Jacob J, Grünewald W, Gheysen G. Structural and functional investigation of a secreted chorismate mutase from the plantparasitic nematode Heterodera schachtii in the context of related enzymes from diverse origins. Mol Plant Pathol 2009;10:189–200. [38] Jones JT, Furlanetto C, Phillips MS. The role of flavonoids produced in response to cyst nematode infection of Arabidopsis thaliana. Nematology 2007;9:671–7. [39] Rehman S, Postma W, Tytgat T, et al. A secreted SPRY domain-containing protein (SPRYSEC) from the plant-parasitic nematode Globodera rostochiensis interacts with a CC-NB-LRR protein from a susceptible tomato. Mol Plant Microbe Interact 2009;22:330–40.

14

A. Haegeman et al. / Molecular & Biochemical Parasitology 178 (2011) 7–14

[40] Hewezi T, Howe PJ, Maier TR, et al. Arabidopsis spermidine synthase is targeted by an effector protein of the cyst nematode Heterodera schachtii. Plant Physiol 2010;152:968–84. [41] Jaubert S, Ledger TN, Laffaire JB, Piotte C, Abad P, Rosso MN. Direct identification of stylet secreted proteins from root-knot nematodes by a proteomic approach. Mol Biochem Parasitol 2002;121:205–11. [42] Huang GZ, Dong RH, Allen R, Davis EL, Baum TJ, Hussey RS. A root-knot nematode secretory peptide functions as a ligand for a plant transcription factor. Mol Plant Microbe Interact 2006;19:463–70. [43] Lee C, Chronis D, Kenning C, et al. The novel cyst nematode effector protein 19C07 interacts with the Arabidopsis auxin influx transporter LAX3 to control feeding site development. Plant Physiol 2011;155:866–80. [44] de Lima de Souza DS, de Souza JDA, Grossi-de-Sá M, et al. Ectopic expression of a Meloidogyne incognita dorsal gland protein in tobacco accelerates the formation of the nematode feeding site. Plant Sci 2011;180:276–82. [45] Patel N, Hamamouch N, Li CY, et al. A nematode effector protein similar to annexins in host plants. J Exp Bot 2010;61:235–48. [46] Gao BL, Allen R, Maier T, et al. Characterisation and developmental expression of a chitinase gene in Heterodera glycines. Int J Parasitol 2002;32: 1293–300. [47] Long H, Wang X, Xu J. Molecular cloning and life-stage expression pattern of a new chorismate mutase gene from the root-knot nematode Meloidogyne arenaria. Plant Pathol 2006;55:559–63. [48] Gao BL, Allen R, Maier T, Davis EL, Baum TJ, Hussey RS. The parasitome of the phytonematode Heterodera glycines. Mol Plant Microbe Interact 2003;16:720–6. [49] Wang XH, Allen R, Ding XF, et al. Signal peptide-selection of cDNA cloned directly from the esophageal gland cells of the soybean cyst nematode Heterodera glycines. Mol Plant Microbe Interact 2001;14:536–44.

[50] Jones JT, Reavy B, Smant G, Prior AE. Glutathione peroxidases of the potato cyst nematode Globodera rostochiensis. Gene 2004;324:47–54. [51] Semblat JP, Rosso MN, Hussey RS, Abad P, Castagnone-Sereno P. Molecular cloning of a cDNA encoding an amphid-secreted putative avirulence protein from the root-knot nematode Meloidogyne incognita. Mol Plant Microbe Interact 2001;14:72–9. [52] Scholl EH, Thorne JL, McCarter JP, Bird DM. Horizontally transferred genes in plant-parasitic nematodes: a high-throughput genomic approach. Genome Biol 2003;4:R39. [53] Robertson L, Robertson WM, Sobczak M, et al. Cloning, expression and functional characterisation of a peroxiredoxin from the potato cyst nematode Globodera rostochiensis. Mol Biochem Parasitol 2000;111:41–9. [54] Sacco MA, Koropacka K, Grenier E, et al. The cyst nematode SPRYSEC protein RBP-1 elicits Gpa2-and RanGAP2-dependent plant cell death. PLoS Pathog 2009;5:e1000564. [55] Prior A, Jones JT, Blok VC, et al. A surface-associated retinol- and fatty acidbinding protein (Gp-FAR-1) from the potato cyst nematode Globodera pallida: lipid binding activities, structural analysis and expression pattern. Biochem J 2001;356:387–94. [56] Jones JT, Smant G. Blok VC. SXP/RAL-2 proteins of the potato cyst nematode Globodera rostochiensis: secreted proteins of the hypodermis and amphids. Nematology 2000;2:887–93. [57] Jacob J, Vanholme B, Haegeman A, Gheysen G. Four transthyretin-like genes of the migratory plant-parasitic nematode Radopholus similis: members of an extensive nematode-specific family. Gene 2007;402:9–19. [58] Ding X, Shields J, Allen R, Hussey RS. Molecular cloning and characterisation of a venom allergen AG5-like cDNA from Meloidogyne incognita. Int J Parasitol 2000;30:77–81.