Gene 450 (2010) 76–84
Contents lists available at ScienceDirect
Gene j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l o c a t e / g e n e
Accumulation, functional annotation, and comparative analysis of expressed sequence tags in eggplant (Solanum melongena L.), the third pole of the genus Solanum species after tomato and potato Hiroyuki Fukuoka ⁎, Hirotaka Yamaguchi, Tsukasa Nunome, Satomi Negoro, Koji Miyatake, Akio Ohyama National Institute of Vegetable and Tea Science, NARO., 360 Kusawa, Ano, Tsu, Mie 514-2392, Japan
a r t i c l e
i n f o
Article history: Received 11 August 2009 Received in revised form 16 October 2009 Accepted 16 October 2009 Available online 24 October 2009 Received by C. Feschotte Keywords: Solanaceae EST Unigene Transcriptome
a b s t r a c t Eggplant (Solanum melongena L.) is a widely grown vegetable crop that belongs to the genus Solanum, which is comprised of more than 1000 species of wide genetic and phenotypic variation. Unlike tomato and potato, Solanum crops that belong to subgenus Potatoe and have been targets for comprehensive genomic studies, eggplant is endemic to the Old World and belongs to a different subgenus, Leptostemonum, and therefore, would be a unique member for comparative molecular biology in Solanum. In this study, more than 60,000 eggplant cDNA clones from various tissues and treatments were sequenced from both the 5′- and 3′-ends, and a unigene set consisting of 16,245 unique sequences was constructed. Functional annotations based on sequence similarity to known plant reference datasets revealed a distribution of functional categories almost similar to that of tomato, while 1316 unigenes were suggested to be eggplant-specific. Sequence-based comparative analysis using putative orthologous gene groups setup by reciprocal sequence comparison among six solanaceous species suggested that eggplant and its wild ally Solanum torvum were clustered separately from subgenus Potatoe species, and then, all Solanum species were clustered separately from the genus Capsicum. Microsatellite motif distribution was different among species and likely to be coincident with the phylogenetic relationships. Furthermore, the eggplant unigene dataset exhibited its utility in transcriptome analysis by the SAGE strategy where a considerable number of short tag sequences of interest were successfully assigned to unigenes and their functional annotations. The eggplant ESTs and 16k unigene set developed in this study would be a useful resource not only for molecular genetics and breeding in eggplant itself, but for expanding the scope of comparative biology in Solanum species. © 2009 Elsevier B.V. All rights reserved.
1. Introduction The Solanaceae is one of the plant families most involved in our daily lives; it includes economically important crops such as tomato, potato, pepper, and eggplant. The genus Solanum is the largest of the solanaceous genera, being comprised of 1000–1100 species including eggplant (Solanum melongena L.), potato (Solanum tuberosum L.) (D'Arcy, 1991), and tomato, which has recently been reclassified into the genus as Solanum lycopersicum L. (formerly Lycopersicon esculentum Mill.) (reviewed by Asamizu and Ezura, 2009). As a model plant for the Solanaceae, extensive efforts have been made to accumulate molecular genetic information about tomato. High-density molecular marker linkage maps have been constructed (Frary et al., 2005; Wu
Abbreviations: BAC, bacterial artificial chromosome; CHX, cycloheximide; CPA, 4chlorophenoxyacetic acid; DPA, days post anthesis; EST, expressed sequence tag; GO, gene ontology; PCR, polymerase chain reaction; QV, quality value; SAGE, serial analysis of gene expression; SSR, simple sequence repeat; UPGMA, unweighted pair-group matching algorithm. ⁎ Corresponding author. Tel.: +81 59 268 4651; fax. +81 59 268 1339. E-mail address:
[email protected] (H. Fukuoka). 0378-1119/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2009.10.006
et al., 2006), which provide the basis for the ongoing tomato genome sequencing project, which is generating a high-quality euchromatic genome sequence for use as a reference genome for the Solanaceae (Mueller et al., 2005b, 2009). The saturated tomato linkage maps also provide a basis for comparative genomics among other solanaceous plants, including eggplant (Doganlar et al., 2002; Wu et al., 2009a) and pepper (Wu et al., 2009b). Eggplant and the closely related Solanum species belonging to the subgenus Leptostemonum are some of the most important vegetable crops in Asia, the Middle and Near East, Southern Europe, and Africa (Daunay and Lester, 1988). World production of eggplant has been growing year by year during the last two decades, and reached 32 million tons in 2007, which was roughly one-fourth of the total tomato production (FAOSTAT, http://faostat.fao.org/). However, eggplant has been less recognized as a target for molecular genetics research than other solanaceous species. One reason for this may be that many of the agronomically important traits in eggplant are also shared by tomato, potato, and pepper (Capsicum annuum L.), and in most cases, the genetics of these traits has been investigated in more detail in those species (Wu et al., 2009a). However, eggplant has many unique traits, including extra-large fruit size, high temperature-
H. Fukuoka et al. / Gene 450 (2010) 76–84
and water-stress tolerance, parthenocarpy without any negative pleiotropic effects, and stable Verticillium and bacterial wilt resistance (Sakata et al., 1996; Saito et al., 2009). While most solanaceous crops are believed to have originated in the Americas, eggplant is endemic to the Old World (Daunay and Lester, 1988). Genetic divergence analysis of eggplant genetic resources revealed wide genetic variations (Polignano et al., 2009), which should give new insights into a variety of Solanum species, not solely in eggplant. Therefore, the accumulation of genomic information about eggplant will not only facilitate genetics and molecular breeding methodology in eggplant itself but also make it a valuable and unique member of the solanaceous plant group, useful in comparative biological studies of genetics, physiology, development, and evolution. As a functional subset of whole genomic information, accumulation of expressed sequence tags (ESTs) sequences is a promising strategy for plant studies (Rudd, 2003). In tomato, 239,593 ESTs were assembled into 34,829 unigenes (SOL Genomics Network, Mueller et al., 2005a), 280,913 ESTs were assembled into 45,729 unigenes (MiBASE, Yano et al., 2006), and 333,385 ESTs were assembled into 46,849 unigenes (DFCI Gene Index, Quackenbush et al., 2001). These EST/unigene datasets have been utilized for digital expression analysis in order to obtain an overview of the whole transcriptome among different spatial/temporal biological conditions in tomato, and furthermore, comparative transcriptomics among unrelated plant species, Arabidopsis and grape (Fei et al., 2004). The datasets were also found to be useful targets for mining single nucleotide polymorphisms for DNA marker development (Yamamoto et al., 2005). For two other solanaceous crops, pepper and potato, 32,399 and 61,372 unigenes, respectively, have been released most recently from the DFCI Gene Index project. Furthermore, a database of more than 120,000 ESTs has been published in pepper, which is designed to
77
provide a workbench for expression pattern analysis in different biological conditions, and a comparative analysis in transcriptomes of other solanaceous plants (Kim et al., 2008). On the other hand, in eggplant, very limited information is available at present; only 1841 unigenes, constructed using 3181 ESTs, have been released from the SOL Genomics Network (SGN). In this study, more than 60,000 cDNA clones were sequenced, out of which 16,245 unigenes were created. Sequence comparisons among unigenes of several solanaceous species, Arabidopsis, and other known plant reference sequences were performed to estimate the similarity and uniqueness of the eggplant dataset. Phylogenic analysis using putative ortholog sets comprised of six solanaceous species indicated that eggplant and Solanum torvum, one of the allied species belonging to the subgenus Leptostemonum, clustered separately from subgenus Potatoe species, potato, tomato, and their wild ally, Solanum pennellii. Preliminary results of EST-SSR characterization and gene expression comparison based on the SuperSAGE method were demonstrated as examples of possible uses for ESTs. 2. Material and methods 2.1. Library construction and sequencing sample preparation Total RNA was isolated from various eggplant tissue samples (see Table 1) using TRIzol reagent (Invitrogen, Carlsbad, CA, USA), and used for cDNA library construction, mainly using a Creator-SMART cDNA library construction kit (Clontech Laboratories Inc, Mountain View, CA, USA). In addition, a full-length-enriched normalized library (SmFL) was constructed using a cDNA preparation by the CAP trapping method described by Carninci et al (2000). From each library, colonies were randomly selected, cultured in 96-well plates,
Table 1 Summary of cDNA libraries for EST sequencing. Library
Cultivar
Tissue type
Developmental stage/treatment
AE0 AE5 OVL OVS YFR LS0 LS5 PLA
AE-P03 AE-P03 AE-P03 AE-P03 AE-P03 LS1934 LS1934 Nakate-Shinkuro
Anthesis 5 days post-anthesis 2 days pre-anthesis 5 days pre-anthesis 14 days after pollination Anthesis 5 days post-anthesis 10 days post-anthesis
2688 2688 1536 2304 2304 1536 1536 3072
4051 3965 2339 3758 3689 2530 2353 5243
PER CPA
Nakate-Shinkuro LS1934
Ovary Ovary Ovary Ovary Fruit Ovary Ovary Placenta and immature seeds Pericarp Ovary
3072 1920
4916 3038
CHX
LS1934
Ovary
1920
3324
CAL MLF PED PST
Nakate-Shinkuro Nakate-Shinkuro AE-P03 AE-P03
2304 2688 2688 1152
3303 4571 4365 1842
ROT SHT SEP ROL RBW
Nakate-Shinkuro Nakate-Shinkuro Nakate-Shinkuro LS1934 LS1934
Callus Mature leaf Peduncle Petal and stamen Root Shoot Sepal Root Root
10 days post-anthesis 24 h after 4-CPA and CHX treatment at anthesis 24 h after CHX treatment at anthesis 3 days after subculture Fully expanded Anthesis Anthesis
1152 2688 3072 2688 2688
1838 3891 4996 3196 3509
SmFLa
AE-P03
Ovary
15,360
27,369
Anominori Daitaro
Fruit Leaf and root
Seedling Seedling 10 days post-anthesis 3 weeks after germination 3 weeks after germination, 24 h after infection of R. solanacearum Mixture of 2 days pre-anthesis and 2 days post-anthesis 14 days after pollination Mature plant, mixture of with/without cadmium Total
61,056
98,086
a
Cap-trapped normalized library using mixture of total RNA preparations.
Clone number
Number of high quality
78
H. Fukuoka et al. / Gene 450 (2010) 76–84
and isolated plasmids were used for sequencing from both ends. Sequencing was performed using BigDye v3 sequencing premix and 3730xl DNA sequencer (Applied Biosystems, Foster City, CA, USA). 2.2. Sequence data trimming, clustering, and assembling unigene sequences Sequence data processing was performed using a phred/phrap/ cross_match package (Ewing and Green, 1998; Ewing et al., 1998). Trace files obtained from the DNA sequencer were basecalled by Phred, vector and adapter sequences were trimmed by cross_match, and the data with high-quality value (QV ≥ 20) were extracted and deposited into the DDBJ/EMBL/GenBank DNA databank. The pairs of forward and reverse read sequences of each clone were assembled by phrap and used for clustering. When the forward and reverse reads from a clone did not overlap, the reverse read sequence was reversecomplemented and joined to the end of the forward sequence, with 20 Ns inserted at the joint. All of the assembled/joined sequences were clustered using a global alignment-based VISUALBIO-Clustering system (NTT Software Co. Yokohama, Japan), and a representative clone was selected from each cluster by these rules: first, a clone that gave the highest sum value of the Smith-Waterman score among the cluster members, and then, a clone that has the longest putative ORF. A consensus sequence was obtained by assembling all trace data belonging to each cluster using phrap, followed by vector trimming and extraction of a high QV sequence. In the same way as described above, consensus sequences built by 5′- and 3′-reads were appropriately oriented and joined with 20 Ns if they did not overlap. 2.3. Database searches and functional annotation based on gene ontology (GO) Nucleotide sequences of the unigenes were searched against amino acid sequences of plant RefSeq data (release 27, 92,764 records) (Pruitt et al., 2007) and the Arabidopsis proteome (TAIR8) by BLASTX (E-value ≤ 1E − 5), and assigned to known information. In order to classify the sequences into functional groups, the results of the BLASTX search against the KOG database (Tatusov et al., 1997; Tatusov et al., 2003) was utilized. Sequences with no homology to the known genes were searched using TBLASTX (E-value ≤ 1E − 5) against tomato and potato unigene sets built by the DFCI Gene Index project (Quackenbush et al., 2001), LeGI_v12 (46,849 sequences) and StGI_v11 (56,712 sequences), respectively. GO terms were assigned to the eggplant unigenes by adopting the terms that had been assigned to the top-hit sequences of the known datasets mentioned above, and then they were mapped to the higher level categories (plant GO Slim) using GOSlimViewer (McCarthy et al. 2007). 2.4. Construction of putative ortholog sets and phylogenetic analysis As unigene datasets of tomato, potato, and pepper, LeGI_v12, StGI_v11, and CaGI_v4 (DFCI Pepper Gene Index v.4, 32,399 sequences) were used, respectively. The unigene set of S. pennellii, comprised of 3219 sequences, was obtained from the FTP site at the SGN (ftp://ftp.sgn.cornell.edu). The unigene set of S. torvum was comprised of 6296 unigenes built from 28,379 ESTs obtained from leaf and root cDNA libraries (Yamaguchi et al., in press). Construction of putative ortholog sets of ESTs of the six Solanaceae species was performed as follows. First, unigene datasets of the four crop species (eggplant, tomato, potato, and pepper) were reciprocally compared to each other by the Smith-Waterman algorithm using the SSEARCH program. If “reciprocal best hit” relationships, that is, the first sequence finds the second sequence as its best hit in the second species, and vice versa (Li et al., 2003), existed among all combinations of unigenes of the four species, the four unigenes (one unigene for each species) were presumed to comprise an ortholog group.
Subsequently, “reciprocal best hit” sequence searches were performed separately between eggplant and S. torvum, and tomato and S. pennellii, respectively, and when such sequences existed, one putative orthologous unigene group of six species was completed. The nucleotide sequences of the six species were aligned using the ClustalW program, and if the longest aligned sequence block comprised of all six sequences was longer than 150 bp, and contained gaps less than 10 bp long, subsequences of the sequence block were used for sequence comparison analysis. If at least one sequence contained a gap, the nucleotide data of the gap position were ignored in all sequence combinations. The genetic distances were estimated for each ortholog group by Kimura's 2-parameter method using the ‘dnadist’ function in the PHYLIP v.3.6 package distributed by the author of the package, Prof. Joseph Felsenstein at University of Washington (http://evolution.genetics.washington.edu/phylip. html). A phylogenetic tree was constructed based on the UPGMA method using the ‘neighbor’ function of the PHYLIP package, using a merged nucleotide sequence created by joining all of the subsequences used for calculation of the genetic distances. 2.5. Gene expression analysis Total RNA samples were isolated from the ovaries of the eggplant cv. AE-P03 on anthesis and 2 days post-anthesis (DPA) as described above. SuperSAGE libraries were constructed according to Matsumura et al. (2003), and from each library, 1152 clones were randomly isolated and sequenced from one end using the universal M13 forward primer. Sequencing and data trimming were done as described above. Ditag sequence extraction, tag number counting, and statistical analysis were performed using the SAGE2000 Software kindly provided by Prof. Kenneth W. Kinzler at John Hopkins University (http://www.sagenet.org/protocol/index.htm). 3. Results and discussion 3.1. EST sequencing and clustering analysis for unigene set construction In total, twenty-one eggplant cDNA libraries were constructed from various samples listed in Table 1, and 61,056 cDNA clones, in total, were sequenced from both the 5′- and 3′-ends. After removing the unsuccessful reads, 98,086 high-quality sequence data were obtained and registered in the DDBJ/EMBL/GenBank database (Accession Nr. FS000001–FS098086). The total number of nucleotides reached 50,438,137, which would be ranked within the top 200 organisms (DDBJ release 78, June 2009). The high-quality sequence data were clustered into 16,245 groups using VISUALBIO-Clustering software, a commercialized version of a program developed through a mouse full-length cDNA project (Osato et al., 2002). Out of the 16,245 clusters, 7680 were comprised of more than 2 clones, and 8565 were of single clones. A consensus sequence was calculated for each cluster by Phrap, and the resulting 16,245 sequences were used for further analysis as a 16k eggplant unigene set. Based on the top-hit RefSeq record of the BLASTX search, around 50% of the unigenes were assumed to be the full-length that covered the translation initiation codon (data not shown). Compared to the known unigene datasets in other plant species, there was no doubt that the 16k unigene set represented a limited range of the whole eggplant transcriptome. Based on the largest tomato unigene set reported (46,849 in LeGI_12), it might be possible to estimate that the coverage of the 16k eggplant set would be about 35%. It would be possible as well that the unigene number would be overestimated and/or underestimated depending on the procedure adopted for each species. When the TGI Clustering tools, which had been used for Gene Index construction, were applied to the eggplant EST data, 21,162 unigenes were generated, which accounted for 45% of the number of the tomato unigenes, whereas some duplications
H. Fukuoka et al. / Gene 450 (2010) 76–84
Fig. 1. Classification of the 16k eggplant unigenes based on sequence similarity searches against known plant sequence datasets. Figures indicate the number of eggplant unigenes that show a significant sequence similarity (E-value ≤ 1.0E−5) by BLAST searches on deduced amino acid sequences. Figures in parentheses indicate the ratio of the unigene number in each class versus the total unigene number (16,245).
were detected in the LeGI unigenes (data not shown). Although such ambiguity and incompleteness are involved, the accumulation of the 98k EST data and the construction of the 16k unigene set in this study are valuable as the first nucleotide sequence dataset of considerable size in eggplant, which would cover roughly 40% of the whole eggplant transcriptome, allowing this species to appear on the stage of basic and applied molecular genetics. 3.2. Similarity search against reference databases and putative assignment of functional annotations The eggplant 16k unigene set was subjected to a BLASTX search against the Arabidopsis predicted proteome (TAIR8) and the NCBI plant RefSeq and subjected to a TBLASTX search against the Gene Index unigene sets (Quackenbush et al., 2001) of tomato and potato. Results were summarized in Fig. 1. The homology search against Arabidopsis
79
revealed that 12,670 unigenes (78.0%) had significant homology (Evalue ≤ 1E − 5), while 3575 other unigenes (22.0%) showed no significant correspondence. In the same way, 12,813 unigenes (78.9%) corresponded to the plant genes in the RefSeq dataset. The ratio of the significant corresponding genes in eggplant to the standard model datasets (TAIR and RefSeq) is comparable to the previous report about tomato by van der Hoeven et al. (2002), that nearly 30% of the tomato unigenes had no significant matches at the amino acid level to the Arabidopsis genomic sequence. Out of the 3320 unigenes that showed no significant homology to the RefSeq and/or TAIR8, 2004 (60.4%) hit to tomato and/or potato unigenes during a TBLASTX search (E-value ≤ 1E − 5) that would be common to Solanum and/or Solanaceae, and potentially absent in distant species. Finally, 1316 unigenes (8.1% out of 16,245) were identified as eggplant-specific expressed sequences that had no significant homology at the amino acid level to any datasets examined. The function of the eggplantspecific unigenes was difficult to be deduced since no significant hits were observed even by an InterPro motif search (data not shown). As described later, however, developmental process-specific up-regulation was observed in some of the unigenes, which would suggest the existence of biological function. In order to assign functional categories to the 16k eggplant unigenes and to compare their distribution with that of tomato, a BLASTX search (E-value ≤ 1E − 5) against the dataset of eukaryotic clusters of orthologous groups of proteins (KOGs) (Tatusov et al., 1997, 2003) was performed using both datasets as queries. As shown in Fig. 2, 8083 eggplant unigenes were assigned to KOG entries and their functional categories showing the comparable distribution of that in tomato. This result suggested that the eggplant unigene set built in this study would represent the eggplant transcriptome without a significant bias. In order to compare the eggplant dataset to the other known unigene sets using a different platform, Gene Ontology (GO) annotations were performed. For each eggplant unigene, GO identifiers (IDs) were assigned by adopting the corresponding GO IDs of tomato, potato, and Arabidopsis unigenes with which the eggplant unigene hit a maximum score in blastx (TAIR8) and tblastx searches (LeGI and StGI). In order to standardize the possible discrepancies in the depth of the
Fig. 2. Functional annotations of the eggplant and tomato unigenes based on the KOG database. Inside, the four KOG subcategories are indicated: ‘information storage and processing’, ‘cellular processes and signaling’, ‘metabolism,’ and ‘poorly characterized’. Functional groups are indicated by a one-letter code: J, translation, ribosomal structure, and biogenesis; A, RNA processing and modification; K, transcription; L, replication, recombination, and repair; B, chromatin structure and dynamics; D, cell cycle control, cell division, chromosome partitioning; Y, nuclear structure; V, defense mechanisms; T, signal transduction mechanisms; M, cell wall/membrane/envelope biogenesis; N, cell motility; Z, cytoskeleton; W, extracellular structures; U, intracellular trafficking, secretion, and vesicular transport; O, posttranslational modification, protein turnover, chaperones; C, energy production and conversion; G, carbohydrate transport and metabolism; E, amino acid transport and metabolism; F, nucleotide transport and metabolism; H, coenzyme transport and metabolism; I, lipid transport and metabolism; P, inorganic ion transport and metabolism; Q, secondary metabolites biosynthesis, transport, and catabolism; R, general function prediction only; S, function unknown.
80
H. Fukuoka et al. / Gene 450 (2010) 76–84
annotation of the different species made by the different parties, the reciprocal GO assignment procedure based on the blast result was also done among the three reference species. The GO IDs were mapped to the higher level categories (plant GO Slim) in order to obtain a comparative overview of the distribution of GO terms in the ‘Biological Process’ aspect of the unigenes. In general, the distribution of the GObased annotation in the eggplant unigene set showed no remarkable differences from those of the reference species (Fig. 3), which reinforced the results of using the KOG-based annotation mentioned above. In total, 37 GO Slim terms were identified to which at least 1% of the eggplant unigenes were assigned. The percentage of unigenes assigned to each GO Slim category was compared between eggplant and each of the three reference species (total 111 comparisons = 37 terms × 3 reference species) and the differences fell in the range between 0.5-fold and 2-fold, except for nine comparisons. These results suggested that the eggplant transcriptome of annotated genes was comparable to those of the known higher plants and it was evenly represented by the 16k unigene set, even though it encompassed it incompletely. Although larger differences (more than 2-fold or less than 0.5-fold) were observed in nine cases, it would not necessarily mean that they were essential and species-specific differences between the transcriptomes of the two species being compared. Of the nine cases, six were found in the comparison with the potato unigene set. In potato, most of the EST data were collected from specific vegetative tissues such as stolon, tuber, and leaf/petiole, and therefore, the unigene set would include molecular deviation to some extent, being reflected in the biased representation of the specific functional categories. For instance, although the ratio of the unigenes assigned to GO:0009790 (embryonic development) was 2.3-fold less represented in potato compared to in eggplant, it was likely attributed to the mainly tuber-focused EST sequencing. Lin et al. (2005) reported that in tomato and coffee, which belongs to the Rubiaceae, one of the most closely related families to the Solanaceae, the unigenes assigned the GO Slim categories ‘carbohydrate metabolism’ or ‘biosynthesis’ were significantly over-represented compared to Arabidopsis, as was also the case with ‘other metabolism’, ‘catabolism’, ‘protein biosynthesis’, and so on. Also in our results, the percentage of unigenes assigned the terms ‘biosynthesis’ or ‘carbohydrate metabolism’ in tomato was notably large (21.0% and 13.1%, respectively) compared to that of Arabidopsis (11.8% and 4.2%, respectively). It was found, however, that those unigenes in eggplant (13.9% and 6.0%, respectively) and potato (13.9% and 4.9%, respectively) were represented comparably to Arabidopsis and were also remarkably less represented than those in tomato. The data suggested that the over-representation of ‘carbohydrate metabolism’- and ‘biosynthesis’-related genes found in tomato and coffee was not common to the Solanaceae and related plant families and there would be species-specific genetic backgrounds. A number of factors such as the origin, coverage, assembly method, and annotation procedure of EST data would influence comprehension of the comparative status of the transcriptomes. In our data, and also in the previous report for tomato and coffee, the percentage of the genes compared to the total dataset in most of the GO categories was obviously higher than for Arabidopsis, which might indicate that a certain artificial bias would exist. Further sequence accumulation and more detailed and sophisticated bioinformatic data processing would be helpful in order to improve the perception of comparative genetics in the Solanum species and their relatives. 3.3. GO-based comparison of eggplant transcriptomes in fruit-related and non-fruit-related organs EST datasets obtained from different tissues/organs or biological treatments have been used to obtain a overall perspective of differential gene expression (Ewing et al., 1999; Fei et al., 2004; Ogihara et al. 2003). In this study, the number of cDNA clones sequenced from
Fig. 3. Comparison of the GO-based functional distribution of unigenes in the eggplant dataset and three reference datasets of tomato, potato, and Arabidopsis. The horizontal axis indicates the percentage of unigenes to which the respective GO Slim term shown in the vertical axis is assigned. The combinations of a GO Slim category and a reference dataset from which eggplant exhibits more or less than 2-fold difference are marked by asterisks.
each library was relatively small (1–3 × 103 clones per library) compared to the abovementioned studies, and therefore, reliable and appropriate examination of differential expression of each unigene based on the number of assembled ESTs would be limited
H. Fukuoka et al. / Gene 450 (2010) 76–84
to the genes that were relatively highly expressed. Therefore, we classified the cDNA libraries used in this study into two types: fruitrelated (AE0, AE5. OVL, OVS, YFR, LS0, LS5, PLA, PER) and non-fruitrelated (CAL, MLF, PEO, PST, ROT, SHT, SEP, ROL), and examined the distribution of functional categories based on GO. The former (type F) and the latter (type NF) consisted of 17,985 clones and 15,413 clones, respectively. The ESTs obtained from the libraries prepared with normalized multiple samples (SmFL) and samples subjected to specific chemical/biological treatments (CPA, CHX and RBW) were excluded from the analysis. There existed 39 biological process GO Slim categories to which more than 1% of the unigenes were assigned in at least one of the two types of library groups. As shown in Fig. 4, 27 out of the 39 GO Slim categories were found to be represented differently in the type F
81
transcriptome when compared to the type NF, with sufficient statistic significance (P b 0.01) when tested by a method described by Audic and Claverie (1997). Twenty-three GO Slims were over-represented in the type F transcriptome including ‘catabolic process’ and ‘DNA metabolic process’ belonging to the higher category ‘metabolic process,’ ‘translation’ and ‘cell cycle’ of ‘cellular process,’ all GO Slim terms of ‘developmental process,’ and so on. On the other hand, ‘photosynthesis,’ ‘precursor and energy generation,’ and ‘response’ to ‘stress’ and ‘abiotic stimulus’ were significantly under-represented. These results suggested that the whole transcriptome in fruit-related tissues/organs shifted to a more consumptive, developmentally active status compared to the non-fruit-related ones. In the same procedure, the numbers of the assembled EST members of each unigene were compared between type F and type NF libraries for identification of genes specifically up- or down-regulated in the fruit (data not shown). In total, 47 unigenes including putative homologues of SEPALLATA1 and SEPALLATA3, which were involved in floral organ development (Pelaz et al. 2000), several ribosomal protein subunit genes and translation initiation factors were identified as being significantly (P b 0.01) up-regulated in fruit, whereas 114 unigenes were down-regulated in fruit, including the genes corresponding to known photosystem I and II subunit genes, droughtinduced RD22 and ERD10 (Yamaguchi-Shinozaki and Shinozaki, 1993; Kiyosue et al. 1994), and so on. These results coincided with the summarized fruit-specific gene expression profile based on GO annotation. 3.4. Sequence-based comparative analysis among solanaceous species
Fig. 4. Comparison of the GO-based functional distribution of unigenes in fruit-related and non-fruit-related cDNA libraries. The horizontal axis indicates the percentage of unigenes to which the respective GO Slim term shown in the vertical axis is assigned. Asterisks indicate statistically significant over-representation (P b 0.01), of which color is corresponding to the superior library types.
Based on a reciprocal Smith-Waterman comparison using 16k eggplant, 41k tomato, 57k potato, and 32k pepper unigenes, 1265 putative ortholog groups were constructed. Additional “reciprocal best hit” relationships were screened between two species combinations, eggplant versus S. torvum and tomato versus S. pennellii, respectively, and finally 65 putative ortholog groups, each of which consisted of unigene sequences of the six solanaceous species, were constructed (Table 1S). Since unigene datasets of S. torvum and S. pennellii were relatively small, consisting of 6296 and 3219 sequences, respectively, a drastic decrease in the number of ortholog groups should be reasonable. It has been pointed out that the “reciprocal best hit” procedure will sometimes pair up paralogs when applied to incomplete genome datasets (Wu et al., 2006). In order to minimize the probability of grouping paralogs, complete “reciprocal best hit” relationships were determined among the four species, and then data of the two wild relatives, determined with more relaxed conditions, were added. In addition, the E-value ratio of the best hit versus the second hit was determined to be less than 1E − 10 in all comparisons. Using the 65 ortholog sets, molecular relationships among the solanaceous species were estimated. Sequence alignment using ClustalW revealed that 32 ortholog groups successfully generated commonly aligned sequence blocks that consisted of sequence data of all six species, were more than 150 bp long, and contained gaps not more than 10 bp long (Fig. 1S). Using the 32 sequence sets, phylogenetic distances were calculated. The average distance and standard deviation between each pair of species are shown in Table 2S. A phylogenetic tree shown in Fig. 5 was calculated using a merged sequence for each species, which coincided with known relationships in which eggplant was most closely related to S. torvum and pepper was most distant from the other species belonging to the genus Solanum. Tomato and S. pennellii were most closely related to each other, and potato was placed next to them. GC content and codon usage distribution were compared using the nucleotide and deduced amino acid sequences of the 32 ortholog sets, and it was found that there was no distinct difference in these features among the six species (data not shown).
82
H. Fukuoka et al. / Gene 450 (2010) 76–84
repeat unit species of EST-SSRs in eggplant was clearly different from other examined Solanaceae species, and likely to be intermediate between pepper and tomato/potato. From the practical viewpoint, we have found in tomato and eggplant genomic SSR genotyping that GA and GT microsatellites are more convenient than palindromic AT and GC because of more stable PCR amplification and lower stutter products (unpublished data). Ohyama et al. (2009) reported that in tomato EST-SSR tended to be mapped with widespread distribution, whereas BAC-end sequence-derived genomic SSR showed uneven eccentrically located distribution to pericentric heterochromatin regions. The eggplant unigene set would be a good source of SSR marker development aiming for linkage map construction and genetic diversity studies in eggplant and wild relative species. Fig. 5. Phylogenetic relationships among six solanaceous species deduced by merged subsequences of 32 putative orthologous genes. The phylogenetic tree was constructed with an unweighted pair-group matching algorithm (UPGMA), based on genetic distance calculated by Kimura's 2-parameter method.
3.5. Screening and characterization of EST-SSRs Microsatellites, or SSRs, have been recognized as good targets for development of highly polymorphic DNA markers (Tautz, 1989), and EST datasets have been utilized as a good source for microsatellite screening in many plant species (Gupta et al., 2003; Caruso et al., 2008; Ohyama et al., 2009; Simko, 2009). In eggplant, construction of SSR-enriched genomic DNA libraries and large-scale development of genomic SSR markers have been reported (Nunome et al., 2009), but EST-based SSRs are limited. Using the 16k eggplant unigene set as a target, di-, tri-, tetra-, penta-, and hexa-nucleotide microsatellites that contained more than seven repeat units were screened using the srchssr program (Fukuoka et al., 2005). In order to compare the distribution of SSR motifs in other solanaceous species, the unigene datasets of tomato, potato, and pepper were subjected to microsatellite screening in the same way. (Table 2). From the 16k eggplant sequences, 255 unigenes containing microsatellites were identified. In total, 72.9% (186/255) of the microsatellites consisted of a dinucleotide repeat unit, and out of them, GA repeats were the most frequent, accounting for 48.9% (91/186), followed by AT repeats (30.6%, 57/186). In tomato and potato, di-nucleotide repeats were most frequent as well, but distribution of repeat units was different from that of eggplant, that is, AT repeats accounted for 67% (272/404) and 61% (368/606) of the total di-nucleotide repeats, respectively. In pepper, AT repeats accounted for only 21.8% (198/909) of the total dinucleotide microsatellites, and GA repeats were highly dominant (72.1%, 661/909). These data indicated that the distribution feature of
Table 2 Number and motif distribution of EST-SSRs in solanaceous species. Motif
AT CA GA AAC AAG AAT ACC ACG ACT AGC AGG ATC CCG N 3-base content (%)a a
Eggplant
Tomato
n
(%)
n
(%)
n
Potato (%)
n
Pepper (%)
57 38 91 5 28 20 2 3 1 2 1 2 2 3
22.4 14.9 35.7 2.0 11.0 7.8 0.8 1.2 0.4 0.8 0.4 0.8 0.8 1.2 1.53
272 19 113 9 25 34 3 3 0 0 4 2 2 5
55.4 3.9 23.0 1.8 5.1 6.9 0.6 0.6 0.0 0.0 0.8 0.4 0.4 1.0 1.05
368 56 182 13 53 41 6 5 3 3 8 15 4 10
48.0 7.3 23.7 1.7 6.9 5.3 0.8 0.7 0.4 0.4 1.0 2.0 0.5 1.3 1.35
198 50 661 14 20 30 4 3 1 4 3 11 2 3
19.7 5.0 65.8 1.4 2.0 3.0 0.4 0.3 0.1 0.4 0.3 1.1 0.2 0.3 3.10
Percentage of SSR-containing unigenes versus total unigenes.
3.6. Utility of the unigene set for high throughput gene expression analysis One of the most important usages of the EST dataset in general would be its application to gene expression analysis. While ESTs are essential in designing microarrays, transcriptome analysis by SAGE and related technologies based on direct counting of cDNAs by tagsequencing also requires annotated EST data accumulation that enables estimation of the origin of the obtained tags. As an evaluation of the utility of the eggplant unigene set for gene expression analysis, and a preliminary experiment to understand the molecular mechanism of fruit development initiation, gene expression profiles in the ovary at anthesis and 2-DPA stages were compared using the SuperSAGE method. From each library, 1152 clones were sequenced and approximately 13,000 tags were obtained from each library. After removal of the tags that showed up only once, 2760 independent tags were identified in total. Table 3 shows the tags and corresponding unigenes that significantly up-regulated (P b 0.025) at the 2-DPA, compared to the anthesis stage. A late embryogenesis abundant protein (RBW04B18C), a CHY zinc finger protein (AE507H15A), and an unknown protein (AE503A09A) seemed to be subjected to the most drastic induction among them, while biological function was unclear. Expression of the two genes related to metabolism and transport of phytohormones (cytokinin oxidase/dehydrogenase, AE506G11A; auxin efflux carrier protein, OVL03J06A) showed a considerable increase, which might be involved in the transition from the cell division stage to the cell elongation stage of fruit development. Although the experiment compared only two conditions and data accumulation was not enough for detailed functional analysis, these genes might be regulated in a developmentally specific manner, and related to fruit setting and development. In particular, two unigenes classified as eggplant-specific (SmFL29I15C and RBW02F08F) were found to be up-regulated significantly, which suggested that these unique genes were also functionally associated with the specific stage of development. Out of 2760 independent tag sequences, 2033 (74%) were perfectly matched to the eggplant unigene sequences, so that the corresponding unigenes and accompanying functional annotations, if any, could be assigned to the tags. As shown in Fig. 6, when tags accounted for more than 0.06% of the total count, more than 90% of them would be able to be assigned to a member of the 16k unigene set. On the other hand, when tags accounted for not more than 0.06% of the total count, which was almost 90% of the total tag species (2471/2760), it was more difficult to assign them to unigene sequences. Since it would be necessary to accumulate more sequence information to extend the coverage of transcriptome and to increase the probability of functional assignment of rare tag species to annotated unigenes, 454-based 3′-end EST sequencing is now underway. In addition, it is likely that unassigned tags, in part, would be artifacts of sequence mutations during PCR amplification and/or sequencing errors. Accumulation of more tag sequences would contribute to eliminating the impact of the tags with low reliability.
H. Fukuoka et al. / Gene 450 (2010) 76–84
83
Table 3 Tag sequences detected in a significantly larger number in 2-DPA ovary compared to the anthesis stage. Tag sequence
Unigene ID
Counta Anthesis
2-DPA
P_Chance
Description
RefSeq IDb
Late embryogenesis abundant protein CHY zinc finger protein Unknown protein Dormancy/auxin associated family protein Metallothionein Ubiquitin Eggplant specific Unknown protein Vitamin C defective Multiprotein bridging factor 1B beta-Galactosidase 12 – Eggplant specific Haloacid dehalogenase-like hydrolase family protein Chloroplast thylakoidal processing peptidase Chitinase Cytokinin oxidase/dehydrogenase Cysteine-type endopeptidase Hypothetical protein Zinc ion binding protein DNA-directed RNA polymerases I, II, and III 7-kDa subunit Auxin efflux carrier family protein – – Eggplant specific Amino acid permease Ubiquitin-conjugating enzyme E2
NP_567231 NP_00104408 1NP_565660 NP_564714 NP_00110546 5NP_849292 – NP_00105408 NP_567759 NP_191427 NP_194344 – – NP_850754
GGTGCCGGATCCAGTTACCGGT TGTGATGTCTGTAGCTTGCTGC ATGTAACATAGTGTTCAAGTGT TCCCTGGTTGAGAATGTTGTTT GCTTTTTTATGTTTGGTTTGTA ATGGTCTCAAATGATCATCTTG CAACAGCAAATGCAGCATCCA AGGCTTTTGTTTCTTCCATTTG TTCTATGTTTATGTAGTGTGGG ATATGTAATTGGTGGGCATAA TAATTGTATACAATTACCATAA TATAACTGACAAATCACTTGCG AGCATCTCTGCGTCTGATAAAA CTTTTTACAGCTGCTGCTGGGA
RBW04B18C AE507H15A AE503A09A CHX05H14A YFR05C08F SEP06E10A SmFL29I15C LS501M13A SmFL13J21A PED07K24A SmFL07L02 no EST SmFL14E11 SmFL14M08A
0 3 5 12 13 8 0 1 14 0 0 0 16 0
54 32 42 51 37 28 11 13 34 9 9 9 35 8
0.E + 00 0.E + 00 0.E + 00 0.E + 00 2.E − 04 3.E − 04 3.E − 04 6.E − 04 1.E − 03 1.E − 03 1.E − 03 1.E − 03 3.E − 03 3.E − 03
TACTATTTTCTCTGCTGTGTTG TGTTTGTTGTATTCTCTTGCAA CAACGTAGGATTCATATACTGC TAATGTGCAATATGCACACCTA GGAGCTTTTCTTTGGACCTATT TGGCCAGCTTAGGCTCTCCTTG TTGCTAGTCTATATAGCAGAGT
SmFL34D16A SEP08F08A AE506G11A LS502E19A SHT04L02A OVS01M07A OVS06D04F
9 1 0 0 1 1 1
24 10 7 7 9 9 9
4.E − 03 5.E − 03 7.E − 03 7.E − 03 8.E − 03 8.E − 03 8.E − 03
TAATCAAATTACTGATATGCAG ATCAAGTTGTGCTATTTTATGA CCACATCATCCTATTGGTTGTG GAATAAAGGGAGAAAAATCAG GATATGGTTGCACATATTGAGT GCGCACAATTTTCTGATATTGG
OVL03J06A no EST no EST RBW02F08F PST01H15A SmFL25J20A
6 0 0 0 0 0
17 6 6 6 6 6
1.E − 02 1.E − 02 1.E − 02 1.E − 02 1.E − 02 1.E − 02
a b
NP_180603 NP_566426 NP_181682 NP_195020 NP_00105775 NP_173164 NP_198917 NP_565011 – – – NP_199774 NP_00105419
Out of the total count, anthesis: 13,730, 2-DPA: 12,940. RefSeq ID to which each unigene hit at maximum similarity.
Further experiments are now underway, which will help us understand the molecular mechanism of fruit development initiation in eggplant; these experiments use samples from a broader range of developmental stages and genotypes.
SSR search results will be available from our Web-based database, VegMarks. (http://vegmarks.nivot.affrc.go.jp/VegMarks/jsp/page. do?transition=link). 4. Conclusion
3.7. Data availability EST sequences will be released on the DDBJ/EMBL/GenBank databank with accession numbers FS000001–FS098086. The 16k unigene sequences (flat FASTA format) and spread sheet files containing annotation information based on BLAST results and EST-
The extensive accumulation of eggplant EST resulted in 98,086 sequences that led to a unigene set comprised of 16,245 unique sequences. The dataset includes eggplant expressed sequences from various accessions, developmental stages, and treatments that would cover a respectable range of the whole transcriptome in eggplant, and among them, Solanum-specific and/or eggplant-specific transctipts were identified. The dataset would provide a useful resource for genetic marker development, functional genetics studies, and genetic divergence studies in eggplant, and moreover, it would make eggplant a novel and distinct member of the Solanum species for comparative genetics and phenomics. Acknowledgments The authors wish to thank Dr. Hideo Matsumura (Iwate Biotechnology Research Center) for his helpful advice and valuable suggestions for SuperSAGE analysis and Dr. Takeo Saito (NIVTS) for providing eggplant materials. We are also grateful to Ms. Yumika Kitamura (NIVTS) for skillful technical assistance. This work was supported by the research programs ‘Program for the Promotion of Basic Research Activities for Innovative Biosciences (PROBRAIN)’ and ‘Molecular Evaluation of Parthenocarpy on Horticultural Crops,’ funded by the National Agriculture and Food Research Organization.
Fig. 6. The relationship between the gene expression levels represented as tag counts and the probability of successful assignment to the eggplant unigenes. Gene expression levels shown on the horizontal axis are categorized into eight classes using the percentage of each tag count versus the total number of counted tags. The vertical axis indicates the percentage of tags that were assigned to the corresponding unigene sequences, with a perfect sequence match including an NlaIII recognition site.
Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.gene.2009.10.006.
84
H. Fukuoka et al. / Gene 450 (2010) 76–84
References Asamizu, E., Ezura, H., 2009. Inclusion of tomato in the genus Solanum as “Solanum lycopersicum” is evident from phylogenetic studies. J. Jpn. Soc. Hort. Sci. 78, 3–5. Audic, S., Claverie, J.M., 1997. The significance of digital gene expression profiles. Genome Res. 7, 986–995. Carninci, P., et al., 2000. Normalization and subtraction of cap-trapper-selected cDNAs to prepare full-length cDNA libraries for rapid discovery of new genes. Genome Res. 10, 1617–1630. Caruso, M., Federici, C.T., Roose, M.L., 2008. EST-SSR markers for asparagus genetic diversity evaluation and cultivar identification. Mol. Breed. 21, 195–204. D’Arcy, W.G., 1991. The Solanaceae since 1976 with a review of its biogeography. In: Hawkes, J.G., Lester, R.N., Nee, M., Estrada, N. (Eds.), Solanaceae III. Royal Botanic Gardens/Kew, London, pp. 75–138. Daunay, M.C., Lester, R.N., 1988. The usefulness of taxonomy or Solanaceae breeders, with special reference to the genus Solanum and to Solanum melongena L. (eggplant). Capsicum Newsletter 7, 70–79. Doganlar, S., Frary, A., Daunay, M.C., Lester, R.N., Tanksley, S.D., 2002. A comparative genetic linkage map of eggplant (Solanum melongena) and its implications for genome evolution in the solanaceae. Genetics 161, 1697–1711. Ewing, B., Green, P., 1998. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194. Ewing, B., Hillier, L., Wendl, M.C., Green, P., 1998. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185. Ewing, R.M., Ben Kahla, A., Poirot, O., Lopez, F., Audic, S., Claverie, J.M., 1999. Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression. Genome Res. 9, 950–959. Fei, Z.J., et al., 2004. Comprehensive EST analysis of tomato and comparative genomics of fruit ripening. Plant J. 40, 47–59. Frary, A., Xu, Y., Liu, J., Mitchell, S., Tedeschi, E., Tanksley, S., 2005. Development of a set of PCR-based anchor markers encompassing the tomato genome and evaluation of their usefulness for genetics and breeding experiments. Theor. Appl. Genet. 111, 291–312. Fukuoka, H., Nunome, T., Minamiyama, Y., Kono, I., Namiki, N., Kojima, A., 2005. read2Marker: a data processing tool for microsatellite marker development from a large data set. Biotechniques 39, 472–476. Gupta, P.K., Rustgi, S., Sharma, S., Singh, R., Kumar, N., Balyan, H.S., 2003. Transferable EST-SSR markers for the study of polymorphism and genetic diversity in bread wheat. Mol. Genet. Genomics 270, 315–323. Kim, H.J., et al., 2008. Pepper EST database: comprehensive in silico tool for analyzing the chili pepper (Capsicum annuum) transcriptome. BMC Plant Biol. 8, doi:10.1186/ 1471-2229-8-101. Kiyosue, T., Yamaguchi-Shinozaki, K., Shinozaki, K., 1994. Characterization of 2 cDNAs (ERD10 and ERD14) corresponding to genes that respond rapidly to dehydration stress in Arabidopsis thaliana. Plant Cell Physiol. 35, 225–231. Li, L., Stoeckert Jr., C.J., Roos, D.S., 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189. Lin, C.W., Mueller, L.A., McCarthy, J., Crouzillat, D., Petiard, V., Tanksley, S.D., 2005. Coffee and tomato share common gene repertoires as revealed by deep sequencing of seed and cherry transcripts. Theor. Appl. Genet. 112, 114–130. Matsumura, H., et al., 2003. Gene expression analysis of plant host-pathogen interactions by SuperSAGE. Proc. Natl. Acad. Sci. U. S. A. 100, 15718–15723. McCarthy, F.M., et al., 2007. AgBase: a unified resource for functional analysis in agriculture. Nucleic Acids Res. 35, D599–D603. Mueller, L.A., et al., 2005a. The SOL Genomics Network. A comparative resource for Solanaceae biology and beyond. Plant Physiol. 138, 1310–1317. Mueller, L.A., 2005b. The tomato sequencing project, the first cornerstone of the International Solanaceae Project (SOL). Comp. Func. Genomics 6, 153–158. Mueller, L.A., et al., 2009. A snapshot of the emerging tomato genome sequence. Plant Genome 2, 78–92.
Nunome, T., et al., 2009. Development of SSR markers derived from SSR-enriched genomic library of eggplant (Solanum melongena L.). Theor. Appl. Genet. 119, 1143–1153. Ogihara, Y., et al., 2003. Correlated clustering and virtual display of gene expression patterns in the wheat life cycle by large-scale statistical analyses of expressed sequence tags. Plant J. 33, 1001–1011. Ohyama, A., et al., 2009. Characterization of tomato SSR markers developed using BACend and cDNA sequences from genome databases. Mol. Breed. 23, 685–691. Osato, N., et al., 2002. A computer-based method of selecting clones for a full-length cDNA project: simultaneous collection of negligibly redundant and variant cDNAs. Genome Res. 12, 1127–1134. Pelaz, S., Ditta, G.S., Baumann, E., Wisman, E., Yanofsky, M.F., 2000. B and C floral organ identity functions require SEPALLATA MADS-box genes. Nature 405, 200–203. Polignano, G., Uggenti, P., Bisignano, V., Della Gatta, C., 2009. Genetic divergence analysis in eggplant (Solanum melongena L.) and allied species. Genet. Resour. Crop Evol. doi:10.1007/s10722-009-9459-6 Pruitt, K.D., Tatusova, T., Maglott, D.R., 2007. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65. Quackenbush, J., et al., 2001. The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res. 29, 159–164. Rudd, S., 2003. Expressed sequence tags: alternative or complement to whole genome sequences? Trends Plant Sci. 8, 321–329. Saito, T., et al., 2009. Development of the parthenocarpic eggplant cultivar ‘Anominori’. JARQ, Jpn. Agric. Res. Q. 43, 123–127. Sakata, Y., Monma, S., Narikawa, T., Komochi, S., 1996. Evaluation of resistance to bacterial wilt and Verticillium wilt in eggplants (Solanum melongena L.) collected in Malaysia. J. Jpn. Soc. Hort. Sci. 65, 81–88. Simko, I., 2009. Development of EST-SSR markers for the study of population structure in lettuce (Lactuca sativa L.). J. Heredity 100, 256–262. Tatusov, R.L., Koonin, E.V., Lipman, D.J., 1997. A genomic perspective on protein families. Science 278, 631–637. Tatusov, R.L., et al., 2003. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41. Tautz, D., 1989. Hypervariability of simple sequences as a general source for polymorphic DNA markers. Nucleic Acids Res. 17, 6463–6471. van der Hoeven, R., Ronning, C., Giovannoni, J., Martin, G., Tanksley, S., 2002. Deductions about the number, organization, and evolution of genes in the tomato genome based on analysis of a large expressed sequence tag collection and selective genomic sequencing. Plant Cell 14, 1441–1456. Wu, F., Mueller, L.A., Crouzillat, D., Petiard, V., Tanksley, S.D., 2006. Combining bioinformatics and phylogenetics to identify large sets of single-copy orthologous genes (COSII) for comparative, evolutionary and systematic studies: a test case in the euasterid plant clade. Genetics 174, 1407–1420. Wu, F., Eannetta, N.T., Xu, Y., Tanksley, S.D., 2009a. A detailed synteny map of the eggplant genome based on conserved ortholog set II (COSII) markers. Theor. Appl. Genet. 118, 927–935. Wu, F., et al., 2009b. A COSII genetic map of the pepper genome provides a detailed picture of synteny with tomato and new insights into recent chromosome evolution in the genus Capsicum. Theor. Appl. Genet. 118, 1279–1293. Yamaguchi, H., et al., in press. Gene expression analysis in cadmium-stressed roots of a low cadmium-accumulating solanaceous plant, Solanum torvum. J. Exp. Bot. doi:10.1093/jxb/erp313. Yamaguchi-Shinozaki, K., Shinozaki, K., 1993. The plant hormone abscisic acid mediates the drought-induced expression but not the seed-specific expression of rd22, a gene responsive to dehydration stress in Arabidopsis thaliana. Mol. Gen. Genet. 238, 17–25. Yamamoto, N., et al., 2005. Expressed sequence tags from the laboratory-grown miniature tomato (Lycopersicon esculentum) cultivar Micro-Tom and mining for single nucleotide polymorphisms and insertions/deletions in tomato cultivars. Gene 356, 127–134. Yano, K., et al., 2006. MiBASE: a database of a miniature tomato cultivar Micro-Tom. Plant Biotechnol. 23, 195–198.