Mechanisms of Ageing and Development 124 (2003) 93 /102 www.elsevier.com/locate/mechagedev
RIKEN mouse genome encyclopedia Yoshihide Hayashizaki * Gene Exploration Research Group, Riken Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan Received 10 May 2002; received in revised form 16 September 2002; accepted 1 November 2002
Abstract We have been working to establish the comprehensive mouse full-length cDNA collection and sequence database to cover as many genes as we can, named Riken mouse genome encyclopedia. Recently we are constructing higher-level annotation (Functional ANnoTation Of Mouse cDNA; FANTOM) not only with homology search based annotation but also with expression data profile, mapping information and protein /protein database. More than 1,000,000 clones prepared from 163 tissues were end-sequenced to classify into 159,789 clusters and 60,770 representative clones were fully sequenced. As a conclusion, the 60,770 sequences contained 33,409 unique. The next generation of life science is clearly based on all of the genome information and resources. Based on our cDNA clones we developed the additional system to explore gene function. We developed cDNA microarray system to print all of these cDNA clones, protein /protein interaction screening system, protein /DNA interaction screening system and so on. The integrated database of all the information is very useful not only for analysis of gene transcriptional network and for the connection of gene to phenotype to facilitate positional candidate approach. In this talk, the prospect of the application of these genome resourced should be discussed. More information is available at the web page: http://genome.gsc.riken.go.jp/. # 2002 Elsevier Science Ireland Ltd. All rights reserved. Keywords: Mouse genome encyclopedia; Full-length cDNA; FANTOM (functional anoration of mouse cDNA); Integrated database
1. Goal of Riken mouse genome encyclopedia project The goal of our project is to establish a mouse genome encyclopedia and to develop a series of our original technologies to achieve it. Riken mouse encyclopedia is the platform for the second goal to draw genome-wide pictures of gene cascades that can account for the mechanism to connect from genes to phenotypes. The encyclopedia will consists of five components; the nonredundant mouse full-length cDNA clone bank, the mouse full-length cDNA sequence bank, the chromosomal locations, expression profiles and protein/protein interaction (PPI) covering as many genes as can be collected. We have been developing the full-length cDNA technologies and high-speed sequencing technologies to analyze these materials. The purpose of our project is not only the analysis of the sequence of fulllength cDNA but also the development of a new approach to research based on the encyclopedia. We * Tel.: /81-45-503-9222; fax: /81-45-503-9215. E-mail address:
[email protected] (Y. Hayashizaki).
would like to develop a new post-sequence technologies and systems which can maximize the usefulness of the encyclopedia by utilizing full-length cDNA clones.
2. History and current status of Riken mouse genome encyclopedia project Since 1995 Riken genome exploration research group has been developing a series of new systems to construct full-length cDNA libraries (Carninci et al., 1996, 1997, 1998), high-speed sequencing system named RISA (Shibata et al., 2000). Using this system, approximately 2,000,000 mouse full-length cDNA clones were isolated from 267 independent tissues at different stages from different organs, and 3?-end sequence data could classify these clones into 159,789 clusters. Full stretches of sequences of 60,770 representative clones were sequenced, resulting in 33,409 unique sequences. We are also producing the data of the expression profiles and PPI (Kadota et al., 2001; Miki et al., 2001; Suzuki et al., 2001; Bono et al., 2002b). The integrated database including, not only full-length cDNA sequences but
0047-6374/03/$ - see front matter # 2002 Elsevier Science Ireland Ltd. All rights reserved. doi:10.1016/S0047-6374(02)00173-2
94
Y. Hayashizaki / Mechanisms of Ageing and Development 124 (2003) 93 /102
also mapping database expression profile and PPI data of all these genes were very useful for the analysis of gene functions, positional candidate approach, and gene network analysis to connect gene to phenotype.
3. Achievement 3.1. Development of a system to prepare the mouse genome encyclopedia To establish the mouse genome encyclopedia, two series of technologies are needed: a method for constructing high quality of full-length cDNA and RISA (Riken Integrated Sequence Analysis) system (highspeed sequencing system). At present, our Riken full-length cDNA method consists of four key technologies: an elongation method for first strand cDNA synthesis (Carninci et al., 1998), a selection method for eliminating partial cDNA (Carninci et al., 1996, 1997), a normalization and subtraction method (Carninci et al., 2000) to avoid redundancy in subsequent sequencing efforts and new cloning vector (Carninci et al., 2001). Also, the following technologies play an important role: protocol for making library from small amount of tissues, removal of poly-stretch, mixed- and tagged-cDNA libraries. We also have developed a large-scale plasmid preparation system (Itoh et al., 1997, 1999), a transcriptional sequencing (TS) reaction system (Sasaki et al., 1998a,b; Izawa et al., 1998), and a high-speed 384capillary sequencer that should enable analysis of 40,000 samples/day (Shibata et al., 2000). The most important point is that all of these technologies will be incorporated into a single system to achieve our goal.
3.2.2. Selection method The selection method is named ‘the Cap trapper method’ (Carninci et al., 1996, 1997) (Fig. 1c). This method employs chemical modification of the Cap site to oxidize the diol structure, which is specific for nonredundant cDNA. The product dialdehyde structure is connected to the hydrazide. Thus, the diol group can be biotinylated by biotin hydrazide. After synthesis of the first strand synthesis, the Cap site is labeled by the above-mentioned chemical modification, and RNaseI is used to cleave the single strand RNA but not the DNARNA hybrid. In the mRNA with partial cDNA, the single-strand RNA is exposed and can be attached by RNaseI. Therefore, RNaseI treatment removes the biotin group from mRNA with partial cDNA, and only the biotin label at the Cap site of mRNA with the full-length first strand cDNA remains. Subsequently, full-length cDNA can be collected using avidin beads. 3.2.3. Normalization-subtraction method Highly expressed genes and already-collected genes reduce the efficiency of collection of novel cDNAs by one-pass sequencing. To eliminate them, we developed a reiterative normalization and subtraction method utilizing biotinylated RNAs as drivers (Carninci et al., 2000) (Fig. 1d). Currently an amplified library subtraction system is being introduced in order to rescue from cDNA libraries the cDNAs that have not yet been classified by one-pass sequencing from existing cDNA libraries (Fig. 2).
Our group is approaching this project with originally developed technologies. The Riken full-length cDNA method consists of four key technologies as described below.
3.2.4. Removal of poly-A tails from FL-cDNAs Poly-stretches of nucleotides, such as Poly (A) tail in cDNAs are known to interfere with the processibility of DNA/RNA polymerase in the sequencing reaction, resulting in reduced read-lengths and rates of successful sequences. To overcome this problem, we developed a new method to remove Poly (A) tails from cDNAs using Type II restriction enzymes (Shibata et al., 2001b). We also removed the G-stretch previously used for priming the second strand by a new linker adapter, strategy that induces with high efficiency a sequence to prime the second strand cDNA (Shibata et al., 2001a).
3.2.1. Elongation method The basic principle for the elongation method is that the first strand cDNA synthesis should be undertaken at high temperature because the main cause of partial cDNA synthesis is the secondary structure of the mRNA. The elongation method is based on trehalosemediated themostabilization, thermoactivation and thermoprotection of reverse transcriptase (Carninci et al., 1998) (Fig. 1a and b). In addition to this development, we also discovered that trehalose has some effect, which may be large, on half of the enzymes generally used.
3.2.5. Development of host/vectors for FL-cDNA libraries Two kinds of cloning vectors were developed (Carninci et al., 2001): lambda FLC1 andFLC2. Lambda FLC1 can clone a wide range of cDNAs from 0.5 to over 13 kb long and shows slight size preference for long cDNA clones (Carninci et al., in press). We could routinely prepare a cDNA library that once bulk-excised into plasmid, showed average sixes of 2/3 kbp. Lambda FLC2 is a modified lambda FLC1, which contains att sites flanking the cloning sites. The att sites allow easy transfer of a cDNA insert into other vectors for
3.2. Full-length cDNA library technology
Y. Hayashizaki / Mechanisms of Ageing and Development 124 (2003) 93 /102
95
Fig. 1. The full-length cDNA cloning is based on several developments to synthesize long cDNAs, select full-length cDNA, subtract and normalize the cDNA and finally cDNA in cloning into appropriate vectors. (a) Reverse transcriptase cannot extend efficiently cDNAs until the cap site due to strong RNA structure at 420C. However, if the reaction is performed at high temperature in presence of trehalose, high-rate of long full-length cDNA synthesis is achieved. (b) Left panel, trehalose thermoactivated reverse transcriptase is much very effective to synthesize a full-length cDNA (arrow) from a 5 Kb template and to decrease the truncated cDNAs (asterisks), while the reaction at 600C in absence of trehalose does not produce any fulllength cDNA. On the right panel, mouse brain full-length cDNA is longer (arrow) when prepared with trehalose at high temperature than the normal reaction at 420C (asterisk). (c) Left, the chemical biotinylation of the diol group Cap (and the end of the polyA tail). Right panel, flowchart of the cap-trapping procedure: biotinylated cap allows the recovery with streptavidin magnetic beads of only the full-length cDNA. In fact, RNAseI cleaves away the biotinylated cap from the truncated cDNA/mRNA hybrid, which is not recovered and therefore eliminated. (d) Schematic representation of subtraction and normalization. A cDNA tester contains abundant cDNA, cDNA that have already collected by one-pass sequencing and desirable rare, new cDNAs. After mixing with excess of a RNA biotinylated driver containing the unwanted sequences, deriving from cDNA that have already been collected in the project and equimolar amount of starting mRNA to normalize the highly expressed genes, nucleic acid hybridization is carried out at controlled conditions. At the end, the unwanted sequences (hybrids) are removed with magnetic beads and the remaining single-strand cDNA is converted to double-strand cDNA and cloned.
96
Y. Hayashizaki / Mechanisms of Ageing and Development 124 (2003) 93 /102
Fig. 1 (Continued)
expression and other functional studies with enzyme lambda recombinase. This system should facilitate the functional analyses of cDNAs in post-sequence research. 3.3. High-throughput sequencing system We established and expanded our large-scale sequencing system. It comprises of FL-cDNA library construction, an Escherichia coli picking system, a plasmid preparation system, a sequencing reaction system, a sequencing system (Fig. 3), and the management of samples and data. The current capacity of sequencing is 40,000 samples/day. All samples are well-tracked to avoid confusion of IDs (ID errors), and quality control checking is routinely done.
now being optimized to achieve constant yield and quality for sequencing templates. 3.5. Transcriptional sequencing To develop the TS system (Sasaki et al., 1998a,b; Izawa et al., 1998), we originally developed a mutated RNA polymerase, which can incorporate the 3? dNTP preferentially and uniformly, and fluorescent 3? dNTP dye terminator. 3.6. Development of 384-capillary sequencer (RISA sequencer) We completed the development of the first version of a 384-capillary sequencer (RISA 2) (Shibata et al., 2000) at the end of 1996. Shimadzu from November 1999 has commercially marketed it.
3.4. Plasmid preparation system 3.7. Data management system We have designed and introduced three instruments: an instrument for medium distribution and E. coli inoculation, a harvester of E. coli culture solution, and a plasmid extractor (Itoh et al., 1999). These instruments can process 40,000 plasmid extractions per day and are
We have developed many programs to analyze sequence data produced in our high-throughput sequencing system (Sugahara et al., 2001), such as a set of tools automatically classifying cDNA clones based on the 3?-
Y. Hayashizaki / Mechanisms of Ageing and Development 124 (2003) 93 /102
97
Fig. 1 (Continued)
end sequences and tools for automatically registering sequence data in the encyclopedia database (Konno et al., 2001). We have also established an assembly and primer design system. The assembly system can handle three kinds of sequence data produced by different kinds of sequencers in a uniform style. If a gap remains, primers for the primer walking sequencing can be easily designed. Public available sequences, such as EST data, can also be utilized to fill the gaps. We have a database system based on Sybase DBMS to manage most information derived from our sequencing system, tissue sources for FL-cDNA libraries, clone ID, 3?-end sequences, full-length sequences, and clustering information. This database is updated daily. The summarized information can be viewed with Web browsers in a user-friendly manner.
non-redundant full-length cDNA. In Phase II, the fullsequence of the rearrayed clones from the non-redundant cDNA library will be determined. In Phase III, the chromosomal location of all full-length cDNA will be identified by in silico hybridization with human and mouse genome sequencing (Kondo et al., 2001; Yamanaka et al., 2002) which is expected to be produced in the near future by mouse genome. Phase IV is producing a basic database of gene expression in the body and during the development from embryo to adult (Kadota et al., 2001; Miki et al., 2001; Bono et al., 2002b). Finally, Phase V is the step to produce the PPI, based on the biggest advantage of full-length cDNA which can express the whole structure of protein (Suzuki et al., 2001; Saito et al., 2002), although partial cDNA (expression sequence tag; EST) and genome cannot.
3.9. Data production 3.8. Data accumulation The mouse genome encyclopedia will be prepared in five phases. In Phase I, we will construct the mouse fulllength cDNA using as many tissues as can be collected and cluster these clones by end sequencing, to produce
3.9.1. cDNA libraries Almost all cDNAs were normalized and/or subtracted, constructed from over 267 tissues and cells of Riken Mouse Genome Encyclopedia.
98
Y. Hayashizaki / Mechanisms of Ageing and Development 124 (2003) 93 /102
Fig. 2. Caption of the gene discovery depending on the class of expression. mRNAs highly expressed (that appeared more than 10 times in the course of the project) were isolated early in the course of the project, similarly to mRNAs that appeared at intermediate frequency (6 to 10 times). However, to identify the rare and very rare ones (that appears 2-5 times and only once respectively), strong subtraction has been necessary. The arrows indicate the time when various sets of rearrayed clones were used to prepare subtracting drivers and how this influenced gene discovery in the various categories of expression.
3.9.2. 2,000,000 cDNA clones and those clones and those 3?-end sequences By clustering of the 3?-end sequences, a total of 159,789 clusters of cDNAs have been obtained. However, this classification includes some overlap or redundancy, because of various form of splicing, alternative poly adenylation sites, some internal priming, and clustering limitation due to sequencing fidelity and other factors.
3.9.3. Full sequences of cDNA clones derived from nonredundant FL-cDNA set Representative clones of clusters that are based on 3?end sequences of cDNAs were rearrayed and sent to Phase 2, the full-sequencing phase. Apparently novel genes estimated by comparison of 3?-end sequences and public DNA databases were given high priorities for full-sequencing. So far, 60,770 full-sequences from 159,789 clusters were determined. The 60,770 sequences
still contained redundancy, resulting in being clustered into 33,409 completely independent unique sequences.
3.10. Functional annotation of FL-cDNAs In order to annotate the function of 60,770 full-length mouse cDNAs sequenced in RIKEN, we held the FANTOM (Functional Annotation Of Mouse) meeting through 29th April to the 5th of May to establish the international standard of annotations. This annotation activity covers not only functional information itself, but also many other informative data, such as supplemental descriptions of gene function (gene symbol and its synonym), the functional classification (Gene Ontology, chromosomal localization (from genetic mapping and physical mapping if available), expression specificity (organ localization and sub-cellular localization of cDNA) (Kawai et al., 2001, Okazaki et al., 2002, Carninci et al., in press and Bono et al., 2002a).
Y. Hayashizaki / Mechanisms of Ageing and Development 124 (2003) 93 /102
99
Fig. 3. Riken Integrated Sequence Analyzer (RISA) system. a) RISA Sequencer. This machine is a 384 capillary automated sequencer. Gel filled 384 capillaries, assembled as a cassette, is used for separation of sequencing reaction products. Samples are injected into capillaries electrically and electrophoreses are performed in a thermo-controlled camber. Fluorescences are detected by PMT, and acquired data are sent to a sewer for analysis and storage. b) RISA Automated PCR and thermal cycling reaction system. A newly developed thermal cycler (GS-384) has four 384-well thermal controllable blocks, and thermal cycling programs and operations are performed manually or automatically. The operation of lid can be controlled by computer. This integration of plate handling and reaction is performed automatically. c) RISA plasmid preparator. This machine can prepare 40,000 samples per day based on the filtration method. Each 96-well plate is manipulated sequentially. An automated inoculation machine and a filtration machine are also developed for this system.
At this moment, it is clear that Riken mouse genome encyclopedia is the largest transcriptome described in any organism to date. Analysis of these cDNAs extends known gene families and identifies new ones. 3.11. DNA microarray 3.11.1. Construction of high-throughput arrayer A new arrayer has been constructed, having two arms, each of which holds a pin head, and a large stage on which 96 microarrays can be prepared simultaneously. When a 16-pin head is adopted, 96 microarrays of 30 k cDNAs can be done in 100 h. A 48-pin head device allowing faster mode is also available. Maximum
performance is expected to be 96 microarrays of 30 k cDNAs in 16 h.
3.11.2. 19K RIKEN microarray to 40K RIKEN microarray We have established 19K microarrays of RIKEN cDNAs. Expression profiles of various tissues and several developmental stages have been investigated (Miki et al., 2001). Microarray data are analyzed and stored in our expression database, READ (Bono et al., 2002b). Nineteen thoudsand genes were analyzed in 20 tissues. This database serves as a fundamental resource for the functional researches of each cDNA and each
100
Y. Hayashizaki / Mechanisms of Ageing and Development 124 (2003) 93 /102
gene cascade. The size of the database is expanded to 40K /20 tissues matrix. 3.12. Development of PPI analysis system To uncover the function of each gene as a systematic genome-wide approach, the PPI panel covering all genes (Suzuki et al., 2001), is very important. PPIs play pivotal roles in the network of cellular biological processes and also they should be potential targets for drugs developments. However, it seems not to be so easy to establish entire PPI panel in mouse, because the estimated total number of mouse genes of 100,000 is far larger than those of budding yeast (/6000) and C. elegans (/ 20,000). To address this difficulty, we have developed a highthroughput PPI assay system that consists of a PCRmediated sample preparation and a modified mammalian two-hybrid method. In the pilot study, the system achieved the examination of more than 106 combinations per day. We have also developed a selection method of assay samples allowing us to find significantly interacting combination efficiently, which was based on the demonstration that two genes co-expressed in the same tissues at the same stages preferentially interact each other. These two key developments paved a way to enable us completion of a rough draft of entire PPI panel in mouse within a few years.
on all full-length cDNA, their primary structures, and expression sites. Also important is the chromosomal mapping of the cDNAs at sequencing level in order to connect the gene and the phenotype. We have started developing a system to establish such as encyclopedia using a full-length cDNA system and the RISA sequencing system. To overcome possible resource problems, we chose mouse materials from inbred, congenic and knockout systems, which are available with no limitation for the preparation of tissues such as samples at very early embryonic stages and fertilized eggs. Predictions of human full-length cDNA sequences in-silico can be done by homology search in comparison with our mouse full-length cDNA (Kondo et al., 2001; Lander et al., 2001). This is the significance of our strategy of choosing mouse cDNA as a target. We have begun collecting mouse full-length cDNA and are finding our approach an extremely powerful one for analyzing and explaining why certain genes cause a phenotype. To connect gene(s) and phenotype(s), our encyclopedia is very useful for identifying candidate gene(s) responsible for the phenotype(s) in the positional candidate approach. The cDNA microarrays are also useful for selecting a set of genes, which are transcriptionally regulated downstream of the target gene, using mutant and normal tissues. We plan to continue development of the mouse genome encyclopedia and the technologies to establish it and make it widely useful in order to enable the depiction of genome-wide maps from gene(s) to phenotype.
4. Application of cDNA system to other projects 4.1. Arabidopsis thaliana full-length cDNA project
Acknowledgements
The Arabidopsis thaliana Genome Project of Plant Molecular Biology Laboratory is to be finished in this academic year. To determine the chromosomal locations, and expression profiles of this plant-model organism our group is collaborating with the Plant Molecular Biology Laboratory with our cDNA cloning technique (Seki et al., 1998, 2001). At the moment 115,000 cDNA clones were constructed based on 3?-end sequences of cDNAs (Seki et al., 2002). Fifteen thousand full sequences at 99.99% accuracy will be finished sequences very soon as an international collaboration. All of these clones were mapped onto Arabidopsis thaliana genome sequence that was published at the end of the last year.
This study has been supported by Research Grant for the RIKEN Genome Exploration Research Project from the Ministry of Education, Culture, Sports, Science and Technology of the Japanese Government to Y.H.
5. Future plan Our final goal is to establish a system for genomewide understanding of biological phenomena at the molecular level, particularly in the medical field. In order to achieve this goal, the first step is to collect data
References Bono, H., Kasukawa, T., Furuno, M., Hayashizaki, Y., Okazaki, Y., 2002a. FANTOM DB: database of functional annotation of RIKEN mouse cDNA clones. Nucl. Acids Res. 30, 116 /118. Bono, H., Kasukawa, T., Furuno, M., Hayashizaki, Y., Okazaki, Y., 2002b. READ: RIKEN expression array database. Nucl. Acids Res. 30, 211 /213. Carninci, P., Kvam, C., Kitamura, A., Ohsumi, T., Okazaki, Y., Itoh, M., Kamiya, M., Shibata, K., Sasaki, N., Izawa, M., Muramatsu, M., Hayashizaki, Y., Schneider, C., 1996. High-efficiency fulllength cDNA cloning by biotinylated CAP trapper. Genomics 37, 327 /336. Carninci, P., Westover, A., Nishiyama, Y., Ohsumi, T., Itoh, M., Nagaoka, S., Sasaki, N., Okazaki, Y., Muramatsu, M., Schneider, C., Hayashizaki, Y., 1997. High efficiency selection of full-length cDNA by improved biotinylated cap trapper. DNA Res. 4, 61 /66.
Y. Hayashizaki / Mechanisms of Ageing and Development 124 (2003) 93 /102 Carninci, P., Nishiyama, Y., Westover, A., Itoh, M., Nagaoka, S., Sasaki, N., Okazaki, Y., Muramatsu, M., Hayashizaki, Y., 1998. Thermostabilization and thermoactivation of thermolabile enzymes by trehalose and its application for the synthesis of full length cDNA. Proc. Natl. Acad. Sci. USA 95, 520 /524. Carninci, P., Shibata, Y., Hayatsu, N., Sugahara, Y., Shibata, K., Itoh, M., Konno, H., Okazaki, Y., Muramatsu, M., Hayashizaki, Y., 2000. Normalization and subtraction of cap-trapper-selected cDNAs to prepare full-length cDNA libraries for rapid discovery of new genes. Genome Res. 10, 1617 /1630. Carninci, P., Shibata, Y., Hayatsu, N., Itoh, M., Shiraki, T., Hirozane, T., Watahiki, A., Shibata, K., Konno, H., Muramatsu, M., Hayashizaki, Y., 2001. Balanced-size and long-size cloning of full-length, cap-trapped cDNAs into vectors of the novel l-FLC family allows enhanced gene discovery rate and functional analysis. Genomics 77, 79 /90. Carninci, P., Shiraki, T., Mizuno, Y., Muramatsu, M., Hayashizaki, Y., 2002. Extra-long first-strand cDNA synthesis. Biotechniques, in press. Itoh, M., Carninci, P., Nagaoka, S., Sasaki, N., Okazaki, Y., Ohsumi, T., Muramatsu, M.and, Hayashizaki, Y., 1997. Simple and rapid preparation of plasmid template by a filtration method using microtiter filter plates. Nucl. Acids Res. 25, 1315 /1316. Itoh, M., Kitsunai, T., Akiyama, J., Shibata, K., Izawa, M., Kawai, J., Tomaru, Y., Carninci, P., Shibata, Y., Ozawa, Y., Muramatsu, M., Okazaki, Y., Hayashizaki, Y., 1999. Automated filtration-based high-throughput plasmid preparation system. Genome Res. 9, 463 /470. Izawa, M., Sasaki, N., Watahiki, M., Ohara, E., Yoneda, Y., Muramatsu, M., Okazaki, Y., Hayashizaki, Y., 1998. Recognition sites of 3?-OH group by T7 RNA polymerase and its application to transcriptional sequencing. J. Biol. Chem. 273, 14 242 /14 246. Kawai, J., Shinagawa, A., Shibata, K., Yoshino, M., Itoh, M., Ishii, Y., Arakawa, T., Hara, A., Fukunishi, Y., Konno, H., Adachi, J., Fukuda, S., Aizawa, K., Izawa, M., Nishi, K., Kiyosawa, H., Kondo, S., Yamanaka, I., Saito, T., Okazaki, Y., Gojobori, T., Bono, H., Kasukawa, T., Saito, R., Kadota, K., Matsuda, H., Ashburner, M., Batalov, S., Casavant, T., Fleischmann, W., Gaasterland, T., Gissi, C., King, B., Kochiwa, H., Kuehl, P., Lewis, S., Matsuo, Y., Nikaido, I., Pesole, G., Quackenbush, J., Schriml, L.M., Staubli, F., Suzuki, R., Tomita, M., Wagner, L., Washio, T., Sakai, K., Okido, T., Furuno, M., Aono, H., Baldarelli, R., Barsh, G., Blake, J., Boffelli, D., Bojunga, N., Carninci, P., de Bonaldo, M.F., Brownstein, M.J., Bult, C., Fletcher, C., Fujita, M., Gariboldi, M., Gustincich, S., Hill, D., Hofmann, M., Hume, D.A., Kamiya, M., Lee, N.H., Lyons, P., Marchionni, L., Mashima, J., Mazzarelli, J., Mombaerts, P., Nordone, P., Ring, B., Ringwald, M., Rodriguez, I., Sakamoto, N., Sasaki, H., Sato, K., Schonbach, C., Seya, T., Shibata, Y., Storch, K.F., Suzuki, H., Toyo-oka, K., Wang, K.H., Weitz, C., Whittaker, C., Wilming, L., Wynshaw-Boris, A., Yoshida, K., Hasegawa, Y., Kawaji, H., Kohtsuki, S., Hayashizaki, Y., 2001. Functional annotation of 21,076 sequenced mouse cDNAs prepared from full-length enriched libraries. Nature 409, 685 /690. Kadota, K., Miki, R., Bono, H., Shimizu, K., Okazaki, Y., Hayashizaki, Y., 2001. Preprocessing implementation for microarray (PIRM): an efficient method for processing cDNA microarray data. Physiol. Genomics 4, 183 /188. Kondo, S., Shinagawa, A., Saito, T., Kiyosawa, H., Yamanaka, I., Aizawa, K., Fukuda, S., Hara, A., Itoh, M., Kawai, J., Shibata, K., Hayashizaki, Y., 2001. Computational analysis of full-length mouse cDNA compared with human genome sequences. Mamm. Genome 12, 673 /677. Konno, H., Fukunishi, Y., Shibata, K., Itoh, M., Carninci, P., Sugahara, Y., Hayahizaki, Y., 2001. Computer-based methods for the mouse full-length cDNA encyclopedia: real-time sequence
101
clustering for construction of a nonredundant cDNA library. Genome Res. 11, 281 /289. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., Heaford, A., Howland, J., Kann, L., Lehoczky, J., LeVine, R., McEwan, P., McKernan, K., Meldrim, J., Mesirov, J.P., Miranda, C., Morris, W., Naylor, J., Raymond, C., Rosetti, M., Santos, R., Sheridan, A., Sougnez, C., Stange-Thomann, N., Stojanovic, N., Subramanian, A., Wyman, D., Rogers, J., Sulston, J., Ainscough, R., Beck, S., Bentley, D., Burton, J., Clee, C., Carter, N., Coulson, A., Deadman, R., Deloukas, P., Dunham, A., Dunham, I., Durbin, R., French, L., Grafham, D., Gregory, S., Hubbard, T., Humphray, S., Hunt, A., Jones, M., Lloyd, C., McMurray, A., Matthews, L., Mercer, S., Milne, S., Mullikin, J.C., Mungall, A., Plumb, R., Ross, M., Shownkeen, R., Sims, S., Waterston, R.H., Wilson, R.K., Hillier, L.W., McPherson, J.D., Marra, M.A., Mardis, E.R., Fulton, L.A., Chinwalla, A.T., Pepin, K.H., Gish, W.R., Chissoe, S.L., Wendl, M.C., Delehaunty, K.D., Miner, T.L., Delehaunty, A., Kramer, J.B., Cook, L.L., Fulton, R.S., Johnson, D.L., Minx, P.J., Clifton, S.W., Hawkins, T., Branscomb, E., Predki, P., Richardson, P., Wenning, S., Slezak, T., Doggett, N., Cheng, J.F., Olsen, A., Lucas, S., Elkin, C., Uberbacher, E., Frazier, M., Gibbs, R.A., Muzny, D.M., Scherer, S.E., Bouck, J.B., Sodergren, E.J., Worley, K.C., Rives, C.M., Gorrell, J.H., Metzker, M.L., Naylor, S.L., Kucherlapati, R.S., Nelson, D.L., Weinstock, G.M., Sakaki, Y., Fujiyama, A., Hattori, M., Yada, T., Toyoda, A., Itoh, T., Kawagoe, C., Watanabe, H., Totoki, Y., Taylor, T., Weissenbach, J., Heilig, R., Saurin, W., Artiguenave, F., Brottier, P., Bruls, T., Pelletier, E., Robert, C., Wincker, P., Smith, D.R., DoucetteStamm, L., Rubenfield, M., Weinstock, K., Lee, H.M., Dubois, J., Rosenthal, A., Platzer, M., Nyakatura, G., Taudien, S., Rump, A., Yang, H., Yu, J., Wang, J., Huang, G., Gu, J., Hood, L., Rowen, L., Madan, A., Qin, S., Davis, R.W., Federspiel, N.A., Abola, A.P., Proctor, M.J., Myers, R.M., Schmutz, J., Dickson, M., Grimwood, J., Cox, D.R., Olson, M.V., Kaul, R., Raymond, C., Shimizu, N., Kawasaki, K., Minoshima, S., Evans, G.A., Athanasiou, M., Schultz, R., Roe, B.A., Chen, F., Pan, H., Ramser, J., Lehrach, H., Reinhardt, R., McCombie, W.R., de la Bastide, M., Dedhia, N., Blocker, H., Hornischer, K., Nordsiek, G., Agarwala, R., Aravind, L., Bailey, J.A., Bateman, A., Batzoglou, S., Birney, E., Bork, P., Brown, D.G., Burge, C.B., Cerutti, L., Chen, H.C., Church, D., Clamp, M., Copley, R.R., Doerks, T., Eddy, S.R., Eichler, E.E., Furey, T.S., Galagan, J., Gilbert, J.G., Harmon, C., Hayashizaki, Y., Haussler, D., Hermjakob, H., Hokamp, K., Jang, W., Johnson, L.S., Jones, T.A., Kasif, S., Kaspryzk, A., Kennedy, S., Kent, W.J., Kitts, P., Koonin, E.V., Korf, I., Kulp, D., Lancet, D., Lowe, T.M., McLysaght, A., Mikkelsen, T., Moran, J.V., Mulder, N., Pollara, V.J., Ponting, C.P., Schuler, G., Schultz, J., Slater, G., Smit, A.F., Stupka, E., Szustakowski, J., Thierry-Mieg, D., Thierry-Mieg, J., Wagner, L., Wallis, J., Wheeler, R., Williams, A., Wolf, Y.I., Wolfe, K.H., Yang, S.P., Yeh, R.F., Collins, F., Guyer, M.S., Peterson, J., Felsenfeld, A., Wetterstrand, K.A., Patrinos, A., Morgan, M.J., Szustakowki, J., de Jong, P., Catanese, J.J., Osoegawa, K., Shizuya, H., Choi, S., Chen, Y.J., 2001. Initial sequencing and analysis of the human genome. Nature 409, 860 / 921. Miki, R., Kadota, K., Bono, H., Mizuno, Y., Tomaru, Y., Carninci, P., Itoh, M., Shibata, K., Kawai, J., Konno, H., Watanabe, S., Sato, K., Tokusumi, Y., Kikuchi, N., Ishii, Y., Hamaguchi, Y., Nishizuka, I., Goto, H., Nitanda, H., Satomi, S., Yoshiki, A., Kusakabe, M., DeRisi, J.L., Eisen, M.B., Iyer, W.R., Brown, P.O., Muramatsu, M., Shimada, H., Okazaki, Y., Hayashizaki, Y., 2001. Delineating developmental and metabolic pathways in vivo by expression profiling using the RIKEN set of 18,816 full-length enriched mouse cDNA arrays. Proc. Natl. Acad. Sci. USA 98, 2199 /2204.
102
Y. Hayashizaki / Mechanisms of Ageing and Development 124 (2003) 93 /102
Saito, R., Suzuki, H., Hayashizaki, Y., 2002. Interaction generality, a measurement to assess reliability of protein /protein interaction. Nucl. Acids Res. 30, 1163 /1168. Sasaki, N., Izawa, M., Watahiki, M., Ozawa, K., Tanaka, T., Yoneda, Y., Matsuura, S., Carninci, P., Muramatsu, M., Okazaki, Y., Hayashizaki, Y., 1998a. Transcriptional sequencing: a method for DNA sequencing using RNA polymerase. Proc. Natl. Acad. Sci. USA 95, 3455 /3460. Sasaki, N., Izawa, M., Sugahara, Y., Tanaka, T., Watahiki, M., Ozawa, K., Ohrara, E., Funaki, H., Yoneda, Y., Matsuura, S., Muramatsu, M., Okazaki, Y., Hayashizaki, Y., 1998b. Identification of stable RNA hairpins causing band compression in transcriptional sequencing and their elimination by use of inosine triphosphate. Gene 222, 17 /24. Seki, M., Carninci, P., Nishiyama, Y., Hayashizaki, Y., Shinozaki, K., 1998. High-efficiency cloning of Arabidopsis full-length cDNA by biotinylated CAP trapper. Plant J. 15, 707 /720. Seki, M., Narusaka, M., Abe, H., Kasuga, M., Yamaguchi-Shinozaki, K., Carninci, P., Hayashizaki, Y., Shinozaki, K., 2001. Monitoring the expression pattern of 1300 Arabidopsis genes under drought and cold stresses using full-length cDNA microarray. Plant Cell 13, 61 /72. Seki, M., Naramura, M., Kamiya, A., Ishida, J., Satou, M., Sakurai, T., Nakajima, M., Enju, A., Akiyama, K., Oono, Y., Muramatsu, M., Hayashizaki, Y., Arakawa, T., Shibata, K., Shinagawa, A., Shinozaki, K., 2002. Functional annotation of a full-length Arabidopsis cDNA collection. Science 296, 141 /145. Shibata, K., Itoh, M., Aizawa, K., Nagaoka, S., Sasaki, N., Carninci, P., Konno, H., Akiyama, J., Nishi, K., Kitsunai, T., Tashiro, H., Itoh, M., Sumi-Kikuchi, N., Ishii, Y., Nakamura, S., Hazama, M., Nishine, T., Harada, A., Yamamoto, R., Matsumoto, H., Saka-
guchi, S., Ikegami, T., Kashiwagi, K., Fujiwake, S., Inoue, K., Togawa, Y., Izawa, M., Ohara, E., Watahiki, M., Yoneda, Y., Ishikawa, T., Ozawa, K., Tanaka, T., Matsuura, S., Kawai, J., Okazaki, Y., Muramatsu, M., Inoue, Y., Hayashizaki, Y., 2000. RIKEN integrated sequence analysis (RISA) system */384-format sequencing pipeline with 384 multicapillary sequencer. Genome Res. 10, 1757 /1771. Shibata, Y., Carninci, P., Watahiki, A., Shiraki, T., Konno, H., Muramatsu, M., Hayashizaki, Y., 2001a. Cloning full-length, captrapper-selected cDNAs by using the single-strand linker ligation method. Biotechniques 30, 1250 /1254. Shibata, Y., Carninci, P., Sato, K., Hayatsu, N., Shiraki, T., Ishii, Y., Arakawa, T., Hara, A., Ohsato, N., Izawa, M., Aizawa, K., Itoh, M., Shibata, K., Shinagawa, A., Kawai, J., Ota, Y., Kikuchi, S., Kishimoto, N., Muramatsu, M., Hayashizaki, Y., 2001b. Removal of polyA tails from full-length cDNA libraries for high efficiency sequencing. Biotechniques 31, 1042 /1049. Sugahara, Y., Carninci, P., Itoh, M., Shibata, K., Konno, H., Endo, T., Muramatsu, M., Hayashizaki, Y., 2001. Comparative evaluation of 5?-end-sequence quality of clones in CAP trapper and other full-length-cDNA libraries. Gene 263, 93 /102. Suzuki, H., Fukunishi, Y., Kagawa, I., Bono, H., Saito, R., Oda, H., Endo, T., Kondo, S., Okazaki, Y., Hayashizaki, Y., 2001. Protein / protein interaction panel using mouse full-length cDNAs. Genome Res. 11, 1758 /1765. Yamanaka, I., Kiyosawa, H., Kondo, S., Saito, T., Carninci, P., Shinagawa, A., Aizawa, K., Fukuda, S., Hara, A., Itoh, M., Kawai, J., Shibata, K., Arakawa, T., Ishii, Y., Hayashizaki, Y., 2002. Mapping of 19032 mouse cDNAs on the mouse chromosomes. J. Struct. Funct. Genomics 2, 23 /28.