Review
TRENDS in Biotechnology Vol.19 No.12 December 2001
511
Embryogenomics: developmental biology meets genomics Minoru S.H. Ko Fundamental questions in developmental biology are: what genes are expressed, where and when they are expressed, what is the level of expression and how are these programs changed by the functional and structural alteration of genes? These questions have been addressed by studying one gene at a time, but a new research field that handles many genes in parallel is emerging.The methodology is at the interface of large-scale genomics approaches and developmental biology. Genomics needs developmental biology because one of the goals of genomics – collection and analysis of all genes in an organism – cannot be completed without working on embryonic tissues in which many genes are uniquely expressed. However, developmental biology needs genomics – the high-throughput approaches of genomics generate information about genes and pathways that can give an integrated view of complex processes.This article discusses these new approaches and their applications to mammalian developmental biology.
Minoru S.H. Ko Developmental Genomics & Aging Section, Laboratory of Genetics, National Institute on Aging, National Institutes of Health, Baltimore, MD 21224-6820, USA. e-mail: KoM@ grc.nia.nih.gov
The Human Genome Project, which started in the late 1980s, has recently completed its primary mission by providing nearly complete sequences of the human genome1,2. Emergence and development of several new research fields are interwoven with the rapid progress of the program. The Genome Project has influenced the way that current biological research is conducted, by providing not only the materials and sequence information but also a novel conceptual framework. For example, the consideration of the entire genome becomes routine in laboratory life, and high-throughput technologies have been transforming entire fields of biology and medicine. Developmental biology could not sidestep the genomics revolution. Even scientists using traditional model organisms, such as the frog, have started to adopt genomic approaches (see http://www.nih.gov/science/models/xenopus/reports/ xenopus_report.pdf). Other model organisms, such as the fly and worm, have been at the forefront of the genomics revolution because the complete genomic sequence information for those organisms became available earlier. This new research field, which can be called ‘developmental genomics’3 or ‘embryogenomics,’ is emerging from the interface between genomics and developmental biology. Embryogenomics can also be defined as the systematic analysis of cohorts of genes expressed during development, aided by large-scale genomic methods. However, in a broad sense, embryogenomics can include all high-throughput approaches such as large-scale mutagenesis using N-ethyl–N-nitrosourea4 (ENU), gene knockout5 and in situ hybridization programs6,7. http://tibtech.trends.com
The association of developmental biology and genomics is natural. For most developmental biologists, the starting point or focus of research is genes that are differentially expressed among different cell types and tissues. Traditionally, this goal has been achieved using a variety of elaborate molecular biological techniques including subtraction cDNA library8, serial analysis of gene expression (SAGE; Ref. 9) and differential display10. The new cDNA microarray and chip technologies11,12 are the most promising, but even the prototype of these approaches was invented and used by developmental biologists nearly 20 years ago8. Early attempts to make a collection of all genes (whole cDNA catalog) were also carried out with a cDNA array application with developmental biology in mind13. Embryonic materials are essential for the completion of a whole cDNA catalog
Collection of all genes in the form of cDNA clones is central to embryogenomic approaches (Fig. 1). Primary means of achieving this goal have been expressed sequence tag (EST) projects, which comprise single-pass sequencing of randomly picked cDNA clones14 (Fig. 1e). In this approach, genes that are not expressed in cells and tissues are not represented in cDNA libraries and are therefore not included in EST collections. Because it is reasonable to expect that there are genes that are only expressed during the embryonic and fetal stages, the importance of using embryonic and/or fetal materials should not be underestimated. This is particularly problematic for humans because it is difficult to obtain early embryos for ethical and technical reasons (Fig. 2). The difficulty of obtaining early human embryonic and fetal material is reflected in the current publicly available cDNA clone and EST sets (Table 1). Ten years of research by many groups has accumulated 3.6 million human ESTs and 2 million mouse ESTs in the public database15 (as of June 2001). For human ESTs, the majority have been generated by the Washington University Genome Sequencing Center (St Louis, MO, USA)16 in a tight collaboration with the IMAGE consortium17 which supplies cDNA clones. Most of the 3.6 million human ESTs have been sampled from adult organs (Table 1). Although some human fetal libraries have been sequenced, the earliest stage of embryos used for extensive
0167-7799/01/$ – see front matter © 2001 Published by Elsevier Science Ltd. PII: S0167-7799(01)01806-6
Review
512
(a)
TRENDS in Biotechnology Vol.19 No.12 December 2001
(b)
(c)
(d) cDNA library (n) (g)
EST mapping
(e) EST project
1 Backcross panels 2 RH panels 3 Genome sequence
(f) EST clustering (o) Complete genome sequence
(h)
human mouse
Expression profiling by EST frequency
(i) Gene characterization Sequence annotation
Whole cDNA catalog (~50,000 genes)
(l) In situ hybridization
(p) Mutant mice 1 2 3 4 5
Spontaneous ENU Knockout Transgenic Insertional
own contribution of 25 000 ESTs, are derived from preimplantation embryos18 (Table 1). The lack of human embryonic and fetal ESTs might have contributed to the deduction of a low number of human genes (~30 000) in the nearly complete human genome sequence1,2, granted that ESTs are an important criteria for gene prediction and verification from genome sequences. The situation could be improved in the near future as a few cDNA libraries have been reported from human preimplantation embryos19,20. However, it is still impossible to obtain human embryos at periimplantation, thus there is no report of libraries from this stage. Alternatively, human embryonic stem (ES) cells21 and embryonic germ (EG) cells22 could be used as substitutes. They represent, or at least resemble, early human embryonic cell types: ES cells represent Inner Cell Mass (ICM) cells in the blastocyst and EG cells represent primordial germ cells (PGC). Many publications have shown that these cells can differentiate into various embryonic and/or adult cell types in vitro, including neuronal and hematopoietic lineages23. Therefore, it is probable that many new embryo-specific human genes can be recovered by applying embryogenomic approaches to these cell types. cDNA libraries, EST project, EST clustering, and minimum cDNA clone sets
(m)
(k)
Proteomics approaches
Conventional expression analysis
(j) cDNA
Northern blotting RT-PCR
microarray
TRENDS in Biotechnology
Fig. 1.The embryogenomics strategy.The first step (d) is to construct high-quality cDNA libraries from various embryonic materials: early embryonic stages such as (a) the preimplantation embryos, (b) microdissected tissues and (c) flow-sorted cells.Then, a large-scale single-pass sequencing of either the 3′-end or 5′-end of cDNA clones is performed to generate partial sequence information, that is (e) ESTs. (f) ESTs are clustered into groups, representing individual genes. (h) A cDNA clone set that represents the minimum number of unique genes can be assembled. (i)These genes are then characterized by complete sequencing and annotation. (g)The frequency of ESTs representing individual genes can also be used as an approximate measure of the expression levels of the genes. (n) ESTs have been mapped on interspecific backcross mouse panels and later on the radiation hybrid panels.They are increasingly mapped by finding corresponding ‘locations’ in genomic sequences in the public sequence database. Readily available cDNA clones and sequences can be immediately used for conventional expression analysis such Northern blots and (k) RT-PCR, but can also be used as probes for (l) large-scale in situ hybridization and (j) cDNA microarrays. (m)The clones can also be directly applied for proteomics approaches, such as preparation of mutant proteins and raising antibodies. (p) Various mutant mouse resources become available. Expression profiling of these mice and cells using cDNA microarrays provides a powerful route to analyze the function of genes and to untangle the complex genetic pathways.
sequencing is 8–9 weeks post ovulation, which is not early enough to include crucial developmental events (Table 1, Fig. 2). However, for the mouse, recent efforts have improved the representation of embryonic tissue-derived ESTs in the public database; many mouse ESTs are derived from embryonic stages and ~70 000 ESTs, including our http://tibtech.trends.com
One of the challenges is how to construct high quality cDNA libraries from a small amount of material (Fig. 1d). The small size of early embryos and scarcity of material (e.g. only 1 ng of total RNA from one mouse blastocyst) makes it difficult to apply regular cDNA library construction methods, which usually require at least 1 µg of polyA+ RNA. State-of-the-art methods for making full-length cDNA libraries24,25 and normalized (equal representation) cDNA libraries26 require much more RNA (>5–10 µg of polyA+ RNAs) than standard cDNA library construction methods, and thus suffer even more acutely from the same limitation. The use of PCR to amplify cDNA mixtures27–29 before cloning into a vector alleviates this problem but the insert size is customarily short (~0.8–1.5 kb), and not long enough to recover full-length cDNAs of long transcripts. A new protocol can now differentially amplify long tracts (~3.0 kb with size ranges of 1–7 kb) or short DNAs (~1.5 kb with size ranges of 0.5–3 kb) from a complex mixture, with 70% of clones containing a complete open reading frame30. This could alleviate some of the problems of PCR-amplified cDNA libraries in the future. Three types of embryonic materials have been used for the construction of cDNA libraries and EST projects so far (Fig. 1). First, whole mouse embryos from early embryonic stages were used for library constructions and EST projects (Fig. 1a). For example, conventional cDNA libraries with relatively short inserts (~0.7–1.3 kb) were made from
Review
TRENDS in Biotechnology Vol.19 No.12 December 2001
Time course of mouse development
Placenta Embryo Preimplantation Gastrulation and Organogenesis Fetal growth development early organogenesis and development (× 40) (× 2.5) (× 0.9) (× 0.4) 0
06
1
2
3
18 2835
Embryonic period
4
56 8
5
6
7
8
9
10
11
12
13
14
15
16
17
80
18
19
dpc
266
dpo
38
wpo
Fetal period Corresponding human developmental stages
Preimplantation period
TRENDS in Biotechnology
Fig. 2. Comparison of mouse and human development. Mouse development in utero lasts for 20 days, whereas human development lasts for 280 days, and the corresponding stages of development are not evenly distributed throughout the gestation period. The mouse spends the first two-thirds of gestation period (up to 14 days post conception [dpc]) in crucial developmental events such as gastrulation and organogenesis.The remaining time is spent mainly on maturation and growth. By contrast, humans spend only the first one-fifth of gestation (the ‘embryonic period’, up to 8 weeks post ovulation [wpo]) for the comparable crucial developmental events and the remaining long period (the ‘fetal period’) mainly on maturation and growth. Human embryos of the early crucial period are difficult to obtain both for ethical and technical reasons. Although the unused materials from IVF clinics have provided human embryos of preimplantation stages (0–6 days), it is almost impossible to obtain human embryos from stages immediately after implantation (6 days to 4 weeks).This makes mouse models invaluable for the study of mammalian development and also for the collection of many mammalian genes in the form of cDNAs. Drawings and staging of mouse embryos were adapted from the textbook by Hogan et al.64, and the comparative staging of human and mouse embryos were obtained from textbook by Rugh65.The stages of rat development are similar to those of the mouse. Comparative staging of human, rat and mouse development can also be found at the websites http://anatomy.med.unsw.edu.au/cbl/embryo/Embryo.htm and http://www.ana.ed.ac.uk/anatomy/database/humat/. Digital reference images of mouse development are also being constructed66.
5000 unfertilized eggs, 13 500 two-cell embryos, 2 778 eight-cell embryos and 600 blastocysts31; these libraries were extensively sequenced32. Another study shows the construction of a PCR-based cDNA library with ~1.35 kb inserts from 300 blastocysts, and the generation of 3995 ESTs (Ref. 32). A more recent study reports 25 438 ESTs generated from PCR-based cDNA libraries (with an average insert size of 1.5 kb) that were made from 1528 unfertilized eggs, 1137 fertilized eggs, 397 two-cell embryos, 32 four-cell embryos, 230 sixteen-cell embryos and 40 blastocysts18. Second, specific tissues were microdissected from mouse embryos and used for library constructions and EST projects (Fig. 1b). For example, three distinct embryonic tissues (endoderm, mesoderm and ectoderm) and primitive streak regions, were microdissected from 7.5 days post coitum (dpc) mouse embryos and used for the construction of conventional cDNA libraries34. Similarly, the extraembryonic portion of 7.5 dpc mouse embryos was dissected out and used for the construction of conventional cDNA libraries and EST projects35. http://tibtech.trends.com
513
Third, by tagging specific embryonic cells with β-galactosidase or green fluorescence protein (GFP), these cells can be flow-sorted and highly purified for further analysis such as cDNA library construction and gene expression profiling36 (Fig. 1c). This can be achieved by making a transgenic mouse with β-galactosidase or GFP driven by a tissue- or celltype-specific enhancer and promoters. Alternatively, a bacterial artificial chromosome (BAC) clone containing a whole genomic region, can be engineered to replace the first exon of a gene with GFP or β-galactosidase37. Transgenic mouse lines of such engineered BACs usually show that the expression of GFP or β-galactosidase is faithful to the original expression pattern of the gene37. One can also use a knock-in mouse in which GFP or β-galactosidase is inserted into the coding region of the gene to disrupt its function38. Once high-quality cDNA libraries are made, largescale single-pass sequencing of either the 3′- or 5′-ends of cDNA clones is performed to generate ESTs (Fig. 1e, Box 1). According to sequence similarity, ESTs are then clustered into groups that probably represent individual genes39–41 (Fig. 1f ). Based on these EST clusters, a cDNA clone set that represents the minimum number of unique genes can be assembled (Fig. 1h, Box 1). These genes are then characterized by complete sequencing and annotation (e.g. protein structural motifs) with various informatics tools (Fig. 1i). Several cDNA clone sets have now become available (Boxes 1,2). Gene expression profiling based on the frequency of EST appearance
It has been shown that the EST frequency of individual genes approximately corresponds to the expression levels of genes42. Also, a report on the cloning of a new gene is often accompanied by in silico expression data, which are obtained by simply counting the frequency of corresponding ESTs. Several websites include EST frequency analysis as a measure of expression levels of genes in tissues40–43 (e.g. SANBI, http://www.sanbi.ac.za; TIGR, http://www.tigr.org/tdb/mgi; BodyMap, http://bodymap.ims.u-tokyo.ac.jp and CGAP, http://cgap.nci.nih.gov/Tissues). However, the use of EST data in the public database for this purpose is limited because a large fraction of ESTs are derived from normalized cDNA libraries44 in which the ratio of mRNA species is not maintained in the EST frequencies. Another uncertainty arises because the number of ESTs in each cell type or tissue is much smaller than those customarily obtained in a SAGE project because each sequencing only provides one tag. Although statistically valid criteria for interpreting EST frequency data have been established45, the small number of gene tags makes it difficult to argue the significance of data, when there are statistical fluctuations in cDNA clone sampling. EST frequency
514
Review
TRENDS in Biotechnology Vol.19 No.12 December 2001
Table 1. Libraries and ESTs in current dbEST (as of 1 June 2001) Preimplantation
Embryo
Fetus
Neonate
Adult
Others and/or un-classified
Total
Human
0
38 781
484 618
83 583
2 393 541
510 372
3 510 895
Mouse
71 927
310 454
154 604
157 458
1 177 986
107 979
1 980 408
0
30 878
8852
0
126 499
119 110
285 339
Rat
The table was compiled from the numbers of ESTs organized by cDNA libraries at UniGene database of NCBI (‘Library Browser’ http://www.ncbi.nlm.nih.gov/UniGene/). Each cDNA library was assigned to the stages of development according to Figure 2. (‘Others/Unclassified’ category includes the cultured cells.)
as a measure of gene expression levels will eventually be replaced by cDNA microarray data. However, if the amount of material is limiting for gene expression profiling by cDNA microarrays, this is still a first-order method to monitor global changes of gene expression. In one representative case, EST frequencies from a total of 25 000 ESTs Box 1. Mouse EST projects and unique cDNA clone sets Mouse EST projects (1) Washington University School of Medicine, Genome Sequencing Center (St Louis, MO, USA, http://genome.wustl.edu/est/mouse_esthmpg.html): cDNA clones from IMAGE Consortiuma, Cancer Genome Anatomy Project b (CGAP), and Mammalian Gene Collection (MGC: focus on full-length cDNA clones)c. (2) RIKEN Genomic Sciences Center, Mouse Encyclopedia Project (Yokohama, Japan, http://genome.rtc.riken.go.jp/): Focus on full-length cDNA clones. (3) National Institute on Aging, NIH, Mouse cDNA Project (Baltimore, MD, USA, http://lgsun.grc.nia.nih.gov/cDNA/cDNA.html): Focus on genes expressed in early embryonic tissues and stem cells. Mouse cDNA clone sets (1) Research Genetics: http://www.resgen.com/index.php3 IMAGE mouse clones (>10 000 clones) Mouse brain (BMAP) clones (11 136 clones) (2) Incyte/GenomeSystems: http://www.incyte.com/index.shtml Mouse GEM 1 clones (8734 IMAGE clones) Mouse GEM 2 clones (9448 IMAGE clones) (3) Lion Bioscience AG: http://www.lionbioscience.com Mouse arrayTAG and arrayBASE (20 000 cDNA clones) (4) RIKEN Mouse Encyclopedia Project: http://genome.rtc.riken.go.jp/ Mouse 21K cDNA clone setd (5) National Institute on Aging, NIH: http://lgsun.grc.nia.nih.gov/ cDNA/cDNA.html NIA mouse 15K cDNA clone sete (Box 2) References a Lennon, G. et al. (1996) The I.M.A.G.E. Consortium: an integrated molecular analysis of genomes and their expression. Genomics 33, 151–152 b Strausberg, R.L. et al. (1997) New opportunities for uncovering the molecular basis of cancer. Nat. Genet. 15, 415–416 c Strausberg, R.L. et al. (1999) The mammalian gene collection. Science 286, 455–457 d Kawai, J. et al. (2001) Functional annotation of a full-length mouse cDNA collection. Nature 409, 685–690 e Tanaka, T.S. et al. (2000) Genome-wide expression profiling of mid-gestation placenta and embryo using a 15,000 mouse developmental cDNA microarray. Proc. Natl. Acad. Sci. U. S. A. 97, 9127–9132
http://tibtech.trends.com
from all stages of preimplantation embryos, have revealed that some genes show stage-specific expression patterns18. Gene expression profiling based on cDNA microarray
Because several excellent reviews on cDNA microarrays are available11,12, only the application to mammalian developmental biology will be discussed here. First, cDNA microarrays can be used to identify genes that are differentially expressed between different cell types. To this end, the method to tag specific cell types using GFP or β-galactosidase36 can be used to purify cells from developing embryos and subject them to cDNA microarray analyses. This approach has been used to analyze genes expressed during mouse PGC development36. Second, cDNA microarrays can be used to look at global differences between embryos at different developmental stages, or between different tissues from embryos. For example, differential gene expressions among endoderm, mesoderm, ectoderm and primitive streak of 7.5 dpc embryos have been studied using high-density cDNA arrays34. Global gene expression profiling between the mid-gestation placenta and embryo have identified many novel placenta-specific genes46. Patterns of gene expression during the development of the central nervous system have also been studied47. Although this type of analysis cannot provide much insight into the function of genes or the mechanisms underlying developmental phenomena, it can suggest routes to identify cell-type-specific markers and the starting gene materials for more mechanism-oriented research. To understand gene function or dissect a genetic pathway underlying developmental phenomena, cDNA microarray studies require the manipulation of the experimental system (Fig. 1p). For example, by monitoring the expression levels of all genes after overexpressing or knocking out a specific gene in the cells or embryos48–50, the genes downstream of that gene can potentially be identified. These types of experiments might eventually delineate transcriptional cascades and networks during development and can cumulatively lead to fundamental principles. For example, it is reasonable to assume that if a gene ‘A’ functions upstream of a gene ‘B’ in the same regulatory pathways, the overexpression or disruption of gene
Review
TRENDS in Biotechnology Vol.19 No.12 December 2001
Box 2. National Institute of Aging mouse 15K cDNA clone set To facilitate cDNA microarray work in mouse models, a 15 247 unique cDNA clone set was assembled at the National Institute on Aging (Baltimore, MD, USA) based on 52 374 3′-expressed sequence tags (3′-ESTs) that are primarily derived from early embryonic cDNA librariesa. The sources include preimplantation embryos from unfertilized eggs to blastocystb, embryonic day (E)7.5 embryosc, E12.5 female mesonephros and gonad and newborn ovary. The average insert size of these oligo(dT)-primed cDNA clones is 1.5 kb. The clone set has been freely distributed since May 2000 and is in use at many research institutions (http://lgsun.grc.nia.nih.gov/cDNA/15k.html). The clone set was first distributed to nine academic centers (five in the USA, one in Canada, one in the UK, one in Japan and one in Italy) with the condition that these centers replicate and redistribute the clone set to at least ten places. All cDNA clones were resequenced from the 5′- and 3′-ends to verify the identity of cDNA clones and to obtain sequence information for protein coding regions. Approximately 91% of clones were thus sequence-verified. In addition to many genes that are unique to early mouse development, nearly 7500 show significant homology to known genesd. Assembly of an additional 11 000 cDNA clone set that makes a total 26K set is in progress. References a Tanaka, T.S. et al. (2000) Genome-wide expression profiling of mid-gestation placenta and embryo using a 15,000 mouse developmental cDNA microarray. Proc. Natl. Acad. Sci. U. S. A. 97, 9127–9132 b Ko, M.S.H. et al. (2000) Large-scale cDNA analysis reveals phased gene expression patterns during preimplantation mouse development. Development 127, 1737–1749 c Ko, M.S.H. et al. (1998) Genome-wide mapping of unselected transcripts from extraembryonic tissue of 7.5-day mouse embryos reveals enrichment in the t-complex and under-representation on the X chromosome. Hum. Mol. Genet. 7, 1967–1978 d Kargul, G.J. et al. (2001) Verification and initial annotation of the NIA mouse 15K cDNA clone set. Nat. Genet. 28, 17–18
‘A’ will cause more significant changes in overall gene expression patterns than the overexpression or disruption of the gene ‘B’. By repeating this type of experiment for many genes, the place of genes in regulatory pathway can be assessed. Global gene expression profiling using cDNA microarrays will also be useful for extracting characteristic features of cells and tissues that can then be used as unique fingerprints. This will be particularly useful for studies of cell differentiation in vivo and in vitro. For example, ES cells can now be differentiated into a variety of cell types, but the identification of specific differentiated cell types has been done using just a few markers. Microarrays will provide more discriminating fingerprints of each cell type. This feature might help to characterize the phenotype of mouse mutants generated by large-scale ENU mutagenesis programs4. One can even anticipate that the identification of a mutated gene might be deduced from differential expression profilings. One of the most attractive features of gene expression profiling based on cDNA microarrays, is that it can provide a snapshot of cellular gene expression, and capture even transient changes in cells. This might provide, for the first time, a route http://tibtech.trends.com
515
to recognize genes that show a phenotype only by acting as a group. One can recall, for example, how cellular oncogenes were discovered by simple gene transfection experiments because a single gene could transform cells into easily identifiable tumor cells. If an unknown combination of two genes was required to achieve this overt phenotypic change in cells, they would not have been identified. With the capability to monitor expression of nearly all genes, the cDNA microarray might be able to identify such a combination of genes. Suppose that we want to find out genes that convert cell type ‘A’ to cell type ‘B’ – global gene expression patterns of these cell types can initially be defined by using cDNA microarray analysis. After overexpressing or knocking out a candidate gene in cell ‘A’, global gene expression patterns will be monitored and compared with those of cell ‘A’ and cell ‘B’. It should be possible to identify a gene that can make cell ‘A’ look more similar to cell ‘B’ based on global gene expression patterns even without showing any phenotypic cellular changes. Then, another gene can be overexpressed in the same cells that have been altered by the first gene, and the global gene expression monitored again. In this manner, multi-step transitions of cellular gene expression status should be able to be followed up using cDNA microarrays. Problems of applying cDNA microarray technology to developmental biology
In spite of obvious applications of cDNA microarrays to mammalian developmental biology, only a small number of articles on this topic have been published. What are the obstacles to wider use of cDNA microarray technology in developmental biology? First, extensive and wide use of cDNA microarrays is currently impossible because of their limited supply and high cost. Therefore, many researchers choose to focus on specialized arrays containing only a small number of genes, which can be ‘home-made’, and are cheaper and available in large quantities. Although valuable information can still be obtained, it is obvious that a small array will miss the unexpected expression changes of other genes that could be more interesting than those of the few genes on the array. Ideally, all mammalian genes (currently estimated at ~30 000–50 000), should be on a cDNA microarray. Such a complete cDNA clone set or sequence information is not yet available but we can expect the situation to improve within a few years with the emergence of many new microarray facilities and the recruitment of commercial vendors who can provide high-quality standardized cDNA microarrays in large quantities at a lower price. Second, large amounts of RNA are required for cDNA microarray studies. Standard protocols of cDNA microarray hybridizations require at least
516
Review
TRENDS in Biotechnology Vol.19 No.12 December 2001
≥1 µg of mRNA (or an equivalent amount of total RNA). This makes it difficult for developmental biologists because it is practically impossible to obtain a large amount of cell and tissue material from developing mammalian embryos. Although methods to amplify mRNAs from single cells51,52 or small amounts of material53 have been reported, further development is required to make such methods routine and dependable. A third problem is the reproducibility and reliability of microarray results. Triplicate hybridizations are needed to obtain a statistically validated expression data set46. The student’s t-test has been used to identify groups of genes that show statistically significant differences in expression levels between two samples. Most genes identified in this manner have shown good correlations to data obtained by Northern blot hybridization46. Other groups have also shown the importance of repetitions in microarray data analysis54. However, these statistical analyses eliminate two groups of genes from further consideration: one is excluded by not having reproducible results in triplicate (poor data) and the other is excluded by not showing statistically significant differences (no change in expression). This is not desirable because genes that show no change in expression are equally important in understanding what is occurring in cells. Development of a new computational method to take these groups of genes into account will be important. A fourth problem is that, although the public database has begun to incorporate many cDNA microarray data (e.g. http://www.ncbi.nlm.nih.gov/ geo/), gene expression data obtained by different groups using different cDNA microarrays or oligonucleotide chips are generally not comparable. The same gene will show different expression levels on different cDNA microarrays if the length and region of the gene covered by cDNA clones are different. Differential hybridization kinetics of oligonucleotides and cDNAs is a further source of discrepancy in expression measurements for a gene. It is also reasonable to expect that oligonucleotides designed from a different region of a gene can show different expression levels owing to alternative mRNA splicing. Because one of the main advantages of cDNA microarray analysis is the accumulated gene expression information of many different cell types and tissues (which can be superimposed on results of further experiments) the standardization of cDNA clone sets and microarrays or oligonucleotide chips will be desirable. Alternatively, statistical tests must be devised and applied to recognize and resolve the reasons for discrepancies between data sets. A fifth problem is the inherent difficulty of mining large quantities of data. Researchers usually spend many months to make sense out of the data. Although sophisticated tools are becoming http://tibtech.trends.com
available55 (http://www.GenMAPP.org), further development in bioinformatics and statistical methods is indispensable. Application of cDNA clone set: conventional methods, large-scale in situ hybridization and proteomics
Once a whole cDNA catalog is assembled (Fig. 1f ), a total mouse gene can be stored in a compact format, and could be maintained in a freezer in every laboratory. Such readily available cDNA clones have many applications. For example, cDNA clones can be immediately used as probes for Northern and Southern blot analyses and in situ hybridization that can be performed in a high-throughput manner6,7 (Fig. 1l). The cDNA clone set can also be directly applied to proteomics56 (Fig. 1m). For cDNA clones that contain complete open reading frames, protein coding region cassettes can be cloned into a vector that allows automated in-frame shuttling of genes into any vector system, such as the Gateway system (Invitrogen/Life Technologies, Rockville, MD, USA). Once a cDNA collection is converted to such a format, it can be used for a variety of proteomics approaches, including the generation of antibodies for each protein (e.g. http://www.hip.harvard.edu/). The cDNAs can also be used for biological assays, such as overexpression of genes in the cells and animals (Fig. 1p), as well as for protein–protein interaction assays, as in the yeast two hybrid systems57,58 (Fig. 1m). Impact of human and mouse genome sequence on embryogenomics
Although embryogenomic approaches are independent of the Human Genome Project, genomic sequence information is essential for embryogenomics for several reasons. First, genes (ESTs and cDNAs) can be localized on the genome map by looking up the corresponding genomic DNA sequences (Fig. 1n). Previously, this has been done by mapping ESTs on interspecific backcross mouse panels and the radiation hybrid panels5 (Fig. 1n). Second, analysis of enhancer or promoter sequences of coexpressed genes discovered by cDNA microarray analysis, should identify common regulatory motifs for a transcription factor59. Such analyses will help to build a comprehensive picture of transcriptional regulatory networks. Third, the readily available genomic structures of genes, including exon/intron boundaries, facilitates the analysis of the gene. For example, the genomic structures make it easy to design gene-specific primers for RT-PCR experiments and to facilitate the design and execution of gene knockout experiments. Fourth, integration of many different types of information will provide new perspectives in developmental biology. For example, the integration of expression profiling and map location of genes has already provided the interesting observation that
Review
TRENDS in Biotechnology Vol.19 No.12 December 2001
co-expressed genes tend to cluster in the mouse18 and yeast genome60, although it remains to be seen whether these first glimpses hold true across mammalian genomes. Future perspectives
Acknowledgements I thank all past and present members of the laboratory for their contributions toward the goal of establishing embryogenomic approaches. I particularly thankTetsuya S.Tanaka and Saied A. Jaradat for discussions about cDNA microarray technology, and Dawood B. Dudekula for assisting withTable 1. I also thank David Schlessinger for his input and critical reading of the manuscript and Kevin G. Becker for discussions about cDNA microarrays.
The collection of most genes in the form of cDNA clones will be complete within the next few years. Reasonably priced ready-made cDNA microarrays that contain all mammalian genes will also become available and every laboratory will be able to perform large-scale global gene expression analyses in systems of choice. In other words, individual laboratories will have large-scale capacities that are presently only available to large laboratories. With high-throughput genomics approaches in hand, the traditional small laboratory focusing on individual subjects, might prosper all the more. Are the high-throughput embryogenomic approaches going to revolutionize mammalian developmental biology? It is foreseeable that research in many areas will be accelerated by the full implementation of the existing technologies described. Discovery of genes that are
References 1 Lander, E.S. et al. (2001) Initial sequencing and analysis of the human genome. Nature 409, 860–921 2 Venter, J.C. et al. (2001) The sequence of the human genome. Science 291, 1304–1351 3 Schlessinger, D. and Ko, M.S.H. (1998) Developmental genomics and its relation to aging. Genomics 52, 113–118 4 Nadeau, J.H. et al. (2001) Sequence interpretation. Functional annotation of mouse genome sequences. Science 291, 1251–1255 5 Denny, P. and Justice, M.J. (2000) Mouse as the measure of man? Trends Genet. 16, 283–287 6 Komiya, T. et al. (1997) A large-scale in situ hybridization system using an equalized cDNA library. Anal. Biochem. 254, 23–30 7 Neidhardt, L. et al. (2000) Large-scale screen for genes controlling mammalian embryogenesis, using high-throughput gene expression analysis in mouse embryos. Mech. Dev. 98, 77–94 8 Sargent, T.D. and Dawid, I.B. (1983) Differential gene expression in the gastrula of Xenopus laevis. Science 222, 135–139 9 Velculescu, V.E. et al. (1995) Serial analysis of gene expression. Science 270, 484–487 10 Liang, P. and Pardee, A.B. (1992) Differential display of eukaryotic messenger RNA by means of the polymerase chain reaction. Science 257, 967–971 11 Lockhart, D.J. and Winzeler, E.A. (2000) Genomics, gene expression and DNA arrays. Nature 405, 827–836 12 Hughes, T.R. and Shoemaker, D.D. (2001) DNA microarrays for expression profiling. Curr. Opin. Chem. Biol. 5, 21–25 13 Takahashi, N. and Ko, M.S.H. (1994) Toward a whole cDNA catalog: construction of an equalized cDNA library from mouse embryos. Genomics 23, 202–210 http://tibtech.trends.com
517
differentially expressed in specific cell types, or are downstream targets of a specific transcription factor, will become much easier. Functional analysis of such genes, which has to be done by manipulating the embryos (e.g. knocking out the gene), will still require formidable effort, but eventually it will be assisted by ready-made mutant ES cells61,62 and mice4. All of these will provide unprecedented levels of understanding normal and abnormal animal development, human diseases and the design of more effective medical interventions will also benefit. However, this field is still a frontier. We still have no clue about how the global organization of transcription networks behaves, and whether or not the dynamics of such networks can reflect cellular differentiation status. With a tool for whole genome expression profiling, we will finally have ways to measure RNA levels of all genes and to start understanding the behavior of complex networks. To this end, experimental tools such as cDNA microarrays and computer simulations of gene networks (e.g. E-cells)63, which are only in their infancy, will be used interactively.
14 Adams, M.D. et al. (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252, 1651–1656 15 Boguski, M.S. et al. (1993) dbEST-database for ‘expressed sequence tags’. Nat. Genet. 4, 332–333 16 Marra, M.A. et al. (1998) Expressed sequence tags – ESTablishing bridges between genomes. Trends Genet. 14, 4–7 17 Lennon, G. et al. (1996) The I.M.A.G.E. Consortium: an integrated molecular analysis of genomes and their expression. Genomics 33, 151–152 18 Ko, M.S.H. et al. (2000) Large-scale cDNA analysis reveals phased gene expression patterns during preimplantation mouse development. Development 127, 1737–1749 19 Adjaye, J. et al. (1998) The construction of cDNA libraries from human single preimplantation embryos and their use in the study of gene expression during development. J. Assist. Reprod. Genet. 15, 344–348 20 Morozov, G. et al. (1998) Sequence analysis of libraries from individual human blastocysts. J. Assist. Reprod. Genet. 15, 338–343 21 Thomson, J.A. et al. (1998) Embryonic stem cell lines derived from human blastocysts. Science 282, 1145–1147 22 Shamblott, M.J. et al. (1998) Derivation of pluripotent stem cells from cultured human primordial germ cells. Proc. Natl. Acad. Sci. U. S. A. 95, 13726–13731 23 Odorico, J.S. et al. (2001) Multilineage differentiation from human embryonic stem cell lines. Stem Cells 19, 193–204 24 Maruyama, K. and Sugano, S. (1994) Oligo-capping: a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides. Gene 138, 171–174 25 Carninci, P. et al. (1996) High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics 37, 327–336
26 Soares, M.B. et al. (1994) Construction and characterization of a normalized cDNA library. Proc. Natl. Acad. Sci. U. S. A. 91, 9228–9232 27 Belyavsky, A. et al. (1989) PCR-based cDNA library construction: general cDNA libraries at the level of a few cells. Nucleic Acids Res. 17, 2919–2932 28 Welsh, J. et al. (1990) Cloning of PCR-amplified total cDNA: construction of a mouse oocyte cDNA library. Genet. Anal. Technol. Appl. 7, 5–17 29 Ko, M.S.H. (1990) An ‘equalized cDNA library’ by the reassociation of short double-stranded cDNAs. Nucleic Acids Res. 18, 5705–5711 30 Piao, Y. et al. (2001) Construction of long-transcript enriched cDNA libraries from submicrogram amounts of total RNAs by a universal PCR amplification method. Genome Res. 1553–1558 31 Rothstein, J.L. et al. (1993) Construction of primary and subtracted cDNA libraries from early embryos. Methods Enzymol. 225, 587–610 32 Marra, M. et al. (1999) An encyclopedia of mouse genes. Nat. Genet. 21, 191–194 33 Sasaki, N. et al. (1998) Characterization of gene expression in mouse blastocyst using single-pass sequencing of 3995 clones. Genomics 49, 167–179 34 Harrison, S.M. et al. (1995) Isolation of novel tissue-specific genes from cDNA libraries representing the individual tissue constituents of the gastrulating mouse embryo. Development 121, 2479–2489 35 Ko, M.S.H. et al. (1998) Genome-wide mapping of unselected transcripts from extraembryonic tissue of 7.5-day mouse embryos reveals enrichment in the t-complex and underrepresentation on the X chromosome. Hum. Mol. Genet. 7, 1967–1978 36 Abe, K. et al. (1998) A systematic molecular genetic approach to study mammalian germline development. Int. J. Dev. Biol. 42, 1051–1065
518
Review
TRENDS in Biotechnology Vol.19 No.12 December 2001
37 Heintz, N. (2000) Analysis of mammalian central nervous system gene expression and function using bacterial artificial chromosomemediated transgenesis. Hum. Mol. Genet. 9, 937–943 38 Abe, K. et al. (1996) Purification of primordial germ cells from TNAPbeta-geo mouse embryos using FACS-gal. Dev. Biol. 180, 468–472 39 Schuler, G.D. et al. (1996) A gene map of the human genome. Science 274, 540–546 40 Christoffels, A. et al. (2001) STACK: sequence tag alignment and consensus knowledgebase. Nucleic Acids Res. 29, 234–238 41 Quackenbush, J. et al. (2001) The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res. 29, 159–164 42 Kawamoto, S. et al. (2000) BodyMap: a collection of 3′ ESTs for analysis of human gene expression information. Genome Res. 10, 1817–1827 43 Strausberg, R.L. et al. (1997) New opportunities for uncovering the molecular basis of cancer. Nat. Genet. 15, 415–416 44 Bonaldo, M.F. et al. (1996) Normalization and subtraction: two approaches to facilitate gene discovery. Genome Res. 6, 791–806 45 Claverie, J.M. (1999) Computational methods for the identification of differential and coordinated gene expression. Hum. Mol. Genet. 8, 1821–1832 46 Tanaka, T.S. et al. (2000) Genome-wide expression profiling of mid-gestation placenta and embryo using a 15,000 mouse developmental cDNA microarray. Proc. Natl. Acad. Sci. U. S. A. 97, 9127–9132 47 Miki, R. et al. (2001) Delineating developmental and metabolic pathways in vivo by expression
48
49
50
51
52
53
54
55
56
57
profiling using the RIKEN set of 18,816 fulllength enriched mouse cDNA arrays. Proc. Natl. Acad. Sci. U. S. A. 98, 2199–2204 Livesey, F.J. et al. (2000) Microarray analysis of the transcriptional network controlled by the photoreceptor homeobox gene Crx. Curr. Biol. 10, 301–310 Callow, M.J. et al. (2000) Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. Genome Res. 10, 2022–2029 Maekawa, T. et al. (1999) Mouse ATF-2 null mutants display features of a severe type of meconium aspiration syndrome. J. Biol. Chem. 274, 17813–17819 Kacharmina, J.E. et al. (1999) Preparation of cDNA from single cells and subcellular regions. Methods Enzymol. 303, 3–18 Brady, G. (2000) Expression profiling of single mammalian cells – small is beautiful. Yeast 17, 211–217 Wang, E. et al. (2000) High-fidelity mRNA amplification for gene profiling. Nat. Biotechnol. 18, 457–459 Lee, M.L. et al. (2000) Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl. Acad. Sci. U. S. A. 97, 9834–9839 Quackenbush, J. (2001) Computational analysis of microarray data. Nat. Rev. Genet. 2, 418–427 Lee, K.H. (2001) Proteomics: a technologydriven and technology-limited discovery science. Trends Biotechnol. 19, 217–222 Fields, S. and Song, O. (1989) A novel genetic system to detect protein–protein interactions. Nature 340, 245–246
58 Finley, R.L. and Brent, R. (1994) Interaction mating reveals binary and ternary connections between Drosophila cell cycle regulators. Proc. Natl. Acad. Sci. U. S. A. 91, 12980–12984 59 Zhang, M.Q. (1999) Large-scale gene expression data analysis: a new challenge to computational biologists. Genome Res. 9, 681–688 60 Cohen, B.A. et al. (2000) A computational analysis of whole-genome expression data reveals chromosomal domains of gene expression. Nat. Genet. 26, 183–186 61 Zambrowicz, B.P. et al. (1998) Disruption and sequence identification of 2,000 genes in mouse embryonic stem cells. Nature 392, 608–611 62 Wiles, M.V. et al. (2000) Establishment of a gene-trap sequence tag library to generate mutant mice from embryonic stem cells. Nat. Genet. 24, 13–14 63 Tomita, M. (2001) Whole-cell simulation: a grand challenge of the 21st century. Trends Biotechnol. 19, 205–210 64 Hogan, B. et al. (1994) Manipulating the Mouse Embryo: A Laboratory Manual, Cold Spring Harbor Laboratory Press 65 Rugh, R. (1990) The Mouse, its Reproduction and Development, Oxford University Press 66 Davidson, D. and Baldock, R. (2001) Bioinformatics beyond sequence: mapping gene function in the embryo. Nat. Rev. Genet. 2, 409–417 67 Strausberg, R.L. et al. (1999) The mammalian gene collection. Science 286, 455–457 68 Kawai, J. et al. (2001) Functional annotation of a full-length mouse cDNA collection. Nature 409, 685–690 69 Kargul, G.J. et al. (2001) Verification and initial annotation of the NIA mouse 15K cDNA clone set. Nat. Genet. 28, 17–18
Editor’s Choice – bmn.com/genomics As a busy scientist, searching through the wealth of information on BioMedNet can be a bit daunting – the new gateway to genomics on BioMedNet is designed to help. The genomics gateway is updated weekly and features relevant articles selected by the editorial teams from Trends in Biotechnology, Current Opinion in Biotechnology and Drug Discovery Today. The regular updates include: News – our dedicated team of reporters from BioMedNet news provide a busy researcher with all the news to keep up to date with what’s happening – right now. Journal scan – learn about new reports and events in genomics every day, at a glance, without leafing through stacks of journals. Conference reporter – daily updates on the most exciting developments revealed at conferences – providing a quick but comprehensive report of what you missed by staying at home. Mini reviews and reviews – a selection of the best review and opinion articles from journals including all the Trends and Current Opinion journals. Why not bookmark the gateway at bmn.com/genomics for access to all the news, reviews and informed opinion on the latest scientific advances in genomics?
http://tibtech.trends.com