Computers and Chemical Engineering 29 (2005) 589–596
Gene expression measurement technologies: innovations and ethical considerations Jessica R. Goodkinda,∗ , Jeremy S. Edwardsb a
University of New Mexico Health Sciences Center, Center for Health Promotion and Disease Prevention, Albuquerque, NM 87131, USA b Department of Chemical Engineering, University of Delaware, Newark, DE 19716, USA Received 22 September 2003; received in revised form 20 January 2004 Available online 22 October 2004
Abstract Knowledge of the overall gene expression profile within a cell can provide key insights into cellular physiology. Therefore, it is not surprising that a great deal of effort has been devoted to measuring the expression level of all of the genes in a cell. With the advent of DNA microarray technology, researchers now have the ability to collect expression data on every gene in a cell simultaneously. The vast datasets created with this technology are providing valuable information that can be used to accurately diagnose, prevent, or cure a wide range of genetic and infectious diseases. In this article, we describe the key technologies for measuring gene expression and discuss how these approaches can be used to identify the correlation between the observed expression patterns and disease. Additionally, we will highlight the key ethical considerations and discuss how these tremendously powerful technologies may impact society. © 2004 Elsevier Ltd. All rights reserved. Keywords: Gene expression; Microassay; SAGE; Polony; Ethics
1. Introduction Knowledge of the overall gene expression profile within a cell can provide key insights into cellular physiology. Therefore, it is not surprising that a great deal of effort has been devoted to measuring the expression level of all of the genes in a cell. With the advent of DNA microarray technology, researchers now have the ability to collect expression data on every gene in a cell simultaneously. The vast datasets created with this technology are providing valuable information that can be used to accurately diagnose, prevent, or cure a wide range of genetic and infectious diseases. Thus, an area of active research is the development of tools to accurately collect, analyze, and interpret massive quantities of data on gene expression levels in the cell. There are several examples in the literature where computational tools have ∗
Corresponding author. E-mail addresses:
[email protected] (J.R. Goodkind),
[email protected] (J.S. Edwards). 0098-1354/$ – see front matter © 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.compchemeng.2004.08.033
been applied to analyze gene expression data (Marcotte, Pellegrini, Thompson, Yeates, & Eisenberg, 1999; Marcotte, Pellegrini, Yeates, & Eisenberg, 1999; Marcotte et al., 1999; Pellegrini, Marcotte, Thompson, Eisenberg, & Yeates, 1999; McGuire, Hughes, & Church, 2000; Roth, Hughes, Estep, & Church, 1998; Alizadeh, Eisen, Botstein, Brown, & Staudt, 1998; Alter, Brown, & Botstein, 2000; DeRisi et al., 2000; Eisen, Spellman, Brown, & Botstein, 1998; Spellman et al., 1998; Wilson et al., 1999; Sudarsanam, Iyer, Brown, & Winston, 2000; D’Haeseleer, Wen, Fuhrman, & Somogyi, 1999; Bassett, Eisen, & Boguski, 1999; Liang, Fuhrman, & Somogyi, 1998; Chen, He, & Church, 1999; Getz, Levine, & Domany, 2000; Tavazoie, Hughes, Campbell, Cho, & Church, 1999; Michaels et al., 1998; Tamayo et al., 1999), with the most common method being cluster analysis (Eisen et al., 1998). The technological developments in gene expression measurement are promising, but at the same time it is incumbent upon us to raise numerous ethical concerns. Scientists and engineers who are involved in developing and utilizing gene
590
J.R. Goodkind, J.S. Edwards / Computers and Chemical Engineering 29 (2005) 589–596
expression technology have a responsibility to consider ethical issues and are in a key position to take an active role in policy development and other efforts to ensure that gene expression measurement technologies are used ethically. Four important considerations will be discussed in this manuscript: (1) insurance and employment discrimination and issues of individual privacy; (2) fair access to tests and treatments; (3) technological limitations such as accuracy of diagnosis, ability to relate genetic information to a disease, and diagnosis of “untreatable” conditions; and (4) genetic counseling and issues about release of information to patients. In this article, we will describe the key technologies for measuring gene expression and discuss how these approaches can be used to identify the correlation between the observed expression patterns and disease. Additionally, we will highlight the key ethical considerations and discuss how these tremendously powerful technologies may impact society.
2. Genome-scale gene expression monitoring technology A number of technologies have been developed to acquire gene expression information on a “whole transcriptome” level, where the transcriptome is defined as the complete set of all expressed genes (transcripts) in a given cell. The techniques currently in vogue are based on either direct sequence analysis (Weinstock, Kirkness, Lee, Earle-Hughes, & Venter, 1994; Adams, 1996) or specific hybridization of complex cDNA or mRNA probes to microarrays of oligonucleotides or cDNAs (Ramsay, 1998; Giordano, 2003).
2.1. Hybridization-based approaches All hybridization-based approaches for measuring gene expression are based on a single principle: DNA complementary to all genes (or a region of the genes) is synthesized and placed at high density in distinct points on a ‘chip’. Fluorescently labeled cDNA from a cell or tissue sample is hybridized to the chip and the fluorescent intensity of the spots on the chip are directly related to the mRNA concentration in the cell or tissue sample. Currently there are primarily two alternative hybridization-based technologies for measuring gene expression on a whole transcriptome level: DNA microarrays and GeneChips. DNA microarrays (Fig. 1) are essentially glass microscope slides that have pools of DNA robotically printed on the surface. The DNA to be printed on the DNA microarray is amplified by polymerase chain reaction (PCR). For gene expression, full-length cDNA molecules or small fragments of the cDNA that can be used to uniquely identify the gene can be printed on the slide. Oligonucleotide microarrays are also available from companies such as Agilent, who produces arrays with 60-mer oligonucleotides (http://www.agilent.com/). GeneChips are an alternate technology to DNA microarrays. GeneChips are produced commercially by Affymetrix using photolithography to synthesize ∼25 bp oligonucleotides on the chip. Affymetrix GeneChips operate under the same principle as microarrays – the concentration of the mRNA molecules is proportional to the image signal intensity (Lipshutz, Fodor, Gingeras, & Lockhart, 1999). There are also other related methods for generating arrays for gene expression analysis. For example, NimbleGen arrays are very similar to Affymetrix arrays (http://www.nimblegen.com/);
Fig. 1. Hybridization-based approach for gene expression profiling.
J.R. Goodkind, J.S. Edwards / Computers and Chemical Engineering 29 (2005) 589–596
591
Fig. 2. Sequence-based approach for gene expression profiling.
however, NimbleGen utilizes a micromirror system to synthesize the oligonucleotides rather than photolithography (Nuwaysir et al., 2002). 2.2. Sequencing-based approaches In addition to hybridization-based technologies, there are several DNA sequencing-based approaches for measuring whole transcriptome gene expression (Fig. 2), with the most basic being expressed sequence tag (EST) sequencing (Carulli et al., 1998). Sequencing approaches have several advantages over hybridization-based approaches (i.e., sequencing approaches are more accurate and can provide absolute quantification); however, in the past, sequencingbased approaches have been more expensive and hence less used. Serial analysis of gene expression (SAGE) is the most promising whole transcriptome sequencing-based approach (Fig. 2) (Velculescu, Zhang, Vogelstein, & Kinzler, 1995). SAGE is based on the idea that only a short sequence tag is needed to uniquely identify a gene. Therefore, in SAGE, 10–20 bp are sequenced to uniquely identify a gene (Ye, Usher, & Zhang, 2002; Saha et al., 2002). Other DNA sequencing-based approaches include massively parallel signature sequencing (MPSS) (Brenner et al., 2000) and polymerase colony approaches (discussed below) (Merritt, DiTonno, Mitra, Church, & Edwards, 2003).
3. Polymerase colony approach A new technique that minimizes the limitations and maximizes the detection power of other sequencing approaches is polymerase colony (polony) technology. The Church lab-
oratory at Harvard Medical School has demonstrated that in situ polymerase chain reaction can be used to generate clones of single nucleic acid molecules (Merritt et al., 2003; Mitra & Church, 1999; Mikkilineni et al., 2004; Mitra et al., 2003; Zhu, Shendure, Mitra, & Church, 2003; Butz, Wickstrom, & Edwards, 2003; Butz, Yan, Mikkilineni, & Edwards, 2004) in a gel matrix. This method uses acrylamide polymerized in a solution containing standard PCR reagents and a very low concentration of linear DNA template. This gel is poured on a glass microscope slide and the DNA amplified by PCR. As the products remain localized near their respective templates, a single template molecule gives rise to a PCR colony, or “polony,” consisting of as many as 108 identical DNA molecules. By including an acrydite modification (Kenney, Ray, & Boles, 1998) on the 5 end of one of the PCR primers, the amplified DNA in each polony can be covalently attached by one of its ends to the polyacrylamide matrix. Polony radius decreases as template length increases and as the acrylamide percentage increases. Using a template of 1009 bp and an acrylamide concentration of 15%, Mitra and Church (1999) were able to produce polonies with an average radius of 6 m. At this size, the authors estimated that five million distinguishable polonies could be generated on a single slide. We developed a novel assay based on polony technology that accurately quantifies the relative expression levels of two alleles in the same cell or tissue sample (Butz et al., 2004). Mechanistically, this was accomplished by PCR amplifying a gene in a cDNA library in a thin, polyacrylamide gel. Once the polony was amplified, the two alleles of the gene were differentially labeled by performing in situ sequencing with fluorescently labeled nucleotides (Merritt et al., 2003; Mitra et al., 2003). For these sets of experiments, silent, single nucleotide polymorphisms (SNPs) were used to discriminate
592
J.R. Goodkind, J.S. Edwards / Computers and Chemical Engineering 29 (2005) 589–596
the two alleles. Finally, a simple count was then performed on the differently labeled polonies in order to determine the relative expression levels of the two alleles (Fig. 3). To validate this technique, the relative expression levels of PKD2 in a family of heterozygous patients bearing the 4208G/A SNP were examined and compared to a similar analysis by Yan, Yuan, Velculescu, Vogelstein, and Kinzler (2002). We were able to reproduce results in a digital version and thus demonstrate the utility of this method in human gene expression analysis. 3.1. Strengths and limitations of each approach The main advantage of DNA sequencing-based methods is that, unlike microarray-based technologies, they do not make any assumptions about which genes are expressed in a given tissue sample. This issue is of great importance when dealing with human samples due to the uncertainty in the gene number with estimates ranging from 30,000 to 100,000 genes. Indeed, gene expression microarray technology will only measure the level of transcripts for genes spotted or synthesized on the glass surface. Although cDNA microar-
rays can, to some extent, side-step this limitation by spotting clones of uncharacterized genes on microscope slides, there is a limit on the number of spots that can be placed on one microscope slide. Affymetrix GeneChips clearly suffer from this limitation because they use DNA sequence information in the design on the tiling path of the GeneChip array. In addition to these innate limitations, there is the technical challenge of producing an unbiased fluorescent probe from complex nucleic acid mixtures. Therefore, whole transcriptome surveys are difficult to perform with DNA and oligonucleotide microarray-based technologies. In contrast, SAGE and MPSS technology has been able to detect transcripts of genes that were not identified by genome sequence annotation or EST sequencing, which means that it is not necessary to know the particular gene involved in a disease a priori (Velculescu et al., 1995, 1997; Saha et al., 2002). The main disadvantage of sequencing-based methods is their cost. Indeed, although costs of DNA sequencing have greatly diminished with the advent of capillary array sequencers, the number of genes expressed in a typical eukaryotic cell is in the range of 104 . SAGE technology reduces these costs by extracting small sequence tags out of the cD-
Fig. 3. Determining relative allelic expression using polony technology. (A) Initially, the gene of interest is polony amplified from cDNA. The primers are designed to amplify the region of the gene bearing a silent SNP, which is used to discriminate the two alleles. In this illustration, one allele bears an A–T pairing at a particular position whereas the other allele has a G–C pairing at the same position. Note: one of primers has a 5 acridite modification (star), which is eventually required for distinguishing these two alleles. (B) Following polony amplification, the gel is stained with SybrGreenI and scanned with a microarray scanner in order to ensure that the polony amplification worked properly. Each black dot represents the polony amplification of one copy of the desired first-strand cDNA template. (C) Prior to hybridization of sequencing primer, the polony is made single stranded by denaturing the double stranded denaturation in formamide followed removal of non-acridited strand using electrophoresis. (D) A sequencing primer is hybridized to the single stranded polony, a single base extension is performed with Cy5-dATP and then the gel is scanned. To label the other allele, the process of denaturation, hybridation, extension, and scanning is repeated, but the extension is performed with Cy5-dGTP the second time through. (E) To determine the relative expression levels of the two alleles, a composite image from the Cy5-dATP and Cy5-dGTP extensions is first generated and then the number of polonies (transcripts) for each of the two alleles is counted. In this particular example, the “G–C” SNP represents 62.5% of the population (i.e., five out of eight polonies). Figure adapted from Yan et al. (2002) #1. (F) Representative polony gel for patient 50 following extensions with fluorescently labeled dATP (green) and dGTP (red). This is a composite image generated from the independent extensions with either Cy5-dATP or Cy5-dGTP as described in Fig. 1. The colors used in the image are artificial. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)
J.R. Goodkind, J.S. Edwards / Computers and Chemical Engineering 29 (2005) 589–596
NAs (Velculescu et al., 1995), however, the cost of reaching whole-transcriptome coverage with DNA sequencing-based surveys is in the range of several thousand dollars. SAGE surveys restricted to ∼10,000 tags may not detect gene expression for as many as 30% of mammalian mRNAs. Indeed, it is estimated that 30% of the genes in the genome make up far less than a few tenths of a percent of the total transcriptome, and produce an average of less than 10 copies per cell (Ostrow, Woods, Vosika, & Faras, 1979). Assuming a transcriptome is composed of transcripts from ∼15,000 genes and that a given cell contains 350,000 mRNA molecule (numbers taken from a typical textbook example of the transcriptome of a mammalian cell), we calculate that 322,000 tags would be needed to get a 99% probability of detecting a tag from one of these weakly transcribed genes. Considering that SAGE tags present at less than 10 copies are usually not included in a census of the transcriptome, the number of tags that would be required to accurately measure the level of expression of weakly transcribed genes is in the range of 106 , which is possible with MPSS.
4. Gene expression analysis and cancer One potential application of the gene expression technologies discussed above is in detection and classification of cancer. There are hundreds of different types of cancer, and it has been estimated that at least five genes are mutated in a tumor (Hilgers & Kern, 1999). Based on these data, it was subsequently estimated that there are ∼200 different genes that are mutated in different cancers (Wooster, 2000). Furthermore, each gene can be mutated at different locations; therefore, every tumor will be unique. Using clinical symptoms to classify tumors therefore cannot provide all of the information needed for successful treatment. With high-throughput gene expression technologies, it is now possible to screen a sample and determine the genes that are expressed. Therefore, expression profiling has attracted great interest in cancer biology and it is likely to revolutionize cancer diagnosis (Alizadeh et al., 1998; Perou et al., 1999), by using cluster analysis to identify genes that characterize the cancer (Bittner et al., 2000; Clark, Golub, Lander, & Hynes, 2000). The importance of expression profiling and cancer diagnosis is illustrated by Alizadeh et al. (1998) who showed that the most common subtype of non-Hodgkin’s lymphoma, diffuse large B-cell lymphoma (DLBCL), can be grouped into distinct subgroups based on the gene expression profile. Their work was motivated by the observation that B-cell lymphoma patients receiving the same diagnosis often had quite different responses to therapy and long-term survival rates. They focused on a form of non-Hodgkin’s lymphoma known as diffuse large B-cell lymphoma. They focused on this form of lymphoma because in the past it has not been possible to define subgroups of DLBCL based on morphology. To analyze the gene expression in the human tumors, Brown, Staudt, and co-workers constructed DNA microar-
593
rays. They included on the DNA microarrays genes that have been shown or suspected to be important in immunology or cancer, or are known to be expressed in lymphoid cells (Alizadeh et al., 1998). They included 17,856 cDNA clones on the microarray. In the supplementary information to the paper, the authors provide the datasets from the experiments. The data sets contain ∼4000 genes; therefore, many of the 17,856 cDNA clones are from the same gene. It should be noted that the sequencing-based approaches could have achieved a larger coverage of the genome because a priori knowledge of the genes is not required for sequencing-based approaches. Hence, the experiments by Brown, Staudt, and co-workers were unable to measure gene expression levels for many genes. Sequencing-based approaches will be able to simultaneously monitor the expression of every gene in the system, without biases due to our a priori knowledge of the tumor. Brown, Staudt, and co-workers used their DNA microarrays to study the gene expression patterns in three adult lymphoid malignancies, DLBCL, follicular lymphoma (FL), and chronic lymphocytic leukemia (CLL). In total, the authors analyzed 96 different normal and malignant samples, and used a hierarchical clustering algorithm to identify groups of genes based on similarity of expression profiles across the samples. Additionally, they clustered the samples based on the similarity of the expression profiles. The authors found that hierarchical clustering identified the malignancies based on the gene expression profiles. They identified two molecularly distinct forms of DLBCL; one characteristic of germinal centre B cells, and another that expressed genes typically expressed in activated peripheral blood B cells. The data indicated that patients with germinal centre B-like DLBCL had a significantly better overall survival. This paper was significant because it indicated that the classification of tumors based on the gene expression patterns can identify previously undetected and clinically significant subtypes of cancer. Expression profiling has also been used to classify cutaneous malignant melanoma (Bittner et al., 2000), breast cancer tumors (Perou et al., 2000), and to identify genes that are important for metastasis (Clark et al., 2000).
5. Ethical considerations The innovative technologies that are rapidly emerging in the field of high-throughput gene expression analysis are exciting and promising. These technologies provide hope for earlier and more accurate diagnosis of diseases and for personal genetic profiling that can determine and predict differences in individuals’ responses to different treatments. However, our enthusiasm should be tempered by careful attention to numerous ethical considerations. Typically, the predominant emphasis of scientists is on scientific and technological developments, with the belief that ethical issues about how these technologies are used are not our primary concern. We believe that some of the ethical considerations
594
J.R. Goodkind, J.S. Edwards / Computers and Chemical Engineering 29 (2005) 589–596
are so critical that it is the responsibility of all of us to consider these issues before and in conjunction with any research in the field. There are numerous ethical issues raised by using highthroughput gene expression profiling technologies. We focus on four that are particularly compelling and which we believe deserve immediate attention: (1) insurance and employment discrimination and issues of individual privacy; (2) fair access to tests and treatments; (3) technological limitations such as accuracy of diagnosis, ability to relate genetic information to a disease, and diagnosis of “untreatable” conditions; and (4) genetic counseling and issues about release of information to patients. Issues of individual privacy and insurance and employment discrimination are one of the most common ethical concerns cited by researchers, doctors, and consumers (Friend & Stoughton, 2002; Zupke & Stephanopoulos, 1995). For example, one of the applications of Affymetrix GeneChip technology is to create personal gene expression profiles for individuals. Once these data are available, who has the right to this information? Will employers and health insurance companies be able to demand access to such information and refuse coverage or employment to individuals whose gene expression profiles reveal high risk for certain conditions? If individuals wish to protect their privacy by not revealing this information, insurance companies might refuse coverage or offer it only at higher rates. Although these questions are often raised and discussed, current technological advances in gene expression profiling highlight the importance of resolving these issues more immediately, explicitly, and uniformly through legislation and policy. It is interesting and important to note that issues about health insurance discrimination are likely not of great concern in countries with universal health care, such as Australia, Canada, Japan, England, and many other parts of Europe. Adopting a similar system in the United States could undoubtedly alleviate or lessen certain discrimination concerns. A related issue involves access to testing and treatment. It is clear that these new technologies are only available to certain people, mostly because they are prohibitively expensive. For example, an Affymetrix GeneChip currently costs approximately $800, and SAGE experiments are significantly more expensive. However, even if individuals had equal access to testing, would the same diagnosis offer different options to different people, based upon their access to treatments? For instance, some individuals might have insurance or their own private means to treat a condition that another person’s insurance might refuse to cover. To complicate this issue further, an individual with a personal gene expression profile which suggests that he or she might not respond to a particular treatment (i.e., the DLBCL example described above) might be denied access by their insurance to the treatment. This could result in individuals either having to pay for their own treatment or forgo it if they could not afford it. Many geneticists argue that “knowledge is power,” that patients have the right to know their gene expression profiles
or predispositions to diseases or treatment. But in some cases, knowledge may actually hurt rather than help individuals (or might help certain people while hurting others). In the rush to develop new diagnostic tools, it is also important to keep in mind the technological limitations that we face. Three major issues are: accuracy of diagnosis, our ability to relate genetic information to a disease, and diagnosis of “untreatable” conditions. Our capability to accurately diagnose diseases and/or treatment responses is limited by our technology, and therefore we need to be cautious in how the test results are reported to the individual. For example, cDNA microarray technology typically is only able to identify genes that are up-regulated or down-regulated two-fold. For instance, a related example from a protein profiling technique to detect ovarian cancer reports a false positive rate of 5%. Since ovarian cancer is a rare disease that only occurs in 20 out of 100,000 women, if this test were used to screen the general population of women, it would falsely alarm 250 women for every real case of ovarian cancer it detected (Service, 2003). On the other hand, other genetic diagnostic techniques (although not gene expression profiling techniques), even wellestablished ones that routinely test for muscular dystrophy and cystic fibrosis, fail to identify mutations in 10–40% of patients tested (Sinclair, 2002). False negative rates are even worse for less well-characterized genes. These false negatives are misleading and misinterpretations may result in doctors or patients ignoring other important indications of disease. Thus, both false positives and false negatives can have serious consequences. Furthermore, the problems with false negatives and false positives are amplified in prenatal genetic testing, in which parents may be making irreversible decisions about whether to continue or terminate a pregnancy. A second technological limitation involves our ability to relate gene expression profiling information to the onset of disease. Most diseases are not simple single-gene diseases. Therefore, diagnostic screening typically delineates increased risk for a disease rather than providing an actual diagnosis. For example, the work described above regarding DLBCL, hierarchical clustering was used to find changes in the overall pattern gene expression and not simply changes in expression of a single gene. Improved statistical tools are needed to define the probability of an expression profile existing as part of a cluster. Additionally, inclusion in a cluster will not likely be 100% predictive. For instance, with respect to genotyping data, physicians are still unable to determine the breast cancer risks due to mutations in the BRCA genes, and the lifetime risk estimates are about 40–80% (compared to 10% risk in the general population) (Couzin, 2003). As with the case of false positives, this may needlessly alarm many individuals and could result in other problems such as denial of health coverage or increased premiums, when in fact most individuals with the genetic mutation will never develop the specified disease. Our ability (or lack thereof) to treat diseases or conditions we diagnose with gene expression profiling technologies highlights a third technological limitation and concurrent
J.R. Goodkind, J.S. Edwards / Computers and Chemical Engineering 29 (2005) 589–596
ethical concern. If we utilize one of the new gene expression profiling technologies to characterize a diseases for which no treatment exists, what purpose does this information serve? Who should have access to the information? Related to this issue is the current screening that assesses differential responses to a particular treatment. What hope can we provide to individuals whose profiles suggest they will not respond to existing treatments? A final important ethical consideration involves genetic counseling. In some ways, this issue relates to all of the other ethical issues discussed. For instance, in terms of individual privacy, who can or should genetic counselors share information with? How should information be imparted to individuals who do not have access to treatment or who have conditions for which no treatment exists? The precision and accuracy of gene expression profiling technologies affects how a certain result is explained to patients and what information should or should not be shared. If information is not released or shared appropriately, this can cause undue stress in patients. This is particularly problematic given the negative impact that stress has on immune system functioning. As “insiders,” we have the greatest ability to bring these issues to the forefront of discussion and ensure that ethical guidelines are created and observed. Engineers and scientists have a responsibility to society and since we have the power and influence in these decisions, we need to ensure that we are part of the decision process. As we saw in the Columbia Space Shuttle disaster, scientists, and engineers who fail to voice their concerns may regret their silence later.
References Adams, M. D. (1996). Serial analysis of gene expression: ESTs get smaller. Bioessays, 18, 261–262. Alizadeh, A., Eisen, M., Botstein, D., Brown, P. O., & Staudt, L. M. (1998). Probing lymphocyte biology by genomic-scale gene expression analysis. J. Clin. Immunol., 18, 373–379. Alter, O., Brown, P. O., & Botstein, D. (2000). Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. U.S.A., 97, 10101–10106. Bassett, D. E., Jr., Eisen, M. B., & Boguski, M. S. (1999). Gene expression informatics—it’s all in your mine. Nat. Genet., 21, 51–55. Bittner, M., Meltzer, P., Chen, Y., Jiang, Y., Seftor, E., Hendrix, M., et al. (2000). Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature, 406(6795), 536–540. Brenner, S., Johnson, M., Bridgham, J., Golda, G., Lloyd, D. H., Johnson, D., et al. (2000). Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol., 18(6), 630–634. Butz, J., Wickstrom, E., & Edwards, J. S. (2003). Characterization of mutations and LOH of p53 and K-ras2 in pancreatic cancer cell lines by immobilized PCR. BMC Biotechnol., 3, 11. Butz, J., Yan, H., Mikkilineni, V., & Edwards, J. S. (2004). Detection of allelic variations of human gene expression by polymerase colonies. BMC Genomics, 5(3). Carulli, J. P., Artinger, M., Swain, P. M., Root, C. D., Chee, L., Tulig, C., et al. (1998). High throughput analysis of differential gene expression. J. Cell. Biochem. Suppl., 30–31, 286–296.
595
Chen, T., He, H. L., & Church, G. M. (1999). Modeling gene expression with differential equations. Pac. Symp. Biocomput., 29–40. Clark, E. A., Golub, T. R., Lander, E. S., & Hynes, R. O. (2000). Genomic analysis of metastasis reveals an essential role for RhoC. Nature, 406, 532–535. Couzin, J. (2003). Choices—and uncertainties—for women with BRCA mutations. Science, 302, 592. DeRisi, J., van den Hazel, B., Marc, P., Balzi, E., Brown, P., Jacq, C., et al. (2000). Genome microarray analysis of transcriptional activation in multidrug resistance yeast mutants. FEBS Lett., 470(2), 156–160. D’Haeseleer, P., Wen, X., Fuhrman, S., & Somogyi, R. (1999). Linear modeling of mRNA expression levels during CNS development and injury. Pac. Symp. Biocomput., 41–52. Eisen, M. B., Spellman, P. T., Brown, P. O., & Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. In Proceedings of the National Academy of Sciences of the United States of America (pp. 14863–14868). Friend, S. H., & Stoughton, R. B. (2002). The magic of microarrays. Sci. Am., 286, 44–49, 53. Getz, G., Levine, E., & Domany, E. (2000). Coupled two-way clustering analysis of gene microarray data. Proc. Natl. Acad. Sci. U.S.A., 97, 12079–12084. Giordano, T. J. (2003). Gene expression profiling of endocrine tumors using DNA microarrays: progress and promise. Endocr. Pathol., 14, 107–116. Hilgers, W., & Kern, S. E. (1999). Molecular genetic basis of pancreatic adenocarcinoma. Genes Chromosomes Cancer, 26, 1–12. Kenney, M., Ray, S., & Boles, T. C. (1998). Mutation typing using electrophoresis and gel-immobilized acrydite probes. Biotechniques, 25, 516–521. Liang, S., Fuhrman, S., & Somogyi, R. (1998). Reveal, a general reverse engineering algorithm for inference of genetic network architectures. Pac. Symp. Biocomput., 18–29. Lipshutz, R. J., Fodor, S. P., Gingeras, T. R., & Lockhart, D. J. (1999). High density synthetic oligonucleotide arrays. Nat. Genet., 21, 20–24. Marcotte, E. M., Pellegrini, M., Thompson, M. J., Yeates, T. O., & Eisenberg, D. (1999a). A combined algorithm for genome-wide prediction of protein function. Nature, 402, 83–86. Marcotte, E. M., Pellegrini, M., Yeates, T. O., & Eisenberg, D. (1999b). A census of protein repeats. J. Mol. Biol., 293, 151–160. Marcotte, E. M., Pellegrini, M., Ng, H. L., Rice, D. W., Yeates, T. O., & Eisenberg, D. (1999c). Detecting protein function and protein–protein interactions from genome sequences. Science, 285(5428), 751–753. McGuire, A. M., Hughes, J. D., & Church, G. M. (2000). Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. Genome. Res., 10, 744–757. Merritt, J., DiTonno, J. R., Mitra, R. D., Church, G. M., & Edwards, J. S. (2003). Functional characterization of mutant yeast PGK1 within the context of the whole cell. Nucleic Acids Res., 31, e84. Michaels, G. S., Carr, D. B., Askenazi, M., Fuhrman, S., Wen, X., & Somogyi, R. (1998). Cluster analysis and data visualization of largescale gene expression data. Pac. Symp. Biocomput., 42–53. Mikkilineni, V., Mitra, R. D., DiTonno, J. R., Merritt, J., Church, G. M., Ogunnaike, B. A., et al. (2004). Digital quantitative measurements of gene expression. Biotech. Bioeng., 86(2), 117–124. Mitra, R. D., & Church, G. M. (1999). In situ localized amplification and contact replication of many individual DNA molecules. Nucleic Acids Res., 27, e34. Mitra, R. D., Butty, V. L., Shendure, J., Williams, B. R., Housman, D. E., & Church, G. M. (2003). Digital genotyping and haplotyping with polymerase colonies. Proc. Natl. Acad. Sci. U.S.A., 100, 5926– 5931. Nuwaysir, E. F., et al. (2002). Gene expression analysis using oligonucleotide arrays produced by maskless photolithography. Genome Res., 12, 1749–1755. Ostrow, R. S., Woods, W. G., Vosika, G. J., & Faras, A. J. (1979). Analysis of the genetic complexity and abundance classes of messenger RNA
596
J.R. Goodkind, J.S. Edwards / Computers and Chemical Engineering 29 (2005) 589–596
in human liver and leukemic cells. Biochim. Biophys. Acta, 562, 92– 102. Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D., & Yeates, T. O. (1999). Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. U.S.A., 96, 4285–4288. Perou, C. M., Jeffrey, S. S., van de Rijn, M., Rees, C. A., Eisen, M. B., Ross, D. T., et al. (1999). Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc. Natl. Acad. Sci. U.S.A., 96(16), 9212–9217. Perou, C. M., Sorlie, T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., Rees, C. A., et al. (2000). Molecular portraits of human breast tumours. Nature, 406(6797), 747–752. Ramsay, G. (1998). DNA chips: State-of-the art. Nat. Biotechnol., 16, 40–44. Roth, F. P., Hughes, J. D., Estep, P. W., & Church, G. M. (1998). Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol., 16, 939–945. Saha, S., Sparks, A. B., Rago, C., Akmaev, V., Wang, C. J., Vogelstein, B., et al. (2002). Using the transcriptome to annotate the genome. Nat. Biotechnol., 20(5), 508–512. Service, R. F. (2003). Genetics and medicine. Recruiting genes, proteins for a revolution in diagnostics. Science, 300, 236–239. Sinclair, A. (2002). Genetics 101: Polymerase chain reaction. Cmaj, 167, 1032–1033. Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., et al. (1998). Comprehensive identification of cell cycleregulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell, 9(12), 3273–3297. Sudarsanam, P., Iyer, V. R., Brown, P. O., & Winston, F. (2000). Wholegenome expression analysis of snf/swi mutants of Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. U.S.A., 97, 3364–3369.
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., et al. (1999). Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. U.S.A., 96(6), 2907–2912. Tavazoie, S., Hughes, J. D., Campbell, M. J., Cho, R. J., & Church, G. M. (1999). Systematic determination of genetic network architecture. Nat. Genet., 22, 281–285. Velculescu, V. E., Zhang, L., Zhou, W., Vogelstein, J., Basrai, M. A., Bassett, D. E., Jr., et al. (1997). Characterization of the yeast transcriptome. Cell, 88(2), 243–251. Velculescu, V. E., Zhang, L., Vogelstein, B., & Kinzler, K. W. (1995). Serial analysis of gene expression. Science, 270, 484–487. Weinstock, K. G., Kirkness, E. F., Lee, N. H., Earle-Hughes, J. A., & Venter, J. C. (1994). cDNA sequencing: A means of understanding cellular physiology. Curr. Opin. Biotechnol., 5, 599–603. Wilson, M., DeRisi, J., Kristensen, H. H., Imboden, P., Rane, S., Brown, P. O., et al. (1999). Exploring drug-induced alterations in gene expression in mycobacterium tuberculosis by microarray hybridization. Proc. Natl. Acad. Sci. U.S.A., 96(22), 12833–12838. Wooster, R. (2000). Cancer classification with DNA microarrays is less more? Trends Genet., 16, 327–329. Yan, H., Yuan, W., Velculescu, V. E., Vogelstein, B., & Kinzler, K. W. (2002). Allelic variation in human gene expression. Science, 297, 1143. Ye, S. Q., Usher, D. C., & Zhang, L. Q. (2002). Gene expression profiling of human diseases by serial analysis of gene expression. J. Biomed. Sci., 9, 384–394. Zhu, J., Shendure, J., Mitra, R. D., & Church, G. M. (2003). Single molecule profiling of alternative pre-mRNA splicing. Science, 301, 836–838. Zupke, C., & Stephanopoulos, G. (1995). Intracellular flux analysis in hybridomas using mass balances and in vitro 13 C NMR. Biotechnol. Bioeng., 45, 292–303.