From Library Screening to Microarray Technology: Strategies to Determine Gene Expression Profiles and to Identify Differentially Regulated Genes in Plants

From Library Screening to Microarray Technology: Strategies to Determine Gene Expression Profiles and to Identify Differentially Regulated Genes in Plants

Annals of Botany 87: 139±155, 2001 doi:10.1006/anbo.2000.1314, available online at http://www.idealibrary.com on R E V IE W From Library Screening t...

487KB Sizes 0 Downloads 72 Views

Annals of Botany 87: 139±155, 2001 doi:10.1006/anbo.2000.1314, available online at http://www.idealibrary.com on

R E V IE W

From Library Screening to Microarray Technology: Strategies to Determine Gene Expression Pro®les and to Identify Di€erentially Regulated Genes in Plants EK K E H A R D K UH N* Institute of Plant Physiology and Biotechnology, University of Hohenheim (260), 70593 Stuttgart, Germany Received: 28 July 2000

Returned for revision: 22 September 2000

Accepted: 10 October 2000

Published electronically: 18 December 2000

Hybridization to DNA microarrays and high density membrane ®lters, serial analysis of gene expression (SAGE), the cDNA-AFLP technique, restriction fragment-coupled di€erential display (RC4D), di€erential display reverse transcription PCR (DDRT-PCR) and also the di€erential screening of standard and subtracted cDNA libraries are techniques being used extensively to determine transcription patterns or to identify di€erentially regulated genes in plants and other organisms. In this review, commonly used display systems are evaluated and compared. The general principles on which the di€erent techniques are based and which determine their potential and their limitations are described. Performance aspects of each method are discussed, and existing applications of each method are brie¯y surveyed. Some typical examples are considered to illustrate how di€erential display systems have been applied to plants and to what extent these techniques have contributed to our understanding of plant gene # 2001 Annals of Botany Company expression. Key words: Review, gene expression, transcription pro®les, mRNA quantitation, microarrays, serial analysis of gene expression, SAGE, cDNA-AFLP, RFLP-coupled di€erential display, RC4D, di€erential display reverse transcription PCR, DDRT-PCR, di€erential screening.

I N T RO D U C T I O N There are several reasons why qualitative and quantitative determination of transcript patterns in plant cells is of importance to plant molecular biology. By comparing the concentrations of individual mRNAs present in samples originating from di€erent genotypes, developmental stages or growth conditions, genes can be identi®ed that are di€erentially expressed and hence may have speci®c metabolic or morphogenetic functions. As sequence information grows very rapidly, so does the number of protein coding sequences with unknown functions. When the yeast genome project was completed, only approx. 30 % of its approx. 6000 coding sequences corresponded to known yeast proteins. More than 30 % were orphans sharing no sequence homology to any known gene (Dujon, 1996). At present the function of about half of the yeast genome's open reading frames is still not known. Analysis of transcript patterns should be valuable in assessing roles of novel sequences in an organism, since the similarity of expression patterns of sequences of unknown function with those of known genes may indicate functional homology. For example, the ®rst large scale expression pro®les of light- and dark-grown Arabidopsis thaliana seedlings obtained by membrane and microarray hybridization revealed numerous genes that were highly regulated by light, but did not have a match in the nucleotide sequence databases. The fact that they are lightinducible might indicate that these genes have as yet unde®ned photomorphogenetic functions. Isolation of * Fax ‡49 711 459 3152, e-mail [email protected]

0305-7364/01/020139+17 $35.00/00

plant genes expressed under speci®c conditions allows their promoters to be sequenced and compared. Common cis-elements may be localized and their presence may be correlated with speci®c features of the expression pro®le of the corresponding genes. This allows analysis of how promoter activity is determined by hierarchical and combinatorial interactions of the signal elements embedded within the promoter sequence. By comparing transcript patterns with proteome data, it may be possible to determine whether the intracellular concentration of speci®c proteins is preferentially regulated at the level of transcription or by post-transcriptional mechanisms such as mRNA translation eciency or partitioning between subcellular compartments. For many years the isolation of genes for which products and mutants were not known was only possible by di€erential screening of cDNA libraries. The ®rst in vitro technique for the determination of transcript patterns was di€erential display reverse transcription PCR (DDRTPCR) developed by Liang and Pardee (1992). For the ®rst time it was possible to determine simultaneously a large part of the transcripts present in a eukaryotic cell within a single experiment with high sensitivity. The technique was applied widely, and for several years no other method was available by which comprehensive transcript patterns of eukaryotic cells could be obtained. Fischer et al. (1995) combined DDRT-PCR and ampli®ed fragment length polymorphism (AFLP), a method developed by Vos et al. (1995) for the characterization of genomic DNA. The new technique, termed restriction fragment length polymorphism-coupled domain directed di€erential display (RC4D), provided a # 2001 Annals of Botany Company

140

KuhnÐFrom Library Screening to Microarray Technology

useful tool to detect di€erentially expressed members of individual gene families. The cDNA-AFLP technique (Bachem et al., 1996) is based on the selective PCRampli®cation of adapter-ligated restriction fragments derived from cDNA. In SAGE (serial analysis of gene expression), developed by Velculescu et al. (1995), short fragments termed tags are extracted from cDNA molecules, concatenated, cloned and sequenced. The abundance of a tag in the sequence re¯ects the abundance of the corresponding mRNA in the tissue from which the cDNA was prepared. The era of microarray hybridization technology began with the simultaneous quantitative determination of mRNA concentrations of a small set of Arabidopsis genes by a cDNA microarray (Schena et al., 1995). DNA chip technology has now developed to a stage where the state of expression of complete genomes can be recorded with high accuracy on a single chip. Yet this technique did not foreclose the development of other quantitative large scale screening methods. The throughput of the most advanced AFLP and SAGE protocols is comparable to microarrays. This review summarizes new and old display techniques available to determine transcript patterns and to identify di€erentially expressed genes in plants. The molecular principles of each method are described, and the biological signi®cance of di€erential display data obtained from plant systems is discussed using a few typical examples. Performance criteria such as reproducibility, accuracy and throughput are compared to highlight the power of each approach and to delimit its range of application.

Applications The usefulness of cDNA macroarrays for monitoring gene expression in higher plants has been demonstrated by Desprez et al. (1998), who used a collection of 432 partially sequenced cDNAs arrayed on a nylon support. The ®lters were employed in hybridization studies with 33P labelled complex probes prepared from mRNA of light- and darkgrown Arabidopsis thaliana seedlings. Only a very small number of mRNAs, corresponding to 1 % of the cDNAs tested, was found to be represented at levels higher than 0.5 % of the total mRNA level. Comparison of light- and dark-grown seedlings showed signi®cant di€erences in the expression level for approx. 15 % of the genes investigated. For a number of genes previously reported to be regulated by light, di€erential expression in dark- and light-grown seedlings was observed. These included the chlorophyll a/b binding proteins, plastocyanin and ribulose-1,5-bisphosphate carboxylase. Quantitation of the hybridization pattern also revealed massive variations in expression of many genes for which no di€erential expression between light- and dark-grown seedlings had previously been observed. Large scale gene expression studies using high density membranes have been reported not only for plants but also for other organisms (Bernard et al., 1996; Pietu et al., 1996). c D N A M I C ROA R R AY HY B R I D I Z AT I O N A N A LY S I S Principle

M E M B R A N E A R R AY H Y B R I D I Z AT I O N A N A LY S I S Principle cDNA membrane arrays (macroarrays) are nylon membranes on which denatured double-stranded (ds) DNA fragments are deposited at indexed locations with high speed arraying machines. With the technology currently available it is feasible to array more than 6000 elements on a 12  18 cm nylon membrane. The basic principle of acquiring gene expression data using membrane arrays consists of hybridization of an unknown sample to the ordered set of immobilized molecules on the array surface. This produces a speci®c hybridization pattern that can be read and compared to the pattern produced by a di€erent sample or to a given standard. Probes are prepared by reverse transcription of Poly(A) ‡ -RNA in the presence of 33PdATP to measure transcript concentrations by hybridization to a macroarray. The membrane is then subjected to a standard hybridization procedure. After washing o€ the unbound probe the array is read by an imager. Identi®cation and quantitation of the hybridization signals, as well as the subtraction of local background values, is done by image processing software. The use of radioactive isotopes can be circumvented by employing a non-radioactive single or dual colour detection system in which gene expression levels are represented by colour intensities generated by colourimetric reactions (Chen et al., 1998).

The array elements of cDNA microarrays are singlestranded DNAs. These arrays are produced by robotically depositing denatured target DNAs, e.g. PCR-ampli®ed inserts from cDNA or EST clones, at de®ned locations onto microscope slides. The current density of robotic printing allows the design of arrays containing 20 000 cDNA targets (Schena et al., 1995; Kehoe et al., 1999). This density is much higher than in the case of nylon ®lters, which in theory makes it possible to display most or all transcripts present in a plant cell on a standard microscope slide. The basic steps of preparing and reading cDNA microarrays are depicted in Fig. 1. Applications cDNA microarrays were ®rst developed for Arabidopsis thaliana (Schena et al., 1995). Di€erential gene expression was measured in roots and shoots using a microarray of 48 duplicate cDNA elements simultaneously assayed with a mixture of two probes labelled with ¯uorescein and lissamine. Comparison of scans obtained with leaf and root mRNA revealed 26 genes di€ering more than ®ve-fold in expression between root and leaf tissue. mRNA from the light regulated CABI gene was approx. 500-fold more abundant in leaf than in root tissue. In a transgenic line overexpressing the HAT4 transcription factor, the HAT4 gene was expressed at a level 60-times that of the wild type. Most of the other genes tested gave expression levels within a factor of ®ve between the two samples.

KuhnÐFrom Library Screening to Microarray Technology

141

Preparation of target DNA robot cDNA-library isolate clones, prepare vector DNA

vector

cDNA insert

glass slide print DNA

vector Primer

PCR amplification

false colour image of scanned slide

Preparation of fluorescent DNA probes

mRNA probe 1

hybridize

mRNA probe 2 CY3-dCTP CY5-dCTP

Revertase

cDNA probe 1 cDNA probe 2 Fluorescent DNA synthesis F I G . 1. cDNA fragment-based microarray technology. Target-DNAs are prepared by PCR amplifying inserts from cDNA clones. The amplicons are spotted onto microscope slides at indexed locations using high speed arraying machines. Usually probes from two independent samples are prepared and labelled with ¯uorescent nucleotides. Two ¯uorescent tags with di€erent excitation and emission optima can be used to di€erently label the two probes. Both probes are mixed together and allowed to hybridize to the same microarray under a coverslip. After washing o€ the unbound probe, the ¯uorescent DNA molecules that have hybridized to DNA fragments on the microarray are excited by light and the ¯uorescence signal associated with each element of the microarray is read with an array scanner. The intensity of the ¯uorescence signal ranges from about 1000 (background) to 35 000 ¯uorescence intensity units. For each element of the microarray, the ratio of ¯uorescence emission at the two wavelengths re¯ects the ratio of the abundance of that sequence in the two probes. The result is often displayed as a false-colour image. Thus, in the diagram red may indicate that the corresponding RNAs were more abundant in probe 1 than in probe 2, yellow may indicate that they were present at similar concentrations in both probes and green that the RNAs associated with these array elements were more abundant in probe 2 than in probe 1 (modi®ed from Kehoe et al., 1999).

Ruan et al. (1998) analysed cDNA microarrays containing 1443 Arabidopsis thaliana genes for expression pro®les in di€erent organs and at di€erent developmental stages. More than 80 % of the DNA elements on the arrays generated detectable ¯uorescence signals. Among the detectable elements only 1.4±5 % represented highly expressed genes with abundance of more than 100±500 transcripts per cell. In this class, as expected, many members were well characterized housekeeping or tissue-speci®c genes. The percentage of genes expressed at modest levels (i.e. between ten±50 transcripts per cell) was between 11±16 %, depending on the organ tested. The majority of the expressed genes were low abundance with levels of less than ten±50 transcripts per cell. Hence the transcript complexity in di€erent tissues of Arabidopsis was found to be consistent with early studies in other plants using RNA/DNA reassociation kinetics (Kamalay and Goldberg, 1980).

Comparing leaf and root, 34 % of the transcripts were expressed at signi®cantly di€erent levels. The most di€erentially expressed sequences were those known to be involved in photosynthesis. In ¯owers, many genes were found to be downregulated when compared with their expression patterns in leaves. Conversely, over 200 genes were upregulated in ¯owers when compared with the leaf. Of the 1443 coding sequences used, 173 had no signi®cant homology to any known cDNA sequences in the current public databases. Two of these exhibited an expression pattern comprising higher transcript abundance in roots and developing ¯ower buds, but lower abundance in mature leaves and opened ¯owers. This expression pattern was similar to that of a known gene that encodes a G-protein involved in cell division. The proteins encoded by the two novel sequences may thus have functional roles related to cell division in meristematic tissues. This illustrates the

142

KuhnÐFrom Library Screening to Microarray Technology

point that functions of as yet uncharacterized genes may be inferred from the similarity of their expression patterns to the patterns of genes of known function. The state of DNA microarray technology and the technique's potential have been described in several recent reviews (Hoheisel, 1997; Lemieux et al., 1998; Ramsay, 1998; Kehoe et al., 1999; Khan et al., 1999). Detailed descriptions are also available dealing with technical aspects of producing and reading microarrays (Bowtell, 1999; Cheung et al., 1999; Eisen and Brown, 1999) and with the specialized database software required for storage and retrieval of complex gene expression data (Ermolaeva et al., 1998). O L I G O N U C L E OT I D E M I C ROA R R AY H Y B R ID I Z AT I O N A N A LY S I S Principle The basic principle of using oligonucleotide microarrays was ®rst proposed in the late 1980s, when several groups independently developed the concept of sequencing by hybridization, i.e. determining a DNA sequence by hybridization to a comprehensive set of oligonucleotides, such as all the possible 65 536 octamers (Chetverin and Kramer, 1994). The initial aim was the establishment of a faster, less costly and more e€ective sequencing approach. When it gradually became apparent that microarrays could also be used for large scale comparative measurements of mRNA levels of sets of genes in di€erent cell types or under di€erent conditions, the interest shifted to transcriptional pro®ling, which is now the most prominent application. By automatically printing presynthesized oligonucleotides onto microscope slides, oligonucleotide microarrays can be produced in the same way as cDNA microarrays at densities of up to 30 000 elements cm ÿ2 (Yershov et al., 1996). A di€erent way of producing oligonucleotide microarrays is semiconductor photolithography in combination with modi®ed phosporamidite-based oligonucleotide synthesis chemistry (Fodor et al., 1991; Pease et al., 1994). This technique generates high density microarrays using comparatively few synthesis steps, since the number of probes increases exponentially with a linear increase in the number of synthesis cycles. For example, a complete set of octanucleotides consisting of 65 536 probes can be produced in 32 chemical steps in 8 h (Ramsay, 1998). Photolithography has been used to produce arrays at densities of up to 106 probes per cm2. This corresponds to an element size of 5±10 mm, and it is expected that optimization will permit features of approx. 1 mm. The basic steps of probe labelling, hybridization, signal detection and data processing are the same regardless of array type. Applications Oligonucleotide microarrays have been used to measure expression levels of genes in mice (Lockhart et al., 1996), yeast (Wodicka et al., 1997) and bacteria (DeSaizieu et al., 1998). So far, oligonucleotide microarray analysis has not yet been reported for a plant. Reasons for this will be discussed later.

S E R I A L A N A LY S I S O F G E N E E X P R E S S I O N Principle Serial analysis of gene expression (SAGE) is another large scale method allowing quantitative and simultaneous analysis of many transcripts (Velculescu et al., 1995). From each species present in a cDNA population, a unique sequence tag (approx. 9 bp) is extracted by a series of enzymatic restriction and ligation steps. The cleavage reactions are catalysed by a pair of restriction enzymes, a tetracutter termed the anchoring enzyme and a type II S restriction endonuclease, which is termed the tagging enzyme. The tags are linked together, concatenates are inserted into a plasmid vector, cloned and sequenced. In the sequence, tags are easily distinguished, because they are separated by the recognition sequence of the anchoring enzyme which establishes the register and boundaries of each tag. Hence concatenation of short sequence tags allows the ecient analysis of transcripts in a serial manner by the sequencing of multiple tags within a single clone. A transcription pro®le is obtained by determining the total frequency of each tag in a SAGE clone library, which directly re¯ects the abundance of the corresponding mRNA in the tissue. Individual steps of SAGE library construction are shown in Fig. 2. Applications Serial analysis of gene expression was initially applied to analyse the expression of approx. 1000 pancreatic genes (Velculescu et al., 1995). Since then the power of SAGE for use in gene expression studies has been demonstrated in a number of cases. The ®rst SAGE analysis of a higher plant was completed by Matsumura et al. (1999). More than 10 000 tags derived from approx. 6000 genes expressed in rice were studied. SAGE showed that most of the highly expressed genes in rice seedlings are housekeeping genes. A remarkable exception is a metallothionein gene, whose expression counts for more than 1 % of total gene expression. Analysis of approx. 2000 tags obtained from anaerobically treated and untreated seedlings revealed that most genes were expressed at similar levels. Interestingly, the genes required for anaerobic metabolism were not identi®ed in the di€erentially expressed genes. Using the tag sequence as a primer, a 0.2 kbp cDNA fragment belonging to an elongation factor EF-1a gene could be successfully ampli®ed, thus demonstrating that the sequence of a 9 bp tag plus the adjacent 4 bp tetracutter recognition site can serve as a PCR primer for the recovery of longer cDNA fragments that can in turn be used for the isolation of the corresponding gene. T HE c DN A - A F L P T E C H N I Q U E Principle cDNA-AFLP with two restriction enzymes. cDNA-AFLP is an RNA ®ngerprinting technique that evolved from AFLP (ampli®ed fragment length polymorphism), a method described by Vos et al. (1995) for the ®ngerprinting of genomic DNA. The classical cDNA-AFLP procedure

KuhnÐFrom Library Screening to Microarray Technology NlaIII

143

NlaIII

cleave with NlaIII, isolate cDNA ends by binding to streptavidin beads

divide in half, ligate to linkers

FokI

NlaIII

FokI

9

NlaIII

9

13

13 FokI cleavage site

cleave with FokI

FokI cleavage site





filling in, blunt end ligation FokI

NlaIII

Tag 1

Ditag

Primer A

Primer B

Tag 2

NlaIII

FokI

amplify, cleave with NlaIII, ioslate ditags, concatenate and clone

Tag 1

Tag 2

NlaIII

Ditag

Tag 3

Tag 4

Ditag

F I G . 2. The basic steps of SAGE. Double-stranded DNA is synthesized from mRNA by means of a biotinylated oligo(dT) primer. The DNA is then cleaved with NlaIII, which recognizes 4 bp and cuts DNA frequently. The biotinylated 30 ends of the cleaved cDNA molecules are isolated by binding to streptavidin. The truncated cDNA is divided in half, and each half is ligated via the cohesive 30 end to one of two linkers containing a FokI restriction site. FokI is a type II S restriction endonuclease. These enzymes cleave at a de®ned sequence-independent distance away from their asymmetric recognition site. Cleavage of the ligation product with FokI results in release of the linker with a short piece of cDNA termed a tag. Cohesive ends are blunted, and the two pools of released tags are ligated to each other. Ligated tags then serve as templates for PCR ampli®cation with primers speci®c for each linker. The resulting amplicons contain two tags (one `ditag') linked tail to tail, ¯anked by the NlaIII recognition site and a linker sequence at each end. Clevage of the PCR product with NlaIII allows isolation of ditags that are concatenated, inserted into a plasmid vector and cloned. Clones containing ten±50 ditags are ampli®ed by PCR and sequenced. In the sequence, ditags are counted easily because they are separated by the tetracutter recognition sequence, which serves as punctuation signal. The frequency of each tag in a SAGE library directly re¯ects the abundance of the corresponding mRNA in the tissue (modi®ed from Velculescu et al., 1995).

(Bachem et al., 1996) uses the standard AFLP protocol on a cDNA template. The technique involves three steps: (1) restriction of cDNA and ligation of oligonucleotide adapters; (2) selective ampli®cation of sets of restriction fragments using PCR primers bearing selective nucleotides at the 30 end; and (3) gel analysis of the ampli®ed fragments (Fig. 3). Restriction of plant cDNA with a combination of two restriction enzymes, a tetracutter and a hexacutter,

allows a signi®cant fraction of the cDNA population to be cleaved and to be represented as a discrete banding pattern on a sequencing gel. In genomic AFLP with plant DNA, three selective bases on the end of each primer are required to give a useful banding pattern. The lower complexity of cDNA allows the use of two selective bases for each primer giving a total of 256 possible primer combinations. The largest cDNA-AFLP products visible on a polyacrylamide

144

KuhnÐFrom Library Screening to Microarray Technology mRNA cDNA synthesis

digest with AseI and TaqI 5’ end fragments

inner fragments

3’ end fragments

adapter ligation label

Ase

no amplification

rare

Ase

no amplification

Taq product unlabelled Taq Ase Taq

A

B

A

B

A

B

amplify with 256 primer combinations separate amplicons on sequencing gels

autoradiograph F I G . 3. The classical cDNA-AFLP protocol. cDNA is synthesized from total RNA or poly(A)‡ -RNA and is digested with TaqI and AseI, which recognize four and six bp, respectively. A complete digest of plant cDNA with these enzymes produces ®ve di€erent types of molecules: Ase/Ase fragments, Ase/Taq fragments, Taq/Taq fragments and two terminal fragments with only one cohesive end. TaqI, which cuts DNA frequently, generates small cDNA fragments (around 256 bp on average), which amplify well and lie in the optimal size range for separation on sequencing gels. AseI, which cuts only rarely due to its longer recognition sequence, reduces the number of fragments to a manageable size. Following digestion, double-stranded adapters are ligated to the restriction fragments to generate templates for ampli®cation. PCR ampli®cation is carried out in two steps. In the ®rst step, around 15 cycles of non-speci®c ampli®cation are carried out using primers without extensions. The products of this reaction are then subjected to a second round of PCR ampli®cation using primers bearing at their 30 end two additional nucleotides which extend into the sequence of the restriction fragments, allowing only a subpopulation to be ampli®ed. All the 256 possible primer combinations are necessary to amplify the whole cDNA population. The amplicons are separated on a polyacrylamide gel and visualized by autoradiography. Most of the bands represent Ase/Taq fragments because Ase/Ase fragments are rare and Taq/Taq fragments are not visible on the gel. RNA probes from di€erent sources (A, B) will produce di€erent cDNA-AFLP banding patterns, which allows di€erentially expressed cDNAs to be identi®ed (arrows).

sequencing gel are around 1000 bp in size, the lower end of the gel representing approx. 100 bp. In this size window an average of 40 bands can be observed for each primer combination, corresponding to a total of approx. 10 000 bands.

cDNA-AFLP with one restriction enzyme. A systematic comparison of known potato cDNA sequences showed that approx. 45 % are cleaved by the AseI/TaqI restriction enzyme combination. Thus, in so far as only one pair of

KuhnÐFrom Library Screening to Microarray Technology enzymes is applied, about half of the transcripts present in a cell will not be detected by the standard cDNA-AFLP technique. To obtain more comprehensive patterns, Habu et al. (1997) modi®ed the cDNA-AFLP protocol and showed that the rarely cutting enzyme can be omitted, and meaningful banding patterns can be produced using TaqI alone. Samples derived from buds of red and white ¯owers of the common morning glory (Ipomoea purpurea) were compared using 96 di€erent primer combinations, each of which gave approx. 50 bands, corresponding to a total of approx. 5000 bands. iAFLP. iAFLP (introduced AFLP) is a quantitative high throughput expression pro®ling method speci®cally designed to measure the concentrations of known transcripts in numerous di€erent probes (Kawamoto et al., 1999). cDNA from each probe is restricted with MboI and ligated to one of up to six adapters having short insertions of various lengths into a common sequence ( polymorphic adapters). Following ligation, the di€erentially adapted cDNAs are pooled and 30 end fragments are selectively ampli®ed with a gene speci®c primer and a ¯uorescently labelled adapter primer. The amplicon is then separated on an automatic sequencer. Due to length heterogeneity introduced by the polymorphic adapters, iAFLP fragments from di€erent probes will produce distinct peaks on the electropherogram. Transcript abundance is determined by evaluating peak areas relative to an internal standard.

Applications cDNA-AFLP and its application to plants was ®rst described by Bachem et al. (1996), who analysed di€erential gene expression in a synchronized potato in vitro tuberization system. During screening with di€erent primer combinations, two lipoxygenase cDNA fragments were isolated on the basis of their di€erential expression during potato tuber formation. Both transcripts are highly tuber-speci®c and are expressed strongly in 15-d-old tubers, but not in stolons, leaves or petioles and only at very low levels in stems. The dramatic induction of a lipoxygenase gene just after the start of tuberization led the authors to speculate that the expression of at least one of these enzymes might directly be linked to the tuber development process. Following this initial report, a small number of papers have described the use of cDNA-AFLP ®ngerprinting in plant and animal systems. Habu et al. (1997) compared mRNA samples obtained from the ¯ower buds of two lines of Ipomoea purpurea. Fourteen cDNA fragments (approx. 0.3 %) ampli®ed di€erently in the two samples. Two of these were shown to have been derived from a gene that was actively expressed in the buds of red ¯owers but not in those of white ¯owers. Sequence analysis showed that this cDNA carries a sequence highly homologous to the chalcone synthase gene, a key enzyme in the ¯avonoid biosynthetic pathway. cDNA-AFLP was also applied to identify di€erentially expressed genes in cold-tolerant and coldsensitive alfalfa genotypes and rice (Ivashuta et al., 1999).

145

D I F F E R E N T I A L D I S P L AY R E V E R S E TRANSCRIPTION PCR Principle DDRT-PCR, the di€erential display reverse transcription PCR method, was ®rst described by Liang and Pardee (1992). The method was intended to provide an e€ective tool to detect individual mRNA species that are di€erentially expressed in di€erent eukaryotic cell types and then to permit recovery and cloning of the corresponding cDNAs. The principle of the DDRT-PCR technique is to ampli®y speci®c subpopulations of mRNA, using reverse transcriptase and PCR to produce a population of PCR fragments of di€erent lengths. To do this, mRNA is reversetranscribed in subsets using 12 anchored oligo(dT) primers that recognize di€erent fractions of the total mRNA population. The resulting cDNA is then ampli®ed with the same anchored oligo(dT) primer and a short arbitrary primer. Labelled dATP is introduced in the reaction and the labelled products are separated on a DNA sequencing gel and visualized by autoradiography. Many re®nements of the original di€erential display technology have been described. Instead of decameric upstream primers, elongated arbitrary primers have been used. This opens the possibility of including common sequence motifs to target particular classes or families of genes (Donohue et al., 1995; Johnson et al., 1996; Gonsky et al., 1997). Methods employed for the separation of DDRT-PCR products include denaturing and non-denaturing polyacrylamide gel electrophoresis (Liang et al., 1995), capillary electrophoresis (George et al., 1997) and even electrophoresis on standard agarose gels (Rompf and Kahl, 1997). To label DDRT-PCR products, [33P]dATP is generally used (Simon and Oppenheimer, 1996). Alternatives to radioactive labelling are ¯uorescence detection (Bauer et al., 1993; Jones et al., 1997), silver staining (Simon and Oppenheimer, 1996; Gottschlich et al., 1997) and chemoluminescence detection (An et al., 1996). To circumvent subcloning, direct sequencing approaches for di€erential display products have been developed (Reeves et al., 1995; Wang and Feuerstein, 1995; Simon and Oppenheimer, 1996; Buess et al., 1997). Applications Due to its technical simplicity the DDRT-PCR method has become very popular. Since its introduction, hundreds of applications ranging from microorganisms to humans have been described. I will consider the identi®cation of an Arabidopsis gene (NAP) which is an immediate target of the ¯oral homoeotic genes APETALA and PISTILLATA (Sablowski and Meyerowitz, 1998). Homoeotic genes encode transcription factors, which are mostly MADS box proteins in plants. In Arabidopsis ¯owers both APETALA (AP3) and PISTILLATA (PI) are expressed throughout the development of petals and stamens. The AP3 and PI proteins bind DNA in vitro only as an AP3/PI heterodimer, which probably acts as a master regulator necessary and sucient to set in motion the programs of gene expression required for petal and stamen

146

KuhnÐFrom Library Screening to Microarray Technology

development. To identify AP3/PI target genes, transgenic Arabidopsis plants were constructed exhibiting constitutive expression of PI combined with ubiquitous expression of a steroid regulated version of AP3. In this arti®cial di€erential expression system the mechanism of AP3 activation requires no protein synthesis. Thus, the immediate e€ects of AP3/PI activation can be analysed by di€erential display while all downstream gene regulatory events are blocked by a protein synthesis inhibitor. To identify mRNAs a€ected by activation of the AP3 product, mutant ¯owers were treated with dexamethasone and mRNA populations were compared by DDRT-PCR. For one candidate (NAP), RNA blot analysis showed that the corresponding 1±3 kb mRNA was more abundant in wild type than in mutant ¯owers, suggesting activation by AP3. The expression pattern of NAP and the phenotypes caused by its mis-expression suggest that it functions in the transition between growth by cell division and cell expansion in stamens and petals. In Arabidopsis, DDRT-PCR thus served to detect an important genetic link between homoeotic gene activity and ¯ower morphogenesis. Further examples of the successful isolation of cDNAs associated with distinct stages of plant growth and development include genes expressed during the plant cell cycle (Callard and Mazzolini, 1997), embryogenesis (Heck et al., 1995), fruit ripening (Tieman and Handa, 1996), tuber formation (Monte et al., 1999) and the development of seeds (Nuccio and Thomas, 1999). The DDRT-PCR technique has also been applied to identify several circadian clock regulated genes in Arabidopsis (Kreps et al., 2000) and to isolate several cDNAs induced in plants under various stress conditions e.g. salt (Muramoto et al., 1999), heat shock (Visioli et al., 1997), ozone treatment (Kiiskinen et al., 1997), nutrient starvation (Petrucco et al., 1996), wounding (Titarenko et al., 1997) and pathogen attack (Truesdell and Dickman, 1997). R F L P - CO U P L E D D I F F E R E N T I A L D I S P L AY Principle Many genes and their protein products have a modular structure where the presence of certain domains ( family speci®c domains, FSDs) de®nes membership in di€erent gene families. This is well characterized for the chlorophyll a/b binding proteins and for many transcription factors. Restriction fragment length polymorphism-coupled domain directed di€erential display (RC4D) is a method speci®cally designed to analyse expression of multigene families at di€erent developmental stages, in diverse tissues or in di€erent organisms (Fischer et al., 1995). RC4D combines cDNA-AFLP technology with a gene family speci®c version of DDRT-PCR. In RC4D, instead of arbitrary decameric primers, longer primers directed against an FSD are used, allowing cDNAs belonging to the same gene family to be selectively ampli®ed. As the ampli®cation products are relatively uniform in length, restriction fragment length polymorphism (RFLP) is introduced by digestion with a frequently cutting restriction enzyme.

This reduces the amplicon size from approx. 1 kbp to several hundred base pairs, which is optimal for separation on acrylamide gels. Family members can thus easily be distinguished by size. The basic steps of RC4D are depicted in Fig. 4. Applications RC4D was ®rst used to analyse di€erential expression of MADS box genes in male and female in¯orescences of maize (Fischer et al., 1995; Huang et al., 1996). The name MADS was constructed from the initials of the ®rst four members of the gene family, which were MCM1 (yeast), AGAMOUS ( plants), DEFICIENS ( plants) and SRF (human). A small collection of MADS box primers was designed, directed against sequences encoding derivatives of a highly conserved amino acid motif which covered all its variations known from plants. RC4D yielded many fragments signi®cantly di€erent in size. Most of them were equally present in both sexes. Four already known and two new MADS box genes were identi®ed, being either speci®cally expressed in the female sex or preferentially expressed in male or female in¯orescences, respectively. The two new MADS box genes belong to a subfamily showing sequence similarity to ¯oral homoeotic and transcription factor genes (Huang et al., 1996). Another example of using RC4D was provided by Tahtiharju et al. (1997), who identi®ed several cDNAs coding for calcium dependent protein kinases involved in calcium signalling during cold induction of the kin genes of Arabidopsis thaliana.

DIFFERENTIAL SCREENING OF cDNA LIBRARIES Principle The oldest method of isolating di€erentially expressed cDNAs is the di€erential screening of plasmid or phage cDNA libraries. In principle, di€erential screening can be used to identify any di€erentially regulated gene. In practice, however, the method works well only when the mRNA of interest comprises more than 0.05 % of the total mRNA in one cell and less than 0.01 % in the other (Sambrook et al., 1989). The sensitivity with which cDNAs corresponding to relatively rare mRNAs are identi®ed is considerably improved if the concentration of speci®c sequences in a cDNA probe is increased by subtractive hybridization (SH). There are numerous protocols for SH but the principle remains the same. mRNA extracted from one cell type is used as a template to make radiolabelled cDNA which is then hybridized exhaustively with an approx. 20-fold excess of mRNA extracted from a second type of cell in which the gene of interest is not expressed. cDNAs corresponding to mRNAs expressed in both cell types will form DNA : RNA hybrids, which can be separated from single-stranded DNA by chromatography on hydroxylapatite columns or by streptavidin biotin interaction. The resulting subtracted single-stranded probe

KuhnÐFrom Library Screening to Microarray Technology

147

mRNA

FSD

CDS PCR primer binding sequence (PBS)

1. cDNA synthesis with cDNA synthesis primer (CDS) 2. PCR with FSD primer and PBS primer CDS mixture of amplified truncated family member cDNAs

1. Digestion with tetracutter (arrows) 2. Ligation to double-stranded linkers 3. PCR with linker primer and FSD primer FSD

Linker

truncated family member cDNA fragments of different length

linear PCR with labelled FSD primer Autoradiograph

Electrophoresis

A

B

A

B

F I G . 4. Schematic representation of the RC4D protocol. cDNA is synthesized from mRNA with an oligo(dT) primer bearing a PCR downstream primer binding sequence at its 50 end. PCR is performed with the downstream primer and an upstream primer speci®c for a family speci®c domain (FSD). This results in a mixture of truncated family member cDNAs. The amplicon is digested with a frequently cutting restriction enzyme, and double-stranded linkers are ligated to the cohesive ends. PCR with a linker primer and a FSD primer results in a population of family member cDNA fragments of di€erent lengths. To get rid of the unligated fragments, a further round of PCR is performed using the FSD primer and a primer directed against the linker. Ampli®cation products are then used as a template to extend a radiolabelled FSD primer, and extension products are separated on acrylamide gels. Di€erent probes (A, B) will produce di€erent RC4D banding patterns, which allows di€erentially expressed cDNAs to be identi®ed (arrows); modi®ed from Fischer et al., 1995.

is then used for library screening or for the construction of a subtracted cDNA library. SH detects rare mRNAs with prevalences ranging from 1/500 to 1/200 000 with the median being approx. 1/20 000 (Wan et al., 1996).

Applications A classical example of the identi®cation of a low abundance transcript from a plant by di€erential screening of cDNA libraries was provided by the isolation of the ®rst

148

KuhnÐFrom Library Screening to Microarray Technology

phytochrome cDNA clones from Avena sativa (Hershey et al., 1984). This opened the door for determination of the complete amino acid sequence of the phytochrome apoprotein and the isolation and characterization of phytochrome genes from Avena and other plant species (reviewed in Colbert, 1988). Di€erential screening has been widely applied in plants, including the isolation of cDNA sequences corresponding to genes expressed speci®cally during ¯ower development in tobacco (Goldman et al., 1992) and Sinapis alba (Staiger and Apel, 1993) and the characterization of male ¯ower cDNAs from maize (Wright et al., 1993). POT E N T I A L , L IM I TAT I O N S A N D A P P L I CAT I O N R A N G E O F C U R R E N T D I S P L AY T E C H NI Q U E S Quantitative high throughput display systems cDNA array hybridization, oligonucleotide array hybridization, SAGE and iAFLP represent quantitative high throughput display techniques allowing the intracellular concentrations of hundreds or thousands of transcripts to be determined simultaneously in a single experiment. The main characteristics of these techniques are compared in Table 1. cDNA arrays. Macroarray technology is relatively cheap. Since the ®lters can be rehybridized up to ten times, a small set of ®lters is sucient to monitor expression of most or all Arabidopsis genes in two di€erent tissues or under two di€erent conditions. In comparison to clones arrayed on membrane ®lters, microarrays are 50-times more sensitive and permit simultaneous analysis of two di€erent probes. A disadvantage of these arrays is that they are not reusable and the equipment needed is more complex. Sequence information is not absolutely required, since cDNA arrays can be made using anonymous cDNA or EST clones. Those elements associated with interesting transcription patterns can be sequenced after the hybridization experiment has been completed. However, the number of clones required to ensure a high probability that a low abundance mRNA will be present at least once in the cDNA library used to produce the array is very large. More than 20 standard size cDNA microarrays are required in order to have a 99 % probability of at least one array element hybridizing to a rare mRNA with prevalence 40.001 % (Table 1). Because of this, the usefulness of cDNA array technology for the detection of rare transcripts corresponding to unknown genes is limited, despite the technique's high potential for automation. Oligonucleotide microarrays. A distinct advantage of oligonucleotide microarrays over cDNA microarrays is that preparation and storage of many di€erent DNA species can be avoided by on chip oligonucleotide synthesis. To give a chance for every possible gene fragment derived from an unknown genome to form a perfect hybrid with at least one element of an oligonucleotide microarray, the array should display all possible oligonucleotide sequences

of the given length. The array density required to design such arrays is several orders of magnitude higher than the density at which oligonucleotide arrays can be produced with current chip technology. Hence, monitoring expression of most or all genes of an organism by oligonucleotide microarrays requires the genome sequence of that organism to be known. For most plant species extensive sequence data are not available. For the purpose of transcriptional pro®ling in plants this makes cDNA microarrays the array type of choice. For a given transcript there is a quite accurate relationship between ¯uorescence intensity and transcript concentration regardless of array type. Exact quantitative comparisons of di€erent transcripts, however, require that for all probe species hybridizing with the array this relationship is the same. This is questionable, since individual probe species have unique secondary structures, melting temperatures and reassociation kinetics (Southern et al., 1999), which makes hybridization of all probe species under optimum conditions impossible. Serial analysis of gene expression. The power of SAGE is comparable to microarray technology. Like microarrays, SAGE can quantify low abundance transcripts and detect small di€erences in transcript concentration. Two major advantages of SAGE over microarrays are that uncloned cDNA can be analysed and that no special device other than a sequencer is required. Since SAGE clones may contain 20±50 tags, a representative SAGE library is much smaller than the cDNA library required to build an equivalent microarray. A unique feature of SAGE is that the method has digital output, which makes data processing relatively simple. In order to be represented by a SAGE tag, a cDNA must have at least one recognition site for the anchoring enzyme. Since tetracutters cleave DNA frequently, this should be the case for almost all transcripts. The 9 bp tags generated by the original SAGE procedure together with the adjacent tetracutter sequence are also long enough to identify all transcripts almost unequivocally and provide sucient sequence information for PCR cloning and screening of cDNA libraries (Matsumura et al., 1999). The number of SAGE tags to be sequenced depends on mRNA abundance and the desired accuracy of quantitation. In principle the latter is not limited. If it is desired to count an average of ten tags per cDNA species, which would allow a rough quantitation, approx. 150 000 tags (approx. 1.65 Mb) have to be sequenced. This requires sequencing of 7500 clones, each clone containing 20 tags on average (Table 1). Because of several technical diculties and inherent drawbacks, SAGE has been used less frequently for transcript analysis than DNA microarray hybridization. Meanwhile these problems have largely been alleviated by various modi®cations and improvements of the original SAGE protocol (Powell, 1998; Datson et al., 1999; Kenzelmann and Muhlemann, 1999; Peters et al., 1999; Ryo et al., 2000). Introduced AFLP. Throughput and accuracy of iAFLP are comparable to microarrays. Three di€erent primer colours can be used, allowing up to 288 PCR reactions to be

+50 0.5  10 ÿ6 ±10 ÿ5

6000

460 000

+30

50.01

25

Reusable; many cDNA clones to be maintained

Desprez et al., 1998

Density

Elements/clones to analyse

Reproducibility ( %)

Sensitivity ( %)

Total RNA (mg)

Considerations

Reference

±

Sequencer

Extracting short sequence tags from cDNA; cloning, sequencing and counting tags

SAGE

Wodicka et al., 1997; de Saizieu et al., 1998

Requires extensive sequence information about target genome

25±150

2  10 ÿ4 ±2  10 ÿ3

+50

Velculescu et al., 1995; Datson et al., 1999

Generates sequence information; SAGE library construction dicult

100

Increases with amount of sequencing

Increases with amount of sequencing

4 Number of protein coding 7500 genes in target genome

250 000

Photolithography, array scanner, specialized data processing software

Nucleic acid hybridization

Oligonucleotide microarray

Kawamoto et al., 1999

cDNA or EST Sequence must be known

25±100

510 ÿ5

+4

ˆNumber of genes  probes

±

Sequencer

Selective ampli®cation of adapter ligated 30 cDNA ends

iAFLP

The number (N) of random clones necessary to obtain a near to complete array hybridization expression pro®le from one cell type grown under de®ned conditions is calculated by N ˆ ln…1 ÿ P†=ln…1 ÿ 1=n† using P ˆ 0.99, 1=n ˆ 10 ÿ5 ; P ˆ probability that a low abundance mRNA will be present at least once in the library used to produce the array; 1/n ˆ fraction of the total mRNA that is represented by a single type of rare mRNA (Sambrook et al., 1989). The number of SAGE clones required to obtain a rough SAGE pro®le is estimated as follows: N ˆ ZM=k, where Z is the average number of tags per mRNA species (10), M the number of mRNA species per cell (15 000) and k the number of tags per clone (20). Reproducibility: estimated con®dence interval for repeated measurements. Sensitivity is given in terms of mRNA abundance (mass of desired mRNA/mass of total mRNA in RNA probe). Total RNA required is for one experiment. Protocols employing RNA or cDNA ampli®cation may require 500±5000-fold less material.

Lockhart et al., 1996; Schena et al., 1996; Ruan et al., 1998

Many cDNA clones to be maintained

50±200

460 000

30 000

Advanced robotics, array scanner, specialized data processing software

Robot, 2D imager, standard image processing software

Equipment

Nucleic acid hybridization

Nucleic acid hybridization

cDNA microarray

Principle

Macroarray

T A B L E 1. Quantitative high throughput display systems.

KuhnÐFrom Library Screening to Microarray Technology 149

150

KuhnÐFrom Library Screening to Microarray Technology

analysed in a single sequencer run. Thus, if pro®les from six di€erent tissues are analysed, the throughput is 1728 (288  6) data points per run or 5184 data points per day (three runs). Control measurements performed with de®ned template mixes indicate that the method re¯ects the relations between the ratio of polymorphic products and the ratio of targets in the initial mRNA population quite accurately. The main limitation of iAFLP is that only known transcripts can be quanti®ed, since a primer binding sequence is required downstream of the MboI site closest to the cDNA's 30 end. Qualitative display systems The most characteristic features of cDNA-AFLP, DDRT-PCR, RC4D and di€erential screening are summarized in Table 2. With the exception of RC4D, which requires the nucleotide sequence of a gene family speci®c domain, these methods can be used for the identi®cation of any di€erentially expressed gene. They are, however, less accurate and allow transcription pro®les to be determined only on a qualitative or semiquantitative basis. cDNA-AFLP. In cDNA-AFLP, universal adapters are added to DNA fragments produced by restriction of cDNA with one or two restriction enzymes. The method needs only minute amounts of RNA due to the preampli®cation step performed with non-selective primers. Because stringent hybridization conditions are used in the ampli®cation reactions, mismatched priming events are observed only in cases where transcript levels are extremely high. This results in cDNA-AFLP banding patterns being highly reproducible and almost free of false positives. cDNA-AFLP provides a straightforward veri®cation mechanism of band identity and homogeneity. When using AFLP primers with additional base extensions it is possible to selectively amplify one cDNA fragment, thereby eliminating virtually all other fragments from being co-ampli®ed. Because band intensity is a direct function of template concentration, it is likely that transcript concentrations can be measured quite accurately by combining cDNA-AFLP with high resolution quantitative separation techniques. A signi®cant disadvantage of the cDNA-AFLP method is the requirement for appropriate restriction sites on the cDNA molecules. cDNA-AFLP experiments using di€erent enzymes are necessary to visualize every cDNA present in a plant cell with high probability. For several reasons, this is also true if the one enzyme variant of the cDNA-AFLP protocol is used (Habu et al., 1997). DDRT-PCR. The DDRT-PCR technique has been widely used to isolate di€erentially expressed plant genes due to its technical simplicity and the ease with which the primary data are obtained. However, the method su€ers from severe drawbacks. Even the most advanced procedures generate at least 30 % false positives that do not represent di€erentially expressed genes (Malhotra et al., 1998). This makes a downstream veri®cation process necessary, which is not only labour intensive, but also requires large amounts of RNA that often cannot be obtained without RNA

ampli®cation (Poirier et al., 1997). In theory, DDRT-PCR should detect more than 95 % of the transcripts present in a eukaryotic cell (Bauer et al., 1993). Practically, however, the sensitivity of the method depends on both primer sequence and concentrations of competing template binding sites (Bertioli et al., 1995; Wan et al., 1996). Thus it seems dicult to predict whether or not a given primer set will detect a speci®c transcript, even if abundant, in an uncharacterized cDNA population, where concentrations of competitors are not known. Because ampli®cation starts from the 30 end of the mRNA, amplicons are from the untranslated 30 -region of the gene and often do not contain the information required to perform successful similarity searches in sequence databases. Finally, in DDRT-PCR, there is no accurate relationship between signal strength and the initial concentration of the corresponding mRNA, which precludes quantitative measurement of transcript concentration. A few systematic comparisons of cDNAAFLP and DDRT-PCR have been published, indicating that cDNA-AFLP is superior to DDRT-PCR in several respects (Habu et al., 1997; Jones and Harrower, 1998). RC4D. The use of longer primers and stringent annealing temperatures makes the RC4D method more reproducible and far less parameter dependent than DDRT-PCR, allowing semiquantitative determination of transcript concentration. The main limitation of the RC4D method is that it requires a conserved domain to be situated upstream of the gene's 30 end, which only allows genes belonging to the same gene family to be analysed. Family members bearing a respective restriction site within the FSD will be lost because cleavage of the amplicon resulting from the initial PCR ampli®cation will destroy the domain speci®c primer binding site.

Di€erential screening Because of the many steps required to construct a representative library and the necessity to screen thousands or millions of clones, di€erential screening is generally considered to be extremely laborious and inecient. If one is looking for rare mRNAs, this is certainly true. However, in cases where a gene is strongly expressed under a given set of conditions and only weakly expressed under others, the number of clones that have to be screened to ®nd a certain cDNA may be quite small. For example, the probability of an mRNA with prevalence 1 % being represented at least once in a cDNA library of 500 clones is 499 % (Table 1). The usefulness of subtractive hybridization (SH) is often limited by the inability to obtain sucient mRNA to drive SH to completion and by ine€ective hybridization. As with DDRT-PCR, a signi®cant fraction of the candidate clones isolated by SH consists of false positives. There is also a correlation between clone redundancy and the fraction of false positives, indicating that SH identi®es abundant mRNAs more easily than rare mRNAs. SH works in only one direction, i.e. to identify mRNAs expressed only in those cells from which subtractor RNA has been prepared, a second experiment is required (Wan et al., 1996).

PAGE or sequencer or CE

10 000

100

2  10

2

Method may generate heterogeneous bands, but provides a simple mechanism of band veri®cation

Bachem et al., 1996; Habu et al., 1997

Equipment

Bands or clones to screen

Reproducibility ( %)

Sensitivity ( %)

Total RNA (mg)

Considerations

Reference

ÿ5

ÿ2

2

2  10

100 ÿ5

Few, depending on size of gene family

Bauer et al., 1993; Bertioli et al., 1995; Wan et al., 1996; Poirier et al., 1997

Fischer et al., 1995

Requires sequence of a family Generates at least 30 % false positives; short 30 end fragments speci®c domain prevail; biased towards abundant transcripts; bands usually heterogeneous

5

2  10 ±10 , depending on primer sequence and genetic background

42

38 000

PAGE

Combines cDNA-AFLP and DDRT-PCR

Selective ampli®cation of 30 cDNA ends PAGE or sequencer or CE

RC4D

DDRT-PCR

Sambrook et al., 1989

Generates false positives

Wan et al., 1996

Generates false positives; technically dicult

100

5  10 ÿ5 ±5  10 ÿ3

5 0.05

100

±

46 000

Standard

Nucleic acid hybridization

SH

±

460 000

Standard

Nucleic acid hybridization

DS

Reproducibility of cDNA-AFLP and DDRT-PCR: number of di€erential bands visible in di€erent experiments/number of bands examined ( %). Sensitivity is given in terms of mRNA abundance (mass of desired mRNA/mass of total mRNA in RNA probe). The size value for subtracted libraries assumes ten-fold enrichment of the di€erentially expressed cDNAs by the subtraction step. Total RNA is for one experiment assuming no ampli®cation. DS, Di€erential screening; SH, subtractive hybridization; PAGE, polyacrylamide gel electrophoresis; CE, capillary electrophoresis.

ÿ5

Selective ampli®cation of adapter-ligated cDNA

Principle

cDNA-AFLP

T A B L E 2. Qualitative display systems

KuhnÐFrom Library Screening to Microarray Technology 151

152

KuhnÐFrom Library Screening to Microarray Technology WHICH METHOD SHOULD BE USED?

The answer depends on several factors including the type of problem to be solved, the amount of sequence information available for the species under study and the abundance of the transcripts that one is looking for. In many cases choice of methods is likely to be dictated by cost. The investment in time and money needed to establish a quantitative high throughput display system is still beyond the scope of many laboratories. Following the branches of the binary tree (Fig. 5), one should consider that di€erent methods may have overlapping ®elds of application, and hence clear cut decisions may be dicult.

CO M M O N D R AW B AC K S O F AVA I L A B L E D I S P L AY M E T H O D S All the methods discussed so far produce a very large amount of redundant and/or insigni®cant data. In part this results from one mRNA species generating more than one signal. The number of clones or elements in a representative cDNA library or cDNA array produced by arraying anonymous cDNA clones exceeds by more than an order of magnitude the complexity of the mRNA population from which the cDNA was synthesized (Table 1). If cDNAAFLP is performed with one enzyme, each cDNA species should be represented by several bands (Habu et al., 1997).

Display System Selection Basics Intention to measure quantitative differences in gene expression Costs limiting 3’ end sequences of target transcripts known

iAFLP

no sequence information available anonymous clones available, mRNA abundance > 0.01%

Macroarray

No clones available or mRNA abundance £ 0.01%

SAGE

Costs not limiting Sequence information available Oligonucleotide microarray

Target genome sequenced Target genome not sequenced, sequenced cDNA clones available No sequence information available

cDNA microarray

cDNA microarray

Intention to detect merely qualitative differences in gene expression or to ioslate differentially expressed genes Sequence of a FSD known

RC4D

No sequence information available Sequencer or capillary electrophoresis available mRNA abundance < 1%

cDNA-PCR

mRNA abundance £ 1%

DDRT-PCR

No high resolution separation technique available mRNA abundance > 0.05%

DS

mRNA abundance £ 0.05%

SH

F I G . 5. Flow chart summarizing a few rules of thumb that may facilitate choice of the most appropriate display system for a particular application. DS, Di€erential screening; FSD, family speci®c domain; SH; subtractive hybridization.

KuhnÐFrom Library Screening to Microarray Technology In iAFLP the use of amino-blocked AFLP adapters results in only the 30 end fragments being ampli®ed. Hence each cDNA generates no more than one band. By combining this approach with recent cDNA ampli®cation protocols (Matz et al., 1999) it might be possible to completely remove redundancy from the traditional cDNA-AFLP procedures. DDRT-PCR is inherently redundant, since a given mRNA is usually recognized by more than one pair of primers and hence generates di€erent bands in di€erent lanes of the gel during separation. A more prominent source of meaningless data arises from the fact that usually only a few genes are expressed di€erentially under di€erent conditions. For example, cDNA microarray analysis of 673 genes from two Arabidopsis accessions revealed that expression levels of only two genes di€ered remarkably between the two strains (Kehoe et al., 1999). SAGE analysis of approx. 1500 genes from aerobically and anaerobically treated rice seedlings showed that only 24 transcripts were expressed at signi®cantly di€erent levels (Matsumura et al, 1999). Elimination of the common background produced by constitutively expressed genes constitutes a major problem. In principle this can be solved by algorithms designed to eliminate all insigni®cant di€erences in signal strength, thereby retaining only those exceeding a certain threshold. However, the necessity to generate and to process expression signals from hundreds or thousands of genes not directly involved in the biological process under study remains. Processing of gene expression data would be substantially simpli®ed if there were a method of biochemical subtraction allowing only those transcripts to be recorded that are di€erentially expressed. Subtractive hybridization is the only method developed so far to achieve this. The results, however, have been unsatisfactory. All di€erential display techniques provide information about steady state mRNA concentrations and cannot detect post-transcriptional events known to play a role in plant gene regulation such as changes in mRNA stability, translation eciency or protein (de)phosphorylation.

L I T E R AT U R E C I T E D An G, Luo G, Veltri RW, O'Hara SM. 1996. Sensitive, nonradioactive di€erential display method using chemiluminescent detection. Biotechniques 20: 342±346. Bachem CW, van der Hoeven RS, de Bruijn SM, Vreugdenhil D, Zabeau M, Visser RG. 1996. Visualization of di€erential gene expression using a novel method of RNA ®ngerprinting based on AFLP: analysis of gene expression during potato tuber development. Plant Journal 9: 745±753. Bauer D, MuÈller H, Reich J, Riedel H, Ahrenkiel V, Warthoe P, Strauss M. 1993. Identi®cation of di€erentially expressed mRNA species by an improved display technique (DDRT-PCR). Nucleic Acids Research 21: 4272±4280. Bernard K, Auphan N, Granjeaud S, Victorero G, Schmitt-Verhulst AM, Jordan BR, Nguyen C. 1996. Multiplex messenger assay: simultaneous, quantitative measurement of expression of many genes in the context of T cell activation. Nucleic Acids Research 24: 1435±1442. Bertioli DJ, Schlichter UH, Adams MJ, Burrows PR, Steinbiss HH, Antoniw JF. 1995. An analysis of di€erential display shows a strong bias towards high copy number mRNAs. Nucleic Acids Research 23: 4520±4523.

153

Bowtell DD. 1999. Options availableÐfrom start to ®nishÐfor obtaining expression data by microarray. Nature Genetics Supplement 21: 25±32. Buess M, Moroni C, Hirsch HH. 1997. Direct identi®cation of di€erentially expressed genes by cycle sequencing and cycle labelling using the di€erential display PCR primers. Nucleic Acids Research 25: 2233±2235. Callard D, Mazzolini L. 1997. Identi®cation of proliferation-induced genes in Arabidopsis thaliana. Characterization of a new member of the highly evolutionarily conserved histone H2A.F/Z variant subfamily. Plant Physiology 115: 1385±1395. Chen JJ, Wu R, Yang PC, Huang JY, Sher YP, Han MH, Kao WC, Lee PJ, Chiu TF, Chang F, Chu YW, Wu CW, Peck K. 1998. Pro®ling expression patterns and isolating di€erentially expressed genes by cDNA microarray system with colorimetry detection. Genomics 51: 313±324. Chetverin AB, Kramer FR. 1994. Oligonucleotide arrays: new concepts and possibilities. Biotechnology (N.Y.) 12: 1093±1099. Cheung VG, Morley M, Aguilar F, Massimi A, Kucherlapati R, Childs G. 1999. Making and reading microarrays. Nature Genetics Supplement 21: 15±19. Colbert JT. 1988. Molecular biology of phytochrome. Plant, Cell and Environment 11: 305±318. Datson NA, van der Perk-de Jong J, van den Berg MP, de Kloet ER, Vreugdenhil E. 1999. MicroSAGE: a modi®ed procedure for serial analysis of gene expression in limited amounts of tissue. Nucleic Acids Research 27: 1300±1307. DeSaizieu A, Certa U, Warrington J, Gray C, Keck W, Mous J. 1998. Bacterial transcript imaging by hybridization of total RNA to oligonucleotide arrays. Nature Biotechnology 16: 45±48. Desprez T, Amselem J, Caboche M, HoÈfte H. 1998. Di€erential gene expression in Arabidopsis monitored using cDNA arrays. Plant Journal 14: 643±652. Donohue PJ, Alberts GF, Guo Y, Winkles JA. 1995. Identi®cation by targeted di€erential display of an immediate early gene encoding a putative serine/threonine kinase. Journal of Biological Chemistry 270: 10 351±10 357. Dujon B. 1996. The yeast genome project: what did we learn?. Trends in Genetics 12: 263±270. Eisen MB, Brown PO. 1999. DNA arrays for analysis of gene expression. Methods in Enzymology 303: 179±205. Ermolaeva O, Rastogi M, Pruitt KD, Schuler GD, Bittner ML, Chen Y, Simon R, Meltzer P, Trent JM, Boguski MS. 1998. Data management and analysis for gene expression arrays. Nature Genetics 20: 19±23. Fischer A, Saedler H, Theissen G. 1995. Restriction fragment length polymorphism-coupled domain-directed di€erential display: a highly ecient technique for expression analysis of multigene families. Proceedings of the National Academy of Sciences of the United States of America 92: 5331±5335. Fodor SP, Read JL, Pirrung MC, Stryer L, Lu AT, Solas D. 1991. Light-directed, spatially addressable parallel chemical synthesis. Science 251: 767±773. George KS, Zhao X, Gallahan D, Shirkey A, Zareh A, Esmaeli-Azad B. 1997. Capillary electrophoresis methodology for identi®cation of cancer related gene expression patterns of ¯uorescent di€erential display polymerase chain reaction. Journal of Chromatography B 695: 93±102. Goldman MH, Pezzotti M, Seurinck J, Mariani C. 1992. Developmental expression of tobacco pistil-speci®c genes encoding novel extensin-like proteins. Plant Cell 4: 1041±1051. Gonsky R, Knauf JA, Elisei R, Wang JW, Su S, Fagin JA. 1997. Identi®cation of rapid turnover transcripts overexpressed in thyroid tumors and thyroid cancer cell lines: use of a targeted di€erential RNA display method to select for mRNA subsets. Nucleic Acids Research 25: 3823±3831. Gottschlich S, Goeroegh T, Folz BJ, Lippert BM, Werner JA. 1997. Optimized di€erential display and reampli®cation parameters for silver staining. Research Commununications in Molecular Pathology and Pharmacology 97: 237±240. Habu Y, Fukada-Tanaka S, Hisatomi Y, Iida S. 1997. Ampli®ed restriction fragment length polymorphism-based mRNA ®ngerprinting using a single restriction enzyme that recognizes a 4-bp

154

KuhnÐFrom Library Screening to Microarray Technology

sequence. Biochemical and Biophysical Research Communications 234: 516±521. Heck GR, Perry SE, Nichols KW, Fernandez DE. 1995. AGL15, a MADS domain protein expressed in developing embryos. Plant Cell 7: 1271±1282. Hershey HP, Colbert JT, Lissemore JL, Barker RF, Quail PH. 1984. Molecular cloning of cDNA for Avena phytochrome. Proceedings of the National Academy of Sciences of the United States of America 81: 2332±2336. Hoheisel JD. 1997. Oligomer-chip technology. Trends in Biotechnology 15: 465±469. Huang H, Tudor M, Su T, Zhang Y, Hu Y, Ma H. 1996. DNA binding properties of two Arabidopsis MADS domain proteins: binding consensus and dimer formation. Plant Cell 8: 81±94. Ivashuta S, Imai R, Uchiyama K, Gau M. 1999. The coupling of di€erential display and AFLP approaches for nonradioactive mRNA ®ngerprinting. Molecular Biotechnology 12: 137±141. Johnson SW, Lissy NA, Miller PD, Testa JR, Ozols RF, Hamilton TC. 1996. Identi®cation of zinc ®nger mRNAs using domain-speci®c di€erential display. Analytical Biochemistry 236: 348±352. Jones JT, Harrower BE. 1998. A comparison of the eciency of di€erential display and cDNA-AFLPs as tools for the isolation of di€erentially expressed parasite genes. Fundamental and Applied Nematology 21: 81±88. Jones SW, Cai D, Weislow OS, Esmaeli-Azad B. 1997. Generation of multiple mRNA ®ngerprints using ¯uorescence-based di€erential display and an automated DNA sequencer. Biotechniques 22: 536±543. Kamalay JC, Goldberg RB. 1980. Regulation of structural gene expression in tobacco. Cell 19: 935±946. Kawamoto S, Ohnishi T, Kita H, Chisaka O, Okubo K. 1999. Expression pro®ling by iAFLP: A PCR-based method for genome-wide gene expression pro®ling. Genome Research 9: 1305±1312. Kehoe DM, Villand P, Somerville S. 1999. DNA microarrays for studies of higher plants and other photosynthetic organisms. Trends in Plant Science 4: 38±41. Kenzelmann M, Muhlemann K. 1999. Substantially enhanced cloning eciency of SAGE (Serial Analysis of Gene Expression) by adding a heating step to the original protocol. Nucleic Acids Research 27: 917±918. Khan J, Saal LH, Bittner ML, Chen Y, Trent JM, Meltzer PS. 1999. Expression pro®ling in cancer using cDNA microarrays. Electrophoresis 20: 223±229. Kiiskinen M, Korhonen M, Kangasjarvi J. 1997. Isolation and characterization of cDNA for a plant mitochondrial phosphate translocator (Mpt1): ozone stress induces Mpt1 mRNA accumulation in birch (Betula pendula Roth). Plant Molecular Biology 35: 271±279. Kreps JA, Muramatsu T, Furuya M, Kay SA. 2000. Fluorescent di€erential display identi®es circadian clock-regulated genes in Arabidopsis thaliana. Journal of Biological Rhythms 15: 208±217. Lemieux B, Aharoni A, Schena M. 1998. Overview of DNA chip technology. Molecular Breeding 4: 277±289. Liang P, Pardee AB. 1992. Di€erential display of eukaryotic messenger RNA by means of the polymerase chain reaction. Science 257: 967±971. Liang P, Bauer D, Averboukh L, Warthoe P, Rohrwild M, Muller H, Strauss M, Pardee AB. 1995. Analysis of altered gene expression by di€erential display. Methods in Enzymology 254: 304±321. Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, Brown EL. 1996. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nature Biotechnology 14: 1675±1680. Malhotra K, Foltz L, Mahoney WC, Schueler PA. 1998. Interaction and e€ect of annealing temperature on primers used in di€erential display RT-PCR. Nucleic Acids Research 26: 854±856. Matsumura H, Nirasawa S, Terauchi R. 1999. Technical Advance: Transcript pro®ling in rice (Oryza sativa L.) seedlings using serial analysis of gene expression (SAGE). Plant Journal 20: 719±726. Matz M, Shagin D, Bogdanova E, Britanova O, Lukyanov S, Diatchenko L, Chenchik A. 1999. Ampli®cation of cDNA ends

based on template-switching e€ect and step-out PCR. Nucleic Acids Research 27: 1558±1560. Monte E, Ludevid D, Prat S. 1999. Leaf C40.4: a carotenoid-associated protein involved in the modulation of photosynthetic eciency?. Plant Journal 19: 399±410. Muramoto Y, Watanabe A, Nakamura T, Takabe T. 1999. Enhanced expression of a nuclease gene in leaves of barley plants under salt stress. Gene 234: 315±321. Nuccio ML, Thomas TL. 1999. ATS1 and ATS3: two novel embryospeci®c genes in Arabidopsis thaliana. Plant Molecular Biology 39: 1153±1163. Pease AC, Solas D, Sullivan EJ, Cronin MT, Holmes CP, Fodor SP. 1994. Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proceedings of the National Academy of Sciences of the United States of America 91: 5022±5026. Peters DG, Kassam AB, Yonas H, O'Hare EH, Ferrell RE, Brufsky AM. 1999. Comprehensive transcript analysis in small quantities of mRNA by SAGE-lite. Nucleic Acids Research 27: e39. Petrucco S, Bolchi A, Foroni C, Percudani R, Rossi GL, Ottonello S. 1996. A maize gene encoding an NADPH binding enzyme highly homologous to iso¯avone reductases is activated in response to sulfur starvation. Plant Cell 8: 69±80. Pietu G, Alibert O, Guichard V, Lamy B, Bois F, Leroy E, MariageSampson R, Houlgatte R, Soularue P, Au€ray C. 1996. Novel gene transcripts preferentially expressed in human muscles revealed by quantitative hybridization of a high density cDNA array. Genome Research 6: 492±503. Poirier GM, Pyati J, Wan JS, Erlander MG. 1997. Screening di€erentially expressed cDNA clones obtained by di€erential display using ampli®ed RNA. Nucleic Acids Research 25: 913±914. Powell J. 1998. Enhanced concatemer cloning-a modi®cation to the SAGE (Serial Analysis of Gene Expression) technique. Nucleic Acids Research 26: 3445±3446. Ramsay G. 1998. DNA chips: state-of-the art. Nature Biotechnology 16: 40±44. Reeves SA, Rubio MP, Louis DN. 1995. General method for PCR ampli®cation and direct sequencing of mRNA di€erential display products. Biotechniques 18: 18±20. Rompf R, Kahl G. 1997. mRNA di€erential display in agarose gels. Biotechniques 23: 28±32. Ruan Y, Gilmore J, Conner T. 1998. Towards Arabidopsis genome analysis: monitoring expression pro®les of 1400 genes using cDNA microarrays. Plant Journal 15: 821±833. Ryo A, Kondoh N, Wakatsuki T, Hada A, Yamamoto N, Yamamoto M. 2000. A modi®ed serial analysis of gene expression that generates longer sequence tags by nonpalindromic cohesive linker ligation. Analytical Biochemistry 277: 160±162. Sablowski RW, Meyerowitz EM. 1998. A homolog of NO APICAL MERISTEM is an immediate target of the ¯oral homeotic genes APETALA3/PISTILLATA. Cell 92: 93±103. Sambrook J, Fritsch EF, Maniatis T. 1989. Molecular cloning: a laboratory manual. Cold Spring Harbor: Cold Spring Harbor Laboratory Press. Schena M, Shalon D, Davis RW, Brown PO. 1995. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270: 467±470. Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW. 1996. Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proceedings of the National Academy of Sciences of the United States of America 93: 10 614±10 619. Simon HG, Oppenheimer S. 1996. Advanced mRNA di€erential display: isolation of a new di€erentially regulated myosin heavy chain-encoding gene in amphibian limb regeneration. Gene 172: 175±181. Southern E, Mir K, Shchepinov M. 1999. Molecular interactions on microarrays. Nature Genetics Supplement 21: 5±9. Staiger D, Apel K. 1993. Molecular characterization of two cDNAs from Sinapis alba L. expressed speci®cally at an early stage of tapetum development. Plant Journal 4: 697±703. Tahtiharju S, Sangwan V, Monroy AF, Dhindsa RS, Borg M. 1997. The induction of kin genes in cold-acclimating Arabidopsis thaliana. Evidence of a role for calcium. Planta 203: 442±447.

KuhnÐFrom Library Screening to Microarray Technology Tieman DM, Handa AK. 1996. Molecular cloning and characterization of genes expressed during early tomato (Lycopersicon esculentum Mill.) fruit development by mRNA di€erential display. Journal of the American Society for Horticultural Science 121: 52±56. Titarenko E, Rojo E, Leon J, Sanchez-Serrano JJ. 1997. Jasmonic aciddependent and -independent signaling pathways control woundinduced gene activation in Arabidopsis thaliana. Plant Physiology 115: 817±826. Truesdell GM, Dickman MB. 1997. Isolation of pathogen/stressinducible cDNAs from alfalfa by mRNA di€erential display. Plant Molecular Biology 33: 737±743. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW. 1995. Serial analysis of gene expression. Science 270: 484±487. Visioli G, Maestri E, Marmiroli N. 1997. Di€erential display-mediated isolation of a genomic sequence for a putative mitochondrial LMW HSP speci®cally expressed in conditions of induced thermotolerance in Arabidopsis thaliana (L.) heynh. Plant Molecular Biology 34: 517±527. Vos P, Hogers R, Bleeker M, Reijans M, van de Lee T, Hornes M, Frijters A, Pot J, Peleman J, Kuiper M. 1995. AFLP: a new

155

technique for DNA ®ngerprinting. Nucleic Acids Research 23: 4407±4414. Wan JS, Sharp SJ, Poirier GM, Wagaman PC, Chambers J, Pyati J, Hom YL, Galindo JE, Huvar A, Peterson PA, Jackson MR, Erlander MG. 1996. Cloning di€erentially expressed mRNAs. Nature Biotechnology 14: 1685±1691. Wang X, Feuerstein GZ. 1995. Direct sequencing of DNA isolated from mRNA di€erential display. Biotechniques 18: 448±453. Wodicka L, Dong H, Mittmann M, Ho MH, Lockhart DJ. 1997. Genome-wide expression monitoring in Saccharomyces cerevisiae. Nature Biotechnology 15: 1359±1367. Wright SY, Suner MM, Bell PJ, Vaudin M, Greenland AJ. 1993. Isolation and characterization of male ¯ower cDNAs from maize. Plant Journal 3: 41±49. Yershov G, Barsky V, Belgovskiy A, Kirillov E, Kreindlin E, Ivanov I, Parinov S, Guschin D, Drobishev A, Dubiley S, Mirzabekov A. 1996. DNA analysis and diagnostics on oligonucleotide microchips. Proceedings of the National Academy of Sciences of the United States of America 93: 4913±4918.