Chemometrics and Intelligent Laboratory Systems 82 (2006) 31 – 36 www.elsevier.com/locate/chemolab
Use of three-color cDNA microarray experiments to assess the therapeutic and side effect of drugs Hongya Zhao a, Ricky N.S. Wong b,c, Kai-Tai Fang a,*, Patrick Y.K. Yue c a
c
Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong b Department of Biology, Hong Kong Baptist University, Kowloon Tong, Hong Kong Research and Development Division of School of Chinese Medicine, Hong Kong Baptist University, Kowloon Tong, Hong Kong Received 20 January 2004; received in revised form 20 June 2005; accepted 28 June 2005 Available online 30 September 2005
Abstract A novel microarray strategy, three-color cDNA microarray experiment, is originally applied to assess drug effects on gene expression patterns of ‘‘target disease.’’ By adding Alexa 594 as a dye-label for the third target sample, it is made possible to monitor changes in gene expression in response to disease and generate clues to gene function with therapeutic intervention only on one array. A new kind of graph, hexaMplot, is constructed to illustrate the meaningful expression patterns of normal-disease-drug in three-color experiment. The therapeutic and side effect of drugs can be understood and indicated by the different regions of hexaMplot. And the testing of correlation coefficient is explored as a statistical tool of the assessment. Such a methodology may prove useful in shortening the cycle, reducing the cost, and improving the efficiency of drug discovery and development. D 2005 Elsevier B.V. All rights reserved. Keywords: Drug effect; HexaMplot; Gene expression; Three-color cDNA microarray experiments; Correlation coefficient
1. Introduction DNA microarray technology is a high-throughput and parallel platform that can provide expression profiling of thousands of genes, thereby enabling the rapid and quantitative analysis of gene expression patterns, patient genotypes, drug mechanisms, and disease onset and progression on a genomic scale. The immediate benefits of the progress in genomics are received in the discovery and development of novel pharmaceutical products. The use of microarrays speeds the identification of genes involved in the development of various diseases by enabling scientists to examine a much larger number of genes. This technology also aid the examination of the integration of gene expression and function at the cellular level, revealing how multiple gene products work together to produce physical and chemical responses to both static and changing cellular needs. Thus it is possible to find correlations between therapeutic responses to drugs and the genetic profiles of patients with microarray analysis. * Corresponding author. Tel.: +852 3411 7025; fax: +852 3411 5811. E-mail address:
[email protected] (K.-T. Fang). 0169-7439/$ - see front matter D 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.chemolab.2005.06.021
Human beings are gene machines, and thus the proper and harmonious expression of a large number of genes is a critical component of normal growth and development and the maintenance of proper health. Disruptions or changes in gene expression are responsible for many diseases because small errors in the genetic code can lead to the synthesis of a defective cellular protein, and proteins incapable of performing their normal cellular functions can cause human disease. The most attractive application of microarray is in the study of differential gene expression in disease. Knowing the molecular basis and other diseases enhances our ability to understand genetic predisposition, onset, and regression and expedites the development of safer and more effective treatments. Furthermore, drugs impart their therapeutic activities by acting on different pathways. It is possible in principle to use microarrays for drug discovery and clinical trials through generating gene expression profiles in patients undergoing disease progression or drug treatment. Medical practice has succeeded in identifying and curing disease using a host of clinical procedures, but medicine is lacking in the sense that many illnesses, therapies, and drugs are poorly understood at the genomic level. The use of microarray technology in a
32
H. Zhao et al. / Chemometrics and Intelligent Laboratory Systems 82 (2006) 31 – 36
normal v.s. disease (Red) (Green)
R
R=G
R>G
drug
drug
drug
R
R=G
R>G>B R>G=B R>B>G B=R>G B>R>G
Fig. 1. Flow chart of gene expression patterns of normal-disease-drug in threecolor cDNA microarray experiment. There are 3 patterns in dual-color experiment and 13 in three-color experiment.
clinical setting holds the promise of providing detailed molecular information to the physician to make better health care. Therefore microarrays are becoming a standard tool for drug discovery and development by surveying systematically the variations of DNA and RNA [1,2]. Microarrays are a significant advance both because they contain a very large number of genes and small sample size. The technology is useful when one wants to simultaneously survey a large number of genes or when the sample to be studied is very limited. It is known that microarray studies are costly in terms of equipment, consumables and time, therefore careful design and replication are particularly important if the resulting experiment is to be maximally informative. Commonly, cDNA microarray experiments using spotted arrays involve the hybridization of two differentially labelled targets to one slide, called dual target or dual-dye arrays. In such an experiment, one sample is labelled with the fluorescent dye Cy5 (red) and the other is labelled with dye Cy3 (green). After image analysis and data processing, R and G are used to denote the real values of relative gene expressions. There are practical issues concerning this experimental approach, such as the constraints on the number of arrays processed and/or the amount of samples available. In order to improve on such practical limitations, a third dye-label (Alexa 594) is introduced for a third target sample in microarray applications in Ref. [3,4]. The novel three-color cDNA microarray experiment is originally applied to investigate the changes in gene expressions of normal, disease and drug samples. By comparing their expression levels, the experiment can be applied
to assess drug effect on diseases. With a suitable laboratory protocol, including the necessary replication, three-color microarray experiment reduces the number of arrays, simplifies the experimental processes, and saves the rare samples in the study of drug treatment. There have been previous publications investigating the use of three-color microarray, but they almost deal with the quality control or comparison of slides [5 –7]. We open another door to the application of microarray in drug evaluation. Furthermore, a novel visualization tool, hexaMplot, is developed to elucidate the perfect meanings of drug effect in three-color microarray experiment. All the cases of our drug research with three-color microarray are discussed. Then the meaning of hexaMplot is explained step by step and the results of application are shown with our real data. 2. Methodology In dual-color cDNA microarray experiment, the measurement from one spot is a pair of intensities related to two different fluorescence dyes. There are three expression patterns, including equivalent ( G = R), up-regulated ( G > R) and downregulated ( G < R) expression. So it is enough to analyze the data with their ratio ( G / R) or log – ratio pffiffiffiffiffiffi ffi log( G / R). And A –M plot [8], in which Ag=r ¼ log2 RG and M g/r = log2 ( G / R), provides an effective tool of visualization in dual-color microarray analysis. However, in our three-color experiment, three different fluoresce-dyed samples are hybridized to one array, in which the normal sample is labelled with Cy5 (red), disease-treated sample is labelled Cy3 (green), and drug-treated sample is labelled with Alexa 594 (blue). Of course, other schemes of labelling are applicable for other experiment designs. After laser scanning and data processing, three values are obtained from every spot on one array, denoted as R, G, B, to represent their corresponding expression levels. Obviously, with the additional drug-associated treatment beyond normal and disease-associated treatment in three-color experiment, there are 13 probabilities among the comparisons of their expression levels. All cases can be simplified in Fig. 1. In order to illustrate 13 cases in an intuitive way, Figs. 2 and 3 are provided. In Fig. 2, we consider the possibilities of drug effect on the disease-associated equivalently expressed (EE) gene (R = G). The expression level may be altered by drug treatment to up- or down-regulated expression (R = G > B or R = G < B), or still keep the same expression (R = G = B) as the normal. Obviously, the cases in the left and right of Fig. 2
expression level
drug
normal
disease
normal
disease
drug
normal
disease
drug
measurement index of one spot
measurement index of one spot
measurement index of one spot
Fig. 2. Three probabilities of drug effect on EE genes between normal and disease.
H. Zhao et al. / Chemometrics and Intelligent Laboratory Systems 82 (2006) 31 – 36
33
expression level
disease drug normal
normal drug disease
expression level
disease
drug
normal
normal
drug disease
disease
expression level
drug
normal
normal disease
drug
expression level
disese
drug
normal
normal disease
drug
drug
expression level
disease
normal
normal disease drug
measurement index of one spot
measurement index of one spot
Fig. 3. Ten probabilities of drug effect on disease-associated DE genes. Left column shows the effects of drugs on the up-regulated (R < G); the right is the effects of drugs on the down-regulated (R > G).
should be avoided in drug development, because the drugs show side effects to make the normal expressions with disease abnormal, instead of normal. In drug treatment, the goal is to remedy the diseaseassociated differentially expressed (DE) genes (R m G). If a Mb/g
gene is up-regulated with disease (R < G), its drug-treated expression level may be decreased (R < B < G, B = R < G or B < R < G) as expected, or still keeps abnormal (R < G = B), or unexpected be deteriorated to higher up-regulated expression (R < G < B). The left column of Fig. 3 shows all possible effects Mb/g
Mg/r
Mb/g
Mg/r
Mg/r
Mb/r
Fig. 4. Implication of three axes (M g/r , M b/g and M b/r ) to construct hexaMplot for drug evaluation with three-color microarray experiment.
34
H. Zhao et al. / Chemometrics and Intelligent Laboratory Systems 82 (2006) 31 – 36 Mb/g
G>B>R G>R>B
R: normal G: disease B: drug
B>G>R B>R>G
Mg/r
R>G>B R>B>G Mb/r
Fig. 5. Meanings of six regions divided by three axes in HexaMplot to assess drug effect with three-color cDNA microarray experiment.
of drugs on up-regulated genes with disease. Similarly, the right in Fig. 3 illustrates the cases of drug effect on a downregulated gene (R > G) with disease. In Fig. 3, the trends of the first 3 rows are preferred in drug treatment, the pattern of the 4th row show that the drug is of no effect on the diseaseassociated DE genes, and that of the 5th row should be avoided as much as possible. All in all, it is complicated and confusing to analyze drug effect on disease-associated gene expressions in three-color experiment. Therefore a new kind of graph, called as hexaMplot, is proposed to demonstrate the gene expression patterns in the drug evaluation. The coordinate of hexaMplot is constructed with M g/r = log2 ( G / R) as horizontal x-axis and M b/g = log2 (B / G) as vertical y-axis. Obviously, the difference of gene expression between disease-treated and normal sample in our experiment is illustrated by the separation of x-axis as the left of Fig. 4. In other words, the points in the upper half plane intend to be up-regulated genes with disease, and those in the lower half plane may be down-regulated, and those around x-axis are likely EE genes. The division of y-axis shows the follow-up alteration of disease-associated expression levels with drug treatment. Thus the left and right half plane in the middle one of Fig. 4 imply the reverse meanings of gene expressions between disease-treated and drug-treated sample. The points around y-axis are EE genes whose diseaseassociated expressions are not changed by drugs. Based on the choice of x = M g/r and y = M b/g , a slant axis y = x is implied as M b/r = log2 (B / R), meaning the difference of expression levels between cell states of normal and drug treatment. The right of Fig. 4 shows the case. The genes in the top right plane intend to be up-regulated with drug treatment in comparison with normal expression; vice versa in the other half plane. In view of drug evaluation, the desirable genes should be around the slant axis because their expressions are normal after drug treatment whether they are disease-associated EE or DE. Therefore, the pattern that points most around the slant axis demonstrates the significant therapeutic and little side effects of drugs. As one hexaMplot, we explore the whole meanings of the six regions divided by the three axes. Shown in Fig. 5, hexaMplot is full of meaning about the relation among the expression levels of normal-disease-drug so as to assess the drug effect. We discuss the six parts clockwise from the first to
fourth quadrant. If the points are in the first quadrant, their gene expressions satisfy B > G > R; that is, drug treatment makes the expression of up-regulated genes with disease keep on increasing, instead of therapeutic regression to the normal expression. The side effect of drug treatment is indicated in this case, which should be avoided in drug development. Symmetric pattern may be implied in the third quadrant, in which the gene expressions satisfy R > G > B. In the two parts, drug treatment does not ameliorate the disease-altered expression to the normal, but makes them be more significantly expressed. As such, two regions are included in the second quadrant, in which the expression patterns show the therapeutic effect of drugs on disease. The expressions of up-regulated genes with disease tend to decrease with drug treatment as expected. The expression levels change clockwise from G > B > R to the slant axis ( G > B = R) until G > R > B, and we notice that the expression levels of drug-treated sample change from large to small. In comparison with the up-regulated expressions with disease, their drug-treated gene expressions are slightly decreased to the normal, and then under the normal. Because the therapeutic tendency of drugs can be realized in the two regions, especially around the slant axis, the optimal effect may be obtained by controlling drug dosage. Symmetrically, the drug also shows the therapeutic effect in the other two regions of the fourth quadrant. The three-color cDNA microarray experiment was conducted in our biomedical laboratory to study the effect of PW-1 (an extract of Chinese medicine) on TCDD toxified HepG2 cells. The control sample is labelled with Cy5, TCDD sample is labelled with Cy3 and PW-1 sample is labelled with Alex549 in the experiment. Thus there are three comparisons that can be made, including control vs. TCDD toxified (disease group), disease group vs. PW-1 treated (drug treatment group), and drug treatment group vs. control, which are equivalent to three dual-color microarray experiments. Indeed three-color microarray experiment is very economical and potential in application and research. After data processing, one hexaMplot is plotted with the real data in Fig. 6. The explanations of gene 3
Mb/g
2
1
0
Mg/r
-1
-2 Mb/r
-3 -3
-2
-1
0
1
2
3
Fig. 6. HexaMplot in our three-color cDNA microarray experiment to study the disease group (TCDD) and drug-treated group (PW-1) on a genomic scale.
H. Zhao et al. / Chemometrics and Intelligent Laboratory Systems 82 (2006) 31 – 36
expressions may be given by the corresponding regions in the hexaMplot. And the desirable pattern along the slant axis is observed. HexaMplot provides an intuitive, understandable, and meaningful tool for assessing the drug effect in three-color cDNA microarray experiment. The drug shows the therapeutic effect on disease-associated DE genes and little side effect on EE genes in the four regions of the second and fourth quadrant; and the side effect is assessed in the first and third quadrants because the drug-treated expressions are made worse, instead of regressing to the normal. 3. Hypothesis testing In one hexaMplot, it is desirable on one chip to observe the gene expression pattern along the slant axis M b/r ( y = x). So we consider the correlation coefficient between M g/r and M b/g of all N genes N P
c¼
i¼1
i ¯ g=r Mg=r M
i ¯ b=g M Mb=g
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 PN i i¼1
ð1Þ
;
N Sg=r Sb=g ¯ g=r Mg=r M
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2ffi PN i i¼1
¯ b=g Mb=g M
and Sb=g ¼ : where Sg=r ¼ N N Obviously the high negative correlation is expected to show therapeutic effect of drugs. It is impossible to expect the correlation coefficient of population q = 1, thus the hypothesis test of q is performed: H0 :
qVq0
and H1 :
q > q0
ð2Þ
where q 0 is the experiential value to assess the drug effect and for example q 0 = 0.8 is set in the following analysis. In Ref. [9], the distribution of z-transformation of c is normality as N increase with variance 1 / (N 3): sffiffiffiffiffiffiffiffiffiffiffi 1þc : z ¼ ln 1c Therefore, when the null hypothesis (2) is true, the following distribution qffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffi 1þq0 ln 1þc 1c ln 1q pffiffiffiffiffiffiffiffiffiffiffiffi 0 1= N 3 approximates standard normal and the interval with confidence degree a is calculated as qa V1
2 pffiffiffiffiffiffiffiffiffiffiffiffi k0 exp 2va = N 3 þ 1
ð3Þ
0 where k0 ¼ 1þq 1q0 and v a is the critical value of a from standard normal distribution, for example v 0.05 = 1.96 or v 0.01 = 2.576. In our three-color experiment, there are N = 2024 genes on one chip, and we estimate the correlation coefficient c all = 0.91 and calculate the confidence interval q 0.05 0.82,
35
thus the null hypothesis cannot be rejected. However, it is known in microarray analysis that gene expressions are stable and most of them are not altered with disease- or drugassociated treatment. That is, most points are near the original point in Fig. 6. We are more concerned on the genes significantly altered by treatments. It seems reasonable to consider c of the altered genes with disease and drugassociated treatment. And the unchanged genes should be ignored and thus c only contains the information of side and therapeutic effects of drugs. In order to identify the unchanged genes in hexaMplot, the corresponding confidence region is required to be computed with the tools of multivariate analysis. The procedure is similar to ‘‘2-fold rule’’ in two-color i i , M b/g )V (i = 1, . . . ,N) as microarray experiment. Let zi = (M g/r bivariate sample, the following region is the confidence ellipse with confidence degree a n o ¯ VS1 zi m ¯ Vv21;a ; zi : zi m ð4Þ 2 is the critical value of v 2-distribution with in which v 1,a degree of freedom one at cumulative probability 1 a, and X 1 X i 1 ¯ ¼ ¯ zi m ¯ V: m z; S¼ zi m N i N 1 i
The ellipse in Fig. 6 is obtained with Eq. (4), and there are 271 points (marked with + in Fig. 6) are out of the ellipse. Obviously, the number of sample is enough to make ztransformed data approximating normal. With the 271 genes, we compute the correlation coefficient c out = 0.843 and interval q 0.05 0.839. Thus we infer that in the whole the drug has a therapeutic effect on the disease. On the other hand, we calculate c in = 0.925 with the genes in the ellipse, which is higher than c all and c out. The equivalently expressed genes have an effect on the increase of correlation coefficient and little something with the assessment of the drug. So we prefer to infer with c out. 4. Discussion Microarrays produce an amount of information about genome profiles that provides insights into drug evaluation for disease without defined molecular mechanisms or cellular assays. The main advantage of this technology is the availability of expressed genes in many organisms, the flexibility of arrayed genes, and the low cost of entry. We originally apply the three-color cDNA microarray experiment to assess the drug effect with a novel hexaMplot. The hexaMplot implies plenty of meaning to explain the different expression patterns in response to drug treatments. In drug discovery and development, it is necessary to ensure therapeutic effects and minimize side effects at the stage of animal and human testing. Therefore, the methodology seems useful to test and predict all possible effects of drug treatment on genes. On the other hand, Western medicine with glorious history has developed out of classical physics’ ability to dissect the material world, giving rise to the clinical sciences such as biochemistry, physiology, pathology. As such, it is clear to
36
H. Zhao et al. / Chemometrics and Intelligent Laboratory Systems 82 (2006) 31 – 36
explore the drug effect based on observation and description. However, it is ambiguous and difficult to explore the specific activities of Chinese herbal medicine because of its complication in component and mechanism. With the recent completion of the human genome sequence and the rise of microarray analysis as a formidable research tool, there is an opportunity to revolutionize medical practice in a way that has not occurred since the genesis of the field. Genetic and biochemical data from microarrays, combined with the full spectrum of clinical tools, will enable precise and profound insights into the molecular basis of mental and physical disorders. Thus, there is a common platform for the research of Western and Chinese medicine, including the comparison, evaluation and integration. Microarray technology is still considered to be in its infancy; therefore, the number and variety of applications are unlimited. It is also the fist step to assess the drug effect by hexaMplot with three-color cDNA microarray experiment. As mentioned in Ref. [7], the inclusion of Alexa594 as a third dye-label does not cause additional noise or unexpected results in the data. However the method adds complexity to the data analysis. It is still possible to handle the data with some statistical tools for the analysis of the multiple experimental factors. Based on the patterns in hexaMplot, we try to apply the hypothesis test of correlation coefficient to assess the drug effect on one chip. But the value from experience is required to be given and we set q 0 = 0.8 in the research of Chinese medicine. More methods are expected to analyze three-color microarray data so that wider applications are developed in microarray technology.
References [1] M.J. Marton, J.L. DeRisi, H.A. Bennett, V.R. Iyer, M.R. Meyer, C.J. Roberts, R. Stoughton, J. Burchard, D. Slade, H. Dai, D.E. Bassett Jr., L.H. Hartwell, P.O. Brown, S.H. Friend, Drug target validation and identification of secondary drug target effects using DNA microarrays, Nature Medicine 4 (1998) 1293 – 1301. [2] C. Debouck, P.N. Goodfellow, DNA microarrays in drug discovery and development, Nature Genetics 21 (1999) 48 – 50. [3] Y.J. Cho, J.D. Meade, J.C. Walden, X. Chen, Z. Guo, P. Liang, Multicolor fluorescent differential display, Biotechniques 30 (2001) 562 – 572. [4] G.T. Tsangaris, A. Botsonis, I. Politis, F.T. Stathopoulou, Evaluation of cadmium-induced trascriptome alterations by three color cDNA labeling microarray analysis on a T-cell line, Toxicology 178 (2002) 135 – 160. [5] M.J. Hessner, X. Wang, K. Hulse, L. Meyer, Y. Wu, S. Nye, S.W. Guo, S. Ghosh, Three color cDNA microarrays: quantitative assessment through the use of fluorescein-labeled probes, Nucleic Acids Res. 31 (2003) e14. [6] M.J. Hessner, X. Wang, S. Khan, L. Meyer, M. Schlicht, J. Tackes, M.W. Datta, H.J. Jacob, S. Ghosh, Use of a three-color cDNA microarray platform to measure and control support-bound probe for improved data quality and reproducibility, Nucleic Acids Res. 31 (2003) e60. [7] T. Forster, Y. Costa, D. Roy, H.J. Cooke, K. Maratou, Triple-target microarray experiments: a novel experimental strategy, BMC Genomics 5 (13) (2004). [8] Y.H. Yang, S. Dudoit, P. Luu, D.M. Lin, V. Peng, J. Ngai, T.P. Speed, Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation, Nucleic Acids Res. 30 (2002) e15. [9] D.D. Mari, S. Kotz, Correlation and Dependence, Imperial College Press, London, 2001.