Clinical Immunology 112 (2004) 231 – 234 www.elsevier.com/locate/yclim
Rapid Communication
Gene expression profiling in peripheral blood mononuclear cells from lupus patients with active and inactive disease Violeta Rus, a,* Hegang Chen, b Valentina Zernetkina, a Laurence S. Magder, b Susan Mathai, a Marc C. Hochberg, a and Charles S. Via a a
Division of Rheumatology and Clinical Immunology, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA b Division of Biostatistics and Bioinformatics Department of Epidemiology and Preventive Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA Received 19 May 2004; accepted with revision 4 June 2004
Abstract Systemic lupus erythematosus (SLE) is characterized by periods of flare and remission. The search for parameters associated with disease activity has been an area of intense investigation. To identify genes that best differentiate patients with active from those with inactive disease, the expression pattern of 375 genes was analyzed in peripheral blood mononuclear cells (PBMC) from 12 patients with active and 14 patients with inactive disease. Using the ‘‘nearest shrunken centroids’’ method, 29 genes were found to best discriminate the two groups. Among these genes, 14 were upregulated and 15 were downregulated in patients with active compared to those with inactive disease. Fourteen of these genes also correlated with SELENA-SLEDAI with correlation coefficients ranging from 0.4 to 0.7. Most of these genes have not been previously associated with disease activity and belong to a variety of families such as adhesion molecules, proteases, TNF superfamily, and neurotrophic factors. Using a cross-validation method, the error rate for classifying samples in the two groups was 30%. These results highlight the potential use of microarray data in identifying genes associated with disease activity in SLE, which could become potential biomarkers or future therapeutic targets. D 2004 Elsevier Inc. All rights reserved. Keywords: Systemic lupus eythematosus; Disease activity; cDNA array
Introduction The search for parameters associated with disease activity in SLE has been an area of intense investigation. The value of testing one or more parameters at a time in predicting current or subsequent disease activity is still controversial [1]. Gene expression profiling may be more successful in this quest than previously established methods as simultaneous detection of expression levels of multiple genes may perform better than the behavior of one or a few biomarkers. A growing body of research has suggested the ability of gene expression profiling to distinguish known or previously unrecognized subclasses of various diseases or * Corresponding author. Division of Rheumatology and Clinical Immunology, Department of Medicine, University of Maryland at Baltimore, MSTF Building, Room 8-34, 10 S Pine Street, Baltimore, MD 21201. Fax: +1-410-706-0231. E-mail address:
[email protected] (V. Rus). 1521-6616/$ - see front matter D 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.clim.2004.06.005
predict clinical outcomes such as survival and response to treatment. Using microarray methodology, several groups, including ours, have reported differentially expressed genes in PBMC from patients with SLE compared to controls (reviewed in Ref. [2]). In particular, the expression of type I and to a lesser extent of type II interferon-regulated genes was reported in samples from SLE patients [3– 5]. Furthermore, significant correlations between the expression of interferon-regulated genes and the number of SLE criteria [3] or disease activity [4] was reported. To assess the feasibility of microarray-based prediction of disease activity in SLE and identify genes that best discriminate between active and inactive disease, we analyzed the transcription profile of 375 known genes of potential relevance to the pathogenesis of SLE in PBMC from patients with active and inactive disease. We used the ‘‘nearest shrunken centroids’’ method, one of the approaches devised for class prediction from gene expression profiling data [6]. The ability of this method to select the smallest set of genes that best discrim-
232
V. Rus et al. / Clinical Immunology 112 (2004) 231–234
inates between classes might be an important advantage in the search for diagnostic or predictive markers.
Materials and methods Patients Twenty-six patients fulfilling the American College of Rheumatology classification criteria for SLE [7] were studied. Disease activity in SLE patients was determined using SELENA-SLEDAI score [8]. Twelve patients had clinically quiescent disease (SLEDAI < 4) and 14 had moderate – severe disease (SLEDAI z 4) at the time of the examination. The protocol for the study was approved by the Internal Review Board of the University of Maryland School of Medicine. Written informed consent was obtained from all participating patients and controls. PBMC preparation and cDNA hybridization PBMC were purified from heparinized venous blood by density-gradient centrifugation over Ficoll-Hipaque. Total RNA was generated from samples using RNA-Stat (TelTest, Friendswood, TX). RNA integrity was checked on an agarose gel. Two Ag of tRNA were reverse transcribed into cDNA using [32P] labelled dCTP (NEN Life Science Products Inc. of Boston, MA). Each cDNA probe was hybridized overnight with the Panorama Cytokine Gene Array membranes (Sigma Genosys, Inc, Woodland, TX) which are spotted with cDNAs for 375 known genes in duplicates. The complete list of genes with accession numbers is published at http://www.sigmagenosys.com. Developed membranes were washed and imaged on a Molecular Dynamics Storm Phosphoimager (Molecular Dynamics, Sunnyvale, CA) after overnight exposure to PhosphorImager screens. Digital images were aligned to a template provided by the manufacturer and hybridization intensities for each spot were measured using Image Quant software (Molecular Dynamics). Numeric spot density data were exported into Microsoft Excel for further processing. Data analysis Duplicate spots for each gene were averaged. To remove systematic effects due to nonbiological sources of variation between arrays, log transformed data were normalized using an ANOVA model [9]. Prediction Analysis for Microarrays (PAM) was used to perform the ‘‘nearest shrunken centroids’’ analysis [6]. The software and a full description of the methodology can be accessed at http:// www-stat.stanford.edu/~tibs/PAM/Rdist/. The misclassification rate of the gene classifier was estimated using the 10-fold resampling cross-validation method implemented in PAM. Briefly, samples were randomly divided in 10 approximately equal parts. The program then built a gene
classifier using training sets containing 90% of the samples and the error rates were computed on validation sets that contain 10% of the samples. A resampling procedure was repeated 10 times and error rates were averaged. The gene set and misclassification rate thus identified was compared with the set of differentially expressed genes selected by Wilcoxon rank-sum test (P < 0.05) and the error rate of discriminant analysis performed using Fisher’s linear discriminant function. In addition, for each gene, the Pearson correlation coefficient between SELENA-SLEDAI score and gene expression was calculated.
Results and discussion Demographic and clinical characteristics of the patients are presented in Table 1. Inactive patients (n = 12) had clinically quiescent disease with or without serological activity (low complement and/or increased anti ds-DNA antibody) and a mean of SELENA-SLEDAI of 1 (range 0 – 3). Patients with active disease (n = 14) had moderate to severe manifestations involving the renal (n = 7), musculoskeletal (n = 4), mucocutaneous system (n = 5), CNS (n = 1), and vasculitis (n = 1). Mean SELENA-SLEDAI for this group was 8 (range 4– 16). Using the ‘‘nearest shrunken centroids’’ method, a set of 29 genes was found to best discriminate active from inactive patients (Fig. 1). Comparing these genes to 37 differentially expressed genes that were identified by Wilcoxon rank-sum test, 24 out of 29 (83%) were identified by both methods. Furthermore, 14 of the 29 genes were also highly correlated with SELENA-SLEDAI with correlation coefficients ranging from 0.4 to 0.7. Table 1 Demographic characteristics and clinical manifestations of SLE patients with active and inactive disease Characteristic
Inactive SLE (n = 12)
Active SLE (n = 14)
No female/no male Age mean (range) years Race Caucasian African-American Asian SLEDAI mean (range) Disease manifestations None Visual disturbance Vasculitis Arthritis Hematuria Proteinuria Pyuria Alopecia Mucosal ulcers Skin rash Low complement Increased DNA Leukopenia
12/0 34.4 (21 – 53)
14/0 37.6 (22 – 53)
3 9 0 1 (0 – 3)
1 12 1 8 (4 – 17)
6 0 0 0 0 0 0 0 0 0 2 4 0
0 1 1 4 4 6 1 1 1 3 10 5 2
V. Rus et al. / Clinical Immunology 112 (2004) 231–234
233
Fig. 1. Genes identified by the ‘‘nearest shrunken centroids’’ method that best discriminate samples of active from inactive SLE patients. Data are presented in a matrix format with each row representing a single gene and each column representing a single sample. Genes are arranged according to hierarchical cluster analysis [16]. Red indicates transcript levels greater than the mean and green represents genes expressed at lower levels compared to the mean. Horizontal partition divides genes with higher expression from those with lower expression in patients with active versus inactive disease. * indicates genes that correlated with SELENA-SLEDAI by Pearson correlation (P < 0.05).
From the 29 identified genes, 14 were upregulated and 25 were downregulated in the group with active compared to inactive disease. The identified genes belong to multiple families but none of them was predominantly represented among the upregulated or downregulated genes. Not surprisingly, functionally related genes such as TIMP-3 and TIMP-4 from the protease family, P- and R-Cadherin from the adhesion molecule family, NT-4 and NT-3 from the neurotrophic factors family and the costimulatory molecules CD40 and CD27 were coexpressed across samples. Of particular interest is the coexpression of TNFa converting enzyme (TACE) and TNF RII. These genes were highly correlated with each other (r = 0.53) and with SELENA-SLEDAI (r = 0.68 for TACE and r = 0.4 for TNF RII). This observation is consistent with the known physiologic link between the two genes. TACE has been implicated in the cleavage of transmembrane pro-TNFa and shedding of several receptors and proteins such as TNF RI and II, IL-6R, L-selectin, and FasL [10,11]. Elevated levels of these proteins have been reported in serum of patients with active SLE [12 –14]. The cleavage of several receptors and/or ligands by metalloproteases might reflect a downregulatory pathway that reduces the number of surface receptors/ligand interactions in active disease by producing a pool of soluble receptors competing for ligand with membrane bound receptors. Given clinical consequences of misclassifications, providing reliable estimates of the accuracy of gene classifiers
is extremely important. Using a 10-fold cross-validation method implemented in PAM, the error rate for classifying samples in the two groups was 30%. Although the misclassification rate of Fisher’s linear discriminant function approach was 0%, the error rate obtained by the crossvalidation method represents a more reliable estimate of the generalization error (error rate of classification for a new sample). Performance assessments obtained using the same data set to build the gene classifier and to assess its performance, as in Fisher’s linear discriminant function approach, can grossly overestimate the classification accuracy [15]. The relatively high error rate might be due to the limited number of samples analyzed and the relatively small number of genes on the array used in this study that might not reflect the complexity of regulation or the predominant involvement of a specific pathway. In addition, heterogeneity of clinical manifestations and possibly subtle biological differences between the two subsets of patients may preclude an accurate distinction by microarray expression measures. In conclusion, using the ‘‘nearest shrunken centroids’’ approach, we identified a set of genes in PBMC from lupus patients that distinguishes patients with active from those with inactive disease with an accuracy of 70%. Identifying such genes is of considerable interest for understanding pathways relevant to disease activity in SLE. However, larger and more elaborate studies will be required to
234
V. Rus et al. / Clinical Immunology 112 (2004) 231–234
determine the optimal method for selecting a more accurate set of predictor genes that can provide clinically valuable input. Acknowledgments Supported by grants from the NIH (K23 AR02135-01) and by the Arthritis Foundation Investigator Award. References [1] J.M. Esdaile, M. Abrahamowicz, L. Joseph, T. MacKenzie, Y. Li, D. Danoff, Arthritis Rheum. 39 (1996) 370 – 378. [2] M.K. Crow, Arthritis Rheum. 48 (2003) 2396 – 2401. [3] E.C. Baechler, F.M. Batliwalla, G. Karypis, P.M. Gaffney, W.A. Ortmann, K.J. Espe, K.B. Shark, W.J. Grande, K.M. Hughes, V. Kapur, P.K. Gregersen, T.W. Behrens, Proc. Natl. Acad. Sci. U. S. A. 100 (2003) 2610 – 2615. [4] L. Bennett, A.K. Palucka, E. Arce, V. Cantrell, J. Borvak, J. Banchereau, V. Pascual, J. Exp. Med. 197 (2003) 711 – 723. [5] M.K. Crow, S. George, S.A. Paget, N. Ly, R. Woodwarf, K. Fry, A. Chan, J. Prentice, J. Wohlgemuth, Arthritis Rheum. 46 (2002) S281A.
[6] R. Tibshirani, T. Hastie, B. Narasimhan, G. Chu, Proc. Natl. Acad. Sci. U. S. A. 99 (2002) 6567 – 6572. [7] E.M. Tan, A.S. Cohen, J.F. Fries, A.T. Masi, D.J. McShane, N.F. Rothfield, J.G. Schaller, N. Talal, R.J. Winchester, Arthritis Rheum. 25 (1982) 1271 – 1277. [8] C. Bombardier, D.D. Gladman, M.B. Urowitz, D. Caron, C.H. Chang, Arthritis Rheum. 35 (1992) 630 – 640. [9] R.D. Wolfinger, G. Gibson, E.D. Wolfinger, L. Bennett, H. Hamadeh, P. Bushel, C. Afshari, R.S. Paules, J. Comput. Biol. 8 (2001) 625 – 637. [10] E. Rovida, A. Paccagnini, M. Del Rosso, J. Peschon, P. Dello Sbarba, J. Immunol. 166 (2001) 1583 – 1589. [11] R.C. Newton, K.A. Solomon, M.B. Covington, C.P. Decicco, P.J. Haley, S.M. Friedman, K. Vaddi, Ann. Rheum. Dis. 60 (Suppl. 3) (2001) iii25 – iii32. [12] A. Studnicka-Benke, G. Steiner, P. Petera, J.S. Smolen, Br. J. Rheumatol. 35 (1996) 1067 – 1074. [13] J. Font, P. Pizcueta, M. Ramos-Casals, R. Cervera, M. Garcia-Carrasco, M. Navarro, M. Ingelmo, P. Engel, Clin. Exp. Immunol. 119 (2000) 169 – 174. [14] K. Nozawa, N. Kayagaki, Y. Tokano, H. Yagita, K. Okumura, H. Hasimoto, Arthritis Rheum. 40 (1997) 1126 – 1129. [15] S. Dudoit, J. Fridlyand, in: D.P. Berrar, W. Dubitzky, M. Graznow (Eds.), Understanding and Using Microarrays Analysis Techniques: A Practical Guide, Kluwer Academic Press, Boston, 2002, pp. 93 – 158. [16] M.B. Eisen, P.T. Spellman, P.O. Brown, D. Botstein, Proc. Natl. Acad. Sci. U. S. A. 95 (1998) 14863 – 14888.