A toolbox of novel murine house-keeping genes identified by meta-analysis of large scale gene expression profiles

A toolbox of novel murine house-keeping genes identified by meta-analysis of large scale gene expression profiles

Biochimica et Biophysica Acta 1779 (2008) 830–837 Contents lists available at ScienceDirect Biochimica et Biophysica Acta j o u r n a l h o m e p a ...

1MB Sizes 3 Downloads 73 Views

Biochimica et Biophysica Acta 1779 (2008) 830–837

Contents lists available at ScienceDirect

Biochimica et Biophysica Acta j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / b b a g r m

A toolbox of novel murine house-keeping genes identified by meta-analysis of large scale gene expression profiles Markus Frericks, Charlotte Esser ⁎ Institut für Umweltmedizinische Forschung (IUF) at the Heinrich Heine-University of Düsseldorf, Division of Molecular Immunology, Auf'm Hennekamp 50, 40225 Düsseldorf, Germany

a r t i c l e

i n f o

Article history: Received 23 May 2008 Received in revised form 12 August 2008 Accepted 19 August 2008 Available online 27 August 2008 Keywords: Genomics House-keeping gene Reference gene Gapdh Hprt Mouse

a b s t r a c t Quantitative and semi-quantitative analysis of gene transcripts requires normalization to RNA-input and/or invariantly expressed house-keeping genes. Currently, only a limited choice of reference genes exists, such as GAPDH1, β-actin, or HPRT, whose transcription levels may be less stable than previously thought. We used the meta-database NC-GED, which we had derived from 1968 published murine expression profiles to identify genes with (i) low inter-tissue expression variability and (ii) great stability over 312 conditions, such as experimental drug treatment, age or differentiation. We identified 276 novel genes with “house-keeping” characteristics, including many genes for ribosomal proteins, and aryl-hydrocarbon receptor-interacting protein. Most genes yielded medium to strong fluorescence intensity on the arrays, a relative measure for their cellular expression. We validated the invariant expression levels of eight of the house-keeper candidates in lymph nodes, thymus, liver, kidney and brain of four different mouse strains. In addition, comparative analysis showed the superiority of multiple over single standardization. Caution against established reference genes is justified. The new panel of reference genes is useful for a flexible selection of reference genes in gene expression studies. © 2008 Elsevier B.V. All rights reserved.

1. Introduction After completion of the Human and Murine Genome Projects, the next challenge is understanding the biological roles and inter-relations of genes [1,2]. Methods used include global transcription profiling by microarrays, and (semi-) quantitative mRNA studies. Analysis of global gene expression and of conditionally changed expression patterns can describe parameters of diseased versus physiological state and elucidate cause–effect contexts. All methods used to quantify transcript/protein abundance require correction for sample deviation [3]. In many cases, normalization is done against 18S ribosomal RNA or housekeeping genes (HKGs) [4–6]. HKGs are considered to be of essential importance for the cellular metabolism by classical definition, and thus presumably always expressed. For practical use as reference genes, additional requirements are their constant expression level, irrespective of (i) tissue/cell type, (ii) physiological stage, (iii) exogenous conditions, (iv) single gene deficiencies by e.g. “knock-out”, or (v) disease status of the organism/cell. For most of the reference genes

Abbreviations: HKG, house-keeping gene; GEO, gene expression omnibus; MSI, median signal intensity for all arrays; MG-U74av2, murine genome U74 a version 2 gene chip ⁎ Corresponding author. Tel.: +49 211 3389253; fax: +49 211 3190910. E-mail address: [email protected] (C. Esser). 1 Abbreviations of genes are according to the official nomenclature for Mus musculus provided in medline. 1874-9399/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.bbagrm.2008.08.007

currently used, such ubiquitous and uniform expression has not been formally proven. Moreover, recent studies demonstrated high variability of the expression levels of HKGs under different experimental conditions [5,7–11]. Testing each reference gene in each given experimental setting is not feasible, and therefore employing several reference genes has been proposed [12]. Microarrays measure the transcript abundance for thousands of genes in parallel. Independent microarray analyses of tissue samples from normal healthy donors identified ubiquitously expressed genes, while showing expression variance of classical house-keeping genes. Linear regression analysis over large data sets identified genes with similar expression in various species [13]. However, global gene expression analysis under pathological conditions or various experimental settings is lacking. To identify novel HKGs, we data-mined 1968 expression profiles, originally available via public databases (GEO) [14–16], and transformed for inclusion into our database NCGED. NC-GED contains more than 100 tissue/cell types and 312 conditions of differential gene expression, and permits direct comparability of transcript abundance [17]. 2. Materials and methods 2.1. Animals Three inbred mouse strains, C57BL/6J, Balb/cJ, one outbred line, SKH-1, and the gene-targeted mutant strain CSB, were kept in the IUF's

M. Frericks, C. Esser / Biochimica et Biophysica Acta 1779 (2008) 830–837

animal facility under standard conditions. Lymph nodes, thymus, liver, kidney and brain tissue from 8–12 week old female mice were used. 2.2. Gene expression analysis Organs from three mice were isolated, and RNAs prepared separately for each mouse (n = 3). Total RNA was isolated with TRIzol™ (Invitrogen, Karlsruhe, Germany) and cDNA was synthesized. Selected stably expressed genes were measured by RT-PCR (AIP (aryl-hydrocarbon receptor-interacting protein); Cxxc1 (CXXC finger 1 PHD domain; alternatively called CpG binding protein); Efna3 (ephrin A3); Epn1 (epsin 1); Hgs (HGF-regulated tyrosine kinase substrate); Hprt (hypoxanthine guanine phosphoribosyl transferase); Mrpl48 (mitochondrial ribosomal protein L48); Mtcp1 (mature T-cell proliferation 1); Rps6 (ribosomal protein S6); Tsc1 (tuberous sclerosis 1)). For primer sequences and their chromosomal locations see Supplementary Table S1. Primers were chosen to have versatile PCR temperatures between 54–62 °C. Quantitative PCR reactions were done with the Quantitect Sybr Green Kit (Qiagen, Hilden, Germany) in a Rotor-Gene 3000 thermo cycler (LTF Labortechnik, Wasserburg, Germany). PCR conditions were 15′ 94 °C, 20″ 94 °C, 15″ 55 °C, 20″ 72 °C (45 cycles), 3′ 72 °C.

831

procedure based on the median signal intensity. The processed data were used to calculate the median signal intensity for all arrays (MSI) and expression profiles log2 transformed for further analysis. NC-GED contains the data from 312 conditions of differential gene expression by, e.g., drug treatments, sex, gene targeting, developmental stages or age differences. Note that in NC-GED expression levels are only directly comparable intra genes, but not inter genes. This is due to the hybridization-dynamics which differs for each gene sequence spotted on the chip; thus, a low abundant transcript may hybridize better giving a stronger fluorescence signal, while a highly abundant transcript hybridizes less well, giving a low fluorescence signal. In general, a higher fluorescence signal points to higher abundance of the respective transcript. NC-GED was connected with the statistical data analysis tool R 2.2 for data mining. 2.4. Identification of house-keeping genes For each gene the standard deviation (SD) of the mean expression level over all 1968 normalized arrays in NC-GED, and the coefficient of variation (CV) was calculated. Genes with SD b 1 and CV ≤ 0.1 were considered to be stably expressed across all tissues/cell types. As cut-off level for differential gene expression a threshold of N2-fold was used.

2.3. Database NC-GED 2.5. Gene ontology analysis Details of the generation of NC-GED were described previously [17]. Briefly, 1968 microarrays, all from the MG-U74aV2 Affymetrix™ platform, were downloaded from GEO, normalized and stored in tabular format [17]. For each array the MIAME description was stored independently. Data sets were normalized using a linear scaling

Gene ontology analysis was performed using the GOToolBox (http://crf.univ-mrs.fr/GOToolBox/index.php) using default settings, including hypergeometric test and Bonferroni post-test to correct for multiple testing.

Fig. 1. Heat map of the expression of 33 house-keeping genes in 312 conditions is shown. Data are derived from the meta-database NC-GED as described in Materials and methods. Conditions included drug regimens, pathological situations, influence of unrelated gene deficiencies in “knock-out” mice, gender, age etc. For each gene, the number of cases where the gene is either up- or down-regulated is shown as grey (up) or black (down) bars, and numbers (up, down, not, total).

832

M. Frericks, C. Esser / Biochimica et Biophysica Acta 1779 (2008) 830–837

2.6. Statistical analysis

Table 1 Gene ontology analysis of the putative 276 reference genes

Statistical analysis was performed with GraphPad Prism 5.0. For the comparison between tissues within a single mouse strain, one-way ANOVA with Tukey's post hoc test was performed. Between mouse strains two-way ANOVA followed by Tukey's post hoc test was performed.

Gene ontology term

Number in seta

p-value

Primary metabolic process Macromolecule metabolic process Cellular metabolic process Metabolic process Transport Establishment of localization

95 85 94 99 44 44

0.002 0.004 0.004 0.022 0.036 (0.059)

3. Results 3.1. Analysis of “classical” house-keeping genes for condition dependent variation Only few reference genes (i.e. HKG) are widely used in basic and clinical research [18,19]. For 33 HGKs genes taken from the literature, we performed a NC-GED database analysis of the median expression strength and differential expression under the 312 conditions contained in NC-GED (Fig. 1). Only five genes varied in less than 5% (15/312) of conditions; these were c-abl oncogene 1 (Abl1), tubulin, alpha 8 (Tuba8), aryl-hydrocarbon receptor-interacting protein (Aip), fibroblast growth factor (acidic) intracellular binding protein (Fibp), and ribosomal protein L19 (Rpl19). 25 of the 33 genes were differentially expressed in up to 20% of conditions. Expression of β-actin, a widely used house-keeping gene, was even less consistent (68/312). Low SD roughly correlated with low variance by change of condition. 3.2. Identification of novel house-keeping gene candidates We plotted expression strength versus standard deviation, and CV versus the mean of expression for all 12,468 genes present on the 1968 arrays in NC-GED (see Fig. 2). By definition, HKG expression should be invariant across all conditions. As shown in Fig. 2a, expression strength and variance did not correlate uniformly. Rather, transcripts fell into several categories based on their expression strength, which we classified as low (b6), medium (6 b x b 12) and high (N12), and variance, which we classified into low (SD b 1), medium (1 N SD N 2) and high (SD N 2). A majority showed medium to high variance across tissues, and medium expression levels (see note of caution in material and methods for explanation of inter-gene comparison). Few transcripts had high variance or above average expression strengths. Because the known strong correlation between the number of tissues where a given gene is expressed and average expression level [20], it is possible that the SD can be higher in genes with higher average expression. The coefficient of variation (CV) can be used additionally to reveal genes with steady-across-tissue variation. We calculated the

276 transcripts with a SD b 1 and CV b 0.01 were chosen (see Fig. 2). This data set was analyzed for correspondence to biological processes using the GOToolBox in default settings with Bonferroni testing for false discoveries. p-values were calculated comparing the average number of genes in the category of all genes known to those of the data set. A low p-value thus indicates significant over-representation of the genes in the data set in this category. a Number of unique transcripts within the indicated gene ontology category.

CV for each gene, and only considered genes with a CV ≤ 0.1 further. 276 transcripts classified to be of low variance (SD ≤ 1 and CV ≤ 0.1); only four of these were very abundant (log2 of expression N12) (valosin containing protein (Vcp), SAR1 gene homolog A (Sara1), epsin 1 (Epn1), and nodal modulator 1 (Nomo1). A full list of these 276 genes and the number of conditions, in which their expression varied, is available as Supplementary Table S2. On average, the 276 genes expressed differentially in only 10/312 conditions. A gene ontology analysis identified associated biological processes. The 276 genes encompass 274 non-redundant gene identification numbers, 158 annotated and 116 not annotated. Many of the 276 genes are involved in primary biological processes, such as primary metabolism or transport processes (Table 1). No unique and special biological functions were over-represented in the data set. 3.3. RT-PCR analysis of potential new house-keeping genes RT-PCR for a selection of eight of the 276 potential house-keeping genes validated their abundance and variance of expression in five different tissues (lymph nodes, thymus, liver, kidney and brain) of four genetically different mouse strains. Validated genes covered different biological processes and different levels of expression (8.7 to 11.84). Two “classical” reference genes were analyzed as well (Hprt and Rps6). Their SDs were 1.35 and 1.08, respectively. Hprt expressed differentially in 18/312 of the conditions stored in NC-GED (Fig. 1), Rps6 in 12/312. Rps6 belongs to the 100 most highly expressed genes in NC-GED (log2 = 13.97; see Table 3). All sample genes were easily detectable in three independent RNA preparations. When gene expression was determined against the mean expression of Hprt only (as commonly

Fig. 2. Signal intensity and variance of all transcripts analyzed. (A) For each transcript on the array the median signal intensity over all transcription profiles and the corresponding standard deviation was calculated. The transcripts were grouped according to their expression level and variability. For further analysis two groups were chosen (i) transcripts with a standard deviation b 1 (below black line) have been classified as potential reference genes and (ii) the 100 transcripts showing the strongest expression (see details in Table 3). (B) Coefficient of variation (i.e. the ratio of SD to mean), revealing potential “steady-across-tissues” expression levels.

M. Frericks, C. Esser / Biochimica et Biophysica Acta 1779 (2008) 830–837

done), standard deviations were higher, as compared to the mean of five reference genes (Cxxc1, Hprt, Mrpl48, Mtcp1 and Rps6), (Figs. 3a and b). For the latter analysis, RNA expression differences greater than

833

two-fold were recorded and tested for significance. Table 2 summarizes the number of cases, where expression levels differed significantly between mouse strains or tissue for the eight genes. A small number

Fig. 3. Validation of putative house-keeping genes by qRT-PCR. RNA from lymph nodes, thymus, liver, kidney and spleen from 6–8 week female mice of four different mouse strains, namely C57BL/6, BALB/c, SKH1 and CSB was isolated and used for qRT-PCR. Parameters were chosen to reflect a high degree of genetic variability and expected gene expression differences. The expression strength was either normalized against (A) HPRT alone or (B) the mean expression level of five genes (Cxxc1, Hprt, Mrpl48, Mtcp1, and Rps6) to reduce the effect of outliers. Data were analyzed by one-way or two-way ANOVA, for result summary see Table 2.

834

M. Frericks, C. Esser / Biochimica et Biophysica Acta 1779 (2008) 830–837

Fig. 3 (continued).

indicates uniform expression across tissues and genetic background, i.e., suitability as a reference gene. Two of the proposed reference genes, Aip and Cxxc1 were uniformly expressed in all tissues and mouse strains. This was closely followed by Mrpl48, with changes in only two tested scenarios Hgs and Efna3 expression differed too much between

tissues and mouse strains to be considered a reliable HKG. All other tested transcripts showed some variability of less than 20%. Note that some genes express very uniform across tissues in a given mouse strain, but differ strongly in the same tissue isolated from two mouse genotypes (eg. Mtcp1, Efna3).

M. Frericks, C. Esser / Biochimica et Biophysica Acta 1779 (2008) 830–837

835

Table 2 Experimental determination of RNA variance across tissues or mouse strains for putative house-keeping genes (selected from Fig. 2) Genea

Constant: Tissue

Constant: Genetic background

Variable: Genetic background

Variable: Tissue

Skh1↔B6

Skh1↔CSB

Skh1↔Balb-c

Number of combinations compared

30 (5 tissues in 6 pairs of genotypes)

Aipb Cxxc1 Mrpl48 Epn1 Rps6 Mtcp1 Tsc Hprt Efna3 Hgs

0 0 1⁎c 0 0 0 1⁎ 0 2⁎ 5⁎

0 0 0 0 0 0 1⁎ 0 0 5⁎

0 0 0 0 0 0 0 0 0 0

B6↔CSB

B6↔Balb-c

CSB↔Balb-c

Total

Skh1

0 0 0 0 0 0 1⁎ 0 0 1⁎

0 0 1⁎ 0 0 0 2⁎ 0 0 5⁎

0 0 0 0 0 0 0 0 0 5⁎

0 0 2⁎ 0 0 0 5⁎ 0 2⁎ 21⁎

0 0 0 0 2⁎ 3⁎ 1⁎ 3⁎ 4⁎ 5⁎

C57BL/6J

Sum Total 50

CSB

Balb-c

Total

20 (4 genotypes for 5 tissues) 0 0 0 4⁎ 3⁎ 0 0 0 4⁎ 0

0 0 0

0 0 0 0 0 0 0 5⁎ 1⁎ 3⁎

0 3⁎ 0 0 5⁎ 0

0 0 0 4⁎ 5⁎ 6⁎ 1⁎ 8⁎ 14⁎ 8⁎

0 0 2⁎ 4⁎ 5⁎ 6⁎ 6⁎ 8⁎ 16⁎ 29⁎

a Quantitative real-time RT-PCR for 10 genes was performed from isolated RNA of 5 tissues (liver, thymus, kidney, brain, lymph nodes) in 4 (SKH1, C57BL/6, CSB, BALB/c) mouse strains. The expression level was normalized against the mean expression level of five house-keeping genes (Cxxc1, Hprt, Mrpl48, Mtcp1 and Rps6). Left part of table: Differences between RNA expression levels between two mouse strains (all combinations) for a given tissue. The total number of possibilities is 30. Right part of table: Differences in expression levels in five tissues of the same mouse strain. Number of possibilities is 20. The number of cases where crossing points differed significantly by at least 1 (i.e. an RNA expression difference of N2-fold) from the average was noted. Genes are shown in increasing order from genes with least differential expressions across tissues/mouse strains (summarized in last column). b Gene symbols: Aip (aryl-hydrocarbon receptor-interacting protein); Cxxc1 (CXXC finger 1 PHD domain, other designation CpG binding protein); Efna3 (ephrin A3); Epn1 (epsin 1); Hgs (HGF-regulated tyrosine kinase substrate); Hprt (hypoxanthine guanine phosphoribosyl transferase); Mrpl48 (mitochondrial ribosomal protein L48); Mtcp1 (mature T-cell proliferation 1); Rps6 (ribosomal protein S6); Tsc1 (tuberous sclerosis 1). c Significance was calculated by one-way ANOVA for the tissue comparison in equal genomic backgrounds, and 2-way ANOVA for the comparison of tissues between genetic backgrounds (⁎p b 0.5).

3.4. Analysis of genes with the strongest expression As evident from Fig. 2, high expression and high variability were often correlated. Information regarding their biological function was available in NC-GED for 67 of the top 100 genes with strong signal intensity. Analysis of function with the GOToolBox showed that highly expressed genes code for proteins of indispensable physiological categories, e.g., protein metabolism or ribosome biogenesis (Table 3). In contrast, the 276 putative house-keeping genes (characterized by low variance and medium abundance) are more involved in cellular “administrative” functions, e.g., regulation of transcription, protein localization and modification. 4. Discussion In gene expression studies, house-keeping genes are used for normalization of RNA-input. Triggered by evidence that widely used reference genes may not be as uniformly and constantly expressed as often assumed, we (i) identified novel genes with the characteristics of “house-keeping” in silico, (ii) tested their validity experimentally, and (iii) showed that up-grading normalization with even a small panel of house-keeping genes greatly improves data reliability. HKGs to be used as reference genes should meet the three criteria of ubiquitous expression, low variance, and a reasonable prospect of not being regulated themselves in the experimental condition under investigation. Practical criteria for the choice of a HKG are abundance of the transcript, or the fragment size and melting curve of a respective PCR-product. For these reasons, a panel of house-keeping genes with different transcript abundances and product sizes to choose from would be a useful tool. It is becoming increasingly clear that the important criterion of low variance is often not met for many of the currently used “classical” HKGs, as demonstrated in recent studies [7–9,19,21–23]. In addition, some of these genes (Gapdh, Hprt or β-actin) code for such cellular functions which might render them susceptible to experimental situations [24]. Using the murine database NC-GED, we explored differential gene expression in 312 various conditions, such as diseases, drug regimens, or gene targeting. 276 genes (of the approximately 12,000 represented in the database) changed only marginally, i.e. were robustly expressed independent of the condition.

On average, the newly identified genes were differentially expressed in 12 out of 312 conditions. Note that we could not identify a single gene with no observable expression changes. These data confirm and extend previous data from the literature. For instance, using a different approach (quantitative SAGE in 15 tissues) to determine transcript abundance, Kouadjo et al. showed that roughly 80% of ubiquitously expressed genes nonetheless exhibit expression variance, and that conditions (e.g. hormone status) could influence expression [24]. Hybridization and the resulting fluorescence intensity on Affymetrix™ arrays follow a Langmuire adsorption isotherme, yet it remains a challenge to accurately quantify absolute transcript concentration by fluorescence probe [25,26]. However, recent efforts by the microarray quality control consortium indicate that results achieved with high intensity probes are reproducible by alternative methods, at least for Table 3 Gene ontology analysis of 100 most strongly expressed genes Gene ontology terma

Number in setb

p-value

Protein biosynthesis Macromolecule biosynthesis Cellular biosynthesis Biosynthesis Ribosome biogenesis and assembly Cytoplasm organization and biogenesis Ribosome biogenesis Cellular protein metabolism Protein metabolism Cellular macromolecule metabolism Macromolecule metabolism Cellular physiological process Cellular metabolism Metabolism Organelle organization and biogenesis Cell organization and biogenesis Primary metabolism Cellular process Physiological process Generation of precursor metabolites and energy

33 34 36 37 17 17 15 40 40 40 44 65 54 55 21 24 48 66 66 10

7.0E-27 8.0E-27 4.1E-23 1.7E-22 3.4E-19 2.1E-18 1.8E-16 1.7E-11 2.2E-11 2.6E-11 4.8E-10 2.3E-09 2.5E-08 6.5E-08 8.8E-08 2.7E-05 3.4E-05 0.003 0.004 0.037

a The hundred transcripts showing the strongest fluorescence intensity in the data set were analyzed for their correspondence to biological processes using the GOToolBox in default settings with Bonferroni testing for false discoveries. b Number of unique transcripts corresponding to the given gene ontology category.

836

M. Frericks, C. Esser / Biochimica et Biophysica Acta 1779 (2008) 830–837

the array platform used to create database NC-GED. All reference genes identified here, fall into this category. Congruent with cellular “house-keeping”, the 276 potential novel reference genes code for proteins involved in numerous basic biological processes, mostly of the upper level of the gene ontology hierarchy. Interestingly in this context, humans have a lower ratio of house-keeping genes, reflecting their higher level of evolutionary cell differentiation [27]. To validate our in silico approach, we measured expression levels of a small sample of the 276 genes in five tissues of four mouse strains. Two of those, BABL/c, C57BL/6, are inbred mice, commonly used in many laboratories. SKH-1 is an outbred, hairless, euthymic and immunocompetent mouse strain on an albino background. CSB mice have a single gene deficiency in a DNA-repair enzyme, and albeit on a C57BL/6 background, they display gross phenotype differences from this strain, such as weight loss due to their different feeding behaviour, indicating that this single gene deficiency has consequences on the organismal level. The four mouse strains differ in many aspects. For instance, while BALB/c mice tend to develop tumours, C57BL/6 mice do not. Also single physiological parameters such as haemoglobin content, total cholesterol, plasma glucose, susceptibility to certain toxins, and many others differ significantly between the strains (see [28] and http://www.jax.org/phenome). Although these differences could conceivably influence gene expression directly or in a collateral fashion, expression remained quite stable across conditions (5 tissues in 4 genetic backgrounds) in three out of the ten genes, Aip, Cxxc1 and Mrpl48 scored excellent as HKGs, showing little or no variation between tissues or mouse strains. Aip is a chaperoning protein, first identified as complexed with the aryl-hydrocarbon receptor. Mrpl48 is a mitochondrial ribosomal protein. Cxxc1 is involved in DNA methylation and epigenetic imprinting. Hprt is necessary for DNA synthesis, but it did not score best in our hands. Although widely used as HKG, in specific circumstances, such as a parasitic infection, the Hprt expression level is known to change [29]. Thus, albeit the 276 genes reported here match house-keeping criteria very well, the caveat remains that presumably all genes are potentially subject to regulation. To circumvent this problem, the reference gene of choice could be tested for each experimental situation, which is obviously not practically feasible. Normalization against more than a single reference gene could help avoid the problem as well. Using the mean expression level of five reference genes, tissue-specific or strainspecific effects were effectively reduced. This approach resulted in lower standard deviations in the expression level of the gene of interest. In other words, a simple improvement of PCRs limits the risk of artefacts, and is of acceptable cost and labour. In agreement with reports from the literature, we propose to use genes representing different biological functions for the panel of references [12]. The genes we identified and validated here, can be a good starting point. The most abundant transcripts in a tissue often characterize its specific function [30]. Genes ranking according to their expression levels across the 312 conditions represented in NC-GED revealed that highly expressed “house-keeping” genes are involved in pivotal cellular processes, especially protein biosynthesis, organelle organization and ATP/GTP generation for energy. Numerous transcripts of unknown function were identified in global gene expression function studies, some of them meeting the double criteria “high abundance/ ubiquitous expression”. By inference, such genes may well be relevant for basic cellular processes or tissue specialization. Inclusion of such ranking information should therefore be considered as relevant when analyzing genes of unknown function. Acknowledgements We thank Swantje Steinwachs and Babette Martiensen for their expert technical help. This work was supported through grant BMU.B2 of the German Bundesministerium für Umwelt.

Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.bbagrm.2008.08.007. References [1] E.S. Lander, L.M. Linton, B. Birren, C. Nusbaum, M.C. Zody, J. Baldwin, K. Devon, K. Dewar, M. Doyle, W. FitzHugh, R. Funke, D. Gage, K. Harris, A. Heaford, J. Howland, L. Kann, J. Lehoczky, R. LeVine, P. McEwan, K. McKernan, J. Meldrim, J.P. Mesirov, C. Miranda, W. Morris, J. Naylor, C. Raymond, M. Rosetti, R. Santos, A. Sheridan, C. Sougnez, N. Stange-Thomann, N. Stojanovic, A. Subramanian, D. Wyman, J. Rogers, J. Sulston, R. Ainscough, S. Beck, D. Bentley, J. Burton, C. Clee, N. Carter, A. Coulson, R. Deadman, P. Deloukas, A. Dunham, I. Dunham, R. Durbin, L. French, D. Grafham, S. Gregory, T. Hubbard, S. Humphray, A. Hunt, M. Jones, C. Lloyd, A. McMurray, L. Matthews, S. Mercer, S. Milne, J.C. Mullikin, A. Mungall, R. Plumb, M. Ross, R. Shownkeen, S. Sims, R.H. Waterston, R.K. Wilson, L.W. Hillier, J.D. McPherson, M.A. Marra, E.R. Mardis, L.A. Fulton, A.T. Chinwalla, K.H. Pepin, W.R. Gish, S.L. Chissoe, M.C. Wendl, K.D. Delehaunty, T.L. Miner, A. Delehaunty, J.B. Kramer, L.L. Cook, R.S. Fulton, D.L. Johnson, P.J. Minx, S.W. Clifton, T. Hawkins, E. Branscomb, P. Predki, P. Richardson, S. Wenning, T. Slezak, N. Doggett, J.F. Cheng, A. Olsen, S. Lucas, C. Elkin, E. Uberbacher, M. Frazier, R.A. Gibbs, D.M. Muzny, S.E. Scherer, J.B. Bouck, E.J. Sodergren, K.C. Worley, C.M. Rives, J.H. Gorrell, M.L. Metzker, S.L. Naylor, R.S. Kucherlapati, D.L. Nelson, G.M. Weinstock, Y. Sakaki, A. Fujiyama, M. Hattori, T. Yada, A. Toyoda, T. Itoh, C. Kawagoe, H. Watanabe, Y. Totoki, T. Taylor, J. Weissenbach, R. Heilig, W. Saurin, F. Artiguenave, P. Brottier, T. Bruls, E. Pelletier, C. Robert, P. Wincker, D.R. Smith, L. Doucette-Stamm, M. Rubenfield, K. Weinstock, H.M. Lee, J. Dubois, A. Rosenthal, M. Platzer, G. Nyakatura, S. Taudien, A. Rump, H. Yang, J. Yu, J. Wang, G. Huang, J. Gu, L. Hood, L. Rowen, A. Madan, S. Qin, R.W. Davis, N.A. Federspiel, A.P. Abola, M.J. Proctor, R.M. Myers, J. Schmutz, M. Dickson, J. Grimwood, D.R. Cox, M.V. Olson, R. Kaul, C. Raymond, N. Shimizu, K. Kawasaki, S. Minoshima, G.A. Evans, M. Athanasiou, R. Schultz, B.A. Roe, F. Chen, H. Pan, J. Ramser, H. Lehrach, R. Reinhardt, W.R. McCombie, B.M. de la, N. Dedhia, H. Blocker, K. Hornischer, G. Nordsiek, R. Agarwala, L. Aravind, J.A. Bailey, A. Bateman, S. Batzoglou, E. Birney, P. Bork, D.G. Brown, C.B. Burge, L. Cerutti, H.C. Chen, D. Church, M. Clamp, R.R. Copley, T. Doerks, S.R. Eddy, E.E. Eichler, T.S. Furey, J. Galagan, J.G. Gilbert, C. Harmon, Y. Hayashizaki, D. Haussler, H. Hermjakob, K. Hokamp, W. Jang, L.S. Johnson, T.A. Jones, S. Kasif, A. Kaspryzk, S. Kennedy, W.J. Kent, P. Kitts, E.V. Koonin, I. Korf, D. Kulp, D. Lancet, T.M. Lowe, A. McLysaght, T. Mikkelsen, J.V. Moran, N. Mulder, V.J. Pollara, C.P. Ponting, G. Schuler, J. Schultz, G. Slater, A.F. Smit, E. Stupka, J. Szustakowski, D. Thierry-Mieg, J. Thierry-Mieg, L. Wagner, J. Wallis, R. Wheeler, A. Williams, Y.I. Wolf, K.H. Wolfe, S.P. Yang, R.F. Yeh, F. Collins, M.S. Guyer, J. Peterson, A. Felsenfeld, K.A. Wetterstrand, A. Patrinos, M.J. Morgan, J.P. de, J.J. Catanese, K. Osoegawa, H. Shizuya, S. Choi, Y.J. Chen, Initial sequencing and analysis of the human genome, Nature 409 (2001) 860–921. [2] R.H. Waterston, K. Lindblad-Toh, E. Birney, J. Rogers, J.F. Abril, P. Agarwal, R. Agarwala, R. Ainscough, M. Alexandersson, P. An, S.E. Antonarakis, J. Attwood, R. Baertsch, J. Bailey, K. Barlow, S. Beck, E. Berry, B. Birren, T. Bloom, P. Bork, M. Botcherby, N. Bray, M.R. Brent, D.G. Brown, S.D. Brown, C. Bult, J. Burton, J. Butler, R. D. Campbell, P. Carninci, S. Cawley, F. Chiaromonte, A.T. Chinwalla, D.M. Church, M. Clamp, C. Clee, F.S. Collins, L.L. Cook, R.R. Copley, A. Coulson, O. Couronne, J. Cuff, V. Curwen, T. Cutts, M. Daly, R. David, J. Davies, K.D. Delehaunty, J. Deri, E.T. Dermitzakis, C. Dewey, N.J. Dickens, M. Diekhans, S. Dodge, I. Dubchak, D.M. Dunn, S.R. Eddy, L. Elnitski, R.D. Emes, P. Eswara, E. Eyras, A. Felsenfeld, G.A. Fewell, P. Flicek, K. Foley, W.N. Frankel, L.A. Fulton, R.S. Fulton, T.S. Furey, D. Gage, R.A. Gibbs, G. Glusman, S. Gnerre, N. Goldman, L. Goodstadt, D. Grafham, T.A. Graves, E.D. Green, S. Gregory, R. Guigo, M. Guyer, R.C. Hardison, D. Haussler, Y. Hayashizaki, L. W. Hillier, A. Hinrichs, W. Hlavina, T. Holzer, F. Hsu, A. Hua, T. Hubbard, A. Hunt, I. Jackson, D.B. Jaffe, L.S. Johnson, M. Jones, T.A. Jones, A. Joy, M. Kamal, E.K. Karlsson, D. Karolchik, A. Kasprzyk, J. Kawai, E. Keibler, C. Kells, W.J. Kent, A. Kirby, D.L. Kolbe, I. Korf, R.S. Kucherlapati, E.J. Kulbokas, D. Kulp, T. Landers, J.P. Leger, S. Leonard, I. Letunic, R. LeVine, J. Li, M. Li, C. Lloyd, S. Lucas, B. Ma, D.R. Maglott, E.R. Mardis, L. Matthews, E. Mauceli, J.H. Mayer, M. McCarthy, W.R. McCombie, S. McLaren, K. McLay, J.D. McPherson, J. Meldrim, B. Meredith, J.P. Mesirov, W. Miller, T.L. Miner, E. Mongin, K.T. Montgomery, M. Morgan, R. Mott, J.C. Mullikin, D.M. Muzny, W.E. Nash, J.O. Nelson, M.N. Nhan, R. Nicol, Z. Ning, C. Nusbaum, M.J. O'Connor, Y. Okazaki, K. Oliver, E. Overton-Larty, L. Pachter, G. Parra, K.H. Pepin, J. Peterson, P. Pevzner, R. Plumb, C.S. Pohl, A. Poliakov, T.C. Ponce, C.P. Ponting, S. Potter, M. Quail, A. Reymond, B.A. Roe, K.M. Roskin, E.M. Rubin, A.G. Rust, R. Santos, V. Sapojnikov, B. Schultz, J. Schultz, M.S. Schwartz, S. Schwartz, C. Scott, S. Seaman, S. Searle, T. Sharpe, A. Sheridan, R. Shownkeen, S. Sims, J.B. Singer, G. Slater, A. Smit, D.R. Smith, B. Spencer, A. Stabenau, N. Stange-Thomann, C. Sugnet, M. Suyama, G. Tesler, J. Thompson, D. Torrents, E. Trevaskis, J. Tromp, C. Ucla, A. Ureta-Vidal, J.P. Vinson, A.C. Von Niederhausern, C.M. Wade, M. Wall, R.J. Weber, R.B. Weiss, M.C. Wendl, A.P. West, K. Wetterstrand, R. Wheeler, S. Whelan, J. Wierzbowski, D. Willey, S. Williams, R.K. Wilson, E. Winter, K.C. Worley, D. Wyman, S. Yang, S.P. Yang, E.M. Zdobnov, M.C. Zody, E.S. Lander, Initial sequencing and comparative analysis of the mouse genome, Nature 420 (5-12-2002) 520–562. [3] R.E. Ferguson, H.P. Carroll, A. Harris, E.R. Maher, P.J. Selby, R.E. Banks, Housekeeping proteins: a preliminary study illustrating some limitations as useful references in protein expression studies, Proteomics 5 (2005) 566–571. [4] O. Thellin, W. Zorzi, B. Lakaye, B.B. De, B. Coumans, G. Hennen, T. Grisar, A. Igout, E. Heinen, Housekeeping genes as internal standards: use and limits, J. Biotechnol. 75 (1999) 291–295.

M. Frericks, C. Esser / Biochimica et Biophysica Acta 1779 (2008) 830–837 [5] C. Tricarico, P. Pinzani, S. Bianchi, M. Paglierani, V. Distante, M. Pazzagli, S.A. Bustin, C. Orlando, Quantitative real-time reverse transcription polymerase chain reaction: normalization to rRNA or single housekeeping genes is inappropriate for human tissue biopsies, Anal. Biochem. 309 (2002) 293–300. [6] D. Goidin, A. Mamessier, M.J. Staquet, D. Schmitt, O. Berthier-Vergnes, Ribosomal 18S RNA prevails over glyceraldehyde-3-phosphate dehydrogenase and betaactin genes as internal standard for quantitative comparison of mRNA levels in invasive and noninvasive human melanoma cell subpopulations, Anal. Biochem. 295 (2001) 17–21. [7] S. Selvey, E.W. Thompson, K. Matthaei, R.A. Lea, M.G. Irving, L.R. Griffiths, Beta-actin —an unsuitable internal control for RT-PCR, Mol. Cell. Probes 15 (2001) 307–311. [8] E. Deindl, K. Boengler, R.N. van, W. Schaper, Differential expression of GAPDH and beta3-actin in growing collateral arteries, Mol. Cell Biochem. 236 (2002) 139–146. [9] S. Bilodeau-Goeseels, G.A. Schultz, Changes in the relative abundance of various housekeeping gene transcripts in in vitro-produced early bovine embryos, Mol. Reprod. Dev. 47 (1997) 413–420. [10] A. Nil, E. Firat, V. Sobek, K. Eichmann, G. Niedermann, Expression of housekeeping and immunoproteasome subunit genes is differentially regulated in positively and negatively selecting thymic stroma subsets, Eur. J. Immunol. 34 (2004) 2681–2689. [11] C. Rubie, K. Kempf, J. Hans, T. Su, B. Tilton, T. Georg, B. Brittner, B. Ludwig, M. Schilling, Housekeeping gene variability in normal and cancerous colorectal, pancreatic, esophageal, gastric and hepatic tissues, Mol. Cell Probes. 19 (2005) 101–109. [12] J. Vandesompele, P.K. De, F. Pattyn, B. Poppe, R.N. Van, P.A. De, F. Speleman, Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes, Genome Biol. 3 (2002) RESEARCH0034. [13] W. Zhang, Q.D. Morris, R. Chang, O. Shai, M.A. Bakowski, N. Mitsakakis, N. Mohammad, M.D. Robinson, R. Zirngibl, E. Somogyi, N. Laurin, E. Eftekharpour, E. Sat, J. Grigull, Q. Pan, W.T. Peng, N. Krogan, J. Greenblatt, M. Fehlings, K.D. van der, J. Aubin, B.G. Bruneau, J. Rossant, B.J. Blencowe, B.J. Frey, T.R. Hughes, The functional landscape of mouse gene expression, J. Biol. 3 (2004) 21. [14] A. Brazma, H. Parkinson, U. Sarkans, M. Shojatalab, J. Vilo, N. Abeygunawardena, E. Holloway, M. Kapushesky, P. Kemmeren, G.G. Lara, A. Oezcimen, P. Rocca-Serra, S.A. Sansone, ArrayExpress—a public repository for microarray gene expression data at the EBI, Nucleic Acids Res. 31 (2003) 68–71. [15] K. Ikeo, J. Ishi-i, T. Tamura, T. Gojobori, Y. Tateno, CIBEX: center for information biology gene expression database, C. R. Biol. 326 (2003) 1079–1082.

837

[16] R. Edgar, M. Domrachev, A.E. Lash, Gene Expression Omnibus: NCBI gene expression and hybridization array data repository, Nucleic Acids Res. 30 (1-1-2002) 207–210. [17] M. Frericks, M. Meissner, C. Esser, Microarray analysis of the AHR system: tissue-specific flexibility in signal and target genes, Toxicol. Appl. Pharmacol. 220 (2007) 320–332. [18] M.D. Al-Bader, H.A. Al-Sarraf, Housekeeping gene expression during fetal brain development in the rat—validation by semi-quantitative RT-PCR, Brain Res. Dev. Brain Res. 156 (2005) 38–45. [19] X. Zhang, L. Ding, A.J. Sandford, Selection of reference genes for gene expression studies in human neutrophils by real-time PCR, BMC Mol. Biol. 6 (2005) 4. [20] A.E. Vinogradov, Dualism of gene GC content and CpG pattern in regard to expression in the human genome: magnitude versus breadth, Trends Genet. 21 (2005) 639–643. [21] Y. Huang, J.C. Hsu, M. Peruggia, A.A. Scott, Statistical selection of maintenance genes for normalization of gene expressions, Stat. Appl. Genet. Mol. Biol. 5 (2006) Article4. [22] M. Ganapathi, P. Srivastava, S.K. Das Sutar, K. Kumar, D. Dasgupta, S.G. Pal, V. Brahmachari, S.K. Brahmachari, Comparative analysis of chromatin landscape in regulatory regions of human housekeeping and tissue specific genes, BMC Bioinformatics 6 (2005) 126. [23] E. Eisenberg, E.Y. Levanon, Human housekeeping genes are compact, Trends Genet. 19 (2003) 362–365. [24] K.E. Kouadjo, Y. Nishida, J.F. Cadrin-Girard, M. Yoshioka, J. St-Amand, Housekeeping and tissue-specific genes in mouse tissues, BMC Genomics 8 (2007) 127. [25] G.A. Held, G. Grinstein, Y. Tu, Relationship between gene expression and observed intensities in DNA microarrays—a modeling study, Nucleic Acids Res. 34 (2006) e70. [26] D. Hekstra, A.R. Taussig, M. Magnasco, F. Naef, Absolute mRNA concentrations from sequence-specific calibration of oligonucleotide arrays, Nucleic Acids Res. 31 (2003) 1962–1968. [27] A.E. Vinogradov, O.V. Anatskaya, Organismal complexity, cell differentiation and gene expression: human over mouse, Nucleic Acids Res. 35 (2007) 6350–6356. [28] K. Paigen, J.T. Eppig, A mouse phenome project, Mamm. Genome 11 (2000) 715–717. [29] T. Hoque, M. Bhogal, R.A. Webb, Validation of internal controls for gene expression analysis in the intestine of rats infected with Hymenolepis diminuta, Parasitol. Int. 56 (2007) 325–329. [30] K.E. Kouadjo, M. Yoshioka, Y. Nishida, J. St-Amand, Most expressed transcripts in sexual organs and other tissues, Mol. Reprod. Dev. 75 (2008) 230–242.