Comparative Biochemistry and Physiology, Part D 8 (2013) 283–291
Contents lists available at ScienceDirect
Comparative Biochemistry and Physiology, Part D journal homepage: www.elsevier.com/locate/cbpd
Assessing gene network stability and individual variability in the fathead minnow (Pimephales promelas) transcriptome Christopher J. Martyniuk ⁎, Jeff Houlahan Canadian Rivers Institute and Department of Biology, University of New Brunswick, Saint John, New Brunswick E2L 4L5, Canada
a r t i c l e
i n f o
Article history: Received 10 June 2013 Received in revised form 14 August 2013 Accepted 15 August 2013 Available online 20 August 2013 Keywords: Molecular Individual variation Experimental power Sub-network enrichment analysis Ecotoxicogenomics
a b s t r a c t Transcriptomics is increasingly used to assess biological responses to environmental stimuli and stressors such as aquatic pollutants. However, fundamental studies characterizing individual variability in mRNA levels are lacking, which currently limits the use of transcriptomics in environmental monitoring assessments. To address individual variability in transcript abundance, we performed a meta-analysis on 231 microarrays that were conducted in the fathead minnow (FHM), a widely used toxicological model. The mean variability for gene probes was ranked from most to least variable based upon the coefficient of variation. Transcripts that were the most variable in individual tissues included NADH dehydrogenase flavoprotein 1, GTPase IMAP family member 7-like and v-set domain-containing T-cell activation inhibitor 1-like while genes encoding ribosomal proteins (rpl24 and rpl36), basic transcription factor 3, and nascent polypeptide-associated complex alpha subunit were the least variable in individuals across a range of microarray experiments. Gene networks that showed high variability (based upon the variation in expression of individual members within the network) included cell proliferation, metabolism (steroid, lipids, and glucose), cell adhesion, vascularization, and regeneration while those that showed low variability (more stability) included mRNA and rRNA processing, regulation of translational fidelity, RNA splicing, and ribosome biogenesis. Real-time PCR was conducted on a subset of genes for comparison of variability collected from the microarrays. There was a significant positive relationship between the two methods when measuring individual variability, suggesting that variability detected in microarray data can be used to guide decisions on sample sizes for measuring transcripts in real-time PCR experiments. A power analysis revealed that measuring estrogen receptor ba (esrba) requires fewer biological replicates than that of estrogen receptor bb (esrbb) in the gonad and samples sizes required to detect a 50% change for reproductive-related transcripts is between 12 and 20. Characterizing individual variability at the molecular level will prove necessary as efforts are made toward integrating molecular tools into environmental risk assessments. © 2013 Elsevier Inc. All rights reserved.
1. Introduction Quantifying individual variation is needed in order to generate better predictions about biological responses to environmental stimuli and stressors. In ecotoxicology, individual variation in small bodied fish species has been assessed for morphological and physiological endpoints, such as gonadosomatic index (GSI) and plasma sex steroids (Watanabe et al., 2007; Melvin et al., 2009). These studies have Abbreviations: CDC45, cell division cycle 45 homolog (S. cerevisiae); CDC6, cell division cycle 6 homolog (S. cerevisiae); CDC7, cell division cycle 7 homolog (S. cerevisiae); CDK, cyclin-dependent kinase; CDT1, chromatin licensing and DNA replication factor 1; CIZ1, CDKN1A interacting zinc finger protein 1; DBF4, DBF4 homolog (S. cerevisiae); DNA2, DNA replication helicase 2 homolog (yeast); ERH, enhancer of rudimentary homolog (Drosophila); FEN1, flap structure-specific endonuclease 1; GINS, GINS complex; GMNN, geminin, DNA replication inhibitor; MYST2, MYST histone acetyltransferase 2; PCNA, proliferating cell nuclear antigen; POLA1, polymerase (DNA directed), alpha 1, catalytic subunit; POLE, polymerase (DNA directed), epsilon; SNEA, sub-network enrichment analysis; WDHD1, WD repeat and HMG-box DNA binding protein 1. ⁎ Corresponding author. Tel.: +1 506 648 5506; fax: +1 506 648 5811. E-mail address:
[email protected] (C.J. Martyniuk). 1744-117X/$ – see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.cbd.2013.08.002
improved experimental sampling protocols and provide a baseline for the amount of variation one can expect under natural conditions. Individual variation has also been assessed for endpoints with high ecological relevance, such as fecundity, and recommendations have been postulated to improve the power of reproductive bioassays. For example, a pre-selection phase for breeding pairs can often improve the power of fish bioassays by removing those individuals who are not consistently spawning or those that exhibit higher than average egg production (Bosker et al., 2009). This reduces the effect of individual variability and increases the sensitivity of the bioassay as a measure of adverse reproductive effects due to chemical or sewage effluent exposures. Although higher level reproductive endpoints have been assessed for variability, the expected variation for molecular endpoints in individuals, such as mRNA abundance, has not been systematically explored in fish. This is a knowledge gap because variation in the transcriptome response is significantly associated with phenotypic differences within and among organisms (Vandersteen Tymchuk et al., 2010; Fan et al., 2012), and this can lead to adaptive responses. Exposure to environmental
284
C.J. Martyniuk, J. Houlahan / Comparative Biochemistry and Physiology, Part D 8 (2013) 283–291
contaminants such as polycyclic aromatic hydrocarbons (PAHs) and polychlorinated biphenyl (PCB) for example, results in molecular responses that are correlated with increased tolerance to the chemicals (Oleksiak et al., 2011). One explanation for the observation of increased tolerance is that variability in molecular response is necessary for acquired tolerance. That is, variability in the individual transcriptome may be a pre-requisite for acquired resistance to biotic and abiotic environmental stressors. Microarray-based studies can provide a wealth of information about individual variation in mRNA levels because the same microarray design is repeatedly used across experiments. Thus, there are multiple measures of expression levels for the same gene probe. There are now sufficiently large datasets that can be used to characterize variation in mRNA abundance due to sex, tissue, and chemical perturbations in aquatic toxicological models. There is a need to characterize variability in mRNA abundance as high throughput transcriptomics methods are becoming increasingly applied in studies to assess the impacts of aquatic pollutants (Mehinto et al., 2012; Whitehead et al., 2012) and quantifying variability is necessary in order to determine critical effect sizes for gene expression endpoints if they are to be useful in environmental monitoring programs. The objective of this study was to quantify transcriptome variation in the fathead minnow (FHM) (Pimephales promelas), a well-utilized model for ecotoxicology studies. Our meta-analysis included collected data from 231 FHM microarrays. The analysis of 231 FHM microarrays (8 × 15 K format) included data from three different tissues (liver, testis, and ovary) from both male and female fish that acted as control animals or those that were exposed to different treatments. The rationale for including gene expression values from across tissues and a wide variety of treatments was to obtain a representative sample of transcriptome variability. It may be that, if we had included additional tissues or treatments that we may have observed somewhat different measures of variability in gene expression. However, we have used both sexes and a wide variety of tissues and treatments and one can assume that these data have provided a reasonable estimate of gene variability. This was done to achieve a more holistic view of the transcriptome under different perturbations. A second objective was to identify gene networks with low variability based upon the variation in expression of individual genes within the network. This is important for two reasons. (1) Gene members of these biological processes are expected to be consistent normalizer genes for real-time PCR experiments. (2) More importantly, these biological functions may be more resistant at the transcript level to perturbations from environmental stressors (e.g. pollutants). Perhaps by identifying these processes, insight can be gained into the key physiological processes maintained under environmental stress. Perturbations by chemicals in stable pathways may be more important for longer term environmental monitoring compared to unstable pathways, as these pathways are expected to vary over a wider range of biotic and abiotic scenarios. However, those processes that show high variability may be those most sensitive to perturbation by chemical exposures. Lastly, we perform power analyses using measures of variability and critical effects sizes to provide recommendations for measuring specific transcripts, with special focus on those related to sex steroid signaling and steroidogenesis. 2. Methods 2.1. Source of microarray data, microarray standardization, and analysis In the following descriptions and discussion, we use the following terminology; “study” refers to the unique experiments within a laboratory (i.e. there were 8 studies used in the meta-analysis), “experimental groups” refers to the 19 different experimental data sets in which microarrays were organized for rankings, and “biological process” refers to the 19 biological processes defined by gene ontology that were examined using randomizations. Lastly, “percentage categories” of which
there were 5, describes the grouping of the genes based upon their ranked values. There were a total of 231 microarrays used in the analysis of gene variability. Gene expression data were obtained from NCBI Gene Expression Omnibus (GEO). The FHM 8 × 15 K Agilent microarrays were first generated in the laboratory of N. Denslow at University of Florida (GPL9248) and manufactured by Agilent (Palo Alto, CA, USA). Additional details for the FHM microarray are described in Villeneuve et al. (2010). The GEO accessions for the microarray series were; GSE29350, GSE28353, GSE26958, GSE23521, GSE18254, GSE38289, GSE38288, GSE44839, and GSE44840. These studies were conducted (1) in liver using sexually mature male FHMs exposed to municipal wastewater effluents (concentrations of 0.5% and 5%) and estrogen (E2) (Vidal-Dorsch et al., in press); (2) in ovary of FHMs after exposure to bisphenol A (0, 0.01, 0.1, 1.0, 10, or 100 μg BPA/L) for 96 h (Villeneuve et al., 2012); (3) in ovary of FHMs exposed to the fungicide prochloraz (0 or 300 μg/L) in a time-course of 6, 12 and 24 h exposure (Skolness et al., 2011); (4) in both testis and liver tissue of FHMs exposed to water collected from the two creeks with anthropogenic sources of pollution after 48-h exposures (Weil et al., 2012); (5) in ovaries exhibiting natural stage to stage changes in oocytes development (Villeneuve et al., 2010); (6) in ovary explants of FHMs exposed to dihydrotestosterone (DHT) (10−6 M) for 6, 9, and 12 h (Ornostay et al., unpubl.); (7) in testis explants of FHMs exposed to progesterone (10−6 M) for 12 h (Chishti et al., in press) and (8) in liver tissue of FHMs exposed to phenanthrene (measured 52, 558, and 1460 ng/L) in a 48 h waterborne exposure (Loughery et al., unpub.). Thus, the experiments represented here span different tissues, reproductive stages, experimental paradigms, and both sexes of FHMs. We recognize that this approach may result in an over-estimation of mRNA variability; however this provides a more global and perhaps more realistic snapshot of FHM transcript variation under multiple perturbations. Appendix 1 is a meta-table with additional information about these studies. Microarrays that were used in this analysis were hybridized according to the Agilent One-Color Microarray-Based Gene Expression Analysis protocol. After data retrieval, raw expression data from all 231 microarrays were imported into JMP® Genomics v5.1. Raw intensity data were normalized using Loess normalization (smoothing factor of 0.2) in order to standardize technical variation across experiments. A limitation of this analysis is that there were different scanners used to extract microarray data. For example, an Axon GenePix® 4000B Microarray Scanner (Molecular Devices Inc., Concord, ON, Canada) was used in some studies whereas an Agilent High Resolution DNA Microarray Scanner (Agilent, Palo Alto, CA, USA) was used in others. However, our assumption was that the performance of the scanners was equal for probe intensity measurements for the FHM microarrays. Despite the expected differences in experimental methods, loess normalization performed well based upon box plots and kernel plots (Fig. 1). There were no significant differences in intensity across microarrays after normalization (p b 0.05). 2.2. Assessing variation in gene expression We estimated the variability of mRNA abundance at 2 levels, withingroup and among-group. Within-group variation was required because we had 231 measures of expression for each probe but expression was measured in different tissues, in different sexes, and after exposure to different treatments. Thus, data were first organized into experimental groups based upon tissue, sex, and treatment, and the result was that there were 19 different combinations with sample sizes ranging from 4 to 31 fish in a group. Not all combinations of the three factors (tissue, sex, and treatment) were available (and some combinations were not possible such as female and testis). Within-group variation therefore examined the variability of gene expression within groups that were defined by a common tissue, sex and treatment exposure (n = 19). We then estimated the mean, standard deviation and coefficient of variation of the 19 experimental groups (i.e. among
C.J. Martyniuk, J. Houlahan / Comparative Biochemistry and Physiology, Part D 8 (2013) 283–291
285
Fig. 1. Loess normalization for 231 FHM microarrays. The kernel plots show the distribution of probe intensity pre and post-normalization.
group). The mean provides an estimate of the average within-group variability and the standard deviation and CV provide estimates of among-group variation. Prior to analyzing the dataset for gene stability, we ranked all the gene probes based upon the ‘mean CV’ estimate over the 19 experimental groups from the probe that was least variable (rank = 1) to most variable (rank = 15, 208). The ranking was based on the mean of the within-group variability in gene expression. That is, there were 19 experimental groups (as described above) based on the sex, tissue and treatment of the fish that were tested for gene variation and for each gene probe, we have estimated the variability (CV) across individuals within an experimental group. Thus, for each gene probe, we had 19 estimates of CV, one for each group. We then calculated the mean of those 19 CVs for an overall estimate of within-group variability of a particular gene probe. All rankings of data are provided in Appendix 2. 2.3. Bioinformatics To determine whether or not there were biological processes that showed significantly more variation in the expression of individual gene members within a given pathway than what would be expected by chance, we conducted an enrichment analysis on the ranked gene list. To identify which gene networks would be the most stable, two approaches were used. We first divided the ranked genes into 5 percentage categories from least to most variable, 0–20, 20–40, 40–60, 60–80, and 80–100%. The cut-off for each percentage category was ±0.1%, however for simplicity we refer to the five groups in increments of 20%. FHM gene probes within each group were mapped to human homologs using Entrez GeneID. The resulting lists were then analyzed for over-represented gene networks using a sub-network enrichment analysis (SNEA). Over-represented means that transcripts within a defined sub-network of biological interactions are found more often in a given list (“enriched”) compared to that expected by chance. SNEA was conducted using Pathway Studio 9.0 (Ariadne, Rockville, MD, USA) and ResNet 9.0 (Mammals). The analysis followed that as previously described in Martyniuk et al. (2012) and Langlois and Martyniuk (2013). Briefly, each gene list from each percentage category was compared to characterized networks in Pathway Studio (there are N 1500) to determine if there were networks that were over-represented within
each queried list (each of the 5 percentage categories). For example, if a network was identified as significant in the 0–20% category, it implied that the network is made up of genes that tend to show low variability in mRNA levels. Second, we conducted randomizations (Manly, 2006) of biological processes from gene ontology (e.g. translation, transport, protein modification) to test if sets of genes responsible for specific processes were more stable or more variable than that expected by chance. Most genes are involved in several biological processes; therefore genes could be assigned to several groups. The test statistic used in the randomizations was the summed ranks of all genes associated with a specific process. We then randomized the ranks (i.e. assigned each of the ranks from 1 to 15,208 randomly for each of the genes) and compared the observed sum of ranks to the distribution of 1000 randomized sums of ranks. As an example, if there were 4 genes associated with a specific biological process (e.g. RNA splicing) and they were the 4 least variable genes, their observed ranks would be 1, 2, 3, and 4 and the observed sum of those ranks would be 10. We would then randomly assign ranks from 1 to 15,208 to those 4 genes and sum the randomized ranks. We would do this 1000 times until we had a distribution of the sum of ranks under the null hypothesis that the 4 genes associated with this specific process did not contain genes that were more or less stable than would be expected by chance. We would then compare the observed sum of ranks (i.e. 10) to the distribution of randomized sum of ranks. Alpha was set at 0.05, thus if 50 or fewer randomized sums of ranks were smaller than 10, we would conclude that genes associated with that particular biological function (e.g. RNA splicing) were less variable than that expected by chance. If 50 or fewer randomized sums of ranks were larger than or equal to 10, we would conclude that genes associated with a particular process were more variable than expected by chance. We point out that the example of 4 was used to simplify the explanation of the test statistics for randomization, and is not typical. Many gene ontology terms contain significantly more transcripts (N 20) in the biological process. To test if genes responsible for specific biological processes were found in intermediate percentage categories (i.e. 20–40, 40–60, or 60–80) more than would be expected by chance, we re-ranked all 15,208 genes such that the gene found in the exact center of our target category was ranked 1st. For example, if 20–40% was the percentage
286
C.J. Martyniuk, J. Houlahan / Comparative Biochemistry and Physiology, Part D 8 (2013) 283–291
category being investigated, all genes would be re-ranked such that the gene midway in the 20–40 category (i.e. the gene at the 30th percentile) was ranked 1st. All genes moving away from that 1st ranked gene in either direction increased in rank. Thus, we could use these revised rankings to develop a test statistic that was identical to the test statistic used to test for most and least variable gene networks except that the sum of ranks would be calculated on a revised ranking that would allow us to test for associations with the three intermediate percentage categories. We tested 19 biological processes using the randomization approach outlined above. The biological processes that were selected for randomization were chosen based upon the cell processes identified in the SNEA analysis and randomizations acted as a statistical verification for pathway stability or instability. Biological processes investigated using randomizations included apoptosis (GO:0006915), brain development (GO:0007420), calcium channel activity (GO:0005262), calcium channel inhibitor activity (GO:0019855), calcium ion binding (GO:0005509), carbohydrate metabolic process (GO:0005975), cell cycle (GO:0007049), cellular protein metabolic process (GO:0044267), DNA metabolic process (GO:0006259), DNA repair (GO:0006281), lipid metabolic process (GO:0006629), neuron differentiation (GO:0030182), neuron migration (GO:0001764), protein folding (GO:0006457), response to hypoxia (GO:0001666), RNA metabolic process (GO:0016070), response to DNA damage stimulus (GO:0006974), structural constituent of ribosome (GO:0003735), and translation (GO:0006412). 2.4. Real-time PCR To test the hypothesis that variation exhibited by gene probes on the FHM microarray platform is comparable to variation quantified using real-time PCR, we generated a new expression dataset using samples from 6 different experiments conducted in FHM liver, testis, and ovary. These tissues included untreated testis (Chishti et al., in press) (n = 7), untreated ovaries (n = 6) and ovaries treated with flutamide (n = 6) (Ornostay et al., in press), untreated female livers (n = 5) and female livers treated with phenanthrene for 48 (n = 6) and 72 h (n = 4) (Loughery et al. unpubl.). RNA quantity and quality were assessed using the NanoDrop ND2000 and 2100 Bioanalyzer (Agilent). RNA samples were of high quality (average RIN = 8.44; SD = 0.15). Supplemental Table 1 provides all the primer sequences used in this experiment to assess transcript variability. Real-time PCR analysis was carried out using SsoFast™ EvaGreen® Supermix (BioRad, Mississauga, ON, CAN) and all genes were assayed using the CFX96™ Real-Time PCR Detection System (BioRad). The two-step thermal cycling parameters for the reactions were as follows: initial 1-cycle Taq polymerase activation at 95 °C for 3 min, followed by 40 cycles at 95 °C for 5 s, and annealing at the primer-specific temperature for 5 s. After 40 cycles, a DNA melt curve was generated, starting at 65.0 to 95.0 °C with increment of 0.5 °C for 5 s. Standard curves for each plate were generated with a mixed pool of cDNA (ovary, testis, and liver) and consisted of a 5 or 6 point standard curve. No reverse transcriptase controls (n = 5) and no template controls (n = 3) were performed on each plate. Normalized gene expression was extracted using CFX Manager™ software using the relative ΔΔCq method. The most stable transcripts were used to normalize all expression data (rps12 and rps18; M value = 0.78; CV = 0.27). Normalized expression levels for transcripts were grouped according to the different experimental paradigms; (1) testis from reproductive males (2) ovary from reproductive females (3) liver from nonreproductive FHMs (4) reproductive testis treated with P4 (5) reproductive FHM ovaries treated with flutamide and (6) liver from FHMs treated with phenanthrene in a waterborne exposure for 48 h (7) and 72 h (50 ng/L). The mean, SD, CVs, and average CV were calculated as outlined above for the microarray data. We then ranked the genes assayed with real-time PCR by their average CV and compared these ranks to those generated from the microarray data. Regression analysis
was conducted to determine if there was correspondence between microarray and real-time PCR data (Prism 6.0). 2.5. Power analyses for FHM transcripts The online program PS: Power and Sample Size Calculation (V3.0.43) (Dupont and Plummer, 1998) was used to calculate targeted sample sizes based upon variability in mRNA abundance. Based upon measured variability, our goal was to approximate the sample sizes required to detect a 1.5, 1.75, and 2-fold change in steady state mRNA abundance for transcripts commonly measured in studies focused on reproductive toxicology and physiology such as enzymes involved in steroidogenesis (e.g. steroid acute regulatory protein (star), p450 side chain cleavage, p450ssc) and steroid receptors (estrogen receptor (esr)1; esrba, esrbb, androgen receptor (ar)). Power analyses were based on the following criteria and assumptions; (1) an independent experimental design using a paired t-test to detect differences between groups, (2) a power of 0.80, as this is considered to be adequate for detecting a reasonable departure from the null hypothesis, (3) an alpha of 0.05 (4) a difference in population means (δ) of 0.5, 0.75, and 1 that corresponds to 1.5, 1.75, and 2-fold (50, 75, and 100%) change in a transcript respectively (5) a ratio of control to experimental samples of 1 (m) and (6) the coefficient of variation for each transcript generated from the microarray meta-analysis. Although power often uses SD (σ), here we use the mean corrected SD (or CV) instead because our interest was determining sample size based upon effect sizes that were fold changes (e.g. 1.5-fold) and not true means. Power analyses for select gene examples were conducted separately in each tissue. However, we point out that transcript variability and sample size requirements are expected to change based upon gonadstage, chemical treatment, and sex. 3. Results 3.1. Quantifying variation in FHM genes The mean coefficient of variation (CV) across all FHM gene probes was 0.56. All FHM gene probes along with their final rankings are provided in Appendix 2. The most stable transcript over all experimental groups was ribosomal protein L24, a gene / protein that has a role in the ribonucleoprotein complex. This transcript showed a mean CV of 0.014 and an SD of 0.005 across all microarray experiments. The transcript that showed the highest variability was PREDICTED: similar to gem (nuclear organelle) associated protein 6 which is a gene involved in spliceosomal biogenesis and RNA splicing. This gene showed a mean CV of 0.98 with a SD of 0.57 across all microarrays examined. There was ~200-fold range in variability for FHM transcripts. NADH dehydrogenase flavoprotein 1, synapsin II, similar to marapsin, and PREDICTED: v-set domain-containing T-cell activation inhibitor 1-like were genes that also showed high individual variation in mRNA levels while basic transcription factor 3, ribosomal protein L36, solute carrier family 25 alpha, member 5, nascent polypeptide-associated complex alpha subunit, and 60S ribosomal protein L17 were some of the most stable genes in FHM tissues (Supplemental Table 2). Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guidelines suggest that multiple genes be assessed and used for normalization of qPCR experiments in order to obtain a more stable baseline expression level (Bustin et al., 2010; Taylor et al., 2010). The least variable genes identified in this study are top candidates for normalizer genes in real-time PCR experiments in small bodied fish. Approximately 9000 genes (60%) showed normalized intensity values between 3.0 and 9.0 (Fig. 2). The majority of FHM gene probes (~75%) showed a mean CV between 0.3 and 0.7 and CVs for probe intensity were normally distributed (Goodness of Fit test, Shapiro–Wilk W). A high mean CV represents a gene that shows high individual variability
C.J. Martyniuk, J. Houlahan / Comparative Biochemistry and Physiology, Part D 8 (2013) 283–291
287
Fig. 2. Normalized mean intensity levels of transcripts across FHM microarrays. Approximately 9000 genes (60%) showed intensity values between 3.0 and 9.0 (left graph). Also presented is the distribution of CVs for FHM transcripts (right graph). A high CV represents a gene that shows high variation in expression while a low CV indicates that the expression levels do not vary dramatically over experiments and conditions.
while a low CV indicates that the mRNA expression levels do not vary dramatically over experiments and conditions.
3.2. Stable and variable gene networks in the FHM From the sub-network enrichment analysis, 37 networks contained genes found preferentially in the top of the ranked list while 41 networks were represented by genes found preferentially in the bottom of the ranked list. A Venn diagram was used to graphically overlay subnetworks located within the five percentage categories (i.e. 0–20, 20–40, 40–60, 60–80, and 80–100%) (Fig. 3). Adjacent categories (e.g. 20–40% and 40–60%) showed more sub-networks in common than non-adjacent categories, which suggests that sub-networks show systematic variation from most to least stable. For example, the categories of 0–20% and 20–40% had 53 sub-networks in common while the categories of 0–20% and 80–100% had only 8 sub-networks in common (Fig. 3). We point out here that gene probes are unique to each of the percentage categories; however the biological processes related to those genes can appear in different percentage categories of sub-networks. Stable gene networks in FHM tissues included processes that are under the general description of transcription and translation (e.g.
regulation of translational fidelity, ribosome biogenesis and assembly, poly(A) + mRNA-nucleus export) as well as the processes of protein deubiquitination, endoplasmic reticulum to Golgi transport, and histone methylation (Table 1, Appendix 3). The processes shown in Table 1 are examples of processes that were found exclusively in only one of the 5 percentage categories. Some networks such as apoptosis and cell cycle had transcripts in all percentage categories and are not listed in the table, but are found in the appendix. A network depicting general transcriptional processes that include DNA unwinding, DNA replication initiation and DNA elongation is presented in Fig. 4 as transcripts found in this network are those that show the least amount of individual gene variability. Conversely, some of the most unstable gene networks included those involved in metabolism (e.g. lipids, glucose, fatty acids, steroids) transmission of nerve impulse, calcium transport and regulation, and blood flow/vascularization. Transcripts with a role in different metabolic processes such as glucose metabolism, cholesterol, lipid, steroid, fatty acid, and carbohydrate, were preferentially located in the bottom of the ranked list (60–100%) (Appendix 3). Thus, genes involved in these processes tend to show more variability across individuals. Randomizations were also conducted on selected biological processes to complement SNEA and showed correspondence for many cell processes (Supplemental Table 3). For example, translation, protein folding, DNA repair were more stable than expected by chance while calcium regulation, response to hypoxia, and neuron migration were more variable than expected by chance. Carbohydrate metabolic processes, lipid metabolic process showed moderate variability based upon randomizations of gene ontology terms. 3.3. Real-time PCR Thirteen genes (+ two reference genes) were assessed for individual variability using real-time PCR. Regression analysis was conducted on ranked CV scores for genes measured with both microarray and realtime PCR to determine if the microarray meta-analysis could be used to guide decisions on sample sizes for real-time PCR. Regression analysis showed that there was a significant correlation for variability measured with real-time PCR and variability measured with microarrays (d.f. = 1; R2 = 0.38; p = 0.03) (Fig. 5). The two genes that showed the largest deviation from the regression line were esrba and esrbb. 3.4. Power analyses and sample size variation for FHM target genes
Fig. 3. Venn diagram depicting the overlap in the sub-networks found within the top 20% (0–20%) and bottom 20% (80–100%) of the gene list ranked by overall CVs. Also included are gene probes that fall within the following categories: 20–40%, 40–60%, and 60–80%.
A power analysis for reproduction-related genes was performed to determine the sample sizes needed to detect significant changes in specific genes at a power = 0.8 (Table 2). The most stable gene, ribosomal protein L24 required a very small sample size (n = 2–4) to detect 2-fold change in expression. For genes of interest in reproductive biology and
288
C.J. Martyniuk, J. Houlahan / Comparative Biochemistry and Physiology, Part D 8 (2013) 283–291
Table 1 Some examples of cell processes enriched within each percent category. Provided are the number of entities within a network, the overlap of measured entities with a network, and the percent overlap of measured entities compared to total number of entities. A complete list is provided in Appendix 1. Category of ranks
Cell process
Total # of entities
Overlap
Percent overlap
p-value
0–20%
Regulation of translational fidelity Ribosome biogenesis and assembly Poly(A) + mRNA-nucleus export Nucleotide-excision repair RNA metabolism Spliceosome assembly Protein deubiquitination ER to Golgi transport histone methylation Transcription-coupled nucleotide-excision repair mRNA metabolism Nucleotide biosynthesis Response to oxidative stress mRNA degradation Mitotic checkpoint Intraflagellar transport Secretory pathway Strand invasion Non-selective vesicle fusion Vesicle-mediated transport Sister chromatid cohesion Plasma membrane fusion Vacuole organization and biogenesis Polymerase III transcription Heme biosynthesis Cholesterol export M phase Cilium biogenesis Synaptic vesicle exocytosis DNA degradation Epithelial cell differentiation Myogenesis Lipid storage Microtubule depolymerization Cation transport Granulosa cell function Iron ion homeostasis Triacylglycerols biosynthesis Photoreceptor cell differentiation Granulosa cell differentiation Hexose transport Hair follicle morphogenesis Glycogenesis Calcium ion homeostasis Luteolysis Endothelial cell function Ca++ export Blood-retinal barrier Baroreflex Luteinization Fluid transport Response to light Motor behavior Chloride ion homeostasis Hemostasis Osteocyte function Calcium mobilization Capillary morphogenesis Peripheral nerve function Male gonad development Parturition Habituation Artery remodeling
30 209 116 191 117 74 78 89 107 68 56 40 228 254 107 53 263 44 187 217 83 139 70 85 66 214 316 120 70 695 181 575 103 65 45 49 80 71 61 65 42 43 38 1223 135 467 805 62 129 193 104 41 57 151 217 162 789 67 225 138 226 67 103
19 51 35 42 28 20 19 20 22 16 14 10 34 37 27 14 42 12 31 34 16 23 14 16 13 35 46 21 14 84 27 98 22 14 10 11 19 17 15 17 11 11 10 147 25 60 87 13 21 28 18 10 12 23 30 24 84 13 30 20 29 12 16
61 24 29 21 23 26 24 22 20 23 24 24 14 14 25 25 15 26 16 15 19 16 19 18 19 16 14 17 19 12 14 17 21 21 21 22 23 23 24 25 25 25 25 12 18 12 10 20 16 14 17 23 20 15 13 14 10 19 13 14 12 17 15
8.81E−13 7.12E−11 1.67E−10 8.48E−08 2.20E−06 9.52E−06 7.48E−05 1.58E−04 2.89E−04 4.21E−04 5.11E−04 3.32E−03 3.57E−03 3.63E−03 1.22E−06 2.93E−04 3.43E−04 5.93E−04 1.07E−03 1.69E−03 3.96E−03 4.65E−03 4.94E−03 5.03E−03 7.61E−03 6.58E−04 1.37E−03 3.41E−03 4.88E−03 6.27E−03 9.19E−03 4.95E−11 6.83E−05 1.34E−03 5.23E−03 3.15E−03 4.80E−05 1.08E−04 2.03E−04 3.31E−05 8.50E−04 1.05E−03 1.42E−03 3.14E−08 4.03E−05 7.20E−05 8.93E−04 9.09E−04 9.78E−04 9.95E−04 1.09E−03 1.10E−03 1.38E−03 1.44E−03 1.46E−03 1.68E−03 1.68E−03 1.90E−03 2.55E−03 5.10E−03 5.18E−03 5.52E−03 6.08E−03
20–40%
40–60%
60–80%
80–100%
toxicology, smaller sample sizes (n = 9–10) are needed to detect 1.5fold differences in cytochrome P450 aromatase (cyp19a1), and esrba compared to steroidogenic acute regulatory protein (star), p450ssc, and esrbb (n = 15–50) depending on the tissue. In many cases, samples sizes over 10 were required to detect smaller differences at 1.5-fold compared to sample sizes of 4–5 for a 2-fold change in a transcript.
4. Discussion 4.1. Variability in the FHM transcriptome This is the first systematic assessment of variability in the FHM transcriptome on a global scale. Reproductive endpoints such as gonad
C.J. Martyniuk, J. Houlahan / Comparative Biochemistry and Physiology, Part D 8 (2013) 283–291
289
Fig. 4. A sub-network that contains members that shows low individual variation in mRNA abundance. This gene network includes DNA unwinding, DNA replication initiation, and DNA elongation.
weight, gonadosomatic index, vitellogenin, sex steroids (estrogen and testosterone) (E2 and T), fecundity, average spawn as well as liver weight and hepatosomatic index have been assessed previously in FHM for variability (Watanabe et al., 2007). Reproductive endpoints in females that showed higher variability included E2 (CV = 0.66), T (CV = 0.74), and fecundity (CV = 0.7) while those endpoints with lower variability included body weight, liver weight and liver somatic index (CV = 0.21, 0.31. 0.33 resp.). In male FHMs, CV for plasma steroid levels (E2, T, and 11KT) ranged from = 0.50–0.70 while GSI and nuptial tubercle scores showed lower variability (CV =~ 0.30). Both male and female FHMs showed comparable variability in GSI. Therefore, variability in FHM mRNA levels appears comparable to other endpoints measured in ecotoxicology (CVs in the 0.4–0.6 range). However, it is pointed out that each transcript will have its own inherent variability depending upon tissue-specific regulation and sex and caution must be taken when comparing absolute variability between morphometric and molecular data. Biales et al. (2007) investigated the biological variability in
Fig. 5. Regression analysis for the coefficient of variation for genes measured with both microarray and real-time PCR.
the transcriptional responses to genes after a 48 h dose–response exposure to the pharmaceutical estrogen, 17-α ethinylestradiol (EE2). The authors report that commonly utilized reference genes for real-time PCR that included ribosomal proteins, elongation factor 1 α (ef1a) and glyceraldehyde phosphate dehydrogenase (gapdh) did not have the same magnitude of individual variation in the liver (CV =15 to 50%) compared to other transcripts, such as those related to reproduction (vtg and esr1). Esr1 showed a variable response (CV = 50%) in individuals in response to 10 ng EE2/L exposure while vtg mRNA showed higher individual variation in response to EE2 treatments (CV = ~100%). In our analysis, vtg was located in the bottom 20% of the ranked gene list (rank = 13036) as well as the isoform vtg3 (rank = 13189). Interestingly, the variation in vtg mRNA levels appears similar to that which has been reported for female FHM plasma Vtg (CV = 0.5) and male FHM plasma Vtg (CV = 1.03) (Watanabe et al., 2007). However, this variability may fluctuate when individuals are exposed to estrogenic exposures or other endocrine disruptors. The least variable genes identified here (e.g. basic transcription factor 3, ribosomal protein L36, solute carrier family 25 α, member 5, nascent polypeptide-associated complex α subunit, and 60S ribosomal protein L17) are good candidates for housekeeping genes in real-time PCR experiments. Guidelines for minimum information for publication of quantitative real-time PCR experiments (MIQE) suggest that normalizer genes used in real-time PCR experiments must be properly assessed and show stability using a geometric mean to provide a more accurate, nonvariable baseline for the assessment of target genes (Bustin et al., 2010; Taylor et al., 2010). To achieve this, combinations of normalizers are now used to achieve a stable baseline and the transcripts identified in this study as having low variability, regardless of the tissue, can offer increased stability. Many studies have assessed the appropriateness of different genes in real-time PCR experiments in fish (Filby and Tyler, 2007; McCurley and Callard, 2008; Mitter et al., 2009; Su et al., 2011; Tang et al., 2012) and the consensus is that ribosomal genes, genes involved in the translational process (i.e. elongation factor, ef1a), and structural genes (i.e. beta-actin) can be good housekeeping genes. Not
290
C.J. Martyniuk, J. Houlahan / Comparative Biochemistry and Physiology, Part D 8 (2013) 283–291
Table 2 Some examples of recommended sample sizes in tissues (L = liver, O = ovary, T = testis) for genes related to reproduction. Ribosomal protein L24 ranked number 1 (least variable) in our meta-analysis and is listed here as a reference. Critical effects sizes are 50, 75, and 100% and this corresponds to a fold change of 1.5, 1.75, and 2-fold respectively. Probe ID
NR hit def
NR hit accession
Tissue
50%
75%
100%
UF_Ppr_AF_107061 UF_Ppr_AF_111986 UF_Ppr_AF_107598 UF_Ppr_AF_117841 UF_Ppr_AF_109144 UF_Ppr_AF_107281 UF_Ppr_AF_117096 UF_Ppr_AF_104102
Androgen receptor (ar) Cytochrome P450 aromatase (cyp19a1) Cytochrome P450 cholesterol side chain cleavage (p450ssc) Estrogen receptor alpha (esr1) Estrogen receptor beta 1 (esrba) Estrogen receptor beta 2 (esrbb) Ribosomal protein L24 (ribl24) Steroidogenic acute regulatory protein (star)
AAF88138 CAC38767 ABC87917 AAU87498 AAT45195 AAY21014 AAH59530 NP_571738+
L, O, T O, T O, T L, O, T O, T O, T L, O, T O, T
3, 14, 20 19, 4 9, 8 15, 10, 10 9, 4 12, 48 6, 13, 3 17, 15
3, 7, 10 9, 3 5, 4 7, 5, 5 5, 3 6, 22 4, 7, 3 8, 7
3, 4, 6 6, 3 3, 3 3, 3, 4 3, 3 4, 13 3, 4, 3 5, 5
surprisingly, microarray data suggest that ribosomal protein S18 (rank 15), ef1a (rank 55), and gapdh (rank 490) are transcripts that show low variability across tissues. Based on a meta-analysis of human clinical sample microarrays, the biological process of gene expression (GO:0010467) was determined to be the most stable gene ontology category over 13 organ/tissue types that included heart, muscle, kidney, breast, and gonad (Cheng et al., 2011). In the same study, biological processes that showed stability and contained genes frequently used for normalization included those involved in translation, RNA processing, transcription, and morphogenesis. Consistent with data presented here for FHMs, many genes (e.g. ribosomes, actins) were shown to be stable within and among physiological states and transcripts involved in protein translation are more likely to be stably expressed across physiological states and organ/tissue types in both mammals and fish.
4.2. Variable and stable genes and pathways in the FHM SNEA was performed on the 5 percentage categories of the ranked data (by CV) to determine if there were specific cell processes within a category that showed statistically significant over-representation. Randomizations of the gene ontology data confirmed the major network themes identified by SNEA, and a number of biological processes could be characterized as particularly stable or unstable based upon the overall variability of expression of individual genes within the network. Some of the major pathways that showed low variation in gene expression fell under the general biological processes of transcription and translation (e.g. mRNA processing, rRNA processing, regulation of translational fidelity, translation initiation), and cell cycle. Cell processes found to involve genes with high variation included those related to neural signaling, calcium regulation, and blood flow. SNEA and randomization of gene ontology terms suggested that genes involved in metabolism, for example, carbohydrate metabolic process, lipid metabolic process, and glucose metabolism showed a relatively high level of individual variability. Perhaps the most well studied process in terms of mRNA variability in teleost fish is metabolism. Oleksiak et al. (2005) investigated variation in cardiac gene expression among individuals in the mummichog and found that 94% of the genes involved in metabolism were determined to be significantly variable among individuals. Moreover, 84% of these genes were significantly different across populations of mummichog. Oleksiak et al. (2002) quantified variation in gene expression within and among natural populations of mummichog and plotted frequency distributions of range and mRNA levels among individuals within different populations. Distinct populations showed unique features of the frequency histograms and there were significant differences in mRNA levels in approximately 18% of the transcriptome. In another study in mummichog, Whitehead and Crawford (2005) investigated 192 genes involved in central metabolic pathways. The authors found that nearly half of the genes (approximately 48%) were differentially expressed among individuals and that there was a 1.2 to 5-fold difference in the mRNA levels of metabolic genes. Therefore, there can be substantial individual variation for transcripts
that are related to some ubiquitous functional processes such as metabolism. 4.3. Recommended samples sizes for reproductive genes Our power analyses of genes involved in reproduction and steroidogenesis suggests that a sample size of 12–20 individuals is required to detect a 50% change, depending on the tissue and gene, with a recommended power of 0.8. Overall, transcripts such as p450ssc, and esrba showed a lower amount of variability in the FHM compared to esr1, esrbb, star, and cyp19a1. Our analysis suggested that genes related to steroidogenesis and steroid signaling had CVs comparable to other reproductive endpoints used in ecotoxicology (Watanabe et al., 2007). Power analyses for other genes involved in reproduction have been previously conducted and it has been recommended that the required sample size to measure vtg mRNA was 24 individuals in order to discriminate between a control group and a low concentration of an estrogen (Biales et al., 2007). The authors point out that this is more than the samples typically processed in real-time PCR experiments investigating estrogens. Studies have previously investigated statistical power associated with reproductive toxicity testing in small-bodied fish. It has been reported that many toxicological reproductive bioassays do not have sufficient power to detect changes less than 40 to 50% in endpoints such as egg production (Melvin et al., 2009). This may also be true for select genes in reproduction and many studies have insufficient power to detect a 50–75% change in a transcript level. Many microarray and real-time PCR studies in ecotoxicology often use fewer than 12 individuals/treatment. Based upon our estimates, a 50% change in the expression of star or esrbb will not be detected with most experimental designs because of the high variability in expression. Moreover, many transcripts vary significantly with oocyte maturation in female teleosts (Sampath Kumar et al., 2000; Villeneuve et al., 2010; Martyniuk et al., 2013). Therefore, without precise histological characterization of individual female ovaries, it becomes increasingly difficult to detect small changes in mRNA abundance for reproductive genes due to a chemical treatment, unless chemicals that directly target the pathway are used in the experiments (e.g. fadrozole to inhibit Cyp19a1). To better address this, we suggest that researchers (1) perform rigorous histological analysis in gonads before conducting gene expression analysis and (2) consider the samples sizes required for specific reproductive-related genes in real-time PCR experiments. We observed a high correspondence for mRNA variability assessed with both microarrays and real-time PCR, thus variability characterized in this microarray meta-analysis can be useful for guiding decisions regarding sample sizes in real-time PCR experiments. An important last point to make is that microarrays will measure high abundant genes with more accuracy than low expressed genes (i.e. signal to noise ratio is higher for low abundant transcripts). Therefore, researchers should be aware of the relative abundance of transcripts and level of expression for a gene of interest when conducting microarray/RNA-seq and real-time PCR experiments. It is expected that low expressed genes will require increased sample sizes in order to detect smaller fold changes.
C.J. Martyniuk, J. Houlahan / Comparative Biochemistry and Physiology, Part D 8 (2013) 283–291
5. Conclusions This study provides fundamental information regarding gene variability that can be used to support ecotoxicology-based experiments. Genes that have low within treatment variability may be better candidates for bioindicators of environmental effects, as a perturbation may be interpreted as a serious problem. Further characterization of the variability in the teleost transcriptome will improve understanding regarding global gene expression changes to toxicants by separating out the noise from the biological response. The need to characterize individual variation in gene expression patterns and transcriptomes has been pointed out by others studying small bodied fish (Crawford and Oleksiak, 2007). Whitehead and Crawford (2005) also suggest in their investigations with mummichog that researchers must consider variability among individuals in order to accurately identify differentially expressed genes in response to chemical and abiotic stressors. These principles will also be the same for other omics approaches in ecotoxicology, such as environmental proteomics (Rees et al., 2011) and metabolomics. Characterizing individual variability at the molecular level will prove necessary as efforts are made toward integrating molecular tools into environmental risk assessments. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.cbd.2013.08.002. Acknowledgments The authors thank Y. Chishti, J. Loughery, and A. Ornostay for sharing microarray data. This research was funded by a Canada Research Chair to CJM and NSERC Discovery Grant to CJM and JH. We have no conflict of interest to declare. References Biales, A.D., Bencic, D.C., Flick, R.W., Lazorchak, J., Lattier, D.L., 2007. Quantification and associated variability of induced vitellogenin gene transcripts in fathead minnow (Pimephales promelas) by quantitative real-time polymerase chain reaction assay. Environ. Toxicol. Chem. 26, 287–296. Bosker, T., Munkittrick, K.R., MacLatchy, D.L., 2009. Challenges in current adult fish laboratory reproductive tests: suggestions for refinement using a mummichog (Fundulus heteroclitus) case study. Environ. Toxicol. Chem. 28, 2386–2396. Bustin, S.A., Beaulieu, J.F., Huggett, J., Jaggi, R., Kibenge, F.S., Olsvik, P.A., Penning, L.C., Toegel, S., 2010. MIQE précis: practical implementation of minimum standard guidelines for fluorescence-based quantitative real-time PCR experiments. BMC Mol. Biol. 11, 74. Cheng, W.C., Chang, C.W., Chen, C.R., Tsai, M.L., Shu, W.Y., Li, C.Y., Hsu, I.C., 2011. Identification of reference genes across physiological states for qRT-PCR through microarray meta-analysis. PLoS One 6 (2), e17347. Chishti, Y.Z., Feswick, A., Munkittrick, K.R., Martyniuk, C.J., 2013. Transcriptomic profiling of progesterone action in the male fathead minnow (Pimephales promelas) testis. Gen. Comp. Endocrinol. (in press). Crawford, D.L., Oleksiak, M.F., 2007. The biological importance of measuring individual variation. J. Exp. Biol. 210, 1613–1621. Dupont, W.D., Plummer, W.D., 1998. Power and sample size calculations for studies involving linear regression. Control. Clin. Trials 1, 589–601. Fan, S., Elmer, K.R., Meyer, A., 2012. Genomics of adaptation and speciation in cichlid fishes: recent advances and analyses in African and neotropical lineages. Philos. Trans. R. Soc. Lond. B Biol. Sci. 367, 385–394. Filby, A.L., Tyler, C.R., 2007. Appropriate ‘housekeeping’ genes for use in expression profiling the effects of environmental estrogens in fish. BMC Mol. Biol. 8, 10. Langlois, V.S., Martyniuk, C.J., 2013. Genome wide analysis of Silurana (Xenopus) tropicalis development reveals dynamic expression using network enrichment analysis. Mech. Dev. 130, 304–322. Manly, B.F.J., 2006. Randomization, Bootstrap and Monte Carlo Methods, Third edition. Chapman and Hall/CRC, Florida USA.
291
Martyniuk, C.J., Doperalski, N.J., Kroll, K.J., Barber, D.S., Denslow, N.D., 2012. Sexually dimorphic transcriptomic responses in the teleostean hypothalamus: a case study with the organochlorine pesticide dieldrin. Neurotoxicology 34C, 105–117. Martyniuk, C.J., Prucha, M.S., Doperalski, N.J., Antczak, P., Kroll, K.J., Falciani, F., Barber, D.S., Denslow, N.D., 2013. Gene expression networks underlying ovarian development in wild largemouth bass (Micropterus salmoides). PLoS One 8, e59093. McCurley, A.T., Callard, G.V., 2008. Characterization of housekeeping genes in zebrafish: male–female differences and effects of tissue type, developmental stage and chemical treatment. BMC Mol. Biol. 9, 102. Mehinto, A.C., Martyniuk, C.J., Spade, D.J., Denslow, N.D., 2012. Applications for nextgeneration sequencing in fish ecotoxicogenomics. Front Genet. 3, 62. Melvin, S.D., Munkittrick, K.R., Bosker, T., MacLatchy, D.L., 2009. Detectable effect size and bioassay power of mummichog (Fundulus heteroclitus) and fathead minnow (Pimephales promelas) adult reproductive tests. Environ. Toxicol. Chem. 28, 2416–2425. Mitter, K., Kotoulas, G., Magoulas, A., Mulero, V., Sepulcre, P., Figueras, A., Novoa, B., Sarropoulou, E., 2009. Evaluation of candidate reference genes for QPCR during ontogenesis and of immune-relevant tissues of European seabass (Dicentrarchus labrax). Comp. Biochem. Physiol. B Biochem. Mol. Biol. 153, 340–347. Oleksiak, M.F., Churchill, G.A., Crawford, D.L., 2002. Variation in gene expression within and among natural populations. Nat. Genet. 32, 261–266. Oleksiak, M.F., Roach, J.L., Crawford, D.L., 2005. Natural variation in cardiac metabolism and gene expression in Fundulus heteroclitus. Nat. Genet. 37, 67–72. Oleksiak, M.F., Karchner, S.I., Jenny, M.J., Franks, D.G., Welch, D.B., Hahn, M.E., 2011. Transcriptomic assessment of resistance to effects of an aryl hydrocarbon receptor (AHR) agonist in embryos of Atlantic killifish (Fundulus heteroclitus) from a marine Superfund site. BMC Genomics 12, 263. Ornostay, A., Cowie, A.M., Hindle, M., Baker, C.J.O., Martyniuk, C.J., 2013. Classifying chemical mode of action using gene networks and machine learning: a case study with the herbicide linuron. Comp. Biochem. Physiol. Part D Genomics Proteomics (in press). Rees, B.B., Andacht, T., Skripnikova, E., Crawford, D.L., 2011. Population proteomics: quantitative variation within and among populations in cardiac protein expression. Mol. Biol. Evol. 28, 1271–1279. Sampath Kumar, R., Ijiri, S., Trant, J.M., 2000. Changes in the expression of genes encoding steroidogenic enzymes in the channel catfish (Ictalurus punctatus) ovary throughout a reproductive cycle. Biol. Reprod. 63, 1676–1682. Skolness, S.Y., Durhan, E.J., Garcia-Reyero, N., Jensen, K.M., Kahl, M.D., Makynen, E.A., Martinovic-Weigelt, D., Perkins, E., Villeneuve, D.L., Ankley, G.T., 2011. Effects of a short-term exposure to the fungicide prochloraz on endocrine function and gene expression in female fathead minnows (Pimephales promelas). Aquat. Toxicol. 103, 170–178. Su, J., Zhang, R., Dong, J., Yang, C., 2011. Evaluation of internal control genes for qRT-PCR normalization in tissues and cell culture for antiviral studies of grass carp (Ctenopharyngodon idella). Fish Shellfish Immunol. 30, 830–835. Tang, Y.K., Yu, J.H., Xu, P., Li, J.L., Li, H.X., Ren, H.T., 2012. Identification of housekeeping genes suitable for gene expression analysis in Jian carp (Cyprinus carpio var. Jian). Fish Shellfish Immunol. 33, 775–779. Taylor, S., Wakem, M., Dijkman, G., Alsarraj, M., Nguyen, M., 2010. A practical approach to RT-qPCR—publishing data that conform to the MIQE guidelines. Methods 50, S1–S5. Vandersteen Tymchuk, W., O'Reilly, P., Bittman, J., Macdonald, D., Schulte, P., 2010. Conservation genomics of Atlantic salmon: variation in gene expression between and within regions of the Bay of Fundy. Mol. Ecol. 19, 1842–1859. Vidal-Dorsch, D.E., Colli Dula, C., Bay, S., Greenstein, D.J., Wiborg, L., Petschauer, D., Denslow, N.D., 2013. Gene expression of fathead minnows exposed to two types of treated municipal wastewater effluents. Environ. Sci. Technol. (in press). Villeneuve, D.L., Garcia-Reyero, N., Martinović, D., Cavallin, J.E., Mueller, N.D., Wehmas, L.C., Kahl, M.D., Linnum, A.L., Perkins, E.J., Ankley, G.T., 2010. Influence of ovarian stage on transcript profiles in fathead minnow (Pimephales promelas) ovary tissue. Aquat. Toxicol. 98 (4), 354–366. Villeneuve, D.L., Garcia-Reyero, N., Escalon, B.L., Jensen, K.M., Cavallin, J.E., Makynen, E.A., Durhan, E.J., Kahl, M.D., Thomas, L.M., Perkins, E.J., Ankley, G.T., 2012. Ecotoxicogenomics to support ecological risk assessment: a case study with bisphenol A in fish. Environ. Sci. Technol. 46, 51–59. Watanabe, K.H., Jensen, K.M., Orlando, E.F., Ankley, G.T., 2007. What is normal? A characterization of the values and variability in reproductive endpoints of the fathead minnow, Pimephales promelas. Comp. Biochem. Physiol. C Toxicol. Pharmacol. 146, 348–356. Weil, R.E., Spade, D.J., Knoebl, I., Hemming, J.M., Tongue, M.L., Szabo, N.J., Kroll, K.J., Tate, W.B., Denslow, N.D., 2012. Evaluation of water quality threats to the endangered Okaloosa darter (Etheostoma okaloosae) in East Turkey Creek on Eglin Air Force Base. Aquat. Toxicol. 110–111, 177–186. Whitehead, A., Crawford, D.L., 2005. Variation in tissue-specific gene expression among natural populations. Genome Biol. 6, R13. Whitehead, A., Pilcher, W., Champlin, D., Nacci, D., 2012. Common mechanism underlies repeated evolution of extreme pollution tolerance. Proc. Biol. Sci. 279 (1728), 427–433.