Application of functional genomic technologies in a mouse model of retinal degeneration

Application of functional genomic technologies in a mouse model of retinal degeneration

Genomics 85 (2005) 309 – 321 www.elsevier.com/locate/ygeno Application of functional genomic technologies in a mouse model of $ retinal degeneration ...

497KB Sizes 0 Downloads 107 Views

Genomics 85 (2005) 309 – 321 www.elsevier.com/locate/ygeno

Application of functional genomic technologies in a mouse model of $ retinal degeneration Jeffrey R. Shearstonea,*, Yang E. Wangb, Amanda Clementa, Normand E. Allairea, Chunhua Yangc, Dane S. Worleyc, John P. Carullia, Steven Perrina,* b

a Research Molecular Discovery, Biogen Idec, Inc., 14 Cambridge Center, Cambridge, MA 02142, USA Research Computational Biology, Biogen Idec, Inc., 14 Cambridge Center, Cambridge, MA 02142, USA c Research Neurodegeneration, Biogen Idec, Inc., 14 Cambridge Center, Cambridge, MA 02142, USA

Received 22 July 2004; accepted 1 November 2004 Available online 22 December 2004

Abstract Generation of tissue-specific, normalized and subtracted cDNA libraries has the potential to characterize the expression of rare transcriptional units not represented on Affymetrix GeneChips. Initial sequence analysis of our murine cDNA clone collections showed that as much as 86, 45, and 30% of clones are not represented on the Affymetrix Mu11k, MG-U74, and MG-430 chip sets, respectively. A detailed study that compared EST sequences of a subtracted library generated from mouse retina to those of MG-430 consensus sequences was undertaken, using UniGene build 124 as the common reference. A set of 1111 nonredundant transcript regions, not represented on the commercial array, was identified. These clusters were used as the primary filter for analyzing a data set produced by assaying samples from the Pde6b rd1 mouse model of retinal degeneration on a 12,325-feature retinal cDNA microarray. QRT-PCR validated eight unique transcripts identified by microarray. Seven of the transcripts showed retina-specific expression. Full-length cloning strategies were applied to two of the ESTs. The genes discovered by this approach are the full-length mouse homologue of guanylate cyclase 2F (GUCY2F) and a carboxytruncated splice variant of retinal S-antigen (SAG), known as regulators of the visual phototransduction G-protein-coupled receptor-mediated signaling pathway. These sequences have been assigned GenBank Accession Nos. AY651761 and AY651760, respectively. D 2004 Elsevier Inc. All rights reserved. Keywords: Affymetrix GeneChip; Microarray; cDNA array; Subtracted library; Retinal degeneration; Photoreceptor; SAG; GUCY2F; Splice variant; Retinitis pigmentosa

The continuous maturation of commercially available transcription profiling technologies has brought forth dramatic increases in the number of messages monitored per assay, while simultaneously improving data reproducibility, platform sensitivity, and sample throughput. These increases have coincided with both a decrease in the amount of sample material needed per labeling reaction and a reduction of the total cost per assay. In addition, the completion of the mouse and human genomes has provided the necessary scaffolding $ Sequence data from this article have been deposited with the GenBank Data Library under Accession Nos. AY651760 and AY651761. * Corresponding authors. Fax: +1 617 679 3208. E-mail addresses: [email protected] (J.R. Shearstone)8 [email protected] (S. Perrin).

0888-7543/$ - see front matter D 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.ygeno.2004.11.001

upon which to align and orient EST information, gene prediction models, and known coding sequence, thereby facilitating array design and improving transcript representation. However, commercial technologies are not without their drawbacks, particularly when viewed in the context of gene content. The mouse UniGene build 107 contains 84,459 sequence clusters, of which only 34,323 meet the stringent design criteria of the most current Affymetrix MG-430 chip set, leaving over 50,000 potential transcripts or transcript variants without representation [1,2]. While some of these EST clusters may be merely artifacts of library construction, it is likely that many represent unique, tissue-specific, or lowabundance transcriptional units. Microarrays constructed from tissue-specific cDNA libraries have demonstrated their utility in a number of

310

J.R. Shearstone et al. / Genomics 85 (2005) 309–321

organisms [3–12]. Similarly, tissue-specific in silico mining of the public EST database has proven to be fruitful when taken from computer to lab bench [13–19]. Therefore, proponents of custom microarrays would argue that generation of high-quality normalized and subtracted cDNA libraries from underrepresented tissues, when coupled to microarray technology, has the potential to characterize the expression of rare mRNA species, novel alternatively spliced transcripts, or novel genes not identified by current gene prediction programs or not represented on Affymetrix GeneChips [20–23]. However, custom microarray fabrication of this nature requires significant expertise to develop, a large number of personnel to implement, and constant effort to maintain to produce data of quality equal to that of a GeneChip, which is simply purchased as a reagent. The recent improvements in the transcript content of Affymetrix arrays and the known advantages of tissuespecific microarrays were sufficient to provoke discussions ultimately leading to experiments to investigate the value of fabricating custom-spotted cDNA arrays within our laboratory. We first compared a randomized selection of sequences from three murine cDNA libraries to the Affymetrix chip sets developed over the past 4 years, to quantify the improvements made with each new release. Next, a detailed study was undertaken, comparing EST sequences of a subtracted library generated from mouse retina to those of the Affymetrix MG-430 chip set consensus sequences, using UniGene build 124 as the common reference [18]. We identified a set of unique transcript regions, not represented on the commercial array, which was then applied as the primary filter for a custom retinal cDNA microarray experiment using the Pde6b rd1 mouse model of retinal degeneration. We identified eight transcripts that are unique to our microarray, are differentially expressed in the disease model, and were validated via quantitative real-time PCR. Full-length cDNA sequences were cloned from two of these ESTs. The genes discovered by this approach are the full-length mouse homologue of guanylate cyclase 2F (GUCY2F) and a carboxy-truncated variant of retinal Santigen (SAG), known regulators of the visual phototransduction G-protein-coupled receptor-mediated signaling pathway.

the BMAP1, BMAP2, and retinal sets. The Mu11k chip set represented only 31, 14, and 25% of the content contained within the BMAP1, BMAP2, and retinal libraries, respectively. The MG-U74 chip set showed a marked improvement over its predecessor, representing 84, 55, and 62% of the BMAP1, BMAP2, and retinal libraries. For the BMAP1 library, the MG-430 chip set showed only a 1% gain in the coverage over that of the MG-U74. In contrast, the MG-430 chip set improved sequence coverage of the BMAP2 and retinal libraries to 70 and 78%, respectively (Fig. 1). Differences seen across clone collections with respect to an individual chip set could be attributed to numerous factors involved with library construction including the type of source tissues, sequencing quality of the library, or normalization and subtraction steps applied during synthesis. The BMAP1 collection, which contains 26.9% of clones from subtracted libraries, contained only 15% of ESTs that would be considered unique from the MG-430 chip set. By comparison, the retinal collection, in which 73.7% of clones are derived from subtracted sources, showed 22% of clones to be unique compared to the same chip set. In the BMAP2 collection, the trend was consistent, with 89.1% of cDNAs derived from subtracted libraries, resulting in 30% uniqueness versus the commercial array. These data support numerous reports in the literature documenting the virtues of subtracted libraries to yield novel cDNAs [17,18,25,26]. Further analysis demonstrated that library source tissue is a contributing factor in determining uniqueness. For example, the BMAP2 library contains 23.4% of clones from retina and the remaining clones from brain tissues. However, looking at sequences that are absent from the MG-430 chip reveals that 30.2% are from retinal sources. This enrichment occurs despite the fact that 96% of these brain-derived clones are from subtracted libraries, while

Results and discussion Preliminary BLAST alignments of custom cDNA libraries to Affymetrix GeneChips We began our study by comparing a random sampling of three murine cDNA libraries, generated as part of the Brain Molecular Anatomy Project (BMAP), against the Affymetrix Mu11k (release date April 1999), MG-U74 (release date August 2001), and MG-430 (release date March 2003) chip sets [18,24]. Our three clone collections are referred to as

Fig. 1. Custom cDNA library representation on Affymetrix GeneChips. GenBank sequences for five sets of 100 random clones in the BMAP1, BMAP2, and retinal cDNA libraries were queried against the Mu11k exemplar (black bars), MG-U74v2 consensus (gray bars), or MG-430 consensus (white bars) sequences using the NetAffx Web site. A cDNA clone sequence with no hit to the subject sequence database was considered absent from the Affymetrix GeneChip.

J.R. Shearstone et al. / Genomics 85 (2005) 309–321

only 65% of the retina-derived clones are from subtracted libraries. Furthermore, in the BMAP1 collection, 6.7 and 6.6% of all clones are derived from hypothalamus and prefrontal cortex, respectively. When looking only at clones not represented on the MG-430 chip, we find an enrichment of these tissues to 12.1 and 10.3%. Conversely, olfactory bulb and amygdala represent 7.7 and 5.8% of the BMAP1 library at large, while comprising only 3.4 and 1.7% of the unique clones. Therefore, the ability of Affymetrix to represent accurately all transcripts of a biological system is dependent on the tissue and subtissues used to construct the library, as well as the subtraction steps. Differences across chip sets with respect to an individual clone collection reveal the significant advances made in the public sequence space, the algorithms that assemble this sequence into UniGene clusters, Affymetrix’s refinement of target sequence and probe selection, and continued reduction in the feature size of oligonucleotide probes. The Mu11k chip set contained 11,000 full-length genes and EST clusters and used UniGene build 4 as the primary sequence source. The next chip release, MG-U74, increased the coverage of BMAP2 and retinal collections by 40% and coverage of BMAP1 by 53%. This dramatic improvement reflects the use of UniGene build 74 and the chip set’s ability to represent 36,000 full-length genes and EST clusters, a threefold increase in probe density over the Mu11k. The most recent chip set, MG-430, improved representation of the BMAP2 and retinal libraries by an additional 15%. The MG-430 chip set featured significant improvements in design, particularly with regard to sequence selection, which now includes RefSeq and dbEST sources, as well as the updated UniGene build 107 and GenBank databases. Additionally, a draft assembly of the mouse genome was utilized to improve cDNA sequence orientation and annotation [1,2,27]. In contrast to our data, a similar study using a subtracted library showed no significant improvements in representation between the human Affymetrix U95 and U133 chip sets, perhaps demonstrating the maturity of human sequence space over that of the mouse [28]. The substantial increases in BMAP2 and retinal sequence representation by each new generation of commercial arrays suggests that future designs will continue to improve representation of our tissue-specific, subtracted cDNA libraries. We anticipate that these improvements will be largely driven by further reduction of feature size, thereby allowing more UniGene clusters to be represented per chip set. Currently the MG-430 chip set contains 34,323 of the 84,459 possible UniGene clusters, representing only 40% of all known and hypothetical transcription units. Despite the obvious improvements in Affymetrix’s ability to represent the public transcript space, our initial studies led us to believe that commercial arrays lack significant numbers of previously reported, transcriptional units.

311

Detailed BLAST alignments of the retinal cDNA library to the Affymetrix MG-430 chip set To understand better the transcript representation of the most current Affymetrix chip set, we undertook a comprehensive comparison of the retinal library sequences to the MG-430 consensus sequences. A similar approach has been utilized to arrive at concordant data sets for the NCI-60 cancer cell lines across two different expression profiling technologies [29]. Our investigation was focused on the retinal collection, rather than BMAP2, because expression profiling using retina-specific tissues has enjoyed considerable success in EST mining [13–16,30], serial analysis of gene expression [31,32], and cDNA microarray-based studies [3,19,33–35]. UniGene build 124 was utilized as a common template upon which the MG-430 consensus sequences or retinal 3V EST sequences were aligned by BLAST homology searches. Poor alignments were removed by filtering on an E value greater than 1.00  10 50. This criterion removed 520 (1.2% of total) and 1077 (9.1% of total) sequences from the initial Affymetrix and retinal submission, respectively (Fig. 2A). The relatively large percentage of clones eliminated from the retinal library reflects inaccurate or abbreviated EST reads, known issues in using single-pass sequencing [36]. A stringent E value was chosen by design, in an effort to control false positive rates in downstream analysis. Next, UniGene accession numbers represented more than once were removed from each sequence set to yield a group

Fig. 2. Comprehensive comparison of retinal cDNA library to Affymetrix MG-430 GeneChip. UniGene build 124 was utilized as the common template upon which MG-430A, MG-430B, or MG-430 chip set or retinal 3V EST sequences were aligned. Each MG-430 and retinal sequence received a single UniGene number derived from the UniGene EST producing the highest alignment score. (A) Poor alignments were removed by filtering for E values less than 1.00  10 50. UniGene accession numbers represented more than once were removed from each sequence set to yield a group of unique UniGene identifiers (gray box). (B) Nonredundant UniGene accession numbers were used to determine the intersection of the retinal cDNA and MG-430 chip set sequences.

312

J.R. Shearstone et al. / Genomics 85 (2005) 309–321

of nonredundant UniGene identifiers. The MG-430A and MG-430B chips contained 13,743 and 18,053 nonredundant UniGene clusters, respectively. Taken together as a single sequence set, the MG-430 represented 29,358 individual UniGene clusters, reflecting the elimination of an additional 2438 Affymetrix qualifiers due to redundancy between chips (Fig. 2A). Similarly, the retinal collection could be parsed to 8318 individual UniGene clusters. Therefore, based on unique UniGene cluster membership, the MG-430 set is 35% internally redundant, while the retinal collection is 30% internally redundant. Our data are in slight disagreement with Affymetrix literature that states the MG-430 chip contains 34,323 UniGene clusters. Discrepancies could be attributed to our use of UniGene build 124 versus build 107 used in MG-430 chip design. Due to the experimental nature of UniGene, the number and membership of clusters is known to fluctuate with algorithmic improvement or submission of new sequences [26,36]. In addition, our decision to assign only a single UniGene cluster to each query may have led to the elimination of Affymetrix consensus sequence that spans multiple clusters to produce several high-scoring, nonredundant hits. Nonredundant UniGene accession numbers were used to determine the intersection of MG-430 and retinal sequences. Sequence sets shared 7207 UniGene clusters, while 22,151 clusters were found only on the MG-430. We found that 1111 retinal clones, or 13.3% of the nonredundant set, were absent on the commercial array (Fig. 2B). In concordance with our data, similar studies have shown that 17% of clones from a subtracted library could not be identified as present on the human U133 chip set [28]. In addition, we found that 88.1% of the 1111 clones were derived from subtracted libraries, compared to 73.6% in the entire collection, underscoring the importance of the subtraction step during library construction. Our preliminary studies predicted that 2609 clones (22% of 11,862) in the retinal collection could not be found in the MG-430 chip set. Taken together, these results suggest that our E value and redundancy criteria were effective in generating a conservative set of retinal cDNA clones primarily derived from subtracted libraries that are not represented by Affymetrix. We queried the 1111 sequences against the human UniGene database and NCBI h33 assembly of the human genome. We found that 262 of these sequences had BLAST E values of less than 1.00  10 4, of which 60 ESTs showed significant homology to UniGene, 66 ESTs to the genome, and 136 ESTs to both databases. Further analysis was conducted by manually querying a randomized subset of 70 clones using the October 2003 build of the University of California Santa Cruz mouse genome browser. BLAT alignments for these sequences were categorized into four bins with the following results: (i) 17.0% of sequences are within 5000 bp of a known 3V untranslated region (UTR), (ii) 24.0% of ESTs were not adjacent to any known mRNA full-length sequence or sequence fragment, (iii) 8.5% of sequences were homologous to known full-length or partial

protein coding regions not covered by the Affymetrix MG430 chip set, and (iv) 50.5% of clones resided as contiguous regions within the introns of known full-length sequence or mRNA sequence fragments. Most of the sequences in the first category appear to be 3V UTR extensions of known mRNA transcripts, although it is plausible that they could represent novel transcript units from the opposing genomic strand. Clones in the second category represent novel gene predictions, putative noncoding RNAs, and yet to be described mouse homologues of human genes. The relatively small number of clones in the third category demonstrates the success of Affymetrix clustering algorithms in identifying and representing traditional protein-coding motifs and previously described mRNA species. Conversely, the large number of clones present in category iv illustrates the limitations of Affymetrix’s sequence selection criteria, which do not emphasize putative noncoding RNAs and alternative RNA isoforms of known mRNAs, such as 3V truncations, novel exons, or splice variants. Clones that were found to be intronic to known mRNA represent an intriguing majority of the 1111 unique sequences, especially in the context of noncoding RNAs. Virtually all sequences in this category showed no evidence of splicing. No clear pattern was seen with regard to the alignment loci or orientation of these sequences within the known mRNA. Approximately 50% are found intronic to the 5V UTR or first three exons of the mRNA, 30% are found intronic to the 3V UTR or last three exons of the mRNA, and the remainder are intronic to a central region of the mRNA. Interestingly, 56% of these directionally cloned sequences were found to be expressed antisense to the known mRNA or mRNA fragment. Transcription units of this nature are not without precedent. A recent study using serial analysis of gene expression identified 21 putative noncoding RNAs dynamically expressed in the developing retina [37]. Our analysis of their data found that approximately 50% of these sequences resided as contiguous regions within the introns of known mRNA or mRNA sequence fragments. In addition, these regions of alignment showed no 3V UTR or 5V UTR bias and could be found in both sense and antisense orientations. In a related array-based experiment, probes tiled to human chromosomes 21 and 22 demonstrated that between 44 and 47% of all transcriptional activity is located less than 300 bp distant from the nearest annotated exon [38]. This transcriptional activity was identified in regions throughout the length of the known mRNA loci. However, the use of double-stranded targets precluded the determination of strandedness. Similarly, 311 cases of antisense transcription of nonoverlapping exons, in which one of the transcription units is unspliced, have been described in a detailed analysis of the FANTOM2 cDNA set [39]. While a more comprehensive analysis of these 1111 sequences is outside of the focus of this article, the results presented here are consistent with the accumulating evidence suggesting that a significant portion of the tran-

J.R. Shearstone et al. / Genomics 85 (2005) 309–321

scriptome does not code for protein [31,38–42]. Naturally, array probe selection algorithms that are weighted for protein-coding sequence motifs, such as those of Affymetrix, would overlook many of these noncoding transcriptional units. Therefore, tissue-specific and subtracted libraries have the unbiased advantage of discovering alternative RNA isoforms of previously annotated genes and novel protein coding transcripts, as well as noncoding members of the transcriptome. Identification of unique transcript clusters in a mouse model of retinal degeneration Retinal degenerative diseases are a genetically diverse group of syndromes characterized by progressive loss of vision due to photoreceptor cell death. The underlying pathology associated with this process is poorly understood, despite a large number of murine mutants that recapitulate the physiology of the human disease. The Pde6b rd1 mouse model of retinal degeneration exhibits an autosomal recessive phenotype caused by a mutation in the gene encoding the h subunit of cGMP phosphodiesterase. The defect results in rapid rod degeneration between postnatal days 10 and 20 and a slower loss of cones over the following months [43,44]. Mutations in the canine and human homologues of this gene are associated with rod/ cone dysplasia and retinitis pigmentosa [45,46]. To identify novel or alternatively spliced genes associated with disease pathology, while simultaneously evaluating the ability of Affymetrix GeneChips to capture all transcripts with known biological relevance adequately, a custom microarray containing 12,325 features was fabricated using PCR products from all retinal collection cDNA clones, as well as exogenous, endogenous, and eye-specific controls. RNAs harvested from the retina of diseased Pde6b rd1 and wildtype Pde6b + mice at 15 postnatal days were compared using the microarray. All microarray data reported here represent the consolidation of technical (n = 2) and biological (n = 2) replicate data. Using a fold change filter of F2.5 at a p value b0.1, a list of 153 differentially expressed genes was generated, of which 122 were down regulated and 31 were up regulated in the diseased state. Next, the results of our in silico modeling were applied to our gene list in an effort to focus on transcripts that we might not otherwise identify using commercial arrays. This filter yielded 17 nonredundant retinal clones of which 3 were up regulated and 14 were down regulated in the diseased retina. In our in silico evaluation, we had shown that 13.3% of all retinal collection cDNA sequences were not represented on the MG-430 chip set. Similarly, 11.1% of the differentially regulated sequences in the diseased model were not represented on the commercial array. This concordance suggests that the majority of unique cDNA clusters identified in silico are not merely artifacts of library construction, but rather bona fide transcriptional units that can be differentially expressed.

313

A single sequencing attempt, using fresh plasmid isolated from archived bacterial stock, was made for these 17 clones. In addition, PCR products, generated from the fresh plasmid or original plasmid used in microarray construction, were compared directly. A sequence read could not be obtained for the plasmid representing the following clones with average fold changes: BF455590 ( 8.1) and BF464882 ( 4.9). For the remaining 15 clones, fresh plasmid sequence matched the appropriate EST read within GenBank. However, PCR bands generated from each plasmid source did not agree in the case of BE953559 (2.8), BE954642 ( 9.6), BE982313 (5.0), and BF462295 ( 5.2). In addition, BE994188 ( 2.6), BF462982 ( 2.9), and BF463224 ( 5.4) showed the presence of an additional band during PCR product comparison. Disparities found during verification could be contributed to cross-contamination or tracking errors in the many steps of microarray construction, as well as the subsequent retrieval of the original plasmid for PCR comparisons. While these 9 clones could not be definitively identified using DNA sequencing, further analysis by quantitative real-time PCR (QRT-PCR) could validate many of them as differentially regulated in the disease model. We chose to focus our attention on a more complete understanding of the remaining 8 sequence-verified cDNA clones (Fig. 3): BE953520 ( 2.8), BE989041 ( 6.8), BE996537 ( 3.9), BF455609 (2.6), BF455823 ( 4.6), BF464562 ( 2.7), BF464839 ( 3.4), and BF464860 ( 4.4). In our original list of 153 differentially regulated transcripts, these clones would have ranked 99th, 8th, 32nd, 126th, 20th, 120th, 52nd, and 22nd most differentially regulated, respectively, indicating that our filter was able to identify unique transcripts that otherwise would have been buried deep within a generic gene list. Secondary validation of the cDNAs was performed using QRT-PCR. Biological replicates were not merged to ascertain the heterogeneity of transcript expression in the disease model. The resulting variation in Pde6b rd1 versus Pde6b + fold change was minimal across the two litter pools (Fig. 3): BE953520 ( 2.0, 3.2), BE989041 ( 2.7, 5.5), BE996537 ( 4.1, 4.3), BF455609 (2.1, 1.5), BF455823 ( 4.3, 5.4), BF464562 ( 1.6, 2.0), BF464839 ( 2.8, 2.8), and BF464860 ( 13.8, 6.9). In addition, QRT-PCR data were in strong agreement with those of the microarray, validating the presence and differential regulation of all 8 transcripts in the diseased retina. Next, a panel of total RNA from 10 individual tissues was assayed using QRT-PCR and compared to Pde6b rd1 , Pde6b +, adult retina, and mouse universal RNA. Relative expression of seven cDNA sequences was found to be specific to retinal-derived samples (Fig. 4). In particular, BE953520, BE989041, and BF455823 transcripts are virtually absent from all other tissues. Only a single clone, BF455609, showed relative expression values in other tissues that were equivalent to those found in retina.

314

J.R. Shearstone et al. / Genomics 85 (2005) 309–321

Fig. 3. Microarray identification and QRT-PCR validation of cDNA clones differentially regulated in a mouse model of retinal degeneration and not represented by Affymetrix GeneChips. Total RNA isolated from diseased Pde6b rd1 and wild-type Pde6b + mouse retina was probed using microarray (white bars) or QRTPCR (gray bars). Biological and technical replicate data were used to generate fold changes and error bars for microarray experiments (n = 4). Technical replicate data (n = 4) for each biological replicate (gray bars) were used to generate fold changes and error bars for QRT-PCR experiments.

Based on a single experiment using a disease model of retinal degeneration on a custom cDNA microarray, we identified 17 of 153 (11.1%) and validated 8 of 153 (5.2%) differentially expressed clones that would not be discovered using Affymetrix technology. Of the clones we chose to validate by QRT-PCR, 100% were found to exist and roughly 88% are retinal specific. Therefore, we conservatively estimate that our cDNA microarray contains at least 433 unique and nonredundant clusters. In addition, we predict that approximately 380 these clones are retinal specific. This evidence strongly supports the notion that tissue-specific, subtracted cDNA libraries are still valuable discovery tools despite the recent improvements in Affymetrix sequence quality. Validation of S-antigen and guanylate cyclase 2F transcript variants To authenticate these expressed sequences further, we turned our attention to isolating the full-length coding regions represented by two of the eight ESTs differentially regulated in the Pde6b rd1 mouse model. BF455609 was eliminated from consideration because of the relatively high expression in tissues such as lung and liver. The remaining seven ESTs were aligned to the October 2003 build of the University of California at Santa Cruz mouse genome browser and the NCBI m32 assembly of the Ensembl genome browser. BE953520 is a possible variant of the recently discovered rootletin transcript [47]. BE996537 and BF464562 are potential 3V UTR or coding sequence alterations of 6-phosphofructo-2-kinase/fructose-2,6biphosphatase 2 (Pfkfb2) and sorting nexin 10 (Snx10), respectively. BF464839 and BF464860 do not lie proximal to any known coding sequence. Ultimately, we selected BE989041 and BF455823 for further characterization

because of their retinal specificity in QRT-PCR experiments, strong differential regulation in the Pde6b rd1 mouse, and substantial sequence evidence suggesting involvement in a shared biological pathway consistent with our disease model. Specifically, these two clones were apparent variants of a known gene or gene fragment in the extremely well characterized visual phototransduction G-protein-coupled receptor-mediated signaling pathway [48–51]. The full insert sequence of BE989041 showed 100% homology to a 486-bp intron region between exons 8 and 9 of the S-antigen (SAG; RefSeq NM_009118) gene locus (Fig. 5A, lines 1 and 2). Primers were designed to span the known 5V UTR of SAG and the intron region represented by our EST. Using cDNA from adult retina, a single band of approximately 1700 bp was generated by PCR (data not shown). Sequencing and realignment showed that this fragment was 100% identical to the 884-bp region between the 5V UTR and exons 1 to 8 of SAG. However, our variant included an additional 790 bp directly 3V of exon 8 (Fig. 5A, line 5). Translation of this sequence resulted in a 13-aminoacid extension of exon 8 followed by a stop codon. This cloned fragment represents a transcript variant that results in a truncation of the 403-amino-acid wild-type protein at residue 254. The CDS of our variant is identical to an Ensembl EST prediction (Fig. 5A, lines 4 and 6) and is not represented by an Affymetrix MG-430 target or consensus sequence (Fig. 5A, lines 3 and 6). The SAG variant was submitted to GenBank and received Accession No. AY651760. The full insert sequence of BF455823 (Fig. 5B, line 3) showed 100% homology to a 1207-bp intron region between exons 18 and 19 of RefSeq Model XM_142224 (Fig. 5B, line 1). BF45523 also lies 24 kb downstream of AK44234 (Fig. 5B, line 2), the longest cloned fragment of the proposed mouse homologue of guanylate cyclase 2F

J.R. Shearstone et al. / Genomics 85 (2005) 309–321

315

Fig. 4. Relative expression of cDNA clones across a tissue panel using QRT-PCR. The expression of (A) BE953520, (B) BE989041, (C) BE996537, (D) BF455609, (E) BF455823, (F) BF464562, (G) BF464839, and (H) BF464860 was determined using total RNA isolated from the following sources: adult retina, Pde6b rd1 retina, Pde6b + retina, mouse universal pool of 11 different cell lines, brain, embryo, heart, kidney, liver, lung, ovary, spleen, testicle, and thymus. Amplification efficiencies for all cDNA probe sets were identical within experimental error. Error bars represent technical replicate data (n = 4).

(GUCY2F). Primers were designed to span the known 5V UTR of AK44234 and the region represented by our EST. A single product of approximately 8200 bp was generated by PCR (data not shown). Sequencing and realignment showed that this fragment was 100% identical to the 5V UTR and CDS of AK443234 (Fig. 5B, lines 2 and 5). However, our PCR product contained an additional 739 bp of coding sequence across six new exons and a 3V UTR of

approximately 4600 bp (Fig. 5B, line 7). The CDS of our clone is considerably different from that predicted by the RefSeq project, but is identical to an Ensembl Transcript prediction (Fig. 5B, line 4). Notably, this transcript is not represented by an Affymetrix MG-430 target or consensus sequence. The protein sequence encoded by this transcript has 96.2 and 90% homology to the previously described rat and human GUCY2F, respectively [52–54]. The full-length

316

J.R. Shearstone et al. / Genomics 85 (2005) 309–321

Fig. 5. Identification of S-antigen (SAG) and guanylate cyclase 2F (GUCY2F) transcript variants. Exons are shown as black bars. 5V and 3V untranslated regions are designated by white bars. (A) The full insert sequence of BE989041 (line 2) aligns to the intron region between exons 8 and 9 of the full-length RefSeq SAG transcript (NM _009118, line 1). An Ensembl EST prediction (ENSMUSESTT00000028350, line 4) suggested an S-antigen truncation that could not be identified using the MG-430 chip set (1419025 _at, line 3). Using primer sets designed to BE989041 and the 5V UTR of the Ensembl prediction, a single 1700bp fragment was generated from adult retina cDNA (line 5). Sequencing and in silico translation of the fragment identified a SAG variant with a 13-amino-acid extension of exon 8 followed by a stop codon (AY651760, line 6). (B) The full insert sequence of BF455823 (line 3) aligns to the intron region between exons 18 and 19 of a RefSeq model transcript for GUCY2F (XM _142224, line 1). The longest known fragment for murine GUCY2F aligns with the first 12 exons of the RefSeq prediction (AK044234, line 2). There are no Affymetrix MG-430 probes to either the RefSeq prediction or known transcript fragment. An Ensembl transcript prediction suggested the full-length GUCY2F (ENSMUST00000042530, line 4). Using primer sets designed to BF455823 and the 5V UTR of the Ensembl prediction, a single 8200-bp fragment was generated from adult retina cDNA (line 5). Sequencing and in silico translation of the fragment identified the full-length GUCY2F (AY651761, line 6).

mouse GUCY2F sequence was submitted to GenBank and received Accession No. AY651761. S-antigen and GUCY2F have been shown to be key molecules in returning photoreceptors to the dark state following adsorption of light [55]. GUCY2F is part of a family of guanylate cyclases responsible for replenishing intracellular cyclic GMP levels and has been cited as a candidate gene for X-linked retinitis pigmentosa [52,54]. Transcript down regulation of GUCY2F is likely a cellular effort to lower cGMP production and thereby maintain visual signaling. This response is consistent with the elevated levels of cGMP caused by reduced cGMP phosphodiesterase activity of the Pde6b rd1 mice [43]. Similarly, S-antigen terminates the primary signaling event of the visual phototransduction cascade by selectively binding and desensitizing the activated phosphorylated form of rhodopsin [56,57]. Therefore, down regulation of this transcript is consistent with maintenance of visual signaling during photoreceptor degeneration. Interestingly, a single base pair deletion leading to a C-terminal truncation of SAG has been linked to Oguchi disease [58]. Bovine rods have

been shown to express an active truncation variant of SAG, Arr1–370A, which localizes to the rod outer segment [59]. A red/green cone-specific SAG variant affecting C-terminal structure was described at the protein level in baboon retina, using antibodies directed against specific domains in the molecule [60]. Segment swapping, peptide competition, and crystallization studies have shown that the receptor-binding, or bactivation recognition,Q domain consists of at least three regions within the first 191 residues of the SAG molecule [56,61–64]. Evidence also suggests that the bphosphate recognitionQ domain is localized to the polypeptide Cterminus [56,65–67]. The SAG truncation described here is 90% identical to the first 242 amino acids of the human homologue and lacks the remaining 152 C-terminal residues of the wild-type protein. Taken together, these data might indicate that this SAG variant is not only enriched in specific photoreceptors, but also capable of binding activated rhodopsin in the nonphosphorylated state. We have demonstrated, through in silico modeling of a retinal-specific, subtracted cDNA library, both the dramatic improvement in transcript coverage made by Affymetrix in

J.R. Shearstone et al. / Genomics 85 (2005) 309–321

the past 4 years and the remaining shortfalls in the technology. Through the use of a custom cDNA microarray and QRT-PCR experiments, we have identified and validated eight ESTs that are differentially regulated in a mouse model of retinal degeneration, yet not represented on the most current commercial array. Cloning strategies identified the full-length mouse homologue of GUCY2F and a splice variant of SAG, effectively highlighting the inability of the current Affymetrix GeneChip formats to capture all transcripts with known biological relevance. In addition, there remain six validated ESTs and nine nonvalidated ESTs that have the potential to yield novel or variant transcripts upon further investigations. In light of these data, we have proposed a revised strategy within our laboratory for expression profiling experiments aimed at gene discovery. Maintenance of a highly reproducible, custom-spotted cDNA platform is a difficult endeavor. Comparison of data generated by a two-color microarray based on PCR products to that of a single-color oligonucleotide-based GeneChip is also fraught with caveats. The ultimate discovery tool would be one that can incorporate novel transcript content into a single standardized technology. One approach may be simply to create a custom Affymetrix chip containing the nonrepresented UniGene clusters. This could easily be accommodated since the current state of the art allows 61,000 transcripts to be represented on a single chip. However, as this technology moves toward a complete representation of the entire public transcript space, more sustainable value will be obtained using custom GeneChips based on the sequence of high-quality, normalized and subtracted cDNA libraries generated from tissues associated with a particular disease of interest.

Materials and methods BMAP and retinal cDNA libraries Mouse cDNA clones were assembled from nonnormalized, normalized, and subtracted libraries generated by M.B. Soares as part of the Brain Molecular Anatomy Project [18]. The retinal collection contains 11,862 clones from eight libraries. NIH_BMAP_Ret1 (nonnormalized) and NIH_ BMAP_Ret1_N (normalized) libraries were derived from embryonic retina tissue and comprise 4.9 and 4.5% of the collection, respectively. NIH_BMAP_Ret2 (nonnormalized) and NIH_BMAP_Ret2_N (normalized) libraries were derived from neonatal retina tissue and comprise 4.5 and 4.0% of the collection, respectively. NIH_BMAP_Ret3 (nonnormalized) and NIH_BMAP_Ret3_N (normalized) libraries were derived from retina tissue and comprise 5.0 and 3.4% of the collection, respectively. NIH_BMAP_ Ret4_S1 and NIH_BMAP_Ret4_S2 are subtracted libraries derived from retina tissue at various stages of development and represent 1.1 and 72.6% of the collection, respectively. In total, 14.4% of clones are from nonnormalized sources,

317

11.9% are from normalized sources, and 73.7% are from subtracted sources. The 11,136 clones of the BMAP1 collection are assembled using 22 libraries generated from 11 regions of the brain. The percentages of clones from nonnormalized and normalized libraries of each tissue are reported: cerebellum (2.8, 4.9%), brain stem (2.2, 2.1%), olfactory bulb (2.8, 4.9%), hypothalamus (2.8, 3.8%), prefrontal cortex (2.5, 4.1%), amygdala (3.2, 2.6%), basal ganglia (2.8, 3.9%), pineal gland (3.8, 3.9%), striatum (3.5, 3.3%), hippocampus (2.9, 3.5%), and spinal cord (2.6, 4.1%). In addition, BMAP1 contained 26.9% clones from a subtracted library, NIH_BMAP_M_S1, generated using a mixture of the normalized libraries. In total, 32.0% of clones are from nonnormalized sources, 41.1% are from normalized sources, and 26.9% are from subtracted sources. The BMAP2 collection containing 20,352 clones is derived from 4774 and 480 clones from the retinal and BMAP1 collections, respectively. Also, 311 and 2030 clones were added from a nonnormalized and a subtracted hippocampus library, respectively. In addition, the majority of the BMAP2 collection, 12,757 clones, was derived from five libraries constructed by serial subtraction of NIH_ BMAP_M_S1. Overall, 76.6% of BMAP2 is derived from brain tissue sources, while the remaining clones are derived from the retina. In total, 5.6% of this collection is from nonnormalized sources, 5.3% is from normalized sources, 26.4% is from subtracted sources, and 62.7% is from serial subtracted sources. Preliminary BLAST alignments The 3V EST sequences of 100 clones were chosen at random from the BMAP1, BMAP2, or retinal cDNA libraries. GenBank sequences were queried against Mu11k exemplar, MG-U74 consensus, and MG-430 consensus sequences using the NetAffx Web site [1]. cDNA library sequence with E values less than 1.00  10 50 were considered present on the Affymetrix chip set. Random selection of cDNA clones and subsequent alignment were repeated (n = 5) to generate error bars. BLAST alignments of retinal library FASTA formatted consensus sequences for each of 45,137 qualifiers on the Affymetrix MG-430A and MG430B were obtained from the NetAffx Web site. FASTA formatted 3V EST sequences for each of 11,862 cDNA clones in the mouse retinal library were obtained from the National Center for Biotechnology Information Web site. The approximately 3 million individual EST sequences comprising mouse UniGene build 124 were used as the common reference template onto which the MG-430 and retinal sequences were aligned using the BLAST algorithm [68,69]. Each MG-430 and retinal sequence received a single UniGene number derived from the cluster number of

318

J.R. Shearstone et al. / Genomics 85 (2005) 309–321

the UniGene EST producing the highest alignment score. MG-430 and retinal sequences with redundant UniGene cluster numbers or alignment E values greater than 1.00  10 50 were removed from further analysis. Sequence overlap between the MG-430 and the retinal cDNA content was then determined by direct comparison of the UniGene cluster assignment. The retinal library sequences that appeared to be unique were confirmed by attempting an alignment to the MG-430 chip set consensus sequences on the NetAffx Web site. Retinal microarray fabrication Escherichia coli containing the retinal library cDNA inserts were inoculated into 96 deep well blocks containing 1.2 ml of 2YT medium using disposable plastic replicators (Genetix, Hampshire, UK). Cultures were grown at 378C for 20 h under constant shaking at 200 rpm and then pelleted via centrifugation. Plasmids were purified using the QIAprep Turbo kit (Qiagen, Valencia, CA, USA). Ninetysix-pin disposable replicators were used to transfer plasmid templates to PCR plates containing 60 Al of reaction mix. PCRs utilizing M13 forward and reverse primers and Taq polymerase (Qiagen) underwent 32 cycles of 958C for 45 s, 538C for 45 s, and 728C for 2 min 45 s. PCR products were purified using the QIAquick Kit (Qiagen) and eluted in 100 Al of water. PCR products were quality scored and sized by running on a 2% agarose 96-well E-gel (Invitrogen Corp., Carlsbad, CA, USA). Products ranged in size from 300 to 3000 bp. The total PCR yield for each well was measured on a SPECTRAmax PLUS spectrophotometer (Molecular Devices, Sunnyvale, CA, USA). Plates containing an average of less than 1.5 Ag of product per well were reamplified and combined until the total average mass exceeded this value. Plates were dried at room temperature, resuspended in 50 Al of water, and transferred to 384-well V-bottom spotting plates (Genetix). Spotting plates were dried at room temperature and products were resuspended in 7.5 Al of 30% DMSO (Sigma–Aldrich, St. Louis, MO, USA). A Biorobotics MicroGrid TAS II with 16 Microspot 10K quill pins (Genomic Solutions, Ann Arbor, MI, USA) was used to array PCR products at a pitch of 260 Am onto UltraGAPS amino-silane-coated slides (Corning, Corning, NY, USA) in a relative humidity of 45 F 2%. Microarrays were rehydrated in a humidity chamber for 30 s, denatured on a heat block at 958C for 30 s, and ultimately crosslinked at 125 mJ in a GS Genelinker UV chamber (Bio-Rad, Hercules, CA, USA). Slides were blocked for 1 h with a 50% formamide, 5 SSC, 0.1% SDS, 0.1 mg/ml BSA solution and dried via centrifugation. Sample acquisition and preparation Universal Reference RNA representing a pool of 11 different mouse cell lines was purchased from Stratagene

(La Jolla, CA, USA). Embryo, embryo (fibroblast), kidney, liver (hepatocyte), lung (alveolar macrophage), B lymphocyte, T lymphocyte, mammary gland, muscle myoblast, skin, and testis cell lines are represented. The Mouse Assorted Total RNA Kit (Ambion, Austin, TX, USA) was used in QRT-PCR validation experiments. Liver, brain, thymus, heart, lung, spleen, testicle, ovary, kidney, and embryo derived from Swiss Webster mice are represented. Adult retina total RNA was isolated from a pool of adult retinas harvested from 700 individual C57/B6 mice (Charles River Laboratories, Wilmington, MA, USA). Mouse retina containing 10–20% retinal pigment epithelium was harvested from mouse strain C3H/HeSnJ MMTV (Pde6b rd1 ) and wild-type strain C3.BLiA-Pde6b +-Krd/J (Pde6b +) at 15 days postbirth (The Jackson Laboratory, Bar Harbor, ME, USA). Each biological replicate represents a pool of tissue generated from two litters totaling 10 animals. All retinas were placed in TRIzol reagent (Invitrogen) and stored at 808C until extraction. Total RNA was extracted with TRIzol reagent according to the manufacturer’s instructions. All RNA samples were then cleaned using RNeasy kit (Qiagen). Microarray processing One microgram of total RNA was amplified using the RiboAmp RNA Amplification kit (Arcturus Bioscience, Mountain View, CA, USA) and the manufacturer’s protocol for two-round amplification. Next, 2 Ag of aRNA was labeled by direct incorporation of dCTP–Cy3 or dCTP–Cy5 dye (Amersham Pharmacia Biotech, Piscataway, NJ, USA) using SuperScript II reverse transcriptase (Invitrogen) and random hexamer primers (Invitrogen). Probe was quantified using absorbance at 260 and 550 nm for Cy3 labeling or 650 nm for Cy5 labeling. Samples containing 30 pmol of incorporated dye were hybridized to each array under an m-Series LifterSlip (Erie Scientific, Portsmouth, NH, USA) in blocking buffer supplemented with mouse COT I and salmon sperm DNA at 0.1 mg/ml. After hybridization, slides were washed under gentle agitation as follows: 2 SSC with 0.1% SDS at 428C for 5 min, 0.1 SSC with 0.1% SDS at room temperature for 2 min, 0.1 SSC at room temperature for 2 min, 0.01 SSC at room temperature for 2 min (two times). Slides were dried by centrifugation and scanned at a resolution of 10 Am using a ScanArray Express HT (Perkin–Elmer, Wellesley, MA, USA). Images were saved as 16-bit TIFF files and imported into GenePix Pro analysis software for data extraction (Axon Instruments, Union City, CA, USA). Data normalization, replicate merging, fold change determination, and p value calculations were computed using the Axon/GenePix error model within the Resolver gene expression analysis software package (Rosetta Biosoftware, Seattle, WA, USA). Biological replicates are defined as multiple samples isolated from different retinal pools. Technical replicates are defined as multiple labelings

J.R. Shearstone et al. / Genomics 85 (2005) 309–321

and hybridization from a single retinal pool. In these experiments, technical replicates were always balanced by equal swapping of the fluorescent dyes between control and experimental samples. Microarray data generated from the Pde6b rd1 disease model versus matching Pde6b + control represent the error-weighted merging of biological and technical replicates, totaling an n of 4. Error bars reflect a 68% probability that the actual log ratio is within the defined range [70]. Quantitative real-time PCR Oligonucleotide primers (Biosearch Technologies, Novato, CA, USA) and TaqMan MGB (Applied Biosystems, Foster City, CA, USA) probes were designed from cDNA clone insert sequence using Primer Express version 2.0.0 (Applied Biosystems). TaqMan MGB probes contain a 5V covalently linked fluorescent reporter dye (FAM) and a minor-groove-binder nonfluorescent quencher (MGBNF) covalently linked to the 3V end. Oligonucleotide standard template (Biosearch Technologies) design included 10 bp of gene-specific sequence at the 5V and 3V ends of the amplicon. Total RNA samples were concentration normalized by measuring absorbance at 260 nm. Residual DNA was removed from 5 Ag of total RNA using 5 units of DNase I amplification grade (Invitrogen) at 208C for 15 min. An aliquot of the treated sample was used as control in subsequent QRT-PCR assays to ensure the absence of DNA contamination. The remaining DNase-treated RNA was used in a cDNA synthesis reaction using a highcapacity cDNA archive kit (Applied Biosystems). Oligonucleotide templates were pooled and then serially diluted from 1 to 10 eight times in 25 ng/Al yeast RNA (Ambion) to include a final range of 500 fM to 5 zM. Quadruplicate PCRs for samples and standards were mixed in a 96-well plate and then transferred to a 384-well optical plate (Applied Biosystems) and cycled in a 7900HT (Applied Biosystems) thermal cycler under the following conditions: 508C for 2 min (uracil N-deglycosylase digest), 958C for 10 min (activation of Taq thermostable polymerase), and 40 cycles of 958C for 15 s and 608C for 60 s. Relative transcript quantities for each sample were determined by comparison to oligonucleotide standard curve using Sequence Detection Software (Applied Biosystems). Error bars were generated for transcript abundance based on technical replicate data (n = 4). Error was propagated during log ratio calculations and reflects a greater than 68% probability that the actual log ratio is within the defined range. Cloning of ORF from BE989041 and BF455823 One microgram of adult retina total RNA was used as template to generate cDNA by means of the SuperScript First-Strand Synthesis System for RT-PCR kit (Invitrogen). Reverse transcription reactions were primed with either

319

random hexamers or oligo(dT)12–18. Efficiency of RT was judged by the PCR amplification of a 250-bp region at the 5V and 3V ends of the housekeeping genes clathrin (6 kb), GAPDH (1.7 kb), and h-actin (1.2 kb). EST sequence was aligned to the October 2003 build of the University of California at Santa Cruz mouse genome browser and the NCBI m32 assembly of the Ensembl genome browser [48– 50]. Forward primer (5V-GTTATCTGATAGGATTGCACCAGGTCC-3V) was designed within the 5V UTR of the Santigen RefSeq transcript NM_009118 and paired with a reverse primer (5V-GGCCATCTGCTAGAGACAAACTTCC-3V) designed against the insert sequence of clone BE989041. Forward primer (5V-TTACTCTTTCAGTGGATTTTGTCAACAGAC-3V) was designed within the 5V UTR of the GUCY2F transcript fragment AK044234 and paired with a reverse primer (5V-GGGAGACGGCTTCTCTTTCCTAAAC-3V) designed against the insert sequence of clone BF455823. Two microliters of a 1:1 mixture of random hexamer and oligo(dT)12–18 primed cDNA was the template for PCRs using the EST primer sets and the Elongase Enzyme Mix (Invitrogen). Reactions were denatured for 5 min at 958C before undergoing 30 cycles of 958C for 30 s, 558C for 45 s, and 688C for 10 min. PCR products were purified using the QIAquick Kit (Qiagen), eluted in 100 Al of water, and used as template for doublestranded sequencing.

Acknowledgments We thank our collaborators in the Department of Genetics at Harvard Medical School: Constance Cepko for providing the BMAP1, BMAP2, and retinal clone collections; Wing H. Wong for supplying the adult retina RNA. We also thank our Biogen IDEC colleagues: Christopher Tonkin, Michele McAuliffe, Brittney Coleman, and Richard Tizard for DNA sequencing; Michael Getman for full-length cloning advice; Beth Murray for sequence analysis input; Huo Li for QRT-PCR statistics contributions; and John McCoy for reading the manuscript.

References [1] G. Liu, et al., NetAffx: Affymetrix probesets and annotations, Nucleic Acids Res. 31 (2003) 82 – 86. [2] Array design and performance of the GeneChip mouse expression set 430, Technical Note, Affymetrix, Inc, Santa Clara, CA (2003) 1–10. [3] T. Miyahara, et al., Gene microarray analysis of experimental glaucomatous retina from cynomolgus monkey, Invest. Ophthalmol. Visual Sci. 44 (2003) 4347 – 4356. [4] S. Katsuma, et al., Global analysis of differentially expressed genes during progression of calcium oxalate nephrolithiasis, Biochem. Biophys. Res. Commun. 296 (2002) 544 – 552. [5] C.B. Herath, et al., Pregnancy-associated changes in genome-wide gene expression profiles in the liver of cow throughout pregnancy, Biochem. Biophys. Res. Commun. 313 (2004) 666 – 680.

320

J.R. Shearstone et al. / Genomics 85 (2005) 309–321

[6] M.A. Shultz, et al., Gene expression analysis in response to lung toxicants. I. Sequencing and microarray development, Am. J. Respir. Cell Mol. Biol. 30 (2004) 296 – 310. [7] L.A. Cogburn, et al., Systems-wide chicken DNA microarrays, gene expression profiling, and discovery of functional genes, Poult. Sci. 82 (2003) 939 – 951. [8] J. Lo, et al., 15000 unique zebrafish EST clusters and their future use in microarray for profiling gene expression patterns during embryogenesis, Genome Res. 13 (2003) 455 – 466. [9] S. Bortoli, et al., Gene expression profiling of human satellite cells during muscular aging using cDNA arrays, Gene 321 (2003) 145 – 154. [10] L. Gan, et al., Identification of cathepsin B as a mediator of neuronal death induced by Abeta-activated microglial cells using a functional genomics approach, J. Biol. Chem. 279 (2004) 5565 – 5572. [11] Y. Cho, J. Fernandes, S.H. Kim, V. Walbot, Gene-expression profile comparisons distinguish seven organs of maize, Genome Biol. 3 (2002) research0045. [12] T. Girke, et al., Microarray analysis of developing Arabidopsis seeds, Plant Physiol. 124 (2000) 1570 – 1581. [13] S. Bortoluzzi, F. d’Alessi, G.A. Danieli, A novel resource for the study of genes expressed in the adult human retina, Invest. Ophthalmol. Visual Sci. 41 (2000) 3305 – 3308. [14] K. Malone, M.M. Sohocki, L.S. Sullivan, S.P. Daiger, Identifying and mapping novel retinal-expressed ESTs from humans, Mol. Vision 5 (1999) 5. [15] G. Wistow, A project for ocular bioinformatics: NEIBank, Mol. Vision 8 (2002) 161 – 163. [16] H. Stohr, et al., EST mining of the UniGene dataset to identify retinaspecific genes, Cytogenet. Cell Genet. 91 (2000) 267 – 277. [17] M.B. Soares, Identification and cloning of differentially expressed genes, Curr. Opin. Biotechnol. 8 (1997) 542 – 546. [18] M.F. Bonaldo, G. Lennon, M.B. Soares, Normalization and subtraction: two approaches to facilitate gene discovery, Genome Res. 6 (1996) 791 – 806. [19] X. Mu, et al., Gene expression in the developing mouse retina by EST sequencing and microarray analysis, Nucleic Acids Res. 29 (2001) 4983 – 4993. [20] M.D. Adams, et al., Complementary DNA sequencing: expressed sequence tags and Human Genome Project, Science 252 (1991) 1651 – 1656. [21] J. Xu, et al., Identification of differentially expressed genes in human prostate cancer using subtraction and microarray, Cancer Res. 60 (2000) 1677 – 1682. [22] Y. Jiang, et al., Discovery of differentially expressed genes in human breast cancer using subtracted cDNA libraries and cDNA microarrays, Oncogene 21 (2002) 2270 – 2282. [23] L. Sun, J. Lee, H.A. Fine, Neuronally expressed stem cell factor induces neural stem cell migration to areas of brain injury, J. Clin. Invest. 113 (2004) 1364 – 1374. [24] M.F. Bonaldo, et al., 1274 full-open reading frames of transcripts expressed in the developing mouse nervous system, Genome Res. 14 (2004) 2053 – 2063. [25] P. Laveder, C. De Pitta, S. Toppo, G. Valle, G. Lanfranchi, A two-step strategy for constructing specifically self-subtracted cDNA libraries, Nucleic Acids Res. 30 (2002) e38. [26] K.J. Martin, A.B. Pardee, Identifying expressed genes, Proc. Natl. Acad. Sci. USA 97 (2000) 3789 – 3791. [27] Array design for the GeneChip Human Genome 133 Set, Technical Note, Affymetrix, Inc, Santa Clara, CA (2001) 1–10. [28] W. Cao, et al., Comparing gene discovery from Affymetrix GeneChip microarrays and Clontech PCR-select cDNA subtraction: a case study, BMC Genom. 5 (2004) 26. [29] J.K. Lee, et al., Comparing cDNA and oligonucleotide array data: concordance of gene expression across platforms for the NCI-60 cancer cells, Genome Biol. 4 (2003) R82.

[30] A. Shimizu-Matsumoto, et al., An expression profile of genes in human retina and isolation of a complementary DNA for a novel rod photoreceptor protein, Invest. Ophthalmol. Visual Sci. 38 (1997) 2576 – 2585. [31] S. Blackshaw, R.E. Fraioli, T. Furukawa, C.L. Cepko, Comprehensive analysis of photoreceptor gene expression and the identification of candidate retinal disease genes, Cell 107 (2001) 579 – 589. [32] D. Sharon, S. Blackshaw, C.L. Cepko, T.P. Dryja, Profile of the genes expressed in the human peripheral retina, macula, and retinal pigment epithelium determined through serial analysis of gene expression (SAGE), Proc. Natl. Acad. Sci. USA 99 (2002) 315 – 320. [33] I. Chowers, et al., Identification of novel genes preferentially expressed in the retina using a custom human retina cDNA microarray, Invest. Ophthalmol. Visual Sci. 44 (2003) 3732 – 3741. [34] R. Farjo, et al., Mouse eye gene microarrays for investigating ocular development and disease, Vision Res. 42 (2002) 463 – 470. [35] A.S. Wilson, B.G. Hobbs, T.P. Speed, P.E. Rakoczy, The microarray: potential applications for ophthalmic research, Mol. Vision 8 (2002) 259 – 270. [36] J.U. Pontius, L. Wagner, G.D. Schuler, UniGene: a unified view of the transcriptome. The NCBI Handbook, National Center for Biotechnology Information, Bethesda, MD, 2003, p. 12. [37] S. Blackshaw, et al., Genomic analysis of mouse retinal development, PLoS Biol. 2 (2004) E247. [38] P. Kapranov, et al., Large-scale transcriptional activity in chromosomes 21 and 22, Science 296 (2002) 916 – 919. [39] H. Kiyosawa, I. Yamanaka, N. Osato, S. Kondo, Y. Hayashizaki, Antisense transcripts with FANTOM2 clone set and their implications for gene regulation, Genome Res. 13 (2003) 1324 – 1334. [40] R. Yelin, et al., Widespread occurrence of antisense transcription in the human genome, Nat. Biotechnol. 21 (2003) 379 – 386. [41] S. Cawley, et al., Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs, Cell 116 (2004) 499 – 509. [42] K. Numata, et al., Identification of putative noncoding RNAs among the RIKEN mouse full-length cDNA collection, Genome Res. 13 (2003) 1301 – 1306. [43] C. Bowes, et al., Retinal degeneration in the rd mouse is caused by a defect in the beta subunit of rod cGMP-phosphodiesterase, Nature 347 (1990) 677 – 680. [44] S.J. Pittler, W. Baehr, Identification of a nonsense mutation in the rod photoreceptor cGMP phosphodiesterase beta-subunit gene of the rd mouse, Proc. Natl. Acad. Sci. USA 88 (1991) 8322 – 8326. [45] D.B. Farber, J.S. Danciger, G. Aguirre, The beta subunit of cyclic GMP phosphodiesterase mRNA is deficient in canine rod-cone dysplasia 1, Neuron 9 (1992) 349 – 356. [46] M.E. McLaughlin, T.L. Ehrhart, E.L. Berson, T.P. Dryja, Mutation spectrum of the gene encoding the beta subunit of rod phosphodiesterase among patients with autosomal recessive retinitis pigmentosa, Proc. Natl. Acad. Sci. USA 92 (1995) 3249 – 3253. [47] J. Yang, et al., Rootletin, a novel coiled-coil protein, is a structural component of the ciliary rootlet, J. Cell Biol. 159 (2002) 431 – 440. [48] W.J. Kent, BLAT—The BLAST-like alignment tool, Genome Res. 12 (2002) 656 – 664. [49] W.J. Kent, et al., The human genome browser at UCSC, Genome Res. 12 (2002) 996 – 1006. [50] M.P. Hammond, E. Birney, Genome information resources—Developments at Ensembl, Trends Genet. 20 (2004) 268 – 272. [51] S.P. Daiger, B.F. Rossiter, J. Greenberg, A. Christoffels, W. Hide, Data services and software for identifying genes and mutations causing retinal degeneration, Invest. Ophthalmol. Visual Sci. 39 (1998) S295. [52] R.B. Yang, H.J. Fulle, D.L. Garbers, Chromosomal localization and genomic organization of genes encoding guanylyl cyclase receptors expressed in olfactory sensory neurons and retina, Genomics 31 (1996) 367 – 372.

J.R. Shearstone et al. / Genomics 85 (2005) 309–321 [53] D.G. Lowe, et al., Cloning and expression of a second photoreceptorspecific membrane retina guanylyl cyclase (RetGC), RetGC-2, Proc. Natl. Acad. Sci. USA 92 (1995) 5535 – 5539. [54] R.B. Yang, D.C. Foster, D.L. Garbers, H.J. Fulle, Two membrane forms of guanylyl cyclase found in the eye, Proc. Natl. Acad. Sci. USA 92 (1995) 602 – 606. [55] K.D. Ridge, N.G. Abdulaev, M. Sousa, K. Palczewski, Phototransduction: crystal clear, Trends Biochem. Sci. 28 (2003) 479 – 487. [56] V.V. Gurevich, E.V. Gurevich, The molecular acrobatics of arrestin activation, Trends Pharmacol. Sci. 25 (2004) 105 – 111. [57] M. Han, V.V. Gurevich, S.A. Vishnivetskiy, P.B. Sigler, C. Schubert, Crystal structure of beta-arrestin at 1.9 A: possible mechanism of receptor binding and membrane translocation, Structure (Cambridge) 9 (2001) 869 – 880. [58] S. Fuchs, et al., A homozygous 1-base pair deletion in the arrestin gene is a frequent cause of Oguchi disease in Japanese, Nat. Genet. 10 (1995) 360 – 362. [59] W.C. Smith, et al., A splice variant of arrestin: molecular cloning and localization in bovine retina, J. Biol. Chem. 269 (1994) 15407 – 15410. [60] I. Nir, N. Ransom, S-antigen in rods and cones of the primate retina: different labeling patterns are revealed with antibodies directed against specific domains in the molecule, J. Histochem. Cytochem. 40 (1992) 343 – 352. [61] V.V. Gurevich, J.L. Benovic, Cell-free expression of visual arrestin: truncation mutagenesis identifies multiple domains involved in rhodopsin interaction, J. Biol. Chem. 267 (1992) 21919 – 21923.

321

[62] V.V. Gurevich, J.L. Benovic, Visual arrestin interaction with rhodopsin: sequential multisite binding ensures strict selectivity toward light-activated phosphorylated rhodopsin, J. Biol. Chem. 268 (1993) 11628 – 11638. [63] S.A. Vishnivetskiy, M.M. Hosey, J.L. Benovic, V.V. Gurevich, Mapping the arrestin–receptor interface: structural elements responsible for receptor specificity of arrestin proteins, J. Biol. Chem. 279 (2004) 1262 – 1268. [64] J.A. Hirsch, C. Schubert, V.V. Gurevich, P.B. Sigler, The 2.8 A crystal structure of visual arrestin: a model for arrestin’s regulation, Cell 97 (1999) 257 – 269. [65] A. Pulvermuller, et al., Functional differences in the interaction of arrestin and its splice variant, p44, with rhodopsin, Biochemistry 36 (1997) 9253 – 9260. [66] K. Schroder, A. Pulvermuller, K.P. Hofmann, Arrestin and its splice variant Arr1-370A (p44): mechanism and biological role of their interaction with rhodopsin, J. Biol. Chem. 277 (2002) 43987 – 43996. [67] S.A. Vishnivetskiy, et al., How does arrestin respond to the phosphorylated state of rhodopsin? J. Biol. Chem. 274 (1999) 11451 – 11454. [68] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, D.J. Lipman, Basic local alignment search tool, J. Mol. Biol. 215 (1990) 403 – 410. [69] M.S. Boguski, G.D. Schuler, ESTablishing a human transcript map, Nat. Genet. 10 (1995) 369 – 371. [70] Error bars and p-values in the Rosetta Resolver application, Technical Note, Rosetta Biosoftware, Seattle, WA (2003) 1–13.