C H A P T E R
3 Whole-exome sequencing and whole-genome sequencing Hui Wanga, Rui Chenb,c a
Institute of Life Sciences, College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, China bDepartment of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, United States cHuman Genome Sequencing Center, Baylor College of Medicine, Houston, TX, United States
There are more than 1000 diseases/syndromes that affect the human eye [1]. Genetics play a very important role in many of these diseases, ranging from Mendelian-inherited genetic diseases, such as retinitis pigmentosa (RP), to adult onset complex diseases, such as glaucoma and age-related macular degeneration (AMD). Sequencing of the individual genome is an essential step for both diagnosis and discovery of underlying mutations and variants. In the past decade, mutation screening using next-generation sequencing (NGS) technology is becoming the standard of care for patients with Mendelian diseases. Recently, exome and whole-genome sequencing (WGS) have been increasingly used as the methods of choice post-genome-wide association studies (GWAS) to identify causal variants and genes underlying complex diseases. One of the prominent features of inherited eye diseases is its genetic heterogeneity, with greater than 800 disease-associated genes having been identified (https://www.omim.org/). Since similar clinical phenotypes can result from mutations in different genes, mutation screening for a large number of genes is often essential to identify the pathogenic mutations carried by the patient. Historically, in molecular diagnosis, various technologies have been applied, including Sanger sequencing of a few genes [2], array sequencing [3], and arrayed primer extension reaction (APEX) [4], which tests a set of previously known mutations. Due to the significant genetic heterogeneity and large number of mutant alleles, it is not surprising that the yield of molecular diagnosis via these approaches is relatively low. As a result, molecular diagnosis has not been performed for a vast majority of patients until the emergence of a new sequencing technology, NGS, which allows mutation screening for a large number of genes at an affordable cost.
Genetics and Genomics of Eye Disease https://doi.org/10.1016/B978-0-12-816222-4.00003-4
27
Copyright # 2020 Elsevier Inc. All rights reserved.
28
3. Whole-exome sequencing and whole-genome sequencing
Development of NGS technologies NGS, also known as high-throughput sequencing, is a general term that refers to a set of new sequencing technologies that generate sequence data directly from a large DNA fragment pool in parallel at a low cost. In contrast to the Sanger method, which requires each reaction to have a purified DNA template, NGS technologies generate sequence information directly from a mixture of DNA fragments. A comparison of five commercially available platforms is listed in Table 1.
Illumina sequencing Currently, the most commonly used NGS platforms are manufactured by Illumina. Initially developed by Solexa, the first generation machine named Genome Analyzer was commercialized in 2007. As shown in Fig. 1, to perform sequencing DNA is first fragmented to 300–500 bp and linked to universal adaptors containing molecular barcodes. After a few cycles of polymerase chain reaction (PCR) amplification, the purified library is loaded on the chip, called a flow TABLE 1 Basic features of NGS platforms. Platform
Company
Amplification
Chemistry
454
Roche
Emulsion PCR
Pyrosequencing
Solexa
Illumina
Bridge amplification
Reversible terminators
SOLiD
Life Technologies
Emulsion PCR
Ligation
Retrovolocity
BGI
Rolling cycle amplification
Nanoball/ligation
Ion Torrent
Thermo Fisher
Emulsion PCR
Ion-sensitive SBS
FIG. 1 Basic workflow of NGS library preparation.
II. Genomics in the eye
Development of NGS technologies
29
cell, and then undergoes three steps: on-chip amplification, sequencing, and data analysis. Along the bottom of the flow cell are hundreds of thousands of oligonucleotides, which are complementary to the adaptor of the library (Fig. 2). Once the library molecules are anchored to the flow cell via complementary sequences, the amplification phase called cluster generation begins. The DNA strand bends over and attaches to another oligo forming a bridge, then the complement strand is synthesized (bridge amplification). In a new cycle, both strands can serve as the template and form new bridges, resulting in thousands of DNA clusters with forward and reverse strands. After amplification, only the forward strand remains on the flow cell as the sequencing template and the reverse strand is cleaved. Next, during the sequencing step, sequencing primers and modified nucleotides are loaded onto the flow cell. Each nucleotide has a blocker with a reversible terminator-fluorescent-molecule, thus only one nucleotide can be added during each sequencing cycle. The 30 blocker (30 -O-azidomethyl) and fluorescent dye attached to the incorporated nucleotide need to be cleaved to continue extension at the 30 end. After each round of synthesis, a camera captures the image of the flow cell. The added nucleotide can be determined based on the wavelength of the fluorescent tag. Nonincorporated molecules are washed away after each round. With the sequencing by synthesis (SBS) technology, thousands of fragments are sequenced simultaneously. The read length of Illumina platforms is up to 300 bp with errors 0.1% for 85% of bases. With the latest NovaSeq instrument, each machine can generate up to 6 trillion bases within 2 days, bringing down the cost of sequencing the human genome at 30 coverage to less than $1000. In addition to production-scale platforms, Illumina also has four types of bench-top sequencers. Among them, iSeq is the latest model which can yield 1.2 Gb data within 1 day. FIG. 2
Illumina sequencing process. (A) The flow cell contains oligos which are complementary to the adaptors ligated to the DNA strand; (B) the DNA template strand is immobilized onto the flow cell; (C) the strand bends over and attaches to another oligo forming a bridge; (D) a polymerase synthesizes the reverse strand (bridge amplification); (E) the two strands release and straighten; (F) each forms a new bridge and results in a cluster of forward and reverse strand clones; (G) the reverse strands on the flow cell are cleaved and the forward strands are used as templates for sequencing; and (H) each cluster is sequenced using fluorescently labeled, reversible terminator nucleotides.
II. Genomics in the eye
30
3. Whole-exome sequencing and whole-genome sequencing
As shown in Table 1, in addition to the Illumina platforms, several other instruments have been commercialized since 2005. SOLiD and 454 became obsolete due to high per base cost and relatively high error rates. Platforms from two competitive companies, Ion Torrent from Thermo Fisher and DNA nanoball sequencing from BGI, currently remain in the market. Each platform uses unique sequencing technologies. Ion Torrent is based on ion sensitive SBS and DNA nanoball sequencing is based on ligation-based chemistry [5, 6].
Targeted enrichment sequencing Targeted enrichment selectively isolates specific genomic regions of interest prior to sequencing. The history of targeted enrichment sequencing dates back to 2007 after the introduction of the 454 sequencer, which cost over 1 million US dollars to sequence an individual [7]. Scientists from NimbleGen and Baylor College of Medicine enriched 6726 exons from the human genome by applying a microarray hybridization approach and sequenced on the 454 platform, making it possible to sequence large sets of regions of interests at low cost. Within a couple of years, various strategies have been developed for enriching specific targeted regions, among which the most widely used are hybridization-based molecular inversion probes (MIPs) and targeted amplicon enrichment (Fig. 3).
FIG. 3 Different targeted enrichment techniques. (A) Hybrid capture targeted enrichment on solid surface; (B) hybrid capture targeted enrichment in solution; (C) enrichment by molecular inversion probes (MIPs); and (D) targeted enrichment by multiplex PCR.
II. Genomics in the eye
Targeted enrichment sequencing
31
Targeted enrichment techniques Hybridization-based enrichment Single-strand DNA or RNA pool that is complementary to the genomic region of interest is designed and synthesized. The length of the probes varies between 65 and 120 bp. To enrich the target region, the DNA sequencing library is denatured and hybridized to the probe. DNA fragments from the target regions will form a duplex with the probe oligo. After hybridization, DNA fragments that do not bind to probes are removed through stringent washing steps and DNA fragments that remain bound to probes are eluted and amplified for sequencing [7, 8]. Hybridization-based enrichment can be performed either on a solid surface or in a solution. This enrichment method is the most widely used in both research and clinical fields. However, it has some limitations. It requires the construction of genomic libraries prior to hybridization. The procedure takes a couple of hours or longer, and does not scale well with a larger number of samples. Furthermore, the enrichment efficiency is reduced if the size of the target region is small, resulting in significant off-target effects. Molecular inversion probes As shown in Fig. 3, MIPs are single-stranded DNA molecules that contain the complementary sequence to two regions flanking the target of interest up to several hundred base pairs apart at the 50 and 30 end of the probe, respectively. The middle portion of the MIP is the universal linker region and is used for PCR amplification. During incubation, the 50 and 30 ends of the MIP hybridize to the sequence flanking the genomic target. Then, it undergoes gap filling and ligation resulting in a fully circularized probe. Linear DNA, including the nonreacted probes in the reaction, are removed by exonuclease treatment [9, 10]. MIP is suitable for small region enrichment, single-nucleotide polymorphism (SNP) genotyping, and copy number variation (CNV) detection [11, 12]. The advantages of MIP are: (i) no library preparation step; (ii) less DNA required; and (iii) easily automatable. However, designing optimal MIP probes might not be possible for some targeted regions. Furthermore, the coverage across the targeted region is often highly variable. Targeted amplicon sequencing Enrichment by PCR is a more straightforward method compared to others. It is a two-step PCR process that allows for amplification of the targeted region by using traditional PCR, followed by an additional PCR that attaches a tag with a unique barcode and adaptor sequence for the corresponding NGS platform. Several factors need to be considered when designing primers for multiplex PCR (i) the melting temperature (Tm) of the primers should be either identical or within a 1–2°C difference; (ii) appropriate guanine-cytosine (GC) content (50%–55%); and (iii) avoiding primer cross-complementarity [13]. Amplicon sequencing is widely used in clinical settings especially when specimens are low in DNA quality and quantity. Despite the rapid reduction in sequencing costs, targeted enrichment sequencing is still widely used due to the following reasons: in cases where super high coverage (100 to 1000) is essential, targeted enrichment sequencing is still significantly more cost effective than the whole genome method. In addition, since only a small fraction of the genome is sequenced, the amount of data generated is low, making data storage, transfer, and processing easier and faster.
II. Genomics in the eye
32
3. Whole-exome sequencing and whole-genome sequencing
Whole-exome sequencing Whole-exome sequencing (WES) refers to the targeted sequencing of exons of all annotated protein-coding genes in the genome. Based on current annotation, the human genome contains about 25,000 protein-coding genes which cover between 1% and 2% of the genome. Array and solution-based capture are the two main categories of whole exome capture technologies. Array-based capture requires higher DNA input and is less flexible. It has been largely replaced by the solution-based method. In the solution-based method, the DNA library is hybridized to a biotinylated oligonucleotide probe pool. The biotinylated probe/target DNA hybrids are pulled down by streptavidin-coated magnetic beads, resulting in enrichment of the target regions.
Commercially available WES kits Roche, Agilent, Illumina, and Integrated DNA Technologies (IDT) are the four major companies that provide commercial whole exome enrichment kits as shown in Table 2. NimbleGen probes are DNA oligoes synthesized on a microarray. NimbleGen SeqCap v3.0 kit has approximately 2.1 million DNA probes, covering 64 Mb of >20,000 genes in the human genome. Its probe density is high given an overlapping probe design [14]. Agilent SureSelect uses RNA probes instead of DNA probes. The 114–126 bp RNA probes are converted from DNA oligoes that are synthesized on glass slides. Agilent probes reside immediately adjacent to one another across the target exon intervals. Agilent SureSelect Human All Exon V6 covers 60 MB of target regions in 20,000 genes [15]. Illumina provides two different exome enrichment kits, one is the Truseq exome enrichment kit and the other one is the Nextera rapid capture exome kit as shown in Table 2. The probe design of the two Illumina kits leaves small gaps between probes in the target regions. It relies on paired end reads that extend outside the probe sequence to fill the gaps. Instead of a mechanical shear, the Illumina Nextera kit uses transposons to fragment the TABLE 2 Comparison of WES kits. NimbleGen SeqCap EZ Human Exome Probes
Agilent Sureselect Human All Exon V6
Illumina’s Nextera Rapid Capture Exome Kit
Illumina’s Truseq Exome Enrichment Kit
IDT xGen® Exome Research Panel
Probe size
NPa
114–126 bp
95 bp
95 bp
NPa
Probe type
DNA
RNA
DNA
DNA
DNA
Target region
64 Mb
60 Mb
62 Mb
51 Mb
39 Mb
Input DNA
1 μg
100 ng
50 ng
100 ng
500 ng
Adapter addition
Ligation
Ligation
Transposase
Ligation
Ligation
Hybridization hours
72
16
Up to 24
Up to 24
4
a
NP indicates information not provided.
II. Genomics in the eye
Whole-genome sequencing technologies
33
DNA. It requires less DNA input (50 ng) than other technologies. The Nextera kit exhibits increased coverage of high GC content regions due to sequence bias of transposon fragmentation [14, 15]. Recently, IDT introduced its own exome kit: xGen Exome Research Panel. The unique feature of the IDT exome kit is that all probes are individually synthesized and normalized before pooling to ensure that each probe is represented in the panel at the correct concentration. Its hybridization time is the shortest (4 h) among different technologies and its enrichment is more uniform. After the first successful application of exome sequencing in discovering the causal gene of a rare Mendelian disorder [16], WES is now the most commonly used tool for Mendelian disease gene discovery. More than 1000 genes have been identified between 2010 and 2014 due to the early adoption of WES [17]. WES has also been widely applied to sequence patients with eye diseases since 2011 and has led to the discovery of over 100 new disease-associated genes. In addition, WES has been used in the molecular diagnosis of patients with eye diseases in both research and clinical settings [18–23].
Panel vs WES Depending on the purpose, it is often only necessary to sequence a small set of genes instead of the entire exome. For example, to perform molecular diagnosis for many ocular disorders, it is often sufficient and more cost effective to screen mutations using a gene panel approach where only genes that have been associated with the patient phenotype are enriched and sequenced. Since 2010, targeted disease gene panels were developed by several research groups [24–26]. By focusing on sequencing the coding regions of known disease genes, it is feasible to achieve higher coverage and sensitivity at a relatively lower price compared to WES. However, with the continuously increasing throughput of NGS technology, reduced sequencing cost, and automation of experimental and analysis workflow, it is feasible to sequence the human exome more quickly and affordably. As a result, the gene panel is gradually being replaced by WES.
Whole-genome sequencing technologies Whole-genome sequencing (WGS) allows for sequencing of all 3 billion bases of the human genome including the mitochondria DNA. WGS follows a whole genome shotgun sequencing approach and has a simpler workflow compared to WES by skipping the capture-enrichment step. The DNA sample undergoes library preparation, quantification, and sequencing. Initially, WGS was largely used as a research tool but it has been gradually adopted in clinics as the cost of sequencing rapidly declines [27]. With WGS covering the entire genome, a significant number of variants, approximately 3–4 million, can be identified for each individual [28, 29].
WES vs WGS Hundreds of researchers participated in the Human Genome Project, which was completed in 2003. The project took about 15 years and cost approximately 1 billion dollars. As
II. Genomics in the eye
34
3. Whole-exome sequencing and whole-genome sequencing
sequencing and labor costs continue to decline rapidly, it is possible to sequence an individual’s genome using WGS within a few days for $1000. WGS is the most comprehensive genetic test to date since it provides continuous coverage and identifies sequence variants throughout the genome. Direct comparison between WGS and WES data revealed that WGS is more powerful than WES in detecting exome variants due to greater uniformity of sequence read coverage, less bias in the detection of nonreference alleles, and more reliable CNV detection [30, 31]. It has been estimated that about 85% of the known genetic causes for Mendelian disorders are due to changes in protein-coding regions [32]. Recently, it has been reported that a molecular diagnostic rate of 25% is achieved when WES is applied to a large clinical cohort without a specific clinical diagnosis [33]. Limitations of WES might contribute to the relatively low positive rate. Not all targeted regions in WES can be efficiently captured due to sequence duplication and sequence content (e.g., high G + C content). As a result, 10% of exons may not be covered at sufficient levels for reliable variant identification. Since gene annotation of the human genome is incomplete, the current WES design does not cover the entire coding regions of the genome. Moreover, exome sequencing is limited to detecting certain types of mutations such as large insertions and deletions (indels), chromosomal segment CNVs, and structure variations (SV). Exome capture libraries have a bias toward the plus strand causing significantly less coverage for genes on the minus strand [34]. WES also misses the 16-kb mitochondria genome, which includes 13 protein-coding genes. Finally, it has been demonstrated that DNA variations outside the exons can affect gene activity and protein function and lead to genetic disorders—variations that WES would miss. In contrast, WGS generates more uniform sequence coverage and enables detection of structural variations, copy number changes, and deep intronic and intergenic mutations. In addition, WGS also detects mutations in noncoding genes such as miRNAs and lncRNAs and can uncover mutations in the mitochondrial genome [35]. WGS at high sequencing depths can generate accurate assemblies of the entire mitochondrial genome and detect heteroplasmy [36, 37]. Recently developed PCR-free WGS protocol further reduces the potential GC bias and achieves coverage for all GC-rich first exons and genes recommended by the American College of Medical Genetics (ACMG) [38]. WGS has been used in the molecular diagnosis of rare diseases as proof of principle since 2010 [39–41]. Its usage in a clinical setting was rare until recently. One of the first large-scale WGS projects is the UK 10K project. Based on the success of this project, the United Kingdom’s 100,000 Genomes Project launched with the goal of sequencing 100,000 genomes from the UK National Health Service (NHS) patients who have a rare disease, an infectious disease, or cancer. Researchers in the field of ocular genetics have also applied WGS to their cohorts. Recently, Ellingford et al. performed targeted NGS diagnostic testing in a 562-patient cohort with inherited retinal diseases, among which 46 patients also underwent WGS [42]. Direct comparison of panel data and WGS data revealed that WGS successfully detected large deletions, variants in noncoding regions, complex insertion and deletion events, and additional variants not included in the panel sequencing. In another study, Carss et al. performed WGS on 605 probands with inherited retinal diseases and demonstrated that WGS has advantages in detecting SV, mutations in GC-rich regions, and mutations in regulatory regions [43]. As sequencing technology continues to advance and costs continue to decrease, it is expected that WGS will soon become routine in the diagnostic setting.
II. Genomics in the eye
Whole-genome sequencing technologies
35
Long-read technologies Although current short-read NGS technologies can effectively generate sequences for the vast majority of the genome, its ability and accuracy of resolving complex regions in the genome is limited. This is a significant shortcoming since about 50% of the human genome is composed of repetitive elements and pseudogenes. Accurate alignment and variant calling in these regions based on short reads is problematic. Several third-generation sequencing platforms have been invented recently to address these issues such as Oxford Nanopore Technologies (ONT) sequencing [44] and single-molecule real-time (SMRT) sequencing by Pacific Biosciences (PacBio) [45]. These platforms can produce significantly longer reads with average read lengths of >10,000 bp and with some read lengths up to 100,000 bp or more, thus having the potential to offer significant improvements over current short-read technologies. Furthermore, long reads enable improved “split-read” analysis so that indels, inversions, translocations and tandem/interspersed regions, and other structural changes can be more readily identified [46, 47]. The third-generation technology has already been used to create detailed maps of structural variations that enable phasing variants across large regions of human chromosomes and the filling in of gaps in the human reference genome [48–50]. SMRT sequencing by PacBio PacBio introduced the first commercial platform using a SMRT sequencing technology in 2010 [45, 51]. Hairpin adaptors are ligated to the double-stranded DNA to form the singlestrand circular DNA template, which is called a SMRTbell. Upon loading to the SMRT cell, each SMRTbell diffuses into a nanostructure called zero-mode waveguide (ZMW), which limits the light detection at a very small space to reduce noise. In each ZMW, a single DNA polymerase binds to the hairpin adaptor of the SMRTbell and starts the replication process by incorporating fluorescently labeled dNTP into the newly synthesized strand. The fluorescent signal is recorded by a charge-coupled device (CCD) camera and each increase of fluorescence signal pulse corresponds to the incorporated fluorescently labeled dNTP and is converted to the template sequence base. PacBio sequencing generates long-read lengths (average >15 kb, some reads >100 kb) with faster sequencing rates than short-read methods. Continuous improvement of this method to achieve higher throughput, lower base error rate, and lower cost per base is essential for further expanding its utility. Nanopore sequencing by ONT Another intriguing third-generation sequencing technology was developed by Oxford Nanopore, which released its nanopore-based single-molecule sequencing technology in 2014. It has been observed that conductivity of the ionic current in a nanopore changes when biological molecules pass through it [52]. Furthermore, the flow of the ion current depends on the shape of the molecule translocating through the pore. The current change is distinct for different nucleotides allowing for identification of the bases [53]. The key advantage of this approach is the minimal sample preparation, long-read length, and inexpensive sequencer [54]. Further improvement of the technology by reducing the base error and the amount of input DNA is needed for expanding its use [55].
II. Genomics in the eye
36
3. Whole-exome sequencing and whole-genome sequencing
Linked-reads sequencing by 10 Genomics Linked-Reads is a virtual long-read technology commercialized by 10 Genomics in 2016. Using a microfluidic device, high molecular weight (HMW) DNA molecules are first sparsely partitioned into microdroplets so that each droplet has zero or one DNA molecule. Inside the droplet, each DNA molecule is fragmented and tagged with a unique barcode. Barcodetagged DNA molecules are released from the droplets and undergo a regular library preparation step and are sequenced using the Illumina platform [56, 57]. Based on the barcode information, sequencing reads can be linked to the originating HMW DNA molecule to generate a virtual long read. This enables the construction of long-range haplotype and structural variant information and allows for de novo diploid assembly of individual genomes.
WES/WGS data analysis Substantial efforts have been put into analyzing the large amount of data generated by WES and WGS. Most analysis procedures include the initial and refined alignment and mapping of sequence reads to the human genome reference, generation of a variant list, and annotation of the variant list. The final interpretation is based on a combination of previous publications, mutation databases, variant in silico prediction, gene function annotation, and clinical phenotypes. Due to the large capacity of the sequencing platform, the samples are often multiplexed by pooling uniquely barcoded libraries to reduce the cost. Upon demultiplexing of the raw sequencing data, Binary Alignment Map (BAM) files and variant call format (VCF) files are generated [58]. To identify potential pathogenic mutations, a set of factors is considered: (1) whether the variants have an effect on gene function since most pathogenic mutations alter gene function; (2) the minor allele frequency (MAF) from population databases such as ExAC, gnomAD, 1000 genomes. Common variants with high allele frequencies should be filtered out using these control population databases; (3) the mode of inheritance of the disease. In autosomal dominant inheritance, mutation in one copy of a disease allele is sufficient to cause the phenotype. While in autosomal recessive inheritance, mutations in both chromosomes are required for an individual to be susceptible to expressing the phenotype; (4) whether the clinical phenotype matches the genotype of the mutant genes needs to be considered, especially when the clinical diagnosis is uncertain; and (5) whether the variant segregates in the family pedigree should be examined. A variety of open-source algorithms and commercial software have been developed specifically for processing WES and WGS data. Examples include: IMPACT, GotCloud, and SeqMule [59–61]. Since large portions of variants have not been reported before, accurate annotation of these novel variants represents one of the biggest challenges in data analysis.
Future direction/perspectives We have witnessed the rapid development of NGS technologies over the past decade. These technologies have been widely utilized by the research and clinical community. Using WES and WGS as diagnostic tools has paved the road for precision medicine. With further
II. Genomics in the eye
References
37
improvements in sequencing technologies on the horizon, obtaining the complete genome sequence for each individual will likely become routine in the near future. The challenge will lie in the interpretation rather than the generation of sequencing data.
References [1] M. Singh, S.C. Tyagi, Genes and genetics in eye diseases: a genomic medicine approach for investigating hereditary and inflammatory ocular disorders, Int. J. Ophthalmol. 11 (2018) 117–134. [2] F. Sanger, S. Nicklen, A.R. Coulson, DNA sequencing with chain-terminating inhibitors, Proc. Natl. Acad. Sci. U. S. A. 74 (1977) 5463–5467. [3] M.N. Mandal, J.R. Heckenlively, T. Burch, L. Chen, V. Vasireddy, R.K. Koenekoop, P.A. Sieving, R. Ayyagari, Sequencing arrays for screening multiple genes associated with early-onset human retinal degenerations on a high-throughput platform, Invest. Ophthalmol. Vis. Sci. 46 (2005) 3355–3362. [4] A. Kurg, N. Tonisson, I. Georgiou, J. Shumaker, J. Tollett, A. Metspalu, Arrayed primer extension: solid-phase four-color DNA resequencing and mutation detection technology, Genet. Test. 4 (2000) 1–7. [5] R. Drmanac, A.B. Sparks, M.J. Callow, A.L. Halpern, N.L. Burns, B.G. Kermani, P. Carnevali, I. Nazarenko, G. B. Nilsen, G. Yeung, et al., Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays, Science 327 (2010) 78–81. [6] H. Stranneheim, J. Lundeberg, Stepping stones in DNA sequencing, Biotechnol. J. 7 (2012) 1063–1073. [7] T.J. Albert, M.N. Molla, D.M. Muzny, L. Nazareth, D. Wheeler, X. Song, T.A. Richmond, C.M. Middle, M. J. Rodesch, C.J. Packard, et al., Direct selection of human genomic loci by microarray hybridization, Nat. Methods 4 (2007) 903–905. [8] A. Gnirke, A. Melnikov, J. Maguire, P. Rogov, E.M. LeProust, W. Brockman, T. Fennell, G. Giannoukos, S. Fisher, C. Russ, et al., Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing, Nat. Biotechnol. 27 (2009) 182–189. [9] G.J. Porreca, K. Zhang, J.B. Li, B. Xie, D. Austin, S.L. Vassallo, E.M. LeProust, B.J. Peck, C.J. Emig, F. Dahl, et al., Multiplex amplification of large sets of human exons, Nat. Methods 4 (2007) 931–936. [10] M. Nilsson, H. Malmgren, M. Samiotaki, M. Kwiatkowski, B.P. Chowdhary, U. Landegren, Padlock probes: circularizing oligonucleotides for localized DNA detection, Science 265 (1994) 2085–2088. [11] B.J. O’Roak, L. Vives, W. Fu, J.D. Egertson, I.B. Stanaway, I.G. Phelps, G. Carvill, A. Kumar, C. Lee, K. Ankenman, et al., Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders, Science 338 (2012) 1619–1622. [12] Y. Wang, M. Moorhead, G. Karlin-Neumann, N.J. Wang, J. Ireland, S. Lin, C. Chen, L.M. Heiser, K. Chin, L. Esserman, et al., Analysis of molecular inversion probe performance for allele copy number determination, Genome Biol. 8 (2007) R246. [13] R. Tewhey, J.B. Warner, M. Nakano, B. Libby, M. Medkova, P.H. David, S.K. Kotsopoulos, M.L. Samuels, J. B. Hutchison, J.W. Larson, et al., Microdroplet-based PCR enrichment for large-scale targeted sequencing, Nat. Biotechnol. 27 (2009) 1025–1031. [14] M.J. Clark, R. Chen, H.Y. Lam, K.J. Karczewski, R. Chen, G. Euskirchen, A.J. Butte, M. Snyder, Performance comparison of exome DNA sequencing technologies, Nat. Biotechnol. 29 (2011) 908–914. [15] C.S. Chilamakuri, S. Lorenz, M.A. Madoui, D. Vodak, J. Sun, E. Hovig, O. Myklebost, L.A. Meza-Zepeda, Performance comparison of four exome capture systems for deep sequencing, BMC Genomics 15 (2014) 449. [16] S.B. Ng, K.J. Buckingham, C. Lee, A.W. Bigham, H.K. Tabor, K.M. Dent, C.D. Huff, P.T. Shannon, E.W. Jabs, D. A. Nickerson, et al., Exome sequencing identifies the cause of a mendelian disorder, Nat. Genet. 42 (2010) 30–35. [17] D. Salgado, M.I. Bellgard, J.P. Desvignes, C. Beroud, How to identify pathogenic mutations among all those variations: variant annotation and filtration in the genome sequencing era, Hum. Mutat. 37 (2016) 1272–1282. [18] S. Zuchner, J. Dallman, R. Wen, G. Beecham, A. Naj, A. Farooq, M.A. Kohli, P.L. Whitehead, W. Hulme, I. Konidari, et al., Whole-exome sequencing links a variant in DHDDS to retinitis pigmentosa, Am. J. Hum. Genet. 88 (2011) 201–206. [19] S.J. Bowne, M.M. Humphries, L.S. Sullivan, P.F. Kenna, L.C. Tam, A.S. Kiang, M. Campbell, G.M. Weinstock, D. C. Koboldt, L. Ding, et al., A dominant mutation in RPE65 identified by whole-exome sequencing causes retinitis pigmentosa with choroidal involvement, Eur. J. Hum. Genet. 19 (2011) 1074–1081.
II. Genomics in the eye
38
3. Whole-exome sequencing and whole-genome sequencing
[20] X. Wang, H. Wang, M. Cao, Z. Li, X. Chen, C. Patenia, A. Gore, E.B. Abboud, A.A. Al-Rajhi, R.A. Lewis, et al., Whole-exome sequencing identifies ALMS1, IQCB1, CNGA3, and MYO7A mutations in patients with Leber congenital amaurosis, Hum. Mutat. 32 (2011) 1450–1459. [21] A. Takata, M. Kato, M. Nakamura, T. Yoshikawa, S. Kanba, A. Sano, T. Kato, Exome sequencing identifies a novel missense variant in RRM2B associated with autosomal recessive progressive external ophthalmoplegia, Genome Biol. 12 (2011) R92. [22] I. Audo, K. Bujakowska, E. Orhan, C.M. Poloschek, S. Defoort-Dhellemmes, I. Drumare, S. Kohl, T.D. Luu, O. Lecompte, E. Zrenner, et al., Whole-exome sequencing identifies mutations in GPR179 leading to autosomal-recessive complete congenital stationary night blindness, Am. J. Hum. Genet. 90 (2012) 321–330. [23] R.K. Koenekoop, H. Wang, J. Majewski, X. Wang, I. Lopez, H. Ren, Y. Chen, Y. Li, G.A. Fishman, M. Genead, et al., Mutations in NMNAT1 cause Leber congenital amaurosis and identify a new disease pathway for retinal degeneration, Nat. Genet. 44 (2012) 1035–1039. [24] D.A. Simpson, G.R. Clark, S. Alexander, G. Silvestri, C.E. Willoughby, Molecular diagnosis for heterogeneous genetic diseases with targeted high-throughput DNA sequencing applied to retinitis pigmentosa, J. Med. Genet. 48 (2011) 145–151. [25] I. Audo, K.M. Bujakowska, T. Leveillard, S. Mohand-Said, M.E. Lancelot, A. Germain, A. Antonio, C. Michiels, J. P. Saraiva, M. Letexier, et al., Development and application of a next-generation-sequencing (NGS) approach to detect known and novel gene defects underlying retinal diseases, Orphanet J. Rare. Dis. 7 (2012) 8. [26] X. Wang, H. Wang, V. Sun, H.F. Tuan, V. Keser, K. Wang, H. Ren, I. Lopez, J.E. Zaneveld, S. Siddiqui, et al., Comprehensive molecular diagnosis of 179 Leber congenital amaurosis and juvenile retinitis pigmentosa patients by targeted next generation sequencing, J. Med. Genet. 50 (2013) 674–688. [27] C.G. van El, M.C. Cornel, P. Borry, R.J. Hastings, F. Fellmann, S.V. Hodgson, H.C. Howard, A. CambonThomsen, B.M. Knoppers, H. Meijers-Heijboer, et al., Whole-genome sequencing in health care. Recommendations of the European Society of Human Genetics, Eur. J. Hum. Genet. 21 (Suppl. 1) (2013) S1–S5. [28] K. Lohmann, C. Klein, Next generation sequencing and the future of genetic diagnosis, Neurotherapeutics 11 (2014) 699–707. [29] M. Hegde, A. Santani, R. Mao, A. Ferreira-Gonzalez, K.E. Weck, K.V. Voelkerding, Development and validation of clinical whole-exome and whole-genome sequencing for detection of germline variants in inherited disease, Arch. Pathol. Lab. Med. 141 (2017) 798–805. [30] A.M. Meynert, M. Ansari, D.R. FitzPatrick, M.S. Taylor, Variant detection sensitivity and biases in whole genome and exome sequencing, BMC Bioinformatics 15 (2014) 247. [31] A. Belkadi, A. Bolze, Y. Itan, A. Cobat, Q.B. Vincent, A. Antipenko, L. Shang, B. Boisson, J.L. Casanova, L. Abel, Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants, Proc. Natl. Acad. Sci. U. S. A. 112 (2015) 5473–5478. [32] C. Gilissen, A. Hoischen, H.G. Brunner, J.A. Veltman, Disease gene identification strategies for exome sequencing, Eur. J. Hum. Genet. 20 (2012) 490–497. [33] Y. Yang, D.M. Muzny, F. Xia, Z. Niu, R. Person, Y. Ding, P. Ward, A. Braxton, M. Wang, C. Buhay, et al., Molecular findings among patients referred for clinical whole-exome sequencing, JAMA 312 (2014) 1870–1879. [34] S.H. Lelieveld, M. Spielmann, S. Mundlos, J.A. Veltman, C. Gilissen, Comparison of exome and genome sequencing technologies for the complete capture of protein-coding regions, Hum. Mutat. 36 (2015) 815–822. [35] W. Steyaert, S. Callens, P. Coucke, B. Dermaut, D. Hemelsoet, W. Terryn, B. Poppe, Future perspectives of genome-scale sequencing, Acta Clin. Belg. 73 (2018) 7–10. [36] M. Li, A. Schonberg, M. Schaefer, R. Schroeder, I. Nasidze, M. Stoneking, Detecting heteroplasmy from highthroughput sequencing of complete human mitochondrial DNA genomes, Am. J. Hum. Genet. 87 (2010) 237–249. [37] H. Goto, B. Dickins, E. Afgan, I.M. Paul, J. Taylor, K.D. Makova, A. Nekrutenko, Dynamics of mitochondrial heteroplasmy in three families investigated via a repeatable re-sequencing study, Genome Biol. 12 (2011) R59. [38] J. Meienberg, R. Bruggmann, K. Oexle, G. Matyas, Clinical sequencing: is WGS the better WES? Hum. Genet. 135 (2016) 359–362. [39] E.A. Ashley, A.J. Butte, M.T. Wheeler, R. Chen, T.E. Klein, F.E. Dewey, J.T. Dudley, K.E. Ormond, A. Pavlovic, A. A. Morgan, et al., Clinical assessment incorporating a personal genome, Lancet 375 (2010) 1525–1535. [40] J.R. Lupski, J.G. Reid, C. Gonzaga-Jauregui, D. Rio Deiros, D.C. Chen, L. Nazareth, M. Bainbridge, H. Dinh, C. Jing, D.A. Wheeler, et al., Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy, N. Engl. J. Med. 362 (2010) 1181–1191.
II. Genomics in the eye
References
39
[41] J.C. Roach, G. Glusman, A.F. Smit, C.D. Huff, R. Hubley, P.T. Shannon, L. Rowen, K.P. Pant, N. Goodman, M. Bamshad, et al., Analysis of genetic inheritance in a family quartet by whole-genome sequencing, Science 328 (2010) 636–639. [42] J.M. Ellingford, S. Barton, S. Bhaskar, S.G. Williams, P.I. Sergouniotis, J. O’Sullivan, J.A. Lamb, R. Perveen, G. Hall, W.G. Newman, et al., Whole genome sequencing increases molecular diagnostic yield compared with current diagnostic testing for inherited retinal disease, Ophthalmology 123 (2016) 1143–1150. [43] K.J. Carss, G. Arno, M. Erwood, J. Stephens, A. Sanchis-Juan, S. Hull, K. Megy, D. Grozeva, E. Dewhurst, S. Malka, et al., Comprehensive rare variant analysis via whole-genome sequencing to determine the molecular pathology of inherited retinal disease, Am. J. Hum. Genet. 100 (2017) 75–90. [44] Y. Feng, Y. Zhang, C. Ying, D. Wang, C. Du, Nanopore-based fourth-generation DNA sequencing technology, Genomics Proteomics Bioinformatics 13 (2015) 4–16. [45] A. Rhoads, K.F. Au, PacBio sequencing and its applications, Genomics Proteomics Bioinformatics 13 (2015) 278–289. [46] C.S. Pareek, R. Smoczynski, A. Tretyn, Sequencing technologies and genome sequencing, J. Appl. Genet. 52 (2011) 413–435. [47] K. Nakano, A. Shiroma, M. Shimoji, H. Tamotsu, N. Ashimine, S. Ohki, M. Shinzato, M. Minami, T. Nakanishi, K. Teruya, et al., Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area, Hum. Cell 30 (2017) 149–161. [48] M.J. Chaisson, J. Huddleston, M.Y. Dennis, P.H. Sudmant, M. Malig, F. Hormozdiari, F. Antonacci, U. Surti, R. Sandstrom, M. Boitano, et al., Resolving the complexity of the human genome using single-molecule sequencing, Nature 517 (2015) 608–611. [49] V. Kuleshov, D. Xie, R. Chen, D. Pushkarev, Z. Ma, T. Blauwkamp, M. Kertesz, M. Snyder, Whole-genome haplotyping using long reads and statistical methods, Nat. Biotechnol. 32 (2014) 261–266. [50] M. Pendleton, R. Sebra, A.W. Pang, A. Ummat, O. Franzen, T. Rausch, A.M. Stutz, W. Stedman, T. Anantharaman, A. Hastie, et al., Assembly and diploid architecture of an individual human genome via single-molecule technologies, Nat. Methods 12 (2015) 780–786. [51] J. Eid, A. Fehr, J. Gray, K. Luong, J. Lyle, G. Otto, P. Peluso, D. Rank, P. Baybayan, B. Bettman, et al., Real-time DNA sequencing from single polymerase molecules, Science 323 (2009) 133–138. [52] H. Bayley, Nanopore sequencing: from imagination to reality, Clin. Chem. 61 (2015) 25–31. [53] D. Stoddart, A.J. Heron, E. Mikhailova, G. Maglia, H. Bayley, Single-nucleotide discrimination in immobilized DNA oligonucleotides with a biological nanopore, Proc. Natl. Acad. Sci. U.S.A. 106 (2009) 7702–7707. [54] D. Branton, D.W. Deamer, A. Marziali, H. Bayley, S.A. Benner, T. Butler, M. Di Ventra, S. Garaj, A. Hibbs, X. Huang, et al., The potential and challenges of nanopore sequencing, Nat. Biotechnol. 26 (2008) 1146–1153. [55] Y. Wang, Q. Yang, Z. Wang, The evolution of nanopore sequencing, Front. Genet. 5 (2014) 449. [56] G.X. Zheng, B.T. Lau, M. Schnall-Levin, M. Jarosz, J.M. Bell, C.M. Hindson, S. Kyriazopoulou-Panagiotopoulou, D.A. Masquelier, L. Merrill, J.M. Terry, et al., Haplotyping germline and cancer genomes with high-throughput linked-read sequencing, Nat. Biotechnol. 34 (2016) 303–311. [57] S.U. Greer, L.D. Nadauld, B.T. Lau, J. Chen, C. Wood-Bouwens, J.M. Ford, C.J. Kuo, H.P. Ji, Linked read sequencing resolves complex genomic rearrangements in gastric cancer metastases, Genome Med. 9 (2017) 57. [58] A. Magi, M. Benelli, A. Gozzini, F. Girolami, F. Torricelli, M.L. Brandi, Bioinformatics for next generation sequencing data, Genes (Basel) 1 (2010) 294–307. [59] V. Chaitankar, G. Karakulah, R. Ratnapriya, F.O. Giuste, M.J. Brooks, A. Swaroop, Next generation sequencing technology and genomewide data analysis: perspectives for retinal research, Prog. Retin. Eye Res. 55 (2016) 1–31. [60] S. Yohe, B. Thyagarajan, Review of clinical next-generation sequencing, Arch. Pathol. Lab. Med. 141 (2017) 1544–1557. [61] J.D. Hintzsche, W.A. Robinson, A.C. Tan, A survey of computational tools to analyze and interpret whole exome sequencing data, Int. J. Genomics 2016 (2016)7983236.
II. Genomics in the eye