Identification of Genome-wide Copy Number Variations and a Family-based Association Study of Avellino Corneal Dystrophy Joon Seol Bae, PhD,1 Hyun Sub Cheong, MS,2 Ji-Yong Chun, BS,1 Tae Joon Park, BS,1 Ji-On Kim, BS,2 Eun Mi Kim, MS,2 Miey Park, PhD,3 Dong-Joon Kim, MS,3 Eun-Ju Lee, PhD,3 Eung Kweon Kim, MD,4 Jong-Young Lee, PhD,3 Hyoung Doo Shin, PhD1,2 Objective: To determine the association of identified copy number variations (CNVs) in whole genome with the risk of Avellino corneal dystrophy (ACD) in a Korean population. Design: Case-control study. Participants: A total of 146 patients with ACD and 226 control subjects. Methods: A total of 193 trios were genotyped by the Illumina HumanHapCNV370-Duo BeadChip (370 404 markers) (Illumina, Inc., San Diego, CA). The intensity signal (log R ratio) and allelic intensity ratio (B allele frequency) of each marker in all individuals were obtained by Illumina BeadStudio software (Illumina, Inc.). To obtain authentic CNVs in this study, we performed a family-based CNV validation and family-based boundary mapping using the PennCNV algorithm, which incorporates multiple factors, including total log R ratio, B allele frequency, and family information, based on an integrated hidden Markov model. Main Outcome Measures: Statistical comparison and identification of CNVs between case and control using family information. Results: We identified 27 267 individual trio CNVs with a median size of 16.2 kb, aggregated in 2245 CNV regions. Most of the identified trio CNVs in this study showed well-defined CNV boundaries and overlapped with those in the Database of Genomic Variants (DGV) (83.4% in number and 79.2% in length). With the common CNV regions (264 CNV regions ⬎5%), we performed a family-based association test with the risk of ACD. Conclusions: Two CNV regions (chr6:29978470-29987783 and chr14:59896944-59916129) were significantly associated with the risk of ACD (P⫽0.05⬃0.003 and P⫽0.008, respectively). This study describes the first results of a genome-wide association analysis of individual CNVs with the risk of ACD and shows that 2 novel CNV loci may be involved in the risk of ACD. Financial Disclosure(s): The author(s) have no proprietary or commercial interest in any materials discussed in this article. Ophthalmology 2010;117:1306 –1312 © 2010 by the American Academy of Ophthalmology.
Despite being in the early stages of an association study, compared with single-nucleotide polymorphisms (SNPs), copy number variations (CNVs) are now recognized as one of the major contributors to human genetic diversity. They likely play a part in influencing gene expression, phenotypic variation, and adaptation by altering gene dosage. Several studies have reported genetic associations of CNVs with various human diseases, including autism, autoimmune diseases, osteoporosis, and rheumatoid arthritis.1– 4 To the present, more than 38,406 variants and 6558 CNV regions (CNVRs) (loci) have been reported in the Database of Genomic Variants (DGV) (last updated in March 2009, available at: http://projects.tcag.ca/variation), although several studies have pointed out that the real sizes of CNVs are smaller than what are listed in the DGV.5,6 This suggests that more accurate and sensitive CNV identification methods are required. To overcome limitations in CNV identification based on the signal intensity information of the
1306
© 2010 by the American Academy of Ophthalmology Published by Elsevier Inc.
markers, an advanced algorithm using total signal intensity (log R ratio [LRR]) and allelic intensity ratio (B allele frequency [BAF]) has been developed.7,8 A CNV identification algorithm that incorporates multiple factors, including LRR, BAF, and family information, was recently developed. It is called the “PennCNV” and is designed for high-resolution CNV detection in whole-genome SNP genotyping data.9,10 By making use of multiple factors, this algorithm can improve the sensitivity of CNV detection and accuracy of CNV boundary mapping by family-based validation.9 –11 Several researchers have been using PennCNV to identify CNVs in SNP genotyping microarray.4,11–13 Avellino corneal dystrophy (ACD) is a rare disorder with an autosomal dominant pattern of inheritance.14 This disease is characterized by granular deposits in the subepithelial and anterior stroma layers.15 Although most ACD cases seem to be caused by R124H mutation of the transforming growth factorinduced protein (TGFBI),16,17 this mutation does not fully ISSN 0161-6420/10/$–see front matter doi:10.1016/j.ophtha.2009.11.021
Bae et al 䡠 CNV and Avellino Corneal Dystrophy
Figure 1. Karyotype map showing the genomic distribution of common CNVRs in Korean subjects. Common CNVRs are aggregated from all identified trio CNVs in this study. The genomic distribution of common CNVRs is represented by boxes using the Ideogram Browser. CNV ⫽ copy number variation; CNVR ⫽ copy number variation region.
explain the cause of ACD. There have been cases in which ACD did not occur even when it contained the mutation, and the physiologic process of how the mutation causes ACD is still unclear. However, recent reports suggest that other genetic factors may affect the onset of ACD.18 Moreover, the previous studies used small sample numbers to study certain genes, but to accurately discover genetic factors, a study using a large number of samples on a whole-genome level is required. The current study reports on the genome-wide trio CNVs identified using LRR, BAF, and family information, and on the results from a family-based association test (FBAT) with the risk of ACD in a Korean population.
Materials and Methods Subjects and Whole-Genome Single-Nucleotide Polymorphism Genotyping All individuals included in this study were of Korean ethnic origin. The 193 trios consisted of 146 patients with ACD (64 male and 82 female), 226 control subjects (111 male and 115 female), and 3 unknown individuals. All individuals were recruited from the Korea National Institute of Health. All the subjects in this study, including the controls, came from families with ACD. The criteria for the selected cases consist of those showing granular stromal opacity and R124H mutation of TGFBI gene using slit-lamp examination and direct sequencing, respectively. All blood for genetic tests was donated after the subjects signed the informed
consent form that was approved by the institutional review board. The age of the patients ranged from 5 to 80 years (mean ⫽ 38.1 years, standard deviation [SD] ⫽ 17.5), and the age of the controls ranged from 4 to 79 years (mean ⫽ 39.7 years, SD ⫽ 18.0). We used the Illumina HumanHapCNV370-Duo BeadChip, a highdensity BeadChip that contains 370 404 SNP/CNV markers (Illumina, Inc., San Diego, CA) and 79 631 probes for approximately 3034 DGV regions identified in the genome.19 The mean and median for spacing are 7.9 and 5.0, respectively. In the BeadChip, 44% of all markers exist in the known RefSeq Gene in the BeadChip. In addition, the HumanHapCNV370-Duo BeadChip was also designed for detection of novel CNVRs (⬃9 K). A total of 370 404 SNP/CNV markers for genotyping data were collected for the genome-wide association study. Approximately 750 ng of genomic DNA, which was extracted from the blood of 193 trios, was used to genotype each sample. The assay procedure has been described in our previous study.20 The overall SNP genotyping call rate was 99.84%, an indication of the high-quality data in this study.
Identification of Individual Family-based Trio Copy Number Variations All signal intensity (LRR) and allelic intensity (BAF) ratios of the 193 trios were exported from Illumina BeadStudio 3.2 software (Illumina, Inc.). Because samples with SDs of LRR ⬎0.24 were expected to be called false-positive CNVs, only samples with a call rate ⬎99.0% and SD of LRR ⬍ 0.24 were included in this study. To identify individual CNVs, we incorporated multiple factors, including LRR, BAF, marker distance, and population frequency
1307
Ophthalmology Volume 117, Number 7, July 2010 Table 1. Summary of Identified Trio Copy Number Variations and Aggregated Regions (193 Trios, n⫽375) No. of CNVs Family-based individual 27,267 trio CNV* CNVR 2245
Mean No. of Mean Size Median Size No. of Common No. of Common CNVs per of CNVs of CNVs No. of No. of Ratio CNVs CNVs Sample (kb) (kb) Gain Loss (Loss/Gain) (Frequency > 1%) (Frequency > 5%) 72.7
44.6
16.9
9040
18,227
2.0
1300
210
6.0
94.2
35.6
—
—
—
871
264
CNV ⫽ copy number variation; CNVR ⫽ copy number variation region. *Trio CNVs were detected by the PennCNV algorithm in each father-mother-offspring trio.
of the B allele using PennCNV.10 In the case of the X chromosome, separate clustering was performed in male (n⫽177) and female (n⫽198) participants. To make a gender file, an automatic estimation of gender was performed using Illumina BeadStudio 3.2 software. The “-chrX” argument for identifying individual trio CNVs in the X chromosome was used.
Aggregation of Trio Copy Number Variations to Copy Number Variation Regions By taking into account the fact that CNVRs were aggregated from identified trio CNVs, we considered that each CNV did not have any overlapping regions. IdeogramBrowser software was used for depicting a genomic distribution map of identified common CNVRs (Fig 1). This software is available at http://www.informatik.uni-ulm.de/ni/ staff/HKestler/ideo (accessed May 2009).
Family-based Association Test The CNV status of each individual within common CNVRs (264 CNVRs ⬎5%) at the marker level was performed with an FBAT. We generated input data of both loss and gain (loss: 0 and 1, normal: 2, gain: 3 and 4) within common CNVRs per sample. In
the case of loss, CNV status (0), (1), and (2, 3, and 4) were coded to “A_A”, “A_B,” and “B_B,” respectively. In the case of gain, CNV status (0, 1, and 2), (3), and (4) were coded to “A_A,” “A_B,” and “B_B,” respectively. The gender and phenotype in 70 large families were merged into 2 types of input data. Golden HelixTree software (Golden Helix, Inc., Bozeman, MT; http:// www.goldenhelix.com) was used for the FBAT analysis controlling for R124H mutation of TGFBI gene as covariate.
Results By using the Illumina SNP genotyping array and the PennCNV algorithm, we identified 27 267 trio CNVs and 2245 CNVRs in 193 trios. A summary of the identified trio CNVs and aggregated CNVRs is shown in Table 1. The number of trio CNVs per sample (mean⫽48.1) was 1.5 times higher (Table 2, available at http:// aaojournal.org) compared with the raw CNVs (mean⫽72.7). The mean and median size of the trio CNVs aggregated to 2245 CNVRs were 94.2 kb and 35.6 kb, respectively. The CNVRs covered 6.9% (211.5 Mb) of the whole human genome. Figure 1 shows the genomic distribution of 264 common CNVRs (⬎5% CNV frequency) and 871 common CNVRs (⬎1% CNV frequency). Figure 2 (available at http://aaojournal.org) shows a
Table 3. Whole-Genome, Family-Based Association Test of Common Copy Number Variation Regions with Cytoband
Gene*
Nearby Gene within 200 kb
Marker Name
Chr6:29978470-29987783
CNV Region
6p21.33
ENSG00000204632 (HLA-G precursor), ENSG00000217224 (BAT1P), ENSG00000217266 (MCCD1P1)
HLA-G,‡ HLA-A,§ HCG9
Chr14:59896944-59916129
14q23.1
None
PPMA1,储 C14orf39
rs9259545 rs1611522 rs1611523 rs1611528 rs13204550 rs2428530 rs9259606 rs2428527 rs9259650 rs28592582 rs2975022 rs440908 rs2734928 cnvcnv2007:105PP168 cnvcnv2007:105PP326 cnvcnv2007:105PP344
BAT1P ⫽ HLA-B–associated transcript 1 pseudogene 1; C14orf39 ⫽ chromosome 14 open reading frame 39; CNV ⫽ copy number variation; HCG9 ⫽ class I, G; MCCD1P1 ⫽ mitochondrial coiled-coil domain 1 pseudogene 1; PPMA1 ⫽ protein phosphatase magnesium-dependent 1A. *HapMap Genome Browser (http://www.hapmap.org/cgi-perl/gbrowse/hapmap27_B36). † P values for each marker were obtained by using an FBAT controlling R124H mutation of TGFBI gene as covariate. ‡ Online Mendelian Inheritance in Man (OMIM)-associated gene (142871), disease class: Aging, Cancer, Cardiovascular, Immune, Infection, § OMIM-associated gene (142800), disease class: Immune, Infection, Reproduction. 储 OMIM-associated genes (606108).
1308
Bae et al 䡠 CNV and Avellino Corneal Dystrophy
Figure 5. Comparison of size distribution of both identified CNVs in this study and previously reported CNVs in the DGV. CNV ⫽ copy number variation; DGV ⫽ Database of Genomic Variants.
distribution of CNVs and CNVRs in range of size. Figure 3 (available at http://aaojournal.org) shows a distribution of identified trio CNVs in the whole chromosome. Chromosome 6, which contains the major histocompatibility complex, showed the highest number of trio CNVs (mean size: ⬃14 K). In line with the previous results, the number of deletions in this study was approximately 2-fold higher than duplications.10 As an example, one common CNVR (chr3:163389102165625781) is shown in Figure 4A (available at http://aaojournal.org) using the University of California Santa Cruz Genome Browser (http://genome.ucsc.edu). The X-axis represents the chromosome position within this CNVR, and the black bars on the Y-axis display identified trio CNVs in this study. When compared with the DGV for this region, most of the trio CNVs showed a well-defined CNV boundary with small size. Figure 4B (available at http://aaojournal.
org) shows 8 consecutive genoplot images of SNP/CNV markers within this CNVR. The genoplot image displayed the SNP genotype and signal intensity of each sample, as well as the copy number status through its intensity. Eight consecutive genoplot images of SNP/CNV markers showed that the copy number status of each sample consecutively matched. Other common CNVRs also showed patterns in consecutive markers similar to those in Figure 4B (available at http://aaojournal.org) (data not shown). Figure 5 shows the result of the comparison for both identified trio CNVs in this study and previously reported CNVs of the DGV (version: March 2009) for various sizes. The number of trio CNVs with small size (10⬃100 kb) was higher than that in the DGV. On the other hand, the number of trio CNVs with large size (⬎100 kb) was lower than that in the DGV. In addition, 22 743 (number, 83.4%) and 79.2% (length, 964.4 kb) of the identified trio CNVs in this study overlapped with the DGV (Fig 6, available at http:// aaojournal.org). To examine the genetic effects of trio CNVs on the risk of ACD, we performed an FBAT using Golden HelixTree software controlling R124H mutation of TGFBI gene as a covariate. Table 3 shows the FBAT results for ACD. We found that chr6: 29978470-29987783 with 13 consecutive markers (length: ⬃9.31 k) and chr14:59896944-59916129 with 3 consecutive markers (length: ⬃19.1 k) were significantly associated with the risk of ACD (P⫽0.05⬃0.003 and P⫽0.08, respectively). Furthermore, those CNVRs contained several genes, including the major histocompatibility complex, class I, G (HLA-G precursor; ENSG00000204632), HLA-B–associated transcript 1 pseudogene 1 (BAT1P; ENSG00000217224), and mitochondrial coiled-coil domain 1 pseudogene (MCCD1P1; ENSG00000217266), in the HapMap Genome Browser (http://www.hapmap.org/cgi-perl/gbrowse/ hapmap27_B36) (Table 3; Fig 7A, available at http://aaojournal. org). The expanded region within 200 kb covered 5 genes, including HLA-G, HLA-A, HCG9, PPMA1, and C14orf39 (Figs 7B and 8B, available at http://aaojournal.org).
Risk of Avellino Corneal Dystrophy in a Korean Population (70 Large Families, n⫽375) Chromosome
Position
Allele
Frequency
No. of Informative Families
P Value†
6 6 6 6 6 6 6 6 6 6 6 6 6 14 14 14
29,978,470 29,980,430 29,980,521 29,981,029 29,981,555 29,982,364 29,982,594 29,983,504 29,984,419 29,985,843 29,986,412 29,986,717 29,987,783 59,896,944 59,914,175 59,916,129
Loss Loss Loss Loss Loss Loss Loss Loss Loss Loss Loss Loss Loss Loss Loss Loss
0.120 0.133 0.143 0.143 0.143 0.143 0.145 0.145 0.145 0.141 0.137 0.137 0.137 0.107 0.107 0.105
19 22 22 22 22 20 20 20 20 19 18 18 18 16 16 16
0.05 0.006 0.003 0.003 0.003 0.003 0.003 0.003 0.007 0.01 0.05 0.05 0.05 0.008 0.008 0.008
PERB11 family member in MHC class I region; HLA-A ⫽ major histocompatibility complex, class I, A; HLA-G ⫽ major histocompatibility complex,
Neurological, Psych, Reproduction.
1309
Ophthalmology Volume 117, Number 7, July 2010
Discussion To identify authentic individual CNVs, we used family information and a high-density SNP genotyping array called the Illumina HumanHap370-Duo BeadChip, which contains 370 404 markers and 3034 DGV regions. Several technical advantages of this SNP genotyping array include highresolution CNV detection and authentic CNV identification.10 Log R ratio refers to the total fluorescence intensity signals from both sets of probes/alleles at each SNP marker, whereas BAF indicates the fluorescence signals between 2 probes/alleles at each SNP marker.10,21 The BAF and SNP genotypes are useful for analyzing family information and sensitive CNV identification. In the case of deletion, the loss of 1 copy, referred to as a hemizygous deletion, can be detected by an LRR. In addition, all SNPs in this region are visualized in the B allele plot as loss of heterozygosity.19 In the case of single-copy duplication, the BAF for each SNP marker may have 1 of 4 CNV genotypes (AAA, AAB, ABB, or BBB), with corresponding values of 0.1, 0.33, 0.67, and 1.0, respectively. Those distinct BAF values could benefit from additional information for identifying authentic duplication.9,19 Furthermore, application of family information to identification of reliable CNVs could increase sensitivity and accuracy of results through CNV boundary mapping and family-based validation procedures. Tables 1 and 2 (available at http://aaojournal.org) indicate the family information available for identification of authentic CNVs. Results from comparing trio CNVs with raw CNVs show that the numbers of identified CNVs, small-size CNVs (⬍10 kb), and common CNVs (⬎1%) were higher in the trio CNVs, whereas the mean and median sizes of trio CNVs were smaller than those of raw CNVs, suggesting that well-defined CNV boundaries could be obtained through the use of family information. This means that the family-based validation procedure refined the rough, raw CNVs in each individual to accurate trio CNVs and led to the identification of missing CNVs.9 In addition, most of the identified trio CNVs in this study overlapped with reported CNVs in the DGV (83.4% in number and 79.2% in length). It has also been observed that the number of small-size CNVs (1–100 kb) was higher than in the DGV (Fig 5). These facts support the idea that identified trio CNVs in this study are reliable and defined. Additional evidence for authenticity of the CNVs includes the matching of the copy number status of each individual in consecutive SNP/CNV markers (Fig 4, available at http://aaojournal.org). The copy number status of all individuals can be detected through the intensity change as indicated in a genoplot image of each individual’s consecutive SNP/CNV markers. One dot in a genoplot image represents the allelic intensity ratio (X-axis) and normalized intensity value (Y-axis) of 1 individual.20,21 In a previous study, we reported that an estimated 3 types of copy number status (0X, 1X, and 2X), which are displayed in genoplot images, are validated by quantitative polymerase chain reaction.20 In this study, 8 genoplot images showed that 3 types of copy number status of each individual were matched consecutively from the start marker to the end marker, which indicates that the identified trio CNVs in this study are reliable.
1310
Avellino corneal dystrophy is a rare disorder with an autosomal dominant pattern of inheritance.14 The phenotype of ACD is the formation of corneal opacities on different layers of the cornea, which may contribute to significant impairment of corneal transparency and refraction.22 The R124H mutation in the TGFBI gene has been considered as a major contributor to the severity of ACD from family to family,23,24 but the disease-causing mechanism in TGFBI is still unclear. In addition, previous studies have used only small-scale sample sizes, and global association analysis for identification of the other susceptible loci was not performed at the whole-genome level. Cao et al18 recently claimed that although the R124H mutation is one of the genetic causes of this disease, different genetic and environmental factors might influence the expressivity and the penetrance of ACD. To identify ACD-susceptible loci, we performed an FBAT in 70 large families (n⫽375) controlling R124H mutation of the TGFBI gene as a covariate. In case of loss, we found that 2 CNVRs were significantly associated with risk of ACD. One CNVR (chr6:29978470-29987783) overlapped with previously reported CNVs, mainly representing the deletion. This region includes HLA-G precursor, BAT1P, and MCCD1P1 genes. The expanded region within 200 kb covered 3 genes, including 2 Online Mendelian Inheritance in Man-associated genes (Table 3; Fig 7, available at http://aaojournal.org). Previous studies have indicated that CNVs can attribute an expression of genes located in their genomic neighborhoods.25,26 A recent study showed that expression levels of genes within 50 to 250 kb from CNV breakpoints can be influenced by their CNVs.27 In addition, although CNVR (chr14:5989694459916129) was not found to contain any genes, protein phosphatase magnesium-dependent 1A (PPM1A) and chromosome 14 open reading frame 39 (C14orf39) genes were still detected within 200 kb from the region. The PPM1A gene is essential for regulating cellular stress responses as serine/threonine protein phosphatase.28 We suggest that the immune disease-related genes (HLA-A and HLA-G) and component gene (PPM1A) of stressactivated pathways may be a causative factor of ACD. Further studies are required to assess for a relationship between TGFBI and identified susceptible genes in this study because a disease-causing mechanism of ACD in TGFBI has not been made clear and the formation of abnormal accumulations resulting in mutant TGFBI may require other specific factors.29 Until now, the relationship of CNV and pseudogenes has been unclear. However, a recent study reported that pseudogenes were associated with segmental duplications,30 and it has been suggested that they mediate CNV formation.23,31,32 The reason why BAT1P and MCCD1P1 pseudogenes were included in the CNVR (chr6:29978470-29987783) might be explained by the mediation of CNV formation through segmental duplications. In addition, this CNVR was located near a reported segmental duplication (chr6: 30071718-30079011) in the University of California Santa Cruz Genome Browser (http://genome.ucsc.edu).
Bae et al 䡠 CNV and Avellino Corneal Dystrophy In conclusion, we have identified several trio CNVs and CNVRs using the PennCNV, a method that incorporates multiple factors, including signal intensity (LRR), allelic intensity ratio (BAF), and family information. Most of the identified trio CNVs in this study represent small-length, enriched, common CNVRs with well-defined boundaries that overlap the DGV-defined region. Furthermore, we performed genome-wide FBAT using the identified trio CNVs. We described the first wholegenome FBAT of CNV with the risk of ACD and found that 2 CNVRs may be involved in the risk of ACD in a Korean population. Acknowledgments. The authors thank Dr. Kai Wang (Department of Genetics, University of Pennsylvania, and Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania) for providing highly useful scripts for the PennCNV.
References 1. Aitman TJ, Dong R, Vyse TJ, et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature 2006;439:851–5. 2. Gonzalez E, Kulkarni H, Bolivar H, et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/ AIDS susceptibility. Science 2005;307:1434 – 40. 3. Marshall CR, Noor A, Vincent JB, et al. Structural variation of chromosomes in autism spectrum disorder. Am J Hum Genet 2008;82:477– 88. 4. Yang S, Wang K, Gregory B, et al. Genomic landscape of a three-generation pedigree segregating affective disorder. PLoS One 2009;4:e4474. 5. McCarroll SA, Kuruvilla FG, Korn JM, et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet 2008;40:1166 –74. 6. Perry GH, Ben-Dor A, Tsalenko A, et al. The fine-scale and complex architecture of human copy-number variation. Am J Hum Genet 2008;82:685–95. 7. Itsara A, Cooper GM, Baker C, et al. Population analysis of large copy number variants and hotspots of human genetic disease. Am J Hum Genet 2009;84:148 – 61. 8. Marioni JC, White M, Tavare S, Lynch AG. Hidden copy number variation in the HapMap population. Proc Natl Acad Sci U S A 2008;105:10067–72. 9. Wang K, Chen Z, Tadesse MG, et al. Modeling genetic inheritance of copy number variations. Nucleic Acids Res 2008;36:e138. 10. Wang K, Li M, Hadley D, et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res 2007;17:1665–74. 11. Jakobsson M, Scholz SW, Scheet P, et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 2008;451:998 –1003. 12. Glessner JT, Wang K, Cai G, et al. Autism genome-wide copy number variation reveals ubiquitin and neuronal genes. Nature 2009;459:569 –73.
13. Need AC, Ge D, Weale ME, et al. A genome-wide investigation of SNPs and CNVs in schizophrenia. PLoS Genet 2009; 5:e1000373. 14. Klintworth GK. Advances in the molecular genetics of corneal dystrophies. Am J Ophthalmol 1999;128:747–54. 15. Holland EJ, Daya SM, Stone EM, et al. Avellino corneal dystrophy. clinical manifestations and natural history. Ophthalmology 1992;99:1564 – 8. 16. Park KA, Ki CS, Chung ES, Chung TY. Deep anterior lamellar keratoplasty in Korean patients with Avellino dystrophy. Cornea 2007;26:1132–5. 17. Tsujikawa K, Tsujikawa M, Watanabe H, et al. Allelic homogeneity in Avellino corneal dystrophy due to a founder effect. J Hum Genet 2007;52:92–7. 18. Cao W, Ge H, Cui X, et al. Reduced penetrance in familial Avellino corneal dystrophy associated with TGFBI mutations. Mol Vis 2009;15:70 –5. Available at: http://www.molvis.org/ molvis/v15/a7/mv-v15-a7-cao.pdf. Accessed October 12, 2009. 19. Shaikh TH. Oligonucleotide arrays for high-resolution analysis of copy number alteration in mental retardation/multiple congenital anomalies. Genet Med 2007;9:617–25. 20. Bae JS, Cheong HS, Kim JO, et al. Identification of SNP markers for common CNV regions and association analysis of risk of subarachnoid aneurysmal hemorrhage in Japanese population. Biochem Biophys Res Commun 2008;373: 593– 6. 21. Peiffer DA, Le JM, Steemers FJ, et al. High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res 2006;16: 1136 – 48. 22. Klintworth GK. The molecular genetics of the corneal dystrophies— current status. Front Biosci 2003;8:d687– 713. 23. Fujiki K, Hotta Y, Nakayasu K, Kanai A. Homozygotic patient with betaig-h3 gene mutation in granular dystrophy. Cornea 1998;17:288 –92. 24. Okada M, Yamamoto S, Inoue Y, et al. Severe corneal dystrophy phenotype caused by homozygous R124H keratoepithelin mutations. Invest Ophthalmol Vis Sci 1998;39:1947–53. 25. Guryev V, Saar K, Adamovic T, et al. Distribution and functional impact of DNA copy number variation in the rat. Nat Genet 2008;40:538 – 45. 26. Stranger BE, Forrest MS, Dunning M, et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 2007;315:848 –53. 27. Henrichsen CN, Vinckenbosch N, Zollner S, et al. Segmental copy number variation shapes tissue transcriptomes. Nat Genet 2009;41:424 –9. 28. Takekawa M, Maeda T, Saito H. Protein phosphatase 2Calpha inhibits the human stress-responsive p38 and JNK MAPK pathways. EMBO J 1998;17:4744 –52. 29. Kim JE, Park RW, Choi JY, et al. Molecular properties of wild-type and mutant betaIG-H3 proteins. Invest Ophthalmol Vis Sci 2002;43:656 – 61. 30. Kim PM, Lam HY, Urban AE, et al. Analysis of copy number variants and segmental duplications in the human genome: evidence for a change in the process of formation in recent evolutionary history. Genome Res 2008;18:1865–74. 31. Cooper GM, Nickerson DA, Eichler EE. Mutational and selective effects on copy-number variants in the human genome. Nat Genet 2007;39(7 Suppl):S22–9. 32. Sharp AJ, Cheng Z, Eichler EE. Structural variation of the human genome. Annu Rev Genomics Hum Genet 2006;7:407– 42.
1311
Ophthalmology Volume 117, Number 7, July 2010
Footnotes and Financial Disclosures Originally received: July 3, 2009. Final revision: November 11, 2009. Accepted: November 12, 2009. Available online: March 3, 2010.
Manuscript no. 2009-899.
1
Laboratory of Genomic Diversity, Department of Life Science, Sogang University, Shinsu-dong, Mapo-gu, Seoul, Republic of Korea. 2
Department of Genetic Epidemiology, SNP Genetics, Inc., Gasan-Dong, Geumcheon-Gu, Seoul, Republic of Korea. 3
Center for Genome Sciences, National Institutes of Health, Eunpyung-Gu, Seoul, Republic of Korea. 4
Corneal Dystrophy Research Institute, Department of Ophthalmology, Yonsei University College of Medicine, Seoul, Republic of Korea. Financial Disclosure(s):
1312
The author(s) have no proprietary or commercial interest in any materials discussed in this article. Supported by an intramural grant (4845-301-430-210-13) from the Korea National Institute of Health, Korea Center for Disease Control, Republic of Korea, and a grant from the Korea Science and Engineering Foundation funded by the Korean government (Ministry of Education, Science, and Technology) (No. 2009-0080157). Correspondence: Jong-Young Lee, PhD, Center for Genome Sciences, National Institute of Health (NIH), 194 Tongil-Lo, Eunpyung-Gu, Seoul 122-701, Republic of Korea. E-mail:
[email protected]; Hyoung Doo Shin, PhD, Laboratory of Genomic Diversity, Department of Life Science, Sogang University, Shinsu-dong, Mapo-gu, Seoul 121-742, Republic of Korea. E-mail:
[email protected].
Bae et al 䡠 CNV and Avellino Corneal Dystrophy Table 2. Summary of Identified Trio Copy Number Variations (193 Trios, n⫽375)
Individual CNV CNV detected on each individual separately (raw CNV) CNV detected on each trio jointly (trio CNV) CNV region CNV detected on each individual separately (raw CNV) CNV detected on each trio jointly (trio CNV)
No. of CNVs (<1 mb)
Mean No. of CNVs per Sample
Mean Size of CNVs (kb)
Median Size of CNVs (kb)
No. of Gain
No. of Loss
Ratio (Loss/Gain)
Common CNV (Frequency > 1%)
18,048
48.1
53.9
22.7
5456
12,592
2.3
796
27 267
72.7
44.6
16.9
9040
18,227
2.0
1300
2242
6.0
95.2
35.9
—
—
—
664
2245
6.0
94.2
35.6
—
—
—
871
CNV ⫽ copy number variation; kb ⫽ kilobase; mb ⫽ megabase.
Figure 2. Size distribution of identified trio CNVs and aggregated CNVRs. CNV ⫽ copy number variation; CNVR ⫽ copy number variation regions.
1312.e1
Ophthalmology Volume 117, Number 7, July 2010
Figure 3. Distribution of identified trio CNVs in chromosomes. CNV ⫽ copy number variation; CNVR ⫽ copy number variation regions.
Figure 4. Visualization of a common CNVR and consecutive genoplot images. A, Visualization of a common CNVR (chr3:163977266164424597) in the University of California Santa Cruz Genome Browser. Most trio CNVs showed a well-defined CNV boundary. When compared with CNVR of the DGV in this region, trio CNVs condensed into a very small region. B, Genoplot images of 8 consecutive markers within this CNVR. Each intensity of an individual is marked to 1 dot in the genoplot image. Individuals representing homozygous and heterozygous deletions are indicated in cyan and green, respectively. Three distinct groups representing 3 types of copy numbers (0X, 1X, and 2X) were consecutively elongated from start marker to end marker. CNV ⫽ copy number variation; CNVR ⫽ copy number variation regions; DGV ⫽ Database of Genomic Variants.
1312.e2
Bae et al 䡠 CNV and Avellino Corneal Dystrophy A
40,000
21.8%
B
2,000
25.4% 1,800
35,000 1,600 30,000 1,400
Lemgth (mb)
Number of CNVs
83.4% 25,000
20,000
1,200
79.2%
1,000
800
15,000 600 10,000 400 5,000 200
0
0
In this study Overlapped
DGV (Mar 2009) Unoverlapped
In this study Overlapped
DGV (Mar 2009) Unoverlapped
Figure 6. Overlapping analysis of both identified CNVs in this study and previously reported CNVs in the DGV. A, Our trio CNVs were compared in number with reported variations in DGV (last updated in March 2009). B, Our trio CNVs were compared in length with reported variations in the DGV (last updated in March 2009). CNV ⫽ copy number variation; DGV ⫽ Database of Genomic Variants.
Figure 7. Visualization of chr6:29978470-29987783. A, Visualization of chr6:29978470-29987783 in HapMap Genome Browser. B, Visualization of chr6:29978470-29987783 within 200 kb.
1312.e3
Ophthalmology Volume 117, Number 7, July 2010
Figure 8. Visualization of chr14:59896944-59916129. A, Visualization of chr14:59896944-59916129 in HapMap Genome Browser. B, Visualization of chr14:59896944-59916129 within 200 kb.
1312.e4