Gene structure variation in segmental duplication block C of human chromosome 7q 11.23 during primate evolution

Gene structure variation in segmental duplication block C of human chromosome 7q 11.23 during primate evolution

GENE-40719; No. of pages: 11; 4C: Gene xxx (2015) xxx–xxx Contents lists available at ScienceDirect Gene journal homepage: www.elsevier.com/locate/g...

2MB Sizes 0 Downloads 86 Views

GENE-40719; No. of pages: 11; 4C: Gene xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Gene journal homepage: www.elsevier.com/locate/gene

Research paper

Gene structure variation in segmental duplication block C of human chromosome 7q 11.23 during primate evolution Yun-Ji Kim a,b,1, Kung Ahn c,1, Jeong-An Gim d, Man Hwan Oh a, Kyudong Han a,b, Heui-Soo Kim d,⁎ a

Department of Nanobiomedical Science & BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan 330-714, Republic of Korea DKU-Theragen Institute for NGS Analysis (DTiNa), Cheonan 330-714, Republic of Korea TBI, Theragen BiO Institute, TheragenEtex, Suwon 443-270, Republic of Korea d Department of Biological Sciences, College of Natural Sciences, Pusan National University, Busan 609-735, Republic of Korea b c

a r t i c l e

i n f o

Article history: Received 27 February 2015 Received in revised form 15 July 2015 Accepted 16 July 2015 Available online xxxx Keywords: Chromosome 7q 11.23 FKBP6 gene NSUN5 gene POM121 gene Primate evolution TRIM50 gene

a b s t r a c t Segmental duplication, or low-copy repeat (LCR) event, occurs during primate evolution and is an important source of genomic diversity, including gain or loss of gene function. The human chromosome 7q 11.23 is related to the William–Beuren syndrome and contains large region-specific LCRs composed of blocks A, B, and C that have different copy numbers in humans and different primates. We analyzed the structure of POM121, NSUN5, FKBP6, and TRIM50 genes in the LCRs of block C. Based on computational analysis, POM121B created by a segmental duplication acquired a new exonic region, whereas NSUN5B (NSUN5C) showed structural variation by integration of HERV-K LTR after duplication from the original NSUN5 gene. The TRIM50 gene originally consists of seven exons, whereas the duplicated TRIM73 and TRIM74 genes present five exons because of homologous recombination-mediated deletion. In addition, independent duplication events of the FKBP6 gene generated two pseudogenes at different genomic locations. In summary, these clustered genes are created by segmental duplication, indicating that they show dynamic evolutionary events, leading to structure variation in the primate genome. © 2015 Elsevier B.V. All rights reserved.

1. Introduction The genomic DNA sequences of humans and primates are highly similar (Chen and Li, 2001), but there are large variations in specific genomic regions. As driving forces of primate genome evolution, single base-pair mutation, segmental duplication, insertions/deletions, and chromosomal rearrangement (including homologous recombination) can affect the sequence diversity of human and primate genomes over time (Antonell et al., 2005; Chen and Li, 2001). In addition, these events contribute to phenotypic variation of species despite the similarity between human and primate genomes (Chen and Li, 2001; Frazer et al., 2003; Locke et al., 2003). Particularly, the brain has considerably evolved and diverged; it shows significant enlargement and has a

Abbreviations: LCR, low-copy repeat; POM121, POM121 transmembrane nucleoporin; POM121B, POM121 transmembrane nucleoporin B; POM121C, POM121 transmembrane nucleoporin C; NSUN5, NOP2/Sun domain family, member 5; NSUN5B, NOP2/Sun domain family, member 5 pseudogene 1; NSUN5C, NOP2/Sun domain family, member 5 pseudogene 2; FKBP6, FK506 binding protein 6; TRIM50, Tripartite motif containing 50; TRIM73, Tripartite motif containing 73; TRIM74, Tripartite motif containing 74; NAHR, non-allelic homologous recombination; TE, transposable element; HERV, Human Endogenous Retrovirus; LINE, long interspersed element; SINE, short interspersed element; SVA, SINE-VNTR-Alu. ⁎ Corresponding author. E-mail address: [email protected] (H.-S. Kim). 1 Yun-Ji Kim and Kung Ahn contributed equally to this work.

complicated structure and function in primates (Evans et al., 2004; Goodman et al., 1998). Human and primate genomes are identified by the enrichment of large interspersed segmental duplication compared to other mammalian genomes (Bailey and Eichler, 2006; Lander et al., 2001). In human segmental duplication, low-copy repeats (LCRs) constitute approximately 5–10% of the genome (Stankiewicz et al., 2004). These duplicated sequences increase genomic instability and contribute to the emergence of new functional genes during evolution (MarquesBonet and Eichler, 2009; Marques-Bonet et al., 2009). For instance, chromosome 7q 11.23 is a well-established duplication region with different copy numbers in primate lineages (Perez Jurado et al., 1998). This region contains about 7% duplicated sequences and a highly dynamic copy number variation of the duplicated regions (large region-specific LCRs of block A, B, and C) (Bayes et al., 2003; Hillier et al., 2003) and is deleted in the William–Beuren syndrome (WBS, OMIM 194050). WBS is a neurodevelopmental disorder caused by a hemizygous contiguous gene deletion of 1.5–1.8 Mb, including about 28 genes on chromosome 7q 11.23, and its prevalence rate is estimated to be 1 in 7500 to 20,000 births (Savina et al., 2011; Schubert, 2009). Loss of genes results in cardiovascular defects, malformed face, hypercalcemia, and other abnormal features in WBS patients. Antonell et al. (2005) reported that this region was generated by complicated evolutionary steps during primate divergence. From a single copy of blocks B and C in the orthologous

http://dx.doi.org/10.1016/j.gene.2015.07.060 0378-1119/© 2015 Elsevier B.V. All rights reserved.

Please cite this article as: Kim, Y.-J., et al., Gene structure variation in segmental duplication block C of human chromosome 7q 11.23 during primate evolution, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.07.060

2

Y.-J. Kim et al. / Gene xxx (2015) xxx–xxx

region in the rhesus macaque genome, evolutionary events such as inversion, segmental duplication, and non-allelic homologous recombination (NAHR) occurred, leading to the generation of the present human 7q 11.23 and primates' orthologous regions (Antonell et al., 2005; Schubert, 2009). Interestingly, the primate-specific transposable element (TE), Alu, dominantly existed in the edge of the large duplication block, as well as within the duplication block. This indicates that Alu played a role as a genomic trigger of evolutionary rearrangement in this region (Antonell et al., 2005). TEs comprise 40–50% of the mammalian genome; they are generated in numerous copies via self-replication and can affect the variation of their host genome via various mechanisms (Makalowski, 2001; Thornburg et al., 2006). For example, TEs, with evolutionarily recent integration about 6 million years ago after the divergence of human and chimpanzee, could be important molecular genetic sources of genetic alteration and complex regulation associated with human diseases. Human-specific TEs present smaller copy numbers and lack the time for their explosion. However, these elements have stronger potential of activation with less accumulation of mutations and silencing mechanisms compared with other TEs integrated into primate or mammalian ancestors of humans. HML-2 (HERV-K; Human Endogenous Retrovirus-K), L1 (LINE; long interspersed element), Alu (SINE; short interspersed element), and SVA (SINE-VNTR-Alu) are considered human-specific TEs with copy numbers of 150, 1200, 5500, and 860, respectively (Baskayev and Buzdin, 2012; Mills et al., 2007). HERV has been integrated into the primate genome by infection of germ cells and has become an intrinsic part of the genome during evolution (Yi and Kim, 2006). In particular, HERV-K has been integrated into the catarrhine lineage genomes after its divergence from the platyrrhine lineage and is linked to cancer (Brady et al., 2009; Serafino et al., 2009; Goering et al., 2011; Katoh et al., 2011). The primate-specific TE, Alu, is the most abundant element in the human genome and has exploded during primate evolution. Similar to other TEs, Alu creates genomic mutations and alters gene expression by providing regulatory elements after integration into the host genomes (Hasler and Strub, 2006; Shen et al., 2011). In addition, Alu is considered to be a major cause of gene structure variation because of its high copy number (millions of copies), which results in primate- and human-specific genomic features and disorders (Consortium, 2005; Sen et al., 2006). According to previous studies, Alu can create genomic deletions by interchromosomal and intrachromosomal recombination, called Alu recombinationmediated deletion (ARMD) (Han et al., 2007; Sen et al., 2006). In this study, four genes (POM121, NSUN5, TRIM50, and FKBP6) in the LCR element of block C, formed by segmental duplication on the human chromosome 7q 11.23, were analyzed individually. These genes underwent different evolutionary events via segmental duplication, resulting in varied gene structures and features. In particular, we found that TEs contributed to gene structure variation of the NSUN5 and TRIM50 genes, supporting the critical function of TEs in genome variability and as an evolutionary driving force. 2. Materials and methods 2.1. Identification of orthologous transcripts in block C of 7q 11.23 in human and primates The human chromosome 7q 11.23 consists of large LCRs in blocks A, B, and C, formed by segmental duplication (Bayes et al., 2003; Hillier et al., 2003). We analyzed 4 genes (POM121, NSUN5, TRIM50, and FKBP6) in the LCRs of block C. DNA and mRNA sequences of humans (hg19; February 2009) and primates, including chimpanzee (Pan troglodytes) (panTro4; February 2011), gorilla (Gorilla gorilla) (gorGor3; May 2011), rhesus macaque (Macaca mulatta) (rheMac3; October 2010), and common marmoset (Callithrix jacchus) (calJac3; March 2009), were obtained from the UCSC Genome Browser (http://genome.ucsc.edu). We accurately reconstructed

the structures of four genes using BLAST version 2.2.26+ (http://blast. ncbi.nlm.nih.gov) and BLAT (http://genome.ucsc.edu) from EST and RefSeq mRNAs extracted from the NCBI. We then aligned these paralogous genes using ClustalW (Thompson et al., 1994). Human POM121B sequences (ENST00000380760) were obtained from the Ensembl Genome Browser 75 (February 2014) (http://asia.ensembl.org). Additionally, we analyzed duplication block C in the Neandertal and Denisovan genomes and compared them with human reference genome. The Neandertal (http://www.eva.mpg.de/neandertal/index.html) and Denisovan (http://www.eva.mpg.de/denisova/index.html) genome data were downloaded from Max Planck Institute for Evolutionary Anthropology (Green et al., 2010; Meyer et al., 2012). And then, we confirmed mapping rate in sequencing reads using IGV tools (https:// www.broadinstitute.org/igv/). 2.2. DNA preparation and PCR amplification Genomic DNA was isolated from heparinized blood samples of the following species: (1) hominoids: common chimpanzee (P. troglodytes), gorilla (G. gorilla), orangutan (Pongo pygmaeus), and a gibbon (Hylobates agilis); (2) Old World monkeys: Japanese macaque (Macaca fuscata) and rhesus macaque (M. mulatta); (3) New World monkeys (Platyrrhini): night monkey (Aotus trivirgatus), squirrel monkey (Saimiri sciureus), and common marmoset (C. jacchus); and (4) Prosimian: galago (O. crassicaudatus). Heparinized blood samples were treated in a standard lysis buffer with 5 mg/mL proteinase K, 10 mM Tris, 100 mM NaCl, 1 mM EDTA, and 10% sodium dodecyl sulfate. After phenol extraction, all samples were dialyzed overnight in 10 mM Tris and 1 mM EDTA (TE solution). The DNA concentration was calculated based on measurements on a ND-1000 UV–VIS spectrophotometer (NanoDrop, Wilmington, DE, USA) and the DNA was diluted to 100 ng/μL. Heparinized blood samples were collected at Kyoto University Primate Research Institute. AluSp in the NSUN5B gene was amplified from genomic DNA of human and 10 primates with the following primer pair from GenBank (accession no. NR03322): 5′-GGGACAGCACAGTGGAGC-3′ and 5′-TGTA GGAAGCTCTAAAGCCAGA-3′. The PCR samples were subjected to an initial denaturation at 94 °C for 4 min; 30 cycles of denaturation at 95 °C for 40 s, annealing at 60 °C for 40 s, and extension at 72 °C for 40 s; followed by a final extension at 72 °C for 7 min. 2.3. RNA preparation and reverse transcription (RT) reaction Total RNA from human tissues, including the adrenal gland, cerebellum, adult whole brain, heart, kidney, liver, lung, testis, trachea, bone marrow, fetal brain, fetal liver, placenta, prostate, salivary gland, skeletal muscle, spinal cord, thymus, thyroid, and uterus, were purchased from Clontech (Madison, WI, USA). Total RNA from the adrenal gland, cerebellum, cerebrum, testis, trachea, spleen, heart, kidney, liver, colon, stomach, pancreas, and lung of the rhesus macaque and the colon, heart, kidney, liver, lung, ovary, pancreas, stomach, spleen, and small intestine of the common marmoset were extracted using Trizol reagent (Invitrogen, Carlsbad. CA, USA). All tissues from the rhesus monkey and common marmoset were provided by the National Primate Research Center (NPRC) of Korea. Animal procedures and study design were conducted in accordance with the Guidelines of the Institutional Animal Care and Use Committee (KRIBB-AEC-15031) in the Korea Research Institute of Bioscience and Biotechnology (KRIBB). To eliminate DNA contamination during RNA extraction, the Turbo DNA-free™ kit (Ambion, Austin, TX, USA) was used. Amplification of the DNasetreated total RNA samples was performed without RT (No-RT experiment) to confirm the absence of DNA contamination. The RNA concentration was diluted to 500 ng/μL after determining the concentration using an ND-1000 UV–VIS spectrophotometer. M-MLV RT (Promega, Madison, WI, USA), with an annealing temperature of 42 °C and an RNase inhibitor (Promega), were used for RT-PCR. The housekeeping gene, glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was

Please cite this article as: Kim, Y.-J., et al., Gene structure variation in segmental duplication block C of human chromosome 7q 11.23 during primate evolution, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.07.060

Y.-J. Kim et al. / Gene xxx (2015) xxx–xxx

amplified as a positive control, using the following primer pair: 5′GAGCCCCAGCCTTCTCCATG-3′ and 5′-GAAATCCCATCACCATCTTCCA GG-3′. These primers were designed using the human GAPDH sequence (GenBank accession no. NM_002046). The control PCR experiments were conducted using the following conditions: initial denaturation at 94 °C for 4 min; 30 cycles of denaturation at 95 °C for 40 s, annealing at 55 °C for 40 s, and extension at 72 °C for 40 s; followed by a final extension at 72 °C for 7 min. The transcript of each family gene was amplified with the corresponding primer set under the following conditions: initial denaturation at 94 °C for 4 min; 30 cycles of denaturation at 95 °C for 40 s, annealing at the optimal temperature for each gene for 40 s, and extension at 72 °C for 90 s; followed by a final extension at 72 °C for 7 min. Primer information and optimal annealing temperatures are indicated in Table 1. 2.4. Cloning and sequencing A specific PCR product was amplified by RT-PCR, cloned, and sequenced to verify whether the PCR product was correct or a false positive. PCR products were separated on a 1.5% agarose gel, purified with the QIAquick gel extraction kit (Qiagen, Venlo, Netherlands), and ligated into the pGEM-T Easy vector (Promega). The ligation mixture was transformed into Escherichia coli DH5α. Clones were isolated using a plasmid DNA purification kit (LaboPass, Seoul, Korea). DNA sequencing was performed by Macrogen (Seoul, Korea) with the T7 and SP6 primers using dideoxy chain-termination sequencing on an Applied Biosystems 3730XL automated DNA sequencer. 2.5. Computational analysis The transcription of each gene was analyzed directly by RT-PCR and from data obtained in a previous study. We obtained BioGPS microarray data deposited in a centralized gene portal (http://biogps.gnf.org) and used 2 data sets (GeneAtlas U133A, gcrma and NCI60 on U33A, gcrma) with probes NSUN5-203802_at, NSUN5B-213670_x_at, NSUN5B214100_x_at, NSUN5C-213842_x_at, and NSUN5C-213460_x_at for this analysis (Wu et al., 2009). Additional information is provided in supplementary Fig. 1. The composition of TEs included in the NSUN5 and TRIM50 gene groups as well as the genomic loci of the TEs were identified using RepeatMasker (www.repeatmasker.org). The consensus sequences of the TEs were obtained from the RepBase library of repeats (http:// www.girinst.org) (Jurka, 2000). The exon deletion between the TRIM50 and TRIM73 genes was detected using PipMaker (http://pipmaker.bx. psu.edu/pipmaker), which shows a percent identity plot (pip) based on the alignment of two DNA sequences (Schwartz et al., 2000).

3

3. Results and discussion 3.1. Identification of transcripts from four genes in segmental duplication block C Duplicated sequences could affect genetic instability and increase the emergence of new functional or nonfunctional genes during evolution (Marques-Bonet and Eichler, 2009). Human chromosome 7q 11.23 is composed of large region-specific LCRs within blocks A, B, and C (Antonell et al., 2005; Bayes et al., 2003; Hillier et al., 2003). This entire region has been shown to exist as three copies in humans, two copies in chimpanzees, gorillas, and orangutans, and a single copy in rhesus macaque monkeys, mice, and other mammals (Antonell et al., 2005; DeSilva et al., 1999; Valero et al., 2000). We analyzed four genes, POM121, NSUN5, TRIM50, and FKBP6, in the LCR of block C in human and primate genomes using computational analysis. As shown in Fig. 1, the medial block C (Cm), centromeric block C (Cc), and telomeric block C (Ct) were identified in the human genome. Block Cm is found in human, chimpanzee, rhesus macaque, and marmoset genomes, while block Ct is only found in human and chimpanzee genomes. This indicates that block Ct was generated by a segmental duplication of block Cm after the divergence of the Cercopithecoidea and Hominoidea lineages and block Cc was generated by a segmental duplication of block Ct that is specific to humans. Interestingly, the orthologous region in the orangutan genome includes the same copy number as that of the gorilla and chimpanzee genomes via independent duplication events (Antonell et al., 2005). We also analyzed this region in the Neandertal and Denisovan genomes and could find three copies in both genomes which are the same as reference human genome. We observed that the four genes were partly or entirely duplicated and varied in gene structure from the original gene. We hypothesized that each gene underwent gene-specific evolutionary events that resulted in gene structure variation in humans and primates after the same genomic rearrangement and we identified a gene structure variation and its corresponding evolutionary mechanism by using comparative analysis. 3.2. POM121 gene acquired new exons in humans and chimpanzees The POM121, POM121B, and POM121C genes are arranged in tandem in the human chromosome 7q 11.23 by segmental duplication and recombination events during primate evolution (Fig. 1) (Antonell et al., 2005). The POM121 gene, a transmembrane nucleoporin, is essential for nuclear pore complex (NPC) assembly initiation that leads to nuclear envelope (NE) formation during mitosis in vertebrates (Antonin et al., 2005; Stavru et al., 2006). First, we analyzed the gene structures of orthologous and paralogous POM121 genes in humans and primates

Table 1 The primer pairs used in this study. No.

Target gene

Method

In use

Species

Size (bp)

Forward/reverse

Sequences

1

POM121

RT-PCR

Validation of acquired exons Expression

Human/rhesus monkey/marmoset Human/rhesus monkey/marmoset Human/primates

578 371

Human

901

Rhesus monkey/marmoset Human/chimpanzee/rhesus monkey

543

Forward Reverse Forward Reverse Forward Reverse Forward Reverse Forward Reverse Forwarda Reverseb Forwardc

5′-CTGTGGATAACGGGAGGTGA-3′ 5′-AGAGACCCAGGCTTAGGCAC-3′ 5′-CATCACCCTTCTCTAGCCCA-3′ 5′-AGCATCGCTCTTGTCCTCC-3′ 5′-GGGACAGCACAGTGGAGC-3′ 5′-TGTAGGAAGCTCTAAAGCCAGA-3′ 5′-CGGGAACATGGGGCTGTA-3′ 5′-CTCATGGTAGCGTGGATCC-3′ 5′-CCAGAACGTGAAGCAGCTGTACG-3′ 5′-TGCAGAATGAGGTGTCCG-3′ 5′-ATCTCAGCTCCACTGGTTCCT-3′ 5′-AGCCACAGGATAAATCTCAGG-3′ 5′-GACTTCCAGGGCAAGCTCTAC-3′

Forward Reverse

5′-GGCCAGATGCTTCGTCTT-3′ 5′-TGATGGGATTTGGGATGTAAA-3′

2

RT-PCR

3

NSUN5B/NSUN5C

Genomic PCR

4

NSUN5

RT-PCR

Integration of transposable elements Expression

RT-PCR

Expression

Genomic PCR

Validation of deletion (flanking) Validation of deletion (internal) Validation of acquired exons

5 6

TRIM50/TRIM73/TRIM74

7 8

FKBP6

RT-PCR

Human/rhesus monkey

397/1378

2414/7220 1635 315

a,c

Two forward primers were used in the amplification with same reverse primerb.

Please cite this article as: Kim, Y.-J., et al., Gene structure variation in segmental duplication block C of human chromosome 7q 11.23 during primate evolution, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.07.060

4

Y.-J. Kim et al. / Gene xxx (2015) xxx–xxx

Fig. 1. Schematic representation of the genomic structure of the human chromosome 7q 11.23 and the orthologous region in primates. Original and duplicated genes were grouped into blocks Cc, Cm, and Ct. Humans and primates (chimpanzee, rhesus macaque, and marmoset) have different copy numbers of block C through segmental duplication. (a) Each gene is labeled in a form, A, A′, and A″ (the same in B, C, and D) in order of generation, and annotated symbol was placed at the bottom of the genes based on NCBI. The solid and dotted lines indicate a functional gene and pseudogene, respectively. Here, functional gene and pseudogene mean functional protein-coding gene and non-functional protein coding gene by structural variation, respectively. (b) Segmental duplication events occurred along primate lineages from a common ancestor.

by using computational analysis. Through this analysis, the POM121B gene in rhesus macaque was functional before the divergence of humans and chimpanzees and it remained in chimpanzees as an additional functional gene. However, in humans, intrachromosomal recombination occurred to generate three POM121 gene copies (POM121, POM121B, and POM121C) arranged in tandem in chromosome 7q 11.23 and, thus, the POM121B gene in block Cm became a pseudogene via deletion of exons and POM121 and POM121C genes are potentially protein-coding genes (Figs. 1 and 2) (Antonin et al., 2005). Funakoshi et al. (2007) also reported that these two POM121 loci in this region are transcribed and translated in HeLa cell line. Comparing gene structures, POM121 consists of 13 exons in the chimpanzee and rhesus macaque. However, POM121C in humans and POM121 in humans and chimpanzees present an additional three exons in the 5′-flanking region and POM121B lost exons in humans (Fig. 2a). To confirm the presence of acquired exons, specifically in humans, we conducted RT-PCR with various human, rhesus macaque, and marmoset tissues. As expected, PCR products were only detected in humans (Fig. 2b). In addition, POM121 expression was observed in human, rhesus macaque, and marmoset tissues. As shown in Fig. 2c, POM121 gene is ubiquitously expressed. In conclusion, due to species-specific segmental duplication, the POM121 gene copy number increased and both the original and duplicated genes, POM121 and POM121C, have retained their functions. Moreover, humans acquired three additional POM121 genes after a second duplication event. In contrast, POM121B lost exons and evolved into a pseudogene in humans. 3.3. NSUN5B and NSUN5C genes present a new exon by integration of transposable elements NSUN5, another gene in duplication block C, is the origin of NSUN5B and NSUN5C genes present as pseudogenes (Fig. 1). Based on a GenBank database analysis, the NSUN5B and NSUN5C genes are structured very similarly to the NSUN5 gene. However, a remarkable difference between the NSUN5 and the other two genes is the presence of an exon in the NSUN5B and NSUN5C genes; the orthologous region of the NSUN5 gene is an intron. Intron 4 of the NSUN5 gene is composed of various TEs such as MIR, Alus (SINE; short interspersed element), and L2 (LINE; long interspersed element) in the sense and antisense directions. Furthermore, a solitary HERV-K LTR (LTR5B) was identified in the NSUN5B

and NSUN5C genes in addition to the TEs present in the NSUN5 gene. LTR5B was exonized with AluSp to generate a new exon only in the NSUN5B and NSUN5C genes in humans (Fig. 3). Full-length Alu is approximately 300 bp and is highly similar to other Alu subfamily members (Rowold and Herrera, 2000). In the NSUN5B gene, a single Alu element was separated by LTR5B with 145 bp and 165 bp, respectively, and we could find the target site duplication (TSD) of LTR5B, GGATTA, on both sides of Alu. Additionally, to investigate the evolutionary history of the LTR5B and AluSp in primate genomes, we used a genome browser analysis and performed PCR amplification from genomic DNA from humans and 10 primates. As shown in Fig. 3, we designed primer pairs for the detection of LTR5B and AluSp, simultaneously. Expected PCR products were 1378-bp, 981-bp, and 397-bp. An amplicon of 1378-bp means integration of both LTR5B and AluSp, whereas amplicons of 981-bp and 397-bp mean the integration of LTR5B and AluSp, respectively. From this result, LTR5B appeared to be integrated into the primate genome after the divergence of the Cercopithecoidea and Hominoidea lineage, while AluSp was detected in all primates, indicating that AluSp integrated into the primate ancestor genome before the divergence of the Haplorhini and Strepsirrhini lineages. Next, to compare the expression pattern of the NSUN5 gene (lacking LTR5B) with that of the NSUN5B and NSUN5C genes (including LTR5B), we combined RT-PCR analysis and previously reported BioGPS microarray data (Wu et al., 2009). The NSUN5 gene is ubiquitously expressed in human tissues based on RT-PCR. In addition, we detected the expression of the NSUN5 gene in both rhesus macaque and common marmoset (Fig. 4a). However, we could not observe the expression of the NSUN5B or NSUN5C genes, respectively because their sequences are too similar. Thus, we used the data sets GeneAtlas U133A, gcrma with probes NSUN5B-213670_x_at and NSUN5C-213842_x_at in this analysis. As shown in Fig. 4b, the NSUN5, NSUN5B, and NSUN5C genes present similar expression patterns and are strongly expressed in T-cells, Burkitt lymphoma, and colorectal cancer. Thus, we compared the expression of NSUN5, NSUN5B, and NSUN5C genes in cancer cell lines using data set NCI60 on U33A gcrma with probes NSUN5-203802_x_at, NSUN5B214100_x_at, and NSUN5C-213460_x_at (Fig. 4b, right). Based on the microarray data for a cancer cell line, the NSUN5 gene was more strongly expressed than the NSUN5B and NSUN5C genes. These data indicate the effect of LTR in the NSUN5B and NSUN5C genes. TEs in the opposite direction could inhibit the expression of host genes via “head-on

Please cite this article as: Kim, Y.-J., et al., Gene structure variation in segmental duplication block C of human chromosome 7q 11.23 during primate evolution, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.07.060

Y.-J. Kim et al. / Gene xxx (2015) xxx–xxx

(a)

Cc block

5

Cm block

POM121

Ct block

POM121B

POM121C

HU

CH

RH

Acquired exon

CM

(b) M

M

HU 578 bp

GAPDH 120 bp

M

M

RH

CM

GAPDH 120 bp

120 bp GAPDH

(c) Cm block

M

M

HU 371 bp

HU GAPDH 120 bp

CH

RH M

RH CM

371 bp

M

CM 371 bp

GAPDH 120 bp

Fig. 2. Gene structure variation of the POM121 gene and duplicated genes during primate evolution. (a) Schematic representation of the POM121 gene and duplicated genes in humans and primates: The red bar in the black-dotted box indicates newly acquired exons before the second segmental duplication in humans. (b) Validation of newly acquired exons and expression of the POM121 gene: The primer pair for RT-PCR was designed to amplify the acquired exons. The PCR product size is indicated next to the RT-PCR data. (c) Expression of the POM121 gene was analyzed in humans, the rhesus macaque, and marmoset. Tissues, assayed in humans, the rhesus macaque, and marmoset, are indicated in the gel picture. M indicates a size marker. GAPDH was amplified as a positive control in all experiments. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

collision.” When a TE and its host gene are in the opposite direction, RNA polymerases also move in the opposite direction, retarding the expression of the TE and the host gene (Liu and Alberts, 1995; Wu et al., 1990). Namely, LTR could decrease the expression of the host genes, NSUN5B and NSUN5C genes. However, these expression data

are not sufficient and too preliminary for a direct comparison because of the low specificity of the BioGPS microarray probes. The evolution of the NSUN5 gene in primates can be summarized as follows. (1) AluSp was integrated into the NSUN5 gene before the divergence of the Haplorhini and Strepsirrhini lineages. (2) The NSUN5B gene

Please cite this article as: Kim, Y.-J., et al., Gene structure variation in segmental duplication block C of human chromosome 7q 11.23 during primate evolution, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.07.060

6

Y.-J. Kim et al. / Gene xxx (2015) xxx–xxx

(a)

Cc block NSUN5C

Cm block

Ct block

NSUN5

NSUN5B

HU

CH

RH

AluSp+LTR5

CM

(b)

NSUN5C

MIR

5’

AluY AluSp

NSUN5

AluSx L2

NSUN5B

3’

M

HU CH GO OR GI

LTR5B

JM RH NM SQ CM OG

1380-1383 bp 397-400 bp

Hominoid

Old world monkey

New world Prosimian monkey

Fig. 3. TE-related structure variation of the NSUN5 and duplicated genes during primate evolution. (a) Schematic representation of NSUN5 and duplicated genes in humans and primates: LTR5B (HERV-K LTR) and AluSp are exonized in the NSUN5B and NSUN5C genes of humans and the chimpanzee. (b) TEs in the NSUN5, NSUN5B, and NSUN5C genes of humans and conservation of LTR5B and AluSp. The primer pairs were designed to simultaneously detect LTR5B and AluSp using the consensus sequence of the NSUN5, NSUN5B, and NSUN5C genes. Gel image shows LTR5B and AluSp in the NSUN5B gene after amplification from humans and 10 primates. The species are abbreviated as follows: HU, human; CH, chimpanzee; GO, gorilla; OR, orangutan; GI, gibbon; JM, Japanese macaque; RH, rhesus macaque; NM, night monkey; SQ, squirrel monkey; CM, common marmoset; and OG, galago. The 397-bp of PCR product indicates the integration of AluSp.

was created by segmental duplication, resulting in chimpanzees with NSUN5 and NSUN5B genes. (3) LTR5B was integrated into AluSp in intronic region of the NSUN5B gene. (4) Two TEs were exonized to generate a new exon in chimpanzees. (5) A second segmental duplication occurred and the NSUN5C gene was created that had the same structure as the NSUN5B gene. (6) The NSUN5C, NSUN5, and NSUN5B genes were arranged in tandem in the 7q 11.23 region. 3.4. TRIM73 and TRIM74 genes present only five exons because of homologous recombination-mediated deletion From the TRIM50 gene in rhesus macaque, two segmental duplication events created the TRIM50 and TRIM73 genes in chimpanzees, leading to the sequentially arranged TRIM74, TRIM50, and TRIM73 genes in the human genome (Fig. 1). Based on GenBank database analysis, the structure of these three genes is similar, but the TRIM73 and TRIM74 genes have a longer exon 5 and two additional exons (exons 6 and 7) compared with the TRIM50 gene (Fig. 5a and Supplementary Fig. 2). The TRIM50 gene was spliced using a canonical 5′ and 3′ splicing site (GT-AG). In contrast, the TRIM73 and TRIM74 genes have exonized L2 and AluJr, providing a polyadenylation signal (AATAAA) in the last exon that resulted in a 4 bp longer coding sequence (CDS). This event was also observed in the TRIM73 gene with integration of AluJr in the chimpanzee (Supplementary Fig. 2). Next, we found a large deletion in the TRIM73 and TRIM74 genes that is not present in the TRIM50 gene. These duplicated genes present only five exons corresponding to the deletion of a paralogous region of exons 6 and 7 in the TRIM50 gene and several TEs were present in this deletion region (Fig. 5b). Thus, we analyzed all of the TEs dispersed in the

intronic (introns 5 and 6) and 3′ flanking region (between the last exon of the TRIM50 gene and the first exon of the NSUN5 gene) of the TRIM50 and orthologous region of duplicated genes. Of the ≥15 identified TEs in the TRIM50 gene, primate-specific Alu was more dominantly dispersed than any other TEs, including LINE, DNA transposon, and LTR. The composition of TEs in humans was also observed in the common marmoset with a slight difference (data not shown). In addition, compared with the TRIM50 gene, the duplicated genes were highly homologous except for a deletion region (Fig. 5b). First, we confirmed the deletion using PCR amplification of human and primate templates (Fig. 5c). We found a 4817 bp of deletion region between the TRIM50 gene and two duplicated genes through sequence alignment (Fig. 5d). Next, to examine the mechanism of deletion, we hypothesized that Alu recombination-mediated deletion (ARMD) occurred from a dominant number of Alu dispersed in the TRIM50 gene. According to previous reports, Alu is considered a major cause of gene structure variation and genomic deletions are caused by interchromosomal and intrachromosomal recombination events by Alu–Alu, called ARMD, resulting in genomic disorders. The size range of the deletions is wide, but mostly shorter sequences (b1 kb) are deleted, with an average of approximately 806 bp and a maximum observed deletion of 7255 bp. The ARMD events are associated with unique characteristics such as high GC contents, high sequences similarities (regardless of family affiliation), and over 1 million copies in the primate genome (Sen et al., 2006). As described above, the intronic and 3′-flanking regions of the TRIM50 and duplicated genes are composed of various TEs, especially Alu, and thus computational and experimental analyses were conducted to support our hypothesis. Interestingly, between AluSx and AluSz, AluYc was detected in a deletion region of the duplicated genes in a truncated form (38 bp)

Please cite this article as: Kim, Y.-J., et al., Gene structure variation in segmental duplication block C of human chromosome 7q 11.23 during primate evolution, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.07.060

Y.-J. Kim et al. / Gene xxx (2015) xxx–xxx

7

(a)

HU

M

M

901 bp

GAPDH 120 bp

M

M

RH

CM

543 bp

543 bp

GAPDH 120 bp

(b)

120 bp GAPDH

NSUN5

NSUN5B

NSUN5C

NSUN5

NSUN5B

NSUN5C

Colorectal adenocarcinoma Lymphoma burkitts

CD8+ T cells CD4+ T cells

Fig. 4. Expression analysis of the NSUN5 family gene. (a) Expression of the NSUN5 gene in various tissues of humans, rhesus macaque, and marmoset. M indicates a size marker. GAPDH was amplified as a positive control at all instances. (b) Expression of the NSUN5, NSUN5B and NSUN5C genes of humans in GeneAtlas U133A, gcrma (left) and NCI60 on U33A, gcrma (right) based on the BioGPS microarray data.

based on RepeatMasker. AluYc did not appear to have recently been integrated according to our analysis and its sequence was very similar to AluJb in the TRIM50 gene. Thus, we investigated the TSD and flanking sequences of each Alu in TRIM50 and duplicated genes to determine the characteristics of AluYc in humans and chimpanzees. The interval sequence AAGTCT(C)T(G) was observed in AluSx and AluJb of the TRIM50 gene and in AluSz and ambiguous AluYc of the TRIM73 gene (Fig. 6a). We also found these sequences in the TRIM50 and TRIM73 genes of chimpanzee. These data strongly indicated that the AluYc is a partial form derived from AluJb (Figs. 5d and 6a). Lastly, to establish the mechanism of deletion, we found a homologous CTTT sequence used as a break point supporting homologous recombination-mediated deletion of the TRIM73 gene (Fig. 6b). Interestingly, break points existed in Alu and the intronic sequences indicated the lack of ARMD. However, a break point in AluJb (CTTT) was present in a recombination “hot-spot” 5′TGTAATCCCAGGACTTTGGGAGG-3′ at positions 24 and 45 of the Alu consensus sequence (Han et al., 2007). In summary, the TRIM50 gene in primates was duplicated after the divergence of Hominoidea from Cercopithecoidea, resulting in the TRIM50 and TRIM73 genes in the chimpanzee genome. Exon 5 was then elongated by exonization of L2 and AluJr and the TRIM73 gene

was reconstructed by recombination-mediated deletion of an exonic region. Finally, the TRIM74 gene was created by a second segmental duplication, leading to the gene arrangement observed in the human genome. 3.5. Duplicated genes of FKBP6 have acquired new exons with partial duplication FKBP6 is located at the edge of the duplication block C. This gene also evolved via duplication events in primates, similar to the abovedescribed three genes. Through two segmental duplication events, FKBP6 was co-localized with LOC100101148 and LOC541473 in the human chromosome 7q 11.23 (Fig. 1). This gene encodes a protein containing a prolyl isomerase/FK506-binding domain that localizes to meiotic chromosome cores and regions of homologous chromosome synapsis. Azoospermic males and the absence of normal pachytene spermatocytes were observed from targeted inactivation of Fkbp6 in mice (Crackower et al., 2003; Miyamato et al., 2006). FKBP6 in humans has two alternative transcripts with different first exons (data not shown). Furthermore, the duplicated genes, LOC100101148 and LOC541473, have not acquired functions. The genes arisen from gene

Please cite this article as: Kim, Y.-J., et al., Gene structure variation in segmental duplication block C of human chromosome 7q 11.23 during primate evolution, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.07.060

8

Y.-J. Kim et al. / Gene xxx (2015) xxx–xxx

(a)

Cc block

Cm block

Ct block

TRIM74

TRIM50

TRIM73

(c) M

HU

HU

CH

RH

M

HU

CH

RH

CH

RH

Deleted region

CM

Internal region

(b)

(d)

TRIM50 gene 6

Flanking region

7 5

MER5A 6

7

MADE1

TRIM50

TRIM73 gene 3’ flanking region

L2bAluJr AluSxAluJb

MIRc

TRIM50 gene intron 5 to 3’ flanking region

AluSq

AluJr

AluSz L2c

AluSz AluJr LTR16C

MLT1C

Deletion of 4817 bp 5

AluSx

AluSz

TRIM73 L2b AluJr

AluJb

AluJr LTR16C

Fig. 5. Structure variation of the TRIM50 and duplicated genes during primate evolution. (a) Schematic representation of the TRIM50 gene and duplicated genes in each block in human and primates. The black-dotted box indicates a deletion region of the TRIM50 gene. (b) PCR validation for deleted regions in humans, the chimpanzee, and rhesus macaque using two primer pairs designed in the internal and flanking region of the deletion. HU, human; CH, chimpanzee; RH, rhesus macaque. (c) Identification of the deletion region of the TRIM73 gene using PipMaker. The TRIM73 gene has only 5 exons because of the deletion of the orthologous region of exons 6 and 7 in the TRIM50 gene. Several TEs were integrated into this deletion region. (d) Detailed comparison of the TRIM50 and TRIM73 genes. The gray-dotted box means a 4817-bp deletion region, including exons 6 and 7 of the TRIM50 gene. In the TRIM73 gene, exon 5 is elongated by exonization of L2 and AluJr and AluYc was corrected with AluJb (marked in red) by TSD analysis. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

(a) Exon 6 AluJr

AluSx

Exon7

TRIM50

AluJb MER5A

AluSz

AluJr

Exon 5

TRIM73 AluJr

TRIM50

AluSx

AluJb

AAATGAAGGGACAGA AAGTCTT CACATTACTTGG ATGCTCG(C)AGTC TCGAGTCTA(T)G(T) TAGGCTG(C)TCATACC

AluSz

TRIM73

AluJr

AAATGAAGGCACAGA AAGTCCG ATGCTCAAGTC TTGAGTCT(C)TTAT TAGCCTCCCATAAC

(b) AluJb CTTT

AluJb

AluSz

TRIM50

AluSz

TRIM73

CTTT

CTTT

Fig. 6. Homologous recombination-mediated deletion in the TRIM73 gene. (a) Comparison of the TSD and flanking sequences of each Alu region in TRIM50 and TRIM73 showed that AluYc was derived from AluJb (marked in red) in the TRIM50 gene by the interval sequence AAGTCT(C)T(G) between AluSx and AluJb of the TRIM50 gene and AluSx and AluJb of the TRIM73 gene. Nucleotides in parentheses denote the chimpanzee sequence. (b) CTTT was observed in AluJb and the 3′-flanking sequence of the TRIM50 gene and AluJb of the TRIM73 gene in humans and the chimpanzee, considered break points supporting homologous recombination-mediated deletion. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Please cite this article as: Kim, Y.-J., et al., Gene structure variation in segmental duplication block C of human chromosome 7q 11.23 during primate evolution, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.07.060

Y.-J. Kim et al. / Gene xxx (2015) xxx–xxx

duplication events could be transformed in pseudogenes by subsequent acquisition of mutations that rendered them inactive (Khurana et al., 2010). Through a computational analysis, we found that LOC541473 and LOC100101148 are truncated forms of the original FKBP6 gene. The duplicated genes have only four exons duplicated from the FKBP6 gene with an extra exon (Antonell et al., 2005). The acquired exon was found to derive from intronic sequences of the upstream FKBP6 and STAG3L1 genes (Fig. 7b). In order to confirm this result, we performed RT-PCR of this region in humans and the rhesus macaque. As shown in Fig. 7c, as observed by computational analysis, the recently acquired exon in humans was detected by RT-PCR. In addition, we analyzed the expression of the original FKBP6 gene. In humans, the FKBP6 gene is expressed in the testis, heart, skeletal muscle, liver, and kidney, based on Southern blotting and no marked expression was detected in any other normal tissues, according to BioGPS microarray data (data not shown) (Meng et al., 1998; Wu et al., 2009). However, the FKBP6 gene was not detected in the rhesus macaque and marmoset in this study (Supplementary Fig. 3). To establish the evolutionary mechanism in primates, LOC541473 was generated by the duplication of the FKBP6 gene after the divergence of Hominoidea and Cercopithecoidea, but only duplication of the four exons and introns resulted in loss of function. Furthermore, this disrupted FKBP6 gene acquired an additional exon that derived from introns of the neighboring STAG3L1 gene

(a)

9

(LOC541473). Lastly, a second duplication only occurred in the human lineage to create LOC100101148.

4. Conclusion: evolution of the four genes in block C In summary, we investigated the structure of four genes (POM121, NSUN5, TRIM50, and FKBP) in block C of the human chromosome 7q 11.23 by analyzing evolutionary processes using comparative genomics and experimental validation. Block C is divided into 3 blocks (Cc, Cm, and Ct, formed by segmental duplication) that vary in copy number in humans and other primates. Four genes in these blocks are arranged in a cluster with either partial or complete duplication. Based on comparative genomic analyses, we propose that these genes underwent different evolutionary events to generate varied gene structures and features in primate lineages after shared segmental duplication (Fig. 8). Our study provides information on genes' structure variation and diversification during primate evolution. In particular, TEs were found to contribute to gene structure variations of the NSUN5 and TRIM50 genes, supporting the critical role of TEs in genome variability and as evolutionary driving forces. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.gene.2015.07.060.

Cc block

Cm block

LOC100101148

FKBP6

Ct block LOC541473

HU CH Acquired exon

RH

4 exons duplicated

CM

(b)

(c) M

HU

M

M

RH

315 bp

GAPDH 120 bp

Fig. 7. Gene structure of the FKBP6 and its duplicated pseudogenes, LOC541473 and LOC100101148. (a) Schematic representation of the FKBP6 gene and duplicated genes in humans and primates. LOC541473 was duplicated in only 4 exons (black-solid box) from the FKBP6 gene and presented a newly acquired exon (black-dotted box). (b) Alignment of sequences among the LOC541473 exon, intronic region of the FKBP6, and that of STAG3L1 gene. (c) Validation of a newly acquired exon in LOC541473 and expression analysis of the FKBP6 gene in humans and the rhesus macaque. M indicates a size marker. GAPDH was amplified as a positive control.

Please cite this article as: Kim, Y.-J., et al., Gene structure variation in segmental duplication block C of human chromosome 7q 11.23 during primate evolution, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.07.060

10

Y.-J. Kim et al. / Gene xxx (2015) xxx–xxx

Human Chr. 7q 11.23

Cc

Chimpanzee Chr. 7 Cm

Cm

Ct

Ct

Creation of block Cc

Segmental duplication

Rhesus Monkey Chr. 3

Pseudogenization of the POM121B gene by deletion of exons in block Cm

Acquisition of new exon in the POM121C gene Integration and exonization of LTR5B with AluSp in the NSUN5B gene Homologous recombination in the TRIM73 gene Exon 5 was elongated by exonization of L2 and AluJr in the TRIM50 gene

Mouse Chr. 5

Acquisition of new exon from intron of the FKBP6 and STAG3L1 genes (LOC100613367) Broken FKBP6 gene was inactivated in block Ct

Segmental duplication

AluSp in intron of the NSUN5 gene

Common ancestor

Fig. 8. Proposed schematic representation of evolutionary events of four genes of block C in the human chromosome 7q 11.23 during primate evolution.

Conflict of interest The author declares that there is no conflict of interest. References Antonell, A., de Luis, O., Domingo-Roura, X., Perez-Jurado, L.A., 2005. Evolutionary mechanisms shaping the genomic structure of the Williams–Beuren syndrome chromosomal region at human 7q11.23. Genome Res. 15, 1179–1188. Antonin, W., Franz, C., Haselmann, U., Antony, C., Mattaj, I.W., 2005. The integral membrane nucleoporin pom121 functionally links nuclear pore complex assembly and nuclear envelope formation. Mol. Cell 17, 83–92. Bailey, J.A., Eichler, E.E., 2006. Primate segmental duplications: crucibles of evolution, diversity and disease. Nat. Rev. Genet. 7, 552–564. Baskayev, K.K., Buzdin, A.A., 2012. Evolutionarily recent insertions of mobile elements and their contribution to human genome structure. Biol. Bull. Rev. 2, 371–385. Bayes, M., Magano, L.F., Rivera, N., Flores, R., Perez Jurado, L.A., 2003. Mutational mechanisms of Williams–Beuren syndrome deletions. Am. J. Hum. Genet. 73, 131–151. Brady, T., Lee, Y.N., Ronen, K., Malani, N., Berry, C.C., Bieniasz, P.D., Bushman, F.D., 2009. Integration target site selection by a resurrected human endogenous retrovirus. Genes Dev. 23, 633–642. Chen, F., Li, W., 2001. Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am. J. Hum. Genet. 68, 444–456. Chimpanzee Sequencing and Analysis Consortium, 2005. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87. Crackower, M.A., Kolas, N.K., Noguchi, J., Sarao, R., Kikuchi, K., Kaneko, H., Kobayashi, E., Kawai, Y., Kozieradzki, I., Landers, R., et al., 2003. Essential role of Fkbp6 in male fertility and homologous chromosome pairing in meiosis. Science 300, 1291–1295. DeSilva, U., Massa, H., Trask, B.J., Green, E.D., 1999. Comparative mapping of the region of human chromosome 7 deleted in Williams syndrome. Genome Res. 9, 428–436. Evans, P.D., Anderson, J.R., Vallender, E.J., Gilbert, S.L., Malcom, C.M., Dorus, S., Lahn, B.T., 2004. Adaptive evolution of ASPM, a major determinant of cerebral cortical size in humans. Hum. Mol. Genet. 13, 489–494. Frazer, K.C., Hinds, X., Pant, D.A., Patil, P.V., Cox, N., DR, 2003. Genomic DNA insertions and deletions occur frequently between humans and nonhuman primates. Genome Res. 13, 341–346. Funakoshi, T., Maeshima, K., Yahata, K., Sugano, S., Imamoto, F., Imamoto, N., 2007. Two distinct human POM121 genes: requirement for the formation of nuclear pore complexes. FEBS Lett. 581, 4910–4916. Goering, W., Ribarska, T., Schulz, W.A., 2011. Selective changes of retroelement expression in human prostate cancer. Carcinogenesis 32, 1484–1492. Goodman, M., Porter, C.A., Czelusniak, J., Page, S.L., Schneider, H., Shoshani, J., Gunnell, G., Groves, C.P., 1998. Toward a phylogenetic classification of Primates based on DNA evidence complemented by fossil evidence. Mol. Phylogenet. Evol. 9, 585–598.

Green, R.E., Krause, J., Briggs, A.W., Maricic, T., Stenzel, U., Kircher, M., Patterson, N., Li, H., Zhai, W., Fritz, M.H., et al., 2010. A draft sequence of the Neandertal genome. Science 7, 710–722. Han, K., Lee, J., Meyer, T.J., Wang, J., Sen, S.K., Srikanta, D., Liang, P., Batzer, M.A., 2007. Alu recombination-mediated structural deletions in the chimpanzee genome. PLoS Genet. 3, 1939–1949. Hasler, J., Strub, K., 2006. Alu elements as regulators of gene expression. Nucleic Acids Res. 34, 5491–5497. Hillier, L.W., Fulton, R.S., Fulton, L.A., Graves, T.A., Pepin, K.H., Wagner-McPherson, C., Layman, D., Maas, J., Jaeger, S., Walker, R., et al., 2003. The DNA sequence of human chromosome 7. Nature 424, 157–164. Jurka, J., 2000. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 16, 418–420. Katoh, I., Mirova, A., Kurata, S., Murakami, Y., Horikawa, K., Nakakuki, N., Sakai, T., Hashimoto, K., Maruyama, A., Yonaga, T., et al., 2011. Activation of the long terminal repeat of human endogenous retrovirus K by melanoma-specific transcription factor MITF-M. Neoplasia 13, 1081–1092. Khurana, E., Lam, H.Y., Cheng, C., Carriero, N., Cayting, P., Gerstein, M.B., 2010. Segmental duplications in the human genome reveal details of pseudogene formation. Nucleic Acids Res. 38, 6997–7007. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al., 2001. Initial sequencing and analysis of the human genome. Nature 409, 860–921. Liu, B., Alberts, B., 1995. Head-on collision between a DNA replication apparatus and RNA polymerase transcription complex. Science 267, 131–1137. Locke, D., Segraves, R., Carbone, L., Archidiacono, N., Albertson, D.G., Pinkel, D., Eichler, E.E., 2003. Large-scale variation among human and great ape genomes determined by array comparative genomic hybridization. Genome Res. 13, 347–357. Makalowski, W., 2001. The human genome structure and organization. Acta Biochim. Pol. 48, 587–598. Marques-Bonet, T., Eichler, E.E., 2009. The evolution of human segmental duplications and the core duplicon hypothesis. Cold Spring Harb. Symp. Quant. Biol. 74, 355–362. Marques-Bonet, T., Girirajan, S., Eichler, E.E., 2009. The origins and impact of primate segmental duplications. Trends Genet. 25, 443–454. Meng, X., Lu, X., Morris, C., Keating, M., 1998. A novel human gene FKBP6 is deleted in Williams syndrome. Genomics 52, 130–137. Meyer, M., Kircher, M., Gansauge, M.T., Li, H., Racimo, F., Mallick, S., Schraiber, J.G., Jay, F., Prufer, K., de Filippo, C., et al., 2012. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226. Mills, R.E., Bennett, E.A., Iskow, R.C., Devine, S.E., 2007. Which transposable elements are active in the human genome? Trends Genet. 23, 183–191. Miyamato, T., Sato, H., Yogev, L., Kleiman, S., Namiki, M., Koh, E., Sakugawa, N., Hayashi, H., Ishikawa, M., Lamb, D., et al., 2006. Is a genetic defect in Fkbp6 a common cause of azoospermia in humans. Cell. Mol. Biol. Lett. 11, 557–569. Perez Jurado, L.A., Wang, Y.K., Peoples, R., Coloma, A., Cruces, J., Francke, U., 1998. A duplicated gene in the breakpoint regions of the 7q11.23 Williams–Beuren syndrome

Please cite this article as: Kim, Y.-J., et al., Gene structure variation in segmental duplication block C of human chromosome 7q 11.23 during primate evolution, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.07.060

Y.-J. Kim et al. / Gene xxx (2015) xxx–xxx deletion encodes the initiator binding protein TFII-I and BAP-135, a phosphorylation target of BTK. Hum. Mol. Genet. 7, 325–334. Rowold, D.J., Herrera, R.J., 2000. Alu elements and the human genome. Genetica. 108, 57–72. Savina, N.V., Smal, M.P., Kuzhir, T.D., Egorova, T.M., Khurs, O.M., Polityko, A.D., Goncharova, R.I., 2011. Chromosomal instability at the 7q11.23 region impacts on DNA-damage response in lymphocytes from Williams-Beuren syndrome patients. Mutat. Res. 724, 46–51. Schubert, C., 2009. The genomic basis of the Williams–Beuren syndrome. Cell. Mol. Life Sci. 66, 1178–1197. Schwartz, S., Zhang, Z., Frazer, K.A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., Miller, W., 2000. PipMaker—a web server for aligning two genomic DNA sequences. Genome Res. 10, 577–586. Sen, S., Han, K., Wang, J., Lee, J., Wang, H., Callinan, P., Dyer, M., Cordaux, R., Liang, P., Batzer, M., 2006. Human genomic deletions mediated by recombination between Alu elements. Am. J. Hum. Genet. 79, 41–53. Serafino, A., Balestrieri, E., Pierimarchi, P., Matteucci, C., Moroni, G., Oricchio, E., Rasi, G., Mastino, A., Spadafora, C., Garaci, E., et al., 2009. The activation of human endogenous retrovirus K (HERV-K) is implicated in melanoma cell malignant transformation. Exp. Cell Res. 315, 849–862. Shen, S., Lin, L., Cai, J.J., Jiang, P., Kenkel, E.J., Stroik, M.R., Sato, S., Davidson, B.L., Xing, Y., 2011. Widespread establishment and regulatory impact of Alu exons in human genes. Proc. Natl. Acad. Sci. U. S. A. 108, 2837–2842.

11

Stankiewicz, P., Shaw, C.J., Withers, M., Inoue, K., Lupski, J.R., 2004. Serial segmental duplications during primate evolution result in complex human genome architecture. Genome Res. 14, 2209–2220. Stavru, F., Nautrup-Pedersen, G., Cordes, V.C., Gorlich, D., 2006. Nuclear pore complex assembly and maintenance in POM121- and gp210-deficient cells. J. Cell Biol. 173, 477–483. Thompson, J., Higgins, D., Gibson, T., 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific. Nucleic Acids Res. 22, 4673–4680. Thornburg, B.G., Gotea, V., Makalowski, W., 2006. Transposable elements as a significant source of transcription regulating signals. Gene 365, 104–110. Valero, M.C., de Luis, O., Cruces, J., Perez Jurado, L.A., 2000. Fine-scale comparative mapping of the human 7q11.23 region and the orthologous region on mouse chromosome 5G: the low-copy repeats that flank the Williams–Beuren syndrome deletion arose at breakpoint sites of an evolutionary inversion(s). Genomics 69, 1–13. Wu, J., Grindlay, G., Bushel, P., Mendelsohn, L., Allan, M., 1990. Negative regulation of the human epsilon-globin gene by transcriptional interference: role of an Alu repetitive element. Mol. Cell. Biol. 10, 1209–1216. Wu, C., Orozco, C., Boyer, J., Leglise, M., Goodale, J., Batalov, S., Hodge, C.L., Haase, J., Janes, J., Huss 3rd, J.W., et al., 2009. BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol. 10, R130. Yi, J.M., Kim, H.S., 2006. Molecular evolution of the HERV-E family in primates. Arch. Virol. 151, 1107–1116.

Please cite this article as: Kim, Y.-J., et al., Gene structure variation in segmental duplication block C of human chromosome 7q 11.23 during primate evolution, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.07.060