The 5′ Region of Intron 11 of the Dystrophin Gene Contains Target Sequences for Mobile Elements and Three Overlapping ORFs

The 5′ Region of Intron 11 of the Dystrophin Gene Contains Target Sequences for Mobile Elements and Three Overlapping ORFs

BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS ARTICLE NO. 242, 401–406 (1998) RC977976 The 5* Region of Intron 11 of the Dystrophin Gene Cont...

149KB Sizes 3 Downloads 45 Views

BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS ARTICLE NO.

242, 401–406 (1998)

RC977976

The 5* Region of Intron 11 of the Dystrophin Gene Contains Target Sequences for Mobile Elements and Three Overlapping ORFs Alessandra Ferlini1 and Francesco Muntoni Neuromuscular Unit, Department of Paediatrics & Neonatal Medicine, Imperial College of Science, Technology and Medicine, Hammersmith Campus, Du Cane Road, London W12 ONN, United Kingdom

Received December 4, 1997

We have characterised the 2371 bp 5* end of intron 11 of the dystrophin gene. Comparative analysis of this intronic region revealed homologies with the following sequences: regions containing mobile elements; target sites for numerous transcription factors, two resolvases, and a histone-like DNA binding protein; three eukaryotic promoters. In addition, we identified three partially overlapping ORFs, and transcription analysis confirmed that one of these is expressed, representing the first gene reported to overlap the human dystrophin gene. We have also characterised a 136 bp sequence rearranged in intron 11 in a patient affected by X-linked dilated cardiomyopathy due to a dystrophinopathy. This is a multiple copy sequence with features of a repetitive element. Its comparative analysis showed a very high homology with human genomic and EST regions, adjacent and clustered with Alu, LINE1, and THE elements. The pattern of homology suggests that it may represent a novel Alu-like, transcriptionally active sequence with a possible retrotransposable capacity. We hypothesise that the 5* region of the dystrophin intron 11, containing common target areas for the insertion of mobile elements, may have a role in the rearrangement of this novel Alu-like sequence. q 1998 Academic Press

The dystrophin gene is a giant gene on Xp21, in which mutations cause different phenotypes involving either predominantly the skeletal muscle (1) or the cardiac muscle (X-linked dilated cardiomyopathy XLDCM, 2, 3).The majority of the mutations in the dystrophin gene are due to large deletions or gross rearrangements (duplications, inversions), often occurring in hot spot 1 Address for correspondence: Neuromuscular Unit, Department of Paediatrics & Neonatal Medicine, Hammersmith Hospital, Du Cane Road, London W12 ONN, UK. Fax: /44 181 7462187. E-mail: aferlini@ rpms.ac.uk.

regions of the gene (4, 5), although a few point mutations or small deletions have been reported in both Duchenne (DMD) and Becker (BMD) muscular dystrophies (6). The severity of the resulting phenotype is generally considered to be related to the effect of the mutations on the reading frame (7). In several XLDCM families dystrophin gene mutations have been identified that abolish the expression of dystrophin only in cardiac muscle (2, 8, 9), although a missense mutation has been described that results only in reduced expression of the gene in this tissue (10). The dystrophin gene spans at least 3000 Kb (http:// ruly70.medfac.leidenuniv.nl/Çduchenne/), 99% of which is represented by very large intronic sequences (1). The growing literature regarding the sequence and comparative analysis of these introns has identified different interspersed repetitive sequences and mobile elements that occur with unexpectedly high frequency (11, 12). It has been hypothesised that it is the presence of these unstable repetitive sequences, rather than the size of the gene itself, that accounts for the high mutation rate of dystrophin (12, 13, 14). In rare instances, rearrangements of these repetitive elements have been shown to be associated with the DMD or BMD phenotype (15, 16, 17). The function of such sequences is still not fully understood, although it has been established that these elements, inserted in the human genome in different evolutionary eras, can affect the expression of adjacent genes in eukaryotes (14, 18). Transcription analysis studies in cardiac and skeletal muscle from a patient affected with XLDCM demonstrated the presence of a novel sequence, inserted in the dystrophin transcript between exon 11 and 12 (Ferlini et al., submitted). To better define the dystrophin intronic region in which the rearrangement causing this splicing mutation occurred, we isolated and sequenced 2.4 Kb at the 5* end of intron 11. This newly determined intronic region revealed the presence of

401

0006-291X/98 $25.00 Copyright q 1998 by Academic Press All rights of reproduction in any form reserved.

AID

BBRC 7976

/

6945$$$581

12-29-97 16:31:14

bbrcg

AP: BBRC

Vol. 242, No. 2, 1998

BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS

several target sequences for mobile elements. Three ORFs have also been recognised in intron 11, one of which is effectively transcribed both in human foetal and adult cardiac and skeletal muscle. Comparative analysis of the inserted sequence suggested it might be part of a novel repetitive element family, given its functional and structural characteristics, very similar to the known Alu elements.

Genomic DNA fragments were cloned into PCR Script SK/ AMP Vector (Stratagene), followed by manual (Sequenase 2.0 Pharmacia) or automated sequencing (Applied Byosystem). The sequences have been submitted to the EMBL GenBank.

MATERIALS AND METHODS

Isolation of the 5* End of Intron 11

Genomic studies. Genomic DNA was isolated from peripheral blood obtained from the patient with XLDCM and several controls using standard methods (19). Genomic studies have previously revealed that a rearrangement occurred within intron 11 in the patient (Ferlini et al, submitted). Using a reverse primer located within the rearranged region in intron 11 (nucleotides 91-114 in a.n. Y13186) and a forward one at the 3* end of exon 11 (5* GAAGTAAGCTGATTGGAAC 3*) we were able to amplify a 2.4 Kb intronic region, that was specific to the patient. A PCR walking approach was used to amplify and sequence this 2.5 Kb fragment in the patient by using internal primers (Table I). These were subsequently used for direct DNA sequencing. Amplification reactions were carried out by means of the large Taq Expand System in buffer 1 (Boehringer, Mannheim). The amplification conditions always had a 66 C7 annealing temperature for 2 minutes and 24 seconds of auto-extension, depending on the length of the expected amplification product, for 30 cycles. Some reactions were repeated to obtain a reliable consensus sequence. Large size (more than 2 Kb) amplification products were gel purified by using Agarase digestion (Boehringer); Qiagen purification columns were used for smaller fragments. Cosmid cLA1H1 stabilised culture (gift of Dr. DenDunnen), which encompasses the exon 11/intron 11 boundary, was inoculated overnight in L-Broth medium and DNA was isolated by Qiagen miniprep. Cosmid DNA was tested by PCR using primers M115, M114, Dys5 and Dys6 (Table I) with the same amplification conditions described above.

Amplification using the forward primer in exon 11 and a reverse primer located within the rearranged region gave a 2.5 Kb product in the patient with XLDCM only (data not shown). We PCR isolated and sequenced this fragment containing both the 5* end of intron 11 and the rearranged Alu-like sequence. By PCR walking we were able to amplify the wild type form of the 5* end of intron 11 in normal controls. We were able to recognise a minimal region encompassing the nucleotides 2349-2371 in intron 11, identical in both the patient and controls, that allowed us to establish that nucleotides 1 to 2371 represent the wild type intron 11 sequence. Amplification of the cosmid cLA1H1, which contains the exon 11/intron 11 boundary, by primers utilised for the PCR walk (Table I) confirmed that this intronic sequence is part of the wild type dystrophin gene.

Transcription studies. Total RNA was isolated from control human tissues (adult skeletal muscle, adult heart, adult brain, lymphocytes, foetal skeletal muscle, foetal heart and foetal brain) by the method of Chomczynsky and Sacchi (20). RT-PCR for cDNA synthesis was performed using random hexanucleotide primers and the MMuLV reverse transcriptase (Pharmacia), following the procedure already described (21). PCR analysis was carried out using the primers ORF2a and ORF2b (ORF2a forward, nucleotides 1143-1164; ORF2b reverse, nucleotides 1273-1251 in Y13187) located in the putative ORF2 region of the intron 11. To check cDNAs for genomic DNA contamination, a pair of primers (forward CA3 5* TGTTGACTGGCGTGATGTAGTTGCTTGG 3* and reverse A2 5* TTCAGCGGCCAGTAGCATCTGACTT 3* ) amplifying both genomic and cDNA regions including the exon a2 of the human c-abl oncogene (a.n. m13099) have been utilised for PCR analysis (35 cycles, 58 C7 annealing temperature). The amplification products were electrophoresed on an high resolution agarose gel (Electran, Sigma) and visualised after ethidium bromide staining. For PCR reaction Perkin Elmer thermocycler and Taq Pfu Polymerase (Stratagene) were used. The amplification conditions are available upon request. Cloning and sequencing. For cloning procedures the cDNA amplification products were cloned in the pGEM vector (TA cloning system, Promega) and sequenced by manual sequencing (Sequenase kit version 2.0 Pharmacia).

Comparative sequence analysis. Comparative sequence analysis was done by GCG Wisconsin Package by using FASTA, BLAST, and SIGNAL SCAN.

RESULTS

Intron 11 Comparative Analysis BLAST comparative analysis of the wild type intronic sequence versus the several interspersed repeat sequences (IRS) in the GeneBank did not show homology with known repetitive elements. There was however a significant homology (75-80%), between nucleotides 645-972 of the wild type intron 11 and with cosmids located on Xq22 (a.n. z70272, z68328, z73913) and also on chromosome 7q22 (a.n.ac000117). We observed further homology (55% over 600 bp overlap) with a region located within intron 7 of the dystrophin gene (a.n. u60822). Interestingly, this area of homology does not belong to a known repetitive element region, already mapped to intron 7, but it does correspond to the intron 7 region in which a breakpoint due to a t(X;1) has been determined (12) (Figure 1). The BLAST analysis versus EST’s revealed homology with 8 Homo sapiens EST’s sequences. Some of these EST’s (a.n. aa149803, r08216, aa210803) overlap with nucleotides 645-800, which contain the repetitive element homology. Among the others EST’s, three sequences (a.n. f02926, r40458, r42803) have homology with nucleotides 2120-2157, and two (a.n. r31483 and h79099) with nucleotides 1470-1780 (Figure 1). A search for transcription factors by means of SIGNAL SCAN and REPEAT programs showed several potential target sequences. Among these, we identified: i) three

402

AID

BBRC 7976

/

6945$$$581

12-29-97 16:31:14

bbrcg

AP: BBRC

Vol. 242, No. 2, 1998

BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS TABLE 1

1 2 3 4 5 6 7 8 9

Primer

Sequence (5*–3*)

Orientation

M112 M113 M114 M115 DYS2 DYS3 DYS4 DYS5 DYS6

ATGCACAGGGAGATTGTGTACAT ATCTTAAGAATATATATTGACTTAATAA AGCTCATCAAGACAAGATAGATTTA CTATTATTTATGAAGTTCAAAAGCGT CATGACCATGATAATGGAAACTGA GCTCAAATAATGCTTTGTATATTCA GCATTTCCATTCTCTAGTGGGAT ATGAGAGCATTTAGCCTCCAGG GCATAGATTAGTATCTCCTGGCT

Forward Forward Forward Forward Reverse Reverse Reverse Reverse Reverse

Note. Sequence of the primers utilised to amplify and sequence the 5* end of intron 11 of the dystrophin gene.

closely located target sites (AGCTTTT motif) for the TCF-1 rat sperm protamine (a.n.S02023), a DNA binding protein, that is part of the histone superfamily, ii) four TTATAA motifs, representing the palindrome recognized by the Tn3 and gd resolvases, two closely related recombinases that are active in site-specific recombination and transposition (22) (Figure 1). ORFs in Intron 11 (a.n.Y13187) The search for ORFs in the intronic sequence revealed the presence of three partially overlapping putative ORFs (Figure 1). Two of these ORFs are in the same orientation as the dystrophin gene (nucleotides 991-

1179 ORF1, and nucleotides 1136-1276 ORF2), while the third is in antisense orientation (nucleotides 1024889 ORF3). The GRAIL search predicted two internal exons (10% true exon, medium confident prediction) at position 1179 and 1276 and a HpaII site (GC rich region) located immediately upstream from the begininng of the ORF1 and ORF2 (nucleotide 891) (Figure 1). Furthermore, the ORF2 region showed a 80% homology with a human foetal cDNA (a.n. AAAGJLO). However, the GC content of the ORFs region is only 38%. RT-PCR studies on human adult cardiac and skeletal muscle using a couple of oligonucleotides spanning the ORF region (ORF2a and ORF2b) allowed us to detect a 133 bp PCR product corresponding to the

FIG. 1. Schematic representation of the 5* end of intron 11 of the human dystrophin gene. Orientation of the promoter homologues regions is indicated by the arrows. Orientation of the repetitive sequences in respect to intron 11 is indicated by cross-hatching (// forward, "" reverse). AGCTTTT: TCF1 protamine recognition site; TTATAA: gd resolvases recognition site. 403

AID

BBRC 7976

/

6945$$$581

12-29-97 16:31:14

bbrcg

AP: BBRC

Vol. 242, No. 2, 1998

BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS

adult and foetal tissues. Interestingly, these are tissues in which the dystrophin gene is normally transcribed and highly expressed. Northern analysis will allow us to confirm this data. Comparative Analysis of the Rearranged Alu-like Sequence (a.n. Y13186)

FIG. 2. Amplification of the ORF2 region in the dystrophin intron 11 by using two internal primers ORF2a and ORF2b, in adult and foetal cDNAs. Lanes: a) adult heart, b) adult skeletal muscle, c) adult brain, d) foetal heart, e) foetal skeletal muscle, f) foetal brain. V molecular weight marker V (Boehringer). The expected 133 bp product is detected in all tissues analysed.

ORF2 region in adult and foetal skeletal muscle, heart and brain cDNAs (Figure 2). Genomic DNA contamination of these cDNAs was excluded amplifyng the c-abl oncogene (data not shown). This suggests that ORF2 could represent a true internal exon, located within intron 11, with the same orientation of the dystrophin gene, and transcribed both in human

BLAST comparative analysis of the 136 bp sequence failed to detect any homology with either the Alu and LINE1 consensus sequences, nor with other known repetitive element families, but did highlight a significant homology (more than 80% on a minimum of 100 bp overlap) with several human genomic sequences. These included several cosmid mapping to Xq13, Xq21q22, Xq28, 4p16.3, 13q12-13 and 7q21, the glycerol kinase pseudogene and a region flanking the 3*UTR of the FRAXA gene on Xq27.3 (Ferlini et al., submitted). All of these regions contain several Alu and LINE1 elements. The topographic analysis of these homologous areas is reported in Figure 3. In all cases the Alulike sequence homology region is adjacent to a LINE1 element, generally located at its 3* end, and is part of a cluster of repeats, including Alu and other more uncommon repeats. These regions show both sense and antisense orientation. The homologous region that the present Alu-like sequence shares with these sequences always occurs between nucleotides 25 and 136. Inter-

FIG. 3. Topographic organisation of the genomic and EST regions showing homology with the Alu-like sequence. Orientation of the repetitive sequences in respect to the Alu-like fragment is indicated by cross-hatching (// forward, "" reverse). The EMBL accession numbers of the genomic or EST sequences are indicated in brackets. The gaps between bars indicate the distance in base pair between the repetitive element regions. 404

AID

BBRC 7976

/

6945$$$581

12-29-97 16:31:14

bbrcg

AP: BBRC

Vol. 242, No. 2, 1998

BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS

estingly, this interval showed significant homology (80%) with two eukaryotic promoters (a.n. EPD14029, EPD17097). In addition, comparative analysis with the EMBL-EST bank revealed homology with 13 human EST sequences, the most relevant one being with two Homo sapiens cDNA sequences (a.n. c02353 and z25899). All these EST sequences represent truncated transcripts that do not have an ORF. DISCUSSION The 5* end of intron 11 showed several peculiar features, such as the homology with regions flanking repetitive elements as well as the presence of several motifs recognised by resolvases and a DNA binding protein from the histone-like superfamily. These characteristics suggest that this region could delineate an area adjacent to mobile elements. Furthermore, it has been reported that some recombinases, in particular resolvases and integrases, are involved in the site-specific recombination of transposable elements in both prokaryotes and eukaryotes (22, 23). To function these recombinases require the presence of DNA binding proteins, that belong to the histone superfamily (24, 25). The presence of three Tn3 and gd resolvase target sites as well as the motifs recognised by the histone superfamily DNA binding protein TCF-1 suggest that this intron could represent an effective target sequence for the transposition of mobile elements. This is in concordance with the literature, which suggests that there should be a very high number of potential target sites in the genome, given the low degree of specificity these sites have (26). We hypothesize that the presence of mobile element target sites in intron 11 might have facilitated the Alu-like rearrangement found in this intronic region in a patient with XLDCM (Ferlini et al., submitted). The search for ORFs at the 5* end of intron 11 detected three possible regions. The presence of ORF2 was supported by the GRAIL prediction program and by homology with the human cDNA fragment. Preliminary transcription studies confirm that ORF2 is transcribed. The presence of coding regions within introns has already been reported in the mammalian genome (27, 28, 29) but never in the dystrophin gene. It will be of interest to evaluate the size, structure and homology of this new gene, and particularly its relationship with the dystrophin gene and its mutations. These findings confirm that several repetitive and also mobile elements are located in the intronic regions of the dystrophin gene. They also support the hypothesis that the high rate of gross mutations in this gene may be related to the presence of these sequences. ACKNOWLEDGMENTS This work was generously financed by a grant from the British Heart Foundation (BHF) (grant to F.M.). Dr. Johann DenDunnen

is acknowledged for providing the dystrophin cosmids. Dr. Matthew Dunckley and Dr. Martin Brockington (Neuromuscular Unit, Imperial College of Medicine, Hammersmith Campus, London UK) are acknowledged for the critical reading of the manuscript. Thanks are also due to the Legato Ferrari Foundation, Modena, Italy (to A.F.).

REFERENCES 1. Ahn, A. H., and Kunkel, L. M. (1993) Nat. Genet. 3, 283–291. 2. Muntoni, F., Cau, M., Ganau, A., Congiu, R., Arvedi, G., Mateddu, A., Marrosu, M. G., Cianchetti, C., Realdi, G., Cao, A., Melis, M. A. (1993) New Eng. J. Med. 329, 921–925. 3. Towbin, J. A., Hejmancik, J. F., Brink, P., Belb, B., Shu, X. M., Chamberlain, J. S., McCabe, E. R., and Swift, M. (1993) Circulation 87, 1854–65. 4. DenDunnen, J. T., Grootscholten, P. M., Bakker, E., Blonden, L. A. J., Ginjaar, H. B., Wapenaar, M. C., Paassen, H. M. B., van Broeckhoven, C., Pearson, P. L., and van Ommen, G. J. B. (1989) Am. J. Hum. Genet. 45, 835–847. 5. Galvagni, F., Saad, F. A., Danieli, G. A., Miorin, M., Vitiello, L., Mostacciuolo, M. L., and Angelini, C. (1994) Hum. Genet. 94, 83– 87. 6. Roberts, L. G., Gardner, R. J., and Bobrow, M. (1994) Hum. Mut. 4(1), 1–11. 7. Monaco, A. P., Bertelson, C. J., Liechti-Gallati, S., Moser, H., and Kunkel, L. M. (1988) Genomics 2, 90–95. 8. Milasin, J., Muntoni, F., Severini, G. M., Bartoloni, L., Vatta, M., Krajinovic, M., Mateddu, A., Angelini, C., Camerini, F., Falaschi, A., Mestroni, L., Giacca, M., and the Heart Muscle Disease Study Group (1996) Hum. Mol. Genet. 5, 73–79. 9. Yoshida, K., Ikeda, S., Nakamura, A., Kagoshima, M., Takeda, S., Shoji, S., and Yanagisawa, N. (1993) Muscle and Nerve 16, 1161–1166. 10. Ortiz-Lopez, R., Li, H., Su, J., Goytia, V., and Towbin, J. A. (1997) Circulation 95(10), 2434–2440. 11. McNaughton, J. C., Broom, J. E., Hill, D. F., Jones, W. A., Marshall, C. J., Renwick, N. M., Stockwell, P. A., and Petersen, G. B. (1993) J. Mol. Biol. 232, 314–321. 12. McNaughton, J. C., Hughes, G., Jones, W. A., Stockwell, P. A., Klamut, H. J., and Petersen, G. B. (1997) Genomics 40, 294– 304. 13. Britten, R. J., Baron, W. F., Stout, D. B., and Davidson, E. H. (1988) Proc. Natl. Acad. Sci. USA 85, 4770–4774. 14. Britten, R. J. (1996) Proc. Natl. Acad. Sci. USA 93, 9374 – 9377. 15. Holmes, S. E., Dombrosky, B. A., Krebs, C. M., Boehm, C. D., and Kazazian H. H. (1994) Nat. Genet. 7, 143–148. 16. Narita, N., Nishio, H., Kitoh, Y., Ishikawa, Y., Ishikawa, Y., Minami, R., Nakamura, H., and Matsuo, M. (1993) J. Clin. Invest. 91, 1862–1867. 17. Pizzuti, A., Pieretti, M., Fenwick, R. G., Gibbs, R. A., and Caskey C. T. (1992) Genomics 13, 594–600. 18. Favor, J., and Morawetz, C. (1992) Mutation Research 284, 53– 74. 19. Kunkel, L. M., Tantravahi, U., Kurnit, D. M., Eisenhard, M., Bruns, G. P., and Latt, S. A. (1983) Nucleic Acid Research 11 (22), 7961–79. 20. Chomczynsky, P., and Sacchi, N. (1987) Anal. Biochem. 162, 156–159. 21. Muntoni, F., Melis, M. A., Ganau, A., and Dubowitz, V. (1995) Am. J. Hum. Genet. 56, 151–157. 22. Stark, W. M., and Boocock, M. R. (1995) in Mobile Genetic Ele-

405

AID

BBRC 7976

/

6945$$$581

12-29-97 16:31:14

bbrcg

AP: BBRC

Vol. 242, No. 2, 1998

BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS

ments, (Sherrat, D. J., Ed.), pp. 101–129, IRL Press, Oxford University. 23. Rojo, F., and Alonso, J. C. (1995) Nucleic Acids Res. 23 (16), 3181–8. 24. Alonso, J. C., Gutierrez, C., and Rojo, F. (1995) Mol. Microbiol. 18(3), 471–8. 25. Johnson, R. C., Bruist, M. F., and Simon, M. I. (1986) Cell 46(4), 531–9.

26. Brookfield, J. F. Y. (1995) in Mobile Genetic Elements (Sherratt, D. J., Ed.), pp. 130–153, IRL Press, Oxford University. 27. Derry, J. M., and Barnard, P. J. (1992) Genomics 12(4), 632– 638. 28. Levinson, B., Kenwrick, S., Lakich, D., Hammonds, G., and Gitschier, J. (1990) Genomics 7, 1–11. 29. Viskochil, D., Cawthon, R., O’Connel, P., Xu, G., Stevens, J., Culver, M., Carey, J., and White, R. (1991) Mol. Cell. Biol. 11, 906–912.

406

AID

BBRC 7976

/

6945$$$581

12-29-97 16:31:14

bbrcg

AP: BBRC