CpNpG motifs are frequent sites for heterogeneous mutations in the Neurofibromatosis type 1 (NF1) tumour-suppressor gene

CpNpG motifs are frequent sites for heterogeneous mutations in the Neurofibromatosis type 1 (NF1) tumour-suppressor gene

Mutation Research 373 Ž1997. 185–195 Homonucleotide tracts, short repeats and CpGrCpNpG motifs are frequent sites for heterogeneous mutations in the ...

207KB Sizes 37 Downloads 66 Views

Mutation Research 373 Ž1997. 185–195

Homonucleotide tracts, short repeats and CpGrCpNpG motifs are frequent sites for heterogeneous mutations in the Neurofibromatosis type 1 ž NF1/ tumour-suppressor gene David I. Rodenhiser a

a,)

, J.D. Andrews a , D.N. Mancini a , J.H. Jung b, S.M. Singh

b

Molecular Medical Genetics Program, Room A4 WT, Child Health Research Institute, Children’s Hospital of Western Ontario, London Health Sciences Center, 800 Commissioners Road East, London, Ontario, Canada, N6C 2V5 b Departments of Paediatrics and Zoology, UniÕersity of Western Ontario, London, Ontario, Canada, N6A 5B7 Received 22 December 1995; revised 24 June 1996; accepted 3 July 1996

Abstract Neurofibromatosis type 1 ŽNF1. is among the most common human genetic disorders, having a constellation of cutaneous and skeletal manifestations, intellectual impairment, and an increased risk for a variety of malignancies. The NF1 gene has a high spontaneous mutation rate and is also associated with a variety of sporadic cancers in the general population. While a number of laboratories are involved in a coordinated effort to identify NF1 mutations, an important gap in our knowledge is an understanding of the mechanisms responsible for NF1 mutagenesis. In this present paper we describe our analysis of the sequence environment in the NF1 gene at those sites where small deletions, insertions and nucleotide substitution mutations have been reported. Our objective was to determine whether specific nucleotide sequences commonly occur at these mutation sites within the NF1 gene. We assessed how frequently independent NF1 mutations occur at the site of short direct repeats, single nucleotide repeats Žhomonucleotides. and at CpG and CpNpG motifs. We have established that homonucleotide and short direct repeats are commonly involved in the majority of small deletions and insertions analysed. Substitution mutations are frequently associated with homonucleotide repeats and methylatable CpG dinucleotides and CpNpG trinucleotides. We suggest that NF1 mutations are acquired and retained by cells through an intricate balancing of repair and replication mechanisms. Such mutations may provide a proliferative advantage for that cell and its progeny. Keywords: Neurofibromatosis; Mutation; Homonucleotide; Repeat; CpG; Mechanisms

1. Introduction The neurofibromatosis ŽNF1. gene is primarily known as the cause of the genetic disease resulting in a predisposition for hyperpigmented skin lesions, profound physical disfigurement due to benign tumours developing on or beneath the skin Žneuro)

Corresponding author.

fibromas., as well as developmental disabilities and the increased occurrence of a variety of malignant cancers ŽRubenstein and Korf, 1990.. The incidence of neurofibromatosis in the general population Ž1 in 3000. makes it one of the most common genetic afflictions in man. Up to 50% of affected cases may occur as a result of new mutations ŽHuson et al., 1989., with most originating in the paternal gamete ŽJadayel et al., 1990.. This gene also has a broader

0027-5107r97r$17.00 Copyright q 1997 Elsevier Science B.V. All rights reserved. PII S 0 0 2 7 - 5 1 0 7 Ž 9 6 . 0 0 1 7 1 - 6

186

D.I. Rodenhiser et al.r Mutation Research 373 (1997) 185–195

role in controlling normal cellular growth and development ŽSeizinger, 1993.. NF1 mutations have been found in a variety of tumours outside the clinical phenotype usually associated with neurofibromatosis, including astrocytoma, myelodysplastic syndrome, sporadic malignant melanomas and neuroblastoma ŽLi et al., 1992; The et al., 1993; Andersen et al., 1993.. NF1 is a complex gene spanning 300 kb, consisting of 59 exons encoding an 11–13 kb mRNA with an open reading frame of 8454 nucleotides ŽMarchuk et al., 1991.. Large introns separate exons 1 and 2 Ž; 140 kb. and exons 27 b and 28 Ž50 kb; Li et al., 1995.. Three expressed genes Ž eÕi2A, eÕi2B and OMgp . lie within the 50 kb intron of the NF1 gene and are transcribed in an opposite orientation ŽCawthon et al., 1990; O’Connell et al., 1990., while a processed pseudogene of adenylate kinase 3 ŽAK3. has been mapped at the 3X end of the gene ŽXu et al., 1992.. Many laboratories have been involved in a collaborative effort to identify NF1 mutations for diagnostic purposes and to derive an overall picture of the mutational basis of NF1 ŽUpadhyaya et al., 1994.. This effort has been greatly enhanced by the International NF1 Mutation Analysis Consortium. No single, common mutation has been described in the NF1 gene. Instead, most mutations are family-specific, with only a few shared among unrelated families ŽUpadhyaya et al., 1994.. This is also the case for other tumour-suppressor genes such as the retinoblastoma ŽCanning and Dryja, 1989., APC ŽMiyoshi et al., 1993. and BrCa1 genes ŽShattuckEidens et al., 1995., where multiple sites for mutation have been described, with few recurrent mutations. Other characteristics of the NF1 gene such as its high mutation rate Ž30 to 50% of NF1 patients represent new mutations; Huson et al., 1988., the appearance of NF1 mutations in tumours unrelated to the NF1 phenotype ŽSeizinger, 1993. and the variable expression of the NF1 phenotype ŽRiccardi, 1993; Easton et al., 1993. point to the complex molecular etiology of this gene. While many NF1 mutations have been identified, the mechanisms responsible for NF1 mutagenesis remain speculative. In this present paper we describe our analysis of the sequence environment in the NF1 gene at those sites where small deletions, insertions

and nucleotide substitution mutations have been reported. Our objective was to determine whether specific nucleotide sequences commonly occur at independent mutation sites within the NF1 gene. Specifically, we have assessed how frequently these mutations occur at the site of small direct repeats and single nucleotide repeats Žhomonucleotides.. Since we had previously established the presence of DNA methylation in the NF1 coding region as a possible means to generate mutations at methylation-sensitive and hypermutable CpG dinucleotides ŽRodenhiser et al., 1993., we also assessed the frequency of mutations occurring at CpG dinucleotides and at CpNpG trinucleotide motifs ŽLaird and Jaenisch, 1994; Clark et al., 1995.. We found that while homonucleotide and short direct repeats are involved in the majority of small deletions and insertions analyzed, substitution mutations are frequently associated with homonucleotides and CpGsrCpNpGs motifs. These results suggest that slippage repair during DNA replication and methylation dependent mutagenesis are at least two of the probable mechanisms responsible for these mutational events. 2. Methods Information concerning NF1 germline mutations was collected from the literature and from the listing of mutations submitted to the NF1 Mutation Analysis Consortium and found in the Consortium’s August 1995 Newsletter. Permission was graciously given by individual Consortium members to allow the use of unpublished NF1 mutations in our analyses Žlisted in Acknowledgements.. The sequence features of 90 germline mutations possessed by unrelated NF1 patients were analyzed by localizing the specific mutation sites to the NF1 cDNA sequence ŽGenbank No. M82814.. These mutations include 29 small deletions, 19 insertional mutations and 42 substitution mutations. Twenty-six substitution mutations lead to premature stop codons and 16 substitutions result in amino acid changes. At least 40 large deletions that have been reported have not been included in this analysis, since few Žif any. include information on DNA sequences flanking the deletion breakpoints. Within the DNA sequence immediately flanking all mutations, two types of repeat sequences were

D.I. Rodenhiser et al.r Mutation Research 373 (1997) 185–195

assessed: single nucleotide repeats Žhomonucleotides; i.e., AA or GGGG. and short direct repeats Ži.e., ŽAG. n .. Direct repeats adjacent to multi-base deletions were subdivided into several groups, based on the inclusion of the repeats and intervening sequences in the deleted sequences ŽJego et al., 1993.. We also assessed all mutations for the involvement of CpG dinucleotides ŽLaird and Jaenisch, 1994. and CpNpG trinucleotides ŽClark et al., 1995., since several reports have described the association of

187

CpG dinucleotides with a common recurrent mutation in the NF1 gene ŽAndrews et al., 1996; Ainsworth et al., 1993; Estivill et al., 1991.. 3. Results 3.1. The sequence enÕironment of NF1 deletions We localized mutations to the NF1 cDNA sequence ŽGenbank M82814. and mapped the nu-

Table 1 Deletions in the NF1 gene Pat. ID

DNA sequence

Ref.rSource

ŽA. Deletions associated with homonucleotides 0091 4247delG 0386 658delT – 4424delT 0093 7260delT 0003 4183del C 0406 4190del T 0544 1541delAG 0545 1541delAG – 8040delT

Mutation

CGAAAG GGGCTTGA agGCAT TTTGGAAC TAGTCT TTCCTTC ttagCT TTACTTA TCAATC CTGCCA TGCCAT TGTCTC AAAACAG GGGCCCC AAAACAG GGGCCCC CATCT TACCT

Heim et al. Ž1995. Consortium: Wallace Consortium: Horiuchi Heim et al. Ž1995. Consortium: Cawthon Anglani et al. Ž1993. Robinson et al. Ž1995b. Robinson et al. Ž1995b. Consortium: Horiuchi

B. Deletions associated with direct repeats 0103 730del104bp 0102 4088del23bp 0619 4969delTCTATA 0379 7096del AACTTT 0543 6789del TTAC 0547 6789delTTAC – 5679delACTG 0331 5077del12bp 0232 7118delTTTTA 0380 5672delGTAA – 3937delGATT 0332 7745del10bp 0382 6810delAGAAG 0556 5108del AG 0151 5123del CCACC 0097 3050delAATT

atgcagAAT...100 bp...AGAAAT TCGAAGTGT...20 bp...Ggtatg CGCAGTCTATATCTATAACTG ACTCTAACTTTAACTTTAACT CTGACACTTACAACAG GACACTTACAACAG AAGACACTGGCAGC GAAACTGGCTGAGCA...18bp...GAAACT AACTTTGCATTGGTTGGACAACTTTTA CATTAGTAAGACAC CCTCTGATTGGCAA AAATGTTCTCTTGGATGAAG CTGATAGAAGCTACA ACAACAGAAACT TGCTGCCACCTTGGC GTGTCAATTAGTTG

Heim et al. Ž1995. Heim et al. Ž1995. Consortium: Vidaud Abernathy et al. Ž1994. Robinson et al. Ž1995a. Boddrich et al. Ž1995. Hatta et al. Ž1994. Shen and Upadhyaya Ž1993. Rodenhiser et al. Ž1996a. Consortium: Wallace Consortium: Horiuchi Shen et al. Ž1993. Consortium: Wallace Zhong et al. Ž1993. Heim et al. Ž1995. Heim et al. Ž1995.

ŽC. Other deletions 0337 0338 – –

ACCAAGTATCA GGTGCAGTAGG ATTACAGATCT ACTGGTCACAA

Colman et al. Ž1993. Abernathy et al. Ž1994. Hatta et al. Ž1994. Consortium: Horiuchi

5010delG 4152del 5949del 3635delT

The deleted nucleotides are underlined and the repeated nucleotides are shown in bold print. The deleted nucleotide is indicated as the most X 5 of the repeated nucleotides. Nucleotides in capital letters are present in the NF1 coding region; nucleotides in lower case are located in intron sequence. The left column ŽPat. ID; i.e., 0091. refers to the patient identification number assigned by the NF1 Mutation Analysis Consortium.

D.I. Rodenhiser et al.r Mutation Research 373 (1997) 185–195

188

cleotides flanking each side of the mutation to assess whether homonucleotide motifs or short direct repeats were adjacent to these mutation sites. Table 1 shows that 29 NF1 deletions in the NF1 coding region were analysed. Nine deletions Ž31%. were associated with homonucleotide repeat motifs, of which seven involved deletions of only one base. The remaining two mutations were independent two base deletions Ž1541delAG. that occurred at a fourbase guanine repeat. Sixteen of the deletions we analysed Ž55%. were associated with small direct repeats and all of these involved deletions of two or more bases. We used the categories previously used by Jego et al. Ž1993. to subdivide deletions into several groups, based on the presence Žor absence. of the repeats and their intervening sequences in the deleted sequences. Nine of the 16 deletions included all or part of at least one partner in the direct repeat pair Ži.e., 6789delTTAC., while two deletions had part or all of both repeats deleted Ži.e., 5672delGTAA.. In five cases, the inter-

vening sequence was involved but neither direct repeat was deleted Ži.e., 5108delAG.. From our results we conclude that repeat motifs are a major feature in the sequence environment surrounding small NF1 deletions. 3.2. The sequence enÕironment of NF1 insertions Table 2 shows that 12 of 19 insertions Ž63%. involved the duplication of nucleotides at a common nucleotide motif Ži.e., AGTT ™ AGTTTT; Ainsworth et al., 1993.. Three insertions Ž16%. involved duplications of complex DNA sequences, as shown most dramatically by a 42 base duplication within exon 28 ŽTassebehji et al., 1993.. Four insertions Ž21%. were neither duplications nor were they adjacent to direct repeat motifs. 3.3. The sequence enÕironment of NF1 substitution mutations Forty-two NF1 mutations were analysed which involve substitutions leading to premature stop

Table 2 Insertions in the NF1 gene Pat. ID

DNA sequence

Ref.rSource

ŽA. Insertions associated with homonucleotides 0227 5849insTT 0591 7486insGG 0450 5055insT 0548 6790insTT 0096 2027insC 0095 2027insC 0387 4873insA 0381 6709insC 0557 5816insG 0229 2875 ins A – 6519insG – 5843insA

Mutation

AGAGTT TTACTG TCCCCG GGAGCC AAGGCTT TGTTT ACACTT TTACAA CCCCCCC CAATTT CCCCCCC CAATTT ACCTTA ACCCAT GCAAAC CGAGTG TTTTTG GCAAGC ACTCAA ATTTGT AGAGAG GAGACT ACGACAAA AGA

Ainsworth et al. Ž1993. Purandare et al. Ž1994. Consortium: Peters Boddrich et al. Ž1995. Heim et al. Ž1995. Heim et al. Ž1995. Consortium: Wallace Consortium: Wallace Zhong et al. Ž1993. Consortium: Rodenhiser Purandare et al. Ž1994. Consortium: Horiuchi

ŽB. Insertions associated with direct repeats – 4096dup42bp 0166 6922ins10bp 0546 1998insCCTCT

42 base tandem duplication GAGGAGGTCA GATGAGGTCA TCCTCT CCTCTA

Tassebehji et al. Ž1993. Consortium: Legius Boddrich et al. Ž1995.

ŽC. Other insertions – – 0384 0383

GACTCT CATCCC CACCAA TGATTC CACTT ATACAA TATTAT AAGCTT

Upadhyaya et al. Ž1992. Upadhyaya et al. Ž1992. Consortium: Wallace Consortium: Wallace

5451insC Ž2 people. 5468insT 6791insA 5289insAA

The inserted nucleotide is underlined. In the majority of cases, insertions have duplicated an adjacent sequence, which is given in bold print.

D.I. Rodenhiser et al.r Mutation Research 373 (1997) 185–195

codons Ž26 mutations; Table 3. or leading to amino acid changes Ž16 mutations; Table 4.. We assessed whether homonucleotide motifs or short direct repeats were adjacent to these mutation sites in the same manner as the previous analyses of deletions and insertions. As well, since CGA codons are frequent sites for transitional mutation leading to TGA Žstop. codons we also determined the frequency of methylatable CpG dinucleotides and CpNpGs at these mutation sites. In contrast to the results from our analysis of deletions and insertions, direct repeat sequences were not associated with the occurrence of NF1 substitutions. Instead, substitution mutations occurred equally

189

at CpGrCpNpG Ž38%. and homonucleotide Ž48%. motifs. Two-thirds of the independent substitutions leading to stop codons Ž18 of 26; 69%. occurred at either CpG dinucleotides Ž15 of 26; 58%. or CpNpGs Ž3 of 26; 11%; Table 3B.. These sites include R1947X ŽC5839T., the most common mutation yet described in the NF1 gene and now known to be shared by at least 10 unrelated NF1 patients. In comparison, most substitutions leading to amino acid changes occurred at homonucleotide motifs Ž11 of 16; 69%.. For example, a number of mutations were similar to L1339R, where a T ™ C transition alters the sequence CCTCC to CCCCC. Table 5 summarizes the data from our analyses.

Table 3 Substitution mutations leading to premature stop codons Code

Mutation

ŽA. Substitutions associated with homonucleotides 0099 C2113X ŽT6839A. 0541 Y2264X ŽC6692A. 0542 Y2264X ŽC6692A. 0105 Q1017X ŽC3049T. 0100 W2208X ŽG6624A.

DNA sequence

Ref.rSource

GTGTACT T ™ A at adjacent A TTACAAC C ™ A at adjacent A TTACAAC C ™ A at adjacent A TGTCAAT C ™ T at adjacent T GTGGACA G ™ A at adjacent A

Heim et al. Ž1995. Robinson et al. Ž1995a. Robinson et al. Ž1995a. Heim et al. Ž1995. Heim et al. Ž1995.

ŽB. Substitutions associated with CpGrCpNpG motifs 0094 R440X ŽC1318T. C ™ T Transition at CpG 0092 R1276X ŽC3826T. C ™ T Transition at CpG 0004 R1276X ŽC3826T. C ™ T Transition at CpG 0617 R1513X ŽC4537T. C ™ T Transition at CpG 0061 R1947X ŽC5839T. C ™ T Transition at CpG 0511 R1947X ŽC5839T. C ™ T Transition at CpG 0512 R1947X ŽC5839T. C ™ T Transition at CpG 0226 R1947X ŽC5839T. C ™ T Transition at CpG 0136 R1947X ŽC5839T. C ™ T Transition at CpG – R1947X ŽC5839T. C ™ T Transition at CpG – R1947X ŽC5839T. C ™ T Transition at CpG – R1947X ŽC5839T. C ™ T Transition at CpG 0618 R1947X ŽC5839T. C ™ T Transition at CpG 0437 R1947X ŽC5839T. C ™ T Transition at CpG 0590 R2496X ŽC7486T. C ™ T Transition at CpG 0549 Q239X ŽC715T. CCACAGA ŽC ™ T at CpNpG. 0168 S1745X ŽC5234G. CTTCAGC ŽC ™ G at CpNpG. 0098 Q1794X ŽC5380T. CACCAGG ŽC ™ T at CpNpG.

Heim et al. Ž1995. Consortium: Cawthon Heim et al. Ž1995. Consortium: Vidaud Ainsworth et al. Ž1993. Estivill et al. Ž1991. Horiuchi et al. Ž1994. Horiuchi et al. Ž1994. Consortium: Korf Cawthon et al. Ž1990. Lazaro et al. Ž1995. Lazaro et al. Ž1995. Consortium: Vidaud Consortium: Peters Purandare et al. Ž1994. Horn et al.: in preparation Consortium: Legius Heim et al. Ž1995.

ŽC. Other substitutions 0101 E2518X ŽG7553T. – Q682X ŽC2044T. 0153 W1538X ŽG4614A.

Heim et al. Ž1995. Hatta et al. Ž1994. Consortium: Krone

ATGGAAA ŽTransversion of G ™ T. CGACAAG ŽTransition of C ™ T. CTGGTCC ŽTransition of G ™ A.

Substitutions are organized as to whether they occur adjacent to homonucleotide repeats or involve CpG and CpNpG motifs. Nucleotides which are substituted are underlined. Nucleotides shown in bold print are adjacent to the substitution site and the mutation creates a homonucleotide repeat. NF1 Substitutions leading to premature stop codons.

D.I. Rodenhiser et al.r Mutation Research 373 (1997) 185–195

190

Table 4 Mutations leading to amino acid substitutions Code

Nucleotide change

Ref.rSource

ŽA. Substitutions associated with homonucleotides 0167 M1035R ŽA3103G. AGAGATGAA 0616 L1339R ŽT4016C. AACCTCCTT 0169 L2317P ŽG7050A. CTTCTTGAA 0154 G2169C ŽG6505. TCCTGGCTC – Y2171R ŽT6513G. CCTATGAGA – L1423E ŽA4267G. GTCAAAGgt – K2435K ŽA7305G. TAAAACATA 0620 P2056R ŽC6137G. ACTCCTACT – N1776N ŽC5328G. AGAACCAGT – G1166D ŽG3497A. ATAGGCTTA – K1419R ŽA4255T. CTTGAAGTT

Mutation

DNA sequence

ŽA ™ G at adjacent G. ŽT ™ C at C repeat. ŽT ™ C at adjacent C. ŽG ™ T at adjacent G. ŽT ™ G at adjacent G. ŽA ™ G at A repeat. ŽA ™ G at A repeat. ŽC ™ G at C repeat. ŽC ™ G at C repeat. ŽG ™ A at G repeat. ŽA ™ T at A repeat.

Consortium: Legius Consortium: Vidaud Consortium: Legius Consortium: Krone Upadhyaya et al. Ž1992. Li et al. Ž1992. Consortium: Horiuchi Consortium: Vidaud Purandare et al. Ž1994. Purandare et al. Ž1994. Purandare et al. Ž1994.

ŽB. Substitutions associated with CpGrCpNpG motifs – L2143M ŽC6427A. CTTGCTGTT – R2616Q ŽG7847A. CAACGAATT

ŽC ™ A at CpNpG. ŽG ™ A at adjacent A.

Upadhyaya et al. Ž1992. Consortium: Messiaen

ŽC. Other substitutions 0388 R765H – G1404G – S1311S

undefined at G2294 undefined at G4212 undefined at C3933

Consortium: Wallace Purandare et al. Ž1994. Purandare et al. Ž1994.

AGGCGCATT CAGGGATTT CATCCTCTG

Nucleotides which are substituted are underlined. Nucleotides shown in bold print are adjacent to the substitution site. One substitution involves a CpNpG motifs and three mutations are not associated with CpNpGs or homonucleotide repeats. NF1 Substitutions leading to amino acid changes.

Homonucleotides Ž37r90; 41%. occur frequently in conjunction with deletions and insertions and, perhaps surprising, in association with substitution mutations which result in amino acid changes Ži.e., L1339R.. Direct repeats are associated with 21% of all mutations analysed, but all of these were multiple base deletions and insertions Ž19r48; 40%.. Taken together, it is apparent that 83% Ž40 of 48. of all deletion and insertion events involve either of the two repeat types we analysed. CpG and CpNpG motifs are associated with 22% Ž20r90. of all muta-

tions, but are the single largest contributor to substitution events Ž48%.. Only 14 of the mutations analyzed Ž16%. did not fit into any the three categories we assessed. To determine whether irregularities in nucleotide frequencies in the NF1 gene were an important contributor to these data, we analyzed the frequencies of mono- and di-nucleotides in the NF1 coding region and in a region of ten nucleotides flanking the sites of the 36 deletion events. The 8454 bp NF1 coding sequence consists of adenine Ž2511; 29.7%.,

Table 5 Frequency of repeat sequences and CpGrCpNpGs associated with small deletions, insertions and substitutions in the NF1 gene Mutation class

Homonucleotides

Direct repeats

CpGsrCpNpGs

Other

Totals

Deletions Insertions Substitutions ŽStops. Substitutions Ž D a.a.. Total ŽAll mutations.

9 Ž31%. 12 Ž63%. 5 Ž19%. 11 Ž69%. 37 Ž41%.

16 3 0 0 19

0 0 18 2 20

4 Ž14%. 4 Ž21%. 3 Ž12%. 3 Ž19%. 14 Ž16%.

29 19 26 16 90

Ž55%. Ž16%. Ž0%. Ž0%. Ž21%.

Ž0%. Ž0%. Ž69%. Ž12%. Ž22%.

D.I. Rodenhiser et al.r Mutation Research 373 (1997) 185–195

thymidine Ž2308; 27.3%., guanine Ž1796; 21.2%., cytosine Ž1839; 21.8%. residues Žfrom Genbank, accession No. M82814.. Analysis of the mononucleotides in the 670 nucleotides from the cDNA sequence flanking the deletions showed a modest increase of adenine and cytosine residues over the frequencies observed for these nucleotides in the entire coding region Žq1.6% and q2.6% respectively.. However, these increases were not significantly different, nor were the decreases in guanine and thymidine residues observed in the flanking regions Žy1.9% and y2.5%, respectively.. Therefore, the frequencies of mononucleotides flanking the deletions are not significantly different from those elsewhere in the NF1 coding region. We also evaluated the frequencies of the 8453 dinucleotides comprising the NF1 coding sequence and in the regions flanking the deletion sites. The frequency of each dinucleotide pair observed in the cDNA was determined as was the expected dinucleotide frequency, based on the mononucleotide frequency in the cDNA. The observed and expected frequencies of dinucleotide pairs in the coding region differ significantly Ž x 2 s q455.8; df s 15; p 0.005.. The most dramatic changes from expected values are seen with the significant increases at AG, CT, CA and TG dinucleotides and significant decreases at AT, GT, CG and TA dinucleotides. However, no significant differences were observed in the frequency of homo-dinucleotides ŽAA,CC,GG or TT.. Similar analyses of dinucleotide frequency involving the alcohol dehydrogenase ŽADH., huntingtin ŽHD. and retinoblastoma ŽRb. genes Žfor which large DNA sequences have been analysed. have also shown deficiences and excesses of these same sets of dinucleotide pairs ŽHill et al., 1992; Hill and Singh, submitted.. We also evaluated the frequencies of dinucleotide pairs in the regions flanking the deletion sites. No significant differences were seen in dinucleotides at these sites in comparison with dinucleotides in the entire coding region, or based on the mononucleotide frequencies in these flanking regions. These results show that the frequencies of dinucleotide pairs observed in the NF1 coding region are significantly different from expected frequencies and that these differences continue to be present within the regions immediately flanking the sites of the NF1 deletions.

191

4. Discussion In this paper we have presented our analysis of the sequence environment surrounding 90 deletions, insertions and substitution mutations possessed by NF1 patients. Specifically, we assessed how often these mutations occur at the site of single nucleotide repeats Žhomonucleotides., short direct repeats and at methylatable CpG dinucleotide and CpNpG trinucleotide motifs. We have established that in the NF1 gene, a specific group of DNA sequence elements are frequently associated with the site of small mutational events. We found that homonucleotide and short direct repeats are commonly involved in the majority of small deletions and insertions analysed Ž83%.. Substitution mutations are frequently associated with homonucleotide repeats Ž40%. and CpGrCpNpG motifs Ž45%.. As well, the observed frequencies of a subset of dinucleotide pairs in the NF1 coding region and in the region flanking deletion mutations are significantly different from expected values. At least 10 NF1 patients possess independent CpG mutations at R1947X ŽC5839T., the recurrent mutation site in exon 31. DNA methylation likely plays a role in CpG mutagenesis, where these mutational hotspots may be due to differences in repair efficiencies at these premutagenic lesions ŽSchmutte et al., 1995.. DNA methylation is known to be a major contributor of point mutations leading to human genetic disease, when it precedes deamination of 5-methylcytosine Ž m5 C. present within CpG dinucleotides. CpG dinucleotides are hypothesized to be a hotspot for mutations in a variety of genes, where a substantial proportion of intragenic single base-pair mutations Ž35%. occur in CpG dinucleotides and are the result of C ™ T or G ™ A transitions ŽLaird and Jaenisch, 1994; Bottema et al., 1993; Cooper and Youssoufian, 1988.. Consequently, the rate of transitions at CpGs is suggested to be 20-fold higher than transitions at non-CpG sites. In p53 for example, five of six mutational hotspots are at CpG sites and all of these are methylated to 5m C in every tissue examined ŽTornaletti and Pfeifer, 1995.. DNA methylation has been identified within at least one segment of the coding region of the BRCA1 breast cancer susceptibility gene ŽRodenhiser et al., 1996b.. In our analysis of the NF1 gene, only 34 CpGs are

192

D.I. Rodenhiser et al.r Mutation Research 373 (1997) 185–195

present in the coding region, yet transitional mutations have been described at four of these sites, including the recurrent R1947X mutation. This is despite the low frequency of CpG dinucleotides within the NF1 coding region Ž1.4%; 121r8453.. We have also investigated NF1 methylation patterns and have shown that methylation occurs at a variety of CpG dinucleotides including at the C5839T hotspot in exon 31 ŽRodenhiser et al., 1993; Andrews et al., 1996.. We also investigated the occurrence of NF1 mutations at CpNpG sites, since low frequency patterns of DNA methylation at these sites have also been reported ŽClark et al., 1995.. Although several mutations were identified at these motifs, it is unclear whether methylated CpNpGs play a significant role in NF1 mutagenesis. Short direct repeats and homonucleotides are commonly involved in the majority of small NF1 deletions and insertions analysed. Similar results have been found for the frequency of small repeats and homonucleotide pairs in the p53, retinoblastoma ŽRb. and the BRCA1 genes. Jego et al. Ž1993. compiled 740 independent p53 mutations from a wide variety of human cancers and reported that in all 26 deletions of 2 Žor more. bases, a direct repeat was present on the unaltered sequence in close proximity to the deletion. Redston et al. Ž1994. showed that many small p53 deletions in pancreatic adenocarcinoma occurred in iterations of single bases and that there was evidence for the involvement of homocopolymer Žpolypurine: polypyrimidine. tracts. With retinoblastoma ŽRb., no specific DNA sequence within the Rb gene serves as a common hotspot for deletion events. However, short direct repeats ranging in size from 4–7 base pairs are frequently seen flanking deletion termini ŽCanning and Dryja, 1989.. Our analysis of 74 BRCA1 mutations has revealed strong similarities with the patterns of sequence elements associated with NF1 mutations. In BRCA1, as with NF1, homonucleotides and short direct repeats are commonly associated with small deletions and insertions, while substitutions are frequently associated with homonucleotide repeats and methylatable CpGs and CpNpGs ŽRodenhiser et al., 1996b.. Our data, as well as the evidence presented in the literature, show that homonucleotides and short direct repeats may represent common features of the

sequence environment flanking small deletions and insertions in a number of genes, including the NF1 tumour suppressor. A variety of studies have proposed models to explain how these motifs could be important to mutagenesis in animal and bacterial systems. Mutagenic errors can occur in regions where DNA symmetry and conformational changes may interfere with normal processes of replication, repair and recombination ŽSinden and Wells, 1992.. Streisinger et al. Ž1966. proposed a slippage-repair model to explain how single base pair deletions or insertions could be accounted for by frameshifts within runs of a single base. DNA slippage is likely to occur during replication Žor repair synthesis., when one DNA strand transiently dissociates from the other and then reanneals in a misaligned configuration. The misaligned ends can then be filled up by DNA polymerase ŽSchlotterer and Tautz, 1992.. If the unpaired bases are in the primer strand, continued synthesis would result in an insertion whereas if the unpaired bases are located in the template strand a deletion would be expected ŽStrand et al., 1993; Jego et al., 1993; Krawczak and Cooper, 1991.. In vitro evidence for slippage-mediated frame shifts has shown that polymerase b is particularly prone to minus-T errors at TTT runs ŽKunkel, 1986; Kunkel, 1990.. For example, 658delT, 7269delT and 4424delT ŽTable 1. demonstrate minus-T errors at TTT runs. Similarly, mutations that involve the loss or gain of more than a single nucleotide Žeg. 6789del4 and 3937del4; Table 1B. may be explained by invoking the principles of Streisinger slippage between directly repeated DNA sequences separated by a variable number of intervening nucleotides ŽKunkel, 1990; Krawczak and Cooper, 1991; Jego et al., 1993.. Together, experimental evidence for mutation changes such as these could provide the mechanism responsible for a number of observed NF1 mutations. However, it is clear that the Streisinger model may not be the only mechanism by which these mutational events occur, since there is no evidence that both duplications and deletions have occurred at the same homonucleotide run within this gene. Certain characteristics of the NF1 gene such as its high mutation rate and the variable expression of the NF1 phenotype point to the complex molecular etiology of this gene. While the slippage-repair models may present one mechanism by which NF1 muta-

D.I. Rodenhiser et al.r Mutation Research 373 (1997) 185–195

tions occur, one must ask why this gene is prone to these events. Is the high NF1 mutation rate solely due to the large size of the gene providing a substantial target for point mutations, deletions and other rearrangements? Or, is NF1 mutagenesis related to the tumour-suppressor nature of the NF1 gene itself? We suggest that NF1 is a disease where the acquisition and accumulation of mutations is advantageous to the host cell. Mutational events which do occur in this gene are selectively retained by the host cell if they provide a proliferative advantage for that cell. Such mutations could, for example, remove constraints on growth and proliferation previously provided by the presence and regulatory activity of the functional neurofibromin protein. Removal of growth constraints and the presentation of clinical phenotype would be modulated by several factors. First, the presence of the inherited mutation by itself Žthe first hit. could confer a proliferative advantage to the cell that was dependent on the location and functional effect of the mutation. Second, this first hit could also predispose the cell to secondary mutations, although the generation of second hits would also be dependent on the specific cell type and exposure to environmental factors. Third, the acquisition of a second hit in the NF1 wild-type allele would permit the generation of a clone of cells further released from the normal growth constraints. Fourth, subsequent mutational events could also involve so-called modifier genes ŽRiccardi, 1993. with the same result: a proliferative advantage for the cell and the generation of a cell clone displaying a tumour phenotype. This model has implications for other tumour suppressors such as the breast cancer susceptibility gene ŽBrCa1; Shattuck-Eidens et al., 1995. where multiple, family-specific germline mutations are the norm. Such mutations, although low in frequency, may be selectively retained as either somatic or germline events and lead to extremely variable clinical phenotypes, especially if the mutations are scattered throughout the particular gene. As well, a practical application of this model is that molecular diagnosis, especially in the case of large genes such as NF1 and BrCa1, will be onerous in the absence of high efficiency screening methods. Since our data shows that a specific group of DNA sequence elements are frequently associated with small NF1 mu-

193

tations, the search for unknown mutations could be predicted to some degree based on the occurrence of these particular sequence elements in the NF1 cDNA sequence. In any case, it is clear that further work must be done to unravel the intricate balancing of repair and replication mechanisms that is contributing to mutagenesis in tumour suppressor genes such as NF1 and which displays its effect through a highly variable and somewhat unpredictable clinical phenotype.

Acknowledgements This research was funded by the Child Health Research Institute and the Victoria Hospital Research Development Fund. We gratefully acknowledge the support of our colleagues in the Molecular Medical Genetics Program, as well as the clinicians and staff at the Regional Medical Genetics Centre, Children’s Hospital of Western Ontario. We also specifically acknowledge our colleagues in the NF1 Mutation Analysis Consortium whose permission to use unpublished mutations enhanced our analyses: Drs. Bruce Korf, Peggy Wallace, Peter Nurnberg, ¨ Hartmut Peters, Takahiko Horiuchi, Richard Cawthon, Alessandra Murgia, Eric Legius, Ludwine Messiaen, Dominique Vidaud, Wilfrid Krone and Sven Hoffmeyer.

References Abernathy, C.R., Colman, S.D., Kouseff, B.G. and Wallace, M.R. Ž1994. Two NF1 mutations: Frameshift in the GAP-related X domain and loss of two codons toward the 3 end of the gene. Hum. Mutation 3, 347–352. Ainsworth, P.J., Rodenhiser, D.I. and Costa, M.T. Ž1993. Identification and characterization of sporadic and inherited mutations in exon 31 of the neurofibromatosis ŽNF1. gene. Hum. Genet., 91, 151–156. Andersen, L.B., Fountain, J.W., Gutmann, D.H., Tarle, S.A., Glover, T.W., Dracopoli, N.C., Housman, D.E. and Collins, F.S. Ž1993. Mutations in the neurofibromatosis 1 gene in sporadic malignant melanoma cell lines. Nature Genet. 3, 118–121. Andrews, J.D., Mancini, D.N., Singh, S.M. and Rodenhiser, D.I. Ž1996. Site and sequence specific DNA methylation in the neurofibromatosis ŽNF1. gene includes C5839T: the site of the recurrent substitution mutation in exon 31. Hum. Mol. Genet., 5, 503–508.

194

D.I. Rodenhiser et al.r Mutation Research 373 (1997) 185–195

Anglani, F., Murgia, A., Bedin, S., Bresin, E., Bernardi, F., Clementi, M. and Tenconi, R. Ž1993. A new disease- causing mutation in the GAP-related domain of the NF1 gene. Hum. Mol. Genet., 2, 1057–1059. Boddrich, A., Griesser, J., Horn, D., Kaufmann, D., Krone, W. and Nurnberg, P. Ž1995. Reduced neurofibromin content but normal GAP activity in a patient with neurofibromatosis type 1 caused by a five base pair duplication in exon 12b of the NF1 gene. Biochem. Biophys. Res. Commun., 214, 895–904. Bottema, C.D., Ketterling, R.P., Vielhaber, E., Yoon, H.S., Gostout, B., Jacobson, D.P., Shapiro, A. and Sommer, S.S. Ž1993. The pattern of spontaneous germline mutation: relative rates of mutation at or near CpG dinucleotides in the factor IX gene. Hum. Genet., 91, 496–503. Canning, S. and Dryja, T.P. Ž1989. Short direct repeats at the breakpoints of the retinoblastoma gene. Proc. Natl. Acad. Sci. USA, 86, 5044–5048. Cawthon, R.W., Weiss, R., Xu, G., Viskochil, D., Culver, M., Stevens, J., Robertson, M., Dunn, D., Gesteland, R., O’Connell, P. and White, R. Ž1990. A major segment of the neurofibromatosis type 1 gene: cDNA sequence, genomic structure, and point mutations. Cell, 62, 193–203. Clark, S.J., Harrison, J. and Frommer, M. Ž1995. CpNpG methylation in mammalian cells. Nature Genet., 10, 20–27. Colman, S.D., Collins, F.S. and Wallace, M.R. Ž1993. Characterization of the single base-pair deletion in neurofibromatosis type 1. Hum. Mol. Genet., 2, 1709–1711. Cooper, D.N. and Youssoufian, H. Ž1988. The CpG dinucleotide in human genetic disease. Hum. Genet., 78, 151–155. Easton, D.F., Ponder, M.A., Huson, S.M. and Ponder, B.A.J. Ž1993. An analysis of variation in expression of neurofibromatosis type 1ŽNF1.: evidence of modifying genes. Am. J. Hum. Genet., 53, 305–313. Estivill, X., Lazaro, C., Casals, T., Ravella, A. Ž1991. Recurrence of a nonsense mutation in the NF1 gene causing classical neurofibromatosis type 1. Hum. Genet., 88, 185-188. Hatta, N., Horiuchi, T. and Fujita, S. Ž1994. Analysis of NF1 gene mutations in neurofibromatosis type 1 patients in Japan. Biochem. Biophys. Res. Commun., 199, 207–212. Heim, R.A., Kam-Morgan, L.N., Binnie, C.G., Corns, D.D., Cayouette, M.C., Farber, R.A., Aylsworth, A.S., Silverman, L.M. and Luce, M.C. Ž1995. Distribution of 13 truncating mutations in the neurofibromatosis 1 gene. Hum. Mol. Genet., 4, 975-981. Hill, K.A., Schisler, N.J. and Singh, S.M. Ž1992. Chaos game representation of coding regions of human globin genes and alcohol dehydrogenase genes of phylogenetically divergent species. J. Mol. Evol., 35, 261–269. Horiuchi, T., Hatta, N., Matsumoto, M., Ohtsuka, H., Collins, F.S., Kobayashi, Y. and Fujita, S. Ž1994. Nonsense mutations at Arg-1947 in two cases of familial neurofibromatosis type 1 in Japanese. Hum. Genet., 93, 81–83. Huson, S.M., Harper, P.S., Compton, D.A.S. Ž1988. Von Recklinghausen neurofibromatosis: a clinical and population study in south-east Wales. Brain, 111, 1355–1381. Huson, S.M., Compston, D.A., Clark, P. and Harper, P.S. Ž1989. A genetic study of von Recklinghausen neurofibromatosis in south east Wales. I. Prevalence, fitness, mutation rate, and

effect of parental transmission on severity. J. Med. Genet., 26, 704–711. Jadayel, D., Fain, P., Upadyaya, M., Ponder, M.A., Huson, S.M., Carey, J., Fryer, A., C.G.P. Mathew, Barker, D.F. and Ponder, B.A.J. Ž1990. Paternal origin of new mutations in von Recklinghausen neurofibromatosis. Nature, 343, 558–559. Jego, N., Thomas, G. and Hamelin, R. Ž1993. Short direct repeats flanking deletions, and duplicating insertions in p53 gene in human cancers. Oncogene, 8, 209–213. Krawczak, M. and Cooper, D.N. Ž1991. Gene deletions causing human genetic disease: mechanisms of mutagenesis and the role of the local DNA sequence environment. Hum. Genet., 86, 425–441. Kunkel, T.A. Ž1986. Frameshift mutagenesis by eucaryotic DNA polymerases in vitro. J. Biol. Chem., 261, 13581–13587. Kunkel, T.A. Ž1990. Misalignment-mediated DNA synthesis errors. Biochemistry, 29, 8003–8011. Laird, P.W., Jaenisch, R. Ž1994. DNA methylation and cancer. Hum. Mol. Gen., 3, 1487–1495. Lazaro, C., Kruyer, H., Gaona, A. and Estivill, X. Ž1995. Two further cases of mutation R1947X in the NF1 gene: screening for a relatively common recurrent mutation. Hum. Genet., 96, 361–363. Li, Y., Bollag, G., Clark, R., Stevens, J., Conroy, L., Fults, D., Friedman, E., Samowitz, W., Robertson, M., Bradley, P., McCormick, F., White, R. and Cawthon, R.W. Ž1992. Somatic mutations in the Neurofibromatosis 1 gene in human tumours. Cell, 69, 275–281. Li, Y., O’Connell, P., Breidenbach, H.H., Cawthon, R., Stevens, J., Xu, G., Neil, S., Robertson, M., White, R. and Viskochil, D. Ž1995. Genomic organization of the neurofibromatosis 1 gene ŽNF1.. Genomics, 25, 9–18. Marchuk, D.A., Saulino, A.M., Tavakkol, R., Swaroop, M., Wallace, M.R., Andersen, L.B., Mitchell, A.L., Gutmann, D.H., Boguski, M. and Collins, F.S. Ž1991. cDNA cloning of the type 1 neurofibromatosis gene: complete sequence of the NF1 gene product. Genomics, 11, 931–940. Miyoshi, Y., Anso, H., Nagase, H. and 11 others Ž1993. Germ-line mutations of the APC gene in 53 familial adenomatous polyposis patients. Proc. Natl. Acad. Sci. USA., 89, 4452–4456. O’Connell, P., Viskochil, D., Buchberg, A.M., Fountain, J., Cawthon, R.W., Culver, M., Stevens, J., Rich, D.C., Ledbetter, D.H., Wallace, M.R., Carey, J.C., Jenkins, N.A., Copeland, N.G., Collins, F.S. and White, R. Ž1990. The human homolog of murine Evi-2 lies between two von Recklinghausen neurofibromatosis translocations. Genomics, 7, 547–554. Purandare, S.M., Lanyon, W.G. and Connor, J.M. Ž1994. Charactrization of inherited and sporadic mutations in neurofibromatosis type-1. Hum. Mol. Genet., 3, 1109–1115. Redston, M.S., Caldas, C., Seymour, A.B., Hruban, R.H., da Costa, L., Yeo, C.J. and Kern, S.E. Ž1994. p53 mutations in pancreatic carcinoma and evidence of common involvement of homocopolymer tracts in DNA microdeletions. Cancer Res., 54, 3025–3033. Riccardi, V.M. Ž1993. Gentype, malleotype, phenotype and randomness: lessons from neurofibromatosis-1 ŽNF1.. Am. J. Hum. Genet., 53, 301–314.

D.I. Rodenhiser et al.r Mutation Research 373 (1997) 185–195 Robinson, P.N., Boddrich, A., Peters, H., Tinschert, S., Buske, A., Kaufmann, D. and Nurnberg, P. Ž1995a. Two recurrent nonsense mutations and a 4 bp deletion in a quasi-symmetric element in exon 37 of the NF1 gene. Hum. Genet., 96, 95–98. Robinson, P.N., Buske, A., Tinschert, S., Neumann, R. and Nurnberg, P. Ž1995b. A recurrent 2bp deletion in exon 10c of the NF1 gene in two cases of von Recklinghausen neurofibromatosis. Hum. Mutation, in press. Rodenhiser, D.I., Coulter-Mackie, M.B. and Singh, S.M. Ž1993. Evidence of DNA methylation in the Neurofibromatosis type 1 ŽNF1. gene region of 17q11.2. Hum. Mol. Genet., 2Ž3., 439– 444. Rodenhiser, D.I., Hovland, K., Andrews, J., Jung, J.H., Gillett, J.M.R., Ainsworth, P.J., Coulter-Mackie, M. and Singh, S.M. Ž1996a. A five base pair deletion in a direct-repeat region within exon 39 of the Neurofibromatosis type 1 ŽNF1. gene. Hum. Mutation, in press. Rodenhiser, D.I., Chakraborty, P.K., Andrews, J.D., Ainsworth, P.J., Mancini, D.N., Lopes, E., Singh, S.M. Ž1996b. Heterogeneous point mutations in the BRCA1 breast cancer susceptibility gene occur in high frequency at the site of homonucleotide tracts, short repeats and methylatable CpGrCpNpG motifs. Oncogene. in press. Rubenstein, A.E. and Korf, B.R. Ž1990. Neurofibromatosis: A Handbook for Patients, Families and Health-Care Professionals. Thieme, Stuttgart, New York. Schlotterer, C. and Tautz, D. Ž1992. Slippage synthesis of simple sequence DNA. Nucleic Acids Res., 20, 211–215 Schmutte, C., Yang, A.S., Beart, R.W. and Jones, P.A. Ž1995. Base excision repair of U:G mismatches at a mutational hotspot in the p53 gene is more efficient than base excision repair of T:G mismatches in extracts of human colon tumours. Cancer Res., 55, 3742–3746. Seizinger, B.R. Ž1993. NF1: a prevalent cause of tumorigenesis in human cancers? Nature Genet., 3, 97–99. Shattuck-Eidens, McClure, M. and 40 other authors Ž1995. A collaborative survey of 80 mutations in the BrCa1 Breast and ovarian cancer susceptibility gene. JAMA, 273, 535–541. Shen, M.H. and Upadhyaya, M. Ž1993. A de novo nonsense mutation in exon 28 of the neurofibromatosis type 1 ŽNF1. gene. Hum. Genet., 92, 410–412.

195

Shen, M.H., Harper, P.S. and Upadhyaya, M. Ž1993. Neurofibromatosis type 1 ŽNF1.: the search for mutations by PCR-heteroduplex analysis on Hydrolink gels. Hum. Mol. Genet., 2, 1861–1864. Sinden, R.R. and Wells, R.D. Ž1992. DNA structure, mutations and human genetic disease. Curr. Opin. Biotech., 3, 612–622. Strand, M., Prolla, T.A., Liskay, R.M. and Petes, T.D. Ž1993. Destabilization of tracts of simple repetitive DNA in yeast by mutations affecting DNA mismatch repair. Nature, 365, 274– 276. Streisinger, G., Okada, Y., Emrich, J., Newton, J., Tsugita, A., Terzahgi, E., Inouye, I. Ž1966. Frameshift mutation and the genetic code. Cold Spring Harbor Symp. Quant. Biol., 31, 77–86. Tassebehji, M., Strachan, T., Sharland, M., Colley, A., Donnai, D., Harris, R. and Thakker, N. Ž1993. Tandem duplication within a neurofibromatosis type 1 ŽNF1. gene exon in a family with features of Watson syndrome and Noonan syndrome. Am. J. Hum. Genet., 53, 90–95. The I., Murthy, A.E., Hannigan, G.E., Jacoby, L.B., Menon, A.G., Gusella and J.F., Bernards, A. Ž1993. Neurofibromatosis type 1 gene mutations in neuroblastoma. Nature Genet., 3, 62–66. Tornaletti, S. and Pfeifer, G.P. Ž1995. Complete and tissue-independent methylation of CpG site in the p53 gene: implications for mutations in human cancers. Oncogene, 10, 1493–1499. Upadhyaya, M., Shen, M., Cherryson, A., Farnham, J., Maynard, J., Huson, S.M. and Harper, P.S. Ž1992. Analysis of mutations at the neurofibromatosis 1 ŽNF1. locus. Hum. Mol. Genet., 1, 735–740. Upadhyaya, M., Shaw, D.J. and Harper, P.S. Ž1994. Molecular basis of Neurofibromatosis type 1 ŽNF1.: Mutation analysis and polymorphisms in the NF1 gene. Hum. Mutation, 4, 83–101. Xu, G., O’Connell, P., Stevens, J., White, R. Ž1992. Characterization of human adenylate kinase 3 ŽAK3. cDNA and mapping of the AK3 pseudogene to an intron of the NF1 gene. Genomics, 13, 537–542. Zhong, J., Spiegel, R., Boltshauser, E. and Schmid, W. Ž1993. Two novel mutations: 5108delAG and 5816insG in the NF1 gene detected by SSCP analysis. Hum. Mol. Genet., 2, 1491– 1492.