DNA-sequence specificity of mutations at the human thymidine kinase locus

DNA-sequence specificity of mutations at the human thymidine kinase locus

Mutation Research, 289 (1993) 231-243 © 1993 Elsevier Science Publishers B.V. All rights reserved 0027-5107/93/$06.00 231 MUT 05280 DNA-sequence sp...

1MB Sizes 2 Downloads 48 Views

Mutation Research, 289 (1993) 231-243 © 1993 Elsevier Science Publishers B.V. All rights reserved 0027-5107/93/$06.00

231

MUT 05280

DNA-sequence specificity of mutations at the human thymidine kinase locus Andrew J. Grosovsky, Barbara N. Walter and Cynthia R. Giver Environmental Toxicology Graduate Program, Universityof California, Riverside, CA 92521, USA (Received 1 February 1993) (Revision received30 April 1993) (Accepted 5 May 1993)

Keywords: Thymidinekinase; Spontaneous mutagenesis; Mutational spectrum; DNA-sequence specificity

Summary We have established a system for the study of DNA-sequence specificity at a functionally heterozygous thymidine kinase (tk) locus in a human lymphoblastoid cell line (TK6). Characterization of the parental locus demonstrated that the 2 tk alleles were fortuitously distinguished by differential gene expression. One round of PCR amplification yielded a specific tk cDNA product only for the functional parental allele. Analysis of cDNA from newly mutated alleles which retain substantial levels of expression is thus simplified. Amplification and sequencing of tk genomic sequences was used for analysis of low expression mutants, and in order to distinguish and characterize deletion and splicing mutations. DNA-sequence analysis of the parental locus identified a frameshift in tk exon 4 of the non-functional parental allele, and surprisingly, an exon 7 frameshift mutation in the functional tk allele. This exon 7 frameshift results in a predicted alteration of the final 21 amino acids of the T K protein, and a C-terminal extension of 131 additional amino acids. Since TK6 is phenotypically T K ÷, we can infer that this major C-terminal modification does not eliminate enzymatic activity. The system was utilized for the analysis of 36 spontaneous T K - mutants. Loss of heterozygosity accounted for 58% of the mutations, 11% were attributable to intragenic deletions, and the remainder involved point mutations, primarily G : C to A : T transitions.

DNA-sequence level analyses of mutational specificity in mammalian cells have focused on a small handful of selectable, mostly hemizygous, marker loci. Studies of the X-linked hypoxanthine-guanine phosphoribosyl transferase (hprt) Correspondence: Dr. Andrew J. Grosovsky, Environmental Toxicology Graduate Program, University of California, Riverside, CA 92521, USA. Tel. (909) 787-3193; Fax (909) 787-3087; Bitnet Grosovsk @uervms.

locus account for a large predominance of mutational specificity data, particularly in human cells. A significant amount of data is also available in rodent cells for adenine phosphoribosyl transferase (aprt; de Boer and Glickman, 1992; Phear et al., 1989; Grosovsky et al., 1988) and, to a lesser extent for dihydrofolate reductase (dhfr; Carothers and Grunberger. 1990; Carothers et al., 1988). The distinctive properties of each target locus chosen for mutational specificity investigations

232 will inevitably create limitations and biases affecting the variety of mutations which can be recovered. These locus-specific determinants include the distribution of DNA damage within a given target locus (Brash et al., 1987; Seeburg and Fuchs, 1990), the local sequence context surrounding a site of DNA damage (Gordon et al., 1990; Daubersies et al., 1992), the chromosomal context of the target locus including the nature of the surrounding sequences and the zygosity of the locus itself (Amundson and Liber, 1992; Cavanee et al., 1983), the status of gene expression at the time of mutagen exposure and during subsequent attempts at repair (Bohr et al., 1985; Chen et al., 1992; Venema et al., 1992), and the potential phenotypic consequences of specific DNA-sequence alterations. In order to establish a general picture of mutational specificity it thus becomes necessary to address a number of issues that transcend the inherent characteristics of any one particular marker gene. Hemizygosity is a primary reason for the widespread utilization of hprt in mutagenesis investigations; it enables convenient selection of H P R T - mutants from a wide variety of sources, and also simplifies molecular analyses of mutant hprt alleles. The availability of Chinese hamster ovary cell lines which are hemizygous for aprt (Simon et al., 1982, 1983), an autosomal locus, similarly accounts for the relatively large accumulated data base of DNA-sequence alterations in A P R T - mutants. Despite the logistical advantages of hemizygosity, many pathways for mutagenesis are only possible at heterozygous loci. Large multi-locus deletions, for example, may not be recoverable due to the presence of a closely linked essential gene within the region of hemizygosity. A variety of other mechanisms, involving homologous interactions in a heterozygote, result in reduction to homozygosity by conversion of the functional allele to the sequence found in its non-functional counterpart (Cavanee et al., 1983). Interest in these loss of heterozygosity (LOH) mechanisms has been stimulated by the important role they play in carcinogenesis through permitting expression of recessive alleles at tumor suppressor loci (Muller and Scott, 1992). Thymidine kinase (tk) has emerged as the principal heterozygous marker gene used in

mammalian mutagenesis studies. Although there are also rodent and human cell lines heterozygous for aprt (Bradley and Letovanec, 1982; Kleindienst and Drinkwater, 1991; Fujimori et al., 1992), these have been much less extensively investigated. Like hprt and aprt, tk encodes a salvage pathway enzyme which is inessential for cell survival under non-selective culture conditions, and convenient selections exist for TK and TK + phenotypes. Studies of TK- mutants have been fostered by the availability of cell lines specifically developed to be functionally heterozygous at the tk locus. Two such cell lines, the L5178Y mouse lymphoma cell line (Clive et al., 1972), and the TK6 human B lymphoblastoid cell line (Liber and Thilly, 1982), have been most widely studied. Since the tk locus is distinguished by its zygosity, analyses of TK- mutants have focused on the relative importance of chromosomal versus intragenic mutations. Surprisingly, molecular characterization of TK- mutants has been restricted to this niche, despite the need for additional target loci which can be utilized in DNA-sequence specificity studies. In this report we describe a methodology for the investigation of mutational spectra at the tk locus in TK6 human lymphoblasts. The DNA sequence of the parental tk alleles has been determined and their gene expression characterized. The strategy for analysis of point mutations in any individual TK- mutant strongly depended on the gene expression of the newly mutated allele. Genomic and cDNA sequence analysis of TK- point mutations were each used as required by the level of gene expression, and the need to distinguish and characterize deletion and splicing mutations. A diagnostic Sacl polymorphism at the tk locus (Yandell et al., 1986) was employed to identify those which had undergone loss of heterozygosity (LOH) or rearrangement events. The system was utilized for the analysis of 36 spontaneous TK mutants. Materials and methods

TK6 cell line, the tk locus, and selection of T K mutants TK6 is a human B lymphoblastoid cell line (Liber and Thilly, 1982) which is functionally het-

233

erozygous for thymidine kinase. The cell line was derived from HH4, a WI-L2 derivative (Skopek et al., 1979), by sequential selection of TK - / - mutants, and back selection of TK +/- revertants following treatment with the frameshift mutagen ICR-191 (Liber and Thilly, 1982). TK6 ceils are cultured in RPMI 1640 (Cellgro) with L-glutamine, and 10% heat-inactivated, iron-supplemented calf serum (HyClone). Stock cultures are routinely maintained in exponential growth at concentrations up to 10 6 cells per ml. Clonal assays are performed using 96 well dishes without feeder layers. The human tk locus has been localized to map position 17 q23-25 (Fain et al., 1991). The tk coding sequence consists of 702 bp distributed over 7 exons and 12.9 kb of genomic D N A (Flemington et al., 1987). The numbering systems used here for genomic (Flemington et al., 1987) and cDNA tk (Bradshaw and Deininger, 1984) sequences follow previously defined conventions.

Protocols for selection of TK- mutants have been published (Liber and Thilly, 1982; Grosovsky and Little, 1985). Briefly, selections are performed in 96-well dishes by seeding 4 x 10 4 cells per well in ordinary growth medium supplemented with 2 /xg/ml trifluorothymidine (TFT). Colonies were collected following 14 days of growth. Cultures to be used for mutant collection were started from small initial inocula (5 X 10 3 cells) in order to guard against the occurrence of the same pre-existing spontaneous mutant in more than one culture, and only one TK- mutant was recovered from each independent culture. The phenotype of each TK- mutant was confirmed by growth in TFT and sensitivity to CHAT (10/xM deoxycytidine, 200 /zM hypoxanthine, 0.1 /xM aminopterin and 17.5 /xM thymidine). All mutants grew normally in TFT, but some demonstrated leakiness by residual growth in CHAT. Slow growth TK- mutants (Yandell et al., 1986) were not analyzed in this study.

TABLE 1 tk AMPLIFICATION AND SEQUENCING PRIMERS Fragment

Length

(A) Pnme~ used m genomic amplification and sequencing Exons l and 2 296 bp

Primer a

Primer sequence 5 ' ~ 3 '

tk-g475 tk-g750R

ACGGCCTTGGAGAGTACTCGG GAGAAGAGGAAGGAGCTCCAC

Exon 3

420 bp

tk-g2201 tk-g2601R

GATATTATACCCTGCCTGCTCT GCGCTACCTTGCCAAGCACC

Exon 4

381 bp

tk-g4729 tk-g5089R

GAACACTGAGCCTGCTTTGCAG CACTATGACAGGGAAACTAGA

Exons 5, 6, 7

1016 bp

tk-gll811 tk-g12807R

GTCAGATCTGGCAGCGTCTTC GCAGCATGCAGGGCAGCGTC

(B) Primers used in cDNA amplification and sequencing tk cDNA, 1st 807 bp amplification

tk-c23 tk-c811 R

Same as Primer ~-g12807R

tk cDNA, 2nd amplification

tk-c61 tk-c740R

AGCTGCATTAACCTGCCCAC GTTGGCAGGGCTGCATTGCA

tk-c226 tk-c226R tk-c518 tk-c789R

AAAGACACTCGCTACAG CTGTAGCGAGTGTCTTT TGGAGTGCTTCCGGGAA GTAGGCGGCAGTGGCAGGAA

Additional primers used in cDNA sequencing

698 bp

AGAGTACTCGGGTTCGTGAA

a Primers are numbered according to the extreme 5' base on the non-coding strand of the template to which they anneal. Genomic sequences are designated with the prefix g, and cDNA-sequences with the prefix c. Reverse primers are indicated by the suffix R.

234

Southern and northern blotting DNA and RNA preparations were obtained from expanded mutant colonies using standard procedures. A full length tk cDNA purified as a BamH1-Smal fragment from plasmid pt kl l (Bradshaw and Deininger, 1984) was used as a hybridization probe in Southern and northern analyses. In Southern blotting, 20 /zg of Sacl digested genomic DNA was electrophoresed in a 0.8% agarose-TAE buffer gel and transfered onto Gene Screen Plus membranes (NEN Research Products) by capillary blotting. Hybridizations were carried out overnight at 65°C in 1% SDS, 1 M NaC1, 10% dextran sulfate, and 0.5 mg/ml herring sperm DNA. 25 ng of [a3zp]dATP (New England Nuclear) random primer labelled hybridization probe was used. Following stringent washes, autoradiographs were allowed 2-7 days for exposure. For northern analysis, 30 /zg of total RNA were electrophoresed on a 1% agarose, 37% formamide-MOPS gel. RNA was transferred to Gene Screen Plus nylon membranes by capillary blotting with 10 × SSC according to manufacturer's directions and hybridized to the tk cDNA probe. After stringent washes membranes were exposed for 2-3 days. A second independently isolated RNA sample was analyzed for each mutant in order to confirm observed expression levels. Evaluation of steady state levels of tk transcript from the non-functional parental allele was extensively corroborated in the northern analysis since all LOH mutants carry only this allele. Two separate RNA samples from 9 independently isolated LOH mutants were analyzed by northern blotting and all showed comparable levels of tk message. Evaluation of expression levels for all mutants was subsequently found to be consistent with results from PCR amplification of cDNA (see below).

Reuerse transcription and PCR amplification of cDNA All PCR amplifications were performed using an Ericomp Thermocycler and Taq polymerase (Cetus). Oligonucleotide primers were synthesized using a Model 391 DNA synthesizer (Applied Biosystems). All primers are listed in Table 1.

To analyze tk messenger RNA, 2 /zg of total cellular RNA and 100 ng of primer tk-c811R were added to a reverse transcription reaction mixture containing 1 mM MgCI2, 50 mM KC1, 10 mM Tris pH 8.3,200 ~M each dNTP (Pharmacia), and 200 units MoMuLV reverse transcriptase (BRL) in a total volume of 20 /zl. The reaction was incubated at 42°C for 60 min, and then heated to 95°C for 4 rain. The resultant cDNA was PCR amplified in the same reaction tube following the addition of 2.5 units of Taq polymerase and 100 ng of forward primer tk-c23. The reverse primer tk-c811R, used for reverse transcription, remains present in sufficient concentration for use as the reverse primer in the PCR reaction. The final reaction volume was 100 /~1, maintaining the MgCI2, KC1 and Tris concentrations used in the reverse transcription reaction. The cycling parameters were initial denaturation at 95°C for 2 min, 25 cycles of denaturation (95°C, 30 sec), annealing (57°C, 30 sec), and polymerization (72°C, 1 min), and a final extension at 72°C for 10 min. After completion, 10 /zl were removed for analysis on a 1% agarose gel stained with ethidium bromide. In some cases, a second round of amplification was performed using 5/~1 of the initial PCR product as template DNA, and nested primers, tk-c61 and tk-c740R. All other conditions for the second round of PCR were identical to those used for the initial amplification. cDNA products were purified prior to DNA sequencing using the Qiaex Gel Extraction kit (Qiagen) and resuspended in TE buffer.

Genomic PCR amplification The tk coding regions and surrounding intron sequences were amplified as four PCR fragments. The primers and sizes of the PCR products are listed in Table 1. Exons 1 and 2 are amplified on the same fragment, exon 3 and exon 4 are each amplified individually, and exons 5, 6 and 7 are all amplified on one fragment. The conditions used for the amplification of each fragment were identical, with the exception of a minor modification in the temperature cycle profile for the exon 4 fragment. These conditions are modeled after those specified by Gibbs et al. (1990) for multiplex amplification of the hprt locus. 250 ng of DNA and 50 pmoles each of the appropriate

235

forward and reverse primer were added to a reaction mixture containing 6.7 mM MgC12, 16.6 mM (NH4)2SO4, 5 mM BME, 6.8/zM EDTA, 67 mM Tris-HCl, pH 8.8 at 25°C, 1.5 mM of each dNTP and 2.5 units Taq DNA polymerase, in a final volume of 50/xl. The thermocycle parameters included an initial denaturation step of 94°C for 4.5 min, 25 cycles of denaturation (94°C, 30 sec), annealing (61°C, 50 sec) (57°C, 50 sec for exon 4), and polymerization (68°C, 2 min) and a final extension step at 68°C for 7 min. 10 /zl of the reaction product was analyzed on a ethidium bromide stained 1% agarose gel. Genomic amplification products were purified prior to DNA sequencing using either the Qiaex Gel Extraction kit or the Magic PCR Preps kit (Promega).

40 42

4344

4546

47 484988

4,tSKb

-SAKb

-6.2Kb

D N A sequencing

Purified double-stranded PCR fragments were sequenced directly using 3zp-dATP end lableled primers (Table 1) and thermocycle sequencing using the fmole DNA-sequencing system (Promega). Approximately 50 ng of template DNA was used in each sequencing reaction. Cycle parameters included initial denaturion at 95°C for 2 min, and then 30 two-step cycles of 95°C for 30 sec, and 62-70°C for 30 sec. Results

Initial characterization o f TK - mutants

The initial step in the analysis of TK- mutants was to identify those which were attributable to a structural rearrangement or loss of the target allele. Fig. 1 shows a Southern analysis of Sac1 digested genomic DNA from a representative set of spontaneous TK- mutants. The bands at 14.8 and 8.4 kb are a polymorphic pair resulting from a Sac1 RFLP at the tk locus (Yandell et al., 1986; Fig. 4). LOH mutants contain only the 8.4-kb fragment corresponding to the non-functional parental allele (Fig. 1). Mutants attributable to a structural rearrangement of the tk locus, such as a deletion, are evident by the appearance of an altered band (Fig. 1). Overall, LOH accounted for 58% (21/36) of the TKmutants (Table 2). Another 11% (4/36) were found to contain a large intragenic deletion (Ta-

Fig. 1. Representative Southern analysis of spontaneous T K mutants. Genomic D N A from each mutant was digested with Sac1. A cloned h u m a n tk c D N A was used as a hybridization probe. A Sac1 R F L P at the tk locus results in 2 bands at 14.8 and 8.4 kb corresponding to the functional and non-functional alleles respectively. N u m b e r s on the top of each lane designate individual T K - m u t a n t s (TS prefix used in mutant nomenclature elsewhere in this report are excluded from lane headings). Loss of the 14.8-kb band without coordinate appearance of a new band is interpreted as a loss of heterozygosity event (TS86, TS91, TS42, TS47, TS88). An artifactual autoradiographic exposure seen in lane TS47 does not reflect specific hybridization to the tk probe. Novel bands seen in TS44 and TS45 reflect structural rearrangements of the functional tk allele.

ble 2) observable by southern blotting or genomic DNA-sequence analysis (see below). Gene expression and P C R amplification o f tk cDNA

Although the most convenient way to analyze point mutations is by amplification of the tk

236 TABLE 2 S E Q U E N C E ANALYSIS O F S P O N T A N E O U S T K - P O I N T M U T A N T S Mutant

Position

a

Alteration

A A or other change

Target sequence b

(A) Transitions TS2 TS9 TS5

E7:607 E6:479 El:l12

G:C > A:T G:C > A:T G:C>A:T

Val > Met Pro > Leu Gly>Arg

ACTCC G TGTfiT fiGTGC C fiCXfiC CCCGG

G GGCAG

TS40 TS46 TS48

E6:487 E5:445 E4:313

G:C > A:T A:T > G:C G:C > A:T

Glu > Lys Arg > Gly Gin > Stop

TGGCC

G AGAGC

TS49 TS72

I4:11900 I3:2475

G:C>A:T G:C > A:T

Splice Splice

t t c c a g TTCCC ACCGgtca g t c c c t

E4:334 E5:415

G:C>T:A A:T>T:A

Val>Phe lle>Phe

TGGCT

G TCATA

CCGTA

A TTGTG

E4:329-335

-7bp

GGGCG

TGGCTGT

TC CAG A GGAAG TGGCC C AGGAG

(B) Transversions TS8 TS12

(C) Small deletion TS18

CATAG

(D) Large deletions TS15 TS43 TS44 TS45

Deletion on Southern; Exon 4 missing in genomic sequencing Exon 4 missing in genomic sequencing Deletion on Southern; Exon 4 missing in genomic sequencing Deletion on Southern; Exon 4 missing in genomic sequencing

(E) Loss of heterozygosity TS1 TS14 TS81

TS4 TS17 TS83

TS3 TS16 TS82

TS7 TS19 TS85

TS10 TS20 TS86

TSll TS42 TS88

TS13 TS47 TS91

a Mutated position is identified by the exon (E1-9) or intron (11-8) in which they occur. Mutations occurring in exons are numbered according to their c D N A position. Mutations occurring in introns are numbered according to the genomic sequence position. b Affected base or bases are indicated by spacing. Exon sequences are shown in upper case, and intron sequences in lower case.

TK640 42

43 44

45

46

47 48 49 63

68

77 81

82

83 85

86

88 91"I'K6

Fig. 2. Representative northern analysis of spontaneous T K - mutants. A second, independently isolated R N A sample was analyzed for each mutant in order to confirm observed expression levels (data not shown). Total cellular R N A was hybridized with a cloned h u m a n tk c D N A probe. Expression in the parental TK6 cells is shown in the far right and far left lanes, and individual m u t a n t s are indicated by n u m b e r s on the top of each lane (TS prefix used in m u t a n t nomenclature elsewhere in this report are excluded from lane headings). TS43, TS44 and TS45 are deletion mutants. TS40, TS46, TS48 and TS49 are point mutants. L O H mutants shown include TS426 TS47, TS81, TS82, TS83, TS85, TS86, TS88 and TS91. Since each independent L O H mutant contains only the non-functional parental allele, steady state level of transcript from this allele is extensively confirmed. Full characterization of TS63, TS68 and TS77 will be reported elsewhere.

237

Second Amplification

First Amplification

TS40 TS48 TS42 TK6

"1$40 TS48 TS42 TK6

807bp -

698bp

tk

Fig. 3. P C R Amplification of cDNA. Total cellular R N A was reverse transcribed with a tk-specific primer, and the first strand c D N A was then P C R amplified. Results from a single round of amplification are shown in the left panel. An 807-bp cDNA product was obtained only for TK6 and for TS40, a mutant which retained high levels of expression in northern analysis. Reamplification with nested primers yielded a 698-bp c D N A fragment for all of the strains including TS48, a low expression T K - point mutant, and TS42, an L O H mutant.

tk

cDNA, gene expression from the alternate, nonfunctional tk allele in TK6 cells could potentially complicate sequence analyses. In order to evaluate this problem, we conducted a northern analysis of TK- mutants prior to cDNA amplification and sequencing. Those mutants which have undergone loss of heterozygosity, and therefore con-

4864

I

GCTGCCCCGCCT

H ~ ~ I , , I1 S Si i

B

S

tain only the non-functional allele from TK6, do not produce a level of transcript easily observable by northern analysis (Fig. 2). Gene expression was also found to be nearly undetectable from the newly mutated alleles in some TK- point mutants (Fig. 2; TS48) and deletion mutants (Fig. 2; TS43, TS44, TS45).

12690

I

i

tk

tk

GCCAGGGGAAGC

IH H ~

S*

S

lllil H

HB Non-Functional

4864

12690

I

I

GCTGCCCGCCT

SI~~IN

H I II

Ill

B

I

GCCAGGGGGAAGC

H ~ j

I=

S

Allele

S I

I Ill

H

HB Functional

-3

0

3

6

9

I

I

I

I

I

12 I

15

Allele

18

21

24

I

I

I

Kta Fig. 4. Structure of the locus and position of frameshift mutations occurring m the coding sequence of each allele. T h e 7 exons are shown as filled boxes. T h e non-functional allele is at the top and the functional allele in the center of the figure. A polymorphic recognition site is shown at approximately 18 kb. Sequences shown are altered regions in the locus of TK6. Single-base insertion frameshifts occurred in runs of G : C base pairs (shown in boldface type) in exon 4 of the non-functional allele and exon 7 of the allele whose product retains T K enzymatic function. The wild-type sequence is shown, in each case, in the opposite allele. Numbering of the frameshift sequences are genomic positions.

tk

Sac1

tk

tk

238 A single round of P C R amplification of the tk c D N A yielded a specific product for those T K mutants which demonstrated tk expression levels comparable to TK6 by northern analysis (Fig. 3, TS40). Using a second round of amplification with nested primers, tk c D N A could be obtained for low expression point mutants (Fig. 3, TS48), and L O H mutants (Fig. 3, TS42). Since the amplified c D N A from L O H mutants could only represent the product of the non-functional parental allele, cDNA-sequence analysis could be used to unambiguously identify the inactivating mutation. Furthermore, due to differential gene expression, the tk c D N A amplified with one round of P C R from TK6 cells would essentially represent only the product of the functional allele.

TABLE 3 AMINO ACID AND DNA SEQUENCE POLYMORPHISMS IN THE tk CODING SEQUENCE a,b cDNA

Reported sequence c

TK6 sequence

90

CTG Leu

CCC Pro

GGC Gly

CTG Leu

CCT Pro

GGC Gly

282

GAG Glu

GCG Ala

CTG Leu

GAG Glu

GCA Ala

CTG Leu

373

ATC

ATG

lie

Met

GAG Glu

ATC lie

GTG Val

GAG Glu

a Bases and amino acids shown in bold face vary between the 2 sequences. b Sequence alterations included here occur in both tk alleles in TK6. Frameshift mutations in individual TK6 tk alleles are depicted in Fig. 5. c As reported by Bradshaw and Deininger (1984).

Sequence analysis of parental tk alleles The sequence of the entire tk c D N A from representative L O H mutants was determined, and amplified c D N A from TK6 cells was analyzed in parallel, by direct sequencing of the amplified fragments. A single-base insertion frameshift was observed in exon 4 of the non-functional allele amplified from the L O H mutants (Fig. 4). This exon 4 frameshift mutation was unobservable in the tk c D N A amplified from TK6, confirming that the product of the non-functional allele represents an insignificant fraction of the total yield from the first round of PCR. Furthermore, potential complications for D N A sequence analysis, created by the presence of the alternate allele, are thereby avoided. The functional parental allele, and mutant derivatives which retain a significant level of gene expression, can therefore be directly analyzed with this approach. Surprisingly, the TK6 c D N A contained a single base insertion frameshift in exon 7 which was not present in the non-functional allele product (Fig. 4). This frameshift results in a predicted alteration of the final 21 amino acids at the C-terminus of the T K protein, and the inclusion of 131 additional amino acids before the first in-frame stop codon is reached. Since the frameshift is not present in tk c D N A from L O H mutants, we can be certain that the frameshift is carried on the functional tk allele in TK6. Furthermore, since TK6 is phenotypically T K +, we can infer that the frameshift in exon 7 does not

eliminate the enzymatic activity of the gene product. The localization of the exon 7 frameshift to the functional tk allele in TK6 was further confirmed by genomic DNA-sequence analysis. Three other differences from the previously reported human tk coding sequence (Bradshaw and Deininger, 1984) were found in both alleles (Table 3). These may reflect polymorphisms existing in human populations. One of these sites (373) predicts a conservative amino acid substitution (Met to Val) in the T K protein (Table 3).

Genomic PCR amplification and DNA-sequence analysis Direct sequencing of the amplified c D N A was difficult for mutants in which two rounds of PCR amplification were required in order to recover a tk c D N A product. D N A sequencing suggested that the predominant P C R product from these mutants was often the c D N A from the parental non-functional allele. Analysis of newly mutated alleles is therefore obscured if gene expression is reduced to a level lower than that of the alternate allele. In order to circumvent this problem, an amplification and sequencing strategy was developed for the analysis of tk genomic sequences. The 7 exons were amplified from total genomic D N A in 4 fragments (Fig. 5). The length of these fragments, the primers used for their amplification, and the exons they contain are listed in

239 Table 1. Although these genomic P C R fragments include sequences from both alleles in mutants which retain heterozygosity, direct D N A sequencing can identify the site of a new base-substitution mutation as a double banded position on a sequencing ladder. Other point mutations, such as frameshifts and small deletions, can also be characterized using this method by considering the overlapping sequencing ladders from the two alleles individually.

Characterization of deletions and point mutations in T K - mutants Three T K - mutants were identified as intragenic deletions by Southern blotting (TS44 and TS45, Fig. 1; TS15, data not shown). In order to better define the extent of the deletions, these mutants were further analyzed by genomic amplification and sequencing of exon 4. The frameshift mutation carried in exon 4 of the non-functional allele results in a double-banding pattern on a sequencing ladder if exon 4 of the target allele is also present in the P C R amplified genomic product. However, a double-banding pattern was not observed in TS15, TS44 and TS45 indicating the deletion in each case encompasses a primer site for the exon 4 fragment. No alteration was observed for TS43 by Southern analysis (Fig. 1), but genomic sequencing revealed the loss of exon 4. Since exon 4 lies within a 6-kb Sac1 fragment corresponding to no other portion of the tk c D N A hybridization probe (Fig. 4), both observations could be explained by an internal deletion of the fragment. A small deletion, below the limits of resolution for Southern analysis, could also ex-

1-2 1016

-

3

4

5-7 -,

420 _ 381 296 -

Fig. 5. Genomic PCR amplification of tk exons and surrounding sequences. Exons included in each amplified fragment are indicated at the top of each lane.

TS49

a

11918

I

I

ccctttccag'I-I'CCCTGACATCATGGAGTTCTG --INTRON4

I

EXON5

2466 I

2423

I

TS72

a I

ATCAAG TATGCCAAAGACA...GACCGgtcagtccctg EXON3

J

INTRON3 - -

Fig. 6. Mutations causing aberrant splicing of the tk transcript. Exon sequences are shown in upper case and intron sequences in lower case. The mutated position in the wild-type splice site is shown in boldface with the substituted base as indicated. The exon sequences missing from the cDNA are underlined. The cryptic splice site used in the mutants are shown in italics. Numbering of the sequences are genomic positions. plain the data if it included a primer binding site for the exon 4 P C R fragment. The 11 T K - point mutants were characterized by c D N A sequencing and additional genomic sequence analysis as required (Table 2). A small deletion of 7 base pairs from exon 4 was identified in TS18, the only deletion found in this size class. Base substitutions were the most common type of alteration observed among the mutants retaining heterozygosity (10/15). G : C to A : T transitions were most prominent, accounting for 7 of the 10 base substitutions. In two mutants, TS49 and TS72, c D N A sequence alterations suggested that aberrant splicing of the tk transcript had occurred (Fig. 6) since sequences missing from each c D N A corresponded to normal exon boundaries. 18 bases were missing from the 5' end of exon 5 in the c D N A of TS49, and 48 bases were missing from the 3' end of exon 3 in the c D N A of TS72. The genomic regions likely to be affected were sequenced, and both mutants were found to contain intronic G : C to A : T transitions affecting either a splice acceptor or splice donor consensus sequence (Table 2, Fig. 6). Furthermore, the specific alteration to each c D N A is apparently attributable to the use of cryptic splice sites which precisely demarcate the missing exon sequences (Fig. 6). The last 2 bases of the sequence missing in the c D N A of TS49 comprise a cryptic A G dinucleotide splice acceptor. In TS72, the sequence missing from the c D N A begins with a G T

240 dinucleotide which has apparently been used as a cryptic splice donor site. Additionally, several bases surrounding the GT dinucleotide closely resemble a splice donor consensus sequence. Discussion

In considering the construction of a mutational spectrum at a heterozygous locus, two problems must be faced which are not encountered in studies of hemizygous markers. Mutants which have undergone a loss or conversion event must be efficiently identified. Then, among mutants which retain heterozygosity, the newly mutated allele must be unambiguously analyzed despite the potentially confusing presence of the alternate, non-functional parental allele. In this investigation, we have developed methodology for determination of DNA-sequence specificity at the functionally heterozygous tk locus in TK6 human lymphoblasts. Although this locus has been widely used for the investigation of mutagenesis, these studies have focused only on the relative predominance of chromosomal and intragenic mutations. Identification of LOH mutants was accomplished through use of a previously reported Sac1 RFLP (Yandell et al., 1986) closely linked to the tk gene (Fig 4). LOH mutants were not further analyzed in this investigation since no additional information could be gained by direct characterization of the tk locus. However, a series of chromosome 17 specific RFLP probes, which have been previously reported to be informative in TK6 cells (Li et al., 1992; Amundson and Liber, 1992), can be used to determine the extent of allele loss in individual mutants, and to distinguish non-disjunctions from recombinational or interstitial deletion events. The approach employed for characterization of DNA-sequence alterations was determined by the level of tk gene expression retained in individual T K - point mutants. Steady-state level of the tk transcript from the two parental alleles differs substantially. This is a fortuitous circumstance since cDNA sequence analysis of newly mutated alleles whose expression remains comparable to original levels is not complicated by the transcript from the poorly expressed, non-functional parental allele. However, direct cDNA-se-

quence analysis of low expression mutants may be obscured by the non-functional allele product. Amplification and sequencing of tk genomic sequences was most effective for this group. Genomic amplification and sequencing was also utilized in order to distinguish and characterize deletion and splicing mutations. Functional heterozygosity at the tk locus in TK6 cells was developed by sequential treatments of the predecessor cell line HH4 with the frameshift mutagen ICR-191, selection of a TK - / - homozygote and back selection of a TK +/- heterozygote following additional treatment with ICR-191 (Liber and Thilly, 1982). It was therefore not surprising that a frameshift mutation was identified in exon 4 of the nonfunctional tk allele (Fig. 4). Functional inactivation of the exon 4 frameshift allele is demonstrated by the T K - phenotype of LOH mutants, which contain only that allele. The frameshift is presumed to be the inactivating mutation since other alterations from the previously published human tk sequence in this allele were also observed in the functional allele (Table 3). The observation of a frameshift in exon 7 of the functional tk allele was, in contrast, completely unanticipated. The predicted alterations to the TK protein as a result of the exon 7 frameshift involve the final 21 amino acids as well as extension of the C-terminus by 131 additional amino acids. Nevertheless, TK enzymatic activity appears to be substantially retained in the altered TK protein product of the exon 7 frameshift allele. The TK + phenotype of TK6 cells is clearly demonstrated by sensitivity to TFT and resistance to CHAT. Measured levels of TK enzymatic activity have been reported to be approximately 30% of the functionally homozygous parent cell HH4, only slightly less than expected for a functional heterozygote (Liber and Thilly, 1982). These observations suggest that although the exon 7 frameshift is not entirely negligible, the Cterminus of the TK protein may be in essence functionally dispensable. Furthermore, the altered polypeptide segment must not sterically interfere with critical functional domains. An inactivating single base insertion frameshift has been previously reported (Benjamin et al., 1991) at position 12519-12523 near the beginning of tk

241

exon 7. This mutation presumably inactivates TK enzymatic function by destroying essential coding sequences upstream of those affected in the TK6 exon 7 frameshift allele; both mutations predict an identical 131 amino acid C-terminal extension of the TK protein. We do not anticipate that any mutations will be observed downstream of the TK6 exon 7 frameshift. Further analysis of this exon 7 frameshift allele in TK6 may lead to greater structural and biochemical understanding of the human TK protein, and perhaps other related salvage pathway enzymes. In order to establish the methodology described here for mutational spectra at the tk locus, we analyzed 36 spontaneous TK- mutants. The majority of these mutants (21/36, 58%) were attributable to LOH events (Table 2). These resuits are comparable to previous investigations; the prevalence of LOH at the tk locus in TK6 and related cell lines has been reported to range from 51% to 70% among normal growth, spontaneous mutants (Yandell et al., 1990; Li et al., 1992; Amundson and Liber, 1992). The high frequency of allele loss represents a logistical disadvantage for the development of a large collection of intragenic DNA sequence level alterations; only 11/36 (31%) spontanteous TK- mutants occurred due to intragenic point mutations (Table 2). In contrast, the recovery of point mutations is favored at hemizygous marker loci, but the observation of many chromosomal scale events is precluded. These relative advantages should be considered for specific applications although a comprehensive understanding of mutagenesis requires an accounting for the diversity of the genome. Since the recovery of intragenic point mutations was limited to 11 mutants, specific conclusions regarding DNA-sequence specificity must be accordingly cautious. Single base-substitution mutations were predominated by G : C to A : T transitions (Table 2). It is noteworthy that 3/7 G : C to A : T transitions (Table 2; TS2, TS9, and TS40) affected a cytosine within a CpG dinucleotide. This pattern suggests that these mutants may have resulted from deamination of 5-methylcytosine which is believed to occur primarily at CpG dinucleotides in mammalian cells (Bird, 1986). The local sequence context of the affected

cytosine in the spontaneous transitions observed here can furthermore be distinguished from the G : C to A : T transitions which characterize UVlight-induced mutational spectra. The UV induced G : C to A : T transitions involve cytosines which are directly 3' to a neighboring pyrimidine within a polypyrimidine tract (Drobetsky et al., 1987). In contrast, approximately half (4/7) of the affected cytosines in spontaneous TK- mutants are either in the 5' position or surrounded by purines (Table 2). 2 transversions and 1 small deletion were also observed among the point mutants. Each of the single-base substitutions occurred at separate positions within the tk coding sequence or within a splice junction consensus sequence. One transversion mutant, TS8, did fall within the 7-bp region which is deleted in TS18, although the potential significance is not clear. An additional 4 mutants were classified as large intragenic deletions. Although these deletions are clearly not identical in size (Fig. 1), each affects exon 4, and thus they may be closely related at one breakpoint. Missing exon sequences observed in cDNA analysis of TS49 and TS72 were attributable to G : C to A : T transitions affecting either a splice donor or acceptor consensus sequence (Fig. 6). Both mutations result in the emergence of a cryptic splice site within neighboring exon sequences (Fig. 6). Recent studies (Mullen et al., 1991; Smith et al., 1989) suggest that the the first AG dinucleotide downstream of the branchpoint and poylypyrimidine tract elements, is used as the splice acceptor site. In TS49 the canonical AG dinucleotide in the splice acceptor sequence is specifically altered, and the first AG dinucleotide further downstream is utilized as the new splice junction. Mutations to splice donor sequences usually result in exon skipping, or the utilization of a cryptic splice site, close to and frequently upstream of the mutated position (Robberson et al., 1990; Talerico and Berget, 1990). The mutation in TS72 corresponds closely to these expectations, although the canonical GT dinucleotide within the intron 3 splice donor sequence is not directly affected by the mutation. In summary, this investigation has established a comprehensive methodology for the analysis of

242

mutational specificity at the functionally heterozygous tk locus in TK6 human lymphoblasts. Characterization of the parental locus demonstrated differential gene expression for the 2 tk alleles, a strategically advantageous situation. DNA-sequence analysis identified an unanticipated, functionally tolerable, exon 7 frameshift mutation. The mutational spectrum methodology was utilized to examine 36 spontaneous TKmutants. These were predominantly LOH mutants, although G : C to A : T transitions were most common among intragenic mutations. Use of this system in future studies can provide additional target locus diversity for the definition of mutational spectra in human cells.

Acknowledgements We thank Leslie Hasegawa for technical assistance with the Southern analyses. This work was supported by grant RO1 CA55659 from the National Institutes of Health.

References Amundson, S.A., and H.L. Liber (1992) A comparison of induced mutation at homologous alleles of the tk locus in human cells, II. Molecular analysis of mutants, Mutation Res., 267, 89-95. Benjamin, M.B., H. Potter, D.W. Yandell and J.B. Little (1991) A system for assaying homologous recombination at the endogenous human thymidine kinase gene, Proc. Natl. Acad. Sci. (U.S.A.), 88, 6652-6656. Bird (1986) CpG rich islands and the function of DNA methylation, Nature (London), 321,209-213. Bohr, V.A., C.A. Smith, D.S. Okumoto and P.C. Hanawalt (1985) DNA repair in an active gene: Removal of pyrimidine dimers from the DHFR gene of CHO cells is much more efficient than in the genome overall, Cell, 40, 359369. Bradley, W.E.C., and D. Letovanec (1982) High frequency nonrandom mutational event at the adenine phosphoribosyltransferase (aprt) locus of sub-selected CHO variants heterozygous for aprt, Somat. Cell. Genet., 8, 51-66. Bradshaw, H.D., and P.L. Deininger (1984) Human thymidine kinase gene: molecular cloning and nucleotide sequence of a cDBA expressible in mammalian cells, Mol. Cell. Biol., 4, 2316-2320. Brash, D.E., S. Seetharam, K.H. Kraemer, M.M. Seidman and A. Bredberg (1987) Photoproduct frequency is not the major determinant of ultraviolet mutation hotspots or coldspots in human cells, Proc. Natl. Acad. Sci. (U.S.A.), 84, 3782-3786. Carothers, A.M., and D. Grunberger (1990) DNA base

changes in benzo[a]pyrene diol epoxide-induced dihydrofolate reductase mutants of Chinese hamster ovary cells, Carcinogenesis, 11, 89-92. Carothers, A.M., G. Urlaub, D. Grunberger and L.A. Chasin (1988) Mapping and characterization of mutations induced by benzo[a]pyrene diol epoxide at dihydrofolate reductase locus in CHO cells, Cell. Mol. Genet., 14, 169-183. Cavenee, W.K., T.P. Dryja, R.A. Phillips, W.F. Benedict, R. Godbout, B.L. Gallie, A.L. Murphree, L.C. Strong and R.L. White (1983) Expression of recessive alleles by chromosomal mechanisms in retinoblastoma, Nature (London), 1 305, 779-784. Chen, R.H., V.M. Maher, J. Brouwer, P. van de Putte and J.J. McCormick (1992) Preferential repair and strand-specific repair of benzo[a]pyrene diol epoxide adducts in the HPRT gene of diploid human fibroblasts, Proc. Natl. Acad. Sci. (U.S.A.), 89, 5413-5417. Clive, D., M.R. Flamm, M.R. Machesko and N.J. Bernheim (1972) A mutational assay system using the thymidine kinase locus in mouse lymphoma cells, Mutation Res., 16, 77-87. Daubersies, P., S. Galiegue-Zouitina, N. Koffel-Schwartz, R.P. Fuchs, M.H. Loucheux-Lefebvre and B. Bailleul (1992) Mutation spectra of the two guanine adducts of the carcinogen 4-nitroquinoline 1-oxide in Escherichia coli, Influence of neighbouring base sequence on mutagenesis, Carcinogenesis, 13, 349-354. de Boer, J.G., and B.W. Glickman (1992) Mutations recovered in the Chinese hamster aprt gene after exposure to carboplatin: a comparison with cisplatin, Carcinogenesis, 13, 15-17. Drobetsky, E.A., A.J. Grosovsky and B.W. Glickinan (1987) The specificity of UV induced mutations at an endogenous locus in mammalian cells, Proc. Natl. Acad. Sci. (U.S.A.), 84, 9103-9107. Fain, P.R,, E. Solomon and D.H. Ledbetter (1991) Second international workshop on human chromosome 17, Cytogenet. Cell Genet., 57, 66-77. Flemington, E., H.D. Bradshaw, V. Traina-Dorge, V. Slagel and P.L. Deininger (1987) Sequence, structure and promoter characterization of the human thymidine kinase gene, Gene, 52, 267-277. Fujimori, A., A. Tachibana and K. Tatsumi (1992) Allelic losses in mutations at the aprt locus of human lymphoblastoid cells, Mutation Res., 269, 55-62. Gibbs, R.A., P.-N. Nguyen, A. Edwards, A.B. Civitello and C.T. Caskey (1990) Multiplex DNA deletion detection and exon sequencing of the hypoxanthine phosphoribosyltransferase gene in Lesch-Nyhan families, Genomics, 7, 235244. Gordon, A.J., P.A. Burns and B.W. Glickman (1990) NMethyl-N'-nitro-N-nitroso-guanidine induced DNA-sequence alteration; non-random components in alkylation mutagenesis, Mutation Res., 233, 95-103. Grosovsky, A.J., and J.B. Little (1985) Evidence for linear response for the induction of mutations in human cells by X-ray exposures below 10 rads, Proc. Natl. Acad. Sci. (U.S.A.), 82, 2092-2095.

243 Grosovsky, A.J., J.G. de Boer, P.J. de Jong, E.A. Drobetsky and B.W. Glickman (1988) Base substitutions, frameshifts and small deletions constitute ionizing radiation-induced point mutations in mammalian cells, Proc. Natl. Acad. Sci. (U.S.A.), 85, 185-188. Klinedinst, D.K., and N.R. Drinkwater (1991) Reduction to homozygosity is the predominant spontaneous mutational event in cultured human lymphoblastoid cells, Mutation Res., 250, 365-374. Li, C.Y., D.W. Yandell and J.B. Little (1992) Molecular mechanisms of spontaneous and induced loss of heterozygosity in human cells in vitro, Somat. Cell. Mol. Genet., 18, 77-87. Liber, H.L., and W.G. Thilly (1982) Mutation assay at the thymidine kinase locus in diploid human lymphoblasts, Mutation Res., 94, 467-485. Mullen, M.P., C.W. Smith, J.G. Patton and B. Nadal-Ginard (1991) Alpha-trop-myosin mutually exclusive exon selection: competition between branchpoint/polypyrimidine tracts determines default exon choice, Genes and Develop., 5, 642-655. Muller, H., and R. Scott (1992) Hereditary conditions in which the loss of hetero-zygosity may be important, Mutation Res., 284, 15-24. Phear, G., W. Armstrong and M. Meuth (1989) Molecular basis of spontaneous mutation at the aprt locus of hamster cells, J. Mol. Biol., 209, 577-582. Robberson, B.L., G.J. Cote and S.M. Berget (1990) Exon definition may facilitate splice site selection in RNAs with multiple exons, Mol. Cell. Biol., 10, 84-94. Seeberg, E., and R.P. Fuchs (1990) Acetylaminofluorene bound to different guanines of the sequence - G G C G C C is excised with different efficiencies by the UvrABC excision nuclease in a pattern not correlated to the potency of

mutation induction, Proc. Natl. Acad. Sci. (U.S.A.), 87, 191-194. Simon, A.E., M.W. Taylor, W.E.C. Bradley and L.H. Thompson (1982) Model involving gene inactivation in the generation of autosomal recessive mutants in mammalian cells in culture, Mol. Cell. Biol., 9, 11126-11133. Simon, A.E., M.W. Taylor and W.E.C. Bradley (1983) Mechanism of mutation at the aprt locus in Chinese hamster ovary cells: analysis of heterozygotes and hemizygotes, Mol. Cell. Biol., 10, 1703-1710. Skopek, T.R., H.L. Liber, D.A. Kaden, R.A. Hites and W.G. Thilly (1979) Mutation of human cells by kerosene soot, J. Natl. Cancer Inst., 63, 309-312. Smith, C.W., E.B. Porro, J.G. Patton and B. Nadal-Ginard (1989) Scanning from an independently specified branch point defines the 3' splice site of mammalian introns, Nature (London), 342, 243-247. Talerico, M., and S.M. Berget (1990) Effect of 5' splice site mutations on splicing of the preceding intron, Mol. Cell. Biol., 10, 6299-6305. Venema, J., Z. Bartosova, A.T. Natarajan, A.A. van Zeeland and L.H. Mullenders (1992) Transcription affects the rate but not the extent of repair of cyclobutane pyrimidine dimers in the human adenosine deaminase gene, J. Biol. Chem., 267, 8852-8856. Yandell, D.W., T.P. Dryja and J.B. Little (1986) Somatic mutations at a heterozygous autosomal locus in human cells occur more frequently by allele loss than by intragenic structural alterations, Somat. Cell. Mol. Genet., 12, 255-268. Yandell, D.W., T.P. Dryja and J.B. Little (1990) Molecular genetic analysis of recessive mutations at a heterozygous autosomal locus in human cells, Mutation Res., 229, 89102.