Direct IBD mapping: identical-by-descent mapping without genotyping

Direct IBD mapping: identical-by-descent mapping without genotyping

Genomics 83 (2004) 335 – 345 www.elsevier.com/locate/ygeno Direct IBD mapping: identical-by-descent mapping without genotyping Denis Smirnov, a,b Ala...

744KB Sizes 0 Downloads 28 Views

Genomics 83 (2004) 335 – 345 www.elsevier.com/locate/ygeno

Direct IBD mapping: identical-by-descent mapping without genotyping Denis Smirnov, a,b Alan Bruzel, c Michael Morley, c and Vivian G. Cheung a,b,c,* a b

Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19104, USA c The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA Received 17 April 2003; accepted 4 August 2003

Abstract Direct identical-by-descent (IBD) mapping is a technique, that combines genomic mismatch scanning (GMS) and DNA microarray technology, for mapping regions shared IBD between two individuals without locus-by-locus genotyping or sequencing. The lack of reagents has limited its widespread application. In particular, two key reagents have been limiting, 1) mismatch repair proteins MutS, L and H, and 2) genomic microarrays for identifying the genomic locations of the GMS-selected IBD fragments. Here, we describe steps that optimized the procedure and resources that will facilitate the development of direct IBD mapping. D 2003 Elsevier Inc. All rights reserved. Keywords: Gene mapping; Polymorphism; Mismatch repair protein; Microarray

Direct identical-by-descent mapping is a combination of two techniques, genomic mismatch scanning (GMS) and genomic DNA microarray hybridization. It scans the entire genome for sequence variation and enriches the DNA shared identical-by-descent (IBD) between individuals. The goals of gene-mapping projects are to identify the genomic regions shared IBD among affected individuals and then to search for the DNA variants responsible for the disease or phenotype of interest within these IBD regions. To facilitate these projects, many high-throughput methods have been developed to genotype polymorphic markers such as microsatellite and single nucleotide polymorphism (SNP) markers [1– 10]. Despite these advances, it is still quite difficult to genotype a sufficiently large number of polymorphic markers for gene identification. In a typical study, several thousand microsatellite markers or tens of thousands of SNP markers need to be genotyped for each individual [11,12]. There are many ongoing efforts to improve the technologies and to better understand the genome structure so as to decrease the number of markers that need to be genotyped in mapping studies.

* Corresponding author. Department of Pediatrics, University of Pennsylvania, 3516 Civic Center Boulevard, ARC 516G, Philadelphia, PA 19104, USA. E-mail address: [email protected] (V.G. Cheung). 0888-7543/$ - see front matter D 2003 Elsevier Inc. All rights reserved. doi:10.1016/j.ygeno.2003.08.002

In this paper, we describe improvements on a method, direct IBD mapping, which combines genomic mismatch scanning (GMS) and DNA microarrays. It allows simultaneous scanning of the entire genome for large regions of sequence identity. It does not require marker-bymarker genotyping or sequencing thus it is less laborintensive and less expensive compared to existing mapping methods. Direct IBD mapping can be applied to samples from individuals of unknown relationship [13] and those within a known pedigree. It rests on a highly efficient method known as GMS that enriches for IBD regions between individuals using mismatch repair proteins [14 – 17]. IBD regions are defined by large regions (f3 kb) of sequence identity defined by the absence of sequence polymorphisms, mostly SNPs, and small insertions and deletions. Since the frequency of natural polymorphism is about 1 per 1000 nucleotides [18,19], these enriched fragments likely represent IBD DNA. These polymorphisms do not need to be identified a priori or genotyped since they are recognized by the mismatch repair proteins in a procedure similar to in vivo detection of errors during DNA replication. The genomic locations of the selected IBD DNA are identified by hybridization of the DNA onto a microarray containing mapped genomic clones. GMS was shown to be a robust mapping method first in yeast, and then in mice and humans [13 – 17]. As microarray

336

D. Smirnov et al. / Genomics 83 (2004) 335–345

technology was developed [20], GMS was coupled to microarrays to allow mapping of all the GMS-selected IBD fragments onto microarrays containing mapped genomic clones from chromosome 11 [13]. The lack of two key reagents, mismatch-repair proteins and genomic microarrays, has limited the applicability of direct IBD mapping.

We have now tested the robustness of purified mismatch repair proteins and mapped a set of BAC clones that span the human genome at about 1-Mb intervals [21,22]. In this paper, we will provide a detailed protocol for direct IBD mapping and introduce the resources necessary to perform the procedure.

Fig. 1. Schematic of direct IBD mapping. Two PstI-digested genomic DNA samples, one of which was methylated (indicated by red circles), are mixed, denatured and reannealed. Homohybrids are removed. The remaining heterohybrids are treated with Escherichia coli mismatch-repair proteins to remove the mismatch-containing heterohybrids, thereby selectively enriching for IBD DNA fragments that are sequence identical. The genomic location of the IBD DNA samples are determined by hybridization onto a genomic microarray containing mapped BAC clones.

D. Smirnov et al. / Genomics 83 (2004) 335–345

337

Results

Fig. 4. Purification of E.coli His6-MutS, His6-MutL, and His6-MutH proteins. Final eluates from His-bind resins were fractionated using SDS – PAGE.

Direct IBD mapping consists of two hybridization steps and one mismatch detection step. The procedure is outline in Fig. 1. First, we will give an overview of the procedure and then discuss each step in detail. Genomic DNA samples from two individuals are digested with a restriction enzyme, PstI, generating fragments that are about 3 kb in length with 3V protruding ends. Then, PstI DNA fragments from one of the individuals is methylated to allow the discrimination between the genomic DNA samples from the two individuals and to enhance the kinetics of the methyl-directed mismatch recognition step [23]. The two DNA samples are then denatured and allowed to reanneal in a formamide phenol emulsion [24]. The reannealed products include methylated homohybrids, unmethylated homohybrids and hemimethylated heterohybrids. Since the goal is to compare the DNA sequences of the two individuals, only the heterohybrids are of interest. The homohybrids are removed by digestion with methylation sensitive enzymes, DpnI and MboI. The hemimethylated heterohybrids are resistant to DpnI and MboI digestion. The cleaved homohybrid DNA fragments are removed by addition of exonuclease III that extends the cleavage at the GATC sites to form stretches of single-stranded DNA fragments that are then removed by binding to nitrocellulose filters. The hemimethylated heterohybrids remain intact since their PstI generated 3V protruding ends protect them from exo-

Fig. 5. Functional characterization of Mut proteins. (A) MutS protein was incubated with 32P-labeled oligonucleotides containing no mismatch (PM), G/T mismatch (MM) and +1 bulge T insertion/deletion (ID) in the presence of 20-fold excess of unlabeled PM oligonucleotides. The resulting DNA – protein complex was resolved by native PAGE. Arrows denote the positions of MutS – oligonucleotide complexes (top) and unbound oligonucleotides (bottom). (B) MutH cleavage assay performed with MutH alone (closed circles) or MutH plus MutL and MutS (closed squares).

338

D. Smirnov et al. / Genomics 83 (2004) 335–345

nuclease III. Mismatch repair proteins, MutS, L and H, are then added to identify and nick the heterohybrid DNA fragments that contain single nucleotide polymorphisms and small (3 to 5 bp) insertion/deletion polymorphisms. The nicked mismatch-containing DNA fragments are then exposed to a second round of exonuclease III treatment and removal of single-stranded DNA by nitrocellulose filters. The DNA fragments that are left are enriched with mismatch-free heterohybrids. In the following sections, we will describe in greater detail the different steps in this procedure. Step 1: preparing genomic DNA for IBD selection The DNA samples used for GMS analysis must have several characteristics including optimal size for determining if two fragments are IBD, 3V protruding ends for exonuclease III-mediated heterohybrid selection and specific DNA sequence (GATC sequence) for nicking of mismatchcontaining heterohybrids. GMS selection starts with digestion of genomic DNA samples with a restriction enzyme, such as PstI, which produces fragments averaging 3 kb in size. In silico digestion of human genomic DNA showed that 68% of the genome (in basepairs) are found in PstI fragments that are 3 kb or greater in size. Since the frequency of DNA polymorphism is about 1 per 1,000 [18,19], the majority of the heterohybrids formed from these fragments should contain at least one sequence variant if they are not IBD. In addition, the starting DNA fragments should also contain GATC sites, which is the recognition sequence for mut protein-directed nicking of mismatch containing heterohybrids. An in silico analysis of the PstI fragments show that almost all the PstI fragments (>99.99%) greater than 2 kb contain GATC sites. We estimated the frequency of GATC sequence in human genomic DNA to be approximately one GATC site per 500 bases of sequence. Following initial digestion of genomic DNA samples with PstI, one of the two GMS reaction samples is methylated by Escherichia coli dam methylase. The efficiency of this reaction can be monitored by subsequent digestion of methylated DNA sample with DpnI, which recognizes and cleaves the GmATC sites, and MboI, which cleaves unmethylated GATC sites.

After the reannealing step, we select for the heterohybrids in the mixture by treating with DpnI and MboI, followed by short incubation with exonuclease III and removal of single-stranded DNA with nitrocellulose filters. To test the efficiency of FPERT and heterohybrid selection steps, we subjected either 10 Ag of non-methylated DNA, 10 Ag of fully methylated DNA, or a mixture of 5 Ag nonmethylated and 5 Ag of fully methylated DNA to FPERT renaturation followed by DpnI/MboI/ExoIII treatment and a filtration through a nitrocellulose column. As can be seen in Fig. 3, hemimethylated heterohybrids (lane A) are selectively recovered following this procedure, whereas nonmethylated and fully methylated homohybrids (lanes B and C) are removed. Step 3: mismatch detection This step relies on the ability of E. coli mismatch repair proteins MutS, L and H to recognize mismatched DNA and introduce a nick into the non-methylated strand of the mismatch-containing heterohybrid. Mut proteins can be purified using a His-tagged purification scheme [25]. Fig. 4 shows the His-tagged mut proteins that were purified by binding to Ni2+-chelation affinity resin. The functional activity of the purified proteins was assessed by testing for MutS binding to mismatch-containing heteroduplexes, and for the endonuclease activity of MutH. Ability of MutS protein to recognize mismatches is

Step 2: FPERT and heterohybrid selection The two DNA samples, which are to undergo GMS selection are combined and then denatured. Reannealing is performed by utilizing the FPERT procedure (formamide phenol emulsion reassociation technique) [24]. We have found by S1 nuclease digestion of the single-stranded DNA remaining in the renaturation reaction that a minimum of 25 hours is required to reassociate enough doublestranded DNA for subsequent analysis (Fig. 2).

Fig. 2. Time course of FPERT reassociation between two genomic DNA samples. Two genomic DNA samples were mixed (PRE) and then denatured and reannealed using FPERT protocol for indicated amounts of time. After FPERT, DNA samples were treated with S1 nuclease to remove single-stranded DNA.

D. Smirnov et al. / Genomics 83 (2004) 335–345

339

unmethylated strand of the mismatch-containing heterohybrids. Exonuclease III is used to extend the nick creating single-stranded DNA that is then removed by nitrocellulose filtration. The procedure yields a mixture that is enriched with perfectly-matched heterohybrids. Step 4: mapping the GMS-selected IBD DNA fragments onto a whole-genome microarray

Fig. 3. Heterohybrid selection. PstI-digested (A) nonmethylated plus methylated DNA, (B) nonmethylated DNA alone, or (C) methylated DNA alone was electrophoresed through agarose gel following FPERT renaturation, DpnI/MboI/ExoIII treatment, and passage through nitrocellulose columns.

tested by a gel electrophoretic mobility assay where purified MutS protein was incubated with radiolabeled perfectlymatched homoduplex DNA or mismatched heteroduplex DNA [26]. As expected, MutS bound preferentially to a heteroduplex with a G/T mismatch and one with a 1-bp insertion compared to a mismatch-free homoduplex (Fig. 5A). The fact that MutS binds to the mismatched heteroduplex DNA and not to perfectly-matched homoduplex DNA in the presence of 20-fold excess of unlabeled homoduplex DNA, suggests that MutS has at least 10 to 20 fold higher affinity to mismatch-containing than to mismatch-free DNA (Fig. 5A). Enzymatic activity of MutH protein, as well the ability of all three proteins to work in accord, can be tested by performing a MutH cleavage assay [27]. Briefly, 32P-labeled mismatch-containing heteroduplex was incubated with MutH protein in the absence or presence of MutS and MutL proteins. The extent of cleavage at the hemimethylated GATC site was monitored on a denaturing polyacrylamide gel. As shown in Fig. 5B, MutS and MutL facilitate the ability of MutH to cleave hemimethylated mismatch containing heteroduplex. In the final steps of the GMS selection procedure hemimethylated heterohybrids are incubated with purified E.coli MutS, MutL and MutH proteins. MutS and L identify the mismatched DNA. They then recruit mutH to nick the

The enriched GMS-selected DNA fragments were mapped by hybridization onto a DNA microarray containing mapped bacterial artificial chromosome (BAC) clones. This allows the determination of the physical location of all IBD fragments in a single hybridization experiment. We have assembled a collection of approximately 5,000 mapped RPCI-11 BAC clones that cover the human genome at about 1-Mb resolution. The clones were anchored to STS markers from the GeneBridge 4 radiation hybrid map. Each clone was mapped by filter hybridization and verified by PCR [21,22]. Chromosomal locations of some of the clones were verified by fluorescent in situ hybridization [28]. The mapped clones were further characterized by HindIII fingerprinting and end-sequencing. The clones were anchored to the human genome sequence assemblies by aligning the end sequences of the BAC clones to the DNA sequences at the UCSC Genome Browser. Information about the mapped BAC clones is available through our web-based database GenMapDB, http://genomics.med.upenn.edu/genmapdb, and through the UCSC Genome Browser (http://genome.ucsc. edu on the GenMapDB clone track). To construct a genomic microarray, we amplified the DNA purified from the BAC clones by inter-Alu PCR [29]. We found that arrays containing inter-Alu amplicons gave more robust and specific signals compared to arrays with pure BAC DNA or BACs amplified using degenerate oligonucleotide primers. This is likely due to the reduction in complexity of the hybridization targets achieved by using the DNA fragments that are between Alu sequences. Depending on the density of Alu sequences and the amplification protocols, inter-Alu amplicons represent about 20% of the clone sequences. This 80% reduction in complexity significantly increases the robustness of hybridizations. Relative-pair mapping To test direct IBD mapping using the protocols and reagents described above, we performed four comparisons between a child and her maternal and paternal grandparents in the CEPH family 1362. In each reaction, the child was compared to one of her grandparents to identify the regions that were shared IBD between the two individuals. GMS selected products were hybridized onto arrays containing a partial set of clones from GenMapDB, mainly mapped BAC clones from chromosomes 2 to 22 and X. The average interclone distance on this array was 3 Mb. The results of the four

340

D. Smirnov et al. / Genomics 83 (2004) 335–345

comparisons for chromosomes 11 to 16 are shown in Fig. 6. It shows the location of contiguous regions on each chromosome that are IBD between the child and her grandparents. According to Mendelian segregation, for each maternal (haploid) chromosome, a child’s genome is IBD at every region with either the maternal grandfather or the maternal grandmother (and similarly for the paternal chromosomes). The IBD regions ‘‘switch’’ from one grandparent to the other where a meiotic crossover occurs. These crossovers are seen in the figure. No recombination events were detected for chromosome 18 in the paternal and maternal comparisons. Studies have suggested that there is an obligatory recombina-

tion event when the duplicated meiotic chromosomes pair [30]. However, 50% of the resulting gametes receive chromosomal products that did not participate in the recombination events; in addition, there may be some failures in enriching/mapping some of the IBD DNA. We compared our data to those from microsatellite genotyping (Table 1). We expect the genotyping and the direct IBD mapping to give identical results. Genotypes for members in CEPH family 1362 are available through the Marshfield Center for Medical Genetics CEPH database. We used those genotypes to determine the grandparents from whom the child has inherited her DNA for different

Fig. 6. Regions on six chromosomes that are IBD between the child and her paternal grandparents. A line is used to represent the contiguous regions shared IBD between a child and one of her grandparents. Crossovers are marked with arrows in the first panel. Other crossovers are not marked but they are located where the IBD regions ‘‘switch’’ from one grandparent to the other.

D. Smirnov et al. / Genomics 83 (2004) 335–345

regions in the genome. The markers were randomly selected from the Marshfield database. To obtain informative data for the 77 sets of IBD ‘‘calls’’ in Table 1, about 200 markers were searched. Many markers were noninformative with alleles that are shared identical-by-state between the child and one or more of her grandparents therefore definitive IBD calls were not possible. Among the 38 regions that were compared between the genotyping and direct IBD mapping data for the child and paternal grandparent comparisons, 7 did not agree. Among the 39 markers between the child and maternal grandparent com-

341

parisons, 13 did not agree. Combining the data, we observed a 74% concordance between our data and those from the microsatellite genotyping.

Discussion Direct IBD mapping is the only mapping technique that does not require a priori knowledge of the location and the type of polymorphic markers, and it also does not require locus-by-locus genotyping. Heteroduplex DNA samples are

Table 1 Comparison of IBD determination by genotyping and by direct IBD mapping Chromosome

Marker

Mb

Genotype (paternal)

Direct IBD mapping (paternal)

Concordance (paternal)

Genotype (maternal)

Direct IBD mapping (maternal)

Concordance (maternal)

2 2 3 3 4 4 5 5 6 6 7 7 8 8 8 9 9 10 10 10 11 11 12 12 13 13 14 14 15 15 16 16 16 17 17 18 18 19 19 20 20 20 21 21 22

MFD330 052XF8 D3S1215 136XC1 196XB6 MFD357 028XB12 MFD300 D6S105 ATA29B09 MFD358 D7S2842 143XD8 D8S131 182XH12 gata63a07 ATA38G03 207WD12 GATA197e01 D10S1125 MFD58 109XC3 294YD9 MFD331 249XB1 D13S1493 MFD335 GATA70B06 MFD351 ATA20E12 MFD240 MFD98 031XA5 051XD10 MFD188 ACT1A01 MFD302 INSR MFD240 MFD136 ACT1A04 D20S102 ATA19C05 MFD338 168TH8

64 132 111 197 3 96 0.3 65 27 68 52 100 19 – 21 28 137 75 77 0.2 2 19 17 135 0.7 119 15 28 41 79 45 50 49 67 84 10 45 9 35 7 16 15 53 55 18 26 42

GM GF GF GF GM GF GM GM GM GM GF GM GM NI GF GF GM GM GM GM GM GF GF GM GF GF GM GM GM GM GF NI GM GM GM GM GM NI GM GM GM GM GM GM GM

GM NI GM GM GM GF GM GM GM GM GM GM GM GM GM GM GM GM GM NI GM GF GF GM NI GM GM GM GM GM GM NI NI GM GM GM GM GM GM GM GM GM GM GM GM

Y N/A N N Y Y Y Y Y Y N Y Y N/A N N Y Y Y N/A Y Y Y Y N/A N Y Y Y Y N N/A N/A Y Y Y Y N/A Y Y Y Y Y Y Y

GF GF GM GF GF GF GM GM GM GM GF GM GF GF GM GM GM GF GF GF GM GF GF GM GF GF NI GM GF NI GF GM GM GM GM GM GF GM GM GF NI GM GM GM GM

GF GF GM GF GM GM GF NI GF GM GF GF GF GF GF GF GF GF GF GF GF GF GM GM NI GM GM GM GF NI GF GF NI GM GM GM GM GM GM GF GM GM GM GM GM

Y Y Y Y N N N N/A N Y Y N Y Y N N N Y Y Y N Y N Y N/A N N/A Y Y N/A Y N N/A Y Y Y N Y Y Y N/A Y Y Y Y

342

D. Smirnov et al. / Genomics 83 (2004) 335–345

formed by hybridizing DNA samples from two related individuals. Mismatch-containing heteroduplexes are removed using the E.coli mismatch repair proteins. The surviving DNA is identical in sequence and likely represents regions that are shared identical-by-descent. However, it is possible that even large DNA fragments that are identical in sequence are identical-by-state and not necessarily IBD. The mismatch scanning is performed on thousands of DNA fragments within the genome thus tens of thousands of markers are screened simultaneously. In this paper, we provide an update on the development of direct IBD mapping and a description of the resources that we have assembled to enable its development. We optimized the procedure, especially the first step, where DNA samples are reannealed using FPERT. By using nitrocellulose filter instead of benzoylated naphthylated DEAE cellulose (BNDC) to remove single-stranded DNA, we have minimized the loss of double-stranded DNA and therefore increased the yield of reannealed DNA from f10 –15% to f50%. This is important because in the past, most of the losses of DNA occur at this first reannealing step [16]. We demonstrated the effectiveness of direct IBD mapping in isolating IBD regions shared between close relatives. Using purified mut proteins, we enriched for IBD regions shared between the genomes of related individuals. Those IBD DNA fragments were mapped by hybridization onto a genomic microarray that contains clones for 22 chromosomes. We noted some variability in the level of selection. The comparisons were performed on childgrandparent pairs. Crossover events are defined when a region is switched from IBD with one grandparent to the next. In our data, for some chromosomes, no recombinants were found. This is probably due to the fact that only 50% of the gametes can receive the recombinant meiotic product. In addition, when we compare our results to those from microsatellite genotyping, we observed some discrepancies. We assume that the discrepancies reflect poor selection of specific PstI fragments in our assay. However, it is also possible that the errors are in the genotyping data. The poor selection in some regions may be due to several factors. First, the mut proteins may have failed to identify the mismatches or may have removed some mismatch-free DNA. Some regions of the genome may be less susceptible to mismatch detection by mut proteins – these include regions without GATC sites that are required for mismatch recognition by mut proteins or those that are highly repetitive and therefore do not reanneal properly during the FPERT procedure. With the availability of a set of mapped BAC clones, we can begin to perform a large number of direct IBD mapping and identify the regions that are less prone to selection. The selection may be enhanced by choosing different initial restriction enzymes to alter the size of the average DNA fragments and therefore their sequence content. In addition, now with an abundant

amount of mut proteins that can be prepared, the mut proteins can be titrated to maximize mismatch detection. Other ways to improve mismatch detection include an additional round of detection with mut proteins, or using other mismatch detection proteins such as DNA glycosylases including TDG and MutY, which have been shown to identify mismatches in heteroduplex DNA [31,32]. Previously, when the GMS-selected DNA fragments were mapped by genotyping with microsatellite markers, more DNA was necessary for the mapping step. This limited options for multiple mismatch detection steps. However, with the development of whole genome arrays, smaller amounts of DNA are required. Direct IBD mapping has the potential to screen all the polymorphic sites between two genomes in a single reaction. The IBD maps generated by this technique can be applied to mapping human disease genes and for studying the meiotic patterns in the human genome.

Material and methods In silico PstI digestion PstI fragments were identified by searching for the recognition sequence for PstI (5V-CTGCAG-3V) in human DNA sequences at the UCSC human genome browser (June 2002 build) using the software package, fuzznuc (http:// www.hgmp.mrc.ac.uk/Software/EMBOSS/Apps/index. html). The DNA fragments generated by in silico cleavage at the recognition sequence were organized by their sizes and stored in a relational database. Purification of E coli MutS, H and L proteins E. coli strains pTX412 (MutS), pTX417 (MutH) and pTX418 (MutL) were described previously (Feng and Winkler, 1995). Proteins were purified from E. coli as previously described with the following modifications. MutS expression was induced with 0.25 mM isopropylthio-h-D-galactoside (IPTG) for 1.5 hours at 28jC. MutH and MutL expression was induced with 0.5 mM IPTG for 1.5 hours at 28jC. Purified proteins eluted from the nickel column were dialyzed overnight at 4jC in 50 mM HEPESNaOH pH 7.2, 100 mM KCl, 1 mM EDTA, 1 mM DTT, 50% glycerol. MutS binding assay The following deoxyoligonucleotides were used for MutS gel shift assays: Topmut 5V-GCTAGCAAGCTTTCGATTCTAGAAATTCGATCAGCAT-3V, Botmut/TG 5V-ATGCTGATCGAATTTCTAGAATCGAGAGCTTGCTAGC-3V, Botmut/TA 5V-ATGCTGATCGAATTTCTAGAATCGAAAGCTTGCTAGC-3V, Botmut/TTT 5V-ATGCTGATC GAATTTCTAGAATCGGCTTGCTAGC-3V(Operon

D. Smirnov et al. / Genomics 83 (2004) 335–345

Technologies). The following oligos were used for the MutH assay: MutHtopID 5V-ACGGCAGAAGGGTAGCAGCACTGAGCGTGTGGTTCCTTATGGCAAAG A A A C G T G A C G T T G C AT G C TA G C TA A G C T C GATCCGTACAAGTATT-3V, MutHbot 5V-AATACTTGT A C G G 5T C G A G C T TA G C TA G C AT G C A A C G TC A C G T T T C T T T G C C ATA A G G A A C C A C C G C TCAGTGCTGCTACCCTTCTGCCGT-3V, where 5 denotes N6-methyladenine. Gel shift assay was performed as described previously with the following modifications [26]. 3 Ag MutS protein preparation was incubated with 2 ng of 32P-labeled DNA (A/T homoduplex, G/T mismatch, 1 bp insertion – see sequences in the above section) in a binding buffer (20 mM Tris-HCl pH 7.6, 10 mM EDTA, 5 mM MgCl, 0.1 mM DTT) in the presence of 20-fold excess of unlabeled mismatch-free oligonucleotide. MutH cleavage assay MutH assay was performed as described previously [27]. Briefly, 0.1 pmol of a 90 bp DNA substrate that contains a hemimethylated GATC site (32P-labeled on the non-methylated strand) was incubated with MutH (0.5 Ag) alone or with MutH (0.5 Ag), MutL (3.5 Ag) and MutS (10 Ag) proteins in a binding buffer (20 mM HEPES– NaOH pH 7.8, 50 mM NaCl, 5 mM MgCl2, 1 mM DTT and 0.1 mg/ml BSA) in a total volume of 30 ul. Aliquots of the samples were removed at various timepoints. The aliquots were denatured at 90jC and analyzed on denaturing polyacrylamide gels. After electrophoresis, gels were dried and analyzed on a Storm PhosphoImager. Genomic mismatch scanning The lymphoblastoid cell lines GM11992, GM11993, GM11994, GM11995, GM11982 representing the CEPH family 1362 were obtained from the Coriell Cell Repository. Lymphoblastoid cells of the subjects were grown in RPMI 1640 medium with 15% fetal bovine serum, 1% penicillinstreptomycin, 1% L-glutamine at 37jC in a humidified 5% CO2 chamber. Cells were grown to a density of f1  106/ ml. Genomic DNA was isolated from these cells using Genomic DNA Purification Kit (Gentra Systems, MN, USA) according to manufacturer’s instructions. Genomic DNA samples were digested with PstI (New England Biolabs, MA, USA) restriction enzyme. For each comparison, one of the DNA samples was methylated with dam methylase (10 units/Ag of genomic DNA) in the presence of 160 AM S-adenosylmethionine (New England Biolabs, MA, USA). 5 Ag of PstI digested genomic DNA from one individual was mixed with 5 Ag of PstI digested and methylated genomic DNA from the other individual in a volume of 94 Al. DNA was denatured by incubation of mixed samples with 6 Al of freshly prepared 5N NaOH at room temperature for 15 minutes. NaOH was neutralized by

343

addition of 15.5 Al of 3 M MOPS (pH is approximately 8 after neutralization). To this reaction 53 Al of water, 32 Al of formamide and 200 Al of FPERT buffer (4 M sodium thiocyanate, 20 mM Tris –HCl pH 8.0, 0.2 mM EDTA) and 150 Al of water-saturated phenol were added until emulsion was formed. Samples were mixed on a wrist-action shaker for at least 48 hours at room temperature. Samples were then chloroform extracted twice and DNA was precipitated with EtOH. Samples were reconstituted in 150 Al of 0.5 M NaCl, 10 mM Tris – HCl pH 8.0 and spun through a Centrex MF1.5 0.2 Am nitrocellulose filter assembly (Schleicher & Schuell) for 3 minutes at 500g. The nitrocellulose filter assembly was washed with an additional 25 Al of 0.5 M NaCl, 10 mM Tris – HCl pH 8.0 and spun again. Eluted DNA was precipitated with EtOH and reconstituted in 1 buffer 3 (New England Biolabs, MA, USA) and digested with DpnI (2 units) and MboI (2 units) at 37jC for 30 minutes, followed by addition of an equal volume of exonuclease III (10 units) in 66 mM Tris – HCl pH 8.0, 0.66 mM MgCl2. Incubation was at 37jC for 15 minutes. After heat inactivation (5 minutes at 65jC), samples were brought to 150 Al in 0.5 M NaCl, 10 mM Tris – HCl pH 8.0 and centrifuged through a Centrex MF-1.5 nitrocellulose filter assembly as described above. Eluted DNA was EtOH precipitated. The DNA mixture was incubated with MutS (3500 ng), MutL (1700 ng) and MutH (50 ng) in 50 Al of 50 mM HEPES –NaOH pH 8.0, 20 mM KCl, 4 mM MgCl2, 1 mM DTT, 50 Ag BSA/ml and 2 mM in ATP at 37jC for 45 minutes. Exonuclease III (4 units) was added with incubation at 37jC for 15 minutes followed by heat inactivation. The reaction was brought to 150 Al in 0.5 M NaCl, 10 mM Tris – HCl pH 8.0 and centrifuged through a Centrex MF-1.5 nitrocellulose filter assembly as described above. The eluted DNA, now enriched in IBD fragments, was EtOH precipitated and dissolved in 30 Al of 10 mM Tris-HCl pH 8.0, 1 mM EDTA. Microarray preparation, hybridization and analysis BAC DNA was isolated using Qiagen maxiprep kit according to manufacturer’s instructions. DNA was then amplified using inter-Alu PCR approach as described previously [13]. About 80% of the BAC DNAs contained interAlu fragments. PCR reactions were EtOH precipitated and resuspended in 2 SSC, 0.01% sarkosyl and arrayed in duplicate on CSA 100 silanated slides (CEL Associates Inc., TX, USA) using Affymetrix 417 arrayer. DNA from each GMS reaction was labeled with Cy3 – dCTP and a reference genomic DNA was labeled with Cy5 – dCTP (Amersham Pharmacia Biotech) using Bioprime random priming kit (Invitrogen). The labeled samples were purified through Microspin G50 columns (Amersham Pharmacia Biotech), EtOH precipitated with 100 Ag of Cot-1 DNA and 100 Ag of yeast tRNA and hybridized to arrayed slides overnight at 60jC in ExpressHyb buffer (Clontech, CA, USA). All hybridizations were performed in duplicate.

344

D. Smirnov et al. / Genomics 83 (2004) 335–345

Slides were washed at 55jC once with 2 SSC, 0.2% SDS for 10 minutes, then at room temperature once with 2 SSC for 10 minutes and once with 0.2 SSC for 10 minutes. Slides were spin dried and scanned using Affymetrix 428 Scanner. Images were analyzed using ArrayVision 6.0 software package (Imaging Research Inc.). Each array was normalized so that the overall intensity ratio of Cy3 to Cy5 was 1. Array analysis For each clone, a Cy3/Cy5 intensity ratio (R) was calculated. To determine if a child has inherited a region represented by a clone from the grandfather or the grandmother, we calculated a ratio of ratios (RV= Rgrandfather/Rgrandmother). If RVis greater than 1, then the clone is scored as IBD with the grandfather and if it is less than 1, then the clone is scored as IBD with the grandmother. Clones where the replicates do not agree are not included—this corresponds to about 10% of the clones. Most of these are clones that give little hybridization signals, mostly due to PCR failures when preparing the inter-Alu amplicons. The data were also screened for the presence of tight double recombinants. Using similar rules as those in constructing the human genetic maps, we screened and removed any two recombination events separated by a small distance (<3 Mb) [33 – 35]. This removes about 6% of the datapoints. The sources of these ‘‘errors’’ are mostly errors in clone order. Also included in the 6% are a few cases where the R’s were equal to one indicating that the selection of IBD regions is incomplete (non-specificity in the GMS procedure). The sources of these ‘‘errors’’ are mostly errors in clone order and errors in determining IBD (non-specificity in the GMS procedure).

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11] [12] [13] [14] [15]

[16]

[17]

Acknowledgments We thank Malcolm Winkler (Lilly Research Laboratories) for the clones for His6-MutH, His6-MutL and His6MutS repair proteins, Peggy Hsieh (National Institutes of Health) for advice on purification and characterization of the mut proteins and Terry Furey (University of California, Santa Cruz) and David Haussler (University of California, Santa Cruz) for help with building the GenMapDB track on the UCSC Genome Browser. We also thank Yasmin Cruz (The Children’s Hospital of Philadelphia) for technical assistance. This work is supported by grants from the NIH (HG01880, DC00154, and CA87769 to V.G.C.).

References [1] X. Chen, P.Y. Kwok, Template-directed dye-terminator incorporation (TDI) assay: a homogeneous DNA diagnostic method based on fluorescence resonance energy transfer, Nucleic Acids Res. 25 (1997) 347 – 353. [2] D.G. Wang, et al., Large-scale identification, mapping, and genotyp-

[18]

[19]

[20]

[21] [22] [23] [24]

[25]

[26]

ing of single-nucleotide polymorphisms in the human genome, Science 280 (1998) 1077 – 1082. S. Dong, et al., Flexible use of high-density oligonucleotide arrays for single-nucleotide polymorphism discovery and validation, Genome Res. 11 (2001) 1418 – 1424. A. Oliphant, D.L. Barker, J.R. Stuelpnagel, M.S. Chee, BeadArray technology: enabling an accurate, cost-effective approach to highthroughput genotyping, Biotechniques Suppl. (2002) 56 – 58, 60 – 61. X. Chen, P.Y. Kwok, Homogeneous genotyping assays for single nucleotide polymorphisms with fluorescence resonance energy transfer detection, Genet. Anal. 14 (1999) 157 – 163. W.M. Howell, M. Jobs, U. Gyllensten, A.J. Brookes, Dynamic allelespecific hybridization: a new method for scoring single nucleotide polymorphisms, Nat. Biotechnol. 17 (1999) 87 – 88. T.J. Griffin, J.G. Hall, J.R. Prudent, L.M. Smith, Direct genetic analysis by matrix-assisted laser desorption/ionization mass spectrometry, Proc. Natl. Acad. Sci. USA 96 (1999) 6301 – 6306. K. Lindblad-Toh, et al., Large-scale discovery and genotyping of single-nucleotide polymorphisms in the mouse, Nat. Genet. 24 (2000) 381 – 386. J.B. Fan, et al., Parallel genotyping of human SNPs using generic high-density oligonucleotide tag arrays, Genome Res. 10 (2000) 853 – 860. R.G. Blazej, B.M. Paegel, R.A. Mathies, Polymorphism ratio sequencing: a new approach for single nucleotide polymorphism discovery and genotyping, Genome Res. 13 (2003) 287 – 293. N. Risch, K. Merikangas, The future of genetic studies of complex human diseases, Science 273 (1996) 1516 – 1517. S.B. Gabriel, S.F. Schaffner, H. Nguyen, The structure of haplotype blocks in the human genome, Science 296 (2002) 2225 – 2229. V.G. Cheung, et al., Linkage-disequilibrium mapping without genotyping, Nat. Genet. 18 (1998) 225 – 230. S.F. Nelson, et al., Genomic mismatch scanning: a new approach to genetic linkage mapping, Nat. Genet. 4 (1993) 11 – 18. F. Mirzayans, A.J. Mears, S.W. Guo, W.G. Pearce, M.A. Walter, Identification of the human chromosomal region containing the iridogoniodysgenesis anomaly locus by genomic-mismatch scanning, Am. J. Hum. Genet. 61 (1997) 111 – 119. V.G. Cheung, S.F. Nelson, Genomic mismatch scanning identifies human genomic DNA shared identical by descent, Genomics 47 (1998) 1 – 6. L. McAllister, L. Penland, P.O. Brown, Enrichment for loci identicalby-descent between pairs of mouse or human genomes by genomic mismatch scanning, Genomics 47 (1998) 7 – 11. W.J. Ewens, R.S. Spielman, H. Harris, Estimation of genetic variation at the DNA level from restriction endonuclease data, Proc. Natl. Acad. Sci. USA 78 (1981) 3748 – 3750. D.N. Cooper, B.A. Smith, H.J. Cooke, S. Niemann, J. Schmidthke, An estimate of unique DNA sequence heterozygosity in the human genome, Hum. Genet. 69 (1985) 201 – 205. M. Schena, D. Shalon, R.W. Davis, P.O. Brown, Quantitative monitoring of gene expression patterns with a complementary DNA microarray, Science 270 (1995) 467 – 470. V.G. Cheung, et al., A resource of mapped human bacterial artificial chromosome clones, Genome Res. 9 (1999) 989 – 993. V.G. Cheung, et al., Integration of cytogenetic landmarks into the draft sequence of the human genome, Nature 409 (2001) 953 – 958. K.G. Au, K. Welsh, P. Modrich, Initiation of methyl-directed mismatch repair, J. Biol. Chem. 267 (1992) 12142 – 12148. N.J. Casna, D.F. Novack, M.T. Hsu, J.P. Ford, Genomic analysis. II. Isolation of high molecular weight heteroduplex DNA following differential methylase protection and formamide – PERT hybridization, Nucleic Acids Res. 14 (1986) 7285 – 7303. G. Feng, M.E. Winkler, Single-step purifications of His6-MutH, His6MutL and His6-MutS repair proteins of Escherichia coli K-12, Biotechniques 19 (1995) 956 – 965. M.J. Schofield, et al., The Phe-X-Glu DNA binding motif of MutS:

D. Smirnov et al. / Genomics 83 (2004) 335–345

[27]

[28]

[29]

[30]

the role of hydrogen bonding in mismatch recognition, J. Biol. Chem. 276 (2001) 45505 – 45508. M.J. Schofield, S. Nayak, T.H. Scott, C. Du, P. Hsieh, Interaction of Escherichia coli MutS and MutL at a DNA mismatch, J. Biol. Chem. 276 (2001) 28291 – 28299. I.R. Kirsch, T. Ried, Integration of cytogenetic data with genome maps and available probes: present status and future promise, Semin. Hematol. 37 (2000) 420 – 428. C. Lengauer, E.D. Green, T. Cremer, Fluorescence in situ hybridization of YAC clones after Alu-PCR amplification, Genomics 13 (1992) 826 – 828. A. Yu, et al., Comparison of human genetic and sequence-based physical maps, Nature 409 (2001) 951 – 953.

345

[31] X. Pan, S.M. Weissman, An approach for global scanning of single nucleotide variations, Proc. Natl. Acad. Sci. USA 99 (2002) 9346 – 9351. [32] Y. Zhang, M. Kaur, B.D. Price, S. Tetradis, G.M. Makrigiorgos, An amplification and ligation-based method to scan for unknown mutations in DNA, Hum. Mutat. 20 (2002) 139 – 147. [33] C. Dib, et al., A comprehensive genetic map of the human genome based on 5,264 microsatellites, Nature 380 (1996) 152 – 154. [34] K.W. Broman, J.C. Murray, V.C. Sheffield, R.L. White, J.L. Weber, Comprehensive human genetic maps: individual and sex-specific variation in recombination, Am. J. Hum. Genet. 63 (1998) 861 – 869. [35] A. Kong, et al., A high-resolution recombination map of the human genome, Nat. Genet. 31 (2002) 241 – 247.