Genome Mapping

Genome Mapping

Genome Mapping VK Tiwari, Kansas State University, Manhattan, KS, USA JD Faris, USDA-ARS Cereal Crops Research Unit, Fargo, ND, USA B Friebe, Kansas S...

984KB Sizes 1 Downloads 168 Views

Genome Mapping VK Tiwari, Kansas State University, Manhattan, KS, USA JD Faris, USDA-ARS Cereal Crops Research Unit, Fargo, ND, USA B Friebe, Kansas State University, Manhattan, KS, USA BS Gill, Kansas State University, Manhattan, KS, USA ã 2016 Elsevier Ltd. All rights reserved.

Topic Highlights

• • • • •

Molecular markers are important for genetic and genome mapping studies. Next-generation sequencing-based marker genotyping, such as genotyping by sequencing, is an important aid for gene and genome mapping. Single-nucleotide polymorphism-based marker development and their detection. Genome mapping methods use recombination-dependent and recombination-independent approaches. Comparative mapping is an important tool for genome analysis in the crops where sequence information is not available.

Learning Objective



To achieve an understanding of the commonly used molecular markers and approaches used for genome mapping

Introduction Genome mapping is used to assign short DNA sequences (molecular markers) or specific genes to particular regions of chromosomes and to determine their relative linear orders and distances. A map is an essential tool for scientists to navigate across the genome. Genome maps can be divided into two groups: genetic maps and physical maps. Genetic maps are based on recombination frequencies between genetic markers and genes, and linked markers/genes form linkage groups showing their relative order. A physical map of a given chromosome or a genome shows the physical locations of genes and other DNA sequences of interest, and distances are typically measured in base pairs. Physical maps can be divided into three general types: chromosomal or cytogenetic maps, radiation hybrid (RH) maps, and sequence maps. The ultimate physical map is the complete sequence itself.

Molecular Markers and Their Visualization DNA-based genetic markers rely on differences in DNA sequences (polymorphisms) between two parental lines. Polymorphisms can result from various factors that lead to either nucleotide changes or differences in DNA segment lengths such as mutations, errors in DNA replication, and insertions, inversions, and deletions of DNA fragments.

Reference Module in Food Sciences

There are several established approaches for the detection of polymorphisms using molecular markers including restriction fragment length polymorphism (RFLP), random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), sequence-tagged site (STS), microsatellites or simple sequence repeats (SSRs), and single nucleotide polymorphism (SNP). Originating in the 1980s, RFLP markers were the first type of DNA-based markers to be used. RFLPs involve the use of a restriction enzyme, which cleaves DNA at specific DNA sequence palindromes, and the hybridization of a short-labeled DNA fragment, or probe, to the restriction enzyme-cleaved DNA. The probe label reveals the restriction fragment hybridized by the probe, and polymorphisms are revealed when an insertion/deletion occurs between critical restriction sites in one genotype compared to the other or when a particular restriction site is abolished due to mutation in one genotype and not the other. RFLP markers can be applied to essentially any organism, and they are still employed to a limited extent today due to their usefulness in comparative mapping analysis and map-based cloning studies. However, these markers are not amenable to high-throughput analysis, and they are difficult and laborious to handle due the large amounts of DNA required, enzymatic digestions, Southern blotting, and probe labeling techniques. Besides the RFLP marker technique, all the other types are based on the use of polymerase chain reaction (PCR). PCRbased markers require the development of an oligonucleotide primer, which is a fragment of DNA typically 15–30 nucleotides in length, to serve as a starting point for PCR amplification on template DNA. In a PCR reaction, template DNA is mixed with primers, nucleotides, and a specific enzyme called Taq polymerase, which polymerizes DNA fragments. The mixture is placed into a thermal cycler and subjected to repeated cycles of different temperatures to allow the template DNA to denature, the oligonucleotide primers to anneal to complementary sites on the template DNA, and the Taq polymerase to catalyze the synthesis of new DNA strands leading to the generation of billions of copies of the target sequence. After the completion of the PCR reaction, the amplified product is electrophoresed through an agarose or polyacrylamide gel and subsequently visualized by DNA staining or other technologies. RAPD markers are DNA fragments from PCR-based amplification of random segments of genomic DNA with a primer of arbitrary nucleotide sequences. RAPD markers were the first PCR-based markers to be used but, today, have very limited application in molecular biology and mapping studies due to the unpredictability of short primers in PCR and low repeatability. AFLPs, which combine the use of restriction enzymes with PCR, have been used extensively in a wide range of organisms.

http://dx.doi.org/10.1016/B978-0-08-100596-5.00220-1

1

2

GENETICS OF GRAINS | Genome Mapping

The AFLP technique uses restriction enzymes to digest the genomic DNA followed by ligation of adapters to the sticky ends of the restriction fragments to serve as priming sites for PCR. Subsets of the restriction fragments are selected by using primers with sequencing complimentary to the adapter sequence and also one or two nucleotides within the restriction fragments of the template DNA. The reactions often employ end-labeled radioactive or fluorescent primers for the visualization of the amplified products on polyacrylamide gels. The AFLP technology is also highly sensitive and reproducible and has the capability to detect various polymorphisms in different genomic regions simultaneously. AFLP has higher reproducibility, resolution, and sensitivity at the whole-genome level compared to some of the other marker techniques, and it also has the capability to amplify multiple fragments (50–100) in a single PCR, which provides a high-throughput format. STSs are short DNA sequences (200–500 bp) with known genomic locations. STSs can be easily detected by the PCR using specific primers. In complex genomes, STS markers derived from the coding regions of genes, that is, the expressed portion of genome referred to as expressed sequence tags (ESTs), can be a very useful resource for mapping the locations of expressed genes. These markers are usually codominant in nature, which allows the identification of homozygous and heterozygous individuals in a mapping population. The STS sequences may contain repetitive elements with unique and conserved sequences at both ends of the site, and in broad sense, STS can have a site for markers such as microsatellites, sequence-characterized amplified region, cleaved amplified polymorphic sequences, and inter-simple sequence repeats. Microsatellite markers, also called SSRs, are widely used in gene and genome mapping studies. These are simple sequence tandem repeats and the repeat units are generally di-, tri- tetra-, or pentanucleotides. In a common repeat motif (e.g., in a trirepeat motif in wheat (GAA)n), two nucleotides G and A are repeated for a variable number of times in a bead-like fashion (n could range from 8 to 50). SSRs are usually found in noncoding regions of DNA with a few exceptions. On both sides of the repeat unit are flanking regions that contain unordered DNA, and these flanking regions are most important to develop locus-specific primers to amplify SSRs with PCR. The number and repeats within a microsatellite tend to be highly variable within a given species, which leads to a high frequency of polymorphism even among closely related individuals.

Many large and complex genomes, especially those of some plants, are composed of only about 10–20% gene sequences, whereas the vast majority (80–90%) is composed of transposable element (TE)-related sequences or repeat-based sequences. These repetitive or TEs are widespread throughout the genome and therefore represent a useful resource for whole-genome mapping. These elements have higher levels of tolerance for mutations or rearrangements, which make these TEs highly polymorphic and a good source of marker development for genome mapping. Various TE-based marker development approaches have been used and some of the most common repeat-based markers, which were developed in wheat, belong to two classes including insertion sitebased polymorphism markers and repeat junction markers. These markers are based on PCR with primers designed in conserved regions of TEs. In general, repeat sequences in the genome are not unique, but the insertion sites or repeat junctions are. Therefore, by developing primers that are specific to particular insertion sites or repeat junctions, it is possible to develop genome-specific markers (Figure 1). After the identification of an insertion site or repeat junction, the flanking sequences can be used to design the primers. After the fragment is PCR-amplified, there are various detection methods available for visualizing the marker polymorphisms including high-resolution melting analyses, temperature gradient capillary electrophoresis, and fluorescent capillary electrophoresis. With advances in next-generation sequencing (NGS) technology, it is less expensive to determine the DNA sequence of a fragment, and this has led to dramatic advances in highthroughput marker technologies. With restriction siteassociated DNA (RAD) markers, the flanking DNA sequence around each restriction site is an integral component for isolation of restriction site-associated tags. The application of the flanking DNA sequences in RAD tag techniques is referred as reduced-representation method. The RAD tag isolation procedure has been modified for use with high-throughput sequencing on the Illumina sequencing platform, to reduce error rates and make the process high throughput. Isolated RAD tags can be used to identify and genotype DNA sequence-based polymorphisms such as SNPs, and these polymorphic sites are called as RAD markers. The advent of automated Sanger sequencing and especially recent advances in NGS technologies led to the development of a second generation of markers based on sequence information. SNPs differ by a single nucleotide A, T, C, or G at a given

Transposable sequences TE junction (a) (b)

Gene or unknown sequence (c) (d)

Figure 1 Types of repeat junctions in a given genomic DNA sequence that can be used for designing unique locus-specific markers: (a) A repeat junction between two different transposable elements (TEs). (b) Two repeat junctions with two different TEs (black and green) and an unknown sequence (pink). (c) Repeat junction with a TE on one side and a gene fragment or unknown sequence on the other side. (d) Two repeat junctions (nested) created by a TE inserting into another TE.

GENETICS OF GRAINS | Genome Mapping

locus between different individuals, populations, and parental lines (Figure 2). If this variation occurs between the members of the same population, these variations are considered alleles (e.g., A or T), and most SNPs have only two alleles. SNPs have emerged as the markers of choice because of their abundance and high-throughput detection capacities. There are many ways to identify SNPs starting from a low-throughput method like PCR amplification followed by electrophoresis, sequence detection, and mass spectrometry to high-throughput NGSbased SNP discovery. After generating sequences for SNP discovery, the next step is to detect useful SNPs. Manual identification of putative SNPs had been a major bottleneck for high-throughput SNP calling, but now, there are numerous

3

software programs available for SNP discovery. These programs (CASAVA, GS Amplicon, BioScope™, NextGENe®, GigaBayes, SNPdetector, PolyScan, etc.) are very important for the development of accurate computational methods for automated SNP calling. There are established approaches and protocols for SNP discovery in many species, and for species with reference genome sequences, NGS reads can be mapped on the reference sequences and SNP discovery can be made. However, SNP discovery can also be done in the species without a reference sequence. There are many assays available for SNP genotyping including Illumina GoldenGate, KASPar, iPLEX Gold technology, and Illumina BeadChips, to name a few (Figure 3(a) and 3(b)). Exciting progress has been made in

SNP site Genotype1 TTGGCCTGATTTTAGTGGTACGGCCCCGTCACCCGTGATTGGTGAAGTTGGAATGGAGGA Genotype2 TTGGCCTGATTTTAGTGGTATGGCCCCGTCACCCGTGATTGGTCAAGTTGGAATGGAGGA ∗∗∗∗ ∗∗ ∗∗∗ ∗∗ ∗∗∗ ∗∗ ∗∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ Figure 2 Identification of SNP sites in a DNA sequence: Two SNPs between genomic DNA of two genotypes are shown. The length of the sequence is 60 base pairs, and genotype 1 and genotype 2 show variants at positions 21 [C/T] and 44 [G/C].

A T

A T

G C

A

G C A T A

G

G C G

A T A

T

A

G

C

G C G

A T A G C G C

TA

CG A T A T G C G C

Figure 3 (a) Hybridization-based SNP genotyping method (Illumina Infinium assay): In this assay, the genomic DNA is captured by direct hybridization to array-bound target sequences (50 bases directly upstream of the SNP). Followed by hybridization, a single-base extension reaction with dideoxynucleotides (fluorescent) is used at the target SNP nucleotide. Differences in the relative intensity of fluorescent signals can be used to make genotyping calls. (b) PCR-based genotyping methods (Applied Biosystems’ TaqMan assay): For each locus, two common locus-specific primers are designed on each side of the SNP to amplify the fragment spanning the polymorphic site. Two fluorescence resonance energy transfer (FRET)-labeled oligonucleotides called TaqMan probes are then added to the PCR. Each probe is specific to one of the alleles and is designed to hybridize at the SNP site between the forward and reverse primers. By design, these have a reporter dye at their 50 end (different for each allele) and a quencher (Q) at their 30 end. If there is no reaction, the probes are intact and the reporter dye’s emission is suppressed by the quencher. During the PCR amplification, the Taq polymerase cleaves the probe that anneals to the template and separates reporter and quencher resulting in the emission of fluorescence from the reporter. Genotype calling can then be made according to the fluorescent signal.

4

GENETICS OF GRAINS | Genome Mapping

chromosomes pair at meiosis, they recombine at various positions along the chromosomes. Thus, recombination is the basis for genetic linkage mapping and determining the order of markers along the chromosome, that is, markers are separated by genetic distances calculated based on the amount of meiotic recombination that occurs between them. An example of genetic linkage mapping of three linked markers in 20 F2 progeny is presented in Figure 4. The markers include two DNA markers (A and B) and one morphological marker (disease resistance gene ‘R’). The DNA markers are codominant, and therefore, all possible genotypes can be determined in the F2 progeny (homozygous for parent A, homozygous for parent B, and heterozygous). For the morphological marker, disease resistance is dominant, and therefore, the genotypic classes of heterozygous and homozygous for the resistant parent (parent A) cannot be distinguished (resistant plants can have allelic compositions of ‘RR’ or ‘Rr,’ and susceptible plants have ‘rr’). Inspection of Figure 4 indicates there are three individuals (2, 6, and 12) with genotypes that differ between markers A and B. Between A and R, there are two individuals (6 and 12) with differing genotypes, and one individual (2) has differing genotypes between markers B and R. This suggests that marker R (disease resistance gene) lies between markers A and B. The two recombination events between markers A and R translate into ten map units (2/ 20  100 ¼ 10), and there are five map units between markers B and R (1/20  100 ¼ 5).

sequencing technologies that are providing high-throughput molecular marker information at low costs. Genotyping by sequencing (GBS) provides marker polymorphisms using NGS technologies followed by a bioinformatics pipeline. It is a preferred method for several reasons including reduced cost through an enzyme-based genomic complexity reduction step and the use of barcoded adapters for multiplexing. Additionally, it can be used for the discovery and identification of SNPs, even for those species with complex genomes that lack a reference sequence. GBS has advantages when studying polyploid species, which is a big challenge for any technology. It relies on secondary genome-specific polymorphisms that are next to the SNP, and it allows the assignation of a given sequence to a specific genome so it becomes a single-locus marker.

Genetic Linkage Mapping

Parent B F1

Parent A

Markers are powerful for many diagnostic applications for typing biological samples in determining the identity of unknown samples, sample mixtures, criminal justice system, and curation of biological collections, to name a few. Highdensity genetic linkage maps facilitate map-based cloning, quantitative trait mapping, marker-assisted breeding, and comparative genome evolution. Genetic mapping relies on the fact that nuclear genomes are made up of chromosomes, which contain both genes and noncoding DNA. When homologous

F2 progeny 1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20

R

R

S

R

R

R

R

R

S

Marker A

Marker B

Disease resistance (R) gene

R

S

R

R

S

R

R

R

R

S

R

R

R

R

Phenotype: R = resistant; S = susceptible Genotypes: Parent A = RR; Parent B = rr; F1 = Rr; F2 progeny R = RR or Rr, S = rr Linkage Map B

R gene 5

A 10

Figure 4 Genotypic data of two DNA markers (A and B) and phenotypic data for one morphological marker (disease resistance gene ‘R’) for two parents, the F1 plant derived from crossing the two parents, and 20 F2 individuals. The DNA markers are codominant; thus, all possible genotypes can be distinguished (homozygous for parent A and heterozygous and homozygous for parent B). The morphological marker ‘R’ is dominant, and therefore, the genotypes of resistant F2 individuals cannot be distinguished (resistant plants can be either homozygous for parent A (RR) or heterozygous (Rr)). The resulting genetic linkage map of the three loci and genetic distances separating them are shown at the bottom.

GENETICS OF GRAINS | Genome Mapping

This type of analysis can be applied to hundreds, or even thousands of markers to construct complete genetic linkage maps of chromosomes. Fortunately, there are various computer software programs available to handle such large data sets and to determine the most likely marker orders and intermarker distances. The number of individuals surveyed in a mapping population determines the precision of the genetic distance measured. In the example, only 20 individuals were surveyed, and if no recombinants were identified between two markers, this would translate to a genetic distance of 0 map units between the markers. If 100 individuals were surveyed, then one or more recombinants may be identified leading to a genetic distance of one or more map units. Generally, initial genetic maps of plant species are generated using 80–120 individuals, which allows for the detection of recombination between markers one to three map units apart. This level of precision is considered acceptable, and, at the same time, the amount of labor and cost is considered manageable. However, certain mapping experiments such as map-based cloning of genes by chromosome walking require much higher resolution in order to separate markers extremely close to the target gene. In these experiments, it is not uncommon to survey 3000–5000 individuals to obtain the necessary level of precision. In plants, most populations are derived from crossing two highly homozygous parents. The population shown in the example in Figure 4 is an F2 population. While F2 populations are commonly used and generally a good choice for chromosome mapping, other types of populations, such as backcross (BC), doubled-haploid (DH), and recombinant inbred (RI), are also commonly used. However, DH technology is not easily accomplished in some crops, and it is currently impossible in others. Each type of population has its advantages and disadvantages. F2, BC, and DH populations can be developed very rapidly, while RI populations are developed by advancing each line by single-seed descent for many generations with the goal of selfing to homozygosity. F2 and BC populations are shortlived and provide limited opportunity to obtain DNA and phenotypic data, while DH and RI populations provide essentially pure lines that may be tested for traits in replicated experiments over several environments if desired. Thus, RI and DH populations are preferred for mapping of quantitative traits that may be affected by environmental influences. BC and DH populations result from one cycle of meiosis, but an F2 population has undergone recombination in both male and female gametes and, therefore, provides twice the recombination information. RI populations have undergone several cycles of meiosis but contain two identical homologues and, therefore, provide about the same amount of information as an F2. The development and analysis of genetic linkage maps lead to an abundance of information regarding genome structure. From a more applied perspective, they provide knowledge regarding the locations of genes and DNA markers associated with them. In a segregating population, morphological markers can be scored and analyzed in the same manner as DNA markers. The difference in scoring for morphological makers compared to DNA markers lies in the fact that, for morphological markers, the genotype is determined based on the visualization of the plant’s phenotype, while DNA markers

5

are scored at the DNA level. For example, a population segregating for resistance to a particular disease would be scored based on the reaction of each individual to the disease as being one of either parental type. Inclusion of this phenotypic data with genotypic DNA marker data for map generation might reveal that the disease resistance gene is flanked by closely linked DNA markers. Such markers are valuable tools that can be employed by plant breeders who wish to move the disease resistance gene into elite lines for the development of new and improved varieties. Using the markers to make selections is known as marker-assisted selection (MAS). MAS has advantages over selecting for the trait itself in that markers are not affected by environmental factors as phenotypic traits sometimes are. In addition, MAS allows breeders to make selections in early generations and growth stages allowing them to eliminate undesirable material early on. In a nutshell, genetic mapping is a great resource for trait mapping and map-based cloning studies; however, a genetic map is not sufficient for sequencing a genome. Polymorphism level is many times a limitation for genetic mapping; however, the advanced and low-cost NGS approaches have been a big boost to overcome this limitation. GBS has been widely accepted and is now being used to map target traits in various crops. In addition, sequence-based genome mapping, where members (mostly RI lines) of a given mapping population are sequenced to high coverage, is now gaining momentum as well. This approach is very useful for generating a large number of sequence tags, which can be assembled and anchored on the genetic map in a chromosome-wise manner. However, precise ordering of these tags/contigs will be an issue due to the limitation of genetic mapping in terms of the number of recombination events. The resolution of a genetic map depends on the number of recombination events that have been scored in a given population. Recombination events are not uniformly distributed across the length of the chromosome as recombination is suppressed around the centromeric regions. So reduced or nearly absent recombination events affect the resolving power of linkage analysis, which means that genes that are several kilobases to megabases apart may appear at the same position on the genetic map.

Physical Mapping In contrast to genetic mapping where distances between landmarks are calculated based on recombination frequency, physical mapping determines the actual physical distance. Physical mapping can be done cytologically by chemically staining and viewing whole chromosomes using techniques such as in situ hybridization (ISH) and C-banding. Such techniques have very low resolution in terms of physical mapping because chromosomes are viewed at the cellular level usually at metaphase. However, recent techniques, such as fiber-fluorescence in situ hybridization (FISH) where nuclear DNA is lysed on a glass slide and used for in situ mapping, can provide a much higher resolution (see succeeding text). The highest-resolution physical mapping is obtained by sequencing the DNA itself. It is usually preceded by constructing local contiguous sequences (contigs) of large-insert DNA clones and anchoring the contig to a genetic map.

6

GENETICS OF GRAINS | Genome Mapping

In Situ Hybridization The ISH technique was developed about 45 years ago and allows the localization of genes or DNA sequences directly on chromosomes in cytological preparations. The ISH technique uses probe DNA that is labeled with biotinylated dUTP or digoxigenin-dUTP and the hybridization sites are detected by enzymatic reporter molecules such as horseradish peroxidase or alkaline phosphatase-conjugated avidin/streptavidin. ISH has been used successfully to determine the physical location and distribution of dispersed or tandemly repetitive DNA sequences on individual chromosomes. For example, it has been used to determine the physical location of multicopy gene families such as the 5S and 18S–26S ribosomal genes. FISH uses fluorochromes for signal detection. The FISH technique allows different DNA probes to be labeled with different fluorochromes that emit different colors (multicolor FISH). Thus, the physical order of two or more probes on a chromosome can be determined simultaneously. Also, FISH can allow more precise mapping of probes because the fluorescence signals can be analyzed with special cameras and digital imaging tools. In humans, the order of two DNA probes can be determined by ISH on metaphase chromosomes only if the two sequences are separated by at least 1 Mb. However, when ISH is done using interphase nuclei, DNA sequences separated by as little as 50 kb can be resolved. Plant metaphase chromosomes are more condensed than human metaphase chromosomes, and this may be one reason why ISH using low-copy probes is more difficult in some plant species. Thus, it has been suggested that interphase nuclei can be exploited for ISH mapping in plants. Subsequently, experiments where DNA probes were hybridized to maize interphase nuclei suggested that the resolving power of interphase FISH mapping can be as little as 100 kb. FISH technique has been used successfully to determine the physical location of bacterial artificial chromosome (BAC) clones on interphase and metaphase chromosomes. Rice BAC clones have been hybridized to rice (Oryza sativa L.) chromosomes revealing that the repetitive DNA sequences in the BAC clones could be efficiently suppressed by using rice genomic DNA as a competitor in the hybridization mixture. The successful application of this technique to plants with very large genomes may depend on the size of the genomic clones analyzed and the amount of repetitive sequences in the genome.

Fiber-FISH Fiber-FISH technique uses extended chromatin DNA across a glass slide and a probe is labeled as with standard FISH and hybridized to the extended fibers and where DNA sequences, which are only a few kilobases apart, can be ordered. In humans, fiber-FISH has been used to analyze overlapping clones, detect chromosomal rearrangements, determine the physical distances between genes, measure the sizes of long DNA loci, and aid in the positional cloning of specific genes. Fiber-FISH was used in Arabidopsis thaliana to measure clusters of DNA repeats as long as 1.71 Mb, which is more than 1% of

the Arabidopsis genome. It was found that fiber-FISH signals derived from small DNA fragments (<3 kb) were often observed as single spots on extended DNA fibers, and thus, sequences that are less than 5–10 kb apart cannot be ordered.

Single-Copy Gene FISH Single-copy gene FISH is an approach to develop a cytogenetic map of a given chromosome using full-length cDNA (fl-cDNA) probes. Because genes and gene syntenic blocks are conserved between different grass species such as wheat, barley, rice, and maize, single-copy FISH provides a rapid method for determining chromosome synteny for species for which little genetic or cytogenetic mapping information is available. In an event of transferring important genes from wild relatives to bread wheat (Triticum aestivum L., 2n ¼ 6  ¼ 42, AABBDD) by induced homoeologous recombination, it is important to know the chromosomal relationships of the species involved. Singlecopy FISH provides a powerful and rapid method for determining genetic relationships of relatively little studied wild relatives with those of wheat. Once identified from singlegene markers, fl-cDNA probes are used for FISH and the respective positions of these probes are determined to develop a cytogenetic map. This technique can also be used to identify structural changes between the homoeologous groups of chromosomes, between the genomes of wheat, and other species from the Triticeae tribe. This provides important information on the strategies to be used for exploitation of those species for wheat improvement.

Aneuploid Mapping Wheat is a polyploid and can tolerate a high degree of aneuploidy (abnormal chromosome numbers). There are a vast array of aneuploidy stocks such as nullisomic–tetrasomic (NT) lines and the ditelosomic (dt) lines. NT lines lack one pair of chromosomes and extra pair of homoeologous chromosomes and allow arm mapping of genes. Ditelosomic lines lack one pair of chromosome arms and allow arm mapping of genes. With today’s molecular technology, the power and utility of the wheat aneuploids have been even more fully realized. DNA markers can be quickly located to a specific chromosome or chromosome arm using a single hybridization or amplification reaction without the need for polymorphism. Telocentric chromosomes can be flow-sorted and DNA-amplified and used for NGS for marker development. Dense chromosomal arm maps have been developed and genes identified and ordered to specific chromosome arms. These maps are useful for gene tagging, linkage and mapping of quantitative trait loci (QTL), cytogenetic manipulations, estimation of genetic distance, and evolutionary studies.

Chromosome Deletion Mapping A unique system in wheat is the use of gametocidal (Gc) factors to construct chromosome deletion lines. Gc chromosomes

GENETICS OF GRAINS | Genome Mapping

were introduced into wheat by interspecific hybridization with the related Aegilops species and backcrossing. Plants monosomic for the Gc chromosome produce two types of gametes. Only those gametes possessing the Gc chromosome are normal. Gametes lacking the Gc chromosome undergo structural chromosome aberrations and, in most cases, are nonfunctional. However, if the damage caused by the chromosome breakage is not sufficient to kill the gamete, it may still function and be transmitted to the offspring. The Gc system has been used to develop wheat lines with terminal chromosome deletions. These stocks have proved very useful for the physical mapping of genes and DNA markers to subarm locations and for the development of physical maps, which have been constructed for all seven homoeologous chromosome groups of wheat. In addition, chromosome bin maps of most of the expressed genes in the wheat plant have been constructed using a set of wheat aneuploid and deletion lines (http://wheat.pw.usda.gov/wEST/binmaps/).

HAPPY Mapping Another genome mapping approach known as HAPPY mapping has been used for genome mapping studies. This approach is based on haploid DNA samples analyzed using the polymerase chain reaction (HAPPY). HAPPY mapping does not require marker polymorphism or time-consuming population development. It is an in vitro approach for the ordering of DNA markers directly on native genomic DNA and is based on analyzing the segregation of markers amplified from high-molecular-weight genomic DNA. It is a three-step process. First, genomic DNA is broken into random fragments using gamma irradiation or mechanical shearing. The DNA is isolated and analyzed for quality and integrity, which is the most important aspect of the technique. Various protocols have been tested and used to avoid unwanted mechanical breakage of the DNA molecules. It is usually done by embedding the living cells in agarose gel; during DNA extraction, long molecules of chromosomal DNA remain trapped and protected within the agarose. The high-quality DNA (DNA solution) is then subjected to random fragmentation using mechanical shearing, gel melting, and x-ray treatments. The average size of the broken fragments depends on the dosage or mechanical shearing used. The next step involves the development of a ‘mapping panel,’ and to achieve this, broken DNA fragments are diluted to a very low concentration and 100 samples from individual treatments usually get dispensed into DNA collecting plates or tubes. Since these samples are very small, each well or tube may represent a small incomplete set of random fragments. The third and final step involves a highly sensitive PCR followed by the scoring of markers as present or absent in the HAPPY mapping panel. Genotyping of large sets of markers and detailed analysis of marker data can be used for the construction of maps and to calculate precise locations of markers on a given chromosome or genome. Because the samples in a mapping panel are so small that each one will contain only a randomly sampled subset of the markers rather than the complete genome, a given marker tested on the panel can be present in only one subset of the panel. If two marker loci are close together, then

7

they will remain on the same broken fragments and not show any break between them, whereas distant markers may be lost. With increasing distances between two marker pairs, the frequency of random breaks between them will also increase. The statistical analysis of the cosegregation frequencies and different mapping software can be used to deduce a marker or map order based on the data generated from the HAPPY mapping panel. There are certain limitations attached to this approach. The first is that it is difficult to prepare DNA fragments of more than a few megabases in size, and therefore, intermarker distances of more than one megabase are difficult to measure. Another major limitation is the sample size of the DNA in the mapping panel, as all markers need to be mapped by PCR.

RH Mapping RH mapping has been exploited in animal genome mapping projects and is a recombination-independent approach. It was pioneered in the human genetics arena and uses radiationinduced chromosome breakage rather than meiotic recombination for mapping. After fragmentation, samples containing different subsets of the original chromosome or genome are isolated and used for marker assays. In this method, any given mapping panel member is assayed for the presence or absence of a given marker, thus circumventing the need for marker polymorphisms between genotypes. Gross and Harris produced the first RHs by irradiating the cultured human cells with a high dose of x-rays and their subsequent fusion to unirradiated hamster cells. Generated RHs showed many broken fragments of human chromosomes with unfragmented chromosomes of hamster cells. The approach was then modified and applied to a number of animal species. In the modified approach, donor cells are irradiated and then fused to unirradiated host cells, and RHs containing donor chromosome fragments are identified using selectable markers for a given species. Species-specific RHs can be isolated, cultured, and saved as an immortal resource. For genome (RH) mapping, the DNA of 100 hybrid cell lines (each containing a different set of donor fragments) can be assembled as an RH panel. The assembled panel can be used for marker genotyping and the order and distances of the markers in a given genome can be inferred. Mapping resolution in an RH panel is a function of the size of the fragments that are generated during the development of the mapping panel. Therefore, the mapping resolution can be altered by simply changing the level of chromosome fragmentation. Additionally, in RHs, map distances better reflect the true physical distance between markers than do recombinationbased maps, so maps constructed by the RH approach can better approximate the physical layout of a given chromosome. The RH approach has been used to map the human genome along with various animal genomes; however, its application in plants has been limited. RH mapping in plants was first reported for a maize chromosome, and then, it was applied to cotton, barley, and wheat. Recently, RH mapping was used for genome mapping of hexaploid wheat (Figure 5). Figure 5 presents a scheme for the development of an RH panel for Dgenome chromosomes of hexaploid wheat. Pollen from the reference hexaploid wheat Chinese Spring was irradiated using

8

GENETICS OF GRAINS | Genome Mapping

Tetraploid wheat line Altar 2n=4x=28 (AABB)

Hexaploid wheat line Chinese Spring 2n=6x=42 (AABBDD) Green House Planting

Emasculation of tetraploid wheat spikes

Pollen n=3x=21(ABD)

Pollen

Gamma irradiation X

Pollen n=3x=21(ABD)

Genotyping

Egg n=2x=14(AB)

About 25 days after pollination spikes were harvested which carried RH1 seeds.. Each Seed represents a Chinese Spring-RH and independent deletion event(s).

Green house planting, tissue collection, DNA extraction

RH1 2n=5x=35(AABBD)

Figure 5 Development of Chinese Spring D-genome radiation hybrid panel: The spikes of hexaploid wheat cultivar Chinese Spring (T. aestivum; 2n ¼ 6 ¼ 42, AABBDD) were used for g-irradiation. Pollen from irradiated spikes was immediately used to pollinate the stigmas of emasculated florets (male anthers removed) of tetraploid wheat variety Altar 84 (T. turgidum; 2n ¼ 4 ¼ 28, AABB). Seeds of F1 hybrids were harvested 20 days after pollination. Each surviving F1 seed (RH1-pentaploid) on germination represents a unique RH event. DNA samples of the individual RH1 plants were then harvested and genotyped for RH mapping.

gamma radiation, and these pollen samples were used to pollinate a tetraploid wheat line Altar84. F1 seeds (pentaploid) represent an RH panel and each plant from these seeds presents a unique RH event. Chromosome lesions induced in the A and B genomes of Chinese Spring are masked in these quasipentaploids due to the presence of A and B genome chromosomes from the tetraploid parent, but the chromosomes from the D genome are present in one copy and allow RH mapping of all D-genome chromosomes simultaneously. It has been found that using a small RH panel (94 lines), map resolution of up to 300 kb can be achieved throughout the length of any given chromosome in hexaploid wheat. The RH panel can be used to anchor and order BAC contigs, derived from flowsorted chromosome arm-specific libraries to individual wheat chromosomes. RH panels will also be highly useful for ongoing wheat genome sequencing projects for ordering of sequence scaffolds.

Large-Insert Clone Contigs The construction of physical contig maps is important for facilitating positional cloning of genes, sequencing of genomic DNA, and detailed analysis of chromosome and genome structure. Physical contig mapping is the arrangement of large-insert clones (YACs, BACs, and cosmids) in a linear array that

represents the DNA sequence along the chromosome. Clones are selected by screening a library with DNA probes used to detect genetic markers on a genetic linkage map of the organism. Several DNA probes that detect closely linked genetic loci will hybridize to corresponding large-insert clones, and these clones can then be arranged into a contig based on overlapping segments and fingerprinting. BAC contigs are currently being developed in many crop species. However, crops with complex genomes offer huge problems due to large genome size, polyploid nature, and very high percentages of repetitive sequences. To address these issues in wheat, a sophisticated flowsorting technique was applied for isolation of individual chromosomes or chromosome arms. The DNA from these flow-sorted chromosomes and arms was used for the development of BAC libraries. These BAC libraries laid the foundation for the physical mapping of the wheat genomes. Once a physical contig map is complete, the structure and organization of the genome, such as the distribution of repetitive and singlecopy sequences, can be discerned. A BAC-by-BAC approach has been considered as the most suitable approach for generating reference genome maps of barley and wheat. In this method, a BAC library for an individual chromosome is the starting point and BAC contigs are constructed from individual BACs by identifying BACs containing overlapping fragments. Ideally then, the BAC contigs are anchored onto a genetic or RH map of the genome, so that the sequence data from the contig can

GENETICS OF GRAINS | Genome Mapping

be checked and interpreted by looking for markers or genes known to be present in a particular region. The BACs constituting the minimum tiling path are then individually sequenced by the shotgun method and assembled into a pseudomolecule providing a sequence of each chromosome.

Comparing Physical Distance to Genetic Distance Physical maps have led to a wealth of information regarding the physical locations of morphological traits and evolutionary translocation breakpoints and genome-wide structure and organization. Comparisons of the physical maps with genetic linkage maps can reveal the physical distribution of genes and recombination along the chromosome. For example, RFLP probes derived from mRNA (called cDNA probes) represent expressed genes, and thus, the physical mapping of cDNA probes will reveal the physical locations of expressed genes. Therefore, when sets of cDNA probes are mapped genetically as well as physically, one can infer the relationship between physical distances and genetic distances among the common markers. In wheat, physical maps constructed using the chromosome deletion lines have been compared extensively to corresponding genetic maps of the same chromosomes. This work has revealed that genes and DNA markers tend to be clustered in small physical segments that undergo a high degree of recombination (Figure 6). These gene-rich regions are separated by large gene-poor segments that undergo very little recombination. This work has facilitated BAC contig construction of regions containing genes of interest for the purpose of positional cloning. In barley, physical maps generated based on translocation breakpoints were compared to corresponding genetic linkage maps. The results agreed with those found in wheat by deletion mapping and showed that the barley genome consists of relatively small gene-rich regions that are hot spots for recombination interspersed among large segments that are gene-poor and undergo very little recombination. The information obtained by physical mapping of translocation breakpoints has facilitated the construction of BAC contigs and positional cloning of important genes by allowing researchers to focus on the generich regions of the genome. More intricate comparisons of physical and genetic relationships can be obtained by comparing local BAC contigs to genetic maps. The primary goal of such experiments is to identify a large-insert clone containing a gene of interest, but additional important information is obtained. For example, once a physical contig map of the region is developed, it can be compared to the genetic linkage map of the corresponding region to calculate physical to genetic distance ratios. This is important information because recombination is known to be distributed nonrandomly throughout the genomes of many plant species causing the physical to genetic distance ratios to be highly variable depending on the characteristics of the region.

Comparative Mapping Much effort has been put forth in comparing the genomic relationships among grasses and among members of other

9

plant families. For example, comparative mapping experiments among members of the Poaceae such as wheat, rice, barley, rye, oat, and maize have revealed remarkable similarities in gene content and marker synteny at the chromosome level. It is well established that DNA probes cloned from these related species commonly identify sets of orthologous loci that lie at approximately the same positions relative to each other and to the centromeres. GenomeZipper-based consensus maps, which integrate ordered gene loci from homoeologous wheat genomes and the corresponding chromosomes of barley, Ae. tauschii, T. monococcum, and rice, have been constructed. These experiments have shown that the genomes of barley, Ae. tauschii, and T. monococcum are essentially colinear with that of wheat. The genomes of more distantly related cereals such as oat, rice, and maize can be divided into linkage blocks that have homology to corresponding segments of the wheat genome. The degree of genomic similarities observed at the chromosome level among grass genomes led to the notion that information from the small genome of rice could be directly applied to the much larger genome of wheat. However, even though a substantial degree of synteny is observed at the chromosome level, studies of the degree of microcolinearity between rice and wheat show less promise for gene discovery in wheat. Genes with conserved order across these three species with sequenced genomes can be used to predict the order of corresponding genes conserved in other grass species using synteny-based analysis. There have been exciting developments in genome mapping studies in grasses in terms of the development of highdensity genetic maps and physical maps. This was followed by the generation of EST databases in cereals. In the recent past, large-scale genome sequencing projects in grasses have been successfully implemented, the list including rice, Brachypodium, sorghum, maize, and foxtail millet. These studies provided extensive information on the genome organization of major cereals. Knowledge gained from the genome sequencing has enhanced understanding of the structural and functional components of the genome for its effective utilization in genetic improvement of cereals. Genome maps (whole-genome sequences) of the diploid model grass Brachypodium (genome size 272 Mb) are available, and these provide a useful resource to study the evolution of genomes across the grasses. Among sequenced cereal crops, rice has a smaller genome (420 Mbp) and higher gene density as compared to other cereals; sorghum is positioned after rice with genome size of 730 Mb, whereas the maize genome is larger (2.3 Gb), and it has undergone several rounds of genome duplications and is distinguishable from its close relative, sorghum. Reference genome maps of sorghum and foxtail millet are available, and altogether, these reference genome maps provide a great resource to study comparative genomics in order to develop mapping information about an orphan grass or cereals with no genomic information. There are many software programs and databases developed to look at the syntenic relationship of the cereal genomes. Recently, a GenomeZipper approach was developed to provide an extensive database for studying syntenic relationships among grass genomes (between wheat, Brachypodium, rice, sorghum, and barley genomes). The GenomeZipper uses a novel approach that allows systematic exploitation of conserved synteny with model grasses. For example, it allowed

10

GENETICS OF GRAINS | Genome Mapping

Genetic Map Physical Map Xbcd873 Xbcd873 Xabg705, Xbcd1871 38.4

0.0 8.7 5.6 2.5 2.4 0.0 4.8 2.4 2.4 14.5 2.6 1.3 0.0 3.6 1.2 3.6 1.2 1.2 2.4 9.0 12.3

Xabg705 Xbcd1871

Xwg363

Xwg363 XksuA3 Xbcd204 Xpsr128 Xbcd157 XksuH1 Xbcd1140 Xpm181 Xmwg914 Xmwg72 Xpsr120 XksuQ63 Xbcd9 Xwg583 Xcdo400 Xbcd183 tsn1 Xbcd1030 Xrz575 Xcdo948

XksuA3 Xbcd204 Xpsr128 Xbcd157 XksuH1 Xbcd1140, Xpm181

7.7 Xpm182 Xmwg914, Xmwg72, Xpsr120, XksuQ63 Xbcd9, Xwg583, Xcdo400, Xbcd183, tsn1 Xbcd1030, Xrz575, Xcdo948, Xpm182

19.6 Xpsr370 9.7

Xpsr370, Xmwg862, Xpsr580 Xmwg862

13.3 Xpsr580 Figure 6 Wheat chromosome 5B genetic linkage map (left) compared to the physical map (right). The genetic linkage map was constructed using a backcross population and the physical map was constructed using the chromosome deletion lines of wheat. On the genetic linkage map, map units separating markers are shown at the left, and markers are indicated on the right. On the physical map, hash marks on the left of the chromosome indicate deletion breakpoints; black and hatched regions on the chromosome represent dark and light C-bands, respectively; and DNA markers and their bin locations are shown to the right. Lines drawn between the maps indicate where deletion breakpoints occur relative to the genetic map. Notice that the centromeric region is nearly void of DNA markers and recombination, while more distal regions possess most of the DNA markers and recombination.

the assignment of 86% of the total estimated (32 000) barley genes to individual chromosome arms.

Future Mapping Prospects The ultimate goal in map construction is the deciphering of the linear DNA sequences of the full complement of chromosomes of an organism and the utilization of map information in trait mapping. The whole-genome sequence information available in major cereals like rice, sorghum, maize, and foxtail millet

has revolutionized the understanding of the mechanisms underlying genome evolution in these important cereal crops as well as unraveling the important mechanisms in plant growth and developmental processes and tolerance to various biotic and abiotic stresses. The practical applications of the genome maps and reference sequences are best realized only when allelic diversity among diverse germplasm is better understood. In crops where sequence information is not available, comparative genomics-based tools can be very useful for providing a virtual gene order based on synteny. Sequenceready physical maps of diploid barley chromosomes, reference

GENETICS OF GRAINS | Genome Mapping

sequences of wheat chromosome 3B, and sequence-ready physical maps of some wheat chromosomes are available. These ongoing efforts in wheat and barley are critical for developing amenable and high-yielding crops to fight various challenges emerging in the form of new diseases and changing environmental conditions.

Exercises and Assignments for Revision

• • • • •

What are molecular markers? What are the differences between genetic mapping and RH mapping? What are the limitations with HAPPY mapping in order to develop a genome map? Which cereal genomes have been sequenced? What is comparative genome mapping?

Exercises for Readers to Explore the Topic Further

• •

What is the status of cereal crop genome sequencing projects? How many wheat chromosomes are sequenced to date?

See also: Genetics of Grains: Wheat Genetics (00253); Wheat Genomics (00228).

11

Further Reading Appels R, Morris R, Gill B, and May C (1998) Chromosome Biology. Boston, MA: Kluwer Academic, p. 401. Devos KM and Gale MD (2000) Genome relationships: The grass model in current research. Plant Cell 12: 637–646. Faris JD, Friebe B, and Gill BS (2002) Wheat genomics: Exploring the polyploid model. Current Genomics 3: 577–591. Feuillet C and Keller B (1999) High gene density is conserved at syntenic loci of small and large grass genomes. Proceedings of the National Academy of Sciences of the United States of America 96: 8265–8270. Jiang J and Gill BS (1994) Nonisotopic in situ hybridization and plant genome mapping: The first 10 years. Genome 37: 717–725. Jiang JM and Gill BS (2006) Current status and the future of fluorescence in situ hybridization (FISH) in plant genome research. Genome 49: 1057–1068. Lander ES and Botstein D (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199. Liu BH (1997) Statistical Genomics: Linkage, Mapping and QTL Analysis. Boca Raton, FL: CRC Press. McCarthy LC (1996) Whole genome radiation hybrid mapping. Trends in Genetics 12: 491–493. Paterson AH (1996) Making genetic maps. In: Paterson AH (ed.) Genome Mapping in Plants, pp. 23–39. Austin, TX: R G Landes Company. Paux E, Sourdille P, Mackay I, and Feuillet C (2012) Sequence-based marker development in wheat: Advances and applications to breeding. Biotechnology Advances 30: 1071–1088. Redei GP (1999) Genetics Manual. Singapore: World Scientific, pp. 1141. Tanksley SD, Ganal MW, and Martin GB (1995) Chromosome landing: A paradigm for map-based gene cloning in plants with large genomes. Trends in Genetics 11: 63–68. Tanksley SD, Young ND, Paterson AH, and Bonierbale MW (1989) RFLP mapping in plant breeding: New tools for an old science. Biotechnology 7: 257–263.