Journal of Cereal Science 41 (2005) 37–46 www.elsevier.com/locate/jnlabr/yjcrs
A universal protocol for identification of cereals Shane R. McIntosh, Toni Pacey-Miller*, Robert J. Henry Southern Cross University, Centre for Plant Conservation Genetics (CPCG), PO Box 157, Lismore, NSW 2480, Australia Received 13 January 2004; revised 18 May 2004; accepted 14 June 2004
Abstract Plant fingerprinting and identification has increasingly become a focus in commerce and manufacturing with an emphasis on fast, reliable and cost effective high throughput techniques. GBSS1 is a well conserved single copy nuclear gene in the grass family with potential for generating a universal approach to grass fingerprinting. Alignment of DNA sequences from Poaceae members identified five well conserved regions. PCR primers designed to these regions amplified single DNA fragments on all grasses tested. DNA sequencing revealed polymorphism within these DNA fragments allowing identification at the species level. A universal sequencing primer for Poaceae enabled pyrosequencing through a 28 base pair highly polymorphic region generating unique pyrograms for rice, wheat, barley and maize. Analysis of exon/intron composition including intron length and number, of GBSS1 provided another distinct fingerprinting method for grasses. Phylogenetic utility of these fragments was demonstrated by production of phylograms consistent with previously described taxonomic relationships for the Poaceae family. The sequence polymorphism of the GBSS1 gene provides the basis for universal primer design for identification of members of the Poaceae family. The protocols developed may prove more generally useful in the distinction of plant species in other plant families. q 2004 Elsevier Ltd. All rights reserved. Keywords: GBSS1; Pyrosequencing; Universal primers
1. Introduction Conventional plant classifications employ a diverse array of approaches ranging from anatomical and phytochemical characteristics to habitat restrictions. Many of these methods are susceptible to convergent evolution and have become the focus of reassessment. Due to it its vast economic importance many attempts have been made to understand the taxonomic relationships within the Poaceae family which comprises approximately 10,000 species including some 600–900 genera (Kellogg and Campbell, 1987; Renvoize and Clayton, 1992). Only recently have we been able to examine the genetic material itself to investigate phylogenetic relationships. Many different techniques of genetic analysis have Abbreviations: AFLP, amplified fragment length polymorphism; RFLP, restriction fragment length polymorphism; SNP, single nucleotide polymorphism; SSCP, secondary structural content prediction. * Corresponding author. Tel.: C61 2 66203405; fax: C61 2 66269129. E-mail address:
[email protected] (T. Pacey-Miller). 0733-5210/$ - see front matter q 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.jcs.2004.06.005
emerged, each with varying degrees of utility. A combination of these techniques may be needed to address taxonomic relationships, depending on the level of classification desired. The advances in DNA technology have enabled analysis of both nuclear and organelle DNA, each proving to be very useful in reconstructing phylogenies and as a tool for identification. It is the basic assumption that variations within a defined genetic sequence result from mutations which represent an evolutionary event. Chloroplast DNA (cpDNA) variation has become a standard source of data in studying Poaceae systematics with two main areas under investigation. The first uses restriction site mapping or restriction fragment polymorphism of the entire chloroplast genome (Davis and Soreng, 1993). The second looks directly at sequence variation (nucleotide polymorphisms and indels) of specific genes including ndhF (Clark et al., 1995; Gaut et al., 1997) and rbcL (Duvall and Morton, 1996; Nishikawa et al., 2002). Each method proving useful depending on the level of classification desired.
38
S.R. McIntosh et al. / Journal of Cereal Science 41 (2005) 37–46
The need for phylogenetic markers from the nuclear genome to complement the growing cpDNA data sets has lead to the examination of new genes for plant systematics. Nuclear encoded ribosomal DNA (nrDNA) which encodes genes for 5.8S, 18S, 28S and non-coding regions (internal transcribed spacer, ITS1 and ITS2), (Appels and Clarke, 1992; Hsiao et al., 1995; Sastri et al., 1992) have been regions of choice showing broad utility. Additional nuclear loci under investigation include, phytochrome B (Mathews et al., 2000) and granule-bound starch synthase 1 (GBSS1) (Mason-Gamer et al., 1998). Today focus has shifted away from classical phylogenetic questioning and onto more advanced genetic finger printing. Current goals are not only to identify plant sources but to locate genetic markers for desired traits. These applications have been the major focus for DNA technologies in assisting plant breeding programs, maintaining crop consistency, plant forensics and in commercial activities. The increasing demand for high-throughput analysis has led research away from labour intensive gel based methodologies (i.e. RFLP, SSCP and AFLP) to the analysis of simple sequence repeats like microsatellites (Cordeiro et al., 2000; Nishikawa et al., 2002) and single nucleotide polymorphisms (SNPs) (Batley et al., 2003; Nasu et al., 2002) as genetic markers (reviewed by Henry, 2001). Identifying small informative regions of DNA has led to development of new techniques such as pyrosequencing that are faster and more cost effective (Ching and Rafalski, 2002; Fakhrai-Rad et al., 2002; Nordstrom et al., 2000). Here we evaluate the use of the GBSS1 gene to describe a system that will enable simple and rapid identification of grasses. GBSS1 is an ADP–glucose– starch glucosyltransferase involved in amylose synthesis in plants (McLauchlan et al., 2001). GBSS1 is a wellconserved gene within Poaceae and due to its commercial importance has undergone much investigation. Its utility in
phylogenetics and sequence polymorphism surrounding desired traits has been investigated (Mason-Gamer et al., 1998; McLauchlan et al., 2001). The level of GBSS1 sequence homology within grasses and its structural characteristics has led us to identify a number of simple, sequence based techniques for rapid plant identification. In addition to their potential in classical phylogenetics and plant identification there is opportunity for traceability within mixed samples in cereal processing and food quality control. The investigation of the constituents of an unknown cereal sample highlighted the need for a simple test to identify contaminants or multiple sources of ingredients in processed products. A set of potential universal primers for use in PCR and pyrosequencing were identified and assessed for their utility in fingerprinting grasses and preliminary tests have been undertaken for mixed cereal identification.
2. Experimental 2.1. Primer design The nucleotide sequences of GBSS1 from various grasses including Genbank accession numbers (wheat, X57233; rice, X62134; barley, X07932 and maize, X03935) were retrieved from the NCBI data base and aligned using EMBL-European Bioinformatics institute Clustal W (Thompson et al., 1994) program (http://www. ebi.ac.uk/Tools/). Sequences were aligned in FASTA format to identify regions of homology between these grasses. Three primer pairs were designed to conserved areas that encompassed highly polymorphic regions (Fig. 1). Primers were assessed by Primer Premier Version 5.0 (Premier Biosoft International, Palo Alto, CA).
Fig. 1. Schematic representation of the Rice GBSS1 genomic sequence showing the position and direction of the primers used for PCR and sequencing. The sequence of the primers is also indicated including approximate melting temperatures. The PCR primer combinations are as follows: A/B; C/D and C/E.
S.R. McIntosh et al. / Journal of Cereal Science 41 (2005) 37–46 Table 1 Summary of grasses assayed Genus/species
DNA bank accession number
Oryza sativa var. Nipponbare Oryza sativa var. Amaroo Oryza sativa var. Basmati 370 Oryza sativa var. Calrose Oryza longiglumis Oryza alta Oryza barthi Oryza australiensis Oryza glumaepatula Oryza officinalis Oryza rufipogon Potamophlia parviflora Triticum aestivum var. Tasman Triticum aestivum var. Sunco Triticum aestivum var. Katepwar Triticum aestivum var. Sunstar Triticum aestivum var. Egret Triticum aestivum var. Halberd Triticum aestivum var. Cranbrook Triticum aestivum var. ChineseSpring Triticum monococcum Triticum speltoides Triticum tauschii Hordeum vulgare var. Beecher Hordeum vulgare var. Skiff Hordeum vulgare var. Galleon Hordeum vulgare var. Tallon Hordeum vulgare var. Arapalies Hordeum spontaneum var. Iran Hordeum spontaneum var. Jordon Hordeum spontaneum var. Turkey Zea mays var. PoblacionXV Saccharum spontaneum Secale cereale Eleusine conacana Festuca canariensis Cynodon transvaalensis Setaria italica
AC01-1001045 AC01-1001023 AC01-1001016 AC01-1001028 AC01-1002314 AC01-1002317 AC01-1002318 AC01-1002319 AC01-1002321 AC01-1002322 AC01-1002323 AC01-1001064 AC01-1001529 AC01-1001530 AC01-1001531 AC01-1001533 AC01-1001534 AC01-1001535 AC01-1001536 AC01-1001537 * * * AC03-1002442 AC03-1002454 AC03-1002446 AC03-1002457 AC03-1002460 * * * AC03-1003064 AC03-1003161 AC03-1002636 AC03-1003346 AC03-1003349 AC03-1003344 AC03-1003059
numbers (Tables 1 and 2). Seedlings were grown to approximately 10 cm. Genomic DNA was extracted from fresh leaf material by the CTAB method (Maguire et al., 1994) PCR was performed in a total volume of 50 ml containing 50 mM Tris–HCl (pH 9.0), 15 mM ammonium sulfate, 7 mM MgCl2, 0.17 mg/ml BSA (Sigma Fraction V), 0.05% NP40, 3.75 mM dNTPs, 50 pMol or 500 ng each primer, 5U Taq DNA Polymerase (Roche, Australia) and 20 ng genomic DNA. After an initial denaturation at 94 8C for 4 min, 30 cycles of amplification were carried out starting at 94 8C for 30 s, followed by 40 s at 62 8C (for AB) and 57 8C (for CD and CE) with a final extension at 72 8C for 35 s. PCR products were analysed on a 2% agarose gel. Sequencing was done by the Australian Genome Research Facility (AGRF, Brisbane, Australia) and analyzed using Chromas software. Sequences were edited and aligned with Clustal W using default settings (Thompson et al., 1994). 2.3. Pyrosequencing Pyrosequencing was performed on fragment CD for the four grasses wheat, rice, barley and maize A biotinylated CD PCR product was generated using a 5 0 biotin labelled D primer and PCR conditions described for CD above. PCR products were analysed by agarose gel separation to assess the quality of the product. Desalting and removal of unincorporated nucleotides and primers was carried out using a PCR clean up kit (Qiagen, Germany). The following pyrosequencing method has been adapted from that supplied with the SQA Sequencing kit (Pyrosequencing, AB, Uppsala, Sweden). All pyrosequencing was performed by Southern Cross Plant Genomics (Lismore, Australia). 2.4. ssDNA Production
*Sourced from other collections.
2.2. PCR amplification conditions and sequencing All seeds and genomic DNA was provided or sourced from private collections by the Australian Plant DNA Bank (www.dnabank.com.au) with the corresponding accession Table 2 Summary of plant species assayed Genus/species
DNA bank accession number
Ipomoea batatas Arabidopsis thaliana Corymbia variegate Pinus caribaea var. Caribaea Acronychia littoralis Elaeocarpus williamsianus Austromyrtus fragrantissima Lycopersicon esculentum
* * * * AC01-1000955 AC01-1002128 AC01-1001094 *
*Sourced from other collections.
39
Biotinylated PCR product (20 ml) was incubated with 8 ml Streptavidin Sepharose high performance beads (Amersham, Pharmacia, England) in the presence of 12 ml Binding Buffer (10 mM Tris–HCl, pH 7.6; 2 M NaCl; 1 mM EDTA; 0.1% Tween 20) for 10 min at room temperature. dsDNA was denatured using 50 ml of 0.2 M NaOH for 1 min at room temperature then washed twice with 150 ml of 10 mM Tris–acetate, pH7.6. 2.5. Primer annealing and sequencing Two pyrosequencing primers (CDpyroseq1: 5 0 CTGCATCCACAACATCT and CDpyroseq2: 5 0 -ACAACATCTCCTACCAG were examined (Fig. 3). The ssDNA isolated was incubated with 50 ml of annealing buffer (20 mM Tris–acetate, pH 7.6), 2 mM magnesium acetate and 12.5 pMol of primer. Annealing of sequencing primer to template was performed at 80 8C for 2 min then allowed to
40
S.R. McIntosh et al. / Journal of Cereal Science 41 (2005) 37–46
cool to room temperature. Pyrosequencing was performed automatically with a PSQ 96 system (Pyrosequencing, AB, Uppsala, Sweden) using the SQA reagent kit as per manufacturers instructions. 2.6. Mixed sample analysis Preliminary analysis was undertaken to examine the usefulness of PCR and DNA sequencing for traceability in mixed cereal samples. Genomic DNA from the grasses wheat, barley, rice and maize was randomly combined. Combinations of two or more templates were amplified using PCR, separated on an agarose gel and sequenced as previously described. The utility and accuracy of pyrosequencing was also examined on these mixtures. 2.7. Phylogenetic analysis Coding DNA sequences of fragment AB and CD for rice (Oryza sativa var. Nipponbare), wheat (Triticum aestivum var. Chinese Spring), barley (Hordeum vulgare var. Arapalies) and maize (Zea mays var. PoblacionXV) were analyses by PAUP*4.0b10 (Swofford, 2002). A phylogenetic tree was generated by a heuristic search and Nearest Neighbour Interchange with steep descent.
2.8. Structural analysis of introns To analyse the intron-length polymorphism of GBSS1, genomic sequences for wheat (Genbank accession AB019622), rice (Genbank accession X53694), barley (Genbank accession X07931) and maize (Genbank accession X03935) were edited to remove all coding sequence. The length of each intron was measured in base pairs and compared to generate a bar graph.
3. Results 3.1. Sequencing Alignment of GBSS1 cDNA sequences from different grasses, including species representing diverse groups within the grass family, revealed blocks of sequence that are highly conserved (data not shown). From these alignments, priming sites were identified in a number of conserved regions that would amplify fragments of GBSS1 in all of the grasses analysed. Three regions of GBSS1 showing varying degrees of polymorphism were identified and selected for further study. These regions were chosen on the basis that they would (1) be
Fig. 2. Alignment of amplified DNA sequences. The 143 bp fragments amplified with primers A and B from rice, wheat, barley and maize were aligned by Clustal W with polymorphic bases highlighted. The binding sites for primers A and B are indicated.
S.R. McIntosh et al. / Journal of Cereal Science 41 (2005) 37–46
expected to work on all grasses, (2) amplify a region of high polymorphism and (3) had priming sites that fell completely within an exon and were not near the ends of exons (based upon analysis of available sequence information). The structure of this gene is such that it contains many small exons and introns so that finding a site where priming would not be affected by a shift in an intron boundary was a challenge. PCR amplification from 20 ng of genomic DNA produced a single band for rice (Oryza sativa var. Nipponbare), wheat (Triticum aestivum var. Chinese Spring), barley (Hordeum vulgare var. Arapalies) and maize (Zea mays var. PoblacionXV). Each of the three regions amplified revealed variations in the size of the fragments between the species tested. This variation was found to result from intron sequence polymorphism. Intron sequences were removed prior to alignment. Alignment of fragment AB (Fig. 2) shows the two almost totally homologous domains, bases 1–20 (which encodes primer A) and bases 124–143 (which encodes primer B). This fragment was approximately 23% polymorphic in sequence providing ample variation to distinguish one genus from another. Base position number 72 was identified as a possible SNP to discriminate all four genera. This base differed between rice, wheat, barley and maize and was consistent within the small set of variants tested (data not shown). Bases 60–73 would also provide good utility for SNP identification by pyrosequencing. The second GBSS1 region amplified termed CD (Fig. 3) is a 151 bp fragment once again bound by two very well conserved domains, bases 1–20 (denotes primer C) and bases 132–151 (denotes primer D). This region was found to have 29% sequence variation. It was established that the analysis of a relatively small number of these polymorphisms could differentiate between these grasses accurately. To extend the utility of this CD region of GBSS1 the amplified fragment was extended to include another 100 bp of sequence downstream of the priming site D (amplified with primer E, bases 232–251). Alignment of this larger 251 bp fragment from the four species (Fig. 3) identifies more SNP positions giving the CE fragment a total of 23.7% polymorphism. Depending on the level of differentiation required, polymorphisms may be examined within the smaller fragment or extended to include the larger fragment. A cross section of both wild and cultivated wheat, barley and rice (Table 1) has been assayed to identify the tent of sequence polymorphism between and within species. The variation within the amplified fragments showed no varietal differences in either the rice, barley or wheat cultivars and was found to be limited when comparing related species. Wheat contained a small number of SNPs between species but there was no variation evident within the rice or barley species examined in the fragment CD. To test the utility of the highly conserved primers C and D for their ability to amplify and identify a more diverse range of grasses, PCR amplification was performed on members representing the genus Saccharum, Secale,
41
Eleusine, Festuca, Setaria and Cynodon (see Table 1). Alignment of available GBSS1 sequences from these more diverse plant species which include dicots and non-flowering species (Table 2) reveal highly conserved truncated regions in both primers C (bases 1–17) and D (bases 133– 149) as shown in Fig. 3. This truncated primer pair showed greatest utility outside of grasses and in all cases we were able to amplify a single PCR fragment (data not shown). 3.2. Pyrosequencing Two approaches were taken to generate ‘fingerprints’ or pyrograms for the four grasses in question, rice maize, wheat and barley For pyrosequencing application we targeted fragment CD as it has a hypervariable region bases (32–60) close to a well conserved region (bases 1–32). Pyrosequencing from biotin labeled ssDNA has been found to give best results (Pacey-Miller and Henry, 2003). Hence the fragment to be pyrosequenced was labeled at the 5 0 end of the D primer. Two pyrosequencing primers CDpyroseq1 and CDpyroseq2 were designed upstream to this hypervariable region (Fig. 3). The first technique, termed ‘non-cyclic’, employs a process of pre-determining a base dispensation sequence. The nucleotide sequence of the fragment in question needs to be known for this process. A dispensation which provides for all four grasses can be produced. The dispensation order is the same for homologous bases, then when a SNP position is reached a combination of nucleotides is dispensed to account for the sequence variation. A control base dispensation is then released so the sequencing returns to alignment. Using a dispensation order assessed with sequence primers CDpyroseq1 and CDpyroseq2, accurate pyrograms showing SNP’s were generated (data not shown). CDpyroseq2 primer proved to be the most useful as it was closer to the hypervariable region and routinely gave approximately 35 bases of sequence data. The second method, termed ‘cyclic’, does not call for a specific dispensation sequence but uses a more standard sequencing approach. All four nucleotides are presented in a cyclic fashion (i.e. CGAT, CGAT, etc.) at each nucleotide position on the template. The correct base is incorporated into the sequence. Using CDpyroseq2 primer with a cyclic protocol we generated consistent fingerprints for the species tested (Fig. 4). Individual peaks, but more often a combination of peaks, are used to differentiate between these plant species (highlighted bases, Fig. 4). The Y-axis of the pyrograms uses arbitrary units representing light intensity produced during the pyrosequencing chemistry. Specific peak heights are not comparable between different runs; therefore it is the presence or absence of nucleotide incorporation at the SNP positions that is assessed. Varieties within a species (Table 1) were examined to observe the consistency of peak patterns (pyrograms not shown). Within the varieties tested no SNPs were identified in this 30 base region and consistent pyrograms were produced for each species.
42
S.R. McIntosh et al. / Journal of Cereal Science 41 (2005) 37–46
Fig. 3. Alignment of amplified DNA sequences. The 251 bp fragments amplified with primers C and E from rice, wheat, barley and maize were aligned by Clustal W with polymorphic regions highlighted. Amplification with the nested primer D generates a fragment of 151 bp (bases 1–151). Primer binding sites are highlighted, underlined regions 1 and 2 indicate the binding sites for the respective pyrosequencing primers (CDpyroseq1) and (CDpyroseq2).
S.R. McIntosh et al. / Journal of Cereal Science 41 (2005) 37–46
43
Fig. 4. Pyrosequencing analysis. Pyrograms for rice, maize, barley and wheat (top to bottom) produced by priming with sequencing primer CDpyroseq2 (Fig. 3) using a cyclic dispensation. The highlighted peaks indicate some of the main polymorphic regions. Y-axis units are arbitrary representing light intensity.
The comparison of pyrograms between species shows obvious differences in incorporation at the SNP position (Fig. 4). The first differential position for example contains a single G in rice, there are two G’s in maize and a GC in both barley and wheat. Rice contains a TTT sequence in the second SNP position whereas the other species contain only a TT. Further downstream the rice displays an incorporation of TTAC yet the maize is CTAC, the barley CTTT and the wheat CTTC. Continued differences in the sequences can be seen downstream in Fig. 4. It can been seen from these results that the pattern obtained for each of the species gives a more consistent evaluation for identification than examining individual SNPs.
using PCR primers C and D as previously described (data not shown) The resulting fragments were separated on a 2% agarose gel, excised and subject to standard sequencing protocols. Positive identification was achieved on a limited sample set. Data suggests that further optimisation of protocols is needed before definitive traceability is achieved on mixed samples however we have identified that this technique has potential to achieve this desired outcome. Pyrosequencing however has shown less promise for this application due to the difficulty in interpreting pyrograms. 3.4. Phylogenetics
3.3. Mixed sample analysis Combinations of two or more samples of genomic DNA from rice, wheat, barley and maize were amplified
To examine the relationship between these grasses and the use of GBSS1 as a phylogenetic tool, the DNA sequence data from combining both AB and CD
44
S.R. McIntosh et al. / Journal of Cereal Science 41 (2005) 37–46
contained the same phylogenetic groupings as the combined fragment with minor variation in distance between branches (data not shown). These genetic distance relationships are indicated by numbers representing evolutionary events (changes) required to separate these grasses into their relative taxa. 3.5. Intron analysis The structural makeup of GBSS1 indicates a possible tool for use in fingerprinting plants. As an example of its utility we analysed wheat, rice, barley and maize genomic sequences. Each intron was spliced out and its length measured in base pairs. The length of each intron was compared is generating an intron length polymorphism map, illustrated by a bar graph (Fig. 6). Adding greater utility to this fingerprint is the variation in the number of introns for a given species. This is highlighted in rice and maize which have two extra introns inserted into exons four and six of the GBSS1 locus, these two regions then becomes intron 4 and intron 7 in rice and maize.
4. Discussion
Fig. 5. Phylogram of 4 members from the Poaceae family generated from fragment AB and CD nucleotide sequences (see Figs. 2 and 3). This is the only parsimonious Neighbour Joining, unrooted tree found by PAUP*4.0b10. The numbers on the branches refer to number of changes (evolutionary distance) that have occurred to separate the four genera.
fragments of rice, wheat, barley and maize were evaluated with PAUP*4.0b10 (Swofford, 2002) Using a heuristic search and nearest neighbour interchange with steep descent only one consensus phylogram was produced (Fig. 5). This displays wheat and barley as more closely related and rice and maize being less related. We repeated this process using only fragment AB and only fragment CD DNA sequences. Phylograms were produced that
Fig. 6. Intron comparison of four cereal genomic sequences. The genomic sequences for rice, maize, wheat and barley were analysed and the intron lengths were calculated in base pairs which is represented on the Y-axis. The X-axis represents the intron number.
Single/low-copy number genes in plants are a rich source of phylogenetic information. They have great potential in the improvement and conformation of phylogenetic reconstruction at all taxonomic levels, especially where markers from cpDNA and nrDNA fail to build strong phylogenetic hypotheses. Plant phylogenetics and plant identification studies however, still rely on a few universal markers derived primarily from cpDNA and nrDNA sequences. The utility of these sequences is broad but many factors including inheritance patterns and rates of sequence divergence have limited its resources. Biologists face even more challenging situations when trying to follow gene flow or reconstruct speciation in hybrids. This is a result of inheritance pathways where we see predominantly uniparental (maternal) movement of cpDNA; or homogenization or biparentally inherited nrDNA. Phylogenetic utility of nrDNA spans a broader taxonomic range but relationships between distant genes and more closely related species are poorly resolved. This occurs because the coding sequences of the ribosomal subunits are highly conserved, and on the other extreme the internal transcribed spaces are too polymorphic to identify any relationships (Soltis and Soltis, 1998). Identifying nuclear genes and wider variety universal markers that overcome these limitations will prove useful in plant identification and systematics. Initial in silico sequence comparisons of the nuclear gene GBSS1 identified a level of homology in both coding and non-coding sequence that will be useful in a broad taxonomic range. We have identified a number of conserved
S.R. McIntosh et al. / Journal of Cereal Science 41 (2005) 37–46
regions for primer design enabling amplification of both coding and non-coding sequence in all grasses tested. Comparisons of the entire GBSS1 coding region between closely related species revealed three to five percent polymorphism but still enabled individual identification. For example exon sequence alignment of the three diploid wheat progenitors reveals 7 SNPs in fragment CE and 2 SNPs in fragment AB as shown by Yan et al. (2000). The polyploid nature of wheat did not result in mixed sequences in these regions of the gene and therefore did not complicate the SNP analysis. This level of homology was also observed for a cross section of Hordeum species analyzed although the sequence of fragment CD revealed no polymorphisms. Combining coding sequence data sets from the three fragments targeted will provide an accurate fingerprint at the species level. This level of sequence conservation or lack of evolution within coding regions is not unexpected for a well-conserved nuclear gene and allows alignments that identify relationships. The polymorphism between aligned CD sequences of GBSS1 coding regions could identify variation at the genus level but was limited at the species level depending of the genus examined. No application was observed for varietal identification and a more rapidly evolving gene would be required for this purpose or the examination of the intron sequences of GBSS1. Analysis of intron sequences in fragments AB and CD from closely related species and cultivars reveals a higher level of polymorphism that can be targeted for identification at this level. The number, sequence and length of introns show greatest evolution and variation due to the presence of a wide range of insertions and deletions (indels). Analysis of these introns identified fingerprints for four grasses based solely upon number of and length of the introns across the entire GBSS1 locus. Highlighting the polymorphic nature of these sequences is the occurrence of additional exons in rice and maize that have resulted from the insertion of two non coding sequences into exons four and six, also described by Yan et al. (2000). With this study aimed at both phylogenetics and plant identification (traceability) we have investigated the usefulness of pyrosequencing and its potential as a high throughput commercial application. Pyrosequencing has already proven its effectiveness as a tool for targeting small regions of DNA including single nucleotide polymorphisms. Analysis of the coding sequences in fragments AB and CD identified potential regions to target with pyrosequencing using universal primers designed for grasses. CDpyroseq2 primer showed greatest utility allowing cyclic pyrosequencing through a highly polymorphic 30 base pair region. Sequencing through this coding region produced specific pyrograms or ‘fingerprints’ which are useful for plant identification. Also examined in this study was the technique of producing a specific dispensation order for nucleotides during sequencing. This dispensation was
45
designed from previously known sequence composition and enabled a larger fragment to be sequenced. This method produced accurate pyrograms with good utility to identify a species with known sequence; however the cyclic approach proved more useful for developing a universal system for identification. Finally we reinforced the phylogenetic utility of GBSS1 by the production of a single phylogram, generated from the sequence of fragments AB and CD, confirming previously described taxonomic relationships. To support the usefulness in the analysis of manufactured plant food products we have tested these methods on mixed, plant-derived materials. The results thus far have enabled us to identify the individual plant constituents of materials derived from multiple sources. This will be very useful in industry where the ever increasing demand on quality control and traceability must be met. The future of this plant identification procedure is its potential for identification outside the grass family. In silico analysis of GBSS1 across a variety of plants from different families reveals the potential for universal primers C and D to function outside grasses. Initial results employing truncated versions of primers C and D that encompass highly conserved domains (beyond the grass family) is looking promising. We have been able to amplify fragment CD from a broad range of plants including monocots, dicots and nonflowering species (summarized in Tables 1 and 2). Therefore, GBSS1 a single-copy nuclear gene has proven to be a valuable source of information in plant systematics and as a tool for plant identification in grasses and possibly all plants.
Acknowledgements This work was supported by the Australian Research Council and PROLIGO industries. We would like to thank the Australian DNA Bank for kindly providing and sourcing seed and DNA samples and Stirling Bowen from Southern Cross Plant Genomics for pyrosequencing analysis.
References Appels, R., Clarke, B.C., 1992. The 5S DNA units of bread wheat (Triticum aestivum). Plant Systematics and Evolution 183, 195–208. Batley, J., Barker, G., O’Sullivan, H., Edwards, K.J., Edwards, D., 2003. Mining for single nucleotide polymorphisms and insertions/deletions in maize expressed sequence tag data. Plant Physiology 132, 84–91. Ching, A., Rafalski, A., 2002. Rapid genetic mapping of ESTs using SNP pyrosequencing and indel analysis. Cellular and Molecular Biology Letters 7, 803–810. Clark, L.G., Weiping, Z., Wendel, J.F., 1995. A Phylogeny of the grass family (Poaceae) based on ndhF sequence data. Systematic Biology 20, 436–460. Cordeiro, G.M., Taylor, G.O., Henry, R.J., 2000. Characterisation of microsatellite markers from sugarcane (Saccharum sp.), a highly polyploid species. Plant Science 155, 161–168.
46
S.R. McIntosh et al. / Journal of Cereal Science 41 (2005) 37–46
Davis, J.I., Soreng, R.J., 1993. Phylogenetic structure in the grass family (Poaceae) as inferred from chloroplast DNA restriction site variation. American Journal of Botany 80, 1444–1454. Duvall, M.R., Morton, B.R., 1996. Molecular phylogenetics of Poaceae: an expanded analysis of rbcL sequence data. Molecular Phylogenetics and Evolution 5, 352–358. Fakhrai-Rad, H., Pourmand, N., Ronaghi, M., 2002. Pyrosequencing: an accurate detection platform for single nucleotide polymorphisms. Human Mutation 19, 479–485. Gaut, B.S., Clark, L.G., Wendel, J.F., Muse, S.V., 1997. Comparisons of the molecular evolutionary process at rbcL and ndhF in the grass family (Poaceae). Molecular Biology and Evolution 14, 769–777. Henry, R.J. (Ed.), 2001. Plant Genotyping (The DNA Fingerprinting of Plants). CABI, Wallingford. Hsiao, C., Chatterton, N., Asay, K., Jensen, K., 1995. Molecular phylogeny of the Pooideae (Poaceae) based on nuclear rDNA (ITS) sequence. Theoretical and Applied Genetics 90, 389–398. Kellogg, E.A., Campbell, C.S., 1987. in: Soderstrom, T.R., Hilu, K.W., Campbell, C.S., Barkworth, M.E. (Eds.), Grass Systematics and Evolution. Smithsonian Institute Press, Washington DC, pp. 310–322. Maguire, T., Collins, G., Sedgley, M., 1994. A modified CTAB DNA extraction procedure for plants belonging to the family Proteaceae. Plant Molecular Biology Reporter 12, 106–109. Mason-Gamer, R.J., Weil, C.F., Kellogg, E.A., 1998. Granule-bound starch synthase: structure, function, and phylogenetic utility. Molecular Biology and Evolution 15, 1658–1673. Mathews, S., Tsai, R., Kellogg, E.A., 2000. Phylogenetic structure in the grass family (Poaceae): evidence from the nuclear gene phytochrome B. American Journal of Botany 87, 96–107. McLauchlan, A., Ogbonnaya, F., Hollingsworth, B., Carter, M., Gale, K., Henry, R., Holton, T., Morell, M., Rampling, L., Sharpe, J., Shariflou, M., Jones, M., Appels, R., 2001. Development or robust PCR-based markers for each homo-allele of granule-bound starch synthase and their application in wheat breeding programs. Australian Journal of Agricultural Research 52, 1409–1416.
Nasu, S., Suzuki, J., Ohta, R., Hasegawa, K., Yui, R., Kitazawa, N., Monna, L., Minobe, Y., 2002. Search for and analysis of single nucleotide polymorphisms (SNPs) in rice (Oryza sativa, Oryza rufipogon) and establishment of SNP markers. DNA Research 9, 163–171. Nishikawa, T., Salomon, B., Komatsuda, T., von Bothmer, R., Kadowaki, K., 2002. Molecular phylogeny of the genus Hordeum using three chloroplast DNA sequences. Genome 45, 1157–1166. Nordstrom, T., Ronaghi, M., de Faire, U., Morgenstern, R., Nyren, P., 2000. Direct analysis of single-nucleotide polymorphism on double-stranded DNA by pyrosequencing. Biotechnology and Applied Biochemistry 31, 107–112. Pacey-Miller, T., Henry, R., 2003. Single-nucleotide polymorphism detection in plants using a single-stranded pyrosequencing protocol with a universal biotinylated primer. Analytical Biochemistry 317, 165–170. Renvoize, S.A., Clayton, W.D., 1992. Classification and evolution of grasses, in: Chapman, G.P. (Ed.), Grass Evolution and Domestication. Cambridge Univ. Press, Cambridge, pp. 3–37. Sastri, D.C., Hilu, K., Appels, R., Lagudah, E.S., Playford, J., Baum, B.R., 1992. An overview of evolution in plant 5S DNA. Plant Systematics and Evolution 183, 169–181. Soltis, D., Soltis, P., 1998. Choosing an approach and an appropriate gene for phylogenetic analysis, in: Soltis, D., Soltis, P., Doyle, J. (Eds.), Molecular Systematics of Plants. 2. DNA Sequencing. Kluwer, Boston. Swofford, D., 2002. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, MA. Thompson, J., Higgins, D., Gibson, T., 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4680. Yan, L., Bhave, M., Fairclough, R., Konik, C., Rahman, S., Appels, R., 2000. The genes encoding granule-bound starch synthases at the waxy loci of the A, B and D progenitors of common wheat. Genome 2000;, 264–272.