Plant Science 165 (2003) 941 /949 www.elsevier.com/locate/plantsci
Obtaining and analysis of flanking sequences from T-DNA transformants of Arabidopsis Genji Qin, Dingming Kang, Yiyu Dong, Yunping Shen, Li Zhang, Xiaohui Deng, Yao Zhang, Song Li, Nan Chen, Weiran Niu, Cong Chen, Peicheng Liu, Haodong Chen, Jigang Li, Yanfei Ren, Hongya Gu, XingWang Deng, Li-Jia Qu *, Zhangliang Chen * Peking /Yale Joint Center for Plant Molecular Genetics and Agrobiotechnology, National Laboratory of Protein Engineering and Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, PR China Received 14 January 2003; received in revised form 16 April 2003; accepted 2 May 2003
Abstract Large collections of insertional Arabidopsis lines are valuable for research on functional genomics. Using the activation tagging vector pSKI015, more than 45 000 T-DNA insertion lines were generated by Agrobacterium -mediated floral-dip transformation protocol. 2304 insertion lines were analyzed and 1502 items of plant sequences flanking the T-DNA insertion sites were obtained by a modified thermal asymmetric interlaced PCR (TAIL-PCR) protocol. These sequences were searched against Genebank database using BLAST and 1194 insertion sites were determined according to the sequences matching to the Arabidopsis genome sequence. The insertion sites were distributed on all the five Arabidopsis chromosomes and the interfered genes were classified into 14 function categories. Analysis on 1194 items of 100-bp sequences surrounding T-DNA insertion sites showed that 27 and 31% GC contents were likely to favor the T-DNA integration. Sixty-eight items of these 100-bp sequences having more than two insertions were chosen to look for motifs in favor of T-DNA integration. The results showed that ‘‘ATNTT’’ (N represents A/T/C/G) and the polyT and polyA motifs probably play a role in the T-DNA integration event. # 2003 Elsevier Ireland Ltd. All rights reserved. Keywords: T-DNA; Arabidopsis mutant collection; Flanking sequence; Location; Distribution
1. Introduction Genomic research has seen many significant accomplishments in Arabidopsis in recent years. One of them is the almost complete sequence of Arabidopsis genome published in 2000 by the Arabidopsis Genome Initiative (AGI). The sequenced regions cover 115.4 megabases of the 125 megabases genome and about 50% of the genome encodes about 25 500 putative genes, with an average density of one gene every 4.5 kb [1]. However, it is estimated that more than 90% of the Arabidopsis genes have not been functionally studied. The completion of the Arabidopsis genome sequence opens a new
* Corresponding authors. Fax: /86-10-6275-1841. E-mail addresses:
[email protected] [email protected] (Z. Chen).
(L.-J.
Qu),
era for studying gene function using reverse genetics approaches. Reverse genetics approach starts with the mutation of a gene and from the resulting phenotypic changes gene function is predicted [2]. Several methods are often adopted to create gene mutations. Homologous recombination is a routine way to analyze gene function in yeast, but this method is not efficient in higher plants [3]. Double-stranded RNA (dsRNA) is another useful method to study gene function in plants. In this method, gene-specific sequences in the sense and anti-sense orientations were linked to the two ends of a GUS gene fragment, respectively, and then placed under the control of a CaMV 35S promoter before being transformed into plants. Duplex RNA will be formed in vivo, resulting in reduction-of-function or loss-of-function for the specific gene [4]. Although efficient to analyze the function of a specific gene, it is still difficult to use dsRNA method in a large scale because different
0168-9452/03/$ - see front matter # 2003 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/S0168-9452(03)00218-8
G. Qin et al. / Plant Science 165 (2003) 941 /949
942
Fig. 1. The left border sequence and the specific primers used for TAIL-PCR and sequencing. The orientation of the primers was indicated by arrows. LS1, LS2 and LS3 were for TAIL-PCR whereas LS4 for sequencing the PCR products.
genes require different constructs. Recently, random large-scale insertional mutagenesis has become a powerful alternative, since it allows a density of insertions in the Arabidopsis genome, and the transformation is easy by simply dipping the flower into Agrobacterium culture. Moreover, the development of effective methods to amplify flanking sequences of the insertion sites and availability of whole genome sequence in public databases also favor for insertional tagging [5]. The insertion elements basically include transposons and T-DNA of Agrobacterium . Although a transposoncreated mutant can be easily confirmed by excising transposon from the disrupted gene in the presence of
Table 1 Oligonucleotide primers for TAIL-PCR and sequencing Name
Sequence
LS1 LS2 LS3 LS4 AD1a AD2a AD3a AD4a AD5 AD1-1 AD1-2 AD2-1 AD2-2 AD2-3 AD2-4 AD2-5 AD4-1
5?-GAC AAC ATG TCG AGG CTC AGC AGG A-3? 5?-TGG ACG TGA ATG TAG ACA CGT CGA-3? 5?-GCT TTC GCC TAT AAA TAC GAC GG-3? 5?-TTG GTA ATT ACT CTT TCT TTT CCT CC -3? 5?-NTC GA(G/C) T(A/T)T (G/C)G(A/T) GTT-3? 5?-NGT CGA (G/C)(A/T)G ANA (A/T)GA A-3? 5?-(A/T)GT GNA G(A/T)A NCA NAG A-3? 5?-AG(A/T) GNA G(A/T)A NCA (A/T)AG G-3? 5?-TC(G/C) TNA G(T/A)A CNT (A/T)GG A-3? 5?-NAC GT(G/C) A(A/T)T (G/C)CN AGA-3? 5?-NTC GA(G/C) T(A/T)T NG(A/T) GAA-3? 5?-NTC GT(G/C) (A/T)GA NA(A/T) GTT-3? 5?-NCA GCT (G/C)(A/T)C TNT (A/T)GA A-3? 5?-NCT CGT (G/C)(A/T)G ANT (A/T)GA T-3? 5?-NGT CGA (G/C)(A/T)C ANT (A/T)CT A-3? 5?-NGT CGA (G/C)(A/T)C TNA (A/T)CA A-3? 5?-AG(A/T) CAN G(A/T)T NCA (A/T)GA A-3?
a The four AD primers were designed by Liu et al. [18], the other primers were used in this paper for the first time.
transposase, the unstableness and preferential shortrange transposition make this method difficult for a random saturated mutagenesis in Arabidopsis . The TDNA insertion cannot translocate after integration into the genome and thus can generate stable insertional lines [5]. To decipher the function of all Arabidopsis genes, it is preferable to construct large collections of insertional Arabidopsis lines. Several types of collections have been established. Some collections can be used to screen for an insertion in a particular gene of interest by PCR using pooled DNA [2,6]. Some create a database of the insertion flanking sequences and insertion sites for every individual mutant line [7 /12]. Other collections were established even with the data of phenotypes [13]. Therefore, researchers can screen a mutant line of interest by simply searching database. All these collections are undoubtedly valuable in elucidating gene function by reverse genetics. In this study, Arabidopsis plants were transformed with activation-tagging vector pSKI015 to generate insertional mutants. Four copies of CaMV 35S enhancer elements were put within and close to the right border of
Table 2 Summary of the identified T-DNA flanking sequences Items
Numbers Percentage of total identified lines
Identified DNA samples of lines The samples having bar gene T-DNA flanking sequences Identical to Arabidopsis sequences Identical to vector Short sequences failing to match Arabidopsis sequences
2304 1901 1502 1194 252 56
82.5% 79.0% 79.5% 16.8% 3.7%
G. Qin et al. / Plant Science 165 (2003) 941 /949
943
Fig. 2. The distribution of T-DNA insertions on the five Arabidopsis chromosomes. (A) Proportion of the insertion numbers on the chromosomes. (B) Comparison of insertion numbers between direct and opposite orientations. (C) The physical map of insertion sites on the chromosomes. The representation of the Arabidopsis chromosomes was referred from AGI [1]. Telomeric and centromeric regions are light blue while the other regions are red. The pseudo-colors represent different insertion frequency per 300 kb region.
the T-DNA. Therefore, loss-of-function as well as gainof-function mutants can be obtained [14]. About 45 000 T-DNA lines were generated and 1502 flanking sequences of the insertion sites were obtained. It has been found that these insertion sites were spread in all five Arabidopsis chromosomes. The distribution and other characters of insertion sites were also discussed.
2. Materials and methods 2.1. Plant materials and transformation The Columbia ecotype of A. thaliana was used to generate T-DNA insertion lines. Plants for transformation were grown in soil at 229/2 8C in the long-day
944
G. Qin et al. / Plant Science 165 (2003) 941 /949
Table 3 The distribution of T-DNA insertions on five Arabidopsis chromosomes Items
Chr1
Chr2
Chr3
Chr4
Chr5
Number of direct insertion Number of opposite insertion Number of total insertion Percentage of total insertion (%) Average percentage of every 1 Mb (%)
176 153 329 27.55 0.95
99 74 173 14.48 0.74
122 91 213 17.83 0.77
97 99 196 16.41 0.94
143 140 283 23.70 0.91
condition (16-h of light and 8-h of dark) and transformed by the floral dip protocol [15,16]. The transformed seeds were put on petri dishes with 1/2 MS medium containing 20 mg/ml glufosinate ammonium, and then put at 4 8C for 3 days to synchronize germination. Seedlings were germinated and grown on the medium under the condition mentioned above for a week. The green seedlings were transplanted into soil and grown at the same conditions. For polymerase chain reaction (PCR) amplification, 0.5 g of plant tissues were collected and genomic DNA was prepared using the CTAB (2%) method described by Wagner et al. [17].
precipitated directly. For those having two or more bands, the PCR products were loaded for electrophoresis again on a 0.8% low-melting agarose gel and then the brightest bands were excised and recovered by extraction with phenol and chloroform:isomyl alcohol (24:1), respectively, before DNA was precipitated. About 25 /50 ng of PCR products were used for sequencing on ABI 377 Automatic DNA Sequencer according to manufacturer’s instruction (PE Applied Biosystems). A specific primer LS4 was designed to sequence the PCR products (Fig. 1 and Table 1).
2.4. Sequence analysis and mapping 2.2. PCR amplification 20 ng of DNA from T1 generation plants was used to amplify the Bar gene fragment in the pSKI015 vector. The primers were as follows: 5?-TCA TCA CAT CTC GGT GAC GG-3? and 5?-TAC CAT GAG CCC AGA ACG AC-3?. Those DNA samples with positive Bar bands were then used to amplify the T-DNA flanking sequences by thermal asymmetric interlaced PCR (TAIL-PCR). In order to increase the efficiency of TAIL-PCR, several modifications were made to the TAIL-PCR protocol described by Liu et al. [18]. The secondary amplified products were not directly used to do the tertiary amplification, but first analyzed by electrophoresis on 1% agarose gel. The samples having no bright bands were subjected to do the primary amplification with other arbitrary degenerate (AD) primers. Those bright bands were cut from the gel and frozen in liquid nitrogen. DNA was collected by centrifugation at 12 000 rpm for 5 min and then used as template to do the tertiary amplification. Three nested primers (LS1, LS2 and LS3) complementary to the downstream sequence of left border in pSKI015 were synthesized to do TAIL-PCR. Up to 12 arbitrary degenerate (AD) primers were used in combination with the three specific primers (Fig. 1 and Table 1). 2.3. Purification and sequencing of PCR products The tertiary amplification products were checked on 1% agarose gels. Those having only one clear band were
The sequences were used to search and align with the Arabidopsis genome sequence in the National Center for Biotechnology Information (NCBI) database by BLAST [19]. The insertion sites were mapped to the Arabidopsis chromosomes by MATLAB6.1 (The MathWorks, Inc.) and MACROMEDIA FIREWORKS MX (Macromedia, Inc.) software.
3. Results
3.1. Identification of T-DNA flanking sequences More than 45 000 independent insertion lines were generated. 2304 DNA samples of different individual lines were used to amplify the Bar gene fragment. A clear 560-bp Bar gene band was amplified from about 1901 DNA samples. Out of these 1901 samples, 1502 TDNA flanking sequences were obtained by TAIL-PCR [18]. These sequences were used to search the NCBI Genebank by BLAST. 1194 sequences were found either identical or significantly similar to Arabidopsis genome sequences. As expected, a short stretch of T-DNA left border sequence was found in the 1194 sequences. However, 56 sequences were too short to find any similarity to the Arabidopsis sequences, whereas 252 sequences were actually vector sequences. The detailed data were summarized in Table 2.
G. Qin et al. / Plant Science 165 (2003) 941 /949
945
Fig. 3. The pie chart showing the gene proportion of different functions. A total of 1010 genes were interfered by T-DNA insertion. Fig. 4. The proportion of T-DNAs inserted in the upstream, downstream, intron and exon of the interfered genes.Fig. 4. The proportion of T-DNAs inserted in the upstream, downstream, intron and exon of the interfered genes. Fig. 5. The distribution of the GC% content of the 1194 100-bp SSISs and 1200 100-bp RSs.
946
G. Qin et al. / Plant Science 165 (2003) 941 /949
3.2. Distribution of T-DNA insertion sites in Arabidopsis genome The physical map of insertion sites were established by comparing 1194 flanking sequences with the five Arabidopsis chromosome sequences. The different pseudo-colors represent the different T-DNA insertion frequency in 300 kb sequence range. Since all the TDNA flanking sequences determined by one primer LS4 (Fig. 1) have the orientation of 3?0/5?, whereas in the Genebank the Arabidopsis genome sequences are 5?0/3? orientated, the orientations of T-DNA flanking sequences to the Arabidopsis genome sequences (/// or ///) represent the direct or opposite orientation of T-DNA inserted in Arabidopsis genome. The physical map showed that the T-DNA insertions, either direct or opposite, distributed on all the five Arabidopsis chromosomes. The proportion was showed in Fig. 2A. Although the chromosome 2 is longer than the chromosome 4, the number of insertions on chromosome 2 is less than that on chromosome 4. The average insertion frequency for every 1 Mb sequence on different chromosomes varied from 0.74% of chromosome 2 to 0.95% of chromosome 1 (Table 3). On each chromosome, the number of direct insertions and opposite ones was statistically similar (Fig. 2B and Table 3). However, some hot regions were observed (Fig. 2C). For example, some terminal regions of the long arms of chromosome 1, 2 and 4, and those of the short arm of chromosome 1 and 5 have more insertions than other regions. It was also noted that there were fewer insertions around centromere regions of chromosome 1, 2, 3 and 5. 3.3. The analysis of T-DNA interfered genes Out of the mapped 1194 T-DNA insertion sites, 1010 have their tags inserted in or close to a predicted gene, whereas only 68 tags located in repetitive regions and 126 in intergenic regions. These 1010 interfered genes can be classified into 14 functional categories according to the description of Bevan et al. [20] (Fig. 3). Nearly half of these genes were those with unknown functions. Those categories with greater numbers in the genome [1] also have the greater numbers of T-DNA insertion. The percentages of T-DNA insertion interfered genes involved in metabolism, transcription, plant defense, signal transduction, and energy were 7.8, 6.4, 4.65, 6.93 and 5.64%, respectively. For those T-DNA inserted functional genes, insertion sites can be defined into four regions related to the gene analyzed: upstream region (within 1 kb upstream of start codon), downstream region (within 1 kb downstream of stop codon), intron and exon. Analysis on the 1010 insertion sites showed that the frequency of TDNA sites was different in the four regions. About 1/3
T-DNA tags were inserted into the 1 kb upstream regions of start codons of corresponding genes (33.07%), whereas the frequency in the other three regions seemed similar (Fig. 4). 3.4. The analysis of surrounding sequences of T-DNA insertion sites In order to find out whether there is any sequence bias for T-DNA insertion, 100-bp long sequences surrounding the 1194 insertion sites (50 bp upstream and 50 bp downstream to each site) were selected from the Arabidopsis genome sequences database and analyzed. Meanwhile, 1200 100-bp sequences were randomly selected from the database as controls. The GC% contents of the 1194 100-bp sequences surrounding the insertion site (SSIS) and 1200 100-bp random sequences (RS) were analyzed. The GC% contents of the two classes of 100-bp sequences ranged from 15 to 60%, although most of these sequences have 27 /43% GC contents (Fig. 5). Both RS and SSIS have the largest number at 33% GC content, and they are basically similar at most of the GC% content. However, when looking at the 27 and 31% GC content, the number of SSIS was statistically about 2-fold more than that of RS (56/25, 61/36, respectively). Sixty-eight SSISs inserted more than two times were selected as hot integration sites to search for sequence patterns using IBM Bioinformatics Group Tools [21]. It was found that a sequence ‘‘ATNTT’’ (N represent any nucleotides) existed in more than 70% of the SSISs (46/ 68). Furthermore, this sequence pattern was found not only in left border sequences but also in right border sequences (Fig. 6). Interestingly, a polyT motif and a polyA motif were found in some SSISs including the sequences having six times and 13 times integration by using Motif Discovery Tool [22] (Fig. 7).
4. Discussion With the completion of the Arabidopsis genome project, the goal of the 2010 Project of Arabidopsis has been established to decipher the function of all the Arabidopsis genes [23]. One of the efficient approaches to this goal is to generate large insertional mutant collections and relative databases. Some independent groups have generated large collections of T-DNA or transposon insertion lines. For example, DNA pools from the population of 60 480 and 38 000 T-DNA transformants have been established [2,6]. A database of 500 individual DS insertion lines with position information [7] and some other databases including 1000, 6000, 9000, 85 108 T-DNA flanking sequences, respectively, [8,10 /12] have also been reported. Despite these progresses, it is a long way to obtain insertion
G. Qin et al. / Plant Science 165 (2003) 941 /949
Fig. 6. The alignment of 48 SSISs that are inserted more than twice, showing the ‘‘ATNTT’’ motif in both left and right borders.
mutants of all Arabidopsis genes, since for a specific gene of 1 kb in length, 600 000 insertion lines would be
947
needed to have a 99% possibility to find an insertion in it [5]. In this study, a collection of 45 000 Arabidopsis insertional lines was established by activation-tagging. Out of 2304 DNA samples from independent insertion lines, 1502 T-DNA flanking sequences were obtained by TAIL-PCR. 1010 T-DNA flanking sequences were found located in or close to functional genes and distributed in five Arabidopsis chromosomes. Since activation-tagging mutants can be used to analyze the function of redundant genes that are difficult to identify by loss-of-function approaches [14], this mutant collection will be valuable in determining functions of Arabidopsis gene, especially those of redundant genes. It was reported that the nucleolus organizer regions NOR2 and NOR4 were two hot spots for DS insertion [7]. In this study, we analyzed the distribution of 1194 TDNA insertion sites in the Arabidopsis genome. No specific bias for the T-DNA insertion was observed in the NOR2 and NOR4 regions. However, T-DNA insertions were not distributed equally and randomly in each chromosome (Fig. 2). There were fewer insertions in the regions of centromere and more insertions in chromosome arms, which were also observed in other reports [9]. In the same time, about 84.6% of the T-DNA insertion sites were located in or close to functional genes, whereas, only 5.7% of the T-DNA insertion sites located in the repetitive regions. Similar phenomenon was observed in rice [24]. This data imply that T-DNA prefers to insert in the non-repetitive regions. It is reasonable to propose the explanation that on the one hand there are more repetitive sequences that are not transcriptionally active in the centromere regions, and on the other hand there are topological structures such as D-loop formation, nicks, gaps or breaks raised from transcription in the gene space [25]. It was reported that the DS element is preferentially inserted at the 5? ends of genes [7] and 37 156 out of 85 108 (44%) T-DNA insertion sites were mapped to promoters [11]. In our effort to analyze the T-DNA insertions, we found that about one third of the T-DNA insertions were in the 1 kb upstream region of the start codons of the corresponding genes. This implies that TDNA insertion also seems to be biased for the 5? end region of a given gene. These mutants carrying T-DNA in the upstream regions of genes will favor the function study of those redundant genes through gain-of-function approaches [14]. In fact, the phenotypes of many mutants we observed were the results of over-expression of a gene (data not shown). In a report published in 2000, Barakat and his colleagues obtained 210 Arabidopsis transformants and analyzed the GC% contents of BAC clones having TDNA insertions [24]. It was found that these BACs covering a 32 /40% GC content range and there were maximum number of T-DNA insertions in DNA fragments having 34-36% GC content. In this study, we
948
G. Qin et al. / Plant Science 165 (2003) 941 /949
Fig. 7. The polyT and ployA motifs created by using SEQUENCE LOGOS software. Each logo consists of stacks of symbols, one stack for each position in the sequence. The overall height of the stack indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of each nucleic acid at the position, measured in bits. The zero position represents the insertion site.
analyzed 1194 100-bp SSISs and 1200 100-bp RSs. It turned out that the two classes of the sequences shared a similar GC% distribution. Both RSs and SSISs have the largest numbers at the 33% GC level. However, at the 27 and 31% GC level, respectively, the numbers of SSISs were significantly larger than those of RSs. This result indicates that T-DNA integrated into Arabidopsis genome in a random pattern at most of the loci, but this integration event probably favors sequences having 27 or 31% GC contents. We analyzed 68 SSISs having more than two insertions selected from 1194 items of SSISs. Three motifs were found among these SSISs. The ‘‘ATNTT’’ motif was found exiting in about 70% of SSISs (46/68). Interestingly, this motif also exits in both left and right borders. It was reported that limited homologous pairing between the T-DNA and plant target sequences played a role in T-DNA integration process [12,25]. The fact that homologous motif ‘‘ATNTT’’ is found in both SSISs and T-DNA two borders suggests that pairing between the motifs from SSISs and T-DNAs might occur and hence promote the integration of T-DNA. Pairing with the left border sequence of T-DNA may result in direct insertions whereas pairing with right border sequence may result in opposite insertions. These pairings may occur randomly so that the numbers of direct insertion and opposite insertion are similar, as showed in this study (Fig. 2). It may also count for the reason why T-DNA insertion favors promoter regions that usually include TATA-box sequences. It was reported that a T-rich region play an important role in T-DNA integration to plant genome based on the analysis of 18 000 flanking sequence tags (FSTs) [12]. In our study many of SSISs were also found having polyT or polyA motifs (Fig. 7). It is possible that the polyT and polyA structure is easy to unwound and denature, thus favoring T-DNA insertion event. In conclusion, from the 1502 T-DNA flanking sequences we obtained and analyzed, we found that TDNA integration is random at most of chromosome loci. However, T-DNA favors to insert into the upstream regions of the genes and the region having 27 or 31% GC contents. The motif ‘‘ATNTT’’ found both in SSISs and T-DNA borders may facilitate T-DNA integration by homologous pairing. It is believed that,
with the number of the mutants growing and more TDNA flanking sequences obtained, more will be elucidated about the mechanisms behind the insertion-sitechoosing of T-DNA integration.
Acknowledgements This study was supported by National Program for Transgenic Plants from China (Grant no. J99-A-001). We thank Professor Dr Liu Meihua for technical assistance. We are also indebted to Professor Dr Liu Yaoguang (Genetic Engineering Lab, College of Life Science, South China Agricultural University, China) for valuable suggestions about primer designing for TAIL-PCR.
References [1] The Arabidopsis genome initiative, analysis of the genome sequence of the flowering plant Arabidopsis thaliana , Nature 408 (2000) 796 /815. [2] P.J. Krysan, J.C. Young, M.R. Sussman, T-DNA as an insertional mutagen in Arabidopsis , Plant Cell 11 (1999) 2283 /2290. [3] M. Hanin, S. Volrath, A. Bogucki, M. Briker, E. Ward, J. Paszkowski, Gene targeting in Arabidopsis , Plant J. 28 (6) (2001) 671 /677. [4] C.F. Chuang, E.M. Meyerowitz, Specific and heritable genetic interference by double-strand RNA in Arabidopsis thaliana , Proc. Natl. Acad. Sci. USA 97 (2000) 4985 /4990. [5] S. Parinov, V. Sundaresan, Functional genomics in Arabidopsis : large-scale insertional mutagenesis complements the genome sequencing project, Curr. Opin. Biotechnol. 11 (2000) 157 /161. [6] M. Galbiati, M.A. Moreno, G. Nadzan, M. Zourelidou, S.L. Dellaporta, Large-scale T-DNA mutagenesis in Arabidopsis for functional genomic analysis, Funct. Integr. Genom. 1 (2000) 25 / 34. [7] S. Parinov, M. Sevugan, D. Ye, W.C. Yang, M. Kumaran, V. Sundaresan, Analysis of flanking sequences from dissociation insertion lines: a database for reverse genetics in Arabidopsis , Plant Cell 11 (1999) 2263 /2270. [8] F. Samson, V. Brunaud, S. Balaergue, B. Dubreucq, L. Lepiniec, G. Pelletier, M. Caboche, A. Lecharny, FLAGdb/FST: a database of mapped flanking insertion sites (FSTs) of Arabidopsis thaliana T-DNA transformants, Nucleic Acid Res. 30 (1) (2002) 94 /97. [9] D. Ortega, M. Raynal, M. Laudie, C. Llauro, R. Cooke, M. Devic, S. Genestier, G. Picard, P. Abad, P. Contard, C. Sarrobert, L. Nussaume, N. Bechtold, C. Horlow, G. Pelletier, M. Delseny,
G. Qin et al. / Plant Science 165 (2003) 941 /949
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
Flanking sequence tags in Arabidopsis thaliana T-DNA insertion lines: a pilot study, C. R. Biol. 325 (7) (1991) 773 /780. L. Szabados, I. Kovacs, A. Oberschall, E. Abraham, I. Kerekes, L. Zsigmond, R. Nagy, M. Alvarado, I. Krasovskaja, M. Gal, A. Berente, G.P. Redei, A.B. Haim, C. Koncz, Distribution of 1000 sequenced T-DNA tags in the Arabidopsis genome, Plant J. 32 (2) (2002) 233 /242. A. Sessions, E. Burke, G. Presting, G. Aux, J. McElver, D. Patton, B. Dietrich, P. Ho, J. Bacwaden, C. Ko, J.D. Clarke, D. Cotton, D. Bullis, J. Snell, T. Miguel, D. Hutchison, B. Kimmerly, T. Mitzel, F. Katagiri, J. Glazebrook, M. Law, S.A. Goff, A high-throughput Arabidopsis reverse genetics system, Plant Cell 14 (12) (2002) 2985 /2994. V. Brunaud, S. Balzergue, B. Dubreucq, S. Aubourg, F. Samson, S. Chauvin, N. Bechtold, C. Cruaud, R. DeRose, G. Pelletier, L. Lepiniec, M. Caboche, A. Lecharny, T-DNA integration into the Arabidopsis genome depends on sequences of pre-insertion sites, EMBO Rep. 3 (12) (2002) 1152 /1157. A. Ogarkova, N.B. Tomilova, A.A. Tomilov, V.A. Tarasov, Collection of Arabidopsis thaliana morphological insertion mutants, Russian J. Genet. 37 (8) (2001) 899 /904. D. Weigel, J.H. Ahn, M.A. Bla´zquez, J. Borevitz, S.K. Christensen, C. Fankhauser, C. Ferra´ndiz, I. Kardailsky, E.J. Malancharuvil, M.M. Neff, J.T. Nguyen, S. Sato, Z. Wang, Y. Xia, R.A. Dixon, M.J. Harrison, C.J. Lamb, M.F. Yanofsky, J. Chory, Activation tagging in Arabidopsis , Plant Physiol. 122 (2000) 1003 /1014. N. Bechtold, J. Ellis, G. Pelletier, In planta Agrobacteriummediated gene transfer by infiltration of adult Arabidopsis thaliana plants, C.R. Acad. Sci. Paris Life Sci. 316 (1993) 1194 /1199. S.J. Clough, A.F. Bent, Floral dip: a simplified method for Agrobacterial -mediated transformation of Arabidopsis thaliana , Plant J. 16 (1998) 735 /743. D.B. Wagner, G.R. Furnier, M.A. Saghai-Maroof, S.M. Williams, B.P. Dancik, R.W. Allard, Chloroplast DNA polymorphisms in lodgepole and jack pines and their hybrids, Proc. Natl. Acad. Sci. USA 84 (1987) 2097 /2100. Y.G. Liu, N. Mitsukawa, T. Oosumi, R.F. Whittier, Efficient isolation and mapping of Arabidopsis thaliana T-DNA insert junctions by thermal asymmetric interlaced PCR, Plant J. 8 (3) (1995) 457 /463.
949
[19] S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller, D.J. Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acid Res. 25 (1997) 3389 /3402. [20] M. Bevan, I. Bancroft, E. Bent, K. Love, H. Goodman, C. Dean, R. Bergkamp, W. Dirkse, M. Van Staveren, W. Stiekema, L. Drost, P. Ridley, S.A. Hudson, K. Patel, G. Murphy, P. Piffanelli, H. Wedler, E. Wedler, R. Wambutt, T. Weitzenegger, T.M. Pohl, N. Terryn, J. Gielen, R. Villarroel, R. De Clerck, M. Van Montagu, A. Lecharny, S. Auborg, I. Gy, M. Kreis, N. Lao, T. Kavanagh, S. Hempel, P. Kotter, K.D. Entian, M. Rieger, M. Schaeffer, B. Funk, S. Mueller-Auer, M. Silvey, R. James, A. Montfort, A. Pons, P. Puigdomenech, A. Douka, E. Voukelatou, D. Milioni, P. Hatzopoulos, E. Piravandi, B. Obermaier, H. Hilbert, A. Du¨sterho¨ft, T. Moores, J.D.G. Jones, T. Eneva, K. Palme, V. Benes, S. Rechman, W. Ansorge, R. Cooke, C. Berge, M. Delseny, M. Voet, G. Volckaert, H.W. Mewes, S. Klosterman, C. Schueller, N. Chalwatzis, Analysis of 1.9Mb of contiguous sequence from chromosome 4 of Arabidopsis thaliana , Nature 391 (1998) 485 /488. [21] I. Rigoutsos, A. Floratos, Motif discovery without alignment or enumeration, Proceedings 2nd Annual ACM International Conference on computational molecular biology, New York, March 1998. [22] T.D. Schneider, R.M. Stephens, Sequence Logos: a new way to display consensus sequences, Nucleic Acid Res. 18 (1990) 6097 / 6100. [23] J. Chory, J.R. Echer, S. Brigs, M. Caboche, G.M. Coruzzi, D. Cook, J. Dangl, S. Grant, M. Lou Guerinot, S. Henikoff, R. Martienssen, K. Okada, N.V. Raikhel, C.R. Somerville, D. Weigel, National science foundation-sponsored workshop report: ‘‘The 2010 Project’’ functional genomics and the virtual plant, a blueprint for understanding how plant are built and how to improve them, Plant Physiol. 123 (2000) 423 /425. [24] A. Barakat, P. Gallois, M. Raynal, D.M. Ortega, C. Sallaud, E. Guiderdon, M. Delseny, G. Bernardi, The distribution of T-DNA in the genomes of transgenic Arabidopsis and rice, FEBS Lett. 471 (2000) 161 /164. [25] R. Mayerhofer, Z. Koncz-Kalman, C. Nawrash, G. Bakkeren, A. Crameri, K. Angelis, G.P. Redei, J. Schell, B. Hohn, C. Koncz, TDNA integration: a mode of illegitimate recombination in plants, EMBO J. 3 (1991) 697 /704.