Analysis and recognition of the GAGA transcription factor binding sites in Drosophila genes

Analysis and recognition of the GAGA transcription factor binding sites in Drosophila genes

Computational Biology and Chemistry 35 (2011) 363–370 Contents lists available at SciVerse ScienceDirect Computational Biology and Chemistry journal...

561KB Sizes 0 Downloads 50 Views

Computational Biology and Chemistry 35 (2011) 363–370

Contents lists available at SciVerse ScienceDirect

Computational Biology and Chemistry journal homepage: www.elsevier.com/locate/compbiolchem

Research Article

Analysis and recognition of the GAGA transcription factor binding sites in Drosophila genes E.S. Omelina a,∗ , E.M. Baricheva a , D.Yu. Oshchepkov a , T.I. Merkulova a,b a b

Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, pr. Lavrentieva 10, Novosibirsk 630090, Russian Federation Novosibirsk State University, Pirogova Street 2, Novosibirsk 630090, Russian Federation

a r t i c l e

i n f o

Article history: Received 9 June 2011 Received in revised form 5 October 2011 Accepted 7 October 2011 Keywords: GAGA transcription factor Binding sites Computer prediction Drosophila

a b s t r a c t The transcription factor GAGA, encoded by the gene Trl, controls expression of many Drosophila melanogaster genes. We have compiled the presently largest sample (120 sites) of published nucleotide sequences with experimentally confirmed binding to GAGA protein. Analysis of the sample has demonstrated that despite an apparent structural diversity of the GAGA sites, they fall into four distinct groups, namely, (1) the sites containing two GAG trinucleotides with no more than one nucleotide substitution in each and separated by spacers with a length of 1 or 3 nucleotides (GAGnGAG and GAGnnnGAG); (2) the sites containing a single GAGAG motif; (3) (GA)3–9 microsatellite repeats; and (4) the sites corresponding to three and more direct repeats of GAG trinucleotide homolog and its inverted repeats separated by spacers of various lengths. Using the software package SITECON, the methods were elaborated for recognizing the sites of GAGnGAG (method 1) and GAGnnnGAG (method 2) types in DNA sequences. Experimental verification confirmed the ability to interact with the GAGA factor for 72% of the sites predicted using method 1 and 94.5% of the sites predicted by method 2. Application of the experimentally verified methods to analyzing the localization of potential GAGA binding sites in the target genes of this transcription factor has demonstrated that the 5 -untranslated regions (5 UTRs) and first introns are enriched for these sites (two–threefold relative to the average occurrence frequency in the D. melanogaster genome) as compared with a moderate enrichment (not exceeding 1.5-fold) of promoter regions (−4000/+200 bp or −1000/+100 bp). © 2011 Elsevier Ltd. All rights reserved.

1. Introduction The fruit fly Drosophila melanogaster is a uniquely advantageous system for performing biochemical and genetic analyses of different mechanisms involved in transcription regulation (Honkela et al., 2010). Transcription process is known to require of numerous transcription factors in addition to the core promoter recognition factors. Transcription factors control gene expression patterns by binding to specific sequence elements, transcription factor binding sites, located in the regulatory regions of the genome (Fuda et al., 2009; Orphanides and Reinberg, 2002). Therefore, identification of transcription factor binding sites in nucleotide sequences, analysis of their structural organization, and study of their arrangement patterns relative to the target genes is of the paramount importance for understanding of molecular mechanisms underlying gene expression control. The Drosophila GAGA factor, encoded by the Trithorax-like (Trl) gene, is a sequence-specific DNA binding protein with several

∗ Corresponding author. Tel.: +7 383 363 49 17x1024; fax: +7 383 333 12 78. E-mail address: [email protected] (E.S. Omelina). 1476-9271/$ – see front matter © 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.compbiolchem.2011.10.008

functions. Mutations in Trl affect viability and display distinct developmental phenotypes (Farkas et al., 1994). Being expressed in various Drosophila organs and tissues (Baricheva et al., 1997; Bhat et al., 1996; Perelygina et al., 1993; Soeller et al., 1993), the transcription factor GAGA plays an important role in embryogenesis, oogenesis (Bhat et al., 1996; Ogienko et al., 2006, 2008), as well as in development of the Drosophila eye (Dos-Santos et al., 2008). The GAGA factor plays a global role in chromosome structure and function, since it is normally involved in modification of the heterochromatin structure. Presumably, this determines an important role of GAGA in cell division as well as makes the Trl gene mutations dominant enhancers of the position effect variegation (Bhat et al., 1996; Farkas et al., 1994; Ogienko et al., 2006; Raff et al., 1994). The ability of GAGA factor to interact with DNA through all phases of the cell cycle can determine its involvement in cell memory processes (Brock and Fisher, 2005; Raff et al., 1994). The best studied GAGA function is its involvement in the regulation of gene expression. The GAGA factor was originally identified as a transcriptional factor that binds to several (GA)n -rich sites in the Ultrabithorax and engrailed promoters (Biggin and Tjian, 1988; Soeller et al., 1988). Since then the binding sites for the GAGA protein have been identified in the promoters of other Drosophila

364

E.S. Omelina et al. / Computational Biology and Chemistry 35 (2011) 363–370

genes, in particular, in promoters of the heat shock genes Heatshock-protein-70Bb (Gilmour et al., 1989; Lee et al., 1992) and Heat shock protein 26 (Glaser et al., 1990; Lu et al., 1992), the genes Actin 5C (Chung and Keller, 1990) and Ecdysone-induced protein 74EF (Thummel, 1989), and in the his3/his4 promoters (Gilmour et al., 1989). The GAGA binding sites have also been found in the regulatory regions distant from promoters (Busturia et al., 2001; Mahmoudi et al., 2003; Mishra et al., 2001), exons (van Steensel et al., 2003) and introns (O’Donnell and Wensink, 1994; Tchoubrieva and Gibson, 2004). The first analysis of the GAGA binding sites dates back to 1995 (Granok et al., 1995). Analysis of the 48 GAGA sites known by that time and belonging to 12 Drosophila genes demonstrated that although their sequences were considerably different, a homolog of the GAGAGAG heptanucleotide with no more than three nucleotide substitutions was detectable in 27 of these 48 sites. Thus, GAGAGAG was accepted as a consensus for the GAGA sites (Granok et al., 1995). However, almost half binding sites for this transcription factor remained unclassified. It is still also unknown whether there are any regular patterns in the location of GAGA sites in the target genes. The data obtained when analyzing individual genes are still very sparse and fragmentary to understand this issue. The technology of a global search for such sites (DamID chromatin profiling) allowed several hundreds of new GAGA target genes to be discovered (van Steensel et al., 2003). However, this method as well as the more popular method of chromatin immunoprecipitation, ChIP-chip or ChIP-seq, is unable to determine a precise location of the transcription factor binding site (or, possibly, several sites) within a rather large DNA fragment (2–5 kb in the case of Dam methylation) and does not distinguish between the binding of a studied transcription factor to DNA and its binding to any other chromatin protein including other transcription factors bound to cognate DNA sites (Farnham, 2009; Kolchanov et al., 2007). Bioinformatics methods for GAGA site recognition can considerably enhance solution of both problems. In addition, use of computer methods for site recognition allows all the sites existing in the genome to be detected, whereas chromatin profiling methods are limited by a certain type and state of cells (Farnham, 2009; Kolchanov et al., 2007). The goal of this work was to study the patterns of structural organization of the GAGA transcription factor binding sites, to construct computational methods for recognition of the binding sites for this transcription factor in nucleotide sequences, to verify them experimentally, and to analyze the GAGA site distribution in various regions of target genes using the elaborated methods. 2. Materials and methods 2.1. GAGA binding sites dataset The sample comprises 120 GAGA binding sites from 18 D. melanogaster genes (Supplementary Table 1). The criterion for selecting the sites was the data on interaction between GAGA protein and the corresponding DNA region obtained by at least one of the following methods: (1) DNase I footprinting using purified GAGA, (2) EMSA with purified GAGA, and (3) EMSA with nuclear extract and specific antibodies. 2.2. Recognition of potential GAGA binding sites The search was performed in the 5 -flanking regions (−4000/+100 bp relative to the transcription start site), introns, 5 UTRs, coding sequences (CDSs), and 3 UTRs of 33 genes controlling Drosophila development and in intron 2 of Trl gene. A sample of the promoter regions (−4000/+100 bp) containing

202 potential gene targets of GAGA (van Steensel et al., 2003), 6947 random Drosophila genes, and the complete D. melanogaster genome sequence were also used. In addition, we analyzed the sequences of the first introns and 5 UTRs of 50 housekeeping genes controlling glycolysis, Krebs cycle, and amino acid metabolism. The genome sequence was extracted from the GenBank (http://www.ncbi.nlm.nih.gov/genbank/) and the remaining sequences, from the FlyBase (http://flybase.org/). The GAGA sites were searched for with the help of the SITECON software package (http://wwwmgs.bionet.nsc.ru/mgs/ programs/sitecon/). SITECON is based on the detection of conserved conformational and physicochemical properties of short regions in an alignment of known functional DNA sites and computation of the similarity measure between an analyzed sequence and the training sample, referred to as a measure of the conformational similarity. SITECON is utilizing an exhaustive set of 38 conformational and physicochemical DNA properties from the database Property (http://wwwmgs.bionet.nsc.ru/mgs/gnw/bdna/) (Ponomarenko et al., 1999). The built-in algorithm is used for the selection of the subset of the conservative properties which is most informative for recognition, allowing decrease in the recognition errors (Oshchepkov et al., 2004). SITECON was tested when searching for the new transcription factor binding sites, and the search results demonstrated its high predictive ability (Kolchanov et al., 2007). The sites of GAGnGAG and GAGnnnGAG type with a length of 25 and 28 nt, respectively, with a conserved core at the central position (Tables 2 and 3) were used as the training samples. To estimate the type 1 errors (underpredictions), one sequence was successively removed from each training sample and used for recognition (jack-knife method; Efron and Gong, 1983). The type 2 errors (overpredictions) for each method were estimated by recognition of the binding sites in a random sequence with a length of 1 × 106 nt generated by multiple shuffling of the nucleotides of the sequences from the corresponding training sample. The estimates of recognition errors for various levels of conformational similarity are listed in Supplementary Table 2. 2.3. Preparation of GAGA fusion protein The histidine-tagged GAGA fusion protein (His-GAGA) contains amino acids 5–519 of GAGA fused in frame to six histidine residues of pET-28a (Novagen, USA). To produce the construct, a fragment of Trl gene was generated by PCR amplification of cDNA from the Drosophila ovaries (kindly provided by D.A. Karagodin, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Novosibirsk, Russia) using the sense primer 5 -aagag-ctcat-gtcgc-tgcca-atgaa-ttcg-3 and the antisense primer 5 -tgaag-acctg-atcgc-ccac-3 . This fragment was initially inserted into the pBluescript (pBS) vector (Stratagene). The pBS carrying the cDNA insert was digested with restriction endonucleases NdeI–HindIII, and the Trl fragment was inserted into the pET-28a vector digested with the same enzymes. The construct was verified by sequencing and transformed into Escherichia coli BL21/DE3. The cells were grown to an OD600 of 0.6–0.8, induced by adding 0.3 mM IPTG, additionally grown for 4 h, and harvested by centrifugation. His-GAGA was purified using a His-Trap HP column (Amersham Biosciences) according to the manufacturer’s recommendations. The protein content in samples was determined spectrophotometrically (SmartSpec Plus Spectrophotometer, Bio-Rad Laboratories) as Protein (mg/ml) = 1.55 × (A280 − A320 ) − 0.76 × (A260 − A320 ). 2.4. Electrophoretic mobility shift assays (EMSA) The oligonucleotides (Table 1) were synthesized by V.F. Kobzev (Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Novosibirsk, Russia). After annealing, the

E.S. Omelina et al. / Computational Biology and Chemistry 35 (2011) 363–370

365

Table 1 Potential GAGA sites predicted by methods 1 and 2 in the regulatory regions of D. melanogaster genes and the results of their experimental verification. No.

Gene

Positiona /Chain direction

(a) Potential GAGA sites recognized by method 1 (GAGnGAG) −321/ − 1 domeless (dome)

Level of conformational similarityb

Sequence

Experimental confirmation

0.882

ttTGTCCGTGTGAGTGAGAAAGTGACG

+

2

midline fasciclin (mfas)

−1516/ −

0.856

ttCTTACATCGGAGAGAGCTGCACGCT

+

3

string (stg)

−246/ −

0.868

ttTAATAATGTGAGATCCACGCATTCC

+

4

rhino (rhi)

−2448/ +

0.878

ttGCAAAAACCGAGAGACAAAAGCAGG

+

5

cut up (ctp)

−1407/ −

0.862

ttGGGCTGCGAGAGCGAGAGCGGGAGA

+

6

Notch (N)

−950/ +

0.876

ttAAAACATTTGAGCGATCTACCAAAA



7

bunched (bun)

−1243/ −

0.865

ttTCCACTAAGGAGTTCTATATATGTA



8

grim (grim)

−74/ +

0.851

ttATTCACTGCGAGAGAGGAACTCGCT

+ +

9

BarH1 (B-H1)

−307/ +

0.908

ttGAAACCGGGGAGAGACTTTATTTCA

10

BarH2 (B-H2)

−156/ +

0.857

ttCAGGTCAATGAGCGAAAAAGAGACG

+

11

ovarian tumor (otu)

−652/ +

0.867

ttTGAAACACCGAGAGGAAATAGAATT

+

12

cup (cup)

−1349/ −

0.854

ttACCTTCGCCGAGTGATATGACTACT



13

poly U binding factor 68kD (pUf68)

−2602/ +

0.859

ttCTCGGTAGAGAGCTCCAAACTGGAC

+

14

pipsqueak (psq)

−133/ −

0.859

ttTTTAAAATTGAGTGAGAACTCATTT

+

15

combgap (cg)

−466/ −

0.881

ttCTGAACACGGAGAGAATTCGCTATT

+

16

singed (sn)

−1794/ +

0.874

ttACGGGTCCTGAGCGAGAAACAGAAG



17

sickle (skl)

−1780/ −

0.881

ttTCAAGACGAGAGTGATGAAAGGATA



18

reaper (rpr)

−3848/ +

0.908

ttAAGGTAGGAGAGTGAGTTAATGACC

+

(b) Potential GAGA sites recognized by method 2 (GAGnnnGAG) gurken (grk) −14/ − 1

0.840

ttAGGAAAACAGAGCCAGAGAAACAGA

+

2

chickadee (chic)

−995/ −

0.775

ttCAGTGCAGGGAGTGGGAGTGAGAGG

+

3

effete (eff)

−2421/ +

0.792

ttCATGCGAGGGAGGGAGAGAACGAAC

+

4 5

bifocal (bif1) bifocal (bif2)

−1528/ − −1717/ +

0.858 0.760

ttTTTTTACTAGAGTAAGAGTTCGGCT ttTTATGATCCGAGGTGGAGCAGACGA

+ +

6 7

frayed (fray) thickveins (tkv)

−125/ − −502/ −

0.822 0.789

ttATTTGTAGGGAGTGAGAGTGTGTTA ttCGTGTGATTGAGGTTGAGTTTTTTT

+ +

8

saxophone (sax)

−154/ +

0.786

ttCTTTGATGGGAGTTCGAGTTTGACG

+

9 10

slow border cells (slbo) brinker (brk)

−284/ − −202/ −

0.809 0.826

ttCCTGTGGATGAGAGCGAGAGCCGCG ttCAGCAGTGGGAGAGAGAGTGAGAGG

+ +

11

longitudinals lacking (lola)

−145/ −

0.817

ttAGAACAAGTGAGTGCGAGCGAGATG

+

12

scalloped (sd)

−67/ +

0.836

ttCTCCATTTCGAGATAGAGATCTTCT

+

13

grainy head (grh)

−179/ +

0.803

ttTCTCTCGGAGAGCGTGAGCTTAAAG

+

14

enhancer of yellow 3 (e(y)3)

−888/ −

0.788

ttCCGTTCAGCGAGTTCTCGCCGACAC

+

15

broad (br)

−90/ −

0.777

ttTCGCAGGCTGAGGCTGAGACTGAGG



16

Trithorax-like (Trl1)

+330/ +

0.801

ttGGAGAACGAGAGAGTGAGACGTAGA

+

17

Trithorax-like (Trl2)

+363/ +

0.819

ttGCAGACAGAGAGGGAGAGAGCGAGA

+

18

Trithorax-like (Trl3)

+1905/ +

0.812

ttGAGGACGGAGAGTGTGAGCGACGCT

+

a

For all the genes except for Trl position is indicated relative to the transcription start site; for Trl, relative to the beginning of the second intron. The level of conformational similarity to the experimentally verified GAGA sites determined using SITECON. Italicized in the column Sequence are the tt overhangs necessary for (␣-32 P)-dATP labeling. The regions homologous to GAGnGAG and GAGnnnGAG are shaded. b

oligonucleotides were labeled with Klenow enzyme and (␣-32 P)dATP and used as DNA probes in EMSA. The binding reaction mixture (12 ␮l) contained 2.6 ␮g of recombinant protein, 10 ng of (32 P)-labeled DNA probe, and 30 ng of poly(dI-dC) in the band shift buffer (25 mM HEPES pH 7.8, 80 mM KCl, 0.2 mM EDTA, 1 mM DTT, 0.2 mM EGTA, and 10% glycerol). After incubation in ice for 10 min, the reaction mixture was analyzed by nondenaturing PAGE (4.5%). 3. Results and discussion 3.1. Structural variants of GAGA binding sites We have compiled the presently largest sample of the GAGA sites with strict experimental confirmation that the transcription factor GAGA binds directly to the corresponding DNA regions. This sample comprises 120 sites belonging to 18 D. melanogaster genes. Supplementary Table 1 lists the corresponding data. Note that the majority of the known GAGA binding sites are localized to the genes controlling development. Analysis of the primary structures of the sites from the sample has demonstrated that despite an apparent diversity of the GAGA sites they all can be arranged into four groups. The first group unites the sites where two GAG trinucleotides with no more

than one nucleotide substitution in each are separated by spacers with a length of 1 or 3 nucleotides (consensuses GAGnGAG and GAGnnnGAG, respectively; see Tables 2 and 3). The sites of GAGnGAG type account for 12% and the sites containing the GAGnnnGAG motif, for 21% of all the analyzed GAGA sites. The second group, accounting for 16% of the total sample, comprises the sites containing a single GAGAG motif. The most representative group (31%) unites the sites corresponding to microsatellites (group 3) with the number of GA repeats varying from three to nine (Supplementary Table 3). Finally, the fourth group (20%) contains various sites with the most abundant types of GAGnGAGnGAGnGAGnGAG, GAGnnnGAGnnnGAG, and GAGnnnGAGnnnGAGnnnGAG as well as several sites containing GAGnnnnnCTC and CTCnnnnGAG inverted repeats. The common trait of all the analyzed GAGA binding sites is the presence of one or several homologs of GAG trinucleotide (with no more than one substituted base) separated by spacers of various lengths. In the majority of cases, the trinucleotides are in a direct orientation relative to one another, although they can be also met in an inverted orientation. Thus, the trinucleotide GAG can be regarded as an elementary structural unit used for assembling the GAGA sites. This is confirmed by the data that the presence of such trinucleotide in a synthetic oligonucleotide is sufficient to bind

366

E.S. Omelina et al. / Computational Biology and Chemistry 35 (2011) 363–370

Table 2 Training sample of the GAGA binding sites of GAGnGAG type. The regions homologous to GAGnGAG are shaded. No.

Gene

Sequence

Reference

1

Ubx

cgtgcgtaagagcgagatacagata

Mahmoudi et al. (2003)

2

Ubx

ggccgagtcgagtgagttgagtcgg

Biggin and Tjian (1988)

3

Abd-B

ttcggacaacagagtgaatgagagt

Busturia et al. (2001)

4

Trl

tggagaggagagcaagacggcaaga

Kosoy et al. (2002)

5

Trl

aaaataactgagtgacaaaaaaaaa

Kosoy et al. (2002)

6

en

agagttaacgcgagagccgattgaa

Soeller et al. (1988, 1993)

7

Hsp70Bb

tcgtctacggagcgacaattcaatt

Weber et al. (1997) Katsani et al. (1999)

8

ftz

gcgcagagcgagagggcgcgactct

9

ftz

gcaggcacggagaggggaagttatc

Katsani et al. (1999)

10

Trl

gcattgccagatggagcgcaggaag

Kosoy et al. (2002)

11

Trl

tgtatctatgagagaagtgcctgcc

Kosoy et al. (2002)

12

Ubx

tgcggtatggagagatgcgcacgct

Mahmoudi et al. (2003)

13

Kr

tacgagggtgagtgagaatctttcg

Kerrigan et al. (1991)

14

Trl

gcgctccgtgagcgggaaaacaata

Kosoy et al. (2002)

GAGA (Wilkins and Lis, 1998). It is also known that the zinc finger of the DNA binding domain in this protein interacts with this particular trinucleotide (Omichinski et al., 1997). However, it is likely that the interaction of GAGA protein with a DNA region containing only one GAG trinucleotide is insufficiently stable. In particular, it has been demonstrated that the sites containing 2.5 and more GA repeats display a higher affinity for this protein as compared with the sites containing only single GAG trinucleotide (Wilkins and Lis, 1998). It has been shown for many transcription factors that the stability of their binding to DNA significantly increases when dimers or multimers bind to diversely arranged repeats of the sites interacting with the monomer of the corresponding protein (Merkulov and Merkulova, 2009). It is known that GAGA also interacts with its sites as a dimer or a multimer (Espinas et al., 1999; Katsani et al., 1999); consequently, the presence of two or more repeats of the GAG trinucleotide in the sites belonging to groups 1, 3, and 4 should provide a necessary stability of the complexes. In addition, it is likely that the manifold variants of structural organization for the sites of these groups reflect the architectural diversity of the protein complexes binding to them.

There are two known isoforms of the GAGA protein (GAGA519 and GAGA581 ), which are produced by alternative splicing and are able to form heteromers with one another (Benyajati et al., 1997) and other proteins. A well-known example of such heteromer is the GAGA complex with the protein Pipsqueak (Psq), which also interacts with (GA)-rich sequences (Schwendemann and Lehmann, 2002). The GAGA interactions with the transcription factor Tramtrack (TTK) and protein Pleiohomeotic (Pho), the binding sites of which differ from those of GAGA, have also been demonstrated (Pagans et al., 2002; Mahmoudi et al., 2003). Presumably, the binding of one GAGA molecule to the site containing a single GAG trinucleotide (group 2) is stabilized due to the interaction with another transcription factor having closely localized binding sites, as has been demonstrated for several other factors (Merkulov and Merkulova, 2009). Thus, our classification covers all the variants of the GAGA sites discovered so far and agrees well with the available data on mechanisms of GAGA factor interactions with its sites in DNA. Note that in addition to the above described GAGA binding sites found in euchromatin, the regions interacting with this protein have been

Table 3 Training sample of the GAGA binding sites of GAGnnnGAG type. The regions homologous to GAGnnnGAG are shaded. No.

Gene

Sequence

Reference

1

en

cgagtgcgggagcgagagggcgagatcc

Soeller et al. (1988, 1993)

2

Abd-B

gatgtgaaagagagcgtgctcttggcct

Mishra et al. (2001)

3

˛Tub84B

ataactgaagtgggagaggtaatggcac

O’Donnell and Wensink (1994)

4

ftz

aggtgcgcagagcgagagggcgcgactc

Katsani et al. (1999)

5

Trl

ggcgtcgtggaggaggagtagaaagggg

Kosoy et al. (2002)

6

Kr

tctttcgccgagacagagcgtacttata

Kerrigan et al. (1991)

7

Kr

cgtgcgtgtgagcgggagagccaattga

Kerrigan et al. (1991)

8

Trl

tacgcaacagagcgagagcgcgcgagac

Kosoy et al. (2002)

9

Trl

ttggtaaatgagtcagagtcaagctcaa

Kosoy et al. (2002)

10

Trl

tacgggcaagagaaagagaaggcacaaa

Kosoy et al. (2002)

11

Trl

cagcaagaagaggaagagcaaagggagt

Kosoy et al. (2002)

12

Abd-B

tgtgttagtgcgtgagagtaagtgagac

Busturia et al. (2001)

13

˛Tub84B

cgataactgaagtgggagaggtaatggc

O’Donnell and Wensink (1994)

14

˛Tub84B

ggtattttgcaggactagttgggactcg

O’Donnell and Wensink (1994)

15

Ubx

cagcgagagaagaaagagcaaggcgaaa

Mahmoudi et al. (2003)

16

dpp

ttacctgccgaagaagaagagcagcgtt

Schwyter et al. (1995)

17

Eip74EF

tgaagttcagagcgtgaacttgagcgtt

Thummel (1989)

18

Eip74EF

tgcgctggtgtgcgtgagcgggtctcag

Thummel (1989)

19

en

aagagatgagagtagaagatgtgtcttt

Soeller et al. (1988, 1993)

20

en

ggagtgagtgagccggcgaaaccggttc

Soeller et al. (1988, 1993)

21

Hsp26

gcaattttctagaaagagcgcaaaagaa

Sandaltzopoulos et al. (1995)

22

Trl

gatggagaggagagcaagacggcaagac

Kosoy et al. (2002)

23

Trl

cgtctgtctgagtttgtgcgcgtgtgtg

Kosoy et al. (2002)

24

Trl

aatggaaaagaaaaagagtacagacgtt

Kosoy et al. (2002)

25

Ubx

gagagtcccaagcgagagcttttcatag

Mahmoudi et al. (2003)

E.S. Omelina et al. / Computational Biology and Chemistry 35 (2011) 363–370

367

Fig. 1. Examples of experimental confirmation of GAGA binding sites predicted by (A) method 1 and (B) method 2. The used DNA probes: (1 and 2) oligonucleotide corresponding to the known GAGA site (5 -ggcatagagagggagagaggcgtca-3 ) from bxd polycomb response element as a positive control (Horard et al., 2000); (3) oligonucleotide containing (TTTG)6 as a negative control; and (4–10) oligonucleotides corresponding to predicted sites. Binding to (1) the E. coli extract without His-GAGA and (2–10) containing His-GAGA.

also found in heterochromatin. These GAGA sites of (AAGAG)n and (AAGAGAG)n types differ in their structure from the sites localized to the gene regulatory regions and are arranged in extremely long satellite repeats (Platero et al., 1998; Raff et al., 1994). In particular, the (AAGAG)n in the D. melanogaster Y chromosome, consisting mainly of heterochromatin, occupies 7 Mb (Lohe et al., 1993). Presumably, the differences in the structure and length of the GAGA sites localized to euchromatin and heterochromatin determine the differences in their functions. 3.2. Detection of potential GAGA binding sites by SITECON and their experimental verification Development of computational methods for predicting transcription factor binding sites is very important for annotation of the eukaryotic genomes. However, a high level of false positive predictions is a serious issue in attempts to reliably recognize transcription factor binding sites (Farnham, 2009). One of the main reasons determining a poor quality of recognition methods is heterogeneity of training samples. This is the situation when the binding sites for the same transcription factor are represented by different structural variants; their presence in a single sample considerably deteriorates the recognition quality (Kolchanov et al., 2007; Merkulova et al., 2007). In particular, it is evident that simultaneous use of the detected sites of GAGnGAG and GAGnnnGAG or CTCnnnnGAG types in one training sample will be fatal for the majority of computational recognition methods, since they are sensitive to both the variations in spacer length and mutual orientation of half-sites (Kolchanov et al., 2007). Therefore, we constructed two separate training samples for the sites of GAGnGAG and GAGnnnGAG type (Tables 2 and 3). Then using the SITECON software package, we have elaborated method 1 and method 2 for recognizing the sites of GAGnGAG and GAGnnnGAG types respectively. The sites of these types constitute over one-third of the sample comprising the known sites. Besides, method 1 simultaneously recognizes the sites corresponding to microsatellites, which also account for approximately 30% of the sample. In addition, the elaborated methods can detect many GAGA sites from group 4. The potential sites were searched for using method 1 at the level of conformation similarity exceeding 0.85 and by method 2 at a similarity level of 0.76 and higher (Supplementary Table 2). At the first stage, the promoter regions (−4000/+100 bp) of the selected D. melanogaster genes involved in the control of development (Table 1) were examined as well as the second intron of Trl gene, in the 5 -region of which 27 GAGA sites were found (Kosoy et al., 2002). Since the majority of the already known GAGA binding

sites are localized to the regulatory regions of developmental genes, we could expect that the selected group would also contain many GAGA target genes. Moreover, DamID chromatin profiling demonstrated that GAGA protein bound to the regions of nine of the 33 selected genes in Drosophila Kc cells (van Steensel et al., 2003). Totally, 264 sites were detected in the 5 -regions of 33 genes; of them, 124 were found by method 1 and 140, by method 2. Additionally, 19 sites were found in the second intron of Trl gene. We selected 36 sites for their experimental verification (Table 1), trying (1) to take at least one site from each gene under study and (2) to study sites with different location within −4000/+100 bp promoter regions. The verification was performed by EMSA using recombinant His-GAGA (see examples in Fig. 1A and B). Experimental tests demonstrated the high efficiency of both recognition methods: the binding to the recombinant protein was observed for 13 of the 18 sites predicted by method 1 and for 17 of the 18 sites predicted by method 2 (Table 1). Thus, the real experimentally determined overprediction value amounts for less than 30% for method 1 and only 5% for method 2. Consequently method 2 at the chosen parameters can be regarded as a very reliable tool searching for new GAGA sites. The reliability of method 1 is somewhat lower, although even in this case over two-thirds of the predicted sites actually bind with the GAGA factor. The observed difference in the quality of these methods is most likely associated with a smaller size of the training sample used for constructing method 1 (14 sites) as compared with method 2 (25 sites). In general, we have detected 30 new GAGA binding sites in the performed experiments (Table 1). The majority of these sites are located in the genes previously unknown as the target genes for this protein. The exception is the gene Trl, the promoter of which harbors multiple GAGA binding sites, suggesting a potential regulatory role of GAGA on its own promoter (Kosoy et al., 2002). Intron 2 of the Trl gene was included in our analysis, because we had earlier discovered a decrease in the fly viability resulting from deletions of various regions in this intron (Fedorova et al., 2006). Identification of three GAGA binding sites in intron 2 (Table 1) suggests that this region can also be involved in the autoregulation of Trl gene. In addition, nine genes (effete, gurken, string, chickadee, midline fasciclin, frayed, domeless, bifocal, and longitudinals lacking) where we have detected GAGA binding sites, have been earlier identified by van Steensel et al. (2003) as potential targets of GAGA factor. 3.3. Detection of potential GAGA binding sites in Drosophila DNA sequences After the experimental verification, both computer methods were used to analyze the distribution of potential sites in different

368

E.S. Omelina et al. / Computational Biology and Chemistry 35 (2011) 363–370

Fig. 2. Density of the potential binding sites of types (A) GAGnGAG and (B) GAGnnnGAG in various D. melanogaster DNA sequences. The first group of columns shows the site density in the whole genome sequence and the promoter regions of 6947 random genes; the second group, the site density in the sequences of 33 developmental genes; and the third group, the density in the sequences of 202 genes according to van Steensel et al. (2003).

regions of the 33 studied genes. As was expected, the promoter regions (−4000/100 bp) of these genes were enriched for potential GAGA sites (1.5-fold) as compared with the average density of these sites in D. melanogaster genome as well as in the sample of 6947 promoters (−4000/100 bp) of Drosophila genes (Fig. 2A and B). Analysis of a narrower promoter region (−1000/+100 bp) gave the same result. Surprisingly, a considerably more pronounced enrichment was observed in 5 UTRs (1.9-fold for method 1 and 3.1-fold for method 2) and the first intron (2.2- and 3.2-fold, respectively) of these genes. The density of predicted GAGA sites in the remaining introns was slightly higher than in the promoter regions, while in CDSs it was comparable to the average density over the genome. The frequency of GAGA sites in 3 UTRs was lower than on the average over the genome, namely, 1.5- and 1.4-fold lower for methods 1 and 2, respectively (Fig. 2A and B). An unusual distribution pattern of the GAGA sites, when their density in 5 UTRs and first introns was considerably higher as compared with the promoter regions, suggested us to analogously examine the same regions in other potential GAGA target genes. For this purpose, we took 202 genes with experimentally demonstrated by Dam methylation (van Steensel et al., 2003) GAGA binding to

the regions of their localization. It was shown that the frequency of GAGA sites in their promoter regions (−4000/+100 bp) was 0.824/1000 and 0.783/1000 bp for methods 1 and 2, respectively, which is comparable to their density for the sample of promoter regions of 6947 random Drosophila genes and considerably lower as compared with the promoter regions of the 33 selected developmental genes. However, the frequency of GAGA sites in the 5 UTRs of 202 genes was 2- and 2.7-fold (for methods 1 and 2, respectively) higher than the average value for the genome, which is comparable to the density of GAGA sites in the 5 UTRs of the earlier studied 33 genes. The first introns of these genes also displayed an elevated density of GAGA sites, although it was slightly lower than in the analogous regions of the 33 genes (Fig. 2A and B). For the control, we studied the promoters, 5 UTRs, and the first introns of the sample of genes (mainly, housekeeping genes; Supplementary Table 4) that were unlikely to be the targets for the GAGA transcription factor. The density of potential GAGA sites in the promoters, 5 UTRs, and the first introns was comparable to the average genomic density according to both methods. Thus, characteristic of the GAGA target genes is a high frequency of potential GAGA binding sites in the 5 UTRs and introns on the background of a relatively low density of these sites in promoter regions. This distinguished them from the target genes for many other transcription factors, the binding sites for which have been shown to be most abundant in the promoter regions of the target genes (Ananko et al., 2007; Farnham, 2009; Klimova et al., 2006; Rabinovich et al., 2008). Presumably, the unusual distribution pattern of GAGA binding sites is connected with that this protein in addition to transcription initiation is involved in its elongation. In particular, it is known that GAGA factor directly contacts dFACT, Drosophila facilitates chromatin transcription (Shimojima et al., 2003), which is important for transcription elongation. This suggests that the observed clustering of GAGA sites near the transcription start site but downstream rather than upstream of it (as usually takes place) is optimal for providing involvement of this protein in both processes (O’Brien et al., 1995). Note also an overall high frequency of potential GAGA binding sites in the Drosophila genome. On the average, one GAGnGAG type site occurs per each 1342 bp and one GAGnnnGAG type site, per 1410 bp. A high frequency of the GAGA sites over the entire genome is explainable by several reasons. First, involvement of the ubiquitous protein GAGA in the regulation of numerous processes in many organs and tissues at different stages of Drosophila development implies the presence of similarly numerous target genes for this factor, the majority of which is still unknown. It is likely that only part of the binding sites is accessible for binding to transcription factor in the context of chromatin in a particular cell type at a particular stage. In other cell types and other situations, changes in the chromatin state can make other group of GAGA binding sites accessible for binding. Second, GAGA factor can fulfill architectural functions in chromosomes. For example, due to its ability to oligomerize, GAGA can be involved in maintaining connections between remote regions (Mahmoudi et al., 2002). Presumably, part of GAGA sites in the genome is not functional, since they are either located in the regions of constitutively inactive chromatin or their localization regions contains no other transcription factor binding sites necessary for assembly of the protein ensembles capable of influencing transcription efficiency (Rister and Desplan, 2010). In addition, since the concentration of GAGA factors in cell nuclei is rather high (Wilkins and Lis, 1998), numerous binding sites for this factor in the genome is likely to act as the so-called “traps” regulating effective concentration of free transcription factor molecules in the nucleus (Moorman et al., 2006). However, it cannot be excluded that part of these traps is necessary for a local increase in GAGA concentration in the necessary chromatin regions (Kolchanov et al., 2007).

E.S. Omelina et al. / Computational Biology and Chemistry 35 (2011) 363–370

Thus, the developed and experimentally verified computer methods for the recognition of potential GAGA sites allowed for the new information about organization of regulatory regions in the target genes for this transcription factor and abundance of these sites in the D. melanogaster genome. We hope that these methods will be useful for both further studies of the known and putative target genes for GAGA factor in Drosophila and search for the binding sites and target genes of GAGA homologs in other species. Acknowledgements The authors are sincerely grateful to E.V. Antontseva (Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences) for her assistance in EMSA performance. The work was supported by Russian Foundation for Basic Research (grant 10-04-01011) and RAS Program “Molecular and cell biology” A.II.6. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.compbiolchem.2011.10.008. References Ananko, E.A., Kondrakhin, Y.V., Merkulova, T.I., Kolchanov, N.A., 2007. Recognition of interferon-inducible sites, promoters, and enhancers. BMC Bioinformatics 8, 56. Baricheva, E.M., Katokhin, A.V., Perelygina, L.M., 1997. Expression of Drosophila melanogaster gene encoding transcription factor GAGA is tissue-specific and temperature-dependent. FEBS Lett. 414, 285–288. Benyajati, C., Mueller, L., Xu, N., Pappano, M., Gao, J., Mosammaparast, M., Conklin, D., Granok, H., Craig, C., Elgin, S., 1997. Multiple isoforms of GAGA factor, a critical component of chromatin structure. Nucleic Acids Res. 25, 3345–3353. Bhat, K.M., Farkas, G., Karch, F., Gyurkovics, H., Gausz, J., Schedl, P., 1996. The GAGA factor is required in the early Drosophila embryo not only for transcriptional regulation but also for nuclear division. Development 122, 1113–1124. Biggin, M.D., Tjian, R., 1988. Transcription factors that activate the Ultrabithorax promoter in developmentally staged extracts. Cell 53, 699–711. Brock, H.W., Fisher, C.L., 2005. Maintenance of gene expression patterns. Dev. Dyn. 232, 633–655. Busturia, A., Lloyd, A., Bejarano, F., Zavortink, M., Xin, H., Sakonju, S., 2001. The MCP silencer of the Drosophila Abd-B gene requires both Pleiohomeotic and GAGA factor for the maintenance of repression. Development 128, 2163–2173. Chung, Y.T., Keller, E.B., 1990. Regulatory elements mediating transcription from the Drosophila melanogaster actin 5C proximal promoter. Mol. Cell. Biol. 10, 206–216. Dos-Santos, N., Rubin, T., Chalvet, F., Gandille, P., Cremazy, F., Leroy, J., Boissonneau, E., Théodore, L., 2008. Drosophila retinal pigment cell death is regulated in a position-dependent manner by a cell memory gene. Int. J. Dev. Biol. 52, 21–31. Efron, B., Gong, G., 1983. A leisurely look at the bootstrap, the jackknife, and crossvalidation. Am. Statistician 37, 36–48. Espinas, M.L., Jimenez-Garcia, E., Vaquero, A., Canudas, S., Bernués, J., Azorín, F., 1999. The N-terminal POZ domain of GAGA mediates the formation of oligomers that bind DNA with high affinity and specificity. J. Biol. Chem. 274, 16461–16469. Farnham, P.J., 2009. Insights from genomic profiling of transcription factors. Nat. Rev. Genet. 10, 605–616. Farkas, G., Gausz, J., Galloni, M., Reuter, G., Gyurkovics, H., Karch, F., 1994. The Trithorax-like gene encodes the Drosophila GAGA factor. Nature 371, 806–808. Fedorova, E.V., Ogienko, A.A., Karagodin, D.A., Aimanova, K.G., Baricheva, E.M., 2006. Generation and analysis of novel mutations of the Trithorax-like gene in Drosophila melanogaster. Genetika 42, 149–158. Fuda, N.J., Ardehali, M.B., Lis, J.T., 2009. Defining mechanisms that regulate RNA polymerase II transcription in vivo. Nature 461, 186–192. Gilmour, D.S., Thomas, G.H., Elgin, S.C., 1989. Drosophila nuclear proteins bind to regions of alternating C and T residues in gene promoters. Science 245, 1487–1490. Glaser, R.L., Thomas, G.H., Siegfried, E., Elgin, S.C., Lis, J.T., 1990. Optimal heat-induced expression of the Drosophila Hsp26 gene requires a promoter sequence containing (CT)n·(GA)n repeats. J. Mol. Biol. 211, 751–761. Granok, H., Leibovitch, B.A., Shaffer, C.D., Elgin, S.C.R., 1995. Chromatin. Ga-ga over GAGA factor. Curr. Biol. 5, 238–241. Honkela, A., Girardot, C., Gustafson, E.H., Liu, Y.H., Furlong, E.E., Lawrence, N.D., Rattray, M., 2010. Model-based method for transcription factor target identification with limited data. Proc. Natl. Acad. Sci. U.S.A. 107, 7793–7798. Horard, B., Tatout, C., Poux, S., Pirrotta, V., 2000. Structure of a polycomb response element and in vitro binding of polycomb groupcomplexes containing GAGA factor. Mol. Cell. Biol. 20, 3187–3197.

369

Katsani, K.R., Hajibagheri, M.A., Verrijzer, C.P., 1999. Co-operative DNA binding by GAGA transcription factor requires the conserved BTB/POZ domain and reorganizes promoter topology. EMBO J. 18, 698–708. Kerrigan, L.A., Croston, G.E., Lira, L.M., Kadonaga, J.T., 1991. Sequence-specific transcriptional antirepression of the Drosophila Kruppel gene by the GAGA factor. J. Biol. Chem. 266, 574–582. Klimova, N.V., Levitskii, V.G., Ignat’eva, E.V., Vasil’ev, G.V., Kobzev, V.F., Busygina, T.V., Merkulova, T.I., Kolchanov, N.A., 2006. Recognition of the potential SF-1 binding sites by SiteGA method, their experimental verification and search for new SF-1 target genes. Mol. Biol. 40, 512–523. Kolchanov, N.A., Merkulova, T.I., Ignatieva, E.V., Ananko, E.A., Oshchepkov, D.Y., Levitsky, V.G., Vasiliev, G.V., Klimova, N.V., Merkulov, V.M., Hodgman, C.T., 2007. Combined experimental and computational approaches to study the regulatory elements in eukaryotic genes. Brief Bioinform. 8, 266–274. Kosoy, A., Pagans, S., Espinas, M.L., Azorin, F., Bernuesm, J., 2002. GAGA factor downregulates its own promoter. J. Biol. Chem. 277, 42280–42288. Lee, H., Kraus, K.W., Wolfner, M.F., Lis, J.T., 1992. DNA sequence requirements for generating paused polymerase at the start of hsp70. Genes Dev. 6, 284–295. Lohe, A.R., Hilliker, A.J., Roberts, P.A., 1993. Mapping simple repeated DNA sequences in heterochromatin of Drosophila melanogaster. Genetics 134, 1149–1174. Lu, Q., Wallrath, L.L., Allan, B.D., Glaser, R.L., Lis, J.T., Elgin, S.C., 1992. Promoter sequence containing (CT)n ·(GA)n repeats is critical for the formation of the DNase I hypersensitive sites in the Drosophila Hsp26 gene. J. Mol. Biol. 225, 985–998. Mahmoudi, T., Katsani, K.R., Verrijzer, C.P., 2002. GAGA can mediate enhancer function in trans by linking two separate DNA molecules. EMBO J. 21, 1775–1781. Mahmoudi, T., Zuijderduijn, L.M., Mohd-Sarip, A., Verrijzer, C.P., 2003. GAGA facilitates binding of Pleohomeotic to a chromatinized polycomb response element. Nucleic Acids Res. 31, 4147–4156. Merkulova, T.I., Oshchepkov, D.Y., Ignatieva, E.V., Ananko, E.A., Levitsky, V.G., Vasiliev, G.V., Klimova, N.V., Merkulov, V.M., Kolchanov, N.A., 2007. Bioinformatical and experimental approaches to investigation of transcription factor binding sites in vertebrate genes. Biochemistry (Mosc). 72, 1187–1193. Merkulov, V.M., Merkulova, T.I., 2009. Structural variants of glucocorticoid receptor binding sites and different versions of positive glucocorticoid responsive elements: analysis of GR-TRRD database. J. Steroid Biochem. Mol. Biol. 115, 1–8. Mishra, R.K., Mihaly, J., Barges, S., Spierer, A., Karch, F., Hagstrom, K., Schweinsberg, S.E., Schedl, P., 2001. The iab-7 polycomb response element maps to a nucleosome-free region of chromatin and requires both GAGA and pleiohomeotic for silencing activity. Mol. Cell. Biol. 21, 1311–1318. Moorman, C., Sun, L.V., Wang, J., de Wit, E., Talhout, W., Ward, L.D., Greil, F., Lu, X.J., White, K.P., Bussemaker, H.J., van Steensel, B., 2006. Hotspots of transcription factor colocalization in the genome of Drosophila melanogaster. Proc. Natl. Acad. Sci. U.S.A. 103, 12027–12032. O’Brien, T., Wilkins, R.C., Giardina, C., Lis, J.T., 1995. Distribution of GAGA protein on Drosophila genes in vivo. Genes Dev. 9, 1098–1110. O’Donnell, K.H., Wensink, P.C., 1994. GAGA factor and TBF1 bind DNA elements that direct ubiquitous transcription of the Drosophila 1-tubulin gene. Nucleic Acids Res. 22, 4712–4718. Ogienko, A.A., Karagodin, D.A., Pavlova, N.V., Fedorova, S.A., Voloshina, M.A., Baricheva, E.M., 2008. Molecular and genetic description of a new hypomorphic mutation of Trithorax-like gene and analysis of its effect on Drosophila melanogaster oogenesis. Ontogenez 39, 134–142. Ogienko, A.A., Karagodin, D.A., Fedorova, S.A., Fedorova, E.V., Lashina, V.V., Baricheva, E.M., 2006. Effect of hypomorphic mutation in Trithorax-like gene on Drosophila melanogaster oogenesis. Ontogenez. 37, 211–220. Omichinski, J.G., Pedone, P.V., Felsenfeld, G., Gronenborn, A.M., Clore, G.M., 1997. The solution structure of a specific GAGA factor-DNA complex reveals a modular binding mode. Nat. Struct. Biol. 4, 122–132. Orphanides, G., Reinberg, D., 2002. A unified theory of gene expression. Cell 108, 439–451. Oshchepkov, D.Yu., Vityaev, E.E., Grigorovich, D.A., Ignatieva, E.V., Khlebodarova, T.M., 2004. SITECON: a tool for detecting conservative conformational and physicochemical properties in transcription factor binding site alignments and for site recognition. Nucleic Acids Res. 32, 208–212. Pagans, S., Ortiz-Lombardia, M., Espinas, M.L., Bernués, J., Azorín, F., 2002. The Drosophila transcription factor tramtrack (TTK) interacts with Trithorax-like (GAGA) and represses GAGA-mediated activation. Nucleic Acids Res. 30, 4406–4413. Perelygina, L.M., Baricheva, E.M., Sebeleva, T.E., Kokoza, V.A., 1993. The evolutionarily conserved gene Nc70F, expressed in nerve tissue of Drosophila melanogaster, encodes a protein homologous to the mouse delta transcription factor. Genetika 29, 1597–1607. Platero, J.S., Csink, A.K., Quintanilla, A., Henikoff, S., 1998. Changes in chromosomal localization of heterochromatin-binding proteins during the cell cycle in Drosophila. J. Cell Biol. 140, 1297–1306. Ponomarenko, J.V., Ponomarenko, M.P., Frolov, A.S., Vorobyev, D.G., Overton, G.C., Kolchanov, N.A., 1999. Conformational and physicochemical DNA features specific for transcription factor binding sites. Bioinformatics 15, 654– 668. Rabinovich, A., Jin, V.X., Rabinovich, R., Xu, X., Farnham, P.J., 2008. E2F in vivo binding specificity: comparison of consensus versus nonconsensus binding sites. Genome Res. 18, 1763–1777. Raff, J.W., Kellum, R., Alberts, B.M., 1994. The Drosophila GAGA transcription factor is associated with specific regions of heterochromatin throughout the cell cycle. EMBO J. 13, 5977–5983.

370

E.S. Omelina et al. / Computational Biology and Chemistry 35 (2011) 363–370

Rister, J., Desplan, C., 2010. Deciphering the genome’s regulatory code: the many languages of DNA. Bioessays 32, 381–384. Sandaltzopoulos, R., Mitchelmore, C., Bonte, E., Wall, G., Becker, P.B., 1995. Dual regulation of the Drosophila hsp26 promoter in vitro. Nucleic Acids Res. 23, 2479–2487. Schwendemann, A., Lehmann, M., 2002. Pipsqueak and GAGA factor act in concert as partners at homeotic and many other loci. Proc. Natl. Acad. Sci. U.S.A. 99, 12883–12888. Schwyter, D.H., Huang, J.D., Dubnicoff, T., Courey, A.J., 1995. The decapentaplegic core promoter region plays an integral role in the spatial control of transcription. Mol. Cell. Biol. 15, 3960–3968. Shimojima, T., Okada, M., Nakayama, T., Ueda, H., Okawa, K., Iwamatsu, A., Handa, H., Hirose, S., 2003. Drosophila FACT contributes to Hox gene expression through physical and functional interactions with GAGA factor. Genes Dev. 17, 1605–1616. Soeller, W.C., Oh, C.E., Kornberg, T.B., 1993. Isolation of cDNAs encoding the Drosophila GAGA transcription factor. Mol. Cell. Biol. 13, 7961–7970.

Soeller, W.C., Poole, S.J., Kornberg, T., 1988. In vitro transcription of the Drosophila engrailed gene. Genes Dev. 2, 68–81. Tchoubrieva, E.B., Gibson, J.B., 2004. Conserved (CT)n ·(GA)n repeats in the non-coding regions at the Gpdh locus are binding sites for the GAGA factor in Drosophila melanogaster and its sibling species. Genetica 121, 55–63. Thummel, C.S., 1989. The Drosophila E74 promoter contains essential sequences downstream from the start site of transcription. Genes Dev. 3, 782–792. van Steensel, B., Delrow, J., Bussemaker, H.J., 2003. Genomewide analysis of Drosophila GAGA factor target genes reveals context-dependent DNA binding. Proc. Natl. Acad. Sci. U.S.A. 100, 2580–2585. Weber, J.A., Taxman, D.J., Lu, Q., Gilmour, D.S., 1997. Molecular architecture of the hsp70 promoter after deletion of the TATA box or the upstream regulation region. Mol. Cell. Biol. 17, 3799–3808. Wilkins, R.C., Lis, J.T., 1998. GAGA factor binding to DNA via a single trinucleotide sequence element. Nucleic Acids Res. 26, 2672–2678.