Isolation of a family of Drosophila POU domain genes expressed in early development

Isolation of a family of Drosophila POU domain genes expressed in early development

Mechanisms of Development, 34 (1991) 75-84 © 1991 Elsevier Scientific Publishers Ireland, Ltd. 0925-4773/91/$03.50 75 MOD 00032 Isolation of a fami...

2MB Sizes 0 Downloads 40 Views

Mechanisms of Development, 34 (1991) 75-84 © 1991 Elsevier Scientific Publishers Ireland, Ltd. 0925-4773/91/$03.50

75

MOD 00032

Isolation of a family of Drosophila POU domain genes expressed in early development A n d r e w N . Billin, K e i t h A . C o c k e r i l l a n d S t e p h e n J. P o o l e Department of Biological Sciences, University of California, Santa Barbara, Santa Barbara, California, U.S.A. (Accepted 1 February 1991)

The POU domain is a - 140 amino acid domain shared by several mammalian transcriptional activators. We have isolated genomic and cDNA clones from three Drosophila P O U domain genes, designated pdm-l, pdm-2, and (?fla. All three genes encode a conserved POU-specific domain and a P O U homeo domain but are otherwise divergent. By Northern analysis, all three of the genes are expressed in early embryos; the pdm.l and Cfla genes are also expressed at lower levels throughout development. The spatial expression patterns of the pdm-2 and Cfla genes show that they probably play multiple roles during development: an early role in specific ectodermal cells, and a subsequent role in the embryonic nervous system.

Drosophila; Development; POU domain; Segmentation; Homeobox Introduction A number of approaches have identified genes important for Drosophila development (Bellen et al., 1989; Bier et al., 1989; B o p p e t al., 1986; Jurgens et al., 1984; Nusslein-Volhard et al., 1984; Wieschaus et al., 1984). The protein products of many of these genes share common structural domains, such as the homeodomain, that have been highly conserved across species lines. Several mammalian transcription factors contain both a homeodomain and a second conserved region, together termed the POU domain (Herr et al., 1988), which has subsequently been found in a number of additional genes (Finney et al., 1988; He et al., 1989; Monuki et al., 1989; Scholer et al., 1990). Most of these POU domain-containing genes are expressed in the developing mammalian nervous system, although some are additionally expressed in specific nonneuronal tissues. We have used the polymerase chain reaction (PCR) to search for POU domain genes in Drosophila. In the studies described here, we show that Drosophila contains at least three genes that contain a POU domain but are otherwise unrelated. One of the genes, C f / a , has been previously identified as a putative regulator of

Correspondence to: S.J. Poole, Dept. of Biological Sciences, University of California, Santa Barbara, Santa Barbara, CA 93106, U.S.A.

neurotransmitter synthesis in larvae and adults, and two of the genes are newly identified. Temporally, all three POU domain genes are expressed at their maximal levels in early development; two of them, including the Cfla gene, are additionally expressed at lower levels throughout development. The early spatial expression of two of the genes is restricted to discrete ectodermal and neuronal patterns.

Results

Isolation of POU domain genes from Drosophila To search for genes encoding POU domains in

Drosophila, we used PCR to amplify Drosophila D N A sequences similar to the POU-specific regions of three mammalian genes (the Pit-l, Oct-l, and Oct-2 genes) as well as the unc-86 neuronal lineage gene of C. elegans [reviewed in (Herr et al., 1988)]. Amplification with degenerate oligonucleotides yielded a - 1 0 0 bp product, the expected size if Drosophila contained at least one POU domain gene with no introns in the amplified region. The PCR product was subcloned, and the D N A sequences of individual subclones fell into two distinct classes of POU domains. One member of each class was then used to screen a Drosophila genomic library; phage isolated in this way hybridized with both classes of

76 probe. Restriction mapping of individual phage showed they fell into three families, each representing a different Drosophila POU domain gene (Fig. 1). The chromosomal location of each of the three genes was determined by in situ hybridization to polytene chromosomes (data not shown). Two of the genes mapped to band 33F on the second chromosome; chromosomes hybridized with both probes simultaneously showed only a single band. These two POU domain genes of Drosophila have been designated pdm-1 and pdm-2, and as shown below, they are virtually identical in the POU-specific region amplified by PCR but diverge outside of the POU domain. The extent of the DNA between the pdm-1 and pdm-2 genes has not yet been determined, but there is no overlap between the walks surrounding the two genes (Fig. 1A and 1B). The third gene mapped to band 65D on the third chromosome,

A).

~u I

I" ,,,5kb

I

I

I ~.13 ~.14

~.15

Location: 33F

Gene: pdm-1

eDNA: c6

POU i

B).

HH H

H

H

H

III

I

I

I

T

),2

).3 ~.4

Location: 33F

).6 cDNA: c9

Gene: pdm-2

c).

Pou ... I I

. I

I

H IH

HHHH

I

II

II

~.l II

),7 ),5 ).10 ~.9

Localion: 65D

Gene: Cfla

cDNA: c21

Fig. 1. Restriction maps of three families of phage containing Drosophila P O U domain genes. Shown are the restriction maps of regions contained in three families of overlapping phage. Above each m a p is shown the location of the restriction fragment that hybridizes with a POU domain probe. Below each map is shown the extent of the D N A contained in individual phage in each class. Also shown is the chromosomal location of each gene as determined by in situ hybridization, the designation of the POU domain gene, and the representative c D N A clone arising from each gene used in further studies. (A) Phage of the pdm-1 gene at 33F, encoding c D N A 33i-c6. (B) Phage of the second 33F gene, pdm-2, encoding c D N A 33ii-c9. (C) The 65D gene, encoding the Cfla c D N A clone 65-c21. H = HindllI, S = Sail.

the reported location of the previously isolated Cfla POU domain gene (Johnson and Hirsh, 1990). The PCR amplification thus yielded three Drosophila POU domain genes. It is possible that others have yet to be identified. In order to reduce degeneracy, the upstream set of oligonucleotide primers for PCR amplification did not include possible codons for a G l n / V a l substitution unique to the mammalian Pit-1 protein. In addition, the presence of an intron in the region between the two primers could make the region too large to amplify under the conditions used. To test for the possibility of other Drosophila POU domain genes, the POU-specific domain of the pdrn-2 gene was used to probe Southern blots of Drosophila genomic DNA at different stringencies of hybridization (Fig. 2). At high stringencies, the probe hybridized strongly to two genomic EcoRI or HindlII fragments, corresponding to the two similar pdm-1 and pdm-2 genes located at band 33F (Fig. 2A; as shown below, in the region of the probe they are 97% identical). At slightly lower stringencies, an additional band corresponding to the band 65D gene can be seen (Fig. 2B, C). At the lowest stringency used, an additional fragment not corresponding to any of the known genes can be clearly seen (Fig. 2D); on longer exposures additional weakly hybridizing fragments can be seen. These additional fragments may represent other more divergent Drosophila POU domain genes that have not yet been isolated. Northern blots using the POU domain region of each gene showed that all are expressed in early embryos (see below). Accordingly, probes from each gene were used to screen an embryonic (3-9 h of development) c D N A library. Several clones arising from each gene were isolated, and the largest member of each class was used for subsequent analyses, including determination of the DNA sequence on both strands (Fig. 3). Clone 33i-c6 is 2.8 kb in length and was derived from the pdm-1 gene at band 33F. It contains a 1806 bp open reading frame that can potentially encode a 65,000 dalton protein if the first A T G (at position 213) in the cDNA clone is used for initiating translation (Fig. 3A). It is likely that this represents the first A U G in the pdm-1 transcript giving rise to this cDNA, since there are several in-frame stop codons just upstream. The protein encoded by this open reading frame has an acidic region (amino acids 123-128), a potential basic nuclear localization signal at the N-terminal end of the homeo domain ( R R R K K R ) , and trinucleotide repeats encoding runs of polyalanine (e.g. residues 191-199) and polyhistidine (residues 359364). Clone 33ii-c9 (1.8 kb) was derived from the pdm-2 gene, also mapping to 33F. It has a 1242 bp open reading frame potentially encoding a 45,800 dalton protein (Fig. 3B). Although there are no in-frame stop codons upstream of the first A T G in the cDNA clone, there is a stop codon in the genomic D N A 33 bp

77 A

B

RH

RH

C

D

RH

RH .~21 ~e6

-'~Nc9

Fig. 2. Low stringency hybridization of a Drosophila POU domain probe to Drosophila genomic DNA. A probe representing the POUspecific domain of pdm-2 was hybridized to Southern blots of Drosophila DNA digested with EcoRI (R) or HindIII (H) at various stringencies. Shown at the right are the expected sizes of genomic HindIII fragments hybridizing to the Cfla cDNA clone c21 (8.5 kb), the pdm-I cDNA clone c6 (5.9 kb), and the pdm-2 cDNA clone c9 (3.5 kb). The arrowhead shows the strongest of the additional restriction fragments hybridizing at low stringency. (A) 50% formamide in the hybridization buffer. (B) 40% formamide. (C) 30% formamide. (D) 20% formamide.

upstream of the 5' end of the cDNA, with no consensus splice acceptor sites present (data not shown). Thus, the first A T G in the cDNA clone probably represents the first A U G in the pdm-2 transcript. These is an acidic region in the protein from amino acids 34-39, and a potential nuclear localization signal at the N-terminal end of the homeo domain ( R R R K K K R ) . Finally, clone 65-c21 (2.7 kb) was derived from the 65D gene, which as shown below is the previously identified Cfla gene. The sequence shown in Fig. 3C is a composite of the sequence of clone 65-c21 and the upstream end of 65-c23, a clone that extends 158 bp further upstream. There is an A T G codon at position 118 followed by an open reading frame potentially encoding a protein of 427 amino acids. This protein would also contain runs of polyalanine (e.g. residues 87-93) and polyhistidine (81-86), and a potential nuclear localization signal at the N-terminal end of the homeo domain (RKRKKR). All three genes encode a different POU domain-containing protein, and regions outside the POU domain are in general not conserved with each other. As with other POU domain proteins, there is strong conservation in the upstream POU-specific region and in the homeo domain, and the short linker region between them is less conserved (Fig. 4). Among the Drosophila genes, the pdm-1 and pdm-2 genes at 33F are most closely related. In their predicted POU-specific do-

mains, 78 of 79 amino acids are identical. The homeodomains are less conserved (53 identical residues out of 64), and the linker regions are even more divergent. There is some sequence similarity in the 23 amino acids upstream of the POU-specific domain between the two proteins, and further upstream they are unrelated. When the predicted pdm-1 protein is compared with that of the Cfla gene, the conserved POU-specific region is shorter (63 amino acids) and only 50 amino acids are identical. The C f l a homeodomain is still more distantly related, having only 43 of 64 residues identical with the pdm-1 homeodomain. A comparison with other POU domain proteins reveals that the pdm-1 and pdm-2 POU domains are most similar to the mammalian Oct-1 and Oct-2 proteins (Fig. 4). The POU-specific region is the most conserved (e.g. 70/74 between pdm-2 and Oct-2), the homeodomain is less conserved (50/63 identical residues between pdm-2 and Oct-2), and the linker region is divergent. Regions outside of the POU domain are not conserved. The C f l a POU domain is most like the mammalian class III POU domain genes (He et al., 1989). Comparing C f l a with the class III POU domain Brn-1, 119 of 125 residues are identical in a region including the POU-specific domain, the linker, and the homeodomain. It is not known whether the region of similarity extends beyond this in the mammalian class III genes.

Clone 65-c21 arose from the C f l a gene The isolation of a Drosophila POU domain gene, Cfla, encoding a putative regulator of the Dopa decarboxylase (Ddc) gene has recently been reported (Johnson and Hirsh, 1990). Dopa carboxylase catalyzes the formation of the neurotransmitters dopamine and serotonin, and its expression can be detected in specific dopaminergic and serotonergic neurons in the larval and adult central and visceral nervous systems as well as the epidermis (Beall and Hirsh, 1987; Konrad and Marsh, 1987). Nuclear extracts from 10-22 h embryos contain factors that recognize several upstream elements of the Ddc promoter (Johnson and Hirsh, 1990; Johnson et al. 1989). The Cfla POU domain gene was originally identified by in vitro binding of a fl-galactosidase/Cfl ~,gt-ll fusion protein to upstream element C of the Ddc gene. Third instar larvae with a mutated element C show reduced or absent dopa decarboxylase levels in some dopaminergic neurons, but normal levels in other dopaminergic neurons, in serotonergic neurons, and in the hypoderm, suggesting that Cfl is a requisite transcription factor for the Ddc gene in only those affected neurons (Johnson and Hirsh 1990). The Cfla gene was localized by in situ hybridization to salivary chromosome region 65D, the location of the PCR-isolated POU domain gene that gave rise to our

78

A 1 61 121

AG T T T C A C A C G A G A C A C G A A C G C G T C C A C A C G A A A G C G G C G C T T C C A A G A ~ T C A G A A T A AATCTACAAGATAC TAACAAGACCAACAAACGAATAAGAAACAATCAGAACAGAAACAGA TAT T T A ~ G T G C TCAAG TGAAT TCACAG T G C T T A A G A A A C A T T A A A C A T A T C G C A T A T A A

241 ii 301 31 361 51

C C G C T A G T C C C G A G G A T A A T A A G A A T T C C T T G A A G C G C G A T T TGC TCAAATCCAC ACC AA A S P E D N K N S L K R D L L K S T P T C C A G T G C C C G C G A G G C C G C C G T T C A C A T A A T G C A G A A T C G A T A T AT CAGTCGCC TG TCGC S A R E A A V H I M Q N R Y I S R L S R G T T C G C C A T C G C C A C T T C A A T C O A A T G C TTCCGATTGCGATGAT AACAAC TCGAG TG TGG S P S P L 0 S N A S D C D D N N S S V G

421 71 481 91 541 iii 601 131 661 151 721 171

GCACCTCGAGCGATCGCTGCCGATCTCCTTTGAGTCCTGCGCTCTCCTTGAGCCACCAGC T S S D R C R S P L S P A L S L S H Q Q AGGCCA~CGCCAGCTOATGTCGC TGCAGCCCCACCCGGCACAC CATCATCACAATCCGC A K R Q L M S L Q P H P A H H H H N P H ACCATT TGAACCACC T G A A C C A C C A T C A ° T A C A A A C A G G A G G A G G A T T A C G A T G A C G C C A H L N H L N H H O Y K O E E D Y D D A N ATGGTGGGGCATTGAATTTAACCAGCGACAATAGTCGCCACAGCACTCAGTeTCCATCGA G G A L N L T S D N S R H S T Q S p S N AT T CGG T G A A A T C G G C C A C A G C A T C G C C O G T T C C G G T A A T T TCAG TGCCCTCGCC AGT GC S V K S A T A S P V P V I S V p S p V P CGCCCATGATCTCTCCGGTTTTGGCTCCCTCGGGTTGTGGCTCCACCACACCCAATTCCA P M I S P V L A P S G C G S T T P N S M

781 191 841 211 901 231 961 251 1021 271 1081 291 1141 311 1201 331 1261 351 1321 371 1381 391 1441 411 1501 431 1561 451 1621 471 1681 491

TGGCAGCAGCGGCTGCAGCCGCCGCCGCTGTGGCATCCACCATGGGTAGTGGCATC T CGC A A A A A A A A A V A S T M G S G I S P CGTTGCTOGCCCTGCCOGGAATOTCCTC~CCACAGGCTCAACTCGCAGCAGCTGGCTTGG L L A L P G M S S P Q A Q L A A A G L G GCATGAACAATCCACTGCTGACTGGATCACTGTCGCCACAGGATTTTGCCCAGTTCCATe M N N P L L T G S L S p Q D F A Q F H Q AGC T A T T G C A G C A A C G T C A A G T TGCG T TGACACAGC AGT T C A A C A G C T A C A T G G A G C TGC L L Q Q R Q V A L T Q Q F N S Y M E L L TGCGAAGTGGTTeCCTGGGCCTGGCACAGGACGATeCGGCACTGACCGCCCAGGTGGCGG R S G S L G L A Q D D P A L T A Q V A A CAGCCCAGTTCCTAATGCAGAGCCAACTGCAGGCCCTCAGTCAGGCCAGTCAGCAGTTGC A Q F L M Q S Q L Q A L S Q A S Q Q L Q AGGCAC T G C A G A A G C A A C A G C A G C G C C A G G T G G A T G A G C C A C TG CA~C T G A A C C A C A A G A A L Q K Q Q Q R Q V D E P L Q L N H K M T G A C G C ~ C A G C C A C G C A G T T C C A C A C C G C A C T C G A T C C G C A G C CCCATC GC AAT TCGC A T Q Q P R S S T P H $ I R S P I A I R S G TCCGGCCAGC T C A C C C C A G C A G T T G C A C C A T C A C C A C C A C C A T C C A C T G C A ~ A T C A C G C P A S S P O O L H H H H H H P L Q I T P CGCCCAGCTCGGCGGCCAGTTTGAAATTGAGTGGTATGCTOACGCCCAGTACOCCCACCA P S S A A S L K L S G M L T P S T P T S G T G G C A C C C A G A T G A G C C A G G G C A C C A C C A C G C C G C A G C C C A A G ACGG TGGCCAGTGC TG G T Q M S Q G T T T P Q P K T V A S A A CAGCTGCTCGGGCAGCGGGCGAGCCATCGCCGGAGGAAACCACCGATCTAGAAGAGTTGG A A R A A G E P S P E E T T D L E ~ ~ E AGCAGTTTGCCAAGACCTTCAAGCAGCGGCGAATCAAGCTGGGC TTCACCCAGGGCGATG Q F A K T F K O R R I K L G F T O G D V T GGGTC T G G C C A T G G G C A ~ C T G T A T G G C A A T G A C T T C TCGCAG ACCACCAT T TCACG T T G L A M G K L y G N D F S ~ T T I S N F T C G A G G e C C T C A A T C T G A G C T T C A A G ~ % C A T G T G C A A G C T G A A G C C O C T G C TC C AGAAGT E A L N L S F ]~ MC K L ~ ~ Q K W GGTTGGATGACGCCGATCGCAeCATCCAGGCCACCGGTGGTGTCTTCGATCCTGCCGCCT L n ~ A D R T I Q A T G G V F D P A A L

1741 511 1801 531 1861 551

TGCAGGCCACCGTCAGCACACCGGAGATCATTGGTCGCCGTCGCAAOAAGCGCACCTCCA Q AT V S T P E I I G ~ R K ~ ( R T S I TCGAGACCACeATTCGCGGTGCTCTGGAGAAGGCCTTTCTGGCCAACCAGAAGCCCACCT E T T I R G A L E K A F L A N O K P T S C A G A G G A G A T C A C T C A G C TGGCCGATCGTCTGAGCATGGAGI~%GGAGGTGGTGCGCGTGT R ~ 7 T O T. A ~ R T. S M R K R V V R V W

1921 571 1981 591 2041 2101 2161 2221 2281 2341 2401 2461 2521 2581 2641 2701 2761 2821

GG T TC T G C A A C C G T C G G C A G A A G G A G A A G C G C A T C A A T C C C T C C CTGGACAG TCCCACGG ~ ~ ~ R R O K E K ~ I N P S L D S P T G G C G C C G A T G A C G A C G A G T C C T C C T A T A T G A T G C A C T A A G A G A T C AAGGATCG T C T AT GG A A D D D E S S Y M M H C T G C A A G G G A G G TCACG T T T A G G G A C A A A G A C A G T G C A A T G T G C GGGAT AAAT C G A A G G A AGC T T A G C A A C A G C A G G T TCAAAG T C G C A A A C A C A T C A A A T A C T AGCATAAT TC TG TACT ACGGATAGGTATCGACTCCA~CA~CGGGGTTTGCTTGTGT TAACTATTAATGGTAT T TAT T T T AAT G T A A T T C T A T T A T G A A G A T T A T C C AAAATAT TGC TACTC A A G T G A A C C G C TACTCATTCGGCATTCAAGTTCACTTGCTCAACAACATTTTGTGGATGCATAAGGAAATG GTATTTATTTGCCTTGATAAGCAAATGAATTCGACTTCTTTATAAGTCTTTTTGGCGTAT GTGTATTAeTAATATTTAAGTACACTCTTTGCACAATTTTCAAAGTCA~TAATGAAT T CCTAGCGAATAATTAOAAAGTTTACCACTTTCCACGT~ATTACGTATGTAGATTACATTG TAAGCATATTTATTTTTCGCGTGAATACTGTGATAATCAATTTGAATGATTTTAATTTTA ATAAATGA~G~%AAC~AAG~CAATAT~J%ACATAAACT AAeTGT ACT TTGT AT G TTATGGGCTAGTAAACTAACACAAAAAACATTAATGTAAACCTATCGTATTTATGTGTAA A ~ C C G A T TGAAT T T A T G A ~ C A C G A ~ T A T T T AC~AT TG T ~ A A A C A T G A A A T A C TAGA AT T T T T A ~ T G T A T A A C A ~ G G A ~ G A C A A A A T C ACAATGA~TAAAATTAAT T A T C A A C A ~

B 1

121 14 181 34 241 54 301 74 361 94 421 i14 481 134 541 154 601 174 661 194 721 214 781 234 841 254 901 274 961 294 1021 314 1081 334 1141 354 1201 374 1261 394 1321 1381 1441 1501 1561 1621 1681 1741 1801

A A A A A A T A T C C T T T T T C C C T A C T T A C A G A C T G C G G A A A G A G C T T C G A G G A G G A G G A A C TG

C TAGAC TC TGAGCTGGAARACGAGGTCCTGGATCTGGCACCGCCACCGAAGAGAT TGGCC L D S E L E N E V L D L A P p P KR LA

GAGGAGCAAGAGGAGGA~GGTGGCCAOTGTCAATCCACCACAGCCCGTGGCATT EE OE E E K V A S VN p P 0P V A F CCGGAGGAAATGCATCAGGCTCTGCAACTGCAGT P E E M H O A L Q L O L

TGCA A

TGCACAGCTACATCGAAATGGTTCGC H S Y I E M V R

CAAT T G G C G C C G G A G G C G T T T C C C A A T C C C ~ A C C T G G C C A C G C A G T T C C T G C T G C A G A A C Q L A P E A F P N P N L A T OF L L Q N TCACTGCAAGCCTTAGCGCAGTTTCAGGCCCTGCAACAGATGAAGCAGCAGCAAAGGGAG S L QA L A Q F 0 AL O Q M K O O Q R E ° A T C C T C T G C C C A G T T A C T C C A C C C C C T T G G C C A A A T C A C C A C T G A G A A G T C C A T C C C TG D P L P S Y S T P L AK S p LR S p S L A° TCCGG T G C C C C G G C A C A G C A ~ T C C C A G C A G C G A A C T C C G C C A ~ C T C G A T G A C TGCC S P VP R H S K S QO R T P p N S M T A AAT TC T C T G G G A A T G A G T A G T G C C G T G A T G A C T C C A A A T A C G C C C A G C A T G C A G C A G C A G N S L G MS S A V M T p N T p S MQ O Q CCGCAACTGCAGCAGAGCACTCCCAAGCCCACGAGTGGACTAACTGTGGCATC TGCCATG P 0 L Q Q S T P K P T S G L T VA S AM GCCAAGC T G G A A C A A T C G C C G G A A G A G A C C A C C G A T C T G G A G G A A C T G G A A C A G T T C G C C AK L E Q S P E E T T D L E E LE ~F A AAGACC T T C A A A C A G C G G C G A A T C A A G C T G G G C T T C A C C C A G G G C G A T G TGGGTCTGGCC K T F K ~ R R I K L G F T G D V G L A ATGGGCAAACTGTATGGCAATGACTTCTCGCAGACCACCATTTCCCGCTTCGAAGCCC M ° K L Y G N D F S ~ T T $ R F E A

TC L

AATCTGAGCTTCAAGAACATGTC, C A A G C T G A A G C C G C T G C T C C A G A A A T G G C T G G A G G A T N L S F K M M e K L K P L L ~ K W L E D GCGGACAGCACGGTGGCCAAGTCGGGCGGTGGAGTGTTCAACATCAACACGATG&CCAGC A D S T V A K S G G ° VF N I N T M T S ACGCTGAGCAGCACACCGGAAAGCATCCTGGGACGCAGACGCAAGA~CGCACCTCCATC T L S S T P E S I L G R,,R R K K R T S I GAGACCACGGTGCGCACCACGCTGGAGAAGGCCTTCCTGATGRACTGCAAGCCCACG TCG E T T VR T T LE K A F L M N e KP T S GAGGAGATTAGCCAGCTGTCCGAGCGCTTGAACATGGACAAGGAGGTGATCCGGGTGTGG E E I S ~ L S E R LN MD K E V I R VW TTCTGCAATCGCCGTCAGAAGGAGAAGCOCATCAATCCCTCGCTGGATCTGGACAGTCCC F C N R R ~ K E K R I N P S L D L D S P ACGGGCAC TCCGC T G A G C A G T C A T G C C T T T G G G T A T C C A C C G C A G G C T C T G A A C A T G T C C T G T P L S S H AF G Y P P QA L N M S CACATGCAAATGGAGGGeGG TTCGGGTTCCTTeTGCGC~AGTTCCATCAGTTCAGGGGAA H M 0 ME G G S G S F C G S S I S S G E T A G G A G A A G C G G G T T C A G C A T G G A A A T C C A T C C A T C C A A T C T T G T A C A T A G T C A G C T TAT GC TCCAAAT TC T TCCACAACATTCC A C A T G C C A G A C T C T C C ATATC A A A T G C A A A C A A T ~ C C A T G G A T G T A T T T A G T A A A C T A G G T G A T A G C C A G T C G T A T C A G A A A A T C C A T TACGAAT T T A T G A G T A C A T A T A G C TCCAGTACAC TATAC T A C C T A T T C G T A T C T G T A T G C A T AT AGA G T T C C T A G T T C T A A C T A T T C C C C A A A T G C C A T A C G T G T G T A T C T G T T G T G C C C C ATATC C C C C A T A T C C A T T A T G C C A C A A A T C A G T C C T C T A G T TGTAT T T A C T G T A C T G T G T A T T G A C C T A ~ T G TG TATG T A T A T G T A T A C C G A T G T T G T T G T A C G A T G TCCAAT T G T A A A T T G A T C T TGAACACAC T TAGATCG T A A C A T C C A ~ T T T C A A T A A A A T TAT T C C G A T A ~ T A A T TTATA~

Fig. 3. Nucleotide sequence and predicted translation products of the three Drosophila POU e D N A clones. The nucleotide sequences are shown, with the predicted translation product underneath. The POU-specific domain and the homeodomain are underlined. The first ATG in the sequence is boxed. (A) Sequence of clone 33i-c6, arising from the pdm-I gene at chromosomal band 331=. The boxed A T G is the first in the open reading frame. (B) Sequence of clone 33ii-c9, arising from the pdm-2 gene at chromosomal band 33F. As described in the text, the boxed A T G is most likely the first in the open reading frame. (C) Sequence of clone 65-c21, arising from the gene at chromosomal band 65D. The sequence shown is a composite of sequences from two e D N A clones. Nucleotides 159-2839 are derived from clone c21; nucleotides 1-158 are derived from clone e23, which was found to extend further 5' than c21. The first ATG in the sequence is boxed; it is not known whether there are upstream in-frame A T G codons. Nucleotides 363-1523 overlap with those of ACfla (Johnson and Hirsh, 1990). Asterisks mark the locations of 8 nucleotides in the region of overlap present in the sequence of c21 and absent from ACfla.

79

C 1 CATAGGGACCCT~TCTTCCTTTCGATCGATC~TGTACAAAGTCGTTAAACTGAGCCCC 1 H R D P G L P F D R ' S V Y K V V K L S P 61 ~ C T T G G A G C C C T A C C C C G C C A ~ G T ~ A G G G C C T G ~ C & ~ C C C ~ T G G G ~ 2 1 N L E P Y P A S V E G L N S P R A V G ~ L _ J 121 GCCGCGACCTCGTACATGACGCCGCCG~CGGTGATCTGGACATGGCCCTCGGAGGC~T 4 1 A A T S Y M T P P S G D L D M A L G G G 181 G~TATCACACC~GTCGCCGCGCTCGGCGGCTGATGCCGGTGAAATGAAGTACATGCAG 6 1 G Y H T S S P R S A A D A G E M K Y M Q 241 CACCATCATCATCATCAT~C~T~C~GGCAGCTGCCCACCATCAGTTGCCCTCGTCG 8 1 H H H H H H A A A A A A A H H Q L P S S 301 C C A T C G C C C ~ T G G C C A ~ T ~ T G G C ~ A C T T G G A T T ~ G T T C C G G T T C C G G G T T G 1 0 1 P S P N G Q G N G G G L G L G S G S G L 361 GGCTCATGGAGT~CCTCCATCCGGATCCGTGGATGCAAACCCATCATACGCACCATCTG 1 2 1 G S W S A L H P D P W M Q T H H HHL 421 ~CGCCGCCGCT~CGTTGCCTCGGCCGCTGATACCGTC~%GCAGG~&TGTCGCATCTC 1 4 1 P A A A A V A S A A D T V K Q E M S H L 481 TCGCAGCAGACGCGCATCC~C~GGCAT~GCCTCGCCCCATGCCGCCTGGCATGC~CC 161

s

o

o

T

R

I

o

o

G

M

A

s

P

.

A

A

.

H

A

P

541 1 8 601 2 0

CATGCAGGACACTATGCGCC~CGGGCGGGTCACCGTTGCAGTACCATCATGCCATG~C 1 H A G H Y A P T G G S P L Q Y H H A M N GG~TGCTCCATCATCCGGCCCATGCGGTTGCAGCGGCCCATCATCAGAGTGTGGCGCCA 1 G M L H H P A H A V A A A H H Q S V A P 661 CTGCATCATAC~TGCGCGGGG~TCGCCACAGCTGCACATACACCATCACATGGGCGGC 2 2 1 L H H T L R G E S P Q L H I H H H M G G 721 GGCGATCGGGAT~CAT~GCGGCGGCGAGGAGGACACTCCCACGTCCGATGATCTGGAG 2 4 1 G D R D A I S G G E E D T P T S D D L E 781 GCCTTTGCC~GCAGTTCAAGCAGCGCCG~TC~GCTGGGCTTCACCCAG~CGATGTG 2 6 1 A F A K Q F K Q R R I K L G F T ~ A D V 841 GGCCTGGCACTG~CACTCTCTATG~TGTGTTCTCGCAGACGACCATCTGCAGATTC

2 8 1 G L A L G T L Y G N V F S ~ T T I C R F

901 GAGGCACTGCAGCTGAGCTTC~GARCATGTGCAAGCTG~GCCGCTGCTGCAGAAGTGG 3 0 1 E A L ~ L S F K N M C K L K P L L ~ K W 961 CTGGAGGAGGI!GGACTCC&C~CGGGCTCACCCACGTCCATTGAC~GATCGCCGCTCAG 3 2 1 L E E A D S T T G S P T S I D K I A A Q 1021 GGGCGT~GCGC~Gi~ACGCACCAGCATCGA~TGAGCGTG~GGGGGCACTGGAGCAG 3 4 1 G R K R K K R T S I E V S V K G A L E ~ 1081 CACTTCC&C~GCAGCCG~GCCATCC~CCA~AGAT~CCTCGTT~CCGACTCCTTG 3 6 1 H F H K ~ P K P S A ~ E I T S L A D S L 1141 C A G C T G G ~ G G A ~ T G G T A C ~ G T G T ~ T TC T ~ T C G C , CGGCAG~GGAG~GCGC 3 8 1 O L E K Z V V ~ V W F C ~ R O K E K R 1201 ATGA~CCGCCAAATACGCTCGGCGGCG~ATGAT~ACG~ATGCCGCCGGGCCATATG 4 0 1 M T P P N T L G G D M M D G M P P G H M 1261 CATCATGGTGGCTATCATCCGCATCACGATATGCAC~CAGTCCGA~GGCACACACTCC

4 2 1 H H G G Y H P H H D M H G S P M G T H S

1321 4 4 1381 4 6 1441 1501 1561 1621 1681 1741 1801 1861 1921 1981 2041 2101 2161 2221 2281 2341 2401 2461 2521 2581 2641 2701 2761 2821

CACTCGCACAGTCCGCCCATGCTGAGCCCACAG~TATGC~TCCTCGGCGGTAGCGGCG 1 H S H S P P N L S P O N M O S S A V A A CATCAGTTGGCG~CCACTA~TCAGAAATCCAGGAGTCG~CTCAGCTGCAGCTGCG 1 H Q L A A H TCCACTCCTGCA~CCTC~TAGTCT~GCCAGCAGC~CAGCAGCAGCAGC~CAGCAG CAGCAGC~CACCAGC~CAGCAGC~CATCAGCAGCA~AGCACACACCG~TACACCA TCCTCCAGT~G~ATCATCGGCTACGATGACCAGCCA~TGATGTCGCCGCAGAGTCCG CT G G G C A G C T C C ~ C ~ T G ~ T C A G G C T ~ C ~ C ~ T ~ T ~ T ~ T ~ T ~ C ~ C ~ C ~TAGCAGCACC~C~T~C~T~T~C~T~C~CGAGGAGC~GTTAAACACCAC CAGCAGC~CAGC~C~C&~AGGCATCCA~AT~CCGCTGCAGCAGCCGCCATG TACATGGATCCCAT~TACCAGCATCCGCACCACCC~ATCCGCATCCGCACCAGCAT CCGCACCTG~T~C~CCACCATCCGCACCTGTTCCACACGAGCGATCAGTTGCAGCAC TCGCAGCAGCAG~CGTCG~CACCAGCAGC~CTCCC~TTGTCACCGGGTGCCGGC AGC~CAGTGGCCTGCCACTCCAGCCGCCCAGTTCCGCCTCGCCGGCCGGATCCGCATCC TCGGTGCTTGGT~CCATCTC~TCTG~TCCGCTGCACCAGCACCACCACTATCAGCAC CACCATCACCA~AGCACCACCACCACCAGCAGCAGCAGCAGCAGCTCCACGCCCGCCGA CTCTTCGAGTTG~TCTGC~TTTGGAGCGGCGGCGGCACCGGGAGCGGTACGGCACTTT ~CAGCTTT~C~GGCGGGCGGAGCGGGTGGCG~TAGCATTCGTCGGCCGCCTGGTTC GGCTATGAGTCC~CTAGGATTCGGCGTCG~TGGGTGGCCGCATTTC~GCGGGAGCAG CCCTAG~TCGT~AGTCG~CTTCTCT~CGCG~CTC~TTGGTTCGATTCG~TGAT CGCAAAAGTAGCTATCTAAAATTT~CTAGCTGTGAGCATTTCAG~TATTTCGT~GTC AT~G~TATTTTTGAAATTTTTC~TATTTTATTTCTGCGGTGTTTTGAGGTTTTTT~ ~TTTGAA~TA~ATTTCACAGTGTTCTTAAAATCTGTTACTAAATTGGCGACTGAAAA AAATCATTTTAAACATATATACTTAAAC~TATG~CCATATTGAGTTCCACGTTAGACA AGCTCGACTGTATGCAAATCTTTAAAG~TAGAAAT~GATGAGGTCCGTCGATGAGGTT GAGAAATCATCTTTTTTACCAGTGCATT~TCTGTGTTAAACTTCAGCTTAAATAGGAGC AGGGAGTGGAGG~GTGGAG~GATGGGGAGGCTGCCATCCACCATTTTAGTTTGTTTTA ~GTTTGAAAAACT

1499 of Fig, 3C, shown by asterisks) where an additional nucleotide is present in 65-c21 that are absent in the reported sequence of )~Cfla, and 7 of these 8 sites occur within, and alter, the open reading frame. In addition, 65-c21 has a CG dinudeotide at position 557, part of an alanine codon, that is GC in )~Cfla, resulting in a glycine codon. Other differences between the two sequences, at nucleotides 546, 825, 879, 987, 1089, and 1161, are single base-pair changes that would not result in an amino acid substitution. It is almost certain that )~Cfla and 65-c21 arose from the same gene. In addition to the nearly identical sequence, both map to the same chromosomal location. If hCfla and 65-c21 arose from different genes, their sequence near-identity across a span of > 1000 bases would result in cross hybridization between the different genes even at high stringency. However, genomic Southern blots probed with the entire c21 insert detect only a single major band of hybridization at normal stringencies (data not shown), and probes from the 33F POU-specific domains cross-hybridize with only a single genomic fragment representing the 65D region, not two different fragments (Fig. 2). Thus, despite the differences the two cDNA clones were almost certainly derived from the same Cfla genomic transcription unit. There are several arguments that the extra nucleotides seen in 65-c12 are not gel or cloning artifacts. First, the 65-c21 clone was sequenced completely on both strands, so that gel compressions on the different strands were offset, allowing the extra nucleotides to be detected. Second, one of the strands was also independently sequenced in its entirety with dITP, and the sequence obtained was the same as that determined by sequencing both strands with dGTP. Third, the four 5'-most of these additional nucleotides (positions 422, 504, 510, and 537) and the CG dinucleotide at position 557-8 were also confirmed in the sequence of c23, an independently isolated clone. Finally, there is a correlation between the sites of additional nucleotides and high G / C gel compression artifacts, suggesting that the differences between hCfla and 65-c21 are due to sequencing artifacts in the )~Cfla sequence.

Fig. 3 (continued). The three POU domain genes are preferentially expressed in early Drosophila embryos cDNA clone 65-c21. A comparison of the sequence of 65-c21 with the reported sequence of ACfla (the Xgt11 Cfla cDNA clone (Johnson and Hirsh 1990)) shows that although they are quite similar and probably arose from the same gene, there are significant differences between them (Fig. 3C). The 65-c21/23 sequence extends 362 bp further upstream than ~,Cfla and 1310 bp further downstream, and the predicted translation product of 65-c21 is significantly different from that of hCfla. In the 1161 bp region of overlap, there are 8 sites (nucleotides 422, 504, 510, 537, 1205, 1206, 1241, and

To determine whether any of the Drosophila POU domain genes may play a role in early development, inserts from the cDNA clone for each gene were hybridized to Northern blots of total RNA prepared from staged embryos, larvae, pupae and adult flies (Fig. 5). Each probe hybridized to transcripts preferentially expressed in early embryos. The pdm-1 clone 33i-c6 hybridized to a major transcript of - 2.8 kb in 0-3 and 3-6 h old embryos; a minor tran'script of - 2 . 9 kb is also seen in 6-12 and 12-24 h embryos, and another

80

4 Pit-1 Oot-1 Oot-I pdm-1 pdm-I Cfla Brn-1 Ool-4 uno811

POU-Specific Box

k

iiiiiiiii ii !! ! iiiiiiiiI 7 iN itiiiii!lIi iiiii ilI ii!iii

L CO S e z z ' r T ~ L " . . . . . GGK EDT P r S[DI. . . . . . .

r,lL zl~[r ~ x T~ X O !qR z It L O[rlS' ~ ° V (~1~[14° KILJ'~GI- - - N °IF S 0 T r IJs[P. r I:INLNL qgl~N NC XL lip L LIGPtWLIg ~l,~e s T VA K S ~ L g ~ k A~X0 ~ I ( O lq R Z 1( L g ~ l " Q~D V G~I~A~LO 11L~[G~" - - N ~ r S O T T ZIC[RF I¢~LI~L fflr~KN MC X L X P L LIbel(WLIt If.ll~ s T T O s P I I I I | RRZKLG~'~rQ~DVGJL~LGI'JLJY~G~---NV~FSQTTIJC ~ F I ~ L k ~ L ~ l r t X N N C l t L I t P L L ~ q j K w ~ S S T G S p v It p r t= s z s ¢11~1t4- x ~ r. o x ~ L I r ~ ' ~ X l, JJx d]]t~ : I ~ L d f f l r d d ° v d ~ o ~fl~- - x dr s o r ~ ~d~ r ~ L d~ ~cx ~ c~ = ~ . ~ z~ L0 R r P I & P P r S~M D r - o P R ~ z . j ~ U t I~'Fx o It I z I(L G k / t ~ L &~ P GV G S ~ ~ " ~ H ~ ~ E E & MK Q X

II

Homeo Box Pit-1 Oot-1 Oot-I pdm-1 pdm-= Cite Brn-I 0ot-4 unolll

SL$SPSALNSP .....

G~EGL-SRRR

ETNZRV

K

LlrN

Tit le

THai

M

-GGVFDP~ALOA TV - S TP g - l Z °R FtR ~K R T S ]]ET T I R ° k~L IgIK~ L A N 01Kt~r S ¢ qZlr 0 LIND ~LIS ~4llxlZlvl rSZDKZ*~O ............. ~zcxs~r~v . . . . . . . . . . . . . DTIGDIN .........

a • x, q~ a r s ~ v s v xG~c qs ,lqLXe alKqs s 0 qzlr. ~ slqOLqxFIvJ O* ~x qx • • s ~ l ~ . * w . sic q~.lqc x c ~t~qs c OqIl~. II,~ OI~;LqxRvl

G I L P N - T D K K RIK R T S ~ A A P g X R F , ~ J ~ F ~ K O O P ~

~v.re.,IQt~vxrscNos ~vwrcn~oqqxqz~PPssGG KwreN~xoqqxqzuPesA^P ~vwrc~ROqqXqZ.PSLDSP ~v.reNRROqqxqz. PSCDLD

Rv.reNK~0qqx~,rPP,rc~ RVNFCN RVNFCNRRQ~K~SSIZYSQR

GIg ItJ~A 8 I~ID RLLIDL I~bll~_J

Fig. 4. Comparison of POU domains. The predicted protein products of the POU domain region of each of the three Drosophila POU domain genes is shown. Also shown are a representative sampling of other POU domains: rat Pit-1 (Mangalam et al., 1989), human Oct-1 (Sturm et al., 1988), human Oct-2 (Clerc et al. 1988), Brn-1 (He et al., 1989), Oct-4 (Scholer et al., 1990), and C. elegans unc-86 (Finney et al., 1988). The upstream POU-specific region and the downstream POU homeodomain are indicated. Boxed regions indicate residues exactly conserved in at least 8 of the selected POU domains. Dashes indicate gaps inserted to maintain sequence alignment.

of the 5' untranslated region is not contained in the clone. A probe specific for the Cfla clone 65-c21 hybridized to two transcripts of - 3.4 and 3.5 kb that are first strongly detected in 3 - 6 h embryos (Fig. 5C); their maximum level of expression is in 6-12 h embryos, but they continue to persist through all subsequent stages of development. This later expression is consistent with the postulated role of this gene in regulating the expression of dopa decarboxylase. The 65-c21 and c23 c D N A clones together cover 2.8 kb, and clone c21 contains a

transcript of - 3.3 kb is seen in larvae and pupae (Fig. 5A). The 2.8 kb 33i-c6 cDNA clone was isolated from a 3 - 9 h cDNA library, so it most likely arose from either the 2.8 or 2.9 kb transcripts. The pdm-2 clone 33ii-c9 hybridized to a transcript of - 2.5 kb in 3-6 h embryos (Fig. 5B). Weak hybridization was also seen in 6-12 hour embryos. No hybridization was seen in other stages of development. The c D N A clone is only 1.8 kb in length; it contains a short poly(A) tail and as described above probably contains the entire open reading frame of the transcript, so some

33i-c6 I

I • --=

65-c21

33ii-c9 I

I

~=m

-

I

rp-49 I

®

I

I O

v-

m

i

A

B

C

D

Fig. 5. Northern blot analysis of the three Drosophila POU domain genes. Inserts from the cDNA clone from each gene were hybridized to Northern blots of total RNA isolated from 0-3 h, 3-6 h, 6-12 h, and 12-12 h embryos, and from mixed larvae, pupae, and adults. (A) Probes with pdm-I clone 33i-c6. (B) Same RNA preparation as in part A, but run on a separate gel and probed with pdm-2 clone 33ii-c9. (C) Same blot as in part A, probed with Cfla clone 65-c21. (D) Same blot as in part B, probed with ribosomal protein gene rp49 to show evenness of R N A loadings.

81 poly(A) tail, so that the 5'-most portion of the transcript is probably missing from the cDNA clones. To spatially localize the domains of expression of the genes during early embryogenesis, digoxygenin-labeUed probes were used for in situ hybridization to whole mount embryos. Probes for the pdm-2 and Cfla genes showed that each is restricted to specific domains of expression during development. With a pdm-2 probe, expression is first detected in cellular blastoderm stage embryos in two distinct locations: a broad band of staining extending from about 25% egg length to 50% egg length, and an arc of staining at the anterior end of the embryo (Fig. 6A). As gastrulation begins, the broad band of staining resolves into narrower stripes, and additional stripes appear starting just anterior to the cephalic furrow. In addition, the extreme anterior end expression becomes localized to the anterior dorsal surface (Fig. 6B). As germ band extension progresses further, the expression in stripes disappears, but expression in the developing central nervous system and

A



hindgut becomes apparent (Fig. 6C, D). There is strong expression in the developing brain, and strong expression in discrete subsets of neuroblasts. During germ band retraction staining is seen only in a few cells along the ventral surface of the embryo. The Cfla gene is first expressed during germ band extension, in the tracheal placodes prior to their invagination (Fig. 7A) and weakly in a few cells down the ectodermal midline of the embryo (Fig. 7B). After invagination of the tracheal pits, expression is seen in the cells surrounding each pit (Fig. 7C). Weaker expression is also seen in a group of cells dorsal and anterior to each pit (Fig. 7D) that may be precursors to chordotonal organs (Bier et al., 1990), and in a group of cells on the roof of the stomodeal invagination. Expression later in embryogenesis is quite complex and has not been analyzed in detail, but includes some central and peripheral nervous system expression. Probes specific for the pdm-1 gene did not produce any conclusive staining patterns, even though such

El

Fig. 6. Spatial expression of pdm-2 RNA. Spatial expression was localized in whole m o u n t embryos by in situ hybridization with digoxygenin labelled probes from c D N A clone 33ii-c9. Stages are as in (Campos-Ortega and Hartenstein, 1985). (A) Expression is first detected after cellularization, in a broad band from 25% to 50% egg length (bar), and in an arc near the extreme anterior end (arrowhead). Anterior is right and dorsal is up. Shortly after this, early during gastrulation, additional narrower bands appear starting just anterior to the cephalic furrow. (B) Late stage 8 embryo. During the beginning of germ band extension, the striped expression becomes m u c h weaker, but extends down the length of the embryo as - 1 3 weak stripes in the ectoderm and extending into the mesoderm. The arc of staining at the anterior end becomes localized to a strong dorsal patch (arrowhead). Anterior is to the left. (C) Late stage 10 embryo. The stripes are no longer present, but strong expression is seen in the ventral nervous system, in the developing brain, and in the hindgut. Anterior is to the right. (D) Ventral view of a germ band extended embryo, showing the strong expression in a subset of neuroblasts in the central nervous system. Anterior is to the right.

82

A

B

Fig. 7. Spatial expression of Cfla RNA. Cfla expression was localized by in situ hybridization to whole mount embryos with probes from eDNA clone 65-c21. Anterior is left and dorsal is up in all embryos. Stages are as in (Campos-Ortega and Hartenstein, 1985). (A) Early stage 10 embryos, slightly oblique view. Cfla expression is first seen in stage 10 embryos in 10 bilaterally paired spots in the ventrotateral ectoderm, corresponding to the tracheal placodes. The signal can be seen before any invagination of the tracheal pits occurs. Weak signal can also be seen in the stomodeal invagination. (B) Later stage 10 embryo, dorsal view. In addition to staining in the tracheal placodes, there is a thin line of staining in the ectodermal midline, with slightly stronger staining in a small group of midline cells at the level of each tracheal placode. (C) Stage 11 embryo. The tracheal placodes have begun to invaginate to form the tracheal pits (tp), and the Cfla gene is expressed in the cells surrounding each tracheal pit. Expression is also seen on the roof of the stomodeal invagination (st). Weak expression can be seen in cells anterior to and dorsal to each tracheal pit. (D) Higher magnification view of a stage 11 embryo after tracheal pit invagination. The ring of staining around each pit can be seen. The arrowheads indicate the cells anterior and dorsal to each tracheal pit that also show staining and may be precursors to cells of chordotonal organs (Bier et al., 1990).

p r o b e s detect transcripts on N o r t h e r n blots t h r o u g h o u t d e v e l o p m e n t at a p p r o x i m a t e l y the s a m e levels as the o t h e r two genes, a n d c D N A clones from this gene were isolated from a 3 - 9 h c D N A library. W h e t h e r the lack of staining p a t t e r n s reflects generalized weak expression in all cells o r is a technical artifact will be resolved b y the d e v e l o p m e n t of antisera against the p r e s u m e d nuclear p r o t e i n p r o d u c e d b y this gene.

Discussion W e have identified a n d cloned three m e m b e r s of a family of Drosophila genes whose p r e d i c t e d p r o t e i n p r o d u c t s c o n t a i n a P O U d o m a i n , a n d thus are likely to be D N A - b i n d i n g r e g u l a t o r y proteins. T h e r e g u l a t o r y

roles for these genes are n o t yet k n o w n , b u t all three are s t r o n g l y expressed in e a r l y e m b r y o s . A single r e g u l a t o r y gene m a y be expressed in m u l t i p l e tissues a n d have several different roles in d e v e l o p m e n t . F o r e x a m p l e , m a n y of the Drosophila h o m e o d o m a i n genes i m p o r t a n t in early d e v e l o p m e n t for the g e n e r a t i o n o f the b o d y p l a n are also expressed later in the d e v e l o p i n g n e r v o u s system ( A k a m , 1987). T h e m a m m a l i a n P O U d o m a i n genes in general also have different activities or p a t t e r n s of expression at different stages o f d e v e l o p m e n t ( H e et al., 1989). T h e expression p a t t e r n s of the Drosophila P O U d o m a i n genes s h o w n here i n d i c a t e that they also will likely p r o v e to have m u l t i p l e roles in d e v e l o p m e n t . T h e m o s t interesting p a t t e r n of e x p r e s s i o n is that of the pdm-2 gene. It is o n l y expressed d u r i n g the first 12 h of d e v e l o p m e n t , unlike the o t h e r two genes. Its ex-

83 pression begins just after cellularization in a pattern similar to the zygotic gap genes, and shortly thereafter is expressed in a series of metameric stripes along the length of the embryo, suggesting that it might play a role in segmentation. Its later expression is restricted to discrete subsets of cells in the developing central nervous system. The Cfla gene also has several potential regulatory roles in development. One likely regulatory target is the Ddc gene, since a fusion protein containing the Cfla POU domain can bind in vitro to a site upstream of the Ddc promoter (Johnson and Hirsh, 1990). This binding site appears to be required for Ddc expression only in a small subset of dopaminergic neurons in third instar larvae (Johnson and Hirsh, 1990). However, Ddc expression can only be detected in the larval and adult central and visceral nervous systems, and in the epidermis (Beall and Hirsh, 1987; Konrad and Marsh, 1987), and catecholamine-containing processes can not be detected prior to 18 h of development (Budnik and White, 1988). Thus, the early expression of the Cfla gene shows that it is likely to regulate genes other than Ddc. The Cfla gene is first expressed in the precursors to the tracheal pits and the stomodeal invagination, and later in the peripheral nervous system, none of which are known sites of Ddc expression. Other genes have been found that share some domains of expression with Cfla. For example, the pair-rule gene hairy is first expressed prior to Cfla, but is later expressed at the germ-band extension stage in cells surrounding each tracheal pit (Carroll et al., 1988). The rhomboid gene is transiently expressed in cells that will form the tracheal pits, in a chordotonal organ precursor cell, and in some other peripheral nervous system cells, in addition to other locations that do not overlap with Cfla (Bier et al., 1990). Finally, a number of fly strains containing lacZ/P-element insertions at various chromosomal positions express fl-galactosidase in patterns that overlap with Cfla expression domains (Bier et al., 1989). We were not able to detect by in situ hybridization distinct patterns of expression for the pdm-1 gene. Although this may be simply a technical problem, one other possible explanation is that the gene is expressed at a low level throughout the embryo. In fact, the POU domain of this gene is similar to that of mammalian Oct-1 gene, which is expressed in all tissue culture cell lines that have been examined (Sturm et al. 1988), and is expressed in a wide variety of tissues during mouse development (He et al., 1989). In summary, we have described the isolation and initial characterization of three Drosophila POU domain genes. The developmental expression profiles and in situ hybridization patterns suggest that they may play interesting roles in early development. A definitive determination of what these roles might be must await a mutational analysis.

Experimental procedures Isolation of Drosophila POU domain genes General methods are as described (Ashburner, 1989; Sambrook et al., 1989). Drosophila genomic DNA was amplified with two sets of degenerate oligonucleotides representing two conserved regions in the POU-specific domain of the Pit-l, Oct-l~ Oct-2, and unc-86 genes. The upstream primer was 5 TT(T/C)AA(A/G)CA(A/

G)(C/ A)GI(C/ A)G(A/G/C/T)sAT(T/C/A)AA3' and the downstream primer was TC(A/G)AA(A/G/ C/T)C(T/G)I(C/G)(A/T)(A/G/T)AT(A/G/C/T)GT 3'. The primers were mixed with Drosophila genomic DNA and amplified by 40 PCR cycles of 94 o C, 1 min., 50 ° C, 1 rain., 55 ° C, 2 rain. The PCR products were separated by acrylamide gel electrophoresis, and the single band that was dependent on the presence of both primers and Drosophila DNA was excised and subcloned into pBluescript. Eight individual transformant colonies were sequenced. All showed sequences corresponding to the upstream and downstream PCR primers at their ends. Seven were nearly identical in the region between the primers, with occasional single nucleotide differences, and the sequence encoded a POU domain (later shown to have arisen from the pdm-1 or pdm-2 genes). The eighth also encoded a POU domain, but was clearly from a different gene (later shown to be Cfla). The PCR reaction was repeated with an annealing temperature of 37 ° C, but no additional POU domain sequences were found among 20 clones sequenced (six were missing one or the other of the PCR primer sequences; the remainder were identical with the previously identified class of seven POU domain clones). Purified inserts from the two classes of POU domain clones thus identified were used to screen genomic and embryonic cDNA libraries (Poole et al., 1985). The genomic phage thus isolated were restriction mapped and ordered into three families. The chromosomal location of each family was determined by in situ hybridization of a representative phage to salivary gland chromosomes (Ashburner 1989). The cDNA clones were ordered by hybridization to the different genomic phage families. Each clone hybridized strongly to only one of the three classes of genomic phage. The largest cDNA clone originating from each family was selected for further analysis. Clone 33i-c6 originated from the 33F gene designated pdm-1, clone 33ii-c9 arose from the 33F gene designated pdm-2, and clone 65-c21 arose from the 65D gene Cfla. For each clone, nested deletions from either end were made with exonuclease III and S1 nuclease (Promega). Each of the three cDNA clones was sequenced on both strands with Sequenase (U.S. Biochemicals) using dGTP. In addition, one strand from each clone was also sequenced in its entirety with dlTP to help resolve compressions. The upstream 560 bp of clone 65-c23, an

84 i n d e p e n d e n t c D N A clone arising from the 6 5 D gene that e x t e n d e d further 5 ' than 65-c21, was also seq u e n c e d with d G T P a n d d l T P .

F e i n s t e i n for critical c o m m e n t s . This w o r k was supp o r t e d b y N S F g r a n t DCB-9003908.

Low stringency hybridization Drosophila D N A was digested with EcoRI or HindlII, s e p a r a t e d b y agarose gel electrophoresis, transferred to nitrocellulose, a n d h y b r i d i z e d with a p r o b e p r e p a r e d b y r a n d o m p r i m e r labelling ( F e i n b e r g a n d Vogelstein, 1983). T h e p r o b e was the purified insert from the cloned P C R p r o d u c t r e p r e s e n t i n g the P O U specific region of the pdm-2 gene. H y b r i d i z a t i o n b u f f e r c o n t a i n e d 5 x SSPE, 5 × D e n h a r d t ' s , 200 # g / m l sonicated, d e n a t u r e d s a l m o n s p e r m D N A , 0.1% SDS, a n d either 50%, 40%, 30%, or 20% f o r m a m i d e . T h e h y b r i d i z a t i o n t e m p e r a t u r e was 42 ° C. A f t e r h y b r i d i z a t i o n , the filters were w a s h e d in 2 x SSC, 0.1% S D S at r o o m t e m p e r a t u r e , then at 42 ° C.

RNA analysis T o t a l R N A was p r e p a r e d from e m b r y o s of specified ages, m i x e d larvae, p u p a e , a n d a d u l t s b y acid g u a n i d i n e H C 1 / p h e n o l e x t r a c t i o n ( A s h b u r n e r , 1989). F o r N o r t h ern h y b r i d i z a t i o n , 10 # g of total R N A was l o a d e d p e r lane, s e p a r a t e d on 0.9% agarose gels c o n t a i n i n g form a l d e h y d e , a n d t r a n s f e r r e d to n y l o n m e m b r a n e s (Micron Separations, Inc.). Probes were either gel-purified inserts of the Drosophila P O U d o m a i n c D N A clones, an engrailed c D N A clone to d e t e r m i n e the sizes of various t r a n s c r i p t s (Drees et al., 1987), or a c D N A clone for r i b o s o m a l p r o t e i n rp49 ( O ' C o n n e l l a n d R o s b a s h , 1984) to check u n i f o r m i t y of l o a d i n g different samples. H y b r i d i z a t i o n c o n d i t i o n s were 5 x SSPE, 5 × D e n h a r d t ' s , 200 # g / m l sonicated, d e n a t u r e d s a l m o n s p e r m D N A , 0.1% SDS, 50% f o r m a l d e h y d e , at 4 2 ° C . In situ h y b r i d i z a t i o n to whole m o u n t e m b r y o s was c a r r i e d out as d e s c r i b e d (Tautz a n d Pfeifle, 1989), exc e p t that the p r o b e was labelled b y an u n p u b l i s h e d p r o c e d u r e of N. Patel using a P C R t h e r m a l cycler a n d a single p r i m e r to p r o d u c e a p r o b e labelled on one s t r a n d only. A f t e r h y b r i d i z a t i o n , the initial washes were d o n e at 45 o C. A f t e r staining, e m b r y o s were w a s h e d into 70% glycerol in PBS a n d e x a m i n e d with N o m a r k s i D I C optics.

Acknowledgements T h e a u t h o r s wish to t h a n k C. O h a n d N. Patel for advice on in situ h y b r i d i z a t i o n a n d D. Clegg a n d S.

References Akam, M. (1987) Development, 101, 1-22. Ashburner, M. (1989) Drosophila,A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 434 pp. Beall, C. and Hirsh, J. (1987) Genes Dev., 1, 510-520. Bellen, H., O'Kane, C., Wilson, C., Grossniklaus, U., Pearson, R. and Gehring, W. (1989) Genes Dev., 3, 1288-1300. Bier, E., Jan, L and Jan, Y. (1990) Genes Dev., 4, 190-203. Bier, E., Vaessin, H., Sheperd, S., Lee, K., McCall, K., Barbel, S., Ackerman, L., Carretto, R., Uemura, T., Grell, E., Jan, L. and Jan, Y. (1989) Genes Dev., 3, 1273-1287. Bopp, D., Burri, M., Baumgartner, S., Frierio, G. and Noll, M. (1986) Cell, 47, 1033-1040. Budnik, V. and White, K. (1988) J. Comp. Neurol., 268, 400-413. Campos-Ortega, J. and Hartenstein, V. (1985) The Embryonic Development of Drosophilamelanogaster. Springer, Berlin, 227 pp. Carroll, S., Laughon, A. and Thalley, B. (1988) Genes Dev., 2, 883-890. Clerc, R., Corcoran, L., LeBowitz, H., Baltimore, D. and Sharp, P. (1988) Genes Dev., 2, 1570-1581. Drees, B., Ali, Z., Soeller, W., Coleman, K., Poole, S. and Kornberg, T. (1987) EMBO J., 6, 2803-2809. Feinberg, A. and Vogelstein, B. (1983) Anal. Biochem., 132, 6-13. Finney, M., Ruvkun, G. and Horvitz, H. (1988) Cell, 55, 757-769. He, X., Treacy, M., Simmons, D., Ingraham, H., Swanson, L. and Rosenfeld, M. (1989) Nature, 340, 35-42. Herr, W., Sturm, R., Clerc, R., Corcoran, L., Baltimore, D., Sharp, P., Ingraham, H., Rosenfeld, M., Finney, M., Ruvkun, G. and Horvitz, H. (1988) Genes Dev., 2, 1513-1516. Johnson, W. and Hirsh, J. (1990) Nature, 343, 467-470. Johnson, W., McCormick, C., Bray, S. and Hirsh, J. (1989) Genes Dev., 3, 676-686. Jurgens, G., Wieschaus, E., Nusslein-Volhard, C. and Kluding, H. (1984) Roux Arch. Dev. Biol., 193, 283-295. Konrad, K. and Marsh, J.L. (1987) Dev. Biol., 122, 172-185. Mangalam, H., Albert, V., Ingraham, H., Kapiloff, M., Wilson, L., Nelson, C., Elsholtz, H. and Rosenfeld, M.G. (1989) Genes Dev., 3, 946-958. Monuki, E., Weinmaster, G., Kuhn, R. and Lemke, G. (1989) Neuron, 3, 783-793. Nusslein-Volhard, C., Wieschaus, E. and Kluding, H. (1984) Roux Arch. Dev. Biol., 193, 267-282. O'Connell, P. and Rosbash, M. (1984) Nuc. Acid Res., 12, 5495-5513. Poole, S.J., Kauvar, L., Drees, B. and Kornberg, T. (1985) Cell, 40, 37-43. Sambrook, J., Fritsch, E. and Maniatis, T. (1989) Molecular Cloning: a Laboratory Manual. Cold Spring Harbor Laboratory. Cold Spring Harbor, NY. Scholer, H., Ruppert, S., Suzuki, N., Chowdhury, K. and Gruss, P. (1990) Nature, 344, 435-439. Sturm, R., Das, G. and Herr, W. (1988) Genes Dev., 2, 1582-1599. Tautz, D. and Pfeifle, C. (1989) Chromosoma, 98, 81-85. Wieschaus, E., Nusslein-Volhard, C. and Jurgens, G. (1984) Roux Arch. Dev. Biol., 193, 296-307.