Insect Biochem. Molec. Biol. Vol. 24, No. 4, pp. 419-435, 1994
Copyright © 1994 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0965-1748/94 $6.00 + 0.00
Pergamon
Characterization of a c D N A and Gene Encoding a Cuticular Protein from Rigid Cuticles of the Giant Silkmoth,
Hyalophora cecropia DAVID J. LAMPE,*t JUDITH H. WILLIS:~
Received 15 April 1993; revised and accepted 13 August 1993 We have isolated a cDNA and gene encoding a protein (HCCP66) found in the rigid cuticles of both larvae and pupae of the silkmoth, Hyalophora cecropia. The cDNA encoded a protein similar to cuticle proteins isolated from several other insects and contained a sequence motif similar to one present in a "family" of cuticular proteins from flexible cuticles. The gene had a structure similar to that of cuticle protein genes isolated from Drosophila melanogaster, albeit with a much larger intron that contained three copies of a transposable element-like sequence similar to short interspersed repeated DNA elements (SINEs). A sequence found 5' to the transcription start site matched the Octamer (Oct) cis-acting element. This sequence was capable of binding protein(s) from whole cell extracts of wing epidermis with high affinity and sequence specificity suggesting a role in transcriptional regulation. Cuticle Cuticleprotein gene Metamorphosis Ecdysteroid Juvenoid Octamer
INTRODUCTION Insect metamorphosis is a complex form of postembryonic development involving cell death, cell proliferation, and changes in the expression of many cellular products (Wigglesworth, 1959; Riddiford, 1985). The most conspicuous changes occur in the cuticle, the extracellular secretion of the epidermis. For example, in the giant silk moth, Hyalophora cecropia, the abdominal sclerite epidermis secretes in turn flexible larval cuticle decorated with many rigid tubercles, pupal cuticle that is rigid and heavily tanned, and finally adult cuticle that is specialized by the presence of scales. Because of the conspicuous changes that occur, the integument has become a model system for the study of metamorphosis (Wigglesworth, 1959; Riddiford, 1985). Attention has been focused on cuticular proteins that account for up to half the dry weight of the cuticle (Neville, 1975). These appear to comprise several distinct protein families based on currently available sequence data (Willis, 1989) and the finding that the presence of any given protein could be most closely correlated with the type of cuticle (i.e. flexible or rigid) in which it was *Department of Entomology, University of Illinois, Urbana, IL 61801, U.S.A. t A u t h o r for correspondence. ~:Department of Zoology, University of Georgia, Athens, GA 30602, U.S.A. 419
found rather than any particular stage or region (Cox and Willis, 1985). Cuticle protein genes have come under scrutiny recently to define the molecular genetic correlates of metamorphosis. Cuticle protein gene expression has been shown to be under the control of both ecdysteroids and juvenoids using in vitro cell and organ culture. In Manduca sexta (Hiruma et al., 1991), Drosophila melanogaster (Apple and Fristrom, 1991) and Bombyx mori (Nakato et al., 1992), ecdysteroids are necessary for, and negatively regulate, cuticle protein gene expression, although these studies differ in their conclusions as to whether regulation is direct or indirect. The role of juvenoids in the expression of these genes is less clear. The application of juvenile hormone (JH) or its analogues to epidermis before a defined critical period results in the production of the same kind of cuticle as the one previously synthesized. These results imply an effect of JH on cuticle protein genes, but whether they are affected directly by the hormone is unknown (Nijhout and Wheeler, 1982; Willis, 1990). We have isolated a eDNA and gene encoding Hyalophora cecropia cuticular protein 66 (HCCP66), a protein found in rigid cuticles of the larvae and pupae (Cox and Willis, 1985). A comparison of the protein sequence encoded by the cDNA with other cuticle protein sequences revealed a motif common to two families of cuticle proteins previously thought to be
420
D A V I D J. L A M P E and J U D I T H H. WILLIS
distinct. The gene was typical in structure for a cuticle protein gene and contained an Octamer (ATTTGCAT; Schreiber et al., 1989) sequence 5' to the transcription initiation site capable of high-affinity binding with a protein(s) from epidermal cell extracts. MATERIALS A N D M E T H O D S
Animals Larvae of H. cecropia were reared out-of-doors on wild cherry trees and brought into the laboratory at appropriate ages. Larvae were maintained for brief periods in the laboratory on wild cherry leaves. When cuticle-synthesizing pupae were needed, fifth instar larvae were allowed to spin, cocoons were opened, pharate pupae monitored, and epidermis dissected within 48 h after ecdysis. Diapausing pupae were maintained at 6°C and were checked for signs of development before use. Pharate adults were obtained by placing chilled diapausing pupae at 25°C and monitoring their development until wing pigmentation could be detected through the pupal cuticle (Schneiderman and Williams, 1954). Protein and amino acid analyses In order to obtain sequence data from an internal peptide, HCCP66 was purified by running an extract of larval tubercle cuticle on an isoelectric focusing gel (Serva ampholytes pH 5-6). A 6 cm strip containing the desired polypeptide band was cut out and electrophoresed on an SDS (sodium dodecyl sulfate) polyacrylamide gel, electroblotted onto an Immobilon P filter in CAPS (10 mM 3-[cyclohexylamino]-l-propanesulfonic acid, 10% methanol, pH 11.0), extracted in 40% acetonitrile, dried under vacuum, resuspended in water and then redried (Matusadiara, 1989; Stone, 1989). This dried material was sent to the W. M. Keck Foundation Biotechnology Resource Laboratory, Yale University, where its quantity and amino acid composition were determined. The band extract contained 17 #g (c. 1.4 nmol) of protein. The protein was reduced, carboxyamidomethylated and then digested with Lys-C. Peptide fragments were separated on a 4.6 × 25 mm Vydac C 18 column that was eluted at a flow rate of 0.5 ml/min, with an increasing concentration of acetonitrile. One fragment eluting at 75 min was selected for sequencing, repurified on a 1.0 × 25 mm Aquapore C8 column, and sequenced on an Applied Biosystems model 470A gas phase microsequencer. Twenty-eight amino acids were identified with confidence (Table 1), three others tentatively. An additional peptide (eluting at 64 min) was subsequently selected. Only seven residues were identified and these corresponded to the N-terminus of fragment No. 75. Total RNA and poly-A 4- m R N A isolation Tissues used for RNA extractions were dissected from surface sterilized animals. These were homogenized in c. 2 tissue vol of 100mM Tris-C1 (pH7.8), 2 0 m M EDTA, and 1% SDS. The homogenates were spun in a
T A B L E 1. Degenerate primers and corresponding regions of HCCP66 (1) N-terminal peptide sequence and 768-fold degenerate primer, NDEG SDFS SFSYGVADPSTGDFK S OIESXL A C C 5 ' GATTTTAAAAGGCAGATTGA 3 ' C
C
GTCT
A
A
(2) Internal peptide sequence and 1536-fold degenerate primer, I D E G KDPALIAAAPYITAPYGYAVPYAYTsPFgiF A A A A C G C C C G 5 'T A G C C A T A G G G G G C G G T A A T T T T T T
3 '
Underlined regions in protein sequences are regions from which primers were designed. N D E G sequence was taken from the sense strand, I D E G from the antisense strand. Protein sequence data from Willis (1989) and this report. Residues in lower case were not identified with confidence.
clinical centrifuge for 5 min on the highest setting to remove unlysed cells and cell debris. The supernatant was transferred to eppendorf tubes and repeatedly extracted with an equal volume of phenol/chloroform/ isoamyl alcohol (25:24:1) until no white interface appeared between the aqueous and organic layers. Total nucleic acids were precipitated with ethanol. High molecular weight RNA was isolated by a urea-LiC1 method (Auffray and Rougeon, 1980). Messenger RNA was isolated using FastTrack (Invitrogen) following the manufacturer's instructions, except that total RNA from pupal wings was used instead of tissue as the starting material. Reverse transcription-polymerase chain reaction Degenerate oligonucleotide primers were designed using N-terminal (Willis, 1989) and internal protein sequence data (see previous text) focusing on the least degenerate portions of each (Table 1). Primers were synthesized at the University of Illinois Biotechnology Center, Urbana, Ill. Reverse transcription-polymerase chain reaction (RTPCR) was performed essentially according to Kawasaki (1989) using the I D E G primer (see Table 1) in the cDNA synthesis step. The RT-PCR product was made blunt ended with E. coli D N A polymerase (Klenow fragment) and cloned into the EcoR V site of pcDNA II (Invitrogen). Construction and screening of a cDNA library from pupal wing epidermis Approximately 3.5/~g of 48 h old pupal wing m R N A isolated as described above was used to construct a cDNA library using Invitrogen's Librarian II (version 2.4) cDNA library synthesis kit following the manufacturer's instructions, cDNA > 400 bp was size selected on an agarose gel for construction of the library, based on an apparent molecular weight of 14,600 for H C C P 66 (Willis, 1989). Ten thousand recombinants were screened using standard procedures (Sambrook et al., 1989).
cDNA AND GENE ENCODING A CUTICULAR PROTEIN FROM RIGID CUTICLES
Hybridizations and autoradiography
421
R N A probes were constructed using suitably finearized templates and P r o m e g a ' s R i b o p r o b e R N A synthesis kit (SP6 or T7 polymerase). U n i n c o r p o r a t e d nucleotides were removed either by two successive ethanol precipitations or by application to a 1 ml Sephadex G-50 spin column ( S a m b r o o k et al., 1989).
All filters ( c D N A s and N o r t h e r n blots, see later) were prehybridized for at least 2 h in 6 x SSPE (60 m M N a phosphate, p H 7.4, 150 m M NaCI, 6 m M E D T A ) , 1% SDS, 10 x D e n h a r d t ' s solution, and 100 # g / m l sonicated denatured salmon sperm D N A at 42°C. Hybridizations were performed for 1 6 - 2 0 h in 50% formamide (v/v), 6 × SSPE, 1% SDS, and 5 0 / t g / m l sonicated denatured salmon sperm D N A in heat sealed plastic bags at a temperature 20°C below the calculated T~, o f the probe ( S a m b r o o k et al., 1989). Typically, 1 × 1 0 6 cpm probe/ml hybridization solution were used. W h e n R N A probes were used, care was taken to treat solutions with D E P C (diethylpyrocarbonate) where practical to minimize degradation o f the probe. Washes were performed twice for 15 min in 6 × SSPE, 1% SDS at r o o m temperature; twice for 15 min each in 1 x SSPE, 1% SDS at 37°C; and a final wash for 1 h in 0.1 × SSPE, 1% SDS at 65°C. A u t o r a d i o g r a p h y was performed with X - O m a t - A R film ( K o d a k ) using intensifying screens.
The 5'-most nucleotide sequence o f the H C C P 66 c D N A was obtained using the R A C E (Rapid Amplification o f c D N A Ends) procedure o f F r o h m a n (1990) with < 48 h old pupal wing epidermal R N A . Primers specific for H C C P 6 6 used in the R A C E protocol are underlined in Fig. 1. Primer extension was performed as described by Samb r o o k et al. (1989) using 10 # g o f total epidermal R N A from appropriate regions and stages and end-labeled R A C E secondary primer to prime the reverse transcription reaction.
Synthesis of radiolabeled probes
Northern analysis
D N A probes were constructed by the r a n d o m - p r i m e d synthesis m e t h o d ( S a m b r o o k et al., 1989) using ~[32P]dATP (3000Ci/mmol) and r a n d o m hexamers (Pharmacia).
R N A agarose gel electrophoresis and sample preparations were performed according to Miller (1987) using 10 # g each o f total R N A from various epidermal samples.
-
cDNA 5" sequence determination and primer extension analysis
636 536 436 336 236 136
CCCGT~TAGTATCCACGGTTA~GGTTACGCTCGCACGCTGCGTGT~TAAGTTAAATATCTCAACAACTTGGGTCCGTGATAA~TG~T~ GCTCCGATATAAACTGACTTCGGTCCTTGCGTCGTCGTCGTTTAAGTTGTTTTGGCTAAAACTTTATAATTAATTCTAACATAC~TATT~T~AC TATAATATAAACGGTATTTAAAACCTTCCTGATATGTAATTATACTATTTTGATATTTATATTTGATATTTATA~TTTATAT~TATAT&TATATATAT~ AAGCAATCGTCGCACTTTATAATAGAGGCTATGAAGTATTGAAAAAATTATC~CTTTAATCCAATTGACATATTATAcAATTATGGATAAATATT~G CGAATTAAACACGTGTTCGAATATAGTTAATAATAAATTGTCTACTTGCTTTCTTAAATAATTiL~GTAC TAAGGT C C A A T T ~ T ~ GATTTTTAATATCAAAATAACGAGTTTCGT GTCAATTCTT CCTGTTTT C T G ~ ] ~ ] ~ T C C T GCTTCGAAT TATGAGT ~ T ~ T ~ 36 ACGTTCA TA q'ATAAACTGGTCGTAT C TTTTCAACA tit CA GAT T G T C A C A C G A C G R ~ E m ~ A CATCAAAATGCTCGTGAA~T~TT~
65 165 265 365 465 565 665 765 865 965 1065 1165 1265
TAATACTCAATACTATCTTTATCTATACTAATct a t c c t a a t a a a t a a a a t t g a a g t
gt ctgt tt gtaatat
C
M L V K -14 caaaat taatt ttt ttt get aaat aaat
gtttata~atacctatac~aaaaatacaattttttaaattttt~tct~tct~t~tgt~tgtatgtttggtccgggtaatctctqgaaCgg~gggacgga ttta~cg~g~gtttttttggta~atagctgaa~tta~t~c~taacat~g~aactttt~ata~ta~taatataattaq~gtaccgc~taaaaaaa~tt aatactttaaaaaattaat aaaat ctgattcaat gcacagtttacgggcaagtgctagaagttatt agaaat atttttactttatcttatta=aaaata a aatattagggttccgtataaaaaaaataaaacg•agtgaagaacatattatgaaacgcgggcgcagccgcgggcacacagctagtAAATaaataaaatta aagcg=ctat tt gtaatat caatat caact art ttt t tot t &at acat at t aat gt atat acggt acagat agcaagaagataacaattttt aacatttt tctttctctgtctgtttqttccgggtaatatctqgaacgattggaccgattttgataggtatttttttgcaqatagcttaagttacttcacgtaacttag g~ttttattattcgttaataattagggttCc~cataaa~a~ttaaact~atgtagttataaa~aaaataCggtt~atggatagCaca~tcaaaacatt~ ttt t ccaat gatt at t taaaactat at aagg=agagaagt at gtat gaagggcgggcggaaccgcgtggacacagct agt gt aaaat agt aaacgcaact ttttaaatatttgtCtgtatgtctgttgatttgttctagttaat•t•tggaaagattgtgccgc•tttga•tggactttttgtgatagatacctgaagtt acacggcgt aact taggt t taacaaaagaaaaacqaaaat aat tggggt t ccgcgt aaaaaaatt aaaagt cacgaaatqt gaataaagt ccgat t caat tcacagttttactggcatatgaatgaatctgcaaqaaatatctttactctctgtta•a•ta•caaaaacgatttaggg•ccaaaaaatagtgtccatcaa AACATTATTTTTCAATGATTATTTAAAAACATATTACGTGGTGAAGAATGGCATGGAATAGAAAACATA
~t~io
~DcpMmcm ~ t ~ e ¢ ~
3505 TATTCCAATGAGAAATACAAcGGCAcGTAAAAATTGATATATGATTTCCTTAAATTAAAAATATTcAAGTTTACGGGCGATATTTTACAATATTCGATAT 3605 TATTTCTTTAGTTCGTCGCTTGCTTCGTGTTCGTGGCTGTTGCCAGcGCCAGCGATTTCTCCAGCTTCTCCTATGGCGTcGCCGAcCCTTcCACCGGG~ F V A C F V F V A V A S A S D F S S F S Y G V A D P S T G 3705 ~ T T C A A A A G C C A G A T T G A A A G T C G C T T A G G ~ G A T A A C G T C C A G G G C c A A T A T T C T C T T T T G G A A T ~ G G A C G G C A C ~ G A A C T G T T G A C T A C ~ F K S 0 I E S R L G D N V Q G Q Y S L L E 5 D G T Q R T V D Y A
D
17
A
50
3805 GC~ATCTGAAGGTTTCAATGCTGTCGTCAGAAAGGACCCAGCTCTCATCGCCGCTCCACCTTACATCACCGCTCCTTATGGCTATGCTGTA~CCTATGCCT G
S
E
G
F
N
A
V
V
R
K
D
P
A
L
I
A
A
A
P
Y
I
T
A
P
Y
G
Y
A
V
P
y
A
Y 84
3905 A C A C A A G T C C T T A T C ~ A A T C T C C A A C C T C G C G A A C T A T A G A G C T T T G A A A T T C G C T T C A G C T c T T C C T T A C A G C A C ~ % G T C T T C T T C T A A A ~ T S P Y G I S N L A N Y R A L K F A S A L P Y S R V F F Tar (F) (F)
112
ATATACCTGGATTT~TAAATAAAAATACTTCTC~CTTGTCAGCTGGTTGTTTCTATTTTATACATTacaaq~t~tgtacccqcgactgcaccctcgcttc
4005 4105 atatatt ccttcagt gcgttttaat tttttt ataccgat ccctaactt ttttaatactat gatcaaaaaaaatat atttgt cat aaatt caagcacttga 4205 ¢cgcaaaact gcatt aaat aggatt tt art aaattt ttt agqagt ttt at ttt ttt t acacggaaccct aat aaataaaaat gaaagtagccgtgcgccg 4305 4405
tataacttcagttatctatcaaaaaaagt~c~qtcaaaatccgt~caaccgttccggagattacc~ggaacaaacaqatagacagacaaaaattttaaaa att~Catttttggtattggtgtcgtatata~cttCatat~tatatagtaaaaaa~a~a~atttccaattttatatattaacgacaa~
FIGURE 1. HCCP66 genomic, cDNA, and conceptual protein sequences. The genomic sequence and conceptual translation are shown. Nucleotides are numbered on the left and amino acids on the right. The first nucleotide of the cDNA is listed in bold (position + 1). Nucleotides 1-54 and 3618-3642 were not present in any cDNA and are known only from RACE clones and the genomic sequence. The TATA box is underlined with a double underline. Fully and partially palindromic sequences in the 5' flanking DNA are underlined. Palindromes were identified with the program LOOPS using a window size of 30 and a cutoff of 50%. Any sequence of over 6 bp showing palindromy around the apex of the loop was accepted. The site of the secondary primer used in the RACE and primer extension experiments is underlined in bold. The Octamer and a sequence similar to a Drosophila EcRE are shown in bold. The insertion element sequences are in lower case (see Fig. 5 for orientation). We were unable to determine their exact boundaries but approximate boundaries were inferred from dot plot analyses (data not shown) and comparisons with the basic attacin IE. A large portion of the intron was omitted. The entire sequence including additional 5' flanking DNA has been submitted to Genbank (accession No. L13971).
422
DAVID J. LAMPE and JUDITH H. WILLIS
RNA was transferred to a nylon filter (Nytran, Schleicher and Schuell) using the manufacturer's instructions and hybridized with an antisense RNA probe constructed using T7 RNA polymerase and a linearized HCCP66 cDNA. Hybridization was as outlined as before at 58°C for 16 h. The filter was washed three times in 1 x SSPE/0.5% SDS at 65°C and once in 0.1 x SSPE/0.1% SDS at 60°C. Genomic library characterization
construction,
screening,
and clone
A genomic library was prepared from a single larva using size-selected genomic DNA partially cut with Sau3A and ligated into LambdaGem-2 (Promega Corp.; see Binger, 1991). A total of 1.65 x 105 pfus were analyzed in the initial screen. Filter lifts and filter preparation were done as described by Binger (1991). Hybridization conditions were as above using an antisense RNA probe made from the subcloned PCR product used to screen the pupal wing epidermal cDNA library. The secondary screen was performed with the same probe at 57°C at a density of 100 plaques/plate. One clone was chosen for a large scale DNA preparation (plate lysis method, Sambrook et al., 1989) and restriction mapping. Phage D N A was digested with the restriction enzymes shown in Fig. 5 at a concentration of 10-15 U//~g D N A and a temperature of 37°C. All possible double digests were performed. Restriction fragments were separated on 0.8%, 1 × TAE (40 mM Tris-acetate pH 8.5, 2 mM EDTA) agarose (SeaKem, FMC) gels. The D N A in the gel was depurinated, denatured, and neutralized according to Sambrook et al. (1989), transferred to a nylon filter by the method of Southern (1975), and afterward was baked at 80°C for 2h. The filter was screened with an antisense RNA probe made from the cDNA.
Wis.). Pairwise comparisons of insertion elements were performed using the program SEQCOMP with a window of 10 nucleotides, a match cutoff of 50%, and an allowable gap of 15 nucleotides. 5' flanking sequences of cuticle protein genes were compared with the program C O M P A R E using a window of 15 nucleotides and a similarity cutoff of 80%. The G C G package of programs and the transcription factor database was used to identify potential cis-acting sequences in the 5' flanking DNA. The Genpept and SwissProt libraries (release Nos 72.0 and 16.0) were searched using the conceptual translation product minus the signal peptide. Genbank (release 72.0) was searched using the cDNA sequence. All searches were performed using FASTA (Pearson and Lipman, 1988). The gene sequence has been submitted to Genbank (accession No. L13971). Whole cell lysates and nuclear extracts Whole cell lysates were prepared by the method of Manley et al. (1983). Nuclear extracts were prepared similarly, except the tissue was homogenized in four packed tissue volumes of solution A/B [a 1:1 mix of solution A (10mM Tris-HC1 pH 7.9, 1 mM EDTA, 5 m M DTT, 0.1 mM PMSF): solution B (50mM Tris-HC1 pH 7.9, 10mM MgC12, 2 m M DTT, 0.1 mM PMSF, 25% sucrose, and 50% glycerol)]; a crude nuclear pellet was isolated by centrifugation in a microfuge for 2 min. The pellet was resuspended and washed twice with 500 p l of buffer A/B. The final pellet was resuspended in eight times the estimated pellet volume of buffer A/B and the remainder of the procedure followed exactly. Protein concentrations were determined with a Pierce Micro BCA kit according to the manufacturer's instructions and typically were between 7 and 10 mg/ml. Gel retardation assays
Sequencing Sequencing was performed using the dideoxy chain termination method (Sanger et al., 1977). All templates were sequenced twice, incorporating both d d G T P and ddlTP to clarify regions containing compressions. Primers used for sequencing were made at the University of Illinois Biotechnology Center. Artifact banding was reduced by adding terminal deoxynucleotidyl transferase to each reaction tube after the termination step (Fawcett and Bartlett, 1990). The subclone p266EE3.8-5 (an EcoR1 fragment containing the first exon and inserted elements, see Fig. 5) was used to make a nested set of deletions (3'-5') using the method of Henikoff (1984) to sequence the region of the clone containing the three tandemly-repeated SINE elements. Computer analysis of sequence data Sequence data was analyzed mainly using the DNASTAR package of programs ( D N A S T A R Inc., Madison,
DNA fragments used in the gel retardation assays are shown in Fig. 7 and were isolated on low melting temperature agarose gels in l x TAE (Qian and Wilkinson, 1991). DNA was end-labeled by filling in 5' overhangs with ~_[32P]dGTP (either 400 or 3000 Ci/mMol) and Klenow fragment in React 1 buffer (BRL; Tabor and Struhl, 1989). Unincorporated nucleotides were removed using Sephadex G-50 spin columns (Sambrook et al., 1989). The specific activity of the DNA was determined by TCA precipitation onto glass fiber filters. Oligonucleotides used for competition were either obtained from a commercial source (Octl, Promega No. E3241) or synthesized locally. Oligonucleotides were made double-stranded by combining equimolar amounts of each complementary strand, heating the mixture to 85°C, and allowing it to cool slowly to room temperature. Standard DNA/protein reactions were 2 0 p l in volume and contained the following: 20 mM HEPES
cDNA AND GENE ENCODING A CUTICULAR PROTEIN FROM RIGID CUTICLES (pH 7.9), 1 0 0 m M KCI, 10% glycerol, 2 # g doublestranded poly dI-dC (Pharmacia), 5 - 1 5 # g protein (whole cell lysates or crude nuclear extracts), and c. 10,000 cpm of labeled D N A (0.2-2.0 ng, depending on the specific activity). A control was always performed containing no protein. The radiolabeled D N A was added last and the mix allowed to incubate at r o o m temperature for 15 min after which the entire mix was loaded onto a polyacrylamide gel. Whole cell lysates were sometimes used, refrozen on dry ice, stored at -80~C, and used again with no noticeable difference in gel retardation patterns. 5% polyacrylamide ( a c r y l a m i d e : b i s = 8 0 : l ) , 0.5 x TBE (45 m M Tris-OH, 45 m M H3BO3, 1 m M EDTA) gels (1.5 m m thick) were cast and allowed to polymerize at least 2 h after which they were transferred to a cold room and allowed to equilibrate to 4°C. Gels were preelectrophoresed at 125 V for 2 h by which time the current reading had stabilized. One lane (without a sample) was loaded with ~ 1 #1 of I0 x TBE agarose sample buffer containing bromophenol blue to monitor the extent of electrophoresis. Samples were electrophoresed until the bromophenol blue was at the bottom of the gel. The gel was removed from the apparatus, transferred to a piece of W h a t m a n 3 m m paper, covered with a plastic transparency sheet (Xerox Corp. No. 3R3028), and dried on a vacuum drier at 60°C for 1.5 h. The dried gel was then autoradiographed using K o d a k X - O m a t A R film with intensifying screens at
423
(1990). The c D N A sequence derived from overlapping c D N A clones and R A C E clones is shown in Fig. 1. Primer extension analysis of HCCP66 m R N A was performed in order to confirm the transcription start site as determined by RACE, using epidermal R N A extracted from different regions and stages. Two m R N A species were found that differed in length by 3 bp (Fig. 2) and the same two species were present in all the R N A samples known to contain HCCP66 m R N A . No products were detected in the R N A sample from diapausing pupal wing epidermis.
Message distribution of HCCP66 The detection of a major R N A in Northern analyses (Fig. 3) was consistent with the distribution of HCCP66 in cuticle (Cox and Willis, 1985), namely that it was present in young pupal wings and larval tubercles only. Unexpectedly, though, three bands were detected. The most intensely hybridizing R N A migrated at an apparent size of 775 bp. Given the size of the HCCP66 c D N A (481 bp) this would imply a poly-A tail of c. 300 nucleotides. A faint band (c. 1.3 kb) was present in both FW and H W samples while a second weak signal A C
G
T A
FW
HW
Tu
DFW
_ 80~'C.
RESULTS
Isolation of a partial cDNA for HCCP66 using RT-PCR A reverse transcription-polymerase chain reaction (RT-PCR) containing total R N A isolated from < 24 h old pupal wing epidermis and degenerate oligonucleotides constructed based on peptide sequence data from HCCP66 (Table 1) yielded a single discrete product of c. 200bp. Cloning and sequencing of the product showed it to be 184 bp long. It contained one long open reading frame. The 3' sequence matched a sequence encoding 11 amino acids of the internal protein sequence not used in designing the P C R primer (see Table 1).
Isolation and characterization of a cDNA for HCCP66 Ten thousand bacterial colonies from a c D N A library prepared from cuticle-forming pupal wing epidermis were screened using a random-primed labeled probe consisting of the R T - P C R product. The primary screen revealed seven positive colonies which contained inserts ranging from c. 450-520 bp. Sequencing revealed all to contain the same cDNA, but of slightly different lengths and none contained a full length c D N A as evidenced by the lack of an initiator methionine codon. The Y-most sequences of the c D N A ( + 1 to + 5 4 and +3616 to 3766 in Fig. 1) were obtained using the rapid amplification of c D N A ends (RACE) P C R procedure of F r o h m a n
i i¸ iiii
FIGURE 2. Primer extension analysis of epidermal RNA from various regions and stages. Ten micrograms of total RNA (spectrophotometrically determined) were loaded per lane as determined by spectrophotometry. FW = < 24 h old pupal forewing; HW = < 24 h old pupal hindwing: T u - <24 h old larval tubercle; DHW = diapausing pupal forewing. A sequencing ladder of a full-length RACE clone is shown at left for a size standard. The RACE clone contains a poly-A tract added as part of the RACE protocol.
424
D A V I D J. L A M P E and J U D I T H H. WILLIS
"41"-3.7 kb
- ' ~ 1 . 3 kb
~ - 0 . 8 kb
F I G U R E 3. Northern analysis of total epidermal R N A samples from various stages and regions using a HCCP66 c D N A probe. F D = late larval forewing discs (6 days post-spinning); H D = late larval hindwing discs; F W = < 24 h old pupal forewing; H W = < 2 4 h old pupal hindwing; D F W = diapausing pupal forewing; D H W = diapausing pupal hindwing; A W = 16 18 day pharate adult wing; Te = < 2 4 h old pupal testes; Tu = < 2 4 h old larval tubercle. Size standards at right are based on an R N A ladder not shown.
(c. 3.7 kb) was present in hindwing and tubercles. These fainter bands may be due to hybridization with other messages encoding closely related cuticle proteins as HCCP66 appears to be a member of a multigene family based on N-terminal amino acid data (Willis, 1989). Conceptual translation of cDNA Conceptual translation of the cDNA predicted a protein of 129 amino acids, the first 17 of which were not present in the N-terminal amino acid data previously described (Willis, 1989) and which presumably constituted a signal peptide (Fig. 1). Indeed, the presumptive cleavage point for this signal peptide followed the ( - 3 , - 1 ) rules of von Heijne (1984) and the sequence was also extremely hydrophobic based on hydropathy analysis (data not shown). The mature protein was thus 112 amino acids long with a molecular weight of 12,442 and an estimated pI of 5.1, in reasonable agreement with values obtained from SDS and isoelectric focusing gel studies on extracted protein (14,600kDa and pH 5.4, respectively; Willis, 1989). The conceptual translation of the cDNA matched the N-terminal amino acid data exactly, however two mismatches were found with the amino acid sequence data for an internal peptide (Fig. 1). Tyr88 and Ser91 of the conceptual translation (present both in the gene and
cDNA) were both Phe in the amino acid data. The former mismatch represents a conservative substitution but the latter does not. Both changes, however, can be produced by single nucleotide substitutions at the second position of a Phe codon (TTT to TAT for Tyr and TTC to TCC for Ser) so these may represent genuine polymorphisms. The HCCP66 cDNA encoded a protein rich in serine and alanine which together accounted for c. 25% of the total residues. Alanine residues were distributed mainly in the carboxy-half of the protein while serine showed no such skewed distribution. Proline residues, too, were confined mostly to the carboxy half of the molecule. Skewed amino acid distributions have been noted in many other cuticle proteins (Klarskov et al., 1989). Interestingly, five of the six prolines in the carboxy half of the protein were followed by a tyrosine residue and then between 4 and 13 mostly hydrophobic residues. This arrangement is similar to that found in some of the cuticle proteins of L. rnigratoria, B. mori and D. melanogaster (Hojrup et al., 1986; Klarskov et al., 1989; Nakato et al., 1990; Apple and Fristrom, 1991). The mature HCCP66 sequence was used to search the GenPept and SwissProt sequence libraries and the cDNA sequence was used to search the GenBank library. Four sequences were found that strongly
cDNA AND GENE ENCODING A CUTICULAR PROTEIN FROM RIGID CUTICLES
425
I~R
~
u
HCCP66
I I , ~
.
-~
-
,I
-2
I EI~84
I I.
-1 -2 -3
1:3.,
,
O
>.,
l
I
I
,,
.w2,..I
I
100
]
i
,
,
,
'
,
100
2 1 0
-1
,,,,,
-2
I
I 3
--
~
m
I
I
TMCP20 .
-2 -3
-4
,
,
~ P.a
.
I
--
,
1
.
A
,
I .
i
Ii ,
,
i
I
ill I~
. . . .
I
I
I I
TMCIO_2 I I
i
. . . .
]
-1 -2 -3
I
I
I
I
I
I
i
I
100
200
Amino Acid FIGURE 4. Comparison of hydrophobicity plots of HCCP66 and four similar proteins aligned by sequence similarity. Hydrophobicity plots were constructed according to Kyte and Doolittle (1982) using the computer program DNAStrider (Mark, 1988). The plots were aligned around the region of greatest sequence similarity (between dashed lines). R&R indicates the region containing a consensus sequence identified in proteins from flexible cuticles by Rebers and Riddiford (1988). Proteins are identified in the caption of Fig. 10.
resembled HCCP66. These were LMCP8 (a locust cuticle protein) (Hojrup et al., 1986), EDG84 (a Drosophila pupal cuticle protein) (Apple and Fristrom, 1991), and two cuticle proteins from the coleopteran, Tenebrio molitor, TMCP20 (Charles et al., 1992) and TMCP22 (Bouhin et al., 1992). The similarity was only in a region of hydrophilicity shared by the proteins (see Fig. 4). In sequence alignments HCCP66 was found to be 56.7% identical to LMCP8 (60 amino acid overlap), 47.2% identical to EDG84 (108 amino acid overlap), 38.4% identical to TMCP20 (99 amino acid overlap), and 53.7% identical to TMCP22 (54 amino acid overlap). All of these proteins come from rigid cuticles or,
in the case of EDG84, from ecdysteroid-stimulated imaginal discs, and assumed to be a pupal cuticle protein. Isolation and HCCP66
characterization
of a gene
encoding
One of the cDNA clones was used to screen a H. cecropia genomic library and a single plaque was chosen
for extensive analysis. The restriction map for its insert is shown in Fig. 5. Under the stringency conditions used only one hybridizing band was detected in each restriction digest. Some cuticular proteins have been shown to reside in clusters (Snyder et al., 1982; Rebers and Riddiford, 1988) but under the high stringency
426
DAVID J. LAMPE and JUDITH H. WILLIS 5' 3' L I H
I
E S ,
,
T¢1.2~3 ..
E
I
H
I
H TH
llJ
Vi
Xb Xh
i
~ l k b FIGURE 5. Restriction map of a genomicclone containing the HCCP66 gene. Restriction enzymeabbreviations are E = Eco R I; H = Hind III; T = Tth 111 I; Xb = Xba I; Xh = Xho I. Exons of the gene are shown as black boxes. Labeled arrows show approximate position and orientation of insertion elements. conditions used no clustering was evident for genes similar to the HCCP66 gene. The exon-intron boundaries of the HCCP66 gene were established by comparison of the gene sequence with that of the c D N A and R A C E clones obtained earlier. The HCCP66 gene contained two exons. The first was 55 bp long and coded for only four amino acids. The second exon was 428 bp long; the intron separating the two was 3561 bp. The intron-exon boundaries showed typical eukaryotic splice junctions (Padgett et al., 1986).
The H C C P 6 6 gene contains three copies of a highly repetitive D N A sequence Approximately 1500 bp of the intron length could be accounted for by three tandemly-repeated copies of a repetitive D N A sequence. The position of this sequence and another copy in the opposite orientation in the 3' flanking D N A are shown in Fig. 1. These sequences are homologous to sequences from flanking and intronic D N A from a number of other Cecropia genes and correspond to an "insertion element" (IE) first reported in the 5' flanking D N A of the basic attacin gene (Sun et al., 1991). The first IE in the intron as well as in the one in the 3' flanking D N A were nearly the same length as the basic attacin IE and were > 80% identical at the sequence level. HCCP66 IE No. 2 lacked approximately the 3'-most 100bp and HCCP66 IE No. 3 lacked approximately the Y-most 100bp of the "intact" IE.
H C C P 6 6 gene promoter elements The transcription start site of the gene was inferred from primer extension experiments and R A C E clone sequence data. The Y-most of the two sites identified by primer extension and R A C E products matched an insect non-heat shock transcription start site consensus sequence (Hultmark et al., 1986) at six out of seven positions but only at eight out of 13 positions for a Drosophila cap site consensus (Cherbas et al., 1986). The other identified site matched only two out of seven positions of the non-heat shock consensus and 6/13 bp of the Drosophila consensus. The first base of the larger product has therefore been designated position + 1. The shorter product may be the result of a premature stop by reverse transcriptase (Sambrook et al., 1989) or may be a genuine second transcription start site that is used less efficiently by R N A polymerase II, as shown by the lesser amount of product produced in all samples of the primer extension analysis (Fig. 2). A T A T A box was centered 26 bp upstream of the major start site. A search for the common CCAATbinding transcription factor/nuclear factor-1 sequence
G C C A A T (McKnight et al., 1984) located several sequences in both the 5'-3' (beginning at - 1 0 1 , - 1 5 7 , - 2 3 2 , and - 2 7 2 , respectively) and 3'-5' (beginning at - 7 6 , - 195, - 2 6 8 , - 6 9 8 , - 1025, - 1058, and - 1324, respectively) orientations. Most cuticle protein genes contain what appear to be C C A A T boxes in the 5' 3' orientation at c. - 8 0 . The sequence at - 7 6 was in the opposite orientation from most C C A A T boxes and was incomplete. Indeed, C C A A T boxes may not be crucial for cuticle protein gene synthesis as D M C P 3 and 4 completely lacked identifiable C C A A T boxes (Snyder et al., 1982). Because palindromy is often associated with cis-acting promoter and enhancer sequences a search was undertaken to locate such sequences in the 5' flanking D N A of the HCCP66 gene. The whole or partially-palindromic sequences identified are shown in Fig. 1.
A search o f the H C C P 6 6 g e n e f o r other possible cis-acting elements A search was made for sequences corresponding to other known eukaryotic cis-acting sequences. Many matches were detected but only two were of sufficient interest to pursue further. The first was a case of two imperfect matches to the consensus sequence of the Drosophila ecdysone response element (EcRE). These were of particular interest as several cuticle protein genes have been shown to rely on 20-hydroxyecdysone (20E) for their transcription (Apple and Fristrom, 1991; Hiruma et al., 1991; Nakato et al., 1992). Two matches were found for the EcRE and these are shown in Fig. 6 aligned with the Drosophila EcRE and listed in bold in Fig. 1. The match between + 20 and + 3 4 is interesting because it is both partially palindromic and is so situated that if bound by a hormone-receptor complex it might interfere with the I.
HCCP66
+20
TAGTCACAGTGAACA
IIII I IIIII Drosophi ia
2.
HCCP66
GGGTCA-N- TGAAC T AT C C
-567
GGGTCCGTGATAA
IIIII IIII Dro sophi i a
GGGTCANTGAACT AT C C
FIGURE 6. HCCP66 sequences similar to a Drosophila ecdysone response element. Positions of the HCCP66 EcRE similarities are with respect to the transcription start site. Lines between sequences indicate identities. The Drosophila EcRE consensus sequence is after Cherbas et al. (1991).
cDNA AND GENE ENCODING A CUTICULAR PROTEIN FROM RIGID CUTICLES
Since cuticle proteins are tissue-specific products and since several have been shown to respond to ecdysteroids, we reasoned that the cis-acting sequences mediating these properties might be conserved between species. The sequences 5' to the transcription start sites of three cuticle protein genes were chosen for comparison with that of the HCCP66 gene. They were (1) the D. melanogaster cuticle protein 3 (DMCP3) gene chosen because it lacked an identifiable CCAAT box (Snyder et al., 1982); (2) the HCCP12 gene chosen because it was the only other Cecropia cuticle protein gene that had been characterized (Binger, 1991); and (3) D. melanogaster ecdysone dependent gene 84 (EDG84) (Apple and Fristrom, 1991) chosen because it encoded a protein very similar to that of HCCP66 (see previous text and Discussion). All of the 5' flanking sequence available in Genbank for each of these genes was used
transcriptional machinery. Such a form of negative regulation has been suggested for the EDG84 cuticular protein gene where similarities were found between the sequences surrounding the TATA box and an EcRE (Apple and Fristrom, 1991). A perfect Octamer (Oct) sequence was found between -81 and - 7 4 (Fig. 1). This Oct sequence was notable because it was in a position similar to that of the tissue-specific Oct sequence in the immunoglobulin H (IgH) gene promoter (Schreiber et al., 1989). Moreover, it was preceded by a pyrimidine-rich region (-101 to -87), also a feature of the IgH promoter, although in the HCCP66 gene it is situated much closer to Oct. Interestingly, the Oct sequence was also part of a larger palindrome. This may be coincidental or it may allow competition for the same sequence by two different trans-acting factors.
5' Hind III (H)
427
Tthlll
T ~ I (T)
. . . .
E) 3'
/
a.
I -164
H/E b
H/Tth +
•
+19 +34
-64
I
I
|
H/T
+
|l
T/E 4-
II
4I
F I G U R E 7. D N A fragments used in gel retardation assays and gel retardation assay of HCCP66 gene promoter fragments showing location of a strong shift. (a) A subclone of HCCP66 promoter D N A and flanking vector sequences is shown. Restriction sites are listed. N u m b e r s are positions relative to the transcription start site of the HCCP66 gene. Black boxes are vector sequences. Cross hatched box contains a putative ecdysone response element. "8" box contains the Octamer sequence. (b) H/E = Hind III/Eag I; H/Tth = Hind III/Tth 111 I; H / T = Hind III/Taq I; T/E = Taq 1/Eag I. Samples either were ( + ) or were not ( - ) incubated with a whole cell extract of day 2 pharate adult wings. For the location of fragments used here relative to EcRE and Octamer see Fig. 7(a). The very dark " b a n d s " at the bottom of the autoradiograph are unshifted labeled D N A .
428
D A V I D J. L A M P E and J U D I T H H. WILLIS T A B L E 2. Oligonucleotides used for competition in gel retardation assays (1) Oct 1 (Promega)
5' 3'
TTCTAGTGATTTGCATTCGACA AAGATCACTAAACGTAAGCTGA
3' 5'
(2) Oct 66
-87
5' T G A T G C A T T T G C A T T T C C T G 3'ACTACGTAAACGTAAAGGAC
3' 5'
-68
(3) Oct 66'
-87
5' 3'
3' 5'
-68
(4) Non-specific
5' 3'
........ CCC ......... ........ GGG .........
GACACAGTGAACAACATCAAGGATCCGG CAGTGTCACTTGTTGTAGTTCCTAGGCC
3' 5'
Numbered positions are relative to the transcription start site in the H C C P 66 gene. In Oct66', dashes represent nucleotides identical to Oct66. Oct sites (or mutated Oct in Oct66') are underlined in each. The non-specific D N A corresponds to, from 5' to 3', bp + 2 2 to +41 of the HCCP66 gene (double underline), a BamH I site (underline), and two extra base pairs.
in pairwise comparisons with the 5' flanking D N A of the HCCP66 gene. No sequence was shared between all four cuticle protein genes but some were shared by three. The sequence beginning at - 134 in HCCP66 was shared with some nucleotide changes between HCCP66, DMCP3 and HCCP12. Two other sequences were shared exactly. The sequence A G T T A A T A A ( - 2 1 2 in HCCP66) and the sequence A T A T T T T A ( - 3 6 5 in HCCP66) were shared between HCCP66, HCCP12, and EDG84. Gel retardation assays reveal a strong shift associated with an Octamer-containing D N A fragment The occurrence of both the Octamer and + 20 EcRE sequence matches in the putative regulatory regions of the HCCP66 gene were of interest because cuticle protein genes are both 20E-responsive and tissue specific and these kinds of cis-acting elements mediate those characteristics in other genes (Schreiber et al., 1989; Cherbas et al., 1991). To test this potential, gel retardation assays were undertaken to ascertain whether proteins were present in the epidermis that were capable of specifically binding either of these two sequences. A fragment of genomic DNA containing both the + 20 EcRE and Oct was subcloned into a plasmid vector so that the piece could be isolated and end labeled with T4 polynucleotide kinase [Fig. 7(a)]. Internal restriction sites allowed the isolation of portions of the subclone containing either both Oct and EcRE or each individually so that each sequence could be tested for its ability to bind protein. A whole cell extract (Manley et al., 1983) was prepared from day 2 pharate adult wings because a tissue containing 20E receptors was desired. Samples of this extract were tested by Dr L. Cherbas at Indiana University for the presence of 20E receptors using an ~25I-iodoponasterone binding assay (Cherbas et al., 1988) and found to contain three times the level of hormone binding activity per mg of protein as similarly prepared Drosophila Kc cell extracts (data not shown). Gel retardation analyses with the day 2 pharate adult wing extract and the entire H/E fragment [see Fig. 7(a) for fragment locations and abbreviations] were used to
establish conditions for binding, length of gel runs, etc. One very strong shift was produced as well as three minor ones [Fig. 7(b), lane 2]. Subfragments of the H/E fragment were then used to localize the region of binding. The strong shift was retained on all fragments except T/E, which contained the putative EcRE but not Oct. This fragment is approximately the same length as H/T, but had only a weak shift in the position where H/T had a strong one. It was expected that the EcRE in the T/E fragment would shift if bound by receptor, as 20E receptors have been shown to bind their cognate cis-acting sequences in the absence of ligand (Koelle et al., 1991), and did not require the presence of any naturally occurring flanking D N A for normal binding (Cherbas et al., 1991). The weak shift with the T/E fragment might be caused by the "strong shift" protein binding to some sequence either related to its true binding sequence or to another one altogether, but in either case with weaker affinity than that of the site on the H/T fragment. Binding of more than one sequence by various DNA binding proteins has been described and is known to be a property of some Oct-binding proteins (Schreiber et al., 1989; Kemler and Schafner, 1990). Because we could not confidently assign a specific shift to the fragment containing the putative EcRE, we focused our attention on the H/T fragment containing the Oct sequence.
The strong shift is caused by protein binding specifically to the Oct sequence A competition assay was performed to determine whether the strong shift was produced by a protein binding to the Oct site. The competitor DNAs used for this experiment are shown in Table 2. The results of the competition assay are shown in Fig. 8. These DNAs differed at most other positions except for the Oct sequence itself. The strong shift was competed away successfully only with competitor DNAs that contained an intact Oct sequence. By contrast, both the nonspecific D N A and the "mutant" Oct66' D N A were incapable of competing away the strong shift, even at concentrations ten times greater than the highest concentration used in these experiments (data not shown).
cDNA AND GENE ENCODING A CUTICULAR PROTEIN FROM RIGID CUTICLES
Octl
Oct66
×x
Oct66'
x II
xXo .ll
Non-sp.
x
xx II
429
None
x m
FIGURE 8. Gel retardation competition assay using Octamer-specific and non-specific oligonucleotides. Labeled DNA was the Hind III/Taq I (H/T) fragment in Fig. 7. Whole cell lysate was from post-commitment larval wing discs. Sequences of competitors are found in Table 2. Numbers under competitor labels are molar excess above labeled input DNA. Only the competed band has been shown here.
Oct binding activity is not tissue-specific Whole cell lysates and nuclear extracts were prepared from a variety of tissues from different regions and stages to determine the distribution of Oct binding activity (Fig. 9). The distribution of the protein in wing tissue at various stages is shown in the first five lanes. The Oct binding protein was present in each one, but at a much lower level in the < 24 h pupal wings. Pupal wing epidermis of this age is actively secreting cuticle proteins
so at least some of the reduction in the level of the Oct protein may be due to a larger total protein pool within the cells or some protein contributed by remnants of adhering cuticle. No Oct shift could be detected in the only other epidermal sample to be surveyed (mid-fifth instar larval dorsal epidermis) even after extremely long exposure. Two separate extracts were prepared from this tissue to be certain of the negative results. We have subsequently -d °,-4
Oct
FIGURE 9. Tissue distribution of Oct-binding activity. For descriptions of tissues extracted see Materials and Methods. All gel retardations used whole cell extracts except silk gland and larval fat body which used crude nuclear extracts. Labeled DNA was the Hind III/Taq I (H/T) fragment [see Fig. 7(a)].
430
DAVID J. LAMPE and JUDITH H. WILLIS
learned that extracts from larval epidermis eliminate Oct-binding activity when mixed with active extracts (Gu and Willis, unpubl, observations). A strong shift was detected with the silk gland nuclear extract, but did not correspond to that found in wing tissue. Octamer binding activity has been shown in silk gland extracts of B o m b y x m o r i (Hui et al., 1990) so this may be an analogous protein with a lower molecular weight. Very weak shifts (not visible in Fig. 9) comigrating with the high molecular weight Oct shift were detected in larval fat body nuclear extract and in extracts of pupal fat body and testes in autoradiographs exposed for long periods. These bands probably represent the presence of the same Oct-binding protein as in wing tissue, as competition assays with extracts of testes show sequence-specific competition (data not shown).
DISCUSSION As part of a long term effort to understand how metamorphosis is regulated at the molecular level, we undertook the isolation of a c D N A and gene encoding H y a l o p h o r a c e c r o p i a cuticular protein 66 (HCCP66), a protein found in rigid larval and pupal cuticles (Cox and Willis, 1985). HCCP66 has been identified as a member of a multigene family of proteins from rigid cuticles
EDG78
Cuticular proteins from rigid and flexible cuticles share a conserved hydrophilic sequence indicating a c o m m o n ancestry Many cuticular proteins have been purified and sequenced and the amino acid sequences of others are known from the conceptual translation of their c D N A s (Nohr e t al., 1992; Nakato e t al., 1990; references for Fig. 10). Rebers and Riddiford (1988) aligned a number of cuticle proteins isolated from the flexible cuticles of several species and concluded that they probably belonged to a single protein family. Willis (1989) identified another family based on N-terminal amino acid data from cuticle proteins of H. c e c r o p i a . Still other cuticular proteins have been isolated which seem to bear little resemblance to these two groups (Nohr et al., 1992; Nakato e t al., 1990).
IGVLSQGGGGGGAAGGLLEGGGGFEEYGHRSRGSIIGLSRGIEIGRHYGGGGGGGGEGEEGREH
TMCP22
TMCP20 TMCP22 H C C P 66 EDG84 LMCP 8 Abd-4 HCCPI2 D M C P 11 LCP14 LCP16117 DMCP 1 IAMCP 3 DMCPGA
(Willis, 1989). The c D N A isolated in this report coded for a protein of a similar molecular weight and charge of HCCP66; the m R N A corresponding to the c D N A was expressed in both larval tubercles and pupal wings, a pattern of expression characteristic of HCCP66; and the conceptual translation matched the N-terminus of HCCP66 and not the two relatives for which sequence data is available (HCCP65 and HCCP91) (Willis, 1989).
AGVVVGGYGDGVGVGL GGLGGGLGGVGVGL GGVGVVGGGHGVVDLHTPAHYQFKY .... GVEDHRTGDRKQQAEVRVGD .... V ELRGGGLELGGGGGGGGGGGGGGGEGRHYE IGFGGSHFNTPVDVHHEEA IHLKAHPE- --YHSDY .... HVADHKTKDFKSKHEVRDGY .... K SDFSSFSY .... GVADPSTGDFKSQI ESRLGD .... N IEDEYGGPLPAKSSGSEDTYDSHPQYSFNY .... DVQDPETGDVKSQSESRDGD .... V NVLHASGGYAAPAAYAAPVAYAAPVAKAWAAPAVAYAAPVAKAWAEPVAYPKYEFNY ----GVHDAHTGD IKQQSEARDGD .... V GAPSD-KVI P I IS-QNEVRN ....... p DGSYQWNYETGNG I KADGTGTLKKGSKPDEGD--F IV VP LTPDGDAQ I K- -YENDN IG ........ VEG-FQYGYETSNG IQHOESGQ LNNVTENEG I E ..... NEAD-WKSDSEVN .............. LLDFNYAYELSNH IRAVQTGALKEHDNW ........ V NDP EAVVV-RNDYVQN ........... p EGSYNYAFESNNG ISGQAEGKFKVFDKDSA .... AVV NEPEPPKI LRSEYDQK ........... PEGSMVFGFETEDG I SRDET GEVKEALDEDNKP HSVVV N P P V p HS L G R S E - D V H A D V L S R - S D D V R A D G F D S $ LHTSNG IEQAASGDAHGN I ........... NANVEVKE LVN--DVQ ........... p DG-FVSK LVLDDGSASSATGD IHGN I ........... oss~PnsD..T,,--Lo.D~ ...... ~ o . ~ , ~ ~ , =~v ........... ~,~-~--~Q~O~ ....... ~ o w ~ , ~ I o ~ - o ~ ...........
VKGEYS LAEPD-GTVRWKY TADDHNGFNAWSRVGHAVH ...... pQVLVRKAVVPVATHGWGVGGLGG LGGVGLGGVGLGGVG LGGG LGGV VKGTYS LLEPDHKTVRVVDYVSDKKRGF IARVSYRKHH VQGQ¥S LLESD-GTQRTVDY-AAGSEGFNAVVRKD-PAL IAAA---PY ITA- --PY GYAV-P YAYTSPY G I SNLANYRALKFA-SALP Y sRVFF VHGQy SVNDAD-GYRRTVDY TADDVRGFNAWRRE-P LSSAAVVVKPQAT-AVVPKVQ LK-P LKKL ........... PALKPL-SQASAVVHRS VKGSY S LVEPD-GSTRTVEYQADDHNGFNAVVHR-TP GTHPVAVA-PVAV-AHAPVAVAHAP IAY H AQGSFSYTGPD-GTAYQVQYSADDENGFVPQGAH-FPTPPP IP ---PAIQ ........................... RALDYLATLPPTPEARP VRGQFSYVGPD-GVTYSVTYTAGQE -GFKPVGAH-I PVA VSGEYEYVAPN-GKT~TADE-TGYNP K ?VEA-- DMCPll VAGSSQYKGSD-GKVYS LTYVADE-NGYQPQADF-LPTPPPTVA I -PEYIA .......................... RAVAYNLAHSAKV LCP14 VRGQYSYVDPD-GNPQVI KYYADE-TGYHAEGDS-I P KVP SRR LCPl~/17 -HGNFGW ISPE-GEHVEVKYVANE-NGYQP SGAW- IPTPPP I.... PEAIA .......................... RAVAWLESHPPAPEHPR DMCP 1 -DGVFEW ISPE-GVHVRVSYKADE-NGYQPQGDL-LP TPPP I.... PAAIL .......................... KAIAY IEANPSKN I~CP3 Q G G S S Y T S P E G E V I S V N Y V A D E F G Y H P V G A H I P Q V . . . . . . . P D Y I L . . . . . . . . . . . . . . . . . . . . . . . . . . R S L E Y I R T H - - -P Y Q I K IAMCPGA -RC.AVAMVRPR-GR~TSLTYTADW~R--GYRPVGDH-LPTPPPV .... PAYVL .......................... RALEYI RTHPPAPAQ-K EDG78 G * ** G * Y A*E GY P P R & R Cons.
TMCP20
TMCP22 H C C P 66 EDG84 LMCP8 Abd-4 H C C P 12
TMCP20
E.DG84
GLLGGRGGLDRG I LGGHGGSE LKFKRAL I F A P V V M M A p %'TlaiW llHAAP A M $ ~ S i~MV P V LKTT%'I4MAMHP H A I SYI/F
IAMCP I IIMCPGA EDG78
DYYTGELKTVEHDAAAFNVYTRNIQDHT EQQ
HM IPQSRPSTTPKT
IYLTHPPTTTSRP
LRQRRALPTH
FIGURE 10. Alignment of cuticular proteins showing similarity in a hydrophilicregion of each. Proteins from rigid or pupal cuticles are indicated in bold. TMCP20 and 22 = T. molitor Adult specific cuticle proteins 20 and 22 (Charles et al., 1992; Bouhin et al., 1992). HCCP12 and 66 = H. cecropia cuticle proteins 12 and 66 (Binger, 1991; this report). EDGT$ and 84 = D. melanogaster ecdysone dependentgene products 78 and 84 (Apple and Fristrom, 1991). LMCP8 = L. migratoria cuticle protein 8 (Klarskov e t al., 1989). Abd-4 = L . migratoria abdominal cuticle 4 (Talbo e t al., 1991). LCP14 and 16/17 = M . s e x t a larval cuticle proteins 14 and 16/17 (Rebers and Riddiford, 1988; Horodyski and Riddiford, 1989). DMCP1 and 3 = D. melanogaster cuticle proteins 1 and 3 (Snyder e t al., 1982). DMCP11 = D . m e l a n o g a s t e r cuticle protein 11 (Silvert, 1985). D M C P G A = D. melanogaster cuticle protein from gart locus (Hennikof e t al., 1986). R&R = Consensus residues derived for cuticle proteins from flexible cuticles (Rebers and Riddiford, 1988). * = Highly conserved residues not present in R&R consensus. Region underlined in EDG78 represents the region and alignment of all sequences used for PAUP [see Fig. I l(a) and text].
cDNA AND GENE ENCODING A CUTICULAR PROTEIN FROM RIGID CUTICLES The comparison of the conceptual translation of HCCP66 with a subset of proteins from rigid or pupal cuticles show these proteins to be similar to one another in a hydrophilic region. Interestingly, this region was not localized to any particular part of the proteins. For example, it occurred at, or near, the N-terminus of HCCP66 and EDG84, in the middle of LMCP8 and TMCP20, and near the C-terminus of TMCP22 (Figs 4 and 10). It is unclear what the function of such a motif might be. Cuticle proteins have the potential to interact with water, chitin, other cuticular proteins, or any of a number of postulated crosslinking agents so it is possible that this region interacts with one or more of these. We note, however, that a cuticular protein isolated from B. m o r i using a chitin affinity column lacks this motif entirely (Nakato et al., 1990). Searches of Genbank using the HCCP66 sequence not only identified cuticular proteins from rigid/pupal cuticles but some from flexible cuticles as well, albeit with much lower degree of identity. We compared the conserved region of amino acids from rigid/pupal cuticles with a conserved region in flexible proteins identified by Rebers and Riddiford (1988) (Fig. 10), and were surprised to find a noticeable degree of similarity between the two, particularly in light of the fact that these two groups of proteins had been placed previously into two different families (Willis, 1989). The spacing of the first six amino acids in a consensus identified by Rebers and Riddiford was well conserved when minor gaps were introduced into the protein alignments. The seventh amino acid was frequently phenylalanine rather than tyrosine, a conservative substitution. The two prolines in the consensus were not conserved. The presence of this consensus in a cuticular protein from rigid cuticles of Tenebrio was also noted by Charles et al. (1992) and Bouhin et aL (1992). With gapping it was possible to align other regions of apparent similarity in the proteins, but it is clear that the genes encoding these proteins have undergone numerous insertions and deletions. This is particularly evident from the termini of the proteins where it is impossible to find any similarity between all of the proteins. For example, LMCP8 contained N-terminal sequences that resembled those on its C-terminal end (Klarskov et al., 1989). TMCP22, HCCP12, and LCPI6/17, on the other hand, completely lacked an extensive C-terminal portion beyond the consensus region, although both TMCP20 and TMCP22 had very long N-terminal regions rich in glycine. Where C-terminal sequences were present beyond the consensus sequence, they differed between the rigid and flexible groups. The rigid/pupal group contained a more periodic sequence of approximately regularly spaced prolines and no acidic residues. The flexible group, on the other hand, contained less regularly spaced prolines and acidic residues were present. The question arises as to whether the apparent sequence similarity of these two groups of proteins is the result of divergence from a shared ancestral sequence or convergence toward a common useful sequence motif.
431
To answer this question, we performed a phylogenetic analysis of the proteins based on the consensus region using the computer program PAUP [Swofford, 1990; Fig. 11(a)]. Based on a bootstrap analysis of 100 replications only two nodes (numbered in figure) could be supported by the data with confidence. The first of these is the node joining DMCP1 and DMCP3. The genes encoding these proteins have been postulated to have arisen as the result of a recent duplication event (Snyder et al., 1982) and so would be expected to be very similar on the sequence level. More interesting was the node joining a group of proteins all of which were from flexible cuticles to another all of which contained members from rigid/pupal cuticles. The node joining these groups was supported 100% of the time in 100 heuristic replications. The branching pattern within groups was much less robust, with most nodes being supported less than half the time by the data. The flexible/rigid grouping implies that these cuticle proteins are encoded by genes descended from a single common gene that underwent a duplication event no later than the evolution of the locusts (or the orthopteroids), the oldest taxonomic group in this analysis, over 300 million years ago. Assignment of related groups of proteins to families or superfamilies is a surprisingly arbitrary affair (Gonzalez, 1989). It is now clear that many proteins have more than one evolutionary origin generated by mechanisms such as exon shuffling (Doolittle, 1981). Thus, a protein can easily belong to more than one family. Some cuticle proteins may in fact have a multiple origin. Hojrup et al. (1986) noted the similarity between the repeated structure spanning the entire length of LMCP38 and the Nand C-termini of LMCP8. Apple and Fristrom (1991) recognized the similarity between EDG91 and short Nand C-terminal sequences of LMCP38 as well as oothecal and chorion proteins. We have shown the similarity of the hydrophilic portion of LMCP8 to numerous proteins from flexible cuticles. Doolittle (1981) suggested a system of grouping proteins that takes arbitrariness and multiple origins into account. Based on his methods, we suggest that the best preliminary grouping for insect cuticle proteins would consist of a Venn diagram of overlapping circles [Fig. ll(b)]. All cuticle proteins sharing the hydrophilic consensus were placed into a single group and divided into two subgroups, one for proteins from rigid cuticles and one for proteins from flexible ones based on the parsimony analysis [Fig. l l(a)]. LMCP38 and NCP55 are composed of repeated hydrophobic residues punctuated by prolines EDG91 is nearly uniformly hydrophobic and contains a high percentage of glycine. In this respect it is similar to oothecal proteins and regions of chorion proteins whose similarity to each other was noted previously (Regier and Kafatos, 1985). BMCP90, despite its unusually high percentage of tryptophan, does contain a repeating motif similar to that of the proline repeat family. Only one cuticular protein (NCP62) remains an "orphan". As more sequences become available the validity of these groupings can be more fully tested.
432
DAVID J. LAMPE and JUDITH H. WILLIS [
~
DMCP1
a.
DMCP3 DMCPll DMCPGA
--
EDG78 .~
LCP14 LCP16/17 Abd4 HCCP12 HCCP66 EDG84
(100)
TMCP22 TMCP20 LMCP8
b. Hydrophilic Flexible
Rigid
HCCP12 LMCP14 LMCPI6/17 DMCP1 DMCP3 DMCPGA Abd-4 EDG78
HCCP66 EDG84 TMCP22 LMCP8 TMCP20 /
Proline Repeat
BMCP90
LMCP38 NCP55
/
EDG91
Chorion?
Oothecal?
FIGURE 11. Phylogenetic analysis of cuticle proteins sharing a conserved hydrophilic sequence and potential groupings of related cuticle proteins. (a) The phylogenetic analysis of cuticle protein sequences was carried out with PAUP, version 3 (Swofford, 1990) using the sequences underlined in Fig. 10. A heuristic search with the bootstrap option (100 replications) was performed incorporating TBR branch swapping and midpoint rooting. This tree is one of 69 equally parsimonious trees. The labeled nodes are confidence limits of those exceeding 50. The abbreviations of the cuticle proteins follows Fig. 10. (b) A Venn diagram showing preliminary proposed relationships of insect structural proteins. NCP55 and NCP62 = L. migratoria nymphal cuticle proteins 55 and 62 (Nohr et al., 1992). BMCP90 = B. mori cuticle protein 90 (Nakato et al., 1990). Other cuticle protein abbreviations are those defined in Fig. 10. Only the group with the hydrophilic region has been analyzed by PAUP. The H C C P 6 6 gene
Using the c D N A as a probe we isolated a gene encoding H C C P 6 6 . Like all cuticular protein genes and chorion protein genes isolated to date, an intron interrupts the coding region o f the signal peptide. Cuticular protein genes from flexible cuticles o f M a n d u c a s e x t a (Rebers and Riddiford, 1988; Horodyski and Riddiford,
1989) and H. cecropia (Binger, 1991) have a second intron interrupting the coding sequence o f the mature peptide. The gene for H C C P 6 6 like the genes for all Drosophila and B o m b y x cuticular proteins so far isolated lacks this second intron (Snyder et al., 1982; Henikoff et al., 1986; Apple and Fristrom, 1991; N a k a t o et al., 1992).
cDNA AND GENE ENCODING A CUTICULAR PROTEIN FROM RIGID CUTICLES The H C C P 6 6 gene contains copies of a SINE-like transposable element The HCCP66 gene contained an intron of over 3 kb. Dot plot analyses of the intron showed it to contain a tandemly duplicated sequence totaling about 1.5 kb. The monomer of this sequence, first described as an insertion element (IE) in the basic attacin gene (Sun et al., 1991), accounts for up to 1% of the Cecropia genome (Lidholm, 1991). It is short (500 bp), contains a series of T C T G tandem repeats, and has short (4 bp) direct terminal repeats. Lidholm (pers. commun.) suggested the Cecropia IEs might be similar to SINEs (short interspersed repeated D N A elements) due to their size and short direct repeat termini. In support of this is our finding that nearly one third (8/30) of the matches in Genbank to the basic attacin IE were to flanking D N A of mitochondrial tRNA genes. Most SINEs contain the bipartite tRNA promoter in some form (Deininger, 1989). We searched the reverse complement of the basic attacin IE for a match to the consensus of both the A ( T R G C N N A G Y N G G ) and B ( G T T C G A N T C C ) tRNA gene promoter boxes (Geiduchek and Tocchini-Valentini, 1988). Three related sequences were identified at the 5' end of the IE (data not shown), their normal location in SINEs (Deininger, 1989). Search for putative regulatory elements shared among cuticle protein genes Certain characteristics of the expression of cuticular protein genes such as epidermal restriction and hormone responsiveness are known to be conserved (Apple and Fristrom, 1991; Hiruma et al., 1991; Nakato et al., 1992). It thus seemed likely that the elements controling cuticular gene expression would also be conserved. Indeed, there are strong precedents on which to base this presumption. Correct spatial and temporal expression of chorion genes of Bombyx mori introduced as transgenes into the Drosophila genome has been reported (Mitsialis and Kafatos, 1985) demonstrating that elements controling chorion gene expression are conserved between these two distantly-related species. Similar experiments have shown that silk genes of B. mori can be expressed in Drosophila salivary glands, even though Drosophila never makes silk (Bello and Couble, 1990). Moreover, 5' sequence comparisons of Bombyx and Drosophila chorion gene sequences demonstrated the presence of an identical hexanucleotide sequence in a similar position in both genes that, when mutated, eliminated the proper expression (Wong et al., 1985; Spoerel et al., 1986; Mitsialis et al., 1987). Thus, by comparing cuticle protein gene sequences from diverse species it was hoped tentatively to identify conserved cis-acting sequences. Interestingly, we found no universally conserved 5" sequence in the cuticular protein genes we examined. This may indicate that cuticle protein gene expression relies on cis-acting elements positioned much farther 5' than the region we searched or even in the introns or 3' flanking DNA.
433
POU domain proteins may be involved in cuticle protein gene expression Two sequence elements identified by matches to known cis-acting sequences of other genes were of particular interest. A sequence similar to a Drosophila ecdysone response element consensus sequence was present just 3' to the transcription start site. Several cuticle protein genes are reported to contain sequences that might mediate their response to ecdysteroids, but no sequences have been demonstrated as being functional (Apple and Fristrom, 1991; Binger, 1991). The +20 EcRE-like sequence we identified failed to bind proteins in whole cell lysates shown to contain proteins that bind ecdysteroids. Our gel retardation reactions, however, contained no hormone and it is possible that the presence of ligand is necessary for the receptors to bind this sequence because many steroid receptors function only in the presence of ligand (Gronmeyer, 1991). Another 5' sequence element did bind protein in a gel retardation assay. This was the Octamer sequence present between - 8 1 and - 7 4 . Octamer binding proteins are members of the POU ( = Pit-Oct-unc) domain family of DNA binding proteins which are important regulators of tissue-specific gene expression in several systems (Kemler and Shafner, 1990; Ruvken and Finney, 1991; Schreiber et al., 1989). Recently, a Pit-like sequence was noted in the promoter of a cuticle protein gene from B. mori, and Oct sequences have been found in the tissuespecific fibroin gene promoter of this same insect (Nakato et al., 1992; Hui et al., 1990). We find no simple explanation for the distribution of binding activity to the Oct sequence in the HCCP66 gene. It was present in wing tissues that were secreting HCCP66 but also was present in precommitment wing imaginal discs and in diapausing wings that contain no HCCP66 message. It was found, albeit in very low concentrations, in testes and fat body. Strong binding activity, but to a factor of a different molecular size, was present in salivary gland nuclei. Obviously, Oct binding activity cannot be the sole regulator of HCCP66 expression. Other Oct binding proteins are known that require the presence of additional factors to activate transcription (Ruvkun and Finney, 1991). Particular sets of factors, not any individual one, may be the most important element in determining how and when a gene will be expressed (Latchman, 1990). REFERENCES
Apple R. T. and Fristrom J. W. (1991) 20-hydroxyecdysoneis required for, and negatively regulates, transcription of Drosophila pupal cuticle protein genes. Dev. Biol. 146, 569 582. AuffrayC. and Rougeon F. (1980)Purificationof mouse immunoglobulin heavychain messengerRNAs from total myelomatumor RNA. Eur. J. Biochem. 107, 303-324. Bello B. and Couble P. (1990) Specificexpression of a silk-encoding gene of Bombyx in the anterior salivary gland of Drosophila. Nature 346, 480-482. Binger L. C. (1991) Molecular analysis of the cDNA and gene for a major protein from flexible cuticles of the giant silkmoth,
434
DAVID J. LAMPE and JUDITH H. WILLIS
Hyalophora cecropia. Ph D. dissertation. University of Illinois, Urbana, I11. Bouhin H., Charles J.-P., Quennedey B., Courrent A. and Delachambre J. (1992) Characterization of a cDNA clone encoding a glycinerich cuticular protein of Tenebrio molitor: developmental expression and effect of a juvenile hormone analogue. Insect Molec. Biol. 1, 53-62. Charles J. P., Bouhin H., Quennedey B., Courrent A. and Delachambre J. (1992) cDNA cloning and deduced amino acid sequence of a major glycine-rich cuticular protein from the coleopteran Tenebrio molitor: temporal and spatial distribution of the transcript during metamorphosis. Eur. J. Biochem. 206, 813 819. Cherbas L., Lee K. and Cherbas P. (1991) Identification of ecdysone response elements by analysis of the Drosophila Eip28/29 gene. Genes Dev. 5, 120-131. Cherbas L., Shultz R. A., Koehler M. M. D., Savakis C. and Cherbas P. (1986) Structure of the Eip28/29 gene, an ecdysone-inducible gene from Drosophila. J. Molec. Biol. 189, 617-631. Cherbas P., Cherbas L., Lee S. S. and Nakanishi K. (1988) 26-[~25I]lodoponasterone is a potent ecdysone and a sensitive radioligand for ecdysone receptors. Proc. natn. Acad. Sci. U.S.A. 85, 2095~100. Cox D. C. and Willis J. H. (1985) The cuticular proteins of Hyalophora cecropia from different anatomical regions and metamorphic stages. Insect Biochem. 15, 349 362. Deininger P. L. (1989) SINEs, short interspersed repeated DNA elements in higher eukaryotes. In Mobile DNA (Edited by Berg D. E. and Howe M. M.). American Society for Microbiology, Washington, D.C. Doolittle R. F. (1981) Similar amino acid sequences: chance or common ancestry? Science 214, 149-159. Fawcett T. W. and Bartlett S. G. (1990) An effective method for eliminating "artifact banding" when sequencing double-stranded DNA templates. Biotechniques 9, 46-48. Frohman M. A. (1990) RACE, rapid amplification of cDNA ends. In PCR Protocols, a Guide to Methods and Applications (Edited by lnnis M. I., Gelfand D. H., Sninsky J. J. and White T. J.), pp. 28 38. Academic Press, Calif. Geiduschek E. P. and Tocchini-Valentini G. P. (1988) Transcription by RNA polymerase III. Ann. rev. Biochem. 57, 873914. Gonzalez F. J. (1989) The molecular biology of cytochrome P450s. Pharmac. Rev. 40, 243~76. Gronemeyer H. (1991) Transcription activation by estrogen and progesterone receptors. Ann. Rev. Genet. 25, 89-123. von Heijne G. (1984) How signal sequences maintain cleavage specificity. J. Molec. Biol. 173, 243-251. Henikoff S. (1984) Unidirectional digestion with exonuclease III creates targeted breakpoints for DNA sequencing. Gene 28, 351-359. Henikoff S., Keene M. A., Fechtel K. and Fristrom J. W. (1986) Gene within a gene, nested Drosophila genes encode unrelated proteins on opposite DNA strands. Cell 44, 33-42. Hiruma K., Hardie J. and Riddiford L. M. (1991) Hormonal regulation of epidermal metamorphosis in vitro, control of expression of a larval-specific cuticle gene. Dev. Biol. 144, 369-378. Hojrup P., Andersen S. O. and Roepstorff P. (1986) Primary structure of a structural protein from the cuticle of the migratory locust, Locusta migratoria. Biochem. J. 236, 713 720. Horodyski F. M. and Riddiford L. M. (1989) Characterization and expression of an ecdysone inducible larval cuticular protein gene of the tobacco hornworm Manduca sexta. Dev. Biol. 132, 292 303. Hui C., Matsuno K. and Suzuki Y. (1990) Fibroin gene promoter contains a cluster of homeodomain binding sites that interact with three silk gland factors. J. molec. Biol. 213, 651-670. Hultmark D., Klemenz R. and Gehring W. J. (1986) Translational and transcriptional control elements in the untranslated leader of the heat-shock gene hsp22. Cell 44, 429-438. Kawasaki E. (1989) Amplification of RNA sequences via complementary DNA (cDNA). Amplifications 3, 4-6. Kemler 1. and Schaffner W. (1990) Octamer transcription factors and the cell type-specificity of immunoglobulin gene expression. FASEB J. 4, 1444-1449.
Klarskov K., Hojrup P., Andersen S. O. and Roepstorff P. (1989) Plasma-desorption mass spectrometry as an aid in protein sequence determination. Biochem. J. 262, 923930. Koelle M. R., Talbot W. S., Segraves W. A., Bender M. T., Cherbas P. and Hogness D. S. (1991) The Drosophila EcR gene encodes an ecdysone receptor, a new member of the steroid receptor superfamily. Cell 67, 59-77. Kyte J. and Doolittle R. R. (1982) A simple method for displaying the hydropathic character of a protein. J. Molec. Biol. 157, 105-132. Latchman D. (1990) Gene Regulation, a Eukaryotic Perspective. Unwin Hyman, London. Lidholm D.-A. (1991) The cecropin locus in Hyalophora cecropia, structure and function of three genes for antibacterial peptides. Ph.D. dissertation. University of Stockholm, Stockholm. Manley J. L., Fire A., Samuels M. and Sharp P. A. (1983) In vitro transcription, whole-cell extract. Meth. Enzymol. 101, 568 583. Mark C. (1988) "DNA Strider", a "C" program for the fast analysis of DNA and protein sequences on the Apple Macintosh family of computers. Nucl. acids Res. 16, 1829-1836. Matsudaira P. (1989) N-terminal sequence from proteins purified by SDS-polyacrylamide gel electrophoresis. In Protein Sequence from Microquantities of Proteins and Peptides, pp. 27-31. Workshop Handbook for ASCB/ASBMB Special Interest Subgroup Meeting, 29 January 1989. McKnight S. L., Kingsbury R. C., Spence A. and Smith M. (1984) The distal transcription signals of the herpesvirus tk gene share a common hexanucleotide control sequence. Cell 37, 253-262. Miller K. (1987) Gel electrophoresis of RNA. Focus 9, 14-15. Mitsialis S. A. and Kafatos F. C. (1985) Regulatory elements controlling chorion gene expression are conserved between flies and moths. Nature 317, 453-456. Mitsialis S. A., Spoerel N., Leviten M. L. and Kafatos F. C. (1987) A short 5'-flanking DNA region is sufficient for developmentally correct expression of moth chorion genes in Drosophila. Proc. nam. Acad. Sci. U.S.A. 84, 7987-7991. Nakato H., lzumi S. and Tomino S. (1992) Structure and expression of gene coding for a pupal cuticle protein of Bombyx mori. Biochem. Biophys. Acta 1132, 161-167. Nakato H., Toriyama M., Izumi S. and Tomino S. (1990) Structure and expression of mRNA for a pupal cuticle protein of the silkworm, Bombyx mori. Insect Biochem. 20, 667-678. Neville A. C. (1975) The Biology of the Arthropod Cuticle. Springer, Berlin. Nijhout H. F. and Wheeler D. E. (1982) Juvenile hormone and the physiological basis of insect polymorphisms. Q. rev. Biol. 57, 109-133. N~hr C., Hojrup P. and Andersen S. O. (1992) Primary structure of two low molecular weight proteins isolated from cuticle of fifth instar nymphs of the migratory locust, Locusta migratoria. Insect Biochem. Molec. Biol. 22, 19 24. Padgett R. A., Grabowski P. J., Konarska M. M., Seiler S. and Sharp P. A. (1986) Splicing of messenger RNA precursors. Ann. rev. Biochem. 55, 1119-1150. Pearson W. R. and Lipman D. J. (1988) Improved tools for biological sequence comparison. Proc. natn. Acad. Sci. U.S.A. 85, 2444-2448. Qian L. and Wilkinson M. (199 I) DNA fragment purification, removal of agarose I0 minutes after electrophoresis. Biotechniques 10, 736-737. Rebers J. E. and Riddiford L. M. (1988) Structure and evolution of a Manduca sexta larva cuticle gene homologous to Drosophila cuticle genes. J. Molec. Biol. 203, 411-423. Regier J. C. and Kafatos F. C. (1985) Molecular aspects of chorion formation. In Comprehensive Insect Physiology, Biochemistry, and Pharmacology (Edited by Kerkut G. A. and Gilbert L. I.). Pergamon Press, Oxford. Riddiford L. M. (1985) Hormone action at the cellular level. In Comprehensive lnsect Physiology, Biochemistry and Pharmacology (Edited by Kerkut G. A. and Gilbert L. I.). Pergamon Press, Oxford.
cDNA AND GENE ENCODING A CUTICULAR PROTEIN FROM RIGID CUTICLES Ruvkun G. and Finney M. (1991) Regulation of transcription and cell identity by POU domain proteins. Cell 64, 475-478. Sambrook J., Fritsch E. F. and Maniatis T. (1989) Molecular Cloning, A Laboratory Manual, 2nd Edn. Cold Spring Harbor Laboratory, New York. Sanger F., Nicklen S. and Coulson A. R. (1977) DNA sequencing with chain-terminating inhibitors. Proc. natn. Acad. Sci. U.S.A. 74, 54~3. Schneiderman H. and Williams C. M. (1954) Physiology of insect diapause, IX. The cytochrome oxidase system in relation to the diapause and development of the Cecropia silkworm. Biol. Bull. 106, 238~52. Schreiber E., Muller M. M., Schaffner W. and Matthias P. (1989) Octamer transcription factors mediate B-cell specific expression of immunoglobulin heavy chain genes. In Tissue Specific Gene Expression (Edited by Renkawitz R.). VCH Verlagsgesellschaft, Wenheim. Silvert D. J. (1985) (2uticular proteins during postembryonic development. In Comprehensive Insect Physiology, Biochemistry and Pharmacology (Edited by Kerkut G. A. and Gilbert L. I.). Pergamon Press, Oxford. Snyder M., Hunkapiller M., Yuen D., Silvert D., Fristrom J. and Davidson N. (1982) Cuticle protein genes in Drosophila, structure, organization and evolution of four clustered genes. Cell 29, 1027 1040. Southern E. (1975) Detection of specific sequences among DNA fragments separated by gel electrophoresis. J. Molec. Biol. 98, 503-517. Spoerel N., Nguyen H. T. and Kafatos F. C. (1986) Gene regulation and evolution in the chorion locus of Bombyx mori. J. Molec. Biol. 190, 23 35. Stone K. (1989) Enzymatic digestion of proteins and HPLC peptide isolation. In Protein Sequence from Microquantities of Proteins and Peptides, pp. 17 25. Workshop Handbook for ASCB/ASBMB Special Interest Subgroup Meeting, 29 January 1989. Sun, S.-C., Lindstrom I., Lee J.-Y. and Faye I. (1991) Structure and expression of the attacin genes in Hyalophora cecropia. Eur. J. Biochem. 196, 247-254.
435
Swofford D. U (1990) PA UP, Phylogenetic Analysis Using Parsimony, Version 3. Computer program distributed by the Illinois Natural History Survey, Champaign, Ill. Tabor S. and Struhl K. (1989) Enzymes for modifying and radioactively labeling nucleic acids. In Current Protocols in Molecular Biology (Edited by Ausubel F. M., Brent R., Kingston R. E., Moore D. D., Seidman J. G., Smith J. A. and Struhl K.). Wiley, New York. Talbo G., H~jrup P., Rahbek-Nielsen H., Andersen S. O. and Roepstorff P. (1991) Determination of the covalent structure of an N- and C-terminally blocked glycoprotein from endocuticle of Locusta migratoria. Eur. J. Biochem. 195, 495-504. Wigglesworth V. B. (1959) The Control of Growth and Form. Cornell University Press, New York. Willis J. H. (1989) Partial amino acid sequences of cuticular proteins from Hyalophora cecropia. Insect Biochem. 19, 41-46. Willis J. H. (1990) Regulating genes for metamorphosis, concepts and results. In Molecular Insect Science (Edited by Hagedorn H. H., Hildebrand J. G., Kidwell M. G. and Law J. H.). Plenum Press, New York. Wong Y.-C., Pustell J., Spoerel N. and Kafatos F. C. (1985) Coding and potential regulatory sequences of a cluster of chorion genes in Drosophila melanogaster. Chromosoma 92, 124-135.
Acknowledgements--We would like to acknowledge the technical assistance of T. Westfall in the construction and screening of the genomic library and to Dr L. Cherbas for performing the ecdysteroid binding activity assay on the whole cell extracts. We are grateful to Dr H. Robertson for much advice concerning all aspects of PCR. We thank Drs J. Doctor, K. Fechtel, D. Osterbur, and G. Robinson for critically reading the manuscript. This paper is based upon a thesis submitted for the doctoral degree at the University of Illinois at Urbana-Champaign. This work was supported by grant GM44511 from the National Institutes of Health.