Tandemly duplicated Caenorhabditis elegans collagen genes differ in their modes of splicing

Tandemly duplicated Caenorhabditis elegans collagen genes differ in their modes of splicing

J. Mol. Biol. (1990) 211, 395-406 Tandemly Duplicated Caenorhabditis elegans Collagen Genes Differ in their Modes of Splicing’ Yang-Se0 Park and Jam...

3MB Sizes 0 Downloads 8 Views

J. Mol. Biol. (1990)

211, 395-406

Tandemly Duplicated Caenorhabditis elegans Collagen Genes Differ in their Modes of Splicing’ Yang-Se0 Park and James M. Kramer? University of Illinois, Department of Biological Sciences Laboratory for Cell, Molecular and Developmental Biology P.O. Box 4348, Chicago, IL 60680, U.S.A. (Received 18 May

1989, and in revised form 29 August

1989)

Caenorhabditis elegans contains 50 to 150 collagen genes dispersed throughout its genome. We have determined the complete nucleotide sequences of two collagen genes, ~01-12 and ~01-13, that are separated by only 1800 bases and are transcribed in the same direction. The 951 nucleotides of their coding regions differ by only five nucleotides (995% identity). The amino acid sequences are identical except for two conservative amino acid changes within the putative secretory signal sequences, so the mature forms of the ~01-12 and ~01-13 collagens would be identical. The position and sequence of the intron (52 base-pairs) within the coding region of each gene are perfectly conserved. In contrast to the coding regions and the introns, the 5’ and 3’ flanking regions show little sequence similarity. ~01-12 and ~01-13 are expressed at similar levels at the same developmental stages, and appear to utilize conserved TATA boxes and transcription start sites. The major difference between the genes is that,, preceding the initiator ATG, ~01-12 has a &-spliced intron, while ~01-13 is transspliced. Thus, ~01-12 and ~01-13 are essentially identical in all aspects except that the ~01-12 mRNA has a 26.nucleotide c&-spliced leader at the same place where the ~01-13 mRNA has a 22.nucleotide trans-spliced leader. These results suggest that ~01-12 and ~01-13 are derived from a gene duplication and that sequence homology in the coding regions, but not in the flanking regions, has been maintained by gene conversion. The fact that the only significant difference between the two genes is in their modes of splicing suggests that cis and transsplicing can be interchanged during gene evolution.

1. Introduction The cuticle of Caenorhabditis elegans is a complex, multi-layered extracellular structure that has important functions in morphogenesis, motility, and interaction with the external environment. The cuticle is primarily composed of collagens (Cox et al., 1981) that are encoded by a large family of between 50 and 150 collagen genes (Kramer et al., 1982; Cox et al., 1984). Unlike most known large multigene families, such as the silkmoth chorion genes (Eickbush & Kafatos, 1982) and the mouse transplantation antigen genes (Steinmetz et al., 1982), most members of the C. elegans collagen gene family are dispersed throughout the genome (Cox et al., 1984, 1985). The nueleotide sequences of 13 of these collagen genes have been determined (Kramer et al., 1982, 1988; von Mende et al., 1988; Cox et al., 1989; this report; J. Kramer, R. French, J. Johnson 7 Author to whom correspondence should be addressed. 0022-2836/90/020395-12 $03.00/O

& A. Levy, unpublished results). Although these collagen genes encode polypeptides that share a common domain structure, the levels of amino acid sequence identity between different collagens range from less than 50 y. to as high as 99.4 %. This large, diverse family of collagens is probably required to construct the five cuticles that are produced during the normal C. elegans life cycle, as well as the specialized dauer larvae cuticle. Each of these six stagespecific cuticles can be shown to be distinct by structural, biochemical and/or genetic criteria (Cox et al., 1980, 1981). Different members of the collagen gene family display different temporal and quantitative patterns of expression during C. elegans development (Cox & Hirsh, 1985; Kramer et al., 1985), presumably accounting for the stage-specific differences in cuticle structure and composition. Mutations in several of the collagen genes, sqt-1 (Kramer et al., 1988), dpy-13 (von Mende et al., 1988), rol-6 (J. Kramer, R. French, J. Johnson & E.-C. Park, unpublished results) and dpy-IO (A. Levy & J. Kramer, unpublished results), have

395

0 1990 Academic Press Limited

Y.-S. Park

396

and J. &f. Kramer

been shown to dramatically alter the morphology and motility of C. elegans. In the sqt-l and rol-6 collagens, single amino acid replacements have been shown to be the cause of severe morphological alterations (J. Kramer & J. Johnson, unpublished results), underscoring the importance of the collagen sequences in producing a normal, orderly cuticle We are interested in knowing how structure. members of the collagen gene family have evolved to be able to produce the distinct stage-specific cuticle structures. Presumably, gene duplication and divergence have given rise to collagens with different primary sequences and/or different temporal and quantitative expression patterns. We have chosen to examine two closely apposed collagen genes ~01-12 and ~01-13, which are likely to be products of a gene duplication, to determine in what aspects they have begun to diverge from each other. In our characterization of ~01-12 and ~01-13 we found that these two genes differ in their modes of splicing. Nematodes, including C. elegans, undergo trans-splicing as well as cis-splicing (for a review, see Blumenthal & Thomas, 1988). About 10% of 6. elegans mRNAs are estimated to receive a 22nucleotide spliced leader at their 5’ ends by a transsplicing mechanism (Bektesh et aE., 1988). Krause & Hirsh (1987) first identified the 22.nucleotide transspliced leader in C. elegans as a 5’-terminal sequence shared by mRNAs derived from three different actin genes. The trans-spliced leader is derived from a lOO-nucleotide poly(A)) RNA: known as the spliced leader (SLt) RNA. The gene encoding the SL RNA is located within a 1 kb tandem repeat that includes the 5 S rRNA gene (Krause & Hirsh, 1987; Nelson & Honda, 1985). The SL RNA has several characteristics indicating that it acts as a snRNA and forms part of a snRNP (Bruzik et al., 1988; Thomas et al., 1988; Van Doren & Hirsh, 1988). Since the XL RNA donates the 5’ exon and probably functions as an snRNA in the trans-splicing reaction, it has been proposed that it is an evolutionary intermediate between self-splicing RNAs and the snRXAs involved in &-splicing. trans-splicing has also been shown to occur in trypanosomes (Parsons et al., 1984; Sutton & Boothroyd, 1986). In contrast to the nematode C. elegans, all mRNAs in trypanosomes receive a 35-nucleotide trans-spliced leader at their 5’ termini (Walder et al., 1986), and no cissplicing occurs. It is not known what effects, if any, trans-splicing has on gene expression. In this paper we present the sequences and expression characteristics of two collagen genes, ~01-12 and ~01-13. We chose to study ~01-12 and ~01-13 because these two genes are separated by only l-8 kb, unlike

most

other

collagen

genes that

are

dispersed in the genome. Sequence comparisons of the genes indicate that their coding regions are nearly identical, while their 5’ and 3’ flanking 7 Abbreviations used: SL; spliced leader; kb, lo3 bases or base-pair(s); bp, base-pair(s); snRNA, small nuclear RNA; snRNP, small nuclear ribonucleoprotein.

--

are highly divergent. The expression regions patterns of the genes appear to be identical. The most significant detectable difference between t,he genes is that just preceding their initiator codons the ~01-13 transcript is trans-spliced, while the col-12 transcript is &s-spliced.

2. Materials

and Methods

(a) Nematode strains The wild-type X2 strain var. Bristol of C. oleyana uas used for these studies. Maintenance and handling of C. elegans strains were as described by Brenner (1974). /b) DNA isolation and analysis ~01.12 and eoE-13 were previously identified hy hy-bridizing C. elegans genomic phage libraries with clones of the 6. elegans collagen genes ~01-7 or ~01-2 (COS ei a!.. 1984). C. elegans genomic DNA was isolated as described bv Kramer et al. (1988). For genomic Southern hybridizaiions, genomic DNA was digested with EcoRJ or HindID. size fractionat)ed on 0.77; (w/v) agarose gels. transferred to nitrocellulose filters, hybridized with nick-translated plasmid DEA containing col-1% or ~01-13, and washed under low or high stringency conditions. Hybridization was performed at 37°C for 16 to 24 h in 3 x SET; SOT/h formamide. 0.1 ZI sodium phosphate (pH 7-O), 0.1% SDS; @05 mg heparin/ml, and 001 mg denatured salmon sperm DNA/ml (20 x SET is 3 M-NaCl. 1 M-Tris: 002 M-EDTA, pH 7%). Low stringency wash conditions were 5O’C in @2x SET, @I’$& SDS, and high stringency wash conditions were 65°C in 20% formamide; 0.2 x SET. 0.1 ‘?h SDS. The t, for overall C. elegans collagen genes was calcula.ted as described by Meinkoth & Wahl (1984), assuming that the G +C contents of most collagen genes are similar to those of col-1 and col-2. 59% (Kramer et al., 1982). (c) DNA sequence analysis Restriction fragments containing port.ions of co-!% or col-13 were subcloned into Ml3 phage vectors (YanisehPerron et al., 1985) or the Bluescribe vector (Stratagene) and sequenced by the dideoxy chain termination method (Sanger el al.. 1977). DRIA sequences were analyzed using the HIBIO DNASIS programs (Hitachi) and the Int,elligenetics GENALIGN pr0gra.m. (d)

FLEA

isolation

and

analysis

RNA was prepared from a synchronous population of young adult worms grown in liquid culture (Sulston & Brenner. 1974), using a LiCl precipitation procedure (Cat.hala et al., 1983). RNAs from other developmental stages were previously prepared by Kramer et al. (1985). Total RNA (I 0 pg) from each stage was eleetrophoresed in a 1.2% (w/v) agarose/formaldehyde gel as described by Lehrach et al. (1977) a,nd transferred to a nitrocellulose filter using 20 x SSC (3 iv-NaCl, 0.3 Rr-sodium citrate, pW 7.0). For a col-l&specific probe, a 97 bp RsaI-DdeI fragment including 18 bp of the coding region and 79 bp of the 3’ untranslated region was isolated from a 5% (w/v) polyaerylamide gel and labeled by random primer elongation (Hodgson & Fisk, 1987). A col-Id-specific probe was prepared, as described above, from a 72 bp Rsal-finf7 fragment including 18 bp of the coding region and 54 bp

cis and trans-Splicing

of the these blots 91%

3’ untranslated region. Filters were hybridized with probes using the conditions described for Southern and were then washed at 50°C in 2 x SET and SDS. (e) Nuclease

extension

Collagen

Genes

sequencing

397

Extension reactions were carried out with avian m,yeloblastosis virus reverse transcriptase at 45°C for 45 min. Extension products were analyzed on 8% (w/v) polyacrylamide/urea gels.

3. Results

S 1 mapping

Nuclease S, mapping was performed as described by Kramer et al. (1985) with minor modifications. A 12 kb SaZI-EcoRI fragment from col-12 and a 430 bp RsaI-EcoRI fragment from col-13 were isolated from 3.5% (w/v) polyacrylamide gels, end-labeled with T4 polynucleotide kinase, and electrophresed through strandseparating gels (Maxam & Gilbert, 1980). The strandseparated fragments were annealed with 50 pg of young adult total RNA at 30 “C in 80 o/0formamide, 40 mM-Pipes (pH 6.4), 94 M-NaCl, 1 mM-EDTA for 14 to 16 h. The samples were diluted into cold S, nuclease buffer, digested with 500 units S, nuclease/ml, and electrophoresed on 8 y0 (w/v) polyacrylamide/urea gels. (f) Primer

qf Duplicate

of RNA

Primer extension sequencing of RNA was done essentially as described by Bektesh et al. (1988). The mixture of an end-labeled oligonucleotide primer (20.mer) and 60 pg of young adult total RNA was heated at 80°C for 3 min, slowly cooled to 45°C and allowed to anneal for 3 h.

(a) Genomic Xouthern hybridization ~01-12 and ~01-13

We have studied

the sequence relatedness of Southern hybridization. Genomic blots were hybridized with plasmids containing either the ~01-12 or ~01-13 gene and washed under low or high stringency conditions. When the blots were washed under low stringency conditions (5O”C, 603 M-Na+), which correspond to 30°C below the t, calculated for the average G+C contents of C. elegans collagen genes, both genes many hybridized to restriction fragments containing collagen-like sequences (Fig. 1, left panel). This hybridization pattern is similar to that seen when other collagen genes are used as probes under low stringency conditions (Cox et al., 1!984). When the same blots were washed under high stringency conditions (65”C, 0.03 M-Na+, 20% formamide), which correspond to 3°C below the average ~01-12 and ~01-13 by genomic

col-

I2

col-

I.7

col-12

E

H

E

H

E

L.OW

of

cot-I3 E

H

H

High

Figure 1. Genomic hybridization patterns of ~01-12 and ~01-13. Three pg of C. elegans genomic DNA were digested with either EcoRI (E) or Hind111 (H), electrophoresed through a 97% (w / v ) ag arose gel, and transferred to a nitrocellulose filter. Each strip of the filter was hybridized with nick-translated plasmid DNA containing col-12 or ~01-13 and washed under low (left panel) or high (right panel) stringency conditions. Arrowheads indicate self-hybridization bands. Under high stringency conditions only a single hybridizing band is seen in the Hind111 lanes because ~02-12 and col-13 are located on the same Hind111 fragment. The sizes of hybridizing bands are indicated in kb.

F.-S. Park and J. M. Kmramer

398 5

E

\i

‘i

CCCC ‘Illi

J T

i

E

CCCC

1

>)I(

JJ TT

Jill T BTB

E i

iilk T BTB

F5 .

F m-/2

cd-i3

c------,-

m-

-s

Cc--z-z.

cc

Figure 2. Restriction

--f-

-

map and sequencing strategy for

~01-12and ~01-13.The directions and extents of sequences obtained from restriction fragments subcloned into Ml3 phage vectors or the Bluescribe vector are indicated by small arrows. Large arrows indicate the locations and orientations of the coding regions of the ~01-12and ~01-13 genes. B, BanzHI; E; EcoRI; C, SacI; 8, SalI; T, TaqI (not all TCUJIsites are shown).

of the collagen genes, most cross-hybridizing bands disappeared. However, ~01-12 and ~01-13 still hybridized with equal intensities to each other (Fig. 1, right panel). Similar results have been reported by Cox et al. (1984). The strong crosshybridization between these two genes is unusual since under these high stringency conditions most other C. elegans collagen genes do not crosshybridize. These results demonstrate that the ~01-12 and ~01-13 genes must have very similar nucleotide sequences. t,

(b) Sequence comparisons ~01-12 and ~01-13 To determine how closely ~01-12 and ~01-13 are related to each other at the nucleotide and amino acid levels, we sequenced the two genes. A restriction map and sequencing strategy are shown in Figure 2. This Figure also shows that cob12 and ~01-13 are separated by only 1,8 kb and are transcribed in the same direction. The sequence comparison of the coding regions of ~01-12 and cob13 is presented in Figure 3. The 951 nucleotide long coding regions of these two genes differ by only five nucleotides (99.5% sequence identity); three nucleotide differences are located near the 5’ ends of the genes (positions 4, 6 and 14), and two nucleotide differences are in the Gly-X-Y coding regions near the 3’ ends (positions 835 and 856). Amino acid sequence comparison reveals that the collagen polypeptides encoded by these genes are identical except for two amino acid residues near the amino terminus (99.4o/o amino acid sequence identity). The two amino acid substitutions are conservat,ive; Thr versus Ser at residue 2 and Pro versus Leu at residue 5, in ~01-12 and col-13, respect-

ively. These changes occur within the 36 amino acid residue long atiitio-terminal domain that is likely-to be the signal sequence necessary for secretion. The predicted signal sequence cleavage site (von Neijne, 1986) is between amino acid residues 36 and 37. Cleavage of the putative signal sequences would leave the co612 and ~01-13 collagens with identical amino acid sequences. The overall structures of t,he ~01-12and ~01-13 collagen polypeptides are similar to those of the other C. elegans collagens that have been characterized (Kramer et al., 1982, 1988; von Mende et al.: 1988; Cox et al., 1989). The ~01-12 and col-13 collagen polypeptides consist of five Gly-X-Y repeating blocks that are flanked by long aminoterminal and short carboxyl-terminal non-(Gly-XY) domains. They contain seven conserved cysteine residues; three in front of the first Gly-X-Y block, two between the first and second Gly-X-Y repeat blocks, and two at the carboxyl ends of the polypeptides. Both ~01-12 and col-13 contain a 52 bp intron within their coding regions. The posit,ions and sequences of the intr0n.s are perfectly conserved in the two genes. In contrast to the coding regions and the introns, t,he 5’ and 3’ flanking regions of the divergent. The 5’ flanking genes are highly sequences of the two genes were aligned to compue the regions upstream from the ATG initiator codons (Fig. 4(a)). They show no significant sequence similarity except for their putative TATA boxes and 3’ splice acceptor sequences. A maximum sequence similarity of 49% was obtained by insertion of 13 gaps into the ~01-12 sequence and 28 gaps into the co613 sequence, resulting in a total of 291 positions compared including gaps. The proposed TATA boxes (GTATAAAAG) of ~01-12 and co&13 are located at positions -113 to -105 and -9s to - 90; respectively, relative to the ATGs. There are no recognizable CAAT box sequences in eit,her gene. Both genes have potential splice acceptor sequences (TTTTAG) locat,ed at positions - 10 to - 5 in col-12 and -6 to - 1 in ~01-13. The 3’ flanking regions also show little sequence similarity even though in both genes these regions are very A+T-rich (Fig. 4(b)). A maximum similarity of 52% was obtained by introducing 12 and 35 gaps into ~01-12 and ~01-13, respectively, in the total of 293 nucleotides including gaps. co!-12 contains a presumptive polyadenylation signal (AATAAA) 88 nucleotides downstream from the TAA termination codon. At the equivalent posit’ion in the sequence alignment ~01-13differs by the insertion of a single nucleotide (AATPAAA), interrupting the consensus polyadenylation signal. There is no consensus polyadenylatian signal within the 430 bp of sequence downstream from t,he c,ol-13

Figure 3. Sequence comparison of the coding regions of ~01-12 and ~01-13.The complete nucleotide and deduced amino acid sequences of ~01-12are shown. The sequence of ~01-13differs from ~01-12 at only 5 nucleotides and 2 amino acid residues which are indicated in bold face type. At the positions where the sequences of ccl-12 and ~01-13differ, the ~01-13 sequence is shown above the ~01-12 sequence. The Gly-X-Y repeating regions are in boldface type and underlined. The intron present in both ~01-12and ~01-13is indicated by lowercase letters. The putat,ive secretory signal sequence, identified as described by von Heijne (lW6), is underlined with dashes.

:o/-I3 *o/-/2

Ser TG ATG ACC GAA GAT Met Thr Glu Asp _____----------------

T2l.l

T CCA AAG CAG ATT GCC CAG GAG ACT GAG TCT CTC CGT AAA GTT Pro Lys Gln Ile Ala Gln Glu Thr Glu Ser Leu Arg Lys Val --_-__-_--_-_-------____________________----------

54 l,B

GCA TTC TTC GGA ATT GCA GTC TCT ACA ATT GCT ACT TTG ACT GCA ATT ATT GCT 108 Ala Phe Phe Gly Ile Ala Val Ser Thr Ile Ala Thr Leu Thr Ala Ile Ile Ala 316 __________________-----~----------------------------------------------GTT CCA ATG CTT TAC AAC TAC ATG CAG CAT GTG CAA TCT TCT CTT CAA TCG GAG 1162 Val Pro Met Leu Tyr Asn Tyr Met Gln His Val Gln Ser Ser Leu Gln Ser Glu 54 GTT GAA TTC TGC CAA CAC AGA TCA AAT GGA CTT TGG GAT GAG TAT AAG AGA gt Val Glu Phe Cys Gln His Arg Ser Asn Gly Leu Trp Asp Glu Tyr Lys Arg

215 7'1

atg

2168 7:2

ttt

ttt

ttg

ttg

aat

aat

ttt

aat

ttt

agt

taa atg

ttt

gat

ttc

ag

TTC Phe

CM GGA GTT TCT GGA GTT GAA GGA CGT ATC AAG AGA GAG GCA TAT CAC CGT AGC 322 Gln Gly Val Ser Gly Val Glu Gly Arg Ile Lys Arg Asp Ala Tyr His Arg Ser 910 CTC GGA GTT TCT GGT GCT TCC CGC AAG GCT CGT CGT CAA TCT TAT GGA AAT GAC 376 Leu Gly Val Ser Gly Ala Ser Arg Lys Ala Arg Arg Gln Ser Tyr Gly Asn Asp 1108 GCT GCT GTC GGA GGA TTC GGT GGA TCA TCT GGA GGA TCA TGC TGC TCA TGC GGA 430 Ala Ala Val Gly Gly Phe Gly Gly Ser Ser Gly Gly Ser Cys Cys Ser Cys Gly 126 TCT GGA GCT GCT GGA CCA GCT GGA TCA CCA GGA CAA GAT GGA GCA CCA GGA AAC 484 Ser Glv Ala Ala Glv Pro Ala Glv Ser Pro Glv Gln ASD Glv Ala Pro Glv Asn 1'44 GAT GGA GCT CCA GGA GCA CCA GGA AAC CGA GGA CAA GAT GCT TCT GAG GAT CAA 538 Gly Ala Pro Glv Ala Pro Glv Asn Pro Glv Gln ASD Ala Ser Glu Asp Gln 162

ASD

ACT GCT GGA CCA GAC AGC TTC TGC TTC GAT TGC WA GCT GGA CCA CCA GGA CCA 592 Thr Ala Gly Pro Asp Ser Phe Cys Phe Asp Cys Pro Ala Glv Pro Pro Glv Pro 180 'TCA GGA GCA CCA GGA CAA AAG GGA CCA TCA GGA GCT CCA GGA GCC CCA GGA CAA 646 Ser Glv Ala Pro Glv Gln Lvs Glv Pro Ser Glv Ala Pro Glv Ala Pro Glv Gln 198 TCT GGA GGA GCT GCT CTT CCA GGA CCA CCA GGA CGA GCT GGA CCA CCA GGA CCA 700 Ser Glv Glv Ala Ala Leu Pro Glv Pro Pro Glv Pro Ala Glv Pro Pro Glv Pro 216 GCC GGA CAA CCA GGA TCC AAC GGA AAC GCC GGA GCT CCA GGA GCC CGA GGA CAA 754 Ala Glv Gln Pro Glv Ser Asn Glv Asn Ala Glv Ala Pro Glv Ala Pro Glv Gln 234 GTC GTC GAT GTT CCA GGA ACT CCA GGA CCA GCT GGA CCA CCA GGA TCA CCA GGA 808 m Val Asp Val Pro Glv Thr Pro Glv Pro Ala Glv Pro Pro Glv Ser Pro Glv 252 G

G

CCA GCC GGA GCT CCA GGA CAA CCA GGA CAA GCC GGA TCT TCC CAA CCA GGA GGC 862 Pro Ala Glv Ala Pro Glv Gln Pro Glv Gln Ala Glv Ser Ser Gln Pro Glv Glv 270 CCA GGA CCA CAA GGA GAT GCC GGA GCA CCA GGA GCC CCA GGA GCC CCA GGA CAA 916 Pro Glv Pro Gln Glv ASD Ala Glv Ala Pro Glv Ala Pro Glv Ala Pro Glv Gln 288 GCT GGA GCA CCA GGA CAA GAT GGA GAG AGT GGA TCC GAG GGA GCT TGC GAT CAC 970 306 Ala Glv Ala Pro Glv Gln ASD Glv Glu Ser Glv Ser Glu Glv Ala Cvs Asp His TGC CCA CCA CCA CGT ACC GCT CCA GGA TAT TAA Cys Pro Pro Pro Arg Thr Ala Pro Gly Tyr *** Fig. 3

1003 316

Y.-S. Park

400

and J. M. Kramer

-260 -240 -220 TTTTCACCTACTGTTTATCTCCTTCTGTTC-T-TTTTCTTTATTCC~T~T~TT~A-T-~CATTCTGTTTT~T I II I I I I II / II I III I I I II II III II TACTCTTAATCTTACTTCCTTGTGAGACATATCACAACATGTGTT--C

COI-I2 cd-/3

- 240

-260

-220

-200 -160

-180

-200

TCACA-CTGCGCGCCGCAACACCATGTGGCT-GT~TATTTAG--TCAGTTTTTGAG~TA-GGT-ATAGAG

/IIll

II I Illi

II

I II

I II II

I

I

TCACAGTTGTAAACC-C-ACACAAGGTTTATCGTTTCTTATTG

II

II II I

-160

-180

-100

III1

II

I

II

IIIIIIIII

I I!

ill1

ACGGTGTTGCTGCCATTGC-GCGC---GTATAAAAGAAGI

I I II Ilill

III

-80

-100

-120 -60

I

-75

**

ACAGAGACGCTGACAAGACACACTAGGTATAAAAGCGGA

II I I

II III

-140

-120

- 140

II

-40

II

-60 -20

+l

GTCCAGTGACAGGTAAGGTTCTCGTTACTTCCGTTACTTCCGTCTCGATTACT~GATTT-GATTACTTTTTAG~TG

II

I

II

I

I I I II

II

I I

I

III

I llll

I

--CC--T-TCA---ATCTTACCCCTT--TT--TTTTGAACTTTTTAG----~ -40

1003

I IlltI;I

III

-20

1020

+1

1040

1060

col-12

TAAGCGCTTCAATGACATCTCATTTGATTATCTCTGCTTTAT~TCATTTGTATGTTTTGTGTATG~~G~4

cd-/3

TAAATATCTC-ATTAAAACTC--TTGTCTTTCTAT-C~T-TTGCTTTACCATGTCATG-AT-TCAGAGCCG-A

III

IIIIIllII

III

1003

Illlll

II

1020

III

III!

II

1040

1080

1100

!/II

I/!

1060

1120

1140

CACACT-TAGAATAGTGGAkTAAATGATTTCATTTCATTAC~TTTG~TTG~T~GAC~TGTG~TG~GT

II

II IIIII

Ill

Ill

II

I III

II I

I

I

II II Iii

I I I! lllll

CAATCTGTAG~GT-TAATGAA--A-TTC-TT-C--A--T----TTCAA-AAG-TTTTTTTTATTGAAA-T 1080 1100 1160

1180

!

1120

1200

ATAA~~AGAAAATGAGAGACATGATGGTGGCAGTTGATTGGCTTTCATG~G~~AT--G~TAGG~TAGA

I

II II!

III

IIll

I

III/

I

I Ill

Ill

I

I I z IIIIII

I!

I

CTGGAfiAAAACTGATTTCCAT-A-CTTT~GTT--TCAGGTTT-GTGAT~CACCGTGCC~TA-TTATTCA 1140 1160 1180 1240

1220

1280

1260

AAAGGACAAGGA-CATTTCGGTAATT-GAAAT-TCGGCAAATTGGAAATTTACAAPiCTT-AAC-TG-TT---AG

I

I

I

III

III

II I I I I!lIl

III

I

I I

III

IllGi

il

AGTTTTCTTTCAGAATTGAAGTATTTAGCACTCTCGGC--ATTCG--C~~CAGATTCTTGCCCATGTTTCA~AG 1240 1200 1220 1260 fb) Figure 4. Sequence comparisons of the 5’ and 3’ flanking regions of cd-12 and ~01-13. Sequences were aligned by the GENALIGN program (Intelligenetics). Gaps (dashes) were insert,ed to maximize similarity between the B genes. Identical nucleotides are connected by vertical lines. (a) The 5’ flanking sequence comparison. The TATA boxes and the 3’ splice acceptor signals are underlined. The transcription start sites of ~01-12are indicated by asterisks. The CACTT sequences are in boldface type. (b) The 3’ flanking sequence comparison. The putative polyadenflation signal of ~01-12 is underlined.

cis and trans-Splicing

of Duplicate

coi-I3 E

D

L

A

E

II

LA

Collagen

Genes

specific probe. The gene-specific probes were prepared from the 3’ untranslated regions OS the genes to avoid cross-hybridization (see Materials and Methods). Hybridization of these probes to genomic blots shows that they do not crosshybridize to each other, or to other sequences in the genome (data not shown). The col-12-specific probe hybridizes to an approximately 1.2 kb transcript and the m-13specific probe hybridizes to a 1.3 kb transcript (Fig. 5). These results are consistent with the sizes of the genes as determined by DNA sequence. .Both transcripts are detected at similar levels in dauer-14 and Lb-adult molt RNAs, but not in embryo or L2dauer molt RNAs. These data indicate that ~01-12 and col-13 are expressed at similar levels and at the same developmental stages.

(d) The ~01-12 and ~01-13 mRNAs modes of splicing

Figure 5. Stage-specific expression of the w-12 and ~01-13 genes. Total RNAs (10 pg in each lane) isolated from embryos (E), animals at L2 to dauer molt (D), dauer to L4 (L). and L4 to adult molt (A) were electrophoresed through a formaldehyde gel, and transferred to a nitrocellulose filter. Each strip of the blot .was hybridized with either a cob12 (left panel) or col-13 (right panel) specific probe. The smear seen below the 1.3 kb ~02-13 transcripts is not seen when other probes that detect col-13 are used, and may result from the extremely high A + T content of the probe. The sizes of the transcripts are indicated in kb.

termination codon that we have determined. It is unlikely that ~01-13 utilizes a polyadenylation signal further downstream since the size of its transcript is only 100 bp longer than the ~01-12 mRNA on Northern blots (see Fig. 5). It seems most likely that ~01-13 uses a non-consensus polyadenylation signal. The act-2 gene (Krause et al., 1989) and the Tel open reading frame (Rosenzweig et al., 1983) of C. elegans also lack consensus polyadenylation signals. (c) ~01-12 and ~01-13 are expressed at the same developmental stages We asked whether tially regulated during their coding regions flanking regions of the each other. Northern from molting animals stages were hybridized

~01-12 and col-13 are diffendevelopment, since although are nearly identical the 5’ genes are very different from blots of total RNA isolated at various developmental with either a ~01-12 or col-13-

401

undergo diferent

Both ~01-12 and ~01-13 possess potential 3’ splice acceptor signals (TTTTAG) ending at positions -5 and - 1, respectively, relative to their ATG initiator codons. A consensus 5’ splice donor sequence (GTAAGG) is located 56 bp upstream from the ATG in ~01-12, suggesting that ~01-12 contains a 52 bp intron within its 5’ untranslated region. However, there is no consensus splice donor sequence within 330 bp upstream from the initiator codon in ~01-13. This raised the strong possibility that ~01-13 might be trans-spliced utilizing the 3’ splice acceptor signal. We performed primer extension and nuclease S1 mapping experiments to test this possibility. A synthetic oligonucleotide primer complementary to the coding regions of both ~01-12 and ~01-13, extending from +47 to +28 (see Fig. 3), was used for primer extension analysis. If the ~01-13 mRNA is trans.spliced, it should extend for 22 nucleotides beyond the ATG, since the trans-spliced leader is 22 nucleotides long (Krause & Hirsh, 1987). Major extension products corresponding to positions - 301 31 and -22 with respect to the ATG codons were detected (see Fig. 7, lane N). Similar results were obtained using poly(A)+ RNA (data not shown). We believe that these products are extended from the ~01-12 and ~01-13 transcripts, respectively. The results are consistent with the ~01-13 mRNA being trans.spliced since a band terminating at position -22 is present. If this is the case, the -30/31 band would be the extension product from the ~01-12 mRNA. A shorter exposure of the gel shows that the intensity of the - 31 band is much weaker than that of the -30 band (data not shown), suggesting that the -30 band is the major extension product from the col-12 mRNA. The primer extension bands from the ~01-12 and m-13 mRNAs are of equivalent intensities, supporting the conclusion from the Northern blot analyses that ~01-12 and ~01-13 are expressed at similar levels. Minor bands clorresponding to positions - 36/- 37 and - 65, were also observed. We believe that the band at -65 is

Y.-X. Park and J. M. Kramer

402 (a1

derived

IGATC2

-

430

1200

-

176 172-

166

164-

164

156-

(b) S

E

E

R E

I

I

I

5’-

mRNA

* SE

I

5’-

Cd-I2 I.2 kb

E

co/-/3 probe

430 bp --x RE

Oxkb

Figure 6. S, nuclease mapping of the ~01-12 and ~01-13 nuclease showing Si mRKAs. (a) Autoradiogram protected fragments. The single-stranded RE (lane 1) or SE (lane 2) probe

was hybridized

with

total RNA, on an 8% sequencing gel. The sizes of protected fragments are shown in nucleotides. Protection of full-length fragments (430 bp in lane 1 and 1200 bp in lane 2) probably results from incomplete strand separation. The differences in intensity of the bands resulting from cleavages at the mismatches between col-12 and ~02-13 (164 and 156 bp in lane 1, 166 and 164 bp in lane 2) may be due to differences in efficiency of their cleavage by S, nuclease. (b) Restriction map of the probes for S, nuclease mapping. The locations and sizes of RE and SE probes are shown below a restriction map of ~01-12 and ~01-13. E. EcoRI; R, RsaI; S, XaZI. Kot all RsuI sites are shown.

digested with S, nuclease, and analyzed

from the eoE-13 precursor

RKA

(see below).

The -36/-37 bands may result from crossannealing of the primer to other messages or possibly to the branched trans-splicing intermediate of the ~01-13 message. For S1 n&ease mapping, a 1.2 kb XaZI-GeoRI fragment (SE) from ~01-12 and a 430 bp RsaI-EcoRI fragment (RE) from ~02-13 were used as probes (Fig. 6(b)). If the 3’ splice acceptor signals are functional, the ~02-12 and col-13 transcripts should protect 176 and 172 nucleotide fragments. respectively, spanning the regions from the 3’ splice junctions to the EcoRI sites. To arrive at these expected sizes two nucleotides must be added because the last two nucleotides (AG) of the splice acceptor signal (TTTTAG) are identical with t.he last two nucleotides of the putative 5’ untranslated exon of eel-12 and to the last two nucleotides of the transspliced leader RNA. When the SE probe was used, two major protected fragments were produced: a 176 base fragment that maps at the 3’ splice signal of ~01-12, and a 166 base fragment that results from cleavage at the mismatch between ~01-12 and ~01-13 at position +4 (Fig. 6(a), lane 2). The RE probe also generated two major protected fragments, a 172 base fragment that terminates at the 3’ splice signal of ~01-13 and a 164 base fragment that terminates at the mismatch between col-12 and ~01-13 at position +6 (Fig. 6(a), lane 1). These results demonstrate that the 3’ splice acceptor signals in both ~01-12 and ~01-13 are indeed functional, supporting the notion that ~01-12 is &-spliced and ~01-13 is trans-spliced at these positions. In addition to these major protected fragments, several minor fragments were also detected. A 164 base fragment from the SE probe and a 156 base fragment from the RE probe correspond to the regions extending from the mismatches between ~01-12 and ~01-13 at positions + 6 and + 14, respectively. To confirm the results of primer extension and S, nuclease mapping, we sequenced the 5’ ends of the ~01-12 and ~01-13 transcripts using the same oligonucleotide primer that was used in the primer extension experiments. Since the primer is complementary to both the col-12 and ~01-13 transcripts it yielded double sequences. In fact, we could read both nucleotides at the positions where the coding regions of ~01-12 and ~02-13 differ (Fig. 7). Since the sequence of the 22 nucleotide trans-spliced leader was already known, we were able to read a unique 29 nucleotide sequence by subtracting the 22 nucleotide spliced leader sequence from the double sequences beyond the ATGs. This 29 nucleotide sequence is complementary to the region from the ATG to the 3’ splice acceptor signal (positions - I to -4) and the region upstream from the 5’ splice donor sequence ( -57 to - 80) in col-12 (Fig. 8). These results confirm that the col-13 transcript receives the 22 nucleotide trans-spliced leader and that the ~01-12transcript contains a 26 or 27 nucleotide long &s-spliced leader. The primer extension analysis also shows that the transcription of col-12 starts at position -82 or

cis and trans-fJplicing

GATCN

2

-

-65

L

C

1 G -A x x c ? A A A T ? A A ? G G G ‘T T C A A

A C T C T A c *A G *C c T T c T A G *A T T T c G T c

of Duplicate Collagen Genes

403

-83 with respect to the ATG. The transcription start sites of col-12 occur within the sequence CACTT ( - 83 to -79), which is located 30 bp downstream from the TATA box (GTATAAAAG, -113 to -105) (Fig. 4(a)). The sequence CACTT (-67 to - 63) also occurs 31 bp downstream from the putative TATA box (GTATAAAAG, -98 to -90) in ~02-13 (Fig. 4(a)). Considering the sequfence similarity to col-12 and the conserved distance from the TATA box, we propose that the transcription of col-13 also starts within this CACTT sequence:. In fact, we detect a weak primer extension product that maps at position -65 (Fig. 7). This extension product would be derived from the precursor of the cok13 mRNA, before it has been trams-spliced. 4. Discussion

A T ? T 7 A G ? T C A C T G T C T T T T T A c T G G c T T c T A G G T T T c G T c

/ / -

-37 -36 -31 -30

-

-22

The C. elegans collagen genes ~01-12and ~01-13 are separated by only 1.8 kb and are transcribed in the same direction. Transcripts from the two genes appear at the same developmental stages and in equivalent amounts, but the col-13 transcript acquires a trans-spliced leader where the COZ-12transcript is &-spliced (summmarized in Fig. 9). Sequence

comparisons

of the two genes show that

their coding regions, including introns, are virtually identical (99.5 y. nucleotide sequence identity). The close proximity and coding region sequence similarity of ~01-12 and ~01-13 suggest that these two collagen genes arose by duplication of a single ancestral gene. In contrast to the coding regions and internal introns, the 5’ and 3’ flanking regions of the genes show very little nucleotide sequence tsimilarity. This fact implies that the duplication event that gave rise to the ~01-12 and col-13 genes is very old, old enough for the flanking regions of the genes

Fig. 7

Figure 7. Primer extension sequencing of the ed-12 and eel-13 mRNAs. A synthetic oligonucleotide primer complementary to both the ~01-12 and ~01-13 mRNAs (see Fig. 8) was used to sequence the 5’ ends of the ~01-12and ~01-13mRNAs. The 5’ end-labeled primer was annealed to total RNA and extended with reverse transcriptase in the presence of ddNTPs (lanes G, A, T, C), or in the absence of ddNTPs (lane N) to determine the termination sites (designated by X). Lane 1 shows the sequence derived from the ~01-13mRNA, with the sequence that is complementary to the 22 nucleotide trans-spliced leader indicated in boldface type. Lane 2 shows the sequence derived from the ~01-12mRNA. Nucleotides at positions where the coding regions of col-12 and ~01-13 differ are indicated by asterisks. Ambiguities in reading the sequence are indicated by question marks. Nucleotides complementary to the ATG initiator codon are indicated with a bar to the left. The sizes (in nucleotides) of extension products are given relative to the ATG initiator codons. The autoradiogram chosen for this Figure was obtained from a long exposure of the gel to X-ray film, in order that the RNA sequence ladder might be seen easily. With shorter texposures, the intensities of the - 36/- 27 and - 65 bands are clearly much reduced relative to the - 22 and - 30/ - 31 bands.

404

Y.-Is. Park and 9. A!. Kramer

co/-/z

RNA:

-80 -70 -60 -50 -40 -30 CACTTGGCTTCTAAAGTCCAGTGACAGGTAAGGTTCTCGTTCTCGTTA~TTC~GTCT~GATTACT~GATT XXGMCCGA??AT?T?AG?TCAcTGTC--------------------------------------

DNA : RNA:

+20 cl +30 940 -10 * * +I0 TGATTACTTTTTAGAAAAATGACCGAAGATC~~GCAGATTGCCCAGGAGACTGAGTCTCTCCG _______------TTTTTACTGGCTTCTAGGTTTCGTCTAACGG<--primer

DNA :

-20

- - ->

col-I3 DNA :

-60 -50 -70 -40 -30 -80 AACGGCAGGAGATAAGCACTTTATTTCGAACCTTCAATCTTACCCCTTTTTTTTGAACCACTTAC

-20 x

RNA:

DNA :

-10 +l i-20 +30 * * +I0 ATTTCAAACAATTTTTAGATGTCGGAAGAT~~*G~AGCAGAGAGA~TGAGT~TCT~~G

RNA:

AT?M?GGGITCMACTCTACAGCCTTCTAGATTTCGTCTAACGG<---

940 primer

- - ->

Figure 8. Summary of the 5’ end sequences of the ~01-12 and ~01-13 genes and their mRIiAs. The sequences derived from the mR,NAs are read as the antisense strand and are shown below the sequences of the DNA coding strand. The primer used for RNA sequencing is indicated. The 5’ and 3’ splice signals are underlined. The mRNA-derived sequence which is not complementary to the sequence of the DL7A coding strand is in boldface t’ype. This sequence is complementary to the 22 nucleotide tram-spliced leader sequence (Krause & Hirsh, 1987). Xs indicate the termination sites. Question marks indicate ambiguities in reading the sequence. Asterisks indicate nucleotides at positions where thr coding regions of ~01-12 and ~01-13 differ.

to have lost almost all sequence similarity. The high degree of sequence similarity within the coding regions of the genes is not, therefore, a result of the gene duplication event having occurred recently.

(a) Gene conversion

between col- 12 and ~01-13

The coding region sequence similarity between ~01-12 and ~01-13 must result from some form of sequence exchange between the two genes, most likely gene conversion. Gene conversion has been shown to play a major role in maintaining sequence homology in several gene families, including the human fetal globin genes (Slightom et al., 1980), the human alpha globin genes (Liebhaber et aE., 1981), the human immunoglobulin genes (Flanagan et al., 1984), the mouse beta globin genes (Erhart et al., 1985), the silkmoth chorion genes (Eickbush & Burke, 1985, 1986), and the C. e2egans heat shock protein genes (Russnak & Candido, 1985). How could gene conversion have maintained sequence identity from the ATG initiation codons to the termination codons, but not into the flanking regions? We propose that a combination of selection against amino acid changes and a high frequency of gene conversion events within the coding region could produce the observed characteristics. Selection against amino acid changes may be particularly strong in the ~01-12 and ~01-13 genes if the collagens

they encode have important roles in forming the H,4 and adult cuticles. Even though there are duplicate copies of this collagen, most amino acid changes would act dominantly due to the polymeric, strnctural nature of collagens. It seems reasonable, therefore, to propose that the coding regions of eel-12 and col-13 would be under selective pressure to limit alterations in their encoded amino acid sequences. We would expect much less selection against nucleotide changes in the 5’ and 3’ flanking sequences, since a much smaller percentage of the nucleotides in these regions is likely to be functionally significant. As a result of select,ion, t,he rate of accumulation of nucleotide substitutions would be much lower in the coding regions than in the flanking regions of col-12 and col-13. Both the initia#tion of gene conversion events and their expansion via branch migration, require regions of homology between genes (DasGupta & Radding, 1982; Ayares et nl.. 1986; Liskay et al., 1987). Selection against’ amino acid changes would keep the coding regions of FOE-22 and col-13 similar enough that gene conversions could extend through the coding region and maintain sequence identity, The more rapid accumulation of mutations in the flanking regions of the genes would reduce the frequency and extents of gene conversion events, allowing the flanking regions t,o diverge. Even smail regions of non-homology have been proposed to block branch migration and therefore limit the

cis and tram-Splicing col-I2 S

E

of Duplicate

cot-t3 E I

E

E

I kb

Collagen

actin genes, act-l gene conversion.

Genes and act-3, have been conserved

(b) The ~01-12 and ~01-13 mRNAs modes of splicing Although

u

DNA

u

20”’

4 M

precursor

RNA

-

RNA

W

+ spliced

leader

c&splicing Puns-splicing

r/x.

mRNA

Figure 9. Schematic representation of the different modes of mRNA maturation displayed by col-12 and ~01-13. A restriction map indicating the positions of the coding regions of cot-12 and ~01-13 is given. E, EcoRI; S, XaZI. The details of the 5’ ends of the genes are shown below the restriction map. Filled boxes represent the coding regions and horizontal lines represent flanking regions, including TATA boxes and introns. Single hatched boxes indicate a 26 nucleotide &-spliced leader and double hatched boxes indicate a 22 nucleotide transspliced leader. The sequence AAAA, preceding the initiator ATG of ~01-12, is shown as a small open box. ~01-12 and ~01-13 are transcribed utilizing conserved TATA boxes and transcription start sites. However, the ~01-12 mRNA matures through &-splicing while the col-13 mRNA matures through trans-splicing. nt, nucleotide.

region of sequence exchange (Michelson & Orkin, 1983). Over time the extents of gene conversion would become progressively shorter, eventually being delimited by the region in which strong selection was active, defined by the initiation and termination codons in the case of cot-12 and cot-13. Strong selection within the coding regions of the two genes implies that both ~01-12 and ~01-13 have been continuously expressed since the time of the gene duplication that produced them. If either of the genes had ever been inactive it would have been released from selective pressure and its sequence would have begun to drift. There are 52-nucleotide long introns within the coding regions of both ~01-12 and ~01-13 that would not be expected to be under strong selective pressure against nucleotide changes. ‘They would, therefore, be expected to present a block to the extension of gene conversion events. The sequence identity of the introns in the two genes, and of the coding regions around them, however, indicate that gene conversion has not been blocked by the introns. The introns are short and are bounded on either side by large blocks of coding sequence that would provide anchors of homology for gene conversion events. If gene conversions occurred with high frequency, then the short intron sequences might never accumulate sufficient nucleotide changes to block branch migration. Krause et al. (1989) have also proposed that both the coding regions and introns of the C. elegans

405

the

utilize

by

diferent

5’ flanking regions of w-12 and extensively, there is no indication of differences in the temporal regulation of transcription of the two genes. Both transcripts are detected at similar levels during the formatioln of the L4 and adult cuticles. Furthermore, the TATA boxes and the transcription start sites of the two genes appear to have been conserved. These data suggest that since the time of gene duplication these two genes have retained the regulatory sequences necessary for stage-specific expression, while other flanking sequences have diverged. Since both genes encode structurally equivalent collagen polypeptides, it may be that a large amount of this particular collagen polypeptide is required for the formation of the L4 and adult cuticles. ~01-12 and ~01-13 are essentially identical in all aspects except that the ~01-12 mRNA has a eisspliced leader at the same position where the ~‘01-13 mRNA acquires the trans-spliced leader (Fig. 9). The different patterns of splicing displayed by two nearly identical genes raises the question of whether ~01-12 and ~01-13 are differentially regulated at the level of translation. Since the ~01-12 mRNA has a 26 nucleotide c&spliced leader at its 5’ end while the ~01-13 mRNA has the 22 nucleotide trans-spliced leader, the translational properties of these two messages could , be different. There is evidence showing that trans-spliced mRNAs of C. elegans acquire and retain the trimethylguanosine catp of the XL RNA (Blumenthal & Thomas, 1988; K.. van Doren & D. Hirsh, personal communication). Although it is not certain what effect the trimethylguanosine cap might have, the difference in cap structures between the ~01-12 and ~01-13 mRNAs could affect their translational properties. There is no sequence similarity between the cis- and1 the trans.spliced leaders. The function of the transspliced leader is not known, but it has been speculated that it may stabilize mRNAs or increase their ribosome binding affinities, both effects that would result in increased levels of translation prolduct. Most of the genes whose mRNAs have been shown to be trans-spliced encode abundant proteins and are members of gene families. Among them are three of the four actin genes (Krause & Hirsh, 1987), one of the two myosin light chain genes (Cummins & Anderson, 1988), two of the ribosomal protein genes (Bektesh et al., 1988) and all four of the GAPDH genes (Bektesh et al., 1988; X.-Y. Huang & D. Hirsh, personal communication). Thus, even though the ~01-12 and ~01-13 mRNAs accumulatte to similar levels at the same developmental stages, more of the ~01-13 collagen than the ~01-12 collagen might be made. In this view, trans.splicing vvould provide another mechanism for controlling the level of a gene product. ~01-13 have diverged

406

Y.-S.

Park

and J”. ,841.Kramer

(c) cis and trans-splicing

may be interchanged gene evolution

during

Immediately following the gene duplication that produced them, the ~01-12 and ~01-13 genes were either both c&-spliced or both trans-spliced. Subsequently, one of the genes must have altered its mode of splicing. From our results it is not possible to determine which of the modes of splicing was active first. However, ~01-12 and ~01-13 are the first example in which cis and trans-splicing appear to have been interchanged during the evolution of duplicate genes. The mRNAs from the remaining actin and myosin light chain genes are not trans or &-spliced at their 5’ ends, even though they do contain consensus 3’ splice acceptor sequences preceding their initiation codons. Thus, the simple presence of a 3’ splice acceptor sequence with no splice donor upstream is not necessarily sufficient for trans.splicing to occur. No specific recognition signal for trans-splicing has been identified. It is not certain whether the change in splicing modes between ~01-12 and ~01-13 occurred directly, or via an intermediate stage during which neither mode of splicing was active. As argued above, however, the change must have occurred without any period during which either of the genes was inactivated. We have also identified another pair of related collagen genes that differ in their modes of splicing. The sqt-1 (Kramer et al.; 1988) and rol-6 (J. Kramer, R. French, J. Johnson 85 E.-C. Park, unpublished results) collagen mRNAs receive &-spliced and trans-spliced leaders, respectively (Y.-S. Park & J. Kramer,

unpublished

results).

The

alternation

between the cis and trans modes of splicing may not be uncommon amongst the numerous members of the C. elegans collagen gene family. Whether the differences in mRNA maturation of the collagen genes are related to the regulation of cuticle formation during development remains to be determined. We thank Susan Bektesh, Tom Blumenthal, Tom Eickbush and David Hirsh for providing unpublished data and illuminating discussions. This work was supported by U.S. Public Health Service grant HD22028 to J.M.K.

Biol.

Evol. 2; 304-320.

Fla,nagan, J. G., Lefranc, M.-P. & Rabbitts, T. H. (1984). Cell, 36, 681-688. Hodgson, C. P. & Fisk; R. Z. (1987). i”;‘ucl. Acids Res. IS. 6295. Kramer, J. M.? Cox, G. N. & Hirsh, D. (1982). Cetl, 30, 599-606. Kramer, J. M.: Cox, G. N’. & Hirsh, D. (1985). J. Biob. Chem. 260, 1945-1951. Kramer; J. M.: Johnson, J. J., Edgar, R. S.; Basch, C. & Roberts, S. (1988). Cell; 55. 555-565. Krause, M. & Hirsh, D. (1987). Cell, 49; 753-761. Krause, M., Wild, M., Rosenzweig, B. & Hirsh, D. (1989). J. Mol. Biol. 208; 381-392. Lehrach, H.; Diamond, D., Wozney, ,J. M. & Boedtker, H. (1977). Biochemistry, 16, 47434751. Liebhaber, S. A., Goosens, M. & Kan. Y. W. (1981). Na,ture

(London),

290, 26-29.

Liskay, R. M., Letsou, A. & Stachelek; J. L. (1987). Genetics, 115, 161-167, Maxam, A. M. &r.Gilbert, W. (1980). Methods li:nqrr&. 65, 499-560 Meinkoth, J. 8r Wahl, G. (1984). Anal. Biochem. 138, 267-284. Michelson A. M. & Orkin, S. H. (1983). J. Riol. Chem. 258, 15245-15254. Nelson, D. W. 8: Honda, B. M. (1985). Gene, 38. 245-X:. M.; Nelson R. G., Watkins, K. P. & Parsons, Agabian. N. (1984). Cell, 38, 3099316. Rosenzweig, B.; Liao, L. W. & Hirsh. D. (1983). ~Vucl. dcids Res. 11, 4201-4209. Russnak, R. H. & Candido, E. P. M. (1985). Xol. Cd. Biol. 5, 1268-1278. Sanger, F., Nicklen, S. & Coulson? A. R. (1977). Proc. Xai. U.S.A.

74, 5463-5467.

Slightom, J. L., Blechl, A. E. & Smithies, 0. (1980). Cell,

$yares, D.: Chekuri, L., Song, K.-Y. & Kucherlapati, R. (1986). Proc. Nat. Acad. Sci., U.S.A. 83, 5199-5203. Bektesh, S.; Van Doren, K. & Hirsh, D. (1988). Genes Dew. 2, 1277-1283. Blumenthal, T. & Thomas, J. (1988). Trends Genet. 4, 305-308. Brenner, S. (1974). Genetics; 77, 71-94. Bruzik, J. P., Van Doren, K.; Hirsh, D. & Steitz, J. A. (1988). Nature (London), 335, 559-562. Cathala, CT., Savouret, J.-F., Mendez, B., West, B. L.; Karin, M., Martial, J. A. & Baxter, J. D. (1983). 2, 329-335.

Cox, G. N. & Hirsh, D. (1985). Mol. Cell. Biol. 5, 363-372. Cox, 6. N., Laufer, J. S., Kusch, M. 8: Edgar, R. S. (1980). Genetics; 95, 317-339. Cox, G. N., Staprans, S. & Edgar, R. S. (1981). Develop. Biol. 86. 456-470. Cox, G. N., Kramer, J. M. & Hirsh; D. (1984). fMoZ. Cell. Biol.

5339-5349.

DasGupta, C. & Radding, C. M. (1982). Proc. :vat. Acar%. Sci., U.S.A. 79, 762-766. Eickbush, T. H. 8i Burke, W. D. (1985). Proc. Nat. dcad. Sci., U.X.A. 82, 2814-2818. Eickbush: T. H. & Burke, W. D. (1986). J. idol. Riol. 190, 357-366. Eickbush, T. H. 85 Kafatos, F. C. (1982). Cell, 29. 633-643. Erhart, M. A., Simons, K. S. & Weaver, S. (1985). Mol.

Acad. i&i.,

References

DhrA,

Cox, G. N., Carr, S.; Kramer, J. M. & Hirsh, D. (1985). Genetics. 109, 5133528. Cox, G. N., Fields, C.; Kramer, J. M.. Rosenzweig, B. & Hirsh, D. (1989). Gene, 76, 331-343. Cummins, C. &, Anderson, P. (1988). Mol. Cell. Biol. 8,

4; 2389-2395. Edited

21, 627-638. Steinmet’z, M., Winoto, A., Minard, K. & Hood, L. (1982). Cell, 28, 489-498. Sulston, J. E. & Brenner, S. (1974). Genetics, 77, 95-104. Sutton, R. E. & Boothroyd; J. C. (1986). Cell, 47, 527-535. Thomas, J. D., Conrad, R. C. & Blumenthal, T. (1988). Cell, 54, 533-539. Van Doren; K. & Hirsh, D. (1988). Nature (London), 335. 556-559. von Heijne, G. (1986). Nucl. Acids lifes. 14, 4683-4690. von Mende, N., Bird, D. M., Albert, P. S. $ Riddle, D. L. (1988). Cell; 55; 567-576. Walder, J. ,4., Eder, P. 6,; Engman, D. M., Brentano: S. T., Walder, R. Y.. Knutzon, D. S.; Dorfman, 1). M. & Donelson, J. E. (1986). Science, 233, 569-571. Yanisch-Perron, C.? Vieira, J. & Messing; J. (1985). Oelze, 33; 109-119. by 9. Karn