Characterization of a complex satellite DNA in the mollusc Donax trunculus: Analysis of sequence variations and divergence

Characterization of a complex satellite DNA in the mollusc Donax trunculus: Analysis of sequence variations and divergence

Gene, 169 (1996) 157 164 © 1996 Elsevier Science B.V. All rights reserved. 0378-1119/96/$15.00 157 GENE 09437 Characterization of a complex satelli...

849KB Sizes 1 Downloads 26 Views

Gene, 169 (1996) 157 164 © 1996 Elsevier Science B.V. All rights reserved. 0378-1119/96/$15.00

157

GENE 09437

Characterization of a complex satellite D N A in the mollusc Donax trunculus: analysis of sequence variations and divergence (EcoRV restriction; tandem repetition; DNA sequencing; repetitive subfamilies; concerted evolution; gene conversion; wedged clam; marine invertebrate)

M. Plohl and L. Cornudella Departamento de Biologia Molecular y Celular, Centro de Investigacidn y Desarrollo del C.S.I.C. y Unitat de Biologia Molecular del Centre de Referdncia en Biotecnologia de la Generalitat de Catalunya, E-08034 Barcelona, Spain

Received by G. Bernardi: 2 June 1995; Revised/Accepted: 2 August/30 August 1995; Received at publishers: 9 October 1995

SUMMARY

A highly repetitive sequence in the genomic DNA of the bivalve mollusc Donax trunculus (Dt) has been identified upon restriction with EcoRV. During the time-course of DNA digestion, genomic fragments resolved electrophoretically into a ladder-like banding pattern revealing a tandem arrangement of the repeated elements, thus representing satellite DNA sequences. Cloning and sequence analysis unraveled the presence of two groups of monomer units which can be considered distinctive satellite subfamilies. Each subclass is distinguishable by the presence of 17 evenly spread diagnostic nucleotides (nt). The respective consensus sequences are 155 bp in length and differ by 11%, while relevant internal substructures were not observed. The two satellite subfamilies constitute 0.23 and 0.09% of the Dt genome, corresponding to 20 000 and 7600 copies per haploid complement, respectively. Sequence mutations often appear to be shared between two or more monomer variants, indicating a high degree of homogenization as opposed to that of random mutational events. Shared mutations among variants appear either as single changes or in long stretches. This pattern may arise from gene conversion mechanisms acting at different levels, such as the spread of nt sequences of a similar length to the monomer repeat itself, and the diffusion of short tracts a few bp long. Subfamilies might have evolved from the occasional amplification and spreading of a monomer variant effected by gene conversion events.

INTRODUCTION

Satellite DNA is a class of highly reiterated sequences, tandemly arranged, which are ubiquitous components of eukaryotic genomes. These sequences usually display great variability in reiteration frequency, complexity, repetitious unit length and chromosomal distribution (Skinner, 1977; Miklos, 1985; Lohe and Roberts, 1988). Correspondence to: Dr. L. Cornudella, Department of Molecular and Cell Biology, Centro de Investigaci6n y Desarrollo, C.S.I.C., Jordi Girona 18-26, E-08034 Barcelona, Spain. Tel. (34-3) 400-6138; Fax (34-3) 204-5904; e-mail: [email protected]

Abbreviations: bp, base pair(s); CSPD, disodium 3-(4-methoxyspiro {1,2-dioxetane-3,2'-(5'-chloro)-tricyclo [3.3.13'7] decan}-4-yl)-l-phenylSSDI 0378-1119(95)00734-2

No defined function has yet been assigned to satellite DNA although it has been proposed to be involved in genome structure and evolutionary processes. Nonetheless, recent evidence has been reported concerning the existence of satellite DNA transcripts in a variety of organisms such as the Bermuda land crab (Varadaraj and Skinner, 1994). Repetitive DNA sequences appear to be subject to phosphate; Dt, Donax trunculus (Mollusca, Bivalvia); ENase, restriction endonuclease; EtdBr, ethidium bromide; kb, kilobase(s) or 1000 bp; nt, nucleotide(s); p, plasmid; pDTE, primary clones of Dt monomer repeats inserted into the EcoRV site of the Bluescript vector; PolIk, Klenow (large) fragment of E. coli DNA polymerase I; SDS, sodium dodecyl sulfate; SSC, 150 mM NaCl/15 mM Naa.citrate pH 7.6; TBE, 90 mM Tris/90 mM HaBO3/2 mM EDTA; u, unit(s).

158 different mutational processes which introduce changes generating m o n o m e r variants that are then spread by molecular drive forces resulting in homogenization and concerted evolution (Dover, 1982; 1986; 1993). Occurrence and evolution of divergence in repetitive sequences have been ascribed to unequal cross-over (Smith, 1976) and also to slippage replication (Tautz et al., 1986). Satellite D N A families in Drosophila seem to appear t h r o u g h gene conversion and unequal crossover (Strachan et al., 1985). Likewise, gene conversion, recombination and unequal cross-over have been held responsible for the Alphitobius diaperinus satellite D N A divergence and the ensuing formation of subfamilies (Plohl and Ugarkovi6, 1994a). Satellite D N A sequences have been analyzed in a large n u m b e r of organisms, but amongst marine invertebrates the information available is rather scarce. Sequences of satellite D N A have been examined in some crustacean species (Fowler and Skinner, 1985; Stringfellow et al., 1985; Badaracco et al., 1987), in echinoderms (Sainz et al., 1989; Sainz and Cornudella, 1990) and also in the c o m m o n clam Mytilus edulis (Ruiz-Lara et al., 1992). Molluscs represent a large g r o u p of marine invertebrates constituting the second most a b u n d a n t p h y l u m of metazoans. Several features on the biology of molluscs at the molecular level are known, such as genomic content, c h r o m o s o m e complements and genomic organization in some species (Hinegardner, 1974; A1-Sabti, 1989). Constitutive heterochromatin in low a m o u n t s has been found in c h r o m o s o m e s of Mytilus galloprovincialis, located in telomeres or intercalarily in some c h r o m o somes of the complement, but absent from centromeres. This distribution suggests an unusual arrangement of repetitive D N A sequences (Martinez-Lage et al., 1994). To gain further insight into the genome organization and evolutionary trends in these animal species, we have focused here on a satellite D N A family from the bivalve mollusc Donax trunculus (Dt), reporting on its isolation, nt sequence, a b u n d a n c e and organization. Detailed description of sequence variations is also provided with the aim to explain the ocurrence of subfamilies evolved thereof.

RESULTS AND DISCUSSION

(a) Detection and cloning of an EcoRV repetitive sequence High-molecular-mass D N A was isolated and purified from fresh sperm suspensions by phenol extraction ( S a m b r o o k et al., 1989), extensively digested with various ENases and the digests subsequently electrophoresed (Fig. 1A). Restriction of D N A to completion with EcoRV yielded two smear-masked fragments, one of approx.

A 1 2 3 4 5 6 7

B 23456

I i

m

Fig. 1. Southern analysis of genomic DNA from Dr. Adult specimens of Dt were obtained from local fishermen during the breeding period, moved live to the laboratory and held in cold seawater until used. After carefully opening the shells, gills were removed to expose gonadal tissue. A small incision was performed and the spontaneously released sperm fluid was then collected with the aid of a Pasteur pipette. (A) Samples of purified sperm DNA (5 ~tg each) were serially cleaved to completion as specified by the manufacturer (Gibco-BRL) with the ENases listed below. DNA fragments were resolved on 1.5% agarose gels cast in 0.5 x TBE containing EtdBr, at 2 V/era for 6 h. (B) Gels in A were blotted to a positively charged nylon membrane (Amersham) in 0.4 M NaOH, and hybridized to the DNA fragment electro-eluted from the fastest migrating band in the EcoRV digest. This probe was biotinylated by PolIk random priming and the hybridization was carried out overnight at 42°C in 0.25 M Na-phosphate buffer, pH 7.2/7% SDS/50% formamide. The filter was washed twice in 2 × SSC/0.1% SDS followed by two stringent washes at 68°C in 0.1 × SSC/1% SDS and finally rinsed at room temperature to remove excess detergent. The filter was next blocked with 0.2% casein and subsequently reacted with streptavidinalkaline phosphatase conjugate. After the binding reaction, the filter was briefly incubated with the CSPD chemiluminescent dioxetane (Tropix) and the wet membrane finally exposed to Kodak X-Omat AR X-ray film. Lanes: 1 and 7, 142-bp DNA ladder as size standard; 2, EcoRI; 3, HindIII; 4, EcoRV; 5, HaeIII; 6, MboII. The position of the EcoRV monomer band of about 150 bp is indicated by an arrow. 150 bp in length and the second about twice this size. Detection of such bands was considered an indication of the presence of a satellite-type, repetitive nt sequence within the Dt genome containing EcoRV sites. The smallest D N A fragment was eluted from the gel, biotinylated by P o l I k r a n d o m priming and used to probe a Southern blot of the gel in Fig. 1A. The a u t o r a d i o g r a p h yielded a strong chemiluminescent signal in the position of the 150-bp D N A fragment, as well as a short ladder of a b o u t four multimeric bands (Fig. 1B). In the limit restriction of genomic D N A with HindIII a weakly-stained fragment comigrating with the EcoRV fragment (see Fig. 1A, lane 3) failed to hybridize with the biotinylated probe. A sim-

159 ilar phenomenon occurred with the fastest migrating fragment in the HaellI digest (lane 5). The lack of hybridization suggests the existence of other clustered repetitive sequences, unrelated to the EcoRV repeat, within the Dt genome. Hybridization screening also revealed the presence of short distributions of oligomers of the repeated fragment in limit restrictions of DNA with EcoRI, HindlII and MbolI, indicating the presence of recognition sites for these enzymes within the EcoRV repeat, later confirmed by nt sequence analysis. DNA fragments recovered from the two fast migrating bands in the EcoRV limit digest were ligated separately into the corresponding site of the Bluescript SK ÷ vector (Stratagene, La Jolla, CA, USA) and cloned in competent E. coli Xll-blue cells. For identification of recombinants carrying EcoRV repetitive fragments, plasmid DNA from mini-preparations of a set of clones was digested with EcoRV to release the respective inserts and the latter electrophoresed, blotted to nylon and screened with the biotinylated uncloned 150-bp fragment. This approach produced two sets of positive transformants, one containing eleven monomer inserts and the other two dimeric DNA fragments. Recombinant plasmids were generically termed p D T E and numbered.

(b) Genomie organization of the EcoRV repetitive sequence The tandem arrangement of the repetitive sequence was determined by following EcoRV action on genomic DNA with time (Fig. 2). Ladder-like distributions of DNA fragments were generated with the progress of digestion, displaying a decrease in distinguishable ladder rungs with time as a result of the gradual shortening of DNA fragments which reflected a clear trend to approach the monomer length. At moderate digestion intervals, as many as 15 oligomer bands could be distinguished in the corresponding elcctrophoretic patterns. All restriction fragments appeared to be integers of a basic repeat length, while no fragments of unrelated size were seen. The characteristic ladder pattern obtained, together with the relative abundance of the repeat (see below), argue that the EcoRV repeated sequence is organized as a true satellite DNA. A short ladder containing up to four bands of decreased intensity persisted even at prolonged digestion intervals. This residual pattern is typical for type A restriction of satellite DNA species (Hoerz and Zachau, 1977), indicating partial random loss of EcoRV recognition sites within the tandem repeat. The sequence analysis of the cloned dimers confirmed the presence of mutated sites.

(c) Sequence comparisons of monomer variants and definition of subfamilies Cloned satellite DNA monomers were sequenced by the dideoxy chain-termination procedure (Sanger et al.,

5 15 3 0 6 0 N

~ I O m

155 bp Fig. 2. Time-course of EcoRV digestion and tandem arrangement of the repeat. DNA samples (5 gg) wereprogressivelydigestedwith EcoRV (0.5 u/gg DNA) for various times, with the exceptionof the complete digest in the rightmost lane (N) which was carried out overnight with a 10-fold ENase concentration. Digestions were stopped by adding Na2EDTApH 8.0 to 20 mM and chillingon ice. EcoRV DNA fragments were separated by electrophoresisas describedin the legend to Fig. 1A. Size-fractionated fragments were blotted to a nylon membrane and probed with a balanced mix of clonedinserts releasedfrom recombinant plasmids pDTE24-8and pDTE24-4,labelledwith biotin. Hybridization conditions and chemiluminescent detection were as described in the legend to Fig. lB. Restriction intervals (min) are indicated at the top.

1977) in both directions. The sequences of 15 monomers obtained from 11 monomeric recombinants and 2 cloned dimers were aligned to establish unambiguous consensus using the P I L E U P computer programme from the G C G package (Genetics Computer Group, Madison, WI, USA) (Fig. 3). The sequences of all analyzed monomers appeared to be largely similar displaying nt substitutions as the predominant deviation from the consensus while insertions or deletions were rather scarce. Neighboring monomers from the 2 cloned dimers yielded comparable sequence variation indicating absence of specific intracluster nt variability. Some alterations seemed to be clearly non-random. A number of clones shared identical nt changes in exactly the same positions, hinting at the spread of new monomer variants rather than series of unrelated mutational events. The sharing of subsets of mutations between variant sequences has been proposed to be a consequence of partial homogenization driven by gene conversion mechanisms (Pages and Roizes, 1984; Strachan et al., 1985; Drouin and Dover, 1990). Several particular mutations were found to be restricted to a group of monomer variants which redounded to their reciprocal homology tending to deviate from the rest of clones. For instance, 6 monomer sequences shared iden-

160

BI-24-3

pDTE El

T GATATCA/%CA

24-3

..........

6O ATCTGTGTAA

ECORV

27-6

....

27-9

G

TA

T

........

T-

-A

. . . . . . . . . . . . . .

24-9

. . . . . . . . . . . . . . . . . . . . . . . . . . (3

---T

24

l

24-10

TT .

. . . . . . .

.



24-8

G

.

T ....

. . . . . . . . . . . . . . . . . . . . . . . . . T .........

A--T

.

.

.

.

.

.

.

T .....

C-G

G

C-T

................

G

....

. . . . . . . . . . . . . .

GT-T--

A

.....

T-

T . . . . . . . . . . . . . . . . . . . . . . .

GT

0T ......

T-

T .......

T ...............

TT .......

. . . . . .

T-

T---C

. . . . . . . . . . . .

GT-

E2

C

T

gatatcaaca

. . . . . . . . . .

Tgatttgaac

G-0

....

aaactt~aaa

...........

T--

G

.......

GT

A .....

. . . . . . . . .

gaggGTcacc

C---

T ........

taaggatcat

BI-27-9

--A--

T---A-AA

. . . . . . . . . . . . . .

. . . . .

--A-A-T-

T

GT ..............

24-6

........

BI-24-7

................

A ........................

27-I0b

27-10

A . . . . . . . . . .

T .............................

TA . . . . . . . . . . . . .

.........

BI-27-6

A ............

AT ......

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . .

24-11

TAAGGATCAT

..........................

T . . . . . . . . . . . . . . . . . . . . . . . .

27-9b 24-5

G A a G A ~

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

T ...........

.......

24-4

AAACTTGAAA

TA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

24-7

CGATTTGA~C

-

T---

AA--

T .....

AA-

T .....

AA--

T .....

AA--

BI-24-4

TtctgtAAaa

BI-24-9

T

~DTE ~_qo ~ T ~ • TATGAATG,-,A/% TTCCATC-CAG

CAGTTTTGGA

24-3

.........

T ......................

27-6

................... T . . . . . . . . . . . . . . . . . . . . . . . . . TTC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24-7 27-9 24

4

24-9

AAA

............

/4boil GR2~%AGAT,T

110 ~ , A T T T T ......

. . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24-5

. . . . . . . . . . . . . . . .

24-11

....

A

24



T-T

TF ..........

1

24-10

....

24-8

.-G-T-T

27-i0b

"--

6

A .......

. . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . T-T

. . . . . . .

A - - - T - T ....

E2

• tatTaTUg,--aa

T-T

A

. . . . . . . . . . . . . .

...........

T .......

TGAT

....

T

T~GA

" A- -

T .......................

....

T . . . . . . . . . .

T

TGA

--T

..........

....

0A .

.....

0

E1 24-3

C---AA

27 -6

C ....

A .................

24-7

.....

A0

.......

GO

. . . . . . . . . . .

27-9 24

.... 4

CG-CCCCTGGC OA ........

A ............

C

C .....

A .........

. . . . . . . . . . . . . . . .

A

..............

27-9b

.....

24-5

_TG

......

A

24-11

-T . . . . .

A

24-1

....

24-10

-T--CA-T

.

24-8

....

CATT

. . . . . . . . . . . . . . . . .

GACT

. . . . .

24-6

-

T--A

G ..............

--CA

A

. . . . . . .

27-10

....

tcatCAgTtg C

CACT

G

.........

C .....

. . . . .

T

C ............

.

.

.

TT.-

A ..................

.

A"

T . . . . . . . . . . . . . . . 0 . . . . . . . --T

T . . . . . . . . . .

. . . . . . . . . . . . . . .

-T ..............

cca

BI-27-I0

TT"

G . . . .

•-T ....................

0AC-

aaacggg,

BI-24-6

C-TTA,

A ....................

. . . . . . . .

T . . . . . .

AC

• •

T-A.

A . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

E2

BI-24-I BI-24-I0 Bi-27-10b

G-'.

. . . . . . . . . . . . . . .

T ........

---AC--A

.

i

BI-24-8

'IT TTTAAOl

...................

A

.

T .......

..............................

-T .......

.

T---

"-

T C.T--tttTaa-attt

gaagaagatO~

~ I I 150 G~CATGTTT

-T .......

27-i0b

T-

. . . . . . . . . . .

Tagttttgga

24-9

CA-T

.....

.

A

. . . . . . . . . . . . . .

A

T-0 .

T ....

T

G T A A HaeII~ TCATTTGATG AAACGGGICCA

.

TT ......

T . . . . . . . . . . .

T . . . .

AAA

T--C--

T ..................

ttcTGGC-caU

BI-24-II

A ....

T . . . . . . .

000-A

....

ATGG-C

.......

G-AT

TGGT

TGG

. . . . . .

27-10

pDTE

BI-24-5

.....

. . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27-9b

24

Bi-27-9b

~oxi

E1

cg. c c c c t g g c

A-GA

G-

• •

G-"

"

G-

• •

G-

• -

G- " " .....

ggocatgttt

A

---TTAG

tttGa--

Fig. 3. The nt sequence of eleven cloned EcoRV monomers and two dimer repeats. Sequences were analyzed using the PILEUP program from the Genetics Computer Group (GCG) and grouped according to their reciprocal homologies. Two groups of monomer variants can be distinguished by means of a set of distinct nt which are considered diagnostic changes. The two derived consensus are shown above (in upper case) and below (in lower case, except for diagnostic nt) their respective groups. Consensus nt were determined by majority rule. Bases showing no variation from the respective consensus nt are omitted. Base gaps in the consensus sequences (black dots) have been arbitrarily introduced to account for nt insertions in individual variants, while particular deletions are marked by zeros. Diagnostic nt for group E2 of cloned monomers have been both capitalized and italicized in the consensus, in the corresponding individual variants, as well as in the particular E1 variants which also contain them. All possible nt at each position of the ten ambiguous sites in both consensus sequences are indicated. Sequences from the pair of monomers 27-9 and 27-9b, as well as from pair 27-10 and 27-10b were obtained from two dimer-sized clones. All sequences reported here have been submitted to the EMBL/ GenBank databases under accession Nos. X86926 X86938.

tical changes in 17 positions contrasting with the nt normally occupying these positions in the remaining 9 cloned sequences. From these nt variations it seemed feasible to distinguish two main groups of monomer sequence variants, respectively designated E1 and E2 in Fig. 3. The relationships based on the mutual homologies between monomers, as well as their distribution into two

Fig. 4. Dendrogram derived from the alignment of EcoRV monomer variant sequences. The diagram was obtained from the analysis of all sequences reported in Fig. 3 using the computer program PILEUP, and presents the clusters of homology relationships as implemented by the programme to derive the final best alignment. The tree indicates that the EcoRV monomer variants can be divided into two subfamilies (El and E2) on the basis of a set of shared mutations.

sequence-related groups are illustrated in the dendrogram of Fig. 4 and in Table I. Pairwise homology comparisons between monomer variants of the two subsets ranged from 15.4 to 29.8% (21.5% average). Intra-group sequence variations were 6.4 to 19.3% within the E1 group (average 12.5%) whereas in E2 varied from 11.4 to 19.6% (average 14.7%), as indicated in Table I. Despite that the average intragroup homologies found were higher than that between both groups, some particular intra-group pairs were more dissimilar than some inter-group monomer pairs. This notwithstanding, the presence of diagnostic nt in monomer sequences was considered a determinant criterion to assign individual sequences to each group as well as to consider them as distinct satellite D N A subfamilies. Consensus sequences (155 bp) for both monomer subfamilies differ in 17 nt, or 11%. (d) Sequence variations Diagnostic nt appeared well-represented and defined in each satellite subfamily, except for cloned monomers pDTE24-3 and pDTE27-6 which contained diagnostic nt

161 TABLE I P a i r w i s e h o m o l o g y c o m p a r i s o n of m o n o m e r v a r i a n t s pDTE

[24-3

27-6

24-7

27-9

24-3

24-4

24-9

27-9b

-

24-5] a

-

27-6

7.5

-

-

24-7

10.8

9.6

27-9

14.4

12.1

6.4

24-4

10.1

10.3

8.4

12.1

24-9

14.5

13.4

10.9

12.1

27-9b

11.4

11.5

7.7

10.2

9.0

10.3

-

24-5

17.7

16.7

11.6

15.3

12.3

13.5

16.1

-

24-11

19.3

18.2

14.6

17.6

10.8

13.3

15.2

18.4

27-10b

-

.

.

.

. -

-

-

8.3

[24-10

27-10b

24-1

24-6

27-10] b 21.7

20.1

18.4

17.1

17.0

16.9

17.9

20.5

15.4

16.6

18.2

22.0

18.7

20.7

19.4

18.0

17.7

23.4

21.0

23.6

22.3

21.5

21.9

23.9

19.4

21.9

20.0

19.2

20.3

22.8

21.2

24.4

21.8

21.7

23.3

24.1

20.6

21.4

21.3

20.5

21.5

23.4

22.4

28.4

26.5

23.1

23.4

29.8

22.8

25.3

27.2

22.8

21.9

25.8

17.4

-

16.8

24-1

.

.

.

.

.

.

13.5

15.5

.

24-8

.

.

.

.

.

.

12.8

11.5

12.2

.

.

.

.

.

.

24-6 27-10

24-8

.

.

15.8

13.9

15.2

11.4

17.1

19.6

16.6

15.1

.

a Set of m o n o m e r v a r i a n t s of s u b f a m i l y E l . b Set of m o n o m e r v a r i a n t s of s u b f a m i l y E2.

from both groups. In this two monomers, specific E2 nt stretched from position 81 in variant pDTE24-3 and 104 in pDTE27-6, to position 11 in both cases (shown in italics in Fig. 3), except for positions 115 and 118 in which E1 consensus nt are essentially maintained. This pattern of disruption may have arisen from recent point mutation and/or separate gene conversion events. Although it is not possible to unambiguously conclude on the direction of these changes, the high frequency of E1 diagnostic nt in variants pDTE24-3 and pDTE27-6 definitely ascribes them to the E1 subfamily. Among the analyzed monomer variants in group El, coincident mutations occurred in eight positions (8, 30, 47, 111, 116, 120, 132bis (3'-adjacent to 132) and 136), whereas in E2 they did in six positions (10, 28, 76, 99bis, 107 and 117). Moreover, 18 positions common to both E1 and E2 subsets shared also identical changes (9, 12, 35, 41, 42, 47, 55, 77, 78, 99bis, 106, 112, 124, 125, 141, 154, 155 and 155bis). The occurrence of these shared changes does not Seem to bear sufficient identity to allow for establishment of a distinct satellite D N A subgroup. Instead, they were mostly detected as single nt substitutions, randomly scattered with uniform frequency, within as well as between members of both subfamilies. Gene conversion events have been proposed to involve regions containing clusters of mutations either a few bp long or else stretching along several kb (Dover, 1993). In this concern, the pattern of mutations shared among variants of the Dt satellite DNA described here strongly suggests the existence of two distinct levels of gene conversion occurring simultaneously and, in all likelihood, at different rates. The first, probably spreading DNA fragments of a length comparable to the actual

monomer sequence itself, and the second acting on short tracts few bp long, subsequently yielding patterns of identical point mutations non-constitutive of defined stretches. It is worth mentioning here, that singly shared point mutations in monomer variants of the Tenebrio molitor satellite DNA lack any correlation and appear with unrelated frequencies, representing transition stages in the process of sequence homogenization (Plohl et al., 1992). In contrast, stretches of shared mutations have been unambiguously observed in monomer variants of the highly homologous satellite DNA present in the sibling species Tenebrio obscurus (Plohl and Ugarkovi6, 1994b). These examples clearly illustrate a dissimilar contribution of gene conversion events in two closely related insect species. To assess the degree of variability of the Dt satellite DNA exclusive of processes responsible for spreading new sequence variants, consideration of single mutation events was restricted to non-identical changes. In this respect, the average divergence from the consensus sequence relative to nt substitutions for monomers belonging to the E1 group was 4.5%, whereas that for E2 variants was 5.8%. Contribution of insertion and deletion events was essentially restricted to single nt, displaying average frequencies of 1.1% and 2.0% for E1 and E2, respectively. All alterations seem to be randomly spread among the sequenced monomers. The satellite sequence displays a content of A + T residues of 62%. However, since most of the nt alterations involve A or T residues, a shift to higher estimates would be predictable (Table II). Interestingly, the degree of mutation of the Dt satellite appears to be quite close to that of the mussel Mytilus edulis previously described (Ruiz-Lara et al., 1992).

162 TABLE II Analysis of nt changes in the two EcoRV satellite DNA subfamilies Consensus

A G C T

Subfamily E1 changes

Subfamily E2 changes

A

G

C

T

13 3 7

5 0 2

2 2

11 6 9

3

Transversions/transitions= 1.1 C,G~A,T/A,T--*C,G= 2.6

About 1/3 of the total altered positions in these two mollusc satellites appears contributed by identical multiple changes, suggesting that they are non-random and preferentially contributed by homogenization mechanisms rather than by independent mutational events. The inference is further supported by the non-random distribution of base substitutions observed in the heterochromatic repetitive AluI DNA in the brine shrimp Artemia salina (Landsberger et al., 1992). The single-base mutations, occurring outside the sequence blocks which confer a specific solenoidal geometry to the repetitive DNA, are thought to originate under the selective pressure to preserve the molecular structure of the repetitive DNA.

A 6 4 2

G

C

T

3

2 4

8 6 7 -

1 1

2

Transversions/transitions= 1.6 C,G~A,T/A,T~C,G = 2.9

this mechanism is capable to replace subfamilies in the genome (Kass et al., 1995). In this respect, it seems reasonable to consider the possibility that members of the EcoRV satellite subfamilies in Dt are able to replace each other through a process of gene conversion entailing the homogenization of the satellite sequence at the monomer level. In fact, monomer variants can be likewise partially homogenized by gene conversion events spreading short stretches of DNA few bp in length. Although no data concerning the overall organization and chromosomal localization of these satellite subfamilies are presently available for confirmation, homogenization seems to affect all repetitive elements to the same extent.

(e) Evolutionary considerations

(f) Redundancy analysis and genomic content of the

The average number of variant nt per monomer sequence within each group comes out similar to the number of diagnostic nt distinguishing both subfamilies (Table I). Consequently, the actual formation of subfamilies may be explained by means of mutational events generating divergent repetitive units, in turn followed by amplification and spread of a particular monomer variant brought about by a bias in turnover mechanisms. The uniform distribution of diagnostic nt along the monomers, as well as their intact presence in almost all of them strongly points to gene conversion as the responsible mechanism which might have acted on a tract of sequence the length of the satellite monomer. Irrespective of the mechanism involved in the amplification, it can be argued that the formation of subfamilies was followed by similar mutational processes which keep operating alike in both subfamilies. Since the pattern of shared, nondiagnostic mutations does not support any further subgrouping of the monomer variants analyzed, amplification of the monomer variant and subdivision into existing subfamilies might have occurred as a single abrupt event. It has been recently shown that elements of the primate Alu repetitive family can be exchanged between subfamilies through gene conversions, thus demonstrating that

EcoR V repeats The nt sequences of the EcoRV satellite monomers were analyzed using the MicroGenie computer software (Beckman, Palo Alto, CA, USA) to retrieve any potential subrepeat structures within the satellite DNA. The search for direct repeats failed to yield any prominent internal substructures but for a small number of short perfect elements (3) containing several consecutive A or T residues, and a few imperfect repeats 6 bp in length. The sequence 5'-GATTTGAACA was detectable as an imperfect repeat (20-30% variation) in three different locations along the consensus sequences. A few shorter tracts scattered on the DNA sequence could be recognized as partial fragments of this sequence. These repeated motifs might well be relics of an ancient substructure once present in the satellite and which might have changed during evolution to such an extent as to become presently unrecognizable. It has been shown that direct repeated elements in eukaryotic genes are remarkably prone to variations, often involving irreversible loss of the motif itself (Efstratiadis et al., 1980; Fowler et al., 1985). The number of short inverted repeats was found also to be quite low (5). Most of them are presumed unable to adopt thermodinamically stable secondary conforma-

163 tions due to the presence of mismatches. Although absence of inverted repeated motifs has been reported in another mollusc species (Ruiz-Lara et al., 1992), this cannot be considered a generally occurring feature, since inverted repeats susceptible to form cruciforms have been described in DNA satellites from other invertebrates such as the cryptic satellites of the hermit crab (Fowler and Skinner, 1985). The relative genomic abundance of the two EcoRV repetitive sequences was determined from dot-blot hybridizations (Fig. 5). Two monomer clones (pDTE24-4 and pDTE24-8) were selected as representatives of each satellite subfamily and spot-blotted in increasing amounts onto positively charged nylon membranes, together with graded amounts of total sperm DNA. The membranes were then successively probed with the inserts released from both cloned monomers, labelled with

A pDTE24-4 D. trunculus

~:~





O





O

0

g





O

o,



O

50

100 200 ng

pDTE24-8

B pDTE24-4 D. trunculus

pDTE24-8

genomic DNA plasmid DNA

6.25 12.5 25 0

0.2

0.4 0.8

1.6

3.2

ng

Fig. 5. Estimation of the repeat copy number of EcoRV satellite subfamilies. Graded amounts of genomic DNA and cloned monomer variants pDTE24-4 and pDTE24-8, as representatives of E1 and E2 subfamilies, respectively, were spot-blotted by duplicate onto nylon membranes and independently probed with the biotinylated monomer inserts cut out from the same clones. (A) Autoradiograph of the membrane hybridized to the pDTE24-4 monomer insert from subfamily El. (B) Autoradiogram from the blot probed with the pDTE24-8 monomer insert from group E2. Hybridization solutions lacked formamide while the temperature of hybridization and filter washes was set at 72°C. Both screenings were performed under high-stringency conditions to minimize cross-hybridization contributed by nt tracts similarly represented in both subfamily sequences (see Table I). For other conditions, see the legend to Fig. 1.

biotin. After exposure to radiographic film the chemiluminescent signals were processed and quantified using a computer-assisted laser densitometer loaded with the ImageQuant program (Molecular Dynamics). The genomic contents found are 0.23% for the E1 subfamily repeat and 0.09% for that of subfamily E2. Since the size of the haploid DNA complement of Dt has been estimated as 1.4 x 10 9 bp (Hinegardner, 1974), these values correspond to 20 000 copies per genome of the E1 satellite subfamily and to 7600 repeats of the E2.

(g) Conclusions (1) A distinct repeated DNA sequence in ripe sperm of the bivalve mollusc Dt has been detected by EcoRV digestion and cloned. The repetitive units are tandemly arrayed, thus conforming to the characteristic organization of DNA satellites, and account for 0.32% of the clam genome. (2) Sequence analysis of all cloned repeats has unraveled two distinct groups of monomers which can be differentiated by the presence of diagnostic nt representing two satellite subfamilies. Consensus sequences derived from both subfamilies are 155 bp long and differ by 17 nt or 11% divergence. (3) The degree of internal redundancy of the reiterated sequences is low since only a few imperfect, direct repeats are present and which are deemed remnants of a primordial substructure lost during the course of evolution. Inverted repeats are also quite scarce and likely unable to form cruciforms. (4) Satellite subfamilies are thought to emerge from an accidental or sudden spread of monomer variant(s) of the satellite DNA as a consequence of a bias in the turnover mechanism. Subsequently, the sequence changes and concomitantly is subjected to homogenization by events leading to the spread of new alterations throughout and between subfamilies. Amplification of the original monomer variant and emergence of subfamilies represent a transient stage in the overall process of sequence homogenization. (5) The substantial number of shared nt changes detected in the analyzed set of monomer variants indicates a relative high rate of spreading as opposed to unrelated mutational events. Shared mutations occur either as single changes or in the form of stretches of alterations. It is proposed that this pattern emerged as a result of gene conversion mechanisms acting at two different levels of spreading: first, sequence segments in a length similar to the repeated monomers and second, the spread of short sequence tracts few nt long. Gene conversion acting at two levels should be sufficient to explain the evolution, divergence and homogenization of monomer variants in this satellite DNA. The spreading of short sequence frag-

164 ments may have occurred at higher rate, while conversion of entire monomer variants is likely to have been occasional.

ACKNOWLEDGEMENTS

We are greatly indebted to Dr. E). Ugarkovi6 for a thoughtful and critical review of the manuscript. This work was supported in part by grants to L.C. from the spanish Direcci6n General de Investigaci6n Cientifica y T6cnica (PB94-0042), the NATO Scientific Affairs Division (CRG93-1146) and the Generalitat de Catalunya (GRQ93-8024). M.P. was the recipient of a postdoctoral fellowship for Foreign Scholars from the Ministry of Education and Science of Spain.

REFERENCES AI-Sabti, K.: Chromosomal investigation of the mussel (Mytilus galloprovincialis) from the Adriatic sea. Cytobios 58 (1989) 149-153. Badaracco, G., Baratelli, L, Ginelli, E., Meneveri, R., Plevani, P., Valsasnini, P. and Barigozzi, C.: Variations in repetitive DNA and heterochromatin in the genus Artemia. Chromosoma 95 (1987) 71 75. Dover, G.A.: Molecular drive: a cohesive model of species evolution. Nature 299 (1982) 111 117. Dover, G.A.: Molecular drive in multigene families: how biological novelties arise, spread and are assimilated. Trends Genet. 2 (1986) 159-165. Dover, G.A.: Evolution of genetic redundancy for advanced players. Curr. Opin. Genet. Dev. 3 (1993) 902-910. Drouin, G. and Dover, G.A.: Independent gene evolution in the potato actin gene family demonstrated by phylogenetic procedures for resolving gene conversions and the phylogeny of angiosperm actin genes. J. Mol. Evol. 31 (1990) 132 150. Efstratiadis, A., Posakony, J.W., Maniatis, T., Lawn, R.M., O'Connell, C., Spritz, R.A., DeRiel, J.K., Forget, B.G., Weissman, S.M., Slightom, J.L., Blechl, A.E., Smithies, O., Baralle, F.E., Shoulders, C.C. and Proudfoot, N.J.: The structure and evolution of the human beta-globin gene family. Cell 21 (1980) 653-668. Fowler, R.F. and Skinner, D.M.: Cryptic satellites rich in inverted repeats comprise 30% of the genome of a hermit crab. J. Biol. Chem. 260 (1985) 1296 1303. Fowler, R.F., Bonnewell, V., Spann, M.S. and Skinner, D.M.: Sequences of three closely related variants of a complex satellite DNA diverge at specific domains. J. Biol. Chem. 260 (1985) 8964-8972. Hinegardner, R.: Cellular DNA content of the mollusca. Comp. Biochem. Physiol. 47A (1974) 447-460. Hoerz, W. and Zachau, H.G.: Characterization of distinct segments in mouse satellite DNA by restriction nucleases. Eur. J. Biochem. 73 (1977) 383 392.

Kass, D.H., Batzer, M.A. and Deininger, P.L.: Gene conversion as a secondary mechanism of short interspersed element (SINE) evolution. Mol. Cell. Biol. 15 (1995) 19-25. Landsberger, N., Cancelli, S., Carettoni, D., Barigozzi, C. and Badaracco, G.: Nucleotide variation and molecular structure of the heterochromatic repetitive AluI DNA in the brine shrimp Artemia salina. J. Mol. Evol. 35 (1992) 486-491. Lohe, A. and Roberts, A.: Evolution of satellite DNA sequences in Drosophila. In: Verma, R.S. (Ed.), Heterochromatin: Molecular and Structural Aspects. Cambridge University Press, Cambridge, 1988, pp. 148 186. Martinez-Lage, A., Gonzfilez-Tiz6n, A. and M6ndez, J.: Characterization of different chromatin types in Mytilus galloprovincialis L. after C-banding, fluorochrome and restriction endonuclease treatments. Heredity 72 (1994) 242-249. Miklos, G.L.G.: Localized highly repetitive DNA sequences in vertebrate and invertebrate genomes. In: MacIntyre, R.J. (Ed.), Molecular Evolutionary Genetics. Plenum, New York, NY, 1985, pp. 241-345. Pages, M.J. and Roizes, G.P.: Nature and organization of the sequence variations in the long- range periodicity calf satellite DNA I. J. Mol. Biol. 173 (1984) 143 157. Plohl, M., Bor~tnik, B., Lucijanid-Justi6, V. and Ugarkovid, D.: Evidence for random distribution of sequence variants in Tenebrio molitor satellite DNA. Genet. Res. 60 (1992) 7 13. Plohl, M. and Ugarkovi6, El.: Analysis of divergence of Alphitobius diaperinus satellite DNA - roles of recombination, replication slippage and gene conversion. Mol. Gen. Genet. 242 (1994a) 297-304. Plohl, M. and Ugarkovi6, D.: Characterization of two abundant satellite DNAs from the mealworm Tenebrio obscurus. J. Mol. Evol. 39 (1994b) 489 495. Ruiz-Lara, S., Prats, E., Sainz, J. and Cornudella, L.: Cloning and characterization of a highly conserved satellite DNA from the mollusc Mytilus edulis. Gene 117 (1992) 237 242. Sainz, J., Azorin, F. and Cornudella, L.: Detection and molecular cloning of highly repeated DNA in the sea cucumber sperm. Gene 80 (1989) 57 64. Sainz, J. and Cornudella, L.: Preservation of a complex satellite DNA in two species of echinoderms. Nucleic Acids Res. 18 (1990) 885-890. Sambrook, J., Fritsch, E.F. and Maniatis, T.: Molecular Cloning. A Laboratory Manual, 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989. Sanger, F., Nicklen, S. and Coulson, A.R.: DNA sequencing with chainterminating inhibitors. Proc. Natl. Acad. Sci. USA 74 (1977) 5463 5476. Skinner, D.M.: Satellite DNAs. BioScience 27 (1977) 790 796. Smith, G.P.: Evolution of repeated DNA sequences by unequal crossover. Science 191 (1976) 528 535. Strachan, T., Webb, D. and Dover, G.A.: Transition stages of molecular drive in multiple- copy DNA families in Drosophila. EMBO J. 4 (1985) 1701 1708. Stringfellow, L.A., Fowler, R.F., LaMarca, M.E. and Skinner, D.M.: Demonstration of remarkable sequence divergence in variants of a complex satellite DNA by molecular cloning. Gene 38 (1985) 145-152. Tautz, D., Trick, M. and Dover, G.A.: Cryptic simplicity in DNA is a major source of genetic variation. Nature 322 (1986) 652 656. Varadaraj, K. and Skinner, D.M.: Cytoplasmic localization of transcripts of a complex G + C rich crab satellite DNA. Chromosoma 103 (1994) 423 431.