t~ INSTITUTPASTEUR/El SI~VtER Paris 1991
Res. MicrobioL 199i, 142, 217-222
The BIME family of bacterial highly repetitive sequences E. G i l s o n , W. Saurit~, D. Perrin, S. Bachellier a n d M. H o f n u n g Unit~ de Programmation Moldculaire et Toxicologie Gdn~tique, CNRS UA271 gNSERM U163. lnstitut Pasteur, 75724 Paris Cedex 15
SUMMARY Palindromic units (PU or REP) were initially defined as a DNA sequence of 40 nuclentldes which is highly repeated in the genome of several enterobacteria and found in clusters of up to six copies, it appears now that PU belong to a larger repeated DNA element, of up to 300 nucleotides, called BIME for bacterial interspersed mosaic element. BIME is a mosaic combination of ten sinai: DNA motifs, iniudlng the PU sequence. A central question concerning BIME is to determine whether they play a critical role within the cell. BIME exhibit only limited effects on local gene expression; it seams unlikely that these weak effects alone can account for the high BIME sequence homogeneity, it hac recently been shown that DNA gyrasa and DNA polymerase I are able to specifically recognize BiME DNA in vitro. These findings suggest that BIME could play a role in the functional organization of the bacterial nucleoid. Hypotheses on their origin and evolution are discussed.
Key-words : BIME, Palindromic unit, Nu~leoid; Evolution, Repetitive sequences.
Historical account of the repeats
A palindromic unit ' P U ) was originally defined as a D N A element present several times in four intergenie regions of bacterial operons (Higgins et al., 1982). Two years later, a computer search revealed that a large number of intergenie sequences in Escherichia coil a n d in Salmonella typhimurium were homologous to PU DNA, so that PU amounted to almost 1% of the genomie DNA (Gilson et al., 1984, 1987a). A n u m b e r of laboratories kept the name PU family for these sequences (Gilson et aL, 1984; Merino and
Bolivar, 1989), while others called them REP for repetitive extragenic palindromes (Stern et al., 1984; Higgins et aL, 1988). In 1987, a consensus sequence of 40 nucleotides was determined from 118 different PU (Gilson et al., 1987a), The PU is not a perfect palindrome and thus, it can be oriented (symbolized by a triangle in figure l). PU are found ~n clusters, from one to six copies. In a recent study, we showed that all identified PU clusters contain, in addition to the PU, a limited number of other conserved D N A motifs. We have recently shown that PU clusters
can be exhaustively described by a specific combination of ten motifs: Y, Z t, Z -~, A, B, S, L, s, 1 and r (Gilson et aL, submitted) (table I). The motifs Y, Z t and Zz correspond to three homogeneous subsets of the PU sequences (table I). Since PU clusters are mosaic combinations of these and only these ten motifs, we called them BIME for bacterial interspersed mosaic element (Gilson et al., submitted). Our present total collection of BIME amounts to 105, originating from about 20% of the total E. coli genomic sequence; thus, it can be extrapolated that this genome
E. GILSON ET AL,
218 contains about 500 BIME, These 105 IHME present a grcal diversity of motif combinations : from BIME with only one PU to complex structures such aS the one observed in the araA-araD intergenie region ( " S Y s Z : S Y s Z 2 SYsZ2SYs"),
hal segments of a BIME are located between two P U t we distinguished the right internal segments which cantina either the S or the L motifs, and the left internal segments which contain either the s or the l or the r motifs,
The following three rules account for the organization of all known BIME {Gilson vt aL, submitted).
- - T h e "alternance*' rule
--The
"topological" rule
Figure 1 shows how the various motifs are located in well defined positions with respect to the PU. The left external segments are at the left end of a single PUt they can contain either Ibe A or the B motifs. The inter-
When successive copies of PU are present within a BIME, tbey strictly alternate both in orientation and in type of motif (Y or either Z I or Zt)..
--The
"uniformity"
I Ibl
141
I .,ME
jy ! ,.?- +,n
PU
~,t~
PU
~t~it
IqJ
~
Segments
Z;Z2 B
,
s,l,r
v
rule
Within a B I M E , all the occurrences of the same type of
Motifs
S,L.
z?zz
v
A
BIME=Bacterial Interspersed Mosaic Element
I
Fig. L Structure of PU and BIME. The PU is symbolized by ~ grey rectangle enclosing a triangle which indirates it: oriemation, The sar+e PU symbol is used in the other figures. A ~hematie representation of a , extragenic BIME with three PU is indicated. Two of the three rules which account for the diversity of the known B1ME motif combinations (see text) are evoked in this figure as follows. The "'topological" rule: the different se~nents, defined by their topological location with respect to PU, are indicated below the BIME. The "ahernance" rule: th~ three successive copies of PU both alternate in orientation and in type of motif, Le. on the same line one can have either Z 1 or Z2 followed by Y, followed by ehhPr Z n or Z~ and so forth.
BIME= bacterial interspersed mosaic clement. PU ~=paliHdromic unil
I I
segment belong to the same motif subset: all the Z are either Z ~ or Z2; all the right internal segments are either S or L ; all the left internal segments are either s or l or r. Moreover, the homogeneity of any motif is higher between repeats of this motif belonging to a same BIME than between reoeats belonging to different BIME. B I M E localization and p o s s i b l e origin
A total of 102 out of the 105 B1ME are located in an extragenic position. If we estimate that the E. coli genome is composed of ~000 genes, about one extragenic region in six contains a B I M E ; 85 BIME have been positioned on the E. coil chromosome map (fig. 2). Both their distribution along the chromosome and their sequence orientation relative to oriC appear to be random (fig. 2). How can the great diversity of BIME structures be generated? The three rules presented above can be accounted for by considering that BIME containing more than two PU are direct repetition of two convergent P U ; this repeated unit is called the "doublet". We proposed that a special class of BIME formed by two P U are able to generate all the known BIME. Because a large fraction of th~ BIME with two convergent PU have identical structure ( " A Y L Z I B " ) , we assume that they c o n s t i t u t e a family of " g e n e r a t i n g d o u b l e t s " . The generation of all BIME could be ensured by these "generating doublets" by transposition and by sequence modifications such as localized amplification, deletion and m o t i f conversion (fig, 3). A more detailed
REP = repetitive cxtragenic palindromes.
T H E B I M E F A M I L Y OF B A C Y E R I A L H I G H L Y R E P E T I T I V E S E Q U E N C E S
219
Table I. C o n s e n s u s sequences o f the E. coil B I M E m~tifs. Segments
Motifs
PU
Y
ZI Zz
Left External Right Internal
A B S L
Le~ Internal
s I r
Consensusla, ABDGCCGGATGCGGCGYPAACGCCTTA FCCGGCCTACP ABTGCCTGATGCGCTACGCTTATCAGGCCTACP ABTGCCTGATGCGACGCTDGCGCGTCTTATCAGGCCTACP AEEEAEAAZEECTTDPYCEF TTDGCGTTTGYCATCAPBYW NNNNNNNNNCPACT AAAGCATGCAAATTCAATATATTGCAGNNATCP C CGFBGCAC A"B
Occurrences ilt 103 B I M E ~h~
H o m o g e n e i t y ~=J (%)
112
80
43
85
50
85
18 20 34 33
58 60 50 80
21 16 7
60 68
(a) T h e consensus rules are in G i l s o n et al., in p r e p a r a t i o n . N = A , T , G or C ; W = T , A or C ; Z ~ C , A o r (3; P = A o r G ; Y ~ T o r C ; B = A o r T ; D = G or T ; E = A or C ; F = G or C, (b) T h e list o f B I M E is g i v e n in G i l s o n et al., in p r e p a r a t i o n . (¢) T h e h o m o g e n e i t y is d e f i n e d as H = [~i', 100.Ni/(T-Ai) I ]/i w h e r e Ni is t h e n u m b e r o f the bases present in the c o n s e n s u s at p o s i t i o n i, Ai is t h e n u m b e r o f gaps at position i a n d T t h e total n u m b e r o f sequences,
_,
4.7 106bp
let
"'-"---o
Fig. 2. BIME location and orientation on the E. coil chromosome. We assigned a ~ orientation to B]ME which contain either a Z in the same oriemation as the direction ol repiieation, or a Y in the Opposite orientation as the direction of replication. The direction of repiicafion is given by the two arrows Pm:ated on each side of oriC. The BIME oriemation was determined using the orientation of transcription relative to replication tBrewer, 1990). When RIME are in + orientation, the flags point in the same direction as replication, and inversely. When orientation is not known, the BIME is represented as a circle. The known sequences of the E. coli chromosome are repru~ented by black hats when their size is more than 2,000 bp IKrSger et aL, 1990).
2".0
f . GIL~ON ET ,4L.
"generating doublet" A Y L ZI g
It TRANSPOSITION
LOCALIZED AMt'LIF|CATION
MOTIF CONVERSION
Fig. 3. BIME diversity from a "generating doublet". A "generath~g doublet" (see textJ, schematically represented at the top of tile figure, can undergo either a transpositiol) ¢vcm or localized amplification leading to BIME with more than two PU or a motif conversion event, leading to modified motifs within a segment (for example an L motif is converted to an S motif). During the localized amplification process, the joinl ~egments belween the generating doublet repeats constitute new sequences which correspond Io the lefl inteln~l segmenls (Gilson el al., submitted). RiS - righl internal segmenl ; LIS= left inlernal seEraent.
argumentation of this model is presented elsewhere (Gilson et al., submitted). Whal are BIME doing in the cell ? A central question concerning BIME is to know whether the:,' are playing an essential role ~ith~.r the cell. T w o main approaches have been dovelopped to study BIME function: effects of BIME on local gene expression and b io ch emical characterization of BIME-prorein interaclions.
role in these processes, it seems unlikely that any of them alone can account for the high BIME sequence homogeneity for at number of reasons presented elsewhere (Gilson et aL, 1987a). Furthermore, BIME are not a identical chromosomal positions between closely related bacteria, for example between E. coil and S. lyphimurium (Gilson et ill., 1987b; Bachellier et al., in preparation), and thus they do not appear to be essential for the accurate expression of a given gene (or group of genes). Specific binding sites for DNA
gyrase and D N A polymerase I Limited effects on local gene expression From studies on their effects on local gene expression, BIME have been invoived in transcription termination (Gilson et aL, 1986a), in mRNA stabilization (Newbury el aL, 1987a,b) and in control of translalian (Stern et al., 1998). However, even if some BIME are able to play a
The observation that BIME specifically b i n d nucleoidassociated proteins (Gilson et aL, 1986b) could provide a plausible cause for B l M E sequence homogeneity. Recently, it has been shown that D N A gyrase (Yang and Ames, 1988) and D N A polymerase I (Gilson et aL, 1990a) are able to specifically recognize BIME DNA. !n both
cases, specific binding has been shown to occur in v i , o; but, to our khowledge, in viva data are still lacking. In both cases, the interactions occurred with the PU motifs present in BIME. Purified D N A gyrase binds specifically to PU (Yang and Ame s , 1988; Gilson, unpublished results). The histone-like protein HU appears to modify the p r o p e r t i e s o f the D N A g y r a s e - B I M E i n t e r a c t i o n . In vitro, the presence of H U stimulates the binding of gyrase to BIME and inhibits the gyrasemediated D N A cleavage sites (Yang and Ames, 1990). As a consequence of these specific interactions, it can be asked whether the presence of BIME D N A modifies the supercoiling activity of gyrase. In vitro experiments show that the presence of H U increased the efficiency of supercoiling of a relaxed plasmid D N A containing a BIME sequence as compared with a plasmid D N A devoid of BIME sequences (Yang a n d Ames. 1990).
Starting from a crude &. co/~ cxtracl. lwo activities which specifically protect a PU against a digestion with exonuclease ttt were purified. One of these activities was doe to Poll. This interaction requires the pwsence of PU. It was confirmed and analyred by native gel electrophoresis and DNase footprinting experiments. The other activity is less characterized but has been shown to be devoid of DNA gyrate (Gilson et rrl., 199ftzi).
al.. 1990). it is tempting to speculate that the specific DIMEgyrase Interaction may be involved in some orocrsx~ governing the gene&m of certain BIME (for example. during the :ramposition of the “gencrsting doublet”). On the other hand, the localized amplification events observed in certain BIME could be explained by a preferred entry or pausing or slippage site for Poll in these
BIME we parasitic elements
This finding was the first evidence that Pal I is able to bind intact duplex DNA. I: appears that Pal 1 is able to perform two types of DNA binding: the first is its “classical” capacity to bind DNA ends and nicks and the second is its capacity to rccognix duplex DNA in a requencedependent manner. On linear duplex DNA, it appears that PolI first binds DNA ends and then duplex aNA. preferentially at PU sites (Gilson er ol., 1990~3).
BIME are species-specific sequencer
Bocteriol distribution. They are present in a small subset of Enrerobacteria clorely related to E. cob.
I
A role in bacterial chramatin fuunctional organization ? The multiple protein binding capacities of BIME (see above) suggestthat these sequences can be involved in the formation of various type of nucleoprotein complexes. These complexes could have structural properties (for example by a anchoring distant segment of the DNA to form an independent topological domain) or functional properties (for examole bv facilitatine the &try on ihe CNA of proteins, likr gyras? or DNA Poll, or by modifying the local concentration of these oroteins). Another possible effect of the BIME-nrotein interactions is that they piay a role in the BiME DNA formation. On the one hand, since the topoisomerase activity is very similar to the activity catalysed by some sitcspecific recombinases (Wang et
DNA
If BIME constitute an essential genetic element for the functional organization of the nucleoid (see above), it is reasonable 10 predict that BIME exist in bacteria other than E. coli and S. ryphimurium. By searching sequence databases and by hybridization erperimerits, we found BIME only in enlerobacteria closely related to E. eoli (Gilson et 08.. t99Ob; Bachellier et al., in preparation). Within these bacteria, BIME motifs exhibited Eequcnce species-specificity; for example, the 2’ cooseosus in E. coli and in S. typhimunum differs at two positions. Up to now, we are able to define three speciesspecific groups of BIME: one common to E. cob and Shigek, one in Solmonelfa and in Cirrobackv and one in Klebsiello (Gilson et oi.., submitted; Bachellier er ol., in preparation). Whether other groups exist in more distantly related bactena IS still an open question. Such new groups of BIME are currently searched for by sequencing a test set of intergenic regions. Parasitic element?
or essential
DNA
Two point of view, presented below as a series of statements, are still compatible with all the avsilable data on B1ME.
DNA
BIME arc rome kind of prokaryotic selfirh DNA (Doolitde and Sapiewa, I9RO). FOU?lUl,Oll. They are generated by a series of mcchanisms involving transposition or retrorransposition (gjrase. Poll, other proteins?). Their divernity results from localized sequence modifications (amplification, deletion and conversion).
Sequence homogeneity. Their homogeneity is maintamed by efficient DNA turnover mechanisms (for example, RNAmediated gene conversion or recombination). Function. - They can son~etimes he recruited by the cell to achieve a local function in gene expression (for example, mRNA stabilization or transcription termination).
BIME are essential etelIle”ts Formation. -
DNA
AS above.
Bucreriol disrrrburion. BIME are widely distributed amongst the bacterial world in a series of species-specitic groups. Sequence homogeneily. BIME are conswed because of an important role played by them. Funcrion. RIME are involved in an essential role for the cell. This role an be related to the functional and structural organization of the bacrerial chromatin. Further results are needed to decide if one or a combination of these points of view holds.
222 Acknor* ledgcments lhI~ work was supported by srants from tile Fondali~ pour la Redlerchc
Mcdicalc, the Ligue Nalional¢ cnlnr¢ le Cancer, and the Axsoclatlon pour la Recherche sur le Cancer. References Brewer, b,J, (I900L Replication and tbe transcriptional organization of Ihe E. coil chromosome, in "The baclerial chromosome" (K, Drllca & M. Riley) (pp. 61-83L American Society for Microbiology, Washington, D.C. Dooliltle, W.F. & gapienza, C. (1980), Selfish genes, the phenolype paradigm and genome evolution. Nature (Lond.I, 284, 601-603. Gilson. E.. CIhmenl. J.M.. Brutiag, D. & Hofnung, M. (1984l, A family of dispersed repetitive exlragenie palindromic DNA sequence:, in E. coll. EMBO J., 3, 1417-1421, Giison, E., Rousset, J.P., CIhmem, J.M. & Hofnung, M. (19g6a), A subfamily of E. coil palindromic units implicated in transcription termination? Ann. lost. Pasteur/MicrobioL, 137B, 259-270. Gilson, E., Perrin, D., Clement, J.M., Szmelcman, S., Dassa, E. & Hofnung, M. (1986b), Palh=dromic units from E. coti as binding sites for a chromoidassociated protein. FEBS Letters, 206, 323-328. Gilson, E., Clement, J.M., Perrin, D. & Hofnung, M. (19gTa), Palindromic units: a case of highly repetitive DNA sequences in bacteria, Trends GeneL, 3, 226-230.
E. GILSON ET AL. Gilson, E., Perrin, D , Saurln, W, & Hofnung, M. (1987b), Species specificity of bacterial palindromic units. J, tool, Ecol., 25, 371-373, Gilson, E., Perrin, D. & Hofnung, M, (1990a), DNA polymerase I and a protein complex bind specifically tO E, colt palindromic unit highly repetitive DNA: implications for bacterial chromosome organization. Nuct. Acids Res,, 18, 3941-3952, Gilson, E., Bachellier, S., Perrin, S,, Perrin, D , Grimont, P.A.D., Grimont, F. & Hofnung, M. (1990b), Palindromic unit highly repetitive DNA s:quenees exhibit species specificity within Enterobacteri¢cane. Res. MicrobioL, 141, 1103-1116. Higgins, C.F., Ames, G.F.-L., Barnes, W.M,, Clement, J.M. & Hofnung, M. (1982), A novel intercistrooic regulatory element of prokaryotic operons. Nature (Lend.), 298, 760-762. Higgins, C.F., McLaren, R,S. & Newbury, S. (1988), Repetitive extragenic palindromic sequences, mRNA stability and gene expression: evolution by gene conversion? - a review. Gene, 72, 3-14. Krdger, M., Wahl, R. & Rice, P. (1990), Compilation of DNA sequences of E. coil (update 1990}. Nuct. Acids Res., 18, 2549-2552. Merino, E. & Bolivar, F. (1989), The ribonucleoside diphosphate reductase gene (nrdA) of F. coil carries a repetitive extragenie palindromic (REP) sequence in its 3' structural terminus. Mot. Microbiol., 3, 839-841.
Newbury, S.F,, Smith, N.H., Robinson, E.C., Hlles, i.D. & Higgins, C.F, (1987a), Stabilization of transcriptionally active mRNA by prokaryotie REP sequences. Cell, 4g, 297-310. Newbury, S.F., Smith, N.H. & Higgins, C,F. (1987b), Differential mRNA stability controls gane expression within a polycis. tronic operon. Cell, 51, 1131.1143. Stern, J.M., Ferro-Luzzi Ames, G., Smith, N.H., Robinson, E.C. & Higgins, C.F. (1984), Repetitive extragenic palindromic sequences: a major component of the bacterial genome. Cell, 37, 1015-1026. Stern, M.J,, Prossnitz, E. & Ames, G.F.-L. (1988), Role of the intercistronic region in posttranscriptional control of gene expression in the histidine transport operon or S. typhimurium: involvement of REP sequences. Mot. Microbiol., 2, 141-152. Wang, J.C., Caron, P,R. & Kim, R.A. (1990), The role of DNA topoisomerases in recombination and genome stability: a double-edged sword? Cell, 62, 403-406. Yang. Y. & Ames, G.F.-L. (1988), DNA gyrase binds tO the family of prokaryotic repetitive extragenie palindromic sequences. Prec. nat. Acad. Sci. (Wash.), 85, 8850-8854. Yang, Y. & Ames, G.F.-L. (1990), The family of repetitive extragenie palindromic sequences: interaction with DNA gyrase and histonelike protein HU,/,7 "The bacterial chromosome" (K. Drlica & M. Riley) (pp. 211226). American Society for Microbiology, Washington, D.C.