PHYSlCA Physica A 244 (1997) 497-509
ELSEVIER
Chromosome structure and gene regulation Jonathan Widom* Department of Biochemistry, Molecular Biology, and Cell Biology, and Department of Chemistry, Northwestern University, Evanston, IL 60208-3500, USA
Abstract
The packaging of DNA in chromosomes presents obstacles to the action of gene regulatory proteins and polymerases on their natural chromatin substrates. We critically assess previously existing ideas for how these obstacles may be overcome, and then summarize a recently proposed model and discuss its implications.
1. Chromatin structure
This review addresses unresolved problems in the regulation and action of genes that arise from the natural organization of DNA in chromosomes. Each chromosome of a eukaryotic cell (i.e., from an animal, plant, fungus, or protist) contains a single molecule of double-stranded DNA that has a contour length very much greater than the diameter of the cell's nucleus. For example, a typical human chromosome contains a DNA molecule with a contour length of several centimeters. This chromosomal D N A is folded in a hierarchical series of steps by a family of highly conserved proteins, eventually leading to a ~ 10000-fold linear compaction of the DNA immediately prior to cell division. (The complex of D N A and proteins is known as chromatin.) Key unanswered questions arise even with the lowest level of chromatin organization; as this is also the best understood structurally and the most readily accessible to experimental analysis in vitro, we will focus the discussion on this level of structure. The first level of chromosome organization is based on a repeated structure known as a nucleosome. In each nucleosome, a short stretch (147 bp in length) of the very long chromosomal D N A is locally folded and compacted by a family of conserved proteins known as histones. Two molecules each of four different histones (known as H2A, H2B, H3, and H4) aggregate together to form a compact core that has the property of causing the D N A to wrap in ~ 1¼ superhelical turns on its outer surface.
* Tel: 847/467-1887; fax: 847/467-1380; e-mail
[email protected]
0378-4371/97/$17.00 Copyright © 1997 Elsevier Science B.V. All rights reserved Pll S 0 3 7 8 - 4 3 7 1 ( 9 7 ) 0 0 2 2 5 - 2
498
J. Widom /Physica A 244 (1997) 497-509
The resulting particle has the shape of a flat disk, with a diameter of 11 nm and a height (thickness) of 6 nm. This level of structure is particularly well understood. There are now available X-ray crystal structures of the completeonucleosome at 7 ,~ resolution [1], and of the protein core of the nucleosome at 3.1 A resolution (which suffices to reveal the path of each polypeptide chain) [2]. This packing motif is repeated at intervals, hundreds of thousands or millions of times along the DNA, with each nucleosome separated from the next along the DNA by a short variable-length ( ~ 10-80 bp) stretch of "linker" DNA [3]. DNA sequences that are organized in nucleosomes are largely inaccessible to other proteins. This inaccessibility is due to steric exclusion, a necessary consequence of the proximity of DNA to an impenetrable surface. The recognition of DNA sequences on a nucleosome is further hindered by steric exclusion from adjacent gyres of the DNA on the same nucleosome, as well as from the organization of the nucleosome filament into higher-order chromatin structures [4]. Most DNA in vivo is packaged in nucleosomes, but many DNA sequences are critical for gene regulation, and these must be accessible at appropriate or specific times to the regulatory proteins that act on them. What, then, is the principle that guarantees that regulatory proteins may have access to their DNA target sequences when necessary?
2. Existing ideas Despite the great importance of this question, it is not known how regulatory proteins are provided the necessary access to their target sequences. The problem has been recognized for some time, however, and potential answers have been suggested. At present three ideas predominate (for references, see [5, 6]). (i) A window of opportunity. Perhaps, following DNA replication, nucleosome assembly is delayed; regulatory proteins that are present at this time are given an opportunity to bind to their target sites as naked DNA; then, after some time, the remaining naked DNA is assembled into nucleosomes. This model is considered by some investigators as a paradigm for gene regulation in differentiation and development. The model is suggested by evidence from some studies that cells need to pass through S-phase of the cell cycle (when DNA is replicated) before they can switch the activity state of certain genes. (ii) Precise positioning. Perhaps chromosomes are organized much more carefully than has previously been recognized, so that DNA regions that are critical for regulation are simply never packaged in nucleosomes in such a way as to be inaccessible. Target sequences may be always oriented on nucleosomes so as to face "outward", or else they may be restricted to linker DNA or to longer nucleosome-free regions. In this model, nucleosomes may also carry out essential repressive roles, acting at other sites to keep those sites inaccessible, preventing expression of certain genes in particular cell types. This model is suggested by studies using nucleases that
J. Widom / Physica A 244 (1997) 497-509
499
attack linker DNA preferentially over DNA in nucleosomes as probes for the locations of nucleosomes. Sites that are readily digested are attributed to the "linker DNA" regions of chromatin, whereas regions ~ 147 bp in length that are difficult to digest ("protected") are attributed to D N A that is packaged in a nucleosome. Many cases are now known in which, in the vicinity of gene regulatory sites, protected regions attributable to nucleosomes end abruptly, with single nucleotide resolution, at particular chromosomal locations. In some cases arrays of several or many consecutive nucleosomes appear to be precisely positioned in this sense. (iii) Active invasion. Perhaps instead, regulatory proteins may have some capability of active invasion, so that steric exclusion simply does not apply. Proteins may recognize the existence of target site surfaces that they cannot touch, and then carry out a series of steps to actively expose these sites and bind them. This model is suggested by a number of surprising studies that establish that some proteins can in some cases bind to sites that really are buried in nucleosomes. In some examples, binding occurs in a process that is dependent on hydrolyzable ATP and on additional protein factors which have been shown genetically to be involved in gene activation and to interact with histones in vivo. Each of these three models has significant limitations. The window-of-opportunity model has well-established counterexamples. The best known of these are phosphate induction of the yeast PH05 gene and glucocorticoid induction of M M T V - L T R transcription (for a discussion of this point, see [7]). The essential observations are that cells that are prevented from undergoing DNA replication by drugs or mutation, or that do not undergo replication during an experiment simply because they grow slowly, can still switch reversibly between transcriptional states - with corresponding changes in chromatin-structural states as detected by sensitivity to nuclease digestion - in response to changes in their environment. Thus, this model might apply in some instances, but it is not a general solution to the problem of protein access. The precise-positioning model is perhaps the most widely held of the three. Nevertheless, it violates fundamental physical principles. The free energies that contribute to specifying the positioning of nucleosomes along the DNA [8] are far too small to guarantee that a particular DNA sequence will be similarly organized in the chromatin of every cell, at all necessary times, in a large multicellular organism or particular tissue. It is illustrative to consider the case of prokaryotic repressor proteins as gene-regulatory factors. Repressor proteins do indeed bind to and act at "precise sites", but it is not helpful to use this language. Rather, it is recognized that repressor proteins have a finite free energy difference for binding to the specific site versus binding to a large set of alternative non-specific sites, and one equates their activities in gene regulation to the equilibrium fractional occupancy of the specific site [9]. Previous reports of "precise positioning" of single or multiple adjacent nucleosomes must be considered as reflecting preferential positioning at those sites. Moreover, on DNA sequencing-style gels (with which the data leading to the "precise positioning" model are obtained), all DNA fragments of differing lengths are resolved at single
500
J. Widom / Physica A 244 (1997) 497-509
nucleotide resolution. Investigators should focus not on the sharpness of any particular product band, but rather on the intensity of particular product bands relative to the intensities at the set of all alternative positions. It is important to recognize the statistical nature of positioning because this has substantial ramifications for proposed mechanisms of gene regulation. If positioning is not "precise", then essential DNA regulatory sequences will sometimes be buried when they need to be accessible, or will sometimes be accessible when they need to be repressed (buried). Mechanisms proposed for gene regulation must be robust to inevitable fluctuations. The active invasion model as currently defined has two substantial problems. First, it has not yet been developed to the point of having a definite, testable, proposed mechanism. Second, a major unresolved question is how proteins that might be capable of active invasion of nucleosomes know which nucleosomes to invade - unless it is established in advance that specific DNA recognition sequences will be accessible - in which case the question again arises how such accessibility is achieved. In summary, at present it is not known how regulatory proteins are guaranteed and provided access to their DNA target sites. It is clearly established that regulatory proteins can in some cases bind to nucleosomal target sequences in vivo or in vitro (see [5, 6]), but the underlying principles are not known. In particular, in such cases: (i) It is known how the proteins obtain access to their binding sites. (ii) Some proteins reportedly "can bind" whereas others "cannot"; there is no framework for understanding these observations. (iii) For proteins that "can bind", there is no basis for understanding the apparent binding constants. (iv) Protein binding to nucleosomal target sites can occur cooperatively, but the mechanism of this is not known. (v) The effects of binding on the structure and stability of the nucleosome are not understood. (vi) It is not clear how the studies of regulatory protein binding in vitro relate to the situation in vivo, where the concentration of free regulatory protein may be substantially lower.
3. The site exposure model
Given these problems in existing ideas, it is important to develop and consider alternative models. We are led to a surprising postulate: perhaps nucleosomes (N) are not inert, frozen structures as one had previously imagined, but perhaps instead they are dynamic, transiently exposing their DNA, perhaps (as one way of thinking about it) by an uncoiling mechanism, such that in the exposed state (S) regulatory proteins (R) may bind as though they were binding to naked DNA. Analogous nucleosome conformational changes have previously been proposed, but only for the very ends of the core particle DNA [10, 11]; the remainder of the nucleosomal DNA has previously been considered inert. Now the question arises, are only the end DNA segments "dynamic" in this fashion? Or, if one looks more sensitively, can one see such dynamic behavior along more, perhaps all, of the nucleosomal DNA length? We
J. Widom / Physica d 244 (1997) 497 509
501
devised a simple model that captures these ideas, and a sensitive experiment to test and to quantify it I-5]. The site exposure model is illustrated in Fig. l(a). We make the simplifying assumption that sufficient nucleosomal DNA is exposed such that the rates and equilibria for binding to an exposed nucleosomal target sequence or to a naked DNA target sequence are identical. Thus, k12
N,
k21
k23
'S+R,
k23
'RS
and
S+R,
k32
'RS
(1)
k32
for nucleosomes and naked DNA, respectively. With this assumption, binding of a regulatory protein to a nucleosomal target sequence will occur with a net free energy change AGnet, 0 and an apparent dissociation constant K~pp.... t, given by AGOet = AGconf o -b AGnake o d ONA
and
[Tapp .... t "'~d =
IU naked DNA / LTc°nf ~t'Ld /'~-eq ,
(2)
where geCqnf is the equilibrium constant for site exposure in the nucleosome (k12/k21), g n a k e d DNA d is the dissociation constant for binding to naked DNA (k32/k23), and 0 0 AGco,f and AGnake d DNA are the corresponding free energies. In this model, ALeq K c°nf depends primarily on the translational position of the target sequence within the nucleosome (AGco,f depends on the length of DNA being uncoiled i.e., on the length of protein-DNA interface that is disrupted). However, the effective Kconf will also depend on other factors such as the size and shape of the regulatory eq protein, the rotational setting of the target site around the periphery of the DNA helix, and on DNA bending induced by the protein, since these affect the amount of DNA that must be uncoiled to allow the protein to bind. We devised a sensitive procedure to detect and quantify the postulated conformational equilibrium. We replace the regulatory protein with restriction enzyme (E), and we construct nucleosomes having a site for E at a known position in the particle. If this conformational equilibrium exists yielding an exposed state, the restriction enzyme can bind (yielding ES) and catalyze cleavage of radiolabeled substrate to yield products (P), which can be resolved by gel electrophoresis and quantified using a phosphorimager. Thus, k12
N (
k23
' S+ E ~ kEt
k23
' ES k32
k34~E + P, and S + E '
~ ES
k34)E + P
k32
(3) for nucleosomes and naked DNA, respectively. The kinetic analysis of this mechanism is well understood. One expects on theoretical grounds, and we demonstrate experimentally, that the system obeys a rapid pre-equilibrium limit. We measure the first-order rate constants kobs for loss of
J. Widom / Physica A 244 (1997) 497-509
502
(a)
__k23© k21
~0
k3 2
1)
+
®
(b)
// XN NY
XNY
Fig. 1. (a) The site exposure model illustrated for a single nucleosome. The histone octamer is shown from above as a disk with the DNA coiling around it. A particular DNA target sequence (stippled) is inaccessible to the regulatory protein (R) that acts on it. k l z and k21 are position-dependent apparent rate constants for site exposure and recapture, respectively. Exposure of sites nearer the middle of the nucleosomal DNA may occur by several successive steps of exposure of shorter segments from an end as illustrated; each smaller step would have its own microscopic rate constants. k23 and k32 are microscopic rate constants for binding and dissociation of R from its target site, and pertain to naked DNA as well as to the exposed state of nucleosomes. Real nucleosomes exist in long chains; but this need not prevent uncoiling such as illustrated. With just modest deformation of the linker DNA, a combined uncoiling coupled to a motion of the uncoiled DNA in a direction parallel to the axis of the nucleosomal disk allows uncoiling beyond the dyad (which is as far as necessary to allow binding anywhere) with no required crossings and with little motion of other nucleosomes. Higher levels of chromosome structure may need to be disassembled prior to the site exposure process illustrated here, but are also believed to possess only marginal stability. (b) Cooperativity in the binding of multiple proteins to target sites in a single nucleosome. A nucleosome is shown containing binding sites (stippled) for two proteins (X) and (Y). X and Y may be two unrelated proteins, or two molecules of the same protein. X is defined as the protein binding to the outer-more site, and Y as binding to the inner-more site. AG~ is the free energy cost for uncoiling enough DNA so as to expose the site for X. AG2 is the additional free energy cost for uncoiling sufficient additional DNA so as to expose the site for Y. In some cases, X and Y may have "conventional" cooperative interactions, also detectable in their binding to naked DNA (e.g., from favorable protein-protein contacts between X and Y); these are collectively represented as AG3.
reactant,
for nucleosomal
equilibrium
constants
these quantities.
DNA
and for naked
for site exposure
DNA
in identical conditions.
I( c°"f a r e t h e n c a l c u l a t e d --eq
The
from the ratio of
J. Widom / Physica A 244 (1997) 497-509
503
In addition to sensitivity, other goals in the experimental design included (1) using prokaryotic proteins as probes, so as to eliminate undefined possibilities of magical capabilities in eukaryotic proteins, and (2) to develop an experiment that is thermodynamically equivalent to, but mechanistically different than, the (previously unexplained) binding experiments carried out by others, as a way of providing an independent test for the correctness of our ideas. The DNA constructs are designed to strongly bias the positioning of the test nucleosome [5, 12]. For each construct, patches of DNA were engineered which contained sites for multiple enzymes within roughly one helical turn of DNA. By measuring the apparent K~qnf for the different sites within one patch, we discriminate between true site exposure and inadvertent construction of a site that faces "out". Families of constructs were generated that allowed measurement o i r ~r :e qc o n f for patches positioned throughout the nucleosome. Key control experiments included: mapping and assays for the homogeneity of nucleosome positioning; assays for the integrity of the nucleosomes throughout the digestions; and tests for possible direct roles of the restriction enzymes in facilitating site exposure. The most important finding is that site exposure does in fact occur, even over the nucleosomal dyad (the middle and presumably least-accessible region of the nucleosomal DNA), with substantial values for ~onf ~Leq • This dynamic property intrinsic to nucleosomes provides a general mechanism guaranteeing that regulatory proteins may have access to their DNA target sequences. The measured equilibrium constants for site exposure decrease more-or-less progressively as one moves inward from an end into the middle of the nucleosome, from 1-4 × 10 - 2 just inside the core particle, to ~ 10 5-10 -4 over the dyad axis. Such behavior is consistent with the simple uncoiling picture as illustrated. One may picture the DNA wrapped on the histone surface as making contacts ("bonds") in a small patch, every ~ 10 bp, each time the phosphodiester backbone (minor groove) faces inward toward the octamer. In that case, uncoiling proceeds stepwise, with an incremental increase in energetic cost AGco,f (i.e., decreased equilibrium constant Kconf~ eq I associated with each additional 10 bp-long segment uncoiled.
4. Prediction of results from equilibrium binding studies These results explain and clarify a large body of phenomenological experimental data in the literature, from studies that investigated the ability of proteins to bind to nucleosomal target sites. In many such studies, it was found that proteins "could bind"; but how they gained access to their target site was not known, and the resulting binding constants were unexplained. Some proteins apparently "could not" bind; but there was no basis for understanding why some proteins could bind whereas others (sometimes very closely related) could not. The site exposure model provides answers to these questions. For those cases where proteins "could bind", the site exposure model provides a physically plausible mechanism for how they gain access to their target sites. Moreover, it allows us to predict the
504
J. Widom / Physica A 244 (1997) 497-509
outcomes of the equilibrium binding studies: given the (readily measured) affinity of a protein for its target site on naked DNA, and the location of that target site in a nucleosome (for which our results provide the corresponding ~Leq tcc°nqj, the predicted affinity for the nucleosomal target site is given by Eq. (1). There is an excellent agreement between the predicted and the measured apparent dissociation constants. For further discussion, and an analysis of certain a p p a r e n t exceptions, see I-5]. Evidently, the site exposure mechanism and Eq. (1) provide a framework for analysis and interpretation of these binding studies. The site exposure model also explains why in seemingly arbitrary cases it was found that certain proteins "could not" bind to their target sites within nucleosomes. In many (possibly all) such cases, the observed failure to bind can be attributed to a simple V-apparent consequence of ending the titrations too soon, prior to reaching the ""d that we predict for that system. Plainly, it is important to extend titrations to sufficiently high vapparent concentrations so that the free concentration exceeds the predicted ,,-d
5. Cooperativity (synergy) in the binding of multiple proteins to sites within a single nucleosome The site exposure model has within it the potential for important novel cooperative (synergistic) interactions between multiple proteins binding simultaneously to sites within a single nucleosome 1-13]. This cooperativity is distinct from any "conventional" (e.g. direct or other) cooperative interactions that may also exist between the proteins. The origin of this potential cooperativity (synergy) is illustrated in Fig. l(b). It arises from the possibility that, once protein Y has bound, the binding of protein X may take place without having to pay the energetic penalty for site exposure (here defined as AG °) which otherwise would be required. Similarly, the ability of X to bind facilitates the subsequent binding of Y, since at least some of the final free energy penalty for the required conformational change is already paid. These processes are linked in a thermodynamic cycle, hence the same coupling free energy and corresponding binding cooperativity necessarily applies regardless of the order in which the proteins bind. The important conclusion is that X and Y may act cooperatively (synergistically) even if they do not touch: the binding of one protein radically alters the binding ability of the other. No special properties are required of X or Y: they need only bind DNA for this cooperativity to be manifested. X and Y may be two different proteins, or they may be two molecules of the same protein. By comparing the free energy for the binding of Y in the absence and presence of X bound at saturation, it can be seen that the amount of cooperativity between X and Y (the coupling free energy 6 G x v - the free energy by which the prior binding of X facilitates the binding of Y, or vice versa), is simply equal to - AG O-i.e., minus one times the energetic cost for exposing the outer more site. This quantity is precisely what we defined and measured as - AGconfin the previous section - the energetic cost of the required conformational change when there was only one protein binding.
J. Widom / Physica A 244 (1997) 497-509
505
This cooperativity (synergy) is a necessary consequence of the site exposure mechanism if and only if binding of one protein does reduce the conformational free energy penalty for binding at another site. If the site exposure process somehow operates in such a way as to expose individual sites while even closely positioned flanking sites are unaffected, then cooperativity would not observed. Remarkably, this model, with no adjustable parameters, accounts quantitatively for a diverse set of experimental results on cooperative binding of various proteins to nucleosomal target sites obtained by another laboratory [14]. Moreover, there is very good agreement between the predictions of this model and the experimental data even using AGco,f obtained from the restriction enzyme digestion kinetic measurements instead of the 6Gxv obtained directly from the primary cooperative binding data. These two experiments are completely unrelated except through the site exposure model, and therefore the agreement between the two provides strong evidence for the applicability of the site exposure model to the behavior of real nucleosomes. These results have many important ramifications. (1) Real nucleosomes in vitro do behave in the manner described by Fig. lb, with the potential cooperativity of that model fully realized. Such behavior evidently requires mechanical linkage between events at the two binding sites, consistent with uncoiling from an end as illustrated. (2) This cooperativity which is intrinsic to nucleosomes means that cells can control the occupancy at, for example, X's binding site, either by varying the concentration of X itself, or by varying the concentration of Y, with no requirement for conventional direct cooperative interactions between X and Y. This idea provides a natural mechanism for the construction of cooperative (synergistic) multi-protein control modules from the combinatorial action of independent and arbitrarily chosen parts. (3) The free energy of this cooperativity (6Gxy), obtained as AGoo,f in our earlier studies, ranges from (minus) 2.5 to 6 kcal mol-1. These large coupling free energies greatly increase the occupancies achieved by binding proteins that are present at realistic concentrations, compared to their occupancies if they act independently. In real systems, X and Y may have direct "conventional" cooperative interactions, represented by AG3° in Fig. l(b), in addition to the cooperativity that arises from competing against a common competitor. In that case, the net cooperativity will be given by the sum of ~Gxv + AG °. As one measure of the significance of the inherent cooperativity one can compare AGxy with typical values for AG o measured in real systems. A survey of several well-known conventional cooperative interactions having clearly established significance in gene regulation reveals typical values of AGo of 1-2 kcal mol-1 [13]. Our novel cooperativity free energies are substantially greater than these free energies of previously recognized "conventional" cooperative interactions.
6. New insights and predictions from the site exposure model
The site exposure model provides physically plausible explanations for a variety of additional phenomena associated with chromosome structure and gene
J. Widom / Physica A 244 (1997) 49~509
506
regulation. We conclude with discussions of several of the most important of these. (i) Roles of posttranslational modifications and histone #ene variants. Each of the histone proteins is subject to a diverse set of specific posttranslational modifications such as phosphorylation, acetylation, methylation, and others [15]. These modifications can occur independently at multiple specific sites in each protein, giving rise to a rich combinatorial complexity of distinct states of nucleosomes. Moreover, cells contain multiple genes coding for variants of each of the histone proteins [15], adding further to the combinatorial diversity of distinct nucleosomes. The presence of particular variants or post-translationally modified states of the histones is linked to gene regulation in vivo [6], but mechanisms are not known. The site exposure model provides a mechanism by which histone posttranslational modifications and gene variants may affect gene regulation. Changes to the detailed state of the nucleosome may plausibly affect the magnitude of geCqnf ( k 1 2 / k 2 1 , see Fig. l(a)), thereby changing the net free energy for regulatory protein binding and hence the occupancy of regulatory sites that are achieved for any given protein concentration. In the context of the model of cooperative binding (Fig. l(b)), posttranslational modifications and gene variants may affect either AG1 (6Gxy) or AG2, or both. (ii) Histone HI. In addition to the four different "core" histone proteins that make up the octameric core of the nucleosome, each nucleosome contains one additional molecule of a fifth histone known as H1. H1 is located on the outer surface of the nucleosome near the dyad axis [16]; it interacts with one or both of the DNA segments entering and leaving the nucleosome, as well as with particular core histone proteins. H1 protects the end segments of nucleosomal DNA against digestion by nonspecific nucleases, while, with continued digestion, it is released from the nucleosome concomitant with the digestion of the DNA from ~166 bp down to the 147 bp-containing nucleosome core particle. While it is often considered that H1 "seals two turns of DNA in the nucleosome", this cannot be literally correct since H1 is in free exchange in physiological ionic conditions [17]. The site exposure model suggests two distinct possible mechanisms for the action of histone H1. In one mechanism, release of H1 must precede site exposure. Thus, kf
NH1,
klz
'HI+N~ kr
k23
'H1 +S+R~ k21
~HI+SR,
(4a)
k32
where NH1 is an HI-containing nucleosome. In this model, H1 acts effectively as a competitive inhibitor for site exposure. Cells may repress the genetic activity of regions of chromosomes either by increasing the concentration of HI, or by producing HI variants or post-translationally modified states of H1 that bind more tightly. Alternatively, histone H1 may always remain bound, but its presence may alter the equilibrium constants for site exposure ~Leq r'(c°nf through effects on k12, k21, or
J. Widom / Physica A 244 (1997) 497-509
507
both: k12
NH1 ~
k23
~ SH1 + R ~ k21
~ RSH1 .
(4b)
k32
These two models are readily tested and distinguished by experiment in vitro. (iii) Role of poly ( d A : d T ) elements in eukaryotic promoters. Polypurine tracts of length 15-30 bp are overrepresented in the genomes of all eukaryotic species, but this overabundance does not occur in prokaryotes [18]. Poly (dA:dT) elements are common and important elements of promoters in yeast [19, 20]. The function of these elements in vivo evidently depends on their intrinsic structure or properties and not on their interaction with sequence-specific DNA binding proteins. These elements increase the accessibility of adjacent D N A regulatory sequences to many enzymes, suggesting that poly(dA: dT) elements lead to nucleosomes having altered structure or stability, thereby facilitating access by regulatory proteins at those genomic locations. The site exposure model predicts such behavior and may account for the action of these promoter elements. Stretches of poly (dA:dT) of length 16 bp decrease by 1.2 kcal m o l - 1 the favorable free energy of histone-DNA interactions in nucleosome reconstitution [21]. This decreases, by that same amount, the free energy penalty that must be paid in order to uncoil the nucleosomal DNA beyond the poly (dA:dT) stretch so that a protein can have access to a site further inside the nucleosome. Decreasing the free energy penalty for exposing the site makes the overall process of protein binding more favorable. More generally, for any sequence element that decreases the favorable free energy of histone-DNA interactions by an amount AAG~eq, the site exposure model predicts that KeCqnf a n d the affinity for binding at a site further inside that same nucleosome will be increased by the factor exp(AAG~eq/RT). (iv) A new role for nucleosome positioning. Genomic DNA sequence may contribute significantly to gene regulation through effects on the statistical positioning of nucleosomes. Natural DNA sequences are well known that differ by a > 100-fold range in affinity for histone octamer in nucleosomes [22, 23]. If nucleosomes in vivo are mobile as currently believed [24, 25] (so that their positions may equilibrate), DNA sequences having a particularly high affinity for histone octamers will act to statistically bias nucleosome positioning [23]. This in turn governs the average equilibrium constant for site exposure at any particular (nearby) stretch of that DNA by controlling whether that stretch is found, on average, closer to a nucleosome end, where the K c°nf is further decreased by ._eqk "c°"f is greater, or closer to the nucleosome dyad, where ..~q a factor of 102 103. The magnitude of ~Leq k"c°nf in turn governs the fractional OCCUpancy of regulatory D N A sequences by the proteins that must act on them. These effects of genomic DNA sequence on the fractional occupancy of regulatory sites are much greater than the effects attributable to many individual gene regulatory proteins. (v) Site exposure as an initial step in 9ene activation. The site exposure model suggests definite, plausible, mechanisms for "active invasion". A fundamental problem
508
J. Widom / Physica A 244 (1997) 497-509
for active invasion models has been how hypothetical proteins capable of active invasion might know where in the genome to act. The site exposure model implies that particular genomic locations can always be identified based on their underlying D N A sequence even if those regions are packaged in nucleosomes. After just one protein has bound site-specifically - even though its time-averaged occupancy m a y be low - this initially bound protein m a y recruit additional factors having the capability of actively displacing the nucleosome, or it m a y itself have such an activity. Alternatively, simple equilibrium binding of sufficient numbers of proteins to sites within a single nucleosome - each of which is competing for D N A surface with the histone octamer - may ultimately lead to displacement of the histone octamer. These possibilities have within them a very interesting additional property. When not bound to DNA, the histone octamer is unstable in physiological conditions: it dissociates into three subunits: two H2A/H2B heterodimers plus an H32/H42 tetramer 1-26]. Reassociation to form a nucleosome could require a quaternary reaction, which may be expected to occur with vanishingly low probability since the free concentrations of the subunits are likely to be vanishingly low (except, perhaps, during S-phase of the cell cycle). If reassembly essentially does not occur, this means that the octamer no longer exists as an effective competitor. Hence, even if the initial regulatory protein-nucleosome complex might have only marginal net stability (i.e, low time-averaged occupancy), loss of the histone octamer as an effective competitor now allows that same regulatory protein - D N A complex to be much more stable, with a much greater time-averaged occupancy.
Acknowledgements The author is grateful to the members of his research group for valuable discussions and for their essential contributions to the experimental studies reviewed here. Research in the author's laboratory is supported by grants from the N I H . This paper is dedicated to Professor Benjamin Widom on the occasion of his 70th birthday.
References [1] T.J. Richmond, J.T. Finch, B. Rushton, D. Rhodes, A. Klug, Nature 311 (1984) 532. I-2] G. Arents, R.W. Burlingame, B.-C. Wang, W.E. Love, E.N. Moudrianakis, PNAS (USA) 88 (1991) 10 148. I-3] J. Widom, Proc. Natl. Acad. Sci. (USA) 89 (1992) 1095. 1-4] J. Widom, Ann. Rev. Biophys. Chem. 18 (1989) 365. [5] K.J. Polach, J. Widom, J. Mol. Biol. 254 (1995) 130. [6] G. Felsenfeld, Cell 86 (1996) 13. [71 A. Schmid, K. Fascher, W. Horz, Cell 71 (1992) 853. [8] J. Yao, P.T. Lowary, J. Widom, Proc. Natl. Acad. Sci. (USA) 90 (1993) 9364. 1-9] P.H. von Hippel, Science 263 (1994) 769. 1-10"1J.D. McGhee, G. Felsenfeld, Nucleic Acids Res. 8 (1980) 2751. [-11"1 H. Shindo, J.D. McGhee, J.S. Cohen, Biopol. 19 (1980) 523.
J. Widom / Physica A 244 (1997) 497 509 1-12] 1,13] 1,14] [15] 1-16]
509
R.T. Simpson, D.W. Stafford, Proc. Natl. Acad. Sci. USA 80 (1983) 51. K.J. Polach, J. Widom, J. Mol. Biol. 258 (1996) 800. C.C. Adams, J.L. Workman, Mol. Cell. Biol. 15 (1995) 1405. K.E. van Holde, Chromatin, Springer, New York, 1989. D. Pruss, B. Bartholomew, J. Persinger, J. Hayes, G. Arents, E.N. Moudrianakis, A.P. Wolffe, Science 274 (1996) 614. 1,17] F. Caron, J.O. Thomas, J. Mol. Biol. 146 (1981) 513. [18] M.J. Behe, Nucl. Acids Res. 23 (1995) 689. 1-19] V. lyer, K. Struhl, EMBO J. 14 (1995) 2570. 1,20] Z. Zhu, D.J. Thiele, Cell 87 (1996) 459. 1,21] J. Hayes, J. Bashkin, T.D. Tullius, A.P. Wolffe, Biochem. 30 (19911 8434. [22] J. Widom, J. Mol. Biol. 259 (1996) 579. [23] P.T. Lowary, J. Widom, Proc. Natl. Acad. Sci. (USA) 94 (1997) 1183. 1,24] S. Pennings, G. Meersseman, M.E. Bradbury, J. Mol. Biol. 220 (1991) 101. 1,25] K. Ura, J.J. Hayes, A.P. Wolffe, EMBO J. 14 (1995) 3752. [26] H.-P. Feng, D.S. Scherl, J. Widom, Biochem. 32 (1993) 7824.