A domain model for eukaryotic DNA organization: A molecular basis for cell differentiation and chromosome evolution

A domain model for eukaryotic DNA organization: A molecular basis for cell differentiation and chromosome evolution

J. theor. Biol. (1988) 132, 479-507 A Domain Model for Eukaryotic DNA Organization: A Molecular Basis for Cell Differentiation and Chromosome Evoluti...

2MB Sizes 41 Downloads 90 Views

J. theor. Biol. (1988) 132, 479-507

A Domain Model for Eukaryotic DNA Organization: A Molecular Basis for Cell Differentiation and Chromosome Evolution JOHN W. BODNAR

Department of Biology, Northeastern University, 360 Huntington Avenue, Boston MA 02115, U.S.A. (Received 18 December 1987, and in revised form 15 March 1988) A model for eukaryotic chromatin organization is presented in which the basic structural and functional unit is the DNA domain. This simple model predicts that both chromosome replication and cell type-specific control of gene expression depend on a combination of stable and dynamic DNA-nuclear matrix interactions. The model suggests that in eukaryotes, DNA regulatory processes are controlled mainly by the intranuclear compartmentalization of the specific DNA sequences, and that control of gene expression involves multiple steps of specific DNA-nuclear matrix interactions. Predictions of the model are tested using available biochemical, molecular and cell biological data. In addition, the domain model is discussed as a simple molecular mechanism to explain cell differentiation in multi-cellular organisms and to explain the evolution of eukaryotic genomes consisting mainly of repetitive sequences and "junk" DNA.

Introduction Study of eukaryotic molecular biology has drawn heavily on studies of prokaryotic molecular biology. Data and models derived from simple prokaryotic systems are extrapolated to the more complex eukaryotic systems. While this a p p r o a c h has been extremely productive, it has its limitations in that m a n y cellular functions are sufficiently different that such extrapolation is not possible. In these cases one may obtain misleading results if the prokaryotic model is applied. I believe that one of these cases has been the application of models for prokaryotic gene function (in particular the operon model) more or less directly to the eukaryotic genome. While the operon model has m a n y parallels to the mechanisms of eukaryotic gene regulation, m a n y properties of eukaryotic D N A are completely beyond the ability of the operon model to explain. Some of these are cell type-specific regulation of genes, the presence of repetitive D N A and " j u n k " D N A , and gene regulatory elements such as enhancers. I suggest that the tremendous difference in genome size and cell or nuclear volume between prokaryotes and eukaryotes requires that the underlying principles behind D N A regulatory processes are also very different between these two groups. On this basis, I have developed a model for eukaryotic D N A organization which is consistent with the size of the eukaryotic genome and nucleus. This model suggests that the basic structural and functional unit of eukaryotic D N A is the D N A 479 0022-5193/88/120479+28 $03.00/0

© 1988 Academic Press Limited

480

J.w.

BODNAR

d o m a i n - - a loop of a p p r o x i m a t e l y 50,000 base pairs of D N A which is stably associated at either end with the nuclear matrix. I discuss this domain model first by presenting the underlying assumptions on which it is based then by describing how the D N A domain can serve as the functional basis for both c h r o m o s o m e replication and cell type-specific gene regulation. I then test predictions that can be drawn from this model using available biochemical, molecular and cell biological data, concentrating on data which cannot easily be explained by extrapolation of the operon model. Finally, I demonstrate that the domain model is a simple way to explain, on a molecular level, both cell differentation in multi-cellular eukaryotes and evolution of a eukaryotic genome consisting mainly of repetitive sequences and " j u n k " DNA.

Assumptions A cell o f the bacterium E. Coli has a volume o f about 1 i~3 and contains a D N A genome o f about 4 x 1 0 6 base pairs; a typical h u m a n cell has a nuclear volume of about 500 ix3 and contains about 3 x 1 0 9 base pairs of D N A (Darnell et al., 1986). With a 500-fold difference in size and a thousand-fold difference in D N A content one would expect that D N A - p r o t e i n interactions must be markedly different in the two types of cells. Lin & Riggs (1975) measured the binding of the lac repressor to operator and n o n - o p e r a t o r D N A ; they concluded that with the lac repressor protein's ability to discriminate between operator and n o n - o p e r a t o r D N A it could not function in a eukaryotic cell because the concentration of n o n - o p e r a t o r D N A would be too high to allow the repressor to find its specific D N A binding site. Therefore, the ability of a D N A - b i n d i n g protein to find and associate with a specific D N A site in a eukaryotic cell must be driven by mechanisms other than simple diffusion. I suggest that there are several specific ways in which the mechanisms of D N A protein interactions differ between prokaryotes and eukaryotes. These are summarized in Table 1. While these can ultimately be tested, they will be simply stated here without proof. These assumptions will then be used to build a model for eukaryotic D N A organization. (I). Eukaryotic DNA-protein interactions are compartment-limited. In a bacterial cell, D N A - b i n d i n g proteins can find their specific binding sites by simple diffusion throughout the cell volume (Lin & Riggs, 1975). In eukaryotic cells, the ability o f TABLE 1

Proposed mechanistic differences between prokaryotic and eukaryotic gene regulation

DNA-protein interactions Regulatory DNA sequences Regulatory DNA-binding factors

Prokaryotes

Eukaryotes

Diffusion-limited Few per gene Many types per cell

Compartment-limited Hundreds per gene Few types per cell (used combinatorially) Many copies of each per cell

Few copies of each per cell

EUKARYOTIC

DNA

DOMAIN

ORGANIZATION

481

a regulatory protein to find its specific binding site is driven by compartmentalizing that protein and its DNA binding site in the same region of the nucleus. This raises the local concentration of both protein and specific DNA sequences to allow efficient binding in much the same way as bacterial nucleotide-synthesizing enzymes raise the local concentrations of nucleotide intermediates by forming a macromolecular complex (Allen et aL, 1983). Although a particular DNA sequence element may be found thousands of times in a eukaryotic genome, only those copies of that element which are found in the same intranuclear compartment as the appropriate regulatory factor can be recognized. Physically this compartmentalization occurs by the binding of both DNA and regulatory proteins to specific regions of the cellular nuclear matrix. (II). Hundreds of copies of D N A regulatory sequence elements are found in each eukaryotic gene. In the bacterial operon, regulatory sequence elements (usually in single copies) are localized in the DNA of the promoter region (Beckwith & Zipser, 1970). While the last steps in activation of a eukaryotic gene involve similar promoterspecific interactions, these interactions are only the final in a series of regulatory sequence binding steps which occur throughout the entire gene and its flanking DNA. There are hundreds of specific DNA regulatory elements throughout an entire eukaryotic gene. This allows a particular type of DNA regulatory element to be recognized by sheer numbers. If there are five hundred copies of a particular DNA-binding site for a regulatory protein in a given gene, they will be much more easily located than a single copy, and if only four hundred of the five hundred are bound, that will be sufficient to drive the gene activation. This means that the bulk of most eukaryotic gene sequences will be comprised of various sorts of repetitive DNA elements. (III). Few different types of regulatory DNA-binding proteins are found in a particular eukaryotic cell, and they are used combinatorially for gene regulation. In a bacterial operon there is typically one regulatory (repressor) gene found for control of the other three or four genes in the operon (Beckwith & Zipser, 1970). This means that for the few thousand genes in a bacterial cell (Darnell et al. 1986) there are probably several hundred regulatory genes. If this model is applied to a eukaryotic cell, one would expect to find about 25,000 regulatory genes for the approximately 100,000 genes in a human cell (Darnell et al., 1986). I propose instead that each eukaryotic organism only has a few hundred regulatory genes and that only a few (probably 20 or 30) of these are expressed in any one cell. These are used in combinations within a particular gene for regulation. The advantages and mechanisms for combinatorial regulation (Gierer, 1973; Alberts et aL, 1983; Weintraub, 1985; Yamamoto, 1985) are discussed in detail below. This means that there is a limited variety of regulatory genes (e.g. oncogenes, transactivators, hormone receptors, etc.) in any eukaryotic organism. (IV). Many copies of each regulatory gene product are produced in a eukaryotic cell, In a bacterial cell about ten copies of the lac repressor protein are produced to regulate its operon (Lin & Riggs, 1975). In a eukaryotic cell tens of thousands of copies of each regulatory gene product may be produced (Yamamoto & Alberts, 1976). This assumption follows from assumptions II and Ill above. To find and

482

J . w . BODNAR

recognize the hundreds of DNA regulatory sequences that are required to activate a gene (Assumption I I) thousands of copies of regulatory proteins are made to drive the required DNA-protein interactions. However, the investment of energy into production of regulatory proteins is minimized in each cell since only a few different kinds of regulatory factors are produced in each cell (Assumption III). Therefore, while a bacterial cell makes only a few copies each of many kinds of regulatory proteins, a eukaryotic cell makes many copies of each of a few kinds. A Domain Model for Eukaryotic D N A Structure and Function

The domain model for eukaryotic DNA structure and function is built upon several previous models for eukaryotic DNA organization which have indicated that eukaryotic DNA is organized into large DNA loop domains (Dingman, 1974; Paulson & Laemmli, 1977; Berezney, 1979; Dijkwel et aL, 1979; Bouvier et aL, 1980; Lawson et al., 1980; McCready et al., 1980; Pardoll et al., 1980; Stalder et al., 1980; Zuckerkandl, 1981; Hancock, 1982; LaFond & Woodcock, 1983; Cook & Lang, 1984; Mirkovitch et al., 1984; Small et aL, 1985; Holmquist & Caston, 1986; Gross & Garrard, 1987; Watson & Gralla, 1987). The major addition made here to those models is the hypothesis that the entire DNA domain can serve as a functional unit for gene expression through multiple sequence-specific interactions of the DNA with the nuclear matrix. The DNA domain as defined here is a single DNA loop which is stably bound to the eukaryotic nuclear matrix or chromosome scaffold. Several investigators have shown that eukaryotic DNA is bound tightly to the nuclear matrix during interphase and a chromosome scaffold during metaphase in long DNA loops which have a size of approximately 50,000 base pairs (see Bodnar et al., 1983 and references therein). These DNA loops are most likely anchored by proteins which are tightly bound to the DNA at the sites of attachment (Spieb et al., 1982; Bodnar et al. 1983). Note that this domain size of 50,000 base pairs is merely an average; a DNA domain may be as small as 5,000 base pairs for the Drosophila histone locus (Mirkovitch et al., 1984) or as large as 2 000 000 base pairs for the human gene for Duschennes muscular dystrophy (Koenig et al., 1987). The basic features of the domain model are shown in Fig. 1 and Table 2. DNA domains in the interphase nucleus are attached stably at their ends to the peripheral nuclear matrix by proteins tightly bound both to the DNA and the nuclear matrix. These DNA-nuclear matrix interactions are sequence-specific and remain throughout the cell cycle to provide the overall DNA organization for the cell. Most of the DNA domains in any given cell are in a condensed configuration where the DNA is packaged into nucleosomes, solenoids, and higher order structures. In these domains the DNA is attached to the nuclear matrix only at the domain ends and the condensed DNA is too tightly packaged to allow transcription of any included genes. A portion of the DNA domains are extended into the fibrillar internal nuclear matrix by multiple sequence-specific DNA-protein interactions. These extended DNA domains represent active genes in which the chromatin structure is altered

EUKARYOTIC

DNA DOMAIN

ORGANIZATION

483

(condensed) Domoin ]I (extended)

FIG. 1. The domain model for eukaryotic DNA organization. The DNA is organized into loop domains by stable attachment to the nuclear matrix at approximately 50,000 base pair intervals. Most domains are condensed (as Domain I) into higher order chromatin structures. The DNA of active domains is extended (as Domain Il) by multiple sequence-specific dynamic associations with the nuclear matrix. Stable nuclear matrix attachment sites (O), dynamic nuclear matrix attachment sites (C)); active gene sequences (E_IZIZ]).

TABLE 2

Predictions about eukaryotic D N A sequence organization in domain model Stable attachment sites

Dynamic attachment sites

(1) Long range (domain ends) (2) Stable (covalenfly or tightly bound proteins) (3) Clustered with: - - D N A replication origins ---enhancers --topoisomerase sites (4) Sequences may play Dual role in replication and transcription

(1) Multiple throughout domain (2) Labile (low affinity) (3) Found everywhere

(4) Promoter may be anywhere in gene

484

~. w. BODNAR

by the fact that these domains are bound at many dynamic sites to the internal nuclear matrix. As summarized in Table 2 this hypothesis suggests that there are two basic kinds of DNA-nuclear matrix interactions: 1) stable sites at domain ends and 2) dynamic sites throughout active D N A domains. With this simple structural organization the cell can regulate both replication and gene expression. The schematic in Fig. 1 represents a nucleus " f r o z e n " at a specific point in the cell cycle; the dynamic nature of the domain model is more evident when the mechanisms for D N A replication and gene expression are discussed in more detail. THE DOMAIN MODEL AND DNA REPLICATION

The D N A domain as a basis for DNA replication is becoming an accepted model which has been refined by several investigators (Dingman, 1974; Dijkwel et aL, 1979; McCready et al., 1980; Pardoll et aL, 1980; Aelen et al., 1983; Cook & Lang, 1984; Jackson & Cook, 1986). The major source of controversy here is whether the DNA replication origins are bound stably or dynamically to the nuclear matrix. I suggest that both types of origins may be found in the same cell. As shown in Fig. 2, D N A replication (of a condensed domain) begins with recognition of a DNA replication origin which is close to the stable nuclear matrix

FIG. 2. The domain model and DNA replication (adapted from Dingman, 1974). The condensed DNA domains are bound to the nuclear matrix at stable attachment sites (0). The replication complexes, containing initiation factors, DNA polymerase, etc. (Ol, are also matrix-bound and recognize the replication origins by scanning the DNA near the attachment sites. Replication proceeds in association with the nuclear matrix where chromatin is unfolded ahead of the replication complexes; the DNA is "reeled through" the replication complexes (in the direction of the arrows), and the parental (-.,-) and daughter ( ~ ) DNA strands refold after the complexes pass. The stable attachment sites remain throughout the replication process and maintain the replication origins and transcriptional "uncoiling" sequences near the nuclear matrix throughout the cell cycle. Chromosome segregation is then accomplished by separation of the parental (O) and daughter ( 0 ) stable attachment sites along the nuclear matrix.

EUKARYOTIC DNA DOMAIN ORGANIZATION

485

attachment sites at the domain ends. The initiation complex, D N A polymerase, and associated enzymes are gathered together on the nuclear matrix where polymerization begins. The D N A replication complexes remain nuclear matrix-bound and the D N A is "reeled through" as it is replicated (Dijkwel et al., 1979; Pardoll et al., 1980). Higher order chromatin structures are disrupted before the replication fork and reform as soon as the fork passes. In this way the bulk of the D N A is found in its normal chromatin packaging, and after a domain is replicated the two daughter domains are again packaged into chromatin adjacent to each other on the nuclear matrix. The daughter chromosomes are then segregated by separation on the nuclear matrix of the proteins at the base of the D N A loops. Note, that in this model the stable attachment sites at the D N A domain ends always remain nuclear matrix-associated. This means that the D N A replication origins are nuclear matrix-bound throughout the entire cell cycle ready for recognition by matrix-bound initiation factors. The "reeling through" of the D N A during replication occurs at the replication forks, and this transient dynamic association is lost when the two replication forks collide and the replication complexes are released from the DNA. The advantage to the cell of this type of replicon organization is that it is modular and is essentially independent of genome size. Since the replication origins are recognized by scanning stable attachment sites, the huge amounts of other D N A will not compete for the initiation factors. In addition, mechanistically it is as straightforward to replicate and segregate hundreds of thousands o f domains as it is the few domains shown in Figure 2. While most of the D N A domains in the cell are condensed and most probably replicated as described above with stable replication origins, the presence of a fraction of the D N A domains in an extended configuration suggests that some replicons may have replication origins which are dynamically bound to the nuclear matrix. If the intracellular location of a D N A sequence is as important as its nucleotide sequence for recognition by D N A replication initiation factors and if DNA-nuclear matrix interactions change when a gene is activated, then one would expect that domains will have different replication origins when they are extended rather than condensed, (i.e. that control o f replication o f genes is different when they are active). This specific example is discussed below.

THE

DOMAIN

MODEL

AND

GENE

EXPRESSION

The D N A domain can also serve as the functional unit of eukaryotic gene expression as shown in Fig. 3. Here, activation o f transcription is directed by three distinct sequential steps which will be termed uncoiling, extension, and promoter recognition. Only those genes which have undergone all three steps in a particular cell are actually transcribed. The first step in gene activation is an uncoiling step which changes the chromatin structure of the entire D N A domain. This begins with the D N A condensed into loop domains as before. Sequences adjacent to the stable attachment sites are accessible to nuclear matrix-bound factors which can bind specific D N A sequences

486

J.

W.

BODNAR

A.

Uncodlng4 ~ ~

Extension

Promoler recocjnition

"Active"gene~Tronscribed gene

FIG. 3. The domain model and gene expression. A. Gene activation begins with a condensed DNA domain stably bound to the nuclear matrix, and the activation process proceeds through three sequential steps: uncoiling, extension, and promoter recognition. Uncoiling involves recognition of DNA sequences near the stable attachment sites (Q) followed by a "loosening" of the higher order chromatin structure of the entire domain (most likely through changes in supercoiling). In the extension step, multiple transcription factors recognize specific DNA sequences throughout the DNA domain, unravel the chromatin, and bind the chromatin ( ) to the fibrillar internal nuclear matrix ( ) at many dynamic attachment.sites (O). Finally, the promoter elements (1') are localized on the nuclear matrix, recognized by the appropriate transcription factors, and transcription is initiated. B. Chromatin can be divided into condensed "inactive" domains, uncoiled and/or extended "active" domains, and extended domains which are actually transcribed. in these regions. This specific D N A b i n d i n g " ' l o o s e n s " the c h r o m a t i n s t r u c t u r e o f the entire l o o p d o m a i n ( m o s t p r o b a b l y t h r o u g h c h a n g e s in s u p e r c o i l i n g ) . T h e s e c o n d step in t h e g e n e a c t i v a t i o n p r o c e s s is an e x t e n s i o n o f the u n c o i l e d c h r o m a t i n t h r o u g h o u t t h e i n t e r n a l n u c l e a r m a t r i x fibrils b y a n o t h e r class o f t r a n s c r i p tion factors. I suggest t h a t t h e r e are h u n d r e d s o f specific D N A s e q u e n c e s w h i c h a r e r e c o g n i z e d b y t h o u s a n d s o f factors at this s t e p to b i n d the D N A t h r o u g h o u t the n u c l e a r m a t r i x (at sites a few h u n d r e d b a s e p a i r s a p a r t ) . In the c o n d e n s e d configura t i o n t h e s e D N A s e q u e n c e s are i n a c c e s s i b l e for b i n d i n g , b u t in the u n c o i l e d D N A d o m a i n s t h e y can be r e c o g n i z e d , a n d the s e q u e n t i a l b i n d i n g o f h u n d r e d s o f factors c a n u n r a v e l the h i g h e r o r d e r c h r o m a t i n s t r u c t u r e s a l l o w i n g e x t e n s i o n a n d o p e n i n g u p o f the c h r o m a t i n s t r u c t u r e t h r o u g h o u t t h e e n t i r e d o m a i n . D e m e t h y l a t i o n o f the D N A t h r o u g h o u t the d o m a i n m a y a i d this p r o c e s s (see b e l o w ) .

EUKARYOTIC DNA DOMAIN ORGANIZATION

487

This extension step is the major difference between this domain model and previous ones in which the DNA domain is the unit of gene regulation. I suggest that hundreds or thousands of specific DNA-protein interactions throughout the entire DNA domain work co-operatively to unravel the chromatin structure. Recognition of any of these types of sites anywhere in the domain can start the extension process by bringing the DNA to the nuclear matrix and opening up a small region of chromatin. The proximity of the adjacent recognition sequences to the matrix then can initiate a sequential binding of nearby sites, followed by an overall extension of the entire domain along the matrix. This type of extension step will have profound effects on the chromatin structure of "active genes"; these are discussed in detail below. Finally, nuclear matrix-bound transcription factors can recognize the promoter of an extended DNA domain to initiate transcription. The extension of DNA through the nuclear matrix allows specific DNA sequences near the gene (i.e., within a few kilobases) to be brought together with the appropriate transcription factors for recognition of upstream control elements (UCE's) and promoters and for initiation of transcription. Unless a cell contains transcription factors for the specific DNA sequences in a domain at each step, genes in that domain will not be transcribed. In this model condensed DNA domains are "inactive" while uncoiled or extended DNA domains are "active" (i.e. able to be transcribed), and promoter recognition of extended ("active") domains completes the process to begin transcription. Overall chromosome structure varies among eukaryotes. For example, vertebrate chromosomes lose their recognizable structure during interphase, and heterochromatin is localized mainly at the nuclear periphery (Brown, 1966); however, Drosophila polytene chromosomes retain a recognizable structure throughout interphase, and the chromosome scaffolds run through the nuclear interior (Hochstrasser and Sedat, 1986a; 1986b and references therein). I will describe the domain model here using the vertebrate nuclear organization. This model can also be applied to Drosophila with minor modification where the interphase DNA domain organization is inside out from that of vertebrates. I suggest that the Drosophila interphase chromosome scaffolds correspond to the vertebrate peripheral nuclear matrix, and that the chromosome puffs extending out from Drosophila chromosomes correspond to the extended DNA domains which stretch in from the vertebrate nuclear membrane. Test of the Domain Model with Available Biological Data

The value of any biological model is in its ability to make experimentally testable predictions and to provide insight into mechanisms of the processes it concerns. While the domain model is simple to describe, it can do both of these things at several different levels. In this section I will describe predictions of the domain model as it applies to the biochemistry, molecular biology, and cell biology of eukaryotic D N A organization, and then describe how these predictions are consistent with data currently available in the literature. These will be presented as follows:

488

J . w . BODNAR

(1) Prediction; (2) Rationale for that prediction, and (3) Data from the literature that supports the hypothesis. I will concentrate only on those areas which are different between prokaryotes and eukaryotes (i.e. difficult to explain by the operon model). A summary o f the predictions on DNA sequence organization versus nuclear matrix attachment is shown in Table 2. STABLE ATTACHMENT PROTEINS

SITES ON THE

COVALENTLY

NUCLEAR

OR TIGHTLY

MATRIX BOUND

ARE

MEDIATED

TO THE

BY

DNA

Rationale. The stable attachment sites serve as an overall structural organization for the DNA, and therefore, the proteins involved should be bound tightly both to the D N A and to the nuclear matrix. The intractability of this class of proteins may be the major stumbling block to isolating and characterizing the stable attachment sites for cellular DNA. Data: Proteins have been isolated which are bound tightly to the D N A of several eukaryotic cell types and to HSV-1 DNA (Razin, et al. 1981; Spieb, et al., 1982; Bodnar, et aL, 1983; Wu et aL, 1979; Hyman, 1980) which (while not covalently bound) remain associated with the D N A in the presence of SDS and reducing agents. The tightly bound proteins associated with HeLa cell D N A are attached to the DNA at stable nuclear matrix attachment sites, and the spacing of the proteins along the D N A is constant over the cell cycle (Bodnar et aL, 1983). Proteins are found covalently bound to the DNA of several viruses including adenovirus and Minute Virus of Mice (MVM), a murine parvovirus (Rekosh et aL, 1977; Astell et aL, 1982; reviewed by Wimmer, 1982). The stable sites of interaction of adenovirus and MVM DNA's with the nuclear matrix are in the restriction fragments attached to the covalently bound proteins (Bodnar & Ward, in preparation). The tightly or covalently bound proteins as a class are extremely hydrophobic and insoluble, as one would expect for nuclear matrix-associated proteins, and extreme care must be taken to insure that they do not cause the loss of the D N A they bind during purification procedures. DNA-protein complexes of this type tend to stick to glass, aggregate, or precipitate unless kept in detergent solutions. Additionally, the proteins tightly bound to mammalian D N A are extremely resistant to protease digestion (Spieb et aL, 1982 and references therein; Bodnar et aL, 1983). If DNA is isolated by conventional methods using proteases, detergents, and phenol and if the D N A fragments isolated are small, the residual peptides can cause the D N A fragments they bind to be lost during phenol extractions (Bodnar et al., 1983). PROTEINS

COVALENTLY LOCALIZED

OR TIGHTLY NEAR

DNA

BOUND

TO

REPLICATION

EUKARYOTIC

DNA

ARE

ORIGINS

Rationale. The stable attachment sites are used to localize D N A replication origins on the nuclear matrix, and therefore proteins involved in stable association o f D N A to the nuclear matrix are attached to D N A near D N A replication origins. Data. T h e proteins tightly bound to HSV-1 DNA are bound to the inverted terminal repeats which contain the Ori~ D N A replication origin (Wu et al., 1979;

EUKARYOTIC

DNA

DOMAIN

489

ORGANIZATION

Stow & McMonagle, 1983). The terminal proteins covalently bound to adenovirus and MVM DNA termini are near the DNA replication origins for these viral DNA's, and both proteins have been implicated in initiation of DNA replication (Rekosh et al., 1977; Astell et al., 1982).

EACH

LOOP

DOMAIN

REPRESENTS

AN

INDIVIDUAL

REPLICON

Rationale. Since there should be a DNA replication origin at each stable attachment site, each loop will be replicated from origins at either end and will represent an individual unit of DNA replication. Data. Average replicon length in eukaryotic cells is the same size as the length of DNA domains (Edenberg & Huberman, 1975; Bodnar et al., 1983 and references therein). In cells of several organisms the average replicon length has been compared to the average size of a DNA loop domain, and the two are the same in each cell type (Buongiorno-Nardelli et al., 1982).

MOST

EUKARYOTIC

DNA

NUCLEAR

REPLICATION

MATRIX

ORIGINS

THROUGHOUT

ARE THE

STABLY

CELL

BOUND

TO THE

CYCLE

Rationale. The stable attachment sites remain throughout the cell cycle; therefore most DNA replication origins remain nuclear matrix-bound also. Data. Several investigators using several different techniques have shown that the bulk of the DNA replication origins are stably bound to the nuclear matrix throughout the cell cycle in Physarum polycephalum, Xenopus laeois, chicken, and hamster (Aelen et at., 1983; Carri et al., 1986; Razin et al., 1986; Dijkwel et al., 1986).

SOME THE

EUKARYOTIC

NUCLEAR ON

DNA

MATRIX

WHETHER

THE

REPLICATION

AND

THEIR

GENES

ORIGINS

MODE

1N T H O S E

ARE

DYNAMICALLY

OF REPLICATION DOMAINS

ARE

CHANGES

INACTIVE

OR

BOUND

TO

DEPENDING ACTIVE

Rationale. When a gene is activated, its nuclear matrix association changes from being only at its domain ends to sites throughout the domain. Therefore, if nuclear matrix attachment is as important as actual DNA sequence in determining origins, then the DNA replication origins would change from being at domain ends when a gene is inactive to being at multiple points throughout the domain when the gene is active. Data. In general, inactive genes replicate late in S phase; active and housekeeping genes replicate early in S, and often genes change from late- to early-replicating when activated (Goldman et al., 1984). In particular, the mouse immunoglobin heavy-chain constant region locus is a domain of over 300 kilobase pairs which replicates as a single replicon late in S phase when inactive, but shifts to earlier

490

J . w . BODNAR

replication from apparently several replication origins when activated (Brown et al., 1987).

DNA

REPLICATION

CLUSTEKED

ORIGINS,

IN S M A L L

DNA

ENHANCERS, SEGMENTS

AND

NEAR

TOPOISOMERASE

SITES ARE

STABLE ATTACHMENT

SITES

Rationale. Since the recognition of DNA replication origins and the first steps in activation of a domain for gene expression require compartmentalization of proteins and specific DNA sequences at the stable attachment sites, one would expect the DNA sequences required for those functions to be clustered in a small region of the DNA domain near its ends. Data. To date, studies with eukaryotic cells are suggestive but inconclusive. Replication origin association with stable sites was discussed above. The mouse immunoglobin kappa chain enhancer is in a restriction fragment which contains both a stable matrix attachment site and several copies of the Topoisomerase II DNA recognition sequence (Cockerill & Garrard, 1986). Topoisomerase II has been localized to the base of DNA loops in chromosome scaffolds and to the nuclear matrix in interphase cells (reviewed by North, 1985 and by Nelson et al., 1986). The clustering of regulatory sites is much more recognizable in the DNA's of several animal viruses which replicate their DNA in the nucleus. In these cases, it is the immediate early (IE) genes, those transcribed immediately after viral infection by host factors, which have their enhancers and promoters in the cluster. For example, in the papovaviruses SV40 and polyoma the 5 kiiobase pair circular DNA contains a 400 base pair regulatory sequence which contains the DNA replication origins, all viral promoters and enhancers, and the strongest Topoisomerase II nicking site (reviewed by Chambon et al., 1984 and in Tyndall et al., 1981; Yang et aL, 1985). In the linear 36 kilobase pair adenovirus DNA genome, the last kilobase pair segment on each end contains the DNA replication origins, the promoter and enhancer for the IE genes (E1A and E1B), strong Topoisomerase I nick sites, the covalent terminal protein attachment sites, and the stable nuclear matrix attachment sites (reviewed by Futterer & Winnacker, 1984, and Shenk & Williams, 1984; Chow & Pearson, 1984; Rekosh et al., 1977; Bodnar & Ward, in preparation). In the linear 5 kilobase pair MVM DNA genome, the terminal restriction fragments contain the DNA replication origins, the promoter for the regulatory (NS-1) gene, specific DNA sites for nicking-closing steps in DNA replication, the covalent terminal protein attachment sites, and the stable nuclear matrix attachment sites (Astell et al., 1982 and references therein; Bodnar and Ward, in preparation). The HSV-1 genome is a linear 140,000 kilobase pair DNA composed of two fused segments each with an inverted DNA terminal repeat. The sequences in (or very near) these inverted repeats contain the required DNA replication origin (Oris), the promoters for 4 of 5 immediate early (IE) genes (ICP-0, - 4 , -22, and -47), and the sites of attachment of the tightly bound proteins (reviewed by Roizman, 1979; Wu et al., 1979 and references therein; Stow & McMonagle, 1983; Sacks & Schaffer, 1987 and references therein; Hyman, 1980).

EUKARYOTIC DNA

SEQUENCES

MAY

DNA

PLAY

DOMAIN

A DUAL

REPLICATION

ROLE

AND

491

ORGANIZATION IN

REGULATION

OF

BOTH

DNA

TRANSCRIPTION

Rationale. Since regulatory sites for both DNA replication and gene regulation are clustered at the stable attachment sites one would expect that the chromatin alterations involved in both processes may be similar and sequences used in the processes may have evolved to be required for both replication and gene activation. Data. Experiments with polyoma and bovine papillomavirus indicate a requirement of enhancers for DNA replication (Muller et al., 1983; Veldman et al., 1985; Stenlund et al., 1987). Additionally, several transcriptional regulatory DNA sequence elements are identical to DNA replication elements or bind the same protein factors. For example, Nuclear Factor I (NF-I) which regulates adenovirus replication is identical to a protein which binds the CAAT transcriptional signal (Jones et al., 1987). The conserved octanucleotide which is found in immunoglobin enhancers (Falkner et al., 1986; Parslow et al., 1987) is the same sequence element as that bound by NF-III, an adenovirus replication regulator (Pruijn et al., 1986). Finally the GC box in the SV40 21 base pair repeat (which binds the transcription factor Spl) modulates both transcription and replication of that viral DNA (Bergsma et al., 1982; Hartzell et al., 1983; Kadonaga et al., 1986). EACH

DOMAIN

IS A S E P A R A T E

SUPERCOILING:

DNA

FUNCTIONAL

UNCOILING GENE

CAN

UNIT

SERVE

FOR

AS T H E

CONTROL FIRST

OF

STEP

DNA IN

ACTIVATION

Rationale. The stable attachment sites at the ends of a domain serve as anchor points for the DNA loop which can both allow the loop to maintain superhelicity and insulate that loop from superhelicity in adjacent loops. Therefore, stable attachment at domain ends allows each DNA domain to be supercoiled independently, and changes in superhelicity can "uncoil" the chromatin structure of the entire DNA domain. Data. Eukaryotic chromosomes are maintained in domains of supercoiling which are the same size as the DNA loop domains (Hartwig, 1978; Benyajati & Worcel, 1976, Cook & Brazell, 1976). The "'halo" of DNA loops around nuclear matrices isolated with no DNA breakage can be expanded or contracted by varying ethidium bromide concentration indicative of maintenance of DNA supercoiling throughout these structures (Vogelstein et al., 1980). Additionally, torsional stress promotes DNAse ! sensitivity of active genes (Viileponteau et al., 1984) and is required for the maintenance of the chromatin structure of active genes (Luchnik et ai., 1982; Ryoji & Worcel, 1985). THE

DYNAMIC

ATTACHMENT

ELEMENTS

FOUND

SITES ARE THROUGHOUT

MULTIPLE THE

SPECIFIC

ENTIRE

DNA

DNA

SEQUENCE

DOMAIN

Rationale. The extension of DNA throughout the nuclear matrix depends on attachment at many sites (probably hundreds), which are found every few hundred

492

j . w . BODNAR

base pairs, to open the chromatin structure and to bring the promoter into the proper nuclear compartment for recognition. These are specific DNA sequences bound by a specific transcription factor or hormone receptor. This means that the regulatory sequences in this step are the most prevalent oligonucleotides in a particular DNA domain. Data. The nuclear matrix has been implicated in RNA transcription as well as replication. Both transcribed genes and RNA synthetic complexes have been found associated with the nuclear matrix (reviewed by Nelson et al., 1986). By computer analysis, the most highly recurring sequence elements (HRSE's) can be identified in a DNA sequence, and in test sequences analyzed (SV40, HSV-1, metallothionein, MMTV, and the ovaibumin locus), the HRSE's identified by computer analysis were homologous with sequence elements recognized by transcriptional regulatory elements (Bodnar & Ward, 1987). For example, in the ovalbumin locus the HRSE determined by computer analysis was a 9 of 10 match to the sequence element recognized by the progesterone receptor. A 9/10 match of this HRSE was found 134 times in the 13 kilobases of the ovalbumin DNA sequences available. The HRSE found in Mouse Mammary Tumor Virus (MMTV) sequences contained the glucocorticoid receptor binding site with a single mismatch; the MMTV HRSE was found with a single mismatch 85 times in 8400 base pairs (Bodnar & Ward, 1987). SOME PROMOTER

TRANSCRIPTION SEQUENCES

FACTORS

BUT WITH

WILL

BIND

LOW AFFINITY

DOMAIN

WITH

HIGH

AFFINITY

EVERYWHERE

AT

IN THE

DNA

REGULATED

Rationale. Since the extension step is driven by concerted interaction of thousands of regulatory proteins acting at hundreds of sites, the specific DNA sequence elements recognized would probably be high affinity (i.e. an exact match for the sequence element) near promoters to insure they are nuclear matrix-bound, but could be low affinity (i.e. with one or more mismatches) elsewhere. Their total effect on extension would be similar to Velcro (i.e. hundreds of weak interactions to give an overall strong interaction). Note that transcription complexes could easily pass through these genes since the few binding sites disrupted as the complexes pass could be "zippered up" afterwards. Also, since the HRSE's bound by these factors are often an average of 100 base pairs apart, the binding at the HRSE's would appear to be non-specific in laboratory experiments (i.e. every region of a domain would bind the factors with an affinity higher than control DNA). Data. Work with steroid receptor action indicates such a high afffinity/Iow affinity binding by the activated receptors throughout the entire DNA domain. Early studies with the steroid receptors indicated that they open up chromatin by high affinity binding of DNA near promoters but with low affinity "non-specific" binding elsewhere (Yamamoto & Alberts, 1976). Recent studies with MMTV indicate that the glucocorticoid receptor binds with high affinity to specific sequences near the MMTV promoter but with low affinity in other restriction fragments throughout the genome

EUKARYOT1C

DNA

DOMAIN

ORGANIZATION

493

(Payvar et al., 1982, 1983; reviewed by Yamamoto, 1985). This low affinity binding is evident even in rodent DNA sequences flanking integrated MMTV genomes (Geisse et al., 1982).

PROTEINS

INVOLVED

IN

DOMAIN

EXTENSION

NUCLEAR

CAN

BIND

BOTH

DNA

AND

THE

MATRIX

Rationale. The regulatory factors in the extension step act to bind DNA to the nuclear matrix and serve as a "bridge" between DNA and the matrix. In this way steroid hormone receptors would be free in the cell when no steroid is present, but when bound by the steroid would change conformation then bind both DNA and the nuclear matrix. Data. Steroid receptors bind DNA (see above) but also bind the nuclear matrix (Rennie et al., 1983; Kaufmann et al., 1986; Alexander et al., 1987). This interaction with the nuclear matrix is stimulated by steroids in cell types that are responsive to those hormones (reviewed in Barrack & Coffey, 1982 and Nelson et al., 1986). DYNAMIC REFLECTED

ATTACHMENT BY T H E

SIZE

SITES SHOULD OF THE

DNA

BE L A B I L E BOUND

AND

DURING

THEIR

STABILITY

EXTRACTION

WILL

BE

PROTOCOLS

Rationale. The dynamic attachment sites depend on multiple cooperative interactions of DNA with the nuclear matrix. Therefore, if DNA is digested in situ then extracted with salt, the dynamically bound DNA should remain matrix-bound if it is long with many sites to stabilize the interaction. Short DNA pieces will have few or no attachment sites and should be more easily extractable. Data. Experiments of the above type with HeLa cells, mouse alpha-globin genes, Drosophila actin genes, and adenovirus DNA, indicate that large DNA pieces are retained in the nuclear matrix fraction, but that as the DNA size decreases so does nuclear matrix binding (Mirkovitch et al., 1984; Kirov et al., 1984; Small et al., 1985; Bodnar & Ward, in preparation). In particular, the actin gene had a restriction enzyme fragment which bound to the matrix as long as a 3.5 kilobase pair segment remained intact, but when that fragment was cut, binding was lost (Small et al., 1985). DEMETHYLAT1ON OF DNA PLAYS A ROLE IN THE EXTENSION STEP OF GENE ACTIVATION Rationale. Methylation of DNA can play a role in control of protein binding; restriction enzymes and other regulatory proteins will often bind unmethylated DNA but not DNA where the binding site DNA is methylated (reviewed by Doerfler, 1983). Uncoiled DNA domains may be methylated so that the HRSE's (and many other sequences throughout the domain) cannot be bound by the transcription factors. Specific demethylation of the uncoiled DNA domains would then make the HRSE's accessible for protein binding. Thus, the sequence for gene activation would be uncoiling, demethylation, extension, etc.

494

~. w. BODNAR

Data. Many eukaryotic genes are methylated when inactive and demethylated as part of the gene activation process (reviewed by Doerfler, 1983). Gene demethylation is often in domain-sized D N A regions (Molitor et al., 1976; Bird et al., 1979) where demethylation occurs throughout the gene including flanking regions in both directions.

PROMOTER

SEQUENCES

AND

ENHANCERS GENE

CAN

BE F O U N D

ANYWHERE

1N T H E

REGULATED

Rationale. Since the transcriptional factors work mainly by localizing D N A and the proper proteins in the same nuclear compartment, one would expect the "promoter" recognition elements to be found anywhere near the transcriptional start site (i.e. within a few kilobases) and that the orientation of those sequences does not matter. Similarly, since the uncoiling and extension steps affect the structure of the entire D N A domain, enhancer elements could cause those changes in structure while found anywhere within the DNA domain sequences. Data. Many studies on eukaryotic genes have indicated that the promoter elements can be moved or reversed and still function as long as they are "close" to the transcriptional start site. One good example is the GC box (the Spl binding site) which is found upstream of the HSV-1 tymidine kinase gene and the SV40 T-antigen genes; this element can be moved fifty base pairs upstream or reversed in orientation and still function (Everett et al., 1983; reviewed by McKnight & Tjian, 1986). Also, promoters for RNA Polymerase III transcripts can be found within the coding sequences o f the gene itself(Smith et al., 1984). Enhancers are by definition sequence elements that can affect transcription at a distance in either direction (summarized in Schirm et al., 1987).

UPON

ACTIVATION

NUCLEASE

THAN

NUCLEASE

THE BULK

ENTIRE

DNA

CHROMATIN

HYPERSENSITIVE

DOMAIN AND

WILL

BE M O R E

WILL CONTAIN

SITES THROUGHOUT

SENSITIVE

MANY

THE

TO

SPECIFIC

DOMAIN

Rationale. The extension of DNA through the nuclear matrix will open up the chromatin higher order structure so that the extended D N A will contain nucleosomes like "beads on a string" between dynamic attachment sites. With a spacing of a few hundred to a few thousand base pairs between dynamic attachment sites there is room for only a few nucleosomes between them. Near the attachment sites themselves or between two closely spaced sites one would expect naked D N A which would be hypersensitive to nucleuses. Data. Active genes are located in domains of DNAse I sensitivity (Gazit & Cedar, 1980) which can range from 12 kilobase pairs for the glutaraldehyde-3-phosphate dehydrogenase gene (Alevy et al., 1984) to at least 54 kilobase pairs for the ovalbumin locus (Lawson et al., 1980). Within these sensitive regions are hypersensitive sites

EUKARYOTIC DNA DOMAIN ORGANIZATION

495

which may be found at specific sites, often in many places, anywhere in the gene (reviewed by Elgin, 1981 and by Weintraub, 1985). NUCLEOSOME

ORGANIZATION

W I L L BE H I G H L Y

DISRUPTED

IN ACTIVE GENES

Rationale. If the low affinity binding sites are distributed, on average, only a few hundred base pairs apart on an active gene, there will be little room for nucleosomes between the binding sites. Therefore, if an active gene is digested with micrococcal nuclease, one would expect to see some nucleosome monomers, and very few dimers or higher order structures. In many places, the spacing between low affinity sites may be too close to allow any nucleosomes on the DNA. Here one would expect to see a heterogeneous population of D N A sizes upon nuclease digestion, most likely as a smear on the gels going to a limit size which is protected by the specific DNA-binding protein. Data. A non-nucleosomal organization has been reported for a variety of active cellular genes (see Cohen and Sheffrey, 1985 and references therein). Nuclease digestion of active sea urchin histone genes indicates the presence of only mononucleosomes and only trace amounts o f higher order structures (Wu & Simpson, 1985). Additionally, the D N A of intranuclear adenovirus, herpesvirus, and MVM is not organized in classical nucleosomes, once again the digestion pattern of these DNAs is a smear to a band which is not the expected m o n o m e r nucleosome size (Futterer & Winnacker, 1984; Muggeridge & Fraser, 1986; Doerig et al., 1986). GENE

ACTIVATION

IN

VITRO

WILL

BE M U C H

LESS E F F I C I E N T

THAN

IN

VIVO

Rationale. In the domain model there are two reasons why in vitro experiments will not account for the high (108-fold) induction of gene expression seen in vivo (Ivarie et al., 1983): (a) the factors that control transcription and the DNA they bind are collected on the nuclear matrix in vivo, while in vitro the best that would be found is m a c r o m o l e c u l a r complexes which must be assembled by 3-dimensional diffusion: (b) gene activation in vivo is the end result of several sequential reactions each of which is only weakly activating. Suppose there are 5 biochemical steps in transcription (uncoiling, demethylation, dynamic site binding, UCE binding, and promoter recognition) and the overall activation is 108. Then an in vitro study of one of these steps would see 10 t~/SL o r 1016- or 40-fold induction. If the efficiency o f this reaction were further reduced in vitro, since it is now diffusion-limited, then an in vitro reaction may only see a 2- or 4-fold induction and still be biologically relevant. Additionally, plasmids transfected in vivo will be more efficient, but will not approach the 108-induction since the appropriate sequences from the domain ends are not present. Data. In vitro studies of transcription factor activation of promoters usually see a 3- to 100-fold enhancement of regulation. These same sequences may enhance transcription up to 400-fold in vivo in transfected plasmids. (For example, compare in vitro experiments in Sassone-Corsi et al., 1985 with in vivo experiments in Nomiyama et al., 1987.)

496

j.w.

CONDENSED

BODNAR

DNA C O R R E S P O N D S TO H E T E R O C H R O M A T I N ,

INACTIVE CHROMATIN,

A N D N U C L E O S O M E S WITH HI; E X T E N D E D D N A C O R R E S P O N D S TO E U C H R O M A T I N . ACTIVE C H R O M A T I N

A N D N U C L E O S O M E S WITH H M G ' S

Rationale. In addition to predictions about the biochemistry and molecular biology of eukaryotic DNA organization and function, the domain model predicts a general overall chromatin arrangement on the level of cell biology (Table 3). As seen in Fig. 1, the model suggests a mechanism for the structural difference between heterochromatin and euchromatin. In addition, it suggests a general pattern of condensed D N A at the nuclear periphery and extended DNA in the nuclear interior. However, note that even extended (active) D N A domains are stably bound to the nuclear periphery at their ends, and a fraction of extended domain transcription and replication will be in sequences near the nuclear periphery. Data. In general, heterochromatin is nontranscribed and found at the nuclear periphery, while euchromatin is transcribed and found in the nuclear interior (Brown, 1966). Inactive genes are associated with nucleosomes containing histone HI, while active genes are associated with nucleosomes containing high mobility group proteins ( H M G ' s ) (reviewed in Weisbrod, 1982). When the stable peripheral nuclear matrix is separated from the labile interior nuclear matrix, the nucleosomes from the peripheral matrix are HI-containing while the nucleosomes from the interior matrix contain H M G ' s (Bouvier et al., 1985). A more detailed discussion o f the biochemical differences between these partitions of chromatin consistent with the notion of condensed and extended D N A domains is found in Gross & Garrard (1987).

CONDENSED

(STABLE) D N A D O M A I N S ARE A S S O C I A T E WITH T H E P E R I P H E R A L

N U C L E A R MAT RIX W H I L E THE E X T E N D E D

( D Y N A M I C ) DNA DOMAINS ARE

A S S O C I A T E D WITH T H E I N T E R I O R F I B R I L L A R N U C L E A R M A T R I X

Rationale. The most stable DNA-nuclear matrix associations should be with the most stable nuclear structures, while dynamic associations should be with dynamic structures. TABLE 3

Correlation of partitions of chromatin to condensed and extended domains Condensed

Extended

Heterochromatin Inactive C h r o m a t i n Nucleosomes with HI Peripheral nuclear matrix Late replicating D N A

Euchroma t i n Active chromatin Nucleosom e s with H M G ' s Fibrillar internal nuclear matrix Early replicating D N A

EUKARYOTIC DNA DOMAIN ORGANIZATION

497

Data. The peripheral nuclear matrix is composed mainly of the nuclear iamins which are among the most insoluble and stable components of the cell (Gerace & Blobel, 1981). It has a simple protein composition and is stable through most nuclear matrix isolation protocols (Bouvier et al., 1980). On the other hand, the interior nuclear matrix is extremely labile with a complex protein composition and can be retained in nuclear matrix preparations only under conditions where disulfide bounds and metal-protein interactions are stabilized (Lebkowski & Laemmli, 1982; Kaufmann & Shaper, 1984; Dijkwel & Wenink, 1986). Immature chicken erythroblasts have a nuclear structure that contains a well defined interior fibriilar matrix when they are actively transcribing; as they mature, transcription stops, the interior matrix disappears, and the nuclear matrices of mature erythrocytes contain only exterior shells (LaFond & Woodcock, 1983). If stimulated to transcribe again by cell fusion, the erythrocytes regain their interior fibriilar nuclear matrix (LaFond et a/., 1983). Similarly, in human lymphocytes the interior nuclear matrix increases in protein content and complexity when those cells are transcriptionally active (Konstantinovic & Sevaljevic, 1983; Setterfield et aL, 1983).

CONDENSED

(PERIPHERAL) (INTERIOR)

DNA

DOMAINS

DNA

REPLICATE

DOMAINS

LATE WHILE

REPLICATE

EXTENDED

EARLY

Rationale. Since inactive chromatin replicates late and active chromatin replicates early as determined by biochemical methods (see above), one would expect cell biological experiments to also reflect this. Data. Heterochromatin generally replicates late in S phase, while euchromatin replicates early. In synchronized mammalian cells, DNA replication (as judged by 3H-thymidine autoradiography on electron microscope sections) is found at the nuclear periphery late in S phase, but in the nuclear interior early in S phase (see Huberman et al., 1973 and references therein). Also, the inactive X chromosome in mammals mainly replicates late in S phase and is localized at the nuclear periphery, while the active X chromosome replicates early (Migeon et al., 1986 and references therein).

IN SITU AROUND

NUCLEASE

THE

SENSITIVE

NUCLEAR

DNA

PERIPHERY THE

WILL

AND

NUCLEAR

BE F O U N D

IN

A

1N A D I S C R E T E

DIFFUSE

REGION

SHELL

THROUGHOUT

INTERIOR

Rational. In the domain model, one would expect to see two major types of nuclease sensitive DNA (Fig. 1). The condensed domains will comprise the bulk nuclease insensitive DNA, but there should remain short regions exposed at the base of all loops available for recognition as DNA replication origins, and accessible to transcriptional activators and also nucleases. These would be seen as a shell of nuclease sensitive DNA around the nuclear periphery close to the stable attachment sites and tightly bound proteins. The extended DNA domains will be nuclease sensitive (see above) and would be seen throughout the nuclear interior. While

498

j.w.

BODNAR

active genes are nuclease sensitive, I suggest that they represent only a subset of the nuclease sensitive DNA, since in the domain model, all DNA domains will be sensitive to nucleases at their ends. Therefore, it should be noted that demonstrating that all active genes are nuclease sensitive does not imply that all nuclease sensitive DNAs represent active genes. Data. In situ nick translation of DNA with a biotinylated nucleotide followed by immunological detection of the biotinylated D N A has shown that mouse cells have exposed DNA in a shell at the nuclear periphery and in diffuse patterns throughout the nuclear interior (Hutchison & Weintraub, 1985). Additionally, ethylation of DNA in situ by N-nitroso carcinogens (presumably in exposed areas of chromatin) is preferentially localized near the tightly bound proteins on rat DNA (Nehls et aL, 1984). THE

BIOPHYSICAL

DIFFERENT

ISOLATION

RESULTS

OF

NUCLEAR

DEPENDING THAT

ON THE

MATRIX-BOUND EXACT

DNA

PROTOCOLS

WILL GIVE USED

IN

ISOLATION

Rationale. While the domain model is straightforward to describe, it suggests that DNA-nuclear matrix interactions are complex. There is one class of stable nuclear matrix attachment sites (which may in fact have several types of proteins involved), and there are several classes of dynamic attachment sites ( D N A replication complexes, complexes involved in D N A extension, promoter recognition sites, and transcription complexes themselves) each o f which could have several different kinds of interactions. In addition, the peripheral and interior nuclear matrices are stabilized differently. Therefore, a particular preparation may contain one or both of these types of nuclear matrix structures along with any subset of the nuclear matrix-bound DNA's depending on the exact protocol used (e.g. cell type, detergents, salt, reducing or oxidizing agents, chelating agents, DNAse, RNAse, etc.) Data. The data on nuclear matrix-bound D N A has been diverse and often apparently contradictory to such an extent that one reviewer summed up the scepticism about the data (Zakian, 1985) by saying that all nuclear matrix experiments must be examined "with a grain of salt". The Domain Model and Cell Differentiation

The domain model can be used to explain a wide variety of biochemical, molecular and cell biological data, but at first glance there seems to be little reason why eukaryotes would have evolved with a D N A organization that requires several sequential sets of DNA-protein interactions in concert with 50 kilobase pairs of DNA to activate one gene. In contrast, bacteria can activate a gene with a single repressor and a DNA segment one tenth as large. I propose that the domain model is advantageous to the eukarvotic organism in that it provides a simple and flexible mechanism for cell differentiation. In this model, the utilization of D N A in any cell of an organism may be inefficient, but that inefficiency is the price that is paid for

EUKARYOTIC

DNA

DOMAIN

ORGANIZATION

499

the ability of the whole organism to regulate genes independently in m a n y different cell types.

The domain model provides a simple method to regulate many genes with only a few regulatory proteins. The domain model is in essence a combinatorial model which has been suggested by others as a means of gene regulation (Gierer, 1973; Alberts et al., 1983; Weintraub, 1985; Y a m a m o t o , 1985). Let us consider a hypothetical eukaryotic organism in which gene activation occurs in 5 distinct biochemical steps (uncoiling, demethylation, gene extension, U C E recognition, and p r o m o t e r recognition). In this organism, let us suppose that there are ten different control proteins at each step (i.e. 10 uncoiling proteins, 10 demethylases, etc.) which are encoded in that organism's genome. This organism will have in total 50 ( = 10x 5) different regulatory genes. However, to be expressed in this organism a gene must contain a particular combination of D N A sequences recognized by the appropriate control proteins, and only that one combination of factors will work to transcribe that gene. There are 100,000 ( = 105) different combinations o f control sequences that could be found in the proper places in a given D N A domain; in this organism 100,000 different genes could be regulated by its 50 control factors. Therefore, the h u m a n g e n o m e needs only to encode ideally 50 (and realistically only a few hundred) control genes to regulate its 100,000 genes rather than the 20,000 control genes that would be required if the o p e r o n model applied to humans. The domain model provides a simple mechanism to regulate genes independently in different cell types and to coordinate activation of different genes in different cells by steroid hormones. In the d o m a i n model "active" genes are defined as those genes which are uncoiled or both uncoiled and extended (both categories probably exist). As shown in Fig. 4, this means that a particular cell type will be defined by which domains are uncoiled and extended in that cell. A " h o u s e k e e p i n g " gene ( D o m a i n

Cell typeA

Cell type B

FIG. 4. The domain model and cell differentiation. Cell type-specific gene expression depends on the particular DNA domains that are extended in a given cell type. In all cell types, domains are defined by stable attachment sites (O), and active genes (FIEIE]) are extended onto the nuclear matrix at multiple dynamic attachment sites (O). 1, DNA domain active in cell type A; 2, housekeeping gene active in both cell types; 3, DNA domain active in cell type B.

500

J.W.

BODNAR

2 in Fig. 4) is extended in all cell types, but a gene which is active in one cell type is extended in that ceil type (e.g. Fig. 4 Domain 1 in Cell Type A) but condensed in all other cell types (e.g. Domain 1 in Cell Type B). Therefore, a particular cell need only express one control factor of each type for the first two (or three) biochemical steps of gene activation. Therefore, for our hypothetical organism, assume that the first three biochemical steps (i.e. uncoiling, demethylation, gene extension) define the "active" domains in that cell type by expressing one factor of each of the 3 kinds in each cell type. Then this organism can, with its 50 factors, have 1000 (10 ~) different cell types. Even if the appropriate promoter recognition proteins are available in two different cell types, a gene may be transcribed in one and not the other (Becker et al., 1987) depending on whether or not the promoter is extended for recognition. If Cell Type A is one of the 1000 cell types in our hypothetical organism, the genes that are actually transcribed in Cell A are defined by the factors expressed in that cell at the last 2 biochemical steps (UCE and promoter recognition). Therefore, this organism can regulate 100 ( = 102) different genes independently in Cell Type A (and every other cell in the organism). As discussed above, steroid hormone receptors have all the characteristics expected of control proteins involved in the extension step of gene activation. Therefore, when a steroid hormone receptor is activated by steroid binding it can bind the nuclear matrix and D N A but only the D N A that is in an uncoiled conformation. Since different DNA domains are uncoiled in the different cell types, the steroid receptor will bind different classes of genes in each cell type. Therefore, by this mechanism, an organism can induce different genes in different cell types in response to stimulation by a single hormone. The domain model provides a mechanism for the extremely high level of transcription activation required for hormones to work in a multicellular organism. In bacterial cells, the difference between m R N A levels of the lac operon in induced and uninduced cells is a factor of 10 ~ (Beckwith & Zipser, 1970). In eukaryotes, the difference between induced and uninduced m R N A levels is estimated to be a factor of 108 (Ivarie et aL, 1983). This high level o f induction is a necessity for steroid hormones to work in a multicellular organism. For example, suppose our hypothetical organism had a gland that secreted a steroid hormone which represented 0.1% of the total cells in that organism (i.e. 1 out of 1000 cells). Then assume that the basal level of transcription was such that one steroid molecule was secreted per hour per cell. This would mean that the total basal level steroid synthesis in the organism would be 10-90 molecules per hour (i.e. 1 molecule per cell x 1000 cells). Now, if the induction of steroid production in the gland were a factor of 10 3 (as in bacteria) when the gland was to make the steroid, the total steroid production in the organism would only double to 2000 steroid molecules per hour (103 molecules per hour in the gland plus 10~ molecules per hour basal level throughout the organism). However, suppose that steroid production in the gland was controlled under the domain model where each of the 5 sequential steps in gene activation enhanced transcription levels by a factor o f 100. In this case, the total induction o f steroid production in the gland would be a factor of 10 t° ( = 100 ~) and the total increase in steroid levels

EUKARYOTIC DNA DOMAIN ORGANIZATION

501

throughout the organism would be a factor of 107 ( = [10 "~ molecules per hour in the gland plus 103 molecules per hour basal level throughout the organism]/103 molecules per hour total in the uninduced state). Therefore, the domain model can account for the extremely high levels of induction required for a hormone gene in a gland to insure a significant increase of that hormone throughout the organism. All steps in the domain model for gene activation are kinetically favorable. As discussed in detail above, the operon model cannot work in a eukaryotic nucleus since the kinetics of repressor-DNA interactions would be unfavorable in the midst of the tremendous amounts of DNA in the nucleus. In the domain model, each step is kinetically favorable: (1) recognition of DNA replication origins and sequences involved in domain uncoiling are driven by compartmentalization of the appropriate protein factors and DNA sequences on the nuclear matrix at the base of the loop domains; (2) the extension step of gene activation is driven by extremely high concentrations of both the appropriate protein factors and the specific DNA sequences they bind; (3) the promoter recognition step is driven again by compartmentalization of factors and sequences they bind in different locations on the nuclear matrix. Note that this scheme produces a cell where the regulation of DNA processes is essentially independent of the total amount of DNA in that cell (i.e. there is no constraint on genome size). The Domain Model and Genome Evolution

Perhaps the most intriguing feature of the domain model is its ability to account for a eukaryotic genome that is composed mainly of repetitive sequences and "junk" DNA. Once again, one can predict very specific characteristics of the genome in a organism that has evolved using the mechanisms suggested by this model. These characteristics are complex but reflect features of the eukaryotic genome that are impossible to explain by any extrapolation of the operon model. Long interspersed repeats. Within a DNA domain one would expect to find a class of repetitive elements at one or both ends of each domain which would serve both as DNA replication origins for condensed domains and as signals to control the uncoiling step in gene activation. The predicted characteristics of these elements are similar to those seen for LINES--long interspersed sequences (reviewed by Singer, 1982): (1) they will be found in moderate copy number (i.e. the total number of all types of these elements in the human genome should be one per domain or about 100,000); (2) they will be spaced about 50 kilobase pairs apart along the DNA but physically clustered at the base of adjacent loops (Manuelidis & Ward, 1984). Simple sequence DNA. In the domain model, most of the DNA throughout each domain will consist of sequences whose major function is to extend the DNA on the nuclear matrix during gene activation.. Therefore, one would expect to see long stretches of simple DNA sequences with the multiple binding sites for control proteins. The entire domain would probably be extremely GC- or AT-rich depending on whether the specific low affinity protein binding sites are GC- or AT-rich. The co-operative nature of the extension step in gene activation is consistent with DNA full of long stretches of "mindlessly-repeated" sequence elements. Additionally, by

502

J.W.

BODNAR

analogy, the long simple sequences repeated in satellite D N A would contain the thousands of repeated protein binding sites necessary to anchor an entire chromosome during mitosis. Intermediate repeat D N A . While not discussed directly in the simple model presented above, one can envisage specific DNA repetitive sequences which are dispersed throughout a D N A domain to act as DNA replication origins for extended domains a n d / o r as transcriptional activators during domain extension. The characteristics of intermediate repeat DNA such as human Alu or mouse B1 sequences would be consistent with this role: (1) they are dispersed throughout the genome but localized to specific chromosome bands (Manuelidis & Ward, 1984); (2) they are found in tens of copies '~randomly" throughout gene domains which contain them (Singer, 1982); (3) they have significant homologies to viral D N A replication origins (Jelinek et al., 1980). Amplified genes. Since the basic functional unit in this model is the D N A domain, when a genetic element is amplified, one would expect that the amplification boundaries would correspond to the domain ends. While these ends are widely separated on the linear D N A map, they are clustered at stable attachment sites so that recombination of over-replicated DNA would occur at the domain ends (Schimke, 1982). This is consistent with the specific amplification units seen for the hamster D H F R gene where the amplicon is 150 kilobase pairs, the hamster CAD gene where the amplicon is about 500 kiiobase pairs, and the human U1 gene where a 164 base pair transcription unit is amplified in a specific 44 kilobase pair domain (Burhans et al., 1986; Wahl et al., 1982; Bernstein et al., 1985). Pseudogenes. The domain model suggests that in addition to all the functional repetitive elements there will be a certain amount of " j u n k " D N A in a eukaryotic genome since there is little selective pressure to remove " j u n k " D N A domains. Suppose a gene is duplicated and during the duplication process the new domain is sufficiently jumbled that there are no cells in the organism which contain the appropriate combination o f signals for gene activation. It would be carried in each cell (even if the coding sequences diverged) where each cell would treat it as if it were a domain used in another cell type. Mechanisms to remove that domain would have to be extremely well developed to prevent removal of a domain which is used in only one cell type in the organism (e.g. the globin genes are " j u n k " in every cell in the body except erythroblasts). This would suggest that mechanisms to remove pseudogenes might be advantageous only to unicellular or extremely simple eukaryotes such as yeast. C-Value paradox. Certain species, such as lilies and salamanders, have as much as 20 times more DNA in their genomes as does the human genome. A major problem in understanding DNA function in eukaryotes (usually termed the C-Value paradox) is why an organism which is certainly not 20 times as complex as a human needs 20 times as much D N A (reviewed by Orgel & Crick, 1980). The domain model provides a simple basis by which eukaryotic genome DNA content is essentially independent of organismal complexity. In the domain model, DNA functions (both replication and transcription) are modular and independent o f genome size. The modular replicons (Fig. 2) are compartmentalized so that replication of hundreds

EUKARYOTIC

DNA

DOMAIN

503

ORGANIZATION

of thousands of domains is mechanistically as simple as replication of only a few. The modular domains for gene expression (Fig. 3) are condensed and inaccessible to transcription factors except in the cells where they are active. Therefore, it is easy to visualize that these same mechanisms (Figs 2 and 3) would work as well in a genome of virtually any size.

CONSERVATION

OF

STRUCTURAL AND

DNA

DOMAINS

BETWEEN

EUKARYOTES

PROKARYOTES

Despite the major differences in DNA organization between eukaryotes and prokaryotes, there are elements of DNA domain structure which are shared between them that are consistent with their evolution from a common ancestor. In E. coli, the genome is constrained in a "nucleoid" structure in which the DNA is anchored by proteins (and possibly RNA) to the membrane in about 100 independently supercoiled loops that are about 60 kilobase pairs long (Lydersen & Pettijohn, 1977; Sinden and Pettijohn, 1981 and references therein; Pettijohn, 1982). DNA replication complexes and DNA replication origins are associated with the bacterial membrane (Huberman, 1968; reviewed by Moyer, 1979; Sparks & Helinski, 1979; Kusano et al., 1984). Additionally, proteins tightly bound to E. coli DNA are very similar in size, physical characteristics, and antigenicity to those tightly bound to mammalian DNA at stable nuclear matrix attachment sites (Werner & Petzelt, 1981; Werner et al., 1981; Bodnar et al., 1983; Bodnar & Ward, unpublished observations). These similarities suggest that DNA loop domains predate the divergence of prokaryotes and eukaryotes and that eukaryotic nuclear matrix structures are functionally equivalent to the bacterial membrane (Dingman, 1974; Cavalier-Smith, 1982). In both cases this DNA organization is necessary for proper replication and chromosome segregation. The common ancester to both prokaryotes and eukaryotes most likely had a DNA domain organization similar to present day bacteria with one DNA replication origin at the base of one of the loop domains. One can envisage three changes in that DNA domain organization which led to the modular eukaryotic DNA organization which is independent of genome size. Modular DNA replicons required evolution of DNA replication origins at the bases of all the DNA loop domains. Modular transcriptional units required the evolution of a muitistep pathway for transcriptional activation along with an efficient DNA packaging mechanism (i.e. the nucleosome) which could sequester inactive genes from the transcriptional machinery. These changes allowed a genome which was unconstrained in size and could be regulated in a cell type-specific manner. I suggest that the evolution of a modular domain organization in the eukaryotic genome led to the emergence of differentiated multicellular organisms. In the domain model, the D N A use in each eukaryotic cell is extremely inefficient, but the flexibility in gene regulation afforded by the domain organization is advantageous to the whole organism. Therefore, it is short-sighted to compare the DNA organization in each cell of E. coli and man and suggest that prokaryotes are "'efficient" while eukaryotes are "inefficient". One should instead look at the whole organism in each case and realize that the DNA organization is merely different, not better or worse,

504

J . w . BODNAR

and the result is that each type of organism has evolved at a price. E. coli exists with a small genome where each cell is efficient (since each cell is the organism); the price it pays is that the DNA per cell and consequent organismal complexity is limited by kinetic restraints (Table l) in evolving beyond a single cell. In eukaryotes, the flexibility afforded by being able to have efficient cell type-specific gene regulation in many cell types with only a few hundred regulatory genes is apparently the underlying advantage; the price paid is that only a small percentage of the D N A is used in any cell and the genome may contain large amounts of " j u n k " (i.e. unused ) DNA. Conclusions

I have presented a simple model for eukaryotic DNA organization and function in which the basic structural and functional unit o f eukaryotic D N A is the DNA domain. I believe that the most important advantages of this model over previous models are: (1) it accounts for an underlying difference in cell size, DNA content, and chromosome organization between prokaryotes and eukaryotes; (2) it is totally consistent with previous models (i.e. virtually every paragraph in this paper has been previously proposed in one way or another; it is in the way that the theories are synthesized that is new); (3) one simple scheme which is essentially summarized in one diagram (Fig. 1) and explained in two more (Figs 2 and 3), is consistent with a wide variety of data at the levels o f molecular, cellular, developmental, and evolutionary biology; (4) it presents a simple testable explanation for mechanisms of cell differentiation; (5) it accounts for a eukaryotic genome made up of mostly repetitive sequences and " j u n k " DNA. The value of this domain model should be in its ability to synthesize a wide variety of biological viewpoints and in its ability to make testable predictions on how eukaryotes and prokaryotes have evolved differently. l thank Ron Berezney and David Ward for helpful discussions in the development of the model, Gwilym Jones, Charles Ellis, and Marie Chow for helpful discussions and critical reading of the manuscript, and Rosanne Cedroni and Elizabeth Fox for their assistance in preparation of the manuscript. This work was supported in part by the Northeastern University Research and Scholarship Development Fund.

REFERENCES AELEN, J. M. A., OPSTELTEN, R. J. G. & WANKA, F. (1983). NucL Acids Res. It, 1181. ALBERTS, B., BRAY, D., LEWIS, J., RAFF, M., ROBERTS, K. & WATSON, J. D. (1983). Molecular Biology of The Cell. pp. 444-445. New York: Garland Publishing Inc. ALEVY, M. C., TSAI, M. & O'MALLEY, B. W. (1984). Biochem. 23, 2309. ALEXANDER, R. B., GREENE, G. L. & BARRACK E. R. (1987). Endocinology 120, 1851. ALLEN, J. R., LASSER, G. W., GOLDMAN, D. A., BOOTH, J. W. & MATHEWS, C. K. (1983). Z Biol. Chem. 258, 5746.

EUKARYOTIC

DNA DOMAIN

ORGANIZATION

505

ASTELL, C. R., THOMSON, M., CHOW, M. B. & WARD, D. C. (1982). Cold Spring Hbr. Syrup. Quant. BioL 47, 751. BARRACK, E. R. & COFFE¥, D. S. (1982). Rec. Prog. Horm. Rex. 38, 133. BECKER, P. B., RUPPERT, S. & SCHUTZ, G. (1987). Cell 51,435. BECKWITH, J. R. & ZIPSER, D. (ed.) (1970). The Lactose Operon. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory. BENYAJATI, C. & WORCEL, A. (1976). Cell 9, 393. BEREZNEY, R. (1979). The Cell Nucleus (Busch, H., ed.) Vol. 7, p. 413. New York: Academic Press. BERGSMA, D. J., OLIVE, D. M., HARTZELL, S. W. & SUBRAMANIAN, K. N. (1982). Proc. NaIL Acad. Sci. U.S.A. 79, 381. BERNSTEIN, L. B., MANSER, T. & WEINER, A. M. (1985). MoL & CelL BioL 5, 2159. BIRD, A. P., TAGGART, M. H. & SMITH, B. A. (1979). Cell 17, 889. BODNAR, J. W. & WARD, D. C. (1987). NucL Acids Res. 15, 1835. BODNAR, J. W., JONES, C. J., COOMBS, D. H., PEARSON, G. D. & WARD, D. C. (1983). MoL & CelL BioL 3, 1567. BOUVtER, D., HUBERT, J. & BOUTEILLE, M. (1980). J. Ultra. Res. 73, 288. BOUVIER, D., HUBERT, J. SEVE, A. & BOUTEILLE, M. (1985). Exp. Cell. Res. 156, 500. BROWN, E. H., IQBAL, M. A., STUART, S., HATTON, K. S., VALINSKY, J. & SCHILI)KRAUT, C. L. (1987). MoL & Cell. Biol. 7, 450. BROWN, S. W. (1966). Science 151,417. BUONGIORNO-NARDELLI, M., MICHELI, G., CARRI, M. T. & MARILLEY, M. (1982). Nature 298, 100. BURHANS, W. C., SELEGUE, J. E. & HEINTZ, N. H. (1986). Biochem. 25, 441. CARRI, M. T., MICHELt, G. GRAZtANO, E., PACE, T. & BUONGIORNO-NARDELLI, M. (1986). Exp. CelL Res. 164, 426. CAVALIER-SMITH, T. (1982). In: Wistar Symposium, Series 2, 307. CHAMBON, P., DIERICH, A., GAUB, M.-P., JAKOWLEV, S., JONGSTRA, J., KRUST, A., LEPENNEC, J.-P. OUOET, P. & REUDELHUBER, T. (1984). Rec. Prog. Horm. Res. 40, 1. CHOW, K.-C. & PEARSON, G. D. (1984). NucL Acids Res. 12, 1489. COCKERILL, P. N. & GARRARD, W. T. (1986). Cell 44, 273. COHEN, R. B. & SHEFFREY, M. (1985). J. MoL BioL 182, 109. COOK, P. R. & BRAZELL, I. A. (1976). J. CelISci. 22, 303. COOK, P. R. & LANG, J. (1984). NucL Acids Res. 12, 1069. DARNELL, J., LODISH, H. & BALTIMORE, D. (1986). Molecular Cell Biology, pp. 136-138. New York: Scientific American Books. DIJKWEL, P. A. & WENINK, P. W. (1986)../. CelISci. 84, 53. DIJKWEL, P. A., MULLENDERS, L. H. F. & WANKA, F. (1979). NucL Acids Res. 6, 219. DLIKWEL, P. A., WENINK, P. W. & PODDIGHE, J. (1986). NucL Acids Res., 14, 3241. DINGMAN, C. W. (1974). J. theor. BioL 43, 187. DOERFLER, W. (1983). Ann. Reu. Biochem. 52, 93. DOERIG, C., McMASTER, G., SOGO, J., BRUGGMAN, H. & BEARD, P. (1986). J. ViroL 58, 817. EDENBERG, H. J. & HUBERMAN, J. A. (1975). Ann. Rev. Gen. 9, 245. ELGIN, S. C. R. (1981). Cell 27, 413. EVERETT, R. D., BATY, D. & CHAMBON, P. (1983). NucL Acids Res. II, 2447. FALKNER, F. G., MOCIKAT, R. & ZACHAU, H. G. (1986). NucL Acids Res. 14, 8819. FUTTERER, J. & WtNNACKER, E.-L. (1984). Curr. Top. Micro. lmmun. III, 41. GAZIT, B. & CEDAR, H. (1980). NucL Acids Res. 8, 5143_ GEISSE, S., SC'HEIDEREtT, C., WESTPHAL, H. M., HYNES, N. E., GRONER, B. & BEATO, M. (1982). EMBOJ. I, 1613. GERAC'E, L. & BLOBEL, G. (1981). Cold Spring Hbr. Symp. Quant. BioL 46, 967. GIERER, A. (1973). Cold Spring Hbr. Syrup. Quant. Biol. 38, 951. GOLDMAN, M. A., HOLMOUIST, G. P., GRAY, M. C., CASTON, L. A. & NAG, A. (1984). Sci. 224, 686. GROSS, D. S. & GARRARD, W. T. (1987). Trends Biochera. Sci. 12, 293. HANCOCV,, R. (1982). BioL of the Cell 46, 105. HARTWlG, M. (1978). Acta. BioL Med. Ger. 37, 421. HARTZELL, S. W., YAMAGUCHI, J. & SUBRAMANIAN, K. N. (1983). NucL Acids Res. II, 1601. HOCHSTRASSER, M. & SEDAT, J. W. (1987a). J. Cell Biol. 104, 1455. HOCHSTRASSER, M. & SEDAT, J. W. (1987b). J. CelL BioL 104, 1471. HOLMQUIST, G. P. & CASTON, L. A. (1986). Biochim. Biophys. Acta 868, 164. HUBERMAN, J. A. (1968). Cold Spring Harbor Syrup. on Quant. BioL 33, 509. HUBERMAN, J. A., TSAI, A. & DEICH, R. A. (1973). Nature 241, 32.

506

J.w.

BODNAR

HUTCHISON, N. & WEINTRAUB, H. (1985). Cell 43, 471. HYMAN, R. W. (1980). Virology 105, 254. JvARIE, R. D., SCHACTER, B. S. & O'FARRELL, P. H. (1983). Mot. Cell. Biol. 3, 1460. JACKSON, D. A. & COOK, P. R. (1986). E M B O J . 5, 1403. JELINEK, W. R., TOOMEY, T. P., LEINWAND, L., DUNCAN, C., BIRO, P. A., CHOUDARY, P. V., WEiSSMAN, S. M., RUBIN, C. M., HOUC'K, C. M., DEININGER, P. L. & SCHMID, C. W. (1980). Proe. Natl. Acad. Sci. U.S.A. 77, 1398. JONES, K. A., KAOONAGA, J. T., ROSENFELD, P. J., KELLY, T. J. & TJIAN, R. (1987). Cell 48, 79. KADONAGA, J. T., JONES, K. A. & TJIAN, R. (1986). Trends Biochem. Sci. ll, 20. KAUEMANN, S. H. & SHAPER, J. H. (1984). Exp. Cell. Res. 155, 477. KAUFMANN, S. H., OKRET, S., WIKSTROM, A., GUSTAFSSON, J. & SHAPER, J. H., (1986). J. Biol. Chem. 261, 11962. KIROV, N., DJONDJUROV, L. & TSANEV, R. (1984). J. Mol. Biol. 180, 601. KOENIG, M., HOFFMAN, E. P., BERTELSON, C. J., MONACO, A. P., FEENER, C. & KUNKEL, L. M. (1987). Cell 50, 509. KONSTANTtNOVlC, M. & SEVALJEVlC, L (1983). Biochim. Biophys. Acta 762, 1. KUSANO, T., STEINMETZ, D., HENDRICKSON, W. G., MURCHIE, J., KING, M., BENSON, A. & SCHAECTER, M. (1984). J. Bact. 158, 313. LAFOND, R. E. & WOODCOCK, C. L. F. (1983). Exp. Cell Res. 147, 31. LAFOND, R. E., WOODCOCK, H., WOODCOCK, C. L. F., KUNDAHL, E. R. & LUCAS, J. J. (1983). J. Cell, Biol. 96, 1815. LAWSON, G. M., TSAI, M. & O'MALLEY, B. W. (1980). Biochem. 19, 4403. LEBKOWSKI, J. S. & LAEMMLI, U. K. (1982). J. Mol. Biol. 156, 325. LIN, S. & RIGGS, A. D. (1975). Cell 4, 107. LUCHNfK, A. N., BAKAYEV, V. V., ZBARSKY, I. B. & GEORGIEV, G. P. (1982). E M B O J . i, 1353. LYDERSEN, B. K. & PETTIJOHN, D. (1977). Chromosoma 62, 199. MANUELIDIS, L. & WARD, D. C. (1984). Chromosoma 91, 28. MCCREADY, S. J., GODWIN, J., MASON, D. W., BRAZELL, I. A. & COOK, P. R. (1980). J. CellSci. 46, 365. MCKNIGHT, S. & TJIAN, R. (1986). Cell 46, 795. MIGEON, B. R., SCHMIDT, M., AXELMAN, J. & CULLEN, C. R. (1986). Proc. Natl. Acad. Sci. U.S.A. 83, 2182. MtRKOVITCH, J., MtRAULT, M. & LAEMMLI, U. K. (1984). Cell 39, 223. MOLITOR, H., DRAHOVSKY, D. & WACKER, A. (1976). Biochim. Biophys. Acta 432, 28. MOYER, M. P. (1979). Int. Rev. o f Cytology 61, 1. MUGGER1DGE, M. I. & FRASER. (1986). J. Virol. 59, 764. MULLER, W. J., MUELLER, C. R., MES, A. & HASSELL, J. A. (1983). J. Virol, 47, 586. NEHLS, P., RAJEWSKY, M. F., SPIESS, E. & WERNER, D. (1984). E M B O J . 3, 327. NELSON, W. G., PIENTA, K. J., BARRACK, E. R. & COFFEY, D. S. (1986). Ann. ReD. Biophys. Chem. 15, 457. NOMIYAMA, H., FROMENTAL, C., XIAO, J. H. & CHAMBON, P. (1987). Proc. Natl. Acad. Sci. U.S.A. 84, 7881. NORTH, G. (1985). Nature 316, 394. ORGEL, k. E. & CRICK, F. H. C. (1980). Nature 284, 604. PARDOLL, D. M., VOGELSTEIN, B. & COFEEY, D. S. (1980). Cell 19, 527. PARSLOW, T. G., JONES, S. D., BOND, B. & YAMAMOTO, K. R. (1987). Science 235, 1498. PAULSON, J. R. & LAEMMLI, U. K. (1977). Cell 12, 817. PAYVAR, F., DEFRANC'O, D., FIRESTONE, G. L., EDGAR, B., WRANGE, ()., OKRET, S., GUSTAFSSON, J. & YAMAMOTO, K. R. (1983). Cell 35, 381.

PAYVAR, F., FIRESTONE, G. L., Ross, S. R., CHANDLER, V. L., WRANGE, O., CARLSTEDT-DUKE, J., GUSTAFSSON, J. & YAMAMOTO, K. R. (1982). J. Cell. Biochem. 19, 241.

PETTIJOHN, D. (1982). Cell 30, 667. PRUIJN, G. J. M., VAN DRIEL, W. & VAN DEN VLIET, P. C. (1986). Nature 322, 656. RAZIN, S. V., CHERNOKHVOSTOV, V. V., ROODYN, A. V., ZBARKSY, 1. B. & GEORGIEV, G. P. (1981). Cell 27, 65. RAZIN, S. V., KEKELIDZE, M. G., LUKANIDIN, E. M., SCHERRER, K. & GEORGIEV, G. P. (1986). Nucl' Acids Res. 14, 8189. REKOSH, D. M. K., RUSSELL, W. C., BELLET, A. J. D. & ROatNSON, A. J. (1977). Cell II, 283. RENNIE, P. S., BRUCHOVSKY, N. & CHENG, H. (1983). J. Biol. Chem. 258, 7623. ROIZMAN, B. (1979). Ann. Rev. Genet. 13, 25.

EUKARYOTIC

DNA DOMAIN

ORGANIZATION

507

RYOJI, M. & WORCEL, A. (1985). Cell 411, 923. SACKS, W. R. & SCHAFFER, P. A. (1987). J. ViroL 61,829. SASSONE-CORSI, P., WlLDEMAN, A. & CHAMBON, P. (1985). Nature 313, 458. SCHIMKE, R. T. (1982). In: Gene Amplification (Schimke, R. T., ed.), p. 317. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory. SCHIRM, S., JIRICNY, J. & SCHAFFNER, W. (1987). Genes and Development I, 65. SETTERFIELD, G., HALL, R., BLADON, Z., LITTLE, J. & KAPLAN, J. G. (1983). Z Ultra. Res. $2, 264. SHENK, T. & WILLIAMS, J. (1984). Curt. Top. Micro. and lmmun. !11, 1. SINDEN, R. R. & PE'VrlJOHN, D. (1981). Proc. Natl. Acad. Sci. U.S.A. 78, 224. SINGER, M. F. (1982). Cell 28, 433. SMALL, D., NELKIN, B. R, VOGELSTEIN, B. (1985). Nuel. Acids Res. 13, 2413. SMITH, D. R., JACKSON, 1. J. & BROWN, D. D. (1984). Cell 37, 645. SPARKS, R. B., Jr. & HEUNSKI, D. R. (1979). Nature 277, 572. SPIEB, E., NEUER, B. & WERNER, D. (1982). Biochem. Biophys. Res. Comm. 1114, 548. STALDER, J., LARSEN, A., ENGEL, J. D., DOLAN, M., GROUDINE, M. & WEINTRAUB, H. (1980). Cell 20, 451. STENLUND, A., BREAM, G. L. & BOTCHAN, M. R. (1987). Science 236, 1666. STOW, N. D. & McMONAGLE, E. C. (1983). Virology 1311,427. TYNDALL, C., LA MANTIA, G., THACKER, C. M., FAVALORO, J. & KAMEN, R. (1981). Nucl. Acids Res. 9, 6231. VELDMAN, G. M., LUPTON, S. & KAMEN, R. (1985). Mol. & Cell. Biol. 5, 649. VILLEPONTEAU, B., LUNDELL, M. & MARTINSON, H. (1984). Cell 39, 469. VOGELSTEIN, B., PARDOLL, D. M. & COFFEY, D. S. (1980). Cell 22, 79. WAHL, G. M., VITTO, L., PADGETT, R. A. & STARK, G. R. (1982). Mol. & Cell. Biol. 2, 308. WATSON, J. B. & GRALLA, J. D. (1987). Z Virol. 61, 748. WEINTRAUB, H. (1985). Cell 42, 705. WEtSaROD, S. (1982). Nature 297, 289. WERNER, O. & PETZELT, C. (1981). J. Mol. Biol. 150, 297. WERNER, D., ZIMMERMAN, H-P., RAUTERBERG, E. & SPALIN(3ER,J. (1981). Exp. Cell Res. 133,, 149. WIMMER, E. (1982). Cell 2g, 199. Wu, M. HYMAN, R. W. & DAVIDSON, N. (1979). Nucl. Acids Res. 6, 3427. WU, T.-C. & SIMPSON, R. T. (1985). Nucl. Acids Res. 13, 6185. YAMAMOTO, K. R. (1985). Ann. Rev. Genet. 19, 209. YAMAMOTO, K. R. & ALaERTS, B. M. (1976). Ann. Rev. Biochem. 45, 721. YANG, L., ROWE, T. C., NELSON, E. M. & LIu, L. F. (1985). Cell 41, 127. ZAKIAN, V. A. (1985). Nature 314, 223. ZUCKERKANDL, E. (1981). Mol. Biol. Rep. 7, 149.