A predictive model for DNA recognition by the herpes simplex virus protein ICP4

A predictive model for DNA recognition by the herpes simplex virus protein ICP4

J. Mol. Hid. (1991) 219, 451-470 A Predictive Model for DNA Recognition by the Herpes Simplex Virus Protein TCP4 J. A. DiDonatoi, J. R. Spitznert ...

7MB Sizes 9 Downloads 29 Views

J. Mol.

Hid.

(1991) 219, 451-470

A Predictive Model for DNA Recognition by the Herpes Simplex Virus Protein TCP4 J. A. DiDonatoi,

J. R. Spitznert

and M. T. Mullerg

The Ohio State C’niversity Department of Molecular Genetics and The Molecular, Cellular, and Developmental Biology Columbus, OH 43210, U.S.A. (Received 6 November

1990: accepted 29 January

Program

1991)

The herpes simplex virus (HSV) type 1 immediate early protein ICP4 is an essential regulat,ory enzyme that binds DNA directly in order to stimulate or repress gene expression. The degree of transaction is related to the locations and affinities of the ICP4 binding sites. A number of binding sites have been identified; some sites showed obvious homology to one anot’her, and these were called consensus ICP4 binding sites. Other binding sites did not appear to be related, and these were termed non-consensus sites. We hypothesized, however. that a single model could describe all ICP4 binding sites. given the appropriate charact,erizations of sites. We performed statistical analyses on a set of ICP4 binding sites and found that t’he bases important for defining binding were located within a 13 base region. Missing contact analyses on several high-affinity binding sites revealed the same 13 base region as important for critical protein-DNA contacts. From these data we derived the consensus sequence RTCGTCNSYNYSG, where R is purine, Y is pyrimidine, S is (’ or (:, and X is any base. In addition, we found that a better profile for ICP4 binding sites involves use of a matrix of base proportions from the binding sit’e data; sites are analyzed by calculating bhe Matrix Mean score. We show that this Matrix Mean model could accurately predict, the locations of novel ICI’4 binding sites. Finally, we analyzed the entire HSV-I genome for potential ICP4 binding sites and speculate about what, these results suggest fhr the role of ICP4 in viral gene regulation. h’e;~~~?ord,s: HSV-1:

gene regulation:

immediate early genes; consensus sequences

1. Introduction Immediate early gene 3 (IE311) of herpes simplex virus type 1 (HSV-1) encodes a 1298 amino acid residue protein. ICP4 (also called cr4 or Vmwl75), essential for productive viral infection (Preston. 1979: Watson &, Clements. 1980; Dixon & Shaffer, 1980). ICP4 has been shown to bind DNA at sites containing the core sequence ATCGTC; this site is located at its own transcriptional start (Faber & Wilcox, 1986: Muller, 1987) and is responsible for negative regulat’ion by ICP4, both in transient t Present address: Inept of Pharmacology, University of California at San Diego, La Jolla: CA 92037, U.S.A. $ Present address: Uept, of Biology. MJT. (‘ambridgr, MA 02139, I’.S.A. 8 Author to whom all correspondence should br addressed. 11Abbreviations used: IE3. immediate early gene 3; H8V-1. herpes simplex virus type 1; PMSF, phenylmethylsulfonyl fluoride; TPCK, L-l-tosylamide2-phenylmethgl chloromethyl ketone; bp, base-pair(s); I>MS, dimethyl sulf’ate: CAT. chloramphenicol ac:etvltrarisferasr,.

DNA-protein

interactions:

t,ransfection assays and in the context of the viral genome (Roberts et al., 1988; DeLuca & Schaffer, 1988). Localization of t’he ICP4 binding site within proximity (2 to 3 helical turns) of the TATA box correlates with the negative regulatory activity of TCP4 (DiDonato & Muller, 1989; DiDonato. 1990). Promoters with ICP4 binding sites locat.ed further by 5’ , such as the case with gD, are trans-activated ICI’4 both in ai~o and in vitro (Tedder et al., 1989; Tedder & Pizer. 1988). TCP4 is a general transactivator of cellular and viral promoters which frequently lack the ATCGTC motif; therefore. eit’her D?\‘A binding is not important in activation by TCF’4, or the existing consensus data do not adequately describe TCP4-DNA interactions. The results obtained with a number of HSV-1 genes suggest, a c*orrrlat,ion between ICP4 activity and specific DNA binding sites (Muller, 1987; DeLuca & Schaffer, 1988; Faber & Wilcox, 1988; Kattar-Cooley & Wilcox. 1989; Roberts et al., 1988: DiDonato & Muller, 1989; Kristie & Roizman. 198Bb; Michael et nl.. 1988); data from transient expression assays with l(!P4 and caonstructions containing normal or altercsd Ic’P4 binding sites are consistent with the

that ICY’4 functions through direct binding to I)SA (Roberts rf ul., 198X: Tedder rt r/l.. 19X!1; it ~1.. 1989: ‘I’eclder 8 l’iwr1 1 198X; Shepard I )if)onato B Muller, 1989). ICP4-DNA binding to a variety of sit,es has been demonstrated i/l vitro using t)andshift assays. I)Nase I fhotprints, and DMS in t wfiwnce analyses (Faber $ Wilcox, 19%; Krist,ie K- Roizman, 1986n: Mulfer. 1987: DiI~onato & Mullrr, 1989: Tedder et (IL., 1989: Tmbalzano (If nl., f 990: Kat tar-(boley & LVilcox, 1989). Finally, disswtion of functional TC’P4 domains has revealed that mutat,ions afl‘rcting trans-activation always ;if)fwar to affect f)NA binding (l)ef,uca & Schaffer. I!##: I’atcrson & Everett,. 1988; Shepard rf (11.. I %!I). The dat,a soundly support t’he notion that I )?;A binding is a vital step in act’ivat’ion of rrpression by l(‘I’4. Faber & Wilcox (1986) aligned the sc~cfrrenc~est’rom several footprinted binding sites to dcrivc a consensus sequence for TCI’4. The consensus sequence was 5’ AT(IGT(rN?ilVr\;Y(IGR(I 3’. where R is purine. Y pyrimidine. and N any Ijaw. Sut)sequent analyses of TW4 binding sites suggested that there were two or more classes of binding sites. Some sites clearly resembled the consensus sequen(ac and \vcre lahelcd consensus sites, while others tlisf)layed lit,tle if’ any homology to the consensus St’{~UfW”” and were labeled non-consensus sites; 1hwr findings led t,o the suggestion that ICY’4 has nrultiplc I)NA binding motifs (Paterson Cy Everett, 1988: Michael et ol.. 198X; 1mt)aleano rt nl.. I9!10: Trdder cd ~1.. 1989). Our hypothesis. however. was t,hat ICY’4 binding sites could. in fac*t, fw descritwd 1)~ a single recognition element, \,!:e believed that the apparent lack of homology among I(‘I’4 sites cwuld be explained by a more tlrgrnerat~r I)SA\ binding motif than was inferred fwcviousfy for I( ‘P-C. Our objectives were t,o characterize a number of sit,es twtti t)iochemically and stat,isticnlfy atrcl then evaluate the result)ing model fi)r its generalit? in fwedicting novel T(X’4 hinding sit’es. We anal,vzed 1% ICI’4 binding sites and report a broader consensus sequence t,o which both conset~sus and non-consensus l(‘T’4 sites are homologous. (‘onsensus sequences are generalI>- not t’hr Lest profiles for degenerat,e binding sites in which not all positions and base sut)stitut)ions are equal. Thus, (wnsensus sequenws work n-cll for restriction cnzyrnr sit,es. but many cukaryotic regulatory fwot,rins have complex recognition elements. For examf)lr. the yeast TAT&binding TFl I I) was shown to hind promoters w&h and without thr: c*h:tracteristic .TXTA” consensus sequence (Hahn pt rrl.. 1989) and a consensus sequencer was derivecl for t,he binding of 8pl 1 hut not’ all sequences bound I)y hf)f are homologous to this consensus sequence (nor arc all consensus rnat,ches actually recognized t)y the fwot,ein: ,Jones &r Tjian. 1985: I)ynan rt al.. 1990; Ares it (xl.. 1987). There are many cases of proteins that appear to recognize remarkably diff’erent I)NA sequences (Haumrukrr et nl.. 198X; (‘osta et (ll., l!)X8: Storrno & Hartzell. 1989: Stormo, 198X; f3erg & van Hifjfjel. 19880): for example, t hr yeast acCvator HAP1 binds f)SA sequences wit’h VVI.~ 1it.t lo idea

af)pa~rent sirnilarit~~~ (T’ti~ifbr r’f (I/.. 1987). In I hew cases, any (wnsensus srcfuenw tfrrived may I)(. quite degenerat,e and offer litt,fe fwdictivc valw. DNA binding sites are fwhafjs best protiled t)> mat’rices of base proport~ions. in whic*h ea(*ll posit~ion in the binding site has four fwssihlc VRIIIOS (1 per base) (Mulligan Pt al., 1981: 12rrg B van ffifjfwl. 198Xa: Stormo, 1990; Spit’zner & 1luller. I9XR). To analyze a putative site, it is i~,ligll?d with ttrr mat ris, then each base (+ I +2. r+(*.) oft he sit,r is sc~orr~clh? the value for that position and nucleotitle t’rorn Ihr matrix of’ base propwtions (or in soti1(~ ~~Ilitl~Sf’S. t.tlcb base

proportions

arc

bwightt~d

t)y

thtl

inforniation

content per position, as wvicl\vetl I)y Storrno (I!490)). The scores per fwsition in the binding sit’tb are t hrtl c9rnhinetl

lo

yield

tjtw

rnatc+

of’ tlw

sites to

thts

matrix. Berg Hr \-on Hippcl (f!488/~) travcs shocvll that. in general. bases contrihutc intlcpendcnt Iy to the hiriding affinity of a site; therefow. a matrix score c*an he relat,ed directly to the ;iCtllill I)intling energies of protein-DNA int~crac~t iolls. We

rtyorf

statistical

that

twth

footprinting

characterizations

nition element, tion profile appropriatt~ prediction of a,nalyzcd the (tvith ;I t110W

for

T(‘P1. showed threshold t)otjh known wtiw HS\‘-I wnswval

and

analJ.ws

;tn(l

suggest a 13 t)asc recwg\\:c

cwnstrucGd a 13 Ibosi that selection of t trc% valw allowt~tl ;tc~c*ura~~t~ and novrl 1C’P-C sittls. M’c

genomc ivcl

using this lwotilr

t hrwhold

found that fwtSential IC’T’4 hinding distrit)ut’ed non-random I>. iitld rnric*htbd in thcl 1143 yew.

valllt’)

sitw appwr art’

iLIlt

to 1w

part icularl>~

2. Experimental Methods (a) .Jlo//~rirt/s

ISHK

vrlls

\vtlrv

gro\b n

ill

I)ultwc~cw

nlcdifirtl

Eagle

medium (Flow I,ahoratoricbs. Tlrta., Mcl~tw~. I’,\. I’.S.A.) supplemented with 5”,, (v/v) fetal twvinr serum ((: lB(‘O Lahoratoriw. (iranti Tslantl, KY. IT.S.A.) and SO pg gentamicin/ml. Stocks of from lobv-mult,iplicitg 1)iT)onato B Mullrr.

HSV-I pssagr 198!1).

(KOS strain) as tlrsc~rihrtl

were prepared (Muller. 1987:

Derivation

(c) f’reparatim

of crudr

of ICPP Consensus Elements

uxtract.v

BH K cells growing in 100 mm tissue culture dishes were either mock inf%ed or infected w&h HSV-I (KOS) at an input multiplic~ity of 10 plaque-forming units/cell. and cultures were harvested at 11 h post infection. Cells were sc*raped into the tissue vulture medium. collected on ice and carntrifugrd for 5 min at 8OOg (4°C). The pellet, was washed twi(‘e with 50 vol. cold T&saline (10 rn~lTris’ HC‘I (pH 8.0), 150 rnbl-,\;a(‘i) and culls \vvrre resuspended in I pellet volume of cold sterile wat,er ~ontaininp 0.2 m>~-NazE1)TA. An equal volume of cold Iysis buffer (20 mM-Tris. HC’I (pH 8.2). 2 M-Kd’l. 2 m.v-Ka,EDTA. 0.5 m>l-PMSF. 0.1 m&f-TI’CK) was added. vortrxed brirtl~* and the IJ,sate placed on ice for 1 h with periodic, mixing. The Iysats was then clarified b,v c*entrifugation at iOO.OC~O g (4°C’) for 60 min. The supernatant was collected and dialyzed overnight. at 4°C: against, several changes of A (20 rahr-Tris. HC’I (pH X.5). 1 rn~ W3 \I-bufier Sa, E1)T.A. I rn,l-l-rnrrca1~t~oethanol. 10YO (v/v) plycserol. (b.5 mu-PMSF supplemented with KC’1 to 0.3 M). Th(a extract was th(xn centrifuged (12.000 g for 10 min) to tlrposit any insoluble material and stored at -80°C’ in portions. The I)?r’A binding activity was stable for > 1 year under these* conditions.

lnf+c%ed c*ell 1ysatr.s were prepared from 2 x IO* cells as drscribt~ti above and dial~ed against 03 M-buffer A. The dialysate nas centrifuged (15.000 g for 5 min), and the suprrnatant adjusted to a final salt, Vonc.entration of f,Om~-Ki(“l using buffer A and loaded onto a DEAESrphacrl ~Iumn (3 ml volume in a 1 cm x 1 cm caoiumn. pre-equilibrated \vit,h buffer A) at a rate of 30 ml/h. The column Was washed with 50 mu-buffer A utltil the ultravioital abs0rbanc.r was reduced to background and bound proteins were drsorbed in a 60 ml linear KCI gradient (in buf&r ;I) from WI!) M to 05 M. I(1P4 was located by mobility shift assays (using IE3 cap site fragment as probrb) and t.hc: ac%ive fractions were pooled (most, of the ac+ivit,v elutetl at 225 mM-KU). (TV) Rucombinnnt

plasmids

.jII new gene constructions were verified by dideoxy i)SA sequencillg. The 1’lasmid designated as pl Ii (Everett. 1987). contains IE gene I with approximately 750 bp of 5’ upst,ream secluence; this includes a novel immrdiat,e early-like promoter region that is divergent from the IE gene 1. We refer to the gene region as IEx (the TATA box for IEx is 735 bp 6 of the TEl TATA). The recombinant pJz)2 contains a 815 bp region of the iE3 and IE4/6 promot,ers from +27 (BarnHI) relative to the IE3 tranxvriptional st.art to + 100 relative to TE4/5 (c*loned between the BamHI and EcoRV sites of Bluescript’). The plasmid pBA-2 contains 1 copy of the I&lmHI/A’uaT fragment (+27/- 18 of IE3) cloned mto the li:coRV site of Bluescript+. The const,ruct is oriented so t.hat the blunted ii~aT end ( - 18) is t,oward the EcoRI site and the blunted Ha,mHI site (+ 27) is to the Hind111 site of t,he Bluescript + polylinkrr. The minimal promoter TEl rtA(bombinant p,JD IO0 was constructed by isolation of the Srnal .4~zI fragment (-126/+51 relative to the IEI transcriptional st,art site) from ~111, the end filled with the Klenow fragment and cloned into the EcoRV site of Bluescript,+ A clone with the +51 end nearest to EcoRI was isolated and designated pCJI)lOO. EcoRI and Hind111 were used to digest p.JI) 100 releasing t.he insert. The

453

band-isolated insert was then cloned into Ml3mp19 using standard cloning techniques. Single-stranded DNA was prepared from this bacteriophage in order to mutate the I(‘l’4 binding sit.e at -68. The oligonueleotide 5’ GGQG(:A(I(:T(:CA(‘T(:(I(r 3’. was used in a site-directed in vitro mutagenesis reaction as described (see below). Rrcsombinant phage were then screened by dideoxy seyuencing and a mutant was identified. Repiicative form i>SA was isolated and restricted with RcoRT and H?:ndIII to release the insert. This fragment was cloned between the E~coR1 and III:ndIII sites in Bluescript’ resulting in p.l1)1Oi. The resulting mutation contains 2 transitions and a t,ransversion (GGGGQAATCGTC t,o CGGG(:(:(:A(Y:TC’. .). The mutation creates a unique AatIT site located in the ICP4 core consensus sequence. The mutant pJD101 was digested with .4atTI. the 3’ overhangs digested and made blunt with phage T4 polymerase and religat,rd wit.h ligasr to form pJD102. Thr sequence of the alttsrrd region is (ZGGGGGC resulting in a 4 bp tl&~tion. (f) In vitro mutagfVesi.s The mutant oligonucleotide spanning t,he I(‘P4 binding sitts in IEI (described above) was used to extend compiion the single-stranded strand synthesis mentar! M I3mpl9 recombinant. containing the wild-t.ype minima1 IEI sryuenre between EcoRI and HindIIT. The procedure used eras that of Su & El-Geweiy (1988). Mutants were sc*rwnrd by didroxy sequencing.

hlobiiity shift assays were performed as described (~lullrr. 1987: 1)ii)onat.o & Muller. 1989). The input stoichiomrtry of the binding react,ion is approximately 10 molecules of ICP4 per DNA molecule. DNA probes were 5’ end-labeled with [y-32P]ATP and T4 polgnucieotide kinasr (Maxam & (iiibert. 1980). The rea&ion mixtures (total vol.. 20 ~1) were assembled in binding buffer B (5 mM-Tris. H(‘I (pH 7.9). 0.5 miv-Ba,EJ)TA$, 005q; (v/v) Sonidet, 1’40, 12.5 mM-NaPO, (pH 68)) \vhic*h contained 30 to 100 mM-KC’1 (contributed from the partially puritied protein fractions). 4 pg of poiy(dA-dT) poly(dA-dT), and PO,C,OO cts/min (1 to 2 ng) of end-labeled I)iYA. Keact ions were initiated by addition of protein (I to 2 ,~g) and incubations were performed either on ice for 45 min or at 22°C‘ for 30 min (similar results were obtained with both in,.ubation regimens). At the end of the incubation period. WI vol. sample dye (0.25Y0 (w/v) bromophrnol blue. o%YO (w/v) xylene cyanol FF. 5O’!h (v/v) glycerol. 6.7 mnl-Tris. HC:i (pH 7.9), 3.3 mM-sodium acetate) was added and the samples were loaded onto a native 4% (H/C) low ionic strength polyacrylamide gel (acarylamidel bis weight ratio, 80 : 1) containing 6.7 m.n-Tris. HCI (pH 79). 3.3 rn>l-sodium acetate. The gel was pre-ruti for 1 II at 20 mA and after loading t)hr samples, rlectrophoreaia was carried out at 20 mA (with constant buffer rec.irculat.ion) until the bromophenol blue dytb front had rracshed the bott,om of the gel. The gel was dried and rxposrd t,o Kodak XAR film with an intensitting screen. usuallr for 4 to 6 h at -80°C. When quantification was rssentjal. soft laarr densitometric scan?: of multiple exposures of t.he autoradiogram were used The peaks under the curves were excised and weighed to determine the values Alternativeiy~ an AMBIS radioimaging detect,or (AMBIS. San I)iego. CA. I1.R.A.) was used to quantify ratl~oa~%ivit.y Io~at.rd in the gel.

J. A. I)ilhnato

454

(h)

Missiny

lmsr

contact

antrlysis

(> + A and (‘+T reactions were performed on 5’ rntllabeled DXA fragments as described (Hrunelle B S(~hlrit’. 1987) with slight modification. The only modification was that, the GSA base elimination reaction was carried out at 37°C’ for 25 min. the free DEA and bound DR’A \VHS precipitated with ethanol twice. washed twic*e with 70”,, (v/v) ethanol and dried. The dried pellets were washed once with 25 ~1 of distilled water and dried. Typically. LO’ to 2 x 10’ ct,s/min of end-labeled probe was modified pet’ reaction with an expected recovery of S.?“,,. Binding reactions contained between 0.5 x IO6 and I.0 x lOh c%s, min of end-labeled modified probe. 7.0 pg of partialI? purified ICI’4 and 14 pg of poly(dA-dT). pol$dA-dT) in a reaction volume of 50 ~1 and incubated on tee for 60 mitt before addition of loading dye. Samples wertk frac%iottat,rtl on a preparative 400 low-ionir strength gel as desrribrd above. Complexed and free DIVA was visualized by autoradiograph>- of the wet, gel at 4°C’ for 3 h. ;1leas of int,rrrst were then cut, from the gel and transferred to NAG papet (Schlircher and Schurll. NH, r’.S.A.) in 0.25 x THE (0+)23 M-Tris. 0.023 M-boric3 acid. 0.05 rrt.n-Ka,tCDTX) at 95 V for 1.5 h in a RioRad Transblot’ unit at 4°C’. The strips of NA45 paper (wet) were then subjected to autoradiography and all bands localized and cut from t)he paper. Radioactivity was eluted from the EA45 by in(u hation in 2 M-NET (ZWOmM-Tris.H(Jf (pH ?sO), 1.0 rnMat 68°C for 2 h. Eluted samples Na,EDTA, 2.0 M-x&l) were extracted with phenol/chloroform, precipitated wit,h ethanol. washed t,wice with 70?, ethanol and dried. Samples were then washed once with 25 ~1 of distilled water. dried and resuspended in 100 ~1 of I M-piperidine, heated t’o 90°C for 25 min. vacuum desiccated to dryness, washed with 25 ~1 of distilled water and evaporated to dryness. The dried pellets were resuspended in sequencing loading dye and equal amounts of radioactivity were fractionated on a 12.5Oi, (w/v) sequencing gel. The gel was soaked in lO(& (v/v) methanol/l2?~ (v/v)~ acetic acaid for 2 h. to remove t,he urea. dried and sul-,jected t,o autoradiography. (i)

Footprintiny

et, al.

The infi~rtrtation content analysis was calculatc,tl as desc*ribed in Spitznrr & Slullet~ ( I9W): thr valur for each sc’quencr position is drtrrmittrd by ust~ of the rquat)ion: “;, information

cont’ent = ( 1 - H,,JH,,,)

whert, N is the ent,ropy equation:

i-

(in bits)

x IOO”,,, (1)

c&alculated by the

t

in which p is the probability of a base occurring at a particular position (from the base proportion matrix) and xi has 4 values (; = I, 2. 3, or 4) corresponding to the 4 bases. Thus. the tnaxitnum cbntropy (fi,,,). from ciyn (2). results when all 4 hasps o(‘(‘ur equally often (p = 0.25): (3) f’rnax = (4)l - (O%)( -“)I = 2.0. so t,hat if thr base proportions at a 1)osition art’ random. the entropy at that posit,iott (W,,,) equals M,,, and thrb information (*ontent percentage is zero. In contrast. if a, single base occurred in I OOYo of vascts at a given position. &7s is zero for that position. and the resulting infortnation rontent perc*entagt~ is IOO”,,. X c~onvrntional (*hi-squartl goodtlrss of tit trst \\ as calculated for eac,h position using t,he fortnula:

where 0, is the observed frryuenc*,v of base> i in t ho database at the position. and H, is the rxprc*tc~d basr freyuenc,v calculat,ed from thta base proportion of thr ovrrall samplt~ multiplied by thtl sample size,. ()btaint~d values of (*hi-syuarr were c*omparcd to thts caritical valur with 3 degrees of frethdotn at a c*onfidrnc~t~Itlvt’l of !)5”,, (r = 0.05). The calculations for evaluating Matrix Mean s(*orchsarc% described in thr text. All statistical analyses wt.rP conducted using t,he EDEN (&ntasvs c~omput,er soft\varra system (TEAM Associattas. \Vrstrr\:ille. OH. I’.S.d\.)

a&p.s

( ‘opper footprinting on DPr’A-prot,ein complexes in situ was a,cxcomplished using 1, IO-phenanthrolinr co-ordinatrd c:opper ion as a cleavage reagent (Kuwabara & Sigman. 1987). Preparative low ionic strength polyacrylamidr gels (described above) bvere used to separate bound and frer I)NA fragments. Following electrophoresis. the ent.ire gel was rinsed with 50 mM-Tris. H(‘1 (pH 79). and submerged in 200 ml of the same buffer, followed by the addition of 20 tnl of solution Al ((P45 m~i-cupric sulfate. 2.0 m&l1.fO-phenanthrolinr) and 20 ml of solution HI (58 mm :~-merc:alttopropioni~ at-id). The gel was incubated for IO min at room trtnperature with grntle agitation followed by addition of 20 ml of solution (‘1 (28 mM2.!)-dimet,hvl-l.IO-lthenanthrolinr) for 2 tnin at room temperat& The gel was then rinsed several times \vit,h distilled water and exposed to X-ray film ovrrnight, at 4°C”. Bands of interest were excised from the gel and the f)XVA rfectroelutrd. To prepare solution Al, 40 mm l.lO-phenanthrolinta (in 95”; ethanol) was mixed with att rqual volume of 9 mm(‘uS0, (in water) and t,hr mixture cfiluted IO-fold with water t,o yield a final conc’entrat,ion of 2 mm1. IO-phenanthrolino. 0.45 trtM-(MO,. To prepare solution 131. neat’ :~-tttercaptoI)rol,ioni~~ arid was diluted 200-fold with wat,rr to give .58 rnM-3-mercaptoprc,pioni(, ac*id. Solution (‘I was prepared in 959,, ethanol.

3. Results (a) Ident@cation

of ICP4

binding

sitrs

It is clear that ICP4 binding sit’e characteristics have a significant impact on gene regulation in HRV-I; however, no model currently exists that can be applied generally to predict all TCP4 binding sites. From our previous experience with proteins that recognize degenerate DNA binding sites (Spitzner & Muller, 1988. 1989), we found that the first step in generating a model was to analyze a set of binding sites to determine what, range of sequences and base substitutions is allowed for protein binding of DNA. Although the consensus sequence reported by Faber & Wilcox (1986) is not capable of predicting all ICP4 sites (see Introduction), ICP4 clearly does bind sites that are homologous to that consensus sequence. We used this consensus sequence to identify a potential ICP4 site located 740 bases upstream from the IEl gene; this sequence was designated IEx because of its location external and 5’ to the IEl promoter/ regulatory region. We found that a DNA fragment,

Derivation F

B

F

6

F

A/G

of ICP4

G

1 TATA

+I

ICP4 binding site (cons ensus sequence)

+13

Figure I. Copper/phenanthrotine footprinting (Cu/OP) the l(‘P4 binding site in the IEx promoter. Labeled fragments were prepared from pl I1 ( - 135 to + 35 of IEx labeled at +35. to t,hr TATA box and to the IEx transcriptional st,art site: S. Silverstein, personal communication). The fragments were incubated with partially purified ICP4 and the bound and free DI’CAs were stpw”t.ed on a 4”” (w/v) Iow-ionic strength polyaerylitfnidf~ gel at room temperature (see Experimental .\Irthods). CI/OP footprinting was carried out as described in Experimental Methods. Lanes 1, 3 and 5 c,ontain unbound DXA; lanes 2 and 4 cont’ain 1CP4 bound t)SA: lanes 6 and 7 contain the sequence markers for the - 1% to +X IEx fragment. In lanes 4 and 5, a greater amount of’ radioavtivity was loaded compared to lanes 1. 2 HIKI 3. Positions of TATA and the ICP4 protected site UC‘ marked with brackets. Lanes designated B correspond ttr l(‘I’4-bound 1)K.h sample lanes; those with F are free t)S,a lanes. The + I/+ 13 region (on the right) corre~l~m~ls to the bases underscored in sit,e 2 of Table I: I\o\*.pv(sr. thr ac%lral sequence shown as protect,ed corre~~)onds to thra strand opposite the ATCGTC.

c*ontaining this 1E:x site was specifically bandshifted by TCP4 (see below). We then used copper/ phenanthroline footprinting (Kuwabara & Gigman,

Consensus Elements

455

1987) to show that ICP4 bound the fragment at the site predicted by the consensus sequence; as Figure 1 illustrates, the sequence bound by ICP4 (lanes 2 and 4) was protected in a region corresponding to the consensus sequence given by Faber & Wilcox (1986) (see also Table IA). In Figure 1: + 1 corresponds to first base of the core sequence ATCGTC (although the opposite strand was footprinted in this rsxperiment). Bases + 1 through + 13 were strongly protected; additionally, bases at positions - 1 and + 14, + 15 were partially protected. Finally. we note that a transcriptional start site has been mapped in the region of the TCP4 binding site (S. Silverstein, personal communication). The upstream TATA box (labeled in Fig. 1) was hyperreactive to cleavage by copper/ph&anthroline. as report’ed for several immediate early gene ICP4 binding sites (DiDonato & Muller, 1989). Therefore, the putative IEx promoter appears to be very similar to that of IE3. In Figure 2, we show that, the protein which bound the TEx sequence is indeed ICP4 by use of a monoc*lonal antibody against ICP4 (Muller, 1987; 1XI)onato & Muller, 1989); in a bandshift’ experiment, inclusion of the antibody super-shifted the protein-DNA complex (compare lanes 11 and 19). In Figure 2, we also compared the relative binding affinities of a number of DNA sequences for TCP4. Radiolabeled probes for the DNA fragments were prepared concurrently and specific activities were calculated to ensure that each reaction contained very similar concentrations of input DNA; furt’hermore, all of the reactions were set up in parallel using t,he same stock solutions and ICI’4 preparation. The results of this bandshift experiment were consistent in terms of relative amounts of fragments that formed primary complexes, super-shifted complexes with antibody, and amounts remaining with inclusion of specific competitor DNA. There was a clear hierarchy in binding affinities for these sites. in which the IEx site was the strongest. The wild-type IEl site was also shifted with high rf% ciency; the IEl-T site, with essentially a change in the first two bases of the ATCGTC core element of the IE:l site, bound ICP4 with much less affinity, while a further deletion of the (*ore element (t,he IEl-dl” site) led to a barely detectable band shift with ICP4. The TE4/5 site showed a relative binding aflinitv on t,he order of the IEl-T mutant sit,e. The IEx sjte is clearly the strongest ICP4 binding site we have identified. From competit,ion experiments in which sites were all competed against. the same IE3 cap site oligonucleotide (or as contjrol. a heterologous competitor), we found that the Ic’P4 binding affinity for TEx is four- to sixfold greater than the wild-type IEl sit,e (Kristir Cy l%oizman. 1986n; 1)iDonat.o 8r hluller, 1989; Kattar-(‘oolo!& Wilcos. 1989) and eightfold greater than the wild-type TE3 promot;er site (data not shown; ,Muller. t 987). The physiological effects of altering tht, afinit,y of a sequence fbr TCP4 have been demonstrated using transient rbxpression assays (Tedder ef al., 1989: R#oberts et al.. 1988; Batchelor & O’Hare. 19X7): the

t Antl-ICP4-IoG

’ I

-T

WT

I2

dl’



WT

’ I

I

I

I

MICMICMI

IE4/5

IEX

IEI

’ I

WT



IEI

lrWT I

-T

I

di”‘lEx

IE4/5

1

MICMICIIIII

34

5

6

7

8

9

IO

II

12 13 14 15 16

17

18 19 20

-

A4 -

Shifted A4 complex

complex

DNA probes

Figure 2. Formation of ICP4-~DXA comIkxrs in different IE genes. Binding reat+,ions were carried out using l)~.~ fragments from wild-type (WT) of mutant TEl. IE4/fi or IEx (- 10,5/-t 3.5). Reactions t.ontained eithrr mock-infrt+rtl extract’ (2 pg) or partially purified ICP4 (2 pg). Lanes 1 to 3 contain wild-type IEI probe from l~,ll>lOO (- 129/+:5l) incubated with mock-infect.ed ext.ract (Iant, I), part.ially purified ICI’4 (lane 2). part.ially purified I(“P4 prt,-intaubated with IE3 IC’P4 binding site oligonucleotide. referred to as the A4 oligonucleotide (Iall? 3). I,ant~~ 4 to 6 cont,ain thta IEl mutant probe from pJDlO1 (see Experimental Methods) incubated wibh mock-infect,ed extract (lane 1). partially 1)urifietl IC’P4 (lane 5) and partially purified TCP4 plus A4 oliponucleotide (lane 6). Lanes i and 8 conta,in t,he 1 El rnut,ant probe from pJDlO2 (see Experimental Methods) incubated with mock-infected extract’ (lane 7) and partiali!, purified ICI’4 (lane 8). Lanes 10 through I:! and lane I9 contain a probe derived from TEx (see Experirnrrltal Methods) incubat.ed with mo&infetsted extract, (lane lo), partially purified T(‘l’4 (lane 11). partially purified I(‘P4 plus A4 oligonu~lrotitlt~ (lane 12) and partially purified ICY4 pre-incubated with anti-ICI’4 monoclonal antibody (5X S) (lane I!)). Lanes I:< through Ifi and lane b0 contain the IE4/5 probe (-51/+ 108) incubated with n1ot.k extract (lane 113).pat%ially purifit~tl l(‘P4 (lane 14). partially purified ICY’4 pre-incubated with A4 oligonucleotide (lane 15). partially purified 1C’P-l JHY~ incubated wit,h anti-ICI’4 monoclonal antibody (58 S) (lane 20). Lane s I ti to 18 display ICI’4 ~omplrxt~s that u’e~’ supershifted with a monoclonal antibofl? to TCt’4: wild-type TEl (lane l(i). mutant TEl from p.I I)101 (lane Ii) ant1 mutant TEl from pJDlOX (lane 18). Posltlons of the l(‘t’4 complex (A4 t~omplex) and frrr I)KA are marked ?iotr that in lanes d. 5. 11 and 14. :! shifted bands are apparent: t,he upper band probably represents 1(‘1’4-1(3P4 rnrtlt,illlrriz;ttiorl. indicated by the following observations: upper band appearance is proport~itmal to protrin r:onc.entration and is found after purification of ICP4 to homogeneity (Kattar-(‘ooley & Wilcox. 1989): the footprint of t,hr upper (2 ) band is identical with that of the lower (1”) band (see Fig. 4(b); DiDonato & Muller. 1989); both bands art. tvmprted by sptvifk TCP4 binding site oligonurleotides. and the upper band disappears first (Fig. 2). results indicate that lowering the binding afinitp of ICP4 sites diminishes the regulatory effect,s (both with positive and negative) of‘ TCP4. For examplr. the wild-type II33 promoter fused to (‘AT, inclusion of TCP4 led to a 6 to IS-fold reduc%ion in (‘Al activity; however. changes in t,wo bases of t)he tore

see below) ncutralizcd element (+ IA and +Yl’, ICP4-mediated negative relation and I( 2’4 binding in V&N (Roberts d al.. 1988: I)il>onato bz iVlullrr. 1989; Shepard et nl.. 1989; l)ilk~na,to, 1990). ‘I’ht. same result was seen when the entire T(T4 binding ~~etwwn the site was deleted. or when t)he distantat,

Derivation

of ICP4

ICI’4 site and the TATA box was increased (from 25 bp in wild-type to 48 bp), while basal levels of expression were unaffected (DiDonato. 1990). We have also examined negative regulation of CAT expression by TCP4 in the IEI promoter, which contains a high affinity TCP4 sit’e located 5’ of the TATA box (Kristie 8r Roizman, 1986a: Michael et al., 1988; Kattar-Cooley & Wilcox, 1989; DiDonato & Muller, 1989). In this construction, ICP4 produced a 44-fold reduction in CATG activity as compared to the basal level; alterations in the recognition site that reduced the binding affinity for ICP4 (the TEI-T and TEI-dl” sit,es shown in Fig. 2) also led to substant)ial losses of negative regulation (DiDonato, 1990). These results suggest that’ knowledge of the locations and binding strengths of ICY4 sit,es near genes allows for predictions of the effects of I(IP4 upon regulation of these genes. For this reason we analyzed a collect,ion of known ICP4 binding sites in an attempt to derive a more general profile for T(‘P4 binding sites. (1)) Stafistical

of a database of ICP4 hindin,g sites

analy.Gs

\Ve only analyzed ICP4 binding sites that had been demonstrated by footprinting, to avoid inclusion of erroneous data or sites that were of very low binding aflinity (which rnight dilute the database). Seven sites contained perfect matches to the six base core element (ATCGTC) of the consensus sequence described by Faber & Wilcox (1986); these sites are shown as the first seven in Table 1A. A consensus sequence w-as det,ermined from these sites and used subsequently to align the five other sites (numbered 8 to 12 in Table 1A) t.hat did not caontain complet,e core elements. This temporary (honsensus sequence was 13 bases in length and spanned the region underlined in Table 1A; its sequence was (5’ to 3’) ATCGTCNNYNYSG, where Y is pyrimidinc, K is C or G, and N is any base. These non-consensus ICI’4 sites (8 to 12 in Table 1~4) were, in fact. quite homologous t,o t.his consensus sequence. with an average match of 11 out of 13 positions. Table 115 shows the nucleot)ide proportions, by position. of t,he 12 TCP4 seyuences; the proportions arca showr~ from ten bases 5’ of the six base COW t+ment t,o 24 tta,ses 3’ of the element (i.e. - 10 to a value for the information + 30). ln addition, content, of the database is shown for each position. f&sentially this is a statistical measure of the nonrandomness of base proportions at a position (Spitzner & Mulltar, 1988). Thus, the more information (or wnsmsus) at a position, the greater the wow (for a detailed description of information analysis. see Stormo & Hartzell, 1989; Stormo, t 988. 1990). The informatjion scores for the I2 ICP4 binding sitrs arc’ shown graphically in Figure 3(a). indicates that the positions with This graph c.onstarvrd nuc:IPotides are in t,he same 13 base region detincd by the temporary consensus sequence given above (and in the first 13 bases of the 15 base

Consensus Elements

457 Bases +I/+13

loo il

I

II Base position

3.1

(a) I

601

I

II Base positmn

21

31

(b)

+I t13 5’RTCGTCNNYNYSG 3’ ICP4 consensussequence

(c)

Figure 3. Analysis of significant bases in the I(‘P4 database. (a) The information content percentage from Table SB is shown graphically. by position. (b) LAgraph of a chi-square test by position using basr frequencies from t,hr I(Jf’4 database (Table 2) and expected frrquemies calculated from the background base proportions in the sample The significant positions, taken from the d graphs. lie within the bracketed region spanning positions + 1 to t 1S of the database. The ICP4 consensus sequence shown in ((‘) was derived from data in Table I and the graphs in (a) and (b) consensus sequence described by Faber & Wilcox). The peak positions were + 1 to 6, 9, and 11 t,o 13. Tn peaks were also observed at, several addition, flanking positions. We suspected that the peaks at these flanking positions were due t,o chance occurrences associated with the high (: +(: content of t’he database; the information analysis shown in Figure 3(a) compares the ICP4 binding sites to random base proportions of 25yA each. whereas the base proportions of the HSV-1 genome, thth natural target. for ICP4. are quite non-random. Therefore, the predominance of G or C at a position may not be significant compared to the high background of G +C in the database. Likewise. 9 and T bases are under-represented in our database. For these reasons, we analyzed the databa,se using a chisquare goodness of fit test, because this procedure subtracts out background base proportions. The database background proportions of 32(?.h for G and C and t 8 ‘);, for A and T were used t.o compute t’he expected frequencies. The &-square value ptLr posi-

-W

-C---

+

+

Derivation 1

GfA

Codinq

of ICP4

st;Id C+T

IF

A

Conwnsus

Elements

459

1

T-F--F

G+A

I

F

6

F’F

E

F’G

G+A

A4 51te

Figure 4. Missing contact analysis of IE3, IEI and IEx. Missing contart analyses were carried out as drscrihed in Experimental Methods. (a) The analysis with IE3 coding and non-coding strands. Fragments were prepared firom thr IE3 promoter (pBA2: see Experimental Methods) by unique end-labeling at the XbaI site (coding strand) or the Xhol as site (non-voding strand) to yield probes from - 18 to +27. The probes were either depurinated or drpyrimidinated described and binding reactions contained the I>h’A probe ( IO6 cts/min per reaction) with 5 pg of partially purified TCP1. The coding strand (lanes I t.o 8) and non-coding strand (lanes 9 to Ii) are a,s rnarkrd above the gel. Lanes labeled F c-ontain unbound 1)NA. lanes labeled H contained ICP4 complexes: seynencGng markers are indicat’ed in lanrs 4. 5. 9. IH and 17. l’urine eliminations are in lanes 1 to 3 and 10 to 12; pprirnidinr eliminations in lanes ti to 8 and 13 to 1.5. The T(‘P4 binding region is marked as the A4 site on the left and right of vath gel. sequence from a statistical analysis we sought to validate our findings means.

of binding sites. by biochemical

(c) Determination of bases hnportant for ICI’4 contact ty missing co&act analysis To identify particular bases in the ICI’1 recognition sequence that are important for prot’ein

binding, missing contact analysis was performed (Brunellr Br Schleif. 1987). This method is analogous to dimethyl sulfat,e interference (Siebenlist & Gilbert, 1980) in that it correlates pro&in-FIX.4 complex formation with specific basr modification (in this case. base elimination). The advantage of t)his method is that it shows the caontribution of every base (not just purine residues) to binding of the p&&n: furthermore, both posit,ivv and negative

G

Y,

G+ A’,

B

FTF

I’

F’F

B

2’

I

F

10

2”

A

G

TATA

-5o%

J---

L--7O-

-9o-

-lOO-

I

2

3

4

5

6

:

8

3

IO

II

j ;1

13

14

15

I6

I bi

Figure 4. (b) The IE-1 missing contact analysis. The DK;A tingment was prepared fiwn p-1I)100 (- l29/+51, labeled at the - 129 site; see Experimental Methods). Lanes are labeled as bound (13) or free (F) or purinr-specific (G + .A) 01‘l)~rirnitIilre-speci~c ((l+T) as indicated above. In this experiments. l3 and 2” ICY4 c~omplexrs were analyzed (DiDonato & Muller, 1989). The I’ and 2” complexes most. likely differ in their stoirhiomrtry of TCP4 a.8 descrihrd (I)il)onato 8r ~luller. 1989). f’ositionx relative to the. transcriptional start site are listed to the left as is the posit.ion of TATA and t.htx I(‘I’4 binding site protected in Cu/OP footprinting of TGTA and the TCP4 binding site protwtrd in (:u/OP footprintiny c~xlwrirnents (DiDonato & Mullrr. 1989).

c~ontributions of’ thr bases to binding strength at’e &closed (Rrunelle 8~ Sohleif, 19873. In the ~!xlwritnents shown in Figure‘ 4, t.he I)NA w:ts first tre:ttc+zl to eliminate either purine or ljyrimidine l)ase residues (without strand s&sion). Thr DNA was

t’hen rract’etl with partially lmrilird I( ‘1’4. ittlcl protein-I)NA complexes (bound) wew separated from unbound (free) DNA on a low ionic* strrn@h polyacrylamide gel. 1)s; from twh hand was eluted and subjected t’o st’rand scission at t.he psi-

tion of a particular base affected t,he relative a%initJ of 1W-C for the binding site. We performed missing contact attalysih on three I(‘T’4 Gtes. The restAs (Fig. 4(a,) to (c)), reveal that thrl T&:x site (the st’rongest site) wnt,ained no position in which base elitnination enhancwl binding aSnit)-: in contrast. both the TW and IEl sites wntained bases which appeared to interfere with binding. Together. these results (summarized in Table 2) indicate a 13 base region that is impor’twnt for the binding of TCP4. This 12 haw essential region hegins with t,he first) base of the AT(‘(:TC (we wnsensus rlernent) (Faber & M’ilwx. 1986) and t~stentls S’ to ittc9utle the first 13 bases of that 15 lease wns~~nsus sequenc*e. Thus. t tie missing wtttac$ data itre wnsistcnt with the statistiwl (latil from Piguw 3. Eac*h indicates the ident ivat IS lease region as invludin# thv bases sipnifcant ti)r I(‘I’-C two~ttition of’ binditi# Gtes, and in eavti vase t tit, hasw that apltear less itnporlant within this IR haw span are lwatecl twt\vwn the six haw wrt’ el~~mrtrt and tttc, S elc~mettt. (‘omparison of t hv tnissittg cwtttwc*t tlata with the (‘ottsettsuh sequenw. ho\vever. suggest.s that all l)ositions miyht not wntrit)utc~ ~~Iually to ltitttlitr~. ;Lt1(1 furthf~r. that ttltl hsw in I IIe ('C'tltIXI re~ioti may he involvrd in site rec~o~nition 1)~ l(‘1’4. These observations illus;trate a l~rol~lrttt inht~reti1 in t tic nature of t’onsensus srqurncw. in that sortie’Lvhat arbitrary assipntnents must Iw niatl~~. 111 addti(JIl.

M'hPIl

('Ot1St'1lSlIS

St'C~LWIlWh

:lI't‘

IlStYl.

hOIIlOkqq

is sc+orrd as an allLor-none l~tirtrortrc~ttofr \1 hew all positions are wigtited equally. Recvnt sl ricliw ha\-ta suggested that a hettrr profile for a set of I)NA scyuertcw is sitnply the Imse prolwrt iott:, of the di\ta scat (Spifzner & hluller, 19X9: Stormo & Harlwll. I #!I: Stormo. 1!18X. 1990).

Figure 4. (c.) ‘I’hr TEx missing contact analysis. Basrc~lirrtinatrd (: + A and (‘+T modified DSA was prepared labelc~d at +35: SW Exprriment,al from I)1 I I (--.‘,I I/+X. Mtbthods). Thr T.\‘l’.A 1)0x and Ir1’4 binding site (A4 site) aw markrd. t)ions of’ prior base elimination, then the free and bound ONA sa~niples were analyzed on sequencing gels. (‘omparisorr of band intensities between the hound and free samples indicates whether rlimina-

1989) ;\ test site is aligned to the 1~~s~ proportion mat ris. For T(‘l’1 sites. the first haw is (compared to the + 1 haw in the matrix (front Table 1 ti) and assignrd tht> c.ot,rc.sl)oti[litt~ va111v (thus arl A swtw OTT,, a ( ’ hwres O$. A (i wows 02.5 and a T swres 04). Tlret~ the swond base is c~orttfwwl to the +2 position of’ thts rrtatris. The ;rl)l)t~oltt~i;tt(~ value is

IE3 Coding strand

5’

0.0 CGCCCCGATC

*ooo*o*ooo GTCCACACGG

AGCGCGGCTG

CCGACAC

3’

IEI Coding strand

5’

0.0 TGGGGGAATC

0.0 000.*. GTCACTGCCG

CCCCTTTGGG

GAGGGGA

3’

IE3 Non-codlng

strand

3’

*@a GCGGGGCTAG

*** **ee* CAGGTGTGCC

TC&GCC%C

GGCTGTG

5’

IEX Non-coding

strand

3’

*** AGGACGGTAG

oooooooooo CAGAGAGGCC

TCTCGCCGAA

CCA

5’

.I.

d1atri.r

Mrccn rcnnlysi,s of thp TK

The TK srqurnw

prom&r

l(‘I’4

is shown from -200 to -280

bind&q

relative

.sitr

to the transcription

start sity

-240 -228 I I G/CGCcACTCCCTGA*GCTCCTGc*GTcc~~~~~~GTG*c~G*TAGTGT*ccTGTGccccGTccTGGTGTTT~

-280

-200

CGCGGTGAGGGACTTCGAGGACGTCAGGGAGCGCGGAGGCCCACTGTTCTATCACATGGACAGGGGG~GGACCACAMC

CTCGCGCCTCCGG RICPTCNNYNYSP

TK sit*,

(‘onsrnsus srl(urnc’r (neu )

The homology of the ‘rK site to our new ICY’4 consensus secpencr is 10 matches in 13 bases total, and 5 matches in IO non-N hasrs; t,his is thr best wore on the DNA sequence above: however. thew are 4 other sites with this degree of homolog)

* IE3

+I

*+

ATCGTCCACACGG TCGCAGGTGTGCC *

Muller (1987) Diwmat0

& Mulls

(1989)

Imhalzanc~

rt (~1.(IWO)

*

* IEl -61

ATCGTCACTdCC; TAGCAGTGACGGC ** *+

YI)

ATCdTCCATACC: TAGCAGGTATGGC * **

TK

CTC&GCCTCC:G+ GAGCGCGGAGGCC ++* *

-III

-22x

The s~~r~cs of thr data we indicated; the TK site was alignrd by the Matrix X1ra11prediction. I’ositiuns at which mrthylation 01‘ punninr interfered with ICP4 binding are marked with asterisks. t An ICP4 binding site was localized by Imbalzano et al. (1990) to hetwwn 220 and 250 bases upstream from the TK tritnsc~riptiotr start by handshifts and DMS interference; we have confirmed a binding site within this region (data not shown). The sequerwr analyzed included an additional 30 bases on each side of this region

Derivation

of ICP4 Consensus Elements

Matrix Mean values less than @5; as a general rule, it is likely that, the site(s) with the highest Matrix Mean values above a threshold of 0.51 on a given DNA fragment are binding sites for lCP4 (see below). Addit,ionally, sites with higher scores arc usuallp bound with great.er affinity; t’his is in agreement with work l)z Berg &. von Hippel (1988a,h) showing that. in general. matrix scores are proportional to actual binding energies. A test with an indepc>ndent sample is shown in Table 3. Using 1)MS interference and mobilit> shifts. lmbalzano rl al. (1990) identified an ICI’4 binding site in the promoter of the HS’I’-1 thymidine kinase gene. about 240 bases upstream from the transcription start. TJI Table 3A, the sequence in t,his region (from -200 t’o - 280) was analyzed b> three methods. There were no sites with high homology to the Faber $ Wilcox (1986) TCP4 consensus sequence. and t,here were nine sites with between 10 and 12 matches out of the 15 bases. In contrast, using the Mat.rix Mean method, the site spanning -228 to -240 was detected as the only site scoring greater than 0.5 ((k526: see Table 3A): this site matched t~heFaber bt Wilcox consensus sequence at only 6 of the 11 non-N bases (Table 3N). This thymidine kinasr site is a better match to our consensus sequence (7 of 10 non-N bases, or 10 of 13 t.otal: Table 3R); however, the consensus sequence has more matches than the single site predict’ed by the Matrix Mean model. Table 3(: shows experimental evidence that the site predicted by our profile is c0rrec.t I)MS interference data from the t.hymidine kinase region was compared with that of the IEI. TE3, and gD strong ICP4 sites. When the thymidine kinasr sit,e is exhibited as predicted in ‘I’ablc 3.-2. t,he guanine residues important for c.ontacat can bc aligned between all four sites. For example. critical (: residues common to all sites on t,hc t,op strand (Table 3C) are at +4 and + 13; on the bottom strand at + 11 and +6. and on whichc\-rr strand possessed a guanine at + 12. A differcbnceis the + 6 site in thyrnidine kinase. which is a (1 instead of (: (of’ c.ourse, as t,he + 13 6 was the only guanim~ complet.ely conserved among our sites, we cbxpect t,o tind crrt,ain differences in I)MS interfi,rcaric*epat terns among sites). We designed additional experiments to determine the prc>dictivr value of our profile for l(‘P4 binding sites. LZ’r show in Figure 2 (lanes 14 and 20) t’hat, a I)NA f’ragmcrrt from the promoter of t,he IE4/5 gene u-as bound spt~c~itically by TCP4, albhough the rctlativca binding afinity appears lower for this fragment than for the 1Rx and IEI binding sites. Prtlvious rcbsults using tht, TE4/5 promoter in transierlt iissay rxpcariment.s also suggested an interitcation with l(‘1’4, but there were no consensus ICI’4 sites prf~sent ((irlrnan & Silverstein, 1987aJ). .Jriillysis of this promoter sequence using our profile suggestr~d four possible locat’ions for the ICI’4 l)iriding. .-\I1 sites had Matrix Mean values of at least 0.5 (Fig. S(a)). tV’r isolatcbd the potential sites by tligest ions with various restriction enzymes. and the su bfragmrnt s MVI’(~ testrfi individually for ICP4

463

binding as described (DiDonato & Muller, 1989; Muller, 1987). For these experiments, partially purified T(‘P4 was used; however, we applied additional criteria (oligonucleotide competition and antibodyinduced shifts) to confirm the formation of authtsntic TC!P4-DNA complexes (see Fig. 2). The results (summarized in Fig. 5(a)) show that. the most, prominent Iland shift occurred with fragment 4, which contained the highest Matrix Mean site (0.519) in the fragment. Fragment 2 contained two potential sites, with Matrix Mean values of 0.513 and 0.50; however. ICP4 bound this fragment, much less than fragmc>nt 4. These results suggest that the Matrix Mean threshold for ICP4 binding is around 0.50 to 0.51. Supporting this hypothesis, a third subfragmtlnt (number 3; Fig. 5(a)) contained a site with a Jlatrix Mean value of 0.50, but this l>NA was not band-shifted by TCP4. An ICP4 binding site had been demonstrated in the leader region (+8/+ 194) of the cr-tif (Vmw65) gene (LMicha.el et nl.. 1988). Analysis of this cr-tif sequence revealed the presence of two additional sequences with high Matrix Mean values (Fig. 5(b)). A DNA fragment containing this promoter sequence was restricted to separate the potential lCP4 binding sites (see Fig. 5(b)). which were then tested for ICP4 binding. The fragment with the highest Matrix Mean value (0.69) a-as prr\-iously identified as an I(1P4 site (Fa,ber & i$?l(nox. 1986). and is included in Table IA. T(‘P4 bound two subfragments containing this 0.69 site (fragments 5 and 6) with higher aflinit’\. than fragme’31 t 7 (Matrix Mean value of (b58) or fragment 8 (Matrix Mean value of O%O). However. the latter two subfragments were also band-shift,ed efficiently. Therefore. the cr-tif region from + 8 to + 194 contaitls t.hrre high-affinity ICP4 binding sites. as predicted by our profile (and these sites appear to be stronger t,han those in Fig. 5(a), as greater band shifts. were observed using the same reagents and running the fragments on t,he same gel). A number of abundant. sequence-specific l>llA binding proteins have been identified in the yeast Saccha~romyces cerevisiar. These art’ caalled IJY various names (GFI, ABFI, BAFI. SI’F. Tr-c: Dorsman et nl., 1990; Ruchman et (I/.. 1!%8; (‘hambers it al., 1990; Vignais et nl.. 1987: Halfter et rrl.. 19X9: Hamil et al., 1988) and the!. may be related; however. collectively we will call these (:Fl. Tlw ~~O~SPIISUSsc’yuences for GFl and T(‘P1 are: 5’ RT(:RYYYNKNAC’(: .?’ MT(:GT(‘NNYNYS(;

(;FI I(‘T’4.

(‘learlv there is significant homology bc~twt~enthese two sites. WC cloned an oligonuclt:otid~~ that contained a GFT binding site ((‘UC1 A: 1)orsman it al.. 1990). The Matrix Mean value for t.his sequence (0538) indicated that it should be bound by ICP4. This was confirmed by a bandshift experiment using partially purified ICP4 (Fig. 6). As shown in Figure 6, lane 2. the GFT site was band-shifted, and the complex was not competed by a molar excess of hetrrologous oligonucleotide (lane 3). The sa,me concent rat,lon of the A4 oligonucleotjide (cap site from

.I. :I

464 (a )

Summary Nor I

of band-shift

SmaI

results: IE4/5

Sau961

sites with matrix NOrI

&

et al

I)ifhndo mean values AN71

&J

equal or

Fragment number

results:

a-tif

(Vmw65)

lntenslty of band shift

I

++t+

2

+t

Competltlon with A4 oligonucleatide

Supershift ant{-ICP4

Yes

Yes

Yes

Nat tested

Yes

Nat tested

with IgG

3

+ttt

-4

Of band-shift

than 0.5

Hmdm

-

( b)SummarY

greater

sites

with matrix

AvaI I

-%/I

mean values Fragment number

.

equal c), greater intensity of band shift

than 0.5

CornpetitIon with A4 ollgonucleotide

SupershIft antl-ICP4

with IgG

5

+++tt

Yes

6

tt+t+

Yes

Not tested

7

tt

Yes

Not tested

8

tct

Yes

Not tested

Yes

Figure 5. Summary JCP4 binding sites detected in the IE4/5 and cr-tif (Vmw65) promokr and leader srqu~~nc*c~s. [a) The top line shows the restriction map of the IE4/5 promoter-leader region and all IC1’4 binding sites possessing a Matrix Mean value of 05 or greater. Darker lines underneath show fragments containing TCP4 binding sites (0) that WPW prepared from pJD2: the HindIJI-&zaI (fragment 1, 153 bp), Xurl-Surl (fragment 2. 104 bp). .4 rraIJ -SauYCiI (fragment 3, 67 hp. note that, &u961 digestion disrupts both the 0.513 and 050 sites) and the HindJJI--AwIJ (fragment 4, 50 bp). Standard mobility shift assays were carried out as described using partially purified T(‘P4. Rractions were adjusted to contain equal amounts of each end-labeled fragment. Where possible. a larger fragment was tioubl~ rntilabt~lrd (e.g. SmaI-NindJII) and subseyuently digested with a second enzyme (e.g. riraJT) to release uniquely entllabeled subfragments (of equal specific activity) from the same parent fragment. Relative binding affinity of I(‘1’4 fm each subfragment was then compared by sca,nning gels on an AMRIS unit. Relative binding was quantified on a scale of0 (-) to +.5 (1-5 being the most intense band-shifted fragment from a given region). Specificit,y was evaluatr,d by comp&it,ion with the JE4 JCP4 binding site oligonucleotide, and by autibody secondary shifts with anti-I(IP1 immunoglobulin G (JgG). (b) Possible JCP4 binding sites in the cr-tif (Vmw65) leader seyuencr with Matrix Mean values greater than 0.5. The restriction map of the &if leader (+8/f 194) is shown. Fragments below this were prepared from pMCl and are diagramed. Methods of analysis of binding for each fragment were identical with t’hosr used for (a). (fragment 6. 4 1 bp). Fragments for the cc-tif leader region include the SCZEI-AoaP (fragment 5, I!%), 9alI/~coRV F:coR?‘/AmI (fragment 7, 145 bp), Ava.JLC’oI (fragment 8. 88 bp).

1E3) dissociated the complex (lane 4). Furthermore. anti-ICP4 immunoglobulin G produced an additional shift in the complex (lane 6) that was not detected with a control antibody (lane 5). On the basis of these results, we conclude that this GFI site (from the CYCl gene of S. cerevisiae) is recognized by TCP4. Amino acid sequence comparison showed uo homology between ICP4 and the yeast factor (Halfter e6 al., 1989); therefore, these two DNA binding proteins are unrelated yet share DNA recognition elements. Results of titrating various amounts of oligonucleotide competitor (cap site from TE3) against, complexes formed with thr (:FI

site and the lE3 cap site (Muller, 1987) revealed that I(‘P4 binds to the GFT site with a 15 to 20-fold lower affinity relative to the JE3 cap site (data not shown). Thus, the ICI'4 binding to t,he GFI sik (Matrix Mean value = 0.538) is similar to that of t,hc IE4/5 promoter site (Fig. 5(a)), which has a similar Matrix iMean value (0.52). However. WV wish to stress t.he following caveat.. I)ue to t,ho ralativel~~ small number of ICJ’4 binding sites t,hat havci k)een localized. and t,he fact that’ their binding etlergies have not, been rigorously c~ompared, it is not possible to generate a highly quantit,ativc’ prediction model (namely. one tha.t will allow predic*tion of’

Derivation

of ICP4

123456 -

IcP4+Ae

-

ICP4

Consensus

Elements

the view that ICI’4 recognized (aoI1sensus and nonconsensus sit,es. For t,his reason. we sought t)o derive a model for J(‘P4 binding sites using additional data. \Z’e identified a potential IW4 site located $40 bases upstream from the TEl gene by homology to the Faber & Wilcox (19%) consensus sequence. 1l’e showed that this sequence (referred to as TEx) was a genuine ICP4 binding site by use of bandshift assays. in which monoclonal antibody against J(‘P4 suJ)er-shifted tile protein-DNA complex (Fig. d. lane (‘opper/phenant~hrolin~~ foot,printing 19). c-orrtirmed I(‘P4 binding a,t the predic+d location (Fig. 1). (‘ompetition experiments with TW4 binding oligonllc~leotides (JXUonato. 1!)90) revealed that T(‘P4 had a higher afYinit.y for the 1Ex sit,e than either TN1 or TE3 sit,es (relative binding affinity 4 to (i-fold greater than t,he TEl sit)? and X-fold greater than the IE3 (sap sit,e; see also Fig. 2). in addition to the I Ex site. we included 11 other J(‘I’4 sites in our “training set.” from which we derived our J)rofile. XII 1% sites have been footprinted (Table 1.4); seven sites contain the consensus core element =\T(‘C:T(‘, while five had been labeled non-~nsensus sit.cJs. (1,) Stntistircxl

DNA

Figure 6. ICI’4 binding to a GFI site. A 45 bp fragment c+ontaining the VYCla GFT binding sequence (1 ng) was incubated with partially purified ICP4 in a bandshift experiment under condit,ions described in Experimental Methods. Lane I. GFI probe only; lane 2. partially purifird ICP4; lane 3. partialI\, purified ICP4 plus 4.0 ng (d(:tl(‘;dGd(‘) ,4 oligonucleoiide; lane 4. partially purified I(‘P4 plus 40 np of IE3 ICP4 binding site oliponucleotide: lane 5, partially purified T(‘P4 plus anti-topc,isomerase t,ype IT monoclonal antibody ascites fluid (0.5 p(p); lanr B. partially purified ICP4 plus 0.25 pg of anti-ICP4 monocllonal immunoglobulin G. Free probe is marked as DXA: I(lP4-containing c.omplrxes are marked as TW4 and the ant,ibody-shift.ed ICI’4 complexes as ICP4 + .4 B.

binding afinity). On the, other hand. the results presented here are t.he most detailed analyses that have been c~omplrted for TCP4 binding sites. and it is c.lear t,hat the profile we derived has value because it predicated the locations of several new I(‘P4 sites.

4. Discussion (a)

Iden.ti,ficution.

oj I(‘P4

bindiny

465

crnnlysis

of /(‘I’3

hirrdim~

sitrs

The base proportions of all 1P J(‘P4 sites are shown in Table 1 R. In order to dotermine which positions were conserved among binding sites, we anal?wzed the information cont,rnt of the set. The informat.ion t*ont.ent. plotted by posit ion in Figure 3(a). shows the deviation of base proportions in the database from random (0%) ba,se Jjroportions (Spitzner & Muller. 1988; Ktormo. 1988, 1990). These results ident,ified a number of posit.ions within a 13 base region as important: this region is shown underlined in the sequences given in Tabhb 1A. We also analyzed the I(:P4 sites using a ch-square t.estwhic,h allowed subtraction of background base proportions (0%4 Cl +(‘) from the samplr. The peak in Figure 3(b) was very pronounced region compared to flanking positions. suggesting that the fanking positions were not significant (compare Fig. :3(a) and (1))). Therefore. the positions that conveyed the important base information in the J(‘T’4 dat)abase were in t,he + 1 t,o + 13 region; this binding site size corresponds closely to footprinting data (SW Figs 1 and 4). Using the results from Table 1 and Figure 3(a) and (b) we proposed t-he TCP4 (~ons~‘nsus sequence shown in Figure 3(c). The five sitrs in Table 1 J%that had been (AaIled non-c’onsensus sit,es arck actually quite homologous to our new ICI’4 (aonsf’nsus sequence (2 match 1i of 13 posit#ions and 3 match I:! of 13 positions).

&5

The results linking ICI’4 binding sites with the regulation of HSV-I gene expression make it clear that it would be valual)le to have a model to describe TCP4 binding sites. A consensus sequence was derived for l(ZP4 sites (Faber & Wilcox, 19%); however. this did not describe all sites. leading to

(c)

Missing

contact

nnal?ysis

The results of statistical analyses must be viewed cautiously when sample sizes are small. For this reascjn, we wanted an independent biochemical mrasurc’ of bases import,ant for ICP4 binding to validate our stat’ist,ically derived model. The missing

Figure 7. Distribution of ICP4 Matrix Mean matches in the HSV-1 genome. The entire genome is represented with :! lines showing, on top. the fraction of genome, and below this, the distance along the genome in kb (lo3 bases). HSV-I protein-coding regions are indicated at the appropriate positions along the genome (directly below the distance map). wit’h direction of translation designated by arrows; this Figure was taken from McGeoch rt al. (1988). Above t,hri gene map is a histogram showing the locations of all sites in the HSV-1 genome (st,rain 17) with an ICI’4 Matrix Mean score of 0.55 or greater; the genome was subdivided into approximately 400 bins for this analysis. with each bin spanning 400 bp. The height of each bar indicates the number of Matrix Mean matches within each bin. Below the gene map. the sites with ICP4 Matrix Mean scores of at least @6 are marked with vertical arrows; the numbers 3 below the arrows indicate that at these sites there are 3 sites with very high Matrix Mean scores that are too close together to mark separately. The sites labeled a, b, c, d, and cap refer to sites to which binding has been demonstrated (see above); they are. respectively, the TEx, TEl, cr-tif, IE3 cap site, and the glycoprotein D binding sites. (UL: Unique Long; C’S: Unique Short.)

contact analysis (Brunelle & Schleif, 1987) is a powerful technique for determination of residues important for complex formation; in addition, the method is useful in identifying contacts on all four bases and reveals both positive and negative base contributions to complex formation (Brunelle & Schleif, 1987). The data in Figure 4 (summarized in Table 2) suggest that a 13 base region is the essential zone for interaction with ICP4. This is slightly smaller than the region protected from copper/ phenanthroline modification (16 bases total, but only 13 bases were strongly protected; see Fig. I), suggesting that ICP4 covers the DNA helix over a region wider than 13 bases, but not in a basespecific manner. Importantly, this 13 base region implicated biochemically in ICP4 binding is the same 13 base region identified bv the statistical analyses. Comparison of the individual results in Figure 4(a) to (c) reveals that all 13 bases of the IEx sit,e (our strongest site) contributed positively toward ICP4 binding, while in the two somewhat weaker sites (IEI and IE3) there were several posi-

tions in which base elimination did not significantly reduce protein binding. The identities of these bases in IF:1 and TE3 were clearly different from those at. the corresponding positions in the TEx site. From these results, we inferred that it, would be appropriate to use the 13 base region for further analyses of ICP4 binding site data. (d) Matrix

Mean analysis using observed base proportioned

There are a number of problems associated with the use of consensus sequences to specify degenerate binding sites. These problems can be minimized hi using a matrix of base proportions as a binding site profile (for example, degeneracy is less of a problem as positions need not be weighted equally, unlike a consensus sequence. which can be thought of as a matrix of zeros and ones). For our profile, we initially selected the 13 base inclusive region shown t’o be important for ICP4 binding by the information analysis (Fig. 3(a)) and by missing contact

Derivation

of ICP4 Consensus Elements

analysis (Tablr 2). We analyzed a variety of ICP4 sites and non-sites with this 13 base matrix and found this to be a good discriminator for predicting these faithfully (using a threshold Matrix Mean value of 0.51). Subsequent tests of independent da&a (the sit,es are discussed below) with matrices of different lengths (14 bases, 23 bases, etc.) or exclusion of certain positions led to results less reliable than our 13 position profile. Our ICP4 profile, therefore, is t,he 4 by 13 matrix that’ is positions 1 to 13 in Table 1B. Comparison of a site to the matrix is analogous to use of a consensus sequence. hut values are taken directly from t.he matrix, using t’he value of the corresponding base. Use of t.hia profile is illustrated in Table 3A.

467

Buchman et al., 1988; Chambers et al., 1990: Vignais et al., 1987; Halfter et al., 1989; Hamil et al., 1988). The consensus sequence for these proteins is clearly quite similar t.o that of ICP4. To test the relationship between ICP4 and GFI, we selected a sequence (CYCla) (Dorsman et al., 1990) with a ,Matrix Mean value l)f 0.538. The experiment displayed in Figure 6 shows that this site bound TCP4: however. ](‘I’4 binding affinity at this site is 15 to 20-fold lower t,han the TE3 cap site. based upon lE3 cap site oligonueleotide competit,ion analysis. Thr G Fl site is slightly different (+9, + 11, for example) compared to the other TCP4 sites (Table 1A). These differences are important becausr deriving a more aca(.urate model for TCP4 sites will ultima.tt~ly require large set, of binding sites and relative binding afinities. in which sit,es are examined that con&in all possible base substitutions. This iti obviously be\-and the scope of this paper: however. onl,v in this manner will wc be able t.o determine t hr rontribution of each base at, every position. For example. it appears from the current data that csertain misrnatc*hes from preferred nucleotides art‘ ac*ceptable. but it is not known whether all are equal. Furthrrmore. we do not know whether certain subst,itutions actively interfere with l(~‘T’4 binding, as opp~)serl t.o simply lacking hydrogen bond format,ion. Although wc do not have a perfect model rlow. t,his is the most extensive stat.istical analysis of I(?‘4 l~intling sites that has been reported. and the resulting profile is clearly useful in locating sites in a given sequence: however, we hast)en to add that this modrl is not intended to be quantit)ative. 111general, it appears that sequences with the highest scorns tha,t are greater than about 0.51 will be bound by T(!P-C:in addition. from the data examined thus far, all isolated sites scoring at least 0.51 wert~ bound by l(‘l’-C. ;tnd sites with ,Matrix Mean scores of’ 0% or greater wertl bound with high affinity. iI

(e) The Xatri.r

Mean proJile accurateby predicts ICP4 silrs

As a first test for whet,her our profile was also valid for predicting sites, we examined the sequence upstream from the HSV-1 thymidine kinase gene. Tmbalzano et (~1.(1990) reported a band shift of the fragment spanning - 2.~4 to - 197 relat,ive to the thymidine kinase transcription start. U’e scanned this sequence and found a single Matrix $lean value greater than O-5 (O-526), as shown in Table 3A. This site showed a reasonable match to our ICP4 consensus sequence, but was not predicted by the Faber & Wilcox (1986) caonsensus sequence. Comparing the I)MS interference pattern of the thymidine kinasp site with other bona $de ICP4 sites confirmed t’hat our profile predicted t,he correct site (Table 3(r). An additional test was conducted on the IE4/5 upstream region. We show in Figure 2 that this DNA region is specifically band-shifted by TCP4. Analysis of the sequence by the Matrix Mean profile showed a total of four sites scoring at least 0.5 (050. 0.50. 0.51, 052). These sites were separated into subfragments and analyzed for specific band shifts with partially purified TCP4; t,he results arr summarized in Figure 5(a). Comparing all of the sites in this region. fragment 4 contained t,he highest scoring site (0.5%), and was shifted most efficiently. while fragment 2 (0.51 Matrix Mean value) was weakly shifted. and fragment 3 (0.50 value) showed JIO detectable T(‘P4 binding. A similar experiment was conducted with a sequence from the a-tif (Vmw65) gene (Michael et ul., 1988). The analysis indicated three sites with high scores (as displayed in Fig. 5(b)): DNA fragments containing each of these sites individually were shifted by l(lP4. Thus. by in vitm analysis, the m-tif promoter contains three high-affinity ICP4 sites; this may be reltxvant to cr-tif regulation in viva. as placement of multiple T(P4 binding sites in a promoter led to increased stimulation of gene expression by ICP4 (Tedder & Pizer. 1988). We also examined an oligonucleotide t,hat contained a sequence identified as a high-affinity binding site for one or more yeast DNA binding protein/transcription fact,ors (called GFT, ABF-1, HAFl, SI’F and TUF: Dorsman rt nl.. 1990:

(t’) Potf&ial

I(‘P4 sites in t/w [IS C.-1 ypnorrw

%‘e used our profile to examine the distribution of potential TCP4 binding sites throughout the entire HSV-1 genome; a summary of sites scoring at, least 0.55 and at least @6 is presented in Figure 7. These conservative thresholds were used becaause I(!T’4 is likcblv to be competed away from weak sit,es 1)~ proiimal stronger sites (see Spitztier & .Mullcr, 1989). A glanah at the predicted binding sites revealed several interesting observations. From the HSV-I gene map (3lcGeoch crl.. 198X). we arbt.rarily refer t,o regions that coded for proteins plus 300 bases 5’ of the translation st;\rts as genie regions. while the remaining sequences are nongenie: based upon t)his, it appears t.hat approximatSelF 90°, of the genome is genica a,nd 1O’j;, nongenie,. Of the 556 sites w-it,h Matrix Mean scores greater than O..%, 536 were in genie regions. 20 in non-genies regions: this is 1 site per 250 g&c basrs. and I site per 760 non-genie bases. (‘rrtain gene types appear to be associated with greater numbers (if pot.eritial I(“T’1 sit,es. Sprcitic~ally. 1153 (lE175, et

,I. ;I.

4fiX Vtnw17.’

i)ilk~nmto

et al.

_--._

---

and

I(‘I’4)

eight sites tiispla,yrtl

wntains

Matrix

an a~~untlatlc~t~ of’ sitc5:

.-

l rvgulator\~ Jjroteins. II. Thr bindIng sJw*ific.it,y of’ c.,vc*licaAMI’ rrcvptor protc*in rrc~ognitioil sites. ./. Mol. Wiol. 200. it)!)-7%X

10

Berg. 0. (:. & VOII Hip@. I’. H. (l!Wh). Stxlet-tion of I)N,I I)inding sites b>. regulatory l)rot,rins. ‘I‘rrnrls Riodwn/. Sri. 13. %Oi-:!I I. Brunellr. .I. & Stbhlirf. R. F. (19Hi). Missing cvntact intrrat+ions. I’roc. Sffl. probing of I)SVA protein .-1rtrd. Sci.. f..S.A. 84, MTMX’iG. tluc>hman. ,j. I<., Kimmerl>. IV. ,I., Iline. .I. & Kornherp. R. I). ( l!MX). Two 1)X&binding factors revognizc sJwc*ific, st~c~uences at silrnwrs. upstream activating stvlut’n~vs. autonomously rrplic~ating sryuencw and trlomerrs in Socchnrorrcycrx wwvisiw. Mol. (‘r’ll. Hid. 8. 210 -“%.5. (‘hamhers, ~1.. Stanway. C’.. Tsang. ,I. S. H., Henry. Y.. Kingsman. ,A. .I. & Killgsrnan. S. 21. (1990). ARS binding fatbr I binds adjacent to JtAJ’I at the IlASs of’ thcl yeast glgcwlytic, genes JWK and I’YK I. Xucl.

.4rids Kr.s 18, 5393 5399. (‘osta. I:. H.. (:ra,vson. I). It.. Santhopoulous.

Ji. K. & I)ariwll. .I. E. (1988). .\ liver-spwific~ l)N;A hillding protrirr rrc~ognizrs multiple nu&wtidt~ sitw in rryylator~ regions of transthyrrtin, antitryJ)sin. albumin and simian \-iws 10 genes. /‘i-w. n’trl. .Jmd. Svi.. I..S. A 85. MM- SW.

I)rJ,uc~a. N. .I. & St*hafCr. I’. A. (198X). J’h?;sic,al and functional domains of thr herpes simplex virus traw sc,riJJtional regulatory protein I(‘J’4. ./. l’iml. 62. 732 -713.

DiDonato.

,J. 1). (1990). The DNA

binding

regulatory properties of the herpesvirus early protein IC1’4. Ph.D. thesis. ITniversity.

JIiDonato.

and ~T~ULYimmediate

Ohio

State

.J. A. & Muller. kJ. 7’. (1987). Analysis of’ thr

DT\;A sequence that confers negative regulation of immediate early gene S. 12th International Hrrp~s l’irus Workshop, Philadelphia. PA, p. 233. (tjbstract). l)il)onato. .I. .\. & Muller. Jl. T. (1989). DIVA binding and g(‘ne regulation hv thfb hrrJw sitnplrx virus tyJ)r I l)r.ottaill I(‘J’4 and ‘involvement of the TATA element. .J. I.irol. 63. X7:%7 S7-C7.

IIixon. \c’t, exJ~r~ss our gratitude to qJost:ph Spitznrr f’or aJ)J)Iying his expert,& in computer J>rogramming and c1at.a analysis to adapt the EDEN Genesys computri (TEAM Assoriates: Westerville. soft\\ arc’ system OH 430X1, I’.S.A.) for use in all of our statistical analysrs. \\‘e also thank .J. H. Spitzner for his valuable nssistarrcta this work, and 1,. I’izrr. S. Ailverstrirr. ii> editing I<. E\errt,t for discussions of data. This work was supported hy Public Health Service grant AIldX3Nd. .J.R.S. was supported by Public Health Services post,tloc%oral t,raining grant CAO9338-13.

-.-. --

References

Jlcan vnlt~r~s g~~:~ttbr,

t,han CM (a 5-folcl enrichment over thra rest of’ thrk ge~romt~). (‘lustrrs of high Matrix Mean values wt’re also noted in a group of glyc*oprotcirr gent’s ( IrS1. (‘ST,. [‘Sfj and I’S7). If glycoprott~in I) (I’M) is r~,J)r~s~‘“tati~,~,. it is likely t.hat thest~ grnrs uoultl also hcl activatrd hy ICI’4 binding (Tedder rf (I,/.. 1989; Tedder & Pizer, 1088). An abundance of sites was found within genes involvrtl in I)SA rrplicttion: the seven pssent,ial genes (11’11 rf al.. 198X) csontain a t’otal of 1S sites scoring Matrix Means of at least 0% and except for (‘I,8 and -!J all others contain t’wo or morn sitc5. In other gene regions. sittts werr noticeably infrequent) (I’l,R6. 1’1,%4 t’o -26. 1’1,48 t’o -45 and the region 3’ of IF,1 ): howt~vt~r. it should br strtxssed that, although thpse regions appear dqMed of 0% sit’cs. tht>y still taontain Matrix Mean scores of 0.55. ’ The signiticant~t~ of I(‘P4 sit,es within genes is not known. It is likely that &her DNA binding prot,eins atlti transcription factors will influencr t,he radial at’t’tass of l(“T’4 to a given sit,r in t*hc helix. Xdditionally, the number of ICP4 molecules, their 1)NA binding uc*t)ivity (which may he modified hy post-t~ranslational modifications: Mic~hael pf al.. I!)%%) and the number of viral genomes within a ~11 all act t,ogrther to d&ermine which part,icular viral sitcbs art’ likely to hr bound at a given time. In t~xamining the distribution of high Matrix Mean values in the HSV-I genomc (Fig. 7). perhaps the most intrrest’ing ohservat,ion is the high density of I(‘T’4 s&s predict,ed wit,hin the IE3 gene itself. an ohsrrvation that implicates lCP4 hinding in a reglatory feedback as follows. Tn very early post infetaCon, the numerous binding s’it,es may tit#rate available T(rP4 molecules away from t,hr binding sitr at the cap site t,hat is required for repression of 1 II’:< rxpression (Roberts it al., 1988: I)eLuc+a K: Sc*hafft>r, 198X: I)iI)onato. 1990); these more numer’011s sitc,s may actually stimulate expression of t tie gtbntl al. t.his time. iIs the infecstion progresses. rndog~ous levels of 1(‘P4 become elevated t.o the point whcrr all binding sites are occupied and the IF:3 promoter is repress&. These ideas are quite speculaquestions must be answrred t iv,,. and man) t*ont~erning both -finer details of l(‘T’4 binding sites antI t)heir locations hefore we can underst,and t’hr role of T(‘P4 in HSV-I infect,ions. The ubiquity of I(‘I’3 sites in thr genomr suggests that this essent,ial protein is a kty factor in the cascade of gent’ rqulation.

---~.-----~

It. .\. F. & Shaffer. I’. A. (1980). Fint,-strnc,t,urt,

tnapl)ing a11t1 func%ional analysis of t,rniprrat,urrsrnsitivv mutants ill the gene encoding t hr herJws simplex virus type I immrdiat,r early protein VJ’I 7.5. ,J. I’irol. 36. IX!)--203.

J)orsman. .I. C’.. van Henswijk.

\2’. 1’. & (irivthll.

I,. ?r.

transcriJ)tion factor (iFI: requirrmrnts for hinding t,o I)IG’;A and W,"?""" t~volutiotlw~ cwnsrrvation. Sucl. Acids RPS. 18. 2’769 “TTli. Dynan. \V. S., Sazrr. S., ‘I’jian. I<. Xr Schimkr. IX. (1990). Transc~ription factor’ SJ’I rrwgnizes a I)h’A srquen~t~ (I !)!)O).

L’tast

~f2llr~iIl

Derivation

in

the

of

ICP4 Conwnsus

mouse

dihydrofolate reductase promot,er. 319, 246-248. Kvrrett. R. I>. (1987). A detailed mutational analysis of Vmwl 10. a tmns-acting transcriptional act,ivatol encoded by herpes simplex virus type I. EXBO -1. 6. 2069-2076. Faber. S. W. & Wilcox. K. W. (1986). Association of the herpes simplex virus regulatory protein ICP-1 with specific nucleotide sequences in DXA. Sucl. Acid.9 RPS. 14. 6067-~6083. Faber. S. ET. hz. Wilcox, K. W. (1988). Association of herpes simplex virus regulator?; prot,eirr IC’P4 with the TCP4 gene transcript,ion srque11ces spanning initiat,ion site. LV~~cI. .-l(.id.s Rrs. 16, 555-570. (:elman. I. H. & Silverstein. S. (1987n). Herpes simplex virus immediate-early promoters are responsive to virus and cell tmns-acting factors. ,/. F’iml. 61. 228ti2%!)6. (:elman. I. H. & Silverstein. S. (19876). I)issection ot immediate-early gene promoters from herpes simplex virus: sequenc~es that respond to t,hr virus t,ranwriptional achtivators. .J. Viral. 61, 3167 -X17$!. Hahn, S.. Kuratowski, S.. Sharp, P. A. 02 (iuarente. L. (1989). Yeast TAT&binding protein TFTTD binds to TATA elements with both consensus and non~~~tlsensus IIXA sequences. Proc. Sat. .-Imd. Sci.. IT.S.A. 82, il78 7182. Halftrr, H., Muller. I-.. IVitunacker. E.-L. & (iallwitz. I). (I 989). Insotat,ion and DNA-binding characteristics 01’ a protein involved in transcript,ion activation of two diwrgently transcribed, essential yeast genes. S&m

(London),

EMMKO

.J. 8. 3029-3037.

Harnil. K. (i., Sam. H. G. & Fried. H. &I. (1988). (‘onstitutive transc.ription of yeast ribosomal protein gene ‘IY’MI is promoted by uncommon cis- and trrrrw acting elements. Mol. (‘ill. Hiol. 8, 48%8~3341. Imbalzano, A. N.. Shrpard. A. A. & J)eLuca. 9. A. (1990). k‘un&)nal relcranw of specific interactions between hwpes simplex virus type I TCP4 and sequences from tt1r IJromot~,r-regulatory domain of the viral thymidine kinasr gene. .J. I’irol. 64. 26°C 2631. Jones, K. A. CprTjian. R. (I 98;s). Spl binds to promoter and src~utw”w activates herpes simplex virus “irntrrrdiate-c~arl~” gene transcription in ~ilro. Saturr (Lo~~o~~). 317. li9-182. Kat,tar-(‘ooley. I’. & \Vilcox. K. 1%‘. (1989). (‘haracterizatiotr of’ the l)XA-binding properties of herpes simplex virus regulatory protein ICP4. ./. I’irol. 63. 6!)ti-

704.

Kristitb. T. .\I. & Roizman. 11. (1986a). Alpha 4. the majot regulatory protein of herpes simplex virus t,ype I. is stabI>. and sIwcificall> associated wit,h promoterregulatory domains of alpha genes and of selected other viral ptws. l’roc Sat. Acad. Sri.. Iy.S.A. 83. z3.j 18 .‘{“‘N i-u_

Kristin. 7’. >I. & Roizman. 13. (19866). DSAbinding sit,e of’ major regulatory protein alpha 4 specificall> asswiatrd with promot,er-regulatory domains of alpha genes of herpes simplex virus type 1. Pror. Sat. .-lrnd. Sri., I’.,S.A. 83. 4700k4704. Kuwabara, .\I. I). 8r Sipman. D. S. (1987). Footprinting l)KA pr’otthin complexes in situ following gel retardation assays using I. IO-phenanthrolim-copper ion: b~.schpricki* coli RNA polymerase-laca promoter complexes. ISioch,rmistry, 26. 7234-7238. Maxanl. A. M. & (iitbert. W. (1980). Sequencing endlabelled DN;\ with base-specific chemical cleavages. Mdhods Enz~~rt~~l. 6.5. 499-560. McGeoc*h, I). .I. l)alrymple, M. A., Davison, A. J..

Elements

469

Dolan. A.. Frame, M. C.. MeNab. D.. Perry. L. J.. Scott. J. E. & Taylor, P. (1988). The cornplebe DX‘4 sequence of the long unique region in the genome of herpes simplex virus type I. .I. Gan. T’irol. 69. 1X111.574. Micahael. N. & Roizman, R. (1989). Binding of the herpes simplex virus regulatory protein to viral DNA. Proc. Sat. Acad. Sci., U.S.A. 86, 9808%9812. Mi(~hael. S.. Spector. I).. Marromara-Xazos. I’.. Kristie. T. JI. & Roizman. I<. (198X). The DXA-binding properties of the major regulatory protein alpha 4 of herpes simplex viruses. Science. 239. I.531 1534. Jluller. 11. T. (1987). Binding of the herpes simplex virus immediate-early gene product I(‘I’1 to its O\VII transcription start site. J. ITiro/. 61. 86X-865. Mulligan. 11. E.. Hawley. I). K.. Entrikrn. It. B JIc(‘lure. \2’. R. (l!M3). Eschrrichic~ coli promotrr src~urntw srlrc~ti\.it~. S/rc,l. lJrw1ic.t ire vitro R,SA\ polymrrasr ;Icids Krs. 12. 7X9-800. Paterson. T. Cyr Everett. R,. I). (198X). The regions of the herI)es simplex virus type I imrrirdiate early protein Vmwl7.5 required for site specific, I)Xi\ binding to those involved in trandOSd\ wrrespond scGpt\onal regulation, S~rl. .-lrid.v Krs. 16. I IO05 I 10%. Pfeifer. K.. I’rezant. T. Hr (~uarente. I,. (I9Xi). ITeast HAPI ac%ivator binds to two upstream ac+ivation &es of different srquenw. (‘r/l, 49, I !I- 27. Pizer. I,. T.. Tedder. D. G.. Everett, R I).. Beard. I’. 8r Wilcox. K. (1989). Nucleotide sequences of t’he ICP4 binding sites in the promoter region of glgcoprotein genes. 14th Znternational Herpes C’irns Workshop. Syborg Strand, Denmark. p. 50 (Abstract). Preston. (‘. 31. (1!)79). :ibnortnal propt~rtirs of an immediate early polypept,ide in cells infec*ted with the herpes simI)lrx virus type I mutant tsK. .I I-i&. 32. x55-3x1. Robcart&, .\I. S.. I~oundy. A.. O‘Hare. I’.. I’izzorno. M. (‘., f’iufo. I). JI. & Hayward. (:. S. (I!,#). I)irwt c~orrelation between a negative autorrgutator~ resyJonsr element at the cap site of the hrrpes simplex virus type I 1 El 7.5 (alpha 1) promoter atltl a specific binding site for the TEf7.5 (T(‘f’l) IJr’otritl. ./. I~iwl. 62. 4307 -%?(I. Shepar(l. A. A.. lmbalzano. A. S. & DeLuca. N. A. (l!#X!)). of’ primary structural cwmponents conSeparation ferring autoregulation. transactivatioti. and f)XAbinding proper&s to the herpex sitnplex virus trarlswiptional regulatory protein T(‘P-l-. .J I’irol. 63. :1711~:37"x. Siebenlist, Y. & Gilbert. \T. (1980). (‘ontac+s between Esc hrrichicr coli RNA polymerasr ancl an earl) Inwtnotrr of phage TS. ~‘ux. SU!. .I/w/ ,Sci.. IY.R.,J. 77. IL-IX Spitznrr. J. R). B Muller, M. 1’. (1!4XX). ,-\ cwrsensus sequen(~e for cleavage t)y vertebrat,r 1)X,-2 toI)o5’573 ,%jti. isomerase IT. S/tcl. Acids Ilrs. 16 ..<.1 Spitznet.. .I. R. & Muller. 11. T. (l!)H9). i\pplic~aticJn of a (‘onsensus to quantif) tleyeneratt~ seq utww rec~)gnition sites by vertebrate DNA topoisomerwsr II. .I. Slol. H~co~. 2. 63+74. Stortno. (:. (198X). (‘omputer tuethocls fat, anal,vzing sequence rrwgnition of nucleic ac.ids. .-I nncl. fir,/%. Biophys. 13iophy.s. (‘hem. 17, 241 -X%. Stormo. (:. (1990). (‘owensus matching in 1)X.\. JIrthod.s En,yn~ol. 183, 21 f&dpl. Stormo. (:. & Hartzell. G. (I!#!#). 1dentiflvrn.g protrinbincling sites from unaligned DXA fragments. l’roc. .Vn!. rl/VfJ Sri.. I’.S.A. 86. 11x:) -1 187.

Su. ‘I’. %. KL El-(kbbrly. ttlUtiXpPn&S

using

>I. li. (198X). A ttlultisitc~-dirc~(~~~~(l

\2’.

‘IT

t)itttlittg ofTIvF yeast ritwsomal 1457.

for rrcwnstruc-ting X1-8!). Tdtlrr, I). (:. & Pizrr.

DNA

pol~mrr;bsr:

a mammalian

application

grttr.

G~nr.

69.

1,. I. (1988). Kolr for I)XUA protein of the hrrpes simplex virus glycwprotein I) gene. J. I’irol. 62. 4cilil~ 467%. Trddw. I). (i.. Everrtt. R. I).. \lWwx. K. LV.. Ikarti. I’. & I’izrr. I,. I. (1989). T(‘P4-ltintling sitrs in thr promoter and c-ding regions of the hrrpw sitttplt~s virus gl) gene cwdrihutr to activation of in rifro transcription by l(‘1’4. J. Vird. 63, 2510--2520. \‘ignais. M., Woudt~, 1,. P.. Wassrnaar. C. N. Magrr. intrrac*tiott

in

t-1..

Setrtrtlac~.

A. xr l’lantit, R. .I. (lrrxs). Sl’w’itic~ fwtor t.o upstream wtivatiott sites oi protritt genes. BMHO .I 6. t 451

activation

K&ted

LVu. (‘. .j.. Nrlson. pi. .I.. Mdkwh, I). .I. Ki (‘tralltwg. 11. I). ( l!lX8). Identification of hwpes simplex virus type I g!cnw required for ori#itt-tlt:~tc,ntlrrlt I)Iv:1 synthrsis. ,/. I’iro/. 62, 435--443.

by Ir’. E. Cohrn