HANS-PETER KLENK AND W. FORD DOOLITTLE HANS-PETER KLENK AND W. FORD DoolrrrLE
EVOLUTION EVOLUTION
Archaea and eukaryotes versus bacteria? The recent discovery of homologs of the eukaryotic transcription factor TATA-binding protein in archaea has been taken as support for the view that archaea and eukaryotes have a close phylogenetic relationship. In their molecular features, archaea [1] may be as distinct from bacteria as bacteria are from eukaryotes. For insights into the relationships between these three ancient cellular lineages - which have been called 'domains' - evolutionary biologists find it illuminating to study the independent refinement, in each of the three domains, of molecular components and processes assumed to have been present in their last common ancestor 3-4 billion years ago. The components of the basic transcription machinery - DNA-dependent RNA polymerases, promoters and transcription factors - are promising candidates for such study, because transcription must have been in place in some form before the separation of the domains.
subunits of the archaeal and eukaryotic polymerases was demonstrated, first by semiquantitative immunoblotting and later by comparative sequence analysis. Furthermore, several of the small RNA polymerase subunits found in archaeal and eukaryotic polymerases have also been found to be homologous, but have no counterparts in bacterial polymerases [4,5]. The similarity between the archaeal and eukaryotic transcription machinery was further shown by site-specific mutational analyses of the ribosomal RNA promoter of Sulfolobus, which established an essential role for a TATA-box sequence - resembling the motif known from eukaryotic polymerases II and III promoters - in transcriptional efficiency and start-site selection [6].
Eukaryotes and bacteria follow different strategies for the initiation of transcription. In bacteria, RNA polymerase finds promoters with the help of specific initiation factors (-factors) that are first bound to the polymerase and then mediate the binding of the polymerase to the DNA. In eukaryotes, general transcription initiation factors first form complexes with promoters, and then these complexes are recognized by the RNA polymerases in the formation of preinitiation complexes. The three classes of functionally specialized nuclear RNA polymerases - I, II and III - employ different sets of transcription initiation factors, which serve to recruit RNA polymerases to their correct transcription start sites. The TATA-binding protein (TBP), originally identified as a component of the multisubunit transcription factor IID (TFIID), is the only one of these factors that has been shown to be required by all three classes of RNA polymerase, regardless of whether the recognized promoter does or does not contain a TATA box (for review, see [2]). Until recently we knew nearly nothing about transcription in archaea. That is beginning to change, however: a number of archaeal transcription factors have recently been identified (Fig. 1) and are shedding light on the vexed question of the relationships between archaea, bacteria and eukaryotes.
Archaeal transcription factors appeared on the stage in 1992 when Ouzounis and Sander [7] identified an open reading frame encoding a homolog of the eukaryotic transcription factor TFIIB in Pyrococcus woesei. In eukaryotes, TFIIB associates with TFIID, the only one of the general eukaryotic factors with an intrinsic capability for site-specific DNA-binding (to the TATA box via its TBP subunit), and mediates the contact to RNA polymerase II with subsequent initiation of transcription. Together with the similar TATA-box-like promoter motifs and RNA polymerases in archaea and eukaryotes, the discovery of an archaeal TFIIB homolog prompted Ouzounis and Sander to predict that TFIID should also be present in archaea.
The first component of the archaeal transcription machinery to become available for comparisons with bacterial and eukaryotic counterparts was the DNAdependent RNA polymerase. In 1983, Huet et al. [3] reported that the complexity of archaeal polymerases is similar to that of the eukaryotic polymerases, which are made up of 10 to 15 subunits. With only four different subunits, bacterial enzymes are far less complex. In addition, a high degree of similarity between the large 920
Fig. 1. Components of the archaeal transcription machinery and dates of their discovery. TAFs, TBP-associated factors; aTFA and aTFB, archaeal transcription factors A and B [12].
© Current Biology 1994, Vol 4 No 10
DISPATCH The discovery of this missing link in the archaeal transcription machinery has now been reported. Recent papers by Marsh et al. [8] and Rowlands et al. [9] describe the identification and analysis of putative homologs of eukaryotic TBPs in two closely related extreme thermophiles, Thermococcus celer and P. woesei, respectively. Whereas Rowlands et al. fished out the gene encoding the missing TBP of P woesei by using the polymerase chain reaction with degenerate oligonucleotide primers derived from the highly conserved carboxy-terminal sequences of eukaryotic TBPs, Marsh et al. found the T celer counterpart while exploring random genomic DNA in the course of a genome project. Both archaeal TBP homologs have molecular weights of about 21 kD, corresponding to lengths of 189 and 191 amino acids, respectively. Their sequences show 82% identity with each other, and 29-41% identity with the essential, conserved 180-residue domain that occurs near the carboxyl terminus of all eukaryotic TBPs [8,9]. The amino-terminal domain of the eukaryotic TBPs, which is neither conserved nor functionally essential, is missing from both archaeal homologs. Both archaeal TBPs share with their eukaryotic counterparts an imperfect sequence repeat of about 90 residues, of which 35 have been shown, in yeast, to contact DNA; of the two copies of the repeat sequence, 25 and 22 residues are identical in the Pyrococcus [9] and Thermococcus [8] TBPs, respectively. Rowlands et al. focused on a functional analysis of the Pyrococcus TBP. By probing with monoclonal antibodies against Pyrococcus TBP, they were able to show that the TBP gene is indeed expressed in vivo. Mobility-shift assays - in which the binding of a protein to a DNA fragment is detected by a change in the latter's electrophoretic mobility - showed that Pyrococcus T1BP binds specifically to oligonucleotides containing the TATA-box motif and to TFIIB from Pyrococcus, forming a heat stable TFIIB-TBP-DNA complex. Pyrococcus TBP also binds tightly and specifically to certain mammalian transcriptional regulators. Rowlands et al. predicted that archaea will also turn out to have homologs of other eukaryotic transcriptional components, such as TBP-associated factors (TAFs), TFIIA, TFIIE and TFIIF (Fig. 1). One putative archaeal homolog of a eukaryotic transcription factor (TFIIS) was in fact reported last year by Langer and Zillig [10], from studies with Sulfolobus acidocaldarius. But when Kaine et al. [11] analyzed the corresponding gene, discovered recently in a random survey of the T celer genome, they found an even higher similarity of the archaeal sequences with those of small subunits of eukaryotic polymerases I and II. Kaine et al. concluded that the ancestral gene encoded a TFIIS-like transcription factor, and must have duplicated in eukaryotes, generating genes for the eukaryotic RNA polymerase subunit, which has diverged functionally from the ancestral protein, and TFIIS, which has retained the original function. But it seems likely that further archaeal homologs of eukaryotic transcription factors will soon be
Fig. 2. Possible branching topologies for the universal phylogenetic tree. Blue and red, two character states. B, bacteria; A, archaea; E, eukaryotes. discovered. At least functional evidence for the existence of two more archaeal transcription factors was reported recently by Thomm et al. [12]. Marsh et al. [8] focused on a phylogenetic analysis of Thermococcus TBP. They inferred phylogenetic trees from the highly conserved carboxy-terminal domains of the eukaryotic TBPs and Thermococcus TBP from distance, parsimony and maximum likelihood analyses of the amino-acid sequences. In spite of the good agreement between the results determined with the different methods, the relatively small size of the TBPs gave poorly resolved trees and a barely convincing tree topology. In a comparison of the TBP repeat sequence, they found the highest fraction of identical amino acids between the two copies of the repeat in the Thermococcus TBP, concluding that this reflects greater conservation of the archaeal sequence. The inferred phylogenies of the TBP repeat sequence support the view that there was a single duplication event before the divergence of the archaea and the eukaryotes, and a lower average rate of sequence change in the archaeal lineage. The relatively low rate of sequence change in the archaea, and the close phylogenetic relationship between the archaea and the eukaryotes, led Marsh et al. to suggest that the archaeal genome might be a window into the primordial eukaryotic make-up [8]. If they are right, up-coming archaeal genome projects will be helpful for the investigation of (our) eukaryotic history. It is tempting to conclude [13] that these demonstrations of profound similarity in the transcriptional machinery in archaea and eukaryotes help establish a root for the
921
922
Current Biology 1994, Vol 4 No 10 universal tree, with bacteria branching off first and archaea and eukaryotes later, as sister lineages (tree 1 in Fig. 2). Such a rooting has in fact been favored by sequence analyses of duplicated elongation factor and ATPase genes [14]. But the data cannot, with strictest logic, be taken as any real support for this root: having complex factor-dependent transcription machinery could as well be a primitive feature (characteristic of the universal common ancestor), with bacteria having since simplified the whole process as part of a general genomic 'streamlining' (any of trees 2a, 2b or 2c in Fig. 2). By the same token, even though we may fully accept that bacteria form the deepest branch- that is, accept either tree 1 or 2 a in Fig. 2 are correct - on the basis of the elongation factor and ATPase data, we still cannot conclude that the bacterial transcriptional apparatus is the most primitive. Nevertheless, there is one extraordinarily important and largely unexpected conclusion that is allowed by even the strictest cladistic logic. This is that complex eukaryotestyle transcription, with multisubunit polymerases, prior binding of transcription factors and perhaps much else, did not first evolve in eukaryotes. Much of this complexity was already in place in the prokaryotic ancestors of the first eukaryotic cell. Further detailed study of archaeal genomes should tell us in what other ways the first eukaryotes looked like their last prokaryotic ancestors.
3. 4.
5.
6.
7. 8.
9. 10.
11. 12. 13.
Acknowledgements: We ask for the understanding of all authors who could not be mentioned in order to limit the number of citations. Hans-Peter Klenk has been supported by the Canadian Genome Analysis and Technology Program.
References 1. Woese CR, Kandler O, Wheelis ML: Towards a natural system of organisms: proposal for the domains Archaea, Bacteria and Eucarya. Proc Natl Acad Sci USA 1990, 87:4576-4579. 2. White RJ,Jackson SP: The TATA-binding protein: a central role in
14.
transcription by RNA polymerases I, II and IIIl.Trends Genet 1992, 8:284-288. Huet , Schnabel R, Sentenac A, Zillig W: Archaebacteria and eukaryotes possess DNA-dependent RNA polymerases of a common type. EMBO] 1983, 2:1291-1294. Klenk H-P, Palm P, Lottspeich F, Zillig W: Component H of DNAdependent RNA polymerase of Archaea is homologous to a subunit shared by the three eucaryal nuclear RNA polymerases. Proc Natl Acad Sci USA 1992, 89:407-410. Lanzendorfer M, Langer D, Hain J, Klenk H-P, Holz I, ArnoldAmmer I, Zillig W: Structure and function of the DNA-dependent RNA polymerase of Sulfolobus. Systematic Appl Microbiol 1994, 16:656-664. Reiter W-D, Hudepohl U, Zillig W: Mutational analysis of an archaebacterial promoter: essential role of a TATA box for transcription efficiency and start-site selection in vivo. Proc Natl Acad Sci USA 1990, 87:9509-9513. Ouzounis C, Sander C: TFIIB, an evolutionary link between the transcription machineries of archaebacteria and eukaryotes. Cell 1992, 71:189-190. Marsh TL, Reich Cl, Whitelock RB, Olsen GJ: Transcription factor IID in the Archaea: Sequences in the Thermococcus celer genome would encode a product closely related to the TATA-binding protein in eukaryotes. Proc Natl Acad Sci USA 1994, 91:4180-4184. Rowlands T, Baumann P, Jackson SP: The TATA-binding protein: a general transcription factor in eukaryotes and archaebacteria. Science 1994, 264:1326-1329. Langer D, Zillig W: Putative tflls gene of Sulfolobus acidocaldarius encoding an archaeal transcription elongation factor is situated directly downstream of the gene for a small subunit of DNAdependent RNA polymerase. Nucleic Acids Res 1993, 21:2251. Kaine BP, Mehr I], Woese CR: The sequence, and its evolutionary implications, of a Thermococcus celer protein associated with transcription. Proc Natl Acad Sci USA 1994, 91:3854-3856. -Thomm M, Hausner W, Hethke C: Transcription factors and termination of transcription in Methanococcus. Systematic Appl Microbiol 1994, 16:648-655. Barinaga M: Archaea and eukarya grow closer. Science 1994, 264:1251. Iwabe N, Kuma K, Hasegawa M, Osawa S, Miyata T: Evolutionary relationship of archaebacteria, eubacteria and eucaryota inferred from trees of duplicated genes. Proc Natl Acad Sci USA 1989, 86:9355-9359.
Hans-Peter Klenk and W. Ford Doolittle, Canadian Institute of Advanced Research and Department of Biochemistry, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada.