Gene, 134 (1993) 251-256 0 1993 Elsevier Science Publishers
GENE
B.V. All rights reserved.
251
0378-l 119/93/$06.00
07356
Genomic structure of the Xenopus Zuevisliver transcription (Exon intron
structure;
HNFl;
homeobox
Dirk Zapp, Sigrid Bartkowski”, Institut,fiir
Zellbiologie
(Tumorforschung).
Received by H.G. Zachau:
29 March
protein;
Christiane
Unioersitdtsklinikum
1993; Revised/Accepted:
POU domain;
evolutionary
conservation)
Zoidl, Ludger Klein-Hitpass Essen, D-45122
24 May/25
factor LFBl
and Gerhart
U. Ryffel
Essen, Germany
May 1993; Received at publishers:
21 June 1993
SUMMARY
Liver factor Bl [LFBl, also called hepatocyte nuclear factor 1 (HNFl)] is a tissue-specific vertebrate transcription factor that is present in the liver, intestine, stomach and kidney. The LFBl protein contains an unusual homeobox that is characterized by an insertion of 21 amino acids (aa) not found in any other homeodomain protein. We have isolated and characterized the genomic sequences encoding the LFBl of Xenopus laevis. By comparing the genomic sequences with the cDNA clones, we could identify nine exons. In general, the position of the introns is identical to the one previously found in the rat. However, the C-terminal activation domain of LFBl contains, in each species, an exon that is split in two in the other species. The homeobox of the X. laeuis LFBI contains an intron at exactly the position where the 21 aa typical for LFBl are inserted. This is in agreement with the structure found in the rat gene and supports the notion that the LFBl homeobox evolved separately from the other genes encoding homeodomain proteins.
INTRODUCTION LFBl (also called HNFl or HNFla) has initially been cloned as a cDNA encoding a liver-specific transcription factor recognizing a defined promoter element present in several genes specifically expressed in hepatocytes (Frain et al., 1989; Baumhueter et al., 1990; Chouard et al., 1990). However, LFBl can also be found in other tissues including kidney, intestine and stomach (Blumenfeld et al., 1991; De Simone et al., 1991; Bartkowski et al., 1993), but the target genes in these non-hepatic cell types are not yet known. Analysis of the aa sequence deduced from the cloned cDNA revealed that LFBl has a structure that has some features of homeobox proteins but differs from all other Correspondence to: Dr. G.U. Ryffel, Institut ftir Zellbiologie (Tumorforschung), Universitatsklinikum Essen, Hufelandstrasse 55, D-45122 Essen, Germany. Tel. (49-201) 723-3110; Fax (49-201) 723-5905.
*Present address: Institut zu Ltibeck, Ratzeburger (49-451) 5002627.
ftir Humangenetik, Medizinische Universitat Allee 160, W-2400 Ltibeck, Germany. Tel.
Abbreviations:
acid(s); bp, base pair(s);
aa, amino
HNFl,
hepatocyte
nuclear factor 1 (the same as LFBl); kb, kilobase or 1000 bp; LFBl, liver factor Bl (the same as HNFl); LFBl, gene encoding LFBl; nt, nucleotide(s); ORF, open reading frame; ori, origin of DNA replication; POU, family of transcription factors including _Pit-l, _octamer factor and the factor of the unc-86 gene; X., Xenopus.
homeobox proteins by an additional 21-aa loop between the predicted helices II and III of the homeobox; moreover, the DNA-binding domain of LFBl contains a second essential component with some weak similarity to the POU-A domain and the C-terminal part of LFBl has an activation domain rich in Ser and Thr (Frain et al., 1989; Baumhueter et al., 1990; Chouard et al., 1990). A unique feature of LFBl is a short dimerization domain at the N-terminal end that allows homeodimerization as well as heterodimerization with the closely related transcription factor LFB3 (also called vHNF1 or HNFlP) that has essentially the same structural organization as LFBl (De Simone et al., 1991; Mendel et al., 1991; ReyCampos et al., 1991). All these typical features of LFBl have been conserved during vertebrate evolution since the structure of LFBl from Xenopus, an amphibian species, reflects all characteristics found in LFBl of mammals (Bartkowski et al., 1993). The fact that we found two distinct LFBl cDNAs, a and b, in Xenopus with about 10% sequence divergence (Bartkowski et al., 1993) is not too surprising, as the genome of X. laevis has been duplicated during evolution (Kobel and Du Pasquier, 1986). Our recent data suggest that LFBl is not only important in the adult for tissue-specific gene expression but may even play a major role during early embryogenesis
252 as the LFBl shortly
transcripts
after midblastula
can
(Bartkowski et al., 1993). To analyze the regulatory in Xenopus genomic
clones encoding
allow the isolation Xenopus
before organogenesis gene
LFBl.
Furthermore,
the LFBlb
the geno-
it with the reported LFBl
we isolated
three different ten independent
using
LFBl
mammalian
the differences
and the other homeodomain
on restriction LFBl
specific
overlap
of these
Hybridization representing
genes
within
proteins.
analysis cDNA
the isolated
eight EroRI
and
probes
five sequences
with
(Fig. IB), revealed
between
described
the
clones
Southern
with the 5’ fragment
of a large intron.
izing with the cDNA
(a) Genomic LFBl sequences of Xenopus To isolate the genomic LFBl sequences of Xenopus, we used the cDNA clone al that contains the entire ORF of the Xenopus LFBl protein a and some 3’ and 5’ flanking sequences (see Fig. 1B). As the two groups of LFBl cDNAs (a and b) are about 90% identical in their nt sequence (Bartkowski et al., 1993), the al cDNA should
XLF4b
:
XLF lb
E
EE E
I
I
A
C,
of all these
sequences surprisingly
the genomic
fragments
(A) and the middle frag-
subcloning
the presence
of fragments
hybrid-
was used to locate exon sequences
in short restriction fragments and these fragments were sequenced for proper identification of the exons. Sequences containing parts of the nt sequence in the cDNA, i.e., exons are given in capitals
as found in Fig. 2.
Based on sequencing we identified nine exons (Fig. 1A). Clearly exon 1 is separated by approximately 15 kb from
E E V
E
EEEEE
I
i
A
Further
in Fig. 1A.
and 3’ part, respectively
ment (B) did not react (Fig. lA), suggesting AND DISCUSSION
the
A, B and
(Fig. 1). Quite
between
blotting
we determined
probes
the presence
fragments
of
in this paper.
as given
cDNA
the 5’, the middle
hybridizing EXPERIMENTAL
the b genes
were used to define the structure
gene of Xenopus
Based
of LFBl between
i.e., the exon-intron structure the evolutionary relationship
(Bach et al., 1992) and emphasize Xenopus
of both genes. Using libraries
and these sequences
to isolate and characterize
LFBl and the homeodomain proteins. In this report we analyze the exon-intron structure of the Xenopus LFBI, compare
genomic
clones. Five clones shown in Fig. 1A represent of the LFBl
potential
it is a prerequisite
mic organization, allows to deduce
in X. laevis
be detected
transition
E a
XLF 3b
XLF 2b
XLF 5b
6
XLFBl
al-cDNA Dim.
POUA
Homeo.
Activation
Domain
6
4
A
--
B
rT
C
l H 100 bp
Fig. I. Structure of the LFBlh gene of Xmopus. (A): The overlap of the genomic clones isolated by standard techniques (Sambrook et al., 1989) is illustrated by the EcoRI sites (E) within each clone. The EcoRI fragments hybridizing with the cDNA fragments A, B and C in part B are indicated by a bar. The position of the exons 1-9 (filled squares) is given. The open square of exon 9 indicates that the end of this exon is not known. In fact we cannot exclude an intron downstream of the stop codon. The EcuRI fragment smaller in clone XLF4b compared to XLFlb and XLF3b is marked by an up-bow. (B): Schematic drawing of the protein domains encoded by the XLFBlal cDNA (Bartkowski et al., 1993) with the dimerization (Dim.), POU-A specific (POU-A). homeobox (Homeo.) and activation domain. The cDNA fragments A, B and C representing the nt ll500,54OGlO63 and 1058-2172, respectively, of the al LFBI cDNA (Bartkowski et al., 1993) used to map the genomic clones in A are marked by arrows.
253 EXONl -1158 gatcaggaaattgttaggaagcctatgggggccttccccataggctaacattggcctcggtaggttttaggtggcgaa ctagggggtcgaagaattttttaaagagacagtacttcgactatcgaatggtcgaatgatttttagttcgaatcgttcgattcgaaggtcgtagtcaaaggtcgaagtagcccattcaat ~tcgaagtagcatattcgaccattcgaaattcaaactttttttcctctattccttcactcgaactaagtaaatgggcccccaagacaccattgttattttctcagtgacttccattctt aacaaaaagatgtcactggggaaatgacattaaattcccaacttaaatgaattttacccttttagtggtttactaattaaatgctaagcaaccagctgatcacataagaattacagttgt atcatatttttaattaaatcatcagatagaaagtaaccataaatacaccaatatttaaaaatatttatatatacaggctttattaaatgctactactaccccccccccccccatgccttc ctatcagtatctctcttcccctctctggtttatagttctgaagttatttctttgttgataaatgtctctactattaggttactgtgtatttGTTTTGGTATT~CAG~TTCTT~TGTA AATTCATTCAGGTTTCAGCCACTCACAGCTATTATTA~TCATC~T~C~TT~CCCTTTACCTA~TTGTGTCACTTTCACCTTCTCATTCTCTTACTTTTACATTCTTCCTTGATAT TTTGCTTTTTCAACTTTTGTTTCTTTCTCTCTTCTACCCCTCCTCATATTCCTCT~ACTCCCCCCTCTCT~CTCAT~ACTTTGTG~TCC~GTTCAGT~CTT~~~ ACAGGGATAAAGATGAACCTTGGAAGATTTACTCT~TCTGATGT~CAGAGAGTGAC~GGGTCCCTTATCTATGTCTCAGAG~~CTGTCC~G~GTGACCACTT~TGGTTGTG GCTGCACAGTGTGTTTTTTTGGGGGGGAGGAGGAAACAGAAGG AAAAAAGCATTGXTGATGGTGACAATTACGCATATCCCAAT ATOGCGTCTCAGCTTAGTTACCTGCAACMGAGCTTTTACGTCTG GTCCCGTTGGATGATATTAGAAACCTGGATGGATGAG~AGAC~CTGTGT~~CTACCT~T~GTTAG~GA~CTCAGATGTCAG~GATG~GTTCTGATGAT~TG~ACTTTACA CCACCCATTATGAAAGAGTTAGAAAGGCTGAGCCCCCG~GAG~T~TCATCAG~G~CGTGGTGG~CGTCT~T~~aaga -
1
EXON 2 ttaaccccagagcaaaaaaaaatacagccacatgcaaaatacagccacataacttcaatgtttcaccagttgggtaaaatggttgtttatttccac~ GGAGGACCCATGGCATGTAGCCAAACTTGTAAAGTCATATATCT~A~A~AC~CATCCCACA~GGG~GT~TTGACACCACA~TCTC~TCAGTCCCATCTTTCACA~ACCTC~ CAAGGGTACACCAATGAAGACTCAAAAGAGAGA~A~CCTGTAT~CT~TATGTGG~~~AGAGAGAGATT~CA~gtgggtatgagaaatttgatagaattC
321
EXON 3 gagagactgttcatca ccagatctct gcatacccaaaattaaaaatactgataacactcgcagggcaatcaaaggtatcaattagtccaggcagactatatataaatatgtatatgtgtgtgtgttttttacc~ AGTTCACACACGCGGGACACAGTATGATAACAGATGACATTCCTTTTCC AGGCTTATGAGCGACAGAAGAATCCAAGCAAGGAGGA~GAGAG~ACTAGTAG~G~T~~CA~taacaccacaaatatagagaacaaatggcaaacattgggaggcacatttatc aaaggtcaaa ttttgaattc
521
WON
4 and
5 gatcatggagctacaaaatacctgggctgtaggtgtcacaaggtcgaagttaaagacata tgttcccaataattctacatgcacaagacatgaaagaagtaccaaagatagtagaataacagcacttgtatagaaaaccttgcaagtacacttttttcacattgtctgtctctgattc~ GGCAGAATGTTTGCAGAGAGGGGTCTCTCACCATCACAG~TCAG~TTTG~TCT~CCTGGTGACAG~GTACGTGTCTAT~TTGGTTT~C~TAG~~~GGAGGAG~ATTTCG ACACRAGTTAGCRATGGACACATACAATGGGCAGCAGAGAGTTCA~AC~CCTcTTTC~cCCATGATCTTCCTCATG~~CTCCT~aagcagattttgcaacaaatatttaaaaa attctactatgtttcctctcactagcacaaacacatgacttaattctactatttcaaacc~ GATTCAGATACACCCAAGACTCTTCCAC~ACA~AGT~T~TAT~G~CAGTCAGAGTACTCTCTCTCCTTCA~CCTGGA~CCA~CACATCCTGATG~CAGTGACA~~ TGGTCCCAGTTTCAGGTGGCTC_ACTACCACCAGTCAGTACACT~CT~TCT~ACAGTTTGGATCATAGTC~CACACACTTG~CAGACACAG~CCT~TTATG~TTCACTACCTA GTGTCATGACAATTGGCACTGATTCAGCACTA~A~CG~ATTCA~~TCCAG~TCCTCCAC~TGGTGATT~taactataatctttaaaatggcagaaatcaagcacatgtggctc agtcgttattactcccacctgtaccaaaacgagagccg~~ta~a~ttaatgagaa~agtttagttaagagttggagagg~ttaagca~t~tgggcacaatacttgtgcactag~attt atatgagcgtcaccttatattatcaccagtgtacatctaaaggtgttttgcttacttctatctatgcctc
708
917
EXON 6 aaaggtctcatctattttagagacagcagtgtttctatgtacccaaattattgactttgatatacgtacaattgtttttc~ GTCTTGCCTCACAAACACAGAGTGTACCAGTCATT~CAGTGTT~TA~A~TTGACCAC~T~AGTCTGTTCAG~rTTCTCA~A~T~ACCCTTCCCATCA~A~C~TTGTAC AACAAGTGCAGAGTCACATGACAGAGTCCCTTTATG~CAC~T~CCA~TACAGTCTCCTCAT~taaagctcataatcctaatgaaaatacatttaaaaaatacaggtatggga cctgctttccagaatgcttgggacctagggttacctggataactgatctttctgtaatttggatcttcataccttaagtatactagaaaatcatgtaaacattaaataaaccaatgggct aattttgtttccaatacggattatttatattttagttgggatcaactacaaggaattattacagaaaaaaaaacatttttaaaaatttgtattatttggataaaatggagtctatgggag acgccgttccataattttggagttttctggataacaggattacacataatggatcc
1232
EXON 7 ggatccaatacctatacatgggggggcttacccctaaaaatactaactatttattttttacctac~ CTCTTTACAGCCATAAGCCAGAGGTTGCCCAATATACATCT~A~TTCTTCCCCCAGACCATGGTCATCACAGATAC~~~TCT~G~CCCTGAC~GTCT~CCCCTA~~C AGgtaagactcaatgacacggggacctgtttagtttcagcccttaaaatgaatgactcttgtt
1421
EXON 9 taaattcaagccagtatatatactataaggcttaaatacagcatacttatttttttttttttacacggt~ GTTGTCTCCCATCACCCTACTGCACATGGRGATTCCCCAGATTCCCCAGGATCCCA~TTCAT~TCAGGATTCCA~ATATT~ACCTTCATCCTAGTCATCG~TGTCCCCCATACCCACT~taag aagtattaga gagaataagaaagaacttgagttgtttatatgtgaagtgaagccatctggttcagactgaaaactaaaactgaggaaaaaaa
1543
Exon
1658
9
aagcttatgactgccttccaagatgataaaagagttatatagatgtctttgcaggaaaacataatagtacagtacattcagggact aaaaacatccataatataaaagaatgcaaatatatagatcagttcccactggagaaactctgactgatcagttgtaaggaatgaagtatgatatggattatctttttatttatccaac~ TGTCCTCGGCCAGCTTGATACACTATCACAATTCAAGTTC~GTTCTCCAGAG~CCACAGTCACCTGTTATCTCCTTCACAC~CACCATTGACAGTTTCATATCCACCC~TG~ATCTTCCT CACAGTGAGACAGAATGTATTGGACRAAGCAAAACTGTGATCA~T~CACGTATATGTCTAGGGTTGAGTGGTGGATTT~TT~T~TGTCTTCTCTCCCTT~TGACCTTGG~G AATAAGTTGGGTATCAATTAGTACCACT~TTTACTTTTCCCTGATC~CACA~~CCT~CTC~~ACAGGACAC~CTTTT~CATGTGT~GG~~GATACACT~CAGATG TGAACTGGATAACCGCTGATATAATTTACTCACAAAATAA
Fig. 2. Sequence of the Xenopus LFBlb gene exons with their flanking regions. Subfragments containing exons of the genomic clones listed in Fig. 1 were subcloned and sequenced using a Pharmacia sequencing kit. Exon sequences are given in capita1 letters, whereas the 5’ flanking region and the intron sequence are represented in lower-case letters. The numbering refers exclusively to the exon sequence given in capital letters starting with the start codon ATG as No. 1. Sequences representing dispersed middle-repetitive DNA are underlined. The start codon, the stop codon and the dinucleotides of the intron-exon boundaries are indicated in bold face. The potentials in frame stop codons in the introns are underlined. These nt in the exons that differ from the cDNA are underlined. A detailed characterization of the promoter structure will be described elsewhere (D.Z., S.B., B. Holewa, C.Z., L.K.-H. and G.U.R., in preparation). The sequences have been deposited in the EMBL data bank and can be retrieved by the following accession numbers: LO9605 (promoter and exon l), X72983 (exon 2) X72984 (exon 3), X72985 (exons 4 and 5) X72986 (exon 6), X72987 (exon 7), X72988 (exon 8) and X72989 (exon 9).
254 exon
2, whereas
exons
4-7
are
very
closely
spaced
exons (Nos. 3 and 4) and the C-terminal
part of LFBl
(Fig. 1A). As seen in Fig. 2 all introns contain the expected dinucleotide GT and AG as characteristics for
containing the activation domain is distributed on the five remaining exons (Nos. 5-9). Recently the exon-
splice donor and acceptor aries, respectively.
intron structure of the rat and mouse LFBl has been reported (Bach et al., 1992). As illustrated in Fig. 3A the
sites at their 5’ and 3’ bound-
Comparing the aa coding sequences of the genomic clones given in Fig. 2 with the cDNA sequences of the bl and
b2 clones,
quences
that
(Bartkowski
nt changes aa exchange
both
contain
potential
et al., 1993), we found
in exon 5 (underlined (Ser+Cys).
intron
se-
two single-
in Fig. 2), with only one
In addition
in the sequence
of
exon 9 following the stop codon and the intron sequence between exons 5 and 6 that is present in one of our cDNA clones (Bartkowski between
the cDNA
et al., 1993) a total of nine differences and the genomic
sequence
were de-
tected (compare the data bank for the exact type of variation). These heterogeneities are expected, since the cDNA library and genomic library are made from distinct individuals. Further heterogeneity is seen in the isolated genomic clones as an EcoRI fragment in clone XLF4b is smaller than the corresponding fragment in the clones XLFlb and XLF3b (Fig. IA). We assume that this difference reflects the presence of two distinct allels of the b gene. This assumption is consistent with some sequence variation observed 5’ of exon 1 between the clones XLF4b and XLFlb (data not shown). A data bank search revealed in the 5’ flanking region - 1140 to -880 extensive similarities from about (70-85% identity) to a dispersed repetitive DNA element found in various other genes of Xenopus. This repetitive DNA element (underlined in Fig. 2) has initially been defined as Vi element in the Xenopus vitellogenin genes with some properties of a transposon (Schubiger et al., 1985). A distinct type of repetitive DNA was found in the intron between exons 6 and 7 (underlined in Fig. 2). This DNA segment has initially been described as a moderately repetitive dispersed DNA of the Xenopus genome with properties of a putative ori (Riggs and Taylor, 1987). (b) The exon-intron structure of the Xenopus LFBI To analyze how the various domains of the transcription factor LFBl correspond to the exons, we have indicated in Fig. 3 the position of the intron along the various protein domains of LFBl. Clearly exon 1 contains not only the entire dimerization domain of LFBl but also the sequences up to the beginning of the POU-A specific domain (Fig. 3A), whereas exon 2 is restricted to the central portion of the POU-A domain known to constitute an essential element for DNA binding (Tomei et al., 1992). The second protein domain involved in DNA binding, i.e., the homeodomain is encoded in two separate
exons l-4 of Xenopus and rat contain essentially the same protein domains, whereas the last five exons containing the activation
domain
are partially
distinct.
Most notably
exon 5 of the Xenopus gene seems to be encoded exons in the rat (exons exons 8 and 9 correspond From the aa alignment
5 and 6) whereas
in two
the Xenopus
to the exon 9 of the rat gene. of the Xenopus LFBl with the
one of the rat (Fig. 3B) it is obvious that the exon boundaries in the conserved regions have been maintained precisely throughout evolution (e.g., exons 1, 2 and 3). Fig. 3B also illustrates that some exons are more conserved than others and within an exon highly conserved and divergent domains can be present simultaneously (e.g., exons 1 and 4). There is no clear indication that an unrelated exon has been added during evolution. We postulate that even the rat exon 5 (R5 in Fig. 3B) that is quite distinct from the corresponding part of the Xenopus protein has the same evolutionary origin as the Xenopus sequence because rat exons 5 and 6 represent approximately the domain of the Xenopus exon 5. A similar argument can be made for the Xenopus exon 8 (X8). Comparing the aa sequence of the two LFBl proteins a and b in Xenopus we observed an insertion of a stretch of five and six aa in the a protein compared to the b protein (Bartkowski et al., 1993). From the deduced exonintron structure shown in Fig. 3B it is clear that the additional stretch of five aa, LFTFT, found in the POU-A domain of the a protein are located exactly at the border of exon 2 to exon 3; thus it is likely that these additional aa were added by displacement of the splicing site. At this same position an addition of 26 aa has been reported in a splicing variant of LFB3 (vHNFl-B in Rey-Campos et al., 1991). However, the stretch of additional five aa found in the a protein of Xenopus cannot arise in a variant of the b protein as there is no consensus for an alternative splicing 15 nt 3’ of exon 2 (see Fig. 2). The second addition of a short stretch of aa (LHPSHQ) in the activation domain of the a protein in Xenopus (see Fig. 3B) is located within exon 6 and thus this insertion in the conserved part of the activation domain cannot be explained by a change in an exon-intron border. These findings illustrate the multiplicity of the mechanisms that attribute to sequence divergence during evolution. (c) Genomic structure of LFBI in vertebrates Comparing the genomic structure of Xenopus LFBl with the one for the rat and mouse several common features are evident. In both the amphibian and mammalian
255
A Xenopus
x:
1
R:
1
2
3
4
5
6
7
8
@
LFBl + rat
4
44
4 4
3
2
4
5
44
6
8
7
9
POU-A
homeo. x3
LHPSHQ X6 /__-T-QNLIMASLPSVMTIGT-D-SALGPAFSNPGSSTLVIGLRS-QTQSVPVINSVGSSLTTLQSVQFSQQLHPSHQQPIVQQVQSHMAQSPFMATMAQLQS IlllIIIII
II/II
:
::IIIIIIllI:IlIII/lII
,:IlII,,,,:II/,I/l,
,,,I/
,,,I
Hi::
II,,:,,
471
Il/,/lIIl/1
QPQNLIMASLPGVMTIGPGEPASLGPTFTNTGASTLVIGLRSTQAQSVPVINSMGSSLTTLQPVQFSQPLHPSYQQP~PPVQSHVAQSP~TMAQLQS
R7
+
+
x7
J,
491 J/
X8
561
PHALYSHKPEVAQYTSAGPFPQTMVITDTSNLGTLTSLTPS~~~T~------~G-~S~GS--"LQ~Q~-SS~L~L"PS~~LS~~PTVSSASLIHY IlI:I/l/I:lllI:
:
lIIl:IIlI
II
/,:l/,I:IlI
I
/
I
II:
I :
II
I
I
II
I
Ill1
IIIIl:Il:
/
PHRLYSHKPEVAQYTHTSLLPQTMLITDT-NLSTLASLTP~TS~TS~SSS~GL"S~SS~*TT*~~PSQDPSNIQHLQPAHRLSTSPTVSSSSLVLY
R8 x9
590 594
Xenopus LFBI b
HNSSSPENHSHLLSPSHNTIDSFISTQkRSSSQ
I 1
I/III
I
,::/II/,lII/,1
rat LFBI
QSSDSNG-HSHLLPSNHGVIETFISTQMASSSQ
R9
628
Fig. 3. Comparison of the exon intron structure of the LFBI gene between Xenopus and rat. (A) A schematic drawing shows the domains of the LFBl protein (compare Fig. 1B) with the position of the introns given by arrows. The exons for the Xenopus (X) and rat (R) gene are numbered. (B) The positions of the introns (marked by arrows) are indicated in the alignment of the aa sequence of the Xenopus LFBlb (upper lines) and rat LFBl proteins (lower lines). The data for the rat are taken from Bach et al. (1992). A 5-aa and 6-aa insertion found in the Xenopus LFBla gene (Bartkowski et al., 1993) are indicated at the respective position. A triangle marks these introns where intron sequences were found in the cloned Xenopus cDNAs (Bartkowski et al., 1993). Four helices and the 21-aa loop characteristic for the LFBl homeobox (Rey-Campos et al., 1991) are marked. The exons of the Xenopus (X) and rat (R) gene are numbered. The numbers to the right refer to the aa positions,
256 species (Bach et al., 1992) a very large intron (more than 10 kb) separates exon 1 from the remaining exons. This is even more
remarkable
as such a large intron
is also
found in the chicken gene. Furthermore, it is notable that the chicken LFBl gene contains ten exons by combining the introns Grajer
specific for Xenopus and rat (A. Horlein,
and T. Igo-Kemenes,
A unique
feature
both
in mammals
et al., 1990; Chouard
and Xenopus (Bartkowski
during
an intron
Forschungsgemeinschaft
(Frain
et al., 1990)
et al., 1993) is an extra loop
(Fig. 3B). It is reasonable evolution
by the Deutsche
K.-H.
of 21 aa within the homeodomain. Precisely at this position both in Xenopus and rat exons 3 and 4 are joined together
Supported (Ry 5/l-5).
in preparation).
of LFBl
et al., 1989; Baumhueter
ACKNOWLEDGEMENTS
to assume that initially
was generated
at this position
and that at a later stage the generation of a new acceptor site inside this putative intron led to the addition of 21 aa. As these 21 aa gained during evolution have been highly conserved between Xenopus and mammals (Fig. 3B), we postulate that they are of functional importance. The unusual extended structure of the homeobox of LFBl has also been interpreted as two insertions of 18 and 3 aa (Nicosia et al., 1990). In this model the inserted aa-coding nt sequences would not be located at exon-intron boundaries. Usually homeoboxes are encoded by a single exon, but there are a few cases with an intron interrupting this domain. However, so far no gene encoding a homeobox protein, including the POU transcription factors, has been identified with an intron at the same position of the homeobox as in the LFBl gene (see Bach et al., 1992, for an updated list). As the intron position in Xenopus and mammals has been conserved, we argue that LFBl has evolved as a gene separate from the ones encoding other homeodomain proteins. Cloning of the LFBl cDNA of Xenopus had revealed for the b group cDNA clones some sequences that contain interrupted ORFs of LFBl (Bartkowski et al., 1993). Based on the analysis of the genomic clones we know now that the hl cDNA clone contains exon 1 and the adjacent intron, whereas the b2 cDNA contains exactly the sequences of the intron between exons 5 and 6. Thus both these cDNA clones are not cloning artefacts derived from some rearrangement during cloning but rather represent splicing intermediates or variants. In Fig. 3B the relevant exon-intron boundaries are marked by a triangle. In this context it seems noteworthy that both in Xenopus (Fig. 2) and in mouse and rat (Bach et al., 1992) these intron sequences contain within a few nt in frame translation stop codons. This apparent conservation might indicate some functional relevance. Thus it is reasonable to search for the corresponding potential truncated LFBl proteins, that might have regulatory potentials distinct from the full-length protein.
REFERENCES Bach, I., Pontoglio, hepatocyte 4199-4204.
M. and Yaniv, M.: Structure
nuclear
factor
1 (HNFl ). Nucleic
of the gene encoding Acids Res. 20 (1992)
Bartkowski, S., Zapp, D., Weber, H., Eberle, G., Zoidl, C., Senkel, S., Klein-Hitpass, L. and Ryffel, G.U.: Developmental regulation and tissue distribution Xenopus
kwis.
of the liver transcription factor LFBl Mol. Cell, Biol. 13 (1993) 421-431.
(HNFl)
in
Baumhueter, S., Mendel, D.B., Conley, P.B., Kuo, C.J., Turk, C., Graves, M.K., Edwards, CA., Courtois, G. and Crabtree, G.R.: HNF-1 shares three sequence motifs with the POU domain proteins and is identical to LF-Bl and APF. Genes Dev. 4 (1990) 3722379. Blumenfeld, M., Maury, M., Chouard, T., Yaniv, M. and Condamine, H.: Hepatic nuclear factor 1 (HNFI) shows a wider distribution than products of its known target mouse. Development 113 (1991) 589-599.
genes in developing
Chouard, T., Blumenfeld, M., Bach, I., Vandekerckhove, J., Cereghini, S. and Yaniv, M.: A distal dimerization domain is essential for DNAbinding by the atypical HNFl homeodomain. Nucleic Acids Res. 18 (1990) 5853-5863. De Simone, V., Magistris, L., Lazzaro, D., Gerstner, J., Monaci, P., Nicosia, A. and Cortese, R.: LFB3, a heterodimer-forming homeoprotein of the LFBl family, is expressed in specialized epithelia. EMBO J. 10 (1991) 143551443. Frain, M., Swart, G., Monaci, P., Nicosia, A., Stampfli, S., Frank, R. and Cortese, R.: The liver-specific transcription factor LF-Bl contains a highly diverged (1989) 1455157.
homeobox
DNA-binding
domain.
Cell 59
Kobel, H.R. and Du Pasquier, L.: Genetics of polyploid Xenopus. Trends Genet. 2 (1986) 310-315. Mendel, D.B., Hansen, L.P., Graves, M.K., Conley, P.B. and Crabtree, G.R.: HNF-la and HNF-1B (vHNF-1) share dimerization and homeo domains, but not activation domains, and form heterodimers in aitro. Genes Dev. 5 (1991) 104221056. Nicosia, A., Monaci, P., Tomei, L., De Francesco, R., Nuzzo, M., Stunnenberg, H. and Cortese, R.: A myosin-like dimerization helix and an extra-large homeodomain are essential elements of the tripartite DNA binding structure of LFBl. Cell 61 (1990) 122551236. Rey-Campos, J., Chouard, T., Yaniv, M. and Cereghini, S.: vHNF1 is a homeoprotein that activates transcription and forms heterodimers with HNFl. EMBO J. 10 (1991) 1445-1457. Riggs, CD. and Taylor, J.H.: Sequence organization and developmentally regulated transcription of a family of repetitive DNA sequences of Xenopus la&s. Nucleic Acids Res. 15 (1987) 9551-9565. Sambrook, J., Fritsch, E.F. and Maniatis, T.: Molecular Cloning. A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989. Schubiger, J.-L., Germond, J.-E., ten Heggeler, B. and Wahli, W.: The Vi element ~ a transposon-like repeated DNA sequence interspersed in the vitellogenin locus of Xenoptts laeuis. J. Mol. Biol. 186 (1985) 491-503. Tomei, L., Cortese, R. and De Francesco, R.: A POU-A related region dictates DNA binding specificity of LFBl/HNFl by orienting the two XL-homeodomains in the dimer. EMBO J. 11 (1992) 411994129.