Genomic structure of the Xenopus laevis liver transcription factor LFB1

Genomic structure of the Xenopus laevis liver transcription factor LFB1

Gene, 134 (1993) 251-256 0 1993 Elsevier Science Publishers GENE B.V. All rights reserved. 251 0378-l 119/93/$06.00 07356 Genomic structure of t...

718KB Sizes 0 Downloads 70 Views

Gene, 134 (1993) 251-256 0 1993 Elsevier Science Publishers

GENE

B.V. All rights reserved.

251

0378-l 119/93/$06.00

07356

Genomic structure of the Xenopus Zuevisliver transcription (Exon intron

structure;

HNFl;

homeobox

Dirk Zapp, Sigrid Bartkowski”, Institut,fiir

Zellbiologie

(Tumorforschung).

Received by H.G. Zachau:

29 March

protein;

Christiane

Unioersitdtsklinikum

1993; Revised/Accepted:

POU domain;

evolutionary

conservation)

Zoidl, Ludger Klein-Hitpass Essen, D-45122

24 May/25

factor LFBl

and Gerhart

U. Ryffel

Essen, Germany

May 1993; Received at publishers:

21 June 1993

SUMMARY

Liver factor Bl [LFBl, also called hepatocyte nuclear factor 1 (HNFl)] is a tissue-specific vertebrate transcription factor that is present in the liver, intestine, stomach and kidney. The LFBl protein contains an unusual homeobox that is characterized by an insertion of 21 amino acids (aa) not found in any other homeodomain protein. We have isolated and characterized the genomic sequences encoding the LFBl of Xenopus laevis. By comparing the genomic sequences with the cDNA clones, we could identify nine exons. In general, the position of the introns is identical to the one previously found in the rat. However, the C-terminal activation domain of LFBl contains, in each species, an exon that is split in two in the other species. The homeobox of the X. laeuis LFBI contains an intron at exactly the position where the 21 aa typical for LFBl are inserted. This is in agreement with the structure found in the rat gene and supports the notion that the LFBl homeobox evolved separately from the other genes encoding homeodomain proteins.

INTRODUCTION LFBl (also called HNFl or HNFla) has initially been cloned as a cDNA encoding a liver-specific transcription factor recognizing a defined promoter element present in several genes specifically expressed in hepatocytes (Frain et al., 1989; Baumhueter et al., 1990; Chouard et al., 1990). However, LFBl can also be found in other tissues including kidney, intestine and stomach (Blumenfeld et al., 1991; De Simone et al., 1991; Bartkowski et al., 1993), but the target genes in these non-hepatic cell types are not yet known. Analysis of the aa sequence deduced from the cloned cDNA revealed that LFBl has a structure that has some features of homeobox proteins but differs from all other Correspondence to: Dr. G.U. Ryffel, Institut ftir Zellbiologie (Tumorforschung), Universitatsklinikum Essen, Hufelandstrasse 55, D-45122 Essen, Germany. Tel. (49-201) 723-3110; Fax (49-201) 723-5905.

*Present address: Institut zu Ltibeck, Ratzeburger (49-451) 5002627.

ftir Humangenetik, Medizinische Universitat Allee 160, W-2400 Ltibeck, Germany. Tel.

Abbreviations:

acid(s); bp, base pair(s);

aa, amino

HNFl,

hepatocyte

nuclear factor 1 (the same as LFBl); kb, kilobase or 1000 bp; LFBl, liver factor Bl (the same as HNFl); LFBl, gene encoding LFBl; nt, nucleotide(s); ORF, open reading frame; ori, origin of DNA replication; POU, family of transcription factors including _Pit-l, _octamer factor and the factor of the unc-86 gene; X., Xenopus.

homeobox proteins by an additional 21-aa loop between the predicted helices II and III of the homeobox; moreover, the DNA-binding domain of LFBl contains a second essential component with some weak similarity to the POU-A domain and the C-terminal part of LFBl has an activation domain rich in Ser and Thr (Frain et al., 1989; Baumhueter et al., 1990; Chouard et al., 1990). A unique feature of LFBl is a short dimerization domain at the N-terminal end that allows homeodimerization as well as heterodimerization with the closely related transcription factor LFB3 (also called vHNF1 or HNFlP) that has essentially the same structural organization as LFBl (De Simone et al., 1991; Mendel et al., 1991; ReyCampos et al., 1991). All these typical features of LFBl have been conserved during vertebrate evolution since the structure of LFBl from Xenopus, an amphibian species, reflects all characteristics found in LFBl of mammals (Bartkowski et al., 1993). The fact that we found two distinct LFBl cDNAs, a and b, in Xenopus with about 10% sequence divergence (Bartkowski et al., 1993) is not too surprising, as the genome of X. laevis has been duplicated during evolution (Kobel and Du Pasquier, 1986). Our recent data suggest that LFBl is not only important in the adult for tissue-specific gene expression but may even play a major role during early embryogenesis

252 as the LFBl shortly

transcripts

after midblastula

can

(Bartkowski et al., 1993). To analyze the regulatory in Xenopus genomic

clones encoding

allow the isolation Xenopus

before organogenesis gene

LFBl.

Furthermore,

the LFBlb

the geno-

it with the reported LFBl

we isolated

three different ten independent

using

LFBl

mammalian

the differences

and the other homeodomain

on restriction LFBl

specific

overlap

of these

Hybridization representing

genes

within

proteins.

analysis cDNA

the isolated

eight EroRI

and

probes

five sequences

with

(Fig. IB), revealed

between

described

the

clones

Southern

with the 5’ fragment

of a large intron.

izing with the cDNA

(a) Genomic LFBl sequences of Xenopus To isolate the genomic LFBl sequences of Xenopus, we used the cDNA clone al that contains the entire ORF of the Xenopus LFBl protein a and some 3’ and 5’ flanking sequences (see Fig. 1B). As the two groups of LFBl cDNAs (a and b) are about 90% identical in their nt sequence (Bartkowski et al., 1993), the al cDNA should

XLF4b

:

XLF lb

E

EE E

I

I

A

C,

of all these

sequences surprisingly

the genomic

fragments

(A) and the middle frag-

subcloning

the presence

of fragments

hybrid-

was used to locate exon sequences

in short restriction fragments and these fragments were sequenced for proper identification of the exons. Sequences containing parts of the nt sequence in the cDNA, i.e., exons are given in capitals

as found in Fig. 2.

Based on sequencing we identified nine exons (Fig. 1A). Clearly exon 1 is separated by approximately 15 kb from

E E V

E

EEEEE

I

i

A

Further

in Fig. 1A.

and 3’ part, respectively

ment (B) did not react (Fig. lA), suggesting AND DISCUSSION

the

A, B and

(Fig. 1). Quite

between

blotting

we determined

probes

the presence

fragments

of

in this paper.

as given

cDNA

the 5’, the middle

hybridizing EXPERIMENTAL

the b genes

were used to define the structure

gene of Xenopus

Based

of LFBl between

i.e., the exon-intron structure the evolutionary relationship

(Bach et al., 1992) and emphasize Xenopus

of both genes. Using libraries

and these sequences

to isolate and characterize

LFBl and the homeodomain proteins. In this report we analyze the exon-intron structure of the Xenopus LFBI, compare

genomic

clones. Five clones shown in Fig. 1A represent of the LFBl

potential

it is a prerequisite

mic organization, allows to deduce

in X. laevis

be detected

transition

E a

XLF 3b

XLF 2b

XLF 5b

6

XLFBl

al-cDNA Dim.

POUA

Homeo.

Activation

Domain

6

4

A

--

B

rT

C

l H 100 bp

Fig. I. Structure of the LFBlh gene of Xmopus. (A): The overlap of the genomic clones isolated by standard techniques (Sambrook et al., 1989) is illustrated by the EcoRI sites (E) within each clone. The EcoRI fragments hybridizing with the cDNA fragments A, B and C in part B are indicated by a bar. The position of the exons 1-9 (filled squares) is given. The open square of exon 9 indicates that the end of this exon is not known. In fact we cannot exclude an intron downstream of the stop codon. The EcuRI fragment smaller in clone XLF4b compared to XLFlb and XLF3b is marked by an up-bow. (B): Schematic drawing of the protein domains encoded by the XLFBlal cDNA (Bartkowski et al., 1993) with the dimerization (Dim.), POU-A specific (POU-A). homeobox (Homeo.) and activation domain. The cDNA fragments A, B and C representing the nt ll500,54OGlO63 and 1058-2172, respectively, of the al LFBI cDNA (Bartkowski et al., 1993) used to map the genomic clones in A are marked by arrows.

253 EXONl -1158 gatcaggaaattgttaggaagcctatgggggccttccccataggctaacattggcctcggtaggttttaggtggcgaa ctagggggtcgaagaattttttaaagagacagtacttcgactatcgaatggtcgaatgatttttagttcgaatcgttcgattcgaaggtcgtagtcaaaggtcgaagtagcccattcaat ~tcgaagtagcatattcgaccattcgaaattcaaactttttttcctctattccttcactcgaactaagtaaatgggcccccaagacaccattgttattttctcagtgacttccattctt aacaaaaagatgtcactggggaaatgacattaaattcccaacttaaatgaattttacccttttagtggtttactaattaaatgctaagcaaccagctgatcacataagaattacagttgt atcatatttttaattaaatcatcagatagaaagtaaccataaatacaccaatatttaaaaatatttatatatacaggctttattaaatgctactactaccccccccccccccatgccttc ctatcagtatctctcttcccctctctggtttatagttctgaagttatttctttgttgataaatgtctctactattaggttactgtgtatttGTTTTGGTATT~CAG~TTCTT~TGTA AATTCATTCAGGTTTCAGCCACTCACAGCTATTATTA~TCATC~T~C~TT~CCCTTTACCTA~TTGTGTCACTTTCACCTTCTCATTCTCTTACTTTTACATTCTTCCTTGATAT TTTGCTTTTTCAACTTTTGTTTCTTTCTCTCTTCTACCCCTCCTCATATTCCTCT~ACTCCCCCCTCTCT~CTCAT~ACTTTGTG~TCC~GTTCAGT~CTT~~~ ACAGGGATAAAGATGAACCTTGGAAGATTTACTCT~TCTGATGT~CAGAGAGTGAC~GGGTCCCTTATCTATGTCTCAGAG~~CTGTCC~G~GTGACCACTT~TGGTTGTG GCTGCACAGTGTGTTTTTTTGGGGGGGAGGAGGAAACAGAAGG AAAAAAGCATTGXTGATGGTGACAATTACGCATATCCCAAT ATOGCGTCTCAGCTTAGTTACCTGCAACMGAGCTTTTACGTCTG GTCCCGTTGGATGATATTAGAAACCTGGATGGATGAG~AGAC~CTGTGT~~CTACCT~T~GTTAG~GA~CTCAGATGTCAG~GATG~GTTCTGATGAT~TG~ACTTTACA CCACCCATTATGAAAGAGTTAGAAAGGCTGAGCCCCCG~GAG~T~TCATCAG~G~CGTGGTGG~CGTCT~T~~aaga -

1

EXON 2 ttaaccccagagcaaaaaaaaatacagccacatgcaaaatacagccacataacttcaatgtttcaccagttgggtaaaatggttgtttatttccac~ GGAGGACCCATGGCATGTAGCCAAACTTGTAAAGTCATATATCT~A~A~AC~CATCCCACA~GGG~GT~TTGACACCACA~TCTC~TCAGTCCCATCTTTCACA~ACCTC~ CAAGGGTACACCAATGAAGACTCAAAAGAGAGA~A~CCTGTAT~CT~TATGTGG~~~AGAGAGAGATT~CA~gtgggtatgagaaatttgatagaattC

321

EXON 3 gagagactgttcatca ccagatctct gcatacccaaaattaaaaatactgataacactcgcagggcaatcaaaggtatcaattagtccaggcagactatatataaatatgtatatgtgtgtgtgttttttacc~ AGTTCACACACGCGGGACACAGTATGATAACAGATGACATTCCTTTTCC AGGCTTATGAGCGACAGAAGAATCCAAGCAAGGAGGA~GAGAG~ACTAGTAG~G~T~~CA~taacaccacaaatatagagaacaaatggcaaacattgggaggcacatttatc aaaggtcaaa ttttgaattc

521

WON

4 and

5 gatcatggagctacaaaatacctgggctgtaggtgtcacaaggtcgaagttaaagacata tgttcccaataattctacatgcacaagacatgaaagaagtaccaaagatagtagaataacagcacttgtatagaaaaccttgcaagtacacttttttcacattgtctgtctctgattc~ GGCAGAATGTTTGCAGAGAGGGGTCTCTCACCATCACAG~TCAG~TTTG~TCT~CCTGGTGACAG~GTACGTGTCTAT~TTGGTTT~C~TAG~~~GGAGGAG~ATTTCG ACACRAGTTAGCRATGGACACATACAATGGGCAGCAGAGAGTTCA~AC~CCTcTTTC~cCCATGATCTTCCTCATG~~CTCCT~aagcagattttgcaacaaatatttaaaaa attctactatgtttcctctcactagcacaaacacatgacttaattctactatttcaaacc~ GATTCAGATACACCCAAGACTCTTCCAC~ACA~AGT~T~TAT~G~CAGTCAGAGTACTCTCTCTCCTTCA~CCTGGA~CCA~CACATCCTGATG~CAGTGACA~~ TGGTCCCAGTTTCAGGTGGCTC_ACTACCACCAGTCAGTACACT~CT~TCT~ACAGTTTGGATCATAGTC~CACACACTTG~CAGACACAG~CCT~TTATG~TTCACTACCTA GTGTCATGACAATTGGCACTGATTCAGCACTA~A~CG~ATTCA~~TCCAG~TCCTCCAC~TGGTGATT~taactataatctttaaaatggcagaaatcaagcacatgtggctc agtcgttattactcccacctgtaccaaaacgagagccg~~ta~a~ttaatgagaa~agtttagttaagagttggagagg~ttaagca~t~tgggcacaatacttgtgcactag~attt atatgagcgtcaccttatattatcaccagtgtacatctaaaggtgttttgcttacttctatctatgcctc

708

917

EXON 6 aaaggtctcatctattttagagacagcagtgtttctatgtacccaaattattgactttgatatacgtacaattgtttttc~ GTCTTGCCTCACAAACACAGAGTGTACCAGTCATT~CAGTGTT~TA~A~TTGACCAC~T~AGTCTGTTCAG~rTTCTCA~A~T~ACCCTTCCCATCA~A~C~TTGTAC AACAAGTGCAGAGTCACATGACAGAGTCCCTTTATG~CAC~T~CCA~TACAGTCTCCTCAT~taaagctcataatcctaatgaaaatacatttaaaaaatacaggtatggga cctgctttccagaatgcttgggacctagggttacctggataactgatctttctgtaatttggatcttcataccttaagtatactagaaaatcatgtaaacattaaataaaccaatgggct aattttgtttccaatacggattatttatattttagttgggatcaactacaaggaattattacagaaaaaaaaacatttttaaaaatttgtattatttggataaaatggagtctatgggag acgccgttccataattttggagttttctggataacaggattacacataatggatcc

1232

EXON 7 ggatccaatacctatacatgggggggcttacccctaaaaatactaactatttattttttacctac~ CTCTTTACAGCCATAAGCCAGAGGTTGCCCAATATACATCT~A~TTCTTCCCCCAGACCATGGTCATCACAGATAC~~~TCT~G~CCCTGAC~GTCT~CCCCTA~~C AGgtaagactcaatgacacggggacctgtttagtttcagcccttaaaatgaatgactcttgtt

1421

EXON 9 taaattcaagccagtatatatactataaggcttaaatacagcatacttatttttttttttttacacggt~ GTTGTCTCCCATCACCCTACTGCACATGGRGATTCCCCAGATTCCCCAGGATCCCA~TTCAT~TCAGGATTCCA~ATATT~ACCTTCATCCTAGTCATCG~TGTCCCCCATACCCACT~taag aagtattaga gagaataagaaagaacttgagttgtttatatgtgaagtgaagccatctggttcagactgaaaactaaaactgaggaaaaaaa

1543

Exon

1658

9

aagcttatgactgccttccaagatgataaaagagttatatagatgtctttgcaggaaaacataatagtacagtacattcagggact aaaaacatccataatataaaagaatgcaaatatatagatcagttcccactggagaaactctgactgatcagttgtaaggaatgaagtatgatatggattatctttttatttatccaac~ TGTCCTCGGCCAGCTTGATACACTATCACAATTCAAGTTC~GTTCTCCAGAG~CCACAGTCACCTGTTATCTCCTTCACAC~CACCATTGACAGTTTCATATCCACCC~TG~ATCTTCCT CACAGTGAGACAGAATGTATTGGACRAAGCAAAACTGTGATCA~T~CACGTATATGTCTAGGGTTGAGTGGTGGATTT~TT~T~TGTCTTCTCTCCCTT~TGACCTTGG~G AATAAGTTGGGTATCAATTAGTACCACT~TTTACTTTTCCCTGATC~CACA~~CCT~CTC~~ACAGGACAC~CTTTT~CATGTGT~GG~~GATACACT~CAGATG TGAACTGGATAACCGCTGATATAATTTACTCACAAAATAA

Fig. 2. Sequence of the Xenopus LFBlb gene exons with their flanking regions. Subfragments containing exons of the genomic clones listed in Fig. 1 were subcloned and sequenced using a Pharmacia sequencing kit. Exon sequences are given in capita1 letters, whereas the 5’ flanking region and the intron sequence are represented in lower-case letters. The numbering refers exclusively to the exon sequence given in capital letters starting with the start codon ATG as No. 1. Sequences representing dispersed middle-repetitive DNA are underlined. The start codon, the stop codon and the dinucleotides of the intron-exon boundaries are indicated in bold face. The potentials in frame stop codons in the introns are underlined. These nt in the exons that differ from the cDNA are underlined. A detailed characterization of the promoter structure will be described elsewhere (D.Z., S.B., B. Holewa, C.Z., L.K.-H. and G.U.R., in preparation). The sequences have been deposited in the EMBL data bank and can be retrieved by the following accession numbers: LO9605 (promoter and exon l), X72983 (exon 2) X72984 (exon 3), X72985 (exons 4 and 5) X72986 (exon 6), X72987 (exon 7), X72988 (exon 8) and X72989 (exon 9).

254 exon

2, whereas

exons

4-7

are

very

closely

spaced

exons (Nos. 3 and 4) and the C-terminal

part of LFBl

(Fig. 1A). As seen in Fig. 2 all introns contain the expected dinucleotide GT and AG as characteristics for

containing the activation domain is distributed on the five remaining exons (Nos. 5-9). Recently the exon-

splice donor and acceptor aries, respectively.

intron structure of the rat and mouse LFBl has been reported (Bach et al., 1992). As illustrated in Fig. 3A the

sites at their 5’ and 3’ bound-

Comparing the aa coding sequences of the genomic clones given in Fig. 2 with the cDNA sequences of the bl and

b2 clones,

quences

that

(Bartkowski

nt changes aa exchange

both

contain

potential

et al., 1993), we found

in exon 5 (underlined (Ser+Cys).

intron

se-

two single-

in Fig. 2), with only one

In addition

in the sequence

of

exon 9 following the stop codon and the intron sequence between exons 5 and 6 that is present in one of our cDNA clones (Bartkowski between

the cDNA

et al., 1993) a total of nine differences and the genomic

sequence

were de-

tected (compare the data bank for the exact type of variation). These heterogeneities are expected, since the cDNA library and genomic library are made from distinct individuals. Further heterogeneity is seen in the isolated genomic clones as an EcoRI fragment in clone XLF4b is smaller than the corresponding fragment in the clones XLFlb and XLF3b (Fig. IA). We assume that this difference reflects the presence of two distinct allels of the b gene. This assumption is consistent with some sequence variation observed 5’ of exon 1 between the clones XLF4b and XLFlb (data not shown). A data bank search revealed in the 5’ flanking region - 1140 to -880 extensive similarities from about (70-85% identity) to a dispersed repetitive DNA element found in various other genes of Xenopus. This repetitive DNA element (underlined in Fig. 2) has initially been defined as Vi element in the Xenopus vitellogenin genes with some properties of a transposon (Schubiger et al., 1985). A distinct type of repetitive DNA was found in the intron between exons 6 and 7 (underlined in Fig. 2). This DNA segment has initially been described as a moderately repetitive dispersed DNA of the Xenopus genome with properties of a putative ori (Riggs and Taylor, 1987). (b) The exon-intron structure of the Xenopus LFBI To analyze how the various domains of the transcription factor LFBl correspond to the exons, we have indicated in Fig. 3 the position of the intron along the various protein domains of LFBl. Clearly exon 1 contains not only the entire dimerization domain of LFBl but also the sequences up to the beginning of the POU-A specific domain (Fig. 3A), whereas exon 2 is restricted to the central portion of the POU-A domain known to constitute an essential element for DNA binding (Tomei et al., 1992). The second protein domain involved in DNA binding, i.e., the homeodomain is encoded in two separate

exons l-4 of Xenopus and rat contain essentially the same protein domains, whereas the last five exons containing the activation

domain

are partially

distinct.

Most notably

exon 5 of the Xenopus gene seems to be encoded exons in the rat (exons exons 8 and 9 correspond From the aa alignment

5 and 6) whereas

in two

the Xenopus

to the exon 9 of the rat gene. of the Xenopus LFBl with the

one of the rat (Fig. 3B) it is obvious that the exon boundaries in the conserved regions have been maintained precisely throughout evolution (e.g., exons 1, 2 and 3). Fig. 3B also illustrates that some exons are more conserved than others and within an exon highly conserved and divergent domains can be present simultaneously (e.g., exons 1 and 4). There is no clear indication that an unrelated exon has been added during evolution. We postulate that even the rat exon 5 (R5 in Fig. 3B) that is quite distinct from the corresponding part of the Xenopus protein has the same evolutionary origin as the Xenopus sequence because rat exons 5 and 6 represent approximately the domain of the Xenopus exon 5. A similar argument can be made for the Xenopus exon 8 (X8). Comparing the aa sequence of the two LFBl proteins a and b in Xenopus we observed an insertion of a stretch of five and six aa in the a protein compared to the b protein (Bartkowski et al., 1993). From the deduced exonintron structure shown in Fig. 3B it is clear that the additional stretch of five aa, LFTFT, found in the POU-A domain of the a protein are located exactly at the border of exon 2 to exon 3; thus it is likely that these additional aa were added by displacement of the splicing site. At this same position an addition of 26 aa has been reported in a splicing variant of LFB3 (vHNFl-B in Rey-Campos et al., 1991). However, the stretch of additional five aa found in the a protein of Xenopus cannot arise in a variant of the b protein as there is no consensus for an alternative splicing 15 nt 3’ of exon 2 (see Fig. 2). The second addition of a short stretch of aa (LHPSHQ) in the activation domain of the a protein in Xenopus (see Fig. 3B) is located within exon 6 and thus this insertion in the conserved part of the activation domain cannot be explained by a change in an exon-intron border. These findings illustrate the multiplicity of the mechanisms that attribute to sequence divergence during evolution. (c) Genomic structure of LFBI in vertebrates Comparing the genomic structure of Xenopus LFBl with the one for the rat and mouse several common features are evident. In both the amphibian and mammalian

255

A Xenopus

x:

1

R:

1

2

3

4

5

6

7

8

@

LFBl + rat

4

44

4 4

3

2

4

5

44

6

8

7

9

POU-A

homeo. x3

LHPSHQ X6 /__-T-QNLIMASLPSVMTIGT-D-SALGPAFSNPGSSTLVIGLRS-QTQSVPVINSVGSSLTTLQSVQFSQQLHPSHQQPIVQQVQSHMAQSPFMATMAQLQS IlllIIIII

II/II

:

::IIIIIIllI:IlIII/lII

,:IlII,,,,:II/,I/l,

,,,I/

,,,I

Hi::

II,,:,,

471

Il/,/lIIl/1

QPQNLIMASLPGVMTIGPGEPASLGPTFTNTGASTLVIGLRSTQAQSVPVINSMGSSLTTLQPVQFSQPLHPSYQQP~PPVQSHVAQSP~TMAQLQS

R7

+

+

x7

J,

491 J/

X8

561

PHALYSHKPEVAQYTSAGPFPQTMVITDTSNLGTLTSLTPS~~~T~------~G-~S~GS--"LQ~Q~-SS~L~L"PS~~LS~~PTVSSASLIHY IlI:I/l/I:lllI:

:

lIIl:IIlI

II

/,:l/,I:IlI

I

/

I

II:

I :

II

I

I

II

I

Ill1

IIIIl:Il:

/

PHRLYSHKPEVAQYTHTSLLPQTMLITDT-NLSTLASLTP~TS~TS~SSS~GL"S~SS~*TT*~~PSQDPSNIQHLQPAHRLSTSPTVSSSSLVLY

R8 x9

590 594

Xenopus LFBI b

HNSSSPENHSHLLSPSHNTIDSFISTQkRSSSQ

I 1

I/III

I

,::/II/,lII/,1

rat LFBI

QSSDSNG-HSHLLPSNHGVIETFISTQMASSSQ

R9

628

Fig. 3. Comparison of the exon intron structure of the LFBI gene between Xenopus and rat. (A) A schematic drawing shows the domains of the LFBl protein (compare Fig. 1B) with the position of the introns given by arrows. The exons for the Xenopus (X) and rat (R) gene are numbered. (B) The positions of the introns (marked by arrows) are indicated in the alignment of the aa sequence of the Xenopus LFBlb (upper lines) and rat LFBl proteins (lower lines). The data for the rat are taken from Bach et al. (1992). A 5-aa and 6-aa insertion found in the Xenopus LFBla gene (Bartkowski et al., 1993) are indicated at the respective position. A triangle marks these introns where intron sequences were found in the cloned Xenopus cDNAs (Bartkowski et al., 1993). Four helices and the 21-aa loop characteristic for the LFBl homeobox (Rey-Campos et al., 1991) are marked. The exons of the Xenopus (X) and rat (R) gene are numbered. The numbers to the right refer to the aa positions,

256 species (Bach et al., 1992) a very large intron (more than 10 kb) separates exon 1 from the remaining exons. This is even more

remarkable

as such a large intron

is also

found in the chicken gene. Furthermore, it is notable that the chicken LFBl gene contains ten exons by combining the introns Grajer

specific for Xenopus and rat (A. Horlein,

and T. Igo-Kemenes,

A unique

feature

both

in mammals

et al., 1990; Chouard

and Xenopus (Bartkowski

during

an intron

Forschungsgemeinschaft

(Frain

et al., 1990)

et al., 1993) is an extra loop

(Fig. 3B). It is reasonable evolution

by the Deutsche

K.-H.

of 21 aa within the homeodomain. Precisely at this position both in Xenopus and rat exons 3 and 4 are joined together

Supported (Ry 5/l-5).

in preparation).

of LFBl

et al., 1989; Baumhueter

ACKNOWLEDGEMENTS

to assume that initially

was generated

at this position

and that at a later stage the generation of a new acceptor site inside this putative intron led to the addition of 21 aa. As these 21 aa gained during evolution have been highly conserved between Xenopus and mammals (Fig. 3B), we postulate that they are of functional importance. The unusual extended structure of the homeobox of LFBl has also been interpreted as two insertions of 18 and 3 aa (Nicosia et al., 1990). In this model the inserted aa-coding nt sequences would not be located at exon-intron boundaries. Usually homeoboxes are encoded by a single exon, but there are a few cases with an intron interrupting this domain. However, so far no gene encoding a homeobox protein, including the POU transcription factors, has been identified with an intron at the same position of the homeobox as in the LFBl gene (see Bach et al., 1992, for an updated list). As the intron position in Xenopus and mammals has been conserved, we argue that LFBl has evolved as a gene separate from the ones encoding other homeodomain proteins. Cloning of the LFBl cDNA of Xenopus had revealed for the b group cDNA clones some sequences that contain interrupted ORFs of LFBl (Bartkowski et al., 1993). Based on the analysis of the genomic clones we know now that the hl cDNA clone contains exon 1 and the adjacent intron, whereas the b2 cDNA contains exactly the sequences of the intron between exons 5 and 6. Thus both these cDNA clones are not cloning artefacts derived from some rearrangement during cloning but rather represent splicing intermediates or variants. In Fig. 3B the relevant exon-intron boundaries are marked by a triangle. In this context it seems noteworthy that both in Xenopus (Fig. 2) and in mouse and rat (Bach et al., 1992) these intron sequences contain within a few nt in frame translation stop codons. This apparent conservation might indicate some functional relevance. Thus it is reasonable to search for the corresponding potential truncated LFBl proteins, that might have regulatory potentials distinct from the full-length protein.

REFERENCES Bach, I., Pontoglio, hepatocyte 4199-4204.

M. and Yaniv, M.: Structure

nuclear

factor

1 (HNFl ). Nucleic

of the gene encoding Acids Res. 20 (1992)

Bartkowski, S., Zapp, D., Weber, H., Eberle, G., Zoidl, C., Senkel, S., Klein-Hitpass, L. and Ryffel, G.U.: Developmental regulation and tissue distribution Xenopus

kwis.

of the liver transcription factor LFBl Mol. Cell, Biol. 13 (1993) 421-431.

(HNFl)

in

Baumhueter, S., Mendel, D.B., Conley, P.B., Kuo, C.J., Turk, C., Graves, M.K., Edwards, CA., Courtois, G. and Crabtree, G.R.: HNF-1 shares three sequence motifs with the POU domain proteins and is identical to LF-Bl and APF. Genes Dev. 4 (1990) 3722379. Blumenfeld, M., Maury, M., Chouard, T., Yaniv, M. and Condamine, H.: Hepatic nuclear factor 1 (HNFI) shows a wider distribution than products of its known target mouse. Development 113 (1991) 589-599.

genes in developing

Chouard, T., Blumenfeld, M., Bach, I., Vandekerckhove, J., Cereghini, S. and Yaniv, M.: A distal dimerization domain is essential for DNAbinding by the atypical HNFl homeodomain. Nucleic Acids Res. 18 (1990) 5853-5863. De Simone, V., Magistris, L., Lazzaro, D., Gerstner, J., Monaci, P., Nicosia, A. and Cortese, R.: LFB3, a heterodimer-forming homeoprotein of the LFBl family, is expressed in specialized epithelia. EMBO J. 10 (1991) 143551443. Frain, M., Swart, G., Monaci, P., Nicosia, A., Stampfli, S., Frank, R. and Cortese, R.: The liver-specific transcription factor LF-Bl contains a highly diverged (1989) 1455157.

homeobox

DNA-binding

domain.

Cell 59

Kobel, H.R. and Du Pasquier, L.: Genetics of polyploid Xenopus. Trends Genet. 2 (1986) 310-315. Mendel, D.B., Hansen, L.P., Graves, M.K., Conley, P.B. and Crabtree, G.R.: HNF-la and HNF-1B (vHNF-1) share dimerization and homeo domains, but not activation domains, and form heterodimers in aitro. Genes Dev. 5 (1991) 104221056. Nicosia, A., Monaci, P., Tomei, L., De Francesco, R., Nuzzo, M., Stunnenberg, H. and Cortese, R.: A myosin-like dimerization helix and an extra-large homeodomain are essential elements of the tripartite DNA binding structure of LFBl. Cell 61 (1990) 122551236. Rey-Campos, J., Chouard, T., Yaniv, M. and Cereghini, S.: vHNF1 is a homeoprotein that activates transcription and forms heterodimers with HNFl. EMBO J. 10 (1991) 1445-1457. Riggs, CD. and Taylor, J.H.: Sequence organization and developmentally regulated transcription of a family of repetitive DNA sequences of Xenopus la&s. Nucleic Acids Res. 15 (1987) 9551-9565. Sambrook, J., Fritsch, E.F. and Maniatis, T.: Molecular Cloning. A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989. Schubiger, J.-L., Germond, J.-E., ten Heggeler, B. and Wahli, W.: The Vi element ~ a transposon-like repeated DNA sequence interspersed in the vitellogenin locus of Xenoptts laeuis. J. Mol. Biol. 186 (1985) 491-503. Tomei, L., Cortese, R. and De Francesco, R.: A POU-A related region dictates DNA binding specificity of LFBl/HNFl by orienting the two XL-homeodomains in the dimer. EMBO J. 11 (1992) 411994129.