Comparative genomics of the Hlx homeobox gene and protein: Conservation of structure and expression from fish to mammals

Comparative genomics of the Hlx homeobox gene and protein: Conservation of structure and expression from fish to mammals

Gene 352 (2005) 45 – 56 www.elsevier.com/locate/gene Comparative genomics of the Hlx homeobox gene and protein: Conservation of structure and express...

1012KB Sizes 2 Downloads 48 Views

Gene 352 (2005) 45 – 56 www.elsevier.com/locate/gene

Comparative genomics of the Hlx homeobox gene and protein: Conservation of structure and expression from fish to mammals Michael D. Batesa,b,*, James M. Wellsb, Byrappa Venkateshc a

Division of Gastroenterology, Hepatology and Nutrition, Cincinnati Children’s Hospital Medical Center, Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA b Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA c Institute of Molecular and Cell Biology, 61 Biopolis Drive, Singapore 138673 Received 18 November 2004; received in revised form 8 February 2005; accepted 1 March 2005 Received by S. Yokoyama

Abstract Hlx is a homeobox transcription factor gene that is expressed in intestinal and hepatic mesenchyme of the developing mouse embryo and is essential for normal intestinal and hepatic development. Because of the morphological and molecular similarities in the development of the digestive system across species, we hypothesized that the Hlx gene and protein sequences and expression patterns would be conserved among vertebrates. Comparison of the Hlx gene orthologues of human, chimpanzee, mouse, rat, pufferfish (Fugu) and zebrafish demonstrates that these six genes share an identical organization with four exons and three introns. Comparison of the inferred Hlx protein sequences from these and three additional species (chick, Spanish ribbed newt and rainbow trout) reveals significant sequence identity, with identical homeodomains. The expression of Hlx in the mesenchyme of developing chick embryos is highly similar to that of mouse. Fugu Hlx is expressed in a tissue-specific manner that is similar though not identical to that of mouse, suggesting a conservation of Hlx function between mammals and birds. The mammalian and fish Hlx genes share a putative 5V upstream enhancer as well as an inverted repeat containing CCAAT boxes on opposite strands that we have previously shown to be important for mouse Hlx gene expression. These results suggest that the function of Hlx and the mechanisms regulating its expression are highly conserved in mammals, birds, amphibians and fish. D 2005 Elsevier B.V. All rights reserved. Keywords: Transcription factor; Developmental gene expression regulation; Genome; Mouse; Takifugu

1. Introduction Among the important participants in developmental programming are members of the family of transcription factors encoded by the homeobox genes. Initially identified in Drosophila, homeodomain transcription factors share a Abbreviations: kb, kilobases; bp, base pairs; min, minutes; E, embryonic day; RT-PCR, reverse transcription – polymerase chain reaction; Mb, megabases. * Corresponding author. Division of Gastroenterology, Hepatology and Nutrition, Cincinnati Children’s Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH 45229, USA. Tel.: +1 513 636 4415; fax: +1 513 636 5581. E-mail address: [email protected] (M.D. Bates). 0378-1119/$ - see front matter D 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2005.03.021

well-conserved, 60-amino acid DNA-binding homeodomain. The various homeodomain proteins are expressed in discrete cell types at specific times during development and in adult tissues. They often occupy high-level positions in the genetic hierarchy of development, in that the expression of a homeobox gene initiates a genetic pathway or cascade that regulates cell differentiation and/or proliferation. Mutations of homeobox genes cause or are associated with morphological and developmental anomalies in Drosophila, mice and humans. The discovery of such master regulatory transcription factors in development raises questions about how their expression is controlled (upstream regulation) and what genes the encoded transcription factors in turn regulate (downstream targets). Inter-species comparisons of gene and

46

M.D. Bates et al. / Gene 352 (2005) 45 – 56

protein sequences and expression patterns can shed important light on whether specific developmental processes are conserved across species. Hlx is a homeobox gene that was originally isolated from a mouse pre-B-lymphocyte cell line (Allen et al., 1991). It has homology to the Drosophila homeobox gene H2.0 (Barad et al., 1988) that is expressed in visceral mesenchyme but is not essential for gut morphogenesis (Barad et al., 1991). The mouse and human Hlx proteins are highly conserved, and their homeodomains are 100% identical (Kennedy et al., 1994). In adult mice, Hlx is expressed most abundantly in lung, heart, skeletal muscle and hematopoietic tissues and cells, and less abundantly in intestine (small and large), liver, uterus and ovaries; there was also trace expression detected in stomach, brain, kidney and testes by RNase protection assay (Allen et al., 1991). In mouse embryos, Hlx is most prominently expressed in mesodermal tissues, in particular in the midgut and hindgut mesenchyme of developing liver, gallbladder and intestine, where it is first detected by in situ hybridization at embryonic day (E) 9.5 (Lints et al., 1996). Targeted disruption of the mouse Hlx gene results in an autosomal-recessive embryonic lethal phenotype in which the intestine and liver fail to grow, although this mutation has no effect on the ability of embryonic hematopoietic cells to reconstitute the hematopoietic systems of irradiated mice (Hentsch et al., 1996). Hlx has more recently been shown to interact with the T-box transcription factor T-bet to induce helper T-lymphocyte type 1-specific gene expression (Mullen et al., 2002; Zheng et al., 2004). Thus, the Hlx transcription factor plays diverse roles in a variety of tissues. Recent genome sequencing efforts have provided the opportunity to gain additional insights into the structure and function of the Hlx gene and protein through cross-species comparisons of genomic data. In addition to the human and mouse genomes that have essentially been completely sequenced (for references, see Ureta-Vidal et al., 2003), progress is being made to complete the genomic sequence of

a number of other species. Additional gene sequencing efforts have focused on species that are important model systems for biological, developmental or medical studies, such as the worm Caenorhabditis elegans, the fruitfly Drosophila melanogaster, the mosquito Anopheles gambiae, the pufferfish Takifugu rubripes (Fugu) and the zebrafish Danio rerio (for references, see Ureta-Vidal et al., 2003). The Fugu genome is of particular interest because, although it contains approximately as many genes as mammals, it is one-eighth the size of mammalian genomes (Venkatesh et al., 2000). This smaller size means that regulatory sequences are more tightly spaced in the non-coding regions of the genome. Because of the morphological and molecular similarities in the development of the digestive system among vertebrates, we hypothesized that the Hlx gene and encoded protein sequences and expression pattern would be conserved. In this study, we have taken advantage of new genomic information to compare the gene structure and encoded protein sequences of Hlx from human and mouse with those from seven other species: chimpanzee, rat, chick, newt, Fugu, zebrafish and rainbow trout. We find that the Hlx gene and protein sequence is strongly conserved among vertebrates, including complete conservation of the DNA-binding homeodomain. Expression in chick and Fugu is highly similar to that previously reported for mouse (Lints et al., 1996). The similarities and differences observed provide insights into Hlx gene and protein structure and function.

2. Materials and methods 2.1. DNA and protein sequences Human and mouse Hlx genomic and cDNA sequences have previously been reported (Kennedy et al., 1994; Allen et al., 1991; Bates et al., 2000). Table 1 lists accession

Table 1 Accession numbers and related information for Hlx genomic and cDNA sequences from nine vertebrate species and the Drosophila gene H2.0 Species

GenBank

Ensembl build

Chromosome

Contig

Nucleotides (gene T 10 kb)

Ensembl peptide

Human Chimp Mouse

NM _021958 AADA01226078 AF172318

12.31.1 1 12.3.1

1 – 1

216886181 – 216911839 7994 – 3141a 185016742 – 185042733

ENSP00000259148 – ENSMUSP00000040505

Rat Fugu Zebrafish Chick Newt Trout Drosophila

– – – BM489797b AF106694b

12.2.1 12.2.1 16.2.1

13

AL445423.13.1.177941 AADA012260781 CAAA01218972.1.1.9634, CAAA01110281.1.1.1286 RNOR01034700 Chr _scaffold _316 ctg10680

95029139 – 95003612 137163 – 114922 4219 – 7478a

ENSRNOP00000003155 SINFRUP00000157884 ENSDARP00000004527

a b c

?

c

NM _078764b

Predicted coding region only; contig does not include 10 kb in each direction. GenBank accession number is for the cDNA sequence. Rainbow Trout Gene Index tentative consensus sequence TC29346 (see Section 2.1 for details).

M.D. Bates et al. / Gene 352 (2005) 45 – 56

information for other sequences used in this study. Additional human and mouse genomic DNA sequences were obtained from recent genomic sequence builds from the Ensembl database project (www.ensembl.org). Hlx genomic sequences for chimpanzee (Pan troglodytes), rat (Rattus norvegicus), Fugu (T. rubripes) and zebrafish (D. rerio) were identified by performing a protein vs. translated DNA BLAST search (TBLASTN) using the Hlx homeodomain (previously shown to be identical in human and mouse) against the relevant Ensembl database. For chimpanzee, we identified a 12.3 kb contig containing 4.3 kb of 5V sequence, the entire Hlx-coding region and three introns, and 3.1 kb of 3V sequence in the Ensembl chimpanzee genome preliminary data. The vertebrate Hlx homeodomain sequence was also used to identify a tentative consensus sequence (TC29346), representing the Hlx cDNA sequence for rainbow trout (Oncorhynchus mykiss), in the Rainbow Trout Gene Index (version 3.0, January 6, 2004) of The Institute for Genomic Research (TIGR; http://www.tigr.org/). Hlx cDNA sequence from Spanish ribbed newt (Pleurodeles waltl) was obtained from GenBank (accession number AF106694). Chick Hlx cDNA sequence was obtained from chick expressed sequence tag (EST) plasmids. The Hlx protein homeodomain sequence was used to BLAST search the Chick EST Database from the University of Delaware (http://www.chickest.udel.edu) and clone pgf1n.pk009.j12 (GenBank accession number BI066827) was identified. The sequence of this truncated clone was then used to identify clone pgm2n.pk002.p23 (GenBank accession number BM489797). The inserts of both chick EST clones were sequenced by the University of Cincinnati DNA Sequencing Core on an ABI 3100 Capillary Electrophoresis Sequencing System. All sequences were confirmed in both directions. Because clone pgm2n.pk002.p23 contained an additional 430 bp 5V to the sequence of clone pgf1n.pk009.j12 (including a large open reading frame), the sequence of clone pgm2n.pk002.p23 and the translation of its open reading frame were taken as those of the chick cDNA and protein, respectively. Comparison of the sequences of the two clones demonstrated several differences in the 3V untranslated regions: two single nucleotide polymorphisms and a 10 bp insertion in clone pgm2n.pk002.p23. In addition, a single nucleotide polymorphism was found in the protein-coding region, resulting in a threonine residue at position 284 in the translation of clone pgm2n.pk002.p23 vs. a proline residue suggested by clone pgf1n.pk009.j12. These differences may be due to chicken strain differences between the EST libraries, because this polymorphism affects an amino acid residue in a region of the Hlx protein that is not tightly conserved. 2.2. Sequence alignments Pairwise comparisons of inferred Hlx protein sequences were performed using the Lipman – Pearson algorithm (Lipman and Pearson, 1985) as implemented by LaserGene

47

Megalign 5.5 for Macintosh OS X (DNASTAR, Inc., Madison, Wisconsin), with these parameters: Ktuple = 2, gap penalty = 4 and gap length penalty = 12, to obtain similarity indices, defined as the number of matching residues divided by the sum of all residues and gap characters, multiplied by 100 to give the index as a percentage. Consensus Hlx protein sequence among the nine species was obtained using the Clustal V algorithm (Higgins and Sharp, 1989) as implemented in LaserGene Megalign, with these parameters: gap penalty = 10 and gap length penalty = 10. Phylogenetic relationships among these sequences were analyzed by alignment of the core aligned region (corresponding to consensus amino acids 173 to 427 in Fig. 2B) using Megalign. Alignments of genomic DNA sequences were performed using MultiPipMaker (http:// bio.cse.psu.edu/pipmaker/; Schwartz et al., 2000). Alignments of conserved 5V genomic sequences were performed using the Clustal V algorithm (Higgins and Sharp, 1989) as implemented by LaserGene Megalign. 2.3. Genomic DNA and protein sequence analysis Conserved cis-regulatory elements in 5V genomic sequences were identified using MatInspector (http://www. genomatix.de/matinspector; Genomatix Software GmbH, Mu¨nchen, Germany) using Matrix Family Library Version 3.1.1. Identification of conserved protein sequence motifs was performed using the Motifs program of SeqWeb, version 2 (Accelrys, San Diego, California), which makes use of the PROSITE Dictionary of Protein Sites and Patterns (Expert Protein Analysis System, Swiss Institute of Bioinformatics, http://us.expasy.org/prosite/). 2.4. In situ hybridization Developmental expression of Hlx was assayed in chick embryos by whole mount in situ hybridization (Wilkinson and Nieto, 1993). Antisense and sense riboprobes were synthesized by in vitro transcription from linearized pgf1n.pk009.j12, whose insert is in the cloning vector pSPORT1. Fertilized White Leghorn chicken eggs (Charles River, Wilmington, MA) were incubated at 38 -C until Hamburger and Hamilton stages 14 –20 (Hamburger and Hamilton, 1951). Chick embryos were isolated and fixed in 4% paraformaldehyde, dehydrated in methanol, rehydrated, treated with 6% hydrogen peroxide, digested with 10 mg/ ml proteinase K for 5 min and postfixed in 4% paraformaldehyde. Embryos were hybridized in buffer containing 1 Ag/ml probe overnight at 70 -C, washed, incubated overnight with an anti-digoxigenin antibody (1:1000) and developed with BM purple (Roche Applied Science, Indianapolis, IN). Embryos were then dehydrated in ethanol and xylene and embedded in paraffin. Paraffin sections were counter stained with nuclear fast red (Vector Laboratories, Burlingame, CA).

48

M.D. Bates et al. / Gene 352 (2005) 45 – 56

an Applied Biosystems 3700 DNA Analyzer to confirm their identity and exon – intron boundaries. A fragment of actin cDNA was amplified by PCR (35 cycles) as an internal control for the quality and quantity of cDNA using the primers ACTF (5V-AACTGGGAYGACATGGAGAA-3V) and ACTR (5V-TTGAAGGTCTCAAACATGAT-3V).

3. Results 3.1. Identification of additional vertebrate Hlx genes: conservation of gene organization

Fig. 1. Organization of the Hlx genes of human, chimpanzee, mouse, rat, Fugu and zebrafish. Genes are shown to scale (bar = 1 kb). Protein-coding regions are shown by the wide dark blue bars, with the homeoboxes (which in all six genes are interrupted by an intron) shown in light blue. Known 5V and 3V untranslated sequences (human and mouse) are shown in shaded blue. The human and mouse Hlx genes have heterogeneous transcription start sites, with two known start sites in human (Kennedy et al., 1994) and multiple start sites in mouse (Bates et al., 2000). The green box and green asterisk show the positions of a conserved 5V sequence element and a conserved inverted repeat containing two CCAAT boxes, respectively (see Section 3.5 and Fig. 5). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

2.5. Reverse transcription – polymerase chain reaction (RTPCR) analysis Total RNA from various Fugu tissues was extracted using Trizol reagent (Gibco BRL) by following manufacturer’s protocol. The purified total RNA was reverse transcribed into cDNA using AMV reverse transcriptase first-strand cDNA synthesis kit (Gibco BRL). The following primers complementary to protein-coding region of Fugu Hlx were used in PCR: fHlxF1: 5V-GATACCGTGAAGAAACCGTCG-3V (complementary to the first exon) and fHlxR1: 5V-GGCCCTGGGAAGGTGTCCTG-3V (complementary to nucleotides in the second and third exon, bridging the second intron), to give a 557 bp product. The PCR cycles comprised a denaturation step at 95 -C for 2 min, followed by 25 or 35 cycles of 95 -C for 30 s, 60 -C for 30 s and 72 -C for 1 min, followed by a final elongation step at 72 -C for 5 min. Representative RT-PCR products were sequenced directly on

The mouse and human Hlx cDNAs and genes have been described previously (Allen et al., 1991; Bates et al., 2000; Kennedy et al., 1994). We used the conserved mouse and human Hlx homeodomain to search for Hlx orthologues in the genome databases available in Ensembl using the BLAST algorithm. These searches yielded highly significant matches for chimpanzee, rat, Fugu and zebrafish, all vertebrate species, with genes encoding proteins having identical homeodomains (Fig. 1). More distantly related genes, more similar to the H2.0 gene of D. melanogaster, were found in searches of genomic data for the fruitfly Drosophila pseudoobscura, the mosquito A. gambiae and the honeybee Apis mellifera (Fig. 2A). No matches were found for C. elegans. The genomic sequences for chimpanzee, rat, Fugu and zebrafish in the region of the matches were obtained for further analysis. The human and mouse Hlx genes have four exons and three introns, all in the protein-coding region, including one in the homeobox (Kennedy et al., 1994; Bates et al., 2000). We found that the chimpanzee, rat, Fugu and zebrafish genes also have three introns within the protein-coding region (Fig. 1), and these are located identically to those of the mouse and human genes, including significant homology of the presumed intron– exon borders (Fig. 2B and Table 2). The introns of the two fish Hlx genes are significantly smaller than those of the other three species (Fig. 1 and Table 2). 3.2. Conservation of the Hlx protein Identification of the Hlx gene sequences from chimpanzee, rat, Fugu and zebrafish allowed comparisons of the

Fig. 2. Conservation of the Hlx protein. (A) Figure shows the inferred sequences of the vertebrate Hlx homeodomains (identical in all nine species) and of the related H2.0 homeodomain protein of Drosophila melanogaster (GenBank accession number NP_523488), and its orthologues in D. pseudoobscura, mosquito and honeybee. Amino acid residues identical to the consensus homeodomain sequence (Bu¨rglin, 1994) are shown in bold red text; others that are among the most frequently observed at the given position are in black. Residues infrequently encountered at the position are shown in gray. The tyrosine residue that is part of a consensus tyrosine phosphorylation site is marked with an asterisk. The locations of the homeodomain helices are indicated. (B) Inferred Hlx protein sequences from human, chimpanzee, mouse, rat, chick, newt, Fugu, zebrafish (Danio) and rainbow trout were aligned as described in Section 2.2. Colors in the bar indicate amount of conservation at each position: red, identical in 9 of 9 sequences; orange, 8 of 9; green, 6 – 7 of 9; light blue, 4 – 5 of 9; dark blue, 3 of 9; white,  2 of 9. Locations of the three introns in human, chimp, mouse, rat, Fugu and zebrafish are designated by black triangles. The completely conserved homeodomain sequences are boxed. The location of a tyrosine residue that is part of a consensus tyrosine phosphorylation motif is indicated by an asterisk. (C) Phylogenetic relationships of Hlx protein sequences (using the core aligned region, corresponding to consensus amino acids 173 to 427, as described in Section 2.2) from the nine species. The inferred number of nucleotide substitutions is shown on the x-axis. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

M.D. Bates et al. / Gene 352 (2005) 45 – 56

49

50

M.D. Bates et al. / Gene 352 (2005) 45 – 56

Table 2 Comparison of intron – exon boundaries of the Hlx genes from human, chimpanzee, mouse, rat, Fugu and zebrafish (Danio) Intron

Species

Length (bp)

5V

1

Human Chimp Mouse Rat Fugu Danio Human Chimp Mouse Rat Fugu Danio Human Chimp Mouse Rat Fugu Danio

744 744 687 700 305 546 790 784 636 647 130 216 1846 1843 1866 1942 ¨724a 1538

TGAGAGgtaggtcttgggcgg. . . TGAGAGgtaggtcttgggcgg. . . TGAGAGgtacgtatcgctgga. . . TGAGAGgtacgtattgctgga. . . TGAGAGgtgagagtgggtttt. . . TGAGAGgtaagacgctaattt. . . TTCCAGgtacggaaaaactcc. . . TTCCAGgtacggaaaaactcc. . . TTCCAGgtagaaaaattcagc. . . TTCCAGgtagaaaattttctg. . . TCCCAGgtagagacatggctg. . . TCCCAGgtagtgtagaaactc. . . GCACAGgtaaggcagttctgg. . . GCACAGgtaaggcagttctgg. . . GCACAGgtaagctaactctcc. . . GCACAGgtactctctccgcta. . . GCGCAGgtaggacatgaggag. . . GCACAGgtgagcgtaacttac. . .

2

3

3V ttcccttgtccccagATCTCA ttcccttgtccccagATCTCA tttccccctacccagATCTCA tttcgccctgcccagATCTCA ctcgctctgcaacagATCTCA gtttttctgtaacagATCTCA cccgttctgcggcagGTCCCT cccgttctgcggcagGTCCCT cctgctgcttggcagGTCCCT cctgctggttggcagGTCCCT ccttgtgtcttccagGGCCTT gtttgaaccaaacagGGCCGT tgtggcgcggcgcagGTGAAG tgtggcgcggcgcagGTGAAG tgctgttgcgcacagGTGAAG tgatgtggcgcacagGTGAAG cacctgcgttttcagGTGAAG ttattattatcttagGTGAAA

Exon sequence is in upper case; intron sequence in lower case, with an ellipsis to indicate intron sequence away from the intron – exon boundary. Bold text indicates sequence matching consensus intron – exon boundary sequences (Mount, 1982). Sequence accession numbers are listed in Table 1. a Exact length cannot be concluded because of about 50 bp of undefined sequence in Ensembl genomic sequence build 12.2.1.

inferred protein sequences with the human and mouse Hlx proteins. We also obtained the chick and rainbow trout Hlx cDNA and protein sequences as described in Section 2.1. Finally, a search of GenBank also yielded the cDNA sequence of Hlx (accession number AF106694) from P. waltl, the Spanish ribbed newt, whose inferred protein sequence we included in the comparison. The chimpanzee, rat, chick, newt, Fugu, zebrafish and rainbow trout Hlx proteins are 493, 476, 294, 388, 361, 356 and 366 amino acid residues, respectively (as compared with 488 for human and 476 for mouse). Significant amino acid sequence identity was found among the nine species (Table 3 and Fig. 2B). The Hlx sequences of the four mammalian species were most closely related and the sequences otherwise match expected phylogenetic relationships (Fig. 2C). The human and chimpanzee sequences are identical except for a five-amino acid expansion of a proline – glutamine-rich motif in chimpanzee and two substitutions at non-conserved residues. The mouse and rat sequences showed 96.8% sequence identity. Kennedy et al. (1994) previously noted that the largest differences between the human and mouse proteins are at the carboxyl terminus. Rat Hlx follows this pattern, with all but 4 of the amino acid differences compared with mouse being within 60 amino acid residues of the carboxyl terminus. The more distantly related chick, newt and fish Hlx sequences were also most divergent from the mammalian species and each other at the carboxyl terminus. The inferred chick Hlx amino acid sequence begins with an MYTA motif that is also present in the three fish Hlx proteins, but it is otherwise much shorter than the other 8 vertebrate Hlx proteins at the amino-terminal end (Fig. 2B), with the loss of several regions that are highly conserved among the rest of the vertebrate Hlx proteins. However, in all nine species, the carboxyl terminal region is serine –

threonine-rich and contains an acidic segment (Allen et al., 1991). Notably, we found a 93-amino acid region of sequence identity among the nine species (with the exception of one conservative substitution each in chick, newt and zebrafish). This region includes the 60-amino acid homeobox, which is identical among the nine species. The Hlx homeodomain most closely matches a consensus homeodomain (Bu¨rglin, 1994) in helices 3 and 4 (Fig. 2A), the region that binds to the major groove of DNA. The Hlx homeodomain contains a consensus tyrosine phosphorylation site [(RK)XX(DE)XXXY; Cooper et al., 1984] (Fig. 2). Another feature, previously noted to be conserved between the human and mouse Hlx proteins, is a larger proline-rich region that contains a smaller glutamine-rich segment. The proline-rich region is also present in chimpanzee, rat, newt, Table 3 Comparison of homology of Hlx protein sequences from various species Human Chimp Mouse Rat

Chick Newt Fugu Danio Trout Dros.a

Human 100 Chimp 98.2 100 Mouse 85.4 84.6 100 Rat 86.0 85.2 96.8 100 Chick 74.5 74.5 74.0 73.6 100 Newt 59.3 61.7 71.7 72.1 69.9 Fugu 60.3 52.8 59.0 57.7 56.9 Danio 56.1 58.7 49.5 49.2 58.9 Trout 65.3 65.3 60.7 60.7 60.0 Dros.a 49.1 49.2 50.8 50.8 61.9

100 52.1 100 52.4 65.3 100 53.4 76.1 73.3 100 47.2 45.0 30.1 48.0 100

Cross-comparisons of the full protein sequences for each species were performed as described in Section 2.2 using protein sequences from the indicated species that were inferred from the genomic or cDNA sequences (for accession numbers, see Table 1). Listed are the similarity indices for each comparison. a H2.0 protein from Drosophila melanogaster (Barad et al., 1988).

M.D. Bates et al. / Gene 352 (2005) 45 – 56

Fugu, zebrafish and trout, but not chick, and the glutaminerich segment (12 of 13 residues in human, 9 of 13 residues in mouse and rat) is only present in the mammalian species. 3.3. Developmental expression of chick Hlx Conservation of the expression of orthologous genes is suggestive of conserved function. To determine whether the

51

temporospatial expression of Hlx is conserved between mouse and chick, the expression of Hlx in developing chick embryos was determined by whole mount in situ hybridization. Expression of Hlx was seen as early as Hamburger and Hamilton stage 14 (Hamburger and Hamilton, 1951), approximate to E9.0 in mouse (Fig. 3A). At this stage, Hlx was expressed broadly in the gut prior to formation of a gut tube. Between stages 17 and 18 (approximately E9.5 – 10.0

Fig. 3. Expression of Hlx in chick embryos. Hamburger and Hamilton stages 14 – 20 (Hamburger and Hamilton, 1951) chick embryos were analyzed by whole mount in situ hybridization with a chick Hlx riboprobe. In panel A, Hlx expression is shown at chick embryonic stages 14, 17 and 18, approximately equivalent to mouse embryonic days (E) 9.0, 9.5 and 10.0, respectively. The lines indicate the planes of sections that are shown in D, DV and F. In panels B and C, stage 20 embryo (approximate to mouse E10.5, panel B) and a dissected gut tube (panel C) show the expression of Hlx in discreet organ domains along the anterior – posterior axis. The line in panel B indicates the plane of section in E. In panels D and E, transverse sections from embryos show that Hlx expression is in the intestinal mesenchyme and not in the epithelium. Panel F shows Hlx expression in the pharynx. Abbreviations: c, cardiac tube; p1, first pharyngeal arch; lb, forelimb bud; hg, hindgut; lu, lung; li, liver; g, gizzard; s, stomach; d, duodenum; e, epithelium; m, mesenchyme.

52

M.D. Bates et al. / Gene 352 (2005) 45 – 56

in mouse), the gut tube forms from anterior to posterior and Hlx expression increases in the posterior end (Fig. 3B). By stage 20, the gut tube is formed and Hlx expression in the gut is maintained (Fig. 3C). In all stages analyzed, the anterior limit of Hlx expression in the gut was adjacent to the developing cardiac tube, and expression extended to the hindgut. Hlx expression was also evident in the pharyngeal arches. To investigate whether mesenchymal Hlx expression is conserved between mouse and chick, we analyzed sectioned embryos. As in mouse, Hlx expression is restricted to the mesenchymal component of the gut (Fig. 3D –E). Expression in the pharyngeal arches is also evident in sections (Fig. 3F). 3.4. Expression of Hlx in Fugu As an initial step toward understanding the function of Hlx in Fugu, we used RT-PCR analysis to determine the expression pattern of Hlx in adult Fugu tissues. We found that, similar to the adult mouse expression pattern determined by RNase protection assay (Allen et al., 1991), Fugu Hlx is expressed in adult heart, intestine and liver, and at lower levels in gill (analogous to lung) and muscle (Fig. 4). However, differences from mouse expression were observed. For example, among the tissues assayed, the expression of Hlx was relatively greater in Fugu brain, kidney, stomach and testis, whereas mouse Hlx was relatively more highly expressed in spleen (Allen et al., 1991). These differences may be due to differences in the techniques employed (RT-PCR vs. RNase protection assay). 3.5. Regulatory regions controlling Hlx expression The similarities in protein sequence and tissue-specific and developmental expression of Hlx in mouse, chick and Fugu suggested that the genes in these species might contain similar regulatory regions. Although chick Hlx genomic sequence is not available, we were able to compare the genomic sequences of six species (Fig. 1). Conservation of non-coding sequences may indicate regulatory regions that are functionally significant (Ureta-Vidal et al., 2003). The

Fig. 4. Expression of Hlx in adult Fugu. Expression of Fugu Hlx and hactin genes (control) was assayed by RT-PCR in the indicated tissues obtained from adult Fugu, as described in Section 2.5. PCR for Hlx cDNA was run for 25 or 35 cycles as indicated.

human and mouse Hlx genes have multiple transcription start sites and TATA-less promoters (Kennedy et al., 1994; Bates et al., 2000). Like in human and mouse, the chimpanzee and rat Hlx genes lack a TATA box; in both, the closest TATA motifs are > 1 kb 5V to the protein-coding region. Analysis of the Fugu and zebrafish Hlx genes showed that the nearest TATA boxes are greater than 360 bp 5V to the translation start sites, suggesting that these Hlx genes either have very long 5V untranslated regions (UTR) (human and mouse have 5V UTRs of 300 – 600 nucleotides; Kennedy et al., 1994; Bates et al., 2000), or that, like mouse and human, they have TATA-less promoters. In previous work, we demonstrated that three consensus cis-regulatory/transcription factor binding elements that are conserved in the 5V regions of the mouse and human Hlx genes are important for promoter activity in reporter expression assays in NIH 3T3 cells, an embryonic mouse mesenchymal cell line in which Hlx is expressed (Bates et al., 2001). Two of these elements are CCAAT boxes in a novel inverted repeat. Examination of the chimpanzee, rat, Fugu and zebrafish Hlx 5V regions demonstrates that such an inverted repeat with two CCAAT boxes is conserved in these species as well (Fig. 4A). Consistent with the compact nature of the Fugu genome, this feature is located much closer to the start of the protein-coding region (186 bp) than in the mammalian species (495 – 518 bp); this feature is relatively closer in zebrafish as well (Figs. 1A and 5). The third regulatory element conserved between mouse and human, an AP-2 site that is located about 30 bp 5V to the inverted repeat, is conserved in chimpanzee. In rat, the closest AP-2 consensus sites are 60 and 80 bp 5V to the inverted repeat, whereas in Fugu the only consensus AP-2 site within 700 bp of the translation start site is within the inverted repeat (Fig. 4A), near the 5V end (5V-CCCATCCC3V). In zebrafish, there is no consensus AP-2 within 1 kb of the translation start site. These results suggest that there are both conserved and mammalian-specific regulatory mechanisms for Hlx gene expression. The mouse and human Hlx 5V flanking regions are very highly conserved (Bates et al., 2000), and the chimpanzee and rat Hlx 5V regions are very similar as well. The Fugu and zebrafish Hlx 5V regions are overall much more divergent. However, comparison of the six genomic sequences demonstrates a 283 – 295 bp region that is highly conserved among the six species (Figs. 1 and 5B), with greater than 95% sequence identity among the mammalian species, 79% sequence identity between the two fish species and 63% sequence identity among all six species. This region is located about 2.9 kb 5V to the protein-coding region in the four mammalian species but much closer in the two fish species (585 bp 5V in Fugu, 820 bp in zebrafish). BLAST searches of these sequences against the genomes of the species studied did not identify matches, suggesting that these conserved regions are not merely repeat sequences. Analysis of the nucleotide sequences of this conserved region identified multiple

M.D. Bates et al. / Gene 352 (2005) 45 – 56

53

Fig. 5. Conserved elements in 5V Hlx genomic sequences. (A) Conservation of an inverted repeat motif containing CCAAT boxes on opposite strands. Positions of these motifs relative to the protein-coding region in each species are indicated. The CCAAT boxes (or reverse complement) are shown in blue, conserved nucleotides that form part of the inverted repeat are in red and non-conserved nucleotides that form part of the inverted repeat in a particular species are shown in green. (B) Conservation of an approximately 290 bp element in the Hlx 5V region. Human, chimpanzee, mouse, rat, Fugu and zebrafish (Danio) genomic sequences are shown along with the consensus sequence. Colors in the bar indicate amount of conservation at each position: red, identical in 6 of 6 sequences; orange, 5 of 6; green, 4 of 6; light blue, 3 of 6; white,  2 of 6. Conserved consensus cis-regulatory elements (Table 4) are indicated by the black and grey bars above the consensus sequence, with black indicating the site of the core sequence for each element and grey indicating the remainder of the consensus sequence. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

consensus cis-regulatory elements that are conserved in all six species (Table 4).

4. Discussion Homeobox genes regulate key processes in the development of the digestive system. In mouse development, the clustered Hox genes are expressed in specific patterns along the length of the gastrointestinal tract (Sekimoto et al., 1998), similar to their expression and role in axial patterning in the whole embryo. Many other homeobox genes are expressed in the developing digestive system and the functions of a number of these have been studied in mice (Beck et al., 2000). However, the specific tran-

scription and signaling pathways through which homeobox genes regulate development of the digestive system remain to be elucidated. The conservation among species of genomic sequences and of molecular mechanisms of development allows the use and comparison of animal models to provide insight into human genomics and development as well as disorders thereof. The pufferfish, Fugu, has provided a powerful model system for such studies because of the compact size of its genome (Venkatesh et al., 2000). It is estimated that the Fugu genome contains about 31,000 genes (Aparicio et al., 2002), similar to the number in human and mouse. However, these genes are contained within a genome of approximately 365 megabases (Mb) (Aparicio et al., 2002), compared to approximately 3000 Mb for the mammalian

54

M.D. Bates et al. / Gene 352 (2005) 45 – 56

Table 4 Conserved consensus cis-regulatory elements in a 5V putative enhancer that is conserved in six vertebrate Hlx genes Location Strand Human Location of the conserved region Ptx1a + Octamer binding sitea Winged-helix/ZF5/Whna AP-1a CREBa GATA-1 + Serum response factor + Avian C-type LTR TATA box

3183 to 3108 3105 3105 3103 3103 3078 3064 2991

Chimp 2895

3180 to 3105 3102 3100 3100 3102 3075 3061 2988

Mouse 2893

3235 to 3160 3157 3157 3155 3155 3130 3116 3043

Rat 2947

3286 to 3211 3208 3208 3206 3206 3181 3167 3094

Fugu 2998

883 to 820 817 817 815 815 791 777 685

Zebrafish 589

1103 to 1040 1037 1037 1035 1035 1011 997 914

821

Locations of core sequences for each conserved site in the Hlx genes (identified as described in Section 2.3) are listed by the 5V-most base of the consensus core sequence, relative to the translation start site, with 1 being the start of the protein-coding region. This numbering was used because there are multiple transcription start sites for Hlx in mouse (Bates et al., 2000) and human (Kennedy et al., 1994), making numbering relative to the transcription start arbitrary. These elements are also indicated in Fig. 4B. Note that the consensus sequences for several of the sites overlap; these are indicated by a superscript ‘‘a’’. a Overlapping consensus sequences.

species. Many important regulatory/developmental genes are conserved between humans and Fugu, including the members of the Hox (Aparicio et al., 1997) and Dlx (Ghanem et al., 2003) homeobox transcription factor gene families. Comparisons of genomic sequences of Fugu and mammalian species have been used to identify conserved gene regulatory elements (Aparicio et al., 1995). However, approximately 25% of predicted human proteins do not have a Fugu orthologue (Aparicio et al., 2002). Our analyses of current genomic sequence databases identified Hlx orthologues in human, chimpanzee, rat, Fugu and zebrafish as well as the mouse gene described by us previously (Bates et al., 2000). There is a related gene (H2.0) in Drosophila (Barad et al., 1988; Barad et al., 1991) that has orthologues in other insects (Fig. 2A) but not in C. elegans or lower eukaryotes. We found that the six vertebrate genes are highly conserved, including 5V regulatory sequence, protein-coding sequence and intron – exon structure. In addition, we find that the chick and Fugu Hlx genes are expressed in a tissue-specific pattern that is similar to that seen in mouse. Previously, zebrafish genes called hlx1, hlx2 and hlx3 have been reported (Seo et al., 1999). However, these genes encode homeodomain sequences that are more closely related to mouse Dbx1 and Dbx2. Thus, we propose that the zebrafish gene that we describe here should be designated hlx, and the previously described genes should be renamed dbx1, dbx2 and dbx3, respectively. In the present work, we have compared the protein sequences for Hlx between nine vertebrate species, including four mammalian species, one bird, one amphibian and three fish. We found high sequence conservation of the Hlx protein in the nine species, including complete conservation of the DNA-binding homeodomain (Fig. 1). Interestingly, the region of high sequence identity extends for 27 amino acid residues amino-terminal to the homeodomain. The encoded proteins share many but not all features previously seen in comparisons between human and mouse Hlx. The mammalian Hlx proteins share a glutamine-rich region, a feature that is observed in other transcription factor families,

including the Sp (Suske, 1999) and Groucho/TLE (Chen and Courey, 2000) families. Other features are observed in non-mammalian species as well, including a proline-rich region, a serine– threonine-rich region and a consensus site for tyrosine phosphorylation within the homeodomain. Interestingly, proline-rich domains are important for the repression activity of members of the Groucho/TLE family (Chen and Courey, 2000). The greatest sequence divergence among the nine proteins is at the carboxyl terminus, even comparing the primate and rodent species. However, in all nine species, this region is serine – threonine-rich and contains an acidic segment. These findings illuminate regions and features of the Hlx protein that likely are functionally important. The presence of a conserved consensus site for tyrosine phosphorylation within its homeodomain suggests that the Hlx protein function may be regulated by posttranslational modification. Regulation of homeodomain protein function by phosphorylation has been shown previously (Eklund et al., 2000). For example, phosphorylation of a tyrosine residue in the homeodomain of the human HOXA10 protein and the interaction of the phosphotyrosine with residues outside the homeodomain results in decreased DNA binding affinity and decreased repression of target gene expression (Eklund et al., 2000). We found conservation not only of Hlx genomic and amino acid sequences, but also in its expression. Hlx expression in the developing chick (Fig. 3) is very similar to that observed in mouse development (Lints et al., 1996). In particular, expression was observed in the midgut, hindgut and pharyngeal arches (branchial arches in mouse). Hlx is expressed in the splanchnic mesoderm of mouse embryos as early as E9.5 and, by E10.5, it is expressed in gut mesenchyme only caudal to the foregut– midgut junction (Lints et al., 1996). We found Hlx expression in developing chick at a relatively earlier stage, with mesenchymal expression observed along the length of the intestine during formation of the gut tube (stage 14). This difference could be a function of broader developmental differences between

M.D. Bates et al. / Gene 352 (2005) 45 – 56

mouse and chick. In the chick, the anterior intestinal portal, equivalent to the foregut in mouse, forms a day before the posterior intestinal portal (hindgut). In mouse, the foregut and hindgut tubes both form between E8 and E9. This suggests that Hlx may play a role in chick gut tube closure. In Fugu, the adult tissue specificity of Hlx expression as determined by RT-PCR (Fig. 4) overlapped that of mouse, as determined by RNase protection assay (Allen et al., 1991). In particular, Hlx was expressed in liver and intestine in both species. It is likely that the mechanisms governing expression in the digestive system in both species are similar. If there is conservation of the temporospatial expression of Hlx, it is likely mediated at the DNA level via conserved sequences. We found an approximately 290 bp sequence in the 5V regions of the mammalian and fish Hlx genes that is highly conserved (Fig. 5B); it is possible that this segment directs expression in the Hlx-expressing tissues in common between mammals and fish. Many of the consensus putative cis-regulatory elements found in this upstream conserved region may be recognized by widely expressed transcription factor families. It is likely that these elements are used combinatorially to regulate gene expression. For example, this region contains adjacent/overlapping AP-1 and octamer binding sites. Such a motif has previously been shown to regulate the mouse interleukin-2 promoter in T-lymphocytes (Pfeuffer et al., 1994). Alternatively, these conserved elements may have other functional roles. In addition, we also found conservation of more proximal regulatory elements. We have previously shown the importance for reporter gene expression of a conserved inverted repeat containing CCAAT boxes on opposite strands (Bates et al., 2001). This CCAAT box-containing inverted repeat is present not only in the mammalian Hlx genes but is also conserved in Fugu and zebrafish, suggesting that it is an important sequence regulating Hlx gene expression. Our work suggested that the CCAAT boxes themselves are more important than the inverted repeat structure (Bates et al., 2001). However, in all species examined, the CCAAT boxes are still present in the context of an inverted repeat (Fig. 5A), suggesting that the inverted repeat structure is important for function. Notably, the gap between the CCAAT boxes is 12 bp (11 bp in zebrafish), meaning that transcription factors binding to them would not be exactly on the same side of the DNA double helix, which has a periodicity of 10.4 base pairs per turn (Wang, 1979). In summary, we have identified and compared the gene and protein sequences for the Hlx homeobox transcription factor from nine vertebrate species. Comparisons of these sequences demonstrate significant conservation, suggesting that the mechanisms regulating Hlx expression and function are highly conserved in primates, rodents, birds, amphibians and fish. The similarities and differences observed provide insights into Hlx gene and protein structure and function. Further characterization of the Hlx genes and proteins will provide insight into the molecular regulation of the

55

development of the digestive system and other organ systems.

Acknowledgements The authors would like to thank Steve Potter, Rashmi Hegde and Jun Ma for helpful discussions, Shiv Kumar Viswanathan for assistance with the chick Hlx cDNA, and Bob Opoka, Dana Dunagan and Margaret Betzel for technical assistance. This work was supported by National Institutes of Health K08 DK02791 and R01 DK61219 (to M.D.B.) and by R24 DK064403 (Cincinnati Digestive Diseases Research Development Center: Center for Growth and Development).

References Allen, J.D., et al., 1991. Novel murine homeo box gene on chromosome 1 expressed in specific hematopoietic lineages and during embryogenesis. Genes Dev. 5, 509 – 520. Aparicio, S., et al., 1995. Detecting conserved regulatory elements with the model genome of the Japanese puffer fish Fugu rubripes. Proc. Natl. Acad. Sci. U. S. A. 92, 1684 – 1688. Aparicio, S., et al., 1997. Organization of the Fugu rubripes Hox clusters: evidence for continuing evolution of vertebrate Hox complexes. Nat. Genet. 16, 79 – 83. Aparicio, S., et al., 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297, 1301 – 1310. Barad, M., Jack, T., Chadwick, R., McGinnis, W., 1988. A novel, tissuespecific, Drosophila homeobox gene. EMBO J. 7, 2151 – 2161. Barad, M., Erlebacher, A., McGinnis, W., 1991. Despite expression in embryonic visceral mesoderm, H2.0 is not essential for Drosophila visceral muscle morphogenesis. Dev. Genet. 12, 206 – 211. Bates, M.D., Schatzman, L.C., Lints, T., Hamlin, P.E., Harvey, R.P., Potter, S.S., 2000. Structural and functional characterization of the mouse Hlx homeobox gene. Mamm. Genome 11, 836 – 842. Bates, M.D., Schatzman, L.C., Harvey, R.P., Potter, S.S., 2001. Two CCAAT boxes in a novel inverted repeat motif are required for Hlx homeobox gene expression. Biochim. Biophys. Acta 1519, 96 – 105. Beck, F., Tata, F., Chawengsaksophak, K., 2000. Homeobox genes and gut development. BioEssays 22, 431 – 441. Bu¨rglin, T.R., 1994. A comprehensive classification of homeobox genes. In: Duboule, D. (Ed.), Guidebook to the Homeobox Genes. Oxford University Press, Oxford, pp. 25 – 71. Chen, G., Courey, A.J., 2000. Groucho/TLE family proteins and transcriptional repression. Gene 249, 1 – 16. Cooper, J.A., Esch, F.S., Taylor, S.S., Hunter, T., 1984. Phosphorylation sites in enolase and lactate dehydrogenase utilized by tyrosine protein kinases in vivo and in vitro. J. Biol. Chem. 259, 7835 – 7841. Eklund, E.A., Jalava, A., Kakar, R., 2000. Tyrosine phosphorylation of HoxA10 decreases DNA binding and transcriptional repression during interferon g-induced differentiation of myeloid leukemia cell lines. J. Biol. Chem. 275, 20117 – 20126. Ghanem, N., et al., 2003. Regulatory roles of conserved intergenic domains in vertebrate Dlx bigene clusters. Genome Res. 13, 533 – 543. Hamburger, V., Hamilton, H.L., 1951. A series of normal stages in the development of the chick embryo. J. Morphol. 88, 49 – 92. Hentsch, B., et al., 1996. Hlx homeo box gene is essential for an inductive tissue interaction that drives expansion of embryonic liver and gut. Genes Dev. 10, 70 – 79.

56

M.D. Bates et al. / Gene 352 (2005) 45 – 56

Higgins, D.G., Sharp, P.M., 1989. Fast and sensitive multiple sequence alignments on a microcomputer. Comput. Appl. Biosci. 5, 151 – 153. Kennedy, M.A., Rayner, J.C., Morris, C.M., 1994. Genomic structure, promoter sequence, and revised translation of human homeobox gene HLX1. Genomics 22, 348 – 355. Lints, T.J., Hartley, L., Parsons, L.M., Harvey, R.P., 1996. Mesodermspecific expression of the divergent homeobox gene Hlx during murine embryogenesis. Dev. Dyn. 205, 457 – 470. Lipman, D.J., Pearson, W.R., 1985. Rapid and sensitive protein similarity searches. Science 227, 1435 – 1441. Mount, S.M., 1982. A catalogue of splice junction sequences. Nucleic Acids Res. 10, 459 – 472. Mullen, A.C., et al., 2002. Hlx is induced by and genetically interacts with T-bet to promote heritable TH1 gene induction. Nat. Immunol. 3, 652 – 658. Pfeuffer, I., et al., 1994. Octamer factors exert a dual effect on the IL-2 and IL-4 promoters. J. Immunol. 153, 5572 – 5585. Schwartz, S., et al., 2000. PipMaker—a web server for aligning two genomic DNA sequences. Genome Res. 10, 577 – 586. Sekimoto, T., et al., 1998. Region-specific expression of murine Hox genes implies the Hox code-mediated patterning of the digestive tract. Genes Cells 3, 51 – 64.

Seo, H.C., Nilsen, F., Fjose, A., 1999. Three structurally and functionally conserved Hlx genes in zebrafish. Biochim. Biophys. Acta 1489, 323 – 335. Suske, G., 1999. The Sp-family of transcription factors. Gene 238, 291 – 300. Ureta-Vidal, A., Ettwiller, L., Birney, E., 2003. Comparative genomics: genome-wide analysis in metazoan eukaryotes. Nat. Rev., Genet. 4, 251 – 262. Venkatesh, B., Gilligan, P., Brenner, S., 2000. Fugu: a compact vertebrate reference genome. FEBS Lett. 476, 3 – 7. Wang, J.C., 1979. Helical repeat of DNA in solution. Proc. Natl. Acad. Sci. U. S. A. 76, 200 – 203. Wilkinson, D.G., Nieto, M.A., 1993. Detection of messenger RNA by in situ hybridization to tissue sections and whole mounts. Methods Enzymol. 225, 361 – 373. Zheng, W.P., et al., 2004. Up-regulation of Hlx in immature Th cells induces IFN-g expression. J. Immunol. 172, 114 – 122.