Phylogenetic analysis of teneurin genes and comparison to the rearrangement hot spot elements of E. coli

Phylogenetic analysis of teneurin genes and comparison to the rearrangement hot spot elements of E. coli

Gene 257 (2000) 87–97 www.elsevier.com/locate/gene Phylogenetic analysis of teneurin genes and comparison to the rearrangement hot spot elements of E...

1MB Sizes 0 Downloads 20 Views

Gene 257 (2000) 87–97 www.elsevier.com/locate/gene

Phylogenetic analysis of teneurin genes and comparison to the rearrangement hot spot elements of E. coli Ariane D. Minet, Ruth Chiquet-Ehrismann * Friedrich Miescher-Institute, PO Box 2543, CH-4002 Basel, Switzerland Received 3 March 2000; received in revised form 16 August 2000; accepted 24 August 2000 Received by A. Dugaiczyk

Abstract Teneurins are a novel family of transmembrane proteins conserved between invertebrates and vertebrates. There are two members in Drosophila, one in C. elegans and four members in mouse. Here, we describe the analysis of the genomic structure of the human teneurin-1 gene. The entire human teneurin-1 (TEN1) gene is contained in eight PAC clones representing part of the chromosomal locus Xq25. Interestingly, many X-linked mental retardation syndromes ( XLMR) and non-specific mental retardation (MRX ) are mapped to this region. The location of the human TEN1 together with the neuronal expression makes TEN1 a candidate gene for XLMR and MRX. We also identified large parts of the human teneurin-2 sequence on chromosome 5 and sections of human teneurin-4 at chromosomal position 11q14. Database searches resulted in the identification of ESTs encoding parts of all four human members of the teneurin family. Analysis of the genomic organization of the Drosophila ten-a gene revealed the presence of exons encoding a long form of ten-a, which can be aligned with all other teneurins known. Sequence comparison and phylogenetic trees of teneurins show that insects and vertebrates diverged before the teneurin ancestor was duplicated independently in the two phyla. This is supported by the presence of conserved intron positions between teneurin genes of man, Drosophila and C. elegans. It is therefore not possible to class any of the vertebrate teneurins with either Drosophila Ten-a or Ten-m. The C-terminal part of all teneurins harbours 26 repetitive sequence motifs termed YD-repeats. YD-repeats are most similar to the repeats encoded by the core of the rearrangement hot spot (rhs) elements of Escherichia coli. This makes the teneurin ancestor a candidate gene for the source of the rhs core acquired by horizontal gene transfer. © 2000 Elsevier Science B.V. All rights reserved. Keywords: Carbohydrate binding; DOC4; Horizontal gene transfer; Nervous system; Neurestin; odz; ten-a; ten-m; Transmembrane protein; XLMR/MRX

1. Introduction Several different laboratories have discovered members of a novel family of transmembrane proteins sumAbbreviations: aa, amino acid(s); B. subtilis, Bacillus subtilis; C. difficile, Clostridium difficile; cDNA, DNA complementary to DNA; C. elegans, Caenorhabditis elegans; DOC4, downstream of chop 4; E. coli, Escherichia coli; EGF, epidermal growth factor; EST, expressed sequence tag; htg, high throughput; kb, kilo bases; MRX, non-specific X-linked mental retardation; nt, nucleotide; odz, odd oz; ORF, open reading frame; rhs, rearrangement hot spot; Ten1, teneurin-1; Ten2, teneurin-2; Ten3, teneurin-3; Ten4, teneurin-4; TNC, tenascin-C; TNM1, tenascin-M1; WAPA, Wall Associated Protein A; XLMR, X-linked mental retardation syndromes; XLP, X-linked lymphoproliferative disease. * Corresponding author. Present address: Friedrich MiescherInstitute, Maulbeerstrasse 66, CH-4058 Basel, Switzerland. Tel.: +41-61-697 24 94; fax: +41-61-697-39-76. E-mail address: [email protected] (R. Chiquet-Ehrismann)

marized in Table 1 The first member was Ten-a of Drosophila melanogaster (Baumgartner and ChiquetEhrismann, 1993), which was found in a search for Drosophila homologues of tenascins and shares with this protein family the same type of EGF-like repeats. Ten-a is homologous to the N-terminal part of all other teneurins. The second member of the teneurin family, Drosophila Ten-m/Odd oz (Odz), was discovered independently in two laboratories (Baumgartner et al., 1994; Levine et al., 1994). Ten-m/odz is expressed in seven stripes during the blastoderm stage in early embryos, and mutational analysis confirmed a ‘pair-rule’ phenotype. Thus ten-m/odz was the first gene of this class of segmentation genes that does not solely encode a transcription factor but acts on the level of cellular interactions. Mouse DOC4, the first vertebrate member of the

0378-1119/00/$ - see front matter © 2000 Elsevier Science B.V. All rights reserved. PII: S0 3 7 8 -1 1 1 9 ( 0 0 ) 0 0 38 8 - 7

88

A.D. Minet, R. Chiquet-Ehrismann / Gene 257 (2000) 87–97

Table 1 Nomenclature of the teneurins Suggested name

Abbreviation

Teneurin-1

ten-a ten-m

TEN1 TEN1 TEN1 Ten1 Ten1 Ten1 TEN2 TEN2 Ten2 Ten2 Ten2 Ten2 TEN3 Ten3 TEN3 Ten3 Ten3 TEN4 Ten4 Ten4 TEN4 Ten4 TEN4 ten-a ten-m

Teneurin

ten

Teneurin-2

Teneurin-3

Teneurin-4

Synonymous names

Tenascin-M1/ TNM1 ten-1 ten-m1 odz-3 odz-1 ten-2 ten-m2 odz-1 odz-2 Neurestin ten-m3 ten-m3 odz-2 odz-3 DOC4 ten-m4 ten-m4 odz-4 ten-a ten-m odd Oz (odz) R13F6.4

teneurin family, was identified in a screen for proteins that were expressed in response to perturbation of protein folding in the endoplasmic reticulum ( Wang et al., 1998a). Oohashi et al. (1999) screened a mouse cDNA library with the sequence of chicken ten1 (Minet et al. 1999) and identified four members of the teneurins in mouse, which were called Ten-m1 to Ten-m4, where Ten-m4 corresponds to DOC4. Ben-Zur et al. (2000) independently presented the same four mouse teneurins, calling them Odz1-4 using the same numbering scheme. The rat Ten2 homologue, neurestin, was identified based on the presence of a short sequence similar to odorant receptors (Otaki and Firestein, 1999a). Neurestin/Ten2 is expressed in the rat olfactory bulb, particularly in cells immediately below regenerating receptor neurons after chemical ablation. This result suggested an important role for neurestin/Ten2 in processes during neuronal regeneration. Further members of the teneurins have been found in zebrafish, where Ten-m3 and Ten-m4 have been isolated in a search for factors regulated by the LIM/homeodomain transcription factor Islet-3 (Mieda et al., 1999). In chicken, three teneurins, TEN1, TEN2, and TEN4, have been described (Minet et al., 1999; Rubin et al., 1999; Tucker et al. 2000). The sequence of human TEN1 was recognized in the database according to its sequence homology (Minet et al., 1999). In an independent report, Brandau et al. (1999)

Organism

Authors

Human Human Chicken Mouse Mouse Mouse Human Chicken Mouse Mouse Mouse Rat Human Mouse Zebrafish Mouse Mouse Human Mouse Mouse Zebrafish Mouse Chicken Drosophila Drosophila

Minet and Chiquet-Ehrismann (this issue) Brandau et al. (1999) Minet et al. (1999) Oohashi et al. (1999) Ben-Zur and Wides (1999) Ben-Zur et al. (2000) Minet and Chiquet-Ehrismann (this issue) Rubin et al. (1999), Tucker et al. (2000) Oohashi et al. (1999) Ben-Zur and Wides (1999) Ben-Zur et al. (2000) Otaki and Firestein (1999a,b) Minet and Chiquet-Ehrismann (this issue) Oohashi et al. (1999) Mieda et al. (1999) Ben-Zur and Wides (1999) Ben-Zur et al. (2000) Minet and Chiquet-Ehrismann (this issue) Wang et al. (1998a) Oohashi et al. (1999) Mieda et al. (1999) Ben-Zur and Wides (1999), Ben-Zur et al. (2000) Tucker et al. (2000) Baumgartner and Chiquet-Ehrismann (1993) Baumgartner et al. (1994) Levine et al. (1994, 1997a,b) Wilson et al. (1994)

C. elegans

described the same gene as TNM1 and named it in the database entry tenascin-M1 (af100772). This name is misleading, and we would like to state clearly that neither TNM1 nor any other teneurin belongs to the protein family of tenascins since tenascins consist of tenascin-type EGF-like repeats, fibronectin type III repeats and a fibrinogen globe, and none of the teneurins have the latter two features. Chicken TEN1 has been shown to bind to heparin via its YD-repeats. Furthermore, this part of the protein has been shown to promote neurite outgrowth of DRG explants when used as a substrate. This outgrowth could be inhibited by the presence of heparin in the culture medium. Nb2a neuroblastoma cells transfected with chicken TEN2 expression constructs revealed a cleavage of the extracellular part of the Ten-2 protein at a furinlike site in the spacer region between the transmembrane sequence and the EGF-like repeats, and the transfected cells showed enhanced formation of filopodia and enlarged growth cones (Rubin et al., 1999). The YD-repeats are located towards the C-terminus of the teneurins, and there are 26 of them followed by a region of condensed YD-like repeats (Minet et al., 1999). YD-repeats are about 20 aa long with the consensus sequence Gx YxYDx GR(L, I or V )x G, where 3–9 2 3–10 x represents any aa. The teneurins are the only known eukaryotic proteins containing YD-repeats, but repeats

A.D. Minet, R. Chiquet-Ehrismann / Gene 257 (2000) 87–97

with the same consensus sequence are found in the predicted protein of the rearrangement hot spot (rhs) elements of E. coli ( Feulner et al., 1990; Zhao et al., 1993) and in the Wall Associated Protein A ( WAPA) of Bacillus subtilis ( Foster, 1993). The N-terminal part of WAPA contains three long repeats of 102 aa length, which mediate association of the protein to the cell wall. The C-terminal part harbours 31 YD-repeats of unknown function ( Foster, 1993). In E. coli, eight rhs elements, termed rhsa to rhsh, have been described (Hill et al., 1994; Wang et al., 1998). A feature common to all rhs elements is a 3.7 kb, GC-rich core that encodes 35 YD-repeats. This core maintains a single open reading frame (ORF ) that is not expressed to a detectable extent during routine cultivation (Hill et al., 1994). Nevertheless, it seems that it is an advantage for E. coli to retain the rhs elements because the ORF is kept open. Up to seven rhs elements can be present in the same E. coli strain, but there are also strains with no such element (Hill et al., 1995). Sequences more distantly related to the YD-repeats are the class II repeats of Toxin A from Clostridium difficile (Dove et al., 1990; Von Eichel-Streiber et al., 1992). Toxin A is an enterotoxin and cytotoxin and causes haemagglutination of rabbit erythrocytes via the binding of its class II repeats with carbohydrates on the surface of the blood cells. Therefore, it has been speculated that the YD-repeats encoded by the rhs core may bind to carbohydrates as well. This hypothesis is supported by our own findings, which showed the binding of heparin to the YD-repeats of chicken TEN1 (Minet et al., 1999). In this paper, we present the genomic organization of the complete human teneurin-1 gene, which may be a candidate gene for an X-linked mental retardation syndrome ( XLMR). Furthermore, we identify the partial sequences of human TEN2, TEN3 and TEN4 and the sequence of a long version of Drosophila Ten-a. Based on these sequences, we investigate evolutionary aspects of the teneurin family and propose the YD-repeats of the teneurin ancestor as a possible source for the rhs core in E. coli.

2. Materials and methods 2.1. Sequences The genomic sequence of human ten1 was obtained from a BLAST search at NCBI using the cDNA sequence of chicken ten1 (Minet et al., 1999). The exons have been found by comparing both sequences using the program Bestfit provided by the Wisconsin Package (Genetics Computer Group, Inc.). The exon–intron junctions have been defined according to (Padgett et al., 1986). All eight PAC clones have been characterized at the

89

Sanger Centre, Hinxton, UK: clones 394F12, 799F15 by R. McDougall, clone 128N22 by J. Wilkinson, clone 789O11 by C. Bird, clones 384D21, 369O24 by R. Deadman and clones 618F1 105M9 by P. Wray. Clones 394F12 and 384D21 are from the library RPCI3, clones 799F15, 789O11 and 618F1 from the library RPCI4, clone 128N22 from the library RPCI1 and clone 105M9 from the library RPCI5. All RPCI libraries have been constructed at the Roswell Park Cancer Institute by the group of Pieter de Jong. Clone 369O24 is from the human PAC library. The genomic sequences of human ten2 and ten4 were obtained from a BLAST search at NCBI screening the high-throughput (htg) library using the deduced cDNA sequence of human ten1. The sequences in this database are highly preliminary and line up in several unordered pieces. Clones ac008637, ac008601 and ac011369 originate from the DOE Joint Genome Institute, and clone ap000716 has been found by Masahira Hattori, Kitasato University, Kanagawa, Japan. The ESTs encoding parts of the human teneurins (cf. Fig. 2A) were obtained, as described by Minet et al. (1999). They have the following Accession Nos: ESTs for Ten1: n38988, n69375, n38974; ESTs for Ten2: ai 669183, aa069326, r95485, aa446367, z45705; ESTs for Ten3: aa382885, n73156, aa702693, ai076124, n67072, aa142988, aa149286, aa149382, aa150978, aa424477, aa449172, aa452858, aa452995, aa476500, ai080208, ai401605, ai159901, aa903519, aa045049, aa130373, ai373936, w35185, w73064, w78206, aa716614, aa716742, ai184444, ai131458, u24159, n24159, aa195521, aa253023, aa253381, aa253407, aa235960, aa479097, ai274854, ai250756, ai288687, ai123585, ai417102, ai288964, ai092540, ai070062, ai274285, ai278290, ai435000, ai474192, ai224973, ai225058, aa478464, h92917, aa037356, aa604438, ai024797, ai017896, aa724870, aa934101, aa775291, aa775879, ai391741, ai186614, ai183448, ai188411, ai189718, ai199810, ai189836, ai220992, ai200670, ai262318, ai367099, ai310370, ai338238, ai200509, ai310214, ai371856, ai222954; ESTs for Ten4: t05481, ai307633. The genomic sequence of Drosophila ten-a was obtained from a BLAST search at NCBI using the published cDNA sequence of ten-a (Baumgartner and Chiquet-Ehrismann 1993). All exons encoding the published ten-a cDNA could be identified within entry AE003488. The same entry also contained exons with homology to the Drosophila ten-m encoding a putative long form of ten-a. All exons encoding the ten-a have been found by comparing the translated protein sequences of AE003488 and the joining entry AE003489 with the protein sequence of ten-m using the program Bestfit provided by the Wisconsin Package. The proposed cDNA sequence of the long ten-a variant can be assembled as follows: nt 256∞360–256∞643, 258∞915– 259∞021, 267∞904–269∞109, 269∞263–269∞457, 271∞369–

90

A.D. Minet, R. Chiquet-Ehrismann / Gene 257 (2000) 87–97

271∞578, 273∞982–274∞170, 277∞515–277∞661, 277∞802– 278∞009, 295∞061–295∞180, 299∞033–299∞191, 299∞563– 299∞674, 300312–301∞689 from AE003488 and nt 6∞764– 10∞656 of AE003489. The description of the structure of the ten-m gene can be found in the Genebank entry AE003597, and the genomic structure and sequence of the C. elegans teneurin has been described as R13F6.4 in the WormBase (http://wormbase.sanger.ac.uk). 2.2. Sequence comparison and alignments Sequence identities have been obtained using the program Bestfit provided by the Wisconsin Package. Distances between different teneurin sequences, marked by lines in Fig. 3A, have been calculated using the formula 10(100:( X-1)) where X stands for the sequence identity in per cent. Multiple sequence alignments have been done with the program ClustalX 1.8 ( Thompson et al., 1997) with the default parameters except for the matrix where we used BLOSSUM-30. For Fig. 4, the program Boxshade (http://www.ch.embnet.org/ software/BOX_form.html ) was used with the parameter 0.6 for shading. The Accession Nos of the predicted rhs cores of E. coli used for the alignment are as follows: rhsa, P16916; rhsb, P16917; rhsc, P16918; rhsd, P16919; rhse, AF044499; rhsf, AF044502; rhsg, AF044503; rhsh,

AF044501, and that of the Wall Associated Protein A of Bacillus subtilis ( WAPA) is L05634. 2.3. Phylogenetic trees Phylogenetic trees were calculated using the program ClustalX 1.8 (Thompson et al., 1997). For all trees, the bootstrapping option with the default parameters (1000 runs) was used, and gaps were excluded. To draw the trees, the program njplot was used (Perrie`re and Gouy, 1996).

3. Results and discussion 3.1. Genomic structure of the human teneurin-1 gene Recently, we have characterized the chicken teneurin1 cDNA (Minet et al., 1999). In the course of searching the databases for homologous sequences, we realized that the human teneurin-1 (ten1) gene was contained in eight PAC clones representing part of the chromosomal locus Xq25. These clones cover a region of more than 600 kb (Fig. 1A). We were able to identify 31 exons representing the entire coding sequence of ten1, as shown in Fig. 1A. Below the cDNA, a model of the protein is shown, as assembled from the exon sequences. The first PAC clone, 394F12, contains two exons encoding the

Fig. 1. Gene organization of human teneurin-1 and comparison to Drosophila ten-a and ten-m and to C.elegans teneurin. (A) Top: clones within genomic region Xq25 covering the locus within the of about 600 kb of human ten1 drawn to scale. The whole genomic region can be assembled from eight PAC clones. Above the lines representing the clones, the PAC clone number and the length in base pairs are given. Below, the exons are numbered from 1 to 31, and their positions are indicated by a vertical line. Underneath, the exons are drawn to scale as black boxes, and the start ATG and stop codon is indicated. The length of the predicted cDNA from the beginning of the first exon to the stop codon is 8180 bp. A model of the encoded TEN1 protein is drawn to scale below. The following symbols were used to indicate structural features: black box, transmembrane sequence; diamonds, EGF-like repeats; gray boxes, YD-repeats; zigzag line, condensed YD-like repeats. (B) Proposed cDNA structure of Drosopila Ten-a (dtena) with intron positions indicated by white lines. (C ) cDNA structure of Drosophila Ten-m as annotated in the Genebank entry AE003597 (dtenm). (D) cDNA structure of C. elegans teneurin (R13F6.4) as annotated in the wormebase at the Sanger Center (http://wormbase.sanger.ac.uk). Conserved introns are connected by stippled lines.

A.D. Minet, R. Chiquet-Ehrismann / Gene 257 (2000) 87–97

first half of the cytoplasmic part of TEN1. The two following PAC clones, 799F15 and 128N22, do not include any exons of TEN1 and are part of an intron of more than 140 kb length. The next PAC clone, 789O11, contains exons three and four, which encode the second half of the cytoplasmic part, including two proline-rich stretches. PAC clone 384D21 contains exons five to 10. They encode the transmembrane region and the first part of the extracellular domain including the first five tenascin-type EGF-like repeats. Exons 11–17 of PAC clone 369O24 encode another three tenascintype EGF-like repeats followed by the first half of a cysteine-rich region. The second half of the cysteine rich region is encoded by five exons (18–22) of PAC clone 618F1 and exon 23 of PAC clone 1052M9. The eight remaining exons (24–31) of PAC clone 1052M9 encode

91

the C-terminal half of the entire protein, which harbours the 26 YD-repeats. In Table 2, the position and sequence of all exon– intron boundaries of the human ten1 are given, which lead to the assembly of a putative full-length cDNA. Since the deduced primary sequence of human TEN1 matches the sequence of chicken TEN1 with 90% identity and no gaps (Minet et al., 1999), we assume that the prediction is accurate. The PAC clones representing human TEN1 are mapped to the X-chromosome in the area Xq25. Genetic diseases such as X-linked lymphoproliferative disease ( XLP), X-linked mental retardation syndromes ( XLMR) and non-specific mental retardation (MRX ) have been mapped to this region. Since Brandau et al. (1999) reported that patients with mutations in

Table 2 Exon/intron boundaries of human teneurin-1a

a Nucleotides matching the consensus splice site for vertebrates are shown in bold (Padgett et al., 1986).

92

A.D. Minet, R. Chiquet-Ehrismann / Gene 257 (2000) 87–97

SH2D1A, a gene found to be located in a tail-to-tail orientation next to ten1 (referred to as TNM1 in their report), suffer from XLP, and since the ten1 genes of the same patients are intact, a link between ten1 and XLP can be excluded. However, the predominant expression of TEN1 in brain (Minet et al., 1999; Oohashi et al., 1999; Otaki and Firestein, 1999a,b), together with its chromosomal location, makes ten1 a candidate gene for the XLMR and MRX syndromes (Lubs et al., 1996; des Portes et al., 1999; Minet et al., 1999). 3.2. Human teneurins derived from ESTs After screening the human EST database, we found a large number of sequences corresponding to short segments of cDNAs for teneurin-like proteins (for Accession Nos, see Section 2). They all have high amino acid identities, from 79 to 97%, to one of the mouse teneurins and therefore appear to represent parts of human teneurins. For human TEN2, five ESTs are found, encompassing four regions that cover 13.2% of the protein (Fig. 2). The majority of the ESTs are most similar to mouse TEN3 and therefore give rise to a partial sequence of human TEN3 (Fig. 2), which covers 24.4% of the protein within three regions. Finally, two parts at the C-terminal end of human TEN4 are encoded by two ESTs, representing 4.6% of the protein. Every EST detected could be assigned to one of the four known teneurins, and none of them seems to represent a further family member. Therefore, we assume that no more than four teneurins exist in vertebrates. While screening the htg library with a tblastn search with TEN1, we found three preliminary sequences on chromosome 5 containing 61% of the coding sequence of human TEN2. The 1737 most C-terminal aa of TEN2 are also contained in a cDNA sequence, named KIAA1127 (Hirosawa et al., 1999), which is derived from human male brain. Furthermore, we found one sequence in the htg library of chromosome 11 containing

33% of the coding sequence of TEN4. This clone is located at position 11q14. Thus, we can locate human ten1 to the X-chromosome at position Xq25, ten2 to chromosome 5 and ten4 to chromosome 11 at position 11q14.

3.3. Evidence for a long Drosophila Ten-a In order to analyze evolutionary aspects of teneurins, we decided to compare the genomic organization of the human ten1 gene with the repective members in Drosophila and C. elegans. Screening the genomic sequences of Drosophila, we were able to identify all exons representing the published ten-a cDNA sequence (Baumgartner and Chiquet-Ehrismann 1993) within Genebank entry ae003488. However, on the same entry, further exons could be identified with homology to the Drosophila ten-m. Alternative splicing of a primary tena transcript could lead to the generation of a mRNA encoding an alternative long form of Ten-a similar to all other teneurins known. The entire sequence encoding a putative Ten-a of the length of 2515 aa could be assembled, as outlined in the Section 2 ( Fig. 1B). The positions of the exon–intron junctions of the ten-a gene were compared to those of the human ten1 ( Fig. 1A), Drosophila ten-m ( Fig. 1C ) and C. elegans teneurin ( Fig. 1D) genes. Interestingly, many exon–intron junctions are precisely conserved. They are indicated by stippled lines in Fig. 1. Thus, four junctions are conserved between human TEN-1 and the C. elegans teneurin, three different junctions with Drosophila ten-m and a total of nine with Drosophila ten-a, respectively. Most of the conserved exon–intron junctions mark proposed boundaries of protein domains, namely the transmembrane domain, the EGF-like repeats and the major block of the YD repeats ( Fig. 1). This conservation of gene organization implies that all of these genes have been derived from a common ancestor.

Fig. 2. Known regions of the human teneurins. The structural model of human TEN1 is shown at the top. The structural domains are indicated with the same symbols as in Fig. 1. The human TEN1 sequence has been described in this paper, the sequences of human TEN2, TEN3 and TEN4 have been assembled either from ESTs or from sequences of the high-throughput genomic (htg) library at NCBI, and the locations of the assembled sequences are indicated by thick black lines. Furthermore, there is one clone from Hirosawa et al. (1999), named KIAA1127, spanning the last 1737 aa of Ten2.

A.D. Minet, R. Chiquet-Ehrismann / Gene 257 (2000) 87–97

3.4. Similarity of teneurin family members From all species investigated to date, 14 full-length members of the teneurin family are known, and they have a length between 2515 and 2825 amino acids. In the vertebrates, there is TEN1 of human, mouse and chicken, TEN2 of mouse, rat and chicken, and TEN3 and TEN4 from mouse as well as zebrafish. In invertebrates, there is one teneurin ( Ten) in C. elegans and Ten-m and Ten-a in Drosophila (for references, see Table 1). The amino acid identities between the different teneurins are shown in Fig. 3A. The aa identity between vertebrate teneurin family members is 58–72% and increases to 78–98% between species homologues. The aa identity between vertebrate teneurin family members and the Drosophila Ten-a or Ten-m is 37–41%. The aa identities between the vertebrate teneurins and Drosophila Ten-a and Ten-m, respectively, are not significantly different. Thus, it is not possible to class any of the vertebrate teneurins with Ten-a or Ten-m, respectively. This is also evident in a phylogenetic tree (Fig. 3B) using the full-length teneurins for alignment

93

and excluding gaps. According to this tree, only one teneurin existed in each organism at the time when the precursors of nematodes, insects and vertebrates separated. Later, the Drosophila teneurin precursor probably duplicated into Ten-a and Ten-m, and the vertebrate ancestor split into the four present classes of teneurins. C. elegans remains with only one teneurin. The way in which the duplication among the four vertebrate teneurins occurred is not clear, and this is reflected by the low bootstrap number separating TEN4 from TEN1 or TEN2 and 3, respectively ( Fig. 3B). The duplication of the vertebrate teneurins took place before the vertebrates themselves started to radiate into fish, birds and mammals. This is shown by the fact that in the phylogenetic tree, orthologous teneurins of different species group together (Fig 3B). The aa identity of 41% between Drosophila Ten-a and Ten-m is only slightly higher than the identities between insect and vertebrate teneurins. This high diversity implies that the duplication in Drosophila took place shortly after the separation of insects and vertebrates. The relationship of C. elegans teneurin to the insect and

Fig. 3. Relationship between the teneurin family members. (A) Degree of the relationship between single family members (white circles). Distances between circles are proportional to the degree of identity between them. Bold lines represent the degree of homology between vertebrates, insects and nematodes. The numbers designate the percentage of aa identity between the teneurins indicated. The abbreviations used are: h1, human TEN1; m1, mouse Ten1 (Accession No. AB025410); m2, mouse Ten2 (Accession No. AB025411); m3, mouse Ten3 (Accession No. AB025412); m4, mouse Ten4 (Accession No. AF059485, AB025413); c1, chicken TEN1 (Accession No. AJ238613); c2, chicken TEN2 (Accession No. AJ279031); z3, zebrafish TEN3 (Accession No. AB026979); z4, zebrafish TEN4 (Accession No. AB026980); da, Drosophila Ten-a (assembled as described in Section 2); dm, Drosophila Ten-m (Accession No. x73154); ce, C. elegans teneurin (Accession No. U00046). Black circles represent the centre of the vertebrate teneurins ( V ) and the centre of Drosophila Ten-a and Ten-m (D). (B) Neighbor-joining tree drawn from a sequence alignment of all known full-length teneurins. These are C. elegans teneurin ( Ten), the long Drosophila Ten-a ( Ten-a) and Ten-m ( Ten-m), human TEN-1 (hTEN1), the four mouse teneurins (mTen1, mTen2, mTen3 and mTen4), rat neurestin/ten2 (rTen2), chicken TEN1 and TEN2 (cTEN1, cTEN2) and zebrafish TEN3 and TEN4 (zTEN3, zTEN4). Numbers indicate the bootstrap values in per cent.

94

A.D. Minet, R. Chiquet-Ehrismann / Gene 257 (2000) 87–97

A.D. Minet, R. Chiquet-Ehrismann / Gene 257 (2000) 87–97

95

the vertebrate teneurins is nearly the same with 30–35% and 33–36% aa identity, respectively. 3.5. Similarity of teneurin YD-repeats with bacterial proteins The teneurins harbour the YD-repeats in their C-terminal part, and these repeats have been shown to bind to heparin (Minet et al., 1999). Interestingly, we found repeats with the same consensus sequence in the rhs elements of E. coli (Feulner et al., 1990; Zhao et al., 1993) and in the Wall Associated Protein A ( WAPA) of Bacillus subtilis (Foster, 1993). Hill et al. (1994) suggest that E. coli has acquired the rhs core by horizontal gene transfer from another organism because the GC content of this part of the gene is abnormally high for E. coli. It was possible to align the sequences of all vertebrate teneurins, Drosophila Ten-m and Ten-a, C. elegans teneurin, the predicted protein sequences of the eight rhs elements (rhsa to rhsh) and WAPA in the region of the YD-repeats and the condensed YD-like repeats with ClustalX. Fig. 4 shows a shortened version of this alignment using one sequence each of the vertebrate teneurins (mouse Ten1, Ten2, Ten3 and Ten4), Drosophila Ten-m and Ten-a and one predicted protein sequence each of the three subgroups (see this section below) of the rhs elements (rhsd, rhsf and rhsg). In addition to the YD-repeats consensus sequence (Gx YxYDx GR[L, I 3–9 2 or V ]x G), many hydrophobic (I/V/L) and basic 3–10 (R/K ) residues are conserved as well. Particularly striking is the conservation of the last 74 aa of the alignment representing the condensed YD-like region. The phylogenetic tree drawn from the complete alignment shows that the YD-repeats encoded by the rhs elements and the teneurins diverge shortly before nematodes, insects and the vertebrates separate (Fig. 5). This differs significantly from classic phylogenetic trees showing a much earlier separation of eukaryotes and prokaryotes. This supports the theory of the uptake of the YD-repeatDNA by E. coli from an origin outside the species by horizontal gene transfer. Y.D. Wang et al. (1998) showed that the rhs elements comprise three subfamilies that most likely diverged prior to the acquisition by E. coli. Therefore, E. coli must have acquired three independent subtypes of the rhs core. According to the phylogenetic tree in Fig. 5, the uptake of three independent rhs cores must have taken place at a relatively recent time

Fig. 5. Neighbor-joining tree drawn from an alignment in the region of the YD-repeats and condensed YD-like sequence. The sequences used for the teneurins are the same as those listed for Fig 3. They are compared to the predicted protein sequences of the eight rhs elements (rhsa to rhsh) of E. coli and to the YD-repeat containing part of the wall-associated protein A ( WAPA) of Bacillus subtilis. The numbers indicated are the bootstrap values in per cent.

after the segregation of the different rhs cores. E. coli cannot have acquired its YD-repeats from C. elegans, Drosophila or the vertebrates because their YD-repeat sequences are too divergent from those of the rhs cores (see Fig. 5). There are other taxa that have separated early from the branch leading to vertebrates and insects. We postulate that one of them has carried a teneurin gene with its YD-repeats, which evolved into the precursors of the three rhs core subtypes. E. coli then acquired three independent rhs cores from a species of this taxon. Other examples that transfer of domains or repeats of genes from eukaryotes to prokaryotes occur are the discoidin-like domains (Baumgartner et al., 1998) and the fibronectin type III repeats (Little et al., 1994).

Fig. 4. Multiple sequence alignment of the YD-repeat region of mouse and Drosophila teneurins and the predicted proteins of rhsd, rhsf and rhsg of E. coli. This alignment is a selection of that used for the phylogenetic tree shown in Fig. 5. It displays the two Drosophila teneurins Ten-a and Ten-m, one sequence per type of vertebrate teneurin (the mouse teneurins mTen1, mTen2, mTen3 and mTen4) and one sequence per group of the rhs elements (rhsd, rhsf and rhsg). The YD-repeats are numbered as they occur in the teneurins, and the condensed YD-like repeats are indicated. The latter region is especially conserved between teneurins and the predicted proteins of the rhs elements and aligns with almost no gaps. Amino acids are shaded in black when they are identical and in grey when they are similar in at least two-thirds of the sequences.

96

A.D. Minet, R. Chiquet-Ehrismann / Gene 257 (2000) 87–97

3.6. Conclusions 1. We present a detailed analysis of the genomic sequence of human ten1. This large gene of more than 600 kb harbours 31 exons that cover the fulllength coding sequence of the protein. 2. From several human ESTs related to the teneurins, we obtained partial sequences of the other human members of the teneurin family, TEN2, TEN3 and TEN4. Since no other related ESTs were found, we assume that no further teneurins exist in humans. Screening of the htg database revealed that human ten2 is located on chromosome 5 and human ten4 on chromosome 11 at position 11q14. 3. After searching the Drosophila genome sequence, we identified exons encoding a long form of Drosophila Ten-a. Comparison of the gene organization between human ten-1, Drosophila ten-a and ten-m and the C. elegans teneurin revealed the presence of several conserved intron locations. This implies that all teneurins are derived from one common ancestor gene. 4. Sequence comparisons of teneurins show that it is not possible to class any of the vertebrate teneurins with either Ten-a or Ten-m of Drosophila. The same result is obtained from phylogenetic trees. These show that insects and vertebrates separated before they started to duplicate independently omit the single teneurin ancestor. 5. The duplication of the teneurins in the vertebrates occurred before the vertebrates radiated into fish, birds and mammals. 6. YD-repeats occur only in teneurins, in the predicted protein of rhs elements of E. coli and in WAPA of Bacillus subtilis. E. coli is believed to have acquired the rhs core by horizontal gene transfer from another species. We propose the teneurin ancestor as a possible source for the rhs core.

Acknowledgements We would like to thank Dr Carl David Nager and Thomas Nyffenegger for computer support. We are very grateful to Dr Matthias Chiquet and Marianne BrownLuedi for critical reading of the manuscript and to Dr Ju¨rg Spring for helpful discussion. This work was supported by the Novartis Forschungsstiftung.

References Baumgartner, S., Chiquet-Ehrismann, R., 1993. Tena, a Drosophila gene related to tenascin, shows selective transcript localization. Mech. Dev. 40, 165–176. Baumgartner, S., Martin, D., Hagios, C., Chiquet-Ehrismann, R., 1994. Tenm, a Drosophila gene related to tenascin, is a new pairrule gene. EMBO J. 13, 3728–3740.

Baumgartner, S., Hofmann, K., Chiquet-Ehrismann, R., Bucher, P., 1998. The discoidin domain family revisited: new members from prokaryotes and a homology-based fold prediction. Prot. Sci. 7, 1626–1631. Ben-Zur, T., Wides, R., 1999. Mapping homologs of Drosophila odd Oz (odz): Doc4/Odz4 to mouse chromosome 7, Odz1 to mouse chromosome 11 and ODZ3 to human chromosome Xq25. Genomics 58, 102–103. Ben-Zur, T., Feige, E., Motro, B., Wides, R., 2000. The mammalian Odz gene family: homologs of a Drosophila pair-rule gene with expression implying distinct yet overlapping developmental roles. Dev. Biol. 217, 107–120. Brandau, O., Schuster, V., Weiss, M., Hellebrand, H., Fink, F.M., Kreczy, A., Friedrich, W., Strahm BNiemeyer, C., Belohradsky, B.H., Meindl, A., 1999. Epstein-barr virus-negative boys with nonhodgkin lymphoma are mutated in the SH2D1A gene, as are patients with X-linked lymphoproliferative disease. Hum. Mol. Genet. 8, 2407–2413. des Portes, V., Beldjord, C., Chelly, J., Hamel, B., Kremer, H., Smits, A., van Bokhoven, H., Ropers, H.H., Claes, S., Fryns, J.P., Ronce, N., Gendrot, C., Toutain, A., Raynaud, M., Moraine, C., 1999. X-linked nonspecific mental retardation (MRX ) linkage studies in 25 unrelated families: The European XLMR consortium. Am. J. Med. Genet. 85, 263–265. Dove, C.H., Wang, S.Z., Price, S.B., Phelps, C.J., Lyerly, D.M., Wilkins, T.D., Johnson, J.L., 1990. Molecular characterization of the Clostridium difficile toxin A gene. Infect. Immun. 58, 480–488. Feulner, G., Gray, J.A., Kirschman, J.A., Lehner, A.F., Sadosky, A.B., Vlazny, D.A., Zhang, J., Zhao, S., Hill, C.W., 1990. Structure of the rhsA locus from Escherichia coli K-12 and comparison of rhsA with other members of the rhs multigene family. J. Bacteriol. 172, 446–456. Foster, S.J., 1993. Molecular analysis of three major wall-associated proteins of Bacillus subtilis 168: evidence for processing of the product of a gene encoding a 258 kDa precursor two-domain ligandbinding protein. Mol. Microbiol. 8, 299–310. Hill, C.W., Sandt, C.H., Vlazny, D.A., 1994. Rhs elements of Escherichia coli: a family of genetic composites each encoding a large mosaic protein. Mol. Microbiol. 12, 865–871. Hill, C.W., Feulner, G., Brody, M.S., Zhao, S., Sadosky, A.B., Sandt, C.H., 1995. Correlation of Rhs elements with Escherichia coli population structure. Genetics 141, 15–24. Hirosawa, M., Nagase, T., Ishikawa, K., Kikuno, R., Nomura, N., Ohara, O., 1999. Characterization of cDNA clones selected by the GeneMark analysis from size-fractionated cDNA libraries from human brain. DNA Res. 6, 329–336. Levine, A., Bashan-Ahrend, A., Budai-Hadrian, O., Gartenberg, D., Menasherow, S., Wides, R., 1994. Odd Oz: a novel Drosophila pair rule gene. Cell 77, 587–598. Levine, A., Weiss, C., Wides, R., 1997a. Expression of the pair-rule gene odd Oz (odz) in imaginal tissues. Dev. Dyn. 209, 1–14. Levine, A., Gartenberg, D., Yakov, R., Lieberman, Y., Budai-Hadrian, O., Bashan-Ahrend, A., Wides, R., 1997b. The genetics and molecular structure of the Drosophila pair-rule gene odd Oz(odz). Gene 200, 59–74. Little, E., Bork, P., Doolittle, R.F., 1994. Tracing the spread of fibronectin type III domains in bacterial glycohydrolases. J. Mol. Evol. 39, 631–643. Lubs, H.A., Schwartz, C.E., Stevenson, R.E., Arena, J.F., 1996. Study of X-linked mental retardation ( XLMR): summary of 61 families in the Miami/Greenwood Study. Am. J. Med. Genet. 64, 169–175. Mieda, M., Kikuchi, Y., Hirate, Y., Aoki, M., Okamoto, H., 1999. Compartmentalized expression of zebrafish ten-m3 and ten-m4, homologues of the Drosophila ten(m)/odd Oz gene, in the central nervous system. Mech. Dev. 87, 223–227. Minet, A.D., Rubin, B.P., Tucker, R.P., Baumgartner, S., ChiquetEhrismann, R., 1999. Teneurin-1, a vertebrate homologue of the

A.D. Minet, R. Chiquet-Ehrismann / Gene 257 (2000) 87–97 Drosophila pair-rule gene ten-m, is a neuronal protein with a novel type of heparin-binding domain. J. Cell. Sci. 112, 2019–2032. Oohashi, T., Zhou, X.H., Feng, K., Richter, B., Morgelin, M., Perez, M.T., Su, W.D., Chiquet-Ehrismann, R., Rauch, U., Fassler, R., 1999. Mouse ten-m/Odz is a new family of dimeric type II transmembrane proteins expressed in many tissues. J. Cell. Biol. 145, 563–577. Otaki, JM., Firestein, S., 1999a. Neurestin: putative transmembrane molecule implicated in neuronal development. Dev. Biol. 212, 165–181. Otaki, J.M., Firestein, S., 1999b. Segregated expression of neurestin in the developing olfactory bulb. Neuroreport 10, 2677–2680. Padgett, R.A., Grabowski, P.J., Konarska, M.M., Seiler, S., Sharp, P.A., 1986. Splicing of messenger RNA precursors. Annu. Rev. Biochem. 55, 1119–1150. Perrie`re, G., Gouy, M., 1996. WWW-Query: An on-line retrieval system for biological sequence banks. Biochimie 78, 364–369. Rubin, B.P., Tucker, R.P., Martin, D., Chiquet-Ehrismann, R., 1999. Teneurins: A novel family of neuronal cell surface proteins in vertebrates, homologous to the Drosophila pair-rule gene product ten-m. Dev. Biol. 216, 195–209. Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G., 1997. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 24, 4876–4882.

97

Tucker, R.P., Chiquet-Ehrismann, R., Chevron, M.P., Martin, D., Hall, R.K., Koss, R., Rubin, B.P., 2000a. Teneurin-2 is expressed in tissues that regulate limb and somite pattern formation and is induced in vitro and in situ by FGF8. Dev. Dyn. in press Tucker, R.P., Martin, D., Kos, R., Chiquet-Ehrismann, R., 2000b. The expression of teneurin-4 in the avian embryo. Mech. Dev. in press Von Eichel-Streiber, C., Sauerborn, M., Kuramitsu, H.K., 1992. Evidence for a modular structure of the homologous repetitive C-terminal carbohydrate-binding sites of Clostridium difficile toxins and Streptococcus mutans glucosyltransferases. J. Bacteriol. 174, 6707–6710. Wang, X.Z., Kuroda, M., Sok, J., Batchvarova, N., Kimmel, R., Chung, P., Zinszner, H., Ron, D., 1998a. Identification of novel stress-induced genes downstream of chop. EMBO J. 17, 3619–3630. Wang, Y.D., Zhao, S., Hill, C.W., 1998b. Rhs elements comprise three subfamilies which diverged prior to acquisition by Escherichia coli. J. Bacteriol. 180, 4102–4110. Wilson, R., et al., 1994. 2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans. Nature 368, 32–38. Zhao, S., Sandt, C.H., Feulner, G., Vlazny, D.A., Gray, J.A., Hill, C.W., 1993. Rhs elements of Escherichia coli K-12: complex composites of shared and unique components that have different evolutionary histories. J. Bacteriol. 175, 2799–2808.