A genome-wide survey of bHLH transcription factors in the Placozoan Trichoplax adhaerens reveals the ancient repertoire of this gene family in metazoan

A genome-wide survey of bHLH transcription factors in the Placozoan Trichoplax adhaerens reveals the ancient repertoire of this gene family in metazoan

Gene 542 (2014) 29–37 Contents lists available at ScienceDirect Gene journal homepage: www.elsevier.com/locate/gene A genome-wide survey of bHLH tr...

1MB Sizes 0 Downloads 6 Views

Gene 542 (2014) 29–37

Contents lists available at ScienceDirect

Gene journal homepage: www.elsevier.com/locate/gene

A genome-wide survey of bHLH transcription factors in the Placozoan Trichoplax adhaerens reveals the ancient repertoire of this gene family in metazoan Fuki Gyoja Marine Genomics Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa 904-0495, Japan

a r t i c l e

i n f o

Article history: Accepted 11 March 2014 Available online 12 March 2014 Keywords: Basic helix–loop–helix proteins Genome-wide survey Basal metazoans Evolution

a b s t r a c t Basic helix–loop–helix (bHLH) transcription factors play significant roles in multiple biological processes in metazoan cells. To address the evolutionary history of this gene family, comprehensive and detailed characterization in basal metazoans is essential. Here I report a genome-wide survey of bHLH genes in the Placozoan, Trichoplax adhaerens. The present survey revealed ancient origins of two orthologous families, 48-related-1/ Fer1 and ASCb, which both belong to high-order Group A. Group A factors are mainly involved in neural and mesodermal differentiation. I also identified novel members of a Group E orthologous family previously thought to be unique to Homo sapiens. These were discovered in Trichoplax, Saccoglossus kowalevskii, Euperipatoides kanangrensis, and Crassostrea gigas, but apparently are not found in Drosophila melanogaster, Caenorhabditis elegans, or Nematostella vectensis. Furthermore, as reported previously, many unclassified Group A members were observed in Trichoplax. The present study provides important information to infer the ancestral state of bHLH components in the Metazoa. © 2014 Elsevier B.V. All rights reserved.

1. Introduction bHLH proteins constitute a large group of transcription factors that have been identified in a wide range of eukaryotes, including metazoans, plants, and fungi. bHLH proteins contain bHLH domain(s), which comprise a basic region for DNA binding, and two α-helices, separated by a variable loop region (HLH). The HLH regions participate in dimerization. Putative full sets of bHLH genes have been described in the genomes of various metazoans (Dang et al., 2011; Gyoja and Satoh, 2013; Gyoja et al., 2012; Ledent and Vervoort, 2001; Ledent et al., 2002; Liu et al., 2012, 2013; Moore et al., 2000; Satou et al., 2003; Simionato et al., 2007; Wang et al., 2007, 2008, 2009; Zhang et al., 2013; Zheng et al., 2009). bHLH proteins in metazoa are involved in multiple biological processes, including cell differentiation and proliferation (Jones, 2004; Massari and Murre, 2000). They were categorized into six high-order Groups, A to F, based on structural and biochemical properties (Atchley and Fitch, 1997; Ledent and Vervoort, 2001). Group A is the largest, and contains many factors such as NeuroD, Neurogenin, MyoD, and Twist, which have important functions in neural and mesodermal differentiation (Massari and Murre, 2000). One of the principal Abbreviations: bHLH, basic helix–loop–helix; NJ, Neighbor Joining; ML, Maximum Likelihood; BI, Bayesian Inference. E-mail address: [email protected].

http://dx.doi.org/10.1016/j.gene.2014.03.024 0378-1119/© 2014 Elsevier B.V. All rights reserved.

questions in bHLH studies is when and how Group A appeared and expanded during animal evolution. This group seems to be exclusive to metazoans (Sebe-Pedros et al., 2011; Simionato et al., 2007). Did the evolution of Group A bHLH genes accompany the acquisition of neural and/or mesodermal tissues? To understand the evolution of bHLH, and especially those of Group A, information from basal metazoan clades is essential. In the remarkable work of Simionato et al. (2007), 16 bHLH genes, such as E12/E47 (Group A), Myc, max, MITF, SREBP, AP4 (all Group B), Hey (Group E) and COE (Group F), were described in the genome of the demosponge, Amphimedon queenslandica. In addition to this, one gene was associated with ARNT and Bmal, and another gene was associated with Hif, Sim and Trh in this demosponge species (Simionato et al., 2007). The authors suggested that this presumably indicates an ancestral situation prior to gene duplication (Simionato et al., 2007). Furthermore, some unclassified bHLH members were also described and three of them seem to be members of Group A (Simionato et al., 2007). On the other hand, two Cnidarian species, Nematostella vectensis and Acropora digitifera (Gyoja et al., 2012; Simionato et al., 2007), have at least 19 bHLH factors that belong to various Group A orthologous families, such as Twist, Hand, and E12/E47 (Gyoja et al., 2012; Simionato et al., 2007). Therefore, there seems to have been a great expansion of Group A orthologous families after divergence of the Porifera and the (Cnidaria + Bilateria), but before divergence of the Cnidaria and the Bilateria (see also Simionato et al., 2007).

30

F. Gyoja / Gene 542 (2014) 29–37

At present, bHLH information from basal metazoans other than Porifera and Cnidaria is quite limited. For this reason, genome-wide surveys of Placozoa and Ctenophora would be very valuable. In this paper, I report the genome-wide survey of bHLH genes in the placozoan, Trichoplax adhaerens. bHLH proteins in this animal display a state that is intermediate between those of the Porifera and Cnidaria. Herein, the evolution of bHLH proteins, especially those of Group A in basal metazoans will be discussed. 2. Materials and methods 2.1. Retrieval of bHLH sequences Using two methods, I searched for candidate bHLH sequences in the genome of T. adhaerens (Srivastava et al., 2008). First, I performed BLAST searches with the BLASTP and TBLASTN algorithms, using the full set of bHLH amino acid sequences from Homo sapiens (Ledent et al., 2002; Simionato et al., 2007) as query sequences. Second, I subjected all predicted proteins to a Pfam search (Bateman et al., 2002), using a threshold value of 1e−5, and retrieved predicted proteins containing bHLH domains. 2.2. Molecular phylogenetic analyses Multiple alignments were performed with T-coffee (Notredame et al., 2000) or ClustalW (Thompson et al., 1994). Gaps and unaligned regions were removed using gblocks (Talavera and Castresana, 2007) or were deleted manually. Molecular phylogenetic analyses were then performed by the NJ method using the BioNJ algorithm (Gascuel, 1997), and 1000 bootstrap pseudoreplications with SEAVIEW (Galtier et al., 1996). ML analysis was performed using MEGA5 (Tamura et al., 2011) with the JTT + G model (Jones et al., 1992) and 500 bootstrap pseudoreplications. BI was performed using MRBAYES (version 3.2; Huelsenbeck and Ronquist, 2001; Ronquist and Huelsenbeck, 2003; Ronquist et al., 2011) with JTT + G model (Jones et al., 1992). Four independent Markov chains were sampled every 100th generation. The first 25% of the trees were discarded as ‘burn-in’. Convergence of each run was assessed by plotting the log-likelihood. Average standard deviation split frequencies and PSRF values were also checked. 2.3. Mutual best-hit analyses Mutual best-hit analyses were performed as previously described (Gyoja et al., 2012; Satou et al., 2003). 2.4. Orthologous families If proteins from two or more organisms formed a single clade with a bootstrap value N 50 or a posterior probability N0.50 by two or more different molecular phylogenetic analyses, those proteins were considered to constitute an orthologous family (Gyoja et al., 2012; Ledent and Vervoort, 2001; Ledent et al., 2002; Simionato et al., 2007). For some orthologous families, such as myc or HES, this criterion was relaxed (Ledent and Vervoort, 2001; Ledent et al., 2002; Simionato et al., 2007). 3. Results and discussion A total of 32 bHLH genes were retrieved from the genome of T. adhaerens with BLASTP, TBLASTN, and Pfam domain searches (Tables 1, 2). In the T. adhaerens genome paper, the authors described 27 bHLH genes (Srivastava et al., 2008; for review, see Degnan et al., 2009). The present survey identified five additional genes. bHLH genes were categorized into six high-order groups, A–F (Atchley and Fitch, 1997; Ledent and Vervoort, 2001). Molecular phylogenetic analyses were performed using a putative full set of bHLHs from H. sapiens, D. melanogaster (Simionato et al., 2007), and T. adhaerens. Members of

Groups D and F have divergent HLH regions, and do not contain clearly identifiable basic regions. Therefore, these proteins were eliminated from molecular phylogenetic analyses (see also below). bHLH regions of the remaining predicted proteins were aligned, and molecular phylogenetic trees were constructed by NJ, ML, and BI (Table 1, Fig. S1). 3.1. Group A Group A contains 48-related-1/Fer1, 48-related-2/Fer2, ASCa, ASCb, ASCc, amber, Atonal, Beta3, Delilah, E12/E47, Hand, mesp, Mist, MyoD, MyoRa, MyoRb, Net, NeuroD, Neurogenin, NSCL, Oligo, paraxis, peridot, PTFa/Fer3, SCL, and Twist (Gyoja and Satoh, 2013; Gyoja et al., 2012; Simionato et al., 2007, 2008). Srivastava et al. (2008) reported E12/ E47 and “PTF” from T. adhaerens. They also reported that some other genes might be members of Group A, although they could not assign those genes to specific orthologous families (Srivastava et al., 2008). Using BLASTP, TBLASTN, and Pfam searches, I obtained one candidate each for 48-related-1/Fer1, ASCb and E12/E47 in the T. adhaerens genome. Molecular phylogenetic analyses placed the ASCb candidate into the ASCb clade, and the E12/E47 candidate into the E12/E47 clade (Fig. S1). Mutual best-hit analysis also supported these orthologies (Table 1). Therefore, I named the genes Tad-ASCb and Tad-E12/E47. For the 48-related-1/Fer1 candidate, the resolution of initial molecular phylogenetic analyses was low and I could not judge whether the gene is bona fide 48-related-1/Fer1 member (Fig. S1). I then constructed an in-group molecular phylogenetic tree of Group A (Fig. 1). Nevertheless, the phylogenetic position of this gene was still unclear. Therefore, I constructed a triply-branched molecular phylogenetic tree using PTFa/ Fer3, 48-related-1/Fer1, and 48-related-2/Fer2 (Fig. S2; also see Gyoja and Satoh, 2013). The T. adhaerens predicted protein and known 48related-1/Fer1 proteins formed one clade. Since I could not guess the root position of the tree, this result alone is not enough to conclude that the T. adhaerens gene is 48-related-1/Fer1. Because we showed in previous work that 48-related-1/Fer1 proteins have a C-terminal conserved motif (Gyoja and Satoh, 2013), I tried to determine whether the predicted protein has such a motif. Because the C-terminal part of the T. adhaerens protein seemed to be not precisely predicted in the JGI model, I re-predicted the scaffold sequence around the locus using Genscan (Burge and Karlin, 1997), and the predicted protein was aligned with known 48-related-1/Fer1 proteins. The alignment showed that the T. adhaerens predicted protein has 48-related-1/Fer1 type motif at its C-terminus (Fig. 2). Mutual best-hit analysis also supported this orthology (Table 1). Therefore, I named the gene Tad-48-related-1/ Fer1. Since 48-related-1/Fer1 and 48-related-2/Fer2 were previously classified into PTFb, which is closely related to PTFa/Fer3 (Ledent and Vervoort, 2001; Ledent et al., 2002; Simionato et al., 2007), this gene seems to correspond to “PTF” in the T. adhaerens genome paper (Srivastava et al., 2008). Furthermore, eleven bHLH factors that seem to belong to Group A were obtained. I named them Tad-Oligo/Beta3-like, Tad-ASC-like-a through c, and Tad-GroupA-HLH-a through g. Tad-Oligo/Beta3-like tends to make a cluster with Oligo and Beta3 (Fig. 1). Tad-ASC-like-a through c tend to make a cluster with ASCa, ASCb and ASCc (Fig. 1). Tad-GroupA-HLH-a through g did not make a statistically significant cluster with any known orthologous families (Fig. 1). Because I was able to identify orthologous family members among putative orphan bHLH genes in the coral A. digitifera (Gyoja et al., 2012), I tried to see if there are also novel orthologous family members among these unassigned genes. However, at present I cannot identify any candidates in Trichoplax. These unassigned bHLH genes that seem to belong to Group A will be discussed later. 3.2. Group B Group B contains AP4, Fig-α, MAD, max, MITF, Mlx, Mnt, Myc, SREBP, TF4, and USF (Simionato et al., 2007). Srivastava et al. (2008) reported

F. Gyoja / Gene 542 (2014) 29–37

31

Table 1 bHLH genes identified in the Trichoplax adhaerens genome. Group

Gene name

Amino acid sequence used for analyses

Best hit protein in the human proteome (RefSeq)

Bootstrap support by NJ/boostrap support by ML/posterior probability by BI

Best hit analyses

A

Tad-48 related 1

gw1.3.716.1

55/NSa/NS

MBHb

Tad-ASCb Tad-E12/E47

gw1.3.591.1 gw1.1.987.1

98/86/1.00 100/97/1.00

MBH MBH

Tad-ASC-like-a Tad-ASC-like-b Tad-ASC-like-c Tad-Oligo/Beta3-like Tad-GroupA-HLH-a Tad-GroupA-HLH-b Tad-GroupA-HLH-c Tad-GroupA-HLH-d Tad-GroupA-HLH-e Tad-GroupA-HLH-f Tad-GroupA-HLH-g Tad-AP4 Tad-MAD Tad-max Tad-MITF

fgeneshTA2_pg.C_scaffold_3000318 fgeneshTA2_pg.C_scaffold_3000319 scaffold3:2924380–2924443(+) gw1.4.684.1 fgeneshTA2_pg.C_scaffold_3001041 scaffold12:1797966–1798035(−) fgeneshTA2_pg.C_scaffold_4000748 fgeneshTA2_pg.C_scaffold 12000116 gw1.3.715.1 gw1.6.563.1 fgeneshTA2_pg.C_scaffold_5000807 fgeneshTA2_pg.C_scaffold_5000170 fgeneshTA2_pg.C_scaffold_24000016 fgeneshTA2_pg.C_scaffold_1000539 fgeneshTA2_pg.C_scaffold_13000094

96/78/0.86 71/54/0.98 99/70/0.94 100/88/1.00

MBH MBH MBH MBH

Tad-Mlx-like

gw1.9.522.1

45/32/NS

MBH

Tad-myc Tad-myc-like Tad-SREBP

fgeneshTA2_pg.C_scaffold_7000132 fgeneshTA2_pg.C_scaffold_5000465 fgeneshTA2_pg.C_scaffold_38000033

81/67/0.82 –/–/0.82 98/75/1.00

MBH NOc MBH

Tad-TF4 Tad-USF Tad-AHR Tad-ARNT

fgeneshTA2_pg.C_scaffold_2000730 fgeneshTA2_pg.C_scaffold_35000027 fgeneshTA2_pg.C_scaffold_5000325 gw1.3.1240.1

73/54/0.97 79/65/0.97 100/91/0.97 91/54/0.97

MBH MBH MBH MBH

Tad-Hif/Sim/Trh-like-a Tad-Hif/Sim/Trh-like-b

fgeneshTA2_pg.C_scaffold_5000254 fgeneshTA2_pg.C_scaffold_42000010

Tad-HELT

fgeneshTA2_pg.C_scaffold_10000135

94/71/0.96

MBH

Tad-HES Tad-HEY

fgeneshTA2_pg.C_scaffold_6000335 fgeneshTA2_pg.C_scaffold_21000090

79/12/1.00 84/29/0.61

MBH MBH

Tad-COE

gw1.8.267.1

NP_835455 pancreas transcription factor 1 subunit alpha NP_982260 achaete-scute homolog 4 XP_006722917 transcription factor E2-alpha isoform X4 NP_982260 achaete-scute homolog 4 NP_001257530 achaete-scute homolog 5 NP_001257530 achaete-scute homolog 5 NP_006152 neurogenin-1 NP_690862 fer3-like protein NP_067014 neurogenic differentiation factor 4 NP_803238 class A basic helix–loop–helix protein 15 NP_002491 neurogenic differentiation factor 1 NP_067014 neurogenic differentiation factor 4 NP_002491 neurogenic differentiation factor 1 NP_690862 fer3-like protein NP_003214 transcription factor AP-4 NP_005953 max-interacting protein 1 isoform a NP_002373 protein max isoform a NP_937821 microphthalmia-associated transcription factor isoform 6 XP_006716081 carbohydrate-responsive elementbinding protein isoform X4 NP_002458 myc proto-oncogene protein NP_002458 myc proto-oncogene protein NP_004167 sterol regulatory element-binding protein 1 isoform b NP_937848 max-like protein X isoform alpha NP_997174 upstream stimulatory factor 2 isoform 2 NP_001612 aryl hydrocarbon receptor precursor XP_005245214 aryl hydrocarbon receptor nuclear translocator isoform X7 NP_033664 single-minded homolog 2 short isoform NP_001158221 neuronal PAS domain-containing protein 3 isoform 1 XP_005263046 hairy and enhancer of split-related protein HELT isoform X1 NP_005515 transcription factor HES-1 XP_005266935 hairy/enhancer-of-split related with YRPW motif protein 2 isoform X1 XP_006723663 transcription factor COE4 isoform X3

B

C

E

F a b c

MBH

Not supported. Mutual best-hit relationships were obtained using the H. sapiens proteome. Mutual best-hit relationships were not obtained.

AP4, Max, Myc, SREBP, Bigmax (TF4), and USF genes in the T. adhaerens genome. Sebe-Pedros et al. (2011) reported an additional Group B gene, MAD, in this organism. The results of the present survey were consistent with those results (Fig. S1, Tables 1, 2). For the Mlx gene, statistical supports of the T. adhaerens candidate with known Mlx genes were low (Fig. S1, Table 1). Therefore, I named the gene Tad-Mlx-like. I also obtained two myc candidates (Fig. S1, Table 1). I named them Tad-myc and Tad-myc-like. In addition, the present survey identified one additional gene, MITF, in the T. adhaerens genome. Molecular phylogenetic analyses and mutual best-hit analysis supported this orthology (Fig. S1, Table 1). I named the gene Tad-MITF.

3.3. Group C Group C contains AHR, ARNT, Bmal, Clock, Cranky, Hif, Sim, SRC, and Trh (Simionato et al., 2007). Srivastava et al. (2008) reported AHR, ARNT, and two “Hif/Sim” in the T. adhaerens genome. The present survey yielded no other candidates for Group C genes (Fig. S1, Tables 1, 2). Molecular phylogeny for ARNT will be discussed below (Fig. 3).

3.4. Group D Group D contains ID and pearl (Gyoja et al., 2012; Simionato et al., 2007). No candidate gene of this high-order Group was reported in the previous study nor identified in the present survey.

3.5. Group E Group E contains HES, Hey, and clockwork orange (Gyoja and Satoh, 2013; Simionato et al., 2007). Many members of this high-order Group have a hairy/orange domain. HES protein often has a WRPW motif at its C-terminus, and Hey protein often has a YRPW motif at its C-terminus (Gyoja et al., 2012; Satou et al., 2003). Srivastava et al. (2008) reported two “HES/Hey” in the T. adhaerens genome. I identified three Group E candidates in the present survey. One of them was assigned to HES, and one was assigned to Hey (Fig. 4, Fig. S1). Mutual best-hit analysis also supported this orthology (Table 1). Therefore, I named these genes Tad-HES and Tad-Hey respectively. Tad-HES has a WRPW motif in its C-terminus, further supporting the orthology (Table S1). I could not find a YRPW motif in the C-terminus of Tad-Hey (Table S1). The

32

F. Gyoja / Gene 542 (2014) 29–37

Table 2 Numbers of bHLH genes in H. sapiens, D. melanogaster, T. adhaerens, and A. queenslandica. Species abbreviations: H. sap = H. sapiens; D. mel = D. melanogaster; T. adh = T. adhaerens; A. que = A. queenslandica. The number of genes from H. sapiens, D. melanogaster, and A. queenslandica genomes is from Simionato et al. (2007). Orthologous family

Group

T. adh

A. que

48 related 1 48 related 2 ASCa ASCb ASCca Amber Atonal Beta3 Delilah E12/E47 Hand Mesp Mist MyoD MyoRa MyoRb Net NeuroD Neurogenin NSCL Oligo Paraxis Peridot PTFa SCL Twist AP4 Fig-α Mad Max MITF MLX Mnt Myc SREBP TF4 USF AHR ARNT Bmal Clock Cranky HIF Sim SRC Trh ID Pearl Clockwork orange HELT HES Hey Coe Unassigned Total number

A

B

C

D D (?) E

F –

H. sap

1 0 2 3 0 0 2 2 0 4 2 3 1 4 2 2 1 4 3 2 3 2 0 1 3 2 1 1 5 1 5 2 1 4 2 1 3 2 2 2 2 1d 3 (4?) 2 (3?) 3 1 (2?) 4 0 0 1 7 (9?) 3 (5?) 4 4 118e

D. mel

1 1 4 0 0 0 3 1 1 1 1 1 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 2 1 1 3 1d 1 1 1 1 1 0 1 0 11 1 1 0 59e

1 0 0 1 0 0 0 0 (1?) 0 1 0 0 0 0 0 0 0 0 0 0 0 (1?) 0 0 0 0 0 1 0 1 1 1 1? 0 1 or 2 1 1 1 1 1 0 0 0 0 (1?) 0 (1?) 0 0 (1?) 0 0 0 1 1 1 1 13 to 15 32

0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 1 1 0 1 0 1 0 1? 0 0 0 0 0 0 0 0 0 0 1 1 6 16

a See Simionato et al. (2008). We named this gene “ACSc” in Gyoja and Satoh (2013), but we would like to change the name “ASCc” after the abbreviation used in H. sapiens and D. melanogaster. d Identified in Satou et al. (2003) and Gyoja and Satoh (2013). e Excluding cranky.

remaining predicted protein showed similarity with H. sapiens HELT protein, Crassostrea gigas (oyster) protein, Euperipatoides kanangrensis (Onychophora) protein, and Saccoglossus kowalevskii (acorn worm)

protein. These five proteins fell into a single clade (Fig. 4). Mutual best-hit analysis also supports the orthology (Table 1). Therefore, I concluded that these proteins form a novel and ancient orthologous family, and named it “HELT” according to the nomenclature of H. sapiens. TadHELT has a hairy/orange domain, supporting its placement within Group E (Table S1). I could not identify HELT candidates from Drosophila melanogaster, Caenorhabditis elegans, A. digitifera (coral), or N. vectensis (sea anemone) genomes. 3.6. Group F Only a single orthologous family, COE, is categorized in Group F. I could obtain a COE candidate in the T. adhaerens genome. Alignment of the amino acid sequence with H. sapiens COE proteins and its D. melanogaster ortholog knot suggested that the predicted protein is COE (Fig. S3). Mutual best-hit analysis also supported its orthology (Table 1). 3.7. Molecular phylogeny of ARNT and Bmal As described in Group C section, one T. adhaerens bHLH protein formed a single clade with known ARNT proteins (Fig. S1). In contrast, I could not find any candidates for Bmal, which is frequently clustered with ARNT. Simionato et al. (2007) reported that one A. queenslandica gene was associated with both ARNT and Bmal. The authors suggested that this presumably indicates an ancestral situation before gene duplication (Simionato et al., 2007). To re-examine this, I constructed a molecular phylogenetic tree for Group C using bHLH, PAS, and PAS_3 domains (Fig. 3). Unexpectedly, the A. queenslandica gene clustered with ARNT with high confidence, but not with ARNT/Bmal (Fig. 3). I consider this incongruity occurred because (1) while Simionato et al. (2007) used the bHLH domain alone for molecular phylogenetic analysis, I used mainly bHLH, PAS, and PAS_3 domains; (2) while Simionato et al. (2007) used ClustalW for alignments, I used T-coffee; (3) deletion methods for gaps and unaligned regions might also differ between the two studies; and (4) Tad-ARNT might affect the tree topology. Aligned sequences used for molecular phylogenetic analyses are shown in Fig. S4. According to the molecular phylogenetic tree by Sebe-Pedros et al. (2011), which used bHLH and PAS domains, this A. queenslandica gene also tends to cluster with ARNT, albeit with low statistical supports (47% for ML and 0.93 for BI). Because longer alignments generally tend to yield better tree resolution, at present I have greater confidence in the present results, although the expression pattern and/or a functional analysis of the A. queenslandica gene may resolve the matter. If my analysis is correct, Bmal may have been lost secondarily from genomes of A. queenslandica and T. adhaerens, since N. vectensis and A. digitifera both have Bmal (Gyoja et al., 2012; Simionato et al., 2007). Alternatively, a gene duplication may have occurred after the split of the Placozoa and other Eumatezoa, and one of those genes may have remained similar to the ancestral form (ARNT), while the others diverged (Bmal), as Larroux et al. (2008) also suggested. 3.8. Evolution of Group A: 48-related-1/Fer1 and ASCb Group A is the largest bHLH high-order group in the Bilateria. bHLH factors belonging to this group are mainly involved in neural and mesodermal differentiation (Massari and Murre, 2000). Simionato et al. (2007) identified E12/E47 in the A. queenslandica genome. In this demosponge, they found three additional unclassified genes that seem to belong to Group A (Simionato et al., 2007). This contrasts sharply

Fig. 1. A molecular phylogenetic tree of Group A bHLH factors, generated by Bayesian Inference, based on alignment of bHLH domains. Trichoplax Group A candidates and each representative from all known Group A orhologous families were used. Numbers at a given nodes indicate bootstrap values from the Neighbor Joining method, the Maximum Likelihood method, or the posterior probability of Bayesian Inference. A rooted tree is shown for simplicity. Proteins are designated by GenBank/DDBJ/EMBL accession numbers or FlyBase annotation IDs. Abbreviated species names include: Hs for Homo sapiens, Dm for Drosophila melanogaster, Adi for Acropora digitifera, and Tca for Tribolium castaneum. For each, the official name and the name of the orthologous family are provided.

F. Gyoja / Gene 542 (2014) 29–37

33

34

F. Gyoja / Gene 542 (2014) 29–37

Fig. 2. An amino acid alignment for C-terminal motif of 48-related-1/Fer1 predicted proteins. Species abbreviations: Tad = T. adhaerens; Adi = A. digitifera; Sko = Saccoglossus kowalevskii; Bf = B. floridae; Dr = Danio rerio; Xtr = Xenopus tropicalis; Tc = Tribolium castaneum; Dp = Daphnia pulex. For Bf19, Tc37, and Dp30, see Simionato et al. (2007).

with two Cnidarian species, N. vectensis and A. digitifera (Gyoja et al., 2012; Simionato et al., 2007), which both have at least 19 bHLH factors that belong to orthologous families of Group A (Gyoja et al., 2012; Simionato et al., 2007). Therefore, there seems to have been a great expansion of Group A orthologous families after divergence of the Porifera and (Cnidaria + Bilateria), but before the divergence of the Cnidaria and Bilateria (see also Simionato et al., 2007). The present survey allocated one T. adhaerens gene to ASCb, a second gene to 48-related-1/ Fer1, and a third gene to E12/E47. All of them are included in Group A. E12/E47 genes are, at least in H. sapiens and D. melanogaster, ubiquitously expressed, and form heterodimers with other Group A bHLH factors, such as MyoD, which are expressed in a tissue specific manner (Lassar et al., 1991; Massari and Murre, 2000). Therefore, among tissuespecific and evolutionarily conserved Group A bHLH factors, ASCb and 48-related-1/Fer1 seem to be, at present, the most ancient members (Fig. 5). Then, what is their function in T. adhaerens, which has neither neural nor mesodermal tissue (Srivastava et al., 2008)? The function of ASCb in other metazoans suggests an interesting possibility. ASCb in mice is called Sgn1, which is required for salivary gland development (Yoshida et al., 2001). ASCb in C. elegans is called hlh-6, which is required for pharyngeal gland differentiation (Smit et al., 2008). ASCb in jellyfish Podocoryne carnea is called ash2, which is expressed in the digestive tract gland (Seipel et al., 2004). Therefore, the ancient ASCb may have

functioned in intestinal glands for processing or passage of food. Whether such secretory function is a common feature of ancient Group A bHLH factors will be the subject of future studies. 3.9. Evolution of Group A: unassigned bHLH Molecular phylogenetic analyses revealed that as many as 13 Trichoplax bHLH genes are unassigned bHLHs (Fig. S1, Table 1). Two of them, Tad-Hif/Sim/Trh-like-a and b seem to belong to Group C (Fig. S1, Fig. 3, Table 1). Other members, however, all seem to belong to Group A, according to molecular phylogenetic analyses and similarity searches (Fig. S1, Table 1). Previous studies also claim that there are several unclassified Group A genes in the T. adhaerens genome (Srivastava et al., 2008; for review, see Degnan et al., 2009). This number is much higher than that in the A. queenslandica genome, in which only three unclassified Group A genes have been identified (Simionato et al., 2007). As described in the previous section, there seems to be a great expansion of Group A bHLH orthologous families after the divergence of Porifera and (Cnidaria + Bilateria) and before the divergence of Cnidaria and Bilateria (see also Simionato et al., 2007). A similar situation was observed in Cnidaria, where many unclassified Group A genes were identified in genomes of three Cnidarians (Gyoja et al., 2012; Simionato et al., 2007). Several orthologous families of Group A,

Fig. 3. A molecular phylogenetic tree of Group C, except for SRC, generated by Bayesian Inference, based on alignment of bHLH, PAS and PAS_3 domains. Numbers at nodes indicate bootstrap values from the Neighbor Joining method, the Maximum Likelihood method, or the posterior probability of Bayesian Inference. A rooted tree is shown for simplicity. Proteins are designated by GenBank/DDBJ/EMBL accession numbers. A. queenslandica proteins were obtained using BLASTP searches against NCBI nr database using bHLH domains described in Simionato et al. (2007). ACUQ01002286 is a gene ID from the Joint Genome Institute USA genome browser: http://www.jgi.doe.gov/.

F. Gyoja / Gene 542 (2014) 29–37

35

Fig. 4. A molecular phylogenetic tree of Group E, generated by Bayesian Inference, based on alignment of bHLH domains. Numbers at nodes indicate bootstrap values from the Neighbor Joining method, the Maximum Likelihood method, or the posterior probability of Bayesian Inference. A rooted tree is shown for simplicity. Proteins are designated by GenBank/DDBJ/EMBL accession numbers. Abbreviated species names include: Eka = Euperipatoides kanangrensis, and Cgi = Crassostrea gigas.

such as MyoD or NeuroD, seem to have been established in the Bilaterial clade after the split of Cnidaria and Bilateria (Simionato et al., 2007). Evolutionary scenarios for Group A bHLH genes in metazoa could be explained as follows (Fig. 5). (1) In early metazoan evolution, a few Group A bHLH factors may have been in an evolutionarily unstable, rapidly evolving state. Three unclassified Group A bHLH factors in A. queenslandica (Simionato et al., 2007) might have originated from these factors. (2) Some of these ancient unstable bHLH factors may have been transformed into evolutionary stable orthologous families.

48-Related-1 and ASCb are possible examples. Molecular evolution of “stable” bHLH factors may be slower than that of “unstable” bHLH factors, and once bHLH genes are in the “stable state”, their orthology could be assessed by molecular phylogenetic analyses (Fig. 5). (3) While (2) occurred, other unstable bHLH factors may have remained unstable and increased in number (Fig. 5). Eleven unclassified Group A bHLH factors in T. adhaerens may have originated from these factors. (4) Some of these increased unstable bHLH factors may have been transformed into evolutionarily stable orthologous families, such as

36

F. Gyoja / Gene 542 (2014) 29–37

Fig. 5. A simplified hypothesis for evolution of Group A bHLH factors. Abbreviation: CA for common ancestor. Each CA's bHLH components were inferred from A. queenslandica (Simionato et al., 2007), T. adhaerens (this study), A. digitifera (Gyoja et al., 2012), H. sapiens, C. elegans and D. melanogaster (Ledent and Vervoort, 2001; Ledent et al., 2002; Simionato et al., 2007). Evolutionarily conserved orthologous families are shown in blue. Evolutionarily less conserved orphan genes are shown in magenta. The transition from a “unstable state” to “stable state” is shown in green

twist and paraxis, which can be found in Cnidarian and Bilaterian genomes. (5) While (4) occurred, other unstable bHLH factors may have remained unstable and duplicated further. Fourteen or so unclassified Group A bHLH factors in N. vectensis and A. digitifera (Gyoja et al., 2012; Simionato et al., 2007) may have originated from these factors. (6) Some of the further increased unstable bHLH factors may have transformed into stable orthologous families such as MyoD and NeuroD which can be found in Bilaterian genomes. (7) There are some unclassified bHLH factors in extant Bilaterian genomes (Simionato et al., 2007). Whether they are survivors of unstable factors or whether they simply diverged from established orthologous families, remains to be elucidated. Thus, Group A bHLH factors may have required a preparatory period before orthologous families became established. Fig. 5 is based on a parsimonious and conservative view and each common ancestor might have more “stable orthologous families” and/or more “unstable genes”. There might also be some unknown, conserved, orthologous families among “unclassified genes” of T. adhaerens and/or Cnidaria. An orthologous family may branch out from another orthologous family, as I suggest in the section entitled “Molecular phylogeny of ARNT and Bmal”. Functional analyses of these unclassified Group A genes in Trichoplax and Cnidaria may provide an important key to solve the evolutionary scenario of Group A bHLHs, which have pivotal roles in neural and mesodermal differentiation in higher Metazoa.

4. Conclusions A total of 32 bHLH genes were identified in the genome of the basal metazoan, T. adhaerens. Those were categorized into 16 or 17 orthologous families. Among them, 48-related-1/Fer1 and ASCb seem to be the most ancient tissue-specific-type Group A orthologous families. Furthermore, 13 to 15 unassigned bHLH members were identified. This study provides information that will be important in understanding the evolution of bHLH factors in early metazoa. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.gene.2014.03.024.

Conflict of interest The author declares that she has no conflict of interests.

Acknowledgments I thank Dr. Nori Satoh for the continuous encouragement on this work and for critical reading of the manuscript. I also thank all members of the Marine Genomics Unit for their support. I thank Dr. Steven D. Aird for editing my English and for useful advice on the manuscript and two anonymous reviewers for helpful comments on the manuscript.

References Atchley, W.R., Fitch, W.M., 1997. A natural classification of the basic helix–loop–helix class of transcription factors. Proceedings of the National Academy of Sciences of the United States of America 94, 5172–5176. Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., et al., 2002. The Pfam protein families database. Nucleic Acids Research 30, 276–280. Burge, C., Karlin, S., 1997. Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268, 78–94. Dang, C.W., Wang, Y., Chen, K.P., Yao, Q., Zhang, D.B., Guo, M., 2011. The basic helix–loop– helix transcription factor family in the pea aphid, Acyrthosiphon pisum. Journal of Insect Science 11, 84. Degnan, B.M., Vervoort, M., Larroux, C., Richards, G.S., 2009. Early evolution of metazoan transcription factors. Current Opinion in Genetics & Development 19, 591–599. Galtier, N., Gouy, M., Gautier, C., 1996. SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny. Computer Applications in the Biosciences 12, 543–548. Gascuel, O., 1997. BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Molecular Biology and Evolution 14, 685–695. Gyoja, F., Satoh, N., 2013. Evolutionary aspects of variability in bHLH orthologous families: insights from the pearl oyster Pinctada fucata. Zoological Science 30, 868–876. Gyoja, F., Kawashima, K., Satoh, N., 2012. A genomewide survey of bHLH transcription factors in the coral Acropora digitifera identifies three novel orthologous families, pearl, amber, and peridot. Development Genes and Evolution 222, 63–76. Huelsenbeck, J.P., Ronquist, F., 2001. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755. Jones, S., 2004. An overview of the basic helix–loop–helix proteins. Genome Biology 5, 226. Jones, D.T., Taylor, W.R., Thornton, J.M., 1992. The rapid generation of mutation data matrices from protein sequences. Computer Applications in the Biosciences 8, 275–282. Larroux, C., Luke, G.N., Koopman, P., Rokhsar, D.S., Shimeld, S.M., Degnan, B.M., 2008. Genesis and expansion of metazoan transcription factor gene classes. Molecular Biology and Evolution 25, 980–996. Lassar, A., Davis, R., Wright, W.E., Kadesch, T., Murre, C., Voronova, A., Baltimore, D., Weintraub, H., 1991. Functional activity of myogenic HLH proteins requires heterooligomerization with E12/E47-like protein in vivo. Cell 66, 305–315. Ledent, V., Vervoort, M., 2001. The basic helix–loop–helix protein family: comparative genomics and phylogenetic analysis. Genome Research 11, 754–770. Ledent, V., Paquet, O., Vervoort, M., 2002. Phylogenetic analysis of the human basic helix– loop–helix proteins. Genome Biology 3, 0030.1–0030.18.

F. Gyoja / Gene 542 (2014) 29–37 Liu, A., Wang, Y., Dang, C., Zhang, D., Song, H., Yao, Q., et al., 2012. A genome-wide identification and analysis of the basic helix–loop–helix transcription factors in the ponerine ant, Harpegnathos saltator. BMC Evolutionary Biology 12, 165. Liu, A., Wang, Y., Zhang, D., Wang, X., Song, H., Dang, C., Yao, Q., Chen, K., 2013. Classification and evolutionary analysis of the basic helix–loop–helix gene family in the green anole lizard, Anolis carolinensis. Molecular Genetics and Genomics 288, 365–380. Massari, M.E., Murre, C., 2000. Helix–loop–helix proteins: regulators of transcription in eukaryotic organisms. Molecular and Cellular Biology 20, 429–440. Moore, A.W., Barbel, S., Jan, L.Y., Jan, Y.N., 2000. A genomewide survey of basic helix– loop–helix factors in Drosophila. Proceedings of the National Academy of Sciences of the United States of America 97, 10436–10441. Notredame, C., Higgins, D.G., Heringa, J., 2000. T-Coffee: a novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302, 205–217. Ronquist, F., Huelsenbeck, J.P., 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574. Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D.L., Darling, A., Höhna, S., et al., 2011. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic Biology 61, 539–542. Satou, Y., Imai, K.S., Levine, M., Kohara, Y., Rokhsar, D., Satoh, N., 2003. A genomewide survey of developmentally relevant genes in Ciona intestinalis. I. Genes for bHLH transcription factors. Development Genes and Evolution 213, 213–221. Sebe-Pedros, A., Mendoza, A., Lang, B.F., Degnan, B.M., Ruiz-Trillo, I., 2011. Unexpected repertoire of metazoan transcription factors in the unicellular holozoan Capsaspora owczarzaki. Molecular Biology and Evolution 28, 1241–1254. Seipel, K., Yanze, N., Schmid, V., 2004. Developmental and evolutionary aspects of the basic helix–loop–helix transcription factors Atonal-like 1 and Achaete-scute homolog 2 in the jellyfish. Developmental Biology 269, 331–345. Simionato, E., Ledent, V., Richards, G., Thomas-Chollier, M., Kerner, P., Coornaert, D., et al., 2007. Origin and diversification of the basic helix–loop–helix gene family in metazoans: insights from comparative genomics. BMC Evolutionary Biology 7, 33. Simionato, E., Kerner, P., Dray, N., Le Gouar, M., Ledent, V., Arendt, D., et al., 2008. Atonaland achaete-scute-related genes in the annelid Platynereis dumerilii: insights into the evolution of neural basic-helix–loop–helix genes. BMC Evolutionary Biology 8, 170.

37

Smit, R.B., Schnabel, R., Gaudet, J., 2008. The HLH-6 transcription factor regulates C. elegans pharyngeal gland development and function. PLoS Genetics 4 (10), e1000222. Srivastava, M., Begovic, E., Chapman, J., Putnam, N.H., Hellsten, U., Kawashima, T., Kuo, A., Mitros, T., Salamov, A., Carpenter, M.L., Signorovitch, A.Y., Moreno, M.A., Kamm, K., Grimwood, J., Schmutz, J., Shapiro, H., Grigoriev, I.V., Buss, L.W., Schierwater, B., Dellaporta, S.L., Rokhsar, D.S., 2008. The Trichoplax genome and the nature of placozoans. Nature 454, 955–960. Talavera, G., Castresana, J., 2007. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Systematic Biology 56, 564–577. Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., Kumar, S., 2011. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Molecular Biology and Evolution 28, 2731–2739. Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positionspecific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4680. Wang, Y., Chen, K., Yao, Q., Wang, W., Zhi, Z., 2007. The basic helix–loop–helix transcription factor family in Bombyx mori. Development Genes and Evolution 217, 715–723. Wang, Y., Chen, K., Yao, Q., Wang, W., Zhu, Z., 2008. The basic helix–loop–helix transcription factor family in the honey bee, Apis mellifera. Journal of Insect Science 8, 1–12. Wang, Y., Chen, K., Yao, Q., Zheng, X., Yang, Z., 2009. Phylogenetic analysis of zebrafish basic helix–loop–helix transcription factors. Journal of Molecular Evolution 68, 629–640. Yoshida, S., Ohbo, K., Takakura, A., Takebayashi, H., Okada, T., Abe, K., Nabeshima, Y., 2001. Sgn1, a basic helix–loop–helix transcription factor delineates the salivary gland duct cell lineage in mice. Developmental Biology 240, 517–530. Zhang, D.B., Wang, Y., Liu, A.K., Wang, X.H., Dang, C.W., Yao, Q., Chen, K.P., 2013. Phylogenetic analyses of vector mosquito basic helix–loop–helix transcription factors. Insect Molecular Biology 22, 608–621. Zheng, X., Wang, Y., Yao, Q., Yang, Z., Chen, K., 2009. A genome-wide survey on basic helix–loop–helix transcription factors in rat and mouse. Mammalian Genome 20, 236–246.