Phylogenetic analysis and evolution of aromatic amino acid hydroxylase

Phylogenetic analysis and evolution of aromatic amino acid hydroxylase

FEBS Letters 584 (2010) 4775–4782 journal homepage: www.FEBSLetters.org Phylogenetic analysis and evolution of aromatic amino acid hydroxylase Jun C...

1MB Sizes 0 Downloads 26 Views

FEBS Letters 584 (2010) 4775–4782

journal homepage: www.FEBSLetters.org

Phylogenetic analysis and evolution of aromatic amino acid hydroxylase Jun Cao a,⇑, Feng Shi b, Xiaoguang Liu a, Guang Huang a, Min Zhou a a b

Institute of Life Science, Jiangsu University, Xuefu Road 301, Zhenjiang 212013, Jiangsu, PR China Shandong Lvdu Bio-technique Industry, 169# Huanghe 2 Road, Binzhou 256600, Shandong, PR China

a r t i c l e

i n f o

Article history: Received 31 August 2010 Revised 29 October 2010 Accepted 5 November 2010 Available online 10 November 2010 Edited by Takashi Gojobori Keywords: Aromatic amino acid hydroxylase Phylogenetic analysis Evolution

a b s t r a c t A study was performed to investigate the phylogenetic relationship among AAAH members and to statistically evaluate sequence conservation and functional divergence. In total, 161 genes were identified from 103 species. Phylogenetic analysis showed that well-conserved subfamilies exist. Exon–intron structure analysis showed that the gene structures of AAAH were highly conserved across some different lineage species, while some species-specific introns were also found. The dynamic distribution of ACT domain suggested one gene fusion event has occurred in eukaryota. Significant functional divergence was found between some subgroups. Analysis of the site-specific profiles revealed critical amino acid residues for functional divergence. This study highlights the molecular evolution of this family and may provide a starting point for further experimental verifications. Ó 2010 Published by Elsevier B.V. on behalf of the Federation of European Biochemical Societies.

1. Introduction Neurotransmitters are endogenous chemicals that are released from neurons. They usually act on receptors on post-synaptic cells and produce functional changes in the target cells. Serotonin (5-HT), dopamine (DA), norephinephrine (NE), epinephrine and their metabolites are conventional neurotransmitters in the brain. The synthesis of these neurotransmitters needs some catalytic reaction of a class of rate-limiting enzymes, aromatic amino acid hydroxylases (AAAH). AAAHs are family of non-heme, iron(II)dependent enzymes. According to the different substrates, they can be divided into three categories: phenylalanine-4-hydroxylase (PAH), tyrosine hydroxylase (TH) and tryptophan hydroxylase (TPH). Among them, PAH catalyzes the conversion of phenylalanine to tyrosine, the rate limiting step in the catabolism of phenylalanine [1]. TH catalyzes the hydroxylation of tyrosine to L-dopa, the rate-limiting step in the synthesis of catecholamines such as dopamine and noradrenaline [2]. And TPH catalyzes the rate limiting step in the biosynthesis of serotonin [3]. In the chemical reaction, all three enzymes require the reduced pterin tetrahydrobiopterin as cofactor, as well as molecular oxygen and iron, to hydroxylate their amino acid substrate [4]. The aromatic amino acid hydrogenase-mediated neurotransmission plays a crucial role in mental and physical health. Some researches have also shown that the decline in physiological functions is often associated with (or accompanied by) changes in various neurotransmitter levels in the central nervous system ⇑ Corresponding author. E-mail address: [email protected] (J. Cao).

[5]. Alterations in the brain concentration of some neurotransmitter can produce behavior abnormalities such as depression, insomnia, autism, eating disorders, despair, misery, aggression and schizophrenia [6–11]. In addition, hydroxylation of PAH is an important step in phenylalanine catabolism and neurotransmitter biosynthesis and is linked to a severe variant of phenylketonuria (PKU) in humans [1]. Calvo et al. (2008) also show that Pah knockout worms display serious cuticle abnormalities [12]. TH deficiency can cause the autosomal recessive form of dopa responsive dystonia (Segawa’s disease) [13]. And several researches have shown that aging are often associated with the TPH activity [3,14]. In phylogenetic terms, the structural features or expression profiles of some AAAH homologs have been described partially in human [15,16], amphioxus [17], amoeba [18], tobacco hornworm [19]. Given the relatively high amino acid sequence similarity found among AAAH proteins [4], surprisingly, few studies have investigated their relationships. Since phylogenetic studies of protein family is a valuable tool to determine conserved or divergent regions, which can potentially lead to functional predictions [20]. In this study we elucidated the evolutionary history of the AAAH protein family by a comprehensive bioinformatics/phylogenetic approach, which might lay the framework for further research into function of these genes. 2. Materials and methods 2.1. Data collection To identify members of AAAH protein family, the key words of ‘‘aromatic amino acid hydroxylases’’ are firstly used as a query to

0014-5793/$36.00 Ó 2010 Published by Elsevier B.V. on behalf of the Federation of European Biochemical Societies. doi:10.1016/j.febslet.2010.11.005

4776

J. Cao et al. / FEBS Letters 584 (2010) 4775–4782

search against the NCBI database and superfamily HMM library and genome assignment server (http://supfam.org/SUPERFAMILY/) is also used as supplement. At the same time, the BLASTP and TBLASTN programs were used to search NCBI, Ensembl and FlyBase. All amino acid sequences are examined for potential aromatic amino acid hydroxylase domain for their authenticity using CD-Search analyses. 2.2. Sequence alignment and phylogenetic analysis Firstly, amino acid sequences of AAAH genes were aligned using Clustal X v2.0.12 program [21] with the default parameters. PHYLIP v.3.69 was used for our phylogenetic analysis employing the neighbor joining (NJ) method with 1000 bootstrap values. Meanwhile, to further verify the reliability of the NJ tree, maximum likelihood (ML) analysis was also performed using ProtTest v2.4 [22]. It is based on the PhyML program for ML optimizations, and the best-fit model considers the relative rates of amino acid replacement and the evolutionary constraints imposed by conservation of protein structure and function. The Akaike Information criterion (AIC) was implemented in ProtTest v2.4 to estimate the most appropriate model of amino acid substitution for tree building analyses. Then, according to the best-fit model predicted by ProtTest v2.4, a rooted maximum likelihood tree was constructed using the PhyML v3.0 online program [23], and the reliability of interior branches was assessed with 1000 bootstrap resamplings. Finally, the phylogenetic trees were displayed using MEGA v4 [24].

eight discrete categories and the Ka/Ks values are computed by calculating the expectation of the posterior distribution [26]. 2.4. Functional divergence analysis To estimate the level of functional divergence and predict important amino acid residues for these functional differences among AAAH subfamilies, the coefficients of type-I functional divergence were calculated using the method suggested by Gu et al. [29,30]. The analysis was carried out with Diverge (version 2.0). This method is based on maximum likelihood procedures to estimate significant changes in the site-specific shift of evolutionary rate or site-specific shift of amino acid properties after the emergence of two paralogous sequences. The advantage of this method is that it uses amino acid sequences and, thereby, is not sensitive to saturation of synonymous sites. Type I designates amino acid configurations that are very conserved in gene 1 but highly variable in gene 2, or vice versa, implying that these residues have experienced altered functional constraints [29]. The coefficients of functional divergence values are significantly greater than 0, suggesting site-specific altered selective constraints or a radical shift of amino acid physiochemical properties after gene duplication. Moreover, a site-specific posterior analysis was used to predict amino acid residues that were crucial for functional divergence [31].

3. Results and discussion 3.1. Identification of AAAH genes

2.3. Positive selection assessment Identification of site-specific positive and purifying selection was calculated with The Selecton Server using a Bayesian inference approach [25,26]. Since an evolutionary model describes in probabilistic terms how characters are expressive enough to describe the biological reality. This Server can implement several evolutionary models, such as MEC (Mechanistic Empirical Combination Model), M5 (gamma), M7 (beta), M8a (xs = 1) and M8 (xs P 1), each of which assumes different biological assumptions and enable contrasting different hypotheses through testing which model better fits the data. M8 allows for positive selection operating on the protein. A proportion p0 of the sites are drawn from a beta distribution (which is defined in the interval [0,1]), and a proportion p1(=1  p0) of the sites are drawn from an additional category xs (P1). Thus, sites drawn from the beta distribution are sites experiencing purifying selection, whereas sites drawn from the xs category are sites experiencing either neutral or positive selection. M8a model is similar to the M8 model, except for the fact that it does not allow for positive selection by setting xs = 1. Thus, only neutral and purifying selection are allowed here. M7 model is similar to M8, except for the fact that it assumes only a beta distribution with no additional category. Thus it allows mainly for purifying selection in the protein. M5 model assumes Ka/Ks among sites are gamma distributed and thus may allow for purifying, neutral, and positive selection [27]. The MEC model differs from the M models in that it allows instantaneous substitutions between pairs of codons that differ at 2 or 3 codon positions and in its ability to take into account the different replacement probabilities between amino acids [28]. The advantage of the MEC model over the other models presented here is that by treating different amino-acid replacements differently, Ka is computed differently: under the MEC model, a position with radical replacements will obtain a higher Ka value than a position with more moderate replacements. These models all assume a statistical distribution to account for heterogenous Ka/Ks values among sites. And the distributions are approximated using

We collected AAAH family gene sequences through database search. The result revealed the presence of AAAH homologs in much of organisms. For Eukaryota, we did not find definitive AAAH orthologs in Fungi and higher Plants. While AAAH analogs mainly exist in Alphaproteobacteria, Betaproteobacteria, Gammaproteobacteria, Bacteroidetes, Actinobacteria and so on in Prokaryota. Considering incomplete nucleotides and proteins sequence databases of some species, the number of AAAH genes collected in a species varied from 1 to 5 in Eukaryota and only an AAAH gene existed in one species in Prokaryota. In total, 102 and 59 AAAH sequences from 44 and 59 selected species in Eukaryota and Prokaryota, respectively, were obtained (Supplementary data 1). 3.2. Phylogenetic analyses of AAAH gene lineages To explore the phylogenetic relationship among AAAH paralogous, a neighbor joining phylogenetic tree with 161 AAAH genes from 103 species was inferred from the amino acid sequences using the PHYLIP v.3.69. At the same time, a rooted maximum-likelihood (ML) phylogenetic tree was aslo constructed using the PhyML v3.0 online program [23] under the best-fit model (LG+I+G) for amino acid substitution was selected by ProtTest v2.4 [22] with discrete gamma distribution in four categories. All parameters (gamma shape = 1.051; proportion of invariants = 0.008) were estimated from the dataset. Tree topology assessed by ML method was substantially similar to the NJ tree (data not shown). Fig. 1 shows the consensus phylogeny obtained for these sequences. As a whole, eukaryotic AAAH genes originated from prokaryotic ones. In prokaryotes, AAAH members within the same species tend to be grouped in the same branch, such as Gammaproteobacteria (denoted by solid square, AAAH-gamma); Betaproteobacteria (denoted by hollow diamond, AAAH-beta); Alphaproteobacteria (denoted by hollow triangle, AAAH-alpha) and so on. Phylogenetic analysis showed that three distinct clusters, named PAH (bootstrap value is 198), TPH (bootstrap value is 607) and TH (bootstrap value

J. Cao et al. / FEBS Letters 584 (2010) 4775–4782

4777

Fig. 1. Phylogenetic tree of 161 aromatic amino acid hydroxylases. This NJ tree was inferred from the amino acid sequences alignment using the PHYLIP v.3.69. This tree was validated by ML method under the best-fit model LG+I+G (selected by ProtTest v2.4) with discrete gamma distribution in four categories. All parameters (gamma shape = 1.051; proportion of invariants = 0.008) were estimated from the dataset. Red, blue and green evolutionary branches denote TH, TPH and PAH subfamilies in eukaryote, respectively. Evolutionary branch in black denotes AAAH members in prokaryote, in which different symbols are marked in different prokaryotic species, such as, hollow round: Actinobacteria; solid round: Green non-sulfur bacteria; hollow square: Bacteroidetes; solid square: Gammaproteobacteria; hollow triangle: Alphaproteobacteria; hollow diamond: Betaproteobacteria; solid diamond: Deltaproteobacteria; solid triangle in green: Acidobacteria; solid square in green: Archaea Korarchaeota; solid round in green: Firmicutes; solid diamond in green: Chlamydiae. The dynamic distribution of ACT domain in AAAH family has also been marked in this figure. That is, AAAH members with ACT domain are shown in red and those without ACT domain in black.

is 477), were unambiguously separated from Protists and lower Plants (Mosses/Green algae) before the divergence of Choanoflagellates in eukaryote. From Fig. 1, we also inferred that two major duplications had occurred in TPH and TH subfamilies. TPH duplication, which occurred in the vertebrate lineages, ultimately led to the emergence of two lineages which evolved into TPH1 (bootstrap value is 1000) and TPH2 (bootstrap value is 1000). However, for TH duplication, the condition is more complicated. Two TH genes emerged as a consequence of a whole genome duplication before the divergence of jawed vertebrates, but TH2 (bootstrap value is 999) was secondarily lost in eutherians [32]. Since whole genome duplication has occurred in the ancestral vertebrate [33], and analysis of a phylogenetic perspective may provide the basis for understanding the functional diversity within conserved protein

families. Our phylogenetic analysis suggests that the AAAH originated by duplication and divergence of a common protein at the base of the eukaryotic tree. Both small-scale and large-scale gene duplications are known to contribute to the complexity of eukaryotic organisms [34]. The high level of sequence identity between different subfamilies suggests evolutionarily conserved functions. While the presence of AAAHs in phylogenetically distant taxonomic groups such as bacteria, insects and mammals highlight their general functional importance. 3.3. Exon-intron evolution of the AAAH family genes To examine the possible mechanisms of structural evolution of AAAH paralogous, we compared the exon-intron structures of

4778

J. Cao et al. / FEBS Letters 584 (2010) 4775–4782

Fig. 2. Conservation and variability of intron positions and phases in eukaryotic AAAH family. Three subgroups [PAH (a); TPH (b); TH (c)] are shown, respectively in here. The 0, 1 and 2 phase introns are marked with black, red and green lines, respectively. To facilitate the presentation, some intron such as Ia, Ig, Ii and so on are also named artificially according to their different inserted positions.

individual AAAH genes in Eukaryota. Fig. 2 provides a detailed illustration of the distribution and position of introns within each of the AAAH paralogous. In order to facilitate the presentation, we named some introns, such as intron Ia, Ig, Ih and so on according to their different inserted positions. In general, the positions of some spliceosomal introns are conserved in orthologous genes from protists, sea anemone, insect to mammalia whereas others are lineage-specific. Such as, intron Id is insect-specific only for TPH and not likely gained by the ancestor. As well, intron Ik exists not only in TH subfamily of Vertebrata, Hemichordata and Echinodermate but also in PAH subfamily of Chordata and Echinodermate, but in other evolutionary branches (such as in TPH), this intron was lost. Intron Io exists specifically in PAH of ascidians and vertebrates, while is lost in Canis lupus (Fig. 2a). Intron Ih exists not only in vertebrata TH gene but also in insect TPH lineage. So it seems that a intron gain has been occured in these groups (Fig. 2b and c). It has been suggested that abundance of intron loss occured after segmental duplication [35]. Our study about TH and TPH duplication also proved this. Excepting the duplicate gene TH2, intron Is exists in all other TH group (Fig. 2c). So, the duplication leading to the emergence of TH1 and TH2 might promote the loss of intron Is in TH2. This phenomena of intron loss following gene duplication also occurs in TPH, in which intron Is is lost in TPH1 (Fig. 2b). It is thus clear that duplication plays an important role

in the organization of gene. In Fig. 2, some conserved introns (such as intron Ia, Ig, Ii, Ij, Il, Im, In) are also marked. While, from Fig. 2, we also find that the gene structures of Arthropods, Nenatodes and Flatworms for TH, TPH and PAH seem to have fewer conserved introns when compared to other eukaryotes. From a genomic perspective, changes in exon-intron structures appear to be predominantly dependent on lineage-specific trends [36]. Our analysis about AAAH provides further support to the hypothesis that intron conservation is usually found in slow-evolving lineages [37], but is rare in fast-evolving species like Caenorhabditis elegans and Drosophila melanogaster [38,39]. This lineage-specific intron evolution trends might be connected with generation time and population size, which might affect whether species evolve quickly or slowly. Perhaps, introns are disfavoured to gene expression in these species that are under strong selective pressure for short genome replication time. And chosen in part for their short generation times, for which short replication times are presumably an advantage [39]. Therefore, the exon-intron structure of the AAAH family is highly dynamic such that evolution of this gene family must have involved numerous intron losses or gains. In general, the structural diversity of gene family members is a mechanism for the evolution of multiple gene families, while intron loss or gain can be an important step in generating this structural diversity and complex-

J. Cao et al. / FEBS Letters 584 (2010) 4775–4782

4779

Fig. 2 (continued)

ity [31,40]. In this study, we analyzed the structural diversity of AAAH genes and found that intron loss and gain events occurred during the expansion and structural evolution of AAAH paralogous. And the number and position of intron loss or gain was distinctly different among them. On the average, intron positions have been shown to be remarkably well conserved over long evolutionary time intervals [41]. In this paper, we observed that the intron positions and phases of the AAAH family genes were well-conserved even in almost all of the lineages (Fig. 2). And some lineage-specific

intron loss or gain events have occurred during the expansion of AAAH genes and generated diversity of gene structure. 3.4. The dynamic distribution of ACT domain in AAAH genes The ACT domain is one of many different small molecule-binding domains characterized by high sequence divergence and evolutionary mobility, which can be fused into a wide variety of proteins in the course of protein evolution [42]. The majority of these can

4780

J. Cao et al. / FEBS Letters 584 (2010) 4775–4782

bind amino acids that function to regulate some aspect of amino acid metabolism, such as ACT domains in acetohydroxyacid synthase [43]; 3-phosphoglycerate dehydrogenase [44]; chorismate mutase [45] and so on. To describe the distribution of ACT domain in AAAH genes, CD conserved domain analyses were also performed for all AAAH proteins. The results indicated that ACT domain is not present in all investigated species but first appeared in the protists tetrahymena and its distribution was obviously dynamic. Fig. 1 marked this change. AAAH genes with ACT domain are shown in red and genes without ACT domain in black. That is, ACT domain did not appear in prokaryotic AAAH genes but was later obtained in eukaryotic ones in evolution. In general, the origin and evolution of multidomain proteins is usually driven by diverse processes including fusion, fission, domain shuffling by genetic recombination, which play important roles in the generation of new protein architectures [46,47]. The first presence of linked ACT domain in tetrahymena strongly suggests the gene fusion event occurred at least as early as the protists appearing. Although different from most of ACT-containing proteins described above, the ACT domains in AAAH can not bind amino acids or substrate analogs [1,48], an allosteric effect has been demonstrated in the ACT domain of PAH in mammalian [49]. So, the acquisition of ACT domain is likely to reflect the demand for greater sophistication in protein function in complex eucaryotic species. Strangely, some eukaryotic AAAH genes (such as AAAHs in moss, green algae and apicomplexans; THs in flatworms; TPH and TH in nenatodes; PAH in fruitfly; TPH and PAH in sea squirt; TPH2 in platypuses) have not ACT domain. We have no clear explanation for this status, but it may be due to the functional inactivation or change of some genes. Overall, the presence of ACT domain in new AAAH may have had a role in contributing to the higher complexity of eucaryota. The effect of this on the structure and function of closely related proteins in different species still remains to be examined, but our findings suggest that acquisition of ACT domain may play an important role in protein evolution.

Table 1 Likelihood values and parameter estimates for the AAH genes. Gene branches

Ka/Ksa

Loglikelihood

Positively selected sites

MEC

TH PAH TPH TPH1 TPH2 AAAH-alpha AAAH-beta

0.092 0.109 0.176 0.083 0.073 0.193 0.249

23076.2 22817.5 17294.6 7435.92 9278.76 7902.06 8150.08

AAAH-gamma

0.220

10730.8

not found 3, 4, 6, 11, 13 41, 167, 168 136 40 2, 53 2, 3, 5, 8, 31, 76, 80, 95, 173, 215, 273, 276, 277, 287, 289, 295 82, 85, 208

M5(gamma)

TH PAH TPH TPH1 TPH2 AAAH-alpha AAAH-beta AAAH-gamma

0.174 0.165 0.238 0.084 0.077 0.239 0.221 0.252

23787.3 23613.7 17749.4 7473.3 9405.83 8107.92 8383.13 11034.1

not not not not not not not not

found found found found found found found found

M7(beta)

TH PAH TPH TPH1 TPH2 AAAH-alpha AAAH-beta AAAH-gamma

0.162 0.152 0.204 0.077 0.072 0.194 0.201 0.211

23736.3 23610.8 17729.6 7463.58 9395.16 8097.31 8379.84 11024.6

not not not not not not not not

found found found found found found found found

M8a(xs = 1)

TH PAH TPH TPH1 TPH2 AAAH-alpha AAAH-beta AAAH-gamma

0.173 0.158 0.210 0.077 0.073 0.203 0.204 0.219

23750.2 23601.2 17731.5 7466.88 9396.42 8097.7 8379.45 11023.9

not 4 not not not not not not

found

TH PAH TPH TPH1 TPH2 AAAH-alpha AAAH-beta AAAH-gamma

0.162 0.166 0.228 0.082 0.074 0.205 0.252 0.248

23752.2 23597.1 17741.1 7435.92 17741.1 8092 8348 11026.4

not found 4, 5, 6, 11, 20 not found 136 not found not found not found 4, 82

M8(xs P 1)

3.5. Variable selective pressures among amino acid sites The Ka/Ks metric is defined to measure selection pressure on amino acid substitutions. A Ka/Ks ratio greater than 1 suggests positive selection and the ratio less than 1 suggests purifying selection. Amino acids in a protein sequence are expected to be under different selective pressure and to have different underlying Ka/ Ks ratios. To analyze positive or negative selection of specific amino acid sites within the full-length protein sequences of AAAH sequences for different branches, substitution rate ratios of nonsynonymous (Ka) versus synonymous (Ks) mutations were calculated with The Selecton Server using a Bayesian inference approach [26]. The results showed that the Ka/Ks ratios of the five subclades in eukaryota and three subclades in prokaryota are significantly different (Table 1). In general, compared with the five eukaryotic branches, three prokaryotic branches have higher Ka/ Ks, suggesting an accelerated evolutionary rate in prokaryotic AAAH group. This study agrees with previous studies implying that shorter generation times (or higher replication speed) are associated with accelerated evolutionary rates [50,51]. The Ka/Ks value of TPH is higher and the Ka/Ks values of TPH1 and TPH2 are lower in Eukaryota. Despite the differences in Ka/Ks values among these subclades, all the estimated Ka/Ks values are substantially lower than 1, suggesting that the AAAH members within each subgroup were under strong purifying selection pressure. We used MEC (Mechanistic Empirical Combination Model), M5 (gamma), M7 (beta), M8a (xs = 1) and M8 (xs P 1) models to perform the tests. The selection model M5 and M7 do not suggest presence of positively selected sites, whereas MEC and M8 models get similar results for the PAH and TPH1 genes and give different results for

Model

a

found found found found found found

The Ka/Ks ratio is the an average over all sites of gene branch alignments.

all others (Table 1). For example the MEC model predicted possible positively selected sites in all genes except TH. In contrast, the M8 model found no positively selected sites for TPH, TPH2, AAAH-alpha, and AAAH-beta. In addition, while both models predicted some sites for AAAH-gamma, far more were predicted by MEC (16) than M8 (2) and none of the sites were in common between the two methods. Interestingly, much of the positively selected sites inferred in our research occurred on the branches right before (TPH) or after (TPH1 and TPH2) duplication events. Functional differentiation of duplicate genes is extremely important for genetic novelty. Based on our phylogenetic analyses (Fig. 1), both TPH1 and TPH2 come from the duplication of ancestral TPH on the emergence of vertebrate. And previous researches have also observed that asymmetric evolution rate often consist in the duplicated gene pairs [52,53]. So we suggest that TPH may play a broader role before the onset of vertebrate, and the higher Ka/Ks in TPH could result from relaxation of functional constraint, and that TPH1 and TPH2 come from the TPH replication and likely refine and polarize the role of TPH in vertebrates, so they have a lower Ka/ Ks value. The putatively selected sites are also shown in Table 1, which might be responsible for the functional divergence among hydroxylase genes.

4781

J. Cao et al. / FEBS Letters 584 (2010) 4775–4782 Table 2 Functional divergence estimated in AAAH paralogous.

a b c d

Comparison

ha

SEb

LRTc

N(0.5)d

N(0.6)d

TH/PAH TH/TPH1 TH/TPH2 PAH/TPH1 PAH/TPH2 TPH1/TPH2 TH/AAAH-gamma TH/AAAH-alpha TH/AAAH-beta PAH/AAAH-gamma PAH/AAAH-alpha PAH/AAAH-beta TPH1/AAAH-gamma TPH1/AAAH-alpha TPH1/AAAH-beta TPH2/AAAH-gamma TPH2/AAAH-alpha TPH2/AAAH-beta AAAH-gamma/AAAH-alpha AAAH-gamma/AAAH-beta AAAH-alpha/AAAH-beta

0.08 0.232 0.3 0.0296 0.6176 0.588 0.6128 0.5336 0.652 0.5912 0.632 0.7064 0.3608 0.276 0.5168 0.6208 0.5592 0.812 0.3992 0.3992 0.1

0.058434 0.181894 0.217097 0.155206 0.129162 0.164921 0.054716 0.060708 0.066395 0.060716 0.068595 0.066342 0.13534 0.226091 0.161215 0.148392 0.192622 0.204739 0.052172 0.059226 0.059405

1.874355 1.62682 1.909571 0.036372 22.86347 12.71032 125.4314 77.258 96.43274 94.81306 84.88775 113.378 7.106959 1.490231 10.27626 17.50178 8.427928 15.72941 58.54659 45.43073 2.833705

1 2 2 0 122 123 75 53 118 60 90 119 10 2 115 122 123 129 35 34 3

1 1 1 0 120 122 56 46 63 50 56 102 4 0 89 114 81 128 22 24 2

h is the coefficient of functional divergence. SE: standard error. LRT is a likelihood ratio test. N(0.5) and N(0.6) means the numbers of divergent residues when the cut-off value is 0.5 and 0.6, respectively.

3.6. Analysis of functional divergence Next, we further investigated whether amino acid substitutions in the highly conserved hydroxylase domain could have caused adaptive functional diversification. Type-I functional divergence between gene clusters of the AAAH family were estimated by posterior analysis using the DIVERGE program algorithms [29,30], which evaluates the shifted evolutionary rate and altered amino acid properties. Pairwise comparisons of paralogous members in subfamilies AAAH-gamma, AAAH-beta and AAAH-alpha in prokaryota and TH, PAH, TPH1 and TPH2 in eukaryota were carried out and the rate of amino acid evolution at each sequence position was estimated, respectively. Our results, as shown in Table 2, indicated that the coefficient of functional divergence (h) values between AAAH subfamilies vary from 0.0296 to 0.812. These observations indicated that there were significantly site-specific altered selective constraints on most members of the AAAH family, leading to subfamily-specific functional evolution after diversification. Moreover, some critical amino acid residues responsible for the functional divergence were predicted based on site-specific profiles in combination with suitable cut-off values derived from the posterior probability of each comparison. And the results indicate that distinct differences in the number and distribution of predicted sites for functional divergence within each pair. For example, no critical amino acid site was predicted for the subfamily PAH/TPH1 pair, while over 100 critical amino acids sites were predicted for the subfamily TPH2/PAH (122) and TPH1/TPH2 (123) in eukaryota. In prokaryota, lower theta value (0.1) was detected for the AAAH-alpha/AAAH-beta comparison. Interestingly, when compared AAAH members in eukaryotic and prokaryotic species, only 2 critical amino acid sites were predicted for the subfamily TPH1/AAAH-alpha. Moreover, when the cut-off value is 0.6, no sites were predicted, implying a lower evolutionary rate between the TPH1/AAAH-alpha. Since TPH1 and TPH2 come from the duplication of ancestral TPH on the emergence of vertebrate, higher seta value (0.588) between them indicates higher evolutionary rata after the emergence of two lineages TPH1 and TPH2. During the long period of evolution, the shifted evolutionary rate at specific amino acid sites with in each pair might facilitate the functional divergence of AAAH subfamilies. Next, residues

predicted to be functionally divergent were mapped onto topology models of human and Pseudomonas syringae AAAH members (Supplementary data 2). The predicted functional sites are not equally distributed throughout the respective AAAH, but instead are clustered in some areas. And several sites have been experimentally verified, such as, P206S substitution can reduce thermal stability and solubility of TPH2 [54]; The K274E mutation of PAH gene often has physiological consequences related to amine neurotransmitter function [55]; Y325 of PAH influences iron binding and coupling efficiency [56], and so on. The results of the functional divergence analysis suggested that AAAH genes should be significantly functionally divergent from each other, owing to the evolutionary rate differences at some amino acid sites. Perhaps, amino acid mutations spured AAAH family genes to evolve some new functions after divergence. Hence, functional divergence might reflect the existence of long-term selective pressure. 4. Conclusion This study provides a comparative genomic analysis addressing the phylogenetic relationships and evolution of the aromatic amino acid hydroxylase gene family in Eukaryota. The results of the phylogenetic analysis revealed that duplication and deletion had occurred in TPH and TH subfamily. The exon-intron structure analysis showed that the gene structures were diverse, while some intron positions and intron phases were highly conserved across different lineages. Functional divergence analysis revealed critical amino acid residues, leading to subgroup-specific functional evolution after their phylogenetic diversification. Acknowledgment This work is partly supported by Jiangsu University Senior Personnel Research Grants to JC (10JDG027). Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.febslet.2010.11.005.

4782

J. Cao et al. / FEBS Letters 584 (2010) 4775–4782

References [1] Flatmark, T. and Stevens, R.C. (1999) Structural insight into the aromatic amino acid hydroxylases and their disease-related mutant forms. Chem. Rev. 99, 2137–2160. [2] Kopin, I.J., Gordon, E.K., Jimerson, D.C. and Polinsky, R.J. (1983) Relation between plasma and cerebrospinal fluid levels of 3-methoxy-4-hydroxyphenylglycol. Science 1219, 73–75. [3] Sanatiago, M., Machado, A., Reinoso-Suarez, F. and Cano, F. (1988) Changes in biogenic amines in rat hippocampus during development and aging. Life Sci. 42, 2503–2508. [4] Teigen, K., McKinney, J.A., Haavik, J. and Martinez, A. (2007) Selectivity and affinity determinants for ligand binding to the aromatic amino acid hydroxylases. Curr. Med. Chem. 14, 455–467. [5] Sparks, D., Markesbery, W. and Slevin, J. (1985) Age-associated alteration of serotonergic synaptic neurochemistry in rat rostral hypothalamus. Neurobiol. Aging. 6, 213–217. [6] Lerer, B., Gillon, D., Litchenberg, P., Gorfine, M. and Allolio, B. (1995) Interrelationship of age, depression, and central serotonergic function: evidence from fenfluramine challenge studies. Int. Psychogeriatr. 8, 83–102. [7] Maes, M., Ombelet, W., Verkerk, R., Bosmans, E. and Scharpe, S. (2001) Effects of pregnancy and delivery on the availability of plasma tryptophan to the brain: relationships to delivery-induced immune activation and early anxiety and depression. Psychol. Med. 31, 847–858. [8] Van Praag, H.M. (2004) Can stress cause depression? Prog. Neuropsychopharmacol. Biol. Psychiatry 28, 891–907. [9] Maron, E. and Shlik, J. (2006) Serotonin function in panic disorder: important, but why? Neuropsychopharmacology 31, 1–11. [10] Popova, N.K. (2006) From genes to aggressive behavior: the role of serotonergic system. Bioessays 28, 495–503. [11] Jans, L.A., Riedel, W.J., Markus, C.R. and Blokland, A. (2007) Serotonergic vulnerability and depression: assumptions, experimental evidence and implications. Mol. Psychiatry 12, 522–543. [12] Calvo, A.C., Pey, A.L., Ying, M., Loer, C.M. and Martinez, A. (2008) Anabolic function of phenylalanine hydroxylase in Caenorhabditis elegans. FASEB J. 22, 346–3058. [13] Bräutigam, C., Wevers, R.A., Jansen, R.J., Smeitink, J.A., de Rijk-van Andel, J.F., Gabreëls, F.J. and Hoffmann, G.F. (1998) Biochemical hallmarks of tyrosine hydroxylase deficiency. Clin. Chem. 44, 1897–1904. [14] Hussain, A.M. and Mitra, A.K. (2000) Effect of aging on tryptophan hydroxylase in rat brain: implications on serotonin level. Drug Metab. Dispos. 28, 1038–1042. [15] Boularand, S., Darmon, M.C. and Mallet, J. (1995) The human tryptophan hydroxylase gene. An unusual splicing complexity in the 50 -untranslated region. J. Biol. Chem. 270, 3748–3756. [16] Sugden, K., Tichopad, A., Khan, N., Craig, I.W. and D’Souza, U.M. (2009) Genes within the serotonergic system are differentially expressed in human brain. BMC Beurosci. 10, 50. [17] Patton, S.J., Luke, G.N. and Holland, P.W. (1998) Complex history of a chromosomal paralogy region: insights from amphioxus aromatic amino acid hydroxylase genes and insulin-related genes. Mol. Biol. Evol. 15, 1373–1380. [18] Siltberg-Liberles, J., Steen, I.H., Svebak, R.M. and Martinez, A. (2008) The phylogeny of the aromatic amino acid hydroxylases revisited by characterizing phenylalanine hydroxylase from Dictyostelium discoideum. Gene 427, 86–92. [19] Gorman, M.J., An, C. and Kanost, M.R. (2007) Characterization of tyrosine hydroxylase from Manduca sexta. Insect Biochem. Mol. Biol. 37, 1327–1337. [20] Eisen, J.A. (1998) Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res. 8, 163–167. [21] Larkin, M.A., Blackshields, G., Brown, N.P., et al. (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948. [22] Abascal, F., Zardoya, R. and Posada, D. (2005) ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21, 2104–2105. [23] Guindon, S., Lethiec, F., Duroux, P. and Gascuel, O. (2005) PHYML Online – a web server for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Res. 33, W557–W559. [24] Tamura, K., Dudley, J., Nei, M. and Kumar, S. (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24, 1596–1599. [25] Doron-Faigenboim, A., Stern, A., Mayrose, I., Bacharach, E. and Pupko, T. (2005) Selecton: a server for detecting evolutionary forces at a single amino-acid site. Bioinformatics 21, 2101–2103. [26] Stern, A., Doron-Faigenboim, A., Erez, E., Martz, E., Bacharach, E. and Pupko, T. (2007) Selecton 2007: advanced models for detecting positive and purifying selection using a Bayesian inference approach. Nucleic Acids Res. 35, W506– W511. [27] Yang, Z., Nielsen, R., Goldman, N. and Pedersen, A.M. (2000) Codonsubstitution models for heterogeneous selection pressure at amino acid sites. Genetics 155, 431–449.

[28] Doron-Faigenboim, A. and Pupko, T. (2007) A combined empirical and mechanistic codon model. Mol. Biol. Evol. 24, 388–397. [29] Gu, X. (1999) Statistical methods for testing functional divergence after gene duplication. Mol. Biol. Evol. 16, 1664–1674. [30] Gu, X. (2001) Maximum-likelihood approach for gene family evolution under functional divergence. Mol. Biol. Evol. 18, 453–464. [31] Li, W., Liu, B., Yu, L., Feng, D., Wang, H. and Wang, J. (2009) Phylogenetic analysis, structural evolution and functional divergence of the 12-oxophytodienoate acid reductase gene family in plants. BMC Evol. Biol. 9, 90. [32] Yamamoto, K., Ruuskanen, J.O., Wullimann, M.F. and Vernier, P. (2010) Two tyrosine hydroxylase genes in vertebrates New dopaminergic territories revealed in the zebrafish brain. Mol. Cell Neurosci. 43, 394–402. [33] Dehal, P. and Boore, J.L. (2005) Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol. 3, e314. [34] Gu, X., Wang, Y. and Gu, J. (2002) Age distribution of human gene families shows significant roles of both large- and small-scale duplications in vertebrate evolution. Nat. Genet. 31, 205–209. [35] Lin, H., Zhu, W., Silva, J.C., Gu, X. and Buell, C.R. (2006) Intron gain and loss in segmentally duplicated genes in rice. Genome Biol. 7, R41. [36] Yandell, M., Mungall, C.J., Smith, C., Prochnik, S., Kaminker, J., Hartzell, G., Lewis, S. and Rubin, G.M. (2006) Large-scale trends in the evolution of gene structures within 11 animal genomes. PLoS Comput. Biol. 2, e15. [37] Putnam, N.H., Srivastava, M., Hellsten, U., et al. (2007) Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science 317, 86–94. [38] Roy, S.W. and Gilbert, W. (2005) Rates of intron loss and gain: implications for early eukaryotic evolution. Proc. Natl. Acad. Sci. USA 102, 5773–5778. [39] Roy, S.W. and Gilbert, W. (2006) The evolution of spliceosomal introns: patterns, puzzles and progress. Nat. Rev. Genet. 27, 211–221. [40] Zhang, Z. and Kishino, H. (2004) Genomic background predicts the fate of duplicated genes: evidence from the yeast genome. Genetics 166, 1995– 1999. [41] Lander, E.S., Linton, L.M., Birren, B., et al. (2001) Initial sequencing and analysis of the human genome. Nature 409, 860–921. [42] Anantharaman, V., Koonin, E.V. and Aravind, L. (2001) Regulatory potential, phyletic distribution and evolution of ancient, intracellular small-moleculebinding domains. J. Mol. Biol. 307, 1271–1292. [43] Mendel, S., Elkayam, T., Sella, C., Vinogradov, V., Vyazmensky, M., Chipman, D.M. and Barak, Z. (2001) Acetohydroxyacid synthase: a proposed structure for regulatory subunits supported by evidence from mutagenesis. J. Mol. Biol. 307, 465–477. [44] Schuller, D.J., Grant, G.A. and Banaszak, L.J. (1995) The allosteric ligand site in the V-max-type cooperative enzyme phosphoglycerate dehydrogenase. Nat. Struct. Biol. 2, 69–76. [45] Zhang, S., Pohnert, G., Kongsaeree, P., Wilson, D.B., Clardy, J. and Ganem, B. (1998) Chorismate mutase-prephenate dehydratase from Escherichia coli: study of catalytic and regulatory domains using genetically engineered proteins. J. Biol. Chem. 273, 6248–6253. [46] Snel, B., Bork, P. and Huynen, M.A. (2002) Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Res. 12, 17–25. [47] Nilsen, T.W. and Graveley, B.R. (2010) Expansion of the eukaryotic proteome by alternative splicing. Nature 463, 457–463. [48] Thorolfsson, M., Ibarra-Molero, B., Fojan, P., Petersen, S.B., Sanchez-Ruiz, J.M. and Martinez, A. (2002) L-Phenylalanine binding and domain organization in human phenylalanine hydroxylase: a differential scanning calorimetry study. Biochemistry 41, 7573–7585. [49] Siltberg-Liberles, J. and Martinez, A. (2009) Searching distant homologs of the regulatory ACT domain in phenylalanine hydroxylase. Amino Acids 36, 235– 249. [50] Gu, X. and Li, W.H. (1992) Higher rates of amino acid substitution in rodents than in humans. Mol. Phylogenet. Evol. 1, 211–214. [51] Duffy, S., Shackelton, L.A. and Holmes, E.C. (2008) Rates of evolutionary change in viruses: patterns and determinants. Nat. Rev. Genet. 9, 267–276. [52] Conant, G.C. and Wagner, A. (2003) Asymmetric sequence divergence of duplicate genes. Genome Res. 13, 2052–2058. [53] Fang, S., Ting, C.T., Lee, C.R., Chu, K.H., Wang, C.C. and Tsaur, S.C. (2009) Molecular evolution and functional diversification of fatty acid desaturases after recurrent gene duplication in Drosophila. Mol. Biol. Evol. 26, 1447–1456. [54] Cichon, S., Winge, I., Mattheisen, M., et al. (2008) Brain-specific tryptophan hydroxylase 2 (TPH2): a functional Pro206Ser substitution and variation in the 50 -region are associated with bipolar affective disorder. Hum. Mol. Genet. 17, 87–97. [55] Chao, H.M. and Richardson, M.A. (2002) Aromatic amino acid hydroxylase genes and schizophrenia. Am. J. Med. Genet. 114, 626–630. [56] Miranda, F.F., Kolberg, M., Andersson, K.K., Geraldes, C.F. and Martinez, A. (2005) The active site residue tyrosine 325 influences iron binding and coupling efficiency in human phenylalanine hydroxylase. J. Inorg. Biochem. 99, 1320–1328.