Structural-based analysis of dihydrofolate reductase evolution

Structural-based analysis of dihydrofolate reductase evolution

Molecular Phylogenetics and Evolution 61 (2011) 212–230 Contents lists available at ScienceDirect Molecular Phylogenetics and Evolution journal home...

5MB Sizes 2 Downloads 123 Views

Molecular Phylogenetics and Evolution 61 (2011) 212–230

Contents lists available at ScienceDirect

Molecular Phylogenetics and Evolution journal homepage: www.elsevier.com/locate/ympev

Structural-based analysis of dihydrofolate reductase evolution David Hecht a,⇑, Jonathan Tran b, Gary B. Fogel c,1 a

Southwestern College, 900 Otay Lakes Rd., Chula Vista, CA 91910, USA Brown University, 69 Brown St., Providence, RI 02912, USA c Natural Selection, Inc., 9330 Scranton Rd., San Diego, CA 92121, USA b

a r t i c l e

i n f o

Article history: Received 12 March 2011 Revised 27 May 2011 Accepted 8 June 2011 Available online 17 June 2011 Keywords: DHFR Phylogeny Evolution Sequence alignment

a b s t r a c t The evolution of dihydrofolate reductase (DHFR) was studied through a comprehensive structural-based analysis. An amino acid sequence alignment was generated from a superposition of experimentally determined X-ray crystal structures of wild-type (wt) DHFR from the Protein Data Bank (PDB). Using this structure-based alignment of DHFR, a metric was generated for the degree of conservation at each alignment site – not only in terms of amino acid residue, but also secondary structure, and residue class. A phylogenetic tree was generated using the alignment that compared favorably with the canonical phylogeny. This structure-based alignment was used to confirm that the degree of conservation of active-site residues in terms of both sequence as well as structure was significantly greater than non-active site residues. These results can be used in helping to understand the likely future evolution of DHFR in response to novel therapies. Ó 2011 Elsevier Inc. All rights reserved.

1. Introduction DHFR is an essential enzyme that catalyzes the reduction of 7,8dihydrofolate to 5,6,7,8-tetrahydrofolate (Blakely, 1985, 1995). Given that this is a key step in the synthesis of purines, pyrimidines, and several amino acids, it is not surprising that the overall fold and core three dimensional structure of DHFR has been highly conserved over a wide range of organisms (Wallace and Matthews, 2002). Inhibition of DHFR effectively blocks DNA synthesis and leads to cell death (Ferone, 1977). DHFR is also considered as a therapeutic target for cancer (Huennekens, 1994), and because of its relevance, a large quantity of detailed structure and activity data exists including 200 X-ray crystal structures deposited in the Protein Data Bank (PDB). Because inhibition of DHFR kills prokaryotes such as Staphylococcus aureus and Mycobacterium tuberculosis, unicellular eukaryotes such as Plasmodium falciparum (malaria) as well as cancer cells, DHFR has been a very attractive therapeutic target (Ferone, 1977; Huennekens, 1994; Rollo, 1970). In response to the widespread application of antibiotics and anti-malarial drugs, resistance has quickly evolved in many of these organisms. The underlying mechanisms, dynamics, and kinetics leading to DHFR-based

⇑ Corresponding author. E-mail addresses: [email protected] (D. Hecht), [email protected] (J. Tran), [email protected] (G.B. Fogel). 1 Fax: +1 858 455 1560. 1055-7903/$ - see front matter Ó 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.ympev.2011.06.005

drug resistance, is a very active area of current research (Bertino, 2009; Mita et al., 2009; Volpato and Pelletier, 2009; Zhanel et al., 2006; Zhao et al., 2007). Chromosomal DHFR has a highly conserved structural motif including a well-defined active site and catalytic mechanism (Wallace and Matthews, 2002). Extensive structural and activity studies have been performed on DHFR making it one of the most extensively studied enzymes. DHFR has been used as the basis for phylogenetic analyses (Iwagami et al., 2007; O’Neil et al., 2003) as well as simulation studies of evolutionary processes (Vartanian et al., 2001; Iwakuri et al., 2006). Here we studied the evolution of DHFR to better understand the frequency, position dependence, and types of amino acid substitutions that occurred during interspecies differentiation. With this data in mind we plan to study the intraspecies differentiation that occurred in drug resistant strains of S. aureus, M. tuberculosis, and P. falciparum. Ultimately the goal of this research is to gain insights into the likely future evolution of DHFR that will be of assistance in developing novel therapies. In this paper we present a structure-based alignment of wild type DHFR protein sequences from 22 species representing broad phylogenetic diversity. Using this structure-based alignment, an analysis was performed to evaluate the degree of site-specific variability/conservation of both sequence as well as structure. These were compared with an existing sequence-based (ClustalW). As part of the evaluation of this approach, phylogenetic trees were generated and compared to the canonical phylogeny. Additionally, these trees were compared with the phylogenetic trees resulting from ClustalW. Finally, an analysis was performed to determine

D. Hecht et al. / Molecular Phylogenetics and Evolution 61 (2011) 212–230

213

Fig. 1. The structural superimposition of DHFR proteins from 22 species obtained from the PDB (overall rmsd = 1.409 Å).

Fig. 2. Structural alignment of 22 DHFR sequences with corresponding secondary structure residue masks. The first 22 sequences represent the actual residues in the sequence. The last set of 22 sequences are labeled for the type of secondary structure present: for alpha helix, for beta-sheet. Residues 66 Å from bound NADPH and the active site are labeled with ‘‘⁄’’. PDB identifiers follow each species name.

214

D. Hecht et al. / Molecular Phylogenetics and Evolution 61 (2011) 212–230

Fig. 2 (continued)

differences in the degree of conservation in active site residues versus non-active site residues. 2. Materials and methods 2.1. Structure-based sequence alignments Following a search for all DHFR structures deposited in the PDB, 188 DHFR X-ray crystal structures and associated protein sequences were downloaded in FASTA format, and imported into a spreadsheet where they were sorted by species and by their classification as either a wild-type or a mutant sequence. For species having more than one wild-type .pdb file, the FASTA sequences were compared and in several cases minor differences were iden-

tified. One .pdb structure file containing a wild-type sequence was selected at random for each of 22 unique prokaryotic and eukaryotic species. Using Deepview (Guex and Peitsch, 1997) each structure was ‘‘cleaned’’ by removing all waters and ions. In addition, each .pdb file was truncated to include only a monomer of DHFR in each .pdb file. These were then imported into MOE (http://www.chemcomp.com) where they were structurally aligned using MOE’s ‘‘Superpose’’ function (Fig. 1). The overall rmsd for the 22 structures was 1.409 Å. The aligned sequences were then exported as a .fasta file for import into BioEdit (Hall, 1999) for further alignment. Several species including P. falciparum, Plasmodium vivax and Chloroflexus aurantiacus have substantial sequence insertions particularly in unstructured regions of the alignment such as residues 20–31,

D. Hecht et al. / Molecular Phylogenetics and Evolution 61 (2011) 212–230

215

Fig. 2 (continued)

residues 47–83 and residues 101–125 (Fig. 2). Because of these large insertions, it was difficult to properly align the residues in these regions for the other sequences. Manual alignments were performed using the superposition of the 22 DHFR structures as a reference. For example, from visual inspection of the 22 superimposed structures the highly conserved PW was manually aligned to be in positions 84, 85 and not 47, 48. For each sequence, two masks were generated labeling each residue: one for the type of secondary structure present (alpha helix (H) or beta-sheet (S)) (Fig. 2), and one for the amino acid residue class (acid (A), base (B), non-aromatic hydrophobic (N), noncharged polar (Q), aromatic (R), proline (P), glycine (G)) (Fig. 3).

These residue classes reflect commonly used classifications of amino acids based on both side chain functionality and the underlying codon. Given the critical role that proline and glycine play in secondary structure elements such as turns and helices, they were given separate classifications. 2.2. ClustalW alignment The FASTA sequences for the 22 species were imported into MEGA 5.0 (Tamura et al., 2007, 2011). A full sequence alignment was performed using the ClustalW (Larkin et al., 2007) algorithm with default settings (Fig. 4).

216

D. Hecht et al. / Molecular Phylogenetics and Evolution 61 (2011) 212–230

Fig. 2 (continued)

2.3. Active site analysis

2.4. DHFR phylogenetics

Each of the 22 superimposed structures were truncated in MOE to include only residues 6 Å from bound NADPH and the active site residues (Fig. 5). The overall rmsd was 0.864 Å. The structurebased sequence alignment (Fig. 2) was likewise truncated in BioEdit to include residues 6 Å from bound NADPH and the active site residues (Figs. 6 and 7). The active site sequence alignment was exported as a FASTA file and uploaded into Weblogo (Crooks et al., 2004; Schneider and Stephens, 1990) in order to generate a graphical analysis of sequence conservation on a position-by-position basis (Fig. 8).

A ‘‘canonical’’ phylogenetic tree of the 22 species was generated from publically available tools at www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi by typing in the names of the 22 species. The resulting phylogeny was saved in Phylip tree format (.phy), and imported into MEGA for 5.0 for formatting (Fig. 9). Phylogenetic trees were generated with MEGA 5.0 using the Maximum Likelihood method with the following parameters: Dayhoff amino acid substitution model; Uniform Rates of site-specific mutation; Complete Deletion of gaps and missing data; and

D. Hecht et al. / Molecular Phylogenetics and Evolution 61 (2011) 212–230

217

Fig. 2 (continued)

Nearest-Neighbor Interchange heuristic tree inference method. Each phylogeny was evaluated using the bootstrap method with 500 replications. These included trees generated from the structure-based sequence alignments of the entire sequences of DHFR (Fig. 10) as well as the truncated sequences consisting of residues 6 Å from bound NADPH and active site residues (Fig. 11). Also included was the phylogenetic tree generated from the ClustalW alignment (Fig. 12). TOPD/FMTS (Puigbò et al., 2007) with default settings was used to perform pair-wise comparisons of the canonical DHFR phylogenetic tree (Fig. 9) with each of the trees resulting from: the complete structure-based alignment (Fig. 10); the truncated active-site alignment (Fig. 11); and the ClustalW alignment (Fig. 12). Additionally, the eukaryotic and prokaryotic clades

from the complete structure-based alignment were compared separately to the corresponding clades from the canonical tree. Three different algorithms were used to evaluate the differences between the trees. These include the split difference based on partition metrics (Robinson and Foulds, 1981), the Nodal Distance based on path length metrics (Steel and Penny, 1993) and the Disagree Distance which identifies the fraction of taxa that disagree between two trees (Penny and Hendy, 1985). One hundred comparisons using randomized trees generated with a Markovian algorithm were performed with each method for each pair of trees in order to determine if the similarity was greater than random (Puigbò et al., 2007). The results of these analyses are presented in Table 1.

218

D. Hecht et al. / Molecular Phylogenetics and Evolution 61 (2011) 212–230

Fig. 2 (continued)

PAML (Yang, 1997, 2007) was used to test if there was a difference in the trees generated assuming a non-uniform evolutionary rate among the sites of the sequences. Default values for codeml were used (see Fig. 13). TOPD/FMTS was used to compare this tree to the canonical one (Table 1). 2.5. Determination of sequence and structure conservation and variability The structure-based alignment was analyzed in an attempt to quantify the frequency of site-specific mutations and the degree of sequence conservation. BioEdit was used to calculate the frequency of finding each of the 20 naturally occurring amino acids at each and every position in the 22 aligned sequences. For exam-

ple, at position 91 (Fig. 2) there were 12 aspartic acid residues, 9 glutamic acid residues, and 1 asparagine residue. The calculated amino acid frequencies (at position 91) are: 12/22 or 54.54% D, 9/22 or 40.91% E and 1/22 or 4.55% N. For cases where one or more gaps are present at a particular position in the alignment, the gaps were assigned a null value. For example, position 20 (representing the first residue in a short insertion present in both P. falciparum and P. vivax but not any of the other species) was calculated as 2/22 or 9.09% V and null for all the other amino acids. Likewise for each position in the 22 aligned sequences, the frequency of finding a residue in either an a-helix or b-sheet conformation was calculated. For example, as each of the residues located at position 91 (Fig. 2) was in an a-helix conformation, this position

D. Hecht et al. / Molecular Phylogenetics and Evolution 61 (2011) 212–230

219

Fig. 3. Residue masks for the structural alignment of 22 wild type DHFR sequences representing the amino acid residue class: for acid, for base, N for non-aromatic hydrophobic, for non-charged polar, R for aromatic, for proline, for glycine. Residues 6 Å from bound NADPH and the active site are labeled with ⁄.

was assigned a value of ‘‘100% a-helix.’’ Similarly, the frequency of amino acid class (e.g., acid, base, glycine, proline, non-aromatic hydrophobic, aromatic, polar) at each position in the alignment was also calculated. The average amino acid frequencies for each position in the alignment were then calculated. The higher this average, the greater the degree of conservation at this particular position. The lower this average, the greater the tolerance for amino acid variability. For example, positions 132, 206 and 207 in Fig. 2, each had an average value of 100% representing complete conservation (e.g., glycine). Likewise, the average frequency for secondary structure and residue class was calculated at each position. The average of the average frequency per position was then calculated for amino acid, secondary structure, and amino acid class. These values provided a metric for the degree of conservation or variability at each site in the alignment – not only in terms of amino acid residue, but also secondary structure and residue class. For purposes of this paper these values will be called Amino Acid Con-

servation, Secondary Structure Conservation, and Residue Class Conservation respectively (Table 2). 3. Results and discussion 3.1. Comparison of the structure-based alignment and the ClustalW alignment From visual inspection of the structure-based and the ClustalW alignments, differences in the relative alignments of active site as well as non-active site residues were observed. Differences in active-site residue alignments are highlighted in Fig. 4. The most significant of these differences exist in the alignment of C. aurantiacus in which residues in potions 37–46 in the structure-based alignment (Fig. 2) were shifted left and out of the active site in the ClustalW alignment (Fig. 4). Additionally, insertions of GP for C. aurantiacus and SP for T. maritima in positions 145–46 caused a shift of the neighboring residue E and A

220

D. Hecht et al. / Molecular Phylogenetics and Evolution 61 (2011) 212–230

Fig. 3 (continued)

D. Hecht et al. / Molecular Phylogenetics and Evolution 61 (2011) 212–230

221

Fig. 4. Sequence alignment of 22 DHFR wt sequences using the default settings for ClustalW in Mega 5.0 (21–23). Residues 66 Å from bound NADPH and the active site are labeled with ‘‘⁄’’. Differences in active site residues from the structure-based alignment (Fig. 2) are highlighted in yellow. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

(respectively) out of the active site. There were also multiple misalignments in positions 107, 127–8, and 206–7. Finally, in the ClustalW alignment, residues 145–6 for C. albicans were shifted out of the active site. Between the structure-based and the ClustalW alignments, there existed numerous differences in the alignments of non-active site residues particularly in non-structured regions or regions with insertions. These included positions: 22–41 (or P. falciparum and P. vivax); 67–92; 129–142; as well as positions 155–165. The most extreme differences in the alignments occured at the C-terminus in which residues in positions 250–265 for the structure-based alignment (Fig. 2) were shifted right to positions 232–245 in the ClustalW alignment (Fig. 4) and the last 20 residues were compressed. Given the extent of the differences in the alignments generated by these two approaches, it is not surprising that the phylogenetic trees (Figs. 10 and 12) have corresponding differences as well. This will be discussed in greater detail in Section 3.3.

3.2. Determination of sequence and structure conservation and variability For the structure-based alignment the average value over all alignment positions for Amino Acid Conservation was 12.55 ± 1.83% (Table 2). This ranges from a value of 4.54 representing the case where each of the 22 sequences have a different amino acid at each position to a value of 100.00 representing complete conservation at each and every position. The calculated value of 12.55% is on the low side of this range and is indicative of a relative lack of site-specific sequence conservation. The value for Amino Acid Conservation for the ClustalW alignment was qualitatively similar: 13.71 ± 2.12%. The value for Secondary Structure Conservation was higher, 46.90 ± 5.84% (Table 2). This is to be expected given the 1.409 Å rmsd calculated from the structural alignment (Fig. 1). Likewise the value for Residue Class Conservation was also higher, 19.85 ± 2.90% (Table 2). These results suggest that although a

222

D. Hecht et al. / Molecular Phylogenetics and Evolution 61 (2011) 212–230

Fig. 4 (continued)

D. Hecht et al. / Molecular Phylogenetics and Evolution 61 (2011) 212–230

223

Fig. 5. Overlay of the DHFR active sites from 22 species obtained from the PDB and truncated to residues 6 Å from bound NADPH and the active site (overall rmsd = 0.864 Å).

Fig. 6. Structural alignment of 22 DHFR sequences truncated to residues 6 Å from bound NADPH and the active site with corresponding secondary structure residue masks. Residues >6 Å from bound NADPH and the active site are not shown and are designated by ‘‘–’’. The first 22 sequences represent the actual residues in the sequence. The last set of 22 sequences are labeled for the type of secondary structure present: for alpha helix, for beta-sheet. PDB identifiers follow each species name.

224

D. Hecht et al. / Molecular Phylogenetics and Evolution 61 (2011) 212–230

Fig. 7. Residue masks for the structural alignment of 22 wild type DHFR sequences truncated to residues 6 Å from bound NADPH and the active site with corresponding secondary structure residue masks representing the amino acid residue class: for acid, for base, N for non-aromatic hydrophobic, for non-charged polar, R for aromatic, for proline, for glycine. Residues >6 Å from bound NADPH and the active site are not shown and are designated by ‘‘–’’.

Fig. 8. Weblogo generated graphical representation of position by position sequence conservation for the structural alignment of 22 wild type DHFR sequences truncated to residues 6 Å from bound NADPH and the active site. The larger the size of the letters the greater the degree of conservation.

substantial degree of site-specific sequence variability exists across the 22 species, there is less tolerance for mutations that result in changes in secondary structure or residue class for DHFR relative to amino acid replacement. The DHFR sequences of P. falciparum, P. vivax and C. aurantiacus exist as outliers in the alignment as each has one or more extended sequence insertions. For P. falciparum and P. vivax, these residues are involved in protein–protein interactions with thymidylate synthase, and are not directly related to the catalytic activity of the active site (Sirawaraporn et al., 1997; Yuvaniyama et al., 2003; Kongsaeree et al., 2005). With the minor exceptions of the residues immediately neighboring these insertions (e.g., Fig. 2, positions 18, 19 and 37, 38 for P. falciparum, and P. vivax respectively) there does not appear to be any affect on the alignment of the ‘‘core’’ DHFR domain. Similarly, the large insertions in the sequence of C. aurantiacus do not appear to affect the alignment of the active site nor the ‘‘core’’ DHFR domain (3KGY.pdb, data not published). The function of the insertion remains unknown. A similar analysis was performed for the structure alignments of the active site residues (Table 2). Not surprisingly, the value of

Amino Acid Conservation for the active site, 27.08 ± 7.53%, was more than twice the Amino Acid Conservation for the entire sequence alignment. The value for Secondary Structure Conservation was also higher, 80.11 ± 10.73% indicating minimal tolerance for changes in secondary structure type. Likewise the value for Residue Class Conservation was also more than twice the total sequence alignment, 42.79 ± 10.17%. A graphical analysis of positional conservation for the active site residues (Fig. 8) suggests that positions where glycine, proline or acidic amino acids were present were among the most highly conserved. Interestingly, most of the variation arose from positions where non-aromatic residues were found (Figs. 7 and 8). Even though these positions were variable with regards to amino acid type, secondary structure class and residue class were highly conserved. 3.3. DHFR phylogenetics The ‘‘canonical’’ phylogenetic tree of the 22 species (Fig. 9) was used as a reference for evaluating the phylogenetic trees

D. Hecht et al. / Molecular Phylogenetics and Evolution 61 (2011) 212–230

225

Haloferax volcanii Cryptosporidium hominis Apicomplexa

Babesia bovis Aconoidasida

Plasmodium vivax

Plasmodium

Plasmodium falciparum Eukaryota

Gallus gallus Amniota

Mus musculus Euarchontoglires

Homo sapiens

Fungi/Metazoa group

Pneumocystis carinii Ascomycota

Candida glabrata

Saccharomycetales

Candida albicans Thermotoga maritima Chloroflexus aurantiacus Mycobacterium tubercuolsis Mycobacterium

Mycobacterium avium Lactobacillus casei Lactobacillales

Bacteria

Streptococcus pneumoniae Bacilli

Staphylococcus aureus Bacillales

Geobacillus stearothermophilus Bacillaceae

Bacillus anthracis Moritella profunda Gammaproteobacteria

Escherichia coli Fig. 9. Taxonomic tree of 22 species from www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi.

resulting from the structure-based alignment of DHFR. Table 1 presents the output from a pairwise comparison of each phylogenetic tree (Figs. 10–12) with the canonical tree (Fig. 9) using three different metrics to evaluate the differences between the trees. For the split difference and the Disagree Difference metrics, two truly random trees would provide a value of 1. For

the Nodal Difference, the larger the value, the greater the tree dissimilarity. Based on this analysis, the structure-based alignment produced a phylogenetic tree (Fig. 10) that was closest to the canonical one (Fig. 9). The eukaryotic clade was fairly close to the canonical with the notable exception of the ordering of

226

D. Hecht et al. / Molecular Phylogenetics and Evolution 61 (2011) 212–230

Mus musculus

42 98

Homo sapiens Gallus gallus

49

Pneumocystis carinii Candida glabrata

37 48

84

Candida albicans Babesia bovis

53

Cryptosporidium hominis 29

9

Plasmodium falciparum 93

Plasmodium vivax

95

Mycobacterium tuberculosis Mycobacterium avium Moritella profunda

75 5

Escherichia coli

38

Bacillus anthracis

4

Streptococcus pneumoniae

47 8

Lactobacillus casei

11

Staphylococcus aureus 22

Geobacillus stearothermophilus Haloferax volcanii Chloroflexus aurantiacus Thermotoga maritima

1.2

1.0

0.8

0.6

0.4

0.2

0.0

Fig. 10. Phylogenetic tree generated from structural alignment of complete DHFR sequences from 22 species using MEGA.

Plasmodium which was demoted to the Apicomplexa clade together with Babesia bovis and C. hominis. This is also reflected in the relatively low values for the calculated Split and Nodal Differences. In the prokaryotic clades there were several differences. Most notably, T. maritima and C. aurantiacus were considered as an outgroup from the prokaryotes and eukaryotes. This reflects the many amino acid substitutions in these taxa as shown in Figs. 2 and 3 and 6 and 7. Another major difference was the placement of M. profunda and Escherichia coli together with Bacillus. B. anthracis was also removed from the Bacillaceae clade and placed between M. profunda and E. coli and the other Bacillus species. It is not surprising that the calculated Split and Nodal Differences were therefore larger than those calculated for the eukaryotic clade. However, the Disagree Distance for the prokaryotic clade was actually less than that for the eukaryotic clade.

In this analysis, a uniform evolutionary rate over the sequences was assumed. PAML was used to generate a phylogenetic tree in which this rate was not kept constant (Fig. 13). The resulting tree was similar to the one generated with MEGA (Fig. 10) with some differences in the ordering of the taxa in the prokaryotic clades. The resulting calculated differences from the canonical tree were slightly higher (Table 1). Interestingly, the phylogenetic tree generated from the truncated sequences consisting of residues 6 Å from bound NADPH and active site residues (Fig. 11) had many differences relative to the canonical tree (Fig. 9). This was reflected in the relatively higher values for all three metrics presented in Table 1. Most notably there was no clear separation between eukaryotes and prokaryotes. Additionally there was very little similarity in the ordering of the prokaryotes to the canonical phylogeny. This divergence from the canonical phylogeny was most likely due to the high

D. Hecht et al. / Molecular Phylogenetics and Evolution 61 (2011) 212–230

227

Homo sapiens

56 60

Gallus gallus

88

Mus musculus

31

Babesia bovis Cryptosporidium hominis

57

Candida glabrata Candida albicans

71 38

Plasmodium falciparum

28

Plasmodium vivax

100 15

Pneumocystis carinii Haloferax volcanii 13

Mycobacterium tuberculosis

76

Mycobacterium avium

98 2

Bacillus anthracis Geobacillus stearothermophilus

Streptococcus pneumoniae

20

Lactobacillus casei

53

Staphylococcus aureus Thermotoga maritima

24

Moritella profunda Escherichia coli

29

Chloroflexus aurantiacus

1.0

0.8

0.6

0.4

0.2

0.0

Fig. 11. Phylogenetic tree generated from structural alignment of 22 wild type DHFR sequences truncated to residues 6 Å from bound NADPH and the active site using MEGA.

degree of sequence and amino acid type conservation (Figs. 6 and 7), and speaks to the unique evolutionary history and strong conservation of the DHFR active site. Based on the data presented in Table 1, the phylogenetic tree generated from the ClustalW alignment (Fig. 12) different more with respect to the canonical phylogeny (Fig. 9) than the structure-based alignment phylogeny (Fig. 10). However, the ClustalW derived tree was closer to the cannoncial tree than the active-site derived phylogeny (Fig. 11). Most notably, H. volcanii was placed in the middle of the Bacillus clade. In an arrangement similar to the tree generated from the structure-based alignment, both T. maritima and C. aurantiacus were placed outside of both the prokaryote and eukaryote clades. Eukaryotes B. bovis and C. hominis switched with the Saccharomycetales (C. glabrata and C. albicans).

There were also multiple differences in the ordering of the prokaryotes. These include the repositioning of the Mycobacterium and the Gammaprotobacteria clades into that of the Bacillus as well as the position of B. anthracis.

4. Conclusions Bacteria have a tremendous opportunity for rapid evolution with large effective population sizes and high rates of mutation. As a result, there has been a re-emergence of bacteria with evolved resistance to many conventional antibiotics, including those that target DHFR. In order to develop novel therapeutics targeting DHFR, it is therefore necessary to better understand the intraspe-

228

D. Hecht et al. / Molecular Phylogenetics and Evolution 61 (2011) 212–230

Mus musculus

73 98

Homo sapiens Gallus gallus

5

Babesia bovis

5

Cryptosporidium hominis

71

Pneumocystis carinii

36

Candida glabrata

92

Candida albicans Plasmodium falciparum

6 61

97

Plasmodium vivax

93

Mycobacterium tuberculosis

31

Mycobacterium avium Lactobacillus casei Staphylococcus aureus

12 42

Geobacillus stearothermophilus Bacillus anthracis 7

Moritella profunda

85

Escherichia coli

5

Streptococcus pneumoniae Haloferax volcanii

31

Thermotoga maritima Chloroflexus aurantiacus

3.0

2.5

2.0

1.5

1.0

0.5

0.0

Fig. 12. Phylogenetic tree generated from ClustalW alignment of complete DHFR sequences from 22 species using MEGA.

Table 1 Analysis of differences between the canonical DHFR phylogenetic tree (Fig. 9) and the trees resulting from: the complete structure alignment (Figs. 10 and 13); the truncated active site alignment (Fig. 11) and the ClustalW alignment (Fig. 12).

Structure-based alignment: (i) Eukaryotic clade (ii) Prokaryotic clade (iii) PAML Active site alignment ClustalW alignment

Split difference

Random

Nodal difference

Random

Disagree difference

Random

0.316 0.143 0.250 0.421 0.684 0.579

0.984 0.950 0.954 0.988 0.987 0.989

1.467 0.683 0.972 1.947 2.298 2.372

3.218 2.108 2.314 3.257 3.241 3.249

0.182 0.200 0.091 0.273 0.636 0.318

0.955 0.860 0.881 0.951 0.965 0.953

cies differentiation that occurred in drug resistant strains of pathogens such as S. aureus, M. tuberculosis, and P. falciparum.

In this paper, we present a structure-based alignment of wildtype DHFR protein sequences from 22 species representing broad

229

D. Hecht et al. / Molecular Phylogenetics and Evolution 61 (2011) 212–230

Fig. 13. Phylogenetic tree generated from structural alignment of complete DHFR sequences from 22 species using PAML.

Table 2 Site-specific frequencies of amino acid, secondary structure type and Residue Class Conservation calculated from for the entire structural based alignment (Figs. 2 and 3) and the active site alignment (Figs. 6 and 7).

Complete structure-based alignment Active site alignment

Amino Acid Conservation (%)

Secondary Structure Conservation (%)

Residue Class Conservation (%)

12.55 ± 1.83 27.08 ± 7.53

46.90 ± 5.84 80.11 ± 10.73

19.85 ± 2.90 42.79 ± 10.17

phylogenetic diversity. Using this structure-based alignment we evaluated the degree of site-specific variability/conservation of

both sequence as well as structure. Comparing the sequence-based alignment from ClustalW and the structure-based alignment, it is

230

D. Hecht et al. / Molecular Phylogenetics and Evolution 61 (2011) 212–230

clear that there are differences in the clustering and the placement of gaps. Even so, the calculated values of Amino Acid Conservation are essentially equivalent. However, the phylogenetic tree generated from the structure-based alignment was much closer to the canonical phylogeny than the tree resulting from ClustalW. Surprisingly, the tree generated from the active-site alignment was the least similar. With the structure-based alignment it was also possible to evaluate differences in site-specific sequence and structure variability between active site and non-active site residues. As expected, there was significantly more conservation at active site residues across the 22 species. Surprisingly, the phylogenetic tree generated from the sequence alignment of the active site residues was significantly different from the canonical tree. This is most likely due to the high degree of sequence and structure conservation for these residues. Using this structure-based alignment of DHFR, it is now possible to generate a metric for the degree of conservation or variability at each site in the alignment – not only in terms of amino acid residue, but also secondary structure and residue class. In future studies we plan to use these values to generate site-specific predictive models of amino acid replacements and identify locations where compensating amino acid replacements may be occurring. We plan to test these predictions with mutant DHFR structures from the PDB as well as homology models DHFR sequences from other species. Acknowledgments The authors would like to thank the reviewers for their very valuable suggestions and insightful comments that helped make this a much more complete manuscript. References Bertino, J.R., 2009. Cancer research: from folate antagonism to molecular targets. Best Pract. Res. Clin. Haematol. 22, 577–582. Blakely, R.L., 1985. Dihydrofolate reductases. In: Folates and Pterins. Wiley, J., New York, pp. 191–253. Blakely, R.L., 1995. Advances in Enzymology and Related Areas of Molecular Biology. Wiley, J., New York, pp. 23–102. Crooks, G.E., Hon, G., Chandonia, J.M., Brenner, S.E., 2004. WebLogo: a sequence logo generator. Genome Research 14, 1188–1190. Ferone, R., 1977. Folate metabolism in malaria. Bull. World Health Organ. 55, 291. Guex, N., Peitsch, M.C., 1997. SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis 18, 2714– 2723. Hall, T.A., 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 41, 95–98. Huennekens, F.M., 1994. The methotrexate story: a paradigm for development of cancer chemotherapeutic agents. Adv. Enzyme Regul. 34, 397. Iwagami, M., Higo, H., Yanagi, T., Tada, I., Kano, S., Agatsuma, T., 2007. Molecular phylogeny of Trypanosoma cruzi from Central America, Guatemala and a comparison with South American strains. Parasitol. Res. 102, 129–134.

Iwakuri, M., Maki, K., Takahashi, H., Takenawa, T., Yokota, A., Katayanagi, K., Kamiyama, T., Gekko, K., 2006. Evolutional design of a hyperactive cysteine- and methionine-free mutant of escherichia coli dihydrofolate reductase. J. Biol. Chem. 281, 13234–13246. Kongsaeree, P., Khongsuk, P., Leartsakulpanich, U., Chitnumsub, P., Tarnchombpoo, B., Walkinshaw, M.D., Yuthavong, Y., 2005. Crystal structure of dihydrofolate reductase from Plasmodium vivax: pyrimethamine displacement linked with mutation-induced resistance. Proc. Natl. Acad. Sci. USA 102, 13046–13051. Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G., 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948. Mita, T., Tanabe, M.T., Kita, K., 2009. Spread and evolution of Plasmodium falciaprum drug resistance. Parasit. Int. 58, 201–209. Molecular Operating Environment (MOE) Available under License from Chemical Computing Group Inc., 1010 Sherbrooke St. W. Suite 910, Montreal, Quebec, Canada H3A 2R7. . O’Neil, R.H., Lilien, R.H., Donald, B.R., Stroud, R.M., Anderson, A.C., 2003. Phylogenetic classification of protozoa based on the structure of the linker domain in the bifunctional enzyme, dihydrofolate reductase-thymidylate synthase. J. Biol. Chem. 278, 52980–52987. Penny, D., Hendy, M.D., 1985. The use of tree metrics. Syst. Zool. 34, 75–82. Puigbò, P., Garcia-Vallvé, S., McInerney, J.O., 2007. TOPD/FMTS: a new software to compare phylogenetic trees. Bioinformatics 23, 1556–1558. Robinson, D.F., Foulds, L.R., 1981. Comparison of phylogenetic trees. Math. Biosci. 53, 131–147. Rollo, I.M., 1970. CRC Crit. Rev. Clin. Lab Sci. 1, 565. Schneider, T.D., Stephens, R.M., 1990. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100. Sirawaraporn, W., Sathikul, T., Sirawaraporn, R., Yuthavong, Y., Santi, D.V., 1997. Antifolate-resistant mutants of Plasmodium falciparum dihydrofolate reductase. Proc. Natl. Acad. Sci. USA 94, 1124–1129. Steel, M.A., Penny, D., 1993. Distribution of tree comparison metrics – some new results. Syst. Biol. 42, 126–141. Tamura, K., Dudley, J., Nei, M., Kumar, S., 2007. MEGA4: molecular evolutionary genetics analysis, MEGA software version 4.0. Mol. Biol. Evol. 24, 1596–1599. Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., Kumar S., 2011. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance and maximum parsimony methods. Mol. Biol. Evol. doi:10.1093/molbev/msr121. Vartanian, J.-P., Henry, M., Wain-Hobson, S., 2001. Simulating psuedogene evolution in vitro: determining the true number of mutations in a lineage. PNAS 98, 13172–13176. Volpato, J.P., Pelletier, J.N., 2009. Mutational ‘hot-spots’ in mammalian, bacterial and protozoal resistance: sequence and structural comparison. Drug Resist. Update 12, 28–41. Wallace, L., Matthews, C.R., 2002. Highly divergent dihydrofolate reductases conserve complex folding mechanisms. J. Mol. Biol. 315, 193–211. Yang, Z., 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555–556. Yang, Z., 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591. Yuvaniyama, J., Chitnumsub, P., Kamchonwongpaison, S., Vanichtanankul, J., Sirawaraporn, W., Taylor, P., Walkinshaw, M., Yuthavong, Y., 2003. Insights into antifolate resistance from malarial DHFR–TS structures. Nat. Struct. Biol. 10, 357–365. Zhanel, G.G., Wang, X., Nichol, K., Nikulin, A., Wierzbowski, A.K., Mulvey, M., Hoban, D.J., 2006. Molecular characterization of Canadian paediatric multi-drug resistant Streptococcus pneumoniae from 1998–2004. Int. J. Antimicrob. Agents 28, 465–471. Zhao, S., McDermott, P.F., White, D.G., Qaiyumi, S., Friedman, S.L., Abbott, J.W., Glenn, A., Ayers, S.L., Post, K.W., Fales, W.H., Wilson, R.B., Reggiardo, C., Walker, R.D., 2007. Characterization of multidrug resistant Salmonella recovered from diseased animals. Vet. Microbiol. 123, 122–132.