Production of bulk chemicals via novel metabolic pathways in microorganisms

Production of bulk chemicals via novel metabolic pathways in microorganisms

Biotechnology Advances 31 (2013) 925–935 Contents lists available at ScienceDirect Biotechnology Advances journal homepage: www.elsevier.com/locate/...

944KB Sizes 2 Downloads 79 Views

Biotechnology Advances 31 (2013) 925–935

Contents lists available at ScienceDirect

Biotechnology Advances journal homepage: www.elsevier.com/locate/biotechadv

Research review paper

Production of bulk chemicals via novel metabolic pathways in microorganisms Jae Ho Shin a, 1, Hyun Uk Kim a, b, 1, Dong In Kim a, Sang Yup Lee a, b, c,⁎ a Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 program), Center for Systems and Synthetic Biotechnology, Institute for the BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 305-701, Republic of Korea b BioInformatics Research Center, KAIST, Daejeon 305-701, Republic of Korea c BioProcess Engineering Research Center, KAIST, Daejeon 305-701, Republic of Korea

a r t i c l e

i n f o

Available online 29 December 2012 Keywords: Synthetic biology Metabolic engineering Pathway prediction Bulk chemicals De novo pathway design Promiscuous enzyme Enzyme modification Genome mining Strain optimization

a b s t r a c t Metabolic engineering has been playing important roles in developing high performance microorganisms capable of producing various chemicals and materials from renewable biomass in a sustainable manner. Synthetic and systems biology are also contributing significantly to the creation of novel pathways and the whole cell-wide optimization of metabolic performance, respectively. In order to expand the spectrum of chemicals that can be produced biotechnologically, it is necessary to broaden the metabolic capacities of microorganisms. Expanding the metabolic pathways for biosynthesizing the target chemicals requires not only the enumeration of a series of known enzymes, but also the identification of biochemical gaps whose corresponding enzymes might not actually exist in nature; this issue is the focus of this paper. First, pathway prediction tools, effectively combining reactions that lead to the production of a target chemical, are analyzed in terms of logics representing chemical information, and designing and ranking the proposed metabolic pathways. Then, several approaches for potentially filling in the gaps of the novel metabolic pathway are suggested along with relevant examples, including the use of promiscuous enzymes that flexibly utilize different substrates, design of novel enzymes for non-natural reactions, and exploration of hypothetical proteins. Finally, strain optimization by systems metabolic engineering in the context of novel metabolic pathways constructed is briefly described. It is hoped that this review paper will provide logical ways of efficiently utilizing ‘big’ biological data to design and develop novel metabolic pathways for the production of various bulk chemicals that are currently produced from fossil resources. © 2012 Elsevier Inc. All rights reserved.

Contents 1. 2.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pathway prediction and design . . . . . . . . . . . . . . . . . . . . . . . . 2.1. Chemical languages employed for constructing pathways in software tools 2.2. Logics of ranking de novo metabolic pathways . . . . . . . . . . . . . 2.3. Limitations of the software tools . . . . . . . . . . . . . . . . . . . . 3. Enzyme promiscuity and modifications . . . . . . . . . . . . . . . . . . . . 3.1. Use of promiscuous enzymes for filling in biochemical gaps . . . . . . . 3.2. Creation of de novo enzymes for expanding metabolic landscape . . . . . 3.3. Expanding enzyme availabilities by genome mining . . . . . . . . . . . 4. Strain optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

926 926 927 928 928 928 928 931 932 932 934 934 934

Abbreviations: 1,4-BDO, 1,4-butanediol; 3HP, 3-hydroxypropionic acid; 4HB-CoA, 4-hydroxybutyryl-CoA; ADH, alcohol dehydrogenase; BEM, bond and electron matrix; BNICE, Biochemical Network Integrated Computational Explorer; EC, Enzyme Commission; GOLD, Genomes OnLine Database; KDC, 2-keto-acid decarboxylase; KEGG, Kyoto Encyclopedia of Genes and Genomes; PHA, polyhydroxyalkanoate; SMILES, simplified molecular-input line-entry systems; UM-BBD, University of Minnesota Biocatalysis/Biodegradation Database. ⁎ Corresponding author at: Department of Chemical and Biomolecular Engineering, KAIST, Daejeon 305-701, Republic of Korea. Tel.: +82 42 350 3930; fax: +82 42 350 3910. E-mail address: [email protected] (S.Y. Lee). 1 These two authors contributed equally to this work. 0734-9750/$ – see front matter © 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.biotechadv.2012.12.008

926

J.H. Shin et al. / Biotechnology Advances 31 (2013) 925–935

1. Introduction Biotechnological production of chemicals and materials from renewable non-food biomass has been receiving increasing attention as an alternative to petroleum-based refinery processes for the sustainability of our society. Systems metabolic engineering, which integrates metabolic engineering with systems and synthetic biology, is becoming an essential strategy to develop microbial strains capable of efficiently producing chemicals and materials of interest (Lee et al., 2011, 2012; Wittmann and Lee, 2012). In particular, synthetic biology enables sophisticated engineering of microorganisms using various molecular biological tools and computational simulations, such that the range of bulk chemicals producible from the engineered microorganisms is reaching out toward the list of petroleum-derived chemicals. This is an important milestone in our industrial history because many of the chemicals used in our daily life are originated from petroleum in a rather unsustainable way. Systems metabolic engineering has successfully demonstrated the potential of using microorganisms as a cell factory for producing non-natural chemicals and materials, including 1,4-butanediol (1,4-BDO; Yim et al., 2011), 5-methyl-1-heptanol (Zhang et al., 2008), and polylactic acid (Jung et al., 2010). According to the hitherto reported studies, development of novel metabolic pathways in the context of chemical

production can be pursued through the following considerations in general: (1) prediction, ranking and selection of potential metabolic pathways, (2) identification of enzymes for biochemical gaps in the constructed pathway, and (3) strain optimization via systems metabolic engineering (Fig. 1). The focus of this paper is on these considerations, each of which has concrete challenges and feasible solutions. A particular emphasis will be put on the identification of enzymes suitable for filling in the gaps existing in the novel metabolic pathway, as this step is conceived to be the most bottlenecking in re-designing the biological system. Other comprehensive reviews should also be consulted for the aspects of synthetic biology that are not covered here in depth (Lee et al., 2012; Martin et al., 2009; Medema et al., 2012; Soh and Hatzimanikatis, 2010). 2. Pathway prediction and design For biologically producing the target petroleum-derived chemicals, the pathway prediction tools are of great use for constructing potential pathways composed of non-natural and/or natural reactions, and for ultimately choosing the best enzyme candidates that are most likely to carry out the desired reactions through the given route. For this particular purpose, several tools have been developed over the past decade, including the Biochemical Network Integrated Computational Explorer

Fig. 1. A scheme for the construction of a microbial cell factory for production of the target chemicals. (A) An engineer would choose the chemical of interest for microbial production and input for the pathway construction. Various candidate chemicals are represented by triangles, squares, diamonds and circles with different gray scales, wherein the same shape represents structural similarity. (B) A biochemical database provides the structural insight of the host metabolism, which is supported by data mining for better insight of biological systems. (C) The information based on databases is used in the pathway prediction tools for the construction of hypothetical pathways using both native and novel biochemical reactions for the specified chemical. (D) The pathway prediction tools provide the general enzyme classification (the third level EC number category) for the novel reactions within the pathway. Manual screening among the substrates within the general enzyme classification suggests one or more specific reactions that resemble the novel reaction. Pie shapes imply native and modified enzymes. Upward arrow represents the biochemical reaction, in which the hexagons and squares represent the substrates and products, respectively. (E) Candidate enzyme that carries out similar reaction can be experimentally or computationally analyzed for substrate compatibility, or modified for inducing substrate promiscuity. Experimental demonstration of the novel reaction and iterative procedures for further modification of the enzyme lead to improved efficiency of the evolved enzyme in vitro. QM and MD refer to quantum mechanics and molecular dynamics, respectively. (F) Finally, introducing the evolved enzyme into a designated host cell must be accompanied by optimization of the production process for the efficient in vivo manufacturing of the target molecule. Steps (D), (E) and (F) can be iterative based on feedback among them.

J.H. Shin et al. / Biotechnology Advances 31 (2013) 925–935

(BNICE) (Hatzimanikatis et al., 2005), DESHARKY (Rodrigo et al., 2008), SimPheny (Genomatica; www.genomatica.com), the system developed by Cho et al. (2010), and RetroPath (Carbonell et al., 2011). In general, these tools propose potential pathways, and rank them according to different factors employed in each framework to provide the most feasible route for possible microbial synthesis of the desired compound. While several platforms (e.g. BNICE) have been continuously developed over the years (Brunk et al., 2012; Hatzimanikatis et al., 2005; Henry et al., 2010; Jankowski et al., 2008), most of the hypothetical routes theoretically predicted by such platforms are still awaiting experimental validation. 2.1. Chemical languages employed for constructing pathways in software tools In order to computationally construct a pathway, unambiguous computational description of chemical information, including substrates, products and the biochemical reactions, is critical, favorably by “speaking” the chemical languages. As one of the best developed prediction tools, the BNICE framework allows prediction of novel pathways using both known and unknown biotransformation

927

reactions (Hatzimanikatis et al., 2005). The capacity of BNICE to adopt unknown reactions is attributed to exploiting the reaction patterns observed in nature rather than solely using the available databases strictly composed of the identified enzymes. It utilizes the reaction rules called generalized reaction operator (or generalized enzyme reactions) based on the Enzyme Commission (EC) classification system (Nomenclature Committee of the International Union of Biochemistry and Molecular Biology) to describe the chemical bond differences in the reactant and the product (Hatzimanikatis et al., 2005). In this system of generalized reaction operator, the information on atoms and the bond connectivities within the reactant and product molecules are described in the form of a matrix called the bond and electron matrix (BEM; Fig. 2B) (Li et al., 2004; Ugi et al., 1979). Additionally, the generalized reaction operator represents bond changes, for instance the formation and dissociation of the bonds in each reaction rule. In order to implement a reaction to a substrate, the reaction matrix would be simply added to the BEM in this system (Fig. 2B). BNICE utilizes the reaction matrices to generate all the possible reactions from the designated starting compound to synthesize the product compound of interest (Brunk et al., 2012; Finley et al., 2009; Hatzimanikatis et al., 2005; Henry et al., 2010; Li et al., 2004). While the use of BEM and generalized

Fig. 2. The illustrative examples of three main chemical languages, BEM, signatures, and SMILES, employed in the pathway prediction tools with an example of glutamate deamination reaction. (A) In the net reaction, L-glutamate loses a hydrogen atom and an ammonium ion while consuming a water molecule and reducing NAD+. As a result, the amino group from L-glutamate is replaced by the oxygen atom from water molecule, indicated by replacing the C\N bond with the C_O bond, to generate α-ketoglutarate. (B) BEM-based chemical description requires a generalized reaction operator in addition to the matrices representing reactants and products. The size of the matrix is n by n, where n is the number of atoms that go through transformation during a reaction. Diagonal elements of the matrix are the number of nonbonding electrons of each atom, and the non-diagonal elements represent bonding between each atom pair. For instance, the first four atoms H, C, N, and H in the matrix that undergo chemical transformations belong to the substrate, L-glutamate. The next atoms, O, H, and H, are from the water molecule, and finally N, C, C, and C from the NAD+. The element ‘1’ in the first column and second row indicates the C\H bond of the α-carbon in L-glutamate, while the diagonal element ‘4’ indicates that the oxygen atom in the water molecule has four non-bonding electrons. Once BEMs of the reactants and the products are constructed in this manner, the generalized reaction operator is obtained by simply subtracting BEM of the reactants from that of the products. Unlike the other chemical languages, BEM considers the atoms of cofactors participating in the reaction in addition to those of primary reactants and products. (C) Signature is somewhat similar to BEM in that it focuses on the atoms participating in the transformation. However, signatures can expand the number of the atoms to be covered by increasing the branch length called ‘height (h)’ of the connected atoms. Here we only show the h = 1 as an example. The atoms underwent transformation are listed first in signature before the parentheses, and their connected atoms (in other words, one branch length apart from the transformed atoms, h = 1) are expressed within the following parentheses. iδ(Sj) denotes signature of substrate j for h = i, and iδ(Pj) denotes signature of product j for h = i. The subscripts j = 1, 2 for substrates are L-glutamate and a water molecule, and j = 1,2 for products are α-ketoglutarate and an ammonium ion, respectively. As an example, for 1δ(S1), 1 [C]([C][C][N]) describes the α-carbon connected with two other carbon and one nitrogen atoms in L-glutamate, while [N]([C]) indicates that the nitrogen atom is only connected to the α-carbon. Hence, each notation here with perspectives from the α-carbon and the nitrogen attached to it includes the redundant information about the C\N bond. Protons are not expressed herein. Lastly, iδ(R) denotes the reaction signature for h = i. (D) SMILES represents the bond information among atoms in compounds by describing their bond order and arrangement in substrates and products. In the SMILES, parenthesis represents chemical branches. For instance, the C located between two outermost parentheses on the substrate side indicates the α-carbon in L-glutamate, and the leftmost C indicates the β-carbon. Atoms described within the parentheses are atoms attached to these α- and β-carbons.

928

J.H. Shin et al. / Biotechnology Advances 31 (2013) 925–935

reaction operators assist greatly in efficiently finding the feasible pathways in BNICE, the method, by definition, does not take into account the rest of the atoms and bonds in the molecules that are not participating in the reaction. Similarly, the RetroPath framework uses ‘signatures’ or the molecular graphs (Fig. 2C) to describe the topological information of chemical structures based on the graph theory (Carbonell et al., 2011), although it cannot distinguish stereoisomers (Faulon et al., 2004). Furthermore, other pathway prediction tools adopted chemical languages other than the BEM. For instance, the simplified molecular-input line-entry system (SMILES) (Weininger, 1988) is used in the University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD) Pathway Prediction system (Gao et al., 2011) and in the system developed by Cho et al. (2010) (Fig. 2D). While SMILES is also based on the graph theory, it can distinguish chiral centers, in contrast to the signatures in RetroPath. In general, precise computational description of the chemical structures using the chemical language is necessary unless novel metabolic pathway of interest can be conceived by intuition and manually designed (Felnagle et al., 2012; Marcheschi et al., 2012; Zhang et al., 2008). While intuition could assist in developing novel pathways, it is often not sufficient for recognizing the most efficient route because the biochemical characteristics of each enzyme are not always obvious, and trying out each enzyme for a large number of predicted pathways by experiments can be very difficult and costly. Hence, the prediction tools are necessary not only for assisting with novel pathway construction, but also for screening for efficient reactions.

coverage of metabolic reactions and related information. For software developers, it is important to assure that their information sources, such as databases, are sufficiently reliable, and the software tools be regularly updated based on the constant upgrades and changes in databases, such that new biochemical reactions can be taken into account for designing metabolic pathways with greater potential and credibility. The frameworks often fail to pinpoint the specific enzymes to be used in the proposed pathways because of the incomplete nature of the biochemical databases. For example, while BNICE ranks the most promising pathways based on thermodynamic feasibility and the pathway lengths, it only provides three tier EC numbers instead of providing the actual availability of particular enzymes for the proposed novel reactions. As an alternative solution for this problem, RetroPath considers the possible enzyme promiscuity using a signature-based prediction of the enzyme to fill in the gap of the predicted metabolic pathway (Carbonell and Faulon, 2010). Nonetheless, these software tools, despite their solid theoretical considerations, have not been widely used in actual development of microbial strains for the production of industrially valuable chemicals. A notable exception is the use of SimPheny for the development of engineered E. coli strain capable of producing 1,4-BDO at industrially relevant performance (Yim et al., 2011). Taken together, one of the most important factors that should be addressed by the prediction tool is the capability to pinpoint a specific enzyme that can be either directly used or used after modification for the biosynthesis of a non-natural chemical of interest. 3. Enzyme promiscuity and modifications

2.2. Logics of ranking de novo metabolic pathways Several pathway prediction tools developed employ different standards for estimating the biological feasibility of the proposed pathways, and ranking them accordingly. The group contribution method allows prediction of standard Gibbs free energy of formation for various chemical compounds (Mavrovouniotis, 1991), which is the theory widely used in the pathway prediction tools. For instance, implementation of this method in BNICE allows the framework to weed out biotransformation reactions that are thermodynamically infeasible in order to avoid combinatorial explosion (Jankowski et al., 2008; Soh and Hatzimanikatis, 2010). The group contribution method is also used in RetroPath (Carbonell et al., 2011) and the system in Cho et al. (2010) for ranking the proposed pathways. DESHARKY is unique in that it quantitatively considers transcription, translation and metabolic burdens due to the expression of foreign genes in the host cell, for ranking the predicted pathways. Similarly, the compatibility of expressing heterologous genes in the host is also considered in RetroPath and the system in Cho et al. (2010) despite them using different methodologies. More detailed logics of predicting and ranking the pathways using different factors are comprehensively reviewed elsewhere (Martin et al., 2009; Medema et al., 2012; Soh and Hatzimanikatis, 2010). 2.3. Limitations of the software tools Although the pathway prediction programs can generate a large number of routes systematically, it is not perfect owing to the fact that the biochemical and metabolic databases, which those programs depend on, are still far from being complete for most organisms. BNICE extracts metabolic pathway information from Kyoto Encyclopedia of Genes and Genomes (KEGG; Kanehisa et al., 2010) and a genome-scale metabolic network model of Escherichia coli, iJR904 (Reed et al., 2003), for developing the reaction rules (Finley et al., 2009; Hatzimanikatis et al., 2005; Henry et al., 2010; Li et al., 2004). DESHARKY and RetroPath also rely on biochemical databases (i.e. KEGG) for the generation of the reaction rule sets. Strong dependence of these software tools on databases implies the importance of metabolic databases, especially their correctness and

The pathway prediction programs developed so far are not yet capable of automatic identification of the exact enzymes for the catalysis of non-natural biotransformation reactions. Potential methods for filling in the biochemical ‘gaps’ or the particular reactions not presented in the databases are described in this section. Promising approaches include the use of promiscuous enzymes and creation of de novo or newly designed enzymes. The former represents the method of employing enzymes that exhibit a broad-substrate range that can be experimentally validated. Fundamental understanding of the reaction characteristics and evolutionary aspects of the enzyme is often of essence in this case (Park et al., 2006; Zhang et al., 2009). Implementation of experiments also assures solid biochemical evidences for the use of promiscuous enzymes for certain desired reactions. On the other hand, the latter method necessarily involves the computational simulations that identify and thoroughly assess critical characteristics of the desired novel enzymes capable of performing the intended reaction. Designing novel enzymes often requires understanding of molecular dynamics and/or quantum mechanics, and their relevant calculations. Additionally, they can facilitate this filling-in gap process more accurately in conjunction with the pathway prediction frameworks. Because of the importance of finding suitable enzymes, which often is the bottleneck of completing the novel metabolic pathway, several studies on employing promiscuous enzymes or designing novel enzymes for producing chemicals of interest are discussed in this section (Tables 1 and 2; Fig. 3). In addition, approaches to explore hypothetical proteins are also briefly discussed for the same purpose. 3.1. Use of promiscuous enzymes for filling in biochemical gaps Biochemical gaps in the novel metabolic pathway can be filled in by promiscuous enzymes since they allow expansion of metabolism by using structurally similar substrates (Fig. 3). Theoretical background to resolve this issue was well presented by Hatzimanikatis and colleagues with the combined use of BNICE and molecular simulations. With the predicted metabolic pathway, molecular simulations were used to predict the feasibility of the reaction between a naturally occurring candidate enzyme and a non-natural substrate for the

J.H. Shin et al. / Biotechnology Advances 31 (2013) 925–935

929

Table 1 Representative microbial production of bulk chemicals, which involves studies to fill in biochemical gaps in the newly designed novel metabolic pathway. Only the chemicals entirely produced using the engineered organism from a single carbon source (i.e. D-glucose) without exogenous feeding of precursors are shown. Also shown are representative chemicals produced where the metabolic pathways are constructed using enzymes that naturally exist. Chemical

Host organism

Titer

Synthetic biological relevance

References

1,4-Butanediola

E. coli

18.0 g/L

Yim et al. (2011)

Catechol

E. coli

18.5 (±2.0) mM 1.1 g/L

Codon optimized promiscuous aldehyde dehydrogenase for novel reaction from 4-hydroxybutyryl-CoA to 4-hydroxybutyraldehyde as a part of the novel 1,4-butanediol pathway, in addition to metabolic engineering approach for increasing the titer Cloning of aroZ (3-dehydroshikimate dehydratase) and aroY (protocatechuic acid decarboxylase) from Klebsiella pneumoniae to complete the catechol biosynthetic pathway Introduction of a synthetic pathway consisting of myo-inositol-1-phosphate synthase, an endogenous phosphatase, myo-inositol oxygenase and urinate dehydrogenase for conversion of D-glucose into D-glucaric acid Development of an optimal scaffold structure for increasing the product titer by co-localizing the recombinant enzymes, on top of the study directly above Evolved glutamate dehydrogenase (K92V/T195S) for L-homoalanine pathway construction for a novel reaction Introduction of phenylalanine ammonia lyase and inactivation of p-hydroxybenzoate hydroxylase

D-Glucaric

acid

E. coli

2.5 g/L L-Homoalanine

E. coli

5.4 g/L

p-Hydroxybenzoate Pseudomonas 1.85 mM putida E. coli 12 g/L cis,cis-Muconic acid E. coli

Phenol

Polylactic acid

Styrene

16.8 (±1.2) mM

36.8 g/L Pseudomonas 1.48 mM putida with a yield of 6.67% (mol/mol) E. coli 11 wt.% from glucose E. coli 260 mg/L

Chorismate lyase (ubiC) overexpression Construction of synthetic pathway consisting of dehydroshikimate dehydratase (aroZ), protocatechuate decarboxylase (aroY) and catechol 1,2-dioxygenase (catA) for shunting dehydroshikimate pool toward cis,cis-muconic acid Optimization of pathway by chromosomal insertion of foreign genes on top of the above study Introduction of tpl gene from Pantoea agglomerans, overexpression of aroF-1 and simultaneous disruption of oprB gene using transposon method, and finally random mutagenesis with the toxic phenol-analogs

Draths and Frost (1995) Moon et al. (2009)

Moon et al. (2010) Zhang et al. (2010a) Verhoef et al. (2007) Barker and Frost (2001) Draths and Frost (1994) Niu et al. (2002) Wierckx et al. (2005)

Introduction of engineered propionate CoA transferase and PHA synthase and subsequent systems Jung et al. (2010), metabolic engineering that involves gene knockout and overexpression using in silico predictions in Yang et al. (2010) order to direct fluxes toward polylactic acid biosynthesis Use of ferulic acid decarboxylase for decarboxylation of trans-cinnamic acid McKenna and Nielsen, (2011)

a While the biochemical gap can be filled in by either monofunctional aldehyde dehydrogenase or the bifunctional aldehyde/alcohol dehydrogenase, the monofunctional enzyme was advantageous due to less formation of ethanol.

enzyme to fill in the biochemical gap in the pathway. This approach was used to theoretically predict the pathways for the production of 3-hydroxypropionic acid (3HP) in E. coli using BNICE and the AMBER software packages (Brunk et al., 2012). One of the potential 3HP biosynthetic pathways predicted by BNICE consists of lactate, lactoyl-CoA, 3HP-CoA, and 3HP, which was ranked high and thus further assessed (Brunk et al., 2012). In this pathway, the non-natural biotransformation from lactoyl-CoA to 3HP-CoA was hypothesized to be carried out by one of the EC 5.4.99. subclass enzymes. Among all the EC subclass 5.4.99. enzymes, methyl malonyl-CoA mutase (EC 5.4.99.2) was selected as the chassis for the simulation of a possible catalyst that carries out intra-molecular rearrangement of lactoyl-CoA into 3HP-CoA in the novel 3HP production pathway. In order to assess the feasibility of this reaction, the AMBER software package was used to assess the hypothetical binding of lactoyl-CoA, the catalysis of reaction, and the dissociation of 3HP-CoA from methyl malonyl-CoA mutase. Conducted molecular simulations include molecular dynamics simulations (PMEMD module of the AMBER), free energy perturbation/thermodynamic integration simulations (SANDER module of the AMBER), and modeling of mutant enzymes (xleap module of the AMBER), thereby assessing the possible carbon skeletal rearrangement mechanism of lactoyl-CoA. A hybrid quantum mechanics/molecular mechanics approach, a unified method that combines molecular dynamics and density functional theory (Car and Parrinello, 1985), was also taken to study the catalytic efficacy of the reaction. The computational method to investigate the likelihood of the novel reaction is an unarguably crucial step for pathway construction. Although the skeletal rearrangement of lactoyl-CoA is yet to be experimentally demonstrated in reality, this study using 3HP production as an example suggested a great potential framework for processing a non-natural chemical reaction using naturally occurring enzyme using both the pathway prediction and the molecular simulation tools in a

complementary way, thereby efficiently filling in the biochemical gap in the pathway. In addition to the above theoretical suggestion, the use of promiscuous enzymes to actually fill in the biochemical gap was experimentally demonstrated for producing 1,4-BDO from E. coli (Yim et al., 2011) (Fig. 3). In the 1,4-BDO biosynthetic pathway, the reaction converting 4-hydroxybutyryl-CoA (4HB-CoA) to 4-hydroxybutyraldehyde was a biochemical gap. The existence of reaction(s) to convert 4HB-CoA into 1,4-BDO in nature was conceivable especially because the species, Clostridium acetobutylicum (Jewell et al., 1986), from which the gap-filling gene was cloned, was already known to convert 4HB into 1,4-BDO although the responsible enzymes had not been characterized. Although evolutionary pressure induced many enzymes for selectively catalyzing each particular biochemical reaction, it is not true for all the enzymes because many of them might not have been pressured as much and still exhibit broad substrate specificities. A result of taking this approach was the use of a bifunctional aldehyde/alcohol dehydrogenase capable of catalyzing the hypothetical conversion from 4HB-CoA to 1,4-BDO, which was experimentally confirmed for the production of 1,4-BDO from glucose in engineered E. coli. The particular portion of the novel pathway for producing 1,4-BDO from 4HB-CoA would not have been a biochemical gap if the enzyme had been identified. Therefore, understanding the promiscuity of all the enzymes that have been discovered thus far and inclusion of such information in the metabolic databases would greatly help us to design novel pathways. Potential use of promiscuous enzymes for filling in gaps in the novel metabolic pathway is more fortified by using heterologous 2-keto-acid decarboxylase (KDC) and alcohol dehydrogenase ADH to process various 2-keto acids and thereby expanding the pool of higher alcohols that can be produced using a single host strain (Atsumi et al., 2008) (Fig. 3). KDC and ADH are enzymes involved in the Ehrlich pathway, which produces

930

J.H. Shin et al. / Biotechnology Advances 31 (2013) 925–935

Table 2 Alcohols produced from natural and non-natural 2-keto acids using their recursive pathway in E. coli as a host organism. The 2-keto acids can be converted into various alcohols by promiscuous KDC and ADH. Introduction of a few promiscuous enzymes can drastically change the host metabolism and lead to producing variety of molecules. In contrast to the existence of natural recursive pathways for fatty acids, polyketides and isoprenoids in bacteria, 2-keto acid biosynthesis is known to be non-recursive in bacteria, while it is recursive in some plants and archaea (Felnagle et al., 2012). Additional strategies for 1-hexanol and 1-octanol production are also listed for comparison. Approaches and strategies for each corresponding study are also shown. Chemical

Titer

Synthetic biological relevance

References

1-Pentanol (C5)

750.5 (±52.9) mg/ L 2.220 (±0.142) g/ L 264.5 (± 9.9) mg/L 1.25 g/L

Binding pocket expansion (V461A/M538A) for KDC and leucine-feedback resistance for LeuA (G462D)

Zhang et al. (2008)

Leucine-feedback resistance (G462D) and RosettaDesign-assisted binding pocket expansion (S139G) for LeuA in the threonine-overproducing strain

Marcheschi et al. (2012) Zhang et al. (2008) Cann and Liao (2008) Zhang et al. (2008)

2-Methyl-1-butanol (C5)

3-Methyl-1-butanol (C5)

1-Hexanol (C6)

963.1 (±48.3) mg/ L 1.28 g/L 38.4 (±8.3) mg/L 302 (±33) mg/L 210 mg/L 469 mg/L

3-Methyl-1-pentanol 793.5 (C6) (±46.5) mg/L 4-Methyl-1-pentanol 202.4 (C6) (±1.1) mg/L 1-Heptanol (C7) 80 (±11) mg/L 4-Methyl-1-hexanol (C7) 1-Octanol (C8)

57.3 (±7.8) mg/L 2.0 (±0.26) mg/L 100 mg/L

5-Methyl-1-heptanol 22.0 (±2.5) (C8) mg/L 2-Phenylethanol 664.4 mg/L 3-Phenylpropanol

4.1 mg/L

Binding pocket expansion (V461A) for KDC and leucine-feedback resistance for LeuA (G462D) Overexpression of isoleucine and threonine biosynthetic genes, use of promiscuous KDS and ADH2 and knockout of genes in the competing pathway Binding pocket expansion (F381L/V461A) for KDC and leucine-feedback resistance for LeuA (G462D)

Overexpression of valine and leucine biosynthetic genes, use of promiscuous KDC and ADH2 and knockout of genes in the competing pathway Binding pocket expansion for LeuA (H97A/S139G) and KDC (F381L/V461A), and leucine-feedback resistance for LeuA (G462D) Leucine-feedback resistance LeuA (G462D) and RosettaDesign-assisted binding pocket expansion for LeuA (H97A/S139G) for threonine-overproducing strain

Connor and Liao (2008) Zhang et al. (2008) Marcheschi et al. (2012) Reverse β-oxidation for carbon chain elongation Dellomonaco et al. (2011) Use of broad substrate range BktB, Hbd/PaaH, Crt, Ter and ADH and directed evolution of PaaH Machado et al. (2012) Binding pocket expansion (F381L/V461A) for KDC and for LeuA (S139G), and leucine-feedback resistance (G462D) for Zhang et al. LeuA (2008) Binding pocket expansion (H97L/S139G) for LeuA and KDC (F381L/V461A), and leucine-feedback resistance (G462D) for LeuA Leucine-feedback resistance (G462D) and RosettaDesign-assisted binding pocket expansion (H97A/S139G/N167G/ P169A) for LeuA in the threonine-overproducing strain

Zhang et al. (2008) Marcheschi et al. (2012) Binding pocket expansion (H97A/S139G/N167A) for LeuA and for KDC (F381L/V461A), and leucine-feedback Zhang et al. resistance (G462D) for LeuA (2008) Leucine-feedback resistance (G462D) and RosettaDesign-assisted binding pocket expansion (H97A/S139G/N167G/ Marcheschi et P169A) for LeuA in the threonine-overproducing strain al. (2012) Reverse β-oxidation for carbon chain elongation Dellomonaco et al. (2011) Binding pocket expansion (H97A/S139G/N167A) for LeuA and for KDC (F381L/V461A), and feedback resistance Zhang et al. (G462D) for LeuA (2008) Marcheschi et Leucine-feedback resistance (G462D) and RosettaDesign-assisted binding pocket expansion (H97A/S139G/N167G/ P169A) for LeuA in the phenylalanine-overproducing strain al. (2012) Leucine-feedback resistance (G462D) and RosettaDesign-assisted expansion of binding pocket (H97A/S139G/N167G/ Marcheschi et P169A) for LeuA in the phenylalanine-overproducing strain al. (2012)

higher alcohols via decarboxylation followed by reduction of 2-keto acids in two steps: for instance, biosynthesis of 1-propanol and isobutanol from 2-ketobutyrate and 2-ketoisovalerate, respectively, using KDC and ADH (Atsumi et al., 2008). Based on this concept, introduction of promiscuous enzymes can increase the number of desired compounds that can be produced from the existing metabolites in a single host organism (Fig. 3; Table 2). By simply introducing a broad-substrate range enzyme KDC, decarboxylation of six different α-keto acids was possible for the production of six different alcohols in an engineered E. coli strain (Atsumi et al., 2008). The six corresponding aldehydes would then be reduced by a broadsubstrate range ADH. This interesting study would not have been possible without understanding of the broad-substrate specificities of the enzymes in the Ehrlich pathway. This approach can be taken for developing other artificial novel metabolic pathways by filling in the gaps using promiscuous enzymes for the production of the target chemical. The following studies also demonstrated increased metabolic capacities of host enzymes by improving their promiscuity, and ultimately enabling new metabolic pathways which otherwise would have been metabolic gaps in the reaction space. Based on the demonstrated promiscuity of KDC and ADH, the subsequent studies further

explored and challenged their capability for accepting even more diverse types of substrates (Table 2). Solely based on the fact that the molecular structure of 2-keto-3-methylvalerate, a precursor of L-isoleucine, highly resembles that of 2-ketoisovalerate, LeuA involved in the L-leucine biosynthesis was tested for its possible substrate promiscuity (Zhang et al., 2008). The hypothesis underlying this study was entirely envisioned by examining the similarities of the molecular structures by intuition without running a pathway prediction program. LeuA was indeed promiscuous enough to use 2-keto-3-methylvalerate, as a substrate allowing biosynthesis of 2-keto-4-methylhexanoate, a non-natural metabolite. Molecular simulations can then be conducted with existing structural information of proteins in order to assist rational enzyme design for improving their promiscuity, thereby increasing the chance of filling in the biochemical gaps. This approach was demonstrated for binding pocket expansion of LeuA and 2-ketoisovalerate decarboxylase, a type of KDC (Marcheschi et al., 2012; Zhang et al., 2008). Quantum mechanics calculations of the transition state using the Gaussian software indicated that the non-natural substrates with longer side chains do not have thermodynamic barrier to the initial step of elongation reactions,

J.H. Shin et al. / Biotechnology Advances 31 (2013) 925–935

931

Fig. 3. A scheme of the expanding host metabolism by multiple biochemical reactions with introduction of just one or a few promiscuous enzymes. (A) Bifunctional aldehyde/alcohol dehydrogenase (vertical cylinders) from C. acetobutylicum, which is known for converting butyryl-CoA into butanol, exhibits enough substrate promiscuity for 4HB-CoA, allowing production of 1,4-BDO. (B) Broad-substrate range 2-keto-acid decarboxylase (KDC, horizontal cylinders) generally decarboxylates 2-keto acids and the resulting aldehyde is reduced by alcohol dehydrogenases (ADH, blocks) in the Ehrlich pathway. LeuA*, a mutated form of LeuA, can compete for 2-keto acids for their elongation after LeuA*BCD reactions (crosses with thick outline). Besides the higher alcohols detected for the purpose of the experiment, there may exist more types of 2-keto acids that can be decarboxylated by KDC from other donor organisms (question marks). (C) As part of the LeuABCD pathway, LeuA, known to elongate 2-ketoisovalerate (2-KIV) into 2-ketoisocaproate (2-KIC), can be modified (LeuA*) to elongate other 2-keto acids for the higher alcohol production. (D) Propionate CoAtransferase (pct, shorter cylinders) and PHA synthase (longer cylinders) can be engineered, indicated with an asterisk, for broader substrate specificities for polymer production. Abbreviations are: 3HA, 3-hydroxyalkanoate; 3HB, 3-hydroxybutyrate. (E) Expansion of metabolism for producible non-natural metabolites can be achieved by introduction of a few promiscuous heterologous genes. Final chemicals from promiscuous enzymes shown on the metabolic map are: a, propanol; b, butanol; c, pentanol; d, hexanol; e, 2-methyl-1-butanol; f, 3-methyl-1-pentanol; g, 4-methyl-1-hexanol; h, 5-methyl-1-heptanol; i, isobutanol; j, 3-methyl-1-butanol; k, 4-methyl-1-pentanol; l, phenylethanol; m, phenylpropanol; n, polylactic acid; o, poly(3-hydroxybutyrate-co-lactate) or (3HB-co-LA); p, 1,4-BDO.

and molecular modeling using the RosettaDesign (Kuhlman and Baker, 2000) provided a guideline to manipulate LeuA in order to minimize the steric clash with the substrates. Various LeuA mutants allowed development of a new pool of 2-keto acids after elongation of a shorter 2-keto acid in combination with LeuBCD, enzymes necessary for L-leucine biosynthesis in wild-type E. coli strain (Marcheschi et al., 2012). Each elongation ‘cycle’ by LeuABCD increases the length of 2-keto acid by a carbon and this system can be used in an ‘iterative’ manner to produce a variety of chemicals as long as the sizes of binding pockets in LeuA and KDC allow for producing various compounds (Table 2). Thus, protein engineering assisted with computational calculations for expansion of the binding pocket has proven to show higher potential of effectively filling in the biochemical gaps in the novel metabolic pathway by enabling the promiscuous enzymes to catalyze reactions involving a broader range of substrates. As the last example, the use of promiscuous enzymes can also be found in the production of microbial polyesters using a modified polyhydroxyalkanoate (PHA) synthase, which produces a wide range of PHAs having varying carbon-chain lengths and side chains (Jung et al., 2010; Lee, 1996; Yang et al., 2010). For the biosynthesis of polylactic acid and lactate-containing copolymers, which are non-natural polymers, it was critical to evolve propionate CoA transferase to convert lactate to lactoyl-CoA, and PHA synthase to incorporate lactoyl-CoA to the growing chain of polylactic acid. Hence, these two enzymes were subjected to the directed evolution for broadening their substrate utilization range, and consequently used to produce polylactic acid and lactate-containing copolymers in E. coli, which enabled the one-step microbial fermentative production of these non-natural polyesters (Yang et al., 2010). Subsequent systems metabolic engineering further improved the biopolymer production by

employing in silico simulations and actual gene manipulations (Jung et al., 2010) (Fig. 3; Table 1). 3.2. Creation of de novo enzymes for expanding metabolic landscape If the appropriate promiscuous enzymes cannot fill in the biochemical gap of de novo metabolic pathway, it is necessary to custom-design novel enzymes for the reactions that have not been identified to occur in nature. This step can be critical for the production of certain non-natural chemicals as classical metabolic engineering techniques, including in silico simulations, omics data analysis and genetic manipulations, and the pathway prediction tools are not capable of suggesting the enzymes required for such reaction steps. Recently, several methods have been reported along with convincing experimental results. Successful examples include Rosetta-assisted design of enzymes for reactions, including Kemp elimination (Khersonsky et al., 2012; Rothlisberger et al., 2008), Diels–Alder (Siegel et al., 2010) and retro-aldol (Althoff et al., 2012; Jiang et al., 2008) reactions. The most important part of this progress is that no known enzymes in nature carry out these reactions. These custom-designed enzymes are built with RosettaDesign and RosettaMatch (Zanghellini et al., 2006), which are based on constructing theozymes or theoretical enzymes (Tantillo et al., 1998). With the help of careful quantum mechanical calculations for the transition states, an active site with catalytic residues ideally positioned with respect to the substrate is modeled in a conformational space. From a library of scaffold proteins, an ideal set of pockets are chosen to accommodate such catalytic residues surrounding the transition state of the ligand. From all the possible active sites generated, the scaffold designs are screened for steric clashes, so that the number of designs becomes more manageable for experimental validation.

932

J.H. Shin et al. / Biotechnology Advances 31 (2013) 925–935

Another successful example is the development of a biocatalyst for the production of an anti-diabetic compound (Savile et al., 2010). Although the novel enzymes developed have not been optimized to be a part of a complete metabolic pathway, the advances these studies brought upon are of high impact in protein engineering. With its wide applications, the field of protein engineering will keep advancing. It is envisioned that more robust computational tools will be developed for systematically designing the necessary enzymes for desired reactions, and ultimately for constructing cell factories for the production of industrially valuable chemicals. 3.3. Expanding enzyme availabilities by genome mining If the identification of promiscuous enzymes and the construction of novel enzymes fail to fill in the gap of the de novo metabolic pathway due to the unexplored biological knowledge space, the hypothetical proteins can also be considered. Hypothetical proteins that have not been thoroughly investigated might possess a function of interest in order to fill in the reaction gap. This step thus utilizes the unprecedentedly large and still increasing amounts of genome sequences of various organisms as well as metagenomic data. As of September 2012, 3705 genome projects spanning archaea, bacteria and eukaryotes have been completed according to the Genomes OnLine Database (GOLD) (Liolios et al., 2010). Furthermore, increasing amount of metagenomic data is becoming available through the contribution of the consortium-level research efforts (Abubucker et al., 2012); currently 339 metagenomic studies covering 2048 samples are being completed (Liolios et al., 2010). Despite the availability of unprecedentedly large amounts of genomic and metagenomic data, correctly annotating genes is mostly limited to already available biochemical information since it usually relies on their sequence alignment to those with known biochemical properties from previous experiments. Moreover, a large portion of automatically predicted proteins is annotated to be hypothetical or putative (for example, as much as 30%–50% of proteins in E. coli), which might contain the target enzymes for the desired reactions (Kimelman et al., 2012; Markowitz et al., 2010). In order to identify hypothetical enzymes for novel biocatalysis from their entire pool, query enzymes, which catalyze reactions similar to the reaction gaps in the designed metabolic pathway, should be carefully selected first (Fig. 4). Query enzymes can be selected based on their interacting substrates whose chemical structures are similar to those of the target non-natural reaction (Fig. 4A). Once query enzymes are determined, potentially useful hypothetical proteins can be narrowed down based on their biochemical information using well-established concepts and tools. Such biochemical information include 3-D structure comparison (Hermann et al., 2007; Matte et al., 2007; Shin et al., 2007), genome context analysis (Korbel et al., 2005; Kuhn et al., 2010; Szklarczyk et al., 2011), and omics data analysis (Yoon et al., 2012) (Fig. 4B). A key approach here is to deploy well-established tools in a complementary manner, depending on the availability of information on the query enzymes. There are software tools that allow characterization of various aspects of an enzyme. For annotating hypothetical proteins using their structural information, several online tools are available (Table 3), which enable a user to compare the known structure of a query enzyme against various protein databases, such as Protein Data Bank. Hypothetical proteins might be detected as an output from this analysis, which can then be assigned a preliminary biochemical function to fill in the biochemical gap of the designed pathway. Another approach worth considering is genome context analysis, preferably using STRING (Szklarczyk et al., 2011). If a query enzyme is predicted to be linked with any hypothetical protein by conserved neighborhood, gene fusion, and co-occurrence analyses of genes by STRING, the functions of hypothetical proteins can be inferred by following the associated known enzymes. Consequently, these hypothetical

proteins can be examined to fill in the gap. This concept is similar to the analysis of omics data, which often employs clustering algorithms. If a hypothetical protein is functionally related with the query enzyme, then they both would be clustered into a single functional group based on their similar expression patterns. STITCH (Kuhn et al., 2010) is also useful for identifying hypothetical proteins of interest, as it would present proteins associated with the chemical of interest. Taken together, all these tools can be employed in a creative way, depending on the available information on the target enzyme, such that heterogeneous data generated from different sources become coherent to one another. Although the hitherto discussion on investigation of hypothetical proteins focused on high-throughput computational methods, experimental methods, especially enzyme assays, should never be overlooked. Experiments provide solid evidences to assign novel or known functions for a hypothetical protein. So far, characterization of hypothetical proteins tends to be skewed toward those associated with virulence and resistance in microbial pathogens (Colmer et al., 1998; Gengenbacher et al., 2008). This is probably due to the relative easiness of identifying such genes that are associated with cellular survival, compared to proteins with biochemical functions other than virulence and resistance. In vitro enzyme assay stands to be the best way to characterize the protein of interest, but is not suited for narrowing down the list of candidate hypothetical proteins in a high-throughput manner due to laboriousness. Instead, other medium-throughput assays, such as the use of antibodies specific to the query enzymes or enzyme-specific functional screening, should be considered to experimentally screen initial candidates (Beare et al., 2008; Reyes-Duarte et al., 2012). Initial experimental screening ultimately produces a reliable list of candidate proteins, from which gap-filling in the designed metabolic pathway can further be pursued. 4. Strain optimization Once the novel biosynthetic pathway for the target chemical is constructed, biological system of the host organism needs to be optimized at the systems-level for the most efficient production of the target chemical (Park et al., 2007; Solomon and Prather, 2011; Yim et al., 2011). Several challenges lie ahead at this stage, which consists of: (1) assuring in vivo activities of newly created novel enzymes, (2) adjusting the expression levels of the enzymes in the constructed biosynthetic pathway, (3) knocking out, knocking down, and/or overexpressing critical genes for metabolic flux optimization, and (4) optimizing the whole bioprocess, including fermentation and downstream processes. Of course, these steps do not necessarily have to be sequential, and the processes often go backwards and are iterative for feedback optimization. Throughout the development of metabolic engineering for more than two decades, each of these challenges has been well addressed along with appropriate techniques (Lee et al., 2012). Due to the importance of this process, however, the strain optimization process is briefly reviewed again here with the case study on 1,4-BDO (Yim et al., 2011). In vivo activity of the novel enzyme, either newly created or evolved from a template enzyme, and the whole biosynthetic pathway must be confirmed and optimized. The first step is to confirm the in vitro activities of the evolved enzymes with standard techniques of biochemistry and molecular biology. Next, various factors involving gene expression need to be optimized; they include plasmid copy number, promoter and ribosome binding site, source of the heterologous gene, intergenic region, isozymes with different cofactor usages, and others. In the case of 1,4-BDO production by engineered E. coli, the codon-optimization (025B) of the monofunctional aldehyde dehydrogenase (025 for the native enzyme) from Clostridium beijerinckii was necessary for in vivo conversion of 4HB-CoA to 1,4-BDO since the native form did not show any in vitro activity (Yim et al., 2011). Also, the codon optimized 025B instead of the bifunctional

J.H. Shin et al. / Biotechnology Advances 31 (2013) 925–935

933

Fig. 4. Strategies for searching potential existing enzymes to fill in the designed biosynthetic pathway using genome mining. (A) First step is to select a query enzyme, and this can be conducted by investigating their interacting substrates whose chemical structures are similar to those of the target non-natural reaction. Each circle with a capital letter indicates a metabolite, and capital letters with apostrophe indicate the chemically similar metabolites. (B) Once query enzymes are determined, potentially useful hypothetical proteins can be narrowed down based on 3-D structure comparisons, genome context analysis, and omics data analysis. In the case of 3-D structure comparisons, the known structure of the query enzyme can be compared against various protein databases, such that hypothetical proteins may be detected as an output from this analysis, which can then be assigned a preliminary biochemical function. For the genomic context and omics data analyses, if the query enzyme is predicted to be linked with any hypothetical proteins by conserved neighborhood, gene fusion, and co-occurrence of genes, or clustering, those linked hypothetical proteins may be tested for the unknown reaction in the biosynthetic pathway. Gene X herein indicates a hypothetical gene.

Table 3 Software tools useful for annotation of hypothetical proteins. Software or database name

URL

Ref

Tools employing structural information of query enzymes in order to identify similar proteins from protein databases Dali server http://ekhidna.biocenter.helsinki.fi/ Holm and dali_server/ Rosenstrom (2010) PDBeFold http://www.ebi.ac.uk/msd-srv/ssm/ Krissinel and Henrick (2004) deconSTRUCT http://epsf.bmad.bii.a-star.edu.sg/ Zhang et al. struct_server.html (2010b) VAST http://www.ncbi.nlm.nih.gov/Structure/ Gibrat et al. (1996) VAST/vast.shtml FATCAT http://fatcat.burnham.org/fatcat-cgi/cgi/ Ye and Godzik fatcat.pl?-func=search (2003) iSARST http://140.113.15.73/iSARST/ Lo et al. (2009) SALAMI http://public.zbh.uni-hamburg.de/salami/ – Superimposé http://farnsworth.charite.de/ – superimpose-web/index.jsp Vorometric http://bio.cse.ohio-state.edu/Vorometric/ Sacan et al. (2008) Tools employing amino acid sequences of query enzymes in order to infer protein or chemical interactions STRING http://string.embl.de/ Szklarczyk et al. (2011) STITCH http://stitch.embl.de/ Kuhn et al. (2010)

aldehyde/alcohol dehydrogenase (002C for the codon-optimized enzyme) was used so that byproduct production could be minimized. Most of these approaches were performed based on biological insight, but computational tools can potentially assist for more systematic analysis: e.g., Gene Designer (Villalobos et al., 2006) for codon optimization, and RBS Calculator (Salis et al., 2009) and RBSDesigner (Na and Lee, 2010) for ribosome binding sites. If the novel metabolic pathway constructed is successfully operating in the host organism, further systems metabolic engineering is needed to improve the strain's capability of producing the desired chemical. Constraints-based flux analysis using a genome-scale metabolic network model can greatly facilitate this process (Kim et al., 2008; Park et al., 2009). Briefly, this approach considers the whole metabolism of the host organism using stoichiometric balancing of the constituting metabolites, such that it can predict which reactions to be manipulated, including knockout and overexpression, on a genome-wide scale for the overproduction of the target chemical. For the production of 1,4-BDO, the OptKnock algorithm (Burgard et al., 2003) was used to predict four gene knockout targets that would enable high 1,4-BDO production while the cellular growth is sustained (Yim et al., 2011). Algorithms for gene amplification (Choi et al., 2010) or heterologous pathway design (Chatsurachai et al., 2012) are also available as well. Biological insight is of course necessary for additional engineering, such as removing or rewiring regulatory circuits and competing pathways, such that the metabolic flux is optimized toward the target chemical.

934

J.H. Shin et al. / Biotechnology Advances 31 (2013) 925–935

Finally, overall bioprocess should be systematically reviewed in order to maximize the host strain's production performance. Typical examples include design of a defined minimal medium and cultivation condition, such as degree of aeration, pH and temperature, and nutrient feeding during the fed-batch culture. In particular, the use of a chemically defined minimal medium is critical in many cases as it strongly affects the overall cost of the bioprocess (Song et al., 2008). The 1,4-BDO-producing E. coli strain was cultivated in M9 glucose minimal medium, but different media can be considered for other microbial hosts. Considerations on the bioprocess are discussed in detail elsewhere (Parekh et al., 2000; Park et al., 2008, 2011). Representative examples of microbial production of bulk chemicals, which have typically been produced by petroleum refineries, are summarized in Tables 1 and 2. 5. Conclusions Many chemicals that have been used in our daily lives are petroleum-derived, and their efficient and sustainable production by biorefineries has become important. Re-inventing microorganisms for production of so far petroleum-derived chemicals requires design and creation of novel metabolic pathways and system-wide optimization of the microorganisms. The approaches described in this paper are just a beginning of what we will see in the future. Still, there exist many innovative approaches to be developed utilizing ‘big’ biological data, while designing and ranking novel metabolic pathways require more insight and experience. Many hypothetical proteins and promiscuous enzymes need to be more thoroughly characterized for filling in the biochemical gaps. Furthermore, newly designed de novo enzymes still need to be more rigorously validated in the cell. Most importantly, both intuitive and systematic approaches should be undertaken in a complementary way, depending on the given objectives and problems. In the case of the target chemicals produced based on the intuitive approach, there might be unexplored biochemical reaction spaces where the software tools are useful for searching. Likewise, strict adherence to the software tools can lead to biologically fallible decisions, and combined use of biological intuition can help sorting out computer-based candidates. Despite all these awaiting challenges, there is no doubt that the methods for designing novel metabolic pathways will continue to advance toward the expected goal of overproducing majority of the conventionally petroleum-derived non-natural chemicals in the near future. Acknowledgment This work was supported by the Technology Development Program to Solve Climate Changes on Systems Metabolic Engineering for Biorefineries (NRF-2012-C1AAA001-2012M1A2A2026556) and the Intelligent Synthetic Biology Center of Global Frontier Project (2011-0031963) from the Ministry of Education, Science and Technology (MEST) through the National Research Foundation of Korea. References Abubucker S, Segata N, Goll J, Schubert AM, Izard J, Cantarel BL, et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput Biol 2012;8:e1002358. Althoff EA, Wang L, Jiang L, Giger L, Lassila JK, Wang Z, et al. Robust design and optimization of retroaldol enzymes. Protein Sci 2012;21:717–26. Atsumi S, Hanai T, Liao JC. Non-fermentative pathways for synthesis of branched-chain higher alcohols as biofuels. Nature 2008;451:86–9. Barker JL, Frost JW. Microbial synthesis of p-hydroxybenzoic acid from glucose. Biotechnol Bioeng 2001;76:376–90. Beare PA, Chen C, Bouman T, Pablo J, Unal B, Cockrell DC, et al. Candidate antigens for Q fever serodiagnosis revealed by immunoscreening of a Coxiella burnetii protein microarray. Clin Vaccine Immunol 2008;15:1771–9. Brunk E, Neri M, Tavernelli I, Hatzimanikatis V, Rothlisberger U. Integrating computational methods to retrofit enzymes to synthetic pathways. Biotechnol Bioeng 2012;109:572–82.

Burgard AP, Pharkya P, Maranas CD. Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol Bioeng 2003;84:647–57. Cann AF, Liao JC. Production of 2-methyl-1-butanol in engineered Escherichia coli. Appl Microbiol Biotechnol 2008;81:89–98. Car R, Parrinello M. Unified approach for molecular dynamics and density-functional theory. Phys Rev Lett 1985;55:2471–4. Carbonell P, Faulon JL. Molecular signatures-based prediction of enzyme promiscuity. Bioinformatics 2010;26:2012–9. Carbonell P, Planson AG, Fichera D, Faulon JL. A retrosynthetic biology approach to metabolic pathway design for therapeutic production. BMC Syst Biol 2011;5. Chatsurachai S, Furusawa C, Shimizu H. An in silico platform for the design of heterologous pathways in nonnative metabolite production. BMC Bioinformatics 2012;13:93. Cho A, Yun H, Park JH, Lee SY, Park S. Prediction of novel synthetic pathways for the production of desired chemicals. BMC Syst Biol 2010;4:35. Choi HS, Lee SY, Kim TY, Woo HM. In silico identification of gene amplification targets for improvement of lycopene production. Appl Environ Microbiol 2010;76:3097–105. Colmer JA, Fralick JA, Hamood AN. Isolation and characterization of a putative multidrug resistance pump from Vibrio cholerae. Mol Microbiol 1998;27:63–72. Connor MR, Liao JC. Engineering of an Escherichia coli strain for the production of 3-methyl-1-butanol. Appl Environ Microbiol 2008;74:5769–75. Dellomonaco C, Clomburg JM, Miller EN, Gonzalez R. Engineered reversal of the beta-oxidation cycle for the synthesis of fuels and chemicals. Nature 2011;476: 355–9. Draths KM, Frost JW. Environmentally compatible synthesis of adipic acid from D-glucose. J Am Chem Soc 1994;116:399–400. Draths KM, Frost JW. Environmentally compatible synthesis of catechol from D-glucose. J Am Chem Soc 1995;117:2395–400. Faulon JL, Collins MJ, Carr RD. The signature molecular descriptor. 4. Canonizing molecules using extended valence sequences. J Chem Inf Comput Sci 2004;44:427–36. Felnagle EA, Chaubey A, Noey EL, Houk KN, Liao JC. Engineering synthetic recursive pathways to generate non-natural small molecules. Nat Chem Biol 2012;8:518–26. Finley SD, Broadbelt LJ, Hatzimanikatis V. Computational framework for predictive biodegradation. Biotechnol Bioeng 2009;104:1086–97. Gao JF, Ellis LBM, Wackett LP. The University of Minnesota Pathway Prediction System: multi-level prediction and visualization. Nucleic Acids Res 2011;39:W406–11. Gengenbacher M, Xu T, Niyomrattanakit P, Spraggon G, Dick T. Biochemical and structural characterization of the putative dihydropteroate synthase ortholog Rv1207 of Mycobacterium tuberculosis. FEMS Microbiol Lett 2008;287:128–35. Gibrat JF, Madej T, Bryant SH. Surprising similarities in structure comparison. Curr Opin Struct Biol 1996;6:377–85. Hatzimanikatis V, Li C, Ionita JA, Henry CS, Jankowski MD, Broadbelt LJ. Exploring the diversity of complex metabolic networks. Bioinformatics 2005;21:1603–9. Henry CS, Broadbelt LJ, Hatzimanikatis V. Discovery and analysis of novel metabolic pathways for the biosynthesis of industrial chemicals: 3-hydroxypropanoate. Biotechnol Bioeng 2010;106:462–73. Hermann JC, Marti-Arbona R, Fedorov AA, Fedorov E, Almo SC, Shoichet BK, et al. Structure-based activity prediction for an enzyme of unknown function. Nature 2007;448:775–9. Holm L, Rosenstrom P. Dali server: conservation mapping in 3D. Nucleic Acids Res 2010;38:W545–9. Jankowski MD, Henry CS, Broadbelt LJ, Hatzimanikatis V. Group contribution method for thermodynamic analysis of complex metabolic networks. Biophys J 2008;95: 1487–99. Jewell JB, Coutinho JB, Kropinski AM. Bioconversion of propionic, valeric, and 4-hydroxybutyric acids into the corresponding alcohols by Clostridium acetobutylicum NRRL 527. Curr Microbiol 1986;13:215–9. Jiang L, Althoff EA, Clemente FR, Doyle L, Rothlisberger D, Zanghellini A, et al. De novo computational design of retro-aldol enzymes. Science 2008;319:1387–91. Jung YK, Kim TY, Park SJ, Lee SY. Metabolic engineering of Escherichia coli for the production of polylactic acid and its copolymers. Biotechnol Bioeng 2010;105:161–71. Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res 2010;38:D355–60. Khersonsky O, Kiss G, Rothlisberger D, Dym O, Albeck S, Houk KN, et al. Bridging the gaps in design methodologies by evolutionary optimization of the stability and proficiency of designed Kemp eliminase KE59. Proc Natl Acad Sci U S A 2012;109: 10358–63. Kim HU, Kim TY, Lee SY. Metabolic flux analysis and metabolic engineering of microorganisms. Mol Biosyst 2008;4:113–20. Kimelman A, Levy A, Sberro H, Kidron S, Leavitt A, Amitai G, et al. A vast collection of microbial genes that are toxic to bacteria. Genome Res 2012;22:802–9. Korbel JO, Doerks T, Jensen LJ, Perez-Iratxeta C, Kaczanowski S, Hooper SD, et al. Systematic association of genes to phenotypes by genome and literature mining. PLoS Biol 2005;3:e134. Krissinel E, Henrick K. Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D Biol Crystallogr 2004;60:2256–68. Kuhlman B, Baker D. Native protein sequences are close to optimal for their structures. Proc Natl Acad Sci U S A 2000;97:10383–8. Kuhn M, Szklarczyk D, Franceschini A, Campillos M, von Mering C, Jensen LJ, et al. STITCH 2: an interaction network database for small molecules and proteins. Nucleic Acids Res 2010;38:D552–6. Lee SY. Bacterial polyhydroxyalkanoates. Biotechnol Bioeng 1996;49:1-14. Lee JW, Kim TY, Jang YS, Choi S, Lee SY. Systems metabolic engineering for chemicals and materials. Trends Biotechnol 2011;29:370–8.

J.H. Shin et al. / Biotechnology Advances 31 (2013) 925–935 Lee JW, Na D, Park JM, Lee J, Choi S, Lee SY. Systems metabolic engineering of microorganisms for natural and non-natural chemicals. Nat Chem Biol 2012;8:536–46. Li CH, Henry CS, Jankowski MD, Ionita JA, Hatzimanikatis V, Broadbelt LJ. Computational discovery of biochemical routes to specialty chemicals. Chem Eng Sci 2004;59: 5051–60. Liolios K, Chen IM, Mavromatis K, Tavernarakis N, Hugenholtz P, Markowitz VM, et al. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 2010;38: D346–54. Lo WC, Lee CY, Lee CC, Lyu PC. iSARST: an integrated SARST web server for rapid protein structural similarity searches. Nucleic Acids Res 2009;37:W545–51. Machado HB, Dekishima Y, Luo H, Lan EI, Liao JC. A selection platform for carbon chain elongation using the CoA-dependent pathway to produce linear higher alcohols. Metab Eng 2012;14:504–11. Marcheschi RJ, Li H, Zhang K, Noey EL, Kim S, Chaubey A, et al. A synthetic recursive “+1” pathway for carbon chain elongation. ACS Chem Biol 2012;7:689–97. Markowitz VM, Chen IM, Palaniappan K, Chu K, Szeto E, Grechkin Y, et al. The integrated microbial genomes system: an expanding comparative analysis resource. Nucleic Acids Res 2010;38:D382–90. Martin CH, Nielsen DR, Solomon KV, Prather KL. Synthetic metabolism: engineering biology at the protein and pathway scales. Chem Biol 2009;16:277–86. Matte A, Jia Z, Sunita S, Sivaraman J, Cygler M. Insights into the biology of Escherichia coli through structural proteomics. J Struct Funct Genomics 2007;8:45–55. Mavrovouniotis ML. Estimation of standard Gibbs energy changes of biotransformations. J Biol Chem 1991;266:14440–5. McKenna R, Nielsen DR. Styrene biosynthesis from glucose by engineered E. coli. Metab Eng 2011;13:544–54. Medema MH, van Raaphorst R, Takano E, Breitling R. Computational tools for the synthetic design of biochemical pathways. Nat Rev Microbiol 2012;10:191–202. Moon TS, Yoon SH, Lanza AM, Roy-Mayhew JD, Prather KL. Production of glucaric acid from a synthetic pathway in recombinant Escherichia coli. Appl Environ Microbiol 2009;75:589–95. Moon TS, Dueber JE, Shiue E, Prather KL. Use of modular, synthetic scaffolds for improved production of glucaric acid in engineered E. coli. Metab Eng 2010;12: 298–305. Na D, Lee D. RBSDesigner: software for designing synthetic ribosome binding sites that yields a desired level of protein expression. Bioinformatics 2010;26:2633–4. Niu W, Draths KM, Frost JW. Benzene-free synthesis of adipic acid. Biotechnol Prog 2002;18:201–11. Parekh S, Vinci VA, Strobel RJ. Improvement of microbial strains and fermentation processes. Appl Microbiol Biotechnol 2000;54:287–301. Park HS, Nam SH, Lee JK, Yoon CN, Mannervik B, Benkovic SJ, et al. Design and evolution of new catalytic activity with an existing protein scaffold. Science 2006;311:535–8. Park JH, Lee KH, Kim TY, Lee SY. Metabolic engineering of Escherichia coli for the production of L-valine based on transcriptome analysis and in silico gene knockout simulation. Proc Natl Acad Sci U S A 2007;104:7797–802. Park JH, Lee SY, Kim TY, Kim HU. Application of systems biology for bioprocess development. Trends Biotechnol 2008;26:404–12. Park JM, Kim TY, Lee SY. Constraints-based genome-scale metabolic simulation for systems metabolic engineering. Biotechnol Adv 2009;27:979–88. Park JH, Kim TY, Lee KH, Lee SY. Fed-batch culture of Escherichia coli for L-valine production based on in silico flux response analysis. Biotechnol Bioeng 2011;108: 934–46. Reed JL, Vo TD, Schilling CH, Palsson BO. An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol 2003;4:R54. Reyes-Duarte D, Ferrer M, Garcia-Arellano H. Functional-based screening methods for lipases, esterases, and phospholipases in metagenomic libraries. Methods Mol Biol 2012;861:101–13. Rodrigo G, Carrera J, Prather KJ, Jaramillo A. DESHARKY: automatic design of metabolic pathways for optimal cell growth. Bioinformatics 2008;24:2554–6. Rothlisberger D, Khersonsky O, Wollacott AM, Jiang L, DeChancie J, Betker J, et al. Kemp elimination catalysts by computational enzyme design. Nature 2008;453:190–5.

935

Sacan A, Toroslu IH, Ferhatosmanoglu H. Integrated search and alignment of protein structures. Bioinformatics 2008;24:2872–9. Salis HM, Mirsky EA, Voigt CA. Automated design of synthetic ribosome binding sites to control protein expression. Nat Biotechnol 2009;27:946–50. Savile CK, Janey JM, Mundorff EC, Moore JC, Tam S, Jarvis WR, et al. Biocatalytic asymmetric synthesis of chiral amines from ketones applied to sitagliptin manufacture. Science 2010;329:305–9. Shin DH, Hou J, Chandonia JM, Das D, Choi IG, Kim R, et al. Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center. J Struct Funct Genomics 2007;8:99-105. Siegel JB, Zanghellini A, Lovick HM, Kiss G, Lambert AR, St Clair JL, et al. Computational design of an enzyme catalyst for a stereoselective bimolecular Diels–Alder reaction. Science 2010;329:309–13. Soh KC, Hatzimanikatis V. DREAMS of metabolism. Trends Biotechnol 2010;28:501–8. Solomon KV, Prather KL. The zero-sum game of pathway optimization: emerging paradigms for tuning gene expression. Biotechnol J 2011;6:1064–70. Song H, Kim TY, Choi BK, Choi SJ, Nielsen LK, Chang HN, et al. Development of chemically defined medium for Mannheimia succiniciproducens based on its genome sequence. Appl Microbiol Biotechnol 2008;79:263–72. Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 2011;39:D561–8. Tantillo DJ, Chen J, Houk KN. Theozymes and compuzymes: theoretical models for biological catalysis. Curr Opin Chem Biol 1998;2:743–50. Ugi I, Bauer J, Brandt J, Friedrich J, Gasteiger J, Jochum C, et al. New applications of computers in chemistry. Angew Chem Int Ed 1979;18:111–23. Verhoef S, Ruijssenaars HJ, de Bont JAM, Wery J. Bioproduction of p-hydroxybenzoate from renewable feedstock by solvent-tolerant Pseudomonas putida S12. J Biotechnol 2007;132:49–56. Villalobos A, Ness JE, Gustafsson C, Minshull J, Govindarajan S. Gene Designer: a synthetic biology tool for constructing artificial DNA segments. BMC Bioinformatics 2006;7:285. Weininger D. SMILES, a chemical language and information-system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 1988;28:31–6. Wierckx NJP, Ballerstedt H, de Bont JAM, Wery J. Engineering of solvent-tolerant Pseudomonas putida S12 for bioproduction of phenol from glucose. Appl Environ Microbiol 2005;71:8221–7. Wittmann C, Lee SY. Systems metabolic engineering. Dordrecht: Springer; 2012. Yang TH, Kim TW, Kang HO, Lee SH, Lee EJ, Lim SC, et al. Biosynthesis of polylactic acid and its copolymers using evolved propionate CoA transferase and PHA synthase. Biotechnol Bioeng 2010;105:150–60. Ye Y, Godzik A. Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics 2003;19(Suppl. 2):ii246–55. Yim H, Haselbeck R, Niu W, Pujol-Baxley C, Burgard A, Boldt J, et al. Metabolic engineering of Escherichia coli for direct production of 1,4-butanediol. Nat Chem Biol 2011;7:445–52. Yoon SH, Han MJ, Jeong H, Lee CH, Xia XX, Lee DH, et al. Comparative multi-omics systems analysis of Escherichia coli strains B and K-12. Genome Biol 2012;13:R37. Zanghellini A, Jiang L, Wollacott AM, Cheng G, Meiler J, Althoff EA, et al. New algorithms and an in silico benchmark for computational enzyme design. Protein Sci 2006;15: 2785–94. Zhang K, Sawaya MR, Eisenberg DS, Liao JC. Expanding metabolism for biosynthesis of nonnatural alcohols. Proc Natl Acad Sci U S A 2008;105:20653–8. Zhang Y, Thiele I, Weekes D, Li Z, Jaroszewski L, Ginalski K, et al. Three-dimensional structural view of the central metabolic network of Thermotoga maritima. Science 2009;325:1544–9. Zhang K, Li H, Cho KM, Liao JC. Expanding metabolism for total biosynthesis of the nonnatural amino acid L-homoalanine. Proc Natl Acad Sci U S A 2010a;107:6234–9. Zhang ZH, Bharatham K, Sherman WA, Mihalek I. deconSTRUCT: general purpose protein database search on the substructure level. Nucleic Acids Res 2010b;38: W590–4.