Theoretical Considerations for Reprogramming Multicellular Systems

Theoretical Considerations for Reprogramming Multicellular Systems

CHAPTER 5 G Theoretical Considerations for Reprogramming Multicellular Systems Joseph Xu Zhou and Sui Huang Institute for Systems Biology, Seattle, ...

2MB Sizes 1 Downloads 114 Views

CHAPTER

5 G

Theoretical Considerations for Reprogramming Multicellular Systems Joseph Xu Zhou and Sui Huang Institute for Systems Biology, Seattle, WA, USA

INTRODUCTION The physicist Richard Feynman once said: ‘What I cannot create, I do not understand.’ After more than 60 years of advances in molecular biology, it has through molecular dissection provided a solid knowledge base of the central elementary molecular processes of life. Now synthetic biology launches a new era in which we manipulate systems to change their fundamental properties. This goes beyond the singular perturbations used to probe a system feature in traditional ‘analytical’ biology. In the most extreme scenario synthetic biology reaches into the realm of ‘making’ livable systems de novo. The purpose is two-fold: to better understand biological systems or, even going beyond Feynman’s dictum, to harness it for a practical purpose. It is clear that our experience from engineering in man-made technology, such as designing cars or computers, offers ample methodology for synthetic biology. But in doing so we also need to be aware of fundamental differences between biological and artificial systems, notably in view of the question of how the encoded blueprints of programs (software, DNA) are translated into system behaviors. Specifically, here we confront the question of how do genes, which are arranged as a linear code in the genomic DNA and interact with each other in a gene regulatory network, give rise to complex behaviors poised at the optimum between stability and flexibility? One essential feature of complex multicellular organisms is their development (coming into being) from a single cell (fertilized egg) into a multicellular organism comprised not only of a large number of cells, but of discretely distinct types of cells that through ordered arrangement in space form tissues and organs.1 Development of the cell type repertoire of .1000s or so cell types found in the human body follows a hierarchical scheme of cellular differentiation characterized by a binary branching genealogy. Since cell types represent discrete, stable entities, it has long been recognized that they correspond to attractor states of molecular dynamic networks that govern the differentiation of the various cell lineages.25 Early conceptualization of developmental cell behavior characterized by the natural discontinuity of phenotype, the discrete binary decisions of less mature multipotent cells into two lineages, and the rare but observed switching between the discrete phenotypes used a landscape picture with valleys that correspond to the attractor states.6 This ‘epigenetic

Synthetic Biology. DOI: http://dx.doi.org/10.1016/B978-0-12-394430-6.00005-4 © 2013 Elsevier Inc. All rights reserved.

81

SECTION II Computational and Theoretical Tools in Synthetic Biology

landscape’ that C. Waddington sensibly proposed in the 1940s can be considered a pivotal conceptual intermediate for understanding the one-to-many mapping between the blueprint (genome) and the variety of cell phenotypes. The epigenetic landscape is akin to a high-dimensional potential-like surface with multiple potential wells that represent nonequilibrium (meta)stable steady-states and whose topography is encoded in the genome.7 With regard to devices or systems that exhibit multiple distinct states, in engineered systems, orderly state transitions follow a predesigned and deterministic process between explicitly designed a priori discrete states, e.g. the onoff state of a light signal. By contrast biological systems exist in a continuous space that nevertheless exhibit discretely distinct behaviors and achieve the same state through many different state transition paths. More concretely, changes of system states (5 phenotypes) in living cells which appear as quasi-discontinuous switching are driven by noise-induced state transitions that can be biased if not explicitly steered by biological signals.8,9 Hence, the various states must be attractor states for the latter to ensure discreteness and stability in a continuum of possibilities and allow for multiple entry (converging) paths to each state. Complex multicellular organisms are thought to have evolved the needed functionality by ‘apposition’ of new attractors to the existing system. In multicellular systems this has led to developmental paths from the egg cell attractor to the attractor states of the stem cells of the various tissues, and from these stem cells to the mature adult cell types, giving rise to a particular form of the landscape that is encoded by the genome.

82

Evolution thus acts on the genome to shape the particular landscape topography. The attractors are located on the landscape to allow the developing cell to access them in a particular, controlled order  when needed during development and in homeostasis. Access to an attractor is controlled locally, based on fine-tuned transition probabilities and susceptibility to regulatory signals, such as hormones. Thus, while nondeterministic processes, such as molecular noise and bifurcations,10 are in general avoided in engineered systems, biological systems exploit noise to drive the unfolding of the genome-encoded system of constraints into a highly complex structure. They are poised between being robust to noise (staying in the same attractor despite it) and being sensitive to noise (switching to an accessible nearby attractor in response to it). In a cartoonish but useful simplification, we can view development as the noise-driven successive occupation of the attractor states, which can represent intermediate or terminal states  all encoded in the genome.11 Thus the cellular states are predestined yet subject to randomness. Here we are concerned with the modification of the developmental paths between attractors with the ultimate goal to control attractor state transitions  which amounts to changing cell types. This has become known as ‘cell reprogramming.’ While the switch from one cell type to another of a closely related lineage has long been practiced in cell biology research,12 and can be achieved by manipulating just one ‘fate-determining’ gene, it is only a recent demonstration of drastic changes of cell phenotype, the reprogramming of adult cells back to an embryonic stem-cell-like state, referred to as iPS (induced pluripotent state)13 that has convinced the reductionist defenders of orthodoxy that cell-type identities are, after all, not carved in stone but a dynamic entity. Such reprogramming can be robustly achieved by manipulating multiple transcription factors although it remains a stochastic process.5,14 The complete reversion of a cell’s phenotype, which in a first approximation can be regarded as defined by the expression pattern of the tens of thousands of genes of the genome, warrants the shift of the notion of explorative manipulation for analysis to the realm of constructive, hence biosynthetic reengineering of a cell phenotype for practical purposes. In fact, the prospect of manipulating cell types at will, starting from easily available, proliferating cells (such as certain blood cells, liver cells, or fibroblasts) to more challenging therapeutically desired cells, has fostered the hope that such universal cell

CHAPTER 5 Theoretical Considerations for Reprogramming Multicellular Systems

phenotype reprogramming capacity will benefit regenerative medicine, where to repair tissue deficits or correct organ dysfunction one requires specialized cells of certain types in large quantities (such as heart muscle cells, skin stem cells, or pancreas insulin-producing cells).15 However, the design of reprogramming experiments remains largely empirical, typically relying on brute-force trial and error and qualitative ‘educated guesses’ to find how to manipulate the regulatory genes in order to achieve a desired attractor state transition. Here we present a formal framework for this purpose. It is based on our increasing understanding of how gene-regulatory networks produce the stable network attractor states that determine the cell-type-specific gene expression pattern, and will permit us to quantitate the ‘height of the barriers’ that separate the attractors and compute the most efficient path from one attractor to another. We organize this chapter as follows: the second section will introduce the very basic concept of dynamical systems and of the epigenetic or quasi-potential landscape in a qualitative manner for experimental biologists. Engineers, computational, and physical biologists can fast-forward to the central question of how to define a quasi-potential landscape and to calculate transition dynamics that is addressed in detail in the third section. In the fourth section a general step-by-step description of the principles for putting the theory into practice is given, considering the paucity of information on gene regulatory networks available. The fifth section offers examples for blood cell reprogramming and pancreas beta-cell reprogramming. Conclusions and outlook end this chapter in the final section.

CONCEPTUAL FRAMEWORK: GENE REGULATORY NETWORKS, NETWORK STATES, AND CELL TYPES The Gene Regulatory Network and Gene Expression Patterns as Network State A gene regulatory network (GRN)16 consists of the regulatory interactions through which regulatory genes control the expression behavior of their target genes. Since regulatory genes are themselves also target genes, this leads to a network replete with feedback loops that exhibits characteristic dynamics of the global gene expression changes. The gene expression patterns and their temporal behavior are thus predestined by the network structure which has evolved a ‘wiring diagram’ to produce the ‘meaningful,’ stable gene expression states that govern cell phenotypes. More specifically, the GRN controls cell differentiations during development. A GRN in which fate-determining transcription factors (TFs) regulate each other drives the development of tissues by orchestrating the activation or suppression of the appropriate genes across the genome to establish the stable steady-state gene expression patterns that specify a given cell type with its biological functions. These stable gene expression patterns  defined by the N gene of the genome, are the aforementioned attractor states. They guarantee the stability of the cell-type-specific expression patterns.5,17,18 The recent integrated analysis of gene expression profiles of tens of thousands of genes across the genome based on microarrays has provided evidence that cell types indeed represent high-dimensional attractor states of the dynamics of GRNs.5,19,20 If the cell-type-specific genomic expressions are attractors, then they are indeed ‘preprogrammed’ by the genome acting as a blueprint because it specifies the particular ‘wiring diagram’ which is defined by the invariant properties of which gene (conditionally) regulates which one. The GRN is thus directly encoded in the genome, since molecular interactions and their dependence on cooperative action (conditional binding, allostery, etc.) are defined by protein and promoter DNA sequences. Yet, while this establishes the genome as a blueprint, hence warranting the use of engineering metaphors in synthetic biology, the occupation of the attractor states is modulated by noise and environmental signal in a way that is unique to biological systems.

83

SECTION II Computational and Theoretical Tools in Synthetic Biology

The Toggle-Switch as an Example for a Gene Regulatory Network In the following we present a simplifed, qualitative introduction to network dynamics using the well-known example of N 5 2-gene network to demonstrate key concepts of network dynamics, attractors and landscape. The mutual inhibition of two opposing transcription factors (TFs), which we call X1 and X2, is one of the simplest GRNs (Fig. 5.1A). We are now interested in how the network state S, which is defined by the measurable gene expression

(A)

X1

X2

State space & vector field

Gene regulatory network

Quasi-potential landscape

(B)

84 X2

X1

Multi-potent progenitor cell binary cell fate decision

Differentiated (mature) cells

Quasi-potential U(X)

Cell type1

(C)

Cell type 1 Cell type 3

Cell type 2

State space coordinate ( ~ X2/X1 ) X1

Cell type 2

X2

Cell type 3 X1

X2

X1

X2

FIGURE 5.1 Topology, state spaces with the vector field and quasi-potential landscapes of gene-regulatory networks (circuits). For the two gene circuits (A) and (B), various representations are given from left to right: The gene circuit topology, the state space with the vector field, and the quasi-potential landscape. Circuit (A)   represents the cross-inhibition and bistability, with two attractors S1 and S2 which are separated by a saddle point S0. Circuit (B) has two positive-feedback loops which stabilize the central steady-state S0, converting it from unstable to stable. The latter can serve to represent a metastable progenitor state which can   differentiate to two differentiated cell fates, represented by attractors S1 ðX1 cX2 Þ and S2 ðX2 cX1 Þ, as shown in (C).

CHAPTER 5 Theoretical Considerations for Reprogramming Multicellular Systems pattern jointly established by the levels of X1 and X2, changes over time: S(t) 5 [X1(t), X2(t)]. The change of S is visualized as the movement of the state S in the two-dimensional state space spanned by the X1 and the X2 axis. The dynamics of this 2-gene network is constrained because X1 and X2 alter their levels in a coordinated manner  as prescribed by the network interactions (explained in Fig. 5.1A). As a consequence, most (X1, X2)-configurations, or states S, of the network are not stable but experience a ‘force’ of change until all the regulatory forces are balanced. In this case of mutual repression the dynamics is such that one typically   finds two discretely distinct stable stationary states, S1 and S2 (asterisks indicate ‘stability’). These stationary states are stable attractors because in the state space of all possible gene expression patterns they attract nearby states (Fig. 5.1A).21 As argued above they represent the two alternative cell types. The term ‘bistability’ is used to describe such dynamical behaviour in which the very same circuit can chose to be in either one of the two attractor states  depending on which ‘basin of attraction’ in the state space was its initial state, S(t 5 0) 5 [X1(t0), X2(t0)].22 Bistability and the associated inverse expression patterns of X1 and X2 in the two attractor states have been used to explain binary cell fate decisions in which an immature uncommitted multipotent cell can make a decision to commit to either one of two lineages (‘fates’). For instance, in a type of blood cell progenitor that faces the decision to become a cell of the erythroid or the myeloid lineage (roughly, red versus white blood cell lineages), the erythroid lineage exhibits the GATA1HIGH/PU.1LOW expression pattern, whereas the reciprocal myeloid lineage, has the inverse GATA1LOW/PU.1HIGH pattern. The model can also incorporate another feature observed in many of such bistable networks: the two antagonist fate-determining factors not only repress each other (Fig. 5.1A), but often also seem to activate their own expression (Fig. 5.1B). This constitutes a different network architecture, hence a different dynamical behavior. Various mathematical models of the cross-inhibitionauto-stimulation circuit motif indicate that this departure from the classical bistable circuit structure (Fig. 5.1A) typically leads to the stabilization of the symmetrical bipotent state S0, which becomes a locally stable attractor.23 Thus, we actually have a tristable system (Fig. 5.1B). A cell differentiation corresponds to the process in which a progenitor cell at central state (X1MID/X2MID) transitions to either the left attractor with X1HIGH/X2LOW expression pattern or the right attractor with X1HIGH/X2LOW expression pattern (Fig. 5.1C). We have here without equations given the intuition of network dynamics and the attractor states it produces. The above two-gene system, widely used in nature to control binary cell fate decisions, produces three attractor states, and thus constitutes a toy model for a multistable system. State transitions during cell fate commitment and subsequent differentiation can be viewed as transitions between attractor states  which so far are simply attracting points in a state space. But in which direction do spontaneous, noise-driven attractor transitions occur? This question of relative (meta)stability of attractors is of central biological significance, for it determines natural cell fate choice by multipotent progenitor cells and the ease of artificially induced fate switches (reprogramming). To conceptualize and quantitatively define attractor transitions and their rates, it is instrumental to introduce the notion of ‘barriers’ (height) between attractors and the ‘relative depths’ of attractors. This is the reason for the concept of potential-like landscapes which has been proposed even before the specific notion of gene networks and of attractor states in their dynamics.

THE QUASI-POTENTIAL LANDSCAPE Biological systems of course are not just 2-gene circuits but high-dimensional dynamical systems that exhibit a large number of stable steady-states (attractors). Dealing with systems with more than two attractors, which is rather uncommon in synthetic systems, is where the landscape metaphor particularly comes in handy. This will prompt us to leave the domain

85

SECTION II Computational and Theoretical Tools in Synthetic Biology

of intuitive explanations as presented above and enter the realm of abstract formalism. To stay focused, however, the practical utility for cell-type modulation will guide us in the following discourse. The stable steady-states (attractors) in the complex GRN represent so-called ‘dissipative structures’ that are sustained steady-states far from thermodynamic equilibrium, because their maintenance requires continuing intake of free energy and energy dissipation.24 Transitions between attractor states correspond to cell phenotype switches and are affected by the ‘relative depth’ or, equivalently, ‘relative barrier height’ that separates these attractors. These metaphoric concepts are not arbitrary but become necessary in multiattractor systems when one has to consider attractor transitions which are frequent and integral to the behaviors of complex biological systems. Such behavior where attractor exit is the norm and not the feared extreme are rarely encountered in engineered systems where one is interested in the stability around a single attractor state. In dynamical systems theory, such ‘local’ stability is evaluated by linear stability analysis with multiple attractors that are ‘agnostic of each other.’ Thus, the landscape notion arises when one confronts the problem of nonlocal stability and wishes to relate a set of attractors to each other with respect to their ‘relative stability’ and the associated transition rates. If we want to manipulate biological functionality by prescribing specific transitions from one attractor to another, one needs to understand the landscape topography associated with a given GRN. This goes beyond traditional dynamical systems analysis that focuses on existence and local stability of attractors. In the following sections, we will introduce a mathematical framework to analyze ‘relative stability’ of multistable dynamic systems.

The Absence of Integrability and the Essence of a Potential Function 

86







In a multistable system with M attractor states x1 ; x2 ; . . . xM (where x i is the N-dimensional state vector that defines the position of attractor i), one thus wishes to obtain some sense of the ‘relative depth,25 or more precisely, of the ordering of the attractor states with respect to their (meta)stability through some energy-like quantity, U1, U2, U3 . . . UM, associated with each steady-state. This becomes relevant when we design biological processes associated with trajectories that go through a certain set of attractors. Such a potential energy-like function U(x) for any point x in the state space of the system could be used to determine ‘potential differences’ that could inform about the probability and direction of transitions between attractor states in a noisy or perturbed system. Let us consider a deterministic system (network) of N variables xi (e.g. the activity of interacting genes) whose values describe the cell state x(t) 5 (x1(t), x2(t), . . ., xN(t))T and whose dynamics result from how each gene influences the activity of other genes (as invariantly predetermined by the gene regulatory network that is hard-coded in the genome). Such dynamics is described by the first-order ordinary differential equations (ODEs) which in general are nonlinear: dx1 5 F1 ðx1 ; x2 ;?; xN Þ dt dx2 5 F2 ðx1 ; x2 ;?xN Þ dt

(5.1)

^ or in vector form: dx=dt 5 FðxÞ F(x) represents the equivalent of ‘forces’ acting to change the system state x(x1, x2, . . ., xN) in this inertia-free system. If we have a gradient system then a potential function U can be

CHAPTER 5 Theoretical Considerations for Reprogramming Multicellular Systems obtained by integration, i.e. where there exists a function Uint(x) with the following properties: @Uint 5 2F1 ðxÞ; @x1

?;

@U int 5 2Fi ðxÞ; @xi

?;

@Uint 5 2Fn ðxÞ @xn

(5.2)

One can obtain a potential function Uint(x) by successive integration, dU int ðxÞ 52ðF1 ðxÞdx1 1 ? 1 Fi ðxÞdxi 1 ? 1 FN ðxÞdxN Þ; ð int U ðxÞ 52 F1 ðxÞdx1 1 ? 1 Fi ðxÞdxi 1 ? 1 FN ðxÞdxn

(5.3)

If U exists then the driving force is the gradient of U. However, unlike the equilibrium system, the transition rate for xA - xB is not path-independent here. Its spontaneity is determined by the potentials of attractor states and saddle points between them. The int transition probability PxA -xB is related to ΔUAS ðxÞ 5 Uint ðxS Þ 2 U int ðxA Þ and the transition 26 follows the least action path, as discussed below. Here, xS is the saddle point between two attractors xA and xB, as shown in Figure 5.2. Biological dynamical systems with N . 1 are typically nonintegrable, nongradient systems, i.e. a function U cannot be obtained by integration, and a pure potential function in general does not exist. Yet, there is validity in a formal notion of an equivalent quantity. It is loosely referred to as ‘quasi-potential.’ Such a function would allow us to compute transition trajectories if properly defined such that it can serve as a quantity for the

87 Stem cells Reprogrammed stem cell

Progenitor cells

XS

XB

XA

Epigenetic barrier Terminally differentiated cells in the ‘low’ attractors

FIGURE 5.2 The quasi-potential landscape as a formal concept for cell differentiation and reprogramming. The figure shows a schematic quasi-landscape that is computed from the normal decomposition of the dynamic system, which is constructed from a gene regulatory network. Here, for representation, the high-dimensional state space (Nc2 genes) is compressed and projected into a two-dimensional plane, whereas the ‘elevation’ represents the quasi-potential U associated with every state. The landscape captures the global dynamics, enabling the comparison of ‘relative depth’ of different attractors, which is equivalent to Waddington’s epigenetic landscape. XA and XB are two cell attractors in the landscape while the saddle point XS is in between them. The downhill movement (in yellow) of the ball represents the differentiation process from stem cells to progenitor cells, and to terminally differentiated cells. The transition paths (in red) of the ball represent the cell reprogramming which require external stimulus and do not happen often.

SECTION II Computational and Theoretical Tools in Synthetic Biology

ordering of (metastable) attractor states of the system in Eq. 5.1. Specifically, for U to be of meaning for this purpose we require that U: dU dU , 0 for x 6¼ x and 5 0 for x 5 x (where x are the stable steady-states, dt dt which can be either fixed points or limit cycle) according to the stability theory of Lyapunov;27 (ii) is related to the Freidlin-Wentzell potential V that in turn expresses the ‘the least action path’ (LAP) between two states and hence, captures the barrier height UAS. Thus, U permits the computation of the transition rate.26 Note that (i) expresses the stability property of attractor state in a nonlocal sense; and (ii) refers to the relationship of any two points. (i) satisfies

The Quasi-Potential Function for Nonintegrable Systems Since high-dimensional, nonequilibrium systems generally are not gradient systems, i.e. Eq. 5.1 is usually not satisfied: dx 5 FðxÞ 6¼ 2rU dt

(5.4)

By contrast, one can enforce a partial notion of a quasi-potential U, if we write the driving force as a sum of two terms: dx 5 FðxÞ 5 2rU 1 Fr dt

88

(5.5)

where Fr is the ‘remainder’ beyond the component of the driving force that takes the form of a potential gradient. Thus, we decompose the nongradient vector field F(x), which needs to be finite and smooth (at least twice differentiable), into two components: one that is the gradient of some ‘potential-like’ function U and the second that represents the remainder of the driving forces. The question then is: what is the physical meaning of these terms? Given that there are infinite ways of decomposing a vector field into a sum of two fields, uniqueness of decomposition must come from imposing constraints through which we can incorporate the physical meaning. Our objective here is to find a decomposition such that the quasi-potential difference ΔU that manifests the ‘height’ of the barrier between the two attractors represents exactly the associated state transitions, whereas the ‘remainder’ of the driving force Fr will not contribute to the efforts needed for the transition.

The Normal Decomposition One imposed constraint is that the gradient term is perpendicular to the ‘remainder’ force F\ (hereafter the notation Unorm, F\ instead of the general U, Fr implies that these two vector field components are perpendicular to each other): F 5 2rUnorm 1 F\

(5.6)

Then, one can show that Unorm is a Lyapunov function 27 and satisfies our condition (i) above. This is shown below for a two-dimensional system by taking its time derivative along any trajectory driven by the dynamic system in Eq. 5.1: dUnorm @Unorm @U norm 5 x_ 1 y_ dt @x @y     @Unorm @Unorm @Unorm @Unorm y x 5 2 1 F\ 1 2 1 F\ @x @x @y @y  norm 2  norm 2 @U @U 52 2 1 ðrUnorm ; F\ Þ @x @y

(5.7)

CHAPTER 5 Theoretical Considerations for Reprogramming Multicellular Systems If the gradient of Unorm is normal to the remainder force F\, i.e.: ðrU norm ; F\ Þ 5 0;

(5.8)

norm

then the function U will decrease monotonically with time during the process of reaching the attractors, thus satisfying Lyapunov’s condition for metastability:  norm 2  norm 2 dUnorm @U @U 52 2 ,0 dt @x @y

(5.9)

We can therefore see that the condition for the normal decomposition has an obvious physical meaning in that for (rUnorm, F\) 5 0, Unorm corresponds to a Lyapunov function Unorm of dynamical systems and can represent the global (meta)stability as opposed to local (linear) stability. Thus, we can decompose any sufficiently smooth vector field into a conservative potential field Unorm and the remaining forces F\: F 52rU norm 1 F\ ð2rU norm ; F\ Þ 5 0

(5.10)

If the physical interpretation is that if Unorm(x) represents a landscape over the state space region x, then for a ball in an attractor state that is perturbed to exit with least ‘energy’ against the system’s ‘driving force’ F(x) that keeps it in the attractor, we can decompose the field such that F\ will NOT contribute to this process, but only forces from the gradient field Unorm can contribute. Based on Freidlin-Wentzell’s large deviation theory of a stochastic process discussed below,26 Unorm can thus be used to compute the transition rate in this nonequilibrium dynamic system. Importantly, the normal potential Unorm can also be directly ‘read off’ the system equations without time-stepping solution. We can calculate the potential field Unorm as follows: ðrUnorm ; F 1 rU norm Þ 5 0

(5.11)

This can be written in a component format called the Hamilton-Jacob equation:       @Unorm @U norm @Unorm @U norm @Unorm @U norm  Fx1 1  F xi 1  Fxn 1 1?1 1?1 50 @x1 @x1 @xi @xi @xn @xn (5.12) The Hamilton-Jacob equation is a nonlinear partial differential equation, which usually has no analytical solutions. However, Unorm can be solved numerically using the iterative Newton-Raphson method after boundary conditions are specified for a real problem.28,29

The Freidlin-Wentzell Theory of Large Deviation in Multistable Systems For a dynamic system governed by deterministic forces F(x, t): dx0 5 Fðx0; tÞ dt

(5.13)

Let us now consider that the system is under a stochastic perturbation ξ(t): dx 5 Fðx; tÞ 1 ε  ξðtÞ dt

(5.14)

If ε is sufficiently small, the perturbed system will converge to the original dynamical system, i.e. jjx 2 x0jj - 0. However, if the perturbation is a random process with small

89

SECTION II Computational and Theoretical Tools in Synthetic Biology

average amplitude but with occasional large excursions, the perturbed dynamic system will behave differently. Freidlin and Wentzell26 proposed a large deviation theory of stochastic process as a theoretical framework to analyze the behavior of the dynamical system with multiple attractors. Supposing that our system satisfies the Langevin dynamics, the governing equations are described by the following ODEs: 8 x_ 1 5 f1 ðx1 ; . . .; xn Þ 1 ξ1 ðtÞ > > > > ^ < (5.15) x_ i 5 fi ðx1 ; . . .; xn Þ 1 ξ i ðtÞ > > ^ > > : x_ n 5 fn1 ðx1 ; . . .; xn Þ 1 ξn ðtÞ 



Suppose now that a ball is perturbed to go from attractor state x A -xB , one defines an action function VAB to measure the ‘energy’ barrier to be overcome for this transition: (ð " # ) n tB X 1 VAB 5 min jj_xi 2 fi ðxÞjj2 dt (5.16) 2 tA i51

90

Here the action function VAB is defined as a time integral of the square of the ‘remainder’ of the dynamic equations (deviation from deterministic trajectory) over the whole trajectory X(t) from attractor xA to xB. If a ball is only driven by the ‘forces’ specified in the deterministic part of the ODEs (Eq. 5.14), it will correspond to ‘free fall’ in the ‘gravity field’ and the action is zero. If the ball is perturbed against the ‘forces,’ the remainder term jj_xi 2 fi ðxÞjj2 will not be zero (in all of this metaphoric picture one is reminded that we are in an inertia-free world). We integrate the forces over the whole trajectory and obtain the total action when a ball switches from attractor xA to xB. Based on the variational principle, there exists a unique minimum integral, namely the action function VAB, which is an objective measure of the difficulty for a state switching in a nonequilibrium dynamic system.

The Relationship Between the Freidlin-Wentzell Action Function and the Normal Decomposition Although the Freidlin-Wentzell potential V and the normal potential Unorm are defined in different ways, they are mathematically related. For a dynamical system, F 5 2rUnorm 1 F\, (2rUnorm, F\) 5 0. We can rewrite the Freidlin-Wentzell potential as: VAB 5 5 5 5 5

ð tB   2 1 _ min :X2F: dt 2 tA ð tB   2 1 norm _ min :X1rU 2F\ : dt 2 tA ð tB  ð tB     1 norm 2 norm _ _ min  rU dt X 2 F :X2F 2rU : dt 1 4 \ \ 2 tA tA ð tB  ð tB     1 norm 2 norm _ _ min X2F X  rU 2rU dt 1 4 dt \ 2 tA tA ð tB h  i 1 norm 2 _ min :X2F : dt 1 4ðUBnorm 2 UAnorm Þ \ 2rU 2 tA

(5.17)

5 2ðUBnorm 2 UAnorm Þ During the process of exiting an attractor, the least action path (LAP) of a ball follows the governing equation: F 5 rUnorm 1 F\

(5.18)

CHAPTER 5 Theoretical Considerations for Reprogramming Multicellular Systems

As long as we are within the same attractor, the Freidlin-Wentzell potential is exactly twice as large as the potential from the normal decomposition, VAB 5 2ðUBnorm 2 UAnorm Þ. When a ‘ball’ transitions from one attractor to another, the Freidlin-Wentzell potential only accounts for the uphill ‘energy,’ which is two times that of Unorm. Once it goes over the saddle point (point Xs in Fig. 5.2), the Freidlin-Wentzell potential V is zero for the remaining ‘free-fall’ path. The same applies for a ball that transitions through many attractors in between: the Freidlin-Wentzell potential is equal to the sum of all uphill potential between two points. All downhill paths contribute nothing to the Freidlin-Wentzell potential.

HOW TO OBTAIN A TRAJECTORY ON THE QUASI-POTENTIAL LANDSCAPE FOR TRANSITION BETWEEN TWO ATTRACTORS The landscape notion offers a new vista that captures the intuition of state transitions by displaying this process as a jump between valleys. This picture has recently become fashionable to illustrate cell-type reprogramming and epigenetic regulation.15,30 However, the link to the underlying gene regulatory network and the theoretical description of its global dynamics has not been articulated. To demonstrate how the formal conceptualization of the landscape as a product of the GRN allows us in principles to build a landscape and determine the transition trajectories; we describe below five steps which one has to take in a typical setting in biology considering the paucity of data about the system determinants.

STEP 1: BUILD THE GENE REGULATORY NETWORK (GRN) FROM INCOMPLETE INFORMATION To modulate the cell phenotype by triggering an attractor switch, we obviously need to know the structure of the underlying dynamical system, i.e. the gene regulatory network. Unfortunately, only for a few biological systems is there sufficient information of a GRN available that permits the straightforward translation of GRN specification into a complete dynamical system in the form of Eq. 5.1. Much effort is currently spent in such ‘system identification’ based on data integration, inference from gene expression profile dynamics, or direct molecular characterization of gene regulatory interactions using high-throughput experimentation such as ChipSeq.31,32 Nevertheless, the obtained information pertains to the network architecture and remains sketchy. Not a single gene regulatory function that maps the inputs on the promoter to the output (gene expression), as needed to write the systems equations (Eq. 5.1), is even remotely known for any mammalian gene. Hence most mathematical modeling approaches that treat GRNs as dynamical systems have to use an educated guess from qualitative experiments (e.g. overexpression studies to determine whether a regulator is inhibiting or activating) based on information in the literature from isolated experiments. Given the sparseness of information, mathematical models have been limited to small subnetworks (‘circuits’) of a handful of genes.

STEP 2: TRANSLATE THE GRN INTO MATHEMATICAL FORMULA AND IDENTIFY FUNCTIONING ATTRACTORS IN GENETIC LANDSCAPE Assuming a GRN has been constructed for a given biological system, we can use a mathematical formalism, such as Boolean network or ODEs, to breathe life into the static GRN structure by translating it into mathematical equations that animate the dynamics of the GRN. The central challenge is to define the ‘transfer function’ at each network node: to map the values of the inputs to a gene X, that is the presence and activity of its upstream regulators at a given time into its change of the expression level, X(t). As mentioned above, almost no explicit molecular-level knowledge exists for the nature of the transfer functions in mammals, and thus they must be inferred from individual experiments that describe causal relationships.

91

SECTION II Computational and Theoretical Tools in Synthetic Biology

Boolean networks in which gene expression X takes the value 1 or 0 (on or off) are one of the simplest mathematical formalisms to model the network dynamics, and have been historically widely used to gain insight into the most fundamental features of complex system dynamics.22,3335 They evade the problem of having to determine the values of parameters that are part of the description of the interactions, but face the same problem as ODEs in that the Boolean function that integrates and maps all the inputs of a gene at time t to its future expression value (output at t 1 1) are virtually unknown. However, the qualitative nature of a Boolean network allows them to more readily capture the information obtained from qualitative overexpression/deletion experiments.3639 However, since in deterministic Boolean networks unstable steady-states do not exist the landscape has no saddles or hilltops, but consists of disjointed attractors.21 By contrast, ODE models are the most widely used formalisms and benefit from a large body of mathematical theory of dynamical systems that offer analytical tools to study the existence and stability of steady-states. Certainly, ODEs suffer from the need to define the functional form of regulatory relationships, i.e., how the expression state of upstream regulatory genes map into the expression state change of the target gene and determine the values of coefficients in the ODEs equations. One common work-around is the assumption of a universal sigmoidal transfer function (e.g. Hill function) that maps input into output.40 This is justified on the grounds that the mechanism of transcriptional gene activation by the regulatory factors requires the assembly of multiprotein complexes on the promoter, which typically generates cooperativity. But even without the latter, the stochastic nature of gene activation and other physicochemical factors in the cell leads to sigmoidal inputoutput.41

92

The next challenge is how to integrate all the sigmoidal transfer functions of the individual inputs, given that for no mammalian promoter, the promoter logics, let alone of the transfer functions are known. In this respect a variety of ad hoc argued approaches have been used. Duff et al. recently compared several ODE approaches for the same network for the case for a hematopoietic system. The case of pancreas cell reprogramming below offers another example.10,4246

STEP 3: DEFINE THE BIOLOGICAL TRANSFORMATION OF INTEREST AS A TRAJECTORY BETWEEN ATTRACTORS IN THE QUASI-POTENTIAL LANDSCAPE Many useful manipulations of biological behaviors consist of imposing a change along trajectories that cause cells or tissues to exit the current (possibly maladaptive) stable state and enter into a benign attractor that may even actively contribute to a vital function of the organism. For instance, a novel therapy envisaged for diabetes patients is to convert alpha cells in the pancreatic islet into the distinct but developmentally neighboring insulin-secreting pancreas beta-cells.47 This transition between two adjacent attractors can be externally triggered by the ectopic expression of a combination of regulatory genes or possibly by small molecules that modulate the activities of the appropriate set of genes. Such manipulations amount to the perturbation of a set of network nodes, which moves the state of a cell in state space. The ease of exit from an attractor is dictated by the gene network via the ODEs and depends on the direction of the perturbation (‘push’) as computed in the next step.

STEP 4: CALCULATE THE LEAST ACTION PATH (LAP) FOR THE TRANSITION BETWEEN CELL ATTRACTORS Once the systems equations are defined and the relevant attractors (departure and destination states) are identified, we can determine the nature of the perturbation such that it follows the course of the LAP for the desired phenotype switch. This can be computed using the Freidlin-Wentzell formalism based on the same knowledge of network specification that has allowed us to compute the attractor states.

CHAPTER 5 Theoretical Considerations for Reprogramming Multicellular Systems

Because the Freidlin-Wentzell action function is usually too complicated to be found analytically, Eq. 5.16 can be rewritten in discrete form to find an approximate solution, as shown below: VAB ðXðtÞÞ 5 Δt

M 21 X n X k51 i51

"

#

x ik11 2x ki 1 k11 k 2

2 ðfi 2fi Þ



Δt 2

(5.19)

Here the total time over the trajectory X(t) is divided into M21 equal parts Δt. The time integral of the action function is approximated with the sum of actions in the small time dx x k11 2 x ki is approximated with the first-order difference equation i . segments. The rate dt Δt To find the least-action trajectory in discrete form, initially two attractors are connected with a straight line and the conjugate gradient method (CG) is used to minimize the action function VAB(X(t)).48

STEP 5: DESIGN THE PERTURBATIONS NEEDED TO DRIVE THE TRANSITION BASED ON THE LAP When the LAP is calculated among the designated attractors, one will note that some genes never change their expression values while others are significantly modified as the cell moves along the LAP. Thus, the LAP serves as the first filter to select the genes whose expression needs to be modified to most efficiently drive the transition and which genes can be left unchanged. Besides identifying the list of genes that change during the state transition, the LAP also provides information on the detailed temporal profile of the expression behavior of each gene. Ideally, if we can find a ‘perturbation recipe’ which exactly modifies gene expression levels such that they collectively follow the LAP, this perturbation would be the most efficient way to induce that state transition. However, in reality this can currently hardly be realized because of a series of problems associated with experimentation, including intrinsic noise of the cell state (which changes the position of x(t)), time delays between manipulation (transfection, drug treatment) and desired gene expression change, the lack of precise control of expression levels, etc. Thus, if the LAP computation is to benefit reprogramming it will be in cases where the cursory initial direction of the computed LAP trajectory deviates significantly from what is intuitive. Such a deviation could be implemented qualitatively to improve transition efficiency.

EXAMPLE: STATE TRANSITION IN BLOOD CELL AND PANCREAS CELL DIFFERENTIATION AND REPROGRAMMING The ODE Model of Blood Cell Differentiation and Reprogramming Here, we use our example of blood cells for which a qualitative description was made earlier to demonstrate how to find the Freidlin-Wentzell least action path. Based on the cross-inhibition and self-activation between GATA1 and PU.1 (Fig. 5.1B), the governing equations can be written as: 8 θnb1 dx xn > > 5 F ðx; yÞ 5 a 1 b 2 k1 x > 1 1 1 > θna1 1 xn θnb1 1 yn < dt > θnb2 dy yn > > 5 F ðx; yÞ 5 a 1 b 2 k2 y > 2 2 n 2 n : dt θ a2 1 y n θb2 1 xn

(5.20)

93

SECTION II Computational and Theoretical Tools in Synthetic Biology

(B) 0.1

S2

• S0

2

0 0.05

0

2 • S1

3

0 0

1 Time

• S0

0

X2

0.2

2 • 3 S1

1 X1

• S2 2

0.3

1

X2 0

2

0.4

V

X2

0.05

1 1 X1

• S2 2



S0

2

1 X2 0

0.1



S2

0 0

1 Time

2

1 Time

2

0.4 •

0.3

S0

V

0

5

V(t)

• V(t)

U

(A) 5

1

0.2

0.1 0.1

0 0

1 X1

2 S1



0 0

1 Time

2

0 0

1 X1

2



S1

0 0

FIGURE 5.3 Least action paths on the quasi-potential landscape are computed based on the Freidlin-Wentzell theory. Least action path for attractor transition   (‘transdifferentiation’) between the two cell attractors of the example in Figure 5.1B. (A) The least action path for transition from attractor S2 -S1 . Green point is the starting point, red point is the end point. V(t) is the action function at every time step. V is the accumulative action function at each time t.     (B) The least action path for transition from attractor S1 -S2 , which is different from that of path S2 -S1 because this is a nonlinear dynamic     system. The Wentzell action function is the same as for the transition from S2 -S1 because attractors S1 and S2 have the same ‘height’ in norm the quasi-potential U .

94 Since the dynamics of this network is structurally quite robust, we can use simple, symmetric values of parameters to demonstrate the network dynamics. In this example, n 5 4, k1 5 k2 5 1, θa1 5 θa2 5 θb1 5 θb2 5 0:5, a1 5 a2 5 b1 5 b2 5 1. The quasi-potential function Unorm from the normal decomposition is shown in Figure 5.3A. It has   three attractors designated S0 (progenitor state), S1 (erythroid lineage exhibits the  GATA1HIGH/PU.1LOW pattern), and S2 (myeloid exhibits the GATA1LOW/PU.1HIGH pattern). If we want to perturb the blood cell system to cause a transition from the myeloid to the erythroid lineage, the associated Wentzell action functions and LAP are calculated from the numerical minimization based on Eq. 5.19, as shown in Figure 5.3B. Note that the least action path myeloid - erythroid is different from the reverse path erythroid - myeloid. Such rather counterintuitive path irreversibility is a common characteristic of nonlinear biological systems.

The Specific Pancreas Development GRN and the ODE Model In the second example, we return to pancreas cell reprogramming to illustrate the above principles of how to calculate a state transition given the partial knowledge of the underlying GRN. We first model the normal differentiation of the main pancreas cell lineages  as cell behavior that takes place on the quasi-potential landscape: the exocrine and the endocrine cells, including the β, δ, and α islet cells. Using ODEs to describe the mutual regulatory influence of 10 TFs involved in the differentiation of these cells during pancreas development, we present a minimal model that qualitatively captures known interactions and is able: (i) to recapitulate normal pancreas cell differentiation; (ii) to predict the temporal changes of key TFs during the development of particular cell lineages; and (iii) to predict the outcome of perturbations and hence, to help design new recipes of reprogramming experiments.46

CHAPTER 5 Theoretical Considerations for Reprogramming Multicellular Systems

There are three pairs of opposing (mutually inhibiting) TFs that control binary decisions at the three levels of pancreas differentiation. The first level of binary branching, between the exocrine and endocrine lineage, is governed by the opposing pair of the TFs, Ptf1a ,- -. Ngn3.46 The second-level branching is governed in a similar manner by the Pax4 ,--.Arx circuit which determines the β/δ versus the α-cell lineage, respectively.46 The fatedetermining TFs for the third branching into the β-cells versus the δ cells have not fully been characterized, but MafA has been shown to bias the decision to establishing the β-cells, whereas little information is available for the fate-determining factor for the δ cells. Thus, for modeling purposes and for maintaining symmetry, we use a placeholder for the δ-cell determining factor, called ‘δ factor’ and then assume a third pair of opposing TFs: MafA , – . δ factor, governing the determination of β versus δ cells, respectively (Fig. 5.4A). Below are the ODEs for the network of regulatory influences described above: xn4 1 xn7 1 xn8 1 xn10 2 k  x1 1 ξ1 ðtÞ 1 1 xn4 1 xn7 1 xn8 1 xn10

Pdx1:

x_ 1 5 as

Ptf 1a:

x_ 2 5 a

xn10 2 k  x2 1 ξ2 ðtÞ 1 1 xn10 1 xn3

Ngn3:

x_ 3 5 a

xn10 2 k  x3 1 ξ3 ðtÞ 1 1 xn10 1 xn2

Pax6:

x_ 4 5 a

ηn  xn3 1 xn4 2 k  x4 1 ξ4 ðtÞ 1 1 ηn  xn3 1 xn4

Pax4:

x_ 5 5 ae

ηnm  xn1  ηn  xn3 2 k  x5 1 ξ5 ðtÞ 1 1 ηnm  xn1  ηn  xn3 1 xn6

Arx:

x_ 6 5 ae

ηnm  xn1  ηn  xn3 2 k  x6 1 ξ6 ðtÞ 1 1 ηnm  xn1  ηn  xn3 1 xn5

MafA:

x_ 7 5 a

ηnm  xn1  ηn  xn3  xn5 1 xn7 2 k  x7 1 ξ 7 ðtÞ 1 1 ηnm  xn1  ηn  xn3  xn5 1 xn7 1 xn8

δgene:

x_ 8 5 a

ηnm  xn1  ηn  xn3  xn5 1 xn8 2 k  x8 1 ξ 8 ðtÞ 1 1 ηnm  xn1  ηn  xn3  xn5 1 xn7 1 xn8

Brn4:

x_ 9 5 a

ηn  xn6 1 xn9 2 k  x9 1 ξ9 ðtÞ 1 1 ηn  xn6 1 xn9

Hnf 6:

x_ 10 5 as

Maturity:

x_ 11 5 m

(5.21)

1 2 k  x10 1 ξ10 ðtÞ 1 1 ηn ðx2 1x7 1x8 1x9 Þn  xn11

Here, the variables x1 to x10 represent the expression levels of the 10 key genes involved in pancreas cell differentiation. Equations represent the regulatory relationship among these 10 genes, as shown in Figure 5.4B. Each equation has three terms which capture the: (i) production of the gene expression in response to the upstream TFs; (ii) the linear degradation; and (iii) contribution of the stochastic fluctuations to gene expression. The production rate for the gene expression change is, as explained in Step 2 above, a sigmoidal function of its upstream regulators. Hill function-like forms with uniform exponents n (n 5 4) were used (more details can be found in46). Figure 5.4C shows the three ‘branchings’ of gene expression trajectories during normal pancreas cell differentiation. The expression of Hnf6 starts at a high level and gradually decays, which activates the first switch between Ptf1a and Ngn3. The nondeterministic property of stochastic dynamics allows the state trajectory to split into two cell lineages,

95

SECTION II Computational and Theoretical Tools in Synthetic Biology

Pdx1 Ptf1a Ngn3 Pax4 Pax6

5 X11

Hnf6 X10

Pancreatic bud

Maturity X1 Pdx1

Ngn3 Ptf1a

Arx

MafA

“δ” δ cells

β cells

MafA

δ gene

X7

X8

Ngn3

4 Ptf1a

2

2

1 0 0

30

1

2

3

4

Arx

3

Brn4

2

Gene expression

Pax4

4

4 2

1

4 3 2 1 0 0

0 0

30

1

2

3

4

MafA

5

δ Cell

4 3 2 1

10

(D)

Time

20

30

0 0

1

2

3

25

30

35

0 0 5

5

10

15

20

25

30

35

0 0

5

10

15 20 Time

25

30

35

Hnf6 Ptf1a Ngn3

6 4

4

6

2 0 0

5

10 15 Time

20

6

5

10 15 Time

20

4 2 0 0

5

10 15 Time

20

25

2

4

6

4

6

4

6

6 4 2 0 0

25 MafA δCell

6

2

Ngn3

2 0 0

4

0 0

25 Pax4 Arx Brn4

4

Pax4 Gene expression

10 20 3rd branching, β - δ cell

20

X9

Ngn3

Arx

0 0 10 20 2nd branching, Ptx4 - Arx 5

15

(C)

Ptf1a

3

10

Brn4

Gene expression

4

5

Arx

(B) Hnf6

δ Cell

96

Gene Expression

Gene Expression

Gene Expression

(A) 1st branching, Ptr1a - Ngn3 5

0 0

X6

X5 Pdx4

α/PP progenitor

5

Ptf1a

β/δ progenitor

X4 Pdx6

Ngn3

X2

0 0 5

Ptf1a

Pax4

X3

Gene expression

Endocrine progenitor

2 Pax4

6 Ptf1a

Ptf1a Exocrine progenitor

4 2 0 0

2 MafA

MafA

(E)

FIGURE 5.4 The dynamic model of GRN to explain pancreatic cell differentiations and reprogramming. (A) The normal branching developmental paths for major pancreas cell types and the gene circuit modules involved in the three branching steps. (B) The underlying gene regulatory network for pancreatic cell differentiation integrating the gene circuit modules. (C) Temporal gene expression profile during pancreatic cell differentiation. The first panel is from experimental observations of both gene expression levels and timing, while the second and third panels show model simulation results from the master model and an alternative model with the inhibitory effects of Pdx1 upon Ngn3 and Ptf1a. (D, E) Time course and state space trajectories for gene expression profile dynamics during normal pancreatic cell differentiation (D) and cell reprogramming with the recipe of overexpressing Pdx1, Ngn3 and MafA(E). 46

corresponding to the two new attractors: the exocrine and the endocrine progenitors. If cells follow the endocrine linage, they become Ngn3 positive which dominates over Ptf1a (‘lower branching lines’ in Fig. 5.4D, right panel). High Ngn3 triggers the second switch, embodied by the branching governed by Pax4 and Arx. Later Ngn3 decreases, reflecting its observed transient expression character, as Hnf6 is down-regulated with maturation. Endocrine progenitors at this stage can differentiate into either α cells or β/δ cell progenitors. Cells fated to the trajectory with high Arx (and low Pax4) turn on the Arx target gene Brn4, which is a marker gene for the α phenotype and whose expression persists. By contrast, in cells fated to the trajectory with dominating Pax4, this TF then triggers the last switch that controls the branching between MafA and the δ cell gene which subsequently will activate the respective effector genes, producing the distinct β and δ cells.

CHAPTER 5 Theoretical Considerations for Reprogramming Multicellular Systems

Simulation and Prediction of Reprogramming It was recently reported that overexpression of Pdx11, Ngn31, and MafA1 could reprogram exocrine pancreas cells to the endocrine insulin-producing β cells47  a clinically desired transition because of its potential in cell therapy of diabetes. To model such reprogramming, we first describe the virus-mediated ectopic gene overexpression with temporal additional gain terms in the corresponding equations in our GRN model (Eq. 5.21). Figure 5.4E presents the gene expression time profiles and trajectories in the relevant phase planes during cell reprogramming using overexpression of the genes Pdx11, Ngn31, and MafA1. An exocrine cell starts with high expression of Ptf1a. In the model, reprogramming is implemented by the extra production terms for Pdx1, Ngn3, and MafA during a certain time window. We see that the cell switches its expression pattern from that with a high-Ptf1a to one with a high-Ngn3 which subsequently triggers the cell to go through the Pax4-Arx branch point to finally reach the steady-state of the β cell. It should be noted that some α cells are also produced in this process because of the stochasticity, and such outliers that ‘go off in the wrong direction’ are often observed in reprogramming experiments. Our model also predicts that Ngn3’s role in reprogramming can be enhanced by the inhibition of Ptf1a directly. This is important because an inhibitory perturbation of a network node is technically much easier (e.g., via RNAi technology, small molecules) than an activating perturbation. Since Ptf1a and Ngn3 are cross-inhibitory, when Ptf1a expression level is suppressed, Ngn3 expression will increase. Our simulations also show that combining the perturbation of the two network nodes, Ptf1 (inhibition) and Ngn3 (activation) can synergistically enhance efficiency of β cell reprogramming. In conclusion, this example demonstrates how a gene regulatory network model, built with qualitatively reported interaction schemes from the literature, which mostly represent incomplete ‘causal networks’ rather than GRNs, govern cell-type diversification and differentiation. Our results show that with a minimum of knowledge of the constraints imposed by the gene network topology, pancreas cell differentiation can be explained as the transitions among different cell attractors.

CONCLUSION AND OUTLOOK We have demonstrated the principles of manipulating the phenotype of cells  by actuating switches between entire cell phenotypes and how such engineering can be informed by the underlying gene regulatory network that governs normal development of these cell types. The perturbations force cells to achieve a physiological, predestined phenotype, but going there through a ‘road not taken’ during normal development. It is in this new rationale of predictive model-based whole phenotype manipulation across uncharted terrain in gene expression state space that warrants the placement of such manipulative approaches into the domain of ‘synthetic biology.’ Controlling transitions between high-dimensional attractors in rugged ‘epigenetic landscapes’ is an elementary capacity that will help the design and engineering of more complex biological systems. The state transitions between attractors on a landscape is not just a helpful metaphor but the direct mathematical manifestation of network dynamics that involve deterministic constraints imposed by regulatory interactions encoded by the genome and modulated and driven by external signals and gene expression noise. At the moment little specific information about these genomic interactions is known, so that educated guessing is a substantial ingredient in the modeling. But it turns out that, as modelers of biological systems have long realized, a profound property of robust complex systems is that the qualitative constraints of interactions, independently of quantitative details, capture much

97

SECTION II Computational and Theoretical Tools in Synthetic Biology

of the realized effective dynamics, thus already granting practical utility to an imperfect and incomplete model. At the academic level, much as a hypothetical generic model of a geographic landscape that we draw in the classroom teaches us about the basic principles behind the formation of mountains, valleys, and lakes, but does not help us to navigate the real world to move from one specific place to another due to the absence of information of the specific geography, so are we, with regard to gene regulatory networks and cell behaviour, only at the classroom stage of generic cartoonish models. We do not have all the exact information to compute the specific potential-landscape topography that could guide us from point (cell state) A to B. But the formal reduction of valleys and mountains in this landscape that Waddington already proposed, to the first principles of systems dynamics of a network provides us now with a solid conceptual framework for the current practice of reprogramming, much of which is still dominated by brute-force trial-and-error efforts in perturbing the dynamics of gene-regulatory networks. With sufficient information, as to be expected in the near future, it may be possible using the theories presented here, to reconstruct the real-world regulatory epigenetic landscape of model organisms. It can then serve as a specific road map to design efficient strategies for reprogramming of any desired cell type at will with high efficiency by computing the optimal starting point and paths.

References 1. Wolpert L, Tickle C. Principles of Development, 4th ed. Oxford, UK: Oxford University Press; 2011. 2. Delbruck M. Genetik der bakteriophagen. Klin Wochenschr. 1949;27:109. 3. Jacob F, Monod J. Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol. 1961;3:318356. 4. Kauffman SA. Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol. 1969;22:437467.

98

5. Huang S. Reprogramming cell fates: reconciling rarity with robustness. Bioessays. 2009;1:546560. 6. Waddington CH. The epigenotype. Endeavour. 1942;1:1820. 7. Pisco A, Fouquier d’Herouel A, Huang S. ‘Epigenetics:’ many meanings  one common concept. DNA Cell Biol. in review. 8. Eldar A, Elowitz MB. Functional roles for noise in genetic circuits. Nature. 2010;467:167173. 9. Huang S. Cell lineage determination in state space: a systems view brings flexibility to dogmatic canonical rules. PLoS Biol. 2010;8:e1000380. 10. Huang S, Guo Y-PP, May G, Enver T. Bifurcation dynamics in lineage-commitment in bipotent progenitor cells. Dev Biol. 2007;305:695713. 11. Muñoz-Descalzo S, De Navascues J, Arias AM, Munoz-Descalzo S. Wnt-Notch signalling: an integrated mechanism regulating transitions between cell states. Bioessays. 2012;34:110118. 12. Graf T. Historical origins of transdifferentiation and reprogramming. Cell Stem Cell. 2011;9:504516. 13. Okita K, Ichisaka T, Yamanaka S. Generation of germline-competent induced pluripotent stem cells. Nature. 2007;448:313317. 14. Hanna JH, Saha K, Jaenisch R. Pluripotency and cellular reprogramming: facts, hypotheses, unresolved issues. Cell. 2010;143:508525. 15. Zhou Q, Melton DA. Extreme makeover: converting one cell into another. Cell Stem Cell. 2008;3:382388. 16. Smith J, Theodoris C, Davidson EH. A gene regulatory network subcircuit drives a dynamic pattern of gene expression. Science. 2007;318:794797. 17. Huang S, Eichler G, Bar-Yam Y, Ingber DE, Yam YB. Cell fates as high-dimensional attractor states of a complex gene regulatory network. Phys Rev Lett. 2005;94:14. 18. Macarthur BD, Ma’ayan A, Lemischka IR, Ma A. Systems biology of stem cell fate and cellular reprogramming. Nat Rev Mol Cell Biol. 2009;10:672681. 19. Chang HH, Hemberg M, Barahona M, Ingber DE, Huang S. Transcriptome-wide noise controls lineage choice in mammalian progenitor cells. Nature. 2008;453:544547. 20. Enver T, Pera M, Peterson C, Andrews PW. Stem cell states, fates, and the rules of attraction. Cell Stem Cell. 2009;4:387397. 21. Huang S, Kauffman S. Complex gene regulatory networks  from structure to biological observables: cell fate determination. Encyclopedia of Complexity and Systems Science. 2009.

CHAPTER 5 Theoretical Considerations for Reprogramming Multicellular Systems

22. Kaplan D, Glass L. Understanding Nonlinear Dynamics. New York: Springer; 1995. 23. Zhou JX, Huang S. Understanding gene circuits at cell-fate branch points for rational cell reprogramming. Trends Genet. 2011;27:5562. 24. Prigogine I. The End of Certainty. New York NY: The Free Press; 1997. 25. Nicolis G. Dissipative systems. Rep Prog Phys. 1986;49:873949. 26. Freidlin MI, Wentzell AD. Random Perturbations of Dynamical Systems. New York, NY: Springer-Verlag; 1984. 27. Lyapunov AM. The general problem of the stability of motion. Int J Control. 1992;55:531534. 28. Polyanin AD, Zaitsev VF. Nonlinear partial differential equations. SIAM Rev. 1969;11:719. 29. Lee ES, Phillip PC. A generalized Newton-Raphson method for nonlinear partial differential equations — packed-bed reactors with axial mixing. Chem Eng Sci. 1966;21:143157. 30. Yamanaka S. Elite and stochastic models for induced pluripotent stem cell generation. Nature. 2009;460:4952. 31. Kerenyi MA, Orkin SH. Networking erythropoiesis. J Exp Med. 2010;207:25372541. 32. MacQuarrie KL, Fong AP, Morse RH, Tapscott SJ. Genome-wide transcription factor binding: beyond direct target regulation. Trends Genet. 2011;27:141148. 33. Huang S. Gene expression profiling, genetic networks, and cellular states: an integrating concept for tumorigenesis and drug discovery. J Mol Med. 1999;77:469480. 34. Shmulevich I, Kauffman SA. Activities and sensitivities in Boolean network models. Phys Rev Lett. 2004;93:48701. 35. Kauffman SA. The Origins of Order. New York: Oxford University Press; 1993. 36. Huang S, Ingber DE. Shape-dependent control of cell growth, differentiation, and apoptosis: switching between attractors in cell regulatory networks. Exp Cell Res. 2000;261:91103. 37. Albert R. The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. J Theor Biol. 2003;223:118. 38. Espinosa-Soto C, Padilla-Longoria P, Alvarez-Buylla ER. A gene regulatory network model for cell-fate determination during Arabidopsis thaliana flower development that is robust and recovers experimental gene expression profiles. Plant Cell. 2004;16:29232939. 39. Balleza E, et al. Critical dynamics in genetic regulatory networks: examples from four kingdoms. PLoS One. 2008;3:e2456. 40. Glass L, Kauffman SA. The logical analysis of continuous, non-linear biochemical control networks. J Theor Biol. 1973;39:103129. 41. Andrecut M, Halley JD, Winkler DA, Huang S. A general model for binary cell fate decision gene circuits with degeneracy: indeterminacy and switch behavior in the absence of cooperativity. PLoS One. 2011;6:e19358. 42. Duff C, Smith-Miles K, Lopes L, Tian T. Mathematical modelling of stem cell differentiation: the PU.1-GATA-1 interaction. J Math Biol. 2012;64:449468. 43. Roeder I, Glauche I. Towards an understanding of lineage specification in hematopoietic stem cells: a mathematical model for the interaction of transcription factors GATA-1 and PU.1. J Theor Biol. 2006;241:852865. 44. Chickarmane V, Troein C, Nuber UA, Sauro HM, Peterson C. Transcriptional dynamics of the embryonic stem cell switch. PLoS Comput Biol. 2006;2:e123. 45. MacArthur BD, Please CP, Oreffo ROC. Stochasticity and the molecular mechanisms of induced pluripotency. PloS One. 2008;3:e3086. 46. Zhou JX, Brusch L, Huang S. Predicting pancreas cell fate decisions and reprogramming with a hierarchical multi-attractor model. PLoS One. 2011;6:e14752. 47. Zhou Q, Brown J, Kanarek A, Rajagopal J, Melton DA. In vivo reprogramming of adult pancreatic exocrine cells to beta-cells. Nature. 2008;455:627632. 48. Hestenes MR, Stiefel E. Methods of conjugate gradients for solving linear systems 1. J Res Natl Bur Stand. 1952;49:409436.

99