The Limited Information Capacity of Cross-Reactive Sensors Drives the Evolutionary Expansion of Signaling

The Limited Information Capacity of Cross-Reactive Sensors Drives the Evolutionary Expansion of Signaling

Article The Limited Information Capacity of Cross-Reactive Sensors Drives the Evolutionary Expansion of Signaling Graphical Abstract Authors 1 bit ...

2MB Sizes 0 Downloads 29 Views

Article

The Limited Information Capacity of Cross-Reactive Sensors Drives the Evolutionary Expansion of Signaling Graphical Abstract

Authors

1 bit

Information capacity, C*

Micha1 Komorowski, Dan S. Tawfik

...

bits

2 bits

...

Correspondence [email protected]

Partial divergence

In Brief

6

An engineer designing a communication system would use few distinct signaling components while ensuring that the output of each component is highly accurate. However, natural evolution came up with a different solution: cells have many interconnected, crossreactive components that individually produce noisy signals. Why?

complete divergence

4

10

15

5

20

25

No divergence Complete divergence

0

5

no divergence 0

1

2

3

i (numer of duplication events)

2

16

128

1024

degree of divergece

Highlights d

Cooperativity or allostery do not increase the signaling capacity

d

Copy number variation and cross-reactivity severely restrict signaling capacity

d

Duplication of cross-reactive receptors doubles capacity even with minimal divergence

d

Duplication with minimal divergence yields multiple crosswired signaling pathways

Komorowski & Tawfik, 2019, Cell Systems 8, 76–85 January 23, 2019 ª 2018 Elsevier Inc. https://doi.org/10.1016/j.cels.2018.12.006

Cell Systems

Article The Limited Information Capacity of Cross-Reactive Sensors Drives the Evolutionary Expansion of Signaling Micha1 Komorowski,1,3,* and Dan S. Tawfik2 1Institute

of Fundamental Technological Research, Polish Academy of Sciences, Warsaw 02-106, Poland Institute of Science, The Department of Biomolecular Sciences, Rehovot 7610001, Israel 3Lead Contact *Correspondence: [email protected] https://doi.org/10.1016/j.cels.2018.12.006 2Weizmann

SUMMARY

Signaling systems expand by duplications of various components, be it receptors or downstream effectors. However, whether and how duplicated components contribute to higher signaling capacity is unclear, especially because in most cases, their specificities overlap. Using information theory, we found that augmentation of capacity by an increase in the copy number is strongly limited by logarithmic diminishing returns. Moreover, counter to conventional biochemical wisdom, refinements of the response mechanism, e.g., by cooperativity or allostery, do not increase the overall signaling capacity. However, signaling capacity nearly doubles when a promiscuous, non-cognate ligand becomes explicitly recognized via duplication and partial divergence of signaling components. Our findings suggest that expansion of signaling components via duplication and enlistment of promiscuously acting cues is virtually the only accessible evolutionary strategy to achieve overall high-signaling capacity despite overlapping specificities and molecular noise. This mode of expansion also explains the highly cross-wired architecture of signaling pathways.

INTRODUCTION Biochemical signaling underlies life. Along the evolutionary timeline, increasingly complex signaling systems have evolved. The evolutionary expansion of signaling pathways is characterized by duplication and divergence, thus yielding paralogous components including receptors and other signaling proteins (kinases, transcription factors, etc.). This trend is evident in multi-cellular organisms in particular (Bridgham et al., 2006; Pires-daSilva and Sommer 2003; Amit et al., 2007; Housden and Perrimon, 2014; Semyonov et al., 2008; Strotmann et al., 2011; Warren et al., 2014; Hwang et al., 2013; Rowland and Deeds, 2014; Rowland et al., 2017), but it is also seen in prokaryotes (Goldman et al., 2006). In some cases, duplicates exhibit clear functional divergence, but in many others, paralogous signaling compo76 Cell Systems 8, 76–85, January 23, 2019 ª 2018 Elsevier Inc.

nents have overlapping, sometimes nearly identical biochemical specificities with respect to the triggering ligands or/and the activated downstream components (Housden and Perrimon, 2014; Amit et al., 2007; Liongue et al., 2016; de Mendoza et al., 2014). Duplications occur frequently, and duplicates may fix not only because they provide a distinct adaptive advantage but also because of drift (Hughes, 2005; Lynch and Walsh, 2007) (e.g., sub-functionalization). As a result, a single ligand often activates multiple downstream components, and a distinct downstream component can be activated by numerous ligands. This functional overlap raises the question of whether and how these duplications increase the organism’s capacity to detect and adequately respond to various cues (Friedlander et al., 2016; Housden and Perrimon, 2014; Amit et al., 2007). The above question is particularly intriguing given the inherent stochasticity of signaling processes as revealed by single-cell studies (Elowitz et al., 2002; English et al., 2006; Zhang et al., 2017; Bar-Even et al., 2006). Biochemical noise, resulting from multiple sources (Symmons and Raj, 2016; Levin et al., 2011; Bar-Even et al., 2006; Swain et al., 2002; Komorowski et al., ik et al., 2013), also has a detrimental impact on signaling (Tkac 2008a, 2008b; Cheong et al., 2011; Selimkhanov et al., 2014; Symmons and Raj, 2016), and it is currently not clear how high overall signaling fidelity can be achieved (Suderman et al., 2017; Lestas et al., 2010). Following Berg and Purcell (Berg and Purcell, 1977), probabilistic modeling has been applied to examine fidelity of receptors, and of noisy biochemical signaling systems in general. In addition, information theory has been deployed (Selimkhanov et al., 2014; Brennan et al., 2012; Cheong ik et al., 2008a, 2008b; Tkac ik and Walczak, et al., 2011; Tkac 2011; Rhee et al., 2012; Suderman et al., 2017) as an integrated measure of signaling accuracy, a term known as ‘‘information capacity’’ (Cover and Thomas, 2012), C*. Information capacity  is expressed in bits, and broadly speaking, 2C represents the maximal number of different inputs that a system can effectively resolve (e.g., different ligand concentrations). If, for example, the derived C* = 2, then 4 different ligand concentrations can be resolved with negligible error. Information theory, therefore, allows for a better understanding of whether and why certain forms of biochemical sensing give preferential advantages (Martins and Swain, 2011; Marzen et al., 2013). We surmised that information theory could be applied to examine the evolution of signaling, and specifically, the extent to which duplications of signaling components may contribute

to higher signaling capacity, and to further ascertain how dependent this contribution is on the degree of divergence of the duplicated paralogs. We begin our analysis with a single gene, encoding a single signaling component. We refer to such a component as a sensor. A sensor can be either a binding protein or an enzyme, that upon activation acquires the potential to transform a downstream effector (activation, hereafter). We developed a generic model that addresses the steady-state function of sensors, given any molecular scenario and set of microscopic parameters. Initially, we considered noise resulting from random activation of sensors and further accounted for copy number variability and activation of sensors by noncognate ligands. For the sake of generality and simplicity, we focused on a steady-state model. Our model addresses essentially any molecular mechanism, and we analyzed in detail the common ones—cooperativity, allostery, and functional selectivity (Wisler et al., 2014) (the same sensor activating different downstream effector proteins with different efficacies, also known as biased agonism). We found that, somewhat counter to conventional biochemical logic, these refinement mechanisms do not increase the overall signaling capacity. As a result, the information capacity per single sensor type presents a practical ceiling imposed by logarithmic diminishing returns with respect to sensor copy number, N. Individual signaling components are also subject to noise due to the presence of non-cognate ligands that promiscuously bind and activate sensors. The ligand binding specificity of sensors is limited by various physicochemical and evolutionary factors Tawfik (2014), and cross-reactivity is commonly observed in many sensors, e.g., G protein-coupled receptors (GPCRs) (Munk et al., 2016; Venkatakrishnan et al., 2013), and in downstream components such as kinases (Rubenstein et al., 2017) or phosphatases (Rowland et al., 2015). As expected, we found that cross-reactivity can severely compromise signaling capacity. We found, however, that it may also lead to fold-increases in information capacity through duplication and divergence of cross-reactive sensors. Notably, fold-increases in information capacity may arise even with very limited modifications in ligand binding selectivity. Foremost, it appears that the only way of circumventing the severe loss in signaling capacity due to copy number variations and cross-reactivity is by the emergence of a paralog that signals the presence of the cross-reactive noncognate ligand. RESULTS Copy Numbers of Sensors Impose an Information Capacity Ceiling How many different ligand concentrations can be effectively recognized by N copy numbers of a given sensor? This depends on several factors. First, sensitivity, i.e., to what degree the number of active sensors (output) changes in response to changes in the ligand concentration (input, x). Sensitivity is readily addressed by classical mass-action treatments, typically via an activation function that describes the fraction of ligand-occupied sensor molecules as a function of ligand concentration (Figure 1A). However, the stochastic nature of signaling also means that how reproducibly the system responds to a given ligand concentration, i.e., noise level, is another factor. The third factor

is a key element of information theory—how frequently different inputs occur— i.e., the input distribution. For each given sensor, there exists a distribution of inputs for which signaling is optimal. Accordingly, classical biochemistry informs us that maximal sensitivity is obtained when a sensor’s Kd (or KM for an enzyme) matches the ligand’s mid concentration. Inputs within the optimal distribution generate distinct and reproducible outputs, whereas other inputs generate similar and/or irreproducible outputs (low sensitivity and/or high noise). If a sensor encounters suboptimal inputs more frequently than the optimal ones, its overall information transfer will be lower (see STAR Methods for more details). Our statistical approach takes the above three factors into account (see STAR Methods), thus allowing us to systematically analyze biochemical sensors with any given molecular mechanism. Specifically, we considered sensors with any given activation function hðxÞ, i.e., functions that describe the steady-state probability of sensors being in their activated state as a function of ligand concentration x (For an example, see Figure 1A). We found that the information capacity of a sensor is then given by   1 N  b; C = log2 2 2e=p

(Equation 1)

where  N is the sensor’s copy number, and b = pffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffi p log2 =arcsinð hðNÞ Þ  arcsinð hð0Þ Þ is a correction factor 2 for ‘‘leaky’’ and/or incomplete activation. The above holds for Monod-Wyman-Changeux (MWC), or Hall’s model, or essentially any other relevant model, and includes all the microscopic rate terms that describe the model of choice. The correction factor, b, takes a value of zero if the sensor meets the default edge conditions — i.e., that the probability of being active when no ligand is present is zero (h(0) = 0), and the probability of being activated at saturating ligand concentrations is 1. Under these conditions, C* is maximal, as expected. An output signal in the absence of a ligand (leaky activation), and/or a partial signal at saturating ligand concentration (incomplete activation), result in b > 0and hence in lower information capacity. Overall, as shown in Equation 1, the information capacity of a sensor is logarithmically dependent on its copy number, N (Figure 1B, continuous line). The logarithmic dependence of capacity on N has been recently demonstrated for the MWC model (Martins and Swain, 2011; Marzen et al., 2013). Our model indicates that this dependency is a universal property that holds for any sensor regardless of the mechanism of binding (further elaborated in the next section) and also holds for out-ofsteady-state scenarios with the output consisting of multiple readouts of sensor activity over time (see STAR Methods). Assuming no leakiness and complete activation, C* reaches a value of 4 at N = 500. An increase to 5,000 copies results in 40% increase in C*, and a further increase to 50,000 copies provides a mere increase of 30%. Thus, the information capacity of a single sensor seems to hit a ceiling that is dictated by how high the copy numbers can be. Very high copy numbers are realistic for key metabolic enzymes—for example, the most abundant proteins in E. coli are present at >105 copies per cell (Milo and Phillips, 2015). However, the majority of proteins are Cell Systems 8, 76–85, January 23, 2019 77

1.0

Figure 1. The Information Processing Properties of a Single Sensor Cooperativity: positive neutral negative

0.0

0.2

0.4

0.6

0.8

A

−4

−2

0

2

4

5

B

0

1

2

3

4

(no copy number variability)

200

300

400

500

0.0

0.5

1.0

C

100

1.5

0

−4

−2

D

2

4

| ||||||||||||| | | |||||||||||| | | | | | ||||||||| | | −2

0

present at much lower levels, in the range of several hundred, and the abundance of signaling proteins is often lower than that of other protein types (Newman et al., 2006; Taniguchi et al., 2010). Finally, in the STAR Methods, we show that multiple readouts of sensor activity over time also exhibit logarithmic dependence on copy number.

| 2

4

3

−4

2

2.

1.8

2.6

1

1

2

0.6

1.6

E

0

0

2.8

−1

1.2

1 −2

0.8

−3

2

1.4 0

2

2.4 4

78 Cell Systems 8, 76–85, January 23, 2019

(A) Classical mass action input-output relationships for receptors with positive, neutral, and negative cooperativity. The input is given by the ligand concentration, x and the output corresponds to the fraction of sensors in the activated state, hðxÞ. (B) (Solid line) The information capacity of all the above receptors is logarithmically dependent on the receptor copy number, N, as indicated by Equation 1, and this holds regardless of strength and direction of cooperativity (C and D show corresponding optimal input distributions and effectively distinguishable ligand concentrations). (Dashed/Dotted lines) The effect of copy number variations on capacity. A copy number, N, of a sensor with either of the activation functions shown in Figure 1A is distributed according to the Gamma distribution, GðmN ;sN Þ, with mean mN and standard deviation sN. Shown is the information capacity as a function of mN for systems with an indicated level of variability as quantified by the coefficient of variation sN =mN . The plotted information capacities correspond to either of the activation functions shown in (A), which numerically demonstrates that sensors with different degree and direction of cooperativity are equally affected. (C) The optimal input distributions, P ðxÞ, of sensors presented in (A). P ðxÞ represents the probability that the sensor’s output accurately reflects the actual ligand concentration, x. (D) An illustration of the effectively distinguishable ligand concentrations (or states) of the above receptors (calculated for N = 500 using Chernoff infor mation and thresholding to match 2C distinguishable states; see STAR Methods). (E) The impact of input distributions on information transfer. Information transfer, as quantified by the mutual information for sensors with no cooperativity of (A) for N = 100, yet with ligand concentrations distributed according to the lognormal distribution expðNðm; s2 ÞÞ. Contours represent combinations of m and s that give rise to the same mutual information expressed in bits. For comparison, the maximal mutual information obtained while assuming an optimal distribution of ligand concentrations, i.e., information capacity, C* z 2.9. Modeling details: positive cooperativity was modeled using the MWC model with parameters L = 0.01, a = 20, n = 6. Neutral and negative cooperativity were represented by the Hall model using parameters L = 0.01, a = 400, d = 400, and g = 1, for no cooperativity, and L = 0.01, a = 400, d = 400, and g = 104 for negative cooperativity. In all three cases, the sensor’s dissociation constant, Kd, was numerically adjusted to such that h(1) = 0.5.

6

8

Factors Compromising Information Capacity Equation 1 represents an optimal scenario. There exists a range of compromising factors—namely, different phenomena that decrease signaling capacity in living cells. Considering a single sensor type, the most evident compromising factors include copy number variation and cross-reacting ligands. In a given population of genetically identical cells, copy numbers vary significantly because of cell-to-cell variations in chromatin remodeling, transcription, translation levels, and many other factors. Accordingly, we considered a variable copy number with mean mN and standard deviation sN, as opposed to a fixed N. Specifically, assuming a Gamma distribution of N, we calculated the capacity, C*, as a function of the mean copy number, mN, for coefficients of variation, sN =mN , ranging from 0 to 1 as observed in typical experimental measurements (Bar-Even et al., 2006) (see STAR Methods). As expected, expression noise severely compromises C*, possibly to the actual, experimentally measured levels (Suderman et al., 2017) of C* z 1 (Figure 1B,

dashed lines). Mechanisms for limiting the effects of expression noise may exist including negative feedback loops; these, however, demand additional signaling components (Raser and O’shea, 2005). Hence, copy number variability severely lowers the capacity ceiling of a single sensor, essentially by limiting potential improvements of C* by increases in copy number. From the ligand’s point of view, natural realms also deviate from the optimality represented by Equation 1. As described above, C* describes the maximal information transfer assuming an optimal distribution of input ligand concentrations. Optimality, therefore, relates to the sensor’s Kd matching the most frequent ligand concentrations that this sensor encounters (see STAR Methods for more details). The Kd of signaling sensors generally evolved to match the ligand concentrations presented by the environment, for example, the KM values of enzymes generally evolve to match the prevailing substrate concentrations (Davidi et al., 2018). However, ligand concentrations may deviate from optimality. We found that information transfer exhibits a certain degree of robustness to such deviations, but substantial discrepancies result in a severe decrease compared to maximal transfer, C* (Figure 1E). Cooperativity Does Not Increase Information Capacity Not only is the potential for higher C* limited by copy numbers but changes in the mechanism of sensing cannot improve C* either. Indeed, Equation 1 indicates a fundamental property that may initially look counterintuitive—the information capacity of individual sensors does not depend on the shape of their activation function as long as the edge conditions are met (see the b parameter above). Therefore, neither positive nor negative cooperativity augment information capacity. Sensors with positive, neutral, and negative cooperativity, that exhibit the response curves plotted in Figure 1A, have exactly the same capacity and are equally affected by copy number variability (Figure 1B). Cooperativity has a profound effect on sensor properties (Ha and Ferrell, 2016; Ferrell and Ha, 2014c; Ferrell and Ha, 2014a, 2014b). How come it has no impact on information capacity? In a nutshell, information capacity is an integrated measure of signaling accuracy across an entire ligand concentration range, and hence, sensing sensitivity (the steepness of the activation curve; Figure 1A) trades off with detection range (i.e., the range of ligand concentration for optimal detection; Figure 1C ). Positive cooperativity, for example, results in high sensitivity, but optimal sensing is achieved only within a narrow range of ligand concentrations (Figure 1D). Further, the sensing accuracy depends on both the sensitivity of the system and the stochasticity of activation. As we show in STAR Methods, the activation function and its derivative determine which values are recognized with high accuracy. Therefore, a fundamental implication of Equation 1 is that manipulation of the shape of the function, hðxÞ, e.g., by cooperativity, and also by allostery (next Section), can only optimize sensing accuracy such that ligand concentrations that occur most frequently are recognized with the highest accuracy. Positive cooperativity, for example, ensures that relatively similar concentrations are effectively distinguishable. However, the number of effectively distinguishable concentrations remains unchanged; what varies with cooperativity is only how close discretely distinguished concentrations can be (Figure 1D). It is important to note that the reasoning above, along with our

methodological assumptions, is valid only up to a certain point in the steepness of the response curve. Practically, very strong positive cooperativity will result in a stepwise response curve, capable of only binary sensing, i.e., detection of whether ligand concentration is below or above a threshold. Allosteric Regulation Does Not Increase Information Capacity Like cooperativity, allostery also modulates sensor properties. Typically, the binding of an allosteric effector to a site outside the primary ligand’s binding site modulates the input-output response curve (Olsman and Goentoro, 2016) (the activation function, by our terminology). An allosteric sensor can, therefore, exist in more than one active state, e.g., k, states, each with a different activation function, hi ðxÞ (STAR Methods and Figure S1A). Accordingly, its output comprises the total number of active sensors, i.e., the sensors in all possible active states. Assuming that binding of the allosteric effector is independent of ligand binding, the number of active sensors in i-th state, Yi , follows the binomial distribution, Yi  Binðui hi ðxÞ; NÞ, where ui is the fraction of sensors in the i  th state (i goes from 1 to k). Then, the overall number of bound Pk the binomial distribution, sensors, Y = i = 1 Yi , also follows Pk Y  BinðhðxÞ; NÞ, with hðxÞ = i = 1 ui hi ðxÞ. However, although the response function, hðxÞ, may have a more complex form given that the allosteric effector affects ligand binding, Y follows the same distribution as a non-allosteric sensor because the sensor can only be either active or inactive. Thus, according to Equation 1, as long as the edge conditions are met (hð0Þ and hðNÞ), the sensor’s information capacity remains the same. To conclude, similar to cooperativity, allosteric regulation may modulate the distribution of concentrations of the input ligand, and may, therefore, enable, for example, the sensor to respond to a wider range of ligand concentrations. However, the number of distinguishable ligand concentrations remains the same. To the biochemist, this conclusion may seem illogical. Indeed, from the sensor’s point of view, an allosteric effector increases the number of different molecular states and may thus provide additional information. However, whether the allosteric effector is bound, or not, is not part of the sensor’s downstream signal. Colloquially speaking, the sensor does not ‘‘know’’ whether the allosteric effector is there or not, and thus, the sensor’s ability to transmit information is exactly the same. The allosteric effector’s presence could, of course, be sensed by another sensor and dully transmitted by a separate downstream signal. Sensing separate signals downstream of a single receptor is known as functional selectivity or biased agonism and is typical for GPCR signaling (Wisler et al., 2014). In the STAR Methods, we show that functional selectivity provides certain increase in capacity. The capacity, however, is still logarithmically dependent on the copy numbers, N. To conclude, cooperativity and allosteric control, including functional selectivity, do not circumvent the information capacity ceiling of a single sensor (Equation 1). Therefore, the divergence of an allosteric sensor from a non-regulated sensor alone, is not an effective evolutionary strategy for achieving overall high-information capacity, foremost, because harnessing the benefit of allostery demands an independent set of downstream components, and these are typically recruited by duplication and divergence of existing ones. Cell Systems 8, 76–85, January 23, 2019 79

gands, respectively. The sensor’s selectivity factor is given by 3.0

1.0

A 2.5

0.8

Cooperativity: positive neutral negative

2.0

0.6

False ligand distributions:

0.4

1.5

(low variability)

0.0

0.0

0.5

0.2

1.0

(high variability)

−4

B

−2

0

C

4

High variability

0.0

0.0

0.5

0.5

1.0

1.0

1.5

1.5

2.0

2.0

2.5

2.5

3.0

3.0

3.5

3.5

Low variability

2

2

16

128

1024

2

λ

16

128

1024

λ

Figure 2. Cross-Reactive Non-cognate Ligands Severely Compromise Information Capacity (A) A sensor that obeys the activation functions shown in Figure 1 encounters a cross-reactive non-cognate ‘‘false’’ ligand. The most likely concentration of the false ligand matches the sensor’s Kd for the cognate ligand and xNC follows a log-normal distribution, expðNðmNC ; s2NC ÞÞ, with either narrow or wide variability (light gray, sNC = 0.3; or dark gray, sNC = 1.5, respectively) and mNC = 0. (B) The information capacity as a function of the selectivity factor (the ratio of the dissociation constants of the non-cognate versus the cognate ligand; l = ðFÞ Kd =Kd ) in the case of low variability in the false, non-cognate ligand’s concentrations. The dashed line represents the C* value for the same sensor in the absence of the non-cognate ligand (N = 100). (C) As in (B) for high variability in the non-cognate ligand’s concentrations.

Cross-Reacting Non-cognate Ligands Further Compromise Signaling Capacity Competing ligands that resemble the cognate ligand are common. Such ligands may promiscuously bind and activate a sensor, thus introducing another noise source. Selectivity is a key property of all biomolecules including signaling systems. However, the selectivity of ligand binding is limited by various constraints, and some degree of cross-reactivity with noncognate ligands is inevitable (Tawfik, 2014). How does crossreactivity affect the information capacity of receptors? Can cooperativity, for example, improve the ability to accurately transmit signals in the presence of non-cognate, ‘‘false’’ ligands? We thus considered a model that includes both a true (cognate, C) and a false (non-cognate, NC) ligand. This model is described by the activation function, hðxÞ, of Figure 1A, with x = xC + xNC =l, where xC and xNC are the concentrations of the true and false li80 Cell Systems 8, 76–85, January 23, 2019

K

ðNCÞ

ðNCÞ

designates the dissociation constant of l = Kd d , whereby Kd the non-cognate ligand. The higher l is, the less likely it is for the receptor to bind the non-cognate ligand. The variability in the non-cognate ligand concentration is represented by the probability distribution PðxNC Þ. We have subsequently calculated the information capacity of different sensors, including sensors with no, positive, or negative cooperativity as a function of the degree of cross-reactivity with the non-cognate ligand, l. We assumed two characteristic scenarios regarding the distribution of the non-cognate ligand. In both cases, the non-cognate ligand’s mean concentration matches the sensor’s Kd for the true ligand, but the distribution of its concentrations can be narrow (an average deviation from the median of approximately 30%) or wide (an average deviation from the median of 150%; Figure 2A). Mathematically, this was represented by XNC following the log-normal distribution, expðNðmNC ; s2NC ÞÞ, with sNC = 0.3 or sNC = 1.5, and with mNC = 0 to position the distribution’s median at the sensor’s Kd. Variability of the non-cognate ligand shapes the distribution of the output Y for each given concentration of R the cognate ligand. Precisely, PðYjXC = xC Þ = PðYjX = xC + xNC =lPðxNC ÞdxNC Þ. The information capacity, of PðYjXC = xC Þ, is here numerically calculated as described in STAR Methods. The distribution of the cognate ligand’s concentration is assumed, as before, to be optimal, i.e., in correspondence with the sensor’s Kd for the cognate ligand. Applying this model, Figures 2B and 2C show how information capacity, C*, depends on the receptor’s selectivity, l, values (x axis) and cooperativity. As expected, C* is highly compromised when a sensor’s crossreactivity with the non-cognate ligand is high (low l; the dashed line represents C* in the absence of a non-cognate ligand). High variability in the concentration of the non-cognate ligand results in further decline in signaling capacity, and C* is only restored at a very high selectivity (Figure 2C). The degree and direction of cooperativity can affect C* when non-cognate ligands are present (in opposition to the model with a cognate ligand only, where cooperativity has no effect; Figure 1B). Sensors with negative cooperativity are more robust to cross-reactivity, especially at high non-cognate ligand variability. This result is intuitively clear. The ability to accurately sense within a wider concentration range decreases vulnerability to a non-cognate ligand. However, this effect is relatively minor (Figures 2B and 2C). In summary, cross-reactivity is clearly a major hurdle for highsignaling capacity. Here also, modulation of the biochemical mechanism (e.g., cooperativity) is of little utility. Are there, then, other ways by which signaling capacity can be significantly augmented and that can break the theoretical ceiling represented by Equation 1 and Figure 1B? Duplication with Even Minor Divergence Doubles Signaling Capacity Gene duplication is abundant and in some genomes occurs at a frequency equal to or even higher than the frequency of single nucleotide mutations. The immediate effect of duplication is an increase in protein dose, but the effect of copy number on information capacity is subject to logarithmic diminishing returns (Equation 1). With time, mutations in the duplicated copy may

no divergence

complete divergence

15

20

A

10

complete divergence

5

no divergence

0

1

2

3

i (numer of duplication events)

2

3

4

5

6

B

1

divergent sensors duplicated non-divergent sensors (

)

)

0

prior to duplication (

2

16

λ

128

1024

Figure 3. The Information Capacity of a Signaling System Undergoing Duplication and Divergence (A) Duplication with no divergence results in the doubling of the copy number, thus yielding only a minor increase in information capacity. However, functional divergence of the duplicated gene to yield a completely new sensor (neo-functionalization) results in an exponential increase in information capacity, even when the copy number per sensor remains unchanged. Plotted is an example whereby the initial copy number is 100. (B) Duplication and partial divergence of cross-reactive sensors. The dashed lines report the information capacity of the ancestral copy, prior (gray) and immediately after duplication (i.e., with no divergence; black), as a function of the selectivity factor (assuming wide variability in the concentration of the noncognate ligand, as in Figure 2C, with no cooperativity). The black solid line shows the information capacity of the system following duplication and divergence of a new paralog that preferentially binds and signals the noncognate ligand (accordingly, for this newly diverged sensor, the original cognate ligand is the non-cognate ligand). The original copy maintains the ancestral ligand preference. Model details: for all computations N0 = 100 and neutral cooperativity model as in Figure 1 were assumed.

fixate, leading to functional divergence, i.e., to changes in ligand affinity and selectivity. Given sufficient time and sequence divergence, a completely new sensor type may evolve, thus doubling C*. Figure 3A compares these two edge scenarios: A single sensor type encoded by the ancestral genes is present at N0 = 100. Duplication with no functional divergence (e.g., recognition of the same ligand with the same affinity) results in doubling of the number of copies of the very same sensor molecules (N = 200). This scenario is compared to duplication and complete divergence while maintaining the same total copy number (2 3 N = 100). Assuming complete divergence, further duplications with divergence yield an exponential increase in C*. The above said, complete divergence of ligand specificity is not easily achieved. Typically, a new sensor, be it a receptor or enzyme, emerges from cross-reactivity in the original receptor, i.e., from promiscuous activation by alternative ligands (i.e., non-cognate, false ligands, as described above). A new sensor then arises via mutations that lead to a shift in selectivity. The sensor’s affinity for the promiscuous non-cognate ligand gradually increases, whereas the affinity for the original one decreases (reversion of selectivity). However, the specialization of biochemical functions is a gradual and slow process. The early mutations typically induce relatively large increases in affinity toward the target ligand (previously the non-cognate ligand, now the cognate ligand for the diverging receptor). However, binding of the original ligand (the previous cognate ligand, now the noncognate ligand for the evolving receptor) is likely to be initially retained (Tawfik and Khersonsky, 2010). Many more mutations, each with a relatively small effect, are typically required to achieve a reversion of selectivity (Tokuriki et al., 2012), and exclusion of binding of the original ligand may only occur under explicit selection against it (negative selection) (Tawfik, 2014). Accordingly, paralogs typically show patterns of overlapping selectivities, e.g., the primary ligand of one paralogue would also activate the other, and vice versa. Is sensor duplication beneficial even with a limited level of functional divergence? To address this question, we examined the following scenario. A sensor that is activated by a given input ligand is also promiscuously activated by a related non-cognate ligand. The information capacity of the ancestral sensor prior and immediately after duplication is presented in Figure 3B by the gray and black dashed lines, respectively. Because the ancestral sensor is a priori cross-reactive with respect to the non-cognate ligand, C* strongly depends on its selectivity (l). Upon duplication, a new paralog diverges such that the non-cognate ligand becomes its cognate ligand. However, the original sensor retains its cross-reactivity, and the new paralog is inevitably cross-reactive toward the original cognate ligand. The discrimination between these two ligands, l, is given, as described above, by the ratio ðNCÞ ðCÞ ðNCÞ ðCÞ and Kd of affinities, such that l = Kd =Kd , whereby Kd represent the dissociation constant for the non-cognate and cognate ligands, respectively. For simplicity, l takes the same value for both paralogs, although the diverging paralog likely has a lower l, at least at the early stages of divergence (upon ðNCÞ ðCÞ becomes Kd , and thus duplication with no divergence, Kd the l value is the reciprocal of the original l). Mathematically, the system studied here is defined by the two-dimensional output ðY1 ; Y2 Þ and two-dimensional input ðxC ; xNC Þ. The two output components are assumed to have binomial distributions Cell Systems 8, 76–85, January 23, 2019 81

Ligand distributions - normal scale 0.6

1.0

neutral cooperativity optimal distribution

0.8

log-normal

0.6

1.0

neutral cooperativity optimal distribution

0.8

0.8

Ligand distributions - log-scale

0.5

A

-2

0

2

0.4 0.2

0.0

0.0

0.1

0.2

0.2 0.0

0.0 -4

0.3

Response 0.4 0.6

0.4

0.6 0.4 0.2

Response

log-normal

4

0

5

10

15

20

x

Scenario 2

)

6 5

5

5

divergent sensors duplicated non-divergent sensors ( )

4 MI (bits)

3

2 1 0

0

0

1

1

2

2

3

MI (bits)

4

4

prior to duplication ( MI (bits)

Scenario 3

6

6

Scenario 1

2

16

128

1024

3

B

2

16

λ

128

1024

λ

2

16

128

1024

λ

Figure 4. The Impact of the Input Distribution on Information Transfer Following Duplication and Partial Divergence of Cross-Reactive Sensors (A) The optimal input distribution of a sensor with no cooperativity as in Figure 1 (in gray), and a log-normal distribution expðNðmNC ; s2NC ÞÞ with sNC = 1.5 and mNC = s2NC (in green), were applied as distributions of either the true, cognate ligand, and/or the false, non-cognate one. The left and right panels show the same distributions in the log and normal scale, respectively. The activation function is plotted for reference (as in Figure 1). (B) Information transfer, as quantified by the mutual information, in the setting considered in Figure 3B, and in three different scenarios regarding the distributions of the cognate and non-cognate ligands. In Scenario 1, both the cognate and non-cognate ligands follow the log-normal distribution. In Scenario 2, the cognate and non-cognate ligands follow the optimal and log-normal distribution, respectively. In Scenario 3, both the cognate and non-cognate ligands follow the optimal distribution.

with response functions hðxC + xNC =lÞ and hðxNC + xC =lÞ, respectively. The information capacity, of PðY1 ; Y2 jXC = xC ; XNC = xNC Þ, is here numerically calculated as described in STAR Methods. As can be seen, duplication gives rise to a substantial increase in C*, assuming of course that downstream components are available that result in transmission of a new output, notably, even with comparatively low selectivity (Figure 3B black line). Interestingly, C* saturates at l values that are smaller than those required for the ancestral sensor to plateau (16 versus > 100, respectively). Thus, in a scenario where a non-cognate crossreactive ligand is present, duplication followed by limited divergence of selectivity can double C*. Furthermore, if crossreactivity is a priori high, a reversion of selectivity in favor of the ancestral non-cognate ligand can be readily achieved via a few mutations. Alternative scenarios where the concentrations of the cognate and non-cognate ligands are distributed differently lead to a similar outcome (Figure 4). High capacity achieved at low selectivity may seem, at first, counter-intuitive. However, this result can be explained by a simple argument. Broadly speaking, the noise, a non-cognate promiscuously activating ligand, becomes part of the signal. Upon duplication and even minor divergence, the system can sense two linear combinations of ligand concentrations, i.e., xC + xNC =l and xNC + xC =l, across a relatively wide, but not the entire range of ligand concentrations. 82 Cell Systems 8, 76–85, January 23, 2019

DISCUSSION We present here a comprehensive framework for analyzing the information capacity of individual signaling components. Our work complements previous attempts to understand how signaling controls critical cellular processes despite noise. Previous, contributions highlighted the relevance of signaling dynamics in enhancing information capacity (Noren et al., 2016; Behar and Hoffmann, 2010; Selimkhanov et al., 2014), as well as the information content of cellular populations (Suderman et al., 2017). Our framework allowed us to examine a wide range of factors that may augment signaling capacity (cooperativity, allostery, and copy number increases) or compromise it (copy number variations and cross-reactive ligands). Foremost, this framework has also enabled a deeper analysis of the evolution of signaling. We found that duplication accompanied by minimal divergence appears to be the most effective way of augmenting signaling capacity— this evolutionary process takes a nuisance, i.e., a promiscuously binding ligand and turns it into an advantage. Further, newly duplicated sensors are a priori cross-reactive because divergence toward higher selectivity is a long evolutionary process. Thus, newly duplicated and diverged proteins tend to act as ‘‘generalists’’ intermediates. Such intermediates would bind and be activated by the newly evolving ligand, cross-react with the original one, and would also exhibit novel,

promiscuous binding specificities that were not present in the original sensor and were never selected for (Rockah-Shmuel and Tawfik, 2012; Matsumura and Ellington, 2001). This broad specificity may in turn provide the driving force for a second round of duplication and divergence of yet another generation of sensors. In this respect, duplication with minimal divergence is a self-amplifying process, and thus, the expansion of signaling systems may be inherent rather than purely adaptive, as with the expansion of other elements (Lynch and Walsh, 2007). Nonetheless, over long evolutionary timescales, paralog expansion does increase the capacity of sensing, as is apparent for example in olfactory systems (Nei et al., 2008). The cross-reactivity of receptors (GPCRs in this case) and of their downstream components may, therefore, be crucial for the evolution of higher sensing capacity, especially in systems that are fast evolving to respond to new cues (de Mendoza et al., 2014). Cross-reactivity may thus be perpetuated by these systems through frequent duplications and as a result of the limited information capacity that individual sensors can provide, be they highly selective, sensitive, and/or allosterically regulated. However, in other signaling systems such as bacterial two-component signaling, cross-reactivity is detrimental, and such systems seem to have explicitly evolved toward high specificity (Rowland and Deeds, 2014). Finally, the limited divergence of duplicated signaling components results in overlapping specificities, and thereby in high connectivity between pathways. Our model of a sensor applies not only to the primary sensing component (e.g., a receptor) but also to the downstream components, e.g., kinases. Hence, duplication with limited divergence applies to multiple components and entire signaling pathways. Indeed, cell signaling is best viewed as a highly connected network, i.e., a large number of branched, inter-connected signaling pathways each with a limited signaling capacity. An engineer aiming at an accurate signaling system would design few entirely orthogonal signaling components while ensuring that the output of each component is highly reproducible (i.e., least noisy, discrete, and non-overlapping with other components). Why then do cells have many interconnected, cross-reactive pathways, each producing a noisy output? Can crosstalk between pathways provide some distinct benefits? Different cell types, each expressing a different variation of the same pathway, can elicit different outcomes in response to the same ligand (Rowland et al., 2017). However, here we show that multiple cross-wired pathways are likely to be a direct outcome of the way by which signaling evolves. As opposed to increasing the capacity of individual, distinct pathways, duplication with limited divergence is the most accessible evolutionary strategy for achieving higher signaling capacity. STAR+METHODS Detailed methods are provided in the online version of this paper and include the following: d d d

KEY RESOURCES TABLE CONTACT FOR REAGENT AND RESOURCE SHARING METHOD DETAILS B Information-Theory Background B Calculation of Information Capacity B Modeling of Sensors Activity

B B B B B B B B

Calculation of Distinguishable Ligand Concentrations Copy Number Variation Cross-Reacting Ligands Additional Analysis A Logarithmic Ceiling Holds for Temporally Resolved Outputs Allosteric Regulation Does Not Increase Information Capacity—Example Functional Selectivity Modestly Increases Information Capacity Regulated Sensors Can Provide Further Benefit when Additional Signaling Components Are Available

SUPPLEMENTAL INFORMATION Supplemental Information includes one figure and can be found with this article online at https://doi.org/10.1016/j.cels.2018.12.006. ACKNOWLEDGMENTS We thank Shalev Itzkovitz, Ron Milo, Tal Einav, Vahe Galstyan, Naama Barkai, and Uri Alon for helpful comments during the preparation of this manuscript. M.K. was supported by the European Commission Research Executive Agency under grant CIG PCIG12-GA-2012- 334298 and by the Polish National Science Centre under grant 2015/17/B/NZ2/03692. D.S.T. is funded by the Adelis Foundation and is the Nella and Leon Benoziyo Professor of Biochemistry. We thank Csaba Pal and Fyodor Kondrashov, the organizers of an EMBO workshop on Evolution and Systems Biology, Barcelona 2014, that had seeded our collaboration. AUTHOR CONTRIBUTIONS Conceptualization, M.K. (Information-theory perspective) and D.S.T. (Evolutionary & biochemical perspective); Modeling Methodology: M.K.; Biochemical Methodology, D.S.T.; Investigation, M.K. and D.S.T.; Writing – Review, & Editing: M.K. and D.S.T. DECLARATION OF INTERESTS The authors declare no competing interests. Received: May 22, 2018 Revised: October 15, 2018 Accepted: December 10, 2018 Published: January 16, 2019 REFERENCES Amit, I., Wides, R., and Yarden, Y. (2007). Evolvable signaling networks of receptor tyrosine kinases: relevance of robustness to malignancy and to cancer therapy. Mol. Syst. Biol. 3, 151. Bar-Even, A., Paulsson, J., Maheshri, N., Carmi, M., O’Shea, E., Pilpel, Y., and Barkai, N. (2006). Noise in protein expression scales with natural protein abundance. Nat. Genet. 38, 636–643. Barkai, N., and Leibler, S. (1997). Robustness in simple biochemical networks. Nature 387, 913–917. Behar, M., and Hoffmann, A. (2010). Understanding the temporal codes of intra-cellular signals. Curr. Opin. Genet. Dev. 20, 684–693. Berg, H.C., and Purcell, E.M. (1977). Physics of chemoreception. Biophys. J. 20, 193–219. Bernardo, J.M. (1979). Reference posterior distributions for bayesian inference. J. R. Stat. Soc. Series B 41, 113–128. Brennan, M.D., Cheong, R., and Levchenko, A. (2012). How information theory handles cell signaling and uncertainty. Science 338, 334–335.

Cell Systems 8, 76–85, January 23, 2019 83

Bridgham, J.T., Carroll, S.M., and Thornton, J.W. (2006). Evolution of hormone-receptor complexity by molecular exploitation. Science 312, 97–101. Brunel, N., and Nadal, J.P. (1998). Mutual information, Fisher information, and population coding. Neural Comput. 10, 1731–1757.

Komorowski, M., Costa, M.J., Rand, D.A., and Stumpf, M.P.H. (2011). Sensitivity, robustness, and identifiability in stochastic chemical kinetics models. Proc. Natl. Acad. Sci. USA 108, 8645–8650.

Cheong, R., Rhee, A., Wang, C.J., Nemenman, I., and Levchenko, A. (2011). Information transduction capacity of noisy biochemical signaling networks. Science 334, 354–358.

Komorowski, M., Mie˛kisz, J., and Stumpf, M.H. (2013). Decomposing noise in biochemical signaling systems highlights the role of protein degradation. Biophys. J. 104, 1783–1793.  _ J., and Stumpf, M.P.H. (2012). StochSens— Komorowski, M., Zurauskien e,

Christopoulos, A., and Kenakin, T. (2002). G protein-coupled receptor allosterism and complexing. Pharmacol. Rev. 54, 323–374.

matlab package for sensitivity analysis of stochastic chemical systems. Bioinformatics 28, 731–733.

Clarke, B.S., and Barron, A.R. (1994). ‘Jeffreys’ prior is asymptotically least favorable under entropy risk. J. Stat. Plan. Inference 41, 37–60.

Le Cam, L. (2012). Asymptotic Methods in Statistical Decision Theory (Springer Science & Business Media).

Cornish-Bowden, A. (2013). Fundamentals of Enzyme Kinetics (Wiley).

Lestas, I., Vinnicombe, G., and Paulsson, J. (2010). Fundamental limits on the suppression of molecular fluctuations. Nature 467, 174–178.

Cover, T.M., and Thomas, J.A. (2012). Elements of Information Theory (John Wiley & Sons). ska, J., Milo, R., and Tawfik, D.S. (2018). A Davidi, D., Longo, L.M., Jab1on bird’s-eye view of enzyme evolution: chemical, physicochemical, and physiological considerations. Chem. Rev. 118, 8786–8797. de Mendoza, A., Sebe´-Pedro´s, A., and Ruiz-Trillo, I. (2014). The evolution of the GPCR signaling system in eukaryotes: modularity, conservation, and the transition to metazoan multicellularity. Genome Biol. Evol. 6, 606–619. Elowitz, M.B., Levine, A.J., Siggia, E.D., and Swain, P.S. (2002). Stochastic gene expression in a single cell. Science 297, 1183–1186. English, B.P., Min, W., Van Oijen, A.M., Lee, K.T., Luo, G., Sun, H., Cherayil, B.J., Kou, S.C., and Xie, X.S. (2006). Ever-fluctuating single enzyme molecules: Michaelis-Menten equation revisited. Nat. Chem. Biol. 2, 87–94. Ferrell, J.E., and Ha, S.H. (2014a). Ultrasensitivity part i: Michaelian responses and zero-order ultrasensitivity. Trends Biochem. Sci. 39, 496–503. Ferrell, J.E., and Ha, S.H. (2014b). Ultrasensitivity part iii: cascades, bistable switches, and oscillators. Trends Biochem. Sci. 39, 612–618. Ferrell, J.E., Jr., and Ha, S.H. (2014c). Ultrasensitivity part ii: multisite phosphorylation, stoichiometric inhibitors, and positive feedback. Trends Biochem. Sci. 39, 556–569. Friedlander, T., and Brenner, N. (2011). Adaptive response and enlargement of dynamic range. Math. Biosci. Eng. 8, 515–528.

Levin, D., Harari, D., and Schreiber, G. (2011). Stochastic receptor expression determines cell fate upon interferon treatment. Mol. Cell. Biol. 31, 3252–3266. Liongue, C., Sertori, R., and Ward, A.C. (2016). Evolution of cytokine receptor signaling. J. Immunol. 197, 11–18. Lynch, M., and Walsh, B. (2007). The Origins of Genome Architecture, 98 (Sinauer Associates). Martins, B.M.C., and Swain, P.S. (2011). Trade-offs and constraints in allosteric sensing. PLoS Comp. Biol. 7, e1002261. Marzen, S., Garcia, H.G., and Phillips, R. (2013). Statistical mechanics of monod–wyman–changeux (mwc) models. J. Mol. Biol. 425, 1433–1460. Matsumura, I., and Ellington, A.D. (2001). In vitro evolution of beta-glucuronidase into a beta-galactosidase proceeds through non-specific intermediates. J. Mol. Biol. 305, 331–339. Milo, R., and Phillips, R. (2015). Cell Biology by the Numbers (CRC Press). Munk, C., Harpsøe, K., Hauser, A.S., Isberg, V., and Gloriam, D.E. (2016). Integrating structural and mutagenesis data to elucidate gpcr ligand binding. Curr. Opin. Pharmacol. 30, 51–58. Nei, M., Niimura, Y., and Nozawa, M. (2008). The evolution of animal chemosensory receptor gene repertoires: roles of chance and necessity. Nat. Rev. Genet. 9, 951–963.

ik, G. (2016). Friedlander, T., Prizak, R., Guet, C.C., Barton, N.H., and Tkac Intrinsic limits to gene regulation by global crosstalk. Nat. Commun. 7, 12307.

Newman, J.R.S., Ghaemmaghami, S., Ihmels, J., Breslow, D.K., Noble, M., DeRisi, J.L., and Weissman, J.S. (2006). Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature 441, 840–846.

Goldman, B.S., Nierman, W.C., Kaiser, D., Slater, S.C., Durkin, A.S., Eisen, J.A., Ronning, C.M., Barbazuk, W.B., Blanchard, M., Field, C., et al. (2006). Evolution of sensory complexity recorded in a myxobacterial genome. Proc. Natl. Acad. Sci. USA 103, 15200–15205.

Noren, D.P., Chou, W.H., Lee, S.H., Qutub, A.A., Warmflash, A., Wagner, D.S., Popel, A.S., and Levchenko, A. (2016). Endothelial cells decode vegf-mediated ca2+ signaling patterns to produce distinct functional responses. Sci. Signal. 9, ra20.

Greenwood, P.E., and Nikulin, M.S. (1996). A Guide to Chi-Squared Testing (John Wiley & Sons).

Olsman, N., and Goentoro, L. (2016). Allosteric proteins as logarithmic sensors. Proc. Natl. Acad. Sci. USA 113, E4423–E4430.

Ha, S., and Ferrell, J. (2016). Thresholds and ultrasensitivity from negative cooperativity. Science 280, 990–993.

Pires-daSilva, A., and Sommer, R.J. (2003). The evolution of signalling pathways in animal development. Nat. Rev. Genet. 4, 39–49.

Hall, D.A. (2000). Modeling the functional effects of allosteric modulators at pharmacological receptors: an extension of the two-state model of receptor activation. Mol. Pharmacol. 58, 1412–1423.

Rajagopal, S., Rajagopal, K., and Lefkowitz, R.J. (2010). Teaching old receptors new tricks: biasing seven-transmembrane receptors. Nat. Rev. Drug Discov. 9, 373–386.

Housden, B.E., and Perrimon, N. (2014). Spatial and temporal organization of signaling pathways. Trends Biochem. Sci. 39, 457–464.

Raser, J.M., and O’shea, E.K. (2005). Noise in gene expression: origins, consequences, and control. Science 309, 2010–2013.

Hughes, A.L. (2005). Gene duplication and the origin of novel proteins. Proc. Natl. Acad. Sci. USA 102, 8791–8792.

Rhee, A., Cheong, R., and Levchenko, A. (2012). The application of information theory to biochemical signaling systems. Phys. Biol. 9, 045011.

Hwang, J.-I., Moon, M.J., Park, S., Kim, D.-K., Cho, E.B., Ha, N., Son, G.H., Kim, K., Vaudry, H., and Seong, J.Y. (2013). Expansion of secretin-like g protein-coupled receptors and their peptide ligands via local duplications before and after two rounds of whole-genome duplication. Mol. Biol. Evol. 30, 1119–1130.

Rockah-Shmuel, L., and Tawfik, D.S. (2012). Evolutionary transitions to new dna methyltransferases through target site expansion and shrinkage. Nucleic Acids Res. 40, 11627–11637.

Jetka, T., Niena1towski, K., Filippi, S., Stumpf, M.P.H., and Komorowski, M. (2018). An information-theoretic framework for deciphering pleiotropic and noisy biochemical signaling. Nat. Commun. 9, 4591.

Rowland, M.A., Greenbaum, J.M., and Deeds, E.J. (2017). Crosstalk and the evolvability of intracellular communication. Nat. Commun. 8, 16009.

Kenakin, T. (2011). Functional selectivity and biased receptor signaling. J. Pharmacol. Exp. Ther. 336, 296–302. Kenakin, T.P. (2012). Biased signalling and allosteric machines: new vistas and challenges for drug discovery. Br. J. Pharmacol. 165, 1659–1669.

84 Cell Systems 8, 76–85, January 23, 2019

Rowland, M.A., and Deeds, E.J. (2014). Crosstalk and the evolution of specificity in two-component signaling. Proc. Natl. Acad. Sci. USA 111, 5550–5555.

Rowland, M.A., Harrison, B., and Deeds, E.J. (2015). Phosphatase specificity and pathway insulation in signaling networks. Biophys. J. 108, 986–996. Rubenstein, A.B., Pethe, M.A., and Khare, S.D. (2017). MFPred: rapid and accurate prediction of protein-peptide recognition multispecificity using selfconsistent mean field theory. PLoS Comput. Biol. 13, e1005614.

Selimkhanov, J., Taylor, B., Yao, J., Pilko, A., Albeck, J., Hoffmann, A., Tsimring, L., and Wollman, R. (2014). Accurate information transmission through dynamic biochemical signaling networks. Science 346, 1370–1373.

ik, G., Callan, C.G., and Bialek, W. (2008a). Information flow and optimiTkac zation in transcriptional regulation. Proc. Natl. Acad. Sci. USA 105, 12265–12270.

Semyonov, J., Park, J.-I., Chang, C.L., and Hsu, S.Y.T. (2008). GPCR genes are preferentially retained after whole genome duplication. PLoS One 3, e1903.

ik, G., Callan, C.G., Jr., and Bialek, W. (2008b). Information capacity of Tkac genetic regulatory elements. Phys. Rev. E 78, 011910.

€ubert, C., Russ, A., and Scho¨neberg, Strotmann, R., Schro¨ck, K., Bo¨selt, I., Sta T. (2011). Evolution of gpcr: change and continuity. Mol. Cell. Endocrinol. 331, 170–178.

ik, G., and Walczak, A.M. (2011). Information transmission in genetic regTkac ulatory networks: a review. J. Phys. Condens. Matter 23, 153102.

Suderman, R., Bachman, J.A., Smith, A., Sorger, P.K., and Deeds, E.J. (2017). Fundamental trade-offs between information flow in single cells and cellular populations. Proc. Natl. Acad. Sci. USA 114, 5755–5760.

Tokuriki, N., Jackson, C.J., Afriat-Jurnou, L., Wyganowski, K.T., Tang, R., and Tawfik, D.S. (2012). Diminishing returns and tradeoffs constrain the laboratory optimization of an enzyme. Nat. Commun. 3, 1257.

Swain, P.S., Elowitz, M.B., and Siggia, E.D. (2002). Intrinsic and extrinsic contributions to stochasticity in gene expression. Proc. Natl. Acad. Sci. USA 99, 12795–12800.

Venkatakrishnan, A.J., Deupi, X., Lebon, G., and Tate, C.G. (2013). Molecular Signatures of G-Protein-Coupled Receptors (Nature Publishing).

Symmons, O., and Raj, A. (2016). What’s luck got to do with it: single cells, multiple fates, and biological nondeterminism. Mol. Cell 62, 788–802. Taniguchi, Y., Choi, P.J., Li, G.-W., Chen, H., Babu, M., Hearn, J., Emili, A., and Xie, X.S. (2010). Quantifying E. coli proteome and transcriptome with singlemolecule sensitivity in single cells. Science 329, 533–538. Tawfik, D.S. (2014). ScienceDirectAccuracy-rate tradeoffs: how do enzymes meet demands of selectivity and catalytic efficiency? Curr. Opin. Chem. Biol. 21, 73–80. Tawfik, O.K., and Khersonsky, O. (2010). Enzyme promiscuity: a mechanistic and evolutionary perspective. Annu. Rev. Biochem. 79, 471–505.

Warren, I.A., Ciborowski, K.L., Casadei, E., Hazlerigg, D.G., Martin, S., Jordan, W.C., and Sumner, S. (2014). Extensive local gene duplication and functional divergence among paralogs in Atlantic salmon. Genome Biol. Evol. 6, 1790–1805. Wisler, J.W., Xiao, K., Thomsen, A.R., and Lefkowitz, R.J. (2014). Recent developments in biased agonism. Curr. Opin. Cell Biol. 27, 18–24. Zhang, R., Fruhwirth, G.O., Coban, O., Barrett, J.E., Burgoyne, T., Lee, S.H., Simonson, P.D., Baday, M., Kholodenko, B.N., Futter, C.E., et al. (2017). Probing the heterogeneity of protein kinase activation in cells by super-resolution microscopy. ACS Nano 11, 249–257.

Cell Systems 8, 76–85, January 23, 2019 85

STAR+METHODS KEY RESOURCES TABLE

REAGENT or RESOURCE

SOURCE

IDENTIFIER

R Foundation for Statistical Computing

https://cran.r-project.org

Software and Algorithms R

CONTACT FOR REAGENT AND RESOURCE SHARING Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact Micha1 Komorowski ([email protected]). METHOD DETAILS Information-Theory Background Regardless of the specific details of the model, within information theory, a system of sensory proteins, be it a single receptor or an entire pathway, can be considered as an input-output device represented by the distribution PðYjX = xÞ that measures an input signal x by eliciting a stochastic output, Y. For a system of N copies of a ligand-activated sensor, the ligand’s concentration comprises the input, and the output is the number of sensor molecules in the activated state. Due to stochasticity, the output relates to the input with limited precision. This precision can be quantified by considering a statistical estimator, xbðYÞ, of the signal, x, as calculated from the output, Y. The estimator, xbðYÞ, can be interpreted as a representation of the signal deduced from the output. The standard deviation, sð xbðYÞÞ, of any unbiased estimator, Eð xbðYÞÞ = x, and therefore of any meaningful pffiffiffiffiffiffiffiffiffiffiffi1representation of the signal deduced from output that on average is correct, satisfies the Crame´r-Rao inequality, sð xbðYÞÞR FIðxÞ , where 2  vln PðYjX = xÞ ; (Equation 2) FIðxÞ = E vx is the the Fisher information (Komorowski et al., 2011; Komorowski et al., 2012) (FI) of the system PðYjX = xÞ. Moreover, in scenarios in which the Fisher information is relatively high, for instance for high copy number N, and when the estimate sð xbðYÞÞ is efficient, i.e., has lowest possible variability, e.g. maximum likelihood estimator, then, the standard deviation is well approximated by the inverse of the FI (Le Cam, 2012) pffiffiffiffiffiffiffiffiffiffiffi1 sð xbðYÞÞz FIðxÞ : (Equation 3) The above quantification of singling fidelity, examined fixed value of the signal, x. The precision of signaling is, however, different for various values of the signal, x. The quantification of the overall capability of the system to perform reliable signaling must consider all possible values of the signal. It is usually achieved by considering the distribution of different signal values, PðxÞ, and its entropy Z (Equation 4) HðXÞ =  log2 ðPðxÞÞPðxÞdx; X

where X is a space of possible values of the signal. The distribution PðxÞ can be interpreted as the distribution of different ligand concentrations in the environment. Via the Bayes, conditional probability, formula, the distribution PðXjY = yÞ represents the plausible values that generated the specific output, y. The entropy of the distribution PðXjY = yÞ Z HðXjY = yÞ =  log2 ðPðxjY = yÞÞPðxjY = yÞdx; (Equation 5) X

quantifies the uncertainty regarding the input after the specific output, y, was observed. As the output is random, averaging HðXjY = yÞ over all possible outputs quantifies the average uncertainty regarding the input given an output, HðXjYÞ Z HðXjYÞ =  log2 ðHðXjY = yÞÞPðyÞdy; (Equation 6) Y

where Y is the space of all possible outputs. The difference between HðXÞ and HðXjYÞ measures, therefore, the average reduction in uncertainty regarding the input, X, resulting from observing an output, Y, and is referred to as mutual information, IðX;YÞ, between the input and the output

e1 Cell Systems 8, 76–85.e1–e6, January 23, 2019

IðX; YÞ = HðXÞ  HðXjYÞ:

(Equation 7)

The mutual information, naturally, depends on the distribution of inputs, PðxÞ. For instance, if distinct inputs that generate similar outputs occur frequently the reduction of uncertainty resulting from observing these outputs will be lower than in the case where distinct inputs that generate distinct outputs occur frequently. Therefore, to quantify how much information can be transmitted through a system the maximal mutual information, with respect to all input distributions, named information capacity, C*, is considered C = maxIðX; YÞ: PðxÞ

(Equation 8)

Calculation of Information Capacity  For the log of base 2 used above, the information capacity, C* is expressed in bits and 2C can be interpreted as the number of inputs that the system can effectively resolve in a specified mathematical sense (Cover and Thomas, 2012). For instance, if C* = 2 there exist four such concentrations that can be resolved with an arbitrary low error (Cover and Thomas, 2012). The distribution, PðxÞ, for which the capacity is achieved is called the optimal input distribution and is denoted by P ðxÞ.The information capacity quantifies the number of effectively resolvable states, whereas the Fisher information quantifies the fidelity of signaling. The two, however, are closely related. Precisely, classical results of Bayesian statistics (Bernardo, 1979; Clarke and Barron, 1994) describing non-informative priors show that, for large N, pffiffiffiffiffiffiffiffiffiffiffi (Equation 9) P ðxÞf FIðxÞ: Moreover, it can be shown (Clarke and Barron, 1994; Brunel and Nadal, 1998; Jetka et al., 2018) that the information capacity can be approximated with high accuracy by the following Equation: 1 0 Z pffiffiffiffiffiffiffiffiffiffiffi 1  (Equation 10) FIðxÞdxA: C = log2 @pffiffiffiffiffiffiffiffiffi 2pe X

It reduces the problem of finding information capacity to the problem of computing the Fisher information. Compared to the well ik et al., 2008a, 2008b) and (Tkac ik et al., 2008a, 2008b), Equaestablished approach of small noise approximation proposed in (Tkac tions 9 and 10 provide more general and, on average, more accurate tools to quantify information capacity (Jetka et al., 2018) Their derivation and applicability to study signaling systems is discussed in (Jetka et al., 2018). In short, the derivation is based on the so called asymptotic limit. Specifically, it is assumed that the signaling system is composed of N independent entities that sense the signal, e.g. receptor molecules. Capacity is calculated in the limit of large N and then regressed back to smaller N (Jetka et al., 2018). Similarly to small noise approximation, derivation of Equations 9 and 10 involved variational optimisation. In specific scenarios, e.g. binomial distribution, both approaches lead to the same results. Modifications of Equations 9 and 10 are particularly useful to study systems that sense more than one input signal, e.g. concentrations of l different ligands (Jetka et al., 2018). For such systems, the input value x is a vector with entries corresponding to concentrations of sensed ligands, i.e., x = ðx1 ; .; xl Þ. The Fisher information is then given as a l 3 l matrix with ij entry defined as   vln PðYjX = xÞ vln PðYjX = xÞ FIij ðxÞ = E : (Equation 11) vxi vxj The optimal input distribution and the information capacity are obtained by replacing the Fisher information with the determinant of the Fisher information matrix in Equations 9 and 10. Precisely, pffiffiffiffiffiffiffiffiffiffiffiffiffi P ðxÞf jFIðxÞj; (Equation 12) and

0 1 C = log2 @pffiffiffiffiffiffiffiffiffi 2pe 

1 Z pffiffiffiffiffiffiffiffiffiffiffiffiffi jFIðxÞjdxA;

(Equation 13)

X

where j,j denotes the matrix determinant. Equations 12 and 13 were used to calculate the information capacity for systems with two inputs considered in the analysis of duplication scenarios. Modeling of Sensors Activity Here, we apply the Equations 9 and 10 to understand information processing properties of allosteric sensors. Specifically, we consider steady-state distributions of sensors activity and assume that each sensor molecule takes one of two or more conformational states independently of other sensory molecules. In case of two allowed conformations, e.g. active and inactive, this leads to the steady-state distribution of active receptors, Y, given by the binomial distribution PðYjX = xÞ = BinðhðxÞ; NÞ;

(Equation 14) Cell Systems 8, 76–85.e1–e6, January 23, 2019 e2

where N is the total number of receptors and hðxÞ describes that probability of a sensor being active at a given ligand concentration, x. In principle, this model takes into account noise resulting from random ligand binding and from spontaneous activation (i.e., the sensor accommodating the activated state despite the ligand being unbound), as described by the function hðxÞ. The case of multiple allowed conformations is modeled by replacing binomial with multinomial distribution. The mass action kinetics is usually used to derive the activation function, hðxÞ, which depends on the molecular and biochemical details of the sensory molecules. Michaelis-Menten, The Monod-Wyman-Changeux (Cornish-Bowden, 2013) and Hall models (Hall, 2000) are examples of frequently used activation functions.The Monod, Wyman, and Changeux (MWC) model considers a largely homogeneous population of sensors composed of n identical subunits that can independently bind a ligand. Each sensor is assumed to be in either of the two conformational states: the T (tense) state, considered as the inactive state, and the R (relaxed) state, considered as the active state. Conformational change can occur spontaneously when no ligand is bound. The ratio between sensors in the R and T states with no ligand present is described by L. The dissociation constant of the T state is described by Kd and the parameter a defines the ratio of the dissociation constants of the T state and R states. The fraction of proteins in the active state within the MWC model is described as hðxÞ =

Lð1 + ax=Kd Þn : Lð1 + ax=Kd Þn + ð1 + x=Kd Þn

(Equation 15)

The allosteric interactions between binding sites in the MWC model result in the cooperative effect. The model allows only for positive cooperativity, the magnitude of which is controlled by the parameters a and n. A similar model of an allosteric sensor was proposed by Davida A. Hall (Hall, 2000). In contrast to the MWC model, it allows for negative cooperativity. Each sensor can be in either of the two states: active and inactive. The sensor is assumed to have two binding sites. Here, we consider a version of the model with both sites binding the same ligand. The possibility to account for negative cooperativity comes at the cost of involving more parameters compared to the MWC model. In addition to the parameters L, a, and Kd that have the same definition as in the MWC model, the Hall model requires two more parameters: g, the ratio of the ligand affinity of the inactive sensor with one ligand bound to that of the inactive sensor with no ligand bound, thus reflecting the magnitude of binding cooperativity; and d, the ratio of the ligand affinity of the active sensor with one ligand bound to that of the inactive sensor with one ligand bound, which quantifies the activation cooperativity (Hall, 2000). The fraction of sensors in the active state is described by the Hall model as hðxÞ =

Lð1 + ax=Kd ð2 + agdx=Kd ÞÞ : 1 + L + x=Kd ð1 + aLÞ + x=Kd ð1 + aL + gx=Kd ð1 + a2 dLÞÞ

The Fisher information of any sensor, at N copy number, described by (14) is given as 2  vhðxÞ vx FIðxÞ = N ; hðxÞð1  hðxÞÞ

(Equation 16)

(Equation 17)

Therefore, the information capacity, regardless of the form of the activation function, hðxÞ, can be readily obtained by substituting the Fisher information into Equation 10. Direct integration leads then to Equation 1. Calculation of Distinguishable Ligand Concentrations  2C represents the maximal number of different inputs that a system can effectively resolve assuming an optimal input distribution. However, what are the actual input values (i.e., ligand concentrations) that can be effectively distinguished cannot be directly derived from C* and the optimal input distribution. These, however, can be addressed by the Chernoff information that quantifies the probability of error when distinguishing two close input values. Specifically, Chernoff information can be used to identify how, given  the overall number of distinguishable states, 2C , the input values are distributed such that they are distinguishable with a minimum error. The distinguishable states for the receptor presented in Figure 1A are presented in Figure 1D. These states are distributed differently depending on the receptor’s cooperativity, but their number remains the same, as elaborated in the next section. The distinguishable ligand levels in Figure 1C Chernoff information were determined using the Chernoff information. For two distributions PðYjX = x1 Þ and PðYjX = x2 Þ, Chernoff information is defined as the Kullback Leibler divergence, KL(), between an auxilliary distribution PðYjX = x1 Þa PðYjX = x2 Þ1a Pa ðYÞ = P a 1a Y PðYjX = x1 Þ PðYjX = x2 Þ and either of the two distributions. The auxilliary distribution is selected to satisfy KLðPa ðYÞ;PðYjX = x1 ÞÞ = KLðPa ðYÞ;PðYjX = x2 ÞÞ. Formally, Cðx1 ; x2 Þ = KLðPa ðYÞ; PðYjX = x1 ÞÞ = KLðPa ðYÞ; PðYjX = x2 ÞÞ Chernoff information allows to quantify the probability of making a wrong decision by the Bayesian maximum a posteriori principle when deciding whether two distinct inputs x1 and x2 generated the observation Y (Cover and Thomas, 2012) assuming that a priori

e3 Cell Systems 8, 76–85.e1–e6, January 23, 2019

both inputs are equally likely. Specifically, for the binomial distribution (Equation 13) by the Chernoff-Stein Lemma the probability of selecting a wrong input, Pe , using the Bayesian maximum a posteriori principle, is given as Pe = 2Cðx1 ;x2 Þ :

(Equation 18)

Distinguishable states presented in the Figure 1C have been calculated as a sequence of inputs x1 ; .; xq such that (i) x1 = 0; and (ii) for each i > 0, xi+1 is the smallest x > xi is such that Cðxi ; xÞR3 , where 3 has been selected so that q = C*. Calculation of information capacity for sensors with copy number variation and cross-reacting non-cognate ligands Copy Number Variation The distribution of active sensors, PðYjX = xÞ, with a variable copy number can be obtained form the distribution of active sensors with a fixed copy number, PðYjX = x;NÞ, and the the distribution of the copy number PðNÞ. Precisely, averaging of PðYjX = x; NÞ over PðNÞ leads to PðYjX = xÞ =

N X

PðYjX = x; NÞPðNÞ:

(Equation 19)

N=1

For the above distribution, the exact Fisher information cannot be explicitly derived. However, it can be numerically calculated, which in turn enables utilisation of the Equation 10 for calculation of C*. Cross-Reacting Ligands Similarly to the above, for the model of a sensor with cross-reacting non-cognate ligands the distribution of active sensors is given as Z PðYjXC = xC Þ = PðYjX = xC + xNC =lÞPðxNC ÞdxNC ; (Equation 20) XNC

where XNC denotes the space of possible non-cognate ligands concentrations. Again, numerical evaluation of the Fisher information enables calculation of the information capacity, C*, via the Equation 10. Additional Analysis Relationship between Kd and Optimal Distribution of Ligand Concentrations As described in the STAR Methods, our statistical approach further connects signaling accuracy, quantified by the Fisher information, with distribution of inputs and the maximal number of resolvable states, i.e., information capacity (Equation 10). Importantly, the optimality criterion (Equation 9) applies to all our models - maximal information transmission is realized under the optimal input distribution as quantified by the information capacity, C*. In the optimal scenario, the ligand concentration, x, should appear with the frequency proportional to 1=sð xbðYÞÞ, where sð xbðYÞÞ is the standard deviation of an estimator of the pffiffiffiffiffiffiffiffiffiffi ffi signal, x obtained from the output, Y, where the standard deviation is described by the Fisher information, sð xbðYÞÞz1= FIðxÞ. As exemplified in Figure 1C, this criterion is an extension of the classical assumption that the highest sensitivity is obtained when the Kd, and also the level of cooperativity, match the range of ligand concentrations that a sensor experiences most frequently. As described in the main text, C* quantifies maximal information transfer assuming an optimal distribution of input, i.e., of ligand concentrations. Optimality relates to the sensor’s Kd (Figures 1A, 1C, and 1D) and also to the shape of the distribution. It is generally assumed that the Kd of signaling sensors evolved to match the ligand concentrations presented by the environment (like-wise the KM of signaling enzymes generally evolves to match the prevailing substrate concentrations). However, if in a given environment, the distribution of input ligand concentrations happens to deviate from the optimal one, information transfer will be accordingly compromised. The deviation may relate to the Kd value failing to match the most probable ligand concentration and/or to differences in the shapes of the actual and optimal distribution Although information transfer exhibits certain degree of robustness to such deviations, substantial discrepancies result in its severe decrease (Figure 1E), thus indicating a strong evolutionary driving force for optimization of Kd to match the actual ligand concentrations. A Logarithmic Ceiling Holds for Temporally Resolved Outputs The logarithmic dependence on the copy number, N, holds for a much broader setting. Precisely, suppose that an activity of a single copy of sensor is represented by a random vector YT. The output YT may for instance represent sensor’s temporally resolved readouts (on/off trains), possibly out-of-steady state. Assume that x is one dimensional and YT follows some probability distribution PðYT jX = xÞ with Fisher information FIT ðxÞ. Note, that here FIT ðxÞ refers to a singleRcopy of a receptor, as opposed to ensamble of N N copies. Then, if each copy of a sensor operates independently of others and 0 FIT ðxÞdx
Cell Systems 8, 76–85.e1–e6, January 23, 2019 e4

The above results directly from the Fisher information of a system composedRof N independent and identical copies being N,FIT ðxÞ N (a basis property of the Fisher information) and the Equation 10. The condition 0 FIT ðxÞdx
Y  Multðh0 ðxÞ; u1 h1 ðxÞ; .; uk hk ðxÞ; NÞ;

(Equation 22)

where h0 ðxÞ = 1  i = 1 ui hi ðxÞ denotes the probability of the sensor being in the inactive state. The Fisher information for the above distribution with respect to x is given as (Greenwood and Nikulin, 1996) 2 2   vh0 ðxÞ vhi ðxÞ k X vx vx FIðxÞ = + ; (Equation 23) ui h0 ðxÞ h i ðxÞ i=1 and its integration according to Equation 10 allows calculation of the system’s information capacity. For illustration of how sensing of both the complete output, i.e., the fraction of ligand-bound sensors, and of the number of sensor molecules in each state, Y = ðY0 ; Y1 ; .; Yk Þ, enhances information processing, we again applied the MWC model. The allosteric effectors thus alter the allosteric constant L, and the information capacity was calculated as a function of the log-ratio of subsequent   L1 allosteric constants log10 , Figure S1C. As expected, the benefit in terms of information capacity is largest when the response L2   L1 curves have non-overlapping dynamic ranges (i.e., higher log10 and when the receptors are equally distributed among all L2 conformations. Under these assumptions, and given that, in all cases hi ð0Þz0 and hðNÞz1, integration of the Fisher information (Equation 23) according to Equation 10 gives   1 k,N : (Equation 24) C = log2 2 2e=p Overall, it appears that sensing the number of bound sensors in individual sensor states provides only a moderate increase in information capacity compared to sensing the overall number of active sensors. The maximal achievable gain can be interpreted as being equivalent to the k times increase in the copy number of receptor molecules, as indicated by Equation 24. Sensing the conformational state of the receptors occurs in phenomena such as functional selectivity or biased agonism (Kenakin, 2011), typically, in G protein-coupled receptors (GPCRs) (Kenakin, 2012; Christopoulos and Kenakin, 2002; Rajagopal et al., 2010). Examples of functional selectivity include b-adrenergic, histamine and numerous chemokine receptors. In these cases, different conformational states of the same GPCR are recruited for signaling different outputs. Note, however, that signaling different outputs demands different variants of G proteins and b-arrestin (Rajagopal et al., 2010). Regulated Sensors Can Provide Further Benefit when Additional Signaling Components Are Available In the analysis of the allostery, and of the functional selectivity, we assumed that each sensor molecule takes its allosteric state independently. However, it is often the case that the sensors’ state depends on the signal. Such a situation occurs, for instance, in chemotactic receptors (Barkai and Leibler, 1997) via methylation. It has been shown that desensitization of the receptor via the negative feedback loop, also called activity-dependent desensitization, can substantially increase the dynamic range of receptors (Friedlander and Brenner, 2011). Using the calculations presented above we can also calculate the information capacity for such e5 Cell Systems 8, 76–85.e1–e6, January 23, 2019

a scenario. Again, consider that a sensor can take k states. For simplicity consider the following assumptions. Consider that activation functions of sensors in each state, hi ðxÞ, are non-overlapping as well as hi ð0Þ = 0 and hi ðNÞ = 1, for all i from 1 to k. Also, assume that subsequent allosteric states, i, are taken by all copies of the sensors approximately simultaneously after the signal reached a saturation for the previous state. As described in (Friedlander and Brenner, 2011) such system has k times the dynamic range of the unregulated sensor. Therefore, the system has k times more distinguishable states than the original sensor and its capacity can be written as  2  1 k ,N : (Equation 25) C = log2 2 2e=p Achieving the above capacity requires, however, an additional signaling component to implement the activity-dependent desensitization via a feedback loop as well as a component to recognize the allosteric state of the sensor.

Cell Systems 8, 76–85.e1–e6, January 23, 2019 e6