Effects of codon sequence on the dynamics of genetic networks

Effects of codon sequence on the dynamics of genetic networks

Journal of Theoretical Biology 315 (2012) 17–25 Contents lists available at SciVerse ScienceDirect Journal of Theoretical Biology journal homepage: ...

928KB Sizes 2 Downloads 25 Views

Journal of Theoretical Biology 315 (2012) 17–25

Contents lists available at SciVerse ScienceDirect

Journal of Theoretical Biology journal homepage: www.elsevier.com/locate/yjtbi

Effects of codon sequence on the dynamics of genetic networks ¨ Ilya Potapov a, Jarno Makel a¨ a, Olli Yli-Harja a,b, Andre S. Ribeiro a,n a b

Computational Systems Biology Research Group, Department of Signal Processing, Tampere University of Technology, P.O. Box 527, FIN-33101, Finland Institute for Systems Biology, 1441N 34th St, Seattle, WA 98103-8904, USA

H I G H L I G H T S c c c c c

We study effects of codon sequences on the dynamics of protein numbers in E. coli. We model stochastic kinetics at nucleotide and codon level with real codons frequency. Mean protein numbers at near equilibrium differ with the codon sequence. Slow codon ramps affect the mean, but not fluctuations, in proteins numbers. Slow codon ramps affect the dynamics of genetic circuits such as switches and clocks.

a r t i c l e i n f o

abstract

Article history: Received 16 March 2012 Received in revised form 17 August 2012 Accepted 22 August 2012 Available online 31 August 2012

In prokaryotes, the rate at which codons are translated varies from one codon to the next. Using a stochastic model of transcription and translation at the nucleotide and codon levels, we investigate the effects of the codon sequence on the dynamics of protein numbers. For sequences generated according to the codon frequencies in Escherichia coli, we find that mean protein numbers at near equilibrium differ with the codon sequence, due to the mean codon translation efficiencies, in particular of the codons at the ribosome binding site region. We find close agreement between these predictions and measurements of protein expression levels as a function of the codon sequence. Next, we investigate the effects of short codon sequences at the start/end of the RNA sequence with linearly increasing/ decreasing translation efficiencies, known as slow ramps. The ramps affect the mean, but not the fluctuations, in proteins numbers by affecting the rate of translation initiation. Finally, we show that slow ramps affect the dynamics of small genetic circuits, namely, switches and clocks. In switches, ramps affect the frequency of switching and bias the robustness of the noisy attractors. In repressilators, ramps alter the robustness of periodicity. We conclude that codon sequences affect the dynamics of gene expression and genetic circuits and, thus, are likely to be under selection regarding both mean codon frequency as well as spatial arrangement along the sequence. & 2012 Elsevier Ltd. All rights reserved.

Keywords: Codon sequence Gene expression dynamics Genetic circuits Ribosome binding site

1. Introduction In prokaryotes, gene expression dynamics has a significant impact on the phenotype. Given the relatively short life time of most of these organisms and the small number of transcription events of most genes (Taniguchi et al., 2010), the moment when genes are transcribed, the mean frequency of transcription and the variability in the intervals between transcription events are relevant in determining how a cell behaves and responds, for example, to environmental changes.

n

Corresponding author. Tel.: þ358 408490736; fax: þ 358 31154989. E-mail addresses: ilya.potapov@tut.fi (I. Potapov), ¨ ¨ olli.yli-harja@tut.fi (O. Yli-Harja), jarno.makela@tut.fi (J. Makel a), andre.ribeiro@tut.fi, andre.sanchesribeiro@tut.fi (A.S. Ribeiro). 0022-5193/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jtbi.2012.08.029

The kinetics of gene expression is subject to multiple regulatory mechanisms at various stages. Some of these mechanisms and their effects have been studied making use of measurements and models. For example, several studies characterized how the kinetics of transcription initiation affects both mean and fluctuations in RNA and proteins numbers (McClure, 1980; Arkin et al., 1998). The effects of other sequence dependent events and mechanisms on gene expression dynamics still require much study. One example of such a mechanism is transcriptional pausing, which has been suggested to be an important regulator of transcription in both prokaryotes and eukaryotes (Herbert et al., 2006; Landick, 2006; Bender et al., 1987). There are several sequence dependent events in translation elongation whose effects on the kinetics of protein production, and possible regulatory roles, still require much study. In Sørensen and Pedersen (1991) different mutants of the LacZ

18

I. Potapov et al. / Journal of Theoretical Biology 315 (2012) 17–25

protein production of these sequences with those of ‘null models’ with uniform codons translation efficiencies. Finally, we investigate whether codons translation efficiencies affect the kinetics of two small genetic circuits, namely, genetic switches and genetic clocks.

gene were used to study the kinetics of translation of individual codons. The sequences were designed so as to enhance queue formation and traffic in elongation. One sequence corresponded to the wild-type lacZ while the other two differed in that a region of slow-to-translate codons was inserted. The speed of protein chain elongation was measured by subjecting the cells to a pulse of radioactive methionines and measuring the level of radioactivity in cells of each population, every 10 s after the pulse. Each strand contained 23 methionines, spread out unevenly on the DNA sequence, causing the incorporation curve to be non-linear. The study revealed that the translation elongation speed of these strands differed, as the speed of incorporation of an amino acid depends on which synonymous codon is coding for it. These and other measurements allowed the development of realistic kinetic models of transcription at the nucleotide level (Ribeiro et al., 2009; Roussel and Zhu, 2006) and translation at the codon level (Mitarai et al., 2008). Such models are necessary if one is to capture accurately the kinetics of gene expression. In Mitarai et al. (2008) it was shown that measurements of sequence dependent translation rates of synonymous codons cannot be modeled by neither deterministic nor uniform stochastic models. For proper mimicking of the kinetics, the models need to include ¨ explicit translation elongation. Another study (Makel a¨ et al., 2011) showed that the degree of coupling between fluctuations in RNA and protein numbers is sequence dependent and that events in elongation, such as sequence-specific transcriptional pauses, affects the temporal levels of proteins. Here, using a detailed stochastic model of transcription and ¨ translation in bacteria (Makel a¨ et al., 2011), we investigate how the codon sequence of a gene affects the dynamics of expression of proteins. The codon sequences are randomly generated according to the natural codons frequency of occurrence in Escherichia coli. We first study how the protein numbers differ between models with randomly generated codon sequences and the underlying causes. Next, we compare the kinetics of the model with measured mean expression levels of genes differing only in synonymous codon usage (Welch et al., 2009). After, we introduce sequences of codons at the start and at the end of the RNA sequence, known as ‘slow ramps’, as these are found in many genes in E. coli (Tuller et al., 2010). We compare the kinetics of

2. Methods In prokaryotes, gene expression comprises several events that cannot be modeled as common bimolecular chemical reaction since they take a relevant time to be completed once initiated and the chemical species involved exist in very small numbers in the cell. Thus, to model the dynamics of gene expression we use the delayed Stochastic Simulation Algorithm (delayed SSA) (Roussel and Zhu, 2006), which while with stochastic kinetics, it allows delaying the release of products of a reaction in the system following the reactive event. For that, it stores products in a waitlist and releases them after enough time has elapsed. The delayed SSA is based on the SSA, a method for the exact numerical simulation of well-stirred chemically reacting systems (Gillespie, 1977). In both methods, each chemical species is a variable of integer value and time advances in discrete steps. Each time a reaction occurs, the number of molecules of the species involved is updated according to the reaction’s formula. ¨ The model of gene expression used (Makel a¨ et al., 2011) is at the nucleotide and codon level. By modeling transcription and translation elongation at this level it allows starting translation as soon as the ribosome binding site region of the RNA is formed, which was found to be necessary to accurately account for the fluctuations in RNA and protein numbers in bacterial gene ¨ expression (Makel a¨ et al., 2011). Further, it allows including events at each elongation step, such as transcriptional pauses in transcription elongation and back-translocation in translation elongation. The reactions modeling transcription and the stochastic rate constants are shown in Table 1. The model of transcription at the nucleotide level (Ribeiro et al., 2009) includes transcription initiation (1), promoter occupancy time (toc in (1)), and promoter clearance (2) (McClure, 1980). The transcription initiation is

Table 1 Chemical reactions, rate constants (in s  1), and delays (in s) used to model transcription initiation, elongation and termination. Pro—promoter, Rp—RNA polymerase, O—nucleotide occupied by Rp; U—unoccupied nucleotide; A—activated nucleotide; UR —unoccupied ribonucleotide. n denotes the index number of the nucleotide in the sequence. ð2  DP þ 1Þ — range of nucleotides that Rp occupies, DP ¼ 12. Parameter values were obtained from measurements in E. coli, mainly for LacZ. Description 1. Initiation and promoter complex formation

Rates kx (s  1) and delays tx (s)

Reaction kinit

Proþ Rp- Rp  Proðtoc Þ

2. Promoter clearance

Rp  Proþ U½1,ðDP þ 1Þ -O1 þ Pro

3. Activation

On -An

4. Elongation 5. Pausing

km

kinit ¼ 0:0245, toc ¼ 40 7 4 km ¼ 150 ka ¼ 150, n 410; ka ¼ 30, n r 10

ka

km

An þ Un þ DP þ 1 -On þ 1 þ UnDP þ URnDP

km ¼ 150 kp ¼ 0:55, tp ¼ 3

kp

On " O n p 1=tp

6. Pause release by collision 7. Pause by collision 8. Arrests

Onp þ An2DP 1 - On þ An2DP 1

km ¼ 150

0:2km

km ¼ 150

0:8km

On þ An2DP 1 - Onp þ An2DP 1

kar ¼ 0:000278, tar ¼ 100

kar

On " Onar 1=tar

9. Editing

ked ¼ 0:00875, ted ¼ 5

ked

On " Oncorr 1=ted

10. Premature termination 11. Pyrophosphorolysis 12. Completion 13. mRNA degradation

kpre

On -Rp þ U½ðnDP Þ,ðn þ DP Þ kpyr

On þ UnDP 1 þ URnDP 1 - On1 þ Un þ DP 1 kf

Alast -Rp þ U½lastDP ,last þ mRNA kdr

mRNA-|

kpre ¼ 0:00019 kpyr ¼ 0:75 kf ¼ 2 kdr ¼ 0:025

I. Potapov et al. / Journal of Theoretical Biology 315 (2012) 17–25

19

strand, occupying DR nucleotides downstream. The first nucleotide only becomes available for the next ribosome to bind when the previous ribosome moves by at least 2DR þ 2 ¼ 32 nucleotides, thus revealing the first DRþ 1 ¼ 16 nucleotides of the RNA needed for the next ribosome to bind. This length is referred to as ribosome binding site (RBS) region of the RNA. Trans-translation corresponds to the release of the ribosome from the RNA template after stalling, which can occur for several reasons, such as incorporation of an incorrect codon, premature mRNA degradation, or frame-shifting (Moore and Sauer, 2005; Keiler, 2008). In the model, stalling followed by trans-translation can occur spontaneously with a given probability at any codon via (21). When occurring, the mRNA strand is degraded and all translating ribosomes are released. Estimates from the observation of expression activity in E. coli suggest that, on average, 0.4% of translation reactions are terminated by trans-translation (Moore and Sauer, 2005), meaning that the probability of occurrence of this event at each nucleotide depends on the length of the gene. Since we only model sequences 1000 nucleotides long, the trans-translation reaction is always set to the same value, defined by ktt (Table 2). We generate codon sequences randomly, according to the known statistical frequency of each codon (extracted from NCBI GenBank, Dec. 1st, 2011) (Benson et al., 2004). Slow ramps at the start or end of a sequence are set to be 35 codons long (Tuller et al., 2010). The codons activation rates (ktr in (15)) change linearly with the codon position in the ramp, with ktr values ranging from 1 to 35 s  1, varying by 1 s  1 per codon. Finally, we note that we do not consider cell growth or cell divisions in the model. Such processes are likely to be, to some extent, a non-negligible source of transient fluctuations in the numbers of RNA and protein molecules. However, we opted for not including these processes in the simulations for two reasons. First, essential proteins are not likely to be diluted as a cell population grows, as the health of cells of subsequent generations would be severely hampered. As these proteins numbers are (to some extent) kept within tight ranges, it is generally accepted that the rate of production and the rate of division are, to some extent, coupled so as to maintain fairly stable levels in these numbers. Second, cell division, as an external source of variability in RNA and protein numbers, would not influence our conclusions concerning the effects of the codon sequences in mean and noise gene expression levels.

followed by elongation, which includes nucleotide activation (3) (Proshkin et al., 2010) and stepwise elongation (4) (Proshkin et al., 2010). At the end of this process, a complete RNA molecule and the RNA polymerase (Rp) are released in the system (12) (Greive et al., 2008). Several events compete with elongation at each nucleotide, such as transcriptional pauses (5) (Greive and von Hippel, 2005). These can end spontaneously (5) or by collisions with preceding Rp’s (6) (Epshtein and Nudler, 2003). Collisions can also induce pauses (7). There are ubiquitous arrests and release from arrest (8) (Ribeiro et al., 2009), misincorporation and editing (9) (Greive and von Hippel, 2005), premature terminations (10) (Lewin, 2008), and pyrophosphorolysis (11) (Erie et al., 1993). The model accounts for the nucleotides occupied by an Rp on the DNA strand. Finally, two RNA polymerases can never occupy simultaneously the same nucleotide. Reaction (13) models RNA degradation (Yu et al., 2006). The reactions modeling translation and the stochastic rate constants are shown in Table 2. The delayed stochastic model of translation at the codon level includes initiation (14) (Mitarai et al., 2008), and ribonucleotide activation (15) (Wen et al., 2008) followed by stepwise translocation (16–18) (Mitarai et al., 2008). Reactions competing with translocation are back-translocation (19) (Shoji et al., 2009), ribosome drop-off (20) (Jørgensen and Kurland, 1990), and trans-translation (21) (Moore and Sauer, 2005). The stepwise translocation ends with elongation completion (Mitarai et al., 2008) followed by protein folding (22) (Cormack et al., 1996). The model accounts for the ribonucleotides occupied by a ribosome when on the RNA. Also accounted for are codon-specific translation rates (Sørensen and Pedersen, 1991), implying that each codon has a specific translation rate. While, in general, this rate is either ktrA, ktrB or ktrC, in some models we introduce other rates so as to model the slow ramps. Finally, proteins undergo degradation (23) (Andersen et al., 1998). Finally, the time for a ribosome to move from the Ribosome Binding Site (RBS) region to the start codon (Mitarai et al., 2008; Ringquist et al., 1992) is accounted in the kinetic rate of the translation initiation chemical rate constant ktl. Ribosomes on the mRNA strand occupy 31 nucleotides, covering DR ð ¼ 15Þ nucleotides upstream and DR nucleotides downstream from the nucleotide being processed. This region cannot be occupied by other ribosomes. Initiation proceeds as follows. A ribosome binds to the first nucleotide of the growing mRNA

Table 2 Chemical reactions modeling translation and rate constants (in s  1) used to model translation initiation, elongation, and termination, as well as protein folding and R

R

degradation. Rib—ribosome; Rib —ribosome on mRNA strand; ½Rib —number of translating ribosomes on mRNA strand; P—complete protein; OR —ribonucleotide R

R

occupied by Rib; U —unoccupied ribonucleotide; A —activated ribonucleotide. n denotes the index number of the ribonucleotide in the sequence. ð2  DR þ 1Þ—range of ribonucleotides that Rib occupies, DR ¼ 15. Parameter values from measurements in E. coli, mainly for LacZ. Rates kx (s  1) and delays tx (s)

Description

Reaction

14. Initiation

Rib þ UR½1, DR þ 1 -OR1 þ Rib

ktl ¼ 0:53

15. Activation

ktrfA,B,Cg ORn - ARn

Codon dependent: ktrA ¼ 35; ktrB ¼ 8; ktrC ¼ 4:5

16–18. Stepwise translocation

ktm ARn3 þ UR½n þ DR 3,n þ DR 1 -ORn2

ktm ¼ 1000

ktl

R

ktm ORn2 -ORn1 ktm ORn1 -ORn þ UR½nDR 2,nDR 

19. Back-translocation

ORn þ UR½nDR 2,nDR  -ARn3 þ UR½n þ DR 3,n þ DR 1

kbt ¼ 1:5

20. Drop-off

kdrop ORn - Rib þ UR½nDR ,n þ DR 

kdrop ¼ 1:14  104

21. Trans-translation 22. Elongation completion and protein folding 23. Protein degradation

kbt

ktt

ktt ¼ 0:000159

R

mRNA-½Rib   Rib ktlf ARlast -Rib þ UR½lastDR ,last þ Pð fold Þ

t

kdp

P-|

ktlf ¼ 2, tfold ¼ 420 7 100 kdp ¼ 0:0029

20

I. Potapov et al. / Journal of Theoretical Biology 315 (2012) 17–25

3. Results and discussion 3.1. Kinetics of gene expression for codon sequences randomly generated Based on the frequency of occurrence of codons in E. coli (Benson et al., 2004) we generated 30 random sequences, each 1000 nucleotides long. The sequences differ only in the frequency of occurrence of its codons. We performed 200 simulations of the kinetics of gene expression of each sequence, each 105 s long with a sampling frequency of 1 s  1. Reactions and stochastic rate constants are shown in Tables 1 and 2. Since simulations are initialized without mRNA or proteins, in all calculations below we disregard an initial transient of 2  103 s, found sufficient to allow mRNA and proteins to reach numbers near-equilibrium. First, we computed the mean mRNA (E(RNA)) and protein (E(P)) numbers in each simulation. Distributions of the mRNA mean numbers at near equilibrium are shown in Fig. 1(left) for the 30 sequences as well as for a single sequence, chosen at random from the 30 sequences. This allows comparing the variability between simulations due to the stochasticity in the kinetics. Similarly, we show the same distributions for mean protein numbers in Fig. 1(right), to compare the variability in protein numbers between simulations due to stochasticity and differing codon sequences. Fig. 1 (left) shows that there is no significant difference in mRNA kinetics for different codon sequences, as expected. However, the kinetics of protein production in Fig. 1 (right) differs between the 30 sequences causing the overall distribution to be much wider than the distribution of the individual sequences (Fig. 1 (right)). Such differences arise solely from the differing

rates of protein elongation and thus overall production, as the kinetics of transcription and RNA and proteins degradation do not differ between models. The different proteins production rate is due to differing codon sequences. These affect the mean translation elongation time, in particular, how long ribosomes remain, on average, in the RBS region, which affects the mean of the intervals between initiations of translation events. To demonstrate this, in Fig. 2 (left) we show the mean rate of translation events per time unit versus the mean protein numbers for each simulation, for two sequences (arbitrarily named sequences 1 and 2). There is a clear positive correlation between the two quantities, showing that the codon sequence affects the rate of translation initiation events. Having a faster rate of translation initiation events leads to a higher number of premature terminations during translation (ribosome drop-offs) (Fig. 2 (right)). However, this is not sufficient to cancel out the effect of having different rates of initiation on the mean protein numbers. We next analyze the fluctuations in protein numbers. From the formula for the stationary variance in protein abundance (Pedraza and Paulsson, 2008), one expects that differences in mean protein numbers cause the stationary variances to differ as well due to the low-copy noise term (/pS1 ) and because the kinetics of RNA production does not differ between sequences. In Fig. 3 (top) we show the CV2 (square of the coefficient of variation) in protein numbers at near-equilibrium for the 30 sequences, averaged over all simulations for each sequence. Also shown are the distributions of protein numbers from the sequence that exhibited the lowest CV2 (Fig. 3, bottom left) and from the sequence that exhibited the highest CV2 (Fig. 3, bottom right), out of the 30 sequences. The differences in CV2 in protein numbers between sequences are of the order of 5%. Note that this quantity is also affected by

Fig. 1. (Left) Distribution of mean mRNA numbers (E(RNA)) and (Right) mean protein numbers (E(P)) at near-equilibrium from 200 simulations of individuals sequences. Also shown are the distributions resulting from 30 sequences, each of which simulated 200 times.

110

E (P)

105

800

1 sequence (1) 1 sequence (2) Linear fit (1) Linear fit (2) r2 =0.95196

100 95 90 85

r =0.91755 1

80 0.24 0.26 0.28 0.3 0.32 0.34 0.36 No. translation initiation events/s

No. ribosome drop−offs

115

750 700

1 sequence (1) 1 sequence (2) Linear fit (1) Linear fit (2)

650 r =0.58595 2

600 550 500 450

r1 =0.5655

400 0.24 0.26 0.28 0.3 0.32 0.34 0.36 No. translation initiation events/s

Fig. 2. Left: mean number of protein E(P) is positively correlated (with correlation coefficients ri ) with number of translation events. Right: number of ribosome drop-offs is positively correlated with number of translation events. Data shown for the same two codon sequences, denoted as 1 and 2.

I. Potapov et al. / Journal of Theoretical Biology 315 (2012) 17–25

0.2 0.15 0.1 0.05 0 0.0505

0.051

0.0515

0.052

0.0525

Mean CV2(P) 0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0 105

110

115 120 No. P

125

0 115

120

125

130

135

No. P

Fig. 3. Distribution of mean CV2(P) of protein numbers for the 30 different codon sequences (top) and distributions of protein numbers(P) for the models that exhibited the lowest (bottom left) and the highest (bottom right) values of CV2(P). All distributions are obtained from the complete time series except for an initial transient of 2  103 s since the networks are initialized without mRNA and proteins.

the stochasticity in protein degradation which tends to diminish differences arising from the kinetics of production. The relevance of these differences is, as noted later, context dependent. While, at the single gene level they are likely not very significant, at a gene network level they may be of significance, e.g., in allowing to cross thresholds in protein numbers due to fluctuations. These results are derived from simulations using models with codon sequences that were randomly generated. From these, we expect that the average codon translation rate of an RNA sequence and, in particular, the codon translation rate of the region following the translation start site, affect the mean rate of protein production. These predictions can be tested by measurements of mean expression levels of genes with the same promoter, coding for the same protein, but differing in synonymous codon usage (Welch et al., 2009). 3.2. Kinetics of gene expression as a function of the codon sequence of a gene Recently, measurements were reported of mean expression levels of genes with the same promoter, coding for the same protein, but differing in synonymous codon usage (Welch et al., 2009). From these and the sequences of the constructs, one can test if the mean expression level of a protein depends on the average codon translation rate of the codon sequence (also referred to as codon bias) and/or the average codon translation rate of the codons coded in the initial region of the RNA. Codon bias has been associated to gene expression level (Varenne et al., 1984) and other phenotypic traits (Bailly-Bechet et al., 2006; Hershberg and Petrov, 2008). There is much debate whether the codon bias affects expression levels. In (Karlin et al., 1998) the codon usages of different classes of genes in E. coli were

analyzed. Some classes of highly expressed genes showed significant deviations from the average codon bias, while others did not. Other studies suggest that the correlation between expression level and average codon bias is strong in several organisms (Gouy and Gautier, 1982; Kanaya et al., 1999; Fuglsang, 2003). However, in Kudla et al. (2009), using a synthetic library of 154 genes differing at synonymous sites while encoding the same protein (GFP) it was concluded that the codon bias did not correlate with expression levels although these varied by 250-fold. In Welch et al. (2009), measurements were reported of protein expression levels in E. coli from genes differing only in synonymous codon usage. From these and the sequences of the genes, it is possible to test if the model is accurate in predicting differences in mean expression levels due to differing synonymous codon usage, as well as the effects of slower mean codon translation rates in the initial region of the RNA. For that, we model and simulate the kinetics of 24 genes encoding a synthetic single-chain antibody fragment (‘scFv’, OncoMed, Redwood City, CA) as the constructs used in Welch et al. (2009). From this, we calculate the Pearson correlation between the reported expression levels and the mean levels from the simulations of each model gene (Fig. 4). This correlation equals to 0.143, indicating that the mean codon bias affects, weakly, the mean expression level of a protein as indicated in the previous section. Next, we test for correlation only for those genes with a slow ramp at the starting region of the RNA (Fig. 4). This region is assumed to be 25 codons long in agreement with genome wide measurements in E. coli, as well as other prokaryotes (Tuller et al., 2010). To select for genes with slow ramps, we selected the sequences whose initial 25 codons mean translation speed is slower than the overall average translation speed for the 24 sequences. We found that the correlation for this set of genes between models and measurements is 0.656, in agreement with the expectation that slow ramps in the initial region of the sequence have a visible effect on the average protein production rate. Given these results, we next explore in greater detail how slow ramps affect the dynamics of single gene expression and of small genetic circuits.

3.3. Kinetics of gene expression in the presence of ramps of slow codons Recent studies revealed universally conserved profiles of RNA translation efficiencies in different classes of organisms including 1 Relative E(P) (experiments)

0.25

21

All sequences

0.8 0.6

r = 0.656

Sequences with ramp Linear fit (ramp) Linear fit (all)

r = 0.143

0.4 0.2 0 0.7

0.75

0.8

0.85

0.9

0.95

1

Relative E(P) (models) Fig. 4. Relative mean protein numbers, E(P), at near-equilibrium of model and measurement for each RNA sequence (solid gray circles). Of these, the ones concerning sequences with a slow ramp at the start are indicated with a black circle around the data point. The sequences differ only in synonymous codons. The E(P) numbers are normalized by the highest value of each set for easier visualization. Also shown is the least square fit to a line for all data points, as well as the least square fit to a line for the data points concerning sequences with a ramp at the start alone. The slope of these two lines corresponds to the Pearson correlation between E(P) numbers of measurements and models.

22

I. Potapov et al. / Journal of Theoretical Biology 315 (2012) 17–25

bacteria (Tuller et al., 2010). In E. coli, many mRNA sequences are such that the first 15–35 codons are, on average, translated with lower efficiency than the rest of the sequence. Some genes also have slow ramps at the end of the sequence. In general, these ramps exhibit an approximately linear increase/decrease in translation speed from beginning to end when at the start/end of the RNA sequence (Tuller et al., 2010). In this section, we study the gene expression kinetics of such sequences. We introduce linearly increasing codon rates for codons in positions 1–35 (counting forward from the translation start site) or linearly decreasing codon rates for codons in positions  35 to  1 (counting backwards from the stop codon). The rest of the sequence is generated randomly as previously. We also generated three sequences with uniform codon activation rates (models ‘‘A’’, ‘‘B’’ and ‘‘C’’), corresponding to the three groups of codon activation rates (ktrA, ktrB, ktrC in Table 2). These are used as ‘‘null models’’ that we compare with those with slow ramps. We simulate each of the five models (the three null models and the two models with a slow ramp at the start and at the end) 200 times, each simulation lasting 105 s with 1 s  1 sampling frequency. Results are shown in Fig. 5. From Fig. 5, the model with a slow ramp at the start of the sequence and null model ‘‘A’’ exhibit the smallest E(P) values, due to their rate of protein production. It is of interest to note that these two models differ significantly from one another in codon translation rates, except in the RBS region. In this region, the two models have slow translation rates. In one of these models, this is due to having codons with slow translation rates throughout the whole sequence. In the other, it is due to the presence of a slow ramp. This supports the hypothesis that the mean rate of protein production is largely controlled by the translation rate of the RBS region of the RNA. On the other hand, the model with a slow ramp at the end and null model ‘‘B’’ have E(P) distributions similar to those of randomly generated codon sequences (see Fig. 1 (right)). From this, we conclude that adding slow ramps in the end of sequences does not affect significantly mean rates of protein production. Interestingly, while the ramps affect E(P), they do not affect the degree of fluctuations in protein numbers (Fig. 6). From Fig. 6, the distributions of values of CV2 of the sequences with a ramp at the start and end are identical (from the same simulations used for Fig. 5). These distributions are also identical to those of the null models (data not shown).

0.3 Null A Null B Null C Start End

0.25 0.2 0.15 0.1 0.05 0 60

70

80

90

100

110

120

130

140

150

160

E (P)

Fig. 5. Distributions of mean protein numbers, E(P), at near-equilibrium of the two models with a slow ramp at the start (‘‘Start’’) and the end (‘‘End’’) and of the three null models with the uniform codons sequences. Null model A (slowest codon activation rate, 4.5 s  1/codon for all codons), null model B (8 s  1/codon for all codons) and, null model C (fastest activation rate, 35 s  1/codon for all codons).

0.35 Start End

0.3 0.25 0.2 0.15 0.1 0.05 0

0.04

0.045

0.05

0.055

0.06

0.065

0.07

0.075

CV2(P) Fig. 6. Distribution of mean CV2 of protein numbers of the models with a slow ramp at the start and at the end of the RNA sequence.

3.4. Effects of ramps of slow codons in small genetic circuits We next study the effect of ramps in the kinetics of small genetic circuits. We first consider a toggle switch (Gardner et al., 2000), which consists of two genes, each of which expressing a protein that represses the expression of the other gene. We compare the kinetics of three model switches. One switch has random codon sequences, generated as previously. Another switch has slow ramps at the start of the sequences of both genes. The last model switch has slow ramps at the end of the sequences of both genes. In all cases, both genes are 1000 nucleotides long. Due to the interactions that define the structure of a genetic switch, usually one protein level is ‘‘high’’ while the other is ‘‘low’’. In the context of stochastic genetic circuits, regions of the state space where the network remains in for long periods of time, yet can leave due to fluctuations in the molecule numbers, are usually referred to as ‘‘noisy attractors’’ (Ribeiro and Kauffman, 2007). The term is used instead of the usual concept of attractor, since stochastic systems do not have attractors. The 2-gene switches usually have two noisy attractors (Ribeiro and Kauffman, 2007). We simulated the dynamics of each of the three model switches 200 times, each simulation 106 s long, sampled every second. From the time series of protein numbers we calculated the stability of the noisy attractors. Stability (S) is defined as T=ðn þ 1Þ, where T is the duration of a simulation in seconds and n is number of switches during T. We define a switch as an event that satisfies the following conditions: the number of molecules of one of the proteins becomes, from one sample to the next, bigger than the number of the other protein, and there is no other switch event within the next 1000 s (so that a single switch between noisy attractors is not counted multiple times). This definition follows the one in Ribeiro (2007). Compared to the model switch without ramps, the slow ramp at the end of the sequences does not alter S significantly due to the weak effects on E(P) numbers. However, the ramp at the start of the sequences causes S to decrease by  25%. Thus, codon sequences can significantly affect the kinetics of small genetic circuits and the location of a slow ramp, relative to the RBS region, is relevant to the degree of change in the dynamics. Finally, we simulated a model switch such that only one gene has a slow ramp, located at the start of the sequence (here named ‘biased switch’). Again, 200 simulations were performed. This model exhibits a stability increase of  400% in comparison to the model without ramps. As such, we conclude that ramps can

I. Potapov et al. / Journal of Theoretical Biology 315 (2012) 17–25

200

No. protein

200

P

P (Start)

P

150

P (No ramp)

150

100

100

50

50

0

23

0

0.5

1

1.5

2

0

0

0.5

1

1.5

2 x 10

x 10 1000

3000 P

No. protein

800

2500

P

2000

P

600

1500 400

1000

200 0

500 0

2

4

6 Time, s

8

10 x 10

0

0

2

4

6 Time, s

8

10 x 10

Fig. 7. Examples of the kinetics of protein numbers of the model genetic circuits. Top left: model toggle switch with codon sequences randomly generated. Top right: model toggle switch with one gene (gene 1) having a slow ramp at the start of the RNA sequence. Bottom left: model repressilator with codon sequences with slow ramps at the start of the RNA sequence of each of the three genes. Bottom right: model repressilator with codon sequences with slow ramps at the end of the RNA sequence of each of the three genes.

bias the switch’s behavior and this bias has a strong effect in the ability of the switch to ‘hold state’. In Fig. 7 we show examples of the kinetics of protein numbers of two model switches. In the top left panel of Fig. 7 the kinetics of the model switch without ramps is shown, while in the top right panel the kinetics of the model switch where one of the two genes has a ramp, placed at the start of the RNA sequence, is shown. Note the bias in the choice of noisy attractor in the latter model. Next, we study the effects of ramps of slow codons in a model genetic Repressilator. The Repressilator consists of three genes, each of which repressing another gene in a negative feedback fashion. This circuit is known to exhibit oscillatory behavior due to the negative feed-back loop with odd number of elements (Elowitz and Leibler, 2000). Again, we compare the dynamics of three models. In one, the sequences were generated randomly as previously. In the other two we introduced slow ramps at the start and at the end of each gene’s sequence, respectively. Each gene consists of 1000 nucleotides and each model was simulated 200 times, for 105 s each with a 1 s sampling interval. In Fig. 7 (bottom panels) we show examples of the kinetics of protein numbers of two model repressilators. In the bottom left panel of Fig. 7 is shown the kinetics of the model repressilator with slow ramps at the start of the RNA sequence of each gene, and in the bottom right panel is shown the kinetics of the model repressilator with slow ramps at the end of the RNA sequence of each gene. Note how the periodicity of the latter model is more robust. The Fast Fourier Transform (Fig. 8) can be used to assess what is the main frequency of oscillation (which corresponds to the main peak) as well as the robustness of the period of oscillation of the proteins numbers. In this case, the robustness can be assessed by the height of the main peak. The higher the height of the main peak, the more times were the oscillations’ duration equal to the mean period. In this sense, a more robust Repressilator is one whose oscillations’ durations are closer to the mean period, and thus are more ‘‘predictable’’. This transform was made on one of the three proteins, and is identical for all proteins. From Fig. 8, the three models have the same main frequency of oscillation

600 No ramp Start

500

End

400 300 200 100 0

0

0.2

0.4

0.6

Frequency, s−1

0.8

1 x

10−3

Fig. 8. Fast Fourier Transform for the model Repressilators without ramps, and with ramps at the start and at end of the genes’ sequences. The transform is made on one of the three Repressilators’ proteins.

(8.4  10  5 s  1) corresponding to a period of  12  103 s, but differ in the robustness of the periodicity, as the heights and widths of the peaks differ. The strongest decrease in the peak’s height is for the model with the slow ramp at the start of the sequence, as expected, due to having lower mean protein numbers. Interestingly, the effects due to adding slow ramps at the end of the sequences are also tangible.

4. Conclusions Using a model of transcription and translation at nucleotide and codon levels with realistic parameters values extracts from measurements in E. coli we studied how the codon sequence affects the kinetics of protein production of individual genes and the dynamics of small genetic circuits. Among the models

24

I. Potapov et al. / Journal of Theoretical Biology 315 (2012) 17–25

simulated we modeled sequences with slow ramps, with positioning and rates of translation according to the conserved profiles of codon translation rates in various groups of living organisms (Tuller et al., 2010). From the studies of the kinetics of single gene expression of genes with codon sequences generated randomly according to the natural probabilities of occurrence in E. coli, we found that the codon sequence affects the mean protein numbers in nearequilibrium significantly, due to affecting the rate of translation initiation (by changing how long ribosomes remain at the RBS region). Interestingly, the noise in protein numbers, as measured by square of coefficient of variation (CV2), does not change significantly with the codon sequence, indicating that altering the codon bias may allow to engineer more strongly expressing genes with weaker fluctuations in protein numbers. To determine whether these results agree with observations, we compared the expression levels between models and measurements. The measurements consist of gene expression levels of a set of genes that have identical promoters and code for identical proteins, but differ in synonymous codon usage (Welch et al., 2009). We modeled the sequence of each of these genes, simulated the expression kinetics, and computed the correlation between their mean expression levels and those reported in Welch et al. (2009). We found weak positive correlation between the models and measurements with respect to the average codon translation rate. We also found a significantly stronger positive correlation for the subset of genes with slower average codon translation rate in the initial region of the RNA, as predicted. Slow ramps were found to influence the kinetics of both single gene expression as well as small genetic circuits. Relevantly, their effect likely depends on their location relative to the RBS region, and both our kinetics studies, as well as sequence analysis studies (Tuller et al., 2010), suggest that selection may have shaped these ramps. The strongest effects are attained by placing a slow ramp at the start of the sequence. In Tuller et al. (2010) it was hypothesized that slow ramps can reduce ribosomal jams by averaging the spacing between ribosomes on the mRNA strand. Our results support the hypothesis of reduction of ribosomal jams, and further indicate that this is achieved mostly by reducing the rate of translation initiation, rather than by averaging the spacing between ribosomes (although this also occurs). At the level of small genetic circuits we found evidence that slow ramps have tangible effects in the kinetics. Again, effects are stronger when slow ramps are placed at the start of the RNA sequences. The present model does not account for some effects that ribosomes may have on the kinetics of transcription elongation. Recent studies (Burmann et al., 2010) suggest that the rate of translation elongation may affect transcription elongation, due to interactions between the ribosome that first binds to the mRNA and the polymerase transcribing it. Such effects may include the release of paused polymerases. We do not exclude the possibility that paused polymerases may, instead, in some cases, force the ribosome to pause momentarily. Altering the rate of translation initiation may also affect the probability of collisions between consecutive polymerases, provided that their elongation kinetics changes in the presence of a ribosome close by. In the future, and provided a more precise knowledge on the consequences of these interactions, these can be included in the model. However, there are indirect effects already visible with the present the model. For example, in the model, having a ribosome close to the polymerase prevents transcriptional backtracking, because the RNA is not available for this event to be possible. This has been observed in measurements (Burmann et al., 2010). In any case, changes in the rate of translation do not necessarily affect the rate of transcription (Zellars and Squires, 1999) and thus, much study is still required before these effects can be included in the model.

The dynamics of genetic circuits is subject to the regulation and influence of a multitude of parameters and features. Extensive research in this area has revealed that events during transcription initiation (e.g. open complex formation Kandhavelu et al., 2011), elongation (e.g. transcriptional pauses Landick, 2006), and translation initiation and elongation (Mitarai et al., 2008), among others, can affect the temporal numbers of RNA and/or of proteins. Our results using a model with parameters values within realistic intervals suggest that the codon sequence affects the dynamics of genetic circuits. However, experimental validation is still required. This validation is likely possible, given the present techniques. For example, to test whether the codon sequence affects the dynamics of genetic switches or clocks one could proceed as follows. Starting from the Repressilator circuit (Elowitz and Leibler, 2000), one could first use site-directed mutagenesis to introduce, in each of the three genes, a sequence that would code for a slow ramp. Provided that the novel circuit is functional, one could then compare its behavior with that of the original Repressilator circuit. A similar approach could also be used to engineer a genetic switch with similar ramps in each gene. By the comparing the oscillatory behavior of the two Repressilators and the switching kinetics of the two switches, it would be possible to assess to what extent these ramps affect the dynamics of small genetic circuits. In conclusion, our results suggest that the codon sequences affect significantly the dynamics of single gene expression and genetic circuits of prokaryotes. These results may be of relevance in synthetic biology, when engineering genetic circuits for specific purposes, and in computational biology, when modeling genetic circuits. Finally, they may assist in a better understanding of the evolutionary pressures that genomes are subject to.

Acknowledgments Work supported by Academy of Finland (ASR), CIMO (IP), and FiDiPro program of Finnish Funding Agency for Technology and Innovation (JM, OYH, and ASR). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. References Andersen, J., Sternberg, C., Poulsen, L., Bjorn, S., Givskov, M., Molin, S., 1998. New unstable variants of green fluorescent protein for studies of transient gene expression in bacteria. Appl. Environ. Microbiol. 64, 2240–2246. Arkin, A., Ross, J., McAdams, H., 1998. Stochastic kinetic analysis of developmental pathway bifurcation in phage l-infected Escherichia coli cells. Genetics 149 (4), 1633–1648. Bailly-Bechet, M., Danchin, A., Iqbal, M., Marsili, M., Vergassola, M., 2006. Codon usage domains over bacterial chromosomes. PLoS Comput. Biol. 2, e37. Bender, T., Thompson, C., Kuehl, W., 1987. Differential expression of c-myb mrna in murine b lymphomas by a block to transcription elongation. Science 237, 1473–1476. Benson, D., KarschMizrachi, I., Lipman, D., Ostell, J., Wheeler, D., 2004. Genbank: update. Nucleic Acids Res. 32, D23–D26. ¨ Burmann, B., Schweimer, K., Luo, X., Wahl, M., Stitt, B., Gottesman, M., Rosch, P., 2010. A nuse: nusg complex links transcription and translation. Science 328, 501–504. Cormack, B., Valdivia, R., Falkow, S., 1996. Facs-optimized mutants of the green fluorescent protein (gfp). Gene 173, 33–38. Elowitz, M., Leibler, S., 2000. A synthetic oscillatory network of transcriptional regulators. Nature 403, 335. Epshtein, V., Nudler, E., 2003. Cooperation between rna polymerase molecules in transcription elongation. Science 300, 801–805. Erie, D., Hajiseyedjavadi, O., Young, M., von Hippel, P., 1993. Multiple rna polymerase conformations and grea: control of the fidelity of transcription. Science 262, 867–873. Fuglsang, A., 2003. Strong associations between gene function and codon usage. APMIS 111, 843–847. Gardner, T., Cantor, C., Collins, J., 2000. Construction of a genetic toggle switch in Escherichia coli. Nature 403, 339–342. Gillespie, D., 1977. Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81, 2340–2361.

I. Potapov et al. / Journal of Theoretical Biology 315 (2012) 17–25

Gouy, M., Gautier, C., 1982. Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res. 10, 7055–7074. Greive, S., von Hippel, P., 2005. Thinking quantitatively about transcriptional regulation. Nat. Rev. Mol. Cell. Biol. 6, 221–232. Greive, S., Weitzel, S., Goodarzi, J., Main, L., Pasman, Z., von Hippel, P., 2008. Monitoring rna transcription in real time by using surface plasmon resonance. Proc. Natl. Acad. Sci. 105, 3315–3320. Herbert, K., Porta, A.L., Wong, B., Mooney, R., Neuman, K., Landick, R., Block, S., 2006. Sequence-resolved detection of pausing by single rna polymerase molecules. Cell 125, 1083–1094. Hershberg, R., Petrov, D., 2008. Selection on codon bias. Ann. Rev. Genet. 42, 287–299. Jørgensen, F., Kurland, C., 1990. Processivity errors of gene expression in Escherichia coli. J. Mol. Biol. 215, 511–521. Kanaya, S., Yamada, Y., Kudo, Y., Ikemura, T., 1999. Studies of codon usage and trna genes of 18 unicellular organisms and quantification of bacillus subtilis trnas: gene expression level and species-specific diversity of codon usage based on multivariate analysis. Gene 238, 143–155. ¨ ¨ Kandhavelu, M., Mannerstrom, H., Gupta, A., Hakkinen, A., Lloyd-Price, J., Yli-Harja, O., Ribeiro, A., 2011. In vivo kinetics of transcription initiation of the lar promoter in Escherichia coli. Evidence for a sequential mechanism with two rate-limiting steps. BMC Syst. Biol. 5, 149. Karlin, S., Mra´zek, J., Campbell, A., 1998. Codon usages in different gene classes of the Escherichia coli genome. Mol. Microbiol. 29, 1341–1355. Keiler, K., 2008. Biology of trans-translation. Ann. Rev. Microbiol. 62, 133–151. Kudla, G., Murray, A., Tollervey, D., Plotkin, J., 2009. Coding-sequence determinants of gene expression in Escherichia coli. Science 324, 255–258. Landick, R., 2006. The regulatory roles and mechanism of transcriptional pausing. Biochem. Soc. Trans. 34, 1062–1066. Lewin, B., 2008. Genes. Jones and Bartlett Publishers, USA. (Chapter IX), pp. 256–299. ¨ ¨ J., Lloyd-Price, J., Yli-Harja, O., Ribeiro, A., 2011. Stochastic sequence-level Makel a, model of coupled transcription and translation in prokaryotes. BMC Bioinformatics 12, 121. McClure, W., 1980. Rate-limiting steps in rna chain initiation. Proc. Natl. Acad. Sci. 77, 5634–5638. Mitarai, N., Sneppen, K., Pedersen, S., 2008. Ribosome collisions and translation efficiency: optimization by codon usage and mrna destabilization. J. Mol. Biol. 382, 236–245. Moore, S., Sauer, R., 2005. Ribosome rescue: tmrna tagging activity and capacity in Escherichia coli. Mol. Microbiol. 58, 456–466. Pedraza, J., Paulsson, J., 2008. Effects of molecular memory and bursting on fluctuations in gene expression. Science 319, 339–343.

25

Proshkin, S., Rahmouni, A., Mironov, A., Nudler, E., 2010. Cooperation between translating ribosomes and rna polymerase in transcription elongation. Science 328, 504–508. Ribeiro, A., 2007. Effects of coupling strength and space on the dynamics of coupled toggle switches in stochastic gene networks with multiple-delayed reactions. Phys. Rev. E 75, 061903. ¨ Ribeiro, A., Smolander, O., Rajala, T., Hakkinen, A., Yli-Harja, O., 2009. Delayed stochastic model of transcription at the single nucleotide level. J. Comp. Biol. 16, 539–553. Ribeiro, A.S., Kauffman, S.A., 2007. Noisy attractors and ergodic sets in models of gene regulatory networks. J. Theor. Biol. 247, 743–755. Ringquist, S., Shinedling, S., Barrick, D., Green, L., Binkley, J., Stormo, G., Gold, L., 1992. Translation initiation in Escherichia coli: sequences within the ribosomebinding site. Mol. Microbiol. 6, 1219–1229. Roussel, M., Zhu, R., 2006. Validation of an algorithm for delay stochastic simulation of transcription and translation in prokaryotic gene expression. Phys. Biol. 3, 274–284. Shoji, S., Walker, S., Fredrick, K., 2009. Ribosomal translocation: one step closer to the molecular mechanism. ACS Chem. Biol. 4, 93–107. Sørensen, M., Pedersen, S., 1991. Absolute in vivo translation rates of individual codons in Escherichia coli: the two glutamic acid codons gaa and gag are translated with a threefold difference in rate. J. Mol. Biol. 222, 265–280. Taniguchi, Y., Choi, P., Li, G., Chen, H., Babu, M., Hearn, J., Emili, A., Xie, X., 2010. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science 329, 533–538. Tuller, T., Carmi, A., Vestsigian, K., Navon, S., Dorfan, Y., Zaborske, J., Pan, T., Dahan, O., Furman, I., Pilpel, Y., 2010. An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell 141, 344–354. Varenne, S., Buc, J., Lloubes, R., Lazdunski, C., 1984. Translation is a non-uniform process: effect of trna availability on the rate of elongation of nascent polypeptide chains. J. Mol. Biol. 180, 549–576. Welch, M., Govindarajan, S., Ness, J., Villalobos, A., Gurney, A., Minshull, J., Gustafsson, C., 2009. Design parameters to control synthetic gene expression in Escherichia coli. PLoS ONE 4, e7002. Wen, J., Lancaster, L., Hodges, C., Zeri, A., Yoshimura, S.H., Noller, H.F., Bustamante, C., Tinoco, I., 2008. Following translation by single ribosomes one codon at a time. Nature 452, 598–603. Yu, J., Xiao, J., Ren, X., Lao, K., Xie, S., 2006. Probing gene expression in live cells, one protein molecule at a time. Science 311, 1600–1603. Zellars, M., Squires, C., 1999. Antiterminator-dependent modulation of transcription elongation rates by nusb and nusg. Mol. Microbiol. 32, 1296–1304.