Clumped-MCEM: Inference for multistep transcriptional processes

Clumped-MCEM: Inference for multistep transcriptional processes

Computational Biology and Chemistry 81 (2019) 16–20 Contents lists available at ScienceDirect Computational Biology and Chemistry journal homepage: ...

960KB Sizes 0 Downloads 66 Views

Computational Biology and Chemistry 81 (2019) 16–20

Contents lists available at ScienceDirect

Computational Biology and Chemistry journal homepage: www.elsevier.com/locate/cbac

Research Article

Clumped-MCEM: Inference for multistep transcriptional processes Keerthi S. Shetty , Annappa B ⁎

T

Department of Computer Science and Engineering, National Institute of Technology Karnataka, Surathkal, India

ARTICLE INFO

ABSTRACT

Keywords: Parameter inference Mass action kinetics Time-series data Multistep promoter model Model reduction

Many biochemical events involve multistep reactions. Among them, an important biological process that involves multistep reaction is the transcriptional process. A widely used approach for simplifying multistep reactions is the delayed reaction method. In this work, we devise a model reduction strategy that represents several OFF states by a single state, accompanied by specifying a time delay for burst frequency. Using this model reduction, we develop Clumped-MCEM which enables simulation and parameter inference. We apply this method to time-series data of endogenous mouse glutaminase promoter, to validate the model assumptions and infer the kinetic parameters. Further, we compare efficiency of Clumped-MCEM with state-of-the-art methods – Bursty MCEM2 and delay Bursty MCEM. Simulation results show that Clumped-MCEM inference is more efficient for time-series data and is able to produce similar numerical accuracy as state-of-the-art methods – Bursty MCEM2 and delay Bursty MCEM in less time. Clumped-MCEM reduces computational cost by 57.58% when compared with Bursty MCEM2 and 32.19% when compared with delay Bursty MCEM.

1. Introduction Gene expression is the process by which the information in a DNA sequence is converted into mRNAs and proteins – plays an essential role in the execution of all cellular functions. The irregularity of this process results in diseases such as cancer, diabetes and neurological disorders (Lee and Young, 2013). In particular, transcriptional bursting in gene expression is not well understood. The accurate characterization of the bursts is very important, as the properties of these bursts have been implicated in disease-related processes such as bacterial phenotype switching (Choi et al., 2008) and HIV activation (Singh et al., 2010). Recent single cells experiments have confirmed bursting in many organisms (Golding et al., 2005; Chubb et al., 2006; Raj et al., 2006; Zenklusen et al., 2008). Although not all genes are transcribed in bursts (Zenklusen et al., 2008), bursting appears predominant in mammals (Suter et al., 2011; Dar et al., 2012; Halpern et al., 2015). Transcriptional bursting has been formalized as a random telegraph model (Peccoud and Ycart, 1995), in which a promoter switches between ON and OFF states. The synthesis of mRNAs only possible in the ON state. Hence, the random telegraph model is seen as an oversimplification of the architecture of most promoters. The assumption of random telegraph model is not valid because it involves multiple kinetic steps in promoter activation. Several experimental studies on promoters shed light on multistep OFF mechanism. The effective number of steps for majority of promoters is considered



greater than two due to the simultaneous regulation by multiple transcription factors, as well as chromatin modifications (Zhang et al., 2012). When a study of the human prolactin gene was conducted, it showed that the distribution of time its promoter spent in an inactive state was inferred to be strongly non-exponential, which in turn indicates towards multiple, sequential OFF states (Harper et al., 2011). In particular, by modeling promoters as an irreversible cycle, endogenous promoters showed five sequential inactive steps, while minimal synthetic promoters exhibited only one (Zoller et al., 2015). Recently, to explore kinetic control- the combinatorial control of gene expression, through regulation of different steps in the transcription cycle, is presented (Scholes et al., 2017). All these experimental facts combined with above analysis motivate us to introduce a more accurate model for the multistep transcriptional bursting processes. Therefore, the aim of this work is to develop a computational method for characterizing transcriptional bursting, given single-cell time-series data. In this work, we develop a model reduction approach that represents several OFF states by a single state, accompanied by specifying a time delay for burst frequency. Our goal is to infer parameters of such a model, consistent with observed data. The parameter inference of discrete-state stochastic systems is often performed by applying the principle of maximum likelihood. We combine Monte Carlo approach of EM (MCEM) (Wei and Tanner, 1990) and Modified Cai's Exact SSA Method (MCEM) and call the resulting approach as Clumped-MCEM. The advantage of using Modified Cai's Exact SSA Method (MCEM) is to

Corresponding author. E-mail address: [email protected] (K.S. Shetty).

https://doi.org/10.1016/j.compbiolchem.2019.107092 Received 16 November 2018; Received in revised form 20 June 2019; Accepted 10 July 2019 Available online 01 August 2019 1476-9271/ © 2019 Elsevier Ltd. All rights reserved.

Computational Biology and Chemistry 81 (2019) 16–20

K.S. Shetty and A. B

give an accurate representation of the multistep bursting model by incorporating delays into the model. To accelerate the proposed method, Cross Entropy (CE) phase (Daigle et al., 2015) is adopted. The CE phase simulates trajectories using Modified Cai's Exact SSA Method (MCEM). We apply this algorithm to time-series data of endogenous mouse glutaminase promoter to infer unknown kinetic rates and calibrate the theoretical model. Simulation results show that Clumped-MCEM inference is more efficient for time-series data and is able to produce similar numerical accuracy as state-of-the-art methods, Bursty MCEM2 (Daigle et al., 2015) and delay Bursty MCEM (Shetty and Annappa, 2018a) in less time. The paper is organized as follows. Section 2 formulates the multistep promoter OFF state bursting model. Section 3 describes the proposed approach for simulation and parameter inference to accommodate multistep transcription bursting model. Section 4 presents results and discussions in applying our algorithm to time-series data of endogenous mouse glutaminase promoter. Finally, Section 5 contains conclusions.

non-exponential. It follows the hypoexponential distribution which approaches an Erlang distribution when switching rates are identical. The representation of the transcriptional bursting Model (2) in terms of (3) requires the generation of inter-burst arrival times. This is achieved by introducing prescribed delays for inter-burst arrival times. The corresponding bursting model is given by

(4) Model (4) can be modeled as a delayed reaction by generating delay time (τd) as an Erlang distribution. There are two points to consider: First, the delay time of the reaction, which is modeled as an Erlang distribution, is the time from the initiation to completion. Second, we focus on delayed and nondelayed nonconsuming reactions. The multistep promoter OFF states of the bursting model is highlighted in red color and reduction of the model to a single state is highlighted in blue color.

2. Model reduction strategy 2.1. Random telegraph model k on

DNA off

k off km

DNA on

3.1. Experimental data

DNA on + mRNA

m

mRNA

3. Methods

DNA on

In this work, a single trajectory of luminescence data collected once every five minutes from the glutaminase promoter (Suter et al., 2011) (Fig. 1C in Suter et al.) is extracted. Data consists of 539 measurements sampled approximately once every five minutes for 43.5 h arbitrary units (Daigle et al., 2015).

(1)

Model (1) (Shetty and Annappa, 2018b) gives representation of random telegraph model using biochemical reactions. The promoter switches from DNAoff to DNAon and DNAon to DNAoff state with rate kon and koff respectively. The mRNA is produced with rate km only in the DNAon state. mRNAs live for an exponentially-distributed time interval with mean lifetime 1/γm, where mRNA degradation rate is given as γm.

3.2. Cai's Exact SSA Method The Stochastic Simulation Algorithm (SSA) (Gillespie, 1977) has been widely used to simulate the stochastic dynamics of biochemical reactions. But, to describe certain biochemical reactions such as transcription and translation accurately delays are employed. Since SSA cannot be used for reactions with delays. The extension of SSA with delays is introduced by Cai Cai (2007). The basic working of Cai's method to compute firing time τ is as follows. To begin with, it has to consider each element of the delayed event queue Tstruct, to evaluate the cdf F of the firing time τ. Following this, the relative completion times of delayed events, in the delayed event queue Tstruct, need to be updated by the simulation.

2.2. Transcriptional bursting model The representation of bursting model based on random telegraph model is given as follows. km

DNA on

DNA on + Bm × mRNA

m

mRNA Bm Geometric(c3)

(2)

The km and Bm denote the burst frequency and burst size in transcriptional bursting model. mRNA bursts arrive at exponentially-distributed time intervals with rate km in this model formulation of (2) (Gillespie, 2007). Each burst produces a geometrically-distributed number of transcripts Bm with the mean value (1 − c3)/c3 (Evans et al., 2000).

3.3. Modified Cai's Exact SSA Method (MCEM) This work proposes the Modified Cai's Exact SSA Method (MCEM) to simplify multistep reactions. The idea for processing delays for Model (4) is as follows. Let R be a nonconsuming delayed reaction with the delay τd, assuming that R is initiated at time t + τ. First, an event is created with time t + τ + τd and stored for later processing. Then the simulation continues processing until time t + τ + τd. At this point, the delay reaction R is retrieved to update the state. This work is based on the fact that none of the delayed reactions are scheduled to complete before the specified time that follows an exponential distribution. Considering that the delayed event occurs after time t + τ, the firing time τ does not change (Thanh et al., 2017). Thus, it is safe to select the a next reaction Rj, with probability j , to initiate at time t + τ. The a0 Modified Cai's Exact SSA Method (MCEM) considers absolute completion times of delayed reactions, instead of relative time.

2.3. Multistep promoter model formulation The realistic representation of multiple, sequential OFF states for Model (1) is shown in Model (3):

DNA off1 DNA offN DNA offN DNA on

k1

DNA off2

kN 1 1

kN

k off

DNA offN

DNA on

DNA off1

(3)

Model (3) differs from Model (1) in the distribution of time spent in OFF states between transcription events. In contrast to (1), it is now 17

Computational Biology and Chemistry 81 (2019) 16–20

K.S. Shetty and A. B

3.4. Discrete state chemical kinetics

simulated trajectories that are consistent with the observed data (in Eq. (9)). We set K to the value that leads to the number of consistent trajectories K′. Simplifying Eq. (9) as in Wilkinson (2006), the MCEM version of maximum likelihood estimates for each reaction is given by

We consider discrete, stochastic chemical kinetic models that assume a well-mixed reactor volume consisting of N molecular species, denoted by Si, for i = 1, …, N. The system state is represented by the Ndimensional random process X(t) = (X1(t), …, XN(t)) at time t, Xi(t) denotes the number of molecular species Si at time t. The firing of M reactions R1, …, RM, evolve through the discrete-valued molecular population numbers. aj(X(t))dt (j = 1, …, M) gives the probability that, reaction Rj will fire in the next infinitesimal time interval [t, t + dt) given X(t) = x with sum a0(X(t)). In this work, we focus on reactions that follow mass action kinetics – i.e. where aj(X(t)) = θjhj(X(t)). Where θj, a kinetic rate constant and hj(X(t)), a function that quantifies the number of possible ways Rj, can occur, given system state x. Note that the time delay is denoted using τ instead of τd for ease of interpretation in these Sections 3.4 and 3.5. The implementation of the Modified Cai's Exact SSA Method (MCEM) provides a numerical procedure for generating system trajectories of the molecular populations from their underlying distribution. It works by selecting the time to the next reaction (τ) and the index of the next reaction (j′) as exponential with mean 1/a0(x) and categorical with probabilities aj(x)/a0(x)(j = 1, …, M) random variables, respectively. Application of the Modified Cai's Exact SSA Method yields a trajectory z ( 1, j1 , …, r , jr ) . Where r is the number of times the jth reaction fires. The likelihood of the complete system trajectory as the function of kinetic parameters θ is given by, r

f (x 0 , z ) = i=1

r ji

h ji (X (t )) × exp

j h j (X j=1

ˆ (n+ 1) = arg max ( [log f (x 0 , z )|y, ˆ (n)])

(n ) [g (z|y , ˆ ) × log f (x 0 , z )] z Z (y )

(t ))

(11) The unknown kinetic parameters of the Model (11) are τd, kon, koff, km. Model (11) is simulated for the different number of promoter OFF states as presented in Tables 2 and 3 . Model (11) includes bursting with the correct parameterization and also assumes random bursts production. The unknown parameters of the model are initialized to 1. But c3 has been initialized to 0.5. We initialize unobserved initial promoter state and number of mRNAs to DNAoff and 20 respectively. We select time delay value ranging from 0.5 to 5 (when present, i.e. denoted as τd) (Table 1). 4. Results and discussion

(6)

In this section, we present empirical results to compare efficiency of Clumped-MCEM approach developed in this paper with Bursty MCEM2 and delay Bursty MCEM. The Clumped-MCEM is able to produce similar numerical accuracy as Bursty MCEM2 and delay Bursty MCEM in less time.

(7)

] is the expectation operator w.r.to the conditional where [·|y, distribution of z given y and θ(n). Z(y) is the set of all valid trajectories (n ) that are consistent with y. g [z|y , ˆ ] denotes the unknown conditional density of z. An explicit evaluation of the summation is intractable in Eq. (7), instead, we combine Monte Carlo approach of Expectation Maximization (MCEM) and Modified Cai's Exact SSA Method (MCEM) to generate (n + 1) reaction trajectories ˆ :

arg max

[I (z kn

Z (y )) × log f (x 0 , z kn)]

k=1

ˆ (n+ 1) = arg max

K k =1

log(f (x 0 , z kn ))

(10)

(5)

ˆ (n )

ˆ (n+ 1)

n ik )

a jk ×

3.6. Clumped-MCEM inference using time-series data

Single-cell time-series data is incomplete as it provides the number of molecules for a species at d discrete time instances. The observed data is represented as y (x 0 , x1 , …, xd) , where x i denotes the numbers of molecules of a subset of the N species at some time point ti. The Expectation Maximization (Dempster et al., 1977), given an incomplete (0) data, is an algorithm to calculate maximum likelihood. Given ˆ , this algorithm is based on iterative computation (Robert and Casella, 2004):

K

(

n r jk

To accelerate MCEM version of maximum likelihood estimate computation, Cross Entropy (CE) phase (Daigle et al., 2015) is adopted. The CE phase simulates trajectories (K′ in Eq. (10)) using Modified Cai's Exact SSA Method (MCEM). The CE phase parameter estimates are used as input parameters to ascent-based MCEM algorithm (Caffo et al., 2005) (K′ replaced by K″ in Eq. (10)). In this work, for the delayed nonconsuming reactions hj(X(t)) is set to 1.

3.5. Clumped-MCEM

ˆ (n+ 1) = arg max

K k =1

K k =1 r kn i=0

M i

i=1

ˆj (n+ 1) =

4.1. Model selection To compare the complexity of different models Akaike Information Criterion (AIC) (Akaike, 1974) is used. It gives the lower value for models which best fits observed experimental data. It is given by

AIC = 2m

2 log(Lˆ )

(12)

Table 1 Reactions defining Model (11).

(8)

(9)

where z kn is the kth Modified Cai's Exact SSA Method (MCEM) trajectory n simulated using the parameter vector ˆ . I (z kn Z (y )) is an indicator n function taking a value of 1 if z k is consistent with y(otherwise 0). K is the total number of simulated trajectories. k′ indexes only the K′ 18

Reaction

Rate constant

Interpretation

DNAoff ⟶ DNAon DNAon ⟶ DNAoff DNAon ⟶ DNAon + mRNA mRNA ⟶ ϕ mRNA ⟶ mRNA + Protein Protein ⟶ ϕ

kon, τd koff km 0.924 (Suter et al., 2011) 12.6 (Molina et al., 2013) 1.98 (Suter et al., 2011)

Promoter activation Promoter inactivation Transcription mRNA degradation translation Protein degradation

Computational Biology and Chemistry 81 (2019) 16–20

K.S. Shetty and A. B

Table 2 Clumped-MCEM parameter inference using glutaminase promoter time-series data. No.of States

τd

kon

koff

km

km/koff

AIC

1 2 3 4 5 6 7 9 10 12 15 20

0 0.5 0.75 1.10 1.40 1.65 2.10 3.40 4.00 4.10 4.75 5.00

2.12 2.89 3.29 4.28 4.88 6.70 8.81 20.30 30.66 27.38 35.76 46.37

5.18 3.95 4.07 3.57 3.85 4.17 3.46 3.43 3.07 3.59 3.26 3.22

73.89 70.85 73.79 71.21 75.56 77.01 69.11 68.81 63.90 71.46 67.62 66.02

14.26 17.93 18.13 19.94 19.62 18.46 19.97 20.06 20.81 19.90 20.74 20.50

3482.94 3474.89 3475.89 3474.63 3467.75 3469.72 3466.52 3464.91 3464.47 3463.7 3460.27 3463.98

Table 5 Execution times for the multistep promoter model using time-series data.

τd

kon

koff

km

km/koff

AIC

1 2 3 4 5 6 7 9 10 12 15 20

0 0.5 0.75 1.10 1.40 1.65 2.10 3.40 4.00 4.10 4.75 5.00

1.95 3.13 3.23 4.27 5.10 6.35 9.41 19.69 25.55 27.78 41.64 45.91

4.36 3.73 3.44 3.78 3.75 3.71 3.28 3.55 3.45 3.31 3.15 3.36

69.35 66.06 70.46 73.13 72.33 72.30 66.41 70.38 69.99 67.40 63.36 68.82

15.90 17.71 20.48 19.34 19.28 19.48 20.24 19.82 20.28 20.36 20.11 20.48

3484.46 3476.68 3473.48 3470.04 3468.99 3467.60 3465.89 3464.66 3464.87 3465.91 3461.25 3464.06

No. of trajectories in CE phase

No. of trajectories in MCEM phase

Clumped-MCEM delay Bursty MCEM Bursty MCEM2

10,000 – 10,000

1500 2500 2500

CPU (h)

15 15 15

Clumped-MCEM delay Bursty MCEM Bursty MCEM2

81.6 120.34 192.36

5. Conclusions In this work, we present a model reduction approach that represents several promoter OFF states by a single state, accompanied by specifying a time delay for burst frequency. To model multistep reactions, first, we create Clumped-MCEM. Application of this algorithm, to timeseries data of endogenous mouse glutaminase promoter, validates the model assumptions and infer the kinetic parameters. Using this above method, we can reduce computational cost. For instance, modeling a promoter switching between five OFF states and single ON state requires six switching parameters and simulation of six reactions per transcription process. But lumping multistep reactions, using time delays in transcriptional bursting model, reduce the number of unknown parameters that should be inferred from experimental data, as well as computational cost. This work is based on the assumption that switching rates are identical in the multistep reactions. In addition, the proposed approach is based on the mass action kinetics. However, there is scope to extend this research work for other types of multistep reactions. In this work, we present a method to reduce the model complexity that is involved in modeling multistep reactions, along with a computational technique for inferring unknown kinetic rates. We apply this method to the time-series data of endogenous mouse glutaminase promoter. Simulation results show that Clumped-MCEM inference is able to produce similar numerical accuracy as Bursty MCEM2 and delay Bursty MCEM, in less time, given experimentally observed time-series data. This can open new perspectives in gene expression models, where often researchers have to balance the accuracy of their parameter inference and simulations with the need of considering complex models.

Table 4 Initial number of trajectories simulated for the glutaminase data. Method

Method

MCEM and Bursty MCEM2 (Daigle et al., 2015)/delay Bursty MCEM (Shetty and Annappa, 2018a) respectively. The comparison of these approaches is made in terms of efficiency. We ran our numerical experiments on 8 core Intel Xeon CPU (E52650 @ 26GHz) with 64 GB memory. Table 4 shows an initial number of trajectories simulated for each method using glutaminase promoter time-series data. The total simulation time in Table 5 shows that the Bursty MCEM2 takes ≈8 days to obtain an estimate of similar accuracy as the delay Bursty MCEM estimate, which takes ≈5 days. The total simulation time for ClumpedMCEM in Table 5 shows that it takes ≈3.5 days to obtain similar accuracy as delay Bursty MCEM and Bursty MCEM2. Clumped-MCEM reduces computational cost by 57.58% and 32.19% as compared to Bursty MCEM2 and delay Bursty MCEM respectively.

Table 3 Bursty MCEM2/delay Bursty MCEM parameter inference using glutaminase promoter time-series data. No.of States

No.of States

where m denotes the number of unknown parameters from the model and Lˆ denotes the likelihood estimate. Tables 2 and 3 shows km/koff and AIC score for various values of τd and No.of States using experimentally observed values in the glutaminase dataset. The accuracy of the inference with the experimental data has been discussed in our earlier work (Shetty and Annappa, 2018a). The goal of this paper is to get the same accuracy as Shetty and Annappa (2018a) in less time. From these rows of Tables 2 and 3, we find that setting τd = 4.75 and No.of States = 15 gives the best fit when compared to experimentally observed time-series data (with mRNAs = 20) with the lowest value of AIC. The simulation results supports that model with the larger number of OFF states (15 in our case) is able to better explain the experimentally observed values during transcriptional bursting. When τd is set to 0, it reduces to one OFF state model (random telegraph model). From these rows of Tables 2 and 3, we find that the one OFF state model fails to produce bursts consistent with data (mRNAs = 20 in our case). Results in bold text describe the models that best fits the data.

Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at https://doi.org/10.1016/j.compbiolchem.2019. 107092. References Akaike, H., 1974. A new look at the statistical model identification. IEEE Trans. Autom. Control 19, 716–723. Caffo, B.S., Jank, W., Jones, G.L., 2005. Ascent-based Monte Carlo expectation-maximization. Ser. B Stat. Methodol. 67, 235–251. https://doi.org/10.1111/j.1467-9868. 2005.00499.x. Cai, X., 2007. Exact stochastic simulation of coupled chemical reactions with delays. J. Phys. Chem. 126. https://doi.org/10.1063/1.2710253.

4.2. Comparison with the literature The results presented in Tables 2 and 3 is produced using Clumped19

Computational Biology and Chemistry 81 (2019) 16–20

K.S. Shetty and A. B Choi, P.J., Cai, L., Frieda, K., Xie, X.S., 2008. A stochastic single-molecule event triggers phenotype switching of a bacterial cell. Science 322, 442–446. https://doi.org/10. 1126/science.1161427. Chubb, J.R., Trcek, T., Shenoy, S.M., Singer, R.H., 2006. Transcriptional pulsing of a developmental gene. Curr. Biol. 16, 1018–1025. https://doi.org/10.1016/j.cub.2006. 03.092. Daigle, B.J., Soltani, M., Petzold, L.R., Singh, A., 2015. Inferring single-cell gene expression mechanisms using stochastic simulation. Bioinformatics 31, 1428–1435. https://doi.org/10.1093/bioinformatics/btv007. Dar, R.D., Razooky, B.S., Singh, A., Trimeloni, T.V., McCollum, J.M., Cox, C.D., et al., 2012. Transcriptional burst frequency and burst size are equally modulated across the human genome. Proc. Natl. Acad. Sci. U.S.A. 109https://doi.org/10.1073/pnas. 1213530109. 17454-9. Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Methodol. 39, 1–38. Evans, M., Hastings, N., Peacock, B., 2000. Statistical Distributions, 3rd ed. Wiley. Gillespie, D.T., 1977. Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81, 2340–2361. https://doi.org/10.1021/j100540a008. Gillespie, D.T., 2007. Stochastic simulation of chemical kinetics. Annu. Rev. Phys. Chem. 58, 35–55. https://doi.org/10.1146/annurev.physchem.58.032806.104637. Golding, I., Paulsson, J., Zawilski, S.M., Cox, E.C., 2005. Real-time kinetics of gene activity in individual bacteria. Cell 123, 1025–1036. https://doi.org/10.1016/j.cell. 2005.09.031. Halpern, K.B., Tanami, S., Landen, S., Chapal, M., Szlak, L., Hutzler, A., et al., 2015. Bursty gene expression in the intact mammalian liver. Mol. Cell 58, 147–156. https:// doi.org/10.1016/j.molcel.2015.01.027. Harper, C.V., Finkenstadt, B., Woodcock, D.J., et al., 2011. Dynamic analysis of stochastic transcription cycles. PLoS Biol. 9, 1–14. https://doi.org/10.1371/journal.pbio. 1000607. Lee, T.I., Young, R.A., 2013. Transcriptional regulation and its misregulation in disease. Cell 152, 1237–1251. https://doi.org/10.1016/j.cell.2013.02.014. Molina, N., Suter, D.M., Cannavo, R., Zoller, B., Gotic, I., Naef, F., 2013. Stimulus-induced modulation of transcriptional bursting in a single mammalian gene. Proc. Natl. Acad. Sci. U.S.A. 110, 20563–20568. https://doi.org/10.1073/pnas.1312310110. Peccoud, J., Ycart, B., 1995. Markovian modeling of gene-product synthesis. Theoret. Popul. Biol. 48, 222–234. https://doi.org/10.1006/tpbi.1995.1027. Raj, A., Peskin, C.S., Tranchina, D., Vargas, D.Y., Tyagi, S., 2006. Stochastic mRNA synthesis in mammalian cells. PLoS Biol. 4, e309. https://doi.org/10.1371/journal.

pbio.0040309. Robert, C.P., Casella, G., 2004. Monte Carlo Statistical Methods, 2nd ed. Springer, New York. Scholes, C., DePace, A.H., Sanchez, A., 2017. Combinatorial gene regulation through kinetic control of the transcription cycle. Cell Syst. 4, 97–108. https://doi.org/10. 1016/j.cels.2016.11.012. Shetty, K.S., Annappa, 2018a. Transcriptional processes: models and inference. J. Bioinform. Comput. Biol. 16. https://doi.org/10.1142/S0219720018500233. Shetty, K.S., Annappa, et al., 2018b. Inferring transcriptional dynamics with time-dependent reaction rates using stochastic simulation. In: In: Sa, P.K. (Ed.), Recent Findings in Intelligent Computing Techniques, Advances in Intelligent Systems and Computing 708 Springer, Singapore. https://doi.org/10.1007/978-981-10-86366_58. Singh, A., Razooky, B., Cox, C.D., Simpson, M.L., Weinberger, L.S., 2010. Transcriptional bursting from the HIV-1 promoter is a significant source of stochastic noise in HIV-1 gene expression. Biophys. J. 98, L32–L34. https://doi.org/10.1016/j.bpj.2010.03. 001. Suter, D.M., Molina, N., Gatfield, D., Schneider, K., Schibler, U., Naef, F., 2011. Mammalian genes are transcribed with widely different bursting kinetics. Science 332, 472–474. https://doi.org/10.1126/science.1198817. Thanh, V.H., Zunino, R., Priami, C., 2017. Efficient stochastic simulation of biochemical reactions with noise and delays. J. Chem. Phys. 146. https://doi.org/10.1063/1. 4976703. Wei, G., Tanner, M., 1990. A Monte-Carlo implementation of the emalgorithm and the poor man's data augmentation algorithms. J. Am. Stat. Assoc. 85, 699–704. https:// doi.org/10.2307/2290005. Wilkinson, D.J., 2006. Stochastic modelling for systems biology. Taylor and Francis: Chapman and Hall/CRC Mathematical and Computational Biology Series, Boca Raton. Zenklusen, D., Larson, D.R., Singer, R.H., 2008. Single-RNA counting reveals alternative modes of gene expression in yeast. Nat. Struct. Mol. Biol. 15, 1263–1271. https://doi. org/10.1038/nsmb.1514. Zhang, J., Chen, L., Zhou, T., 2012. Analytical distribution and tunability of noise in a model of promoter progress. Biophys. J. 102, 1247–1257. https://doi.org/10.1016/j. bpj.2012.02.001. Zoller, B., Nicolas, D., Molina, N., Naef, F., 2015. Structure of silent transcription intervals and noise characteristics of mammalian genes. Mol. Syst. Biol. 11. https://doi.org/10. 15252/msb.20156257.

20