Next-generation sequencing-based analysis of reverse transcriptase fidelity

Next-generation sequencing-based analysis of reverse transcriptase fidelity

Accepted Manuscript Next-generation sequencing-based analysis of reverse transcriptase fidelity Kiyoshi Yasukawa, Kei Iida, Hiroyuki Okano, Ryota Hide...

835KB Sizes 5 Downloads 109 Views

Accepted Manuscript Next-generation sequencing-based analysis of reverse transcriptase fidelity Kiyoshi Yasukawa, Kei Iida, Hiroyuki Okano, Ryota Hidese, Misato Baba, Itaru Yanagihara, Kenji Kojima, Teisuke Takita, Shinsuke Fujiwara PII:

S0006-291X(17)31536-X

DOI:

10.1016/j.bbrc.2017.07.169

Reference:

YBBRC 38275

To appear in:

Biochemical and Biophysical Research Communications

Received Date: 20 July 2017 Revised Date:

0006-291X June 0006-291X

Accepted Date: 31 July 2017

Please cite this article as: K. Yasukawa, K. Iida, H. Okano, R. Hidese, M. Baba, I. Yanagihara, K. Kojima, T. Takita, S. Fujiwara, Next-generation sequencing-based analysis of reverse transcriptase fidelity, Biochemical and Biophysical Research Communications (2017), doi: 10.1016/ j.bbrc.2017.07.169. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

1

Next-generation sequencing-based analysis of reverse transcriptase fidelity

2

Kiyoshi Yasukawaa,*, Kei Iidab, Hiroyuki Okanoa, Ryota Hidesec, Misato Babaa,

4

Itaru Yanagiharad, Kenji Kojimaa, Teisuke Takitaa, and Shinsuke Fujiwarac

5 6

a

7

University, Kitashirakawa Sakyo-ku, Kyoto 606-8502, Japan

8

b

9

Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan

SC

Division of Food Science and Biotechnology, Graduate School of Agriculture, Kyoto

M AN U

Medical Research Support Center, Graduate School of Medicine, Kyoto University,

Department of Bioscience, School of Science and Technology, Kwansei-Gakuin

10

c

11

University, 2-1 Gakuen, Sanda, Hyogo 669-1337, Japan,

12

d

13

Children’s Hospital, Osaka 594-1101, Japan

16 17

TE D

15

Department of Developmental Medicine, Research Institute, Osaka Women’s and

*Corresponding author. Fax: +81-75-753-6265. E-mail address: [email protected] (K. Yasukawa)

EP

14

RI PT

3

Abbreviations: RT, reverse transcriptase; NGS, next-generation sequencing; MMLV,

19

Moloney murine leukemia virus; HIV-1, human immunodeficiency virus type 1; AMV,

20

avian myeloblastosis virus.

AC C

18

21

1

ACCEPTED MANUSCRIPT

In this study, we devised a simple and rapid method to analyze fidelity of reverse

23

transcriptase (RT) using next-generation sequencing (NGS). The method comprises a

24

cDNA synthesis reaction from standard RNA with a primer containing a tag of 14

25

randomized bases and the RT to be tested, PCR using high-fidelity DNA polymerase,

26

and NGS. By comparing the sequence of each read with the reference sequence,

27

mutations were identified. The mutation can be identified to be due to an error

28

introduced by either cDNA synthesis, PCR, or NGS based on whether the sequence

29

reads with the same tag contain the same mutation or not. The error rates in cDNA

30

synthesis with Moloney murine leukemia virus (MMLV) RT thermostable variant MM4

31

or the recently developed 16-tuple variant of family B DNA polymerase with RT

32

activity, RTX, from Thermococcus kodakarensis, were 0.75−1.0 × 10-4 errors/base,

33

while that in the reaction with the wild-type human immunodeficiency virus type 1

34

(HIV-1) RT was 2.6 × 10-4 errors/base. Overall, our method could precisely evaluate the

35

fidelity of various RTs with different reaction conditions in a high-throughput manner

36

without the use of expensive optics and troublesome adaptor ligation.

SC

M AN U

TE D

EP AC C

37

RI PT

22

2

ACCEPTED MANUSCRIPT

38

1. Introduction

39

Reverse transcriptase (RT) [EC 2.7.7.49] is the enzyme responsible for viral genome

41

replication. It has RNA- and DNA-dependent DNA polymerase as well as RNase H

42

activities. In cDNA synthesis, RTs from Moloney murine leukemia virus (MMLV) and

43

avian myeloblastosis virus (AMV) are extensively used because they have high fidelity

44

[1]. On the other hand, RT from human immunodeficiency virus type 1 (HIV-1) is rarely

45

used for this purpose because it has lower fidelity than MMLV RT or AMV RT [2, 3].

SC

RI PT

40

Various methods have been used to analyze the fidelity of DNA polymerase,

47

including misincorporation, misextention, primer extension, or M13 lacZ mutation

48

assays. In a misincorporation assay, the reaction rate for the incorporation of incorrect

49

nucleotides is compared with that of correct ones. In a misextention assay, the reaction

50

rate for the extension from a mispaired end (i.e., G:T) is compared with that from

51

correct pair end (i.e., G:C). In these two assays, the reactions are performed under

52

single-turnover condition. In a primer extension assay, a primer extension reaction in the

53

absence of one dNTP is compared with that of all four dNTPs [4]. In a M13 lacZ

54

mutation assay, mutation frequency is determined as the ratio of mutant plaques to all

55

plaques, from which the error rates are calculated [5]. By using this assay, the error rates

56

of MMLV RT and AMV RT were reported in the range of 3.3−5.9 × 10-4 errors/base and

57

that of HIV-1 RT was 5.9 × 10-3 errors/base. One problem of this method is that the

58

identification of plaque color considerably depends on the person.

AC C

EP

TE D

M AN U

46

59

Next-generation sequencing (NGS) is widely used in basic research and clinical

60

medicine. Because hundreds of millions sequences are obtained in one run, NGS is an

61

attractive tool to be used to identify ultra-rare mutations in the genomic DNA [6] and 3

ACCEPTED MANUSCRIPT

ultra-rare misincorporations or base modifications introduced during DNA synthesis [7].

63

However, a number of errors are introduced during NGS, thus making this application

64

problematic. To circumvent this, Schmitt et al. devised a method to identify ultra-rare

65

mutations in the genome [8]: DNA fragments containing the sequences to be analyzed

66

were ligated with the adaptors containing two tags of 12 randomized bases, and each

67

mutation that was observed via NGS could be grouped based on whether the error was

68

already present in the genome or one that was incorporated by PCR or NGS, by

69

analyzing whether all sequence reads with the same tag sequences and orientations had

70

the same mutation or not [8]. In this study, we present a simple and rapid method to

71

identify mutations in cDNA fragments introduced during cDNA synthesis. By using this

72

method, we obtained the error rates for the cDNA synthesis reactions using various RTs

73

to evaluate their fidelities.

75

2. Materials and methods

76

2.1. Standard RNA

EP

77

TE D

74

M AN U

SC

RI PT

62

Standard RNA, which was the RNA of 419–nucleotides corresponding to DNA

79

sequence 15,112–15,530 of the cesD gene of Bacillus cereus (GenBank accession

80

number NC010924), was prepared by an in vitro transcription [9, 10]. The concentration

81

of purified RNA was determined spectrophotometrically at A260 and adjusted to 10 to

82

106 copies/µl with RNase free water and stored at −80ºC until use.

AC C

78

83 84 85

2.2. Recombinant enzymes Preparation of wild-type HIV-1 RT [11], thermostable MMLV RT variant MM4 4

ACCEPTED MANUSCRIPT

(E286R/E302K/L435R/D524A) [12], family A DNA polymerase variant with RT

87

activity, K4polL329A, from thermophilic Thermotoga petrophila K4 [13], DNA/RNA

88

helicase Tk-EshA from a hyperthermophilic archaeon Thermococcus kodakarensis [14],

89

and family B DNA polymerase variant with RT activity, RTX, from T. kodakarensis [15]

90

were performed as described previously. The enzyme concentration was determined by

91

the method of Bradford using Protein Assay CBB Solution (Nacalai Tesque, Kyoto,

92

Japan) with bovine serum albumin (Nacalai Tesque) as standard.

SC

RI PT

86

93

2.3. cDNA synthesis

M AN U

94

cDNA synthesis reactions (20 µL) were performed with (i) 10 nM HIV-1 RT, (ii) 10

96

nM MM4, (iii) 10 nM RTX, or (iv) 10 nM MM4, 50 nM K4polL329A, and 10 nM

97

Tk-EshA in the presence of 1×106 copies/µl cesD RNA, 5 mM MgCl2, 0 mM ((i)−(iii))

98

or 1 mM (iv) Mn(OCOCH3)2, 50 mM Bicine-KOH buffer (pH 8.2), 20 mM Tris-HCl

99

buffer (pH 8.3), 10 mM KCl, 100 mM CH3COOK, 0.2 mM dNTP, 0 mM ((i)−(iii)) or 1

100

mM (iv) ATP, 0.5 µM IonP-CesD-Rev3 primer, 10 µg/mL E. coli RNA, and 100 mM

101

trehalose at 50°C for 30 min. After the reaction, the reaction solution was incubated at

102

65°C for 5 min.

104

EP

AC C

103

TE D

95

2.4. NGS Sequencing and data processing

105

The reaction mixture for PCR (25 µL) was prepared by mixing water (15.5 µL), the

106

50-fold diluted product of the cDNA synthesis reaction (1.5 µL), 10×PCR buffer (100

107

mM Tris–HCl buffer (pH 8.3) containing 500 mM KCl and 15 mM MgCl2) (2.5 µL), 25

108

mM MgSO4 (1.5 µL), 10 µM IonP-CesD-For3 primer (0.5 µL), 10 µM IonP-Rev3

109

primer (0.5 µL), 2.0 mM dNTP (2.5 µL) and 1 U/µL recombinant KOD-Plus-Neo (1 5

ACCEPTED MANUSCRIPT

110

U/µL) (0.5 µL) (Toyobo, Osaka, Japan). PCR was performed in a 0.2 mL PCR tube for

111

40 cycles of 30 s at 95°C, 30 s at 62°C, and 30 s at 72°C. The amplified products were

112

purified with MagExtractorTM -PCR & Gel Clean up- (Toyobo, Osaka, Japan). NGS was performed with the Ion Proton Sequencer (Thermo Fisher Scientific,

114

Waltham, MA). Sequence reads with the correct signatures and barcode sequences in

115

the correct lengths were selected from all sequence reads by clipping nucleotide

116

sequences derived from cesD mRNA with fastx_clipper program in FASTX-Toolkit

117

(http://hannonlab.cshl.edu/fastx_toolkit/) using the following options; -c, -v, -M25, and

118

“-a CACCAAAGAGGTACGGTCTAATGGTCTTGT”. Initial 70-base sequences

119

adjacent to the barcode sequences, containing 21-base primer-derived sequences and

120

49-base cDNA synthesis reaction-derived sequence, were grouped using Perl scripts as

121

follows: when the biggest sequence group had more than 5 reads and accounts for 80%

122

of the barcode group, the sequence was treated as the representative sequences for the

123

barcode group. Each representative sequence was aligned to the reference cesD

124

sequence with Needle program in EMBOSS packages [16]. After the alignment,

125

substitution, insertion, and deletion were counted.

128

SC

M AN U

TE D

EP

127

3. Results and Discussion

AC C

126

RI PT

113

129

One of the disadavantages of NGS for evaluating error rates in cDNA synthesis

130

reactions is that the error rate of NGS can be as high as 1% [17], much higher than that

131

of the cDNA synthesis reaction. This disadvantage is more apparent for the Thermo

132

Fisher Scientific Ion Proton Sequencer than for the Illumina MiSeq Sequencer [17].

133

However, the advantage of the Ion Proton Sequencer is that it does not require 6

ACCEPTED MANUSCRIPT

134

expensive optics and troublesome adaptor ligation, thus resulting in high throughput and

135

low cost. Under these backgrounds, we devised a new method using the Ion Proton Sequencer.

137

Figure 1A shows a workflow to generate products for NGS. The cDNA was synthesized

138

with RT using a primer containing sequences for the Ion Proton sequencing adaptor α, a

139

five-base key nucleotide ATCGA, and a 14-base randomized barcode. The individually

140

labeled cDNAs were amplified by PCR with a high-fidelity thermostable DNA

141

polymerase using a pair of primers each containing the Ion Proton sequencing adaptor

142

α and β sequences, respectively. The PCR products were then subjected to NGS with

143

the Ion Proton Sequencer. Figure 1B shows the analysis of sequencing reads.

144

Sequencing reads containing the same barcode sequences detected in five or more

145

individual reads were selected, grouped according to barcode, and used for analysis,

146

while those containing the same barcode sequences detected in 1−4 individual reads

147

were discarded. Errors present in all sequence reads from one group were regarded as

148

those that were introduced by cDNA synthesis, whereas errors present in one or some,

149

but not all, sequencing reads were regarded as those that were introduced by PCR or

150

NGS.

EP

TE D

M AN U

SC

RI PT

136

We used the standard RNA and primers as shown in Fig. 2A, since we have

152

previously used this RNA as a model for optimizing a novel one-step [10] or two-step

153

[9] RT-PCR using the genetically engineered thermostable variant of MMLV RT, MM4

154

(E286R/E302K/L435R/D524A) in addition to the genetically engineered variant of

155

family A DNA polymerase with RT activity, K4polL329A (L329A) from the

156

hyperthermophilic bacterium Thermotoga petrophila K4, as well as the euryarchaeota-

157

specific DNA/RNA helicase Tk-EshA from the hyperthermophilic archaeon

AC C

151

7

ACCEPTED MANUSCRIPT

Thermococcus kodakarensis. The cDNA synthesis reactions were performed using a

159

combination of these three enzymes and the IonP-cesD-Rev3 primer, and the PCR was

160

performed using a commercial high-fidelity DNA polymerase and the IonP-cesD-For3

161

and IonP-Rev3 primers. Figure 2B shows the predicted nucleotide sequences of the

162

PCR product. The conventional nucleotide sequencing of the PCR product obtained

163

revealed that all sequences coincided with the ones shown in Fig. 2B (data not shown)

164

and that the 14-base randomized tag had no nucleotide bias (Fig. 2C).

SC

RI PT

158

We first investigated the reproducibility of NGS. The number of the total sequence

166

reads were 79,955,703 in Experiment (Exp.) 1 and 98,725,285 in Exp. 2, and the

167

average lengths of the sequence reads were 161 in Exp. 1 and 2. Figure S1 shows the

168

distribution for the number of sequencing read with the same barcode as in Exp. 1.

169

Except for groups of which the number of sequencing read with the same barcode was

170

less than 50, the distribution profile exhibited the bell-shaped profile with a 750−849

171

maximum of the same barcodes. Sequencing reads containing the same barcode

172

sequences detected in five or more individual reads were selected and grouped

173

according to barcode, and the 49-base sequences (nucleotide number 234−282 in Fig. 2)

174

were used for analysis. Figure S2 shows a comparison of the error rates at each position

175

in the cDNA from Exp. 1 and Exp. 2. The distribution profiles of errors in Exp. 1 and 2

176

were highly similar, with the regression coefficients of 0.87. Therefore, we concluded

177

that NGS analysis produces highly reproducible quantitative results.

AC C

EP

TE D

M AN U

165

178

We next investigated the error rates of the cDNA reactions with HIV-1 RT, MM4,

179

RTX (variant of Thermococcus kodakarensis family B DNA polymerase with RT

180

activity), or the above described enzyme combination of MM4, K4polL329A, and

181

Tk-EshA. It is known that HIV-1 RT has low fidelity, whereas MMLV RT and RTX have 8

ACCEPTED MANUSCRIPT

high fidelities. Among the three error types, substitutions were much more frequent than

183

insertions or deletions (Table 1, Fig. 3). However, in the cDNA synthesis with the

184

combination of MM4, K4polL329A, and Tk-EshA, a number of deletions were detected at

185

282T, which was the nucleotide incorporated at the first turnover (Table 1, Fig. 3). In the

186

analysis of 234A−282T, the error rates (errors/base) of the reaction with MM4 (1.0 ×

187

10-4), RTX (7.5 × 10-5), and the combination of the three enzymes (1.1 × 10-4) were

188

similar and smaller than that of the reaction with HIV-1 RT (2.6 × 10-4) (Table 1). Our

189

results of MM4 and RTX were consistent with those of a previous report with the

190

Illumina MiSeq Sequencer by Ellefson et al (1.1 × 10-4 and 3.7 × 10-5 for MMLV RT

191

and RTX, respectively) [15], suggesting that this method is reliable. In the analysis of

192

282T, the error rate of the combination of the three enzymes (2.4 × 10-3) was the highest

193

(Table 1). We speculate that this might be due to that Tk-EshA weakened the annealing

194

of the primer to the template RNA.

TE D

M AN U

SC

RI PT

182

It should be noted that the error rates in the abovementioned results were not only

196

from mutations introduced during cDNA synthesis but also from that introduced during

197

the preparation of standard RNA using T7 RNA polymerase. However, it is reported that

198

the error rate during RNA synthesis is lower than that during cDNA synthesis [5, 18].

199

Thus we believe that this effect is negligible.

AC C

EP

195

200

The spectrum of HIV-1 RT had 13 hot spots (error rate of > 5.0 × 10-4 errors/base) at

201

positions 282T, 278C, 272A, 271C, 270C, 267C, 263A, 261C, 255C, 254A, 248C,

202

247A, and 235A (Fig. 3). This was rather different from the other three spectra: the

203

spectrum of MM4 had two hot spots at positions 270C and 267C; that of RTX at 261C

204

and 255C; and that of the combination of the three enzymes at 282T, 270C, and 267C

205

(Fig. 3). Figure 4 shows the comparison of the error rates at each position in the cDNA. 9

ACCEPTED MANUSCRIPT

The regression coefficients were low in the range of 0.12−0.48, suggesting that the

207

distribution profiles of the errors were different between enzymes. Table 2 shows the

208

substitution profile. The frequency of mutation depends on the base species with C

209

being the highest and G being the lowest. The C to A and T to C substitutions were

210

frequent in all reactions, which is in contrast to a recent report on PCR where the C to A

211

substation was the most frequent whereas the T to G mutation was the least [19]. The A

212

to G, C to T, A to C, and C to G mutations were frequent only in the reaction with HIV-1

213

RT (Table 2), suggesting that this might be due to the lower fidelity of HIV-1 RT.

SC

RI PT

206

In conclusion, the error rate of cDNA synthesis can be determined with the simple

215

and rapid method using NGS sequencer. Conventional methods have revealed that

216

polyamine [20], Mg2+ [21], reaction temperature [22], and the amino acid residue at

217

position 65 of HIV-1 RT [23] affected the fidelity of HIV-1 RT. Our method might be

218

valuable to evaluate the fidelity of various RTs with different reaction conditions and to

219

identify ultra-rare mutation in mRNA.

220

Acknowledgments

EP

221

TE D

M AN U

214

We appreciate Mr. Tsukasa Hayashi and Dr. Takeshi Ujiiye of Kainos Laboratories,

223

Inc. and Ms. Tomomi Yamasaki of Kyoto University for their contribution to this work.

224

This work was supported by SENTAN, Japan Science and Technology Agency. NGS

225

was performed at the Medical Research Support Center, Graduate School of Medicine,

226

Kyoto University.

AC C

222

227 228 229

Notes The authors declare no competing financial interest. 10

ACCEPTED MANUSCRIPT

230 231

References

232

libraries: overview, Methods Enz., 152 (1987) 307−316. [2]

Science, 242 (1988) 1168−1171.

236

[3]

[4]

[5]

[8]

M.W. Schmitt, S.R. Kennedy, J.J. Salk, E.J. Fox, J.B. Hiatt, L.A. Loeb, Detection

of ultra-rare mutations by next-generation sequencing, Proc. Natl. Acad. Sci. USA,

248

109 (2012) 14508−14513.

249 250

K. Iida, H. Jin, J.K. Zhu, Bioinformatics analysis suggests base modifications of tRNAs and miRNAs in Arabidopsis thaliana, BMC Genomics, 10 (2009) 155.

246 247

J. Shendure, H. Ji, Next-generation DNA sequencing, Nat. Biotechnol., 26 (2008) 1135−1145.

[7]

TE D

[6]

244 245

K. Bebenek, T.A. Kunkel, Analyzing fidelity of DNA polymerase, Methods Enz., 262 (1995) 217−232.

242 243

W.M. Kati, K.A. Johnson, L.F. Jerva, K.S. Anderson, Mechanism and fidelity of HIV reverse transcriptase, J. Biol. Chem., 267 (1992) 25988−25997.

240 241

M AN U

from HIV-1, Science, 242 (1988) 1171−1173.

238 239

J.D. Roberts, K. Bebenek, T.A. Kunkel, The accuracy of reverse transcriptase

EP

237

B.D. Preston, B.J. Poiesz, L.A. Loeb, Fidelity of HIV-1 reverse transcriptase,

SC

234 235

A.R. Kimmel, S.L. Berger, Preparation of cDNA and the generation of cDNA

RI PT

[1]

AC C

233

[9]

H. Okano, Y. Katano, M. Baba, A. Fujiwara, R. Hidese, S. Fujiwara, I. Yanagihara,

251

T. Hayashi, K. Kojima, T. Takita, K. Yasukawa, Enhanced detection of RNA by

252

MMLV reverse transcriptase coupled with thermostable DNA polymerase and

253

DNA/RNA helicase, Enzyme Microb. Technol., 96 (2017) 111–120. 11

ACCEPTED MANUSCRIPT

[10] H. Okano, M. Baba, T. Yamasaki, R. Hidese, S. Fujiwara, I. Yanagihara, T. Ujiiye,

255

T. Hayashi, K. Kojima, T. Takita, K. Yasukawa, High sensitive one-step RT-PCR

256

using MMLV reverse transcriptase, DNA polymerase with reverse transcriptase

257

activity, and DNA/RNA helicase, Biochem. Biophys. Res. Commun., 487 (2017)

258

128–133.

RI PT

254

[11] K. Nishimura, M. Shinomura, A. Konishi, K. Yasukawa, Stabilization of human

260

immunodeficiency virus type 1 reverse transcriptase by site-directed mutagenesis,

261

Biotechnol. Lett., 35 (2013) 2165–2175.

SC

259

[12] K. Yasukawa, M. Mizuno, A. Konishi, K. Inouye, Increase in thermal stability of

263

Moloney murine leukaemia virus reverse transcriptase by site-directed

264

mutagenesis, J. Biotechnol., 150 (2010) 299–306.

M AN U

262

[13] S. Sano, Y. Yamada, T. Shinkawa, S. Kato, T. Okada, H. Higashibata, S. Fujiwara,

266

Mutations to create thermostable reverse transcriptase with bacterial family A

267

DNA polymerase from Thermotoga petrophila K4, J. Biosci. Bioeng., 113 (2012)

268

315–321.

[14] A. Fujiwara, K. Kawato, S. Kato, K. Yasukawa, R. Hidese, S. Fujiwara,

EP

269

TE D

265

Application

271

kodakarensis for noise reduction in PCR, Appl. Environ. Microbiol., 82 (2016)

272

of

a

Euryarchaeota-specific

helicase

from

Thermococcus

AC C

270

3022–3031.

273

[15] J.W. Ellefson, J. Gollihar, R. Shroff, H. Shivram, V.R. Iyer, A.D. Ellington,

274

Synthetic evolutionary origin of a proofreading reverse transcriptase, Science, 352

275

(2016) 1590–1593.

276 277

[16] P. Rice, I. Longden, A. Bleasby, EMBOSS: the European molecular biology open software suite, Trends Genet., 16 (2000) 276–277. 12

ACCEPTED MANUSCRIPT

[17] M.A. Quail, M. Smith, P. Coupland, T.D. Otto, S.R. Harris, T.R. Connor, A.

279

Bertoni, H.P. Swerdlow, Y. Gu, A tale of three next generation sequencing

280

platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq

281

sequencers, BMC Genomics, 13 (2012) 341.

RI PT

278

[18] S. Ulrich, E.T. Kool, The importance of steric effects on the efficiency and

283

fidelity of transcription by T7 RNA polymerase, Biochemistry, 50 (2011) 10343–

284

10349.

SC

282

[19] D.A. Shagin, I.A. Shagina, A.R. Zaretsky, E.V. Barsova, I.V. Kelmanson, S.

286

Lkyanov, S.M. Chudakov, M. Shugay, A high-throughput assay for quantitative

287

measurement of PCR errors, Sci. Rep., 7 (2017) 2718.

M AN U

285

[20] M. Bakhanashvili, E. Novitsky, I. Levy, G. Rahav, The fidelity of DNA synthesis

289

by human immunodeficiency virus type 1 reverse transcriptase increases in the

290

presence of polyamines, FEBS Lett., 579 (2005) 1435–1440.

291

[21] V.

Achuthan,

TE D

288

B.J.

Keith,

B.A.

Connolly,

J.J.

Destefano,

Human

immunodeficiency virus reverse transcriptase displays dramatically higher fidelity

293

under physiological magnesium conditions in vitro, J. Virol., 88 (2014) 8514–

294

8527.

296

[22] M. Álvarez, L. Menéndez-Arias, Temperature effects on the fidelity of a

AC C

295

EP

292

thermostable HIV-1 reverse transcriptase, FEBS J., 281 (2014) 342–351.

297

[23] M. Álvarez, A. Sebastián-Martin, G. Garcia-Marquina, L. Menéndez-Arias,

298

Fidelity of classwide-resistant HIV-2 reverse transcriptase and differential

299

contribution of K65R to the accuracy of HIV-1 and HIV-2 reverse transcriptases,

300

Sci. Rep., 7 (2017) 44834.

301

13

ACCEPTED MANUSCRIPT

302

Figure legends

303

Fig. 1. Overview of the next-generating sequencing (NGS) analysis for the fidelity of

305

reverse transcriptases (RTs). (A) Workflow to generate products for NGS. The cDNA

306

synthesis reaction was performed with RT in the presence of a primer containing the Ion

307

Proton sequencing adaptor α and the 14-base randomized barcode sequence. PCR was

308

performed with a high-fidelity polymerase in the presence of a pair of primers

309

containing an Ion Proton sequencing adaptor α and β, respectively. (B) Analysis of

310

sequencing read. After NGS, sequencing reads are grouped according to barcode. Errors

311

that were present in all of the sequencing reads from the same group were regarded as

312

those introduced during cDNA synthesis, whereas errors that were present in one or a

313

few sequencing reads were regarded as those introduced by PCR or NGS. Sequences for

314

the α adaptors are in purple, those of the β adaptor the in sky blue, those for the key

315

nucleotide “ATCGA“ in green, and those for the barcode in red.

SC

M AN U

TE D

316

RI PT

304

Fig. 2. Nucleotide sequences. (A) The RNA and primer sequences that correspond to

318

those in Fig. 1A. The base position corresponds to that described for cesD from Bacillus

319

cereus (GenBank accession number NC010924) at positions 15,112–15,530. The primer

320

binding sequences are underlined. (B) The PCR product that corresponds to that in Fig.

321

1A. The base position corresponds to that described in Fig. 2A except for 304N−352G.

322

(C) Result from conventional sequencing of the PCR product obtained using the

323

IonP-cesD-For3 primer. The results corresponding to 301G−319C are shown. (A−C)

324

Sequences used for the NGS analysis of the mutation are marked in brown. Sequences

325

for the α adaptors of the IonP-cesD-Rev3 and IonP-Rev3 primers are in purple, and

AC C

EP

317

14

ACCEPTED MANUSCRIPT

326

those for the β adaptor of the IonP-cesD-For3 primer are in sky blue. Sequences for the

327

key nucleotide “ATCGA“ are in green. Successive 14 “N”s (304−317) marked in red

328

indicate a tag of 14 randomized bases.

RI PT

329 330

Fig. 3. Spectrum of mutation. Blue, orange, and gray bars indicate substitution,

331

insertion, and deletion, respectively.

SC

332

Fig. 4. Comparison of the error rates at each position in the cDNA. (A) HIV-1 RT vs.

334

MM4. (B) HIV-1 RT vs. RTX. (C) HIV-1 RT vs. the combination of MM4, K4polL329A,

335

and Tk-EshA. (D) MM4 vs. RTX. (E) MM4 vs. the combination of MM4, K4polL329A,

336

and Tk-EshA. (F) RTX vs. the combination of MM4, K4polL329A, and Tk-EshA. The

337

lines are drawn by linear least-squares-regression. The regression coefficients, r, in A−F

338

are 0.32, 0.22, 0.20, 0.13, 0.48, and 0.12, respectively.

TE D EP AC C

339

M AN U

333

15

ACCEPTED MANUSCRIPT

Table 1. Analysis of sequence reads from NGS. HIV-1 RT

Number of total sequence reads

88,671,943

87,590,907

87,110,839

79,955,703

Number of sequence reads with correct barcodes

60,545,241 (0.68)a

58,737,223 (0.67)

59,411,448 (0.68)

47,278,840 (0.59)

74,994

74,127

Number of groups of sequence reads with 5 or more same barcodes

RTX

M AN U



MM4

72,207

76,126

74,994

74,127

72,207

76,126

64

20

12

180

37

18

9

55

1

1

1

0

26

1

2

125

8.5 × 10-4 (1.0)b

2.7 × 10-4 (0.32)

1.7 × 10-4 (0.20)

2.4 × 10-3 (2.8)

Total base number to be analyzed (A)

3,599,712

3,558,096

3,465,936

3,654,048

AC C

Total base number to be analyzed (A)

MM4, K4polL329A, and Tk-EshA

RI PT

Enzymes

SC

1

939

358

259

400

912

307

233

355

Insertion

6

25

5

16

Deletion

21

26

21

29

2.6 × 10-4 (1.0)

1.0 × 10-4 (0.38)

7.5 × 10-5 (0.29)

1.1 × 10-4 (0.42)

Number of errors (B) Substitution

Deletion Error rates (B/A)

TE D

Insertion

EP



Number of errors (B) Substitution

Error rates (B/A)

2

a

3

b

Numbers in parentheses indicate values relative to total sequence reads. Numbers in parentheses indicate error rates relative to that of HIV-1 RT. 1

ACCEPTED MANUSCRIPT

Table 2. Substitution profile. HIV-1 RT

Total substitution

MM4

RTX

912

307

G to A

45 (0.049)a

18 (0.059)

A to G

278 (0.305)

44 (0.143)

233

355

15 (0.064)

39 (0.110) 40 (0.113)

purine to purine

T to C

69 (0.076)

50 (0.163)

41 (0.176)

52 (0.146)

C to T

177 (0.194)

16 (0.052)

7 (0.030)

7 (0.020)

G to C

4 (0.004)

7 (0.023)

0 (0)

5 (0.014)

G to T

9 (0.010)

6 (0.020)

2 (0.009)

15 (0.042)

32 (0.035)

6 (0.020)

1 (0.004)

7 (0.020)

19 (0.021)

9 (0.029)

8 (0.034)

12 (0.034)

24 (0.026)

3 (0.010)

3 (0.013)

4 (0.011)

194 (0.213)

99 (0.322)

98 (0.421)

110 (0.310)

37 (0.041)

32 (0.104)

26 (0.112)

36 (0.101)

24 (0.026)

17 (0.055)

18 (0.077)

28 (0.079)

A to C A to T

C to G C to A

AC C

T to G T to A a

EP

pyrimidine to purine

TE D

purine to pyrimidine

5

14 (0.060)

M AN U

pyrimidine to pyrimidine

MM4, K4polL329A, and Tk-EshA

RI PT

Enzymes

SC

4

Numbers in parentheses indicate values relative to total substitution.

2

A

ACCEPTED MANUSCRIPT

Key nucleotide sequence

RNA

cDNA synthesis

Barcode

Adaptor a

IonP-CesD-Rev3

SC

Adaptor b

M AN U

IonP-cesD-For3

RI PT

RNA cDNA

PCR

EP

Error

Errors at cDNA synthesis

Fig. 1

Errors at PCR or NGS

Y146F/D361L

AC C

B

TE D

PCR product

IonP-Rev3

A

1

ACCEPTED MANUSCRIPT AAAGCAUCUCUAAAAGCACAUAGUAAAGGCUUAUCUCUUUAU

43

CUAUCUAGUAUUAUCAUCUAUAUGAUUGGUUUGUUUCUUGUUUUUCCGAGUGUUUCAAAA

IonP-cesD-For3: 5'-CCTCTCTATGGGCAGTCGGTGATATGAAATCCTTACCCCCTGG-3’ 103 AGUAGUGGUAUUUCAGAUUUAAUGAAAUCCUUACCCCCUGGUCUAAUGAAAUCAUUAGGA AUCGAAGGGAAUAUGGCAAAUUUAAAUGACUAUUUAAAUAUUAAUUUCUUUAAUUCAUUG

223

UUUUUAUACAUUUUAAUGGCCUAUUGUAUAAUGACAACGAUUAAGUUGGUGACAAGACCA

RI PT

163

SC

283 UUAGACCGUACCUCUUUGGUGUAUUAUUUAUCUUCACCUGUUUCAAAAUCAAAGGUACUU IonP-cesD-Rev3:3'-AATCTGGCATGGAGAAACCACNNNNNNNNNNNNNNAGCTAGACTCAGCCTCTGTGCGTCCCTACTCTACC-5’ IonP-Rev3:3'-GACTCAGCCTCTGTGCGTCCCTACTCTACC-5’ UUCACGCAAUUUAUGGUGUUUUUUACAGGGUUAUUAUUGAUUUCCCUAGUAACGGUUCUU

403

UCUGGUAUUUUAGGAGC

M AN U

343

B

101 CCTCTCTATGGGCAGTCGGTGATATGAAATCCTTACCCCCTGGTCTAATGAAATCATTAGGA GGAGAGATACCCGTCAGCCACTATACTTTAGGAATGGGGGACCAGATTACTTTAGTAATCCT ATCGAAGGGAATATGGCAAATTTAAATGACTATTTAAATATTAATTTCTTTAATTCATTG TAGCTTCCCTTATACCGTTTAAATTTACTGATAAATTTATAATTAAAGAAATTAAGTAAC

223

TTTTTATACATTTTAATGGCCTATTGTATAATGACAACGATTAAGTTGGTGACAAGACCA AAAAATATGTAAAATTACCGGATAACATATTACTGTTGCTAATTCAACCACTGTTCTGGT

283

TTAGACCGTACCTCTTTGGTGNNNNNNNNNNNNNNTCGATCTGAGTCGGAGACACGCAGGGATGAGATGG AATCTGGCATGGAGAAACCACNNNNNNNNNNNNNNAGCTAGACTCAGCCTCTGTGCGTCCCTACTCTACC

AC C

EP

TE D

163

GTGNNNNNNNNNNNNNNTC

Fig. 2

Y146F/D361L



Error rates at each position in the cDNA x 104

24 10 23 9 8 7 6 5 4 3 2 1 0

AC C TE D

D

EP

5 4 3 2 1 0

C

M AN U

282:T 281:G 280:G 279:T 278:C 277:T 276:T 275:G 274:T 273:C 272:A 271:C 270:C 269:A 268:A 267:C 266:T 265:T 264:A 263:A 262:T 261:C 260:G 259:T 258:T 257:G 256:T 255:C 254:A 253:T 252:T 251:A 250:T 249:A 248:C 247:A 246:A 245:T 244:A 243:G 242:G 241:C 240:C 239:A 238:T 237:T 236:A 235:A 234:A

5 4 3 2 1 0

282:T 281:G 280:G 279:T 278:C 277:T 276:T 275:G 274:T 273:C 272:A 271:C 270:C 269:A 268:A 267:C 266:T 265:T 264:A 263:A 262:T 261:C 260:G 259:T 258:T 257:G 256:T 255:C 254:A 253:T 252:T 251:A 250:T 249:A 248:C 247:A 246:A 245:T 244:A 243:G 242:G 241:C 240:C 239:A 238:T 237:T 236:A 235:A 234:A

B

SC

RI PT

282:T 281:G 280:G 279:T 278:C 277:T 276:T 275:G 274:T 273:C 272:A 271:C 270:C 269:A 268:A 267:C 266:T 265:T 264:A 263:A 262:T 261:C 260:G 259:T 258:T 257:G 256:T 255:C 254:A 253:T 252:T 251:A 250:T 249:A 248:C 247:A 246:A 245:T 244:A 243:G 242:G 241:C 240:C 239:A 238:T 237:T 236:A 235:A 234:A

10 9 8 7 6 5 4 3 2 1 0

282:T 281:G 280:G 279:T 278:C 277:T 276:T 275:G 274:T 273:C 272:A 271:C 270:C 269:A 268:A 267:C 266:T 265:T 264:A 263:A 262:T 261:C 260:G 259:T 258:T 257:G 256:T 255:C 254:A 253:T 252:T 251:A 250:T 249:A 248:C 247:A 246:A 245:T 244:A 243:G 242:G 241:C 240:C 239:A 238:T 237:T 236:A 235:A 234:A

ACCEPTED MANUSCRIPT

A

Position in cDNA

Fig. 3

B

RI PT

RTX

9 8 7 6 5 4 3 2 1 0

0 1 2 3 4 5 6 7 8 9 10

HIV-1 RT

HIV-1 RT

SC

0 1 2 3 4 5 6 7 8 9 10

10 9 8 7 6 5 4 3 2 1 0

D

M ARTXN U

C

TE D

0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10

HIV-1 RT

MM4

E

0 1 2 3 4 5 6 7 8 9 10

MM4, K4polL329A, and Tk-EshA

10 9 8 7 6 5 4 3 2 1 0

A

EP

10 9 8 7 6 5 4 3 2 1 0

ACCEPTED MANUSCRIPT 10

AC C

MM4 MM4, K4polL329A, and Tk-EshA MM4, K4polL329A, and Tk-EshA

Error rates at each position in the cDNA x 104

10 9 8 7 6 5 4 3 2 1 0

10 9 8 7 6 5 4 3 2 1 0

F

0 1 2 3 4 5 6 7 8 9 10

RTX

MM4

Error rates at each position in the cDNA x 104

Fig. 4

ACCEPTED MANUSCRIPT

Highlights

RI PT

We devised a simple and rapid method to analyze fidelity of RT using NGS. cDNA is synthesized with RT and a primer containing a tag of 14 randomized bases. Then the cDNA is subjected to PCR using high-fidelity DNA polymerase followed by NGS. Our method could evaluate the fidelity of various RTs with different reaction conditions. Our method enables a high-throughput manner without the use of troublesome adaptor

AC C

EP

TE D

M AN U

SC

ligation.