Accepted Manuscript Next-generation sequencing-based analysis of reverse transcriptase fidelity Kiyoshi Yasukawa, Kei Iida, Hiroyuki Okano, Ryota Hidese, Misato Baba, Itaru Yanagihara, Kenji Kojima, Teisuke Takita, Shinsuke Fujiwara PII:
S0006-291X(17)31536-X
DOI:
10.1016/j.bbrc.2017.07.169
Reference:
YBBRC 38275
To appear in:
Biochemical and Biophysical Research Communications
Received Date: 20 July 2017 Revised Date:
0006-291X June 0006-291X
Accepted Date: 31 July 2017
Please cite this article as: K. Yasukawa, K. Iida, H. Okano, R. Hidese, M. Baba, I. Yanagihara, K. Kojima, T. Takita, S. Fujiwara, Next-generation sequencing-based analysis of reverse transcriptase fidelity, Biochemical and Biophysical Research Communications (2017), doi: 10.1016/ j.bbrc.2017.07.169. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
1
Next-generation sequencing-based analysis of reverse transcriptase fidelity
2
Kiyoshi Yasukawaa,*, Kei Iidab, Hiroyuki Okanoa, Ryota Hidesec, Misato Babaa,
4
Itaru Yanagiharad, Kenji Kojimaa, Teisuke Takitaa, and Shinsuke Fujiwarac
5 6
a
7
University, Kitashirakawa Sakyo-ku, Kyoto 606-8502, Japan
8
b
9
Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan
SC
Division of Food Science and Biotechnology, Graduate School of Agriculture, Kyoto
M AN U
Medical Research Support Center, Graduate School of Medicine, Kyoto University,
Department of Bioscience, School of Science and Technology, Kwansei-Gakuin
10
c
11
University, 2-1 Gakuen, Sanda, Hyogo 669-1337, Japan,
12
d
13
Children’s Hospital, Osaka 594-1101, Japan
16 17
TE D
15
Department of Developmental Medicine, Research Institute, Osaka Women’s and
*Corresponding author. Fax: +81-75-753-6265. E-mail address:
[email protected] (K. Yasukawa)
EP
14
RI PT
3
Abbreviations: RT, reverse transcriptase; NGS, next-generation sequencing; MMLV,
19
Moloney murine leukemia virus; HIV-1, human immunodeficiency virus type 1; AMV,
20
avian myeloblastosis virus.
AC C
18
21
1
ACCEPTED MANUSCRIPT
In this study, we devised a simple and rapid method to analyze fidelity of reverse
23
transcriptase (RT) using next-generation sequencing (NGS). The method comprises a
24
cDNA synthesis reaction from standard RNA with a primer containing a tag of 14
25
randomized bases and the RT to be tested, PCR using high-fidelity DNA polymerase,
26
and NGS. By comparing the sequence of each read with the reference sequence,
27
mutations were identified. The mutation can be identified to be due to an error
28
introduced by either cDNA synthesis, PCR, or NGS based on whether the sequence
29
reads with the same tag contain the same mutation or not. The error rates in cDNA
30
synthesis with Moloney murine leukemia virus (MMLV) RT thermostable variant MM4
31
or the recently developed 16-tuple variant of family B DNA polymerase with RT
32
activity, RTX, from Thermococcus kodakarensis, were 0.75−1.0 × 10-4 errors/base,
33
while that in the reaction with the wild-type human immunodeficiency virus type 1
34
(HIV-1) RT was 2.6 × 10-4 errors/base. Overall, our method could precisely evaluate the
35
fidelity of various RTs with different reaction conditions in a high-throughput manner
36
without the use of expensive optics and troublesome adaptor ligation.
SC
M AN U
TE D
EP AC C
37
RI PT
22
2
ACCEPTED MANUSCRIPT
38
1. Introduction
39
Reverse transcriptase (RT) [EC 2.7.7.49] is the enzyme responsible for viral genome
41
replication. It has RNA- and DNA-dependent DNA polymerase as well as RNase H
42
activities. In cDNA synthesis, RTs from Moloney murine leukemia virus (MMLV) and
43
avian myeloblastosis virus (AMV) are extensively used because they have high fidelity
44
[1]. On the other hand, RT from human immunodeficiency virus type 1 (HIV-1) is rarely
45
used for this purpose because it has lower fidelity than MMLV RT or AMV RT [2, 3].
SC
RI PT
40
Various methods have been used to analyze the fidelity of DNA polymerase,
47
including misincorporation, misextention, primer extension, or M13 lacZ mutation
48
assays. In a misincorporation assay, the reaction rate for the incorporation of incorrect
49
nucleotides is compared with that of correct ones. In a misextention assay, the reaction
50
rate for the extension from a mispaired end (i.e., G:T) is compared with that from
51
correct pair end (i.e., G:C). In these two assays, the reactions are performed under
52
single-turnover condition. In a primer extension assay, a primer extension reaction in the
53
absence of one dNTP is compared with that of all four dNTPs [4]. In a M13 lacZ
54
mutation assay, mutation frequency is determined as the ratio of mutant plaques to all
55
plaques, from which the error rates are calculated [5]. By using this assay, the error rates
56
of MMLV RT and AMV RT were reported in the range of 3.3−5.9 × 10-4 errors/base and
57
that of HIV-1 RT was 5.9 × 10-3 errors/base. One problem of this method is that the
58
identification of plaque color considerably depends on the person.
AC C
EP
TE D
M AN U
46
59
Next-generation sequencing (NGS) is widely used in basic research and clinical
60
medicine. Because hundreds of millions sequences are obtained in one run, NGS is an
61
attractive tool to be used to identify ultra-rare mutations in the genomic DNA [6] and 3
ACCEPTED MANUSCRIPT
ultra-rare misincorporations or base modifications introduced during DNA synthesis [7].
63
However, a number of errors are introduced during NGS, thus making this application
64
problematic. To circumvent this, Schmitt et al. devised a method to identify ultra-rare
65
mutations in the genome [8]: DNA fragments containing the sequences to be analyzed
66
were ligated with the adaptors containing two tags of 12 randomized bases, and each
67
mutation that was observed via NGS could be grouped based on whether the error was
68
already present in the genome or one that was incorporated by PCR or NGS, by
69
analyzing whether all sequence reads with the same tag sequences and orientations had
70
the same mutation or not [8]. In this study, we present a simple and rapid method to
71
identify mutations in cDNA fragments introduced during cDNA synthesis. By using this
72
method, we obtained the error rates for the cDNA synthesis reactions using various RTs
73
to evaluate their fidelities.
75
2. Materials and methods
76
2.1. Standard RNA
EP
77
TE D
74
M AN U
SC
RI PT
62
Standard RNA, which was the RNA of 419–nucleotides corresponding to DNA
79
sequence 15,112–15,530 of the cesD gene of Bacillus cereus (GenBank accession
80
number NC010924), was prepared by an in vitro transcription [9, 10]. The concentration
81
of purified RNA was determined spectrophotometrically at A260 and adjusted to 10 to
82
106 copies/µl with RNase free water and stored at −80ºC until use.
AC C
78
83 84 85
2.2. Recombinant enzymes Preparation of wild-type HIV-1 RT [11], thermostable MMLV RT variant MM4 4
ACCEPTED MANUSCRIPT
(E286R/E302K/L435R/D524A) [12], family A DNA polymerase variant with RT
87
activity, K4polL329A, from thermophilic Thermotoga petrophila K4 [13], DNA/RNA
88
helicase Tk-EshA from a hyperthermophilic archaeon Thermococcus kodakarensis [14],
89
and family B DNA polymerase variant with RT activity, RTX, from T. kodakarensis [15]
90
were performed as described previously. The enzyme concentration was determined by
91
the method of Bradford using Protein Assay CBB Solution (Nacalai Tesque, Kyoto,
92
Japan) with bovine serum albumin (Nacalai Tesque) as standard.
SC
RI PT
86
93
2.3. cDNA synthesis
M AN U
94
cDNA synthesis reactions (20 µL) were performed with (i) 10 nM HIV-1 RT, (ii) 10
96
nM MM4, (iii) 10 nM RTX, or (iv) 10 nM MM4, 50 nM K4polL329A, and 10 nM
97
Tk-EshA in the presence of 1×106 copies/µl cesD RNA, 5 mM MgCl2, 0 mM ((i)−(iii))
98
or 1 mM (iv) Mn(OCOCH3)2, 50 mM Bicine-KOH buffer (pH 8.2), 20 mM Tris-HCl
99
buffer (pH 8.3), 10 mM KCl, 100 mM CH3COOK, 0.2 mM dNTP, 0 mM ((i)−(iii)) or 1
100
mM (iv) ATP, 0.5 µM IonP-CesD-Rev3 primer, 10 µg/mL E. coli RNA, and 100 mM
101
trehalose at 50°C for 30 min. After the reaction, the reaction solution was incubated at
102
65°C for 5 min.
104
EP
AC C
103
TE D
95
2.4. NGS Sequencing and data processing
105
The reaction mixture for PCR (25 µL) was prepared by mixing water (15.5 µL), the
106
50-fold diluted product of the cDNA synthesis reaction (1.5 µL), 10×PCR buffer (100
107
mM Tris–HCl buffer (pH 8.3) containing 500 mM KCl and 15 mM MgCl2) (2.5 µL), 25
108
mM MgSO4 (1.5 µL), 10 µM IonP-CesD-For3 primer (0.5 µL), 10 µM IonP-Rev3
109
primer (0.5 µL), 2.0 mM dNTP (2.5 µL) and 1 U/µL recombinant KOD-Plus-Neo (1 5
ACCEPTED MANUSCRIPT
110
U/µL) (0.5 µL) (Toyobo, Osaka, Japan). PCR was performed in a 0.2 mL PCR tube for
111
40 cycles of 30 s at 95°C, 30 s at 62°C, and 30 s at 72°C. The amplified products were
112
purified with MagExtractorTM -PCR & Gel Clean up- (Toyobo, Osaka, Japan). NGS was performed with the Ion Proton Sequencer (Thermo Fisher Scientific,
114
Waltham, MA). Sequence reads with the correct signatures and barcode sequences in
115
the correct lengths were selected from all sequence reads by clipping nucleotide
116
sequences derived from cesD mRNA with fastx_clipper program in FASTX-Toolkit
117
(http://hannonlab.cshl.edu/fastx_toolkit/) using the following options; -c, -v, -M25, and
118
“-a CACCAAAGAGGTACGGTCTAATGGTCTTGT”. Initial 70-base sequences
119
adjacent to the barcode sequences, containing 21-base primer-derived sequences and
120
49-base cDNA synthesis reaction-derived sequence, were grouped using Perl scripts as
121
follows: when the biggest sequence group had more than 5 reads and accounts for 80%
122
of the barcode group, the sequence was treated as the representative sequences for the
123
barcode group. Each representative sequence was aligned to the reference cesD
124
sequence with Needle program in EMBOSS packages [16]. After the alignment,
125
substitution, insertion, and deletion were counted.
128
SC
M AN U
TE D
EP
127
3. Results and Discussion
AC C
126
RI PT
113
129
One of the disadavantages of NGS for evaluating error rates in cDNA synthesis
130
reactions is that the error rate of NGS can be as high as 1% [17], much higher than that
131
of the cDNA synthesis reaction. This disadvantage is more apparent for the Thermo
132
Fisher Scientific Ion Proton Sequencer than for the Illumina MiSeq Sequencer [17].
133
However, the advantage of the Ion Proton Sequencer is that it does not require 6
ACCEPTED MANUSCRIPT
134
expensive optics and troublesome adaptor ligation, thus resulting in high throughput and
135
low cost. Under these backgrounds, we devised a new method using the Ion Proton Sequencer.
137
Figure 1A shows a workflow to generate products for NGS. The cDNA was synthesized
138
with RT using a primer containing sequences for the Ion Proton sequencing adaptor α, a
139
five-base key nucleotide ATCGA, and a 14-base randomized barcode. The individually
140
labeled cDNAs were amplified by PCR with a high-fidelity thermostable DNA
141
polymerase using a pair of primers each containing the Ion Proton sequencing adaptor
142
α and β sequences, respectively. The PCR products were then subjected to NGS with
143
the Ion Proton Sequencer. Figure 1B shows the analysis of sequencing reads.
144
Sequencing reads containing the same barcode sequences detected in five or more
145
individual reads were selected, grouped according to barcode, and used for analysis,
146
while those containing the same barcode sequences detected in 1−4 individual reads
147
were discarded. Errors present in all sequence reads from one group were regarded as
148
those that were introduced by cDNA synthesis, whereas errors present in one or some,
149
but not all, sequencing reads were regarded as those that were introduced by PCR or
150
NGS.
EP
TE D
M AN U
SC
RI PT
136
We used the standard RNA and primers as shown in Fig. 2A, since we have
152
previously used this RNA as a model for optimizing a novel one-step [10] or two-step
153
[9] RT-PCR using the genetically engineered thermostable variant of MMLV RT, MM4
154
(E286R/E302K/L435R/D524A) in addition to the genetically engineered variant of
155
family A DNA polymerase with RT activity, K4polL329A (L329A) from the
156
hyperthermophilic bacterium Thermotoga petrophila K4, as well as the euryarchaeota-
157
specific DNA/RNA helicase Tk-EshA from the hyperthermophilic archaeon
AC C
151
7
ACCEPTED MANUSCRIPT
Thermococcus kodakarensis. The cDNA synthesis reactions were performed using a
159
combination of these three enzymes and the IonP-cesD-Rev3 primer, and the PCR was
160
performed using a commercial high-fidelity DNA polymerase and the IonP-cesD-For3
161
and IonP-Rev3 primers. Figure 2B shows the predicted nucleotide sequences of the
162
PCR product. The conventional nucleotide sequencing of the PCR product obtained
163
revealed that all sequences coincided with the ones shown in Fig. 2B (data not shown)
164
and that the 14-base randomized tag had no nucleotide bias (Fig. 2C).
SC
RI PT
158
We first investigated the reproducibility of NGS. The number of the total sequence
166
reads were 79,955,703 in Experiment (Exp.) 1 and 98,725,285 in Exp. 2, and the
167
average lengths of the sequence reads were 161 in Exp. 1 and 2. Figure S1 shows the
168
distribution for the number of sequencing read with the same barcode as in Exp. 1.
169
Except for groups of which the number of sequencing read with the same barcode was
170
less than 50, the distribution profile exhibited the bell-shaped profile with a 750−849
171
maximum of the same barcodes. Sequencing reads containing the same barcode
172
sequences detected in five or more individual reads were selected and grouped
173
according to barcode, and the 49-base sequences (nucleotide number 234−282 in Fig. 2)
174
were used for analysis. Figure S2 shows a comparison of the error rates at each position
175
in the cDNA from Exp. 1 and Exp. 2. The distribution profiles of errors in Exp. 1 and 2
176
were highly similar, with the regression coefficients of 0.87. Therefore, we concluded
177
that NGS analysis produces highly reproducible quantitative results.
AC C
EP
TE D
M AN U
165
178
We next investigated the error rates of the cDNA reactions with HIV-1 RT, MM4,
179
RTX (variant of Thermococcus kodakarensis family B DNA polymerase with RT
180
activity), or the above described enzyme combination of MM4, K4polL329A, and
181
Tk-EshA. It is known that HIV-1 RT has low fidelity, whereas MMLV RT and RTX have 8
ACCEPTED MANUSCRIPT
high fidelities. Among the three error types, substitutions were much more frequent than
183
insertions or deletions (Table 1, Fig. 3). However, in the cDNA synthesis with the
184
combination of MM4, K4polL329A, and Tk-EshA, a number of deletions were detected at
185
282T, which was the nucleotide incorporated at the first turnover (Table 1, Fig. 3). In the
186
analysis of 234A−282T, the error rates (errors/base) of the reaction with MM4 (1.0 ×
187
10-4), RTX (7.5 × 10-5), and the combination of the three enzymes (1.1 × 10-4) were
188
similar and smaller than that of the reaction with HIV-1 RT (2.6 × 10-4) (Table 1). Our
189
results of MM4 and RTX were consistent with those of a previous report with the
190
Illumina MiSeq Sequencer by Ellefson et al (1.1 × 10-4 and 3.7 × 10-5 for MMLV RT
191
and RTX, respectively) [15], suggesting that this method is reliable. In the analysis of
192
282T, the error rate of the combination of the three enzymes (2.4 × 10-3) was the highest
193
(Table 1). We speculate that this might be due to that Tk-EshA weakened the annealing
194
of the primer to the template RNA.
TE D
M AN U
SC
RI PT
182
It should be noted that the error rates in the abovementioned results were not only
196
from mutations introduced during cDNA synthesis but also from that introduced during
197
the preparation of standard RNA using T7 RNA polymerase. However, it is reported that
198
the error rate during RNA synthesis is lower than that during cDNA synthesis [5, 18].
199
Thus we believe that this effect is negligible.
AC C
EP
195
200
The spectrum of HIV-1 RT had 13 hot spots (error rate of > 5.0 × 10-4 errors/base) at
201
positions 282T, 278C, 272A, 271C, 270C, 267C, 263A, 261C, 255C, 254A, 248C,
202
247A, and 235A (Fig. 3). This was rather different from the other three spectra: the
203
spectrum of MM4 had two hot spots at positions 270C and 267C; that of RTX at 261C
204
and 255C; and that of the combination of the three enzymes at 282T, 270C, and 267C
205
(Fig. 3). Figure 4 shows the comparison of the error rates at each position in the cDNA. 9
ACCEPTED MANUSCRIPT
The regression coefficients were low in the range of 0.12−0.48, suggesting that the
207
distribution profiles of the errors were different between enzymes. Table 2 shows the
208
substitution profile. The frequency of mutation depends on the base species with C
209
being the highest and G being the lowest. The C to A and T to C substitutions were
210
frequent in all reactions, which is in contrast to a recent report on PCR where the C to A
211
substation was the most frequent whereas the T to G mutation was the least [19]. The A
212
to G, C to T, A to C, and C to G mutations were frequent only in the reaction with HIV-1
213
RT (Table 2), suggesting that this might be due to the lower fidelity of HIV-1 RT.
SC
RI PT
206
In conclusion, the error rate of cDNA synthesis can be determined with the simple
215
and rapid method using NGS sequencer. Conventional methods have revealed that
216
polyamine [20], Mg2+ [21], reaction temperature [22], and the amino acid residue at
217
position 65 of HIV-1 RT [23] affected the fidelity of HIV-1 RT. Our method might be
218
valuable to evaluate the fidelity of various RTs with different reaction conditions and to
219
identify ultra-rare mutation in mRNA.
220
Acknowledgments
EP
221
TE D
M AN U
214
We appreciate Mr. Tsukasa Hayashi and Dr. Takeshi Ujiiye of Kainos Laboratories,
223
Inc. and Ms. Tomomi Yamasaki of Kyoto University for their contribution to this work.
224
This work was supported by SENTAN, Japan Science and Technology Agency. NGS
225
was performed at the Medical Research Support Center, Graduate School of Medicine,
226
Kyoto University.
AC C
222
227 228 229
Notes The authors declare no competing financial interest. 10
ACCEPTED MANUSCRIPT
230 231
References
232
libraries: overview, Methods Enz., 152 (1987) 307−316. [2]
Science, 242 (1988) 1168−1171.
236
[3]
[4]
[5]
[8]
M.W. Schmitt, S.R. Kennedy, J.J. Salk, E.J. Fox, J.B. Hiatt, L.A. Loeb, Detection
of ultra-rare mutations by next-generation sequencing, Proc. Natl. Acad. Sci. USA,
248
109 (2012) 14508−14513.
249 250
K. Iida, H. Jin, J.K. Zhu, Bioinformatics analysis suggests base modifications of tRNAs and miRNAs in Arabidopsis thaliana, BMC Genomics, 10 (2009) 155.
246 247
J. Shendure, H. Ji, Next-generation DNA sequencing, Nat. Biotechnol., 26 (2008) 1135−1145.
[7]
TE D
[6]
244 245
K. Bebenek, T.A. Kunkel, Analyzing fidelity of DNA polymerase, Methods Enz., 262 (1995) 217−232.
242 243
W.M. Kati, K.A. Johnson, L.F. Jerva, K.S. Anderson, Mechanism and fidelity of HIV reverse transcriptase, J. Biol. Chem., 267 (1992) 25988−25997.
240 241
M AN U
from HIV-1, Science, 242 (1988) 1171−1173.
238 239
J.D. Roberts, K. Bebenek, T.A. Kunkel, The accuracy of reverse transcriptase
EP
237
B.D. Preston, B.J. Poiesz, L.A. Loeb, Fidelity of HIV-1 reverse transcriptase,
SC
234 235
A.R. Kimmel, S.L. Berger, Preparation of cDNA and the generation of cDNA
RI PT
[1]
AC C
233
[9]
H. Okano, Y. Katano, M. Baba, A. Fujiwara, R. Hidese, S. Fujiwara, I. Yanagihara,
251
T. Hayashi, K. Kojima, T. Takita, K. Yasukawa, Enhanced detection of RNA by
252
MMLV reverse transcriptase coupled with thermostable DNA polymerase and
253
DNA/RNA helicase, Enzyme Microb. Technol., 96 (2017) 111–120. 11
ACCEPTED MANUSCRIPT
[10] H. Okano, M. Baba, T. Yamasaki, R. Hidese, S. Fujiwara, I. Yanagihara, T. Ujiiye,
255
T. Hayashi, K. Kojima, T. Takita, K. Yasukawa, High sensitive one-step RT-PCR
256
using MMLV reverse transcriptase, DNA polymerase with reverse transcriptase
257
activity, and DNA/RNA helicase, Biochem. Biophys. Res. Commun., 487 (2017)
258
128–133.
RI PT
254
[11] K. Nishimura, M. Shinomura, A. Konishi, K. Yasukawa, Stabilization of human
260
immunodeficiency virus type 1 reverse transcriptase by site-directed mutagenesis,
261
Biotechnol. Lett., 35 (2013) 2165–2175.
SC
259
[12] K. Yasukawa, M. Mizuno, A. Konishi, K. Inouye, Increase in thermal stability of
263
Moloney murine leukaemia virus reverse transcriptase by site-directed
264
mutagenesis, J. Biotechnol., 150 (2010) 299–306.
M AN U
262
[13] S. Sano, Y. Yamada, T. Shinkawa, S. Kato, T. Okada, H. Higashibata, S. Fujiwara,
266
Mutations to create thermostable reverse transcriptase with bacterial family A
267
DNA polymerase from Thermotoga petrophila K4, J. Biosci. Bioeng., 113 (2012)
268
315–321.
[14] A. Fujiwara, K. Kawato, S. Kato, K. Yasukawa, R. Hidese, S. Fujiwara,
EP
269
TE D
265
Application
271
kodakarensis for noise reduction in PCR, Appl. Environ. Microbiol., 82 (2016)
272
of
a
Euryarchaeota-specific
helicase
from
Thermococcus
AC C
270
3022–3031.
273
[15] J.W. Ellefson, J. Gollihar, R. Shroff, H. Shivram, V.R. Iyer, A.D. Ellington,
274
Synthetic evolutionary origin of a proofreading reverse transcriptase, Science, 352
275
(2016) 1590–1593.
276 277
[16] P. Rice, I. Longden, A. Bleasby, EMBOSS: the European molecular biology open software suite, Trends Genet., 16 (2000) 276–277. 12
ACCEPTED MANUSCRIPT
[17] M.A. Quail, M. Smith, P. Coupland, T.D. Otto, S.R. Harris, T.R. Connor, A.
279
Bertoni, H.P. Swerdlow, Y. Gu, A tale of three next generation sequencing
280
platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq
281
sequencers, BMC Genomics, 13 (2012) 341.
RI PT
278
[18] S. Ulrich, E.T. Kool, The importance of steric effects on the efficiency and
283
fidelity of transcription by T7 RNA polymerase, Biochemistry, 50 (2011) 10343–
284
10349.
SC
282
[19] D.A. Shagin, I.A. Shagina, A.R. Zaretsky, E.V. Barsova, I.V. Kelmanson, S.
286
Lkyanov, S.M. Chudakov, M. Shugay, A high-throughput assay for quantitative
287
measurement of PCR errors, Sci. Rep., 7 (2017) 2718.
M AN U
285
[20] M. Bakhanashvili, E. Novitsky, I. Levy, G. Rahav, The fidelity of DNA synthesis
289
by human immunodeficiency virus type 1 reverse transcriptase increases in the
290
presence of polyamines, FEBS Lett., 579 (2005) 1435–1440.
291
[21] V.
Achuthan,
TE D
288
B.J.
Keith,
B.A.
Connolly,
J.J.
Destefano,
Human
immunodeficiency virus reverse transcriptase displays dramatically higher fidelity
293
under physiological magnesium conditions in vitro, J. Virol., 88 (2014) 8514–
294
8527.
296
[22] M. Álvarez, L. Menéndez-Arias, Temperature effects on the fidelity of a
AC C
295
EP
292
thermostable HIV-1 reverse transcriptase, FEBS J., 281 (2014) 342–351.
297
[23] M. Álvarez, A. Sebastián-Martin, G. Garcia-Marquina, L. Menéndez-Arias,
298
Fidelity of classwide-resistant HIV-2 reverse transcriptase and differential
299
contribution of K65R to the accuracy of HIV-1 and HIV-2 reverse transcriptases,
300
Sci. Rep., 7 (2017) 44834.
301
13
ACCEPTED MANUSCRIPT
302
Figure legends
303
Fig. 1. Overview of the next-generating sequencing (NGS) analysis for the fidelity of
305
reverse transcriptases (RTs). (A) Workflow to generate products for NGS. The cDNA
306
synthesis reaction was performed with RT in the presence of a primer containing the Ion
307
Proton sequencing adaptor α and the 14-base randomized barcode sequence. PCR was
308
performed with a high-fidelity polymerase in the presence of a pair of primers
309
containing an Ion Proton sequencing adaptor α and β, respectively. (B) Analysis of
310
sequencing read. After NGS, sequencing reads are grouped according to barcode. Errors
311
that were present in all of the sequencing reads from the same group were regarded as
312
those introduced during cDNA synthesis, whereas errors that were present in one or a
313
few sequencing reads were regarded as those introduced by PCR or NGS. Sequences for
314
the α adaptors are in purple, those of the β adaptor the in sky blue, those for the key
315
nucleotide “ATCGA“ in green, and those for the barcode in red.
SC
M AN U
TE D
316
RI PT
304
Fig. 2. Nucleotide sequences. (A) The RNA and primer sequences that correspond to
318
those in Fig. 1A. The base position corresponds to that described for cesD from Bacillus
319
cereus (GenBank accession number NC010924) at positions 15,112–15,530. The primer
320
binding sequences are underlined. (B) The PCR product that corresponds to that in Fig.
321
1A. The base position corresponds to that described in Fig. 2A except for 304N−352G.
322
(C) Result from conventional sequencing of the PCR product obtained using the
323
IonP-cesD-For3 primer. The results corresponding to 301G−319C are shown. (A−C)
324
Sequences used for the NGS analysis of the mutation are marked in brown. Sequences
325
for the α adaptors of the IonP-cesD-Rev3 and IonP-Rev3 primers are in purple, and
AC C
EP
317
14
ACCEPTED MANUSCRIPT
326
those for the β adaptor of the IonP-cesD-For3 primer are in sky blue. Sequences for the
327
key nucleotide “ATCGA“ are in green. Successive 14 “N”s (304−317) marked in red
328
indicate a tag of 14 randomized bases.
RI PT
329 330
Fig. 3. Spectrum of mutation. Blue, orange, and gray bars indicate substitution,
331
insertion, and deletion, respectively.
SC
332
Fig. 4. Comparison of the error rates at each position in the cDNA. (A) HIV-1 RT vs.
334
MM4. (B) HIV-1 RT vs. RTX. (C) HIV-1 RT vs. the combination of MM4, K4polL329A,
335
and Tk-EshA. (D) MM4 vs. RTX. (E) MM4 vs. the combination of MM4, K4polL329A,
336
and Tk-EshA. (F) RTX vs. the combination of MM4, K4polL329A, and Tk-EshA. The
337
lines are drawn by linear least-squares-regression. The regression coefficients, r, in A−F
338
are 0.32, 0.22, 0.20, 0.13, 0.48, and 0.12, respectively.
TE D EP AC C
339
M AN U
333
15
ACCEPTED MANUSCRIPT
Table 1. Analysis of sequence reads from NGS. HIV-1 RT
Number of total sequence reads
88,671,943
87,590,907
87,110,839
79,955,703
Number of sequence reads with correct barcodes
60,545,241 (0.68)a
58,737,223 (0.67)
59,411,448 (0.68)
47,278,840 (0.59)
74,994
74,127
Number of groups of sequence reads with 5 or more same barcodes
RTX
M AN U
MM4
72,207
76,126
74,994
74,127
72,207
76,126
64
20
12
180
37
18
9
55
1
1
1
0
26
1
2
125
8.5 × 10-4 (1.0)b
2.7 × 10-4 (0.32)
1.7 × 10-4 (0.20)
2.4 × 10-3 (2.8)
Total base number to be analyzed (A)
3,599,712
3,558,096
3,465,936
3,654,048
AC C
Total base number to be analyzed (A)
MM4, K4polL329A, and Tk-EshA
RI PT
Enzymes
SC
1
939
358
259
400
912
307
233
355
Insertion
6
25
5
16
Deletion
21
26
21
29
2.6 × 10-4 (1.0)
1.0 × 10-4 (0.38)
7.5 × 10-5 (0.29)
1.1 × 10-4 (0.42)
Number of errors (B) Substitution
Deletion Error rates (B/A)
TE D
Insertion
EP
Number of errors (B) Substitution
Error rates (B/A)
2
a
3
b
Numbers in parentheses indicate values relative to total sequence reads. Numbers in parentheses indicate error rates relative to that of HIV-1 RT. 1
ACCEPTED MANUSCRIPT
Table 2. Substitution profile. HIV-1 RT
Total substitution
MM4
RTX
912
307
G to A
45 (0.049)a
18 (0.059)
A to G
278 (0.305)
44 (0.143)
233
355
15 (0.064)
39 (0.110) 40 (0.113)
purine to purine
T to C
69 (0.076)
50 (0.163)
41 (0.176)
52 (0.146)
C to T
177 (0.194)
16 (0.052)
7 (0.030)
7 (0.020)
G to C
4 (0.004)
7 (0.023)
0 (0)
5 (0.014)
G to T
9 (0.010)
6 (0.020)
2 (0.009)
15 (0.042)
32 (0.035)
6 (0.020)
1 (0.004)
7 (0.020)
19 (0.021)
9 (0.029)
8 (0.034)
12 (0.034)
24 (0.026)
3 (0.010)
3 (0.013)
4 (0.011)
194 (0.213)
99 (0.322)
98 (0.421)
110 (0.310)
37 (0.041)
32 (0.104)
26 (0.112)
36 (0.101)
24 (0.026)
17 (0.055)
18 (0.077)
28 (0.079)
A to C A to T
C to G C to A
AC C
T to G T to A a
EP
pyrimidine to purine
TE D
purine to pyrimidine
5
14 (0.060)
M AN U
pyrimidine to pyrimidine
MM4, K4polL329A, and Tk-EshA
RI PT
Enzymes
SC
4
Numbers in parentheses indicate values relative to total substitution.
2
A
ACCEPTED MANUSCRIPT
Key nucleotide sequence
RNA
cDNA synthesis
Barcode
Adaptor a
IonP-CesD-Rev3
SC
Adaptor b
M AN U
IonP-cesD-For3
RI PT
RNA cDNA
PCR
EP
Error
Errors at cDNA synthesis
Fig. 1
Errors at PCR or NGS
Y146F/D361L
AC C
B
TE D
PCR product
IonP-Rev3
A
1
ACCEPTED MANUSCRIPT AAAGCAUCUCUAAAAGCACAUAGUAAAGGCUUAUCUCUUUAU
43
CUAUCUAGUAUUAUCAUCUAUAUGAUUGGUUUGUUUCUUGUUUUUCCGAGUGUUUCAAAA
IonP-cesD-For3: 5'-CCTCTCTATGGGCAGTCGGTGATATGAAATCCTTACCCCCTGG-3’ 103 AGUAGUGGUAUUUCAGAUUUAAUGAAAUCCUUACCCCCUGGUCUAAUGAAAUCAUUAGGA AUCGAAGGGAAUAUGGCAAAUUUAAAUGACUAUUUAAAUAUUAAUUUCUUUAAUUCAUUG
223
UUUUUAUACAUUUUAAUGGCCUAUUGUAUAAUGACAACGAUUAAGUUGGUGACAAGACCA
RI PT
163
SC
283 UUAGACCGUACCUCUUUGGUGUAUUAUUUAUCUUCACCUGUUUCAAAAUCAAAGGUACUU IonP-cesD-Rev3:3'-AATCTGGCATGGAGAAACCACNNNNNNNNNNNNNNAGCTAGACTCAGCCTCTGTGCGTCCCTACTCTACC-5’ IonP-Rev3:3'-GACTCAGCCTCTGTGCGTCCCTACTCTACC-5’ UUCACGCAAUUUAUGGUGUUUUUUACAGGGUUAUUAUUGAUUUCCCUAGUAACGGUUCUU
403
UCUGGUAUUUUAGGAGC
M AN U
343
B
101 CCTCTCTATGGGCAGTCGGTGATATGAAATCCTTACCCCCTGGTCTAATGAAATCATTAGGA GGAGAGATACCCGTCAGCCACTATACTTTAGGAATGGGGGACCAGATTACTTTAGTAATCCT ATCGAAGGGAATATGGCAAATTTAAATGACTATTTAAATATTAATTTCTTTAATTCATTG TAGCTTCCCTTATACCGTTTAAATTTACTGATAAATTTATAATTAAAGAAATTAAGTAAC
223
TTTTTATACATTTTAATGGCCTATTGTATAATGACAACGATTAAGTTGGTGACAAGACCA AAAAATATGTAAAATTACCGGATAACATATTACTGTTGCTAATTCAACCACTGTTCTGGT
283
TTAGACCGTACCTCTTTGGTGNNNNNNNNNNNNNNTCGATCTGAGTCGGAGACACGCAGGGATGAGATGG AATCTGGCATGGAGAAACCACNNNNNNNNNNNNNNAGCTAGACTCAGCCTCTGTGCGTCCCTACTCTACC
AC C
EP
TE D
163
GTGNNNNNNNNNNNNNNTC
Fig. 2
Y146F/D361L
C
Error rates at each position in the cDNA x 104
24 10 23 9 8 7 6 5 4 3 2 1 0
AC C TE D
D
EP
5 4 3 2 1 0
C
M AN U
282:T 281:G 280:G 279:T 278:C 277:T 276:T 275:G 274:T 273:C 272:A 271:C 270:C 269:A 268:A 267:C 266:T 265:T 264:A 263:A 262:T 261:C 260:G 259:T 258:T 257:G 256:T 255:C 254:A 253:T 252:T 251:A 250:T 249:A 248:C 247:A 246:A 245:T 244:A 243:G 242:G 241:C 240:C 239:A 238:T 237:T 236:A 235:A 234:A
5 4 3 2 1 0
282:T 281:G 280:G 279:T 278:C 277:T 276:T 275:G 274:T 273:C 272:A 271:C 270:C 269:A 268:A 267:C 266:T 265:T 264:A 263:A 262:T 261:C 260:G 259:T 258:T 257:G 256:T 255:C 254:A 253:T 252:T 251:A 250:T 249:A 248:C 247:A 246:A 245:T 244:A 243:G 242:G 241:C 240:C 239:A 238:T 237:T 236:A 235:A 234:A
B
SC
RI PT
282:T 281:G 280:G 279:T 278:C 277:T 276:T 275:G 274:T 273:C 272:A 271:C 270:C 269:A 268:A 267:C 266:T 265:T 264:A 263:A 262:T 261:C 260:G 259:T 258:T 257:G 256:T 255:C 254:A 253:T 252:T 251:A 250:T 249:A 248:C 247:A 246:A 245:T 244:A 243:G 242:G 241:C 240:C 239:A 238:T 237:T 236:A 235:A 234:A
10 9 8 7 6 5 4 3 2 1 0
282:T 281:G 280:G 279:T 278:C 277:T 276:T 275:G 274:T 273:C 272:A 271:C 270:C 269:A 268:A 267:C 266:T 265:T 264:A 263:A 262:T 261:C 260:G 259:T 258:T 257:G 256:T 255:C 254:A 253:T 252:T 251:A 250:T 249:A 248:C 247:A 246:A 245:T 244:A 243:G 242:G 241:C 240:C 239:A 238:T 237:T 236:A 235:A 234:A
ACCEPTED MANUSCRIPT
A
Position in cDNA
Fig. 3
B
RI PT
RTX
9 8 7 6 5 4 3 2 1 0
0 1 2 3 4 5 6 7 8 9 10
HIV-1 RT
HIV-1 RT
SC
0 1 2 3 4 5 6 7 8 9 10
10 9 8 7 6 5 4 3 2 1 0
D
M ARTXN U
C
TE D
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
HIV-1 RT
MM4
E
0 1 2 3 4 5 6 7 8 9 10
MM4, K4polL329A, and Tk-EshA
10 9 8 7 6 5 4 3 2 1 0
A
EP
10 9 8 7 6 5 4 3 2 1 0
ACCEPTED MANUSCRIPT 10
AC C
MM4 MM4, K4polL329A, and Tk-EshA MM4, K4polL329A, and Tk-EshA
Error rates at each position in the cDNA x 104
10 9 8 7 6 5 4 3 2 1 0
10 9 8 7 6 5 4 3 2 1 0
F
0 1 2 3 4 5 6 7 8 9 10
RTX
MM4
Error rates at each position in the cDNA x 104
Fig. 4
ACCEPTED MANUSCRIPT
Highlights
RI PT
We devised a simple and rapid method to analyze fidelity of RT using NGS. cDNA is synthesized with RT and a primer containing a tag of 14 randomized bases. Then the cDNA is subjected to PCR using high-fidelity DNA polymerase followed by NGS. Our method could evaluate the fidelity of various RTs with different reaction conditions. Our method enables a high-throughput manner without the use of troublesome adaptor
AC C
EP
TE D
M AN U
SC
ligation.