Performance of host-associated genetic markers for microbial source tracking in China

Performance of host-associated genetic markers for microbial source tracking in China

Journal Pre-proof Performance of host-associated genetic markers for microbial source tracking in China Yang Zhang, Renren Wu, Kairong Lin, Yishu Wang...

2MB Sizes 0 Downloads 42 Views

Journal Pre-proof Performance of host-associated genetic markers for microbial source tracking in China Yang Zhang, Renren Wu, Kairong Lin, Yishu Wang, Junqing Lu PII:

S0043-1354(20)30206-2

DOI:

https://doi.org/10.1016/j.watres.2020.115670

Reference:

WR 115670

To appear in:

Water Research

Received Date: 29 July 2019 Revised Date:

25 February 2020

Accepted Date: 26 February 2020

Please cite this article as: Zhang, Y., Wu, R., Lin, K., Wang, Y., Lu, J., Performance of hostassociated genetic markers for microbial source tracking in China, Water Research (2020), doi: https:// doi.org/10.1016/j.watres.2020.115670. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Published by Elsevier Ltd.

1

Performance of host-associated genetic markers for microbial

2

source tracking in China

3 4 5

Yang Zhanga, Renren Wub,c*, Kairong Lina*, Yishu Wangb,c, Junqing Lub,c

6 7 8

a

9

University, Guangzhou 510275, PR China;

Department of Water Resources and Environment, Sun Yat-sen

10

b

11

Province, South China Institute of Environmental Sciences, Ministry of

12

Ecology and Environment of the People’s Republic of China, Guangzhou

13

510000, PR China;

14

c

15

Simulation and Pollution Control, South China Institute of Environmental

16

Sciences, Ministry of Ecology and Environment of the People’s Republic

17

of China, Guangzhou 510530, P.R. China

The key Laboratory of Water and Air Pollution Control of Guangdong

State Environmental Protection Key Laboratory of Water Environmental

18 19 20

Running title: Performance of host-associated microbial source tracking

21

markers in China

22

23 24

Corresponding Author: Renren Wu; Kairong Lin

25

Address: Ruihe road 18, Huangpu District, Guangzhou 510000, P. R.

26

China; West Xingang Road 135, Guangzhou 510275, P. R.

27

China.

28

Email: [email protected]; [email protected]

29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

Renren Wu and Kairong Lin contributed equally to this study.

45 46

Abstract: Numerous genetic markers have been developed to establish

47

microbial source tracking (MST) assays in the last decade. However, the

48

selection of suitable markers is challenging due to a lack of understanding

49

of fundamental factors such as sensitivity, specificity, and concentration

50

in target/nontarget hosts, especially in East Asia. In this study, a total of

51

506 faecal samples comprised of human and 12 nonhuman hosts were

52

collected from 28 cities across China and tested for marker performance

53

characteristics. We firstly tested 40 host-associated markers based on a

54

binary

55

human-associated, 4 pig-associated, 3 ruminant-associated, and 1

56

poultry-associated) showed potential applicability in our study area. The

57

selected 15 markers were then tested using qualitative and quantitative

58

methods to characterise their performance. Overall, Bacteroidales

59

markers presented higher sensitivity and concentrations in target samples

60

compared to other bacterial or viral markers, but their specificity was low.

61

Among nontarget samples, pets accounted for 43.7% and 35.7% of

62

cross-reactivity with human-associated and poultry-associated markers,

63

respectively. Noncommon animals, including horse and donkey,

64

contributed 61.3% of cross-reactivity with ruminant-associated markers.

65

When considering the quantitative distribution of markers, their

66

concentration in nontarget samples were 1-3 orders of magnitude lower

(presence/absence)

criterion.

Here,

15

markers

(7

67

than in target samples. Moreover, a novel classification method was

68

proposed to classify the nontarget hosts into four groups spanning “no

69

cross-reactivity”, “weak cross-reactivity”, “moderate cross-reactivity”,

70

and “strong cross-reactivity” animal hosts. There were 77.9% nontarget

71

samples identified as no cross-reactivity and weak cross-reactivity hosts,

72

suggesting that these nontarget hosts produce little interference for

73

corresponding markers. Our findings elucidate the performance of

74

host-associated markers around China in a qualitative and quantitative

75

manner, and reveal the interference degree of cross-reactivity from

76

nontarget animals to genetic markers, which will facilitate tracking of

77

multiple faecal pollution sources and planning timely remedial strategies

78

in China.

79 80

Keywords: faecal pollution; microbial source tracking; genetic marker;

81

quantitative PCR; China

82 83

1. Introduction

84

Microbial source tracking (MST) is a tool used to discriminate faecal

85

pollution from different source hosts. This method presents an advantage

86

over the traditional faecal indicator bacteria (FIB) approach. FIB

87

approach generally cannot determine the source of faecal pollution

88

because FIB are widely present in most warm-blooded animal faeces

89

(Reischer et al., 2013; Mayer et al., 2018). Another limitation of

90

monitoring FIB is that these bacteria can reproduce in aquatic

91

environments alongside aquatic bacteria (Zhang et al., 2018), which may

92

confound pollution assessment. Therefore, library-independent MST

93

methods which rely on the measurement of genetic markers targeting

94

certain host-associated gut microorganisms have become increasingly

95

prominent (Reischer et al., 2013; Harwood et al., 2014; Feng et al., 2019).

96

The presence of these genetic markers in a watershed indicates faecal

97

pollution from specific hosts in environmental waters (Ahmed et al.,

98

2019). Furthermore, most genetic markers were developed from obligate

99

anaerobes and therefore degrade rapidly outside a host intestine (Bonjoch

100

et al., 2005).

101

Specific genetic markers, such as host-associated Bacteroidales 16S

102

rRNA gene markers, have been developed to discriminate the sources of

103

faecal pollution from humans (Ahmed et al., 2010a), pigs (Mieszkin et al.,

104

2009), ruminants (Bernhard et al., 2000a), poultry (Green et al., 2012),

105

common pets (Kildare et al., 2007), seagulls (Lu et al., 2008), and other

106

animals. The performance of these markers is a key determinant in

107

accurately identifying the source of faecal contamination. Moreover, the

108

suitability and accuracy of these genetic markers are susceptible to

109

regional variability. For example, HF183 is known to be highly sensitive

110

and specific to human-sourced faecal pollution in Belgium (Seurinck et

111

al., 2005) and USA (Boehm et al., 2013), but performed poorly in

112

Singapore and India (Nshimyimana et al., 2017; Odagiri et al. 2015). The

113

Bac305 marker also exhibits high specificity to ruminant faecal pollution

114

in one particular region, but not in others (Bernhard et al., 2000a; Malla et

115

al., 2018). Thus, the search for high-performance host-associated marker

116

genes in new and varied geographical regions has become a major focus

117

of MST research in recent years. The widespread applicability of genetic

118

markers has also been limited by the often narrow regional nature of

119

many seminal studies that evaluated the performance of faecal markers.

120

To our knowledge, only one study has reported the efficacy of gene

121

markers (two human-associated, two ruminant-associated, and one

122

bovine-associated) beyond a regional context, collecting faecal samples

123

from sixteen countries across six continents (Reischer et al., 2013). This

124

lack of data on genetic marker performance in a broad geographical

125

context increases the difficulty of selecting MST markers for validation in

126

new areas with certainty.

127

The performance of markers in different locations relies on repeated

128

testing of reference faecal samples (Ahmed et al., 2009; Bernhard et al.,

129

2000b; Shanks et al. 2010a). Sensitivity and specificity, the most typical

130

evaluation

131

presence/absence of specific markers in host samples. However, this

132

approach fails to quantify the abundance of said markers in individual

endpoints,

are

usually

determined

by

testing

the

133

sources. The variation of marker abundance in different host species has

134

significant implications for accurate and in-depth assessment of marker

135

performance, thus characterising these variations is of great importance

136

(Reischer et al., 2013).

137

To investigate the characteristics of a range of previously reported

138

MST markers in hosts beyond a regional context, China was selected as

139

our study area. China is the second-largest country in Asia, spanning five

140

temperature zones from cold to tropical climates. Until 2018, more than

141

1,390 billion people inhabited China. Moreover, Chinese animal

142

husbandry rapidly developed throughout the economic “great leap

143

forward” period. Nonetheless, few studies have validated MST assays

144

across China. In the present study, more than 500 faecal samples were

145

collected from 28 cities across different regions in China. When

146

considering the composition of samples, faecal pollution from human,

147

common livestock and poultry are the greatest concern. Moreover, the

148

markers selected for our study included but were not limited to the widely

149

well-acknowledged Bacteroidales genetic markers. The objectives of this

150

study were to (i) characterise the performance of host-associated markers

151

based on qualitative and quantitative analysis in a broad geographical

152

area, (ii) determine cross-reactivity between hosts and the resulting level

153

of false-positive signals for each marker, and (iii) provide a novel

154

classification

method

to

quantitatively

assess

the

degree

of

155

cross-reactivity between hosts.

156

2. Materials and methods

157

2.1 Faecal sample collection

158

From 2018 to 2019, a total of 506 faecal samples were collected from

159

human volunteers and nonhuman hosts in 28 cities across China. The

160

sampling cities are shown in Figure 1. Most of the sample sites were

161

distributed across seven major river systems in China, except for those in

162

Lhasa City and Urumqi City, which were distributed across the

163

continental river system of Tibet and north-western China, respectively.

164

Twenty-two sampling cities were selected in south-eastern China,

165

obeying the Heihe–Tengchong geo-demographic demarcation line. The

166

cities located east of said line account for the vast majority of the Chinese

167

population and, consequently, contribute far more faecal pollution to

168

nearby watersheds. In total, 506 faecal samples were collected, 117 of

169

which were of human source, 76 from pigs, 102 from ruminants

170

(including cattle, sheep, and camels), 104 from poultry (including

171

chickens, ducks, and geese), 70 from common pet animals (including

172

dogs, cats, and rabbits), and 37 from uncommon animals (including

173

horses and donkeys). The number of faecal samples collected from each

174

city along with their respective source species are summarized in Table

175

S1. Fresh faecal samples were collected from volunteers in hospitals and

176

from families in highly urbanized areas. Although some human faecal

177

samples were collected from hospitals, we applied to the doctors for

178

faecal samples from healthy people who went to the hospital for routine

179

physical examination, rather than from patients. Most dog, cat and rabbit

180

faecal samples were collected from the same urban households that

181

donated human faecal samples; however, a few dog, cat and rabbit faeces

182

were collected from rural households and as these animals were also

183

living with humans, they were considered as pets in this study. The rest of

184

the animal faecal material was collected from rural families, livestock

185

farms, and zoos. To ensure each sample came from a known source, and

186

to avoid contamination from other hosts, unified sampling guidelines

187

were defined and sent to all research partners prior to sampling in each

188

city. All faecal samples were collected using 60 mL sterile tubes.

189

Collected samples were immediately placed in a sealed icebox, then

190

transported to the laboratory, protected from the sunlight. Upon arrival to

191

the laboratory, the faecal samples were stored at -80

192

extracted.

193

until DNA was

194 195 196 197

Fig.1. Sampling cities in China. Red triangles represent the sampling cities. White line is Heihe-Tengchong geo-demographic demarcation line. The cities located east of the dividing line account for the vast majority of the Chinese population.

198 199

2.2 DNA Extraction

200

The TIANamp Stool DNA Kit (TIANGEN, Beijing, China) was used

201

to extract genomic DNA from all faecal samples following the

202

manufacturer recommendations. For simplicity, 0.25 g (wet weight) of

203

each faecal sample was added into bead tubes with a lysis buffer. Then,

204

the samples were vigorously homogenized using a TGrinder H24 Tissue

205

Homogenizer (TIANGEN, Beijing, China). Afterwards, the Universal

206

DNA Purification Kit (TIANGEN, Beijing, China) was employed to

207

remove polymerase chain reaction (PCR) inhibitors and ensure DNA

208

purity. The concentration and quality of genomic DNA were then

209

measured using a NanoDrop ND 1000 UV spectrophotometer

210

(MAESTROGEN, USA). DNA concentrations in the purified extracts

211

were between 15 and 120 ng/µL. In reference to a previous study

212

(Reischer et al., 2013), purified DNA extracts with concentrations > 30

213

ng/µL were diluted tenfold to ensure that all the purified DNA extract

214

concentrations ranged from 3 to 30 ng/µL for downstream analyses, and

215

the concentrations of most DNA templates were within 10 ng/µL (Fig.

216

S1).

217

2.3 qPCR assays and preliminary experiments

218

All qPCR reactions were performed in triplicate on a Roche

219

LightCycler® 480 II system (Roche Diognostics Ltd., Rotkreuz,

220

Switzeriand). Because we used different commercial reaction components

221

(e.g. polymerases) than those reported in original publications, all assays

222

were run according to the recommended reaction mixtures and procedure

223

of

224

host-associated gene marker assays were performed using 20 µL qPCR

225

mixtures, containing 10 µL of 2x SuperReal probe PreMix (TIANGEN,

226

China) and 2 µL of DNA template. The quantities of probe and primer

227

added were determined by their intended final concentration in the

228

mixtures. For SYBR-based marker assays, 20 µL qPCR mixtures were

229

prepared, incorporating 10 µL 2x Talent qPCR PreMix (TIANGEN,

the

used

commercial

kit

(TIANGEN,

China).

Probe-based

230

China), 2 µL of DNA template, and 10 µM of each primer. The protocol

231

for probe-based qPCR assays was executed according to the SuperReal

232

probe PreMix manufacturer's instructions (95

233

40 cycles of 95

234

protocol consisted of a step at 95

235

95

236

general Bacteroidetes marker, AllBac assay was performed to confirm the

237

amplification of DNA templates and the absence of PCR inhibition

238

(Reischer et al., 2013; Mayer et al., 2018). The detail of AllBac was

239

shown in Table S2.

for 3 s and 60

for 5 s and 60

for 15 min, followed by

for 30 s). The SYBR-based qPCR for 30 min, followed by 40 cycles of

for 10 s. As described in previous studies, the

240

To identify suitable host-associated markers for an in-depth analysis,

241

we adopted the validation method of a previous study to perform

242

pre-screening experiments targeting 40 markers (Table S2) (Fan et al.,

243

2017). These genetic markers (21 human-associated, 8 pig-associated, 7

244

ruminant-associated, and 4 poultry-associated) included but were not

245

limited to the often used Bacteroidales genetic markers. The

246

host-associated marker selection criteria in this study were (i) that they

247

targeted human, pig, ruminant, and poultry hosts, and (ii) that they had

248

precedence of good performance, either in the region where they were

249

developed or in other locations. The performance of the 40 markers were

250

shown in Table S3. Markers with sensitivity and specificity greater than

251

50% were selected for further analysis. Among these 40 markers, only

252

fifteen markers met this criterion, including 7 human-associated, 4

253

pig-associated, 3 ruminant-associated and 1 poultry-associated (Table 1).

254

After pre-screening experiments, we performed in-depth assessment for

255

these fifteen markers.

256

Quantitative analysis of these 15 markers was based on plasmid

257

standard dilutions. Plasmid DNA for different hosts was prepared with

258

the respective target PCR product and primers. The pGEM®-T Easy

259

Vector (BGI, China) was used for the crAssphage marker; all other assays

260

were performed with the pMD 19-T vector. Standard curves for all assays

261

were generated using seven 10-fold serial dilutions of plasmid DNA (i.e.

262

100–106 gene copies; GC). The resulting qPCR efficiencies were between

263

90 and 110%. The limits of detection (LODs) for individual markers were

264

calculated at 99% confidence intervals, as previously described

265

(Nshimyimana et al., 2014). Every qPCR incorporated DNA template

266

triplicates and non-template controls (Table S4). To ensure reproducibility

267

between different plates, two standards from 102 and 103 copies/µL

268

diluted positive controls (plasmid DNA) of each marker were tested in

269

different plates as described in a previous study (Nshimyimana et al.

270

2017). The average coefficient of variability (%CV) was 3.87±0.87% for

271

the 103 copies/µL standard and 3.75±0.80% for the 102 copies/µL

272

standard (Table S5).

273

274 275

Table 1. Primer and probe information for selected qPCR assays in the second test phase qPCR assay

primer or probe

sequence 5’-3’

target

reference

microorganism Human BacH

BacH-f

CTTGGCCAGCCTTCTGAAAG

Bacteroides-Prevot

(Reischer et al.,

BacH-r

CCCCATCGTCTACCGAAAATAC

ella

2010)

BacH-PC

FAM-TCATGATCCCATCCTG-NFQ–MGB

BacHum-160f

TGAGTTCACATGTCCGCATGA

Bacteroidales

(Kildare et al.,

BacHum-241r

CGTTACCCCGCCTACTATCTAATG

BacHum-193p

FAM-TCCGGTAGACGATGGGGATGCGTT-NFQ

SYBR-HF1

HF183-f

ATCATGAGTTCACATGTCCG

83

HF183-r

TACCCCGCCTACTATCTAATG

Hum2

Hum2-f

CGTCAGGTTTGTTTCGGTATTG

Hypothetical

(Shanks et al.,

Hum2-r

TCATCACGTAACTTATTTATATGCATTAGC

protein BF3236

2010a)

HumM2P

(FAM)-TATCGAAAATCTCACGGATTAACTCTTG

BacHum

2007)

Bacteroides dorei

(Ahmed et al., 2010b)

TGTACGC-(TAMRA) Hum163

CPQ_056

CPQ_064

Hum163-f

CGTCAGGTTTGTTTCGGTATTG

Hypothetical

(Shanks et al.,

Hum163-r

AAGGTGAAGGTCTGGCTGATGTAA

protein BF3236

2010a)

056F1

CAGAAGTACAAACTCCTAAAAAACGTAGAG

crAssphage

(Stachler et al.,

056R1

GATGACCAATAAACAAGCCATTAGC

056P1

(FAM)-AATAACGATTTACGTGATGTAAC-(MGB)

064F1

TGTATAGATGCTGCTGCAACTGTACTC

064R1

CGTTGTTTTCATCTTTATCTTGTCCAT

064P1

(FAM)-CTGAAATTGTTCATAAGCAA-(MGB)

Bac32-f

AACGCTAGCTACAGGCTTAAC

Pig-specific

(Mieszkin

Bac108r

CGGGCTATTCCTGACTATGGG

Bacteroidales

al., 2009)

Bac44P

(FAM)ATCGAAGCTTGCTTTGATAGAT

2014)

crAssphage

(Stachler et al., 2014)

Pig Pig-1-Bac

et

GGCG(BHQ-1) Pig-2-Bac

Bac41-f

GCATGAATTTAGCTTGCTAAATTTGAT

Pig-specific

(Mieszkin

Bac163-r

ACCTCATACGGTATTAATCCGC

Bacteroidales

al., 2009)

L.amylovor

L.amylovorus-f

TTCTGCCTTTTTGGGATCAA

Lactobacillus

(He

us

L.amylovorus-r

CCTTGTTTATTCAAGTGGGTGA

amylovorus

2016)

P.ND5

P.ND5-f

ACAGCTGCACTACAAGCAATGC

Mitochondrial

(He

P.ND5-r

GGATGTAGTCCGAATTGAGCTGATTAT

DNA NADH 5

2016)

et

et

al.,

et

al.,

gene Ruminant Rum-2-Bac

BacB2-590f

ACAGCCCGCGATTGATACTGGTAA

Ruminant-specific

(Mieszkin

Bac708Rm

CAATCGGAGTTCTTCGTGAT

Bacteroidales

al., 2010)

BacB2626P

(FAM)ATGAGGTGGATGGAATTCGTGGTGT(BH

Bacteroides-Prevot

(Bernhard

et

Q-1) Bac708

CF128-f

CCAACYTTCCCGWTACTC

et

BacCow

Bac708-r

CAATCGGAGTTCTTCGTG

ella

al., 2000a)

CF128-f

CCAACYTTCCCGWTACTC

Cow Bacteroidales

(Kildare et al.,

305r

GGACCGTGTCTCAGTTCCAGTG

GFD-f

TCGGCTGAGCACTCTAGGG

Unclassified

(Green et al.,

GFD-r

GCGTCTCTTTGTACATCCCA

Helicobacter spp.

2012)

2007)

Poultry GFD

276 277 278 279

2.4 Data analysis Sensitivity (r) and specificity (s) were determined according to the following equations (Kildare et al., 2007; Odagiri et al., 2015):

280

r=

(1)

281

s=

(2)

282

TP represents positive results for target reference samples, and FN

283

represents negative results for target reference samples. Conversely, TN

284

indicates negative results for nontarget reference samples and FP

285

represents positive results for nontarget reference samples. In preliminary

286

experiments, the mean reaction with < 31.0 Cq is considered as a positive

287

result. In the 15 selected markers, the qualitative performance was strictly

288

re-assessed based on the lower limit of detection (LOD) (Boehm et al.,

289

2013; Layton et al., 2013). The concentrations of the 15 markers in target

290

and nontarget samples were evaluated with standard curves.

291

We employed a “25th/75th” metric to classify nontarget animal

292

marker specificity and abundance into 4 groups. The 25th/75th metric

293

was determined by subtracting the 75th percentile concentration in the

294

nontarget hosts from the 25th percentile concentration in the target hosts

295

for each marker (i.e. 25th/75th metric = 25th percentiletarget − 75th

296

percentilenontarget) (Reischer et al., 2013). The four aforementioned groups

297

were: (1) “no cross-reactivity” (NCR), the marker did not produce any

298

positive signals in the nontarget animal; (2) “weak cross-reactivity”

299

(WCR), the 25th/75th metric rendered a positive value; (3) “moderate

300

cross-reactivity” (MCR), the 25th/75th metric rendered a negative value;

301

and (4) “strong cross-reactivity” (SCR), the disparity between the mean

302

concentrations of target and nontarget samples was below 1 order of

303

magnitude. qPCR data were converted into a log10 format, and statistical

304

significance was determined via the t-test or one-way ANOVA. All data

305

analysis was performed using Microsoft Excel 2010, SPSS 22 and the R

306

Statistical Computing Software.

307

3. Results

308

To understand the potential challenges of applying markers in a wide

309

range of geographical regions, the performance of 15 promising

310

pre-selected markers was mainly discussed in the subsequent analysis of

311

this study. Interestingly, the results of qualitative analysis for

312

cow-specific Bacteroidales Bac708 and BacCow markers were not only

313

detected in cattle samples but were also highly prevalent in sheep and

314

camel samples. Therefore, BacCow and BoBac should be more generally

315

considered ruminant-associated markers rather than cow-specific

316

markers.

317

3.1 Qualitative analysis

318

The sensitivities of all the markers tested ranged from 61% to 100%

319

(Table 2). Among these, human-associated markers had the most variable

320

sensitivity (61–98%), followed by pig-associated markers (68–100%). In

321

contrast, ruminant-associated marker sensitivity was in the 96–100%

322

range (Table 2). In human-associated markers, Bacteroidales markers

323

including BacH, BacHum, and SYBR-HF183 were the most prevalent

324

genetic markers, exhibiting host sensitivity values of 98%, 82%, and 74%,

325

respectively. Mitochondrial DNA markers (Hum2, Hum 163) and

326

crAssphage markers (CPQ_056, CPQ_064) exhibited relatively low

327

sensitivity (59–67%). Moreover, pig-associated Bacteroidales markers

328

(Pig-1-Bac, Pig-2-Bac) and mitochondrial marker (P.ND5) exhibited

329

significantly higher sensitivity (95–100%) compared to Lactobacillus

330

amylovorus markers (68%). All ruminant-associated markers targeted

331

Bacteroidales and showed the highest prevalence in target samples

332

compared to other host-associated markers (> 96%). Unfortunately, only

333

one poultry-associated marker (GFD) was selected for applicability in our

334

study from preliminary screens, and the sensitivity of this marker (68%)

335

was low compared to other Bacteroidales markers.

336

The specificity of the evaluated markers ranged from 50 to 91%. No

337

marker exhibited absolute host specificity. But most host-associated

338

markers presented limited cross-reactivity to nontarget samples except for

339

Pig-1-Bac,

340

presented the highest number of false-positives occurring with pets (i.e.

341

43.7% of the pet samples tested positive for seven human-associated

342

markers). Similarly, the pets also contributed many false positive signals

343

to the poultry-associated marker GFD (i.e. 35.7% of pet samples tested

344

positive for the GFD marker). Similarly, ruminant-associated markers

345

yielded the highest numbers of non-common animal (horse and donkey)

346

false-positives with 61.3% of non-common animal samples testing

347

positive for ruminant-associated markers.

Rum-2-Bac,

and

Bac708.

Human-associated

markers

348

In human-associated markers, Bacteroidales markers exhibited lower

349

specificity compared to mitochondrial DNA markers and crAssphage

350

markers. This trend was especially true for BacH and BacHum, which

351

exhibited host specificity values of 51% and 55%, respectively. Among

352

the pig-associated markers, the specificity value of both Pig-2-Bac and

353

P.ND5 were >0.90, but Pig-1-Bac showed relatively lower specificity

354

(68%) compared to other pig-associated markers. Ruminant-associated

355

markers all presented low specificity (<80%), especially Bac708, which

356

exhibited a specificity of barely 50%. In contrast, GFD showed the

357

highest specificity (91%) compared to other host-associated markers.

358

Overall, there was an apparent trade-off between sensitivity and

359

specificity in MST markers, whereby an improvement in one parameter

360

usually translated to a decrease in the other.

361

Table 2. Numbers of qPCR Positives with the indicated primers in Source Species or Source Groups qPCR positive poultry-as human-associated source

pig-associated

ruminant-associated

no. samples

sociated BacH

BacH

SYBR-

Hum

Hum

CPQ

CPQ

Pig-1-B

Pig-2-B

L.amyl

um

HF183

2

163

_056

_064

ac

ac

ovorus

P.ND5

Rum-2-

Bac708

Bac

BacCo

GFD

w

human

117

115

96

87

69

77

71

78

26

13

24

10

22

26

15

0

pig

76

35

25

16

24

12

22

21

76

72

52

72

13

47

23

12

cattle

51

22

30

0

0

7

0

0

17

5

18

0

51

51

51

0

sheep

32

16

14

11

0

0

0

0

11

4

9

5

32

32

32

0

camel

19

0

0

7

0

0

0

0

8

0

8

0

15

19

19

0

chicken

48

21

23

13

0

9

0

0

12

10

16

7

19

24

16

35

duck

35

16

12

8

0

5

14

13

6

0

9

5

15

20

8

29

goose

21

13

10

7

0

6

0

0

13

8

11

8

12

17

0

7

dog

20

17

11

10

0

8

11

0

11

0

7

0

11

14

9

11

cat

17

14

7

11

0

5

0

0

6

7

0

4

8

15

0

6

rabbit

33

26

28

20

18

11

8

9

7

0

8

4

7

12

0

8

horse

18

0

0

5

0

0

0

0

11

0

5

0

10

11

9

0

donkey

19

ruminant

poultry

pet

non-common animals

sensitivity(%) specificity(%)

362

a

11

15

9

9

0

0

0

10

5

6

0

9

16

13

0

a

98

82

74

59

66

61

67

100

95

68

95

96

100

100

68

a

51

55

70

87

84

86

89

68

88

72

90

69

50

77

91

506 506

Total number of samples.

363

3.2 Quantitative analysis

364

Marker abundance in faecal material was characterised per gram of

365

wet faeces, as discussed in previous studies (Ahmed et al., 2019;

366

Nshimyimana et al., 2017; Layton et al., 2013). The abundance of

367

markers in target and nontarget samples were assessed based on the

368

25th/75th percentiles and mean concentrations. Mean concentrations of

369

human-associated markers in target samples ranged from 3.57 ± 0.77

370

log10 GC/g to 5.27 ± 1.25 log10 GC/g, while the range in pig-associated

371

markers in target samples ranged from 4.99±1.79 log10 GC/g to 6.58 ±

372

1.59 log10 GC/g. Ruminant-associated markers had the highest

373

concentrations, ranging from 6.19 ± 1.26 log10 GC/g to 7.15 ± 0.89 log10

374

GC/g. The poultry-associated marker GFD exhibited relatively low

375

concentrations (4.09 ± 0.96 log10 GC/g) in target samples. Bacteroidales

376

markers generally presented significantly higher concentrations in target

377

samples compared to most of the other markers (paired t-test, p < 0.05).

378

Meanwhile,

379

concentrations and showed no statistically significant differences in

380

abundance with Pig-1-Bac (paired t-test, p > 0.05). Moreover, the

381

concentrations of tested markers in target samples presented much

382

broader distributions, 25th and 75th percentiles of marker concentrations

383

were separated by 1-4 orders of magnitude (Fig. 2). In human-associated

384

markers, 25th and 75th percentiles of Bacteroidales markers were

mitochondrial

marker

P.ND5

also

exhibited

high

385

separated by 3-4 orders of magnitude, which evidences a relatively broad

386

distribution compared to mitochondrial DNA and crAssphage markers. In

387

contrast, all ruminant-associated markers presented a relatively narrow

388

gap of 25th and 75th percentile distribution, which were separated by 1-3

389

orders of magnitude. The 25th and 75th percentiles of pig-associated

390

markers were separated by 1-4 orders of magnitude, but Bacteroidales

391

markers only showed 1-2 orders of magnitude differences in 25th and

392

75th percentile distribution. The 25th and 75th percentiles of

393

poultry-associated marker GFD were only separated by 2 orders of

394

magnitude.

395

The concentrations of 15 host-associated markers in nontarget

396

samples were also determined. The mean concentration in nontarget

397

samples ranged from 2.48±0.48 log10 GC/g for Hum163 to 4.09±0.90

398

log10 GC/g for BacCow, which revealed that marker concentrations in

399

nontarget samples were nearly 1-3 order of magnitude lower compared to

400

target samples. In nontarget samples, the 25th and 75th percentiles of

401

markers were separated only 1-2 order of magnitude. These results

402

indicate relatively limited distributions for the markers in nontarget

403

samples (Fig. 2). To investigate the contribution of different nontarget

404

animals to marker concentrations, the distribution of markers in each

405

nontarget host was calculated based on a corresponding standard curve

406

(Fig S2-S5).

407

408

409 410 411 412 413

Fig. 2. Concentrations of human- and nonhuman-associated markers in target/nontarget faecal samples. Boxes indicate 25th/75th percentile. Diamond indicate the median values. Panel (a): human-associated markers, and Panel (b): nonhuman-associated markers.

414 415

3.3 Classification of nontarget samples

416

Based on the distribution of false positives in nontarget samples, we

417

classified nontarget hosts into 4 groups (Fig. 3). The concentration ranges

418

of no cross-reactivity (NCR) class, weak cross-reactivity (WCR) class,

419

moderate cross-reactivity (MCR) class and strong cross-reactivity (SCR)

420

class for each marker were showed in Table S6-S9. The results of said

421

classification revealed that there were 47.1% nontarget animals assigned

422

to the weak cross-reactivity (WCR) class, and 30.8% fell into the no

423

cross-reactivity (NCR). The moderate cross-reactivity class (MCR,

424

14.5%), and strong cross-reactivity class (SCR, 7.6%) were much less

425

frequent (Fig 3). Overall, this classification method indicates that most

426

nontarget samples (77.9%) were found to have little or no impact on MST

427

assays (WCR and NCR). The distribution of WCR and SCR were

428

significantly different between human and nonhuman hosts (paired t-test,

429

p < 0.05). For instance, 40.5% of nontarget hosts were classified as NCR

430

in human-associated markers. However, of the nonhuman-associated

431

markers, 52.9% of nontarget samples were classified as WCR. Fewer

432

nontarget animal hosts fell into the strongly affected category. In

433

human-associated markers, Bacteroidales and mitochondrial markers

434

resulted in high mean concentrations in rabbit samples (ranging from

435

5.37±1.03 log10 GC/g for BacHum to 3.11±0.72 log10 GC/g for Hum163),

436

which

437

ruminant-associated markers Bac708 and BacCow, the level of

438

false-positive signals from donkey were similar to those of ruminant

439

faecal samples.

440

were

similar

in

target

samples.

Moreover,

of

the

441 442 443 444 445 446 447

Fig. 3. Classification of nontarget samples for each host-associated marker. The results were colored on the basis of the following criteria: NCR (no cross-reactivity), no false-positive signal was amplified; WCR (weak cross-reactivity), positive value for the 25th/75th metrics; MCR (moderate cross-reactivity), negative value for the 25th/75th metrics; SCR (strong cross-reactivity), the disparity of mean concentration between target and nontarget samples is less than 1 order of magnitude.

448 449

4. Discussion

450

4.1 Evaluation of genetic markers beyond a regional context

451

Suitable genetic markers spanning a wide variety of geographical

452

regions were selected from a preliminary literature review, rather than

453

conducting a blind validation study for a range of genetic markers. The

454

performance of markers typically declines dramatically when validation

455

studies are performed beyond the regional context in which the markers

456

were originally developed. Moreover, it is difficult to meet the >80%

457

requirement for both sensitivity and specificity (USEPA, 2005), at which

458

point the marker is considered useful. For example, previously published

459

studies reported BacH and BacHum to be the best performing markers

460

(Kildare et al., 2007; Reischer et al., 2010) but performed poorly in our

461

study. One study also consistently found the specificity of BacH and

462

BacHum to be 53% and 68%, respectively, even in the context of a study

463

area that spanned sixteen countries (Reischer et al., 2013). Several

464

possibly factors such as different diet, climate, animal health and lifestyle

465

may lead to the inconsistent performance of genetic markers (Stewart et

466

al., 2013; Shanks et al., 2010b; Shanks et al., 2011; Ahmed et al., 2019).

467

In addition, different DNA isolation procedures and qPCR parameters

468

(e.g. qPCR reagent, DNA load) are other influencing factor that cannot be

469

ignored, which may also contribute to the observed variable outcomes in

470

different locations (Reischer et al., 2013; Boehm et al., 2013). For

471

example, in the verification of BacH, BacHum and BacCow, all qPCR

472

reactions were run in a total volume of 20 µL in our study, whereas the

473

previous study performed qPCR reactions in a total volume of 25 µL

474

(Kildare et al., 2007; Reisher et al., 2010; Reischer et al., 2013), and the

475

temperature settings of qPCR assays were also different among these

476

studies. This discrepancy was due to the application of different

477

commercial reaction components for each study, leading to different

478

qPCR protocols and affecting the performance of these markers.

479

Therefore, we postulate that there is no single performance threshold that

480

determines a genetic marker’s applicability for MST in a wide array of

481

regions. Rather, the performance requirements are subject to each MST

482

challenge and the conditions present within each particular study area.

483

4.2 Variation of source-sensitivity

484

Quantitative data for host-associated concentrations of markers in

485

target samples is critical to detect their presence in aquatic environments.

486

A previous study proposes that if markers have a high qualitative

487

sensitivity, but their quantitative sensitivity (i.e. abundance in target

488

samples) is low, they are unlikely to detect faecal pollution in water

489

samples or will otherwise tend to underestimate the level of

490

contamination due to dilution or losses by sample processing steps

491

(Ahmed et al., 2019). Based on our results, we propose that sensitivity

492

may also be positively associated with the concentrations of specific

493

markers in the target samples. For example, the human-associated

494

markers Hum2, Hum163, CPQ_056, and CPQ_064 exhibited poor

495

sensitivity (59–67%), and their concentrations in the target samples were

496

also lower than the other human-associated markers by 1–3 orders of

497

magnitude. A similar trend was reported by previous studies that

498

evaluated the performance of human-associated markers in Singapore and

499

Australia (Nshimyimana et al., 2017; Ahmed et al., 2019). These

500

observations suggest that a higher mean concentration of markers allow

501

for a higher likelihood of obtaining true positive signals above the LOD

502

in target hosts. Thus, the sensitivity of genetic markers could be used to

503

predict the quantitative performance in target hosts when conducting

504

preliminary screens to select promising markers based on binary data (i.e.

505

presence/absence).

506

The population distribution of target microorganisms in the host

507

intestines, which is another factor considered in the development of

508

genetic markers, may also be linked to marker sensitivity and

509

concentrations in target samples. In the present study, Bacteroidales

510

markers typically exhibited higher sensitivity and concentrations

511

compared to other bacteria and virus markers. This may be because

512

Bacteroidales are a dominant microbial population, compared to most

513

other microorganisms, in mammal intestines (Ahmed et al., 2010a). The

514

pig-associated mitochondrial DNA (mtDNA) marker P.ND5 also

515

exhibited high sensitivity in our study. This is consistent with several

516

earlier studies on the abundance of mtDNA markers, which suggest that

517

multiple copies of mtDNA are contained in exfoliated epithelial cells (He

518

et al., 2016; Tambalo et al., 2012; Caldwell et al., 2009). Thus, these

519

mtDNA copies could provide strong positive signals comparable to those

520

of the Bacteroidales 16S rRNA genes.

521

We also found that human-associated Bacteroidales markers had

522

lower

523

nonhuman-associated

524

human-associated Bacteroidales markers in target samples exhibited a

sensitivity

and

concentrations Bacteroidales

in

target markers.

samples

than

Moreover,

525

relatively broad concentration distribution. As far as we know, although

526

an investigation of the widespread distribution and stability of most

527

Bacteroidales markers has not been systematically performed to date, our

528

partial results are consistent with previous study that reported that

529

BacCow has a broader target host distribution and greater stability than

530

BacH and BacHum (Reischer et al., 2013). This can be attributed to less

531

dominant and more variable target Bacteroidales in the human gut

532

compared to that in other mammals. This was illustrated in a previous

533

study, where target Bacteroidales in pig and cow faeces showed higher

534

and more stable relative abundances than human-associated Bacteroidales

535

including B. fragilis, B. caccae, B. uniformis, and B. vulgatus (Hong,

536

2010). Overall, the data suggests that the distribution of target

537

microorganism populations has a significant effect on the sensitivity

538

performance and concentrations of genetic markers.

539

A previous study reported that a highly abundant bacteriophage

540

“crAssphage” was discovered in human faeces (Dutilh et al., 2014);

541

however, in our study, sensitivity and concentrations of the CPQ_056 and

542

CPQ_064 crAssphage markers in target samples were relatively low

543

compared to other evaluated markers. One study also consistently found

544

lower sensitivity of CPQ_056 and CPQ_064 (both were at 46.1%) in

545

faecal samples (Ahmed et al. 2018). This may be explained by uneven

546

distribution of crAssphage in human faeces (Ahmed et al. 2018). Previous

547

studies have suggested that host age, health or other factors may influence

548

dispersion or aggregation of crAssphage in the faeces (Liang et al., 2016;

549

Cinek et al., 2018; Ahmed et al. 2018), thereby affecting their detection

550

using qPCR (Ahmed et al. 2018). However, as this experiment did not

551

consider the age information from donors and lacked of faecal samples

552

from patients, the reason for low sensitivity of CPQ_056 and CPQ_064

553

needs to be further confirmed in future studies. In addition, the difference

554

in crAssphage abundance among the Chinese, Europeans and Americans

555

may also be the reason for the low sensitivity of CPQ_056 and CPQ_064

556

in this study. A previous study has reported that crAssphage was less

557

abundant in sewage from Asia compared to that in United States and

558

Europe, in a survey of 86 publicly available metagenomes (Stachler et

559

al. 2014).

560

4.3 Cross-reactivity

561

The occurrence of cross-reactivity from nontarget samples leads to

562

poor marker specificity. To our knowledge, there are no genetic markers

563

with

564

cross-reactivity are unclear and require further investigation. Some

565

genome regions of host-associated genetic markers may have homology

566

to microorganisms among different animals (Stachler et al., 2018).

567

Cohabitation may also be an important factor explaining cross-reactivity.

568

In our study, frequent contact between humans and pets in urbanised

absolute

host

specificity

to

date.

The

mechanisms

for

569

areas lead to a higher proportion of false-positive signals from pets in

570

human-associated markers. Similar observations were found in other

571

Asian areas. For instance, a previous study reported that rabbits

572

contribute a large proportion of cross-reactivity to human-associated

573

markers among nontarget samples in Singapore due to their widespread

574

adoption as common pets in this area (Nshimyimana et al., 2017).

575

Moreover, in rural areas, free-roaming poultry and pets (especially dogs)

576

that are not isolated from each other might also explain the large

577

proportion of cross-reactivity from pets to the poultry-associated GFD

578

marker. However, as we did not collect faecal samples from pet-like

579

animals which present habitat isolation with humans and poultry, such as

580

wild rabbits, the influence of cohabitation needs to be further confirmed

581

in future studies by using animals from different habitats. Besides the

582

factor of cohabitation, diet and physiology may also explain

583

cross-reactivity. For instance, the high proportion of cross-reactivity from

584

non-common animals in ruminant-associated markers observed in our

585

study might be attributed to the relatively similar diets and physiologies

586

between these animals.

587

Despite the observations mentioned above, most genetic markers

588

exhibit a limited cross-reactivity with nontarget samples. This may help

589

to correctly identify sources of contamination if two or more markers are

590

used simultaneously (Ahmed et al., 2019). This highlights the importance

591

of characterising the species cross-reactivity of each marker to effectively

592

combine them for optimized target identification. For example, based on

593

our results, if the Hum2 and CPQ_056 human-associated markers are

594

used simultaneously, the potential false-positive signals due to the

595

presence of Hum2 in donkey may be identified when tracking the source

596

of faecal contamination in water bodies. Similarly, Hum2 could also be

597

used to resolve the false-positive results associated with duck and dog,

598

which derive from the CPQ_056 marker (Fig. S1).

599

4.4 Low level of false positive signals

600

The inconsistent species cross-reactivity for genetic markers in

601

different locations poses another significant challenge for the application

602

of MST assays. For instance, Hum2 was previously identified to

603

cross-react with chicken, sheep, cattle, and goose in some parts of the

604

United States (Layton et al., 2013; Shanks et al., 2009), but did not

605

amplify with these faecal samples in our study although cross-reactivity

606

was observed in pigs, donkeys, and rabbits (Fig. S2). This geographical

607

variability in cross-reactivity suggests that markers may produce solely

608

negative results for some animals if tested in a different place or region.

609

However, our study revealed that most false positive signals in nontarget

610

hosts were significantly lower than in target hosts (Fig. 2), this

611

observation could provide beneficial help to exclude the cross-reactivity.

612

This is consistent with several earlier studies, which reported that the

613

mean concentrations of markers in target sources were generally 1–5

614

orders of magnitude higher than in nontarget sources (Reischer et al.,

615

2013; Ahmed et al., 2019). Such low levels of these nontarget hosts may

616

not significantly interfere with the interpretation of results due to dilution

617

factors and loss of gene copies through sample concentration and DNA

618

extraction in water samples (Ahmed et al., 2019). Moreover, we observed

619

that most of the highly sensitive markers, such as Pig-1-Bac, Pig-2-Bac

620

and ruminant-associated markers, appeared to have a greater gap between

621

true positive and false positive signals. The mean concentrations in target

622

samples of these markers were 3–4 orders of magnitude higher than that

623

in nontarget samples. This may also provide more possibilities for

624

detecting the potential source of faecal pollution while eliminating false

625

positive signals.

626

It should be noted that though most false-positive signals were

627

significantly lower compared to true-positive signals, there will still be

628

individual cross-reactive species with a higher likelihood to exhibit

629

false-positive signals and influence the results of MST. For example, in

630

our study, rabbit and donkey exhibited a high degree of cross-reactivity

631

with human- and ruminant-associated markers, respectively. These

632

false-positive signals are likely to confound the detection of true-positive

633

signals. Therefore, if false-positive signals are suspected to be present in

634

water samples, the species known to be cross-reacting (i.e. those that

635

show similar concentrations to target samples), should be prioritised for

636

higher scrutiny or otherwise excluded from the study, adopting other

637

markers instead.

638

4.5 Implication of classification for host specificity

639

Although the mean concentrations of markers were lower in nontarget

640

samples than in target samples, it would be difficult to predict whether the

641

resulting lower false-positive signals from cross-reactive animals will

642

accordingly have little impact on the MST results. We proposed a novel

643

way to address this problem based on a 25th/75th metric. If the 25th

644

percentile concentration of target samples is not overlapped by the 75th

645

percentile concentration of nontarget samples, markers usually present

646

the clearest gap of distributions between true-positive and false-positive

647

signals. A small proportion of positive signals in nontarget samples may

648

further reduce the likelihood of a false-positive signal interfering with the

649

interpretation of results.

650

Our recommendations provide opportunities for policymakers and

651

managers to select suitable markers according to specific requirements,

652

such as land-use patterns. For instance, in highly urbanised areas, there

653

may not be significant faecal contamination input from livestock and

654

poultry farming to justify their monitoring. Rather, the main pollution

655

sources in here would be humans and pets. According to the classification

656

of our results, the application of the human-specific SYBR-HF183

657

marker could provide high sensitivity to human-originated pollution.

658

Furthermore, the cross-reactivity from most nontarget animals could be

659

masked due to their marker’s low-level abundance in nontarget hosts (i.e.

660

the animals which were assigned into WCR). Additionally, pairing the

661

Hum2 and CPQ_064 markers could accurately rule-out the interference

662

of cats (MCR) since there were no false-positive signals presented in

663

these animals. Moreover, Pig-2-Bac or BacCow could also be used to

664

rule-out cross-reactivity from rabbits (SCR).

665

It is important to note that the acquisition of reliable 25th/75th metrics

666

and mean concentrations in target and nontarget samples require a large

667

number of validation samples from various animals. There is no

668

established guideline on how many nontarget hosts must be validated in

669

specificity testing; however, the USEPA MST guidelines suggest that

670

more than 10 species of animals should be used for evaluating host

671

specificity, which would place our study well within an acceptable range.

672

However, the number of nontarget host species and sample size should be

673

increased when testing host specificity beyond a regional context, due to

674

the variability of the markers in different geographic locations. Also,

675

when tracking the source of faecal contamination in environmental waters,

676

the level of true-positive signals may critically mask false-positive signals

677

from WCR. If high concentrations of markers occur in water samples,

678

potential false-positive signals from WCR may also be increased and

679

interfere with MST results. Therefore, markers with high concentrations

680

in water samples should be diluted to ensure that the lower false-positive

681

signals from WCR cannot be detected.

682

This study validated the performance of a range of host-associated

683

genetic markers based on qualitative and quantitative tests for MST in

684

China and proposed a novel classification method for comprehensive

685

characterization of the specificity of markers. Our findings provide

686

opportunities for policymakers and managers to gain access to key

687

background information for selecting suitable markers to address the

688

challenges of accurately tracking faecal pollution sources in a quantitative

689

manner.

690

Unfortunately, in our pre-screening experiments, only one poultry

691

marker (GFD) was found to have applicability in China, and a single

692

marker is hard to validate for accuracy of MST results. The ability to

693

develop reliable genetic markers for the specific detection of poultry

694

faecal contaminations is still a worldwide challenge. This could be due to

695

different distributions of gut microbes in poultry compared to mammals.

696

Bacteroides and its associated organisms are commonly used to develop

697

host-associated genetic markers (Vadde et al., 2019), but a previous study

698

revealed that Bacteroides were not frequently detected in poultry gut or

699

faeces (Zhu et al., 2002; Scupham et al., 2008). Thus, further research

700

needs to focus on developing reliable methods for poultry-associated

701

source tracking. Moreover, the varying persistence of various genetic

702

markers could also affect MST results. A recent study established decay

703

models for HF183 and CPQ_056 in fresh and seawater as a function of

704

temperature; the results showed that the decay rate of HF183 was

705

significantly faster than for CPQ_056 (Balleste et al., 2019). This

706

difference in decay rate may affect the coupling of HF183 with CPQ_056

707

to

708

identification of decay rates of suitable markers is equally important for

709

MST validation.

710

5. Conclusions

711



712

MST

713

contamination in China. Overall, Bacteroidales markers exhibited higher

714

sensitivity and concentrations compared to other bacterial and viral

715

markers in target samples, but their specificity was low, suggesting that it

716

might be necessary to use multiple markers when tracking the sources of

717

faecal contamination.

718



719

and poultry-associated markers was likely due to cohabitation. Likewise,

720

similarities in diet and physiology may explain cross-reactivity from

721

noncommon animals (horse and donkey) to ruminant-associated markers.

722



determine

cross-reactivity

from

nontarget

hosts.

Therefore,

Fifteen host-associated markers presented potential suitability for of

human-,

pig-,

ruminant-

and

poultry-derived

faecal

The observed high proportion of cross-reactivity from pets to human-

Our novel animal cross-reactivity classification method has broad

723

implications for identifying the degree of impact of false-positive results.

724

According to this method, there were 77.9% nontarget samples

725

considered to be unlikely mismatched to the selected 15 markers due to

726

their absence or low concentrations in nontarget samples.

727 728

Acknowledgements

729

This research was financially supported by the National Natural Science

730

Foundation of China (Grant No. 41303054), the Basal Specific Research

731

of

732

PM-zx703-201803-089), the Outstanding Youth Science Foundation of

733

NSFC (Grant No. 51822908) and Natural Science Foundation of

734

Guangdong Province of China (Grant No. 2015A030313850).

the

Central

Public-Interest

Scientific

Institute

(Grant

No.

735 736

Reference

737

Ahmed, W., Goonetilleke, A., Powell, D., Gardner, T., 2009. Evaluation

738

of multiple sewage-associated Bacteroides PCR markers for sewage

739

pollution tracking. Water Res. 43(19), 4872-4877.

740

Ahmed, W., Gyawali, P., Feng, S., McLellan, S.L., 2019a. Host

741

specificity

742

sewage-associated marker genes in human and non-human fecal

743

samples. Appl. Environ. Microbiol. in press.

744

and

sensitivity

of

the

established

and

novel

Ahmed, W., Payyappat, S., Cassidy M., Besley C., Power, K., 2018.

745

Novel crAssphage marker genes ascertain sewage pollution in a

746

recreational lake receiving urban stormwater runoff. Water Res. 145,

747

769-778.

748

Ahmed, W., O'Dea, C., Masters, N., Kuballa, A., Marinoni, O., Katouli,

749

M., 2019b. Marker genes of fecal indicator bacteria and potential

750

pathogens in animal feces in subtropical catchments. Sci. Total

751

Environ. 656, 1427-1435.

752

Ahmed, W., Stewart, J., Powell, D., Gardner, T., 2010a. Evaluation of

753

Bacteroides markers for the detection of human faecal pollution.

754

Lett. Appl. Microbiol. 46(2), 237-242.

755

Ahmed, W., Yusuf, R., Hasan, I., Goonetilleke, A., Gardner, T., 2010b.

756

Quantitative PCR assay of sewage-associated Bacteroides markers

757

to assess sewage pollution in an urban lake in Dhaka, Bangladesh.

758

CAN. J. Microbiol. 56(10), 838.

759

Ayeni, F.A., Biagi, E., Rampelli, S., Fiori, J., Soverini, M., Audu, H.J.,

760

Cristino, S., Caporali, L., Schnorr, S.L. and Carelli, V., 2018. Infant

761

and Adult Gut Microbiome and Metabolome in Rural Bassa and

762

Urban Settlers from Nigeria. Cell Rep. 23(10), 3056-3067.

763

Balleste, E., Pascual-Benito, M., Martin-Diaz, J., Blanch, A.R., Lucena,

764

F., Muniesa, M., Jofre, J. and Garcia-Aljaro, C., 2019. Dynamics of

765

crAssphage as a human source tracking marker in potentially

766

faecally polluted environments. Water Res. 155, 233-244.

767

Bernhard, A.E., Field, K.G., 2000. A PCR assay to discriminate human

768

and ruminant feces on the basis of host differences in

769

Bacteroides-Prevotella genes encoding 16S rRNA. Appl. Environ.

770

Microbiol. 66(10), 4571-4574.

771

Boehm, A.B., Werfhorst, L.C., Van De, Griffith, J.F., Holden, P.A., Jay,

772

J.A., Shanks, O.C., Dan, W., Weisberg, S.B., 2013. Performance of

773

forty-one microbial source tracking methods: A twenty-seven lab

774

evaluation study. Water Res. 47(18), 6812-6828.

775

Bonjoch, X., Balleste, E., Blanch, A.R., 2005. Enumeration of

776

bifidobacterial populations with selective media to determine the

777

source of waterborne fecal pollution. Water Res. 39(8), 1621-1627.

778

Caldwell, J.M., Levine, J.F., 2009. Domestic wastewater influent

779

profiling using mitochondrial real-time PCR for source tracking

780

animal contamination. J. Microbiol. Meth. 77(1), 17-22.

781

Cinek, O., Mazankova, K., Kramna, L., Odeh, R., Alassaf, A., Ibekwe,

782

M.U., Ahmadov, G., Mekki, H., Abdullah, M.A., Elmahi, B.M.E.,

783

2018. Quantitative CrAssphage real-time PCR assay derived from

784

data of multiple geographically distant populations. J. Med.

785

Microbiol. 90(4), 767-771.

786

Dutilh, B.E., Cassman, N.A., Mcnair, K., Sanchez, S.E., Silva, G.G.Z.,

787

Boling, L., Barr, J.J., Speth, D.R., Seguritan, V., Aziz, R.K., 2014. A

788

highly abundant bacteriophage discovered in the unknown

789

sequences of human faecal metagenomes. Nat. Commun. 5(1),

790

4498-4498.

791

Fan, L., Shuai, J., Zeng, R., Mo, H., Wang, S., Zhang, X., He, Y., 2017.

792

Validation and application of quantitative PCR assays using

793

host-specific Bacteroidales genetic markers for swine fecal pollution

794

tracking. Environ. Pollut. 23, 1569-1577.

795

Feng, S., McLellan, S.L., 2019. Highly specific sewage-derived

796

Bacteroides qPCR assays target sewage polluted waters. Appl.

797

Environ. Microbiol. in press.

798

Gawler, A.H., Beecher, J.E., Brandão, J., Carroll, N.M., Falcão, L.,

799

Gourmelon, M., Masterson, B., Nunes, B., Porter, J., Rincé, A., 2007.

800

Validation of host-specific Bacteriodales 16S rRNA genes as

801

markers to determine the origin of faecal pollution in Atlantic Rim

802

countries of the European Union. Water Res. 41(16), 3780-3784.

803

Green, H.C., Dick, L.K., Brent, G., Mansour, S., Field, K.G., 2012.

804

Genetic markers for rapid PCR-based identification of gull, Canada

805

goose, duck, and chicken fecal contamination in water. Appl.

806

Environ. Microbiol. 78(2), 503-510.

807

Harwood, V.J., Christopher, S., Badgley, B.D., Kim, B., Asja, K., 2014.

808

Microbial source tracking

809

contamination in environmental waters: relationships between

810

pathogens and human health outcomes. FEMS Microbiol. Rev. 38(1),

markers for detection

of fecal

811

1-40.

812

He, X., Liu, P., Zheng, G., Chen, H., Shi, W., Cui, Y., Ren, H., Zhang,

813

X.X., 2016. Evaluation of five microbial and four mitochondrial

814

DNA markers for tracking human and pig fecal pollution in

815

freshwater. Sci. Rep. 6, 35311.

816

Hong, P.Y., 2010. A high-throughput and quantitative hierarchical

817

oligonucleotide primer extension (HOPE)-based approach to identify

818

sources of faecal contamination in water bodies. Environ. Microbiol.

819

11(7), 1672-1681.

820

Karkman, A., Parnanen, K. and Larsson, D.G.J. (2019) Fecal pollution

821

can

822

anthropogenically impacted environments. Nat. Commun. 10(1), 80.

explain

antibiotic

resistance

gene

abundances

in

823

Kildare, B.J., Leutenegger, C.M., McSwain, B.S., Bambic, D.G., Rajal,

824

V.B., Wuertz, S., 2007. 16S rRNA-based assays for quantitative

825

detection of universal, human-, cow-, and dog-specific fecal

826

Bacteroidales: A Bayesian approach. Water Res. 41(16), 3701-3715.

827

Kim, J.Y., Lee, H., Lee, J.E., Chung, M., Ko, G., 2013. Identification of

828

Human and Animal Fecal Contamination after Rainfall in the Han

829

River, Korea. Microbes Environ. 28(2), 187-194.

830

Kushugulova, A., Forslund, S.K., Costea, P.I., Kozhakhmetov, S.,

831

Khassenbekova, Z., Urazova, M., Nurgozhin, T., Zhumadilov, Z.,

832

Benberin, V. and Driessen, M., 2018. Metagenomic analysis of gut

833

microbial communities from a Central Asian population. BMJ Open

834

8(7).

835

Layton, B.A., Mckay, L., Dan, W., Garrett, V., Gentry, R., Sayler, G.,

836

2006. Development of Bacteroides 16S rRNA Gene TaqMan-Based

837

Real-Time PCR Assays for Estimation of Total, Human, and Bovine

838

Fecal Pollution in Water. Appl. Environ. Microbiol. 72(6),

839

4214-4224.

840

Layton, B.A., Yiping, C., Ebentier, D.L., Kaitlyn, H., Elisenda, B., João,

841

B.O., Muruleedhara, B., Reagan, C., Farnleitner, A.H., Jennifer, G.S.,

842

2013. Performance of human fecal anaerobe-associated PCR-based

843

assays in a multi-laboratory method evaluation study. Water Res.

844

47(18), 6897-6908.

845

Liang, Y.Y., Zhang, W., Tong, Y.G., Chen, S.P., 2016. CrAssphage is not

846

associated with diarrhoea and has high genetic diversity. Epidemiol.

847

Infect. 144(16), 3549-3553.

848

Lu, J., Santo Domingo, J.W., Lamendella, R., Edge, T., Hill, S., 2008.

849

Phylogenetic diversity and molecular detection of bacteria in gull

850

feces. Appl. Environ. Microbiol. 74(13), 3969-3976.

851

Lu, S., Smith, A.P., Dan, M. and Lee, N.M. (2010) Different real-time

852

PCR systems yield different gene expression values. Mol Cell

853

Probes 24(5), 315-320.

854

Malla, B., Ghaju, S.R., Tandukar, S., Bhandari, D., Inoue, D., Sei, K.,

855

Tanaka, Y., Sherchand, J.B., Haramoto, E., 2018. Validation of

856

host-specific Bacteroidales quantitative PCR assays and their

857

application to microbial source tracking of drinking water sources in

858

the Kathmandu Valley, Nepal. J. Appl. Microbiol. 125, 609-619.

859

Mayer, R.E., Reischer, G., Ixenmaier, S.K., Derx, J., Blaschke, A.P.,

860

Ebdon, J.E., Linke, R., Egle, L., Ahmed, W., Blanch, A., 2018.

861

Global Distribution of Human-associated Fecal Genetic Markers in

862

Reference Samples from Six Continents. Environ. Sci. Technol.

863

52(9), 5076-5084.

864

Mieszkin, S., Furet, J.-P., Corthier, G., Gourmelon, M., 2009. Estimation

865

of Pig Fecal Contamination in a River Catchment by Real-Time

866

PCR Using Two Pig-Specific Bacteroidales 16S rRNA Genetic

867

Markers. Appl Environ Microbiol. 75(10), 3045-3054.

868

Mieszkin, S., J-F, Y., Joubrel, R., Gourmelon, M., 2010. Phylogenetic

869

analysis of Bacteroidales 16S rRNA gene sequences from human

870

and animal effluents and assessment of ruminant faecal pollution by

871

real-time PCR. J. Appl. Microbiol. 108(3), 974-984.

872

Nshimyimana, J.P., Cruz, M.C., Thompson, R.J., Wuertz, S., 2017.

873

Bacteroidales markers for microbial source tracking in Southeast

874

Asia. Water Res. 118, 239-248.

875

Nshimyimana, J.P., Ekklesia, E., Shanahan, P., Chua, L.H.C., Thompson,

876

J.R., 2014. Distribution and abundance of human specific

877

Bacteroides and relation to traditional indicators in an urban tropical

878

catchment. J. Appl. Microbiol. 116(5), 1369-1383.

879

Odagiri, M., Schriewer, A., Hanley, K., Wuertz, S., Misra, P.R., Panigrahi,

880

P., Jenkins, M.W., 2015. Validation of Bacteroidales quantitative

881

PCR assays targeting human and animal fecal contamination in the

882

public and domestic domains in India. Sci. Total Environ. 502(5),

883

462-470.

884

Reischer, G.H., Ebdon, J.E., Bauer, J.M., Nathalie, S., Warish, A., Johan,

885

A.M., Blanch, A.R., Günter, B.S., Denis, B., Tricia, C., 2013.

886

Performance characteristics of qPCR assays targeting human- and

887

ruminant-associated Bacteroidetes for microbial source tracking

888

across sixteen countries on six continents. Environ. Sci. Technol.

889

47(15), 8548-8556.

890

Reischer, G., Kasper, D., Steinborn, R., Farnleitner, A., Mach, R., 2010. A

891

quantitative real-time PCR assay for the highly sensitive and specific

892

detection of human faecal influence in spring water from a large

893

alpine catchment area. Lett. Appl. Microbiol. 44(4), 351-356.

894

Seurinck, S., Defoirdt, T., Verstraete, W., Siciliano, S. D., 2005. Detection

895

and quantification of the human-specific HF183 Bacteroides 16S

896

rRNA genetic marker with real-time PCR for assessment of human

897

faecal pollution in freshwater. Environ. Microbiol. 7(2), 249-259

898

Scupham, A.J., Patton, T.G., Bent, E., Bayles, D.O., 2008. Comparison of

899

the Cecal Microbiota of Domestic and Wild Turkeys. Microb. Ecol.

900

56(2), 322-331.

901

Shanks, O.C., Karen, W., Kelty, C.A., Mano, S., Janet, B., Mark, M.,

902

Manju, V., Haugland, R.A., 2010a. Performance of PCR-based

903

assays targeting Bacteroidales genetic markers of human fecal

904

pollution in sewage and fecal samples. Environ. Sci. Technol. 44(16),

905

6281-6288.

906

Shanks, O.C., Karen, W., Kelty, C.A., Sam, H., Mano, S., Michael, J.,

907

Manju, V., Haugland, R.A., 2010b. Performance assessment

908

PCR-based assays targeting Bacteroidales genetic markers of bovine

909

fecal pollution. Appl. Environ. Microbiol. 76(5), 1359-1366.

910

Shanks, O.C., Kelty, C.A., Mano, S., Manju, V., Haugland, R.A., 2009.

911

Quantitative PCR for genetic markers of human fecal pollution. Appl.

912

Environ. Microbiol. 75(17), 5507-5513.

913

Shanks, O.C., Kelty, C.A., Shawn, A., Michael, J., Newton, R.J., Mclellan,

914

S.L., Huse, S.M., Sogin, M.L., 2011. Community structures of fecal

915

bacteria in cattle from different animal feeding operations. Appl.

916

Environ. Microbiol. 77(9), 2992-3001.

917

Stachler, E., Akyon, B., Carvalho, N.A.d., Ference, C., Bibby, K., 2018.

918

Correlation of crAssphage qPCR Markers with Culturable and

919

Molecular Indicators of Human Fecal Pollution in an Impacted

920

Urban Watershed. Environ. Sci. Technol. 52(13), 7505-7512.

921

Stachler, E., Bibby, K., 2014. Metagenomic Evaluation of the Highly

922

Abundant Human Gut Bacteriophage CrAssphage for Source

923

Tracking of Human Fecal Pollution. Environ. Sci. Technol. Lett.

924

1(10), 405-409.

925

Stewart, J.R., Boehm, A.B., Dubinsky, E.A., Fong, T.T., Goodwin, K.D.,

926

Griffith, J.F., Noble, R.T., Shanks, O.C., Vijayavel, K., Weisberg,

927

S.B.,

928

comparison of microbial source tracking methods. Water Res. 47(18),

929

6829-6838.

2013.

Recommendations

following

a

multi-laboratory

930

Tambalo, D.D., Boa, T., Liljebjelke, K., Yost, C.K., 2012. Evaluation of

931

two quantitative PCR assays using Bacteroidales and mitochondrial

932

DNA markers for tracking dog fecal contamination in waterbodies. J.

933

Microbiol. Meth. 91(3), 459-467.

934 935

USEPA,

2005.

Microbial

Source

Tracking

Guide

Document

EPA/600/R-05/064. Washington, DC.

936

Vadde, K.K., McCarthy, A.J., Rong, R., Sekar, R., 2019. Quantification of

937

Microbial Source Tracking and Pathogenic Bacterial Markers in

938

Water and Sediments of Tiaoxi River (Taihu Watershed). Front.

939

Microbiol. 10.

940

Zhang, Y., Wu, R., Zhang, Y., Wang, G., Li, K., 2018. Impact of nutrient

941

addition on diversity and fate of fecal bacteria. Sci. Total Environ.

942

636, 717-726.

943

Zhu, X.Y., Zhong, T., Pandya, Y., Joerger, R.D., 2002. 16S rRNA-Based

944

Analysis of Microbiota from the Cecum of Broiler Chickens. Appl.

945

Environ. Microbiol. 68(1), 124-137.

946

Zuo, T., Kamm, M.A., Colombel, J.F. and Ng, S.C. (2018) Urbanization

947

and the gut microbiota in health and inflammatory bowel disease.

948

Nat. Rev. Gastro. Hepat. 15(7), 440-452.

Performance of host-associated genetic markers for microbial source tracking in China

Highlights: 1. Performance of host-associated genetic markers were investigated in a large-scale area across China. 2. Distribution of target microorganisms affect the sensitivity and concentrations in target samples for corresponding markers. 3. Cohabitation, diet and physiology are important reason for occurrence of cross-reactivity. 4. Identifying the degree of impact of false-positive results from nontarget hosts by novel classification method.

Declaration of interests The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: