Distinguishing neurocysticercosis epilepsy from epilepsy of unknown etiology using a minimal serum mass profiling platform

Distinguishing neurocysticercosis epilepsy from epilepsy of unknown etiology using a minimal serum mass profiling platform

Accepted Manuscript Distinguishing neurocysticercosis epilepsy from epilepsy of unknown etiology using a minimal serum mass profiling platform Jay S. ...

NAN Sizes 3 Downloads 36 Views

Accepted Manuscript Distinguishing neurocysticercosis epilepsy from epilepsy of unknown etiology using a minimal serum mass profiling platform Jay S. Hanas, James R. Hocker, Govindan Ramajayam, Vasudevan Prabhakaran, Vedantam Rajshekhar, Anna Oommen, Josephine J. Manoj, Michael P. Anderson, Douglas A. Drevets, Hélène Carabin PII:

S0014-4894(17)30648-3

DOI:

10.1016/j.exppara.2018.07.015

Reference:

YEXPR 7590

To appear in:

Experimental Parasitology

Received Date: 5 January 2018 Revised Date:

8 June 2018

Accepted Date: 20 July 2018

Please cite this article as: Hanas, J.S., Hocker, J.R., Ramajayam, G., Prabhakaran, V., Rajshekhar, V., Oommen, A., Manoj, J.J., Anderson, M.P., Drevets, D.A., Carabin, Héè., Distinguishing neurocysticercosis epilepsy from epilepsy of unknown etiology using a minimal serum mass profiling platform, Experimental Parasitology (2018), doi: 10.1016/j.exppara.2018.07.015. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

1

Distinguishing neurocysticercosis epilepsy from epilepsy of unknown etiology using a minimal

2

serum mass profiling platform.

3

Jay S. Hanas*, James R. Hocker*, Govindan Ramajayam†, Vasudevan Prabhakaran†, Vedantam

5

Rajshekhar†, Anna Oommen†, Josephine J. Manoj†, Michael P. Anderson‡, Douglas A. Drevets§,

6

Hélène Carabin‡

RI PT

4

7

SC

8 9

Corresponding author:

11

Hélène Carabin, D.V.M., Ph.D.

12

Dep of Biostatistics and Epidemiology, College of Public Health

13

University of Oklahoma Health Sciences Center

14

801 NE 13th St, Oklahoma City, OK, 73104

15

Tel: +1 405 271-2229 x48083

16

Fax: +1 405 271-2068

17

email: [email protected]

20

TE D

EP

19

AC C

18

M AN U

10

Note: Supplementary data associated with this article′

ACCEPTED MANUSCRIPT

Abstract

22

Neurocysticercosis is associated with epilepsy in pig-raising communities with poor sanitation.

23

Current internationally recognized diagnostic guidelines for neurocysticercosis rely on brain

24

imaging, a technology that is frequently not available or not accessible in areas endemic for

25

neurocysticercosis. Minimally invasive and low-cost aids for diagnosing neurocysticercosis

26

epilepsy could improve treatment of neurocysticercosis. The goal of this study was to test the extent

27

to which patients with neurocysticercosis epilepsy, epilepsy of unknown etiology, idiopathic

28

headaches and among different types of neurocysticercosis lesions could be distinguished from each

29

other based on serum mass profiling. For this, we collected sera from patients with

30

neurocysticercosis-associated epilepsy, epilepsy of unknown etiology, recovered neurocysticercosis,

31

and idiopathic headaches then performed binary group comparisons among them using electrospray

32

ionization mass spectrometry. A leave one [serum sample] out cross validation procedure was

33

employed to analyze spectral data. Sera from neurocysticercosis patients was distinguished from

34

epilepsy of unknown etiology patients with a p-value of 10-28. This distinction was lost when

35

samples were randomized to either group (p-value=0.22). Similarly, binary comparisons of patients

36

with neurocysticercosis who has different types of lesions showed that different forms of this

37

disease were also distinguishable from one another. These results suggest neurocysticercosis

38

epilepsy can be distinguished from epilepsy of unknown etiology based on biomolecular differences

39

in sera detected by mass profiling.

41 42

SC

M AN U

TE D

EP

AC C

40

RI PT

21

Keywords: neurocysticercosis, epilepsy, diagnosis, serum, electrospray mass spectrometry, India

43 44

2

ACCEPTED MANUSCRIPT

1. Introduction

46

Epilepsy is a common neurological disorder affecting approximately 6.38 per 1000 persons (95%

47

Confidence Interval: 5.57-7.30 per 1000) (Fiest et al., 2017). Eighty-five percent of patients with

48

epilepsy reside in low and middle income countries (LMIC) where mortality rates from epilepsy are

49

also significantly higher (Newton and Garcia, 2012; Ngugi et al., 2010). Neuroimaging (CT and

50

MRI) of patients with adult onset epilepsy in endemic regions frequently reveals lesions diagnostic

51

or suggestive of Neurocysticercosis (NCC), a zoonotic infection of the central nervous system by

52

larvae of Taenia solium. The infection is transmitted between humans (definitive host) and pigs

53

(intermediate host). However, NCC may develop when humans become accidently infected with the

54

eggs of T. solium shed in human feces. NCC is most prevalent where sanitation is poor and pigs

55

roam and scavenge for food which includes several countries of Latin America, Africa and Asia

56

(Donadeu et al., 2016). A meta-analysis estimated that 29% of people with epilepsy show lesions

57

consistent with NCC in endemic areas (Ndimubanzi et al., 2010). The internationally recognized

58

NCC diagnostic guidelines rely on brain imaging (Del Brutto, 2012), facilities which are poorly

59

accessible to most people in endemic areas in LMIC. The mismatch between brain imaging

60

accessibility and prevailing economic realities in LMICs creates challenges for accurately

61

diagnosing NCC as a cause of epilepsy and for validly estimating the frequency of, and risk factors

62

for NCC (John et al., 2015; Ndimubanzi et al., 2010).

SC

M AN U

TE D

EP

Inflammatory responses are important for NCC epilepsy and epilepsy of unknown etiology

AC C

63

RI PT

45

64

(EUE), but the host response to NCC is not completely understood (Garcia et al., 2014; Vezzani,

65

2005). Host responses to degenerating larvae and calcified lesions are thought to be associated with

66

seizures in NCC epilepsy (Nash et al., 2015). Clinical distinction between seizures associated with

67

NCC, including among different types of NCC lesions, or EUE is impossible, although critical to

68

guide treatment and monitoring of patients after diagnosis (Coyle, 2014; Nash and Garcia, 2011).

3

69

ACCEPTED MANUSCRIPT

Analysis of biomolecules in readily accessible body fluids is one avenue of research for developing alternative NCC diagnostic aids in LMICs. While serological tests detecting antigens

71

and antibodies have had success in diagnosing the most active forms of NCC, their specificities

72

remain poor because metacestodes can be found in most human tissues and their sensitivities are

73

low for detection of single or calcified NCC lesions (Rodriguez et al., 2012; Sako et al., 2015). In

74

addition, these tests do not differentiate multiple from single lesions or among cysts in different

75

stages of development in the brain.

RI PT

70

Although soluble adhesion molecules in CSF and sera have been suggested as biomarkers for

77

EUE (Luo et al., 2014), one method of biomarker investigation not explored for NCC and EUE is

78

electrospray ionization (ESI) mass spectrometry (MS) serum mass profiling. The hypothesis of

79

serum biomolecule mass profiling is that the amounts and kinds of biomolecules in serum reflect

80

physiological changes, including those accompanying disease states (Hocker et al., 2011a; Hocker

81

et al., 2011b; Richter et al., 1999). The ESI-MS serum mass profiling platform requires minimal

82

sample preparation and examines a large number of different biomolecules in sera. In contrast, other

83

biomarker platforms focus on one or relatively small numbers of similar components. Examining

84

larger numbers of biomolecules increases the power of the platform to discriminate among disease

85

states (Hocker et al., 2011a; Hocker et al., 2011b; Vachani et al., 2015). For example, ESI-MS

86

serum profiling has been used to discriminate early-stage pancreatic cancer and lung cancer patients

87

from control individuals (Hanas et al., 2008; Hocker et al., 2011a; Hocker et al., 2011b).

M AN U

TE D

EP

AC C

88

SC

76

The goal of this study was to assess the degree to which serum mass profiling using ESI-MS

89

could discriminate between patients with NCC-associated epilepsy and those with EUE or

90

idiopathic headaches, and among different types of NCC lesions.

91 92

2. Materials and methods

93

2.1 Study participant descriptions 4

ACCEPTED MANUSCRIPT

Patients aged 18 to 51 years were recruited at the Department of Neurological Sciences, Christian

95

Medical College (CMC) and Hospital, Vellore, India, as described elsewhere (Prabhakaran et al.,

96

2017). The study was approved by the Institutional Review Boards of CMC and the University of

97

Oklahoma HSC, USA. All participants consented to participate in this study. Participants were

98

categorized into four groups: Group 1 included new patients diagnosed with NCC-associated

99

epilepsy who had experienced at least one seizure in the 7 months prior to enrollment. Patient sub-

RI PT

94

groups included: i) solitary cysticercus granuloma (SCG), ii) single calcified cysts (SCC), iii)

101

multiple neurocysticercosis cysts at various stages of development (MNCC). NCC patients were

102

further categorized for the absence or presence of peri-lesional edema on brain imaging. Group 2

103

included previously-diagnosed NCC patients with no seizures for at least two years and no residual

104

brain lesions (recovered NCC - RNCC). Group 3 included new patients with EUE reporting at least

105

one seizure in the 7 months prior to enrollment, no evidence of NCC or other lesions on brain

106

imaging and seronegative for cysticercosis antigens and antibodies. Group 4 included new patients

107

with headaches and normal brain imaging, no history of seizures, head trauma, human

108

immunodeficiency virus (HIV), hepatitis B virus (HBV) and hepatitis C virus (HCV) infections, or

109

serum cysticercosis antigens or antibodies (herein designated as idiopathic headaches). Patients in

110

all groups had not taken anti-inflammatory drugs (i.e., acetaminophen, ibuprofen) at least 7 days

111

prior to enrollment, and were not acutely ill at the time of phlebotomy. Peripheral blood was

112

obtained and serum was prepared as described previously according to blood biomarker standards

113

(Hocker et al., 2017; Hocker et al., 2015; Tuck et al., 2009).

M AN U

TE D

EP

AC C

114

SC

100

NCC and RNCC patients with extra-parenchymal lesions were excluded. This is because the

115

study focused on patients with epilepsy. Patients were tested for HIV, HBV and HCV only as

116

clinically indicated. All patients were tested for antigens and antibodies for cysticercosis

117

(Prabhakaran et al., 2017).

118 5

ACCEPTED MANUSCRIPT

2.2 Definition of NCC-associated epilepsy and epilepsy of unknown etiology

120

As described before (Prabhakaran et al., 2017), we used the proposed diagnostic criteria, including

121

computed tomography (CT) or magnetic resonance imaging (MRI) brain images interpreted by one

122

of the authors (VR) to define cases of NCC and categorize them into the subgroups described above

123

(Del Brutto et al., 2001; Garcı́a and Del Brutto, 2003). These diagnostic criteria were the only

124

available at the time the study was initiated and therefore retained throughout the study. These

125

criteria were recently shown to have similar sensitivity and specificity as those proposed in(Carpio

126

et al., 2016). Single calcified lesions were defined according to the recommendations by del Brutto

127

et al in their diagnostic criteria – “ solid, dense, supratentorial calcifications 1 to 10mm in diameter,

128

in the absence of other illnesses should be considered as highly suggestive of

129

neurocysticercosis.”(Del Brutto et al., 2001). In this study, the diagnosis of solitary cysticercus

130

granuloma (SCG) was made on the basis of previously validated criteria for SCG that has been

131

published by Rajshekhar et al. (Rajshekhar and Chandy, 1997). Patients with MNCC could show

132

only active lesions (i.e. viable or degenerating cysts), only calcified lesions, or a combination of

133

both. The operational definition for epilepsy of the International League Against Epilepsy was used

134

so that those with NCC and a single seizure met the definition (Fisher et al., 2014). All EUE

135

patients had experienced at least two seizures in their lifetime.

SC

M AN U

TE D

EP

136

RI PT

119

2.3 Mass spectrometry

138

Mass spectrometry (MS) analysis was conducted with an Advantage LCQ ion-trap bench top ESI-

139

MS instrument (ThermoFisher, Inc.) and an ESI-Single Quadrupole (Advion) instrument, both were

140

calibrated following manufacturer protocols. All solvents were HPLC grade and purchased from

141

ThermoFisher. Each patient’s serum aliquot (4 µl) was diluted 1:300 into 50% methanol and 2%

142

formic acid, and separated into 3 aliquots. The samples were loop injected (20 µl) into the nano

143

source of the mass spectrometer fitted with a 20 micron inner diameter fused silica (Polymicro

AC C

137

6

ACCEPTED MANUSCRIPT

Technologies) tip at a flow rate of 0.5 µl/min using an Eldex MicroPro series 1000 pumping system

145

with instrument settings determined in previous work (Hocker et al., 2017). High-resolution

146

triplicate mass spectra from two of the study groups were collected each day. The spectra were

147

sampled with m/Z (mass divided by charge) resolution of two hundredths over the m/Z range of the

148

instrument (i.e. 400 to 2000 m/Z). Positive ion mode spectra were collected over 30 min for each

149

injection. Raw spectral data were extracted using the manufacturer's software "Qual Browser"

150

version 1.4SR1 and exported in rounded unit m/Z and intensity values. Data were locally

151

normalized in segments of 10 m/Z from 400-2000 m/Z. MS spectral peak area assignments were

152

calculated as centroid m/Z peak area values (valley to valley) using Mariner Data Explorer 4.0.0.1

153

software (Applied BioSystems).

154

Centroid m/Z mass peak areas (referred to as peak areas), defined as the area of the peak calculated

155

from its geometric m/Z center, were exported into Excel 2013, and triplicate peak areas at each m/Z

156

value were averaged for each serum sample.

157

An ESI-Single Quadrupole instrument (Advion, Inc.) was also used to analyse the sera. This MS

158

instrument uses a different mass analyzer with reduced m/Z range. Daily calibration with Agilent

159

ESI Tuning Mix (G242A) diluted 1:4 with 100% Acetonitrile on peaks of 188.09, 322.05, 622.03,

160

and 922.01 m/z was performed. Fluid solvent flow of 0.23 µL/min was provided by a Harvard

161

Apparatus Pump 11 Elite equipped with a Hamilton 250 microliter gastight syringe. There was no

162

gas flow provided. General modifications made to the standard Advion system and set up included:

163

“Advion Data Express version 3.3.5.2. The tip was identical to that used in the LCQ

164

ADVANTAGE except the voltage was supplied through a M-572 IDEX-Health & Science

165

conductive MicroUnion Assembly. All solvents were HPLC graded purchased from ThermoFisher

166

Scientific. All acquisition and calibration were performed with the same voltages and flow rate as

167

the Advantage LCQ serum analysis. Data analysis of Advion samples was conducted using a 15

AC C

EP

TE D

M AN U

SC

RI PT

144

7

ACCEPTED MANUSCRIPT

168

minute averaged mass spectra (150-1200 m/Z data range) was extracted for each of 3 injections for

169

each patient sample.

170

2.4 Statistical and quantitative analysis

172

Peak areas were analyzed with a nested leave one out [serum sample] cross validation (LOOCV)

173

protocol to mitigate “over-fitting” (Guan et al., 2009; Hocker et al., 2015; Ransohoff, 2004). Fig 1A

174

illustrates the general approach for comparing two study groups. First, all peak areas of one subject

175

(in either group) are taken out of the database (“left out” serum sample). Second, the difference of

176

the means of peak area at each m/Z value for subjects “left-in” the two compared groups is analyzed

177

with a Student’s t-test (one-tailed, unequal variance, (Hocker et al., 2015)) at an alpha value of 0.05.

178

For each statistically significant peak area, the mid-point between these two means is used as a Peak

179

Classification Value (PCV) to classify the left-out sample. If the peak area value of the left-out

180

sample is above the PCV, it is allocated to the study group with the highest mean peak area at this

181

m/Z value. Otherwise, it is allocated to the other group. As an example, Fig. 1B illustrates this

182

classification procedure for 10 differentially expressed peak areas observed between 650 and 720

183

m/Z when one NCC sample is left out and 75 NCC (solid line) and 29 EUE (dotted line) samples

184

are left in. The peak area at 670 m/Z is categorized as a “NCC” peak area and the one at 689 m/Z as

185

an “EUE” peak area. If the left-out sample had a peak area of 12 at 670 m/Z (> PCV), it would be

186

classified as a NCC peak area, but if its value was 8 (< PCV), it would be classified as EUE peak

187

area and so on for all 10 peaks in Fig. 1B. This process is repeated sequentially for all significant

188

peak areas between 400 and 2000 m/V and until all samples have been left out and compared to the

189

remaining left in samples.

AC C

EP

TE D

M AN U

SC

RI PT

171

190

Each left-out sample is scored as the number of significant peak areas assigned to a specific

191

group divided by the number of all significant peak areas in that group. We refer to this score as the

192

% Total Group LOOCV classified peak areas, abbreviated as % Total Group LOOCV. 8

ACCEPTED MANUSCRIPT

193

The overall ability of the LOOCV approach to correctly classify subjects is determined by

194

comparing the mean % Total Group LOOCV (for example mean % Total NCC LOOCV) between

195

subjects from two study groups (for example, NCC and EUE). The p-value of the difference in the

196

means is determined using a Student’s t-test with unequal variance.

RI PT

197

2.5 Estimating the sensitivity and specificity of the ESI-MS approach to classify subjects

199

Means and standard deviations (SD) of the % Total Group LOOCV used to estimate the p-value of

200

the difference between two study groups were used to estimate the sensitivity and specificity of the

201

ESI-MS LOOCV to correctly classify subjects. Cohen’s d effect size values are calculated from the

202

% LOOCV means and standard deviations of two groups being compared to get a sense for the

203

importance of the difference observed (Cohen, 1988; Soper, 2018). A Cohen’s d value of 0.8 and

204

above is interpreted as a large effect size (Cohen, 1988). The observed Cohen’s d values were then

205

combined with the observed means and standard deviations of the two groups compared to get a

206

sense of the statistical power for each comparison conducted as described by Soper (Soper, 2018).

207

Classification of subjects into the NCC or EUE groups is used as an example, with the assumption

208

that the mean % Total NCC LOOCV is larger for the NCC group. First, a scale factor is calculated

209

as follows:

210

  =

211

Second, the cut-off threshold value to classify samples into the NCC or EUE groups is obtained as

212

follows:

EP

TE D

M AN U

SC

198

 %      %   

.

AC C

 %     %   

  − "" #ℎ %ℎ &

= '( % #  ) *++, − - % #  ) *++, ∗  

213

Each subject’s % Total NCC LOOCV is then compared to the cut-off threshold and if above, the

214

subject is classified with the NCC group, otherwise, the subject is classified in the EUE group.

215

Taking NCC as, for example, the “infection” state and EUE as the “non-infection” state, this 9

ACCEPTED MANUSCRIPT

216

approach results in each sample being either correctly classified as NCC (True Positive or TP) or

217

EUE (True Negative or TN) or being wrongly classified as EUE (False Negative or FN) or NCC

218

(False Positive or FP). The sensitivity (Se = TP/TP+FN) and specificity (Sp = TN/TN+FP) are then

219

determined and the 95% confidence interval (95%CI) estimated using a binomial distribution.

RI PT

220

2.6 Randomizing subject allocation to assess the potential for over-fitting

222

Sample randomization was used to mitigate “over-fitting” (Baker et al., 2002). A randomized

223

database (RND) was created by randomizing each sample (Fig. 1A) to one of two study groups

224

while maintaining the original number of subjects, gender, and age group distribution in each group.

225

The nested LOOCV approach described above is applied to the RND. The number of significant

226

peak areas selected using the original dataset determines the number of peak areas selected in the

227

RND. The resulting classification p-values are expected to be either non-significant or considerably

228

larger than that obtained with the original dataset when good discrimination between groups is

229

present. The cut-off thresholds to classify the randomized subjects are determined using the scale

230

factor estimated with the original database. This results in different cut-off threshold values for the

231

two groups being compared (Fig. 2), which in turn mean that a subject randomized to the NCC

232

group can have a % Total NCC LOOCV value which is both above the cut-off threshold for NCC

233

(TP) and below the cut-off value for EUE (FN), resulting in this randomized subject being

234

simultaneously classified in two groups.

M AN U

TE D

EP

AC C

235

SC

221

236

2.7 Analysis of data from members of a smaller sized group

237

When small sized groups are analyzed, two larger groups are used to identify significant peak areas

238

and corresponding PCVs as well as the % Total Group LOOCV cut-off thresholds. Each subject of

239

the small group, referred to as the left-out “blinded sample group”, is classified at each significant

240

peak area to obtain their %Total Group LOOCV and further classified according to the %Total 10

ACCEPTED MANUSCRIPT

241

Group LOOCV cut-off threshold. For the RND, subjects are randomized into three groups,

242

including the small sized group, and then treated as described above.

243

2.8 Analysis of “left-out” data to determine the ability of the approach to classify new cases

245

The approach to analyze smaller groups can be applied to assess how “new” subjects classify into

246

two groups. A set number of subjects are excluded from the LOOCV analysis and put into a “blind

247

database”. The left-in subjects are put into a “training database” and analyzed with LOOCV to

248

determine the cut-off threshold for classification of subjects in the blind database. A p-value for the

249

classification of members of the blind database is estimated using the % Total Group LOOCV as

250

described in the general approach.

M AN U

SC

RI PT

244

251 252

3. Results

254

3.1 Demographics and characteristics of study groups

255

The socio-demographic and clinical characteristics of all recruited patients i.e. 76 patients with

256

NCC-associated epilepsy, including 29 SCG, 20 SCC and 27 MNCC, 29 with EUE, 17 with

257

idiopathic headaches and 10 RNCC were as given earlier (Prabhakaran et al., 2017). Overall, 44

258

(57%) of 76 patients with NCC had a Definite NCC diagnosis and 32 had a Probable NCC

259

diagnosis using the criteria of del Brutto et al (Del Brutto et al., 2001). Among the idiopathic

260

headache group, patients suffered from vascular and migraine headache (n=20), tension type

261

headache (n=3) and unspecified headache (n=3). There were no statistical differences among the

262

groups except for patients in the NCC and RNCC groups more frequently living near a pig-rearing

263

household. Patients with MNCC were more often sero-positive for cysticercosis antigens and

264

antibodies than those with single cysts. While all EUE patients reported two or more lifetime

AC C

EP

TE D

253

11

ACCEPTED MANUSCRIPT

265

seizures, 9 SCG and 5 MNCC patients reported having had only one lifetime seizures. No SCC case

266

reported only one lifetime seizure (See Supplementary Table 1).

267

3.2 Distinguishing NCC-associated epilepsy, EUE and idiopathic headache patients with

269

serum mass profiling

270

ESI-MS distinguished patients with NCC-associated epilepsy from those with EUE and idiopathic

271

headache (Fig. 2). The % Total NCC LOOCV clearly distinguished the two groups (Fig. 2A), with

272

75 of the 76 NCC subjects correctly classified as NCC (Se=99%) and all of the EUE subjects

273

correctly classified (Sp=100%) (Table 1). The p-value of the classification was estimated to 2.6 10-

274

28

275

the much higher p-values and the finding that most randomized NCC cases were classified as both

276

NCC and EUE (Fig. 2C) or as both NCC and idiopathic headache (Fig. 2D). .

SC

RI PT

268

M AN U

. In contrast, the groups were not distinguished from each other using the RND, as represented by

Similarly, NCC-associated epilepsy patients were distinguished from idiopathic headache

278

patients with 74 of the 76 NCC subjects classified as NCC (Se=97%) and all of the idiopathic

279

headache subjects classified as such (Sp=100%) (p-value=2.7 10-12; Table 1). When RND was used,

280

all but one subject were classified as both NCC and idiopathic headache (Fig. 2D). The larger p-

281

value associated with the classification in Fig. 2B and 2D as compared to Fig. 2A and 2C is due in

282

part to the smaller number of idiopathic headache subjects (n=17) compared with EUE (n=29).

EP

AC C

283

TE D

277

284

3.3 Different forms of NCC can be distinguished by serum mass profiling

285

Subjects with SCG and SCC appeared most different from each other with Se and Sp values of

286

100% each (p-value of 3.9 10-25)(Table 1). Good discrimination was obtained among all three sub-

287

groups (Fig. 3)., The RND resulted in p-values that were all significant, indicating some degree of

288

over-fitting however, RND p-values were several orders of magnitude larger than p-values obtained

289

from the actual data, and discrimination with this database was poor (Supplemental Data Fig. S1). 12

290

ACCEPTED MANUSCRIPT

Next, patients with active NCC (29 SCG and 11 MNCC) or calcified lesions (20 SCC and 10 MNCC) only were compared to those with EUE. MNCC subjects (n=6) with both active and

292

calcified lesions were not analyzed. Patients with EUE were distinct from patients with calcified

293

NCC (p-value=1.4 10-25) and from active NCC patients (p-value=8.2 10-18) (Fig. 4A and 4B).

294

Moreover, active NCC patients were also distinct from those with calcified NCC with 38 out of 40

295

patients with active NCC (Fig. 4C) (Se=95%) and 28 out of 30 (Sp=93%) with calcified NCC being

296

correctly classified (Table 1, p-value=1.6 10-19). The RND showed much larger p-values, although

297

the comparison between active NCC and EUE may have slight over-fitting (p-value=0.02).

298

However, the distinction among groups using the RND was poor with most subjects simultaneously

299

classified in two groups (Supplemental Data Fig. S2).

M AN U

SC

RI PT

291

300

3.4 Analysis of NCC patients with and without brain edema

302

Our study population included 48 NCC patients with edema, but edema was not evenly distributed

303

among the types of lesions (Suppl Table 1). To prevent the analysis from being overly influenced

304

by the types of lesions, subjects were selected to balance the number of subjects with and without

305

edema in each study sub-group and to frequency match for age and sex. Results shown in Fig. 4D

306

illustrate discrimination between NCC patients with (n=20) and without (n=20) edema (p-value=

307

1.8 10-19). The % Total edema LOOCV cut-off threshold correctly classified the 40 NCC cases

308

evaluated (Se=100%; Sp=100%) (Table 1) whereas the RND yielded a p-value of 0.02 and poor

309

discrimination (Fig. 4D).

EP

AC C

310

TE D

301

311

3.5 Assessing the classification of Recovered NCC (RNCC) with the other study group

312

The 10 RNCC patients were analyzed as a left-out blinded sample group. When compared with the

313

idiopathic headache and EUE patients, RNCC patients were best differentiated from the idiopathic

314

headache group (p-value=8.0 10-10), and appear more similar to the EUE group, although the p13

ACCEPTED MANUSCRIPT

value for the latter comparison was significant (Fig. 5A; p-value=0.002). In contrast, RNCC

316

patients were indistinguishable from the NCC group (p-value=0.1) while remaining distinctly

317

different from the EUE group (Fig. 5B, p-value=4.1 10-7). Collectively these data suggest that the

318

RNCC mass peak profiles were more similar to NCC patients with visible lesions than to NCC-free

319

subjects. Fig. 6C suggests more similarity between RNCC and SCG (p-value=2.0 10-6) than

320

between RNCC and SCC patients (p-value=3.3 10-11). Even when testing RNCC sera against the

321

more complicated relationship between active and calcified NCC patients (Fig. 5D), the RNCC

322

profiles indicated higher similarity to those with active lesions (p-value=0.045) than with calcified

323

lesions (p-value=1.1 10-5).

SC

RI PT

315

M AN U

324

All comparisons presented above showed Cohen’s d values quite a bit greater than 0.8, a value

326

considered to indicate a large difference. The observed data suggested that the power of our

327

analyses was above 90% for all comparisons. This suggests that our sample size was sufficient to

328

observe the large differences which we found between groups (Table 1).

329

TE D

325

3.6 Assessing the classification of “blinded” left-out samples

331

The left-in training dataset used to determine the group cut-off threshold when comparing the NCC

332

to the EUE group is illustrated in Fig 6A while Fig 6B shows how 28 NCC and five EUE left-out

333

subjects were classified. All five EUE were classified as such while 21 of 28 of the blinded NCC

334

samples were classified correctly, for an estimated sensitivity of 75%. Fig. 6D exhibits a similar

335

blind analysis of five active and five calcified NCC patient serum samples, tested against their

336

training set (Fig. 6C). Nine out of 10 samples were identified correctly with a sub-group

337

discriminatory p value of 10-4.

AC C

EP

330

338 339

3.7 Performance of the Advion instrument. 14

ACCEPTED MANUSCRIPT

The Advion instrument performed reasonably well at distinguishing subjects in the NCC group

341

from those in the EUE group (Suppl Fig. S3A, p-value=1.0 10-14) and of those with active NCC

342

from those with calcified NCC (Suppl. Fig. S3B, p-value=9.9 10-17). However, the classification of

343

subjects was not as good as that observed with the Advantage LCQ. Indeed, 64 out of 76 NCC

344

(84.2%) and 24 out of 29 EUE (82,8%) were classified as such. A similar performance was

345

observed to classify the active NCC patients (40/46 or 87,0%) as compared to the calcified NCC

346

patients (27/30 or 90%). All p-values for the RND were non-significant suggesting that over-fitting

347

was not an issue here. These results suggest that even a less accurate and lower resolution

348

instrument with reduced m/Z range can detect enough mass spectrum signal differences between

349

these groups, strengthening our conclusions that there are some biomolecules in the serum which

350

differ among the study groups that could, if identified, help in the diagnosis of NCC-associated

351

epilepsy and of NCC lesions.

M AN U

SC

RI PT

340

352

4. Discussion

354

This study reports an initial step towards developing minimally invasive and low-cost aids to

355

diagnose NCC-associated epilepsy based on biomolecules identified from mass spectra. We used a

356

LOOCV method combined with randomization of subjects to limit over-fitting of high dimensional

357

mass peak data. All comparisons showed very good discrimination among groups whereas poor

358

discrimination was observed with the RND, supporting the hypothesis that disease-specific

359

perturbations contributed to measurable differences in serum (Hocker et al., 2011a; Hocker et al.,

360

2011b). These observations were further supported by similar results using a different instrument,

361

the Advion CMS which employs a different type of mass analyzer albeit with a reduced m/Z range

362

(see Supplemental material and Suppl Data Figure S3).

363

Serum mass peak profiles are hypothesized to result from tissue shedding and secretion of

364

biomolecules into the bloodstream (Hocker et al., 2017; Hocker et al., 2015) . The small spectral

AC C

EP

TE D

353

15

ACCEPTED MANUSCRIPT

masses of 500-1200 m/Z that were analyzed comprise a lower mass peptide “serome” and likely

366

result from differential host tissue/organ exoprotease activities and other cell/tissue signaling

367

activities (Villanueva et al., 2006). Possible mechanisms yielding differences due to different

368

pathologies could involve “alarmin”-like molecules shed or secreted by differentially

369

damaged/altered cells which could trigger downstream responses in other cells (Bianchi, 2007).

370

Differentiating between seizures due to EUE or NCC is clinically relevant and important for

371

treatment, as is the knowledge of the presence of brain edema. Using serum mass profiling,

372

differences between different NCC lesions and those with or without edema were evident, despite

373

all NCC patients having seizures. An interesting finding was that the sera mass profile of RNCC

374

patients segregated with NCC patients, in particular with those with SCG, rather than with EUE

375

patients. However RNCC patients segregated with EUE patients rather than seizure-free idiopathic

376

headache subjects. These results suggest that novel tests based on biomolecules corresponding to

377

the mass peak areas showing differences could guide therapeutic decisions downstream of a

378

diagnosis of NCC.

379

Our blinded analyses showed promise that biological differences between groups could be helpful

380

in identifying new patients. These results are limited by the potential for data over-fitting and by the

381

lack of identification of the molecular composition of the key discriminating peaks. Although the

382

RND approach could demonstrate that age and gender were unlike to be confounder, it is possible

383

that other variables could have confounded the observed association. However, it would be very

384

difficult if not impossible to account for all potential confounders in a LOOCV model such as the

385

one used here. In addition, this study was meant to be the first step in exploring the possibility of

386

using mass spectrometry as a tool to differentiate among patients with different lesions and

387

symptoms. However, while the comparison of the NCC group with the idiopathic headache group

388

could have been the subject of confounding due to the imbalanced distribution of several variables,

389

other major comparisons such as between SCG and MNCC, SCC and SCG, MNCC and SCC were

AC C

EP

TE D

M AN U

SC

RI PT

365

16

ACCEPTED MANUSCRIPT

conducted among patients with similar distributions of these same potential confounders. Yet, the

391

difference in the p-values of the study group comparisons (which were all highly significant) and of

392

the RND comparisons (which were much higher) were similar when the NCC sub-groups were

393

compared as when the NCC group was compared to the idiopathic headache group. These

394

observations suggest that confounders may play minimal roles in these group separations.

395

Furthermore, the possibility of over-fitting was mitigated by RND analysis and by performing a

396

blinded analysis of patient samples against a training database. All comparisons showed good

397

power as suggested by the Cohen’s d. In addition, analyses are underway to determine the

398

composition of disease discriminating mass peaks more completely and gain a better understanding

399

of the molecules and mechanisms that differentiate the clinical groups.

M AN U

SC

RI PT

390

400

AC C

EP

TE D

401

17

ACCEPTED MANUSCRIPT

Acknowledgments

403

We wish to thank all participants for their time and willingness to take part in this study. This work

404

was supported by the National Institute of Neurological Diseases and Stroke in the U.S.

405

[R21NS077466] and by the Department of Biotechnology in India [BT/MB/BRCP/06/2011] under

406

the U.S.-India Bilateral Brain Research Collaborative Partnerships (U.S. – India BRCP). Further

407

support was received form the National Institute of Neurological Diseases and Stroke and the

408

Fogarty International Center [R01NS098891] under the Global Brain and Nervous System

409

Disorders Research Across the Lifespan program.

AC C

EP

TE D

M AN U

SC

RI PT

402

18

ACCEPTED MANUSCRIPT

410

Table 1: Estimated sensitivity and specificity values (95% CI) of the % Total Group LOOCV mass peak areas to classify subjects into their appropriate

411

groups using the actual and randomized databases. Randomized

Actual data % LOOCV mean (SD)

dataset

TN Sensitivityc Specificityd p-value of the TP

b

NCC 52.1 (4.2)

(95% CI ) (95% CI ) classifications

EUE

100 75

29 99 (93; 100)

37.0 (3.4)

(88;100)

Idiopathic NCC 74

16 97 (91; 100) 94 (71; 100)

TE D

headache 38.3 (5.0) 18.1 (5.7)

100 (88; headache

29

17 100)

35.1 (3.4) SCC

100 (88;

29 49.3 (4.3)

26.6 (3.4)

SCC

MNCC

20

100) 19

classifications

2.6 10-28

0.22

3.95

2.7 10-12

0.08

3.76

8.2 10-18

0.12

4.16

3.9 10-25

0.03

5.85

6.2 10-11

0.02

2.91

100)

100 (83; 100)

24 95 (75; 100) 89 (71; 98)

Note: Supplementary data associated with this article′

classifications

Cohen’s d of the

100 (88;

AC C

51.7 (4.5)

SCG

EP

Idiopathic EUE

p-value of the

SC

Group 2

a

M AN U

Group 1

RI PT

Compared groups

ACCEPTED MANUSCRIPT

Compared groups

Randomized Actual data

% LOOCV mean (SD)

MNCC

SCG 46.5 (5.2)

Active NCC

EUE

56.3 (5.8)

classifications

classifications

(95% CI ) (95% CI ) classifications

100 (88; 26

67.3 (4.3)

b

TP

100)

37

27 93 (77; 98) 93 (77; 99)

30

28

37.7 (6.5)

EUE NCC

100 (88;

46.1 (4.0)

97 (82; 100) 100)

Calcified NCC 34.5 (6.5)

NCC with

NCC without edema

38

28 95 (83; 99) 93 (78; 99)

20

20

AC C

54.2 (5.7)

edema

EP

65.5 (4.3) Active NCC

1.1 10-22

29 96 (81; 100)

Calcified

100 (83; 100)

RI PT

52.5 (2.9)

Cohen’s d of the

SC

63.7 (40.6)

p-value of the

a

0.04

4.35

8.2 10-18

0.02

3.01

1.4 10-25

0.28

4.67

1.6 10-19

0.14

3.27

1.8 10-19

0.02

5.41

M AN U

Group 2

TN Sensitivityc Specificityd p-value of the

TE D

Group 1

dataset

100 (83; 100) 20

ACCEPTED MANUSCRIPT

Compared groups

Randomized Actual data

% LOOCV mean (SD)

p-value of the

Cohen’s d of the

b

classifications

classifications

a

TP

(95% CI ) (95% CI ) classifications

34.2 (5.0)

TPa: Number of subjects truly from Group 1 classified as Group 1

413

TNb: Number of subject truly from Group 2 classified as Group 2

414

Sensitivityc: Proportion of subjects truly from Group 1 classified as Group 1

415

Specificityd: Proportion of subjects truly from Group 2 classified as Group 2

TE D EP AC C

417

M AN U

412

416

RI PT

59.2 (4.2)

Group 2

TN Sensitivityc Specificityd p-value of the

SC

Group 1

dataset

21

ACCEPTED MANUSCRIPT

References

419

Baker, S.G., Kramer, B.S., Srivastava, S., 2002. Markers for early detection of cancer: Statistical guidelines

420

for nested case-control studies. BMC Medical Research Methodology 2, 4-4.

421

Bianchi, M.E., 2007. DAMPs, PAMPs and alarmins: all we need to know about danger. Journal of Leukocyte

422

Biology 81, 1-5.

423

Carpio, A., Fleury, A., Romo, M.L., Abraham, R., Fandino, J., Duran, J.C., Cardenas, G., Moncayo, J., Leite

424

Rodrigues, C., San-Juan, D., Serrano-Duenas, M., Takayanagui, O., Sander, J.W., 2016. New diagnostic

425

criteria for neurocysticercosis: Reliability and validity. Ann Neurol 80, 434-442.

426

Cohen, J., 1988. Statistical Power Analysis for the Behavioral Sciences, 2 ed. Lawrence Erlbaum Associates,

427

Hillsdale, NJ

428

Coyle, C.M., 2014. Neurocysticercosis: an update. Curr Infect Dis Rep 16, 437.

429

Del Brutto, O.H., 2012. Diagnostic criteria for neurocysticercosis, revisited. Pathogens and Global Health

430

106, 299-304.

431

Del Brutto, O.R., Rajshekhar, V., White, A.C., Tsang, V.C.W., Nash, T.E., Takayanagui, O.M., Schantz, P.M.,

432

Evans, C.A.W., Flisser, A., Correa, D., Botero, D., Allan, J.C., Sartì , E., Gonzalez, A.E., Gilman, R.H., García,

433

H.H., 2001. Proposed diagnostic criteria for neurocysticercosis. Neurology 57, 177-183.

434

Donadeu, M., Lightowlers, M.W., Fahrion, A.S., Kesselsd, J., Abela-Ridderc, B., 2016. Taenia solium: WHO

435

endemicity map update. The Weekly Epidemiological Record 91, 595-599.

436

Fiest, K.M., Sauro, K.M., Wiebe, S., Patten, S.B., Kwon, C.S., Dykeman, J., Pringsheim, T., Lorenzetti, D.L.,

437

Jette, N., 2017. Prevalence and incidence of epilepsy: A systematic review and meta-analysis of

438

international studies. Neurology 88, 296-303.

439

Fisher, R.S., Acevedo, C., Arzimanoglou, A., Bogacz, A., Cross, J.H., Elger, C.E., Engel, J., Forsgren, L., French,

440

J.A., Glynn, M., Hesdorffer, D.C., Lee, B.I., Mathern, G.W., Moshé, S.L., Perucca, E., Scheffer, I.E., Tomson, T.,

441

Watanabe, M., Wiebe, S., 2014. ILAE Official Report: A practical clinical definition of epilepsy. Epilepsia 55,

442

475-482.

443

Garcı ́a, H.H., Del Brutto, O.H., 2003. Imaging findings in neurocysticercosis. Acta Tropica 87, 71-78.

AC C

EP

TE D

M AN U

SC

RI PT

418

Note: Supplementary data associated with this article′

ACCEPTED MANUSCRIPT

Garcia, H.H., Gonzales, I., Lescano, A.G., Bustos, J.A., Pretell, E.J., Saavedra, H., Nash, T.E., The Cysticercosis

445

Working Group in, P., 2014. Enhanced steroid dosing reduces seizures during antiparasitic treatment for

446

cysticercosis and early after. Epilepsia 55, 1452-1459.

447

Guan, W., Zhou, M., Hampton, C.Y., Benigno, B.B., Walker, L.D., Gray, A., McDonald, J.F., Fernández, F.M.,

448

2009. Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by

449

support vector machines. BMC Bioinformatics 10, 259.

450

Hanas, J.S., Hocker, J.R., Cheung, J.Y., Larabee, J.L., Lerner, M.R., Lightfoot, S.A., Morgan, D.L., Denson, K.D.,

451

Prejeant, K.C., Gusev, Y., Smith, B.J., Hanas, R.J., Postier, R.G., Brackett, D.J., 2008. Biomarker Identification

452

in Human Pancreatic Cancer Sera. Pancreas 36, 61-69.

453

Hocker, J.R., Deb, S.J., Li, M., Lerner, M.R., Lightfoot, S.A., Quillet, A.A., Hanas, R.J., Reinersman, M.,

454

Thompson, J.L., Vu, N.T., Kupiec, T.C., Brackett, D.J., Peyton, M.D., Dubinett, S.M., Burkhart, H.M., Postier,

455

R.G., Hanas, J.S., 2017. Serum Monitoring and Phenotype Identification of Stage I Non-Small Cell Lung

456

Cancer Patients. Cancer Investigation 35, 573-585.

457

Hocker, J.R., Lerner, M.R., Mitchell, S.L., Lightfoot, S.A., Lander, T.J., Quillet, A.A., Hanas, R.J., Peyton, M.D.,

458

Postier, R.G., Brackett, D.J., Hanas, J.S., 2011a. Distinguishing early-stage pancreatic cancer patients from

459

disease-free individuals using serum profiling. Cancer Invest 29, 173-179.

460

Hocker, J.R., Peyton, M.D., Lerner, M.R., Lightfoot, S.A., Hanas, R.J., Brackett, D.J., Hanas, J.S., 2011b.

461

Distinguishing non-small cell lung adenocarcinoma patients from squamous cell carcinoma patients and

462

control individuals using serum profiling. Cancer Invest 30, 180-188.

463

Hocker, J.R., Postier, R.G., Li, M., Lerner, M.R., Lightfoot, S.A., Peyton, M.D., Deb, S.J., Baker, C.M., Williams,

464

T.L., Hanas, R.J., Stowell, D.E., Lander, T.J., Brackett, D.J., Hanas, J.S., 2015. Discriminating patients with

465

early-stage pancreatic cancer or chronic pancreatitis using serum electrospray mass profiling. Cancer

466

Letters 359, 314-324.

467

John, C.C., Carabin, H., Montano, S.M., Bangirana, P., Zunt, J.R., Peterson, P.K., 2015. Global research

468

priorities for infections that affect the nervous system. Nature 527, S178-186.

AC C

EP

TE D

M AN U

SC

RI PT

444

23

ACCEPTED MANUSCRIPT

Luo, J., Wang, W., Xi, Z., dan, C., Wang, L., Xiao, Z., Wang, X., 2014. Concentration of Soluble Adhesion

470

Molecules in Cerebrospinal Fluid and Serum of Epilepsy Patients. Journal of Molecular Neuroscience 54,

471

767-773.

472

Nash, T.E., Garcia, H.H., 2011. Diagnosis and Treatment of Neurocysticercosis. Nature reviews. Neurology 7,

473

584-594.

474

Nash, T.E., Mahanty, S., Loeb, J.A., Theodore, W.H., Friedman, A., Sander, J.W., Singh, G., Cavalheiro, E., Del

475

Brutto, O.H., Takayanagui, O.M., Fleury, A., Verastegui, M., Preux, P.M., Montano, S., Pretell, E.J., White,

476

A.C., Jr., Gonzales, A.E., Gilman, R.H., Garcia, H.H., 2015. Neurocysticercosis: A natural human model of

477

epileptogenesis. Epilepsia 56, 177-183.

478

Ndimubanzi, P.C., Carabin, H., Budke, C.M., Nguyen, H., Qian, Y.-J., Rainwater, E., Dickey, M., Reynolds, S.,

479

Stoner, J.A., 2010. A Systematic Review of the Frequency of Neurocyticercosis with a Focus on People with

480

Epilepsy. PLOS Neglected Tropical Diseases 4, e870.

481

Newton, C.R., Garcia, H.H., 2012. Epilepsy in poor regions of the world. Lancet 380, 1193-1201.

482

Ngugi, A.K., Bottomley, C., Kleinschmidt, I., Sander, J.W., Newton, C.R., 2010. Estimation of the burden of

483

active and life-time epilepsy: a meta-analytic approach. Epilepsia 51, 883-890.

484

Prabhakaran, V., Drevets, D.A., Ramajayam, G., Manoj, J.J., Anderson, M.P., Hanas, J.S., Rajshekhar, V.,

485

Oommen, A., Carabin, H., 2017. Comparison of monocyte gene expression among patients with

486

neurocysticercosis-associated epilepsy, Idiopathic Epilepsy and idiopathic headaches in India. PLOS

487

Neglected Tropical Diseases 11, e0005664.

488

Rajshekhar, V., Chandy, M.J., 1997. Validation of diagnostic criteria for solitary cerebral cysticercus

489

granuloma in patients presenting with seizures. Acta Neurologica Scandinavica 96, 76-81.

490

Ransohoff, D.F., 2004. Rules of evidence for cancer molecular-marker discovery and validation. Nat Rev

491

Cancer 4, 309-314.

492

Richter, R., Schulz-Knappe, P., Schrader, M., Standker, L., Jurgens, M., Tammen, H., Forssmann, W.G., 1999.

493

Composition of the peptide fraction in human blood plasma: database of circulating human peptides. J

494

Chromatogr B Biomed Sci Appl 726, 25-35.

AC C

EP

TE D

M AN U

SC

RI PT

469

24

ACCEPTED MANUSCRIPT

Rodriguez, S., Wilkins, P., Dorny, P., 2012. Immunological and molecular diagnosis of cysticercosis.

496

Pathogens and Global Health 106, 286-298.

497

Sako, Y., Takayanagui, O.M., Odashima, N.S., Ito, A., 2015. Comparative Study of Paired Serum and

498

Cerebrospinal Fluid Samples from Neurocysticercosis Patients for the Detection of Specific Antibody to

499

Taenia solium Immunodiagnostic Antigen. Tropical Medicine and Health 43, 171-176.

500

Soper, D., 2018. Free Statistical Calculators, 4.0 ed.

501

Tuck, M.K., Chan, D.W., Chia, D., Godwin, A.K., Grizzle, W.E., Krueger, K.E., Rom, W., Sanda, M., Sorbara, L.,

502

Stass, S., Wang, W., Brenner, D.E., 2009. Standard Operating Procedures for Serum and Plasma Collection:

503

Early Detection Research Network Consensus Statement Standard Operating Procedure Integration

504

Working Group. Journal of Proteome Research 8, 113-117.

505

Vachani, A., Pass, H.I., Rom, W.N., Midthun, D.E., Edell, E.S., Laviolette, M., Li, X.-J., Fong, P.-Y., Hunsucker,

506

S.W., Hayward, C., Mazzone, P.J., Madtes, D.K., Miller, Y.E., Walker, M.G., Shi, J., Kearney, P., Fang, K.C.,

507

Massion, P.P., 2015. Validation of a Multiprotein Plasma Classifier to Identify Benign Lung Nodules. Journal

508

of Thoracic Oncology 10, 629-637.

509

Vezzani, A., 2005. Inflammation and Epilepsy. Epilepsy Currents 5, 1-6.

510

Villanueva, J., Shaffer, D.R., Philip, J., Chaparro, C.A., Erdjument-Bromage, H., Olshen, A.B., Fleisher, M.,

511

Lilja, H., Brogi, E., Boyd, J., Sanchez-Carbayo, M., Holland, E.C., Cordon-Cardo, C., Scher, H.I., Tempst, P.,

512

2006. Differential exoprotease activities confer tumor-specific serum peptidome patterns. The Journal of

513

Clinical Investigation 116, 271-284.

515

SC

M AN U

TE D

EP

AC C

514

RI PT

495

25

ACCEPTED MANUSCRIPT

Figure captions and legends

517

Figure 1. A, Flowchart of the steps taken for an electrospray ionizing mass spectrometry analysis

518

using the comparison between neurocysticercosis (NCC) and epilepsy of unknown etiology (EUE)

519

as an example. B, Example of statistically significant different LOOCV means in normalized

520

spectral mass peaks seen between 650 and 750 m/Z when data from 75 NCC and 29 EUE are left-in

521

while 1 NCC case is left out.

522

Legend: * indicates that the difference in a normalized spectral mass peak mean between the NCC

523

and EUE left-in subjects is statistically significant. The horizontal bar indicated the median between

524

the two means and corresponds to the LOOCV peak classification value.

525

Abbreviations: LOOCV: Leave One Out Cross Validation; NCC: neurocysticercosis; EUE: epilepsy

526

of unknown etiology.

M AN U

SC

RI PT

516

527

Figure 2. Percent total (randomized) NCC LOOCV classified mass peaks of each study subject and

529

of randomized subjects in relation to the cut-off thresholds used to classify subjects into one of two

530

groups and with the p-values corresponding to the difference in the means of the two groups being

531

compared. A, Comparison of subjects with neurocysticercosis (NCC) and with epilepsy of unknown

532

etiology (EUE). B, Comparison of subjects with NCC and headaches. C, Comparison of subjects

533

randomized to either the NCC or EUE groups. D, Comparison of subjects randomized to the NCC

534

or headache groups.

535

Abbreviations: NCC, neurocysticercosis; EUE: epilepsy of unknown etiology; LOOCV: leave one

536

out cross validation; RND: randomized database; SD: standard deviation.

EP

AC C

537

TE D

528

538

Figure 3. Percent total group LOOCV classified mass peaks of each subject in the three NCC sub-

539

groups in relation to the cut-off thresholds used to classify subjects into one of two groups, the p-

540

value corresponding to the difference in the means of the two groups being compared, and the p26

ACCEPTED MANUSCRIPT

value obtained with the corresponding randomized database. A, Comparison of subjects with

542

multiple neurocysticercosis (MNCC) and with single cysticercus granuloma (SCG). B, Comparison

543

of subjects with SCG and single calcified cyst (SCC). C, Comparison of subjects with SCC and

544

MNCC.

545

Abbreviations: NCC, neurocysticercosis; MNCC: multiple neurocysticercosis; SCG: single

546

cysticercus granuloma; SCC: single calcified cyst; LOOCV: leave one out cross validation; SD:

547

standard deviation.

RI PT

541

SC

548

Figure 4. Percent total group LOOCV classified mass peaks of each subject with calcified

550

neurocysticercosis (NCC), active NCC, NCC with or without edema or epilepsy of unknown

551

etiology (EUE) in relation to the cut-off thresholds used to classify subjects into one of two groups,

552

the p-value corresponding to the difference in the means of the two groups being compared, and the

553

p-value obtained with the corresponding randomized database. A, Comparison of subjects with

554

calcified NCC and with EUE. B, Comparison of subjects with active NCC and with EUE. C,

555

Comparison of subjects with active NCC and calcified NCC. D, Comparison of NCC subjects with

556

edema* and without edema**

557

Abbreviations: NCC, neurocysticercosis; LOOCV: leave one out cross validation; SD: standard

558

deviation.

559

Legend: NCC with edema*: There were a total of 48 NCC subjects with edema. Among these, 6/6

560

multiple NCC patients with mixed lesions, 2/2 multiple NCC patients with calcified cysts only, 1/10

561

multiple NCC patient with active cysts only, 4/4 single calcified cyst patients, 7/26 single

562

cysticercus granuloma patients were included in this analysis.

563

NCC without edema**: There were a total of 28 NCC subjects without edema. Among these, 1/1

564

multiple NCC patient with active cysts only, 8/8 multiple NCC patients with calcified cysts only,

AC C

EP

TE D

M AN U

549

27

ACCEPTED MANUSCRIPT

565

8/16 single calcified cyst patients, 3/3 single cysticercus granuloma patients were included in this

566

analysis.

567

Figure 5. Classification of patients with recovered NCC (RNCC) according to their percent total

569

group LOOCV classified mass peaks using the data on the percent total group LOOCV classified

570

mass peaks and cut-off thresholds using from subjects in other groups. A, Classification of RNCC

571

patients using data from the idiopathic headache (headache) and epilepsy of unknown etiology

572

(EUE) patients. B, Classification of the RNCC patients using data from the epilepsy of unknown

573

etiology (EUE) and the NCC patients. C, Classification of the RNCC patients using data from the

574

single cysticercus granuloma (SCG) and single calcified cyst (SCC) patients. D, Classification of

575

the RNCC patients using data from patients with active and calcified NCC. Abbreviations: EUE:

576

epilepsy of unknown etiology; NCC, neurocysticercosis; RNCC: recovered neurocysticercosis;

577

SCG, single cysticercus granuloma; SCC: single calcified cyst (SCC); LOOCV: leave one out cross

578

validation; SD: standard deviation.

SC

M AN U

TE D

579

RI PT

568

Figure 6. Classification of 28 neurocysticercosis (NCC), five epilepsy of unknown etiology (EUE),

581

five active NCC and five calcified NCC samples taken out of the original database (blind samples)

582

according to their percent Total group LOOCV classified mass peaks using data from the remaining

583

samples (training set) to determine the group cut-off thresholds. A, Percent Total NCC LOOCV

584

classification mass peaks and group cut-off threshold for classifying subjects into the NCC or EUE

585

groups using the data from 48 NCC and 24 EUE subjects in the training dataset. B, Classification of

586

28 NCC and five EUE subjects not included in the training dataset according to their % Total NCC

587

LOOCV classification mass peaks compared to the group cut-off threshold obtained in (A). C,

588

Percent Total active NCC classification mass peaks and group cut-off threshold for classifying

589

subjects into the active NCC or calcified NCC groups using the data from 35 active NCC and 25

AC C

EP

580

28

ACCEPTED MANUSCRIPT

590

calcified NCC subjects in the training dataset. D, Classification of five active NCC and five

591

calcified NCC subjects not included in the training dataset according to their % Total NCC LOOCV

592

classification mass peaks compared to the group cut-off threshold obtained in (C).

593

AC C

EP

TE D

M AN U

SC

RI PT

594

29

ACCEPTED MANUSCRIPT

Supplementary Figure captions and legends

596

Supplementary Data Figure S1. Percent total randomized group LOOCV classified mass peaks of

597

each subject randomized in the three NCC sub-groups in relation to the randomized cut-off

598

thresholds used to classify subjects into one of two group and the p-value obtained with the

599

corresponding randomized database. A, Comparison of subjects randomized to the multiple

600

neurocysticercosis (MNCC) and with single cysticercus granuloma (SCG) sub-groups. B,

601

Comparison of subjects randomized to the SCG and single calcified cyst (SCC) sub-groups. C,

602

Comparison of subjects randomized to the with SCC and MNCC sub-groups. Abbreviations: NCC,

603

neurocysticercosis; MNCC: multiple neurocysticercosis; SCG: single cysticercus granuloma; SCC:

604

single calcified cyst; LOOCV: leave one out cross validation; RND: randomized database; SD:

605

standard deviation.

M AN U

SC

RI PT

595

606

Supplemental Data Figure S2. Percent total randomized group LOOCV classified mass peaks of

608

each subject randomized to calcified neurocysticercosis (NCC), active NCC, NCC with or without

609

edema or epilepsy of unknown etiology (EUE) in relation to the randomized cut-off thresholds used

610

to classify subjects into one of two group and the p-value obtained with the corresponding

611

randomized database. A, Comparison of subjects randomized to calcified NCC and to EUE. B,

612

Comparison of subjects randomized to active NCC and to EUE. C, Comparison of subjects

613

randomized to calcified NCC and to active NCC. D, Comparison of subjects randomized to NCC

614

with edema* or without edema** Abbreviations: NCC, neurocysticercosis; EUE: epilepsy of

615

unknown etiology; LOOCV: leave one out cross validation; RND: randomized database; SD:

616

standard deviation.

617

NCC with edema*: There were a total of 48 NCC subjects with edema. Among these, 6/6 multiple

618

NCC patients with mixed lesions, 2/2 multiple NCC patients with calcified cysts only, 1/10 multiple

AC C

EP

TE D

607

30

ACCEPTED MANUSCRIPT

NCC patient with active cysts only, 4/4 single calcified cyst patients, 7/26 single cysticercus

620

granuloma patients were included in this analysis.

621

NCC without edema**: There were a total of 28 NCC subjects without edema. Among these, 1/1

622

multiple NCC patient with active cysts only, 8/8 multiple NCC patients with calcified cysts only,

623

8/16 single calcified cyst patients, 3/3 single cysticercus granuloma patients were included in this

624

analysis.

RI PT

619

625

Supplemental Data Figure S3. Results using the Advion desktop instrument showing the percent

627

total (randomized) NCC LOOCV classified mass peaks of each study subject and of randomized

628

subjects in relation to the cut-off thresholds used to classify subjects into one of two groups and

629

with the p-values corresponding to the difference in the means of the two groups being compared.

630

A, Comparison of subjects with neurocysticercosis (NCC) and with epilepsy of unknown etiology

631

(EUE). B, Comparison of subjects with NCC and headaches. C, Comparison of subjects

632

randomized to either the NCC or EUE groups. D, Comparison of subjects randomized to the NCC

633

or headache groups. Abbreviations: NCC, neurocysticercosis; EUE: epilepsy of unknown etiology;

634

LOOCV: leave one out cross validation; RND: randomized database; SD: standard deviation.

M AN U

TE D

EP AC C

635

SC

626

31

ACCEPTED MANUSCRIPT

NCC

SC

Dilute

EUE

Mass Peak Processing Prior to Database Analysis

NCC vs. EUE

EP

Subject Groups LOOCV* Data Analysis

All-Liquid ESI-MS Sample Analysis

Dilute

AC C

Plot % of NCC Classified Serum Mass Peaks vs. Patient/Subject Number True Pathology: p-value Distribution

B

RI PT

EUE Patient Serum

M AN U

Subject Samples

NCC Patient Serum

TE D

A

Random Grouping: p-value Distribution

*Leave One Out Cross Validation (LOOCV)

A

ACCEPTED MANUSCRIPT

TE D EP AC C

C

M AN U

SC

RI PT

B

D

A

B

AC C

EP

C

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

A

ACCEPTED MANUSCRIPT

EP AC C

C

TE D

M AN U

SC

RI PT

B

D

ACCEPTED MANUSCRIPT

A

EP AC C

C

TE D

M AN U

SC

RI PT

B

D

B

ACCEPTED MANUSCRIPT

TE D EP AC C

C

M AN U

SC

RI PT

A

D

ACCEPTED MANUSCRIPT

Highlights •

Patients with NCC and epilepsy of unknown etiology were compared by serum mass profiling Patients with NCC, epilepsy of unknown etiology and headache had distinct spectral

RI PT



signals

NCC patients with different types of lesions had distinct mass profiles



Analysis of serum biomolecules could be used to diagnose NCC.

AC C

EP

TE D

M AN U

SC