Comparison among probabilistic neural network, support vector machine and logistic regression for evaluating the effect of subthalamic stimulation in Parkinson disease on ground reaction force during gait

Comparison among probabilistic neural network, support vector machine and logistic regression for evaluating the effect of subthalamic stimulation in Parkinson disease on ground reaction force during gait

ARTICLE IN PRESS Journal of Biomechanics 43 (2010) 720–726 Contents lists available at ScienceDirect Journal of Biomechanics journal homepage: www.e...

482KB Sizes 0 Downloads 28 Views

ARTICLE IN PRESS Journal of Biomechanics 43 (2010) 720–726

Contents lists available at ScienceDirect

Journal of Biomechanics journal homepage: www.elsevier.com/locate/jbiomech www.JBiomech.com

Comparison among probabilistic neural network, support vector machine and logistic regression for evaluating the effect of subthalamic stimulation in Parkinson disease on ground reaction force during gait A.M.S. Muniz a, H. Liu b, K.E. Lyons c, R. Pahwa c, W. Liu b, F.F. Nobre a, J. Nadal a,n a

Biomedical Engineering Program, Federal University of Rio de Janeiro, COPPE, P.O. Box 68510, 21941-972, Rio de Janeiro, RJ, Brazil Department of Physical Therapy and Rehabilitation Sciences, University of Kansas Medical Center, Kansas City, KS, USA c Department of Neurology, University of Kansas Medical Center, Kansas City, KS, USA b

a r t i c l e in f o

a b s t r a c t

Article history: Accepted 6 October 2009

Deep brain stimulation of the subthalamic nucleus (DBS-STN) is an approved treatment for advanced Parkinson disease (PD) patients; however, there is a need to further evaluate its effect on gait. This study compares logistic regression (LR), probabilistic neural network (PNN) and support vector machine (SVM) classifiers for discriminating between normal and PD subjects in assessing the effects of DBS-STN on ground reaction force (GRF) with and without medication. Gait analysis of 45 subjects (30 normal and 15 PD subjects who underwent bilateral DBS-STN) was performed. PD subjects were assessed under four test conditions: without treatment (mof-sof), with stimulation alone (mof-son), with medication alone (mon-sof), and with medication and stimulation (mon-son). Principal component (PC) analysis was applied to the three components of GRF separately, where six PC scores from vertical, one from anterior–posterior and one from medial–lateral were chosen by the broken stick test. Stepwise LR analysis employed the first two and fifth vertical PC scores as input variables. Using the bootstrap approach to compare model performances for classifying GRF patterns from normal and untreated PD subjects, the first three and the fifth vertical PCs were attained as SVM input variables, while the same ones plus the first anterior–posterior were selected as PNN input variables. PNN performed better than LR and SVM according to area under the receiver operating characteristic curve and the negative likelihood ratio. When evaluating treatment effects, the classifiers indicated that DBS-STN alone was more effective than medication alone, but the greatest improvements occurred with both treatments together. & 2009 Published by Elsevier Ltd.

Keywords: Parkinson disease Deep brain stimulation Logistic regression Probabilistic neural network Support vector machine Gait analysis

1. Introduction Parkinson disease (PD) is a neurodegenerative disorder leading to difficulty in motor function, including gait and balance. Deep brain stimulation of the subthalamic nucleus (DBS-STN) is a treatment for advanced PD patients with disabling motor fluctuations, allowing a significant reduction in dopaminergic medications (Ferrarin et al., 2005). Various studies have evaluated the effects of DBS-STN using clinical motor scores (Krack et al., 2003; Ostergaard and Sundae, 2006), while only a few have quantitatively assessed the gait of PD patients (Liu et al., 2005; Ferrarin et al., 2005). Gait speed is shown to be the variable most affected by the DBS-STN; however, it does not take into account atypical waveforms and therefore does not provide enough information about the gait pattern (Schwartz and Rozumalski,

n

Corresponding author. Tel.: +55 21 25628577; fax: + 55 21 25628591. E-mail address: [email protected] (J. Nadal).

0021-9290/$ - see front matter & 2009 Published by Elsevier Ltd. doi:10.1016/j.jbiomech.2009.10.018

2008). Approaches that capture features of the entire waveform instead of a few parameters may improve the effectiveness of the analysis (Chester et al., 2007). Additionally, the correlations among variables must be considered to accurately evaluate the extent of gait abnormalities and to assess the changes resulting from a specific treatment (Schutte et al., 2000). A clinical challenge is to understand the disease process as well as outcomes of potential interventions. Logistic regression (LR) is commonly used as a linear predictive model for diagnostic and prognostic tasks. Recently, computational intelligence techniques such as artificial neural networks (ANN) and support vector machines (SVM) have played an important role in gait classification and the diagnosis of diseases (Lai et al., 2009). Studies have compared the predictive ability of LR and ANN (Dreiseitl and Ohno-Machado, 2002; Song et al., 2005). ANN modeling has been used in gait analysis focusing on pattern recognition (Hahn et al., 2005), as well as for classifying normal and pathological patterns (Lafuente et al., 1997; Su and Wu, 2000). SVM has recently been used for automated identification of gait pathologies (Begg et al.,

ARTICLE IN PRESS A.M.S. Muniz et al. / Journal of Biomechanics 43 (2010) 720–726

2005; Lai et al., 2009). However, none of the past studies compared LR, probabilistic neural network (PNN) and SVM in classifying gait patterns or evaluated the effect of therapeutic interventions on ground reaction force (GRF) of PD patients. This study evaluated LR, PNN and SVM models for discriminating between normal and PD subjects using principal components derived from the GRF as input variables. For performance evaluation, the accuracies (ACC) and the areas under the receiver operating characteristic (ROC) curves (AUC) based on 1000 bootstrap runs of the classifiers were compared. The effects of DBS-STN on GRF with and without medication were also evaluated with both the models. 2. Materials and methods 2.1. Studied groups

721

(Schumacher et al., 1996): PðxÞ ¼

1 Pn

1 þ ebo þ

i ¼ 1

ð1Þ

bi Pci

where bo is the intercept and bi is the coefficient associated with the explanatory variable PCi. The maximum likelihood estimation method is used to estimate the b coefficients. For subjects’ classification, the PC scores were selected as the independent variables for computing the natural logarithm of the odds ratio; the ratio between the probabilities that an event will or will not occur (1—controls and 0—PD subjects). The classification threshold was set to 0.5.

2.4. Probabilistic neural networks PNN is a feedforward ANN developed by Specht (1990), in which the response to an input pattern is processed from one layer to the next, without feedback paths to previous layers. A typical PNN has four layers: input, pattern, summation and output. The input units supply the same values to all pattern units. The pattern units form a dot product between the input pattern vector x and a weight vector wi (zi = xwi), which is followed by the nonlinear neuron activation function:   ðw  xÞ0 ðwi  xÞ ½2 gðzi Þ ¼ exp  i 2s2

The subjects (n= 45) consisted of 30 healthy controls (20 women) and 15 PD patients (three women). The healthy subjects, recruiting by advertisements, were without neurological illness, degenerative conditions or any general disease that might interfere with gait (Table 1). A screening questionnaire was completed to guarantee that subjects were suitable for the study. The PD subjects were recruited from the Parkinson Disease and Movement Disorder Center of the University of Kansas Medical Center. All PD subjects had undergone bilateral DBS-STN and were stable when the study was conducted. Each subject signed an informed consent approved by the local Institutional Review Board.

This Bayesian function takes into account the relative likelihood of events and uses a priori information to improve the prediction (Specht and Romsdahl, 1994). The summation units simply sum the inputs from the pattern units, corresponding to the category from which the training patterns were selected. Repeating this procedure for each class, the un-normalized density functions gk (x), for k= 1, 2, y, K were estimated. The Bayesian probability that the case was from class k is as follows:

2.2. Experimental protocol and signal processing

Pðx A kÞ ¼ Pk

For each PD subject, quantitative gait measurement was obtained on two different days. In the first session, the subject had taken the usual dose of PD medication and stimulators were turned ‘‘on’’. The gait assessment was first conducted with both medication and stimulation (mon-son). After turning the stimulators off for 30 min, the measurements were repeated (mon-sof). In the second session, the subjects were without medication for at least 12 h. Gait analysis was first conducted with stimulation (mof-son), and repeated after 30 min without stimulation (mof-sof). Due to technical problems, some subjects did not complete all tests. Therefore, 13 subjects were evaluated in mof-sof, 12 in mof-son, 14 in mon-sof and 11 in mon-son conditions. Subjects from the control group were evaluated only once. The quantitative analysis for the controls and PD subjects in the mof-sof condition was used to develop the classifier models. The other three PD conditions were included in the developed models to evaluate the DBS-STN effect in PD treatment. Two force platforms (AMTI, USA) were mounted in series at the middle of a walkway. All subjects practiced the walking trial five times before the experiment. The subjects walked barefoot at their self-selected speed and repeated the walking trial five times. The GRF from both force platforms were collected for 10 s at a sampling frequency of 100 Hz, filtered using a low-pass Butterworth filter, with a cut-off frequency of 30 Hz, and normalized by subject body weight. The averaged vertical, anterior–posterior and medial–lateral components of GRF from five walking trials were interpolated with cubic splines and re-sampled with 101 sample points according to the stance phase duration of each foot. Thus, 202 GRF samples were analyzed for each GRF component separately. Each vertical, anterior–posterior and medial–lateral GRF waveform was stored in a matrix E with 43 rows (# of subjects) and 202 columns (# of GRF samples for both right and left limbs). Principal component analysis (PCA) was applied to the covariance matrices S (202  202) from each E, separately (Jollife, 2002). 2.3. Logistic regression LR is a statistical modeling technique that estimates the probability of a dichotomous outcome event being related to a set of explanatory variables

Table 1 Subject’s characteristics.

Age (years) Mass (kg) Height (m) Duration of PD (years) Time since surgery (months)

gi ðxÞ

i¼1

Control group

PD patients

50.17 7.8 90.56 7 15.7 1.73 70.08 12.27 4.3 15.17 9.5

56.4 7 8.3 50.52 7 8.02 1.67 7 0.09 N.A. N.A.

ð3Þ

gi ðxÞ

The output units have a competitive transfer function that picks the maximum of the probabilities and produces 1 for one class (normals) and 0 (PD patients) for the other.

2.5. Support vector machine The SVM estimates a function for classifying data into two classes (Vapnik, 2000). Using a nonlinear transformation ^ that depends on a regularization parameter C (Begg et al., 2005), the input vectors are placed into a highdimensional feature space, where a linear separation is employed. To construct a nonlinear support vector classifier, the inner product (x,y) is replaced by a kernel function K(x,y) f ðxÞ ¼ sgn

l X

ai yi Kðxi ; xÞþ b



½4

i¼1

where f(x) determines the membership of x. In this study, the normal subjects were labeled as  1 and PD subjects as +1. The SVM has two layers. During the learning process, the first layer selects the basis K(xi,x), i= 1,2, y, N from the given set of kernels, while the second layer constructs a linear function in the ^ space. This is equivalent to finding the optimal hyperplane in the corresponding feature space. The SVM algorithm can construct a variety of learning machines using different kernel functions.

2.6. Variable selection The broken stick criterion (Jolliffe, 2002) was used for choosing the significant PCs of vertical, anterior–posterior and medial–lateral GRF components for the analysis. Moreover, to build a more accurate classifier model, it was necessary to evaluate which scores contributed to improvement in the task (Chang, 1983). In the LR model, a stepwise approach was used to select the input variables by the Akaike information criterion (AIC), followed by w2 tests to contrast with a full model including all PC scores selected by the broken stick criterion or with subsets of variables close to the final model (Krzanowski, 1998). PNN requires the selection of the optimal value for the width (s2) of the radial basis function. For optimizing s2 and selecting the relevant input variables, PNN models were trained and evaluated by the bootstrap method, considering each possible combination of scores and varying the values s2 in the interval [0.1, 1]. Briefly, bootstrapping generates training sets drawing samples with replacement from the original data set. For the SVM, the appropriate kernel function, the number of PC scores to be used as input variables and the parameter C were evaluated using the same bootstrap approach applied to the PNN. The input set was determined considering each possible combination of scores. All SVM models were trained over the range C= {0.1, 1, 10, 100, 1000} using linear, polynomial and Gaussian kernels (Lai et al., 2009).

ARTICLE IN PRESS 722

A.M.S. Muniz et al. / Journal of Biomechanics 43 (2010) 720–726

Fig. 1. Averaged vertical ground reaction force from controls (bold line) and PD (dotted line): (a) mof-sof, (b) mof-son, (c) mon-sof and (d) mon-son conditions.

Fig. 2. Averaged anterior–posterior ground reaction force from controls (bold line) and PD (dotted line): (a) mof-sof, (b) mof-son, (c) mon-sof and (d) mon-son conditions.

2.7. Performance evaluation The models’ performances were assessed using the AUC and ACC. With small sample sizes, the application of a resampling technique such as the bootstrap approach is recommended to estimate the performance of the classifier (Sahiner et al., 2008). Both ACC and AUC were obtained using the set of samples not included in the bootstrap sample that were obtained using the 0.632+ bootstrap method (Efron and Tibshirani, 1997). The predictive power of a classifier to ascertain if a PD patient under treatment has achieved a normal gait profile was evaluated with the negative likelihood ratio (NLR) between false and true negatives. This ratio gives the odds that a subject does not have symptoms of the disease, in this case indicating if the treatment has a positive effect. To perform a comparison among LR, PNN and SVM models, the same bootstrap samples were used for developing and testing the models. The AUC, ACC and NLR

indices for each classifier represented the average and standard error over all 1000 tests with different bootstrap samples. Comparisons among models were performed using the one-way ANOVA, with a = 0.05 followed by the post-hoc Bonferroni test. The agreement between each model pair was compared using the percent agreement and the Cohen’s kappa coefficients, using the algorithm proposed by Cardillo (2007) and interpreted according to the standard criteria (Landis and Koch, 1977).

2.8. Effect of treatments For quantifying the effect of the treatments, the PC scores from PD subjects in the mon-sof, mof-son and mon-son conditions were calculated and used as inputs in the developed classifiers.

ARTICLE IN PRESS A.M.S. Muniz et al. / Journal of Biomechanics 43 (2010) 720–726

723

Fig. 3. Averaged medial–lateral ground reaction force from controls (bold line) and PD (dotted line): (a) mof-sof, (b) mof-son, (c) mon-sof and (d) mon-son conditions.

Fig. 4. (a) Area under ROC (AUC) and accuracy (ACC) of probabilistic neural network classifier; and (b) average and 95% confidence bands for AUC and ACC for one thousand bootstrap training and testing samples as a function of the spread constants for the selected inputs.

3. Results Visual inspection of three averaged GRF components from normal and PD subjects showed the PD pattern moving toward a

normal pattern (Figs. 1–3), especially when both medication and stimulation were used (Figs. 1d, 2d, 3d). The Broken Stick test indicated that the first six PCs from vertical GRF, the first PC from anterior–posterior GRF and the first

ARTICLE IN PRESS 724

A.M.S. Muniz et al. / Journal of Biomechanics 43 (2010) 720–726

PC from medial–lateral GRF, which respectively explained 91.1%, 69.1% and 62.1% of the total variation, should be considered in the analysis. The stepwise selection of LR variables identified the first, second and fifth PC scores from vertical GRF as the significant inputs. Contrasting with the LR model including all eight scores under a w2 distribution, this model yielded a higher P value (P40.05), indicating a non-significant difference between models, which validates the selected PC scores as input variables in the LR classifier. The highest PNN performance occurred with the first three and the fifth vertical PCs, and the first anterior–posterior PC (Fig. 4a). The final model including these five variables was thus maintained for comparison among model and treatment assessment. The AUC indicated higher average performance values within the whole range of spread constants, reaching its maximum with s2 =0.8, while the ACC has the maximum with s2 = 0.1 (Fig. 4b). To make a compromise between AUC and ACC, a 0.1 spread was chosen. The AUC ( 7Standard Error) and the ACC at this spread constant were 0.98 ( 70.019) and 0.92 ( 70.058), respectively.

The best SVM performance was obtained for the linear kernel with C = 1 and four inputs (the first three and the fifth vertical GRF PC scores), for both AUC and ACC criteria (Table 2). The resulting values of AUC, ACC and NLR for the 1000 bootstrap samples indicate that all methods presented high performance indexes (Fig. 5). The ANOVA showed a significant difference among these values (Table 3), indicating that PNN presented better AUC and NLR and the SVM a better ACC. According to the standard criteria for interpreting Cohen’s kappa coefficients, the classification agreement was substantial in the comparison between PNN and SVM, and just moderate between LR and PNN and between LR and SVM (Table 4).

Table 3 Average and standard error and P value for AUC, ACC and NLR.

AUC ACC NLR

LR

PNN

SVM

P

0.9813 70.0314 0.9185 70.0711 0.1216 70.1761

0.9867 7 0.0195 0.9273 7 0.0729 0.0354 7 0.0942

0.9463 7 0.0622 0.94607 0.0556 0.0569 7 0.1172

o 0.001n, + ,y o 0.001n, + ,y o 0.001n, + ,y

Post-hoc P o 0.05: n LR different from PNN, + LR different from SVM, y PNN different from SVM.

Table 2 The best SVM bootstrap training and test set area under ROC (AUC), accuracy (ACC) and the selected scores inputs. Kernel

C

AUC

ACC

Variables

Linear Gaussian (s = 1) Polynomial (d =3)

1 10 1

0.9463 0.8907 0.8979

0.9460 0.9115 0.9291

1, 2, 3 and 5a 1, 4 and 7a 1, 5 and 7a

a Variables 1–5 refer to the first to fifth PC scores of the vertical GRF, while variable 7 is the first PC score of the anterior–posterior horizontal GRF.

Table 4 Comparison between models using the Cohen’s kappa coefficient. Agree (%) Disagree (%) Cohen’s kappa LR and PNN 70 LR and SVM 67 PNN and SVM 86.5

30 33 13.5

0.4537 0.4127 0.8649

Moderate agreement Moderate agreement Substantial agreement

LR—logistic regression, PNN—probabilistic neural network, and SVM —support vector machine

Fig. 5. Box plot of performance indexes of LR, PNN and SVM models obtained with 1000 bootstraps: (a) AUC; (b) ACC; and (c) NLR.

ARTICLE IN PRESS A.M.S. Muniz et al. / Journal of Biomechanics 43 (2010) 720–726

725

Table 5 Comparison among logistic regression (LR), probabilistic neural network (PNN) and support vector machine (SVM) models. LR

Mon-sof Mof-son Mon-son

PNN

SVM

Total

Normal

PD

Normal

PD

Normal

PD

4 (28.6%) 10 (83.3%) 10 (90.9%)

10 (71.4%) 2 (16.7%) 1 (9.1%)

4 (28.6%) 4 (33.3%) 7 (63.3%)

10 (71.4%) 8 (66.7%) 4 (36.7%)

2 (14.3%) 4 (33.3%) 7 (63.3%)

12 (85.7%) 8 (66.7%) 4 (36.7%)

When assessing treatments (Table 5), the classifiers ranked treatments in the same order, with DBS-STN presenting better results than just medication, and further GRF improvements with combined treatments.

4. Discussion This is the first study that used force platforms for assessing the effects of the DBS-STN on PD gait. Although most DBS-STN assessments show an improvement in gait speed (Liu et al., 2005; Ferrarin et al., 2005), none of them quantified its effect by analyzing the entire waveform. GRF is one important indicator of joint moments and muscle activities (Zajac et al., 2003), and is partially explained by subjects’ velocity or cadence (Winter, 1991). Its improvement with treatment as measured by force peaks represents only a preliminary step in gait analysis (Loslever and Barbier, 1998). Indeed, it does not allow a clear classification of subjects in different conditions. Conversely, PCA coefficients account for information from the entire GRF waveforms, rather than discrete parameters as in most past studies (Li and Hamill, 2002; Hsiang and Chang, 2002), which disregard the high correlation among those parameters (Chester et al., 2007). Additionally, PCA provided a substantial dimension reduction of classifier inputs and simplified the use of the bootstrap technique, which are recommended for small sample sizes (Sahiner et al., 2008). Not all PCs selected by the broken stick method are necessarily important for classification, since the best trade-off between variance and classification should be taken into account (Chang, 1983; Jolliffe, 2002). Dreiseitl and Ohno-Machado (2002) also recommend determining which variables are relevant to reach a good model performance. The stepwise and bootstrap approaches allowed for the selection of the PCs that maximized performance of the three tested models, using an adequate criteria. The comparison using AUC and ACC also allowed for the selection of the best spread value in PNN, as recommended by Herna´ndezCaraballo et al. (2005), as well as the best kernel and C parameter in SVM. The bootstrap is recognized as a robust approach for estimating classifier performance under a limited sample size (Sahiner et al., 2008). According to Sahiner et al. (2008), the 0.632+ bootstrap approach provides the lowest bias with the greatest accuracy. In the present study, with 1000 bootstrap samples, the PNN presented the highest AUC and the lowest NLR values compared to LR and SVM, while SVM had the highest ACC value (Table 3). AUC has been recommended as better performance index, since it presents higher convergence than ACC (Brandley, 1997), and represents the average sensitivity across all possible specificities (Glas et al., 2003). On the other hand, ACC is still the most commonly used index, although dependent on the prevalence of the disorder in the studied group (Glas et al., 2003). The NLR describes the discriminatory properties of negative test results. According to Deeks (2009), the NLR values correspond to strong diagnostic evidence. Thus, PNN was more rigorous than LR and SVM in accepting a pattern as normal, which agreed with other

14 (100%) 12 (100%) 11 (100%)

studies that compared only the LR and ANN (Nguyen et al., 2002; Song et al., 2005). Following Landis and Koch (1977) method for describing the strength of agreement associated with the Cohens kappa statistics (Table 4), the present results indicate that PNN and LR had a moderate agreement and PNN and SVM presented a substantial agreement. These findings are also in agreement with Dhanalakshmi et al. (2009), who demonstrated that SVM and ANN can be effectively used as classification approaches. Additionally, these results are in line with other research (Begg et al., 2005; Lai et al., 2009), indicating that SVM had good performance in pathological gait pattern recognition. When applied for PD treatment comparison, the three models ranked results from testing conditions in the same order indicating that DBS-STN alone was more effective than medication alone in moving PD patients to a normal GRF pattern. The best results occurred with the combination of treatments (Table 2). Similarly, past studies (Ferrarin et al., 2005; Xie et al., 2001) have reported enhancements in gait performance with stimulation alone and further improvement when combined with medication. Those authors suggested a synergistic effect of DBS-STN and levodopa for PD symptoms. PNN classified fewer PD subjects as normal, presenting reduced classification difference between DBS-STN and medication alone (Table 2), agreeing with past studies (Ferrarin et al., 2005; Xie et al., 2001). Moreover, PNN had lower NLR or odds of a subject having a normal gait pattern; therefore, being more reliable to classify the PD subjects as normal when the GRF pattern is improved.

5. Conclusion The LR, PNN and SVM models presented high performance indexes for classifying GRF pattern of normal subjects and untreated PD. When using the bootstrap approach, PNN performed better according to AUC and NLR criteria and the SVM showed the best ACC. When evaluating the effect of treatments, the three classifiers indicated DBS-STN alone was more effective than medication alone, with the greatest improvement with combined treatments. However, PNN was more restrictive for accepting the patients’ GRF as normal.

Conflict of interest statement None of the authors have any financial or personal relationship that could inappropriately influence the work submitted for publication.

Acknowledgements This work was partially supported by the Brazilian Research Council (CNPq) and by CAPES and FAPERJ foundation.

ARTICLE IN PRESS 726

A.M.S. Muniz et al. / Journal of Biomechanics 43 (2010) 720–726

References Brandley, A.P., 1997. The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition 30 (7), 1145–1159. Begg, R.K., Palaniswami, M., Owen, B., 2005. Support vector machines for automated gait classification. IEEE Transactions on Biomedica Engineering 53 (5), 828–838. Cardillo G., 2007. Cohen’s kappa: compute the Cohen’s kappa ratio on a 2x2 matrix. http://www.mathworks.com/matlabcentral/fileexchange/15365. Chang, W.C., 1983. On using principal components before separating a mixture of two multivariate normal distribuitions. Applied Statistics 32 (3), 267–275. Chester, W.L., Tingley, M., Biden, E.N., 2007. An extended index to quantify normality of gait in children. Gait and Posture 25, 549–554. Dhanalakshmi, P., Palanivel, S., Ramalingam, V., 2009. Classification of audio signals using SVM and RBFNN. Expert Systems with Applications 36, 6069– 6075. Deeks, J.J., 2009. Systematic reviews in health care Systematic reviews of evaluations of diagnostic and screening tests. British Medical Journal 323, 157–162. Dreiseitl, S., Ohno-Machado, L., 2002. Logistic regression and artificial neural network classification models: a methodology review. Journal of Biomedical Informatics 35, 352–359. Efron, B., Tibshirani, R., 1997. Improvements on cross-validation: The .632 + bootstrap method. Journal of the American Statistical Association 92 (438), 548–560. Ferrarin, M., Rizzone, M., Bergamasco, B., Lanotte, M., Recalcati, M., Pedotti, A., Lopiano, L., 2005. Effects of bilateral subthalamic stimulation on gait kinematics and kinetics in Parkinson’s disease. Experimental Brain Research 160, 517–527. Glas, A.S., Lijmer, J.G., Prins, M.H., Bonsel, G.J., Bossuyt, P.M.M., 2003. The diagnostic odds ratio: a single indicator of test performance. Journal of Clinical Epidemiology 56, 1129–1135. Hahn, M.E., Farley, A.M., Lin, V., Chou, L., 2005. Neural network estimation of balance control during locomotion. Journal of Biomechanics 38, 717–724. Herna´ndez-Caraballo, E.A., Rivas, F., Pe´rez, A.G., Marco´-Parra, M., 2005. Evaluation of chemometric techniques and artificial neural networks for cancer screening using Cu, Fe, Se and Zn concentrations in blood serum. Analytica Chimica Acta 533, 161–168. Hsiang, S, Chang, C., 2002. The effect of gait speed and load carrying on the reliability of ground reaction forces. Safety Science 40, 639–657. Jolliffe, I.T., 2002. In: Principal Component Analysis second ed. Springer, New York. Krack, P., Batir, A., Van Blercom, N., Chabardes, S., Fraix, V., Ardouin, A., et al., 2003. Five-year follow-up of bilateral stimulation of the subthalamic nucleus in advanced parkinson’s disease. The New England Journal of Medicine 340 (20), 1925–1934. Krzanowski, W.J., 1998. In: An Introduction to Statistical Modelling. Arnold, London. Lafuente, R., Belda, J.M., Sa´nchez-Lacuesta, J., Soler, C., Prat, J., 1997. Design and test of neural networks and statistical classifiers in computer-aided movement analysis: a case study on gait analysis. Clinical Biomechanics 13 (3), 216–229.

Lai, D.T., Levinger, P., Begg, R.K., Gilleard, W.L., Palaniswami, M., 2009. Automatic recognition of gait patterns exhibiting patellofemoral pain syndrome using a support vector machine approach. IEEE Transactions on Information Technology in Biomedicine 13 (4), 810–817. Landis, J.R., Koch, G.G., 1977. The measurement of observer agreement for categorical data. Biometrics 33 (1), 159–174. Li, L, Hamill, J., 2002. Characteristics of the vertical ground reaction force component prior to gait transition. Research Quarterly for Exercise and Sport 73 (3), 229–237. Liu, W., Mcintire, K., Kim, S.H., Zhang, J., Dascalos, S., Lyons, K.E., Pahwa, R., 2005. Quantitative assessments of the effect of bilateral subthalamic stimulation on multiple aspects of sensoriomotor function for patients with Parkinson’s disease. Parkinsonism and Related Disordens 11, 503–508. Loslever, P., Barbier, F., 1998. Multivariate graphical presentation for gait rehabilitation study. Gait and Posture 7, 39–44. Nguyen, T., Malley, R., Inkelis, S., Kuppermann, N., 2002. Comparison of prediction models for adverse outcome in pediatric meningococcal disease using artificial neural netweork and logistic regression analyses. Journal of Clinical Epidemiology 55, 687–695. Ostergaard, K., Sundae, N.A., 2006. Evolution of Parkinson’s disease during 4 years of bilateral deep brain stimulation of the subthalamic nucleus. Movement Disorders 21 (5), 624–631. Sahiner, B., Chan, H., Hadjiiski, L., 2008. Classifier performance prection for computer-aided diagnosis using a limited dataset. Medical Physics 35 (4), 1559–1570. Schumacher, M., Robner, R., Wemer, V., 1996. Neural networks and logistic regression. Part I Computational Statistics and Data Analysis 21, 661–682. Schutte, L.M., Narayanan, U., Stout, J.L., Selber, P., Gage, J.R., 2000. An index for quantifying deviations from normal gait. Gait and Posture 11, 25–31. Schwartz, M.H., Rozumalski, A., 2008. The gait deviation index: A new comprehensive index of gait pathology. Gait and Posture 28, 351–357. Song, J.H., Venkatesh, S.S., Conant, E.A., Arger, P.H., Sehgal, Chandra M., 2005. Comparative analysis of logistic regression and artificial neural network for computer-aided diagnosis of breast masses. Academic Radiology 12, 487–495. Specht, D.F., 1990. Probabilistic neural networks. Neural Networks 3, 109–118. Specht, D., Romsdahl, H., 1994. Experience with adaptive probabilistic neural networks and adaptive general regression neural networks. Neural Networks 1203–1208. Su, F., Wu, W., 2000. Design and testing of a genetic algorithm neural network in the assessment of gait patterns. Medical Engineering and Physics 22, 67–74. Vapnik, VN., 2000. In: The Nature of Statistical Learning Theory second ed. Springer, New York. Winter, D.A., 1991. In: The Biomechanics and Motor Control of Human Gait: Normal, Elderly and Pathological second ed. University of Waterloo Press, Waterloo. Xie, J., Krack, P., Benabid, A., Pollak, P., 2001. Effect of bilateral subthalamic nucleus stimulation on parkinsonian gait. Journal of Neurology 248, 1068–1072. Zajac, F.E., Sachin, R.R., Kautz, S.A., 2003. Biomechanics and muscle coordination of human walking part II: lessons from dynamical simulations and clinical implications. Gait and Posture 17, 1–17.