Multiple kernel learning based on three discriminant features for a P300 speller BCI

Multiple kernel learning based on three discriminant features for a P300 speller BCI

Author’s Accepted Manuscript Multiple kernel learning based on three discriminant features for a P300 speller BCI Kyungae Yoon, Kiseon Kim www.elsevi...

1MB Sizes 0 Downloads 46 Views

Author’s Accepted Manuscript Multiple kernel learning based on three discriminant features for a P300 speller BCI Kyungae Yoon, Kiseon Kim

www.elsevier.com/locate/neucom

PII: DOI: Reference:

S0925-2312(16)31088-8 http://dx.doi.org/10.1016/j.neucom.2016.09.053 NEUCOM17582

To appear in: Neurocomputing Received date: 30 June 2015 Revised date: 16 August 2016 Accepted date: 25 September 2016 Cite this article as: Kyungae Yoon and Kiseon Kim, Multiple kernel learning based on three discriminant features for a P300 speller BCI, Neurocomputing, http://dx.doi.org/10.1016/j.neucom.2016.09.053 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Multiple kernel learning based on three discriminant features for a P300 speller BCI Kyungae Yoona,∗, Kiseon Kimb a Department

of NanobioMaterials and Electronics, Gwangju Institute of Science and Technology (GIST), Republic of Korea b School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Republic of Korea

Abstract In this paper, we propose multiple kernel learning (MKL) based on three discriminant features to learn an efficient P300 classifier to improve the accuracy of character recognition in a P300 speller BCI. Three discriminant features are specified in raw samples and two morphological features, which can complement the MKL of a P300 classification. A linear kernel is established for each discriminant feature. A kernel weight differentiates the linear kernel to both explore complementary information among the three discriminant features and weigh a contribution of each discriminant feature for the MKL. Here, the 1 norm regularization of the kernel weight ultimately enforces an optimal discriminant feature set of the MKL of a P300 classification. The performance of the proposed method is then evaluated according to the size of the three discriminant feature sets that are generated from dataset II of BCI competition III. Compared to an existing SVM-based classification method, the proposed method consistently obtains better or similar accuracy in terms of character recognition, with a different execution time for the variable size of the three discriminant feature sets. Furthermore, the kernel weight of the raw samples was found to consistently be more dominant than the kernel weight of the two morphological features on the variable size of the three discriminant feature ∗ Corresponding

author Email addresses: [email protected] (Kyungae Yoon), [email protected] (Kiseon Kim)

Preprint submitted to Journal of LATEX Templates

September 27, 2016

sets. This finding means that the two morphological features supplement the lack of the raw samples for the MKL of a P300 classification. We ultimately could conclude that the proposed method is sufficiently robust to improve the accuracy of character recognition with a different time for the variable size of the three discriminant feature sets in a P300 speller BCI. Keywords: Three discriminant features, Multiple kernel learning, Variable size of three discriminant feature sets, Accuracy of character recognition

1. Introduction A brain-computer interface (BCI) is a direct communication pathway between a human brain and an external device [2][3][4]. The BCI can be used to make communication possible for people suffering from motor disabilities such as spinal cord injuries, amyotrophic lateral sclerosis (ALS; i.e., Lou Gehrig’s disease), because the BCI does not require any movement from the person using it [5][6]. An event-related potential (ERP) is one of the brain signals that is used in the BCI. The ERP is a brain response that is time-locked to the onset of an external event such as the presentation of an audio or a video stimulus [7][8]; in particular, P300 appears about 300 ms after the external event [9][10]. In general, the ERP is difficult to classify because it is quite small relative to artifacts such as eye blinks, eye movements, muscle activity, power lines, etc. [21]. Recently, discriminant features have been presented for learning efficient ERP classifiers, as discriminant features can represent various information for ERP classifications (see Fig. 1). Discriminant features of an ERP classification include morphological features such as peaks or latency periods [11][12], wavelets [13], principal component analysis (PCA) parameters [14], and auto regressive (AR) model parameters [15]. These discriminant features have been applied in classification techniques for a P300 speller BCI, with several techniques having demonstrated notable performance, including a stepwise linear discriminant analysis (SWLDA) and a support vector machine (SVM) [16][17][18][20][24]. 2

4CYUCORNGU &KOGPUKQP /QTRJQNQIKECN HGCVWTGU 2GCM &KOGPUKQP  .CVGPE[

'')FCVC

(GCVWTG GZVTCEVKQP

&KOGPUKQP 

9CXGNGVU

&KUETKOK PCPV HGCVWTG UGVU

'42 ENCUUKHK ECVKQP

&96 &KOGPUKQP 

#4OQFGN RCTCOGVGTU &KOGPUKQP

Figure 1: Illustration of ERP classification framework based on a discriminant feature.

Reproducing kernel theory has an important application in numerical analysis, computational mathematics, machine learning, and probability and statics [22][23]. Multiple kernel learning (MKL), an extension of SVM, has recently been used in an attempt to efficiently handle multiple kernels on classification [19][25][26][28][29][30][32]. MKL has the ability to enhance the interpretability of a decision function, and thus improve the classification performance. In [32], multiple kernels are used via a convex optimization problem to represent different types of biological data. SimpleMKL [24] then casts each feature type as one or more kernels and subsequently solves the optimization problem pertaining to support vectors and kernel weights using simplex constraints. MKL has also been applied to concatenated discriminant features in order to improve classification performance in a P300 speller BCI [31]. Besides, in [22][23], fuzzy differential equation using reproducing kernel Hilbert space (RKHS) method is presented to obtain numerical solution of the uncertainty differential equations. When discriminant features are used to learn an efficient P300 classifier, however, it is often unclear which discriminant features can be complemented and how discriminant features can be applied to the learning. In this paper, the proposed method applies MKL based on three discriminant features to learn an 3

efficient P300 classifier and thereby improve the accuracy of character recognition in a P300 speller BCI. Three discriminant features are specified in the raw samples and two morphological features, the negative area and amplitude of P300, which can complement the MKL of a P300 classification. Importantly, a linear kernel is established for each discriminant feature, and the kernel weight is used to differentiate the linear kernels to both explore complementary information among the three discriminant features and weigh the contribution of each discriminant feature on the MKL. Here, the 1 norm regularization of the kernel weight ultimately enforces an optimal discriminant feature set for the MKL of a P300 classification. The performance of the proposed method is then evaluated based on dataset II of BCI competition III. We evaluate the accuracy of character recognition and execution time according to the variable size of the three discriminant feature sets. Compared to an existing SVM-based classification method, the proposed method consistently obtains better or similar accuracy for character recognition at a different execution time on the variable size of the three discriminant feature sets. Furthermore, the kernel weight of the raw samples is consistently more dominant than the kernel weight of the two morphological features for the variable size of the three discriminant feature sets. This dominance implies that the two morphological features supplement the lack of raw samples for the MKL of a P300 classification. The proposed method has a weakness in that the performance is affected by a kernel type and hyperparameter. The proposed method has limits in the kernel type that is suitable for the three discriminant feature sets. That is, only a linear kernel is fitted with the three discriminant feature sets. Moreover, the regularization parameter C affects execution time, and the kernel degree affects the accuracy of character recognition for the variable size of the three discriminant feature sets. However, when the kernel type and hyperparameter is properly predefined, we ultimately could conclude that the proposed method is sufficiently robust to improve the accuracy of character recognition with a different time for the variable size of the three discriminant feature sets in a P300 speller BCI. The proposed and 4

existing SVM-based classification method was implemented by modifying SimpleMKL toolbox based on Matlab code which is available at http://asi.insarouen.fr/enseignants/ arakoto/code/mklindex.html. The remainder of this paper is organized as follows. Section 2 reviews previous MKL research, and Section 3 describes the proposed method which applies MKL based on three discriminant features. Section 4 reports and discusses the numerical results. Finally, Section 5 concludes this paper. 2. Multiple kernel learning A support vector machine (SVM) [20] is a known effective binary classifier due to its generalization ability. It learns an optimal separating hyperplane to then distinguish data between two different classes without any assumption on data distribution. However, a single kernel might not be sufficient to model the data of interest, and thereby produce a satisfactory separating hyperplane. As a result, multiple kernels have recently been applied for modeling the data of interest. That is, one can replace a single kernel with a linear combination of multiple kernels, with each kernel used to describe a different property of the data of interest (i.e., different feature spaces or distributions): k(xi , xj ) =



βm km (xi , xj ) with

m



βm = 1, β  0

m

(1)

where β = [β1 , β2 , · · · , βm ] and βm denotes the weight of the mth kernel function. The process of learning kernel weights while simultaneously finding an optimal separating hyperplane is known as multiple kernel learning (MKL) [24]. This process describes the data in multiple feature spaces by using different norm vectors wm . For example, let φ : χ → H be a feature mapping function, in which χ is an input space and H is a feature space associated with the inner product φ(xi ), φ(xj )H . Note that we then have φ(xi ), φ(xj )H = k(xi , xj ), which computes the inner product between the transformed feature vectors Φ(xi ) and Φ(xj ), and k(·, ·) is a positive semidefinite kernel function.

5

When MKL is plugged into SVM, the primal form of MKL is reformulated as the following optimization problem: min

β,w,b,ε

s.t

 1 1 (wm )2Hm + C εi 2 m βm i  yi ( (wm ), φm (xi )Hm + b) + εi ≥ 1 m

(2)

εi ≥ 0, f or ∀i  βm = 1, β  0 m

where β = [β1 , β2 , · · · , βm ], m is the number of multiple kernels (i.e., the number of feature spaces), and C is a regularization parameter between training errors and an optimal separating hyperplane. From the above formulation, the primal form of MKL is seen to restrict the weight of the norm vector wm , with the  constraint of m βm = 1 and βm ≥ 0, which tends to produce a sparse solution for β. In (2), it can also be observed that if βm vanishes, then the corresponding (wm )Hm should be zero; otherwise, the value of the objective function will be unbounded. In [24], Rakotomamonjy et al. reported that (wm )Hm → 0 as βm → 0, which prevents the value of the objective function from approaching  infinity. Therefore, use of the sparsity constraint m βm = 1 would produce a valid and sparse solution for β. The primal form of MKL can be converted into the following min-max problem by introducing Lagrange multipliers αi , such that: min max β

s.t

α

 i



αi −

 1  yi yj αi αj βm km (xi , xj ) 2 i j m

yi αi = 0

(3)

i

0 ≤ α ≤ 1C  βm = 1, β  0. m

When SVM is compared to (3), multiple kernels (i.e., km (xi , xj )) are applied in  (3), whereas only a single kernel is used in SVM. Since the constraint m βm = 1 tends to result in a sparse solution of β, this learning process can thus be 6

viewed as the removal of redundant kernels from among multiple kernels. In other words, the MKL formulation in (3) aims to determine an optimal linear combination of multiple kernels to improve the classification performance, which is achieved by learning the best weights β for the multiple kernels. For a test input x, the decision function of MKL can be computed as FMKL (x) =



αi yi



βm km (xi , x) + b.

(4)

m

i

3. Method The proposed method consists of four main parts: 1) pre-processing to improve the general quality of an EEG signal in order for more accurate signal measurements and analyses; 2) three discriminant features of P300 are extracted for further classification; 3) MKL is trained for a P300 classification in a training set labeled as “+1” or “-1” for P300 presence or absence; and 4) character recognition through a trained MKL is performed in order to predict a character from a unlabeled test set. 3.1. EEG data of a P300 speller paradigm The EEG data used in the proposed method are based on BCI competition III – dataset II [1]. The P300 speller paradigm allows a user to choose a character from among a predefined set of alphanumeric characters (letters from A to Z, digits from 1 to 9, and -). A 6 × 6 matrix of the alphanumeric characters is then presented to the user and the rows and columns of the matrix are intensified in random order. The user concentrates on the target character they want to spell. Since the target character is rare compared to the others, P300 is elicited when the rows and the columns including the target character are intensified. To make the spelling procedure reliable, the sequence of intensifications is repeated 15 times for each character to spell. The task of the P300 speller BCI is to then determine which target character the user focuses on by comparing the responses evoked by the intersection of each row and each column.

7

For each subject, there are two datasets–a training set and a test set. The training set is composed of 85 characters and the number of sessions. Each session consists of a number of runs for the spelling of five characters. Note that one spelled character corresponds to 180 trials (12 row/column intensifications× 15 repetitions). Only 30 trials from the 180 trials correspond to a target character intensification that yields P300. Each trial in the training set is labeled with “+1” or “-1”, indicating either P300 presence or absence. A test set is composed of 100 characters; a trial in the test set is not labeled and will be classified by the classifier that is learned from the training set. More details of the datasets can be found in the BCI competition III report [1]. 3.2. Character recognition of a P300 speller paradigm 6TCKPKPIUGV

'')5KIPCNU

2TG RTQEGUUKPI

(GCVWTG 'ZVTCEVKQP

2 %NCUUKHKECVKQP

*\ $CPFRCUUHKNVGTKPI

6JTGGFKUETKOKPCPV HGCVWTGUQH2

/-.DCUGFQP VJTGGFKUETKOKPCPV HGCVWTGUGZVTCEVGF

2TG RTQEGUUKPI

(GCVWTG 'ZVTCEVKQP

%JCTCEVGT 4GEQIPKVKQP

*\ $CPFRCUUHKNVGTKPI

6JTGGFKUETKOKPCPV HGCVWTGUQH2

6TCKPGF/-.

6GUVUGV

'')5KIPCNU

2TGFKEVC EJCTCEVGT

Figure 2: Overall process of character recognition in a P300 speller BCI.

As illustrated in Fig. 2, we propose a method to predict a character in a P300 speller BCI. 3.2.1. Preprocessing From 64 channels, we preprocess EEG signals existing between 0 ms and 600 ms posterior to the beginning of a stimulus. Using the knowledge that P300 appears about 300 ms after a stimulus, we posit that this time interval is sufficient to capture all required information for a P300 classification. To

8

filter EEG signal, we use the 4th-order bandpass Chebyshev Type I in which the cut-off frequency is between 0.1 Hz and 20 Hz. 3.2.2. Feature extraction In general, P300 has a property in which the amplitude has a maximum value at around 300 ms after a stimulus, and the latency is measured at the time having the largest amplitude [8][9][10]. We specify three discriminant features that can complement the MKL of a P300 classification[11][13][12]. These three features are composed of the raw samples and two morphological features, the amplitude and negative area of P300. Accordingly, no other discriminant features will be considered here, in order to avoid the overfitting of a P300 classification. In summary, the three discriminant features we selected are as follows: 1. Raw samples (RAW, SRAW ): Results of the bandpass filtering of the EEG signal s(t) between 0 ms and 600 ms after a stimulus. The raw samples are represented in SRAW = bandpass(s(t)0≤t≤600 ), with each trial retaining 14 samples per channel and thus the corresponding feature vector has dimensions of 14 × 64 . 2. Amplitude (AM P, SAMP ): The largest peak value of the EEG signal s(t) between 0 ms and 600 ms after a stimulus. The amplitude is represented in SAMP = max(s(t)0≤t≤600 ). The amplitude of each trial retains 1 sample per channel, and thus the corresponding feature vector has dimensions of 1 × 64. 3. Negative area (N AR, SN AR ): Results of the bandpass filtering of the negative area of the EEG signal s(t) between 0 ms and 600 ms after a stimulus. 600 The negative area is represented in SN AR = bandpass( t=0 0.5(s(t) − |s(t)|)). The negative area of each trial retains 7 samples per channel, and thus the corresponding feature vector has dimensions of 7 × 64.

9

Training set

Three discriminant feature sets DRAW

D

DAMP

DNAR

( x1, y1 )

( v1(1), y1 )

( v1(2), y1 )

( v1(3), y1 )

(x 2, y 2 )

( v(1)2, y 2 )

( v(2) 2 , y2 )

( v(3) 2 , y2 )

(xi , y i )

( v(1)i , y i )

( v(2) i , yi )

( v(3) i , yi )

(xN , y N )

( v(1)N , y N )

( v(2) N , yN )

( v(3) N , yN )

DRAW

Kernel Matrix 1

E1 u k1

k1

D AMP

Kernel Matrix 2

E2 u k2

k2

DNAR

Kernel Matrix 3 k3

E 3 u k3

Optimal combination of kernel matrices across

(a)

Kernel Matrix

k

MKL

w,b

( P300 classification )

(b) Figure 3: Illustration of proposed MKL based on three discriminant features of P300: (a) three discriminant feature sets of P300: DRAW of raw samples, DAM P of amplitude, and DNAR of negative area, and (b) proposed MKL based on the three discriminant features extracted.

10

3.2.3. MKL based on three discriminant features of P300 For the MKL of a P300 classification, we assume that a training set D is composed of N labeled trials: D = {(xi , yi )}N i=1 , where xi is the feature vector concatenated by the timed samples of all channels of ith trial and the corresponding label yi ∈ {−1, +1} refers to two classes. Here, yi = +1 corresponds to an expected presence of P300 of a stimulus and yi = −1 corresponds to an expected absence of P300 of a stimulus. From the feature extraction, we have three discriminant feature sets: DRAW , DAMP , (1)

(2)

N and DN AR ; DRAW = {(vi , yi )}N i=1 , DAMP = {(vi , yi )}i=1 , and DN AR = (3)

(1)

{(vi , yi )}N i=1 , where vi

(2)

∈ R(14×64) , vi

(3)

∈ R(1×64) , and vi

∈ R(7×64) are

discriminant feature vectors of all channels of the ith trial in terms of raw samples, amplitude, and negative area (see Fig. 3(a)). We apply MKL using 1 norm regularization for the three discriminant feature sets (see Fig. 3(b)), such that a linear kernel is established for each discriminant feature. Kernel weights for the 1 norm regularization differentiate the three linear kernels that are then used to both explore complementary information between the three discriminant features and weigh the contribution of each discriminant feature on the MKL of a P300 classification. Consequently, three linear kernels and associated kernel weights β = [β1 , β2 , β3 ] are obtained. The corresponding MKL is then formulated in the following optimization problem: min

β,w,ε,b

s.t

3 N  1  1 wm 2 + C εi 2 m=1 βm i=1

yi (

3 

(m)

wm , φm (vi

) + b) + εi ≥ 1, f or ∀i

m=1

(5)

εi ≥ 0, f or ∀i 3 

βm = 1, β  0

m=1

where wm is the weight vector of the mth discriminant feature set, εi is the ith slack variable, b is an offset value, and C is a regularization parameter between training errors and an optimal separating hyperplane.

11

As discussed above, a redundant kernel among the three linear kernels will be removed via the 1 norm regularization on the kernel weights β. This removal ultimately enables us to construct an optimal discriminant feature set for the MKL of a P300 classification. In addition, we have wm  → 0 as βm → 0 in the optimization problem of the MKL. This property prevents the value of the objective function in (5) from approaching infinity, and thus a valid solution β under the 1 norm regularization will be maintained as shown in [24]. The optimization problem of the MKL can thus be transformed into the following min-max problem by introducing Lagrange multipliers α = {αi }N i=1 . min max β

s.t

α

S(α, β) =

N 

αi −

i=1 N 

N N 3  1  (m) (m) αi αj yi yj βm km (vi , vj ) 2 i=1 j=1 m=1

αi yi = 0

(6)

i=1

0 ≤ αi ≤ C, f or ∀i 3 

βm = 1, β  0

m=1

The min-max problem (6) is solved using a gradient descent method. For any admissible value of β, the maximization problem over α can be solved using any regular SVM solver. With the determined values of α, the minimization problem over β is solved using a gradient descent method that converges for function S(α, β). Once the gradient of the first equation in (6) is computed, β is updated using the descent direction of the gradient method, thereby ensuring that both the 1 norm regularization and non-negativity constraints on the kernel weights β are satisfied. These two steps are iterated until a stopping criterion is reached; the stopping criterion we choose is based on a variation of the objective function value and the number of iterations. Hence, the decision function of a P300 classification of a new trial is generated as follows: FproposedMKL (v) =

3 N  

(m)

βm km (v(m) , vi

)αi + b

(7)

i=1 m=1

where v is the feature vector composed of the three discriminant features of 12

the new trial. Table 1 describes the overall process for the MKL of a P300 classification in the proposed method. Input: (1) Three discriminant feature sets DRAW , DAMP , and DN AR : (1)

(2)

(3)

N N DRAW = {(vi , yi )}N i=1 , DAMP = {(vi , yi )}i=1 , DN AR = {(vi , yi )}i=1 (1)

(2)

(3)

where vi , vi , and vi

is a discriminant feature vector of the ith trial in

terms of raw samples, amplitude, and negative area, yi ∈ {−1, +1} is the corresponding label, and N is the number of trials (2) Predefined linear kernel function and a regularization parameter C Output: α and β 0 ← (1/3) f or ∀m; α0 ← 0 begin n ← 1; βm

while Sn < S(n−1) and n ≤ 100 do αn ← solve (6) with fixed β n−1 ; Sn ← S(αn , β n−1 ); β n ← solve (6) with all determined values of α0 , α1 , · · · , αn ; n←n+1 end while *n: number of iterations, n : α, βm of nth iterations *αn , βm

Table 1: The overall process of proposed MKL based on three discriminant features of P300.

3.2.4. Character recognition The MKL of a P300 classification is applied to predict a character from a unlabeled test set. Under normal conditions, data is too noisy to predict a correct character during a single trial. Therefore, we repeated the trials several times for corresponding rows and columns per character, as shown in [18][19]. We regard the value of decision function (7) as a score, with the average scores of J repetitions of the trials corresponding to the rows and columns. The row and column with the highest score after J repetitions is chosen to represent the

13

row and column having P300. As such, evidence for the presence or absence of P300 from J repetitions is calculated using equation (8) for each row and each column Si =

J 1 FproposedMKL (vij ) J j=1

(8)

where FproposedMKL (vij ) reflects the score of vij , and the jth repetition of the trial for the ith row or column. Next, the target row and column are chosen as arg max(Si ) f or i = 1, 2, . . . , 6 w.r.t a row or a column. i

(9)

4. Numerical results and discussions This section presents the results of the proposed method according to the variable size of the three discriminant feature sets mentioned above for a P300 speller BCI. The proposed method will be compared to an existing method that applies a SVM-based classifier, in terms of the accuracy of character recognition and execution time. The effect of a hyperparameter, a regularization parameter C and kernel degree, will also be evaluated. The three discriminant feature sets of a training set are scaled to small, middle, and large sizes for all methods considered. The small-sized, middlesized, and large-sized discriminant feature sets are composed of 20, 40, and 60 characters, respectively. All discriminant feature sets are normalized between 0 and 1. When the three discriminant feature sets are used simultaneously in the existing method, the three discriminant feature sets are combined by concatenating three discriminant feature vectors per trial. Linear SVM (LSVM) is obtained using a polynomial kernel function based on kernel degree d =1, and Gaussian SVM (GSVM) is implemented using a Gaussian kernel function based on kernel degree σ = 1. When MKL is applied in the existing method, multiple kernels are obtained by using a polynomial kernel function having different kernel degrees d = {1, 3, 5}.In contrast, when the proposed method applies MKL to the three discriminant feature sets, each linear kernel is obtained using a polynomial kernel function having a kernel degree d =1. In addition, multiple 14

< Subject A >

< Subject B >

RAW

NAR

AMP

RAW+NAR

RAW+AMP

NAR+AMP

RAW+NAR+ AMP

Figure 4: Voltage difference between averaged P300 and averaged non-P300 for a discriminant feature set.

kernels obtained using the proposed method are reimplemented using a polynomial kernel function having different kernel degrees d = {1, 2, 3, 4, 5} to verify the effect of kernel degree. A regularization parameter C is predefined to 1 in all methods, redefined to {10−2 , 10−1 , 100 , 101 , 102 } to evaluate the effect of C in the proposed method. Fig. 4 shows the voltage difference between the averaged P300 and averaged non-P300 for a large-sized discriminant feature set. Note that though the voltage difference is distinguishable for each single discriminant feature set, the voltage difference tends to be similar between the raw sample set and a combined discriminant feature set. This result can indicate that the capability of the combined discriminant feature set is similar to the capability of the raw sample set in terms of describing the brain region of P300. Here, we posit that the character recognition accuracy strongly depends on the discriminant feature set and classifier in the P300 speller BCI. Table 2 shows the accuracy of character recognition according to a different classifier having a single discriminant feature set for subjects A and B. Similarly, Table 3 shows the accuracy of character recognition according to a different classifier having a combined discriminant

15

Discriminant feature set RAW

Subject A

NAR AMP RAW

Subject B

NAR AMP

Number of repetitions Method

1

2

3

4

5

6

7

8

9

10

11 12 13 14 15

LSVM GSVM MKL LSVM GSVM MKL LSVM GSVM MKL

17 3 3 6 3 1 5 8 1

20 3 2 9 3 6 1 2 1

36 3 4 12 3 6 2 1 2

36 3 4 13 3 5 5 4 4

38 3 6 15 4 2 7 6 3

47 3 4 22 4 5 5 5 4

57 3 2 22 5 6 4 7 2

63 3 2 29 5 6 7 6 5

65 3 4 31 4 6 10 6 4

72 3 4 37 3 6 14 8 4

73 3 4 40 4 9 16 11 4

82 3 3 42 4 9 14 10 4

86 3 3 51 2 10 13 11 4

84 3 4 48 2 11 15 7 5

86 3 4 50 3 10 11 6 4

LSVM GSVM MKL LSVM GSVM MKL LSVM GSVM MKL

37 3 9 4 2 1 4 2 5

51 3 8 5 3 5 7 2 4

57 3 10 8 4 5 8 1 2

61 3 12 10 5 3 5 3 2

74 3 12 13 2 5 7 2 3

75 3 11 14 1 5 8 3 4

79 3 11 19 1 6 8 5 5

78 3 13 20 3 4 8 4 5

82 3 14 20 3 5 7 3 5

83 3 14 17 3 5 7 2 5

85 3 13 21 3 4 9 2 5

89 3 12 22 3 2 12 3 6

91 3 11 23 4 2 9 2 6

93 3 10 25 4 2 9 2 4

90 3 10 28 5 2 9 2 4

Table 2: Accuracy of character recognition according to a single discriminant feature set and classifier.

Discriminant

1

2

3

4

5

6

7

8

9

10

11 12 13 14 15

RAW+NAR

LSVM GSVM MKL RAW+AMP LSVM GSVM MKL NAR+AMP LSVM GSVM MKL RAW+ LSVM NAR+AMP GSVM MKL

14 3 3 14 3 2 10 3 2 17 3 7

20 3 2 17 3 1 11 2 4 19 3 0

28 3 4 33 3 3 15 2 4 26 3 4

31 3 4 36 3 3 16 2 2 35 3 3

28 3 5 40 3 4 20 2 3 31 3 4

34 3 4 47 3 4 24 1 5 34 3 7

40 3 6 51 3 4 29 2 5 38 3 5

45 3 3 57 3 3 27 3 4 46 3 4

49 3 3 59 3 3 31 3 6 53 3 4

57 3 3 65 3 3 36 3 5 60 3 3

63 3 2 69 3 1 44 3 4 61 3 3

74 3 2 77 3 4 46 3 3 66 3 3

74 3 3 74 3 5 52 3 4 72 3 2

78 3 2 79 3 4 49 2 4 72 3 3

79 3 3 80 3 5 54 4 4 71 3 1

LSVM GSVM MKL RAW+AMP LSVM GSVM MKL NAR+AMP LSVM GSVM MKL RAW+ LSVM NAR+AMP GSVM MKL

23 3 13 28 3 9 9 4 3 29 3 13

40 3 14 46 3 7 8 4 6 44 3 16

44 3 13 57 3 9 4 3 7 53 3 17

52 3 16 60 3 10 13 4 4 52 3 19

57 3 19 69 3 11 14 4 4 64 3 23

61 3 21 70 3 10 17 4 5 63 3 23

72 3 23 76 3 7 20 4 8 75 3 26

72 3 17 75 3 10 20 4 4 74 3 21

76 3 23 79 3 10 26 4 7 76 3 25

76 3 19 84 3 12 23 6 7 77 3 24

79 3 20 86 3 10 30 5 9 79 3 24

79 3 21 89 3 9 33 6 6 82 3 24

84 3 16 90 3 12 37 6 7 82 3 26

83 3 20 93 3 10 38 6 4 79 3 26

85 3 17 90 3 10 38 6 6 81 3 25

feature set

Subject A

RAW+NAR

Subject B

Number of repetitions Method

Table 3: Accuracy of character recognition according to a combined discriminant feature set and classifier.

feature set for subjects A and B. Notably, the LSVM accuracy is seen to be superior to the accuracies of GSVM and MKL for both the single discriminant

16

feature set and the combined discriminant feature set for subjects A and B. Interestingly, LSVM based on the raw sample set displays better accuracy than LSVM using the combined discriminant feature set, for both subjects. These results indicate that in the existing method, the LSVM based on the raw sample set best-learns the character recognition for the P300 speller BCI. However, these methods do not consider which discriminant features can be complementary, and how discriminant features can be applied to learn an efficient classifier

Small-sized discriminant feature set

Number of repetitions

< Subject B >

Number of repetitions

Middle-sized discriminant feature set

The accuracy of character recognition (%)

< Subject A >

Number of repetitions

The accuracy of character recognition (%)

Number of repetitions

The accuracy of character recognition (%)

Number of repetitions

The accuracy of character recognition (%)

The accuracy of character recognition (%)

The accuracy of character recognition (%)

in the P300 speller BCI.

Number of repetitions

Large-sized discriminant feature set

Figure 5: Comparison of the accuracies of character recognition between the proposed and existing method with respect to the variable size of three discriminant feature sets: 2 proposed MKL based on three discriminant features,  LSVM using a raw sample set,  LSVM based on the concatenation of three discriminant features,  GSVM using the concatenation of three discriminant features, ◦ MKL based on the concatenation of three discriminant features.

Let us now show the results of the proposed method according to the variable 17

size of the three discriminant feature sets discussed above. In Fig. 5, it is clear that the proposed method consistently obtains better or similar accuracies in terms of character recognition than the existing method with respect to the variable size of the three discriminant feature sets. In particular, the accuracy of the proposed method significantly outperforms the accuracies of the existing method for a small-sized discriminant feature set, from the beginning of the repetitions of the trial for subject A and from the 7th repetition of the trial for subject B. Furthermore, compared to the existing MKL, the proposed method markedly improves the accuracy of character recognition for both subjects. < Subject A >

LSVM with RAW

LSVM with RAW+NAR+AMP

< Subject B >

GSVM with RAW+NAR+AMP

MKL with RAW+NAR+AMP

Proposed MKL with RAW+NAR+AMP

Figure 6: Comparison of execution times between the proposed and existing method with respect to the variable size of a discriminant feature set of subjects A and B.

Fig. 6 compares the execution time between the proposed and existing method. Basically, the proposed MKL increases the comparable execution time with subsequent increases in the three discriminant feature set size, subsequently improving the accuracy of character recognition. When compared to the existing 18

MKL, the proposed MKL significantly reduces the execution time for a large-

(OVDUWLFOHBUHYLVLRQBNHUQHOBZHLJKWVBGDWDVHWBVL]HB$B%

sized discriminant feature set, whereas the proposed MKL requires more time for the small-sized and middle-sized discriminant feature sets. Unlike standard MKL, which generally requires a greater execution time than single kernel-based SVM, the proposed MKL has a faster execution time than the existing LSVM for a large-sized discriminant feature set of subject B. Interestingly, GSVM requires the longest execution time in all methods, and has the lowest accuracy in terms of character recognition for the variable size of the three discriminant feature sets. These results confirm the fact that the execution times of SVM-based classification methods depend on the number of features, feature set suitability for learning a classifier, and the optimization algorithm. Note that the execution time of all methods was estimated using a personal computer having an Intel(R) Core (TM)i5-2500 CPU @ 3.60 GHz and 64.0 GB RAM.

< Subject A >

< Subject B > RAW NAR AMP

Figure 7: Kernel weights according to the variable size of three discriminant feature sets for subjects A and B.

Fig. 7 illustrates the kernel weights of the three discriminant features in the proposed method. Each kernel weight corresponds to the degree of contribution of each discriminant feature for the MKL of a P300 classification. The figure shows that the raw sample has a large kernel weight, and that the two morpho19

logical features have relatively small kernel weights with respect to the variable size of the three discriminant feature sets. These results can be used to confirm that the two morphological features supplement the lack of the raw sample for the MKL of a P300 classification for the variable size of the three discriminant feature sets. Size

of

a

discriminant feature set Small-sized discriminant feature set

Subject A

Middle-sized discriminant feature set Large-sized discriminant feature set

Small-sized discriminant feature set

Subject B

Middle-sized discriminant feature set Large-sized discriminant feature set

Number of repetitions C

1

2

3

4

5

6

7

8

9

10 11

12 13 14 15

C=0.01 C=0.1 C=1 C=10 C=100 C=0.01 C=0.1 C=1 C=10 C=100 C=0.01 C=0.1 C=1 C=10 C=100

14 14 13 13 13 16 14 13 14 13 17 17 17 16 16

25 25 25 25 25 33 28 30 30 30 36 38 39 41 41

48 47 46 45 45 49 20 47 47 46 50 52 48 47 50

51 52 51 51 51 57 57 56 56 56 58 60 57 57 59

59 60 57 57 57 66 67 63 64 64 66 68 67 68 71

59 63 63 63 63 73 71 68 68 68 73 71 73 71 72

67 67 69 69 69 77 75 75 75 75 75 77 78 78 79

75 75 74 75 75 84 82 81 82 80 83 82 79 79 82

82 80 75 75 75 87 84 83 83 83 78 79 82 82 82

83 83 85 85 85 86 87 85 85 85 85 86 88 88 88

85 85 86 86 86 89 87 86 85 87 88 90 91 91 91

88 88 88 87 88 89 90 89 89 89 94 93 93 93 93

90 90 90 90 89 91 92 94 93 94 96 97 94 94 94

90 89 90 90 90 94 95 95 95 95 95 97 96 96 96

90 92 93 93 93 95 96 95 95 95 97 97 97 97 98

C=0.01 C=0.1 C=1 C=10 C=100 C=0.01 C=0.1 C=1 C=10 C=100 C=0.01 C=0.1 C=1 C=10 C=100

28 30 28 28 29 35 34 35 34 37 39 39 39 38 39

52 53 52 53 53 55 49 59 60 55 61 60 60 59 60

54 64 62 64 63 64 60 65 65 61 64 65 65 65 65

64 64 63 63 64 67 67 66 64 67 68 70 70 69 70

69 74 73 74 73 71 74 73 72 75 76 74 74 72 74

73 75 75 75 75 82 81 82 82 80 83 81 82 81 82

77 81 81 81 82 85 84 85 85 85 87 88 88 89 88

87 88 88 88 88 92 89 92 92 91 89 89 89 90 89

89 90 89 89 89 95 93 94 94 94 93 94 94 94 94

92 90 90 90 90 94 93 95 95 94 97 96 96 96 96

93 94 94 94 94 95 93 96 96 96 96 97 97 96 97

95 95 95 95 95 97 97 97 97 98 98 98 98 99 98

94 92 92 92 92 96 97 97 97 97 97 97 98 98 98

93 92 93 93 93 97 97 96 96 96 96 97 97 97 97

95 98 98 98 98 97 97 97 97 96 97 97 97 97 97

Value

Table 4: Accuracy of character recognition according to C value for each size of three discriminant feature sets.

Let us evaluate the effect of a hyperparameter in the proposed method, first looking at the effect of a regularization parameter C on the variable size of the three discriminant feature sets. Table 4 shows that the accuracy of character recognition is similar for all different values of C, for each size of the three discriminant feature sets, although the accuracy of each C value improves with a size increase in the three discriminant feature sets.

20

(OVDUWLFOHBUHYLVLRQBOHDUQLQJWLPHB&YDOXHBSURSRVHG0./BGDWDVL]HB$B%

< Subject A >

< Subject B > C=0.01 C=0.1 C=1 C=10

C=100

Figure 8: Execution time according to C value for the variable size of three discriminant feature sets.

< Subject A >

RAW NAR AMP

< Subject B >

Small-sized discriminant feature set

Middle-sized discriminant feature set

Large-sized discriminant feature set

Figure 9: Kernel weight according to C value for the variable size of three discriminant feature sets.

21

Fig. 8 describes that, for each size of the three discriminant feature sets, small values of C have faster execution times than larger values. These results indicate that the regularization parameter C more significantly affects the execution time than the accuracy of character recognition on the variable size of the three discriminant feature sets. Fig. 9 presents the variation of kernel weights according to C for each size of the three discriminant feature sets. Comparing the two morphological features, the raw sample consistently has a higher ratio of kernel weight for different values of C for each size of the three discriminant feature sets. Size

of

a

discriminant feature set Small-sized discriminant feature set

Subject A

Middle-sized discriminant feature set Large-sized discriminant feature set

Small-sized discriminant feature set

Subject B

Middle-sized discriminant feature set Large-sized discriminant feature set

Number of repetitions Kernel

1

2

3

4

5

6

7

8

9

10 11

12 13 14 15

d=1 d=2 d=3 d=4 d=5 d=1 d=2 d=3 d=4 d=5 d=1 d=2 d=3 d=4 d=5

13 2 1 2 0 13 3 1 3 0 17 4 2 3 0

25 1 1 2 3 30 3 3 1 2 39 6 3 1 2

46 0 3 1 5 47 3 4 1 3 48 5 4 1 3

51 2 2 3 3 56 4 1 1 2 57 5 2 2 2

57 5 7 3 5 63 5 5 1 3 67 8 5 2 3

63 5 5 4 4 68 5 3 3 3 73 6 4 3 3

69 4 2 3 3 75 4 3 3 3 78 6 4 3 3

74 6 2 3 3 81 6 4 2 3 79 8 5 3 3

75 3 4 3 4 83 5 5 3 3 82 7 6 2 3

85 3 5 4 5 85 2 4 4 4 88 6 6 3 4

86 2 5 4 5 86 4 5 4 3 91 5 5 3 3

88 4 7 3 7 89 6 6 4 3 93 8 6 3 3

90 5 4 2 6 94 7 5 3 3 94 10 6 2 3

90 2 4 1 6 95 7 2 3 3 96 10 3 3 3

93 2 5 3 5 95 8 6 5 3 97 7 7 5 3

d=1 d=2 d=3 d=4 d=5 d=1 d=2 d=3 d=4 d=5 d=1 d=2 d=3 d=4 d=5

28 2 11 6 4 35 1 10 4 4 39 5 11 2 3

52 4 8 3 4 59 2 15 4 5 60 6 22 1 4

62 6 17 3 3 65 3 19 3 4 65 6 25 4 2

63 2 17 2 7 66 7 20 5 5 70 8 21 6 0

73 4 22 1 7 73 6 21 4 6 74 9 23 4 3

75 3 22 2 8 82 5 28 4 4 82 7 31 5 2

81 2 23 1 6 85 6 24 3 4 88 10 29 4 4

88 2 21 1 8 92 6 28 3 4 89 11 32 3 2

89 2 21 1 9 94 8 30 3 3 94 12 37 3 2

90 2 16 1 7 95 9 33 4 4 96 12 35 3 3

94 3 14 1 4 96 10 32 4 6 97 12 39 3 3

95 4 19 1 5 97 10 31 3 6 98 14 38 4 4

92 5 22 2 6 97 12 29 8 7 98 11 41 5 6

93 5 24 2 4 96 12 31 8 8 97 10 37 5 6

98 3 23 4 3 97 17 31 6 7 97 12 41 4 5

degree

Table 5: Accuracy of character recognition according to kernel degree d for each size of three discriminant feature sets.

Finally, let us discuss how kernel degree d of a polynomial kernel function affects the performance on the variable size of the three discriminant feature

22

(OVDUWLFOHBUHYLVLRQBOHDUQLQJWLPHB'YDOXHBSURSRVHG0./BGDWDVL]HB$B%

< Subject A >

< Subject B > d=1 d=2 d=3 d=4

d=5

Figure 10: Execution time according to kernel degree d for the variable size of three discriminant feature sets.

< Subject A >

RAW

NAR AMP

< Subject B >

Small-sized discriminant feature set

Middle-sized discriminant feature set

Large-sized discriminant feature set

Figure 11: Kernel weights according to kernel degree d for the variable size of three discriminant feature sets.

23

sets. In Table 5 and Fig. 10, with respect to the variable size of the three discriminant feature sets, the accuracy of the kernel degree d =1 is markedly superior to the accuracies of all other kernel degrees, although the kernel degree d = 1 requires a longer execution time than some of the other kernel degrees considered. Fig. 11 shows that comparing the two morphological features the raw sample significantly has a dominant kernel weight for all kernel degrees for each size of the three discriminant feature sets. From these findings, we posit that the regularization parameter C = 1 and kernel degree d = 1 of a polynomial kernel function are appropriate for improving performance in the proposed method. 5. Conclusions In this paper, the proposed method applies MKL based on three discriminant features in order to learn an efficient P300 classifier to improve the accuracy of character recognition in a P300 speller BCI. The main contributions of the proposed method are summarized as follows. First, the three discriminant features are specified as a complementary feature for the MKL of a P300 classification. Second, through the kernel weight of a discriminant feature, the proposed method both explores complementary information between the three discriminant features and weights a contribution of each discriminant feature for the MKL of a P300 classification. Third, the 1 norm regularization on the kernel weight ultimately enforces an optimal discriminant feature set for the MKL of a P300 classification. The performance of the proposed method was then evaluated according to the variable size of the three discriminant feature sets, which were generated from dataset II of BCI competition III. Compared to an existing SVM-based classification method, the proposed method consistently obtained better or similar accuracies for the character recognition, with a different execution time for the variable size of the three discriminant feature sets. Based on the kernel weight ratios of the three discriminant features, we could infer that the two morphological features play a role in supplementing

24

the lack of the raw samples for the MKL of a P300 classification. In addition, we observed that the performance of the proposed method was affected by a kernel type and hyperparameter. Only a linear kernel is fitted with the three discriminant feature sets. The regularization parameter C affects the execution time, and that the kernel degree d affects the accuracy of character recognition with regards to the relative size of the three discriminant feature sets. However, when the kernel type and hyperparameter is properly predefined, we ultimately could conclude that the proposed method is sufficiently robust to improve the accuracy of character recognition with a different time for the variable size of the three discriminant feature sets in a P300 speller BCI. References [1] Berlin Brain-Computer Interface - BCI Competition III - Data set II: P300 speller paradigm 2005. Available on http://www.bbci.de/competition/ iii/ [2] J. R. Wolpaw, N. Birbaumer, D. J. McFarl, G. Pfurtscheller, and T. M. Vaughan, Brain-computer interfaces for communications and control, Elsevier on Clinical Neurophysiology, 2002 March, vol. 113, pp. 767-791. [3] L. Farwell and E. Donchin, Talking off the top of your head: Toward a mental prosthesis utilizing event-related brain potentials, Electroencephalogy and Clinical Neurophysiology, 1998 April, vol. 70, no. 6, pp. 510-523. [4] F. R. Reza, B. Z. Allison, C. Guger, E. W. Sellers, S. C. Kleih, and A. Kubler, P300 brain computer interface: current challenges and emerging trends, Frontiers in Neuroengineering, 2012 July, vol. 5, no. 14. [5] N. Birbaumer, and L. G. Cohen, Brain-computer interfaces: Communication and Restoration of Movement in Paralysis, JPhysiol, 2007 January, vol. 579, no. 3, pp. 621-636.

25

[6] E. Donchin, K. M. Spencer, and R. Wijesinghe, The mental prosthesis: assessing the speed of a P300-based brain-computer interface, IEEE Transactions on Rehabilitation Engineering, 2000 June, vol. 8, no. 2, pp. 174-179. [7] I. Reinvang, Cognitive event-related potentials in neuropsychological assessment, Neuropsychology Review, 1999 December, vol. 9, no. 4. [8] C. C. Duncan, R. J. Barry, J. F. Connolly, C. Fischer, and P. T. Michie, R. N¨at¨anen, J. Polich, I. Reinvang, C. V. Petten, Event-related potentials in clinical research: Guidelines for eliciting, recording, and quantifying mismatch negativity, P300, and N400, Elsevier on Clinical Neurophysiology, 2009 September, vol. 120, pp. 1883-1908. [9] D. Friedman, Y. M. Cycowicz, and H. Gaeta, The novelty P3: an eventrelated brain potential (ERP) sign of the brain’s evaluation of novelty, Elsevier on Neuroscience and Biobehavioral Reviews, 2001 May, vol. 25, pp. 355-373. [10] J. Polich, Updating P300: An integrative theory of P3a and P3b, Elsevier on Clinical Neurophysiology, 2007 June, vol. 118, pp. 2128-2148. [11] A. Kok, On the utility of P3 amplitude as a measure of processing capacity, Psychophysiology, 2001 May, vol. 38, pp. 557-577. [12] Z. Amini, V. Abootalebi, and M. T. Sadeghi, Comparison of Performance of Different Feature Extraction Methods in Detection of P300, Elsevier on Biocybernetics and Biomedical Engineering, 2013 June, vol. 33, no. 1, pp. 3-20. [13] T. Demiralp, A. Ademoglu, Y. Istefanopulos, C. B. Erglu, and E. Basar, Wavelet analysis of oddball P300, Elsevier on International Journal of Psychophysiology, 2001 January, vol. 39, pp. 221-227. [14] K. A. Do, and K. Kirk, Discriminant analysis of event-related potential curves using smoothed principal components, International Journal of Biometrics, 1999 March, vol. 55, no. 1, pp. 174-181. 26

[15] D. P. Burke, S. P. Kelly, P. Chazal, R. B. Reilly, and C. Finucane, A parametric feature extraction and classificaiton strategy for brain-computer interfacing, IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2005 March, vol. 13, no. 1, pp. 12-17. [16] D. J. Krusienski, E. W. Sellers, F. Cabestaing, S. Bayoudh, D. J. McFarland, T. M. Vaughan, and J. R. Wolpaw, A comparison of classification techniques for the P300 speller, Journal of Neural Engineering, 2006 July, vol. 3, pp. 299-305. [17] M. Thulasidas, C. Guan, and J. Wu, Robust Classification of EEG Signal for Brain-Computer Interface, IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2006 March, vol. 14, no. 1. [18] M. Kaper, P. Meinicke, U. Grossekathoefer, T. Linger, and H. Ritter, BCI Competition 2003-Data Set IIb: Support Vector Machines for the P300 Speller Paradigm, IEEE Transactions on Biomedical Engineering, 2004 June, vol. 51, no. 6. [19] A. Rakotomamonjy and V. Guigue, BCI Competition III: Dataset IIEnsemble of SVMs for BCI P300 Speller, IEEE Transactions on Biomedical Engineering, 2008 March, vol. 55, no. 3. [20] C. Cortes, and V. Vapnik, Support-vector networks, Machine Learning, 1995 September, vol. 20, no. 3, pp. 273-297. [21] H. Kook, L. Gupta, S. Kota, D. Molfese, and H. Lvytinen, An offline/realtime artifact rejection strategy to improve the classification of multichannel evoked potentials, Elseiver on Pattern Recognition, 2007 September, vol. 41, pp. 1985-1996. [22] O. A. Arqub, M. AL-Smadi, S. Momani, T. Hayat, Numerical solutions of fuzzy differential equations using reproducing kernel Hilbert space method, Soft Computing, 2015 May, DOI: 10.1007/s00500-015-1707-4.

27

[23] O. A. Arqub, Adaptation of reproducing kernel algorithm for solving fuzzy Fredholm-Volterra integrodifferential equations, Neural Computing and Applications, 2015 December, DOI: 10.1007/s00521-015-2110-x. [24] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet, SimpleMKL, Jouranl of Machine Learning Research, 2008 November, vol. 9, pp. 24912251. [25] S. Sonnenburg, G. R¨ atsch, C. Sch¨ afer, and B. Scholk¨opf,

Large Scale

Multiple Kernel Learning, Jouranl of Machine Learning Research, 2006 July, vol. 7, pp. 1531-1565. [26] F. R. Bach, Consistency of the group lasso and multiple kernel learning, Journal of Machine Learning, 2008 June, vol. 9, pp. 1179-1225. [27] G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. I. Jordan, Learning the kernel matrix with semidefinite programming, Jouranl of Machine Learning Research, 2004 January, vol. 5, pp. 27-72. [28] W. Luo, J. Yang, W. Xu, J. Li, and J. Zhang, Higher-level feature combination via multiple kernel learning for image classification, Elsevier on Neurocomputing, In press 2015. [29] Z. Dai, C. Yan, Z. Wang, J. Wang, M. Xia, K. Li, and Y. He, Discriminative analysis of early Alzheimer’s disease using multi-modal imaging and multilevel characterization with multi-classifier (M3), Elsevier on NeuroImage, 2011 October, vol. 59, pp. 2187-2195. [30] D. Zhang, Y. Wang, L. Zhou, H. Yuan, D. Shen, and A. D. N. Initiative , Multimodal classification for alzheimer’s disease and mild cognitive impairment, Elsevier on NeuroImage, 2011 January, vol. 55, pp. 856-867. [31] H. P. Huang, T. H. Huang, Y. H. Liu, Z. H. Kang, J. T. Teng, A BrainControlled Rehabilitation System with Multiple Kernel Learning, IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2011 October 9-12, pp. 591-596. 28

[32] G. R. G. Lanckriet, T. De Bie, N. Cristianimi, M. I. Jordan, and W. S. Noble, A statistical framework for genomic data fusion, Bioinformatics, 2004 [33] M. Kandemir, A. Vetek, M. G¨onen, A. Klami, S. Kaski, Multi-task and multi-view learning of user state, Elsevier on Neurocomputing, 2014 April, vol. 139, pp. 97-106. [34] I. Kalatzis, N. Pilouras, E. Ventouras, C. C. Papageorgiou, A. D. Rabavilas, D. Cavouras, Design and implementation of an SVM-based computer classification system for discriminating depressive patients from healthy controls using the P600 component of ERP signals, Elsevier on Computer Methods and Programs in Biomedicine, 2004 July, vol. 75, no. 1, pp. 11-22. [35] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Recognition WileyInterscience, London: 2001.

29