BHCR: RSVP target retrieval BCI framework coupling with CNN by a Bayesian method

Accepted Manuscript BHCR: RSVP Target Retrieval BCI Framework Coupling with CNN by A Bayesian Method Liangtao Huang, Yaqun Zhao, Ying Zeng, Zhimin Li...

Download PDF

1MB Sizes 0 Downloads 16 Views

Report

PDF Reader
Full Text

Accepted Manuscript

BHCR: RSVP Target Retrieval BCI Framework Coupling with CNN by A Bayesian Method Liangtao Huang, Yaqun Zhao, Ying Zeng, Zhimin Lin PII: DOI: Reference:

S0925-2312(17)30169-8 10.1016/j.neucom.2017.01.061 NEUCOM 17981

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

28 August 2016 11 December 2016 22 January 2017

Please cite this article as: Liangtao Huang, Yaqun Zhao, Ying Zeng, Zhimin Lin, BHCR: RSVP Target Retrieval BCI Framework Coupling with CNN by A Bayesian Method, Neurocomputing (2017), doi: 10.1016/j.neucom.2017.01.061

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights • A BCI framework couples HV and CV for target image retrieval is proposed.

CR IP T

• A Bayesian brain-computer interaction method consists the core of our framework.

• We conduct a comparison on classification algorithm for EEG-decoding module.

• A CV system based on convolutional neural network is introduced for our framework.

AC

CE

PT

ED

M

AN US

• Further propose a propagation scheme and an image database retrieval scheme.

1

ACCEPTED MANUSCRIPT

BHCR: RSVP Target Retrieval BCI Framework Coupling with CNN by A Bayesian Method

a State

CR IP T

Liangtao Huanga,b,∗, Yaqun Zhaoa,b , Ying Zengc,d , Zhimin Linc Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou, China Digital Switching System Engineering & Technological R&D Center, Zhengzhou, China c Zhengzhou Information Science and Technology Institute, Zhengzhou, China d University of Electronic Science And Technology of China, Chengdu, China

b National

AN US

Abstract

To combine the complementary strengths of human vision (HV) and computer vision (CV) in target image retrieval, we proposed a brain-computer interface framework, Bayesian HV-CV Retrieval (BHCR), which couples HV with CV by a Bayesian method to retrieve target images in rapid serial visual presentation (RSVP) sequences. To construct a well-suited electroencephalogram (EEG) decoding module for BHCR,

M

we conducted a comparative inspection on the selection of classification algorithms, and adopted linear discriminant analysis and random forests as a feature extraction

ED

method and classification algorithm, respectively. We also introduced a CV system based on convolutional neural network (CNN) as a component of BHCR. A Bayesian brain-computer interaction (BBCI) module was carefully designed so that for each pre-

PT

sented image, a Bayesian model that takes HV insight as prior information and CV insights as sample information is built up to present retrieval results. Unlike existing HV-CV coupled works that usually require extra manual labor, BHCR directly en-

CE

hanced retrieval performance with the help of CV insights. As an auxiliary work and a natural extension of BHCR, we then proposed a probability propagation scheme that

AC

incorporates EEG decoding insights to improve the CV system and a one-shot image database retrieval scheme. We demonstrated the effectiveness of BHCR by extensive experiments and simulations on both the entire framework and its sub-components. ∗ Corresponding

author Email addresses: [email protected] (Liangtao Huang ), [email protected] (Yaqun Zhao), [email protected] (Ying Zeng), [email protected] (Zhimin Lin)

Preprint submitted to Journal of LATEX Templates

February 6, 2017

ACCEPTED MANUSCRIPT

The results showed the following: (1) The performance of BHCR was significantly better than the EEG-only mechanism in both receiver operating characteristic (ROC) and classification aspects; (2) The robustness of BHCR was ensured by its process flow

CR IP T

and the steady performances of its sub-components. Keywords: brain-computer interface (BCI), electroencephalogram (EEG), rapid serial visual presentation (RSVP), target image retrieval, convolutional neural network (CNN), Bayesian brain-computer interaction 2010 MSC: 00-01, 99-00

AN US

1. Introduction

As a classical topic in the domain of image processing, image recognition has attracted much attention from academia to industry for a very long time. In many cases, we want to search a certain kind of image that has specific content within a set of 5

images or even a large image database.

M

Human vision (HV) may be the most effective and robust way to fulfill such tasks since the visual system of human beings can parse scenes and recognize objects easily, despite wide variations of images in scale, lighting, pose, background, etc. However,

10

ED

HV is often thought to be inferior to computer vision (CV) in terms of efficiency, even though HV can finish with an image in as little as a few hundred milliseconds [1].

PT

Recent years have seen great achievements in the field of CV, which aims at solving various image recognition problems with computers. The arrival of a variety of deep learning methods, especially convolutional neural networks (CNNs) [2, 3, 4] for image

CE

classification, has boosted the CV developments and attracted the attention of many

15

researchers. Being focused on automatic imagery processing, state-of-the-art CV systems have impressed people with their great efficiency and effectiveness. Nevertheless,

AC

when the processing task is not well-defined, or the images to be processed contain unstructured information, CV systems usually do not perform as good as HV.

20

Considering their respective advantages and shortcomings, a promising direction

is to combine the complementary strengths of CV and HV. From this perspective, the interaction between humans and computers is the first thing that should be considered.

3

ACCEPTED MANUSCRIPT

Over the past 2 decades, noninvasive electroencephalogram (EEG) recordings have been successfully utilized to quantify human perception to certain kinds of stimuli [5, 6, 7], which makes it possible to perform direct interactions between brain and computer. Unlike traditional human-computer interfaces which employ extra devices,

CR IP T

25

e.g., buttons or a mouse, these so called “brain-computer interfaces (BCIs)” aim at “reading the mind,” can realize brain-actuated control, and help handicapped users with complex and difficult tasks.

In this paper, we focused on a BCI that makes use of a subject’s EEG signals for 30

image retrieval [8]. In such BCI applications, images are presented to the subject by

AN US

a specific paradigm; meanwhile, the subject’s EEG signals are collected and analyzed in real-time (a person attentive to certain stimuli will produce measurable scalp EEG signals, in particular, event-related potentials such as the P300 [1]). The problem we considered here is target image retrieval in rapid serial visual presentation (RSVP) 35

sequences, which aims at searching for images that the subject is interested in among a set of images that are presented to the subject using an RSVP paradigm.

M

Previous works have provided enlightening guidance for this study. The most common strategy of solving the target image retrieval problem in RSVP sequences works

40

ED

in a 2-stage manner: (1) The EEG signals are firstly filtered by a spatiotemporal feature extraction method, or down-sampled by a specific operation; (2) Then, a binary classification approach judges whether an image is a target image. Linear discriminant

PT

analysis (LDA, [6, 9, 10, 1]), principal component analysis (PCA, [9]), and independent components analysis [11] has been demonstrated as practical feature extraction

CE

methods. There are also specifically designed feature extraction methods, such as bi45

linear discriminant analysis [12], bilinear discriminant component analysis [13], etc. Among these methods, LDA is most commonly utilized. For summaries of feature

AC

extraction methods, we refer readers to the literature [14, 7]. In the second retrieving stage, classification algorithms, such as logistic regression (LR) [1] and support vector machine (SVM) [15], are trained and used to detect target images in RSVP sequences.

50

For classification algorithms, we further refer readers to other literature[16, 17, 18, 19] and the experimental comparative research on ensemble algorithms [20]. Recent research take target image retrieval in RSVP sequences as a key step in 4

ACCEPTED MANUSCRIPT

image searching tasks in large-scale image databases [21, 22, 1, 8]. In these works, open-loop processes [21, 22] or closed-loop (iterative) processes [1, 8], which often 55

incorporate extra manual labor, were proposed for RSVP target image retrieval so that

CR IP T

a good initial set of target images was sent to further image database retrieval schemes. Lately, this kind of research has been extended to face retrieval tasks [23, 24]. Since database retrieval schemes are commonly linked to a pre-trained CV system, these works have presented a real sense of combining HV with CV.

However, after investigating these works fully focused on target image retrieval in

60

RSVP sequences and those treating it as a subpart of image database retrieval systems,

AN US

we found that there remains some problems. The first is to discover a better EEG decoding method. Since comparative research in the problem of decoding EEG signals is not adequate, there still exists potential for exploration. Second, we should investigate 65

whether there is any better way of building brain-computer interactions. Specifically, we desire a brain-computer interaction method that directly improves target image retrieval performance in RSVP sequences without extra manual labor, but with the help

M

of CV insights. With such an interaction method, not only would the original problem of target image retrieval be better solved, but also retrieval in large image databases would be more efficient (with less manual labor involved in iterations, or even free of

ED

70

extra manual labor).

The main contribution of this paper can be summarized as follows: (1) We com-

PT

pared EEG decoding performance of 6 classification algorithms on 3 feature sets derived by different feature extraction methods. Comparisons showed that the random forest (RF) outperformed all other candidate algorithms. (2) We proposed an inno-

CE

75

vative BCI framework Bayesian HV-CV Retrieval (BHCR) for target image retrieval in RSVP sequences based on a Bayesian brain-computer interaction (BBCI) method,

AC

which incorporates CV insights provided by a pre-trained CNN CV system to improve retrieval performance. Experiments showed that the retrieval performance of BHCR

80

improved significantly compared to the EEG-only scheme in both the receiver operating characteristic (ROC) and classification aspects. Furthermore, we correctly discovered the category of target images in all test sequences and almost all simulation sequences (99.7%), which indicated the potential of our RSVP retrieval results to be a 5

ACCEPTED MANUSCRIPT

good starting point for following retrieval in a large image database. (3) We designed 85

a probability propagation scheme that improvs the CV system with the help of HV insights, and then presented a one-shot imagery database retrieval scheme as a natural

CR IP T

extension of our work. The remainder of this paper is organized as follows: We first introduce the construction of the BHCR framework in Section 2. Then, we describe the proposed BBCI 90

method in detail in Section 3. The comparison between different classification algorithms is presented in Section 4. Extensive experiments and simulations for demonstrating the BHCR framework are shown in Section 5. The proposed probability prop-

AN US

agation scheme and the one-shot image database retrieval scheme are illustrated in Section 6. In Section 7, we compare our work to previous works, and discuss future 95

topics. Finally, we conclude this paper in Section 8. 2. Framework Construction

M

In this section, we overview the proposed framework, and then introduce the sub-

2.1. Overview

ED

modules.

The proposed BHCR framework consists of 5 primary components: (1) an image

100

database; (2) a pre-trained CV system; (3) an RSVP presenting system with an EEG

PT

signal collecting device; (4) an EEG interest decoding module; and (5) a BBCI module. Figure.1 illustrates the connections between these components, and Algorithm.1

CE

summarizes the entire procedure of finding target images in an RSVP sequence. The retrieval started with a set of images randomly sampled from the image database.

105

These images were then presented to the subject using an RSVP paradigm while the

AC

subject’s EEG signals were collected. According to the collected EEG signals, a subject-specific EEG interest decoding module (created previously by a training phase) estimated the subject’s interest in the presented images. For each image, an estimation

110

result (an interest score) was returned as a probability of it being a target image. After estimation of the EEG interest decoding module, the presented images were not

6

Pre-Trained

Computer Vision: CNN

CR IP T

ACCEPTED MANUSCRIPT

Take Out Classification Probabilities of Presented Images

Random Sample

AN US

Image Database

Interest Scores (probabilities)

EEG Decoding Module

Baysian BrainComputer Interaction

Newly Estimated Probabilities

Final RSVP Target Image Retrieval Results

ED

M

RSVP Presentation

EEG Signal Collecting

Matrix of Classification Probability

Figure 1: BHCR Framework. First, a set of images from the database is randomly sampled and presented to the subject using the RSVP paradigm. The subject’s EEG signals are collected in the meantime, and

PT

(probability-formed) interest scores are assigned to each RSVP image based on the EEG response. These scores are subsequently sent to a BBCI module as an input. The other input of the BBCI module is a matrix in which each row corresponds to an array of classification probabilities of an RSVP image returned by a

CE

pre-trained CV system (based on a CNN). The BBCI module then executes the following. (i) The EEG-based scores are treated as prior information by establishing prior distributions for each image. (ii) The matrix from the CV system is transformed into an array of probabilities after the operation of target category discovery, and the BBCI module recovers sample information by conducting randomized trials for each image. (iii)

AC

With prior distribution and sample information ready, a Bayesian mechanism is employed to estimate the posterior probability of each image. (iv) The newly estimated (posterior) probabilities are gathered as an array, and the final retrieval result is derived by setting a threshold to the posterior array. See Section 3 for more details about the BBCI module.

7

CR IP T

ACCEPTED MANUSCRIPT

Algorithm 1 BHCR Framework Offline processing:

AN US

1. Train the CNN-based CV system for the image database.

2. Train the subject-specific EEG interest decoding module.

3. Set a classification threshold pthr in light of retrieval preferences. Online processing:

0. Suppose an RSVP sequence contains n images, online processing starts here. 1. Randomly sample n images I1 , I2 , ..., In from the image database, tell the subject

M

the category to be focused on. presentation.

ED

2. Display images I1 , I2 , ..., In to the subject via RSVP, collect EEG signals during 3. Decode the collected EEG signals, and return probability formed interest scores IS IS P IS = (pIS 1 , p2 , ..., pn ).

PT

4. Run the Bayesian human-computer interaction, a new (posterior) probability array P P P P = (pP 1 , p2 , ..., pn ) is returned.

AC

CE

5. Get RSVP target image retrieval results by comparing elements of P P to pthr .

8

ACCEPTED MANUSCRIPT

flagged directly. Instead, estimated probabilities were transferred to a BBCI module. Then, the Bayesian module combined EEG decoding results with image classification insights from the pre-trained CV system, providing new probability estimations of the presented images (for details, see Figure 1 and Section 3). Finally, target image re-

CR IP T

115

trieval results were derived by setting a probability threshold and comparing the newly estimated probabilities to it. 2.2. Image Database & CV System

All presented images were sampled from the image database Caltech-256 [25]. This image database consists of 257 categories of images. Among these 257 categories,

AN US

120

there are 256 exact categories in which each image contains a certain kind of object, such as an AK-47 or a butterfly, and 1 clutter category with 827 messy and meaningless images. For each category, at least 80 images were collected from the Internet. The total number of images in the database is 30,607.

To develop classification insights of the image database from a CV perspective,

125

M

a CV system was trained in offline manner. Considering the advanced performance of CNNs in the field of image recognition, we followed the work of the open project

ED

Caffe proposed by Yangqing Jia [3]. Specifically, we derived our CV system based on the BVLC Reference CaffeNet model, with a minor variation from as described in the 130

literature [2], from the caffe model zoo.

PT

Here follows a brief description of our CNN model: • Training and Test Sets. The training set for the CNN model was created by

CE

randomly sampling 70% of the images in each category. As a result, the training set consisted of 21,432 images from 257 categories. The remaining (30%) part

AC

135

of the image database Caltech-256 (9,175 images) formed the test set.

• Model Modification. We modified the number of outputs of the final InnerProd-

uct layer of CaffeNet to 257, but kept the other specifications of the model unchanged. We used a mean file derived from our training set.

• Training Setting. The training was a refined procedure based on the published 140

model snapshot of iteration 310, 000 (”bvlc reference caffenet.caffemodel” and 9

ACCEPTED MANUSCRIPT

”bvlc reference caffenet.solverstate”). For the refining procedure, we modified the initial learning rate to 0.001 and the step size to 20, 000. The iteration number we ran for our model was 100, 000.

CR IP T

• Model Usage. Since we randomly sampled images from the entire image database for experiments, we used the trained CNN model on the entire database as well.

145

This treatment was reasonable considering that the model generalized well to the

test set (Overall accuracies were 0.910, 0.821, 0.884 in training set, test set and the entire image database, respectively. For more details of CV performance, see

150

AN US

Section 5.3). 2.3. Image Presentation & EEG Signal Collecting

Images were presented to the subject using an RSVP paradigm [26]. In our setting, images were shown in blocks of 96 and flashed at 5 Hz (Figure 2). The subjects were seated 75 cm from a monitor, and images were centered on the monitor. A fixation cross appeared just prior to each block to allow the subjects to center their gaze on the images during the RSVP sequences. In each RSVP block, 96 images were sampled

M

155

from the image dataset and presented to the subject. The 96 images corresponded to 8

ED

randomly chosen image categories (1 of which was the target category), with 12 images per category (randomly sampled from each category). Subjects are allowed to take a rest between blocks, and the target category can be varied among blocks.

PT

EEG data were acquired by a g.USBamp system (G.Tec) using 16 electrodes dis-

160

tributed in accordance with the International 10 − 20 system. EEG data were sampled

CE

at 2400 Hz using 200 Hz low-pass and 50 Hz notch filters. The acquired EEG data were divided into epochs, each consisting of 1000 ms EEG data after stimulus onset. Thus, we obtained raw data for every image at the scale of 16 × 2400. The spatial dis-

tribution of EEG activities was assumed to change over time with a temporal resolution

AC

165

of 25 ms. Then, for each image in an RSVP sequence, we applied arithmetic average to its corresponding EEG signals within every 25 ms window. Since this was done in different electrodes (16 electrodes in total), we obtained data at a scale of 16 × 40 for each image. After this preliminary processing of EEG signals, we obtained data at a

10

Time Target

200 ms



96 images

CR IP T

ACCEPTED MANUSCRIPT

Figure 2: RSVP paradigm. Each block was comprised of 1 target category (12 images) and 7 nontarget

170

AN US

categories (84 images). Each image was presented for 200 ms.

scale of 96 × 16 × 40 for each RSVP block. We named this ”averaged-window data

(AWD)” to facilitate later referencing. 2.4. EEG Decoding Module

M

An EEG decoding module was employed in our framework to detect user interest in each image shown during the presentation. In previous works, this was often regarded as the main component for target image retrieval tasks in RSVP sequences.

ED

175

The common strategy to build an EEG decoding module, as stated in Section 1, uses a feature extraction method and classification algorithm. We followed this tackling rou-

PT

tine to design our EEG decoding module, and employed LDA as the feature extraction method and RF as the classification algorithm. The selection of our classification algorithm was suggested by a comparative inspection, which will be shown in detail in

CE

180

Section 4.

Unlike other systems, we did not flag any image directly after the EEG interest

AC

estimation. Conversely, we treated the interest scores as a preliminary result and sent it to the BBCI module (see Section 2.5) for further processing.

185

2.5. BBCI Module For an RSVP sequence, we obtained a probability array P IS of 96 elements from

the EEG decoding module, in which each element indicated the possibility of its cor11

ACCEPTED MANUSCRIPT

Table 1: Target Categories of 25 Training Sequences

S EQUENCE

TARGET

S EQUENCE

TARGET

S EQUENCE

TARGET

C UP

10

PAN

19

P HOTOCOPIER

2

B UTTERFLY

11

H ELICOPTER

20

R AINBOW

3

C AMEL

12

S ANDGLASS

21

S UPERMAN

4

C ENTIPEDE

13

L IZARD

22

S HOES

5

D ISPLAYER

14

K ANGAROO

23

F OOTBALL

6

C ARRIAGE

15

C HEETAH

24

B ICYCLE

7

D OLPHIN

16

OWL

25

U NICORN

8

G LASSES

17

M INOTAUR

9

H ELMET

18

AN US

CR IP T

1

PAPER - CLIP

responding image being a target image. These insights were derived from HV by decoding the EEG signals collected during the RSVP.

For the same RSVP sequence, we also obtained a 96 × 257 scaled matrix P CV =

M

190

CV (pCV ij )96×257 of probabilities from the pre-trained CV system, where pij was the eval-

uated probability of the ith image belonging to the jth category in the image database.

ED

Since target images in an RSVP sequence belonged to the same category, we concluded that the matrix P CV provides insights of target images in a latent way. Furthermore, 195

if the category of target images was explicit, those insights given by P CV would turn

PT

explicit.

To make use of CV insights, we designed a simple scheme to discover the target

CE

category. The BBCI module started with this scheme. The module was aimed at combining both of the insights given by the HV-related components (including the image presenting system, signal collecting system, and EEG decoding module) and the CV system. Particularly, this module took HV insights as prior information, and obtained

AC

200

sample information from CV insights. Thus, we could derive posterior probabilities P P to reach the final retrieval result. In the next section, we will provide further explanations about the BBCI module.

12

ACCEPTED MANUSCRIPT

Table 2: Target Categories of All 35 Test Sequences of the 7 Subjects

S EQUENCE 1

S EQUENCE 2

S EQUENCE 3

S EQUENCE 4

S EQUENCE 5

S UBJECT 1

P EOPLE

H AWKSBILL

C ARTMAN

L AWN - MOWER

C OCKROACH

S UBJECT 2

P HOTOCOPIER

R AINBOW

S CORPION

T ENNIS - SHOES

S OCCER - BALL

S UBJECT 3

F RYING - PAN

G ORILLA

H ELICOPTER

H OURGLASS

L IZARD

S UBJECT 4

F RENCH - HORN

T OURING - BIKE

A IRPLANES

H AMBURGER

L LAMA

S UBJECT 5

M EGAPHONE

H OURGLASS

L EOPARDS

T OMATO

PAPER - SHREDDER

S UBJECT 6

Z EBRA

B IRDBATH

F LOPPY- DISK

M ICROSCOPE

G OAT

S UBJECT 7

K ANGAROO

L EOPARDS

OWL

M INOTAUR

C LIP

AN US

205

CR IP T

S UBJECTS

2.6. Experimental Protocol

For each subject, the experiment consisted of 2 phases: the training phase, and the testing phase. For the training phase, we used a training RSVP sequence of 25 blocks, the target categories which are shown in Table 1. To facilitate later expression, we

210

M

treated the training sequence as 25 single-block sequences. Further operations conducted on the training and test sets will also be described in terms of these single-block

ED

RSVP sequences. For each sequence, subjects were instructed to pay attention to images containing a certain kind of object and an example of target images was shown before the presentation (right before the fixation cross) as preparation. In the training

215

PT

phase, we trained the feature extraction and classification models for the EEG decoding module. During the testing phase, 5 single-block sequences were presented, and the target categories of the test sequences are shown in Table 2. The performance of

CE

the BHCR framework was evaluated with respect to each test sequence of each subject (Section 5.1,5.2). The target categories in the testing phase were different from

AC

the training phase, and thus some images that were target images during the training

220

phrase might have appeared as distractors in the testing phase. Seven subjects participated in the experiment (6 males and 1 female; average age

of 22.5 years with a standard deviation of 1.2 years; and all right-handed). All subjects were students of Zhengzhou University and had no previous training in the task. The subjects had normal or corrected-to-normal vision with no neurological problems, and 13

ACCEPTED MANUSCRIPT

Algorithm 2 Processing the BBCI Module IS IS Input: P IS = (pIS 1 , p2 , . . . , p96 ), CV classification probability matrix M =

(mij )30607×257 , CV classification accuracy r, and an intermediate threshold pithr .

CR IP T

P P Output: Posterior array P P = (pP 1 , p2 , . . . , p96 ).

Begin:

1. Extract classification probabilities of the presented images from M ; form them into a small sequence-dependent matrix P CV = (pij )CV 96×257 .

2. Target Category Discovery. Determine the category of target images based on P CV , P IS , pithr .

AN US

C C 3. Matrix to Array. Transform P CV into an array P C = (pC 1 , p2 , . . . , p96 ), in

which pC i indicates the possibility of the ith image being a target image. 4. Bayesian Model Construction. For the ith image of the sequence, a Bayesian C P model is constructed based on pIS i and pi to yield a posterior probability pi .

End.

were financially compensated for their participation.

M

225

ED

3. Details of the BBCI Module

IS IS Suppose we obtained P IS = (pIS 1 , p2 , . . . , p96 ), an array of interest scores of an

RSVP sequence, from the EEG decoding module. The array P IS would be transferred

230

PT

to the BBCI module, which would then start processing. The processing flow-path of the BBCI module is shown in Algorithm 2. Except for the first step of deriving a small

CE

matrix P CV , we organized this processing into 3 subparts. The designing details of these subparts are described below.

AC

3.1. Target Category Discovery

235

Since P CV contains image classification information derived from the CV system,

we must somehow transform them into information about target images of the test RSVP sequence. Notice that compared with P IS , P CV has an apparently different

form. Therefore, the first concern is to reform it. To guide this transformation, we need to know the category of the image database that target images belong to. 14

ACCEPTED MANUSCRIPT

We named this sub-processing of deriving the target category as Target Category 240

Discovery. It contains 3 steps:

CR IP T

Step 1. Obtain a index set Ind = {i|PiIS ≥ Pithr }. P Step 2. For j ∈ {1, 2, . . . , 257}, calculate Sj = i∈Ind pCV ij .

Step 3. The category corresponding to the Jth column of P CV is recognized as the cat-

egory of target images (target category), where J satisfies SJ = max1≤j≤257 Sj . 245

Remark 1. An explanation. We obtained a coarse target retrieval result with pithr , and

the recognized images were indexed by Ind. Then, we turned to matrix P CV , summed

AN US

the (CV-based) probabilities of recognized images in each column (i.e. with respect to each category), sorted summation results, and recognized the category with the biggest accumulate probability as the target category.

Our design specifications suggest that the performance of Target Category Discov-

250

ery mainly depends on (1) the intermediately recognized images with P IS and pithr ,

M

(2) the classification possibilities of the recognized images assigned by the CV system. To describe a suitable condition, we summarized the demand of the target category

255

ED

discovery with the following 2 assumptions: Assumption 1. Most of the intermediately recognized images are true targets. Assumption 2. In most cases, for an image, the CV system tends to evaluate a larger

PT

value of probability to its exact category.

CE

These 2 assumptions call for as good as possible intermediate retrieval performance and classification performance of the CV system. This also partly addresses the reason

260

for us to conduct method selection for the EEG decoding module in Section 4 and to

AC

employ the state-of-the-art CNN as the CV system (with attempts to meet the assumptions, we reached a category discovery accuracy of 100% among test sequences and 99.7% in a simulation of 1000 sequences, see Section 5.3).

265

Note that the way to choose pithr remains unexplained. In fact, the 2 proposed

indexes in Section 4.2 were designed partly to meet Assumption.1, and we just set the value of pithr by maximizing 1 of the indexes, DT −F (for details, see Section 4.2). 15

ACCEPTED MANUSCRIPT

3.2. Matrix to Array Knowing the target category, we could transform CV classification information into target-related information by reforming P CV into an array similar to P IS . For each image, we used a process that is divided into 2 cases depending on whether

CR IP T

270

the image had been classified to the discovered target category by the CV system:

(1) For i = 1, 2, . . . , 96, if the ith image is classified to the target category, then calculate the possibility of it belonging to the target category as PiC =

CV r×Pi,J CV +(1−r)×(1−P CV ) ; r×Pi,J i,J

(2) For i = 1, 2, . . . , 96, if the ith image is classified to another category, then calculate

CV (1−r)×Pi,J CV +r×(1−P CV ) . (1−r)×Pi,J i,J

AN US

the possibility of it belonging to the target category as PiC =

275

Remark 2. These calculations helped us derive insights of the target image retrieval task from the CV system. We could then treat P C as another array of interest decoding probabilities for target image retrieval by some other experimental treatment. Since P IS and P C had the same pattern, we then tried to combine them by a Bayesian model.

M

280

3.3. Bayesian Model for Each Image

ED

For ith image in an RSVP sequence, we considered the event ”this image is one of user’s interesting images”, which followed a 2-point distribution parameterized by

285

PT

θi . Our proposed Bayesian method enabled us to give a posterior estimation of every θi in an RSVP sequence. As we know, a Bayesian model requires prior information and sample information. But at this point, we have just 2 probabilities for each image.

CE

Thus, we must address these additional problems: (1) Which should be taken as prior information? (2) How can the other be utilized as sample information? Let’s consider the following scenario. Some images are presented to the subject by

RSVP and the subject rates these images in the form of probability (P IS ). However,

AC 290

the subject is worried about missing some of these scores during the recording, and that some may be inaccurate due to carelessness. To eliminate doubts, the subject turns to a computer for help after his rating session. The subject indicates the target category and provides all presented images to the computer. Then, the computer conducts

16

ACCEPTED MANUSCRIPT

295

randomized trails for every image, records the frequency of it to be recognized as a target image, and transforms the frequencies into probabilities (P C ). If some of the above worried situations occurred, the subject receives help from those probabilities: values between P IS and P C for some images.

CR IP T

for example, replacing some values in P IS with values in P C , or calculating average Our Bayesian method was built based on the above idea. We took probabilities in

300

P

IS

as prior information and recovery sample information from probabilities in P C

by re-conducting randomized trials according to these probabilities. Specifically, we

conducted N random trials (parameterized by PiC ) for the ith image, and recorded the

305

AN US

frequency ni of it being recognized as a target image as sample information for the Bayesian model.

For the ith image, the process of the Bayesian model is described as follows:

(1)

M

Step 1. Establish prior distribution for the ith image,   2 · (1 − pIS ), θi ∈ [0, 0.5]; i π(θi ) =  2 · pIS , θi ∈ (0.5, 1]. i

Step 2. Conduct randomized trials that are parameterized by PiC for N times, and

ED

record the frequency ni of it being recognized as a target image. Step 3. Calculate the posterior distribution according to the above results, π(θi |ni ) ∝ h(ni , θi ) = p(ni |θi ) · π(θi ).

PT

310

Step 4. Integral the posterior distribution at the interval (0.5, 1], and the resulting value

CE

pP i is the desired posterior probability.

AC

4. Classification Algorithm Comparison for EEG Decoding Module

315

As stated in Section 1 and Section 3.1, we conducted a comparative inspection

for the selection of classification algorithm here. The discussion of this section is organized as 3 parts: (1) comparison settings of feature sets and candidate classification algorithms; (2) comparison indexes utilized to evaluate the performance of candidate algorithms; (3) the comparison and results.

17

ACCEPTED MANUSCRIPT

4.1. Comparison Settings 320

Feature Sets. To ensure that the selected classification algorithm works well on feature sets derived by feature extraction methods in both unsupervised and supervised condi-

CR IP T

tions, we conducted comparisons on LDA-based feature set LDA(40) and PCA-based feature set PCA(51). Specifically, LDA(40) was acquired by an LDA of AWD data,

which took the filtered data as a linear combination of the data of all 16 electrodes in 325

the same time window, while PCA(51) was acquired by a PCA of AWD data, thus setting the cumulative proportion as 85%. The numbers 40 and 51 in parentheses indicates the dimension of the feature set. Additionally, we treated AWD as a trivial feature set

AN US

and employed it as a baseline for comparison.

Candidate Classification Algorithms. This comparison was conducted using 6 preva330

lent classification algorithms in machine learning communities: adaboost [27], bagging [27], artificial neural network (ANN)[28], RF [29], SVM [30] and LR [1]. 4.2. Comparison Indexes

M

The common measure to evaluate information retrieval performance uses the area under the curve (AUC) value of the ROC curve, but new comparison indexes should be introduced here for the following reasons: (1) to meet Assumption.1 stated in Section

ED

335

3.1; (2) to suit the unbalanced characteristic of the binary classification for RSVP target image retrieval.

PT

We concluded 2 requirements from Assumption.1, which were the basis for our comparison indexes design: (1) recognize true targets as much as possible; (2) recognize false alarms as little as possible. Since detecting the correct target category is an

CE

340

important concern of the BBCI module, we did not take the absolute value of recognized true targets as a direct measurement of the comparison. Instead, we proposed

AC

2 indexes (True-False Difference and Corrected False-True Ratio) to address these requirements.

345

To make sure our comparison result was available for common information retrieval

settings, an AUC-based comparison was conducted (see Section 5.3). As expected, our 2 results were consistent with each other. Thus, the suggestion we provide here is also constructive to common EEG decoding settings. 18

ACCEPTED MANUSCRIPT

Definition 1. True-False Difference DT −F (2)

Definition 2. Corrected False-True Ratio RF/T : RF/T = 7 ·

1 − p00 p01 =7· 1 − p10 p11

CR IP T

DT −F = 12 · p11 − 84 · p01

(3)

In the definitions, p10 indicates the probability that a target image is recognized as 350

a distractor, p00 indicates the probability that a distractor is recognized as a distractor,

AN US

p11 = 1 − p10 , and p01 = 1 − p00 . Here probabilities are approximated by frequencies. Remark 3. The 2 indexes indicate the differences in value and ratio between true targets and false alarms. In the comparison, the desired algorithm should have a large DT −F value and a small RF/T value to meet the above 2 requirements. 355

4.3. Comparison & Results

M

Retrieval performances in terms of DT −F and RF/T are shown in Table 3 and Table 4 (since the performances shared similar pattern among subjects, we show figures of one subject here). All indexes were calculated over an average of RSVP sequences,

360

ED

and intermediate threshold pithr for each algorithm was set to maximize index DT −F . The left half of each table displays the results in the training set, and the right side

PT

corresponds to the test set. For each subject, comparison was conducted with a 10-fold repeat random sub-sampling validation. In each fold, 20 of the 25 training sequences were randomly sampled as the training set and the remaining 5 sequences were taken

CE

as the test set. Performances on both sets were considered to evaluate the candidate

365

classification algorithms. Adaboost and RF were the best fitting models with respect to the training set, across

AC

all feature sets. In fact, DT −F = 12 and RF/T = 0 means that all target images were recognized and no false alarm appeared. Viewing the right-side columns of Table 3 and Table 4, it is obvious that, in most conditions, classification algorithms built upon

370

an LDA feature set outperformed others in both indexes. This observation was consistent our intuition that supervised filters could encode data information better than

19

ACCEPTED MANUSCRIPT

Table 3: Performance of Candidate Classification Algorithms w.r.t. True-False Difference

AWD(16X40)

PCA(51)

LDA(40)

DT −F ( TEST )

ADABOOST

12.00(0.00)

12.00(0.00)

12.00(0.00)

BAGGING

11.20(0.23)

8.01(0.50)

7.13(0.36)

RF

12.00(0.00)

12.00(0.00)

12.00(0.00)

RF

ANN

12.00(0.00)

8.40(0.75)

3.27(0.340)

ANN

SVM

11.94(0.04)

4.63(0.32)

8.36(0.15)

SVM

LR

12.00(0.00)

2.06(0.20)

3.27(0.27)

LR

ADABOOST BAGGING

AWD(16X40)

PCA(51)

LDA(40)

3.92(1.05)

3.08(0.80)

5.06(0.99)

CR IP T

DT −F ( TRAIN )

2.80(0.81)

1.74(0.71)

4.22(0.96)

4.06(0.98)

2.88(0.38)

5.68(1.07)

3.14(1.04)

1.36(0.64)

3.52(1.22)

4.90(0.78)

1.06(0.40)

4.34(1.40)

-2.90(1.05)

1.72(0.69)

3.34(1.10)

PCA(51)

LDA(40)

AWD(16X40)

PCA(51)

LDA(40)

ADABOOST

0.00(0.00)

0.00(0.00)

0.00(0.00)

ADABOOST

0.17(0.11)

0.15(0.12)

0.15(0.06)

BAGGING

0.04(0.12)

0.11(0.05)

0.11(0.03)

BAGGING

0.22(0.13)

0.35(0.12)

0.21(0.11)

RF

0.00(0.00)

0.00(0.00)

0.00(0.00)

RF

0.14(0.11)

0.14(0.10)

0.11(0.05)

ANN

0.00(0.00)

0.15(0.05)

0.29(0.05)

ANN

0.39(0.11)

0.49(0.13)

0.34(0.08)

SVM

0.00(0.00)

0.00(0.00)

0.19(0.02)

SVM

0.17(0.07)

0.04(0.08)

0.28(0.10)

LR

0.00(0.00)

0.33(0.05)

0.29(0.06)

LR

1.60(0.30)

0.41(0.18)

0.32(0.08)

M

AWD(16X40)

RF/T ( TEST )

ED

RF/T ( TRAIN )

AN US

Table 4: Performance of Candidate Classification Algorithms w.r.t. Corrected False-True Ratio

unsupervised filters. The only exception was in SVM, which performed better on the

PT

AWD(16X40) set. This exception may have been due to the use of a sigmoid kernel when constructing the SVM algorithm. This result also suggests a possibility of de375

coding EEG signals with a kernel method, which remains to be inspected. Ultimately,

CE

the best performances were with the LDA feature set, so we focused our attention on the third columns. There, we can see that RF was the best setting choice.

AC

Simple calculations suggested that, by taking LDA as the feature extraction method

and RF as the classification algorithm, the numbers of recognized real targets and false

380

alarms in a test RSVP sequence were expected to be 6 (actually 6.38) and 1 (actually 0.70).

20

ACCEPTED MANUSCRIPT

5. Experimental Results In this section, we will firstly show the performance of the entire BHCR framework

385

CR IP T

and then investigate the robustness of the BHCR framework. 5.1. Performance of BHCR: ROC Asepct

Although the RSVP target image retrieval task result was given by a probability threshold in practice, it is common to evaluate the performance of an information retrieval mechanism by ROC curves and their corresponding AUC values.

To intuitively illustrate the improvement of retrieval performance by combining CV insights under the help of the proposed Bayesian method, a simple example is shown

AN US

390

in Figure 3a. Focus is on 1 randomly selected test RSVP sequence, and consists of 2 ROC curves. The dashed blue curve (AU C = 0.80) depicts the performance of the RSVP target image retrieve mechanism in which only the EEG signals were employed (with LDA as feature extraction method and RF as classification method), and the solid 395

red curve (AU C = 1) depicts the performance of the proposed framework, which

M

took advantage of the CV insights by the BBCI module. The red curve being well above the blue one highlights that the proposed framework outperformed the EEG-

ED

only mechanism in this example.

Figure 3b further illustrates the intuition obtained from Figure 3a by showing all 400

AUC values of the 2 compared mechanisms. In particular, points with a vertical coordi-

PT

nate indicate the performance of the BHCR framework, while a horizontal coordinate indicates that of the EEG-only mechanism. Since all points are located in the upper-

CE

left of the figure, the AUC values along the vertical axis are larger than those along the horizontal axis. We found that the proposed framework outperformed the EEG-only

405

mechanism in all 35 test sequences. The average AUC value of the BHCR framework,

AC

0.987, was 13.1% higher than that of the EEG-only mechanism, 0.873. Furthermore, among all 35 test sequences, 54.3% of cases in which the proposed framework outper-

formed the EEG-only mechanism by 10% in terms of AUC values, and 25.7% of cases

by 20%. The greatest improvement was by 31.5%.

410

Figure 3c shows the mean AUC values of test sequences in the 3 different mechanisms with respect to different subjects. This figure shows that, on average, the BHCR 21

1.00

1.00

CR IP T

ACCEPTED MANUSCRIPT

0.50

ROC Curves

0.25

BHCR

0.90

AUC 1.00 0.85

0.98 0.96

0.80 0.94

EEG−only 0.00

0.92 0.75

0.25

0.50

0.75

1.00

0.75

False Positive Rate

0.80

0.85

0.90

0.95

1.00

0.75

0.50

0.25

0.00

1.00

1

AUC Value of the EEG−only Mechanism

(a)

BHCR

2

1.00

0.9

AUC Value

0.75

0.8

Method 0.7

0.50

0.25

EEG−only

2

M

BHCR

3

4

5

Chance

6

7

Subject

(b)

1

1.0

EEG−only

AN US

0.00

AUC Value

Average AUC Value of Test Sequences

True Positive Rate

0.75

AUC Value of The BCHR Framework

AUC 0.95

(c)

3

4

5

6

7

Methods: BHCR EEG−only Chance

0.00

1

2

3

4

5

Subject

6

7

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

All Test Sequnces of All Subjects

(e)

ED

(d)

1 2 3 4 5

Figure 3: Performance of the BHCR framework (ROC Aspect), mainly compared with an EEG-only mechanism and retrieval by chance. (a) A randomly selected test RSVP sequence. The ROC curve of the BHCR

PT

framework is well above the ROC curve of the EEG-only mechanism. (b) AUC values of all 35 test RSVP sequences of the 7 subjects. AUC values are drawn as 2D points, where a vertical coordinate indicates the performance of the BHCR framework and a horizontal coordinate indicates the EEG-only mechanism. The

CE

BHCR framework outperformed the EEG-only mechanism in all cases. (c) Bar chart of average AUC values of each subject with the 3 compared mechanisms. The average AUC value of the BCHRR framework was close to 1 for all 7 subjects, and its corresponding bars were higher than those of the other mechanisms.

AC

(d)Box plot of AUC values of each subject with the BHCR framework and EEG-only mechanism. (e) AUC values of all 35 test sequences of the 7 subjects.

22

ACCEPTED MANUSCRIPT

framework performed better than the EEG-only mechanism for each subject. The comparison for retrieving by chance was aimed at revealing the practical significance of both the BHCR framework and the EEG-only method.

CR IP T

We provide a finer subject-specific comparison between the BHCR framework and

415

the EEG-only mechanism in Figure 3d, in which box plots are drawn. The BHCR

framework performed better than the EEG-only mechanism since (1) their correspond-

ing box plots were well above those of the EEG-only mechanism, and (2) their box plots were thinner than those of the EEG-only mechanism.

Lastly, we conducted a subject-and-sequence specific comparison, which is dis-

420

with those from the above analysis.

AN US

played in Figure 3e, to present greater detail of our results. These results also accord

5.2. Performance of BHCR: Classification Aspect

Since the RSVP procedure we investigated was aimed at finding target images in 425

RSVP sequences, and we actually obtained intermediate EEG-only retrieval results

M

with a temporal probability threshold pithr during the processing of the BBCI module, we then conducted comparisons regarding target image retrieval (binary classification)

ED

performances.

We conducted 3 kinds of experiments to reveal the improvement of our proposed 430

framework in terms of classification performance. We utilized 3 different retrieval

PT

principles: (1) Retrieving by optimizing the indexes proposed in Sec 4.2; (2) Retrieving without false alarm images; (3) Retrieving so that all true targets are hit. Table 5 shows the target image retrieval performance of the EEG-only mechanism,

CE

and Table 6 shows the performance of the BCHR framework. In each table, the num-

435

bers contained in the first 2 rows correspond to how many true targets and false alarms

AC

were recognized from the test RSVP sequence following the first principle. The third rows show the number of recognized true targets in an RSVP sequence in the circumstance that no false alarms appeared, and the fourth rows indicate the number of false alarms in a retrieval process if all true targets were hit. All values are presented as

440

means over 5 test sequences of each subject, with standard deviations. The BCHR framework outperformed the EEG-only mechanism under all 3 retrieval 23

ACCEPTED MANUSCRIPT

Table 5: Target Image Retrieve Performance of EEG-only Mechanism

S UBJECT 1

S UBJECT 2

S UBJECT 3

S UBJECT 4

S UBJECT 5

S UBJECT 6

S UBJECT 7

Ntrue

5.6(1.82)

6.2(2.17)

6.4(1.82)

6.6(2.51)

5.6(2.71)

7.6(1.82)

7.0(1.22)

Nf alse

1.0(1.00)

2.0(2.35)

0.8(1.30)

2.2(2.77)

1.2(1.30)

2.8(2.95)

0.4(0.89)

Ntrue

3.8(0.84)

3.0(1.73)

4.6(2.30)

3.0(1.58)

3.0(2.35)

2.2(1.48)

6.6(1.52)

54.6(19.30)

54.2(22.33)

51.4(30.33)

56.6(22.30)

65.6(18.28)

52.0(22.36)

68.2(8.44)

Nf alse

CR IP T

S UBJECTS

Table 6: Target Image Retrieve Performance of BHCR Framework

S UBJECT 1

S UBJECT 2

S UBJECT 3

S UBJECT 4

S UBJECT 5

S UBJECT 6

S UBJECT 7

Ntrue

11.2(0.84)

11.6(0.55)

11.8(0.45)

11.6(0.55)

11.8(0.45)

11.8(0.45)

11.8(0.45)

Nf alse

0.0(0.00)

0.00(0.00)

0.00(0.00)

1.00(0.00)

0.00(0.00)

0.00(0.00)

1.00(0.00)

Ntrue

11.2(0.84)

11.6(0.55)

11.8(0.45)

10.6(1.14)

11.8(0.45)

11.8(0.45)

11.4(0.89)

Nf alse

17.0(18.14)

4.2(6.57)

0.6(1.34)

21.4(31.30)

16.6(37.12)

4.2(9.39)

17.6(37.12)

M

AN US

S UBJECTS

principles. Actually, by employing the first principle, the average number of true targets selected by the BCHRR framework had increased by 81.3% compared to the EEG-only

445

ED

mechanism, and the false alarm rate decreased by 57.8%. The average number of true targets with respect to the second principle increased by 342.0%, and in terms of the

PT

third principle, the average number of false alarms decreased by 80.4%. Remark 4. From viewing Section 5.1 and 5.2 we concluded that the BHCR framework outperformed the EEG-only mechanism to a considerable degree. Since we em-

CE

phasized the coupling of HV and CV in our framework, one may also consider how

450

to compare the BHCR framework to the CV part. However, there is no direct way to

AC

make such a comparison since the CV system itself does not contain specific retrieval information. Thus, for a real retrieval task, a target category discovery procedure is necessary for utilizing the CV insights to refine the EEG-only mechanism. A possible measure to compare the BHCR framework to the CV system is classification accu-

455

racy. For simplicity, we calculated the average accuracy of all 35 test sequences for the BHCR framework (0.955), and used the average classification accuracy of the CV sys24

ACCEPTED MANUSCRIPT

0.95

0.90

Algorithm adaboost ANN bagging

0.80 LR RF 0.75 SVM

0.70

1

2

3

4

5

6

AN US

Candidate Algorithm

CR IP T

AUC Value

0.85

Figure 4: EEG decoding performance of candidate classification algorithms (box plots of AUC values). Both SVM and RF seemed to have top performances, but an outlier made SVM less satisfactory than RF.

tem (0.887, see Section 5.3). This indicated that the BHCR framework outperformed forced to the CV system. 460

M

the CV system even when a strong assumption (the target category is known) was en-

5.3. Robustness of the BHCR Framework

ED

Performance of EEG Decoding Module. In Section 4.2, we presented 2 indexes for selecting the classification algorithm. We considered to what extent would the

PT

selected algorithm be consistent with the best algorithm suggested by the ROC perspective. Figure 4 shows the AUC performances of all 6 candidate algorithms with the 465

test sequences (we only display the LDA-based results since they had better perfor-

CE

mance from a global perspective). The RF algorithm had the best AUC performance compared to other candidate algorithms, which was consistent with the conclusion we

AC

derived in Section 4.3 (both SVM and RF seemed to have top performances, but an outlier made SVM less satisfactory than RF).

470

CV System Performance. The CV system plays an important role in the entire

framework since the posterior probabilities are derived by combining CV insights with EEG decoding insights. Intuitively, the BHCR framework would return a good retrieval result if the CV system provided image classification information in a proper way. 25

ACCEPTED MANUSCRIPT

25

15

10

5

0 0.6

0.7

0.8

0.9

1.0

AN US

Classification Accuracy

CR IP T

Numbe of Categories

20

Figure 5: Classification accuracy histogram of the CV system. The dashed line indicates the mean value of classification accuracy of all 257 categories, which equals 0.887. Most categories had accuracy values in the interval (0.7, 1) and only 3 had accuracy below 0.700. A density curve is also displayed.

Investigated the classification performance of the CV system would help us obtain a better understanding of the Bayesian interaction mechanism. We plotted a histogram of

M

475

the category number relative to classification accuracy (Figure 5, with a density curve, and a dashed line indicating the average accuracy (0.887, sd = 0.0685, n = 257)).

ED

The majority of these accuracies were located in the interval (0.7, 1), and only 3 were below 0.700 (0.620, 0.653, and 0.660). This indicates that when the target category 480

discovery processes successfully, we could expect 12 × 0.887 = 10.644 true target

PT

EEG decoding probabilities would be amended correctly by CV-side information in

average, more than 12 × 0.700 = 8.400 in most cases, and about 12 × 0.620 = 7.440

CE

in the worst case.

Recall that the average values of the selected true target images of different sub-

485

jects were consistently close to 11 (Section 5.2), which highly agrees with the 10.664

AC

obtained here. There are some subtle reasons for this phenomenon, and we will briefly explain this in the following discussion. Target Category Discovery Performance. As the first operation of the BBCI

module, target category discovery is a pivotal step that decides whether reliable CV

490

insights can be obtained to refine the EEG-only retrieval performance. To ensure the

26

ACCEPTED MANUSCRIPT

7.5

5.0

2.5

0.0

Discover The Category Fail 9 Success

6

3

0 0.0

2.5

5.0

7.5

10.0

0

The Number of Recognized TRUE Targets

3

CR IP T

The Number of Recognized FALSE Targets

The Number of Recognized FALSE Targets

10.0

6

9

The Number of Recognized TRUE Targets

(a)

(b)

AN US

Figure 6: Performance of the target category discovery process. Each point corresponds to an RSVP sequence, and the vertical axis indicates the number of false alarms in the intermediate retrieval result while the horizontal axis indicates the number of true targets. Dashed lines of X = Y are plotted to separate conditions of X > Y and X < Y . (a) Performance in all 35 test sequences. The target category discovery process succeeded in all cases. (b) Performance in 1000 simulation sequences. The process succeeded in

M

997 cases, and succeeded in all cases if we neglected 3 cases of X <= Y .

BHCR framework works steadily, we must carefully inspect the process of target category discovery.

ED

Target category discovery succeeded in all 35 test sequences of all subjects (Figure 6a). In all cases, the numbers of intermediately recognized true targets were larger 495

than the numbers of false alarms. However, with such a simple observation we cannot

PT

conclude that we would see similar success in all future experiments. To evaluate a more realistic performance, we conducted a simulation experiment for

CE

target category discovery. Several steps were involved: (i) Make hypotheses and apply hypothesis testings for the distributions of the true target number and the false alarm

500

number; (ii) Generate 1000 simulated RSVP sequences according to our experimental

AC

protocol and hypotheses; (iii) Run target category discovery process for the simulated sequences, and obtain binary outcomes indicating whether the process succeeded. We organized simulation results in the same way as real sequences (Figure 6b).

The simulation revealed that target category discovery failed in some circumstances,

505

e.g., when the false alarm number was larger than the true target number. However,

27

ACCEPTED MANUSCRIPT

none failed when the true number was larger than the false number. Success rate of the simulation was 99.7%, which is considerably high. Note that the false alarm number showed a tendency to be smaller than the true numbers in practice, and therefore, we

510

CR IP T

could even expect a higher success rate. The steady performance of the target category discovery provided initial support to the final retrieval results, and partly addressed the aforementioned phenomenon regarding CV system performance.

Prior-Posterior Probability Spread. It is essential for us to inspect how the 2 side probabilities (the CV side probabilities were transformed into sample information)

react for the sake of better understanding the BHCR framework. Here, we provide an analysis with 3 core concepts and illuminating examples.

AN US

515

For an image Ii in an RSVP sequence I1 , I2 , ..., In of n images, the 3 concepts are stated as follows: (1) The fact that Ii is classified to the target category by the CV system tends to promote the possibility of it to be finally recognized as a target image; (2) Although (1) holds, if the CV module cannot give strong enough support in terms 520

of the classification probability, stronger support from the EEG decoding module is

M

required to finally recognize Ii as a target image. (3) The fact that Ii had not been classified into the target category tends to strongly diminish the possibility of Ii to be

ED

finally recognized as a target image.

Examples of these concepts are drawn in Figure 7a and 7b. To facilitate illus525

trations, we assumed a probability threshold of 0.5, i.e., the image would be finally

PT

recognized as a target image if the posterior probability was greater than 0.5. Figure 7a depicts the condition in which an image that was classified to the target category by the

CE

CV system. The top curve shows that the posterior probability was larger than the EEG decoding probability when Pi,tc > 0.113, and the lowest curve shows that the posterior

530

probability was smaller than the EEG decoding probability when Pi,tc < 0.113 (we set

AC

Pi,tc = 0.09 here). Figure 7b depicts the condition in which an image was classified into another category (not the target category) by the CV system. Here, we chose large probabilities less than 0.5 to illustrate the third concept.

535

Notice that the above threshold 0.113 = 1 − 0.887. This was because we ap-

plied a scale technique on the CV probabilities in Section 3.2 with a scale parameter r = 0.887, the average classification accuracy. Furthermore, the phenomenon we de28

ACCEPTED MANUSCRIPT

1.00

1.00

Pick Out? No

0.75

0.50

CVProb = 0.113 0.25

Yes

0.75

CVProb = 0.49 0.50

CR IP T

Final Evaluated Probability

Final Evaluated Probability

CVProb = 0.14

CVProb = 0.45

0.25

CVProb = 0.09

CVProb = 0.4

0.00

0.00 0.00

0.25

0.50

0.75

1.00

Evaluated Probability by EEG−decoding Module

0.00

0.25

0.50

0.75

1.00

Evaluated Probability by EEG−decoding Module

(a)

(b)

AN US

Figure 7: Illustration of prior-posterior probability spread. (a) If an image was classified to the target category by the CV system and the CV probability was larger than 0.113, the possibility of it to be finally recognized

as a target image was promoted. If the CV probability was smaller than 0.113, a larger EEG decoding probability was required for the image to be finally recognized as a target. (b) If an image was not classified to the target category by the CV system, the possibility of it to be finally recognized as a target image was

M

strongly diminished.

scribed when discussing the CV performance could also be partly ascribed to this scale technique since CV estimated probabilities were usually larger than 0.113.

540

ED

Global Robustness. Inspecting the whole procedure of the BHCR framework, we noticed that the EEG decoding module followed EEG signal collection. Then, the target category discovery for the CV system processed with the help of EEG decoding

PT

results, and the prior-posterior probability spread followed. With the above analysis on the relevant components of the BHCR framework, we concluded that all components

CE

of the framework performed steadily. Thus, from a global view, the BHCR framework 545

was robust when the EEG signals of a subject correctly captured information about the retrieval task. Since the latter condition was generally accepted by common BCI

AC

applications, this further supports the global robustness of the BHCR framework.

29

ACCEPTED MANUSCRIPT

6. A Propagation Scheme & An One-shot Imagery Database Retrieval Scheme 6.1. A Probability Propagation Scheme Although the CV system performed well with respect to the accuracy, there re-

550

CR IP T

mained a portion of images were wrongly classified. To solve this problem, we de-

signed a probability propagation scheme that improves the CV system by amending CV estimated probabilities with the EEG decoding probabilities among test RSVP sequences.

For an image, the EEG estimated probability of it being a target image will be used

555

to amend it’s CV estimated probability. Since the EEG decoding module sometimes

AN US

recognizes false alarms, we cannot ensure the amending process succeeds in just one time. Actually, we assume a sequential amending process for each image in the image database.

Suppose that an image IoDi had been recognized as target image in n test RSVP se-

560

quences and the EEG interest scores of IoDi among those sequences were Pe,1 , Pe,2 , ..., Pe,n , process worked as follows:

M

the initial CV estimated probabilities of IoDi were P1,i , P2,i , ..., P257,i . The amending

0

Step 1. For each j ∈ 1, 2, ...n, calculate Pe,j =

Pe,j ∗PE Pe,j ∗PE +(1−Pe,j )∗(1−PE ) ,

where PE

ED

is the precision of the EEG decoding module;

565

Step 2. For each j ∈ 1, 2, ...n, if image IoDi was recognized to ith category, update 0

PT

the probability Pj,i = Pj,i + Pe,j ;

Step 3. Rescale probabilities P1,i , P2,i , ..., P257,i to a sum of 1.

CE

Since the possibility of an EEG recognized image being a true target was about 570

9.11 = 6.38/0.70 times as the possibility of it being a false alarm, and the process of

target category discovery worked steadily, we can expect that in the Step.2, the prob-

AC

abilities will accumulate on the exact category. So that the updated CV probabilities will be close to the reality.

575

We conducted a simulation of the propagation process and got encouraging figures.

In the simulation, we tried to find out how many wrongly classified images can be corrected to the right category with different times of propagation. The simulation was conducted by 4 steps: 30

ACCEPTED MANUSCRIPT

Table 7: Performance of The Propagation Scheme

C ORRECTED P ERCENTAGE

1

74.48%(0.28%)

2

91.44%(0.29%)

3

98.44%(0.21%)

4

99.75%(0.07%)

5

99.95%(0.05%)

6

99.99%(0.01%)

7

100.00%(0.00%)

AN US

CR IP T

P ROPAGATION T IMES

(1) Make normal distribution hypothesis for the distribution of the EEG decoding probabilities and apply hypothesis testing; 580

(2) If the times of propagation is n, we generate n ∗ Nwc probabilities Pe,i,j , i = 1, 2, ..., n, j = 1, 2, ..., Nwc as a simulation of EEG decoding probabilities accord-

M

ing to the hypothesis (Nwc = 3563 is the number of wrongly classified images); (3) For each j ∈ 1, 2, ..., Nwc and its corresponding image IoW Cj , simulate their dis-

ED

covered target categories n times by keeping the right-wrong category ratio consistent with the true-false ratio of the EEG recognized images (the ratio equals to

585

9.11). And make sure the wrongly discovered categories are randomly selected

PT

from the rest 256 categories.

(4) Apply the proposed propagation scheme to all the wrongly classified images.

CE

We ran 10 times of the simulation process to get figures of the propagation scheme 590

(see Table 7). It told that, 74.5%(sd = 0.28%, N = 10) of the wrongly classified images were corrected to the right categories with 1 propagation, and the performance

AC

got better with the increase of propagation time. Finally, all the wrongly classified images were corrected to the right category when the time was set to 7. 6.2. An One-shot Imagery Database Retrieval Scheme

595

Since the CV system contained classification information of the whole image dataset, and the target category discovery offered sound results, it’s naturally for us to conduct 31

ACCEPTED MANUSCRIPT

Table 8: Performance of The Proposed Database Retrieval Scheme.

T EST S EQUENCES

A LL C ATEGORIES

K = 50

100.00%(0.00%)

100.00%(0.00%)

K = 80

97.45%(5.30%)

97.69%(4.93%)

K = |C|

94.39%(5.10%)

96.86%(4.87%)

CR IP T

P RECISION

database retrieval after target image retrieval in RSVP presented images. Here we provide an one-shot image database retrieval scheme and illustrate its performance.

600

AN US

Suppose a test RSVP sequence with images I1 , I2 , ..., I96 had been presented to the

subject and the BHCR framework returned the target category C and a set of recognized target images IT = {Ii |Ii is recognized as a target image}. A set RD with K images were expected to return as the image database retrieval result. The one-shot retrieval process worked as follows:

605

M

Step 1. The set IT forms the first part of RD ;

Step 2. Sort all the images in the image database (except for images in IT ) in a decreasing order w.r.t the CV estimated probabilities belonging to category C;

ED

Step 3. The rest part of RD consists of the top K − |IT | images of the sorted list derived from Step.2.

610

PT

Performance of the image database retrieval scheme was measured by 2 ways: (1) Conducting the scheme for all test RSVP sequences in our experiments; (2) Inspecting

CE

retrieval performances to all the 257 categories with simulated test sequences. Since the minimum number of images in a category was 80, we setted K = 50 and K = 80 to observe the performance. In addition, we also set K to the exact number of images

AC

of each category (K = |C|) for inspecting. Table 8 shows that the one-shot retrieval

615

scheme achieved retrieval precisions higher than 90% in all the inspected settings.

32

ACCEPTED MANUSCRIPT

7. Discussion 7.1. Practicability of the BHCR Framework As a real-time application of BCI, the practicability of BHCR framework is an

620

CR IP T

important concern. Since we had demonstrated the retrieval performance of the BHCR framework, what remains to be explained is the time efficiency.

Recall that, for a subject, the following requirements are necessary before starting the operation of the BHCR framework: i) Get the subject-specific EEG decoding module ready; ii) Get the CV system ready; iii) The BBCI module is ready for its

AN US

calculations. Keep the requirements in mind, we can give an efficiency analysis here.

The operation of the BHCR framework for a RSVP sequence consists of: i) Esti-

625

mate the EEG interest scores for the presented images; ii) Extract a probability matrix for the sequence from the CV system; iii) Run the BBCI module to get the retrieval result. Among the 3 steps: i) The first step is common for all the existing real-time BCIs; ii) The second step is an indexed reading process from local storage, with a negligible time consumption. Thus the efficiency of the BBCI module remains a decisive factor

M

630

of the practicability of BHCR framework.

Suppose a RSVP sequence consists of n images. The BBCI operation can be di-

ED

vided into several parts: i) The target category discovery procedure; ii) The matrix to array procedure; iii) The Bayesian model for each image. The total calculation complexity of all the above parts is O(n) since the number of operations is fixed (or has a

PT

635

constant upper bound) when the application setting (e.g., the category number K and the number of randomized trails N for each image, etc) is settled.

CE

With the above efficiency analysis, we point out that the proposed framework meets

the requirement of a real-time BCI application. In our experiments, we found no significant difference between the running time of BHCR framework and an EEG-only

AC

640

retrieval mechanism, which confirms our conclusion. 7.2. Comparison with Previous Works Since the BHCR framework focuses on the target image retrieval in RSVP se-

quences, we first compare our work to previous works on the same task. Actually,

33

ACCEPTED MANUSCRIPT

645

we had conducted a comparison with respect to the EEG decoding methodology in Section 4.3 and Section 5.3. Firstly, the comparison setting of feature extraction methods in Section 4.3 had taken the classical unsupervised method PCA [31, 32] and the

CR IP T

commonly used supervised method LDA into consideration [33, 1, 9]. Secondly, the candidate classification algorithms we had inspected on include SVM [15] and LR [1], 650

which were frequently used in previous works. We also considered classification al-

gorithms such as adaboost [16], bagging [17], ANN [18], and RF [19], which were

fewer used in similar tasks. In light of the overview paper [7], feature extraction and classification should not be regarded as isolated processes. Thus we conducted the

655

AN US

comparison on combinations of the 2 processes. For example, the combination of LDA

and LR in our comparison employed the same methodology of the paper [22]. Some other combinations may not have corresponding existing works, our comparison had also provided sound experimental references. Note that, recently, deep learning techniques were employed to solve the EEG-based classification problem [34, 35], it is worthy to conduct comparisons with them in future works.

Besides the EEG decoding methodology, we can also compare our work to previous

M

660

works that employ extra information to enhance the performance of retrieving target

ED

images in RSVP sequences. As mentioned in Section 1, iterative processes have been proposed in [1, 8] for RSVP target image retrieval so that a good initial set of target images is sent for further image database retrieval. These iterative processes generally require extra manual labor, such as viewing additional RSVP images. Other ways to

PT

665

utilize extra information have been exploited very limited. For example, A. R. Marathe

CE

and his cooperators [36] introduced a measure of confidence to filter non-target images, but they simply corrected the filtered images by human labeling. Compare to manual-cost methods, our framework provided an approach by which we can improve the retrieval performance without extra manual labor.

AC

670

7.3. Potential of Future BCIs Reduce manual labor for future retrieval tasks. The BHCR framework was

demonstrated an attractive way of brain-computer interaction. By fully utilizing the information provided by the CV system, the BHCR framework can recognize more 34

ACCEPTED MANUSCRIPT

675

true targets and fewer false alarms in RSVP sequences while no additional manual labor is required. The sound target category discovery performance showed in Section 5.3 implies that we no longer need to present extra images to subject for a category guess,

CR IP T

which plays an important role in the closed-loop C3Vision system [1]. In addition, the recognized images covered almost all the target images. These observations suggest 680

that, base on the BHCR framework, future systems for large database retrieval are expected to be free of multiple presentations or to reduce extra manual labor while achieving better retrieval result.

Refine CV system with HV insights. In Section 6.1, a simple propagation method

685

AN US

was proposed for improving the CV system with HV insights. For each image, the

propagation process is efficient, but it requires many RSVP sequences to modify all the wrongly classified images to the right category. A probable way to improve the solution is taking image similarities into consideration. If a similarity-based CV component [1, 8, 24] is introduced, it may also be useful to improve the database retrieval performance.

Employ Other EEG decoding techniques. Although feature extraction tech-

M

690

niques and classification methods are commonly used in EEG-based BCI applications,

ED

there are also other techniques can be used to decode EEG signals. Channel selection method [37] for dimension reduction and feature learning techniques [38, 39, 40] aims at denoising or filtering can be used in a pre-processing stage of EEG signals. As for the recognition stage, projective dictionary pair learning [41] and mask-based method

PT

695

[42] can be used for the classification. Since the EEG signals can be treated as a high

CE

dimension tensors, techniques such as nonnegative tensor factorization [43] and other tensor processing method [44] may also be taken into consideration. Further application situations. We suggest that the BHCR framework can be ap-

plied to a variety of future recognition tasks and some complex but realistic situations.

AC

700

For example, when we fail to keep the user’s brain activity in a steady mode, when the images presented to the user are messy and dim [45], etc. We also believe that there remains a large space to be exploited in constructing different efficient BCIs whether based on our framework or not.

35

ACCEPTED MANUSCRIPT

705

8. Conclusion In this paper, we presented a BCI framework BHCR based on a BBCI method for target image retrieval in RSVP sequences. By fully taking advantage of CV and HV

CR IP T

insights, the BHCR framework achieved better retrieval performance compared to the

EEG-only mechanism. We also compared the BHCR framework to previous works in 710

several ways to reveal its advantages. To construct a well-suited EEG decoding module

for BHCR, we then conducted a comparison of classification algorithms, which indi-

cated that RF outperformed other candidate classification algorithms in the EEG decoding task. More deeply investigating the image database retrieval task, we provided

715

AN US

a probability propagation scheme and a one-shot image database retrieval scheme. Although there are still many interesting work directions around the BHCR framework, and some aspects of the BHCR framework could be improved in the future, this study suggests that the proposed framework could be useful for assisting individuals in han-

Acknowledgements

M

dling target image retrieval tasks in RSVP sequences.

The authors would like to thank Dr. Linyuan Wang and Vice Prof. Li Tong from

720

ED

Zhengzhou Information Science and Technology Institute for their invaluable expert advice that makes this paper successfully completed. We thank LetPub for linguistic assistance for this manuscript.

commercial, or not-for-profit sectors.

CE

725

PT

The research did not receive any specific grant from funding agencies in the public,

References

AC

[1] E. A. Pohlmeyer, J. Wang, D. C. Jangraw, B. Lou, S.-F. Chang, P. Sajda, Closing

730

the loop in cortically-coupled computer vision: a brain? computer interface for searching image databases, Journal of neural engineering 8 (3) (2011) 036025.

[2] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, 2012, pp. 1097–1105. 36

ACCEPTED MANUSCRIPT

[3] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrell, Caffe: Convolutional architecture for fast feature embedding, in: Proceedings of the ACM International Conference on Multimedia, ACM, 2014,

735

CR IP T

pp. 675–678. [4] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556.

[5] J. R. Wolpaw, N. Birbaumer, W. J. Heetderks, D. J. McFarland, P. H. Peckham, G. Schalk, E. Donchin, L. A. Quatrano, C. J. Robinson, T. M. Vaughan, et al.,

740

AN US

Brain-computer interface technology: a review of the first international meeting, IEEE transactions on rehabilitation engineering 8 (2) (2000) 164–173.

[6] P. Sajda, A. Gerson, L. Parra, High-throughput image search via single-trial event detection in a rapid serial visual presentation task, in: Neural Engineering, 2003. Conference Proceedings. First International IEEE EMBS Conference on, IEEE,

745

M

2003, pp. 7–10.

[7] J. Mak, Y. Arbel, J. Minett, L. McCane, B. Yuksel, D. Ryan, D. Thompson,

ED

L. Bianchi, D. Erdogmus, Optimizing the p300-based brain–computer interface: current status, limitations and future directions, Journal of neural engineering 8 (2) (2011) 025003.

750

PT

[8] M. Uˇsc´ umli´c, R. Chavarriaga, J. d. R. Mill´an, An iterative framework for eegbased image search: robust retrieval with weak classifiers, PloS one 8 (8) (2013)

CE

e72018.

[9] L. C. Parra, C. D. Spence, A. D. Gerson, P. Sajda, Recipes for the linear analysis of eeg, Neuroimage 28 (2) (2005) 326–341.

AC

755

[10] A. D. Gerson, L. C. Parra, P. Sajda, Cortically coupled computer vision for rapid image search, Neural Systems and Rehabilitation Engineering, IEEE Transactions on 14 (2) (2006) 174–179.

37

ACCEPTED MANUSCRIPT

[11] N. Bigdely-Shamlo, A. Vankov, R. R. Ramirez, S. Makeig, Brain activity-based image classification from rapid serial visual presentation, Neural Systems and

760

Rehabilitation Engineering, IEEE Transactions on 16 (5) (2008) 432–441.

CR IP T

[12] M. Dyrholm, L. C. Parra, Smooth bilinear classification of eeg, in: Engineering

in Medicine and Biology Society, 2006. EMBS’06. 28th Annual International Conference of the IEEE, IEEE, 2006, pp. 4249–4252. 765

[13] M. Dyrholm, C. Christoforou, L. C. Parra, Bilinear discriminant component analysis, The Journal of Machine Learning Research 8 (2007) 1097–1111.

AN US

[14] L. C. Parra, C. Christoforou, A. D. Gerson, M. Dyrholm, A. Luo, M. Wagner,

M. Philiastides, P. Sajda, Spatiotemporal linear decoding of brain state, Signal Processing Magazine, IEEE 25 (1) (2008) 107–115. 770

[15] S. Mathan, S. Whitlow, D. Erdogmus, M. Pavel, P. Ververs, M. Dorneich, Neurophysiologically driven image triage: a pilot study, in: CHI’06 Extended Abstracts

M

on Human Factors in Computing Systems, ACM, 2006, pp. 1085–1090. [16] S. Xiao, B. Cai, L. Jiang, Y. Wang, W. Chen, X. Zheng, Erp component analysis

ED

for rapid image searching in finer categories, in: 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC),

775

PT

IEEE, 2013, pp. 3089–3092.

[17] D. Izzo, M. Rucinski, C. Ampatzis, E. Martin, On the eeg footprint of image

CE

saliency.

[18] E. Haselsteiner, G. Pfurtscheller, Using time-dependent neural networks for eeg classification, IEEE transactions on rehabilitation engineering 8 (4) (2000) 457–

AC

780

463.

[19] F. Akram, S. M. Han, T.-S. Kim, An efficient word typing p300-bci system using a modified t9 interface and random forest classifier, Computers in biology and medicine 56 (2015) 30–36.

38

ACCEPTED MANUSCRIPT

785

[20] S. Sun, C. Zhang, D. Zhang, An experimental evaluation of ensemble methods for eeg signal classification, Pattern Recognition Letters 28 (15) (2007) 2157–2163. [21] J. Wang, E. Pohlmeyer, B. Hanna, Y.-G. Jiang, P. Sajda, S.-F. Chang, Brain state conference on Multimedia, ACM, 2009, pp. 945–954.

790

CR IP T

decoding for rapid image retrieval, in: Proceedings of the 17th ACM international

[22] P. Sajda, E. Pohlmeyer, J. Wang, L. C. Parra, C. Christoforou, J. Dmochowski,

B. Hanna, C. Bahlmann, M. K. Singh, S.-F. Chang, In a blink of an eye and a switch of a transistor: cortically coupled computer vision, Proceedings of the

AN US

IEEE 98 (3) (2010) 462–478.

[23] Y. Wang, L. Jiang, B. Cai, Y. Wang, S. Zhang, X. Zheng, A closed-loop system for rapid face retrieval by combining eeg and computer vision, in: Neural Engi-

795

neering (NER), 2015 7th International IEEE/EMBS Conference on, IEEE, 2015, pp. 130–133.

M

[24] Y. Wang, L. Jiang, Y. Wang, B. Cai, Y. Wang, W. Chen, S. Zhang, X. Zheng, An iterative approach for eeg-based rapid face search: A refined retrieval by brain computer interfaces, Autonomous Mental Development, IEEE Transactions on

ED

800

7 (3) (2015) 211–222.

PT

[25] G. Griffin, A. Holub, P. Perona, Caltech-256 object category dataset. [26] M. C. Potter, E. I. Levy, Recognition memory for a rapid sequence of pictures.,

CE

Journal of experimental psychology 81 (1) (1969) 10. 805

[27] E. Alfaro, M. G´amez, N. Garcia, Adabag: An r package for classification with boosting and bagging, Journal of Statistical Software 54 (2) (2013) 1–35.

AC

[28] F. Kemp, Modern applied statistics with s, Journal of the Royal Statistical Society: Series D (The Statistician) 52 (4) (2003) 704–705.

[29] A. Liaw, M. Wiener, Classification and regression by randomforest, R news 2 (3)

810

(2002) 18–22.

39

ACCEPTED MANUSCRIPT

[30] E. Dimitriadou, K. Hornik, F. Leisch, D. Meyer, A. Weingessel, Misc functions of the department of statistics (e1071), tu wien, R package 1 (2008) 5–24. [31] K. C. Squires, E. Donchin, R. I. Herning, G. McCarthy, On the influence of task

CR IP T

relevance and stimulus probability on event-related-potential components, Electroencephalography and clinical neurophysiology 42 (1) (1977) 1–14.

815

[32] E. Bullmore, S. Rabe-Hesketh, R. Morris, S. Williams, L. Gregory, J. Gray,

M. Brammer, Functional magnetic resonance image analysis of a large-scale neurocognitive network, NeuroImage 4 (1) (1996) 16–33.

AN US

[33] Z. Mao, V. Lawhern, L. M. Merino, K. Ball, L. Deng, B. J. Lance, K. Rob-

bins, Y. Huang, Classification of non-time-locked rapid serial visual presentation

820

events for brain-computer interaction using deep learning, in: Signal and Information Processing (ChinaSIP), 2014 IEEE China Summit & International Conference on, IEEE, 2014, pp. 520–524.

M

[34] Y. Ren, Y. Wu, Convolutional deep belief networks for feature extraction of eeg signal, in: 2014 International Joint Conference on Neural Networks (IJCNN),

825

IEEE, 2014, pp. 2850–2853.

ED

[35] S. Ding, N. Zhang, X. Xu, L. Guo, J. Zhang, Deep extreme learning machine and its application in eeg classification, Mathematical Problems in Engineering 2015.

PT

[36] A. R. Marathe, A. J. Ries, V. J. Lawhern, B. J. Lance, J. Touryan, K. McDowell, H. Cecotti, The effect of target and non-target similarity on neural classification

830

CE

performance: a boost from confidence, Frontiers in neuroscience 9. [37] Z. Qiu, J. Jin, H.-K. Lam, Y. Zhang, X. Wang, A. Cichocki, Improved sffs method

AC

for channel selection in motor imagery based bci, Neurocomputing.

[38] J. Li, Z. Struzik, L. Zhang, A. Cichocki, Feature learning from incomplete eeg

835

with denoising autoencoder, Neurocomputing 165 (2015) 23–31.

[39] Y. Zhang, G. Zhou, J. Jin, X. Wang, A. Cichocki, Optimizing spatial patterns with sparse filter bands for motor-imagery based brain–computer interface, Journal of neuroscience methods 255 (2015) 85–91. 40

ACCEPTED MANUSCRIPT

[40] K. Sadatnejad, S. S. Ghidary, Kernel learning over the manifold of symmetric positive definite matrices for dimensionality reduction in a bci application, Neu-

840

rocomputing 179 (2016) 152–160.

CR IP T

[41] R. Ameri, A. Pouyan, V. Abolghasemi, Projective dictionary pair learning for eeg

signal classification in brain computer interface applications, Neurocomputing 218 (2016) 382–389. 845

[42] J. Li, Y. Wang, L. Zhang, A. Cichocki, T.-P. Jung, Decoding eeg in cognitive tasks with time-frequency and connectivity masks.

AN US

[43] Y. Zhang, G. Zhou, Q. Zhao, A. Cichocki, X. Wang, Fast nonnegative tensor fac-

torization based on accelerated proximal gradient and low-rank approximation, Neurocomputing 198 (2016) 148–154. 850

[44] Y. Zhang, Q. Zhao, G. Zhou, J. Jin, X. Wang, A. Cichocki, Removal of eeg artifacts for bci applications using fully bayesian tensor completion, in: 2016 IEEE

M

International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2016, pp. 819–823.

855

ED

[45] M. Uˇsc´ umli´c, B. Blankertz, Active visual search in non-stationary scenes: coping with temporal variability and uncertainty, Journal of neural engineering 13 (1)

AC

CE

PT

(2016) 016015.

41

ACCEPTED MANUSCRIPT

Biography of the Authors Liangtao Huang

CR IP T

Liangtao Huang received his B.S. degree of applied mathematics in the summer of 2014 from the Information Engineering University, Zhengzhou, China. Now, he is a Master degree candidate of applied mathematics in National Digital Switching System Engineering & Technological R&D Center (NDSC), China. His research interests are in statistics, machine learning and their applications. Yaqun Zhao

AN US

Yaqun Zhao was born in 1961. She received her B.S., M.S. and Ph.D. degree in Mathematics from Zhengzhou Information Science and Technology Institute, China. She is currently a professor with National Digital Switching System Engineering & Technological R&D Center in Zhengzhou, China. Her research interests are in statistics, applied probability and data analysis. Ying Zeng

Zhimin Lin

ED

M

Ying Zeng was born in 1983. She received the B.S., M.S., and Ph.D. degrees from the Zhengzhou Information Science and Technology Institute in 2004, 2007, and 2011, respectively. Currently, she is a lecturer with Zhengzhou Information Science and Technology Institute. Her current research interests are in pattern recognition, brain-computer interface (BCI) and information security technique.\par

AC

CE

PT

Zhimin Lin was born in 1992. He received the B.S. degree in electrical engineering from Nantong University, Nantong, China, in 2014. He is currently working toward the M.S. degree at the Zhengzhou Information Science and Technology Institute, Zhengzhou. His current research interests include biomedical signal processing and brain computer interface.

ACCEPTED MANUSCRIPT

Photo of the Authors

Yaqun Zhao

AC

CE

PT

ED

Ying Zeng

M

AN US

CR IP T

Liangtao Huang

Zhimin Lin

BHCR: RSVP target retrieval BCI framework coupling with CNN by a Bayesian method

BHCR: RSVP target retrieval BCI framework coupling with CNN by a Bayesian method

Recommend Documents