Accepted Manuscript
BHCR: RSVP Target Retrieval BCI Framework Coupling with CNN by A Bayesian Method Liangtao Huang, Yaqun Zhao, Ying Zeng, Zhimin Lin PII: DOI: Reference:
S0925-2312(17)30169-8 10.1016/j.neucom.2017.01.061 NEUCOM 17981
To appear in:
Neurocomputing
Received date: Revised date: Accepted date:
28 August 2016 11 December 2016 22 January 2017
Please cite this article as: Liangtao Huang, Yaqun Zhao, Ying Zeng, Zhimin Lin, BHCR: RSVP Target Retrieval BCI Framework Coupling with CNN by A Bayesian Method, Neurocomputing (2017), doi: 10.1016/j.neucom.2017.01.061
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Highlights • A BCI framework couples HV and CV for target image retrieval is proposed.
CR IP T
• A Bayesian brain-computer interaction method consists the core of our framework.
• We conduct a comparison on classification algorithm for EEG-decoding module.
• A CV system based on convolutional neural network is introduced for our framework.
AC
CE
PT
ED
M
AN US
• Further propose a propagation scheme and an image database retrieval scheme.
1
ACCEPTED MANUSCRIPT
BHCR: RSVP Target Retrieval BCI Framework Coupling with CNN by A Bayesian Method
a State
CR IP T
Liangtao Huanga,b,∗, Yaqun Zhaoa,b , Ying Zengc,d , Zhimin Linc Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou, China Digital Switching System Engineering & Technological R&D Center, Zhengzhou, China c Zhengzhou Information Science and Technology Institute, Zhengzhou, China d University of Electronic Science And Technology of China, Chengdu, China
b National
AN US
Abstract
To combine the complementary strengths of human vision (HV) and computer vision (CV) in target image retrieval, we proposed a brain-computer interface framework, Bayesian HV-CV Retrieval (BHCR), which couples HV with CV by a Bayesian method to retrieve target images in rapid serial visual presentation (RSVP) sequences. To construct a well-suited electroencephalogram (EEG) decoding module for BHCR,
M
we conducted a comparative inspection on the selection of classification algorithms, and adopted linear discriminant analysis and random forests as a feature extraction
ED
method and classification algorithm, respectively. We also introduced a CV system based on convolutional neural network (CNN) as a component of BHCR. A Bayesian brain-computer interaction (BBCI) module was carefully designed so that for each pre-
PT
sented image, a Bayesian model that takes HV insight as prior information and CV insights as sample information is built up to present retrieval results. Unlike existing HV-CV coupled works that usually require extra manual labor, BHCR directly en-
CE
hanced retrieval performance with the help of CV insights. As an auxiliary work and a natural extension of BHCR, we then proposed a probability propagation scheme that
AC
incorporates EEG decoding insights to improve the CV system and a one-shot image database retrieval scheme. We demonstrated the effectiveness of BHCR by extensive experiments and simulations on both the entire framework and its sub-components. ∗ Corresponding
author Email addresses:
[email protected] (Liangtao Huang ),
[email protected] (Yaqun Zhao),
[email protected] (Ying Zeng),
[email protected] (Zhimin Lin)
Preprint submitted to Journal of LATEX Templates
February 6, 2017
ACCEPTED MANUSCRIPT
The results showed the following: (1) The performance of BHCR was significantly better than the EEG-only mechanism in both receiver operating characteristic (ROC) and classification aspects; (2) The robustness of BHCR was ensured by its process flow
CR IP T
and the steady performances of its sub-components. Keywords: brain-computer interface (BCI), electroencephalogram (EEG), rapid serial visual presentation (RSVP), target image retrieval, convolutional neural network (CNN), Bayesian brain-computer interaction 2010 MSC: 00-01, 99-00
AN US
1. Introduction
As a classical topic in the domain of image processing, image recognition has attracted much attention from academia to industry for a very long time. In many cases, we want to search a certain kind of image that has specific content within a set of 5
images or even a large image database.
M
Human vision (HV) may be the most effective and robust way to fulfill such tasks since the visual system of human beings can parse scenes and recognize objects easily, despite wide variations of images in scale, lighting, pose, background, etc. However,
10
ED
HV is often thought to be inferior to computer vision (CV) in terms of efficiency, even though HV can finish with an image in as little as a few hundred milliseconds [1].
PT
Recent years have seen great achievements in the field of CV, which aims at solving various image recognition problems with computers. The arrival of a variety of deep learning methods, especially convolutional neural networks (CNNs) [2, 3, 4] for image
CE
classification, has boosted the CV developments and attracted the attention of many
15
researchers. Being focused on automatic imagery processing, state-of-the-art CV systems have impressed people with their great efficiency and effectiveness. Nevertheless,
AC
when the processing task is not well-defined, or the images to be processed contain unstructured information, CV systems usually do not perform as good as HV.
20
Considering their respective advantages and shortcomings, a promising direction
is to combine the complementary strengths of CV and HV. From this perspective, the interaction between humans and computers is the first thing that should be considered.
3
ACCEPTED MANUSCRIPT
Over the past 2 decades, noninvasive electroencephalogram (EEG) recordings have been successfully utilized to quantify human perception to certain kinds of stimuli [5, 6, 7], which makes it possible to perform direct interactions between brain and computer. Unlike traditional human-computer interfaces which employ extra devices,
CR IP T
25
e.g., buttons or a mouse, these so called “brain-computer interfaces (BCIs)” aim at “reading the mind,” can realize brain-actuated control, and help handicapped users with complex and difficult tasks.
In this paper, we focused on a BCI that makes use of a subject’s EEG signals for 30
image retrieval [8]. In such BCI applications, images are presented to the subject by
AN US
a specific paradigm; meanwhile, the subject’s EEG signals are collected and analyzed in real-time (a person attentive to certain stimuli will produce measurable scalp EEG signals, in particular, event-related potentials such as the P300 [1]). The problem we considered here is target image retrieval in rapid serial visual presentation (RSVP) 35
sequences, which aims at searching for images that the subject is interested in among a set of images that are presented to the subject using an RSVP paradigm.
M
Previous works have provided enlightening guidance for this study. The most common strategy of solving the target image retrieval problem in RSVP sequences works
40
ED
in a 2-stage manner: (1) The EEG signals are firstly filtered by a spatiotemporal feature extraction method, or down-sampled by a specific operation; (2) Then, a binary classification approach judges whether an image is a target image. Linear discriminant
PT
analysis (LDA, [6, 9, 10, 1]), principal component analysis (PCA, [9]), and independent components analysis [11] has been demonstrated as practical feature extraction
CE
methods. There are also specifically designed feature extraction methods, such as bi45
linear discriminant analysis [12], bilinear discriminant component analysis [13], etc. Among these methods, LDA is most commonly utilized. For summaries of feature
AC
extraction methods, we refer readers to the literature [14, 7]. In the second retrieving stage, classification algorithms, such as logistic regression (LR) [1] and support vector machine (SVM) [15], are trained and used to detect target images in RSVP sequences.
50
For classification algorithms, we further refer readers to other literature[16, 17, 18, 19] and the experimental comparative research on ensemble algorithms [20]. Recent research take target image retrieval in RSVP sequences as a key step in 4
ACCEPTED MANUSCRIPT
image searching tasks in large-scale image databases [21, 22, 1, 8]. In these works, open-loop processes [21, 22] or closed-loop (iterative) processes [1, 8], which often 55
incorporate extra manual labor, were proposed for RSVP target image retrieval so that
CR IP T
a good initial set of target images was sent to further image database retrieval schemes. Lately, this kind of research has been extended to face retrieval tasks [23, 24]. Since database retrieval schemes are commonly linked to a pre-trained CV system, these works have presented a real sense of combining HV with CV.
However, after investigating these works fully focused on target image retrieval in
60
RSVP sequences and those treating it as a subpart of image database retrieval systems,
AN US
we found that there remains some problems. The first is to discover a better EEG decoding method. Since comparative research in the problem of decoding EEG signals is not adequate, there still exists potential for exploration. Second, we should investigate 65
whether there is any better way of building brain-computer interactions. Specifically, we desire a brain-computer interaction method that directly improves target image retrieval performance in RSVP sequences without extra manual labor, but with the help
M
of CV insights. With such an interaction method, not only would the original problem of target image retrieval be better solved, but also retrieval in large image databases would be more efficient (with less manual labor involved in iterations, or even free of
ED
70
extra manual labor).
The main contribution of this paper can be summarized as follows: (1) We com-
PT
pared EEG decoding performance of 6 classification algorithms on 3 feature sets derived by different feature extraction methods. Comparisons showed that the random forest (RF) outperformed all other candidate algorithms. (2) We proposed an inno-
CE
75
vative BCI framework Bayesian HV-CV Retrieval (BHCR) for target image retrieval in RSVP sequences based on a Bayesian brain-computer interaction (BBCI) method,
AC
which incorporates CV insights provided by a pre-trained CNN CV system to improve retrieval performance. Experiments showed that the retrieval performance of BHCR
80
improved significantly compared to the EEG-only scheme in both the receiver operating characteristic (ROC) and classification aspects. Furthermore, we correctly discovered the category of target images in all test sequences and almost all simulation sequences (99.7%), which indicated the potential of our RSVP retrieval results to be a 5
ACCEPTED MANUSCRIPT
good starting point for following retrieval in a large image database. (3) We designed 85
a probability propagation scheme that improvs the CV system with the help of HV insights, and then presented a one-shot imagery database retrieval scheme as a natural
CR IP T
extension of our work. The remainder of this paper is organized as follows: We first introduce the construction of the BHCR framework in Section 2. Then, we describe the proposed BBCI 90
method in detail in Section 3. The comparison between different classification algorithms is presented in Section 4. Extensive experiments and simulations for demonstrating the BHCR framework are shown in Section 5. The proposed probability prop-
AN US
agation scheme and the one-shot image database retrieval scheme are illustrated in Section 6. In Section 7, we compare our work to previous works, and discuss future 95
topics. Finally, we conclude this paper in Section 8. 2. Framework Construction
M
In this section, we overview the proposed framework, and then introduce the sub-
2.1. Overview
ED
modules.
The proposed BHCR framework consists of 5 primary components: (1) an image
100
database; (2) a pre-trained CV system; (3) an RSVP presenting system with an EEG
PT
signal collecting device; (4) an EEG interest decoding module; and (5) a BBCI module. Figure.1 illustrates the connections between these components, and Algorithm.1
CE
summarizes the entire procedure of finding target images in an RSVP sequence. The retrieval started with a set of images randomly sampled from the image database.
105
These images were then presented to the subject using an RSVP paradigm while the
AC
subject’s EEG signals were collected. According to the collected EEG signals, a subject-specific EEG interest decoding module (created previously by a training phase) estimated the subject’s interest in the presented images. For each image, an estimation
110
result (an interest score) was returned as a probability of it being a target image. After estimation of the EEG interest decoding module, the presented images were not
6
Pre-Trained
Computer Vision: CNN
CR IP T
ACCEPTED MANUSCRIPT
Take Out Classification Probabilities of Presented Images
Random Sample
AN US
Image Database
Interest Scores (probabilities)
EEG Decoding Module
Baysian BrainComputer Interaction
Newly Estimated Probabilities
Final RSVP Target Image Retrieval Results
ED
M
RSVP Presentation
EEG Signal Collecting
Matrix of Classification Probability
Figure 1: BHCR Framework. First, a set of images from the database is randomly sampled and presented to the subject using the RSVP paradigm. The subject’s EEG signals are collected in the meantime, and
PT
(probability-formed) interest scores are assigned to each RSVP image based on the EEG response. These scores are subsequently sent to a BBCI module as an input. The other input of the BBCI module is a matrix in which each row corresponds to an array of classification probabilities of an RSVP image returned by a
CE
pre-trained CV system (based on a CNN). The BBCI module then executes the following. (i) The EEG-based scores are treated as prior information by establishing prior distributions for each image. (ii) The matrix from the CV system is transformed into an array of probabilities after the operation of target category discovery, and the BBCI module recovers sample information by conducting randomized trials for each image. (iii)
AC
With prior distribution and sample information ready, a Bayesian mechanism is employed to estimate the posterior probability of each image. (iv) The newly estimated (posterior) probabilities are gathered as an array, and the final retrieval result is derived by setting a threshold to the posterior array. See Section 3 for more details about the BBCI module.
7
CR IP T
ACCEPTED MANUSCRIPT
Algorithm 1 BHCR Framework Offline processing:
AN US
1. Train the CNN-based CV system for the image database.
2. Train the subject-specific EEG interest decoding module.
3. Set a classification threshold pthr in light of retrieval preferences. Online processing:
0. Suppose an RSVP sequence contains n images, online processing starts here. 1. Randomly sample n images I1 , I2 , ..., In from the image database, tell the subject
M
the category to be focused on. presentation.
ED
2. Display images I1 , I2 , ..., In to the subject via RSVP, collect EEG signals during 3. Decode the collected EEG signals, and return probability formed interest scores IS IS P IS = (pIS 1 , p2 , ..., pn ).
PT
4. Run the Bayesian human-computer interaction, a new (posterior) probability array P P P P = (pP 1 , p2 , ..., pn ) is returned.
AC
CE
5. Get RSVP target image retrieval results by comparing elements of P P to pthr .
8
ACCEPTED MANUSCRIPT
flagged directly. Instead, estimated probabilities were transferred to a BBCI module. Then, the Bayesian module combined EEG decoding results with image classification insights from the pre-trained CV system, providing new probability estimations of the presented images (for details, see Figure 1 and Section 3). Finally, target image re-
CR IP T
115
trieval results were derived by setting a probability threshold and comparing the newly estimated probabilities to it. 2.2. Image Database & CV System
All presented images were sampled from the image database Caltech-256 [25]. This image database consists of 257 categories of images. Among these 257 categories,
AN US
120
there are 256 exact categories in which each image contains a certain kind of object, such as an AK-47 or a butterfly, and 1 clutter category with 827 messy and meaningless images. For each category, at least 80 images were collected from the Internet. The total number of images in the database is 30,607.
To develop classification insights of the image database from a CV perspective,
125
M
a CV system was trained in offline manner. Considering the advanced performance of CNNs in the field of image recognition, we followed the work of the open project
ED
Caffe proposed by Yangqing Jia [3]. Specifically, we derived our CV system based on the BVLC Reference CaffeNet model, with a minor variation from as described in the 130
literature [2], from the caffe model zoo.
PT
Here follows a brief description of our CNN model: • Training and Test Sets. The training set for the CNN model was created by
CE
randomly sampling 70% of the images in each category. As a result, the training set consisted of 21,432 images from 257 categories. The remaining (30%) part
AC
135
of the image database Caltech-256 (9,175 images) formed the test set.
• Model Modification. We modified the number of outputs of the final InnerProd-
uct layer of CaffeNet to 257, but kept the other specifications of the model unchanged. We used a mean file derived from our training set.
• Training Setting. The training was a refined procedure based on the published 140
model snapshot of iteration 310, 000 (”bvlc reference caffenet.caffemodel” and 9
ACCEPTED MANUSCRIPT
”bvlc reference caffenet.solverstate”). For the refining procedure, we modified the initial learning rate to 0.001 and the step size to 20, 000. The iteration number we ran for our model was 100, 000.
CR IP T
• Model Usage. Since we randomly sampled images from the entire image database for experiments, we used the trained CNN model on the entire database as well.
145
This treatment was reasonable considering that the model generalized well to the
test set (Overall accuracies were 0.910, 0.821, 0.884 in training set, test set and the entire image database, respectively. For more details of CV performance, see
150
AN US
Section 5.3). 2.3. Image Presentation & EEG Signal Collecting
Images were presented to the subject using an RSVP paradigm [26]. In our setting, images were shown in blocks of 96 and flashed at 5 Hz (Figure 2). The subjects were seated 75 cm from a monitor, and images were centered on the monitor. A fixation cross appeared just prior to each block to allow the subjects to center their gaze on the images during the RSVP sequences. In each RSVP block, 96 images were sampled
M
155
from the image dataset and presented to the subject. The 96 images corresponded to 8
ED
randomly chosen image categories (1 of which was the target category), with 12 images per category (randomly sampled from each category). Subjects are allowed to take a rest between blocks, and the target category can be varied among blocks.
PT
EEG data were acquired by a g.USBamp system (G.Tec) using 16 electrodes dis-
160
tributed in accordance with the International 10 − 20 system. EEG data were sampled
CE
at 2400 Hz using 200 Hz low-pass and 50 Hz notch filters. The acquired EEG data were divided into epochs, each consisting of 1000 ms EEG data after stimulus onset. Thus, we obtained raw data for every image at the scale of 16 × 2400. The spatial dis-
tribution of EEG activities was assumed to change over time with a temporal resolution
AC
165
of 25 ms. Then, for each image in an RSVP sequence, we applied arithmetic average to its corresponding EEG signals within every 25 ms window. Since this was done in different electrodes (16 electrodes in total), we obtained data at a scale of 16 × 40 for each image. After this preliminary processing of EEG signals, we obtained data at a
10
Time Target
200 ms
96 images
CR IP T
ACCEPTED MANUSCRIPT
Figure 2: RSVP paradigm. Each block was comprised of 1 target category (12 images) and 7 nontarget
170
AN US
categories (84 images). Each image was presented for 200 ms.
scale of 96 × 16 × 40 for each RSVP block. We named this ”averaged-window data
(AWD)” to facilitate later referencing. 2.4. EEG Decoding Module
M
An EEG decoding module was employed in our framework to detect user interest in each image shown during the presentation. In previous works, this was often regarded as the main component for target image retrieval tasks in RSVP sequences.
ED
175
The common strategy to build an EEG decoding module, as stated in Section 1, uses a feature extraction method and classification algorithm. We followed this tackling rou-
PT
tine to design our EEG decoding module, and employed LDA as the feature extraction method and RF as the classification algorithm. The selection of our classification algorithm was suggested by a comparative inspection, which will be shown in detail in
CE
180
Section 4.
Unlike other systems, we did not flag any image directly after the EEG interest
AC
estimation. Conversely, we treated the interest scores as a preliminary result and sent it to the BBCI module (see Section 2.5) for further processing.
185
2.5. BBCI Module For an RSVP sequence, we obtained a probability array P IS of 96 elements from
the EEG decoding module, in which each element indicated the possibility of its cor11
ACCEPTED MANUSCRIPT
Table 1: Target Categories of 25 Training Sequences
S EQUENCE
TARGET
S EQUENCE
TARGET
S EQUENCE
TARGET
C UP
10
PAN
19
P HOTOCOPIER
2
B UTTERFLY
11
H ELICOPTER
20
R AINBOW
3
C AMEL
12
S ANDGLASS
21
S UPERMAN
4
C ENTIPEDE
13
L IZARD
22
S HOES
5
D ISPLAYER
14
K ANGAROO
23
F OOTBALL
6
C ARRIAGE
15
C HEETAH
24
B ICYCLE
7
D OLPHIN
16
OWL
25
U NICORN
8
G LASSES
17
M INOTAUR
9
H ELMET
18
AN US
CR IP T
1
PAPER - CLIP
responding image being a target image. These insights were derived from HV by decoding the EEG signals collected during the RSVP.
For the same RSVP sequence, we also obtained a 96 × 257 scaled matrix P CV =
M
190
CV (pCV ij )96×257 of probabilities from the pre-trained CV system, where pij was the eval-
uated probability of the ith image belonging to the jth category in the image database.
ED
Since target images in an RSVP sequence belonged to the same category, we concluded that the matrix P CV provides insights of target images in a latent way. Furthermore, 195
if the category of target images was explicit, those insights given by P CV would turn
PT
explicit.
To make use of CV insights, we designed a simple scheme to discover the target
CE
category. The BBCI module started with this scheme. The module was aimed at combining both of the insights given by the HV-related components (including the image presenting system, signal collecting system, and EEG decoding module) and the CV system. Particularly, this module took HV insights as prior information, and obtained
AC
200
sample information from CV insights. Thus, we could derive posterior probabilities P P to reach the final retrieval result. In the next section, we will provide further explanations about the BBCI module.
12
ACCEPTED MANUSCRIPT
Table 2: Target Categories of All 35 Test Sequences of the 7 Subjects
S EQUENCE 1
S EQUENCE 2
S EQUENCE 3
S EQUENCE 4
S EQUENCE 5
S UBJECT 1
P EOPLE
H AWKSBILL
C ARTMAN
L AWN - MOWER
C OCKROACH
S UBJECT 2
P HOTOCOPIER
R AINBOW
S CORPION
T ENNIS - SHOES
S OCCER - BALL
S UBJECT 3
F RYING - PAN
G ORILLA
H ELICOPTER
H OURGLASS
L IZARD
S UBJECT 4
F RENCH - HORN
T OURING - BIKE
A IRPLANES
H AMBURGER
L LAMA
S UBJECT 5
M EGAPHONE
H OURGLASS
L EOPARDS
T OMATO
PAPER - SHREDDER
S UBJECT 6
Z EBRA
B IRDBATH
F LOPPY- DISK
M ICROSCOPE
G OAT
S UBJECT 7
K ANGAROO
L EOPARDS
OWL
M INOTAUR
C LIP
AN US
205
CR IP T
S UBJECTS
2.6. Experimental Protocol
For each subject, the experiment consisted of 2 phases: the training phase, and the testing phase. For the training phase, we used a training RSVP sequence of 25 blocks, the target categories which are shown in Table 1. To facilitate later expression, we
210
M
treated the training sequence as 25 single-block sequences. Further operations conducted on the training and test sets will also be described in terms of these single-block
ED
RSVP sequences. For each sequence, subjects were instructed to pay attention to images containing a certain kind of object and an example of target images was shown before the presentation (right before the fixation cross) as preparation. In the training
215
PT
phase, we trained the feature extraction and classification models for the EEG decoding module. During the testing phase, 5 single-block sequences were presented, and the target categories of the test sequences are shown in Table 2. The performance of
CE
the BHCR framework was evaluated with respect to each test sequence of each subject (Section 5.1,5.2). The target categories in the testing phase were different from
AC
the training phase, and thus some images that were target images during the training
220
phrase might have appeared as distractors in the testing phase. Seven subjects participated in the experiment (6 males and 1 female; average age
of 22.5 years with a standard deviation of 1.2 years; and all right-handed). All subjects were students of Zhengzhou University and had no previous training in the task. The subjects had normal or corrected-to-normal vision with no neurological problems, and 13
ACCEPTED MANUSCRIPT
Algorithm 2 Processing the BBCI Module IS IS Input: P IS = (pIS 1 , p2 , . . . , p96 ), CV classification probability matrix M =
(mij )30607×257 , CV classification accuracy r, and an intermediate threshold pithr .
CR IP T
P P Output: Posterior array P P = (pP 1 , p2 , . . . , p96 ).
Begin:
1. Extract classification probabilities of the presented images from M ; form them into a small sequence-dependent matrix P CV = (pij )CV 96×257 .
2. Target Category Discovery. Determine the category of target images based on P CV , P IS , pithr .
AN US
C C 3. Matrix to Array. Transform P CV into an array P C = (pC 1 , p2 , . . . , p96 ), in
which pC i indicates the possibility of the ith image being a target image. 4. Bayesian Model Construction. For the ith image of the sequence, a Bayesian C P model is constructed based on pIS i and pi to yield a posterior probability pi .
End.
were financially compensated for their participation.
M
225
ED
3. Details of the BBCI Module
IS IS Suppose we obtained P IS = (pIS 1 , p2 , . . . , p96 ), an array of interest scores of an
RSVP sequence, from the EEG decoding module. The array P IS would be transferred
230
PT
to the BBCI module, which would then start processing. The processing flow-path of the BBCI module is shown in Algorithm 2. Except for the first step of deriving a small
CE
matrix P CV , we organized this processing into 3 subparts. The designing details of these subparts are described below.
AC
3.1. Target Category Discovery
235
Since P CV contains image classification information derived from the CV system,
we must somehow transform them into information about target images of the test RSVP sequence. Notice that compared with P IS , P CV has an apparently different
form. Therefore, the first concern is to reform it. To guide this transformation, we need to know the category of the image database that target images belong to. 14
ACCEPTED MANUSCRIPT
We named this sub-processing of deriving the target category as Target Category 240
Discovery. It contains 3 steps:
CR IP T
Step 1. Obtain a index set Ind = {i|PiIS ≥ Pithr }. P Step 2. For j ∈ {1, 2, . . . , 257}, calculate Sj = i∈Ind pCV ij .
Step 3. The category corresponding to the Jth column of P CV is recognized as the cat-
egory of target images (target category), where J satisfies SJ = max1≤j≤257 Sj . 245
Remark 1. An explanation. We obtained a coarse target retrieval result with pithr , and
the recognized images were indexed by Ind. Then, we turned to matrix P CV , summed
AN US
the (CV-based) probabilities of recognized images in each column (i.e. with respect to each category), sorted summation results, and recognized the category with the biggest accumulate probability as the target category.
Our design specifications suggest that the performance of Target Category Discov-
250
ery mainly depends on (1) the intermediately recognized images with P IS and pithr ,
M
(2) the classification possibilities of the recognized images assigned by the CV system. To describe a suitable condition, we summarized the demand of the target category
255
ED
discovery with the following 2 assumptions: Assumption 1. Most of the intermediately recognized images are true targets. Assumption 2. In most cases, for an image, the CV system tends to evaluate a larger
PT
value of probability to its exact category.
CE
These 2 assumptions call for as good as possible intermediate retrieval performance and classification performance of the CV system. This also partly addresses the reason
260
for us to conduct method selection for the EEG decoding module in Section 4 and to
AC
employ the state-of-the-art CNN as the CV system (with attempts to meet the assumptions, we reached a category discovery accuracy of 100% among test sequences and 99.7% in a simulation of 1000 sequences, see Section 5.3).
265
Note that the way to choose pithr remains unexplained. In fact, the 2 proposed
indexes in Section 4.2 were designed partly to meet Assumption.1, and we just set the value of pithr by maximizing 1 of the indexes, DT −F (for details, see Section 4.2). 15
ACCEPTED MANUSCRIPT
3.2. Matrix to Array Knowing the target category, we could transform CV classification information into target-related information by reforming P CV into an array similar to P IS . For each image, we used a process that is divided into 2 cases depending on whether
CR IP T
270
the image had been classified to the discovered target category by the CV system:
(1) For i = 1, 2, . . . , 96, if the ith image is classified to the target category, then calculate the possibility of it belonging to the target category as PiC =
CV r×Pi,J CV +(1−r)×(1−P CV ) ; r×Pi,J i,J
(2) For i = 1, 2, . . . , 96, if the ith image is classified to another category, then calculate
CV (1−r)×Pi,J CV +r×(1−P CV ) . (1−r)×Pi,J i,J
AN US
the possibility of it belonging to the target category as PiC =
275
Remark 2. These calculations helped us derive insights of the target image retrieval task from the CV system. We could then treat P C as another array of interest decoding probabilities for target image retrieval by some other experimental treatment. Since P IS and P C had the same pattern, we then tried to combine them by a Bayesian model.
M
280
3.3. Bayesian Model for Each Image
ED
For ith image in an RSVP sequence, we considered the event ”this image is one of user’s interesting images”, which followed a 2-point distribution parameterized by
285
PT
θi . Our proposed Bayesian method enabled us to give a posterior estimation of every θi in an RSVP sequence. As we know, a Bayesian model requires prior information and sample information. But at this point, we have just 2 probabilities for each image.
CE
Thus, we must address these additional problems: (1) Which should be taken as prior information? (2) How can the other be utilized as sample information? Let’s consider the following scenario. Some images are presented to the subject by
RSVP and the subject rates these images in the form of probability (P IS ). However,
AC 290
the subject is worried about missing some of these scores during the recording, and that some may be inaccurate due to carelessness. To eliminate doubts, the subject turns to a computer for help after his rating session. The subject indicates the target category and provides all presented images to the computer. Then, the computer conducts
16
ACCEPTED MANUSCRIPT
295
randomized trails for every image, records the frequency of it to be recognized as a target image, and transforms the frequencies into probabilities (P C ). If some of the above worried situations occurred, the subject receives help from those probabilities: values between P IS and P C for some images.
CR IP T
for example, replacing some values in P IS with values in P C , or calculating average Our Bayesian method was built based on the above idea. We took probabilities in
300
P
IS
as prior information and recovery sample information from probabilities in P C
by re-conducting randomized trials according to these probabilities. Specifically, we
conducted N random trials (parameterized by PiC ) for the ith image, and recorded the
305
AN US
frequency ni of it being recognized as a target image as sample information for the Bayesian model.
For the ith image, the process of the Bayesian model is described as follows:
(1)
M
Step 1. Establish prior distribution for the ith image, 2 · (1 − pIS ), θi ∈ [0, 0.5]; i π(θi ) = 2 · pIS , θi ∈ (0.5, 1]. i
Step 2. Conduct randomized trials that are parameterized by PiC for N times, and
ED
record the frequency ni of it being recognized as a target image. Step 3. Calculate the posterior distribution according to the above results, π(θi |ni ) ∝ h(ni , θi ) = p(ni |θi ) · π(θi ).
PT
310
Step 4. Integral the posterior distribution at the interval (0.5, 1], and the resulting value
CE
pP i is the desired posterior probability.
AC
4. Classification Algorithm Comparison for EEG Decoding Module
315
As stated in Section 1 and Section 3.1, we conducted a comparative inspection
for the selection of classification algorithm here. The discussion of this section is organized as 3 parts: (1) comparison settings of feature sets and candidate classification algorithms; (2) comparison indexes utilized to evaluate the performance of candidate algorithms; (3) the comparison and results.
17
ACCEPTED MANUSCRIPT
4.1. Comparison Settings 320
Feature Sets. To ensure that the selected classification algorithm works well on feature sets derived by feature extraction methods in both unsupervised and supervised condi-
CR IP T
tions, we conducted comparisons on LDA-based feature set LDA(40) and PCA-based feature set PCA(51). Specifically, LDA(40) was acquired by an LDA of AWD data,
which took the filtered data as a linear combination of the data of all 16 electrodes in 325
the same time window, while PCA(51) was acquired by a PCA of AWD data, thus setting the cumulative proportion as 85%. The numbers 40 and 51 in parentheses indicates the dimension of the feature set. Additionally, we treated AWD as a trivial feature set
AN US
and employed it as a baseline for comparison.
Candidate Classification Algorithms. This comparison was conducted using 6 preva330
lent classification algorithms in machine learning communities: adaboost [27], bagging [27], artificial neural network (ANN)[28], RF [29], SVM [30] and LR [1]. 4.2. Comparison Indexes
M
The common measure to evaluate information retrieval performance uses the area under the curve (AUC) value of the ROC curve, but new comparison indexes should be introduced here for the following reasons: (1) to meet Assumption.1 stated in Section
ED
335
3.1; (2) to suit the unbalanced characteristic of the binary classification for RSVP target image retrieval.
PT
We concluded 2 requirements from Assumption.1, which were the basis for our comparison indexes design: (1) recognize true targets as much as possible; (2) recognize false alarms as little as possible. Since detecting the correct target category is an
CE
340
important concern of the BBCI module, we did not take the absolute value of recognized true targets as a direct measurement of the comparison. Instead, we proposed
AC
2 indexes (True-False Difference and Corrected False-True Ratio) to address these requirements.
345
To make sure our comparison result was available for common information retrieval
settings, an AUC-based comparison was conducted (see Section 5.3). As expected, our 2 results were consistent with each other. Thus, the suggestion we provide here is also constructive to common EEG decoding settings. 18
ACCEPTED MANUSCRIPT
Definition 1. True-False Difference DT −F (2)
Definition 2. Corrected False-True Ratio RF/T : RF/T = 7 ·
1 − p00 p01 =7· 1 − p10 p11
CR IP T
DT −F = 12 · p11 − 84 · p01
(3)
In the definitions, p10 indicates the probability that a target image is recognized as 350
a distractor, p00 indicates the probability that a distractor is recognized as a distractor,
AN US
p11 = 1 − p10 , and p01 = 1 − p00 . Here probabilities are approximated by frequencies. Remark 3. The 2 indexes indicate the differences in value and ratio between true targets and false alarms. In the comparison, the desired algorithm should have a large DT −F value and a small RF/T value to meet the above 2 requirements. 355
4.3. Comparison & Results
M
Retrieval performances in terms of DT −F and RF/T are shown in Table 3 and Table 4 (since the performances shared similar pattern among subjects, we show figures of one subject here). All indexes were calculated over an average of RSVP sequences,
360
ED
and intermediate threshold pithr for each algorithm was set to maximize index DT −F . The left half of each table displays the results in the training set, and the right side
PT
corresponds to the test set. For each subject, comparison was conducted with a 10-fold repeat random sub-sampling validation. In each fold, 20 of the 25 training sequences were randomly sampled as the training set and the remaining 5 sequences were taken
CE
as the test set. Performances on both sets were considered to evaluate the candidate
365
classification algorithms. Adaboost and RF were the best fitting models with respect to the training set, across
AC
all feature sets. In fact, DT −F = 12 and RF/T = 0 means that all target images were recognized and no false alarm appeared. Viewing the right-side columns of Table 3 and Table 4, it is obvious that, in most conditions, classification algorithms built upon
370
an LDA feature set outperformed others in both indexes. This observation was consistent our intuition that supervised filters could encode data information better than
19
ACCEPTED MANUSCRIPT
Table 3: Performance of Candidate Classification Algorithms w.r.t. True-False Difference
AWD(16X40)
PCA(51)
LDA(40)
DT −F ( TEST )
ADABOOST
12.00(0.00)
12.00(0.00)
12.00(0.00)
BAGGING
11.20(0.23)
8.01(0.50)
7.13(0.36)
RF
12.00(0.00)
12.00(0.00)
12.00(0.00)
RF
ANN
12.00(0.00)
8.40(0.75)
3.27(0.340)
ANN
SVM
11.94(0.04)
4.63(0.32)
8.36(0.15)
SVM
LR
12.00(0.00)
2.06(0.20)
3.27(0.27)
LR
ADABOOST BAGGING
AWD(16X40)
PCA(51)
LDA(40)
3.92(1.05)
3.08(0.80)
5.06(0.99)
CR IP T
DT −F ( TRAIN )
2.80(0.81)
1.74(0.71)
4.22(0.96)
4.06(0.98)
2.88(0.38)
5.68(1.07)
3.14(1.04)
1.36(0.64)
3.52(1.22)
4.90(0.78)
1.06(0.40)
4.34(1.40)
-2.90(1.05)
1.72(0.69)
3.34(1.10)
PCA(51)
LDA(40)
AWD(16X40)
PCA(51)
LDA(40)
ADABOOST
0.00(0.00)
0.00(0.00)
0.00(0.00)
ADABOOST
0.17(0.11)
0.15(0.12)
0.15(0.06)
BAGGING
0.04(0.12)
0.11(0.05)
0.11(0.03)
BAGGING
0.22(0.13)
0.35(0.12)
0.21(0.11)
RF
0.00(0.00)
0.00(0.00)
0.00(0.00)
RF
0.14(0.11)
0.14(0.10)
0.11(0.05)
ANN
0.00(0.00)
0.15(0.05)
0.29(0.05)
ANN
0.39(0.11)
0.49(0.13)
0.34(0.08)
SVM
0.00(0.00)
0.00(0.00)
0.19(0.02)
SVM
0.17(0.07)
0.04(0.08)
0.28(0.10)
LR
0.00(0.00)
0.33(0.05)
0.29(0.06)
LR
1.60(0.30)
0.41(0.18)
0.32(0.08)
M
AWD(16X40)
RF/T ( TEST )
ED
RF/T ( TRAIN )
AN US
Table 4: Performance of Candidate Classification Algorithms w.r.t. Corrected False-True Ratio
unsupervised filters. The only exception was in SVM, which performed better on the
PT
AWD(16X40) set. This exception may have been due to the use of a sigmoid kernel when constructing the SVM algorithm. This result also suggests a possibility of de375
coding EEG signals with a kernel method, which remains to be inspected. Ultimately,
CE
the best performances were with the LDA feature set, so we focused our attention on the third columns. There, we can see that RF was the best setting choice.
AC
Simple calculations suggested that, by taking LDA as the feature extraction method
and RF as the classification algorithm, the numbers of recognized real targets and false
380
alarms in a test RSVP sequence were expected to be 6 (actually 6.38) and 1 (actually 0.70).
20
ACCEPTED MANUSCRIPT
5. Experimental Results In this section, we will firstly show the performance of the entire BHCR framework
385
CR IP T
and then investigate the robustness of the BHCR framework. 5.1. Performance of BHCR: ROC Asepct
Although the RSVP target image retrieval task result was given by a probability threshold in practice, it is common to evaluate the performance of an information retrieval mechanism by ROC curves and their corresponding AUC values.
To intuitively illustrate the improvement of retrieval performance by combining CV insights under the help of the proposed Bayesian method, a simple example is shown
AN US
390
in Figure 3a. Focus is on 1 randomly selected test RSVP sequence, and consists of 2 ROC curves. The dashed blue curve (AU C = 0.80) depicts the performance of the RSVP target image retrieve mechanism in which only the EEG signals were employed (with LDA as feature extraction method and RF as classification method), and the solid 395
red curve (AU C = 1) depicts the performance of the proposed framework, which
M
took advantage of the CV insights by the BBCI module. The red curve being well above the blue one highlights that the proposed framework outperformed the EEG-
ED
only mechanism in this example.
Figure 3b further illustrates the intuition obtained from Figure 3a by showing all 400
AUC values of the 2 compared mechanisms. In particular, points with a vertical coordi-
PT
nate indicate the performance of the BHCR framework, while a horizontal coordinate indicates that of the EEG-only mechanism. Since all points are located in the upper-
CE
left of the figure, the AUC values along the vertical axis are larger than those along the horizontal axis. We found that the proposed framework outperformed the EEG-only
405
mechanism in all 35 test sequences. The average AUC value of the BHCR framework,
AC
0.987, was 13.1% higher than that of the EEG-only mechanism, 0.873. Furthermore, among all 35 test sequences, 54.3% of cases in which the proposed framework outper-
formed the EEG-only mechanism by 10% in terms of AUC values, and 25.7% of cases
by 20%. The greatest improvement was by 31.5%.
410
Figure 3c shows the mean AUC values of test sequences in the 3 different mechanisms with respect to different subjects. This figure shows that, on average, the BHCR 21
1.00
1.00
CR IP T
ACCEPTED MANUSCRIPT
0.50
ROC Curves
0.25
BHCR
0.90
AUC 1.00 0.85
0.98 0.96
0.80 0.94
EEG−only 0.00
0.92 0.75
0.25
0.50
0.75
1.00
0.75
False Positive Rate
0.80
0.85
0.90
0.95
1.00
0.75
0.50
0.25
0.00
1.00
1
AUC Value of the EEG−only Mechanism
(a)
BHCR
2
1.00
0.9
AUC Value
0.75
0.8
Method 0.7
0.50
0.25
EEG−only
2
M
BHCR
3
4
5
Chance
6
7
Subject
(b)
1
1.0
EEG−only
AN US
0.00
AUC Value
Average AUC Value of Test Sequences
True Positive Rate
0.75
AUC Value of The BCHR Framework
AUC 0.95
(c)
3
4
5
6
7
Methods: BHCR EEG−only Chance
0.00
1
2
3
4
5
Subject
6
7
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
All Test Sequnces of All Subjects
(e)
ED
(d)
1 2 3 4 5
Figure 3: Performance of the BHCR framework (ROC Aspect), mainly compared with an EEG-only mechanism and retrieval by chance. (a) A randomly selected test RSVP sequence. The ROC curve of the BHCR
PT
framework is well above the ROC curve of the EEG-only mechanism. (b) AUC values of all 35 test RSVP sequences of the 7 subjects. AUC values are drawn as 2D points, where a vertical coordinate indicates the performance of the BHCR framework and a horizontal coordinate indicates the EEG-only mechanism. The
CE
BHCR framework outperformed the EEG-only mechanism in all cases. (c) Bar chart of average AUC values of each subject with the 3 compared mechanisms. The average AUC value of the BCHRR framework was close to 1 for all 7 subjects, and its corresponding bars were higher than those of the other mechanisms.
AC
(d)Box plot of AUC values of each subject with the BHCR framework and EEG-only mechanism. (e) AUC values of all 35 test sequences of the 7 subjects.
22
ACCEPTED MANUSCRIPT
framework performed better than the EEG-only mechanism for each subject. The comparison for retrieving by chance was aimed at revealing the practical significance of both the BHCR framework and the EEG-only method.
CR IP T
We provide a finer subject-specific comparison between the BHCR framework and
415
the EEG-only mechanism in Figure 3d, in which box plots are drawn. The BHCR
framework performed better than the EEG-only mechanism since (1) their correspond-
ing box plots were well above those of the EEG-only mechanism, and (2) their box plots were thinner than those of the EEG-only mechanism.
Lastly, we conducted a subject-and-sequence specific comparison, which is dis-
420
with those from the above analysis.
AN US
played in Figure 3e, to present greater detail of our results. These results also accord
5.2. Performance of BHCR: Classification Aspect
Since the RSVP procedure we investigated was aimed at finding target images in 425
RSVP sequences, and we actually obtained intermediate EEG-only retrieval results
M
with a temporal probability threshold pithr during the processing of the BBCI module, we then conducted comparisons regarding target image retrieval (binary classification)
ED
performances.
We conducted 3 kinds of experiments to reveal the improvement of our proposed 430
framework in terms of classification performance. We utilized 3 different retrieval
PT
principles: (1) Retrieving by optimizing the indexes proposed in Sec 4.2; (2) Retrieving without false alarm images; (3) Retrieving so that all true targets are hit. Table 5 shows the target image retrieval performance of the EEG-only mechanism,
CE
and Table 6 shows the performance of the BCHR framework. In each table, the num-
435
bers contained in the first 2 rows correspond to how many true targets and false alarms
AC
were recognized from the test RSVP sequence following the first principle. The third rows show the number of recognized true targets in an RSVP sequence in the circumstance that no false alarms appeared, and the fourth rows indicate the number of false alarms in a retrieval process if all true targets were hit. All values are presented as
440
means over 5 test sequences of each subject, with standard deviations. The BCHR framework outperformed the EEG-only mechanism under all 3 retrieval 23
ACCEPTED MANUSCRIPT
Table 5: Target Image Retrieve Performance of EEG-only Mechanism
S UBJECT 1
S UBJECT 2
S UBJECT 3
S UBJECT 4
S UBJECT 5
S UBJECT 6
S UBJECT 7
Ntrue
5.6(1.82)
6.2(2.17)
6.4(1.82)
6.6(2.51)
5.6(2.71)
7.6(1.82)
7.0(1.22)
Nf alse
1.0(1.00)
2.0(2.35)
0.8(1.30)
2.2(2.77)
1.2(1.30)
2.8(2.95)
0.4(0.89)
Ntrue
3.8(0.84)
3.0(1.73)
4.6(2.30)
3.0(1.58)
3.0(2.35)
2.2(1.48)
6.6(1.52)
54.6(19.30)
54.2(22.33)
51.4(30.33)
56.6(22.30)
65.6(18.28)
52.0(22.36)
68.2(8.44)
Nf alse
CR IP T
S UBJECTS
Table 6: Target Image Retrieve Performance of BHCR Framework
S UBJECT 1
S UBJECT 2
S UBJECT 3
S UBJECT 4
S UBJECT 5
S UBJECT 6
S UBJECT 7
Ntrue
11.2(0.84)
11.6(0.55)
11.8(0.45)
11.6(0.55)
11.8(0.45)
11.8(0.45)
11.8(0.45)
Nf alse
0.0(0.00)
0.00(0.00)
0.00(0.00)
1.00(0.00)
0.00(0.00)
0.00(0.00)
1.00(0.00)
Ntrue
11.2(0.84)
11.6(0.55)
11.8(0.45)
10.6(1.14)
11.8(0.45)
11.8(0.45)
11.4(0.89)
Nf alse
17.0(18.14)
4.2(6.57)
0.6(1.34)
21.4(31.30)
16.6(37.12)
4.2(9.39)
17.6(37.12)
M
AN US
S UBJECTS
principles. Actually, by employing the first principle, the average number of true targets selected by the BCHRR framework had increased by 81.3% compared to the EEG-only
445
ED
mechanism, and the false alarm rate decreased by 57.8%. The average number of true targets with respect to the second principle increased by 342.0%, and in terms of the
PT
third principle, the average number of false alarms decreased by 80.4%. Remark 4. From viewing Section 5.1 and 5.2 we concluded that the BHCR framework outperformed the EEG-only mechanism to a considerable degree. Since we em-
CE
phasized the coupling of HV and CV in our framework, one may also consider how
450
to compare the BHCR framework to the CV part. However, there is no direct way to
AC
make such a comparison since the CV system itself does not contain specific retrieval information. Thus, for a real retrieval task, a target category discovery procedure is necessary for utilizing the CV insights to refine the EEG-only mechanism. A possible measure to compare the BHCR framework to the CV system is classification accu-
455
racy. For simplicity, we calculated the average accuracy of all 35 test sequences for the BHCR framework (0.955), and used the average classification accuracy of the CV sys24
ACCEPTED MANUSCRIPT
0.95
0.90
Algorithm adaboost ANN bagging
0.80 LR RF 0.75 SVM
0.70
1
2
3
4
5
6
AN US
Candidate Algorithm
CR IP T
AUC Value
0.85
Figure 4: EEG decoding performance of candidate classification algorithms (box plots of AUC values). Both SVM and RF seemed to have top performances, but an outlier made SVM less satisfactory than RF.
tem (0.887, see Section 5.3). This indicated that the BHCR framework outperformed forced to the CV system. 460
M
the CV system even when a strong assumption (the target category is known) was en-
5.3. Robustness of the BHCR Framework
ED
Performance of EEG Decoding Module. In Section 4.2, we presented 2 indexes for selecting the classification algorithm. We considered to what extent would the
PT
selected algorithm be consistent with the best algorithm suggested by the ROC perspective. Figure 4 shows the AUC performances of all 6 candidate algorithms with the 465
test sequences (we only display the LDA-based results since they had better perfor-
CE
mance from a global perspective). The RF algorithm had the best AUC performance compared to other candidate algorithms, which was consistent with the conclusion we
AC
derived in Section 4.3 (both SVM and RF seemed to have top performances, but an outlier made SVM less satisfactory than RF).
470
CV System Performance. The CV system plays an important role in the entire
framework since the posterior probabilities are derived by combining CV insights with EEG decoding insights. Intuitively, the BHCR framework would return a good retrieval result if the CV system provided image classification information in a proper way. 25
ACCEPTED MANUSCRIPT
25
15
10
5
0 0.6
0.7
0.8
0.9
1.0
AN US
Classification Accuracy
CR IP T
Numbe of Categories
20
Figure 5: Classification accuracy histogram of the CV system. The dashed line indicates the mean value of classification accuracy of all 257 categories, which equals 0.887. Most categories had accuracy values in the interval (0.7, 1) and only 3 had accuracy below 0.700. A density curve is also displayed.
Investigated the classification performance of the CV system would help us obtain a better understanding of the Bayesian interaction mechanism. We plotted a histogram of
M
475
the category number relative to classification accuracy (Figure 5, with a density curve, and a dashed line indicating the average accuracy (0.887, sd = 0.0685, n = 257)).
ED
The majority of these accuracies were located in the interval (0.7, 1), and only 3 were below 0.700 (0.620, 0.653, and 0.660). This indicates that when the target category 480
discovery processes successfully, we could expect 12 × 0.887 = 10.644 true target
PT
EEG decoding probabilities would be amended correctly by CV-side information in
average, more than 12 × 0.700 = 8.400 in most cases, and about 12 × 0.620 = 7.440
CE
in the worst case.
Recall that the average values of the selected true target images of different sub-
485
jects were consistently close to 11 (Section 5.2), which highly agrees with the 10.664
AC
obtained here. There are some subtle reasons for this phenomenon, and we will briefly explain this in the following discussion. Target Category Discovery Performance. As the first operation of the BBCI
module, target category discovery is a pivotal step that decides whether reliable CV
490
insights can be obtained to refine the EEG-only retrieval performance. To ensure the
26
ACCEPTED MANUSCRIPT
7.5
5.0
2.5
0.0
Discover The Category Fail 9 Success
6
3
0 0.0
2.5
5.0
7.5
10.0
0
The Number of Recognized TRUE Targets
3
CR IP T
The Number of Recognized FALSE Targets
The Number of Recognized FALSE Targets
10.0
6
9
The Number of Recognized TRUE Targets
(a)
(b)
AN US
Figure 6: Performance of the target category discovery process. Each point corresponds to an RSVP sequence, and the vertical axis indicates the number of false alarms in the intermediate retrieval result while the horizontal axis indicates the number of true targets. Dashed lines of X = Y are plotted to separate conditions of X > Y and X < Y . (a) Performance in all 35 test sequences. The target category discovery process succeeded in all cases. (b) Performance in 1000 simulation sequences. The process succeeded in
M
997 cases, and succeeded in all cases if we neglected 3 cases of X <= Y .
BHCR framework works steadily, we must carefully inspect the process of target category discovery.
ED
Target category discovery succeeded in all 35 test sequences of all subjects (Figure 6a). In all cases, the numbers of intermediately recognized true targets were larger 495
than the numbers of false alarms. However, with such a simple observation we cannot
PT
conclude that we would see similar success in all future experiments. To evaluate a more realistic performance, we conducted a simulation experiment for
CE
target category discovery. Several steps were involved: (i) Make hypotheses and apply hypothesis testings for the distributions of the true target number and the false alarm
500
number; (ii) Generate 1000 simulated RSVP sequences according to our experimental
AC
protocol and hypotheses; (iii) Run target category discovery process for the simulated sequences, and obtain binary outcomes indicating whether the process succeeded. We organized simulation results in the same way as real sequences (Figure 6b).
The simulation revealed that target category discovery failed in some circumstances,
505
e.g., when the false alarm number was larger than the true target number. However,
27
ACCEPTED MANUSCRIPT
none failed when the true number was larger than the false number. Success rate of the simulation was 99.7%, which is considerably high. Note that the false alarm number showed a tendency to be smaller than the true numbers in practice, and therefore, we
510
CR IP T
could even expect a higher success rate. The steady performance of the target category discovery provided initial support to the final retrieval results, and partly addressed the aforementioned phenomenon regarding CV system performance.
Prior-Posterior Probability Spread. It is essential for us to inspect how the 2 side probabilities (the CV side probabilities were transformed into sample information)
react for the sake of better understanding the BHCR framework. Here, we provide an analysis with 3 core concepts and illuminating examples.
AN US
515
For an image Ii in an RSVP sequence I1 , I2 , ..., In of n images, the 3 concepts are stated as follows: (1) The fact that Ii is classified to the target category by the CV system tends to promote the possibility of it to be finally recognized as a target image; (2) Although (1) holds, if the CV module cannot give strong enough support in terms 520
of the classification probability, stronger support from the EEG decoding module is
M
required to finally recognize Ii as a target image. (3) The fact that Ii had not been classified into the target category tends to strongly diminish the possibility of Ii to be
ED
finally recognized as a target image.
Examples of these concepts are drawn in Figure 7a and 7b. To facilitate illus525
trations, we assumed a probability threshold of 0.5, i.e., the image would be finally
PT
recognized as a target image if the posterior probability was greater than 0.5. Figure 7a depicts the condition in which an image that was classified to the target category by the
CE
CV system. The top curve shows that the posterior probability was larger than the EEG decoding probability when Pi,tc > 0.113, and the lowest curve shows that the posterior
530
probability was smaller than the EEG decoding probability when Pi,tc < 0.113 (we set
AC
Pi,tc = 0.09 here). Figure 7b depicts the condition in which an image was classified into another category (not the target category) by the CV system. Here, we chose large probabilities less than 0.5 to illustrate the third concept.
535
Notice that the above threshold 0.113 = 1 − 0.887. This was because we ap-
plied a scale technique on the CV probabilities in Section 3.2 with a scale parameter r = 0.887, the average classification accuracy. Furthermore, the phenomenon we de28
ACCEPTED MANUSCRIPT
1.00
1.00
Pick Out? No
0.75
0.50
CVProb = 0.113 0.25
Yes
0.75
CVProb = 0.49 0.50
CR IP T
Final Evaluated Probability
Final Evaluated Probability
CVProb = 0.14
CVProb = 0.45
0.25
CVProb = 0.09
CVProb = 0.4
0.00
0.00 0.00
0.25
0.50
0.75
1.00
Evaluated Probability by EEG−decoding Module
0.00
0.25
0.50
0.75
1.00
Evaluated Probability by EEG−decoding Module
(a)
(b)
AN US
Figure 7: Illustration of prior-posterior probability spread. (a) If an image was classified to the target category by the CV system and the CV probability was larger than 0.113, the possibility of it to be finally recognized
as a target image was promoted. If the CV probability was smaller than 0.113, a larger EEG decoding probability was required for the image to be finally recognized as a target. (b) If an image was not classified to the target category by the CV system, the possibility of it to be finally recognized as a target image was
M
strongly diminished.
scribed when discussing the CV performance could also be partly ascribed to this scale technique since CV estimated probabilities were usually larger than 0.113.
540
ED
Global Robustness. Inspecting the whole procedure of the BHCR framework, we noticed that the EEG decoding module followed EEG signal collection. Then, the target category discovery for the CV system processed with the help of EEG decoding
PT
results, and the prior-posterior probability spread followed. With the above analysis on the relevant components of the BHCR framework, we concluded that all components
CE
of the framework performed steadily. Thus, from a global view, the BHCR framework 545
was robust when the EEG signals of a subject correctly captured information about the retrieval task. Since the latter condition was generally accepted by common BCI
AC
applications, this further supports the global robustness of the BHCR framework.
29
ACCEPTED MANUSCRIPT
6. A Propagation Scheme & An One-shot Imagery Database Retrieval Scheme 6.1. A Probability Propagation Scheme Although the CV system performed well with respect to the accuracy, there re-
550
CR IP T
mained a portion of images were wrongly classified. To solve this problem, we de-
signed a probability propagation scheme that improves the CV system by amending CV estimated probabilities with the EEG decoding probabilities among test RSVP sequences.
For an image, the EEG estimated probability of it being a target image will be used
555
to amend it’s CV estimated probability. Since the EEG decoding module sometimes
AN US
recognizes false alarms, we cannot ensure the amending process succeeds in just one time. Actually, we assume a sequential amending process for each image in the image database.
Suppose that an image IoDi had been recognized as target image in n test RSVP se-
560
quences and the EEG interest scores of IoDi among those sequences were Pe,1 , Pe,2 , ..., Pe,n , process worked as follows:
M
the initial CV estimated probabilities of IoDi were P1,i , P2,i , ..., P257,i . The amending
0
Step 1. For each j ∈ 1, 2, ...n, calculate Pe,j =
Pe,j ∗PE Pe,j ∗PE +(1−Pe,j )∗(1−PE ) ,
where PE
ED
is the precision of the EEG decoding module;
565
Step 2. For each j ∈ 1, 2, ...n, if image IoDi was recognized to ith category, update 0
PT
the probability Pj,i = Pj,i + Pe,j ;
Step 3. Rescale probabilities P1,i , P2,i , ..., P257,i to a sum of 1.
CE
Since the possibility of an EEG recognized image being a true target was about 570
9.11 = 6.38/0.70 times as the possibility of it being a false alarm, and the process of
target category discovery worked steadily, we can expect that in the Step.2, the prob-
AC
abilities will accumulate on the exact category. So that the updated CV probabilities will be close to the reality.
575
We conducted a simulation of the propagation process and got encouraging figures.
In the simulation, we tried to find out how many wrongly classified images can be corrected to the right category with different times of propagation. The simulation was conducted by 4 steps: 30
ACCEPTED MANUSCRIPT
Table 7: Performance of The Propagation Scheme
C ORRECTED P ERCENTAGE
1
74.48%(0.28%)
2
91.44%(0.29%)
3
98.44%(0.21%)
4
99.75%(0.07%)
5
99.95%(0.05%)
6
99.99%(0.01%)
7
100.00%(0.00%)
AN US
CR IP T
P ROPAGATION T IMES
(1) Make normal distribution hypothesis for the distribution of the EEG decoding probabilities and apply hypothesis testing; 580
(2) If the times of propagation is n, we generate n ∗ Nwc probabilities Pe,i,j , i = 1, 2, ..., n, j = 1, 2, ..., Nwc as a simulation of EEG decoding probabilities accord-
M
ing to the hypothesis (Nwc = 3563 is the number of wrongly classified images); (3) For each j ∈ 1, 2, ..., Nwc and its corresponding image IoW Cj , simulate their dis-
ED
covered target categories n times by keeping the right-wrong category ratio consistent with the true-false ratio of the EEG recognized images (the ratio equals to
585
9.11). And make sure the wrongly discovered categories are randomly selected
PT
from the rest 256 categories.
(4) Apply the proposed propagation scheme to all the wrongly classified images.
CE
We ran 10 times of the simulation process to get figures of the propagation scheme 590
(see Table 7). It told that, 74.5%(sd = 0.28%, N = 10) of the wrongly classified images were corrected to the right categories with 1 propagation, and the performance
AC
got better with the increase of propagation time. Finally, all the wrongly classified images were corrected to the right category when the time was set to 7. 6.2. An One-shot Imagery Database Retrieval Scheme
595
Since the CV system contained classification information of the whole image dataset, and the target category discovery offered sound results, it’s naturally for us to conduct 31
ACCEPTED MANUSCRIPT
Table 8: Performance of The Proposed Database Retrieval Scheme.
T EST S EQUENCES
A LL C ATEGORIES
K = 50
100.00%(0.00%)
100.00%(0.00%)
K = 80
97.45%(5.30%)
97.69%(4.93%)
K = |C|
94.39%(5.10%)
96.86%(4.87%)
CR IP T
P RECISION
database retrieval after target image retrieval in RSVP presented images. Here we provide an one-shot image database retrieval scheme and illustrate its performance.
600
AN US
Suppose a test RSVP sequence with images I1 , I2 , ..., I96 had been presented to the
subject and the BHCR framework returned the target category C and a set of recognized target images IT = {Ii |Ii is recognized as a target image}. A set RD with K images were expected to return as the image database retrieval result. The one-shot retrieval process worked as follows:
605
M
Step 1. The set IT forms the first part of RD ;
Step 2. Sort all the images in the image database (except for images in IT ) in a decreasing order w.r.t the CV estimated probabilities belonging to category C;
ED
Step 3. The rest part of RD consists of the top K − |IT | images of the sorted list derived from Step.2.
610
PT
Performance of the image database retrieval scheme was measured by 2 ways: (1) Conducting the scheme for all test RSVP sequences in our experiments; (2) Inspecting
CE
retrieval performances to all the 257 categories with simulated test sequences. Since the minimum number of images in a category was 80, we setted K = 50 and K = 80 to observe the performance. In addition, we also set K to the exact number of images
AC
of each category (K = |C|) for inspecting. Table 8 shows that the one-shot retrieval
615
scheme achieved retrieval precisions higher than 90% in all the inspected settings.
32
ACCEPTED MANUSCRIPT
7. Discussion 7.1. Practicability of the BHCR Framework As a real-time application of BCI, the practicability of BHCR framework is an
620
CR IP T
important concern. Since we had demonstrated the retrieval performance of the BHCR framework, what remains to be explained is the time efficiency.
Recall that, for a subject, the following requirements are necessary before starting the operation of the BHCR framework: i) Get the subject-specific EEG decoding module ready; ii) Get the CV system ready; iii) The BBCI module is ready for its
AN US
calculations. Keep the requirements in mind, we can give an efficiency analysis here.
The operation of the BHCR framework for a RSVP sequence consists of: i) Esti-
625
mate the EEG interest scores for the presented images; ii) Extract a probability matrix for the sequence from the CV system; iii) Run the BBCI module to get the retrieval result. Among the 3 steps: i) The first step is common for all the existing real-time BCIs; ii) The second step is an indexed reading process from local storage, with a negligible time consumption. Thus the efficiency of the BBCI module remains a decisive factor
M
630
of the practicability of BHCR framework.
Suppose a RSVP sequence consists of n images. The BBCI operation can be di-
ED
vided into several parts: i) The target category discovery procedure; ii) The matrix to array procedure; iii) The Bayesian model for each image. The total calculation complexity of all the above parts is O(n) since the number of operations is fixed (or has a
PT
635
constant upper bound) when the application setting (e.g., the category number K and the number of randomized trails N for each image, etc) is settled.
CE
With the above efficiency analysis, we point out that the proposed framework meets
the requirement of a real-time BCI application. In our experiments, we found no significant difference between the running time of BHCR framework and an EEG-only
AC
640
retrieval mechanism, which confirms our conclusion. 7.2. Comparison with Previous Works Since the BHCR framework focuses on the target image retrieval in RSVP se-
quences, we first compare our work to previous works on the same task. Actually,
33
ACCEPTED MANUSCRIPT
645
we had conducted a comparison with respect to the EEG decoding methodology in Section 4.3 and Section 5.3. Firstly, the comparison setting of feature extraction methods in Section 4.3 had taken the classical unsupervised method PCA [31, 32] and the
CR IP T
commonly used supervised method LDA into consideration [33, 1, 9]. Secondly, the candidate classification algorithms we had inspected on include SVM [15] and LR [1], 650
which were frequently used in previous works. We also considered classification al-
gorithms such as adaboost [16], bagging [17], ANN [18], and RF [19], which were
fewer used in similar tasks. In light of the overview paper [7], feature extraction and classification should not be regarded as isolated processes. Thus we conducted the
655
AN US
comparison on combinations of the 2 processes. For example, the combination of LDA
and LR in our comparison employed the same methodology of the paper [22]. Some other combinations may not have corresponding existing works, our comparison had also provided sound experimental references. Note that, recently, deep learning techniques were employed to solve the EEG-based classification problem [34, 35], it is worthy to conduct comparisons with them in future works.
Besides the EEG decoding methodology, we can also compare our work to previous
M
660
works that employ extra information to enhance the performance of retrieving target
ED
images in RSVP sequences. As mentioned in Section 1, iterative processes have been proposed in [1, 8] for RSVP target image retrieval so that a good initial set of target images is sent for further image database retrieval. These iterative processes generally require extra manual labor, such as viewing additional RSVP images. Other ways to
PT
665
utilize extra information have been exploited very limited. For example, A. R. Marathe
CE
and his cooperators [36] introduced a measure of confidence to filter non-target images, but they simply corrected the filtered images by human labeling. Compare to manual-cost methods, our framework provided an approach by which we can improve the retrieval performance without extra manual labor.
AC
670
7.3. Potential of Future BCIs Reduce manual labor for future retrieval tasks. The BHCR framework was
demonstrated an attractive way of brain-computer interaction. By fully utilizing the information provided by the CV system, the BHCR framework can recognize more 34
ACCEPTED MANUSCRIPT
675
true targets and fewer false alarms in RSVP sequences while no additional manual labor is required. The sound target category discovery performance showed in Section 5.3 implies that we no longer need to present extra images to subject for a category guess,
CR IP T
which plays an important role in the closed-loop C3Vision system [1]. In addition, the recognized images covered almost all the target images. These observations suggest 680
that, base on the BHCR framework, future systems for large database retrieval are expected to be free of multiple presentations or to reduce extra manual labor while achieving better retrieval result.
Refine CV system with HV insights. In Section 6.1, a simple propagation method
685
AN US
was proposed for improving the CV system with HV insights. For each image, the
propagation process is efficient, but it requires many RSVP sequences to modify all the wrongly classified images to the right category. A probable way to improve the solution is taking image similarities into consideration. If a similarity-based CV component [1, 8, 24] is introduced, it may also be useful to improve the database retrieval performance.
Employ Other EEG decoding techniques. Although feature extraction tech-
M
690
niques and classification methods are commonly used in EEG-based BCI applications,
ED
there are also other techniques can be used to decode EEG signals. Channel selection method [37] for dimension reduction and feature learning techniques [38, 39, 40] aims at denoising or filtering can be used in a pre-processing stage of EEG signals. As for the recognition stage, projective dictionary pair learning [41] and mask-based method
PT
695
[42] can be used for the classification. Since the EEG signals can be treated as a high
CE
dimension tensors, techniques such as nonnegative tensor factorization [43] and other tensor processing method [44] may also be taken into consideration. Further application situations. We suggest that the BHCR framework can be ap-
plied to a variety of future recognition tasks and some complex but realistic situations.
AC
700
For example, when we fail to keep the user’s brain activity in a steady mode, when the images presented to the user are messy and dim [45], etc. We also believe that there remains a large space to be exploited in constructing different efficient BCIs whether based on our framework or not.
35
ACCEPTED MANUSCRIPT
705
8. Conclusion In this paper, we presented a BCI framework BHCR based on a BBCI method for target image retrieval in RSVP sequences. By fully taking advantage of CV and HV
CR IP T
insights, the BHCR framework achieved better retrieval performance compared to the
EEG-only mechanism. We also compared the BHCR framework to previous works in 710
several ways to reveal its advantages. To construct a well-suited EEG decoding module
for BHCR, we then conducted a comparison of classification algorithms, which indi-
cated that RF outperformed other candidate classification algorithms in the EEG decoding task. More deeply investigating the image database retrieval task, we provided
715
AN US
a probability propagation scheme and a one-shot image database retrieval scheme. Although there are still many interesting work directions around the BHCR framework, and some aspects of the BHCR framework could be improved in the future, this study suggests that the proposed framework could be useful for assisting individuals in han-
Acknowledgements
M
dling target image retrieval tasks in RSVP sequences.
The authors would like to thank Dr. Linyuan Wang and Vice Prof. Li Tong from
720
ED
Zhengzhou Information Science and Technology Institute for their invaluable expert advice that makes this paper successfully completed. We thank LetPub for linguistic assistance for this manuscript.
commercial, or not-for-profit sectors.
CE
725
PT
The research did not receive any specific grant from funding agencies in the public,
References
AC
[1] E. A. Pohlmeyer, J. Wang, D. C. Jangraw, B. Lou, S.-F. Chang, P. Sajda, Closing
730
the loop in cortically-coupled computer vision: a brain? computer interface for searching image databases, Journal of neural engineering 8 (3) (2011) 036025.
[2] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, 2012, pp. 1097–1105. 36
ACCEPTED MANUSCRIPT
[3] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrell, Caffe: Convolutional architecture for fast feature embedding, in: Proceedings of the ACM International Conference on Multimedia, ACM, 2014,
735
CR IP T
pp. 675–678. [4] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556.
[5] J. R. Wolpaw, N. Birbaumer, W. J. Heetderks, D. J. McFarland, P. H. Peckham, G. Schalk, E. Donchin, L. A. Quatrano, C. J. Robinson, T. M. Vaughan, et al.,
740
AN US
Brain-computer interface technology: a review of the first international meeting, IEEE transactions on rehabilitation engineering 8 (2) (2000) 164–173.
[6] P. Sajda, A. Gerson, L. Parra, High-throughput image search via single-trial event detection in a rapid serial visual presentation task, in: Neural Engineering, 2003. Conference Proceedings. First International IEEE EMBS Conference on, IEEE,
745
M
2003, pp. 7–10.
[7] J. Mak, Y. Arbel, J. Minett, L. McCane, B. Yuksel, D. Ryan, D. Thompson,
ED
L. Bianchi, D. Erdogmus, Optimizing the p300-based brain–computer interface: current status, limitations and future directions, Journal of neural engineering 8 (2) (2011) 025003.
750
PT
[8] M. Uˇsc´ umli´c, R. Chavarriaga, J. d. R. Mill´an, An iterative framework for eegbased image search: robust retrieval with weak classifiers, PloS one 8 (8) (2013)
CE
e72018.
[9] L. C. Parra, C. D. Spence, A. D. Gerson, P. Sajda, Recipes for the linear analysis of eeg, Neuroimage 28 (2) (2005) 326–341.
AC
755
[10] A. D. Gerson, L. C. Parra, P. Sajda, Cortically coupled computer vision for rapid image search, Neural Systems and Rehabilitation Engineering, IEEE Transactions on 14 (2) (2006) 174–179.
37
ACCEPTED MANUSCRIPT
[11] N. Bigdely-Shamlo, A. Vankov, R. R. Ramirez, S. Makeig, Brain activity-based image classification from rapid serial visual presentation, Neural Systems and
760
Rehabilitation Engineering, IEEE Transactions on 16 (5) (2008) 432–441.
CR IP T
[12] M. Dyrholm, L. C. Parra, Smooth bilinear classification of eeg, in: Engineering
in Medicine and Biology Society, 2006. EMBS’06. 28th Annual International Conference of the IEEE, IEEE, 2006, pp. 4249–4252. 765
[13] M. Dyrholm, C. Christoforou, L. C. Parra, Bilinear discriminant component analysis, The Journal of Machine Learning Research 8 (2007) 1097–1111.
AN US
[14] L. C. Parra, C. Christoforou, A. D. Gerson, M. Dyrholm, A. Luo, M. Wagner,
M. Philiastides, P. Sajda, Spatiotemporal linear decoding of brain state, Signal Processing Magazine, IEEE 25 (1) (2008) 107–115. 770
[15] S. Mathan, S. Whitlow, D. Erdogmus, M. Pavel, P. Ververs, M. Dorneich, Neurophysiologically driven image triage: a pilot study, in: CHI’06 Extended Abstracts
M
on Human Factors in Computing Systems, ACM, 2006, pp. 1085–1090. [16] S. Xiao, B. Cai, L. Jiang, Y. Wang, W. Chen, X. Zheng, Erp component analysis
ED
for rapid image searching in finer categories, in: 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC),
775
PT
IEEE, 2013, pp. 3089–3092.
[17] D. Izzo, M. Rucinski, C. Ampatzis, E. Martin, On the eeg footprint of image
CE
saliency.
[18] E. Haselsteiner, G. Pfurtscheller, Using time-dependent neural networks for eeg classification, IEEE transactions on rehabilitation engineering 8 (4) (2000) 457–
AC
780
463.
[19] F. Akram, S. M. Han, T.-S. Kim, An efficient word typing p300-bci system using a modified t9 interface and random forest classifier, Computers in biology and medicine 56 (2015) 30–36.
38
ACCEPTED MANUSCRIPT
785
[20] S. Sun, C. Zhang, D. Zhang, An experimental evaluation of ensemble methods for eeg signal classification, Pattern Recognition Letters 28 (15) (2007) 2157–2163. [21] J. Wang, E. Pohlmeyer, B. Hanna, Y.-G. Jiang, P. Sajda, S.-F. Chang, Brain state conference on Multimedia, ACM, 2009, pp. 945–954.
790
CR IP T
decoding for rapid image retrieval, in: Proceedings of the 17th ACM international
[22] P. Sajda, E. Pohlmeyer, J. Wang, L. C. Parra, C. Christoforou, J. Dmochowski,
B. Hanna, C. Bahlmann, M. K. Singh, S.-F. Chang, In a blink of an eye and a switch of a transistor: cortically coupled computer vision, Proceedings of the
AN US
IEEE 98 (3) (2010) 462–478.
[23] Y. Wang, L. Jiang, B. Cai, Y. Wang, S. Zhang, X. Zheng, A closed-loop system for rapid face retrieval by combining eeg and computer vision, in: Neural Engi-
795
neering (NER), 2015 7th International IEEE/EMBS Conference on, IEEE, 2015, pp. 130–133.
M
[24] Y. Wang, L. Jiang, Y. Wang, B. Cai, Y. Wang, W. Chen, S. Zhang, X. Zheng, An iterative approach for eeg-based rapid face search: A refined retrieval by brain computer interfaces, Autonomous Mental Development, IEEE Transactions on
ED
800
7 (3) (2015) 211–222.
PT
[25] G. Griffin, A. Holub, P. Perona, Caltech-256 object category dataset. [26] M. C. Potter, E. I. Levy, Recognition memory for a rapid sequence of pictures.,
CE
Journal of experimental psychology 81 (1) (1969) 10. 805
[27] E. Alfaro, M. G´amez, N. Garcia, Adabag: An r package for classification with boosting and bagging, Journal of Statistical Software 54 (2) (2013) 1–35.
AC
[28] F. Kemp, Modern applied statistics with s, Journal of the Royal Statistical Society: Series D (The Statistician) 52 (4) (2003) 704–705.
[29] A. Liaw, M. Wiener, Classification and regression by randomforest, R news 2 (3)
810
(2002) 18–22.
39
ACCEPTED MANUSCRIPT
[30] E. Dimitriadou, K. Hornik, F. Leisch, D. Meyer, A. Weingessel, Misc functions of the department of statistics (e1071), tu wien, R package 1 (2008) 5–24. [31] K. C. Squires, E. Donchin, R. I. Herning, G. McCarthy, On the influence of task
CR IP T
relevance and stimulus probability on event-related-potential components, Electroencephalography and clinical neurophysiology 42 (1) (1977) 1–14.
815
[32] E. Bullmore, S. Rabe-Hesketh, R. Morris, S. Williams, L. Gregory, J. Gray,
M. Brammer, Functional magnetic resonance image analysis of a large-scale neurocognitive network, NeuroImage 4 (1) (1996) 16–33.
AN US
[33] Z. Mao, V. Lawhern, L. M. Merino, K. Ball, L. Deng, B. J. Lance, K. Rob-
bins, Y. Huang, Classification of non-time-locked rapid serial visual presentation
820
events for brain-computer interaction using deep learning, in: Signal and Information Processing (ChinaSIP), 2014 IEEE China Summit & International Conference on, IEEE, 2014, pp. 520–524.
M
[34] Y. Ren, Y. Wu, Convolutional deep belief networks for feature extraction of eeg signal, in: 2014 International Joint Conference on Neural Networks (IJCNN),
825
IEEE, 2014, pp. 2850–2853.
ED
[35] S. Ding, N. Zhang, X. Xu, L. Guo, J. Zhang, Deep extreme learning machine and its application in eeg classification, Mathematical Problems in Engineering 2015.
PT
[36] A. R. Marathe, A. J. Ries, V. J. Lawhern, B. J. Lance, J. Touryan, K. McDowell, H. Cecotti, The effect of target and non-target similarity on neural classification
830
CE
performance: a boost from confidence, Frontiers in neuroscience 9. [37] Z. Qiu, J. Jin, H.-K. Lam, Y. Zhang, X. Wang, A. Cichocki, Improved sffs method
AC
for channel selection in motor imagery based bci, Neurocomputing.
[38] J. Li, Z. Struzik, L. Zhang, A. Cichocki, Feature learning from incomplete eeg
835
with denoising autoencoder, Neurocomputing 165 (2015) 23–31.
[39] Y. Zhang, G. Zhou, J. Jin, X. Wang, A. Cichocki, Optimizing spatial patterns with sparse filter bands for motor-imagery based brain–computer interface, Journal of neuroscience methods 255 (2015) 85–91. 40
ACCEPTED MANUSCRIPT
[40] K. Sadatnejad, S. S. Ghidary, Kernel learning over the manifold of symmetric positive definite matrices for dimensionality reduction in a bci application, Neu-
840
rocomputing 179 (2016) 152–160.
CR IP T
[41] R. Ameri, A. Pouyan, V. Abolghasemi, Projective dictionary pair learning for eeg
signal classification in brain computer interface applications, Neurocomputing 218 (2016) 382–389. 845
[42] J. Li, Y. Wang, L. Zhang, A. Cichocki, T.-P. Jung, Decoding eeg in cognitive tasks with time-frequency and connectivity masks.
AN US
[43] Y. Zhang, G. Zhou, Q. Zhao, A. Cichocki, X. Wang, Fast nonnegative tensor fac-
torization based on accelerated proximal gradient and low-rank approximation, Neurocomputing 198 (2016) 148–154. 850
[44] Y. Zhang, Q. Zhao, G. Zhou, J. Jin, X. Wang, A. Cichocki, Removal of eeg artifacts for bci applications using fully bayesian tensor completion, in: 2016 IEEE
M
International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2016, pp. 819–823.
855
ED
[45] M. Uˇsc´ umli´c, B. Blankertz, Active visual search in non-stationary scenes: coping with temporal variability and uncertainty, Journal of neural engineering 13 (1)
AC
CE
PT
(2016) 016015.
41
ACCEPTED MANUSCRIPT
Biography of the Authors Liangtao Huang
CR IP T
Liangtao Huang received his B.S. degree of applied mathematics in the summer of 2014 from the Information Engineering University, Zhengzhou, China. Now, he is a Master degree candidate of applied mathematics in National Digital Switching System Engineering & Technological R&D Center (NDSC), China. His research interests are in statistics, machine learning and their applications. Yaqun Zhao
AN US
Yaqun Zhao was born in 1961. She received her B.S., M.S. and Ph.D. degree in Mathematics from Zhengzhou Information Science and Technology Institute, China. She is currently a professor with National Digital Switching System Engineering & Technological R&D Center in Zhengzhou, China. Her research interests are in statistics, applied probability and data analysis. Ying Zeng
Zhimin Lin
ED
M
Ying Zeng was born in 1983. She received the B.S., M.S., and Ph.D. degrees from the Zhengzhou Information Science and Technology Institute in 2004, 2007, and 2011, respectively. Currently, she is a lecturer with Zhengzhou Information Science and Technology Institute. Her current research interests are in pattern recognition, brain-computer interface (BCI) and information security technique.\par
AC
CE
PT
Zhimin Lin was born in 1992. He received the B.S. degree in electrical engineering from Nantong University, Nantong, China, in 2014. He is currently working toward the M.S. degree at the Zhengzhou Information Science and Technology Institute, Zhengzhou. His current research interests include biomedical signal processing and brain computer interface.
ACCEPTED MANUSCRIPT
Photo of the Authors
Yaqun Zhao
AC
CE
PT
ED
Ying Zeng
M
AN US
CR IP T
Liangtao Huang
Zhimin Lin