Accepted Manuscript
Kernel quaternion principal component analysis and its application in RGB-D object recognition Beijing Chen , Jianhao Yang , Byeungwoo Jeon , Xinpeng Zhang PII: DOI: Reference:
S0925-2312(17)30897-4 10.1016/j.neucom.2017.05.047 NEUCOM 18465
To appear in:
Neurocomputing
Received date: Revised date: Accepted date:
19 October 2016 18 March 2017 21 May 2017
Please cite this article as: Beijing Chen , Jianhao Yang , Byeungwoo Jeon , Xinpeng Zhang , Kernel quaternion principal component analysis and its application in RGB-D object recognition, Neurocomputing (2017), doi: 10.1016/j.neucom.2017.05.047
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Kernel quaternion principal component analysis and its application in RGB-D object recognition Beijing Chena,b,c, Jianhao Yanga, Byeungwoo Jeonc, Xinpeng Zhangd a
Jiangsu Engineering Center of Network Monitoring, School of Computer & Software, Nanjing University of Information Science & Technology, Nanjing 210044, China
Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science & Technology, Nanjing 210044, China c
CR IP T
b
College of Information & Communication Engineering, Sungkyunkwan University, Suwon 440746, Korea
d
AN US
School of Communication & Information Engineering, Shanghai University, Shanghai 200072, China.
ABSTRACT
While the existing quaternion principal component analysis (QPCA) is a linear tool developed mainly for processing linear quaternion
M
signals, the quaternion representation (QR) used in QPCA creates redundancy when representing a color image signal of three components by a
quaternion matrix having four components. In this paper, the kernel technique is used to improve the QPCA as kernel QPCA (KQPCA) for
ED
processing nonlinear quaternion signals; in addition, both RGB information and depth information are considered to improve QR for
PT
representing RGB-D images. The improved QR fully utilizes the four-dimensional quaternion domain. We first provide the basic idea of three
types of our KQPCA and then propose an algorithm for RGB-D object recognition based on bidirectional two-dimensional KQPCA
CE
(BD2DKQPCA) and the improved QR. Experimental results on four public datasets demonstrate that the proposed BD2DKQPCA-based
AC
algorithm achieves the best performance among seventeen compared algorithms including other existing PCA-based algorithms, irrespective of
RGB object recognition or RGB-D object recognition. Moreover, for all compared algorithms, consideration of both RGB and depth
information is shown to achieve better performance in object recognition than considering only RGB information.
Keywords:
Principal component analysis; quaternion; kernel function; RGB-D object recognition
Corresponding author. Tel.: +82-31-290-7144.
E-mail address:
[email protected].
1
ACCEPTED MANUSCRIPT
1.
Introduction
The quaternion is a generalization of the complex number. During the past two decades, it starts to be more seriously considered to deal with color image signals by encoding their three channels into the imaginary parts of the quaternion
CR IP T
representation [1-17]. Its main advantage for representing and processing color signal lies in that a color image signal can be treated holistically as a vector field [1, 2, 14, 15, 17]. Many classical tools developed for gray-scale images have been successfully generalized to color image processing using the quaternion algebra, such as Fourier transform [2, 3], wavelet transform [4, 5], neural networks [6, 7], principal component analysis [8, 9], singular value decomposition [10], independent
AN US
component analysis [11], polar harmonic transform [12], and moments [13-16], etc.
However, the extra fourth dimension in quaternion when representing color images of three components creates redundancy and brings in additional computational cost. Assefa et al. [18] tried to circumvent these disadvantages by introducing a new
M
representation scheme in three-dimensional space using the trinion which has one real and two imaginary components, and subsequently the trinion Fourier transform based on this new representation. However, only few color image processing
ED
approaches adopt this representation, while more and more published works still use the quaternion representation (QR). Some
PT
of the main reasons can be: (a) the theory of trinions still needs be concretized more compared to that of the quaternions which provides theoretical basis for quaternion-based color image processing [1-17]; (b) the QR has been successfully used in many
CE
fields of color image processing [1-17]. So, this paper still considers the QR and resolves its redundancy problem by introducing
AC
the additional important depth information into the QR. Recent popularity of Kinect devices makes it easier to capture RGB-D images carrying both color and depth information [19]. The depth information has many extra advantages: being invariant to lighting and color variations, allowing better separation from background and providing pure geometry and shape cues [20]. So, combining color and depth information can dramatically improve the performance of many vision problems, e.g., object recognition [19-31], object detection [32], object tracking [33], and human activity analysis [34]. Among those problems, the RGB-D object recognition has attracted the most attention [19-31].
2
ACCEPTED MANUSCRIPT
However, we note that the quaternion-based work in the area of RGB-D object recognition is yet absent. Thus, it is our motivation to improve the QR for RGB-D object recognition by considering both color and depth information in an efficient way. In the areas of pattern recognition, computer vision, and multimedia signal processing, while the principal component analysis
CR IP T
(PCA) [35, 36] is a well-known feature extraction technique from possibly high-dimensional data set, its basic nature of being a linear technique does not allow effective treatment of the possibly higher-order statistics of signals since some factors like variations in illumination, shape, and pose make the pattern recognition problems highly nonlinear in real situation [37-39]. This
AN US
well explains why the concept of kernel PCA (KPCA) [40] has been introduced. KPCA is the nonlinear generalization of the PCA via the kernel trick - it first maps the input data into a high-dimensional feature space to convert the nonlinear input data into linear ones, and then performs the conventional PCA in the feature space [40, 41].
Recently, using the quaternion algebra, the so-called quaternion PCA (QPCA) [8], two-dimensional QPCA (2DQPCA) and
M
bidirectional 2DQPCA (BD2DQPCA) [9] have been proposed to generalize the conventional PCA to color image signals. They
ED
have been successfully used in color face recognition [9, 42], color texture segmentation [43], color image watermarking [44], and multispectral palmprint recognition [45], etc. However, these works using the conventional QR is still no exception from
PT
previously mentioned problem of redundancy. Moreover, the quaternion version of KPCA is still absent. So, in this paper we
CE
propose the kernel QPCA (KQPCA) and its two-dimensional version, two-dimensional KQPCA (2DKQPCA) as well as bidirectional 2DKQPCA (BD2DKQPCA), and then apply them to the RGB-D object recognition problem by combining it with
AC
the improved QR.
This paper is organized as follows. In Section 2, we recall some basic features of quaternions, and then present some types of QPCA and KPCA. Section 3, the theoretical part of this paper, provides some types of the proposed KQPCA. The improved QR and an algorithm for RGB-D object recognition based on BD2DKQPCA are given in Section 4. Experimental results and analysis are provided in Section 5 to illustrate the performance of the proposed KQPCA. Section 6 concludes the paper. 3
ACCEPTED MANUSCRIPT
2.
Some Preliminaries
2.1 Quaternion number and quaternion color representation Quaternion introduced by the mathematician Hamilton in 1843 [46] is a generalization of complex number - a quaternion has
q qr qi i q j j qk k ,
CR IP T
one real part and three imaginary parts given by, (1)
where qr, qi, qj, qk R, and i, j, k are three imaginary units obeying the following rules,
i 2 j 2 k 2 1, ij = ji = k , jk = kj = i , ki = ik = j .
(2)
AN US
Moreover, i, j, k are also the fundamental quaternion units. The rules given in (2) help the algebra of quaternions to be the important normed division algebra, where the norm of a product equals the product of the norms, and where every element of the algebra (except zero) has a multiplicative inverse [47]. If the real part qr = 0 in (1), q is called a pure quaternion.
Let q1 = q1,r + q1,i i + q1,j j+ q1,k k and q2 = q2,r + q2,i i + q2,j j+ q2,k k be two quaternion numbers, according to the rules given in
M
(2), the addition and multipulication of q1 and q2 are respectively provided by,
q1 + q2 = (q1,r + q2,r) + (q1,i + q2,i)i + (q1,j + q2,j)j+ (q1,k + q2,k)k,
ED
q1 q2 (q1, r q2, r q1,i q2,i q1, j q2, j q1, k q2, k ) (q1, r q2,i q1,i q2, r q1, j q2, k q1, k q2, j )i
PT
q1, r q2, j q1, j q2, r q1, k q2,i q1,i q2, k j q1,r q2,k q1,k q2,r q1,i q2, j q1, j q2,i k.
(3) (4)
AC
CE
The conjugate and modulus of a quaternion q in (1) are respectively defined as,
q* qr qi i q j j qk k ,
(5)
q (qr )2 (qi )2 (q j )2 (qk ) 2 .
(6)
For a quaternion q = qr + qv, qv = qi i + qj j+ qk k, its logarithm and exponential are separately given by [48],
ln(q) ln( q )
qv qv
q arccos r , q
q exp(q) exp(qr ) cos( qv ) v sin( qv qv
(7)
) .
Let f(u, v) be an RGB image function with the QR, each pixel can be represented as a pure quaternion,
4
(8)
ACCEPTED MANUSCRIPT
f (u, v) f R (u, v)i fG (u, v) j f B (u, v)k ,
(9)
where fR(u, v), fG(u, v) and fB(u, v) are respectively the red, green, and blue components of the pixel (u, v). 2.2 Quaternion version of linear PCA
(1DQPCA), 2DQPCA, and BD2DQPCA.
CR IP T
Here, we briefly describe three types of linear QPCA for quaternion color images given in [8, 9], i.e., one-dimensional QPCA
Let x be the 1D version of a given M×N color image represented by QR in (9), 1DQPCA linearly projects the MN×1 pure quaternion vector x to the r1DQPCA dimensional feature subspace as y1DQPCA,
(10)
AN US
T y1DQPCA W1DQPCA (x x),
where W1DQPCA is the MN×r1DQPCA dimensional QPCA projection matrix, (•)T represents the conjugate transpose of the quaternion matrix, x is the mean vector of all training images. The method for computing the projection matrix W1DQPCA based
M
on quaternion singular value decomposition can be found in [8].
However, 2DQPCA treats the color image directly through representing the M×N color image by the M×N pure quaternion
ED
matrix X as (9) and then projects X to the M×rrow dimensional feature subspace as Y2DQPCA in the row direction, (11)
PT
Y2DQPCA (X X)Wrow ,
where Wrow is the 2DQPCA projection matrix of N×rrow dimension in the row direction, X is the mean matrix of all training
CE
images. More detail about the computation of the 2DQPCA projection matrix Wrow can be found in [9]. In fact, 2DQPCA can
AC
also project X to rcol×N dimensional feature subspace in the column direction, T Y2DQPCA Wcol (X X),
(12)
where Wcol is the M×rcol dimensional 2DQPCA projection matrix in the column direction. Finally, BD2DQPCA projects the quaternion image matrix X to the rcol × rrow dimensional feature subspace respectively in the row and column directions as YBD2DQPCA, T YBD2DQPCA Wcol (X X)Wrow .
5
(13)
ACCEPTED MANUSCRIPT
2.3 Nonlinear kernel PCA In this subsection, we briefly introduce three types of nonlinear KPCA, i.e., one-dimensional KPCA (1DKPCA) [38], two-dimensional KPCA (2DKPCA) [49], and bidirectional 2DKPCA (BD2DKPCA) [50]. KPCA is a nonlinear generalization version of the conventional PCA. The basic idea of 1DKPCA is first to map the original
φ: RN→F, xs→ xˆ s , s =1, 2, …, S, and then to perform a linear one-dimensional PCA in F.
CR IP T
input data xs (s =1, 2, …, S) of size N×1 into a high-dimensional feature space F via a nonlinear mapping,
(14)
AN US
Because we only need to compute the dot product between the mapped patterns and never need the mapped patterns explicitly, in order to avoid the high computational load in the possibly very high-dimensional space F, the kernel technique is introduced in [40]. The covariance matrix is replaced by the kernel matrix K=(ks,t)S×S using the kernel functions, (15)
M
ks,t = k(xs, xt) =< xˆ s , xˆ t >. The commonly-used kernel functions are as follows,
Polynomial kernels Gaussian kernels
(16)
Sigmoid kernels
PT
ED
xs xt c b x x k (xs , xt ) exp s 2 t 2 tanh( x x a) s t
CE
where b, c, σ, ρ, and a are some real numbers. For 2D images, one can in advance transform 2D image matrices into 1D image vectors before conducting 1DKPCA.
AC
However, in this method, it is difficult to evaluate the covariance matrix accurately due to its large size [40, 42]. So, Nhat and Lee [49] proposed 2DKPCA to consider the rows of 2D image for the nonlinear mapping. The nonlinear mapping for the 2D image samples Xs (s =1, 2, …, S) of size M×N is given by,
ˆ , s =1, 2, …, S, m =1, 2, …, M, : R N F, Xs,m X s,m
(17)
where Xs,m is the mth row vector of the sth sample Xs. The following procedures, i.e., kernel matrix calculation, eigenvalue
6
ACCEPTED MANUSCRIPT
calculation and projection, are similar to the 1DKPCA by treating each row vector as a sample. Of course, one can also consider the columns of 2D image for the nonlinear mapping. It is similar to the 2DKPCA in row direction.
In fact, the 2DKPCA is essentially the 1DKPCA on rows or columns of an image. 2DKPCA only considers the correlation among the row or column vectors of the image matrix and ignores the other one [50]. So, Zhang et al. [50]
CR IP T
introduced the BD2DKPCA by integrating the row-direction 2DKPCA and the column-direction 2DKPCA. Moreover, to further alleviate the computational cost of 2DKPCA, they used the mean row (or column) vector to approximate the 2D image. After performing both the row direction 2DKPCA and the column direction 2DKPCA as well as transforming two
AN US
feature matrices into two 1D vectors, they presented two integrating ways: the first one applied PCA to process two 1D vectors and then combined the results into one feature vector for classification, while the other one first combined two 1D vectors into one vector and then applied PCA to this vector.
Kernel Quaternion PCA
M
3.
ED
According to the definitions of QPCA and KPCA, this section introduces three types of KQPCA: one-dimensional KQPCA
PT
(1DKQPCA), 2DKQPCA, and BD2DKQPCA. These three types of KQPCA can be applied to process quaternion data, including color image data represented by the QR in (9).
CE
3.1 1DKQPCA
AC
Given a set of 1D training quaternion data samples xqs (s =1, 2, …, S) with the dimension M×1. Notice that here the superscript ―q‖ means that the variable is a quaternion variable. The same meaning is in the following of this paper. A quaternion function ψ is used to map the vector data in a quaternion space QM into a high-dimensional quaternion feature space G as,
: QM G, xqs xˆ qs , s =1, 2, …, S.
(18)
After the mapping, the quaternion covariance matrix can be denoted by,
Cq
T 1 S q xˆ s xsq xˆ qs xsq , S s 1
7
(19)
ACCEPTED MANUSCRIPT
where xsq is the mean vector of all mapped training samples xˆ qs . However, similar to KPCA, a high computational load may perform in the possibly very high-dimensional space G. Moreover, in fact we do not need the mapped patterns explicitly in the following procedures. So, the covariance matrix is replaced with the quaternion kernel matrix K q (ksq,t )S S by computing the dot product of two patterns in G with the kernel function as follow,
CR IP T
ksq,t k (xqs , xqt ) xˆ qs , xˆ qt .
(20)
Here the kernel function is the quaternion version of that given in (16). The quaternion polynomial kernel is considered in this paper due to its simple form and its quaternion-preserving property (the mapped pattern of a quaternion pattern is still a
AN US
quaternion one),
ksq,t k (xqs , xqt ) xsq xqt c exp ln(xsq xqt c) b , b
(21)
where b and c are two real numbers, exp(•) and ln(•) are respectively the quaternion exponential function given in (7) and the
M
quaternion logarithmic function in (8).
Before calculating the eigenvalues of the quaternion matrix Kq, Kq is centralized as,
Kq Kq 1S K q K q 1S 1S K q 1S ,
ED
(22)
PT
where 1S R S×S, (1S)s,t=1/S. The centralization property of K q is proofed in the following.
CE
Proof. Let K q (ksq,t ) S S and K q (ksq,t )S S , from (22) we have,
ksq,t ksq,t
1 S q 1 S q 1 ks ,t 2 ks ,t S S s 1 S t 1
S
S
k s 1 t 1
q s ,t
.
(23)
AC
Then, we obtain the sum of all elements in the matrix K q as, S
S
k s 1 t 1
S
q s ,t
S
S
S
S
S
S
S
ksq,t ksq,t ksq,t ksq,t 0 . s 1 t 1
s 1 t 1
s 1 t 1
(24)
s 1 t 1
□
So, the matrix K q is a centralized version of Kq. Then, the eigenvalue problem of 1DKQPCA is calculated as,
Kq w w ,
(25)
where λ and w are respectively the eigenvalue and the eigenvector. It is easy to obtain the eigenvalues and eigenvectors for a real
8
ACCEPTED MANUSCRIPT
matrix, while it is no so for a quaternion matrix due to the quaternion computation complexity. In order to resolve this eigenvalue problem, the method for self-conjugate quaternion matrices used in [9] and [44] is considered based on the following theorem given in [9, 44].
K 2 q 1 , λs be the eigenvalues of Kσ, for the two different * be the educing matrix of K K 1
CR IP T
K1 Theorem 1. Let Kσ = K * 2
eigenvalues λs, λt, we can construct the following two matrices,
I (K σ s I) (K σ s I) , I (K σ t I) (K σ t I) ,
(26)
where I is the unit matrix, the symbol ―+‖ represents the Moore-Penrose pseudo-inverse matrix. If (η1 η2)T and (β1 β2)T are the
AN US
first column vectors of the two matrices in (26), the eigenvectors of K q corresponding to the eigenvalues λs and λt can be expressed as η1+i(η2)*, and β1+i(β2)*, which are mutually orthogonal.
Suppose wt, t = 1, 2,…, r, are the normalized solutions of (25) corresponding to the largest r eigenvalues. Then, the projection
M
matrix is given by,
W = [w1, w2, …, wr]QS×r.
ED
(27)
For a new sample xq, one can simply project the mapped sample as,
PT
S y1DKQPCA WT K qx ( ytq )r 1 wt , s k (xq , xqs ) , s 1 r 1
, k (xq , xqS )
T
is the centralized kernel matrix
CE
where wt,s is the sth coefficient of the eigenvector wt, K qx k (xq , x1q ), k (xq , xq2 ),
(28)
by (22), whose element k (xq , xsq ) , s = 1, 2,…, S, are the quaternion kernel coefficients between the new sample xq and all S
AC
training samples xqs computed by (21). 3.2 2DKQPCA
For 2D training quaternion data sample, one can transform it into 1D vector to perform KQPCA. However, it makes the kernel matrix too large to calculate accurately [49, 51] and ignores the intrinsic spatial structure information of the 2D data [50]. So,
1
(•)*
is the conjugate operator of the complex matrix, K q = Kr+iKi+jKj+kKk = (Kr+iKi) + (Kj+iKk)j := K1+K2j, where Kr, Ki,
Kj and Kk are four components of the quaternion matrix K q . 9
ACCEPTED MANUSCRIPT
using the similar method to 2DKPCA, 2DKQPCA considers the rows or columns of 2D quaternion sample Xqs with the size M×N for nonlinear mapping as,
ˆ q,row , s =1, 2, …, S, m =1, 2, …, M, row : QN G, Xq,row X s,m s,m
(29)
q,col ˆ q,col , s =1, 2, …, S, n =1, 2, …, N, col : QM G, Xs,n X s,n
(30)
CR IP T
q,col q where Xq,row s,m and Xs,n are respectively the mth row vector and the nth column vector of the sth quaternion data sample X s . The
following procedures, i.e., kernel matrix calculation, eigenvalue calculation and projection, are similar to the 1DKQPCA by treating each row or column vector as a sample, thus they are omitted here.
AN US
In fact, the dimension of the kernel matrix of 2DKQPCA is usually still too large. For a set of S quaternion data samples Xqs with the dimension M × N, the dimension of the kernel matrix is SM × SM for 2DKQPCA in row direction and SN × SN in column direction. The large dimension will result in heavy computational cost for the subsequent procedure of eigenvalue
M
calculation. So, to further alleviate computational cost, the method for 2DKPCA by Zhang et al. [50] is considered in this paper for 2DKQPCA. The mean row vector (or column vector) of M row vectors (or N column vectors) instead of M row vectors (or N
ED
column vectors) themselves is used in this paper to represent a sample for nonlinear mapping,
PT
ˆ q,row , s =1, 2, …, S, row : QN G, Xq,row X s s ˆ q,col , s =1, 2, …, S, col : QM G, Xsq,col X s
(31) (32)
Xq,row s
1 M
M
X m 1
q,row s,m
, Xq,col s
1 N q,col Xs,n . N n 1
(33)
AC
CE
where Xq,row and Xq,col are respectively the mean row vector and the mean column vector of the sample Xqs obtained by, s s
This processing makes the dimension of the kernel matrix be S × S. The computational cost is reduced significantly. In fact, this method reduces 2DKQPCA to 1DKQPCA by representing the 2D images as the 1D mean vector. 3.3 BD2DKQPCA 2DKQPCA only considers the correlation among the row or column vectors of the data matrix and ignores the other one. So, in order to make 2DKQPCA more efficient in the following RGB-D objects recognition application, BD2DKQPCA is introduced
10
ACCEPTED MANUSCRIPT
in this subsection. Similar to BD2DKPCA in [50], the basic idea of BD2DKQPCA is first to respectively perform 2DKQPCA in row direction and in column direction, and then to fuse the feature vectors for classification. The details for implementation step can be found in the subsection 4.2 by combining the application in RGB-D object recognition.
RGB-D Object Recognition Based on KQPCA
CR IP T
4.
KQPCA presented in previous section can process the general four dimensional quaternion signals, including three dimensional color image signal represented by the QR. However, as mentioned in the introduction, the conventional QR creates
AN US
redundancy when using four-dimensional quaternions to represent color images of three components [18]. So, in this paper the RGB-D object recognition application is considered by improving the QR for the RGB-D object images. We provide an algorithm for RGB-D object recognition by using the improved QR and BD2DKQPCA in this section. The algorithms based on
M
other types of KQPCA are similar to the following BD2DKQPCA-based algorithm, thus they are omitted. 4.1 Improved quaternion representation of RGB-D images
ED
It is well-known that the depth feature is invariant to lighting and color variations [20]. In order to resolve the redundancy problem, the existing QR is improved for RGB-D images by combining both of the color and depth information.
PT
Let g(u, v) be an RGB-D image function, each of its pixel can be represented as a quaternion number, (34)
CE
g(u, v) = gD(u, v) + gR(u, v)i + gG(u, v)j + gB(u, v)k,
where gD(u, v), gR(u, v), gG(u, v) and gB(u, v) are respectively the depth, red, green and blue components of the pixel (u, v). The
AC
new QR of RGB-D image in (34) improves the existing QR of color image in (9) at least in two aspects: (a) it considers the color information as well as the important depth information together; (b) it does not create redundancy by encoding the depth information into the extra dimension (see the real part in (34)). 4.2 RGB-D object recognition using the improved QR and BD2DKQPCA The flow chart of the proposed RGB-D object recognition algorithm is given in Fig. 1. The steps of training and testing procedures are respectively described in the following.
11
ACCEPTED MANUSCRIPT
RGB-D object testing set Zt Improved QR
Zqt
Mean row vectors calculation
RGB-D object training set
Zq,row t
Xs
Zq,col t
Xqs
k (Ζq,row , Xq,row ) t s Wrow
X
Mean column vectors calculation
X
q,row s
y
Column 2DKQPCA
Column projection
matrix Wrow and
matrix Wcol and
X,row feature y s
feature y sX,col
Z,row t
Column projection y tZ,col
y sX,col
Classification using the nearest neighbor classifier
AN US
Row projection
k (Ζq,col , Xq,col ) t s
Wcol
Row projection
q,col s
y sX,row
Row 2DKQPCA
Calculation of the kernel function between each test sample and each training sample in column direction
CR IP T
Calculation of the kernel function between each test sample and each training sample in row direction
Improved QR
Mean row vectors calculation
Mean column vectors calculation
Recognition results
(a) Training procedure
(b) Testing procedure
Fig. 1 Flow chart of the proposed RGB-D object recognition algorithm based on BD2DKQPCA
Training procedure:
M
1) Represent each RGB-D sample Xs, s = 1, 2, …, S, in the training set by the improved quaternion representation (34) as Xqs .
Xq,ξ s , s = 1, 2, …, S, ξ{row, col}.
ED
2) Calculate the mean row vector and the mean column vector of each quaternion-based training sample Xqs by using (33) as
PT
3) Respectively perform 1DKQPCA to the mean row vectors Xq,row and the column ones Xq,col by following the steps given in s s
CE
the subsection 3.1 from (20) to (27): (20) and (21) for the kernel matrix computation, (22) for the kenel matrix centralization, (25) and Theorem 1 for the solution of eigenvalue and eigenvector, (27) for the projection matrix computation. Finally, obtain the row
AC
projection matrix Wrow and the column one Wcol. 4) Project each quaternion-based training sample Xqs by (28) in both directions with the projection matrics Wrow and Wcol to obtain the row and column projected features of the training set y sX,ξ { ysX, h, | h 1, 2,..., r } , s = 1, 2, …, S, ξ{row, col}.
Testing procedure: 1) Represent each RGB-D sample Zt, t = 1, 2, …, T, in the testing set by the improved quaternion representation (34) as Z qt .
12
ACCEPTED MANUSCRIPT
2) Calculate the mean row vectors and the mean column vectors of each quaternion-based test sample Z qt by using (33) as Zq,ξ t , t = 1, 2, …, T, ξ{row, col}. 3) Compute the kernel function between each test sample and each training sample in both of the row and column directions by q,ξ (21) as k( Zq,ξ t , Xs ), t = 1, 2, …, T, s = 1, 2, …, S, ξ{row, col}. In addition, centralize the obtaining kernel matrix by (22).
CR IP T
4) Using the projection matrics Wrow and Wcol obtained in the training procedure, project each quaternion-based test sample Z qt in both directions by (28) to achieve the row and column projected features of the testing set y tZ,ξ { ytZ, h, | h 1, 2,..., r } , t = 1, 2, …, T, ξ{row, col}.
AN US
5) Make a classification task using the nearest neighbor classifier with the fused quaternion Euclidean distances. The distances between the test samples and the training samples are based on their corresponding projected features y tZ,ξ and y sX,ξ . Notice that other classifiers can also be considered, such as SVM, neural network, and Bayes, etc.
M
Here we mainly describe step 5) since other steps have been introduced in the Section 3 and Subsection 4.1. It is well-known that the feature distance is the key issue in the nearest neighbor classifier. Take the recognition of the test
ED
sample Zt as an example. To decide the class of Zt, firstly, the row and column directional distances between the test sample Zt
PT
and each training sample Xs, s = 1, 2, …, S, based on their corresponding projected features y tZ,ξ and y sX,ξ are defined by,
2
r
y h 1
Z , t ,h
ysX, h,
2
, ξ{row, col},
(35)
CE
d s,t y tZ,ξ y sX,ξ
where ||·||2 is the quaternion vector Euclidean distance given in [9], |·| is the quaternion modulus operator defined in (6).
AC
Before the fusion of two directional distances, they are normalized using the method by dividing their corresponding maximum value as [52, 53],
d s,t
d s,t max(d s,t )
, ξ{row, col}.
(36)
1 s S
Here, the denominator max(d s,t ) means the maximum value of all S directional distances d s,t , s = 1, 2, …, S, corresponding to the 1 s S
test sample Z qt and each training sample Xqs .
13
ACCEPTED MANUSCRIPT
Then, the final distance between two samples Xs and Zt is obtained by fusing two normalized directional distances with the weight α [0, 1] as, col d s ,t d srow ,t (1 )d s ,t .
(37)
Finally, the nearest neighbor classifier is used for classification based on the fused distance. If s* arg min( d s ,t ) , then the test sample Zt is decided to belong to the class of the training sample Xs*.
5.
Experimental results and analysis
CR IP T
1 s S
This section provides experimental results in RGB-D object recognition to illustrate the performance of the proposed KQPCA.
AN US
The proposed KQPCA-based algorithm considered four types of KQPCA: 1DKQPCA, 2DKQPCA_Row, 2DKQPCA_Col, and BD2DKQPCA. 1DKQPCA converts 2D RGB-D images into 1D quaternion vectors. 2DKQPCA_Row performs 2DKQPCA in the row direction, while 2DKQPCA_Col in the column direction. Moreover, our algorithms were also compared with two existing
M
PCA-based algorithms - the quaternion-based algorithm using BD2DQPCA [9] and the conventional algorithm using
ED
BD2DKPCA [50]. BD2DQPCA was first proposed for RGB face recognition in [9], while here it was also used to RGB-D object recognition by combining with the proposed improved QR (34). BD2DKPCA was applied for gray face recognition in [50], while
CE
image independently.
PT
here it was used to RGB object recognition and RGB-D object recognition by treating each component of RGB and RGB-D object
5.1 Optimal parameters for KQPCA-based algorithm using IIIT-D RGB-D face dataset
AC
Here the IIIT-D RGB-D dataset [30] was used for experiment. IIIT-D RGB-D dataset contains 4065 RGB-D face images for 106 individuals captured in two sessions using Kinect sensor. The images are under normal illumination with variations in pose, expression and eyeglasses. The number of images per individual is variable from 11 to 254. This dataset provides five folds. Same as used in [30], eacht test in this subsection were carried out five times by using the five folds separately; moreover, each time uses 4 images per individual for training and the rest images for testing. In this paper, all of the RGB images and depth images were
14
AN US
CR IP T
ACCEPTED MANUSCRIPT
Fig. 2 Some sample images of two individuals from the IIIT-D RGB-D dataset. The first and third rows are RGB images, while the second and forth rows are their corresponding depth images
resized to 100×100, which was also the same as [30]. Depth images were converted to gray-scale ones. Some samples of this
M
dataset are given in Fig. 2.
ED
There are a few parameters in the proposed KQPCA-based algorithm: the real numbers b and c of the kernel function in (21); the weight α of the feature fusion in (37); the dimension of the projected features in row direction rrow and in column direction rcol
PT
for BD2DKQPCA, while only one for 1DKQPCA and 2DKQPCA. In this paper the parameter c was set to 1 which was usually used for the conventional KPCA. Other parameters were decided by the following experiments.
CE
We first considered two parameters: the power b and the weight α. In order to find the optimal values of these two parameters, the parameters rrow and rcol were set to their maximum ones: rrow and rcol were respectively equal to the width and height of the
AC
image. Then, for the power b, its value was varied from 0.05 to 1.5 per 0.05, while for the parameter α, its value was variable in the range 0 to 1.0 every 0.1. The average recognition rates of five times (folds) in IIIT-D RGB-D dataset are given in Fig. 3. It can be observed from this figure that: (a) for the power b, the recognition rates first increase, reach a maximum and then decrease for all KQPCA-based algorithms except 1DKQPCA. The maximum average rates are respectively 82.47 for 1DKQPCA with b = 0.8, 74.86 for 2DKQPCA_Row with b = 0.7, 87.58 for 2DKQPCA_Col with b = 0.1, and 89.03 for BD2DKQPCA with b = 0.2 and α = 0.1. These optimal parameters were used in the following tests; (b) the 2DKQPCA_Col-based algorithm outperforms the
15
ACCEPTED MANUSCRIPT
0.89
0.8
0.75
0.88
0.84 0.85 0.82 0.8 0.8 0.75 0.7 1.5
0.05
0.3
0.5 0.7 0.9 Power b in polynomial kernel
1.1
1.3
0.86
0.9
0.78 0.76
1.3
1.1
1
0.9 0.7 0.5 P o w ebr i n p o l y n o m i a l0 k. 3e r n e l 0.05 0
1.5
(a) 1DKQPCA and 2DKQPCA
0.74 0.6 0.4 0 . 2W e i g ht i n f e a t u r e f u s i o n0 . 7 2 0.8
(b) BD2DKQPCA
CR IP T
Average recognition rate
0.85
Av erage rec ognition rate
1DKQPCA 2DKQPCA_Row 2DKQPCA_Col
Fig. 3 Average recognition rates for different KQPCA-based algorithms under different parameters b and α
(b)
(c)
(d)
(e)
(f)
AN US
(a)
Fig. 4 Two original RGB face images and their corresponding mean column and mean row images. (a), (b), and (c) are respectively the original image, the mean column image, and the mean row image for the first image in Fig. 2. (d), (e), and (f) are for the first image of the third row in Fig. 2. Let the size of the original image be M×N, the mean column image is produced by copying the M×1 mean column vector N times. The mean column vector is computed by (33). It is the similar to the mean row image.
2DKQPCA_Row-based algorithm. The reason is that the mean column vector has a stronger ability for representing face images
M
than the mean row one. This is also the reason why the optimal weight α is with a small value 0.1 and why the influence of the
ED
weight α is greater than that of the power b shown in Fig. 3 (b) for the BD2DKQPCA-based algorithm. To make it clear, Fig. 4 shows two original face images and their corresponding mean column and row vectors. It can be seen that the mean column vector
PT
performs better in representing the feature of some important face parts (i.e., eyes, mouths, eyebrows, and forehead, etc.) for
CE
recognition task than the mean row vector. These important parts can be distinguished in the mean column images Fig. 4 (b) and (e). However, this is not the case for the mean row images Fig. 4 (c) and (f).
AC
Then, using the optimal parameters b and α obtained in the previous test, we discussed the remaining two parameters: the dimension of the projected features in row direction rrow and in column direction rcol. Because one can project the feature vector to arbitrary dimensions from 1 to the image width 100 in row direction, and 1 to the image height 100 in column direction for 2DKQPCA-based algorithms as well as from 1 to the number of pixels 100 × 100 for 1DKQPCA-based algorithm, comparing all cases takes a long time and is not necessary. So, two stages were considered. In the first stage, rrow and rcol were tested in the range [5, 100] with the interval 5, while in the second stage, they were evaluated in the range [r’-5, r’+5] with the minimum interval 1,
16
ACCEPTED MANUSCRIPT
where r’ was the optimal rξ, ξ {row, col}, obtained in the first stage. Notice that in order to make a comparison, the maximum value of r considered for 1DKQPCA was 100 instead of 100 × 100. Moreover, the following experimental results show that the rate can reach its maximum in the range [5, 100]. The results of two stages are given in Fig. 5. It can be seen from this figure that: (a) the recognition rates first increase, reach a maximum and then remain stable for all KQPCA-based algorithms. The reason is
CR IP T
that there exists high redundancy in the feature vector. It is also the reason why introducing PCA to reduce this redundancy; (b) for 2DKQPCA-based algorithms, the 2DKQPCA_Col-based algorithm is superior to the 2DKQPCA_Row-based algorithm. This is due to the stronger ability for representing face images of the mean column vector than that of the mean row one, which is also shown in the previous test as Fig. 4. This is also the reason for BD2DKQPCA why the increase of rcol makes the rates increase
AN US
greater than that of rrow; (c) BD2DKQPCA, whose maximum average recognition rate is 89.09 with the optimal parameters rrow = 25 and rcol = 20, has the best performance among four types of KQPCA. It is because that BD2DKQPCA considers the correlation not only among the mean row vectors of the face samples but also among the mean column vectors. The optimal parameters rrow and rcol of four types of KQPCA are shown in Table I and their corresponding average recognition rates are provided in Table III. 0.88
1DKQPCA 2DKQPCA_Row 2DKQPCA_Col
0.82
ED
Average recognition rate
0.86
0.78
0.7
5 10
20
PT
0.74
30
40
50
60
70
80
0.86
Average recognition rate
M
0.88
0.84
1DKQPCA 2DKQPCA_Row 2DKQPCA_Col
0.82 0.8 0.78 0.76
90
100
0.74 -5
-4
-3
-2
-1
0
1
2
3
4
5
r
(b) Second stage of 1DKQPCA and 2DKQPCA (r=r’+δ)
CE
(a) First stage of 1DKQPCA and 2DKQPCA
AC
Average recognition rate
0.8908 0.8915
0.8906
0.891
0.8904
0.8905
0.8902 0.89
0.89 0.8895
0.8898
0.889 35
0.8896 33 31
row 29
r
27 25
(c) First stage of BD2DKQPCA
15
17
19
21 col
r
(d) Second stage of BD2DKQPCA
Fig. 5 Average recognition rates for different KQPCA-based algorithms under different parameters rrow and rcol
17
23
25
0.8894 0.8892
ACCEPTED MANUSCRIPT
Table I Optimal parameters of different PCA-based algorithms for RGB-D face recognition Parameters
BD2DQPCA [9]
α
—
—
—
b
—
0.6
0.8
row
20
18
r
BD2DKPCA [50]
1DKQPCA
2DKQPCA_Row
40
col
r
37
40
2DKQPCA_Col
BD2DKQPCA
—
—
0.1
0.7
0.1
0.2
47
—
25
—
10
20
Table II Optimal parameters of different PCA-based algorithms for RGB face recognition BD2DQPCA [9]
α
—
—
—
—
b
—
0.7
0.8
1.3
row
35
55
r
BD2DKPCA [50]
1DKQPCA
2DKQPCA_Row
35 78
col
r
46
—
43
2DKQPCA_Col
BD2DKQPCA
—
0.1
0.3
0.3
—
34
12
10
CR IP T
Parameters
Table III. Average recognition rate (%) of different algorithms using the optimal parameters on IIIT-D RGB-D face dataset RISE[30]
RISE+ADM [30]
RGB
—
—
RGB-D
82.78
86.16
Datasets
1DKQPCA
RGB RGB-D
BD2DQPCA [9]
BD2DKPCA [50]
AN US
Datasets
78.90
82.23
81.35
83.91
2DKQPCA_Row
2DKQPCA_Col
BD2DKQPCA
84.94
68.43
86.37
87.65
85.29
74.86
87.77
89.09
M
In fact, we also obtained the optimal parameters for other compared algorithms and for RGB face recognition without using the
ED
depth information. For RGB face recognition, the algorithm using the proposed KQPCA is similar to the proposed RGB-D object recognition algorithm given in the Section 4. The only difference is that the input is the pure quaternion data as (9) with the real
PT
part equal to 0 for RGB face images while the quaternion data as (34) for RGB-D face images.The optimal parameters are
CE
respectively shown in Table I for RGB-D face recognition and Table II for RGB face recognition. The recognition results under these optimal parameters are given in Table III. Table III also provides the results of two algorithms, which were proposed by the
AC
creators of IIIT-D dataset respectively using RGB-D image descriptor based on saliency and entropy (RISE) as well as using RISE and attributes based on depth map (RISE+ADM) in [30]. Table III shows that: (1) for all compared algorithms, considering both RGB information and depth information performs better than considering only RGB information. It again demonstrates the importance of depth information in the object recognition; (2) some types of the proposed KQPCA have a good performance in RGB and RGB-D face recognition except for the 2DKQPCA_Row. Among them, the proposed BD2DKQPCA-based algorithm
18
ACCEPTED MANUSCRIPT
outperforms other compared algorithms. It attributes to the quaternion-based RGB-D object processing, the nonlinear processing using the kernel technique, as well as the consideration of both row and column direction.
5.2 Performance comparison using the other three datasets In this subsection, the other three datasets were considered to evaluate the proposed KQPCA-based algorithms using the optimal
CR IP T
parameters obtained in the previous subsection. One was also a face dataset (color FERET dataset [54]), while the other two were the dataset of some general RGB-D objects (CIN 2D/3D dataset [21] and Washington RGB-D dataset [22]). Some examples of these three datasets are provided in Fig. 6. The color FERET face dataset [54] contains a total of 11338 facial images from 994
AN US
individuals at various angles. Two subsets fa/fb captured in frontal views in this dataset were considered in this test. Same as used in [9], the first 200 samples of 200 individuals in fa subset were used for training, while the corresponding 200 samples in fb subset for testing. The 200 samples are with different race, sex, age and facial expression. Notice that the original data of these
M
samples with some background were used without the cropping by face location procedure. Because FERET dataset only provides the RGB color face images, the recently proposed algorithm using deep convolutional neural fields [55] was used to obtain the
ED
depth images of these face images. CIN 2D/3D dataset [21] consists of 163 object instances organized into 18 categories. For each
PT
object instance, RGB-D images corresponding to 36 views are recorded with the 10 degree angle between two neighboring views. The first two object instances of each category were considered in this test. Notice that the samples with very small sizes (5×5)
CE
were not considered and the provided depth images were convert into gray-scale ones. For the category recognition, we randomly
AC
chose 14 views of each instance for training and the remaining 22 views for testing. Processing in the same way as IIIT-D dataset, all of the images were resized to 100×100. The Washington RGB-D dataset [22] consists of 300 object instances organized into 51 categories. The number of images per instance is variable from 110 to 166. This dataset has 41,877 RGB-D images in total captured under three different viewpoint angles (30 degree, 45 degree and 60 degree above the horizon). Our experiments focused on category recognition. The first instance of each category was considered. We randomly selected the 20% images of each instance for training and the remaining images for testing. All of the images were also resized to 100×100.
19
CR IP T
ACCEPTED MANUSCRIPT
Fig. 6 Some examples of three datasets. First row: CIN 2D/3D dataset (depth images are the gray-scale version of those provided in [21]); Second row: Color
AN US
FERET face dataset (depth images are the pseudo-color images); Third row: Washington RGB-D dataset.
Table IV. Average recognition rate (%) of different algorithms using the optimal parameters on FERET, CIN 2D/3D and Washington RGB-D datasets FERET Algorithms
CIN 2D/3D
Washington RGB-D
RGB-D
RGB
RGB-D
RGB
RGB-D
Gaussina kernel SVM [22]
—
—
—
—
74.5
83.8
CKM Descriptor [23]
—
—
—
—
—
86.4
Upgraded HMP [24]
—
—
86.3
91.0
82.4
87.5
KDES [25]
—
—
—
—
—
86.5
—
—
—
—
84.1
91.3
—
—
—
—
84.7
94.0
CNN-TRANSFER+DAE [19]
—
—
87.6
91.3
—
—
CNN-SPM-RNN [20]
—
—
88.5
92.9
85.2
90.7
Ev2D3D [21]
—
—
66.6
82.8
—
—
Multi-Modal CNN [28]
—
—
—
88.4
—
86.9
—
—
88.0
92.8
82.8
88.5
BD2DQPCA [9]
86.50
87.50
59.72
66.29
77.13
74.25
BD2DKPCA [50]
90.00
91.50
89.02
89.90
90.81
93.06
1DKQPCA
86.00
87.50
89.27
90.53
91.75
92.32
2DKQPCA_Row
80.50
84.50
71.59
79.17
84.92
88.94
2DKQPCA_Col
91.00
92.50
89.02
91.29
89.85
92.05
BD2DKQPCA
92.50
93.50
90.28
92.80
91.57
94.04
ED
FusionNet (jet) [26]
AC
CE
PT
FusionNet (Surface normals) [27]
Subset-SAE-RNNs [29]
M
RGB
20
ACCEPTED MANUSCRIPT
The results on these three datasets for different compared algorithms are shown in Table IV. Table IV also provides the results of other existing eleven algorithms [19-29] for RGB-D object recogntion. The results of these eleven state-of-the-art algorithms are taken from their corresponding literatures. However, the authors of these algorithms reported results on only one or two datasets of two RGB-D object datasets (CIN 2D/3D dataset and Washington RGB-D dataset). So, in Table IV we present the results of these algorithms for their corresponding datasets only while use the symbol ―—‖ for other datasets. It can be observed
CR IP T
from this table that, for three new datasets, our proposed BD2DKQPCA-based algorithm also achieves the best performance among the compared seventeen algorithms including some PCA-based algorithms considered in the previous subsection. Here we again analyze the reason why 2DKQPCA_Col still outperforms 2DKQPCA_Row for three datasets. For color face FERET dataset, the reason is the same as that for IIIT-D face dataset described in the subsection 5.1. For CIN 2D/3D dataset and Washington RGB-D dataset, the reason is that most of objects in these two datasets (please see Fig. 6) are placed vertically and thus the mean
6.
AN US
column feature vector contains more object features than the row one.
Conclusions
In this paper, three types of KQPCA were proposed to improve the existing QPCA to effectively deal with the possibly
M
higher-order statistics of a quaternion signal. Moreover, the QR was also improved to resolve the redundancy problem and
ED
to represent the RGB-D images effectively. Then, a RGB-D object recognition algorithm was proposed by using the KQPCA and the improved QR. The proposed BD2DKQPCA-based algorithm outperforms some other existing algorithms
PT
for the following reasons: (a) the BD2DKQPCA-based algorithm uses the improved QR considering the additional
CE
important depth information together; (b) BD2DKQPCA utilizes the kernel technique to process the nonlinear quaternion signal and the RGB-D object recognition problem is highly nonlinear in real situation [37, 38]; (c) BD2DKQPCA
AC
considers the correlation not only among the mean row vectors of the samples but also among the mean column vectors. However, 2DKQPCA in this paper only considers the mean row/column vector to represent one sample due to the large dimension of quaternion kernel matrix. So, as future work, we will find an efficient approach to make use of all row/column vectors for even better performance.
21
ACCEPTED MANUSCRIPT
Acknowledgement Thank Prof. Shuisheng Zhou from Xidian University in China for providing the code for KPCA. This work was supported by the NSFC under Grants 61572258, 61232016, 61572257, 61672294, and 61602253, the Natural Science Foundation of Jiangsu
CR IP T
Province of China under Grants BK20151530, and BK20150925, BK21+ program by the Ministry of Education of Korea, the G-ITRC support program (IITP-2016-R6812-16-0001) supervised by the IITP, and the PAPD fund.
REFERENCES [1]
O.N. Subakan, and B.C. Vemuri, ―A quaternion framework for color image smoothing and segmentation,‖ Int. J. Comput. Vis., vol. 91, no. 3, pp. 233–250,
AN US
2011. [2]
T.A. Ell, and S.J. Sangwine, ―Hypercomplex fourier transforms of color images,‖ IEEE Trans. Image Process., vol. 16, no. 1, pp, 22–35, 2007.
[3]
S.J. Sangwine, ―Fourier transforms of colour images using quaternion or hypercomplex, numbers,‖ Electron. Lett., vol. 32, no. 1, pp. 1979–1980, 1996.
[4]
W.L.Chan, H. Choi, and G. Baraniuk, ―Directional hypercomplex wavelets for multidimensional signal analysis and processing,‖ In: Proc. IEEE Int. Conf.
M
Acoustics, Speech and Signal Processing (ICASSP 2004), pp. 996–999, 2004.
S. Gai, ―New banknote defect detection algorithm using quaternion wavelet transform,‖ Neurocomputing, vol. 196, pp. 133-139, 2016.
[6]
T. Nitta, ―A quaternary version of the back-propagation algorithm,‖ In: Proc. 1995 IEEE Int. Conf. Neural Networks (ICNN’95), Perth, Australia, vol. 5,
ED
[5]
pp. 2753–2756, 1995.
L. S. Saoud, R. Ghorbani, and F. Rahmoune, ―Cognitive quaternion valued neural network and some applications,‖ Neurocomputing, 2016. doi: j.neucom.2016.09.060.
N.L. Bihan, and S.J. Sangwine, ―Quaternion principal component analysis of color images,‖ In: Proc. 2003 10th IEEE Int. Conf. Image Processing (ICIP
CE
[8]
PT
[7]
2003), vol. 1, pp. 809–812, 2003.
Y.F. Sun, S.Y. Chen, and B.C.Yin, ―Color face recognition based on quaternion matrix representation,‖ Pattern Recognit. Lett., vol. 32, no. 4, pp. 597–605,
AC
[9]
2011.
[10] S.C. Pei, and C.M. Cheng, ―Quaternion matrix singular value decomposition and its applications for color image processing,‖ In: Proc. 2003 Int. Conf. Image Processing (ICIP 2003), vol. 1, pp. 805–808, 2003. [11] N.L. Bihan, and S. Buchholz, ―Quaternionic independent component analysis using hypercomplex nonlinearities,‖ In: Proc. IMA 7th Conf. Mathematics in Signal Processing, pp. 1–4, 2006. [12] Y.N. Li, ―Quaternion polar harmonic transforms for color images,‖ IEEE Signal Process. Lett., vol. 20, no. 8, pp. 803–806, 2013.
22
ACCEPTED MANUSCRIPT
[13] L.Q. Guo, and M. Zhu, ―Quaternion fourier–mellin moments for color image,‖ Pattern Recognit., vol. 44, no. 2, pp. 187–195, 2011. [14] B.J. Chen, H.Z. Shu, H. Zhang, G. Chen, C. Toumoulin, J.L. Dillenseger, and L.M. Luo, ―Quaternion Zernike moments and their invariants for color image analysis and object recognition,‖ Signal Process., vol. 92, no. 2, pp. 308–318, 2012. [15] B.J.Chen, H.Z.Shu, G. Coatrieux, G. Chen, X.M. Sun, and J.L. Coatrieux, ―Color image analysis by quaternion–type moments,‖ J. Math. Imag. Vis., vol. 51, no. 1, pp. 124–144, 2015.
CR IP T
[16] X.Y. Wang, W.Y. Li, H.Y. Yang, P.P. Niu, and Y.W. Li, ―Invariant quaternion radial harmonic Fourier moments for color image retrieval,‖ Opt. Laser Technol., vol. 66, pp. 78–88, 2015.
[17] Y.M. Fang, W.S. Lin, B.S. Lee, C.T. Lau, Z.Z Chen, and C.W. Lin, ―Bottom-up saliency detection model based on human visual sensitivity and amplitude spectrum,‖ IEEE Trans. Multimedia, vol. 14, no. 1, pp. 187-198, 2012.
[18] D. Assefa, L. Mansinha, K.F. Tiampo, H. Rasmussen, and K. Abdella, ―The trinion Fourier transform of color images,‖ Signal Process., vol. 91, no. 8, pp.
AN US
1887–1900, 2011.
[19] J.H. Tang, L. Jin, Z.C. Li, and S.H. Gao, ―RGB-D object recognition via incorporating latent data structure and prior knowledge,‖ IEEE Trans. Multimedia, vol. 17, no. 11, pp. 1899-1908, 2015.
[20] Y. Cheng, X. Zhao, K. Huang, and T.N. Tan, ―Semi–supervised learning and feature evaluation for RGB–D object recognition,‖ Comput. Vis. Image Und.,
M
vol. 139, pp. 149–160, 2015.
[21] B. Browatzki, J. Fischer, B. Graf, H.H. Bulthoff, and C. Wallraven, ―Going into depth: Evaluating 2D and 3D cues for object classification on a new,
ED
large-scale object dataset,‖ In: Proc. 2011 IEEE Int. Conf. Computer Vision Workshops (ICCV2011), Barcelona, Spain, pp. 1189–1195, 2011. [22] K. Lai, L.F. Bo, X.F. Ren, and D. Fox, ―A large-scale hierarchical multi-view rgb-d object dataset,‖ In: Proc. 2011 IEEE Int. Conf. Robotics and
PT
Automation (ICRA), pp. 1817–1824, 2011.
[23] M. Blum, J.T. Springenberg, J. Wülfing, and M. Riedmiller, ―A learned feature descriptor for object recognition in rgb-d data,‖ In: Proc. 2012 IEEE Int.
CE
Conf. Robotics and Automation (ICRA), pp. 1298–1303, 2012. [24] L.F. Bo, X.F. Ren, and D. Fox, ―Unsupervised feature learning for RGB-D based object recognition,‖ In: Proc. 13th Int. Symp. Experimental Robotics, pp.
AC
387–402, 2013.
[25] K. Lai, L.F. Bo, X.F. Ren, and D. Fox, ―RGB-D object recognition: Features, algorithms, and a large scale benchmark,‖ In Consumer Depth Cameras for Computer Vision, pp. 167–192, 2013.
[26] A. Eitel, J. T. Springenberg, L. Spinello, M. Riedmiller, and W. Burgard, ―Multimodal deep learning for robust rgb-d object recognition,‖ In: Proc. 2015 IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), pp. 681–687, 2015. [27] L. Madai-Tahy, S. Otte, R. Hanten, and A. Zell, ―Revisiting deep convolutional neural networks for rgb-d based object recognition,‖ In: Proc. 2016 Int. Conf. Artificial Neural Networks, pp. 29–37, 2016.
23
ACCEPTED MANUSCRIPT
[28] A.R. Wang, J.W. Lu, J.F. Cai, T.J. Cham, and G. Wang, ―Large-margin multi-modal deep learning for rgb-d object recognition,‖ IEEE Trans. Multimedia, vol. 17, no. 11, pp. 1887-1898, 2015. [29] J. Bai, Y. Wu, J.M. Zhang, and F.Q. Chen, ―Subset based deep learning for RGB-D object recognition,‖ Neurocomputing, vol. 165, pp. 280-292, 2015. [30] G. Goswami, M. Vatsa, and R. Singh, ―RGB-D face recognition with texture and attribute features,‖ IEEE Trans. Inf. Foren. Sec., vol. 9, no. 10, pp. 1629–1640, 2014.
CR IP T
[31] X. Lv, X.D. Liu, X.Y. Li, X. Li, S.Q. Jiang, and Z.Q. He, ―Modality-specific and hierarchical feature learning for rgb-d hand-held object recognition,‖ Multimed. Tools Appl., vol. 76, no. 3, pp. 4273–4290, 2016.
[32] S. Gupta, R. Girshick, P. Arbeláez, and J. Malik. ―Learning rich features from RGB-D images for object detection and segmentation,‖ In: Proc. Euro. Conf. Computer Vision, pp. 345–360, 2014.
[33] H. Xue, Y. Liu, D. Cai, and X. He, ―Tracking people in RGBD videos using deep learning and motion clues,‖ Neurocomputing, vol. 204, pp. 70–76, 2016.
Syst. Vid. Tech., vol. 26, no. 3, pp. 541–555, 2016.
AN US
[34] H. Zhang, and L.E. Parker, ―CoDe4D: color-depth local spatio-temporal features for human activity recognition from RGB-D videos,‖ IEEE Trans. Circ.
[35] I. Jolliffe. Principal component analysis. John Wiley & Sons, Ltd, 2002.
[36] Z. Zhang, M.B. Zhao, B. Li, P. Tang and F.Z. Li., ―Simple yet rffective volor principal and discriminant feature extraction for representing and recognizing
M
color images,‖ Neurocomputing , vol.149, pp.1058–1073, 2015.
[37] A. Eftekhari, M. Forouzanfar, H.A. Moghaddam, and J. Alirezaie, ―Block-wise 2D kernel PCA/LDA for face recognition,‖ Inf. Process. Lett., vol. 110, no.
ED
17, pp. 761–766, 2010.
[38] N. Sun, H.X. Wang, Z.H. Ji, C.R. Zou, and L. Zhao, ―An efficient algorithm for Kernel two-dimensional principal component analysis,‖ Neural Comput.
PT
Appl., vol. 17, no. 1, pp. 59–64, 2008.
[39] B.J. Chen, G. Coatrieux, J.S. Wu, Z.F. Dong, J.L. Coatrieux, and H.Z. Shu, ―Fast computation of sliding discrete Tchebichef moments and its application in
CE
duplicated regions detection,‖ IEEE Trans. Signal Process., vol. 63, no. 20, pp. 5424-5436, 2015. [40] B. Schölkopf, A. Smola, and K.R. Müller, ―Nonlinear component analysis as a kernel eigenvalue problem,‖ Neural Comput., vol. 10, no. 5, pp. 1299–1319,
AC
1998.
[41] A. Chakrabarti, A.N. Rajagopalan, and R. Chellappa, ―Super-resolution of face images using kernel PCA-based prior,‖ IEEE Trans. Multimedia, vol. 9, no. 4, pp. 888-892, 2007.
[42] R. Zeng, J.S. Wu, Z.H. Shao, L. Senhadji, and H.Z. Shu, ―Quaternion softmax classifier,‖ Electron. Lett., vol. 50, no. 25, pp. 1929–1930, 2014. [43] L. Shi, and B. Funt, ―Quaternion color texture segmentation,‖ Comput. Vis. Image Und., vol. 107, no. 1, pp. 88–96, 2007. [44] F.N. Lang, J.L. Zhou, S. Cang, H. Yu, and Z. Shang, ―A self-adaptive image normalization and quaternion PCA based color image watermarking algorithm,‖ Expert Syst. Appl., vol. 39, no. 15, pp. 12046–12060, 2012.
24
ACCEPTED MANUSCRIPT
[45] X. Xu, Z. Guo, C. Song, and Y. Li, ―Multispectral palmprint recognition using a quaternion matrix,‖ Sensors, vol. 12, no. 4, pp. 4633–4647, 2012. [46] W. R. Hamilton, Elements of quaternions. Longmans, Green, and Company, 1899. [47] T. A. Ell, L. B. Nicolas, and S. J. Sangwine, Quaternion Fourier transforms for signal and image processing. John Wiley & Sons, 2014. [48] D.B. Sweetser, Doing physics with quaternions. Open access at http://www.theworld.com/~sweetser/quanternions/ps/book.pdf. [49] V.D.M. Nhat, and S.Y. Lee, ―Kernel-based 2DPCA for face recognition,‖ In: Proc. 2007 IEEE Int. Symp. Signal Processing and Information Technology,
CR IP T
2007, pp. 35–39. [50] D.Q. Zhang, S.C. Chen, and Z.H. Zhou, ―Recognizing face or object from a single image: linear vs. kernel methods on 2d patterns,‖ Structural, Syntactic, and Statistical Pattern Recognition. Springer Berlin Heidelberg, 2006: 889–897.
[51] J. Yang, D.Q. Zhang, A. F. Frangi, and J.Y. Yang, ―Two-dimensional PCA: a new approach to appearance-based face representation and recognition,‖ IEEE Trans. Pattern Anal. Machine Intell., vol. 26, no. 1, pp. 131–137, 2004.
AN US
[52] N.K. Logothetis, J. Pauls, M. Augath, T. Trinath, and A. Oeltermann, ―Neurophysiological investigation of the basis of the fMRI signal,‖ Nature, vol. 412(6843), pp. 150-157, 2001.
[53] E. Attalla, P. Siy, ―Robust shape similarity retrieval based on contour segmentation polygonal multiresolution and elastic matching,‖ Pattern Recogn., vol. 38, no. 12, pp. 2229-2241, 2005.
Machine Intell., vol. 22, no. 10, pp. 1090–1104, 2000.
M
[54] P. J. Phillips, H. Moon, P. J. Rauss, and S. Rizvi, ―The FERET evaluation methodology for face recognition algorithms,‖ IEEE Trans. Pattern Anal.
ED
[55] F.Y. Liu, C.H. Shen, G.S. Lin, and I. Reid, ―Learning depth from single monocular images using deep convolutional neural fields,‖ IEEE Trans. Pattern
AC
CE
PT
Anal. Machine Intell., vol. 38, no. 10, pp. 2024-2039, 2016.
25
ACCEPTED MANUSCRIPT
CR IP T
Biographies of Authors
Beijing Chen
Beijing Chen received the Ph.D. degree in Computer Science in 2011 from Southeast University, Nanjing, China. Now he
AN US
is an associate professor in the School of Computer & Software, Nanjing University of Information Science & Technology,
ED
M
China. His research interests include color image processing, pattern recognition and information security.
PT
Jianhao Yang received the B.S. degree in Computer Science in 2015 from Nanjing University of Information Science & Technology, Nanjing, China. He is currently pursuing the M.S. degree in the School of Computer & Software, Nanjing
AC
CE
University of Information Science & Technology, Nanjing, China. His research interest includes color image processing.
Byeungwoo Jeon received the B.S. and M.S. degrees, both from the Department of Electronics Engineering, Seoul National University, Seoul, Korea, in 1985 and 1987, respectively, and the Ph.D. degree from the School of Electrical Engineering,
26
ACCEPTED MANUSCRIPT
Purdue University, West Lafayette, IN, USA, in 1992. He was with the Signal Processing Laboratory, Samsung Electronics, Suwon, Korea, from 1993 to 1997, where he was involved in the research and development of video compression algorithms, the design of digital broadcasting satellite receivers, and other MPEG-related research for multimedia applications. He served as the Project Manager of Digital TV and Broadcasting with the Korean Ministry of Information and Communications from 2004 to
CR IP T
2006, where he supervised all digital TV-related research and development in Korea. Since 1997, he has been a Faculty Member with the School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon, where he is currently a Professor. He has authored many papers in video compression, pre/postprocessing, and pattern recognition. He holds more than 50 issued
AN US
patents (Korea and worldwide) in these areas. His research interests include multimedia signal processing, video compression,
ED
M
statistical pattern recognition, and remote sensing.
PT
Xinpeng Zhang received the B.S. degree in computational mathematics from Jilin University, China, in 1995, and the M.E. and Ph.D. degrees in communication and information system from Shanghai University, China, in 2001 and 2004, respectively. Since
CE
2004, he has been with the faculty of the School of Communication and Information Engineering, Shanghai University, where he
AC
is currently a Professor. He was with the State University of New York at Binghamton as a visiting scholar from January 2010 to January 2011, and Konstanz University as an experienced researcher sponsored by the Alexander von Humboldt Foundation from March 2011 to May 2012. He is an Associate Editor for IEEE Transactions on Information Forensics and Security. His research interests include multimedia security, image processing, and digital forensics. He has published more than 200 papers in these areas.
27