Kernel quaternion principal component analysis and its application in RGB-D object recognition

Kernel quaternion principal component analysis and its application in RGB-D object recognition

Accepted Manuscript Kernel quaternion principal component analysis and its application in RGB-D object recognition Beijing Chen , Jianhao Yang , Byeu...

930KB Sizes 0 Downloads 32 Views

Accepted Manuscript

Kernel quaternion principal component analysis and its application in RGB-D object recognition Beijing Chen , Jianhao Yang , Byeungwoo Jeon , Xinpeng Zhang PII: DOI: Reference:

S0925-2312(17)30897-4 10.1016/j.neucom.2017.05.047 NEUCOM 18465

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

19 October 2016 18 March 2017 21 May 2017

Please cite this article as: Beijing Chen , Jianhao Yang , Byeungwoo Jeon , Xinpeng Zhang , Kernel quaternion principal component analysis and its application in RGB-D object recognition, Neurocomputing (2017), doi: 10.1016/j.neucom.2017.05.047

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Kernel quaternion principal component analysis and its application in RGB-D object recognition Beijing Chena,b,c, Jianhao Yanga, Byeungwoo Jeonc, Xinpeng Zhangd a

Jiangsu Engineering Center of Network Monitoring, School of Computer & Software, Nanjing University of Information Science & Technology, Nanjing 210044, China

Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science & Technology, Nanjing 210044, China c

CR IP T

b

College of Information & Communication Engineering, Sungkyunkwan University, Suwon 440746, Korea

d

AN US

School of Communication & Information Engineering, Shanghai University, Shanghai 200072, China.

ABSTRACT

While the existing quaternion principal component analysis (QPCA) is a linear tool developed mainly for processing linear quaternion

M

signals, the quaternion representation (QR) used in QPCA creates redundancy when representing a color image signal of three components by a

quaternion matrix having four components. In this paper, the kernel technique is used to improve the QPCA as kernel QPCA (KQPCA) for

ED

processing nonlinear quaternion signals; in addition, both RGB information and depth information are considered to improve QR for

PT

representing RGB-D images. The improved QR fully utilizes the four-dimensional quaternion domain. We first provide the basic idea of three

types of our KQPCA and then propose an algorithm for RGB-D object recognition based on bidirectional two-dimensional KQPCA

CE

(BD2DKQPCA) and the improved QR. Experimental results on four public datasets demonstrate that the proposed BD2DKQPCA-based

AC

algorithm achieves the best performance among seventeen compared algorithms including other existing PCA-based algorithms, irrespective of

RGB object recognition or RGB-D object recognition. Moreover, for all compared algorithms, consideration of both RGB and depth

information is shown to achieve better performance in object recognition than considering only RGB information.

Keywords:



Principal component analysis; quaternion; kernel function; RGB-D object recognition

Corresponding author. Tel.: +82-31-290-7144.

E-mail address: [email protected].

1

ACCEPTED MANUSCRIPT

1.

Introduction

The quaternion is a generalization of the complex number. During the past two decades, it starts to be more seriously considered to deal with color image signals by encoding their three channels into the imaginary parts of the quaternion

CR IP T

representation [1-17]. Its main advantage for representing and processing color signal lies in that a color image signal can be treated holistically as a vector field [1, 2, 14, 15, 17]. Many classical tools developed for gray-scale images have been successfully generalized to color image processing using the quaternion algebra, such as Fourier transform [2, 3], wavelet transform [4, 5], neural networks [6, 7], principal component analysis [8, 9], singular value decomposition [10], independent

AN US

component analysis [11], polar harmonic transform [12], and moments [13-16], etc.

However, the extra fourth dimension in quaternion when representing color images of three components creates redundancy and brings in additional computational cost. Assefa et al. [18] tried to circumvent these disadvantages by introducing a new

M

representation scheme in three-dimensional space using the trinion which has one real and two imaginary components, and subsequently the trinion Fourier transform based on this new representation. However, only few color image processing

ED

approaches adopt this representation, while more and more published works still use the quaternion representation (QR). Some

PT

of the main reasons can be: (a) the theory of trinions still needs be concretized more compared to that of the quaternions which provides theoretical basis for quaternion-based color image processing [1-17]; (b) the QR has been successfully used in many

CE

fields of color image processing [1-17]. So, this paper still considers the QR and resolves its redundancy problem by introducing

AC

the additional important depth information into the QR. Recent popularity of Kinect devices makes it easier to capture RGB-D images carrying both color and depth information [19]. The depth information has many extra advantages: being invariant to lighting and color variations, allowing better separation from background and providing pure geometry and shape cues [20]. So, combining color and depth information can dramatically improve the performance of many vision problems, e.g., object recognition [19-31], object detection [32], object tracking [33], and human activity analysis [34]. Among those problems, the RGB-D object recognition has attracted the most attention [19-31].

2

ACCEPTED MANUSCRIPT

However, we note that the quaternion-based work in the area of RGB-D object recognition is yet absent. Thus, it is our motivation to improve the QR for RGB-D object recognition by considering both color and depth information in an efficient way. In the areas of pattern recognition, computer vision, and multimedia signal processing, while the principal component analysis

CR IP T

(PCA) [35, 36] is a well-known feature extraction technique from possibly high-dimensional data set, its basic nature of being a linear technique does not allow effective treatment of the possibly higher-order statistics of signals since some factors like variations in illumination, shape, and pose make the pattern recognition problems highly nonlinear in real situation [37-39]. This

AN US

well explains why the concept of kernel PCA (KPCA) [40] has been introduced. KPCA is the nonlinear generalization of the PCA via the kernel trick - it first maps the input data into a high-dimensional feature space to convert the nonlinear input data into linear ones, and then performs the conventional PCA in the feature space [40, 41].

Recently, using the quaternion algebra, the so-called quaternion PCA (QPCA) [8], two-dimensional QPCA (2DQPCA) and

M

bidirectional 2DQPCA (BD2DQPCA) [9] have been proposed to generalize the conventional PCA to color image signals. They

ED

have been successfully used in color face recognition [9, 42], color texture segmentation [43], color image watermarking [44], and multispectral palmprint recognition [45], etc. However, these works using the conventional QR is still no exception from

PT

previously mentioned problem of redundancy. Moreover, the quaternion version of KPCA is still absent. So, in this paper we

CE

propose the kernel QPCA (KQPCA) and its two-dimensional version, two-dimensional KQPCA (2DKQPCA) as well as bidirectional 2DKQPCA (BD2DKQPCA), and then apply them to the RGB-D object recognition problem by combining it with

AC

the improved QR.

This paper is organized as follows. In Section 2, we recall some basic features of quaternions, and then present some types of QPCA and KPCA. Section 3, the theoretical part of this paper, provides some types of the proposed KQPCA. The improved QR and an algorithm for RGB-D object recognition based on BD2DKQPCA are given in Section 4. Experimental results and analysis are provided in Section 5 to illustrate the performance of the proposed KQPCA. Section 6 concludes the paper. 3

ACCEPTED MANUSCRIPT

2.

Some Preliminaries

2.1 Quaternion number and quaternion color representation Quaternion introduced by the mathematician Hamilton in 1843 [46] is a generalization of complex number - a quaternion has

q  qr  qi i  q j j  qk k ,

CR IP T

one real part and three imaginary parts given by, (1)

where qr, qi, qj, qk  R, and i, j, k are three imaginary units obeying the following rules,

i 2  j 2  k 2  1, ij =  ji = k , jk = kj = i , ki = ik = j .

(2)

AN US

Moreover, i, j, k are also the fundamental quaternion units. The rules given in (2) help the algebra of quaternions to be the important normed division algebra, where the norm of a product equals the product of the norms, and where every element of the algebra (except zero) has a multiplicative inverse [47]. If the real part qr = 0 in (1), q is called a pure quaternion.

Let q1 = q1,r + q1,i i + q1,j j+ q1,k k and q2 = q2,r + q2,i i + q2,j j+ q2,k k be two quaternion numbers, according to the rules given in

M

(2), the addition and multipulication of q1 and q2 are respectively provided by,

q1 + q2 = (q1,r + q2,r) + (q1,i + q2,i)i + (q1,j + q2,j)j+ (q1,k + q2,k)k,

ED

q1  q2  (q1, r q2, r  q1,i q2,i  q1, j q2, j  q1, k q2, k )  (q1, r q2,i  q1,i q2, r  q1, j q2, k  q1, k q2, j )i

PT

  q1, r q2, j  q1, j q2, r  q1, k q2,i  q1,i q2, k  j   q1,r q2,k  q1,k q2,r  q1,i q2, j  q1, j q2,i  k.

(3) (4)

AC

CE

The conjugate and modulus of a quaternion q in (1) are respectively defined as,

q*  qr  qi i  q j j  qk k ,

(5)

q  (qr )2  (qi )2  (q j )2  (qk ) 2 .

(6)

For a quaternion q = qr + qv, qv = qi i + qj j+ qk k, its logarithm and exponential are separately given by [48],

ln(q)  ln( q ) 

qv qv

q  arccos  r  , q  

 q exp(q)  exp(qr )  cos( qv )  v sin( qv qv 

(7)

 ) . 

Let f(u, v) be an RGB image function with the QR, each pixel can be represented as a pure quaternion,

4

(8)

ACCEPTED MANUSCRIPT

f (u, v)  f R (u, v)i  fG (u, v) j  f B (u, v)k ,

(9)

where fR(u, v), fG(u, v) and fB(u, v) are respectively the red, green, and blue components of the pixel (u, v). 2.2 Quaternion version of linear PCA

(1DQPCA), 2DQPCA, and BD2DQPCA.

CR IP T

Here, we briefly describe three types of linear QPCA for quaternion color images given in [8, 9], i.e., one-dimensional QPCA

Let x be the 1D version of a given M×N color image represented by QR in (9), 1DQPCA linearly projects the MN×1 pure quaternion vector x to the r1DQPCA dimensional feature subspace as y1DQPCA,

(10)

AN US

T y1DQPCA  W1DQPCA (x  x),

where W1DQPCA is the MN×r1DQPCA dimensional QPCA projection matrix, (•)T represents the conjugate transpose of the quaternion matrix, x is the mean vector of all training images. The method for computing the projection matrix W1DQPCA based

M

on quaternion singular value decomposition can be found in [8].

However, 2DQPCA treats the color image directly through representing the M×N color image by the M×N pure quaternion

ED

matrix X as (9) and then projects X to the M×rrow dimensional feature subspace as Y2DQPCA in the row direction, (11)

PT

Y2DQPCA  (X  X)Wrow ,

where Wrow is the 2DQPCA projection matrix of N×rrow dimension in the row direction, X is the mean matrix of all training

CE

images. More detail about the computation of the 2DQPCA projection matrix Wrow can be found in [9]. In fact, 2DQPCA can

AC

also project X to rcol×N dimensional feature subspace in the column direction, T Y2DQPCA  Wcol (X  X),

(12)

where Wcol is the M×rcol dimensional 2DQPCA projection matrix in the column direction. Finally, BD2DQPCA projects the quaternion image matrix X to the rcol × rrow dimensional feature subspace respectively in the row and column directions as YBD2DQPCA, T YBD2DQPCA  Wcol (X  X)Wrow .

5

(13)

ACCEPTED MANUSCRIPT

2.3 Nonlinear kernel PCA In this subsection, we briefly introduce three types of nonlinear KPCA, i.e., one-dimensional KPCA (1DKPCA) [38], two-dimensional KPCA (2DKPCA) [49], and bidirectional 2DKPCA (BD2DKPCA) [50]. KPCA is a nonlinear generalization version of the conventional PCA. The basic idea of 1DKPCA is first to map the original

φ: RN→F, xs→ xˆ s , s =1, 2, …, S, and then to perform a linear one-dimensional PCA in F.

CR IP T

input data xs (s =1, 2, …, S) of size N×1 into a high-dimensional feature space F via a nonlinear mapping,

(14)

AN US

Because we only need to compute the dot product between the mapped patterns and never need the mapped patterns explicitly, in order to avoid the high computational load in the possibly very high-dimensional space F, the kernel technique is introduced in [40]. The covariance matrix is replaced by the kernel matrix K=(ks,t)S×S using the kernel functions, (15)

M

ks,t = k(xs, xt) =< xˆ s , xˆ t >. The commonly-used kernel functions are as follows,

Polynomial kernels Gaussian kernels

(16)

Sigmoid kernels

PT

ED

 xs  xt  c b    x x  k (xs , xt )  exp   s 2 t  2     tanh(  x  x  a) s t 

CE

where b, c, σ, ρ, and a are some real numbers. For 2D images, one can in advance transform 2D image matrices into 1D image vectors before conducting 1DKPCA.

AC

However, in this method, it is difficult to evaluate the covariance matrix accurately due to its large size [40, 42]. So, Nhat and Lee [49] proposed 2DKPCA to consider the rows of 2D image for the nonlinear mapping. The nonlinear mapping for the 2D image samples Xs (s =1, 2, …, S) of size M×N is given by,

ˆ , s =1, 2, …, S, m =1, 2, …, M,  : R N  F, Xs,m  X s,m

(17)

where Xs,m is the mth row vector of the sth sample Xs. The following procedures, i.e., kernel matrix calculation, eigenvalue

6

ACCEPTED MANUSCRIPT

calculation and projection, are similar to the 1DKPCA by treating each row vector as a sample. Of course, one can also consider the columns of 2D image for the nonlinear mapping. It is similar to the 2DKPCA in row direction.

In fact, the 2DKPCA is essentially the 1DKPCA on rows or columns of an image. 2DKPCA only considers the correlation among the row or column vectors of the image matrix and ignores the other one [50]. So, Zhang et al. [50]

CR IP T

introduced the BD2DKPCA by integrating the row-direction 2DKPCA and the column-direction 2DKPCA. Moreover, to further alleviate the computational cost of 2DKPCA, they used the mean row (or column) vector to approximate the 2D image. After performing both the row direction 2DKPCA and the column direction 2DKPCA as well as transforming two

AN US

feature matrices into two 1D vectors, they presented two integrating ways: the first one applied PCA to process two 1D vectors and then combined the results into one feature vector for classification, while the other one first combined two 1D vectors into one vector and then applied PCA to this vector.

Kernel Quaternion PCA

M

3.

ED

According to the definitions of QPCA and KPCA, this section introduces three types of KQPCA: one-dimensional KQPCA

PT

(1DKQPCA), 2DKQPCA, and BD2DKQPCA. These three types of KQPCA can be applied to process quaternion data, including color image data represented by the QR in (9).

CE

3.1 1DKQPCA

AC

Given a set of 1D training quaternion data samples xqs (s =1, 2, …, S) with the dimension M×1. Notice that here the superscript ―q‖ means that the variable is a quaternion variable. The same meaning is in the following of this paper. A quaternion function ψ is used to map the vector data in a quaternion space QM into a high-dimensional quaternion feature space G as,

 : QM  G, xqs  xˆ qs , s =1, 2, …, S.

(18)

After the mapping, the quaternion covariance matrix can be denoted by,

Cq 

T 1 S q xˆ s  xsq  xˆ qs  xsq  ,   S s 1

7

(19)

ACCEPTED MANUSCRIPT

where xsq is the mean vector of all mapped training samples xˆ qs . However, similar to KPCA, a high computational load may perform in the possibly very high-dimensional space G. Moreover, in fact we do not need the mapped patterns explicitly in the following procedures. So, the covariance matrix is replaced with the quaternion kernel matrix K q  (ksq,t )S S by computing the dot product of two patterns in G with the kernel function as follow,

CR IP T

ksq,t  k (xqs , xqt )  xˆ qs , xˆ qt  .

(20)

Here the kernel function is the quaternion version of that given in (16). The quaternion polynomial kernel is considered in this paper due to its simple form and its quaternion-preserving property (the mapped pattern of a quaternion pattern is still a

AN US

quaternion one),

ksq,t  k (xqs , xqt )   xsq  xqt  c   exp  ln(xsq  xqt  c)  b  , b

(21)

where b and c are two real numbers, exp(•) and ln(•) are respectively the quaternion exponential function given in (7) and the

M

quaternion logarithmic function in (8).

Before calculating the eigenvalues of the quaternion matrix Kq, Kq is centralized as,

Kq  Kq  1S K q  K q 1S  1S K q 1S ,

ED

(22)

PT

where 1S  R S×S, (1S)s,t=1/S. The centralization property of K q is proofed in the following.

CE

Proof. Let K q  (ksq,t ) S S and K q  (ksq,t )S S , from (22) we have,

ksq,t  ksq,t 

1 S q 1 S q 1 ks ,t  2  ks ,t  S  S s 1 S t 1

S

S

 k s 1 t 1

q s ,t

.

(23)

AC

Then, we obtain the sum of all elements in the matrix K q as, S

S

 k s 1 t 1

S

q s ,t

S

S

S

S

S

S

S

  ksq,t   ksq,t   ksq,t   ksq,t  0 . s 1 t 1

s 1 t 1

s 1 t 1

(24)

s 1 t 1



So, the matrix K q is a centralized version of Kq. Then, the eigenvalue problem of 1DKQPCA is calculated as,

Kq w  w ,

(25)

where λ and w are respectively the eigenvalue and the eigenvector. It is easy to obtain the eigenvalues and eigenvectors for a real

8

ACCEPTED MANUSCRIPT

matrix, while it is no so for a quaternion matrix due to the quaternion computation complexity. In order to resolve this eigenvalue problem, the method for self-conjugate quaternion matrices used in [9] and [44] is considered based on the following theorem given in [9, 44].

K 2  q 1 , λs be the eigenvalues of Kσ, for the two different *  be the educing matrix of K  K  1 

CR IP T

 K1 Theorem 1. Let Kσ =    K *  2

eigenvalues λs, λt, we can construct the following two matrices,

I  (K σ  s I) (K σ  s I) , I  (K σ  t I) (K σ  t I) ,

(26)

where I is the unit matrix, the symbol ―+‖ represents the Moore-Penrose pseudo-inverse matrix. If (η1 η2)T and (β1 β2)T are the

AN US

first column vectors of the two matrices in (26), the eigenvectors of K q corresponding to the eigenvalues λs and λt can be expressed as η1+i(η2)*, and β1+i(β2)*, which are mutually orthogonal.

Suppose wt, t = 1, 2,…, r, are the normalized solutions of (25) corresponding to the largest r eigenvalues. Then, the projection

M

matrix is given by,

W = [w1, w2, …, wr]QS×r.

ED

(27)

For a new sample xq, one can simply project the mapped sample as,

PT

 S  y1DKQPCA  WT K qx  ( ytq )r 1    wt , s k (xq , xqs )  ,  s 1 r 1

, k (xq , xqS ) 

T

is the centralized kernel matrix

CE

where wt,s is the sth coefficient of the eigenvector wt, K qx  k (xq , x1q ), k (xq , xq2 ),

(28)

by (22), whose element k (xq , xsq ) , s = 1, 2,…, S, are the quaternion kernel coefficients between the new sample xq and all S

AC

training samples xqs computed by (21). 3.2 2DKQPCA

For 2D training quaternion data sample, one can transform it into 1D vector to perform KQPCA. However, it makes the kernel matrix too large to calculate accurately [49, 51] and ignores the intrinsic spatial structure information of the 2D data [50]. So,

1

(•)*

is the conjugate operator of the complex matrix, K q = Kr+iKi+jKj+kKk = (Kr+iKi) + (Kj+iKk)j := K1+K2j, where Kr, Ki,

Kj and Kk are four components of the quaternion matrix K q . 9

ACCEPTED MANUSCRIPT

using the similar method to 2DKPCA, 2DKQPCA considers the rows or columns of 2D quaternion sample Xqs with the size M×N for nonlinear mapping as,

ˆ q,row , s =1, 2, …, S, m =1, 2, …, M,  row : QN  G, Xq,row X s,m s,m

(29)

q,col ˆ q,col , s =1, 2, …, S, n =1, 2, …, N,  col : QM  G, Xs,n X s,n

(30)

CR IP T

q,col q where Xq,row s,m and Xs,n are respectively the mth row vector and the nth column vector of the sth quaternion data sample X s . The

following procedures, i.e., kernel matrix calculation, eigenvalue calculation and projection, are similar to the 1DKQPCA by treating each row or column vector as a sample, thus they are omitted here.

AN US

In fact, the dimension of the kernel matrix of 2DKQPCA is usually still too large. For a set of S quaternion data samples Xqs with the dimension M × N, the dimension of the kernel matrix is SM × SM for 2DKQPCA in row direction and SN × SN in column direction. The large dimension will result in heavy computational cost for the subsequent procedure of eigenvalue

M

calculation. So, to further alleviate computational cost, the method for 2DKPCA by Zhang et al. [50] is considered in this paper for 2DKQPCA. The mean row vector (or column vector) of M row vectors (or N column vectors) instead of M row vectors (or N

ED

column vectors) themselves is used in this paper to represent a sample for nonlinear mapping,

PT

ˆ q,row , s =1, 2, …, S,  row : QN  G, Xq,row X s s ˆ q,col , s =1, 2, …, S,  col : QM  G, Xsq,col  X s

(31) (32)

Xq,row  s

1 M

M

X m 1

q,row s,m

 , Xq,col s

1 N q,col  Xs,n . N n 1

(33)

AC

CE

where Xq,row and Xq,col are respectively the mean row vector and the mean column vector of the sample Xqs obtained by, s s

This processing makes the dimension of the kernel matrix be S × S. The computational cost is reduced significantly. In fact, this method reduces 2DKQPCA to 1DKQPCA by representing the 2D images as the 1D mean vector. 3.3 BD2DKQPCA 2DKQPCA only considers the correlation among the row or column vectors of the data matrix and ignores the other one. So, in order to make 2DKQPCA more efficient in the following RGB-D objects recognition application, BD2DKQPCA is introduced

10

ACCEPTED MANUSCRIPT

in this subsection. Similar to BD2DKPCA in [50], the basic idea of BD2DKQPCA is first to respectively perform 2DKQPCA in row direction and in column direction, and then to fuse the feature vectors for classification. The details for implementation step can be found in the subsection 4.2 by combining the application in RGB-D object recognition.

RGB-D Object Recognition Based on KQPCA

CR IP T

4.

KQPCA presented in previous section can process the general four dimensional quaternion signals, including three dimensional color image signal represented by the QR. However, as mentioned in the introduction, the conventional QR creates

AN US

redundancy when using four-dimensional quaternions to represent color images of three components [18]. So, in this paper the RGB-D object recognition application is considered by improving the QR for the RGB-D object images. We provide an algorithm for RGB-D object recognition by using the improved QR and BD2DKQPCA in this section. The algorithms based on

M

other types of KQPCA are similar to the following BD2DKQPCA-based algorithm, thus they are omitted. 4.1 Improved quaternion representation of RGB-D images

ED

It is well-known that the depth feature is invariant to lighting and color variations [20]. In order to resolve the redundancy problem, the existing QR is improved for RGB-D images by combining both of the color and depth information.

PT

Let g(u, v) be an RGB-D image function, each of its pixel can be represented as a quaternion number, (34)

CE

g(u, v) = gD(u, v) + gR(u, v)i + gG(u, v)j + gB(u, v)k,

where gD(u, v), gR(u, v), gG(u, v) and gB(u, v) are respectively the depth, red, green and blue components of the pixel (u, v). The

AC

new QR of RGB-D image in (34) improves the existing QR of color image in (9) at least in two aspects: (a) it considers the color information as well as the important depth information together; (b) it does not create redundancy by encoding the depth information into the extra dimension (see the real part in (34)). 4.2 RGB-D object recognition using the improved QR and BD2DKQPCA The flow chart of the proposed RGB-D object recognition algorithm is given in Fig. 1. The steps of training and testing procedures are respectively described in the following.

11

ACCEPTED MANUSCRIPT

RGB-D object testing set Zt Improved QR

Zqt

Mean row vectors calculation

RGB-D object training set

Zq,row t

Xs

Zq,col t

Xqs

k (Ζq,row , Xq,row ) t s Wrow

X

Mean column vectors calculation

X

q,row s

y

Column 2DKQPCA

Column projection

matrix Wrow and

matrix Wcol and

X,row feature y s

feature y sX,col

Z,row t

Column projection y tZ,col

y sX,col

Classification using the nearest neighbor classifier

AN US

Row projection

k (Ζq,col , Xq,col ) t s

Wcol

Row projection

q,col s

y sX,row

Row 2DKQPCA

Calculation of the kernel function between each test sample and each training sample in column direction

CR IP T

Calculation of the kernel function between each test sample and each training sample in row direction

Improved QR

Mean row vectors calculation

Mean column vectors calculation

Recognition results

(a) Training procedure

(b) Testing procedure

Fig. 1 Flow chart of the proposed RGB-D object recognition algorithm based on BD2DKQPCA

Training procedure:

M

1) Represent each RGB-D sample Xs, s = 1, 2, …, S, in the training set by the improved quaternion representation (34) as Xqs .

Xq,ξ s , s = 1, 2, …, S, ξ{row, col}.

ED

2) Calculate the mean row vector and the mean column vector of each quaternion-based training sample Xqs by using (33) as

PT

3) Respectively perform 1DKQPCA to the mean row vectors Xq,row and the column ones Xq,col by following the steps given in s s

CE

the subsection 3.1 from (20) to (27): (20) and (21) for the kernel matrix computation, (22) for the kenel matrix centralization, (25) and Theorem 1 for the solution of eigenvalue and eigenvector, (27) for the projection matrix computation. Finally, obtain the row

AC

projection matrix Wrow and the column one Wcol. 4) Project each quaternion-based training sample Xqs by (28) in both directions with the projection matrics Wrow and Wcol to obtain the row and column projected features of the training set y sX,ξ  { ysX, h, | h  1, 2,..., r  } , s = 1, 2, …, S, ξ{row, col}.

Testing procedure: 1) Represent each RGB-D sample Zt, t = 1, 2, …, T, in the testing set by the improved quaternion representation (34) as Z qt .

12

ACCEPTED MANUSCRIPT

2) Calculate the mean row vectors and the mean column vectors of each quaternion-based test sample Z qt by using (33) as Zq,ξ t , t = 1, 2, …, T, ξ{row, col}. 3) Compute the kernel function between each test sample and each training sample in both of the row and column directions by q,ξ (21) as k( Zq,ξ t , Xs ), t = 1, 2, …, T, s = 1, 2, …, S, ξ{row, col}. In addition, centralize the obtaining kernel matrix by (22).

CR IP T

4) Using the projection matrics Wrow and Wcol obtained in the training procedure, project each quaternion-based test sample Z qt in both directions by (28) to achieve the row and column projected features of the testing set y tZ,ξ  { ytZ, h, | h  1, 2,..., r  } , t = 1, 2, …, T, ξ{row, col}.

AN US

5) Make a classification task using the nearest neighbor classifier with the fused quaternion Euclidean distances. The distances between the test samples and the training samples are based on their corresponding projected features y tZ,ξ and y sX,ξ . Notice that other classifiers can also be considered, such as SVM, neural network, and Bayes, etc.

M

Here we mainly describe step 5) since other steps have been introduced in the Section 3 and Subsection 4.1. It is well-known that the feature distance is the key issue in the nearest neighbor classifier. Take the recognition of the test

ED

sample Zt as an example. To decide the class of Zt, firstly, the row and column directional distances between the test sample Zt

PT

and each training sample Xs, s = 1, 2, …, S, based on their corresponding projected features y tZ,ξ and y sX,ξ are defined by,

2



r

y h 1

Z , t ,h

 ysX, h,

2

, ξ{row, col},

(35)

CE

d s,t  y tZ,ξ  y sX,ξ

where ||·||2 is the quaternion vector Euclidean distance given in [9], |·| is the quaternion modulus operator defined in (6).

AC

Before the fusion of two directional distances, they are normalized using the method by dividing their corresponding maximum value as [52, 53],

d s,t 

d s,t max(d s,t )

, ξ{row, col}.

(36)

1 s  S

Here, the denominator max(d s,t ) means the maximum value of all S directional distances d s,t , s = 1, 2, …, S, corresponding to the 1 s  S

test sample Z qt and each training sample Xqs .

13

ACCEPTED MANUSCRIPT

Then, the final distance between two samples Xs and Zt is obtained by fusing two normalized directional distances with the weight α  [0, 1] as, col d s ,t   d srow ,t  (1   )d s ,t .

(37)

Finally, the nearest neighbor classifier is used for classification based on the fused distance. If s*  arg min( d s ,t ) , then the test sample Zt is decided to belong to the class of the training sample Xs*.

5.

Experimental results and analysis

CR IP T

1 s  S

This section provides experimental results in RGB-D object recognition to illustrate the performance of the proposed KQPCA.

AN US

The proposed KQPCA-based algorithm considered four types of KQPCA: 1DKQPCA, 2DKQPCA_Row, 2DKQPCA_Col, and BD2DKQPCA. 1DKQPCA converts 2D RGB-D images into 1D quaternion vectors. 2DKQPCA_Row performs 2DKQPCA in the row direction, while 2DKQPCA_Col in the column direction. Moreover, our algorithms were also compared with two existing

M

PCA-based algorithms - the quaternion-based algorithm using BD2DQPCA [9] and the conventional algorithm using

ED

BD2DKPCA [50]. BD2DQPCA was first proposed for RGB face recognition in [9], while here it was also used to RGB-D object recognition by combining with the proposed improved QR (34). BD2DKPCA was applied for gray face recognition in [50], while

CE

image independently.

PT

here it was used to RGB object recognition and RGB-D object recognition by treating each component of RGB and RGB-D object

5.1 Optimal parameters for KQPCA-based algorithm using IIIT-D RGB-D face dataset

AC

Here the IIIT-D RGB-D dataset [30] was used for experiment. IIIT-D RGB-D dataset contains 4065 RGB-D face images for 106 individuals captured in two sessions using Kinect sensor. The images are under normal illumination with variations in pose, expression and eyeglasses. The number of images per individual is variable from 11 to 254. This dataset provides five folds. Same as used in [30], eacht test in this subsection were carried out five times by using the five folds separately; moreover, each time uses 4 images per individual for training and the rest images for testing. In this paper, all of the RGB images and depth images were

14

AN US

CR IP T

ACCEPTED MANUSCRIPT

Fig. 2 Some sample images of two individuals from the IIIT-D RGB-D dataset. The first and third rows are RGB images, while the second and forth rows are their corresponding depth images

resized to 100×100, which was also the same as [30]. Depth images were converted to gray-scale ones. Some samples of this

M

dataset are given in Fig. 2.

ED

There are a few parameters in the proposed KQPCA-based algorithm: the real numbers b and c of the kernel function in (21); the weight α of the feature fusion in (37); the dimension of the projected features in row direction rrow and in column direction rcol

PT

for BD2DKQPCA, while only one for 1DKQPCA and 2DKQPCA. In this paper the parameter c was set to 1 which was usually used for the conventional KPCA. Other parameters were decided by the following experiments.

CE

We first considered two parameters: the power b and the weight α. In order to find the optimal values of these two parameters, the parameters rrow and rcol were set to their maximum ones: rrow and rcol were respectively equal to the width and height of the

AC

image. Then, for the power b, its value was varied from 0.05 to 1.5 per 0.05, while for the parameter α, its value was variable in the range 0 to 1.0 every 0.1. The average recognition rates of five times (folds) in IIIT-D RGB-D dataset are given in Fig. 3. It can be observed from this figure that: (a) for the power b, the recognition rates first increase, reach a maximum and then decrease for all KQPCA-based algorithms except 1DKQPCA. The maximum average rates are respectively 82.47 for 1DKQPCA with b = 0.8, 74.86 for 2DKQPCA_Row with b = 0.7, 87.58 for 2DKQPCA_Col with b = 0.1, and 89.03 for BD2DKQPCA with b = 0.2 and α = 0.1. These optimal parameters were used in the following tests; (b) the 2DKQPCA_Col-based algorithm outperforms the

15

ACCEPTED MANUSCRIPT

0.89

0.8

0.75

0.88

0.84 0.85 0.82 0.8 0.8 0.75 0.7 1.5

0.05

0.3

0.5 0.7 0.9 Power b in polynomial kernel

1.1

1.3

0.86

0.9

0.78 0.76

1.3

1.1

1

0.9 0.7 0.5 P o w ebr i n p o l y n o m i a l0 k. 3e r n e l 0.05 0

1.5

(a) 1DKQPCA and 2DKQPCA

0.74 0.6 0.4 0 . 2W e i g ht i n f e a t u r e f u s i o n0 . 7 2 0.8

(b) BD2DKQPCA

CR IP T

Average recognition rate

0.85

Av erage rec ognition rate

1DKQPCA 2DKQPCA_Row 2DKQPCA_Col

Fig. 3 Average recognition rates for different KQPCA-based algorithms under different parameters b and α

(b)

(c)

(d)

(e)

(f)

AN US

(a)

Fig. 4 Two original RGB face images and their corresponding mean column and mean row images. (a), (b), and (c) are respectively the original image, the mean column image, and the mean row image for the first image in Fig. 2. (d), (e), and (f) are for the first image of the third row in Fig. 2. Let the size of the original image be M×N, the mean column image is produced by copying the M×1 mean column vector N times. The mean column vector is computed by (33). It is the similar to the mean row image.

2DKQPCA_Row-based algorithm. The reason is that the mean column vector has a stronger ability for representing face images

M

than the mean row one. This is also the reason why the optimal weight α is with a small value 0.1 and why the influence of the

ED

weight α is greater than that of the power b shown in Fig. 3 (b) for the BD2DKQPCA-based algorithm. To make it clear, Fig. 4 shows two original face images and their corresponding mean column and row vectors. It can be seen that the mean column vector

PT

performs better in representing the feature of some important face parts (i.e., eyes, mouths, eyebrows, and forehead, etc.) for

CE

recognition task than the mean row vector. These important parts can be distinguished in the mean column images Fig. 4 (b) and (e). However, this is not the case for the mean row images Fig. 4 (c) and (f).

AC

Then, using the optimal parameters b and α obtained in the previous test, we discussed the remaining two parameters: the dimension of the projected features in row direction rrow and in column direction rcol. Because one can project the feature vector to arbitrary dimensions from 1 to the image width 100 in row direction, and 1 to the image height 100 in column direction for 2DKQPCA-based algorithms as well as from 1 to the number of pixels 100 × 100 for 1DKQPCA-based algorithm, comparing all cases takes a long time and is not necessary. So, two stages were considered. In the first stage, rrow and rcol were tested in the range [5, 100] with the interval 5, while in the second stage, they were evaluated in the range [r’-5, r’+5] with the minimum interval 1,

16

ACCEPTED MANUSCRIPT

where r’ was the optimal rξ, ξ  {row, col}, obtained in the first stage. Notice that in order to make a comparison, the maximum value of r considered for 1DKQPCA was 100 instead of 100 × 100. Moreover, the following experimental results show that the rate can reach its maximum in the range [5, 100]. The results of two stages are given in Fig. 5. It can be seen from this figure that: (a) the recognition rates first increase, reach a maximum and then remain stable for all KQPCA-based algorithms. The reason is

CR IP T

that there exists high redundancy in the feature vector. It is also the reason why introducing PCA to reduce this redundancy; (b) for 2DKQPCA-based algorithms, the 2DKQPCA_Col-based algorithm is superior to the 2DKQPCA_Row-based algorithm. This is due to the stronger ability for representing face images of the mean column vector than that of the mean row one, which is also shown in the previous test as Fig. 4. This is also the reason for BD2DKQPCA why the increase of rcol makes the rates increase

AN US

greater than that of rrow; (c) BD2DKQPCA, whose maximum average recognition rate is 89.09 with the optimal parameters rrow = 25 and rcol = 20, has the best performance among four types of KQPCA. It is because that BD2DKQPCA considers the correlation not only among the mean row vectors of the face samples but also among the mean column vectors. The optimal parameters rrow and rcol of four types of KQPCA are shown in Table I and their corresponding average recognition rates are provided in Table III. 0.88

1DKQPCA 2DKQPCA_Row 2DKQPCA_Col

0.82

ED

Average recognition rate

0.86

0.78

0.7

5 10

20

PT

0.74

30

40

50

60

70

80

0.86

Average recognition rate

M

0.88

0.84

1DKQPCA 2DKQPCA_Row 2DKQPCA_Col

0.82 0.8 0.78 0.76

90

100

0.74 -5

-4

-3

-2

-1

0

1

2

3

4

5



r

(b) Second stage of 1DKQPCA and 2DKQPCA (r=r’+δ)

CE

(a) First stage of 1DKQPCA and 2DKQPCA

AC

Average recognition rate

0.8908 0.8915

0.8906

0.891

0.8904

0.8905

0.8902 0.89

0.89 0.8895

0.8898

0.889 35

0.8896 33 31

row 29

r

27 25

(c) First stage of BD2DKQPCA

15

17

19

21 col

r

(d) Second stage of BD2DKQPCA

Fig. 5 Average recognition rates for different KQPCA-based algorithms under different parameters rrow and rcol

17

23

25

0.8894 0.8892

ACCEPTED MANUSCRIPT

Table I Optimal parameters of different PCA-based algorithms for RGB-D face recognition Parameters

BD2DQPCA [9]

α







b



0.6

0.8

row

20

18

r

BD2DKPCA [50]

1DKQPCA

2DKQPCA_Row

40

col

r

37

40

2DKQPCA_Col

BD2DKQPCA





0.1

0.7

0.1

0.2

47



25



10

20

Table II Optimal parameters of different PCA-based algorithms for RGB face recognition BD2DQPCA [9]

α









b



0.7

0.8

1.3

row

35

55

r

BD2DKPCA [50]

1DKQPCA

2DKQPCA_Row

35 78

col

r

46



43

2DKQPCA_Col

BD2DKQPCA



0.1

0.3

0.3



34

12

10

CR IP T

Parameters

Table III. Average recognition rate (%) of different algorithms using the optimal parameters on IIIT-D RGB-D face dataset RISE[30]

RISE+ADM [30]

RGB





RGB-D

82.78

86.16

Datasets

1DKQPCA

RGB RGB-D

BD2DQPCA [9]

BD2DKPCA [50]

AN US

Datasets

78.90

82.23

81.35

83.91

2DKQPCA_Row

2DKQPCA_Col

BD2DKQPCA

84.94

68.43

86.37

87.65

85.29

74.86

87.77

89.09

M

In fact, we also obtained the optimal parameters for other compared algorithms and for RGB face recognition without using the

ED

depth information. For RGB face recognition, the algorithm using the proposed KQPCA is similar to the proposed RGB-D object recognition algorithm given in the Section 4. The only difference is that the input is the pure quaternion data as (9) with the real

PT

part equal to 0 for RGB face images while the quaternion data as (34) for RGB-D face images.The optimal parameters are

CE

respectively shown in Table I for RGB-D face recognition and Table II for RGB face recognition. The recognition results under these optimal parameters are given in Table III. Table III also provides the results of two algorithms, which were proposed by the

AC

creators of IIIT-D dataset respectively using RGB-D image descriptor based on saliency and entropy (RISE) as well as using RISE and attributes based on depth map (RISE+ADM) in [30]. Table III shows that: (1) for all compared algorithms, considering both RGB information and depth information performs better than considering only RGB information. It again demonstrates the importance of depth information in the object recognition; (2) some types of the proposed KQPCA have a good performance in RGB and RGB-D face recognition except for the 2DKQPCA_Row. Among them, the proposed BD2DKQPCA-based algorithm

18

ACCEPTED MANUSCRIPT

outperforms other compared algorithms. It attributes to the quaternion-based RGB-D object processing, the nonlinear processing using the kernel technique, as well as the consideration of both row and column direction.

5.2 Performance comparison using the other three datasets In this subsection, the other three datasets were considered to evaluate the proposed KQPCA-based algorithms using the optimal

CR IP T

parameters obtained in the previous subsection. One was also a face dataset (color FERET dataset [54]), while the other two were the dataset of some general RGB-D objects (CIN 2D/3D dataset [21] and Washington RGB-D dataset [22]). Some examples of these three datasets are provided in Fig. 6. The color FERET face dataset [54] contains a total of 11338 facial images from 994

AN US

individuals at various angles. Two subsets fa/fb captured in frontal views in this dataset were considered in this test. Same as used in [9], the first 200 samples of 200 individuals in fa subset were used for training, while the corresponding 200 samples in fb subset for testing. The 200 samples are with different race, sex, age and facial expression. Notice that the original data of these

M

samples with some background were used without the cropping by face location procedure. Because FERET dataset only provides the RGB color face images, the recently proposed algorithm using deep convolutional neural fields [55] was used to obtain the

ED

depth images of these face images. CIN 2D/3D dataset [21] consists of 163 object instances organized into 18 categories. For each

PT

object instance, RGB-D images corresponding to 36 views are recorded with the 10 degree angle between two neighboring views. The first two object instances of each category were considered in this test. Notice that the samples with very small sizes (5×5)

CE

were not considered and the provided depth images were convert into gray-scale ones. For the category recognition, we randomly

AC

chose 14 views of each instance for training and the remaining 22 views for testing. Processing in the same way as IIIT-D dataset, all of the images were resized to 100×100. The Washington RGB-D dataset [22] consists of 300 object instances organized into 51 categories. The number of images per instance is variable from 110 to 166. This dataset has 41,877 RGB-D images in total captured under three different viewpoint angles (30 degree, 45 degree and 60 degree above the horizon). Our experiments focused on category recognition. The first instance of each category was considered. We randomly selected the 20% images of each instance for training and the remaining images for testing. All of the images were also resized to 100×100.

19

CR IP T

ACCEPTED MANUSCRIPT

Fig. 6 Some examples of three datasets. First row: CIN 2D/3D dataset (depth images are the gray-scale version of those provided in [21]); Second row: Color

AN US

FERET face dataset (depth images are the pseudo-color images); Third row: Washington RGB-D dataset.

Table IV. Average recognition rate (%) of different algorithms using the optimal parameters on FERET, CIN 2D/3D and Washington RGB-D datasets FERET Algorithms

CIN 2D/3D

Washington RGB-D

RGB-D

RGB

RGB-D

RGB

RGB-D

Gaussina kernel SVM [22]









74.5

83.8

CKM Descriptor [23]











86.4

Upgraded HMP [24]





86.3

91.0

82.4

87.5

KDES [25]











86.5









84.1

91.3









84.7

94.0

CNN-TRANSFER+DAE [19]





87.6

91.3





CNN-SPM-RNN [20]





88.5

92.9

85.2

90.7

Ev2D3D [21]





66.6

82.8





Multi-Modal CNN [28]







88.4



86.9





88.0

92.8

82.8

88.5

BD2DQPCA [9]

86.50

87.50

59.72

66.29

77.13

74.25

BD2DKPCA [50]

90.00

91.50

89.02

89.90

90.81

93.06

1DKQPCA

86.00

87.50

89.27

90.53

91.75

92.32

2DKQPCA_Row

80.50

84.50

71.59

79.17

84.92

88.94

2DKQPCA_Col

91.00

92.50

89.02

91.29

89.85

92.05

BD2DKQPCA

92.50

93.50

90.28

92.80

91.57

94.04

ED

FusionNet (jet) [26]

AC

CE

PT

FusionNet (Surface normals) [27]

Subset-SAE-RNNs [29]

M

RGB

20

ACCEPTED MANUSCRIPT

The results on these three datasets for different compared algorithms are shown in Table IV. Table IV also provides the results of other existing eleven algorithms [19-29] for RGB-D object recogntion. The results of these eleven state-of-the-art algorithms are taken from their corresponding literatures. However, the authors of these algorithms reported results on only one or two datasets of two RGB-D object datasets (CIN 2D/3D dataset and Washington RGB-D dataset). So, in Table IV we present the results of these algorithms for their corresponding datasets only while use the symbol ―—‖ for other datasets. It can be observed

CR IP T

from this table that, for three new datasets, our proposed BD2DKQPCA-based algorithm also achieves the best performance among the compared seventeen algorithms including some PCA-based algorithms considered in the previous subsection. Here we again analyze the reason why 2DKQPCA_Col still outperforms 2DKQPCA_Row for three datasets. For color face FERET dataset, the reason is the same as that for IIIT-D face dataset described in the subsection 5.1. For CIN 2D/3D dataset and Washington RGB-D dataset, the reason is that most of objects in these two datasets (please see Fig. 6) are placed vertically and thus the mean

6.

AN US

column feature vector contains more object features than the row one.

Conclusions

In this paper, three types of KQPCA were proposed to improve the existing QPCA to effectively deal with the possibly

M

higher-order statistics of a quaternion signal. Moreover, the QR was also improved to resolve the redundancy problem and

ED

to represent the RGB-D images effectively. Then, a RGB-D object recognition algorithm was proposed by using the KQPCA and the improved QR. The proposed BD2DKQPCA-based algorithm outperforms some other existing algorithms

PT

for the following reasons: (a) the BD2DKQPCA-based algorithm uses the improved QR considering the additional

CE

important depth information together; (b) BD2DKQPCA utilizes the kernel technique to process the nonlinear quaternion signal and the RGB-D object recognition problem is highly nonlinear in real situation [37, 38]; (c) BD2DKQPCA

AC

considers the correlation not only among the mean row vectors of the samples but also among the mean column vectors. However, 2DKQPCA in this paper only considers the mean row/column vector to represent one sample due to the large dimension of quaternion kernel matrix. So, as future work, we will find an efficient approach to make use of all row/column vectors for even better performance.

21

ACCEPTED MANUSCRIPT

Acknowledgement Thank Prof. Shuisheng Zhou from Xidian University in China for providing the code for KPCA. This work was supported by the NSFC under Grants 61572258, 61232016, 61572257, 61672294, and 61602253, the Natural Science Foundation of Jiangsu

CR IP T

Province of China under Grants BK20151530, and BK20150925, BK21+ program by the Ministry of Education of Korea, the G-ITRC support program (IITP-2016-R6812-16-0001) supervised by the IITP, and the PAPD fund.

REFERENCES [1]

O.N. Subakan, and B.C. Vemuri, ―A quaternion framework for color image smoothing and segmentation,‖ Int. J. Comput. Vis., vol. 91, no. 3, pp. 233–250,

AN US

2011. [2]

T.A. Ell, and S.J. Sangwine, ―Hypercomplex fourier transforms of color images,‖ IEEE Trans. Image Process., vol. 16, no. 1, pp, 22–35, 2007.

[3]

S.J. Sangwine, ―Fourier transforms of colour images using quaternion or hypercomplex, numbers,‖ Electron. Lett., vol. 32, no. 1, pp. 1979–1980, 1996.

[4]

W.L.Chan, H. Choi, and G. Baraniuk, ―Directional hypercomplex wavelets for multidimensional signal analysis and processing,‖ In: Proc. IEEE Int. Conf.

M

Acoustics, Speech and Signal Processing (ICASSP 2004), pp. 996–999, 2004.

S. Gai, ―New banknote defect detection algorithm using quaternion wavelet transform,‖ Neurocomputing, vol. 196, pp. 133-139, 2016.

[6]

T. Nitta, ―A quaternary version of the back-propagation algorithm,‖ In: Proc. 1995 IEEE Int. Conf. Neural Networks (ICNN’95), Perth, Australia, vol. 5,

ED

[5]

pp. 2753–2756, 1995.

L. S. Saoud, R. Ghorbani, and F. Rahmoune, ―Cognitive quaternion valued neural network and some applications,‖ Neurocomputing, 2016. doi: j.neucom.2016.09.060.

N.L. Bihan, and S.J. Sangwine, ―Quaternion principal component analysis of color images,‖ In: Proc. 2003 10th IEEE Int. Conf. Image Processing (ICIP

CE

[8]

PT

[7]

2003), vol. 1, pp. 809–812, 2003.

Y.F. Sun, S.Y. Chen, and B.C.Yin, ―Color face recognition based on quaternion matrix representation,‖ Pattern Recognit. Lett., vol. 32, no. 4, pp. 597–605,

AC

[9]

2011.

[10] S.C. Pei, and C.M. Cheng, ―Quaternion matrix singular value decomposition and its applications for color image processing,‖ In: Proc. 2003 Int. Conf. Image Processing (ICIP 2003), vol. 1, pp. 805–808, 2003. [11] N.L. Bihan, and S. Buchholz, ―Quaternionic independent component analysis using hypercomplex nonlinearities,‖ In: Proc. IMA 7th Conf. Mathematics in Signal Processing, pp. 1–4, 2006. [12] Y.N. Li, ―Quaternion polar harmonic transforms for color images,‖ IEEE Signal Process. Lett., vol. 20, no. 8, pp. 803–806, 2013.

22

ACCEPTED MANUSCRIPT

[13] L.Q. Guo, and M. Zhu, ―Quaternion fourier–mellin moments for color image,‖ Pattern Recognit., vol. 44, no. 2, pp. 187–195, 2011. [14] B.J. Chen, H.Z. Shu, H. Zhang, G. Chen, C. Toumoulin, J.L. Dillenseger, and L.M. Luo, ―Quaternion Zernike moments and their invariants for color image analysis and object recognition,‖ Signal Process., vol. 92, no. 2, pp. 308–318, 2012. [15] B.J.Chen, H.Z.Shu, G. Coatrieux, G. Chen, X.M. Sun, and J.L. Coatrieux, ―Color image analysis by quaternion–type moments,‖ J. Math. Imag. Vis., vol. 51, no. 1, pp. 124–144, 2015.

CR IP T

[16] X.Y. Wang, W.Y. Li, H.Y. Yang, P.P. Niu, and Y.W. Li, ―Invariant quaternion radial harmonic Fourier moments for color image retrieval,‖ Opt. Laser Technol., vol. 66, pp. 78–88, 2015.

[17] Y.M. Fang, W.S. Lin, B.S. Lee, C.T. Lau, Z.Z Chen, and C.W. Lin, ―Bottom-up saliency detection model based on human visual sensitivity and amplitude spectrum,‖ IEEE Trans. Multimedia, vol. 14, no. 1, pp. 187-198, 2012.

[18] D. Assefa, L. Mansinha, K.F. Tiampo, H. Rasmussen, and K. Abdella, ―The trinion Fourier transform of color images,‖ Signal Process., vol. 91, no. 8, pp.

AN US

1887–1900, 2011.

[19] J.H. Tang, L. Jin, Z.C. Li, and S.H. Gao, ―RGB-D object recognition via incorporating latent data structure and prior knowledge,‖ IEEE Trans. Multimedia, vol. 17, no. 11, pp. 1899-1908, 2015.

[20] Y. Cheng, X. Zhao, K. Huang, and T.N. Tan, ―Semi–supervised learning and feature evaluation for RGB–D object recognition,‖ Comput. Vis. Image Und.,

M

vol. 139, pp. 149–160, 2015.

[21] B. Browatzki, J. Fischer, B. Graf, H.H. Bulthoff, and C. Wallraven, ―Going into depth: Evaluating 2D and 3D cues for object classification on a new,

ED

large-scale object dataset,‖ In: Proc. 2011 IEEE Int. Conf. Computer Vision Workshops (ICCV2011), Barcelona, Spain, pp. 1189–1195, 2011. [22] K. Lai, L.F. Bo, X.F. Ren, and D. Fox, ―A large-scale hierarchical multi-view rgb-d object dataset,‖ In: Proc. 2011 IEEE Int. Conf. Robotics and

PT

Automation (ICRA), pp. 1817–1824, 2011.

[23] M. Blum, J.T. Springenberg, J. Wülfing, and M. Riedmiller, ―A learned feature descriptor for object recognition in rgb-d data,‖ In: Proc. 2012 IEEE Int.

CE

Conf. Robotics and Automation (ICRA), pp. 1298–1303, 2012. [24] L.F. Bo, X.F. Ren, and D. Fox, ―Unsupervised feature learning for RGB-D based object recognition,‖ In: Proc. 13th Int. Symp. Experimental Robotics, pp.

AC

387–402, 2013.

[25] K. Lai, L.F. Bo, X.F. Ren, and D. Fox, ―RGB-D object recognition: Features, algorithms, and a large scale benchmark,‖ In Consumer Depth Cameras for Computer Vision, pp. 167–192, 2013.

[26] A. Eitel, J. T. Springenberg, L. Spinello, M. Riedmiller, and W. Burgard, ―Multimodal deep learning for robust rgb-d object recognition,‖ In: Proc. 2015 IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), pp. 681–687, 2015. [27] L. Madai-Tahy, S. Otte, R. Hanten, and A. Zell, ―Revisiting deep convolutional neural networks for rgb-d based object recognition,‖ In: Proc. 2016 Int. Conf. Artificial Neural Networks, pp. 29–37, 2016.

23

ACCEPTED MANUSCRIPT

[28] A.R. Wang, J.W. Lu, J.F. Cai, T.J. Cham, and G. Wang, ―Large-margin multi-modal deep learning for rgb-d object recognition,‖ IEEE Trans. Multimedia, vol. 17, no. 11, pp. 1887-1898, 2015. [29] J. Bai, Y. Wu, J.M. Zhang, and F.Q. Chen, ―Subset based deep learning for RGB-D object recognition,‖ Neurocomputing, vol. 165, pp. 280-292, 2015. [30] G. Goswami, M. Vatsa, and R. Singh, ―RGB-D face recognition with texture and attribute features,‖ IEEE Trans. Inf. Foren. Sec., vol. 9, no. 10, pp. 1629–1640, 2014.

CR IP T

[31] X. Lv, X.D. Liu, X.Y. Li, X. Li, S.Q. Jiang, and Z.Q. He, ―Modality-specific and hierarchical feature learning for rgb-d hand-held object recognition,‖ Multimed. Tools Appl., vol. 76, no. 3, pp. 4273–4290, 2016.

[32] S. Gupta, R. Girshick, P. Arbeláez, and J. Malik. ―Learning rich features from RGB-D images for object detection and segmentation,‖ In: Proc. Euro. Conf. Computer Vision, pp. 345–360, 2014.

[33] H. Xue, Y. Liu, D. Cai, and X. He, ―Tracking people in RGBD videos using deep learning and motion clues,‖ Neurocomputing, vol. 204, pp. 70–76, 2016.

Syst. Vid. Tech., vol. 26, no. 3, pp. 541–555, 2016.

AN US

[34] H. Zhang, and L.E. Parker, ―CoDe4D: color-depth local spatio-temporal features for human activity recognition from RGB-D videos,‖ IEEE Trans. Circ.

[35] I. Jolliffe. Principal component analysis. John Wiley & Sons, Ltd, 2002.

[36] Z. Zhang, M.B. Zhao, B. Li, P. Tang and F.Z. Li., ―Simple yet rffective volor principal and discriminant feature extraction for representing and recognizing

M

color images,‖ Neurocomputing , vol.149, pp.1058–1073, 2015.

[37] A. Eftekhari, M. Forouzanfar, H.A. Moghaddam, and J. Alirezaie, ―Block-wise 2D kernel PCA/LDA for face recognition,‖ Inf. Process. Lett., vol. 110, no.

ED

17, pp. 761–766, 2010.

[38] N. Sun, H.X. Wang, Z.H. Ji, C.R. Zou, and L. Zhao, ―An efficient algorithm for Kernel two-dimensional principal component analysis,‖ Neural Comput.

PT

Appl., vol. 17, no. 1, pp. 59–64, 2008.

[39] B.J. Chen, G. Coatrieux, J.S. Wu, Z.F. Dong, J.L. Coatrieux, and H.Z. Shu, ―Fast computation of sliding discrete Tchebichef moments and its application in

CE

duplicated regions detection,‖ IEEE Trans. Signal Process., vol. 63, no. 20, pp. 5424-5436, 2015. [40] B. Schölkopf, A. Smola, and K.R. Müller, ―Nonlinear component analysis as a kernel eigenvalue problem,‖ Neural Comput., vol. 10, no. 5, pp. 1299–1319,

AC

1998.

[41] A. Chakrabarti, A.N. Rajagopalan, and R. Chellappa, ―Super-resolution of face images using kernel PCA-based prior,‖ IEEE Trans. Multimedia, vol. 9, no. 4, pp. 888-892, 2007.

[42] R. Zeng, J.S. Wu, Z.H. Shao, L. Senhadji, and H.Z. Shu, ―Quaternion softmax classifier,‖ Electron. Lett., vol. 50, no. 25, pp. 1929–1930, 2014. [43] L. Shi, and B. Funt, ―Quaternion color texture segmentation,‖ Comput. Vis. Image Und., vol. 107, no. 1, pp. 88–96, 2007. [44] F.N. Lang, J.L. Zhou, S. Cang, H. Yu, and Z. Shang, ―A self-adaptive image normalization and quaternion PCA based color image watermarking algorithm,‖ Expert Syst. Appl., vol. 39, no. 15, pp. 12046–12060, 2012.

24

ACCEPTED MANUSCRIPT

[45] X. Xu, Z. Guo, C. Song, and Y. Li, ―Multispectral palmprint recognition using a quaternion matrix,‖ Sensors, vol. 12, no. 4, pp. 4633–4647, 2012. [46] W. R. Hamilton, Elements of quaternions. Longmans, Green, and Company, 1899. [47] T. A. Ell, L. B. Nicolas, and S. J. Sangwine, Quaternion Fourier transforms for signal and image processing. John Wiley & Sons, 2014. [48] D.B. Sweetser, Doing physics with quaternions. Open access at http://www.theworld.com/~sweetser/quanternions/ps/book.pdf. [49] V.D.M. Nhat, and S.Y. Lee, ―Kernel-based 2DPCA for face recognition,‖ In: Proc. 2007 IEEE Int. Symp. Signal Processing and Information Technology,

CR IP T

2007, pp. 35–39. [50] D.Q. Zhang, S.C. Chen, and Z.H. Zhou, ―Recognizing face or object from a single image: linear vs. kernel methods on 2d patterns,‖ Structural, Syntactic, and Statistical Pattern Recognition. Springer Berlin Heidelberg, 2006: 889–897.

[51] J. Yang, D.Q. Zhang, A. F. Frangi, and J.Y. Yang, ―Two-dimensional PCA: a new approach to appearance-based face representation and recognition,‖ IEEE Trans. Pattern Anal. Machine Intell., vol. 26, no. 1, pp. 131–137, 2004.

AN US

[52] N.K. Logothetis, J. Pauls, M. Augath, T. Trinath, and A. Oeltermann, ―Neurophysiological investigation of the basis of the fMRI signal,‖ Nature, vol. 412(6843), pp. 150-157, 2001.

[53] E. Attalla, P. Siy, ―Robust shape similarity retrieval based on contour segmentation polygonal multiresolution and elastic matching,‖ Pattern Recogn., vol. 38, no. 12, pp. 2229-2241, 2005.

Machine Intell., vol. 22, no. 10, pp. 1090–1104, 2000.

M

[54] P. J. Phillips, H. Moon, P. J. Rauss, and S. Rizvi, ―The FERET evaluation methodology for face recognition algorithms,‖ IEEE Trans. Pattern Anal.

ED

[55] F.Y. Liu, C.H. Shen, G.S. Lin, and I. Reid, ―Learning depth from single monocular images using deep convolutional neural fields,‖ IEEE Trans. Pattern

AC

CE

PT

Anal. Machine Intell., vol. 38, no. 10, pp. 2024-2039, 2016.

25

ACCEPTED MANUSCRIPT

CR IP T

Biographies of Authors

Beijing Chen

Beijing Chen received the Ph.D. degree in Computer Science in 2011 from Southeast University, Nanjing, China. Now he

AN US

is an associate professor in the School of Computer & Software, Nanjing University of Information Science & Technology,

ED

M

China. His research interests include color image processing, pattern recognition and information security.

PT

Jianhao Yang received the B.S. degree in Computer Science in 2015 from Nanjing University of Information Science & Technology, Nanjing, China. He is currently pursuing the M.S. degree in the School of Computer & Software, Nanjing

AC

CE

University of Information Science & Technology, Nanjing, China. His research interest includes color image processing.

Byeungwoo Jeon received the B.S. and M.S. degrees, both from the Department of Electronics Engineering, Seoul National University, Seoul, Korea, in 1985 and 1987, respectively, and the Ph.D. degree from the School of Electrical Engineering,

26

ACCEPTED MANUSCRIPT

Purdue University, West Lafayette, IN, USA, in 1992. He was with the Signal Processing Laboratory, Samsung Electronics, Suwon, Korea, from 1993 to 1997, where he was involved in the research and development of video compression algorithms, the design of digital broadcasting satellite receivers, and other MPEG-related research for multimedia applications. He served as the Project Manager of Digital TV and Broadcasting with the Korean Ministry of Information and Communications from 2004 to

CR IP T

2006, where he supervised all digital TV-related research and development in Korea. Since 1997, he has been a Faculty Member with the School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon, where he is currently a Professor. He has authored many papers in video compression, pre/postprocessing, and pattern recognition. He holds more than 50 issued

AN US

patents (Korea and worldwide) in these areas. His research interests include multimedia signal processing, video compression,

ED

M

statistical pattern recognition, and remote sensing.

PT

Xinpeng Zhang received the B.S. degree in computational mathematics from Jilin University, China, in 1995, and the M.E. and Ph.D. degrees in communication and information system from Shanghai University, China, in 2001 and 2004, respectively. Since

CE

2004, he has been with the faculty of the School of Communication and Information Engineering, Shanghai University, where he

AC

is currently a Professor. He was with the State University of New York at Binghamton as a visiting scholar from January 2010 to January 2011, and Konstanz University as an experienced researcher sponsored by the Alexander von Humboldt Foundation from March 2011 to May 2012. He is an Associate Editor for IEEE Transactions on Information Forensics and Security. His research interests include multimedia security, image processing, and digital forensics. He has published more than 200 papers in these areas.

27