ARTICLE IN PRESS
JID: NEUCOM
[m5G;November 23, 2018;6:41]
Neurocomputing xxx (xxxx) xxx
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
Multi-kernel Gaussian process latent variable regression model for high-dimensional sequential data modeling Ziqi Zhu a,b, Jiayuan Zhang a,b, Jixin Zou c, Chunhua Deng a,b,∗ a
School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China Hubei Key Lab of Intelligent Information Processing and Real-time Industrial System, Wuhan, Hubei, China c The Institute of Forensic Science, Ministry of Public Security, Beijing, China b
a r t i c l e
i n f o
Article history: Received 6 January 2018 Revised 25 June 2018 Accepted 1 July 2018 Available online xxx Keywords: Sequential data modeling High-dimensional data Kernel learning Gaussian process latent variable model
a b s t r a c t Modeling sequential data has been a hot research field for decades. One of the most challenge problems in this field is modeling real-world high-dimensional sequential data with limited training samples. This is mainly due to the following two reasons. First, if the dimension of the data is significantly greater then the number of the data, it may result in the over-fitting problem. Second, the dynamic behavior of the real-world data is very complex and difficult to approximate. To overcome these two problems, we propose a multi-kernel Gaussian process latent variable regression model for high-dimensional sequential data modeling and prediction. In our model, we design a regression model based on the Gaussian process latent variable model. Furthermore, a multi-kernel learning model is designed to automatically construct suitable nonlinear kernel for various types of sequential data. We evaluate the effectiveness of our method using two types of real-world high-dimensional sequential data, including the human motion data and the motion texture video data. In addition, our method is compared with several representative sequential data modeling methods. Experimental results show that our method achieves promising modeling capability and is capable of predict human motion and texture video with higher quality. © 2018 Elsevier B.V. All rights reserved.
1. Introduction Time is a natural element that is always present in nature events. Most real-world data has a temporal component, whether it is a measurements of natural process (wind, sea wave) or manmade (human motion). Therefore, sequential data modeling and prediction are among the most important problems that analysts face in many fields, ranging from engineering, science, to economics. The Analysis and modeling of sequential data has been an active research field for decades and is considered as one of the top 10 challenging problems in data mining [1]. Meanwhile, the research of sequential data modeling has been widely applied in many fields and drawn lots of attention for decades [2–6]. Traditional approaches for modeling sequential data focus on univariate sequential data. Some representative works, such as the autoregressive model, the Box–Jenkins Approach [2], etc., achieve great success in univariate sequential data modeling. However, most of real-world sequential data are not univariate. It is
∗ Corresponding author at: School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China. E-mail address:
[email protected] (C. Deng).
difficult to modeling these data with univariate sequential data modeling methods. In last decade, more and more research focus on modeling multivariate sequential data. Among these research, the most challenging task that researchers focus on is modeling the real-world high-dimensional sequential data, such as human motion [7–9], video [3,10–13], et al. These real-world data are highly complex and noise contained. Furthermore, the number of training samples are always very limited compared with the dimension of the data. This will cause the “the curse of dimension” in the optimization process. Traditional methods do not have the capacity to model such complex data accurately. In this paper, we focus on the problem of modeling real-world high-dimensional sequential data, such as human motion and motion texture video, with limited training samples. First, since the dimension of the data D is significantly greater than the number of training samples N, a proper subspace learning method (also known as dimensionality reduction method) [14–18] is required not only for dimensional reduction, but also for high quality data reconstruction. Second, as it is difficult to obtain the prior knowledge about the dynamic behavior of sequential data, we intend to design a self-adaptive modeling method to fit different sequential data. Based on intentions mention above, we propose a Gaussian process latent variable model(GPLVM) [19] based
https://doi.org/10.1016/j.neucom.2018.07.082 0925-2312/© 2018 Elsevier B.V. All rights reserved.
Please cite this article as: Z. Zhu, J. Zhang and J. Zou et al., Multi-kernel Gaussian process latent variable regression model for highdimensional sequential data modeling, Neurocomputing, https://doi.org/10.1016/j.neucom.2018.07.082
JID: NEUCOM 2
ARTICLE IN PRESS
[m5G;November 23, 2018;6:41]
Z. Zhu, J. Zhang and J. Zou et al. / Neurocomputing xxx (xxxx) xxx
regression model. Compared with other subspace learning scheme, the GPLVM learned subspace from the perspective of Bayesian analysis and robust against the over-fitting and noise. Furthermore, we design a multi-kernel learning model to capture the internal probability density function of the data and fit the nonlinear dynamic behavior of sequential data self-adaptively. An optimization algorithm is proposed for this model for parameters learning and a prediction algorithm is proposed for data synthesis. We evaluate the effectiveness of our method the human motion data and the motion texture video. Experimental results show that our modeling method achieves promising modeling capability and is capable of predict human motion and texture video with higher quality. The main contributions of this paper are listed as follows: (1) An adaptive multi-kernel Gaussian process latent variable model is proposed for sequential data dimensional reduction. By applying the Bayesian methodology, it is possible to capture the non-stationary distribution of the sequential data with limited training samples and severe noise signal. Furthermore, by introducing the multi-kernel method, an self-adaptive subspace learning framework is designed to find a smooth trajectory in the subspace of the sequential data. And this is critical for high accuracy modeling from the prospective of dynamic modeling. (2) A multi-kernel regression model is proposed to model and predict the latent trajectory in subspace. The dynamic behavior of sequential data varies greatly among different applications. Thus, it is difficult to make appropriate assumption about the nonlinear properties about the dynamic behavior empirically. To avoid bad modeling results caused by inappropriate assumption, the proposed multi-kernel regression model is capable of learning a suitable nonlinear model driven by the training data. (3) A loss function based on both the adaptive multi-kernel Gaussian process latent variable model and the multi-kernel regression model is designed. An optimization algorithm is designed for the proposed model. In addition, a prediction algorithm is proposed for high-dimensional sequential data forecasting. The rest of the paper is organized as follows. In Section 2, the related work of sequential data modeling method is briefly introduced. In Section 3, we propose the multi-kernel Gaussian process latent variable model for high-dimensional sequential data modeling. The optimization and prediction algorithms for the proposed model are given in Section 4. Section 5 presents the experimental results. Conclusions are drawn in Section 6. 2. Related work High-dimensional sequential data modeling has been a attractive researching field for decades and many methods have been proposed. In this section, a brief introduction is given for some classical methods for high-dimensional sequential data modeling. 2.1. Autoregressive model Autoregressive (AR) models have been proved to be successful for sequential data modeling and been widely used many applications. A linear autoregressive model can be expressed as:
yl =
p
φi yl+li + el
(1)
i=1
where y is the sequential data (also named as space time data in some literature [20,21]). l denotes the location of x in space time, i.e. l = (ls , lt ), in which ls is a location of pixel in an image and lt is
a location in time. l + li indicates a neighboring location around l, and li is the lag. φ i is defined as the autoregressive parameters and el denotes the noise process, normally assumed to be a GP. Originally, the AR models were used to describe univariate sequential data. More recently, some variant methods have been proposed for high-dimensional sequential data modeling, such as the spatio-temporal autoregressive model [20,22], the multiscale autoregressive model [23] and autoregressive eigen model [24]. The advantage of AR methods is easy for training and can be applied for fast recursive data generating. However, they suffer limitation of represent non-stationary sequential data [25]. 2.2. Linear dynamic system The Linear Dynamic System (LDS) model (also known as the Kalman filter) is a classic model for nonlinear sequential data modeling. LDS model assumes that the sequential data are the output of a dynamic system driven by an independent and identically distributed (IID) process. The LDS model can be expressed as
xt+1 = Axt + et
(2)
yt = Bxt + eˆt
(3)
RD
where yt ∈ denotes the observed high-dimensional sequential data and xt ∈ RQ denotes the internal dynamics which governs the behavior of the observed data. Normally, we have Q D. t is the location of et and eˆt denote the noise term and are independent and identically distributed processes and normally defined as Gaussian process. A is the dynamic matrix, and B is an projection matrix. The key point of LDS modeling is to assume that there exists a potential dynamic trajectory with lower dimension that governing the evaluation of the observed high-dimensional sequential data. Therefore, it is possible to optimize the model with few training samples. Some variants have been proposed based on LDS for highdimensional sequential data modeling, such time-varying LDS [26], switching LDS [27], etc. LDS-based methods can model unstable sequential data. However, LDS-based methods fits good only for sequential data generated by an oscillatory system. Meanwhile, they have a tendency toward smoothing the trajectory while generating the sequential data. 2.3. Gaussian process regression Different from pervious two types of modeling methods, Gaussian process (GP) regression models discuss the regression problem from the perspective of Bayesian analysis. The target of modeling is making inferences about the relationship between inputs and targets, i.e. the conditional distribution of the targets given the inputs as p(y|x). The GP regression models can be optimized by maximum likelihood (ML) or maximum a posteriori (MAP). Meanwhile, some useful technics, such as latent variable model [19], sparse approximations [28], and variational inference [29], make the GP regression more powerful for complex learning tasks. Compared with other modeling methods, the GP regression model have several advantages [30,31]. First, GP model is designed based on probability theory, and it provides a framework for modelling the uncertainty in the sequential data. Second, the Bayesianbased models are not prone to overfitting since they average over, rather than fit, the parameters while optimization. Thus, it is possible to model a high-dimensional sequential data with few training data. Third, the GP regression models are very flexible for unknown function. For most sequential data sampled from real-world, it is difficult even impossible to acquire the prior knowledge about the
Please cite this article as: Z. Zhu, J. Zhang and J. Zou et al., Multi-kernel Gaussian process latent variable regression model for highdimensional sequential data modeling, Neurocomputing, https://doi.org/10.1016/j.neucom.2018.07.082
ARTICLE IN PRESS
JID: NEUCOM
[m5G;November 23, 2018;6:41]
Z. Zhu, J. Zhang and J. Zou et al. / Neurocomputing xxx (xxxx) xxx
suitable function for the sequential data. Therefore, the GP regression models have attract great attention in the last decade and have been successfully applied in different applications.
As for the regression function g( · ), we modeled it as the firstorder Markov model [33] based on Gaussian process as:
p(X |λ, WX ) = p(x1 ) 3. Multi-kernel Gaussian process latent variable regression model
= p( x 1 )
yt = f (xt , A )
(4)
xt+1 = g(xt , B )
(5)
RD
where yt ∈ denotes the observed high-dimensional sequential data data with the indexed by time t and we define Y = {y1 , y2 , . . . , yN }. xt ∈ RQ denotes the latent variables which governs dynamic behavior of the observed data yt . Here we assume that Q < < D. Similarly, we define X = {x1 , x2 , . . . , xN }. Since it is difficult to obtain prior knowledge about the form of density function of the sequential data, it is unreasonable to make strong assumption about the functional form of f( · ) and g( · ). Thus, we infer it in a fully Bayesian fashion based on Gaussian process [32]. Therefore, we adopt the Gaussian process latent variable model for dimensional reduction [19] and assume that the motion texture video sequence yi is a multivariate GP indexed by xi and we have: N
p(yt |xt , θ , WY )
t=1
=
1
(2π ) |KY | DN 2
D 2
1 exp − tr (KY−1Y Y T ) 2
(6)
M
wYl kYl (xi , x j ) + wδ δxi ,x j ,
i, j = 1, 2, ..., N
p(xt |xt−1 , λ, WX )
(7)
l=1
where kYl , l = 1, 2, . . . , M denotes different kernel functions. WY = {wYl |l = 1, 2, ..., M} is the weight for each kernel function. For all wYl , we assume the weights are positive and l wYl = 1. δxi ,x j is Kronecker delta. θ = {θl |l = 1, . . . , M} are the kernels’ hyperparameters. As the parameters maybe different for different kernels, the parameters are defined as θl = {(θl )1 , . . . , (θl )Ml } and Ml is the number of parameters for kernel kYl .
1
( 2π )
Q (N−1 ) 2
|KX |
Q 2
1 T exp − tr (KX−1 X2:N X2: N) 2
(8)
where KX is the kernel matrix and the elements of kernel matrix are generated by a kernel function (KX )i, j = kX (xi , x j ). Therefore, the kernel function kX governs of the manner of the latent variables. Similarly, we also adopt the multi-kernel approach for the regression function. The kernel function can be expressed as:
kX ( xi , x j ) =
M
δ δxi ,x j , wXl kXl (xi , x j ) + w
i, j = 1, 2, ..., N − 1 (9)
l=1
denote different kernel functions. W = where kXl , l = 1, 2, . . . , M X {wXl |l = 1, 2, ..., M} are the weights for kernel function and we define the weights positive and satisfy l wXl = 1. λ = {λl |l = } is the kernels’ hyperparameters. Similarly, the parameters 1, . . . , M for each kernel is futher defined as λl = {(λl )1 , . . . , (λl )M }. l
Finally, to discourage overfitting, we assume that there are some priors on the hyperparameters as [7]:
p( θ ) ∝
θi−1
p( λ ) ∝
and
i
(λi )−1 j
(10)
i, j
Based on the latent mapping (6), the latent regression model (8) and the prior (10), we can obtain the multi-kernel Gaussian process latent variable regression (MKGPLVR for short) model (11).
p(X , Y , θ , λ|WX , WY ) = p(Y |X , θ , WY ) p(X |λ, WX ) p(θ ) p(λ ) (11) It should be noted that the regression function can be easily extended into higher-order Markov model. While using first-order Markov model, we simplify the correlation among data and assume that the sequential data only depend on the previous neighboring data. With high-order Markov model, it is possible for us to obtain a precise model for the training data. For example, we define kernel function of the latent regression model based on second-order Markov model:
where KY is the kernel matrix and the elements of kernel matrix are generated by a kernel function (KY )i, j = kY (xi , x j ). In some pervious research, the form of KY is pre-defined, such as linear kernel, squared-exponential kernel and so on. The form of the kernel function determines the properties of the latent mapping f( · ). However, in most cases it is impossible to have prior knowledge about which kernel fits the data best. Thus it is difficult to make suitable choice while designing the model. Instead of making assumption about the form of kernel, we intend to let the data choose the form of kernel. Thus, we propose a multi-kernel learning model for dimensional reduction and the kernel function kY (xi , xj ) can be expressed by:
kY (xi , x j ) =
N t=2
To avoid the “curse of dimension” in the process of highdimensional sequential data modeling, our model contains two steps: (1) dimensionality reduction and (2) nonlinear regression model. In the former step, a projection function is designed to mapping the high-dimensional data into a subspace. Thus, it is possible to optimize the model with few training data. In the latter step, a nonlinear regression model is designed to capture the temporal dependence in the data. Based on these two steps, a general model for high-dimensional sequential data modeling can be expressed as:
p(Y |X , θ , WY ) =
3
KX [xi , xi−1 ], [x j , x j−1 ] =
M
wl Kl [xi , xi−1 ], [x j , x j−1 ] + wδ δ([xi ,xi−1 ],[x j ,x j−1 ])
(12)
l=1
In (12), the correlation among four neighboring data is considered. But it will cause significantly increasing of the computation cost due to the exponentially increasing number of parameters. 4. Optimization and prediction Learning the MKGPLVR model from observed data Y entails using numerical optimization to estimate the unknown parameters {X, λ, θ , WX , WY } in the model. A model gives rise to a distribution over new data and their latent coordinates. We expect modes in this distribution to correspond to new sequential data similar to the training data and their latent coordinates. In this section, we introduce a method for model optimization and prediction for new time series data. 4.1. Optimization Our objective is to optimize the parameters in model (11), including the latent variable X, the hyperparameters {θ , λ} and the
Please cite this article as: Z. Zhu, J. Zhang and J. Zou et al., Multi-kernel Gaussian process latent variable regression model for highdimensional sequential data modeling, Neurocomputing, https://doi.org/10.1016/j.neucom.2018.07.082
ARTICLE IN PRESS
JID: NEUCOM 4
[m5G;November 23, 2018;6:41]
Z. Zhu, J. Zhang and J. Zou et al. / Neurocomputing xxx (xxxx) xxx
weights for different kernels WX and WY using the given training data Y. A two-step optimization algorithm is designed: Step 1: We estimate the latent variable X and hyperparameters {θ , λ} using the maximum a posteriori algorithm while the kernel weights WX and WY fixed. According to the Bayesian inference, the posteriori distribution is
where KX and KY is defined in (9) and (7) respectively. C denotes the constant part. Since it is difficult to obtain the close-form solution, instead we use gradient descent method to optimize (21) with the respect of WX and WY . Based on the Step 1 and Step 2, the optimization algorithm can be summarized by Algorithm 1.
p(X , θ , λ|Y ) ∝ p(Y |X , θ , λ ) p(X , θ , λ )
Algorithm 1 The optimization of MKGPLVR.
= p(Y |X , θ ) p(X |λ ) p(θ ) p(λ )
(13)
Since WX and WY is fixed, it can be considered as constants. To maximize the posteriori distribution, it is equivalent to minimize the negative logarithm of (13). Thus we have:
L = − ln p(X , θ , λ|Y ) D 1 Q = ln |KY | + tr (KY−1Y Y −1 ) + ln |KX | 2 2 2 1 −1 T + tr (KX X2:N X2:N ) + θi + (λi ) j + C 2 i
(14)
i, j
where C denotes the constant part in L. We minimize L with respect to X, θ , λ numerically using the scaled conjugate gradients optimizer. The gradient of L with the respect to X can be found through combining
∂L
∂ X1:N−1
=
∂L ∂ KY ∂ L ∂ KX ∂ L ∂ X2:N · + · + · ∂ KY ∂ X1:N−1 ∂ KX ∂ X1:N−1 ∂ X2:N ∂ X1:N−1
(15)
in which
1 −1 −1 −1 T ∂L D = KY−T − K Y Y KY ∂ KY 2 2 Y
1 −1 ∂L Q −1 −1 T = KX−T − KX X2:N X2: N KX ∂ KX 2 2 ∂L
∂ X2:N
=
∂K
1 −1 K X2:N + KX−T X2:N 2 X
(16)
(17)
(18)
N with ∂ xY , ∂ xX and ∂ 2: (i = 1, 2, ..., N − 1 ) via the chain rule. xi i i The gradient of L with the respect to the hyperparameters {θ , λ} can be calculate through combining:
T ∂ KY 1 1 ∂L D ∂ KY = · tr KY−1 − · KY−1Y Y T KY−1 · + ∂θi 2 ∂θi 2 ∂θi θi
∂L Q 1 1 −1 ∂ KX T −1 T ∂ KX = · tr KX − · KX−1 X2:N X2: · + N KX ∂λi 2 ∂λi 2 ∂λi λi
(20)
via the chain rule. Step 2: We estimate the kernel weights WX and WY while the latent variable X and hyperparameters {θ , λ} are fixed. Therefore, the optimization problem can be formulated as:
D 1 Q ln |KY | + tr (KY−1Y Y −1 ) + ln |KX | 2 2 2
i
i
xt ∼ N
μX (xt−1 ), σX2 (xt−1 )I
(22)
μX (x ) = X2:T N KX−1 kX (x )
(23)
σX2 (x ) = kX (x, x ) − kX (x )T KX−1 kX (x )
(24)
where kX (·,·) is the kernel function defined by (9). kX (x) is a vector containing kX (x, xi ) in the i-th entry. In the prediction precess, the latent variable xt is set to be the mean point given by the previous time index as xt = μX (xt−1 ). Similarly, the new data can be generated by yt = μY (xt ). Moreover, the prediction method can be extended to high-order Markov model easily. 5. Experiments
∂ KYi ∂ KX ∂ KXi ∂ KY = wYi · , = wi · ∂θi ∂θi ∂λi ∂λi
1 T + tr (KX−1 X2:N X2: N) + C 2 with respect to wXi ≥ 0, wYi ≥ 0 subject to wXi = 1, wYi = 1
For many applications, it is desirable to make prediction based on trainee sequential data efficiently. Therefore we adopt the mean-prediction method for online prediction. The latent variable xt conditioned on xt−1 can be effectively generated based on the first-order Markov model using the Gaussian prediction:
(19)
with
minimize
4.2. Prediction
where
∂X
∂K
1 Input: Sequential data Y 2 Output: The latent variable X, the hyperparameters {θ , λ} and the kernel weights W . 3 Initialize X with PCA on Y with Q dimensions. Initialize the kernel parameters {θ , λ}, WX , WY . 4 For i = 1 to I do 5 Fix WX , WY , optimize (14) with respect to {X, θ , λ} using gradient descent method for J1 iterations. 6 Fix {X, θ , λ}, optimize (21) with respect to WX , WY using gradient descent method for J2 iterations. 7 For any wYl < 0 , wYl ⇐ 0. For any wXl < 0 , wXl ⇐ 0 8 Normalize WY , WX such that l wYl = 1 and l wXl = 1. 9 End For
(21)
In this section, we evaluate the performance of the MKGPLVR modeling method. First, some basic experimental setups is introduced. Then, we evaluate the performance of the MKGPLVR with subspaces of different dimensionalities. Meanwhile, the computation cost is analyzed. Furthermore, a comparison experiment among the MKGPLVR and other related methods is conducted and further analysis is provided. 5.1. Experimental setups In this paper, we focus on the problem of modeling the realworld high-dimensional sequential data. Therefore, instead of using toy data, we intend to evaluate the performance of proposed method on the data sampled from real world. Two databases are chosen for testing: (1) the CMU graphics lab motion capture database and (2) the Dyntex database. (1) The CMU Graphics Lab Motion Capture Database The Carnegie Mellon University graphics lab motion capture database (CMU motion capture database for short) contains 2605
Please cite this article as: Z. Zhu, J. Zhang and J. Zou et al., Multi-kernel Gaussian process latent variable regression model for highdimensional sequential data modeling, Neurocomputing, https://doi.org/10.1016/j.neucom.2018.07.082
JID: NEUCOM
ARTICLE IN PRESS
[m5G;November 23, 2018;6:41]
Z. Zhu, J. Zhang and J. Zou et al. / Neurocomputing xxx (xxxx) xxx
5
dimensionality of the latent space is set to be 3 as it is recommend in [7]. As for KADTS and KPCR, we utilize the code provided by You from [3]. As for the MKGPLVR, 7 different types of kernels are utilized for experiments, including the linear kernel function (Lin kernel for short), the rational quadratic kernel (Ratquad kernel for short), the radial basis function kernel (Rbf kernel for short), the squared-exponential kernel function (Sqexp kernel for short), the Gaussian kernel with white noise process (GaussWhite for short), the matern kernel (with nu=3/2, Matern32 for short), the multilayer perceptron kernel(MLP kernel).
5.2. The performance of MKGPLVR with latent space of different dimensionalities
Fig. 1. The skeleton used in our experiments is a simplified version of the default skeleton in the CMU database. The numbers in parentheses indicate the number of degrees of freedom for the joint directly above the labeled body node in the kinematic tree.
trials in 6 categories and 23 subcategories of human motion. Fig. 1 shows a simplified skeleton simplified version of the default skeleton in the CMU mocap database. Each pose comprised 56 Euler angles for joints, three global (torso) pose angles, and three global (torso) translational velocities. The database can be download from http://mocap.cs.cmu.edu. In addition, we use the software “Motion Capture Player” for the mocap data visualization. This software can be downloaded from http://run.usc.edu/cs520-s12/assign2/. In the experiments, for each frame, the global velocity is set to the difference between the next and the current frames. The velocity for the last frame is copied from the second to the last frames. Thus we have the observed training data Yt ∈ R50 . In our experiment, we choose three types of human motion data, including walking (SN: 07–01), running (SN: 09–06) and golfing (SN: 64– 02) as the training sample. (2) Dyntex Database DynTex database [34] is a diverse collection of high quality motion texture videos (as shown in Fig. 2). The latest version contains 650 video sequences of motion textures, mostly in everyday surroundings. It serves as a standard database for different motion texture research areas, such as video texture recognition, video texture segmentation and video texture synthesis. Compared with the human motion data, the motion texture video is with much higher dimension and much more complex. This database can be downloaded from http://dyntex.univ-lr.fr. For evaluation, we choose 3 types of natural scene motion texture video clips, including the ascending escalator (SN: 54pd210), the rotating whirligig (SN: 645a920) and the swinging pendant lam (SN: 64ccf10). Due to the limitation of the computer’s memory, all testing images are resized to a resolution of 120 × 90. For each video clip, we use 250 frames (namely 10 s approximately) as the training sample. All the training samples are converted gray video. Thus we have the observed training data Yt ∈ R10800 . 250 frames are predicted based on the modeling results. We use the first frame of the training sample as the initial frame of the predicted sequence. We compare the proposed MKGPLVR with the LDS [35], the FFTLDS [36], the GPDM [7], the KADTS [3] and the KPCR [3]. As for the LDS model, we utilize the code provided by Doretto from [35]. As for GPDM, we utilize the code provided by Jack from [7]. The
In this subsection, we evaluate the performance of MKGPLVR with latent space of different dimensionalities. Meanwhile, the computation cost of the proposed method is analyzed. The experiments in this section are conducted on the CMU motion capture database. The dimensionality of the subspace is set to be q = {3, 5, 10, 20}. We use the original motion sequence data as the reference data to measure prediction error. The prediction error is defined by the Euclidean norm between the predicted frame and the reference frame. The experiment results are shown in Fig. 3. From the prospective of dimensionality reduction, the dimensionality of the subspace determines the amount of information that preserved in the process of latent subspace projection. The smaller dimensionality is, the fewer information preserved. Therefore, to pursue high accurate modeling, we intend to choose a subspace with higher dimensionality. The experiment results shown in Fig. 3 demonstrate that the models with larger subspace dimensionality achieve smaller prediction error. However, it is noted that the prediction error keeps stable when the dimensionality of the subspace is larger than 5. From the prospective of dynamic modeling, the dimensionality of the subspace determines the modeling difficulty and the severity of over-fitting. Since the training samples are limited, we intend to choose a subspace with lower dimensionality to avoid the over-fitting problem. Furthermore, when the dimensionality of the subspace are too large, the predicted dynamic trajectories are more likely to degenerate into a fix point. Thus phenomenon can be observed in Fig. 4. The vertical red line indicates the number of training samples. From the prospective of computation cost, the dimensionality of the subspace is not a key factor that determines the computation complexity of proposed MKGPLVR. There are two stages in MKGPLVR, including the optimization stage and the prediction stage. As for the optimization stage, the computation complexity of the learning algorithm mainly depends on the inverse operate of kernel matrix KY ∈ RD × D and KX ∈ R(D−1 )×(D−1 ) and the number of iteration steps NI , which is about O(NI D3 ). As for the prediction stage, since only one time of the matrix inverse operate is conducted in mean prediction, the computation complexity is low and can be applied on line. It should be mentioned that, the computation complexity of MKGPLVR can be further reduced by the variational approximations proposed in [28] Based on the analysis above, with the consideration of information preservation in the process of dimensional reduction and modeling accuracy in the process of dynamic modeling, the dimensionality of the subspace are set to be 5 in the following experiments. It should be noted that when more training samples are available in practice, the dimensionality of the subspace is encourage to set to be a larger value.
Please cite this article as: Z. Zhu, J. Zhang and J. Zou et al., Multi-kernel Gaussian process latent variable regression model for highdimensional sequential data modeling, Neurocomputing, https://doi.org/10.1016/j.neucom.2018.07.082
JID: NEUCOM 6
ARTICLE IN PRESS
[m5G;November 23, 2018;6:41]
Z. Zhu, J. Zhang and J. Zou et al. / Neurocomputing xxx (xxxx) xxx
Fig. 2. Sample frames from the DynTex database (left-to-right, top-to-bottom): revolving windmill, rotating whirligig, ascending escalator, warning lamps, revolving bicycle wheel, fountain, controlled fire, swinging pendant lamp.
Fig. 3. The modeling and prediction results of MKGPLVR with latent space of different dimensionalities. (a) The human walking sequence (b) The human running sequence (c) the human golfing sequence.
5.3. Human motion modeling In this subsection, we further evaluate our modeling method using the human motion sequential data. Since the dynamic behavior of human motion is relatively simpler compared with the motion texture, we use four types of kernel functions, including the Lin kernel, the Ratquad kernel, the Rbf kernel and the Sqexp kernel, to construct the multi-kernel function. The modeling and prediction results are shown in Figs. 5–7. Fig. 5 shows the modeling and prediction result using the human walking sequence (SN:07-01) as the training sample. The motion of human walking is smooth and cyclic. As shown in Fig. 5(c), the trajectory of the optimized latent variable is also smooth and cyclical. As shown in Fig. 5(a) and (b), the projection kernel Ky (·,·) is constructed mainly using the linear kernel function and the rational quadratic kernel function, and the kernel function Kx (·,·) is constructed mainly using the linear kernel function, RBF kernel function and the squared-exponential kernel function. The weights of the RBF function and squaredexponential kernel function in Ky (·,·), and the weight of rational quadratic kernel function in Kx (·,·) are very small. It indicates that the kernel form “Lin+RatQuad” and “Lin+Rbf+SqExp” fit the data better than other kernel form. Meanwhile, Fig. 5(d) shows the sampled walking human model of the prediction results with the sampling interval 30. Fig. 6 shows the modeling and prediction result using the human running sequence (SN:09-06) as the training sample. Similar to the walking sequential data, the running motion is also cyclic, but with higher motion speed. As shown in Fig. 6(c), the spacing
between points in the latent space is larger compared to the walking data. It reflects the high motion speed of the observed data. Different from previous two types of human motion, golfing is a acyclic motion. The swings all contain periods of high acceleration. Thus, the golfing motion is more complex. As shown in Fig. 7(a) and (b), the optimized kernel function is constructed by all preset four basic kernel functions with different weights. Meanwhile the spacing between points in latent space are more carried compared to human walking and running data (as shown in Fig. 7(c)). The predicted result is smooth and nicely follow the training data and produce animations that are visually high quality. The modeling and prediction results indicate that the proposed model can be applied to both cyclic motions and acyclic motions. Furthermore, a comparison has been made among the proposed MKGPLVR and GPDM, LDS, KPCR and KADTS. The error between the predicted sequence and the original sequence are shown in Fig. 8. The average prediction error of the MKGPLVR is 10−5 approximately, while the average prediction errors of other compared methods are 102 approximately. It indicates that the modeling accuracy of proposed method is significant improved. The predicted human motion sequence can be downloaded from https://github.com/zhuziqi. 5.4. Motion texture modeling In this subsection, we evaluate the performance of MKGPLVR on the motion texture video sequence. As mentioned above, 7 different types of kernels are utilized to construct the multi-kernel function. The proposed method is compared with some state-of-the-art
Please cite this article as: Z. Zhu, J. Zhang and J. Zou et al., Multi-kernel Gaussian process latent variable regression model for highdimensional sequential data modeling, Neurocomputing, https://doi.org/10.1016/j.neucom.2018.07.082
JID: NEUCOM
ARTICLE IN PRESS
[m5G;November 23, 2018;6:41]
Z. Zhu, J. Zhang and J. Zou et al. / Neurocomputing xxx (xxxx) xxx
7
Fig. 4. 500 predicted human running frames using MKGPLVR with latent space of different dimensionalities. (a) 3 dimensional subspace (b) 5 dimensional subspace (c) 10 dimensional subspace (d) 20 dimensional subspace.
Fig. 5. The modeling and prediction result of human walking sequential data. (a) the weight of each kernel function in Kx ; (b) the weight of each kernel function in Ky ; (c) the visualization of prediction result of the human walking sequence.
Please cite this article as: Z. Zhu, J. Zhang and J. Zou et al., Multi-kernel Gaussian process latent variable regression model for highdimensional sequential data modeling, Neurocomputing, https://doi.org/10.1016/j.neucom.2018.07.082
JID: NEUCOM 8
ARTICLE IN PRESS
[m5G;November 23, 2018;6:41]
Z. Zhu, J. Zhang and J. Zou et al. / Neurocomputing xxx (xxxx) xxx
Fig. 6. The modeling and prediction result of human running sequential data. (a) the weight of each kernel function in Kx ; (b) the weight of each kernel function in Ky ; (c) the visualization of prediction result of the human walking sequence.
Fig. 7. The modeling and prediction result of human golfing sequential data. (a) the weight of each kernel function in Kx ; (b) the weight of each kernel function in Ky ; (c) the visualization of prediction result of the human walking sequence.
methods, including LDS, FFT-LDS, GPDM, KADTS and KPCR. The prediction results are shown in Figs. 9–11. All the predicted video sequences can be downloaded from https://github.com/zhuziqi. The weights of different kernels are shown in Fig. 12. As the predicted motion texture video frames shown in Figs. 9–11, the image quality of predicted video using LDS and FFT-LDS are low. The reason for such results is that the LDS-based methods is generally a linear modeling method and it is incapable of capturing the complex behavior of nonlinear high dimensional sequential data. Compared with LDS, the GPDM method is a nonlinear modeling method. As it is shown in Figs. 9(b)–11(b), the quality of synthesized video using the GPDM method is much better and close to the training sequence. However, since the kernel
form of the GPDM method is fixed and only use the linear kernel and RBF kernel. The capability of GPDM to modeling the motion texture video sequence with various nonlinear dynamic behavior is limited. As we can see from the frames shown in Fig. 11(b), there exists a ghost image of each moving bulb in the frames. It indicates that the modeling result dose not fit the data well. Similar phenomenon can be observed from the predicted videos. KADTS and KPCR are both typical kernel-based regression methods. By introducing the kernel method into traditional principal component regression and least mean squares regression, the capability of nonlinearity in complex data and flexibility is improved. Such improvement can be demonstrated by the quality of the predicted frames shown in Figs. 9(b)–11(b). However, these two methods
Please cite this article as: Z. Zhu, J. Zhang and J. Zou et al., Multi-kernel Gaussian process latent variable regression model for highdimensional sequential data modeling, Neurocomputing, https://doi.org/10.1016/j.neucom.2018.07.082
JID: NEUCOM
ARTICLE IN PRESS
[m5G;November 23, 2018;6:41]
Z. Zhu, J. Zhang and J. Zou et al. / Neurocomputing xxx (xxxx) xxx
9
Fig. 8. The error between the original sequence and the sequences predicted by different modeling methods. (a)Human walking sequence; (b)Human running sequence; (c)Human golfing sequence.
Fig. 9. Modeling and prediction results of the ascending escalator sequence (SN: 54pd210) from dyntex database. (a)The frames of the original training sequence. (b)The frames of predicted sequence.
are deterministic modeling approach, which would suffer from the over-fitting problem especially the training samples are limited (Fig. 13). By introducing the kernel method into the probabilistic modeling methods, MKGPLVR achieves promise predicted results with smooth transformation between frames and high quality of video
frames. Compared with previous 5 methods, the quality of predicted video sequences are improved. The kernel weights of Kx and Ky shown in Fig. 12 vary with the training samples. It indicates that the proposed method learned different forms of kernel to fit the data. Meanwhile, the kernel form of motion texture video is more complex than it is of the human motion data. This is because that
Please cite this article as: Z. Zhu, J. Zhang and J. Zou et al., Multi-kernel Gaussian process latent variable regression model for highdimensional sequential data modeling, Neurocomputing, https://doi.org/10.1016/j.neucom.2018.07.082
JID: NEUCOM 10
ARTICLE IN PRESS
[m5G;November 23, 2018;6:41]
Z. Zhu, J. Zhang and J. Zou et al. / Neurocomputing xxx (xxxx) xxx
Fig. 10. Modeling and prediction results of the rotating whirligig sequence (SN: 645a920) from dyntex database. (a)The frames of the original training sequence. (b)The frames of predicted sequence.
the motion texture video is more complex than the human motion data from the perspective of probabilistic distribution and dynamic behavior. We further use the peak signal-to-noise ratio (PSNR) to measure the prediction error and evaluate the objective quality of the prediction results. The PSNR of proposed method is much higher than the other two methods and much more stable. It means that the quality of synthesized video is stable and dose not degenerate with time.
Compared with previous two method, we obtain a promise predicted results with smooth transformation between frames and high quality of video frames. The kernel weights of Kx and Ky shown in Figs. 9(c)(d)–11(c)(d) vary with the training samples. It indicates that the proposed method learned different forms of kernel to fit the data. Meanwhile, the kernel form of motion texture video is more complex than it is of the human motion data. This is because that the motion texture video is more complex than the human motion data from the perspective of probabilistic distribution and dynamic behavior.
Please cite this article as: Z. Zhu, J. Zhang and J. Zou et al., Multi-kernel Gaussian process latent variable regression model for highdimensional sequential data modeling, Neurocomputing, https://doi.org/10.1016/j.neucom.2018.07.082
JID: NEUCOM
ARTICLE IN PRESS Z. Zhu, J. Zhang and J. Zou et al. / Neurocomputing xxx (xxxx) xxx
[m5G;November 23, 2018;6:41] 11
Fig. 11. Modeling and prediction results of the swinging pendant lam (SN: 64ccf10) from dyntex database. (a)The frames of the original training sequence. (b)The frames of predicted sequence.
Fig. 12. The kernel of Kx and Ky learned from different motion texture sequence. (a) The ascending escalator sequence (SN: 54pd210) (b) The rotating whirligig sequence (SN: 645a920) (c) The swinging pendant lam (SN: 64ccf10).
Please cite this article as: Z. Zhu, J. Zhang and J. Zou et al., Multi-kernel Gaussian process latent variable regression model for highdimensional sequential data modeling, Neurocomputing, https://doi.org/10.1016/j.neucom.2018.07.082
ARTICLE IN PRESS
JID: NEUCOM 12
[m5G;November 23, 2018;6:41]
Z. Zhu, J. Zhang and J. Zou et al. / Neurocomputing xxx (xxxx) xxx
Fig. 13. The PSNR for the video sequences predicted by the proposed MKGPLVR and other methods, with the original video sequence as the reference. (a) The ascending escalator sequence (SN: 54pd210) (b) The rotating whirligig sequence (SN: 645a920) (c) The swinging pendant lam (SN: 64ccf10).
6. Conclusions In this paper, we proposed a multi-kernel Gaussian process latent variable regression model for high-dimensional sequential data modeling. In our method, the modeling process have been divided into two steps: dimensionality reduction and latent regression modeling. First, to capture the internal behavior of the sequential data and avoid the curse of dimension in training process, we design a multi-kernel Gaussian process latent variable model for dimensionality reduction. Compared with existing method, out approach is more flexible for different type of high-dimensional sequential data with high complexity. Second, we propose a multi-kernel Gaussian regression modeling based. Similar to the previous step, by inducing multi-kernel learning, the capability of modeling complex manner of sequential data in improved. We evaluate the effectiveness our method on both the CMU human motion dataset and DynTex motion texture video dataset. Experimental results shows that our method achieves great improvement compared with existing methods. Acknowledgments This work was supported partially by in part by the National Key Research and Development Program of China (No. 2016YFC0800705-2), and in part by the Key Program for International S&T Cooperation Projects of China (No. 2016YFE0121200), in part by Provincial Natural Science Fund of Hubei (No. 2016CFB117 & 2018CFB195), in part by the Provincial Education Office Science and Technology Research Project Youth Talent Project of Hubei (No. Q20161113), in part by in part by the National Natural Science Foundation of China (No. 61702382). Appendix A. Notation and nomenclature
φi l
li A B Y = {yi |i = 1, . . . , N } X = {xi |i = 1, . . . , N } X2:N X1:N−1 N D Q
The autoregressive parameters The location of x in space time A lag for location l The dynamic matrix for LDS The projection matrix for LDS The observed sequential data The latent variables X2:N = {xi |i = 2, . . . , N} X1:N−1 = {xi |i = 1, . . . , N − 1} The length of the observed sequential data The dimensionality of observed data The dimensionality of latent variables
KY KX kYl , l = 1, 2, . . . , M kXl , l = 1, 2, . . . , M θ = { θl | l = 1 , . . . , M } } λ = {λl |l = 1, . . . , M WY WX
δxi ,x j
wδ C L
μX (xt ) μY (xt ) σX2 (x )
The kernel matrix for dimensional reduction The kernel matrix for dynamic regression The kernel functions to generate KY The kernel functions to generate KX The hyperparamters for kYl The hyperparamters for kXl The weights for compound kernel KY The weights for compound kernel KX The Kronecker delta The weight for δxi ,x j A constant The loss function The predicted mean value of x at time t The reconstruct mean value of y given xt The variance of x
References [1] Q. Yang, X. Wu, 10 challenging problems in data mining research, Int. J. Inf. Technol. Decis. Making 05 (4) (2006) 597–604. [2] G. Box, Box and Jenkins: Time Series Analysis, Forecasting and Control, Palgrave Macmillan UK, London, pp. 161–215. [3] X. You, W. Guo, S. Yu, K. Li, J.C. Príncipe, D. Tao, Kernel learning for dynamic texture synthesis, IEEE Trans. Image Process. 25 (10) (2016) 4782–4795. [4] S. Yi, Z. Lai, Z. He, Y.-m. Cheung, Y. Liu, Joint sparse principal component analysis, Pattern Recognit. 61 (2017) 524–536. [5] X. You, Q. Peng, Y. Yuan, Y. Cheung, J. Lei, Segmentation of retinal blood vessels using the radial projection and semi-supervised approach, Pattern Recognit. 44 (10) (2011) 2314–2324. [6] Y. Wei, X. You, H. Li, Multiscale patch-based contrast measure for small infrared target detection, Pattern Recognit. 58 (2016) 216–226. [7] J.M. Wang, D.J. Fleet, A. Hertzmann, Gaussian process dynamical models for human motion, IEEE Trans. Pattern Anal. Mach. Intell. 30 (2) (2008) 283–298. [8] C. Hong, J. Yu, J. Wan, D. Tao, M. Wang, Multimodal deep autoencoder for human pose recovery, IEEE Trans. Image Process. 24 (12) (2015) 5659–5670. [9] W. Liu, Z.-J. Zha, Y. Wang, K. Lu, D. Tao, p-Laplacian regularized sparse coding for human activity recognition, IEEE Trans. Ind. Electron. 63 (8) (2016) 5120–5129. [10] G. Doretto, A. Chiuso, Y.N. Wu, S. Soatto, Dynamic textures, Int. J. Comput. Vis. 51 (2) (2003) 91–109. [11] A. Schödl, R. Szeliski, D.H. Salesin, I. Essa, Video textures, in: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, ACM Press/Addison-Wesley Publishing Co., 20 0 0, pp. 489–498. [12] Y. Guo, G. Zhao, Z. Zhou, M. Pietikainen, Video texture synthesis with multi-frame lbp-top and diffeomorphic growth model, IEEE Trans. Image Process. 22 (10) (2013) 3879–3891. [13] X. Yang, W. Liu, D. Tao, J. Cheng, Canonical correlation analysis networks for two-view image recognition, Inf. Sci. 385 (2017) 338–352. [14] A. Schulz, J. Brinkrolf, B. Hammer, Efficient kernelisation of discriminative dimensionality reduction, Neurocomputing 268 (2017) 34–41. [15] B. Liu, Y. Zhou, Z.-g. Xia, P. Liu, Q.-y. Yan, H. Xu, Spectral regression based marginal fisher analysis dimensionality reduction algorithm, Neurocomputing 277 (2018) 101–107. [16] D. Zhang, Q. Zhu, D. Zhang, Multi-modal dimensionality reduction using effective distance, Neurocomputing 259 (2017) 130–139. [17] X. Chen, H. Yin, F. Jiang, L. Wang, Multi-view dimensionality reduction based on universum learning, Neurocomputing 275 (2018) 2279–2286.
Please cite this article as: Z. Zhu, J. Zhang and J. Zou et al., Multi-kernel Gaussian process latent variable regression model for highdimensional sequential data modeling, Neurocomputing, https://doi.org/10.1016/j.neucom.2018.07.082
JID: NEUCOM
ARTICLE IN PRESS
[m5G;November 23, 2018;6:41]
Z. Zhu, J. Zhang and J. Zou et al. / Neurocomputing xxx (xxxx) xxx [18] S. Yi, Z. He, Y.-m. Cheung, W.-S. Chen, Unified sparse subspace learning via self-contained regression, IEEE Trans. Circ. Syst. Video Technol. (2017). [19] N.D. Lawrence, Gaussian process latent variable models for visualisation of high dimensional data, Adv. Neural Inf. Process. Syst. 16 (3) (2004) 329–336. [20] M. Szummer, R.W. Picard, Temporal texture modeling, in: Proceedings of the International Conference on Image Processing, IEEE, 1996, pp. 823–826. [21] W. Ou, X. You, D. Tao, P. Zhang, Y. Tang, Z. Zhu, Robust face recognition via occlusion dictionary learning, Pattern Recognit. 47 (4) (2014) 1559–1572. [22] J.K. Ord, Space-time modeling with an application to regional forecasting, Trans. Inst. Brit. Geograph. 64 (1975) 119–128. [23] G. Doretto, E. Jones, S. Soatto, Spatially homogeneous dynamic textures, in: Proceedings of the European Conference on Computer Vision, Springer, 2004, pp. 591–602. [24] J. Filip, M. Haindl, D. Chetverikov, Fast synthesis of dynamic colour textures, in: Proceedings of the 18th International Conference on Pattern Recognition, 4, IEEE, 2006, pp. 25–28. [25] M. Haindl, J. Filip, Dynamic Textures, Springer, pp. 97–117. [26] R. Vidal, A. Ravichandran, Optical flow estimation & segmentation of multiple moving dynamic textures, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2, IEEE, 2005, pp. 516–521. [27] Z. Ghahramani, G.E. Hinton, Variational learning for switching state-space models, Neural Comput. 12 (4) (20 0 0) 831–864. [28] N.D. Lawrence, Learning for larger datasets with the Gaussian process latent variable model., in: Proceedings of the AISTATS, 11, 2007, pp. 243–250. [29] M.K. Titsias, N.D. Lawrence, Bayesian Gaussian process latent variable model., in: Proceedings of the AISTATS, 2010, pp. 844–851. [30] Z. Ghahramani, Probabilistic machine learning and artificial intelligence, Nature 521 (7553) (2015) 452–459. [31] X. You, L. Du, Y.-m. Cheung, Q. Chen, A blind watermarking scheme using new nontensor product wavelet filter banks, IEEE Trans. Image Process. 19 (12) (2010) 3271–3284. [32] C.E. Rasmussen, Gaussian processes in machine learning, in: Advanced Lectures on Machine Learning, Springer, 2004, pp. 63–71. [33] Z. He, X. You, Y.Y. Tang, Writer identification of Chinese handwriting documents using hidden Markov tree model, Pattern Recognit. 41 (4) (2008) 1295–1307. [34] R. Péteri, S. Fazekas, M.J. Huiskes, Dyntex: a comprehensive database of dynamic textures, Pattern Recognit. Lett. 31 (12) (2010) 1627–1632. [35] G. Doretto, A. Chiuso, Y.N. Wu, S. Soatto, Dynamic textures, Int. J. Comput. Vis. 51 (2) (2003) 91–109. [36] B. Abraham, O.I. Camps, M. Sznaier, Dynamic texture with fourier descriptors, in: Proceedings of the 4th International Workshop on Texture Analysis and Synthesis, 1, 2005, pp. 53–58.
13
Jiayuan Zhang received the B.S. degree in computer science from Wuhan University of Science and Technology, Wuhan, China, in 2015. He is currently pursuing the Master degree with Wuhan University of Science and Technology, Wuhan, China. His current research interests include topics in probabilistic Inference, computer vision and machine learning.
Jixin Zou received the B.S. degree in environmental engineering from China Agricultural University, Beijing, China, in 2004, and the Master degree in applied chemistry from the China Agricultural University, Beijing, China, in 2006, and the Ph.D. degree in law from the Chinese People’s Public Security University, Beijing, China in 2013. He is currently an Senior Researcher with the institute of forensic science, Ministry of Public Security, Beijing, China. His current research interests include topics in computer vision and counterfeit money analysis and recognition.
Chunhua Deng received his B.S., M.S., and Ph.D. degrees from Huazhong University of Science and Technology, Wuhan, China. He is currently a lecturer with the School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China. His research interests include topics in computer vision, image processing, and machine learning.
Ziqi Zhu received the B.S. degree in computer science from Wuhan University, Wuhan, China, in 2005, and the Ph.D. degree in computer science from the Huazhong University of Science and Technology, Wuhan, in 2011. He is currently an Associated Professor with the School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China. His current research interests include topics in computer vision, patter recognition and machine learning.
Please cite this article as: Z. Zhu, J. Zhang and J. Zou et al., Multi-kernel Gaussian process latent variable regression model for highdimensional sequential data modeling, Neurocomputing, https://doi.org/10.1016/j.neucom.2018.07.082