Image super-resolution via 2D tensor regression learning

Image super-resolution via 2D tensor regression learning

Computer Vision and Image Understanding xxx (2014) xxx–xxx Contents lists available at ScienceDirect Computer Vision and Image Understanding journal...

3MB Sizes 3 Downloads 64 Views

Computer Vision and Image Understanding xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Computer Vision and Image Understanding journal homepage: www.elsevier.com/locate/cviu

Image super-resolution via 2D tensor regression learning q Ming Yin a, Junbin Gao b,⇑, Shuting Cai a a b

School of Automation, Guangdong University of Technology, Guangzhou, China School of Computing and Mathematics, Charles Sturt University, Bathurst, NSW 2795, Australia

a r t i c l e

i n f o

Article history: Received 31 December 2013 Accepted 27 November 2014 Available online xxxx Keywords: Image super-resolution Tensor regression Multi-task regression Non-negative constraint Orthogonal constraint Frobenius norm

a b s t r a c t Among the example-based learning methods of image super-resolution (SR), the mapping function between a high-resolution (HR) image and its low-resolution (LR) version plays a critical role in SR process. This paper presents a novel framework on 2D tensor regression learning model to favor single image SR reconstruction. From the image statistical point of view, the statistical matching relationship between an HR image patch and its LR counterpart can be efficiently represented in tensor spaces. Specifically, in this paper, we define a generalized 2D tensor regression framework between HR and LR image patch pairs to learn a set of tensor coefficients gathering statistical dependency between HR and LR patches. The framework is imposed by different constraint terms resulting in an interesting interpretation for the linear mapping function relating the LR and HR image patch spaces for image super-resolution. Finally, the HR image is then synthesized by a set of patches from one LR image input under the learned tensor regression model. Experimental results show that our algorithm generates HR images that are competitive or even superior to images produced by other similar SR methods in both PSNR (peak signal-to-noise ratio) and visual quality. Ó 2014 Elsevier Inc. All rights reserved.

1. Introduction As one of software resolution enhancement techniques, image super-resolution aims to recover high-resolution (HR) images from low-resolution (LR) input images [27]. Actually, image super-resolution (SR) processing is often desirable for low-cost imaging devices with resolution limitations [13], such as mobile phones, satellite imaging, video surveillance, microscopy, and digital mosaicing. With the assistance of SR, we can offer high quality images for the growing capability of modern HR displays. The basic idea behind SR is the fusion of a single or a sequence of LR noisy blurred images to produce an HR image or sequence. In many cases, multiple LR images of the same scene offer more information to recover a HR image of the scene. However, sometimes we only acquired few even a single LR input image in real world, thus an SR algorithm or technique using a single LR input image to recover HR image is more practical [25,39]. In this paper, we only consider the case that the input is a single LR image. In general, SR task is usually cast as the inverse problem [7] of recovering original HR image by fusing the observed LR images.

q

This paper has been recommended for acceptance by C.V. Jawahar.

⇑ Corresponding author. E-mail addresses: [email protected] (M. Yin), [email protected] (J. Gao), [email protected] (S. Cai).

The inverse problem is formulated under the following generic model,

y ¼ Hx

ð1Þ

where y is the observed LR image (vectorized) and x is the unknown HR image (vectorized). The matrix H represents the imaging system, consisting of several processes, such as blurring and down-sampling operations. However, finding x from y based on the above model is severely ill-posed because of the insufficient information from LR images thus the solution from the reconstruction constraint is not unique. Generally speaking, the existing super-resolution methods can be roughly categorized into three types, which are interpolation based [5], reconstruction based [22] and learning-based [3] approaches. Interpolation based techniques have their roots in sampling theory and the HR image is directly recovered through an interpolation from the LR input. These approaches tend to blur high frequency details resulting in noticeably smooth images with ringing and jagged artifacts particularly along edges, however, they remain popular due to their computational simplicity. In reconstruction based approaches, SR problem is cast as an inverse problem [7] of recovering an HR image based on a reasonable observation model assumption that maps HR image to the LR image(s) together with prior knowledge [6] about HR images. In this process, many kinds of regularization [24] are incorporated

http://dx.doi.org/10.1016/j.cviu.2014.11.005 1077-3142/Ó 2014 Elsevier Inc. All rights reserved.

Please cite this article in press as: M. Yin et al., Image super-resolution via 2D tensor regression learning, Comput. Vis. Image Understand. (2014), http:// dx.doi.org/10.1016/j.cviu.2014.11.005

2

M. Yin et al. / Computer Vision and Image Understanding xxx (2014) xxx–xxx

into the model as prior knowledge to stabilize the inversion procedure of this ill-posed problem, see [11,30]. However, the performance of the mentioned approaches is only acceptable for small upscaling factors, leading to the development of example-based learning approaches [12], which aim to learn the co-occurrence prior between local HR and LR image structures from an external training database [3,29]. As the contemporary super-resolution approach, learning-based methods provide some promising results according to reported experiments. In [3], Chang et al. adopted the philosophy of Locally Linear Embedding (LLE) from manifold learning to recover the high-resolution image given its low-resolution counterpart as input. It is worth mentioning that Yang et al. [37] proposed a learning based super-resolution scheme based on the sparse representation and Gao et al. [13] implemented the sparse dictionary learning for image super-resolution by using the-state-of-the-art Restricted Boltzmann Machine (RBM). Fundamentally all the mentioned approaches rely on the assumed linear (maybe locally linear) relationships between the LR and HR pairs. For better exploiting the underlying context information of image, Yang et al. [39] modeled the relationship between the HR and LR patches by learning similar textural context for image sparse representation. In order to enhance the performance of image restoration (IR), Dong et al. [10] introduced two adaptive regularization terms, i.e., piecewise autoregressive (AR) model and non-local (NL) self-similarity, into the adaptive sparse representation framework. To further improve the capability of sparse representation based IR, the authors [9] proposed the concept of sparse coding noise and recast the IR goal into how to suppress the sparse coding noise. On the other hand, other researchers also proposed SR techniques by exploiting nonlinear relations, for example, the Gaussian process (GP) based regression models have been utilized to learn such nonlinear mappings at pixel levels [16]. Although the resulting HR image is pleasing, there exists huge computational overhead because all the Gaussian models have to be re-calculated for each pixel of each input image. To extend their work of [37], the authors proposed a bilevel optimization model for the coupled dictionary training under sparse representation framework [36]. For image SR, they considered the case where the mapping function may take nonlinear forms. The experimental results showed that the new learning method outperform their old one, i.e., joint dictionary training method, both quantitatively and qualitatively. In the above techniques for learning the mapping function, the HR/LR image patches are all manually vectorized. Thus some important spatial information among pixels tends to lose in the vectorization process. To effectively exploit such spatial information, appropriate feature representation for image patches are desired. 2D tensor [1] is an effective representative for images without damaging pixel spatial relationships. Jia et al. [18] proposed a Bayesian framework to perform face image super-resolution for recognition in tensor space. Furthermore, to effectively explore the spatial local information and avoid the curse of dimensionality dilemma, Wu et al. [34] proposed a regression model in the tensorPCA subspace for face super-resolution reconstruction. They separately learn the tensor subspaces for the high-resolution images and low-resolution counterparts, however it is more desired to learn the matching relations between LR and HR images. Currently, tensor learning methods are drawing considerable attentions [15,41]. Motivated by the idea of using this popular tool, in this paper, we propose a generalized 2D tensor regression learning framework for single image super-resolution via learning tensor coefficients for HR and LR image patch pairs simultaneously. In light of the importance of the mapping function in the learning-based SR, the intent of our work will be understood explicitly since the SR quality largely depends on whether the mapping function can represent well the underlying relation between HR and LR

pairs. Meanwhile it is well known that some patches in a natural image may redundantly occur many times not only within the same scale, but across different scales. This observation motivates us to better exploit the relationship between HR/LR patch pairs for image SR task. Different from the existing vectorizing-based methods, the pixel spatial information can be preserved well when image patches are represented as tensorial data. By imposing different constraint terms, we can obtain three kinds of 2D tensor regression learning task for image SR. Our main contributions are summarized as follows. Firstly, by taking advantage of tensorial representation, we propose a general 2D tensor regression learning framework to learn a mapping function for image super-resolution; and secondly, to stabilize the solution of the 2D tensor learning task, we further impose three different regularization terms, i.e., Orthogonal constraint, Squared ‘2 -norm constraint and Non-negative constraint, on the generic model, which can leverage the power of this combination for image super-resolution. The remainder of the paper is organized as follows. In Section 2, a generalized 2D tensor regression learning framework is proposed to learn a mapping function for single image SR problem followed by a detailed description of algorithm in Section 3. In Section 4, we discuss how to apply the learned mapping function to the single image SR. The extensive experimental results for image SR and their analysis are reported in Section 5, where our results show the proposed methods are quantitatively and qualitatively competitive or even superior to the existing interpolation and learningbased SR approaches. Finally, the conclusion is drawn in Section 6. 2. 2D tensor regression learning framework To review some related existing learning-based SR techniques and introduce our proposed model, we first borrow some useful notations for tensorial algebra that will be used throughout this paper. More specifically, matrices will be denoted by capital letters, e.g., X, vectors by boldface lowercase letters, e.g., x, and scalars by lowercase letters, e.g., x. As for tensors, we denote it by Euler script calligraphic letters, e.g. X. In [15], Guo et al. considered the following linear regression model based on the tensor representation,

y ¼ hX; Wi þ b: Similar to the vectorial case, the inner product hX; Wi is defined as the sum of elementwise products of two tensors X and W in the same size. In least square learning, we may assume b ¼ 0 if we centralize all the training data by removing their mean. In this paper, we are particularly interested in the case of 2D matrix tensors X and W. In particular, we consider the following bilinear model

y ¼ uX v T þ b where u and v are two parameter vectors. To apply this model in our SR setting, we need generalize the above bilinear model to suit image patch outputs. To efficiently utilize 2D visual data in the SR setting, Wu et al. [34] proposed a regression model over the tensorPCA subspaces [1]. In their approach, an HR tensor subspace and an LR tensor subspace are learned from the training HR tensors X ¼ fX i gNi¼1 and their LR tensors Y ¼ fY i gNi¼1 , respectively, under the PCA criterion, i.e., maximizing the tensor variances. In fact, each defines a mapping from a higher dimensional 2D tensor (an image patch in HR) to a lower dimensional 2D tensor given by Y ¼ UXV T , where U and V are the left projection matrix and the right projection matrix, respectively. After two sets of tensorPCA subspace projection for HR tensor and LR tensor pairs are obtained, a co-occurrent linear model for the relationship between the HR and its LR tensor-

Please cite this article in press as: M. Yin et al., Image super-resolution via 2D tensor regression learning, Comput. Vis. Image Understand. (2014), http:// dx.doi.org/10.1016/j.cviu.2014.11.005

M. Yin et al. / Computer Vision and Image Understanding xxx (2014) xxx–xxx

PCA subspaces is trained [34]. More specifically, an approximate conditional probability model was applied for the tensor subspace coefficients and the maximum-likelihood (ML) estimator gives an ordinary linear regression model. Motivated by [15,34], in this section, we focus on how to train the mapping function between LR and HR patch pairs by using tensorial data and then propose a generalized 2D tensor regression learning framework. Without loss of generality, we assume that there exists a set of HR and LR image patch pairs, denoted by fðX i ; Y i ÞgNi¼1 , where each X i 2 Rmn and Y i 2 Rpq . The dimensions satisfy m > p and n > q. The traditional learning-based super-resolution methods usually focus on modeling the relationship between HR image patch fX i gNi¼1 and its counterpart LR fY i gNi¼1 by exploiting priors of specific images [6,7,24,37]. However, these models often treat each image patch as a single feature vector, thus the pixel spatial relations tend to lose in this conversion. As referred in [34], the vectorization of data degrades the underlying structural information due to losing spatial localization ability. Moreover, vectorizing a tensor data often leads to an algorithm suffering from the curse of dimensionality dilemma. Actually, tensors can often be considered as a more natural representation of visual data, as observed in [41]. Different from the method [34] assuming a linear regression model on the learned tensor subspace, we directly regress HR image patches fX i gNi¼1 over LR inputs fY i gNi¼1 with a multiple tensor regression learning model. Our proposed model can be formulated as follows,

X i ¼ UY i V T

ð2Þ

where U and V are the left and the right parameter matrices, respectively. It seems that the proposed model is contradictory to intuition by using less information (LR) to prediction more information (HR), however there exists an implicit relation between X i and Y i which admits a possible recovery of the model. Meanwhile, the model still defines a linear relation between fX i g (output) and fY i g (input) and the coefficients/weights are constrained though. It is this constrained condition that enforces the model to learn the information among the spatially related pixels. We can see this from the following example. Consider a pixel xij of HR patch X. Under model (2), we can write xij as follows

xij ¼ ui Y v j ¼

X uik v lj ykl kl

where ui ¼ ðui1 ; . . . ; uip Þ is the ith row of U; v j ¼ ðv 1j ; . . . ; uqj ÞT is the jth column of V and ykl s are pixels of image patch Y. In the above bilinear relation, the coefficient of ykl has been decomposed into two parts uik and v lj which capture row and column relation information independently. In fact, if we vectorize both X i and Y i , then (2) is equivalent to the following structured linear model,

vecðX i Þ ¼ ðU  V T ÞvecðY i Þ where the overall coefficient matrix W ¼ U  V T is composed of two separate matrices U and V.  means the matrix Kronecker product. In other words, the generic LR image formulation model (1) is þ defined with H ¼ W þ ¼ ðU  V T Þ where W þ is pseudo-inverse of W. Inspired by the advance of tensor regression learning [15], we consider a generalized 2D tensor learning task below. Given a training set fX i ; Y i gNi¼1 , our purpose is to find two appropriate parameter matrices U and V such that the following criterion is satisfied

2 N  X   min X i  UY i V T  þ J U ðUÞ þ J V ðVÞ; U;V

i¼1

F

ð3Þ

3

where kkF is the matrix Frobenius norm defined as 2 Pm Pn kX k2F ¼ i¼1 j¼1 xij  . Functionals JU ðUÞ and JV ðVÞ are regularization terms, which are applied to enforce certain application-dependent characteristics of the optimal solution. 2.1. Orthogonally constrained 2D tensor learning model To obtain stable parameter matrices U and V, we propose the following regularized version, named as the Orthogonally Constrained Learning Model:

T

N  2 X  T X i  UY i V  ;

min T

F

U U¼I;V V¼I i¼1

ð4Þ

where I is the identity matrix. The explicit motivation for this regularization is to regard U and V as projection operator (or bases) along the row and column directions. In the next section, we will propose an algorithm to solve problem (4). Once U and V have been learnt, the HR resolution image patch X can be easily worked out from the low-resolution image patch Y. Finally the whole HR image can be recovered by averaging over those corresponding HR patches given by the model. Remark 1. Here we would like to stress the difference between our learning model and the tensor regression learning presented in [15]. In our case, rather than the scalar output in [15,41], the regression output is an image patch. Indeed, our new model can be regarded as a multi-task tensor regression learning [35] as all the pixels will share the same set of 2D tensors. Under such a constraint, it is likely for a model concerning 2D spatial information to perform better, thus as a consequence in our case it is likely that a better model relationship between HR patches and LR inputs can be learned. 2.2. Squared ‘2 -norm constrained 2D tensor learning model The above orthogonally constrained SR model places strong regularized conditions U T U ¼ I and V T V ¼ I over U and V, respectively. Thus (4) is an optimization problem over Stiefel manifold [31]. We can relax the orthogonal conditions by regularizing the norms of U P P and V. Here, we set J U ðUÞ ¼ i;j u2ij ; J V ðVÞ ¼ i;j v 2ij . This can be implemented with an unconstrained regularized version by introducing Lagrangian multipliers, N  2 X   min X i  UY i V T  þ k1 kUk2F þ k2 kVk2F : U;V

F

i¼1

ð5Þ

We call model (5) the Regularized Learning Model. In this model, we impose the squared ‘2 -norm onto the variables in order to find a smooth solution for (3). Since the regularized SR model (5) is an unconstrained optimization problem, it is relatively easy to solve it by using a standard optimization algorithm, such as the gradient descent algorithm [4]. We discuss algorithms for this model in the next section. 2.3. Non-negative constrained 2D tensor learning model As both U and V in (2) can be considered as up-sampling and filtering, we can propose to find out U and V with only positive elements by imposing non-negative constraints on both of them.

min

2 N  X  T X i  UY i V  ;

UP0;VP0

i¼1

F

ð6Þ

where U P 0 and V P 0 mean all the matrix elements are non-negative. The optimization problem (6) is termed as the Non-negative Learning Model. In fact, the non-negative SR model gives a more nat-

Please cite this article in press as: M. Yin et al., Image super-resolution via 2D tensor regression learning, Comput. Vis. Image Understand. (2014), http:// dx.doi.org/10.1016/j.cviu.2014.11.005

4

M. Yin et al. / Computer Vision and Image Understanding xxx (2014) xxx–xxx

ural interpretation that an HR image can be recovered by a positively additive combination of low-resolution image patches. Non-negative Matrix Factorization (NMF) [20,26] is a useful tool to find a suitable representation of data, which typically makes latent structures for the given data. Applications of NMF are broad, including image processing [40], text data mining [2,23], face recognition [28], subspace learning [38], etc. To find such a part-based subspace, NMF is formulated as the following optimization problem,

min kX  UV k2F ;

 X  X max tr UY i V T X Ti ¼ max tr U Y i V T X Ti

U T U¼I i

U T U¼I

!

P Let M ¼ i Y i V T X Ti . Suppose M has its SVD computed by ^ RV ^ T . Then (9) can be written as svdðM Þ ¼ U

tr U

X Y iV T Xi

!

    ^ RV b T ¼ tr V b T UU ^R ¼ tr U U

i

b T U U, ^ then problem (9) is finally recast as follows, Denote by Z ¼ V

max trðZ RÞ

UP0;VP0

ð10Þ

U T U¼I

where X is the data matrix, U is the basis matrix and V is the coefficient matrix. The inequalities are referred to as bound constraints since the variables are lower-bounded in this case. In our context here, model (6) imposes non-negative constraints on the parameter matrices U and V, which leads to a part-based HR image representation since they allow only additive, not subtractive, combinations of the LR patches data. To solve the NMF problem, researchers proposed many algorithms and variants in recent years [8,19,20,26]. The optimization problem for NMF is convex with respect to one matrix U or V while fixing the other, however, it is not convex in both simultaneously. Paatero et al. [26] presented a gradient algorithm for this optimization, whereas Lee and Seung [20] provided a multiplicative update rule that is somewhat simpler to implement and also showed good performance. Although it is simple to implement, this algorithm even does not guarantee to converge to a stationary point. To tackle this drawback, Kim and Park [19] proposed the alternating non-negative least squares (NNLS) algorithm for which the convergence to a stationary point has been proved. Meanwhile, Ding et al. [8] extended the traditional NMF to semiNMF which removes the non-negative regularization on the data X and basis matrix U. The authors applied a gradient-descent-based update rule to efficiently solve the semi-NMF problem. Similarly, model (6) can also be tackled by using projected gradient descent methods [21]. In this paper, we propose to use alternative coordinate algorithms to solve the problem. Details of the algorithm for model (6) is presented in the following section.

P As R is diagonal and Z is orthogonal, we have trðZ RÞ ¼ i zii ri . Hence the solution to the above problem in terms of Z should be ^ U b T, given by Z ¼ I. Therefore, the solution U is given by U  ¼ VI P  T T b b where ½ U; R; V  ¼ svd iY iV Xi . Alternatively, fixing the variable U, we can obtain the solution   b IU b T , where ½ U; b R; V b  ¼ svd P Y T U T X i . Finally, the main of V  ¼ V i i procedure of the optimization algorithm for Orthogonally Constrained Learning Model can be summarized in Algorithm 1. Algorithm 1. Training orthogonally constrained learning model. Require: fX i ; Y i gN i¼1 Ensure: U  ; V  1: Initialization: V 0 is set as a rand matrix; tol = 1e-5; 2: While not converged ðk ¼ 1; 2; . . .Þ do   b R; V b  ¼ svd P Y i V T X T , 3: ½ U;

3.1. Orthogonal constrained SR model First we re-write the objective function (4) as follows,

2 X  T X i  UY i V  F

i

 T  X  ¼ tr X i  UY i V T X i  UY i V T

ð7Þ

i

    X  T  tr Y i Y i þ tr X Ti X i  2tr UY i V T X i ¼

k1

i

i

^ U bT, U k ¼ VI   b R; V b  ¼ svd P Y T U T X i , ½ U;

4: 5:

i

i

k

b IU bT. Vk ¼ V check convergence: 2 P   if i X i  U k Y i V Tk  < tol then

6: 7: 8:

F

9: break; 10: end if 11: end while

3. 2D tensor learning algorithm description In this section, we detail how to solve the 2D tensor-learning SR optimization problems along with different constraints on the parameter matrices U and V.

ð9Þ

i

3.2. Regularized Learning Model Next, we shift our attention to the Regularized Learning Model (5) and give its algorithm to find the optimal solution. The problem is an unconstrained optimization problem and the objective function is quadratic in one variable while fixing the other. Hence it is easy to use an alternative coordinate method to solve it. e i ¼ UY i and the problem (5) Supposed that U is known, we let Y is recast as,

LðVÞ ¼ argmin V

2 k 1 X  2 2 e iV T  X i  Y  þ kV kF F 2 i 2

i

where trðÞ is the trace function. Since fX i g and fY i g are constants, the above problem is equivalent to

max

 X  tr UY i V T X Ti

U T U¼I;V T V¼I i

ð8Þ

Note that problem (8) is not jointly convex over both variables U and V, but it is convex with respect to either one variable while the other is fixed. For example, when V is known and fixed, optimization problem (8) is actually a well-posed problem that could be efficiently solved [32],

Taking the derivative of LðVÞ w.r.t V and setting the derivative to zero, we have,

T @LðVÞ X e T e i þ k2 V ¼ 0 ¼ Y iV  Xi Y @V i Then the solution for V  is given by,

!1 X T  X Te e e Y i Y i þ k2  I V ¼ Xi Y i 

i

ð11Þ

i

where I is an identity matrix with appropriate dimensions.

Please cite this article in press as: M. Yin et al., Image super-resolution via 2D tensor regression learning, Comput. Vis. Image Understand. (2014), http:// dx.doi.org/10.1016/j.cviu.2014.11.005

5

M. Yin et al. / Computer Vision and Image Understanding xxx (2014) xxx–xxx

e i ¼ Y i V T , we can obtain the Alternatively, fixing V and letting Y following formulation as well.

LðUÞ ¼ argmin U

Similarly, we take the derivative of LðUÞ w.r.t U and set the derivative to zero, then we have

 @LðUÞ X e e T þ k1 U ¼ 0 ¼ U Y i  Xi Y i @U i

i

!

X

!1 eiY eT Y i

i

e i is equal to UY i . where Y P  T T or that the normal matrices PIn the cases  i Y i V VY i T T become very ill-conditioned, the above two solui UY i Y i U tions can be achieved by using the regularized pseudo-inversion, that is,

(

X eT U ¼ max 0; Xi Y i

!



The optimal solution for U  is obtained as,

!1  X X T T e e e Y i Y i þ k1  I U ¼ Xi Y i

X T ei Xi Y i

2 k 1 X  1 2 e i X i  U Y  þ kU kF F 2 i 2



V ¼

aIþ

i

X

!þ ) eiY eT Y i

ð13Þ

i

e i = Y i V T . Similarly where Y

ð12Þ

(

X T ei V ¼ max 0; Xi Y 

i

Alternatively, after updating U, one may iterate Eq. (11) and then Eq. (12) until a convergence criterion is satisfied. In summary, the algorithm for Regularized Learning Model (5) is described in Algorithm 2. Algorithm 2. Training Regularized Learning Model.

!

aIþ

i

X

!þ ) eiY eT Y i

ð14Þ

i

e i = UY i and ðÞþ is pseudo-inverse operator. where Y a in (13) and (14) is a regularization parameter introduced for the algorithm to avoid from getting stuck in local minima. The value of a is adjusted in the k-th iterative step in our algorithm as

ak ¼ a0  exp ðk=qÞ Require: fX i ; Y i gN i¼1 Ensure: U  ; V  1: Initialization: U 0 is set as a rand matrix; tol = 1e-5; 2: While not converged ðk ¼ 1; 2; . . .Þ do e k ¼ U k1 Y i , update V by 3: fixing U and setting Y i    1 P P ek T ek T ek , 4: V k ¼ i X i Y i i ð Y i Þ Y i þ k2  I 5: 6:

e k ¼ Y i V T , update U by fixing V and setting Y k i    1 P e k ÞT P Y e kðY e k ÞT þ k1  I , Uk ¼ i X i ðY i i i i

7:

check convergence: 2 P   8: if i X i  U k Y i V Tk  < tol then F

9: break; 10: end if 11: end while

In our paper, a0 and q are empirically set 20 and 10, respectively. In optimization procedure, both U and V are updated alternatively. That is, one may iterate Eq. (14) and then Eq. (13) until a convergence condition is met. In summary, the algorithm for Non-negative Learning Model (6) is described in Algorithm 3. Algorithm 3. Training Non-negative Learning Model.

Require: fX i ; Y i gN i¼1 Ensure: U  ; V  1: Initialization: U 0 is set as a rand matrix with positive value; tol = 1e-5; a0 = 20; q = 10. 2: While not converged ðk ¼ 1; 2; . . .Þ do e k ¼ U k1 Y i , update V by 3: fixing U and setting Y i   þ P T e k  P ek 4: V k ¼ max 0; ak  I þ i Ye kT , iXi Y i i Yi 5:

3.3. Non-negative Learning Model

6:

For problem (6), it is well known that the objective is non-convex. Therefore it is impractical to provide an algorithm of seeking the global minimum solution of the problem. Here we propose an iterative algorithm which can achieve a local minimum. e i ¼ Y iV T . Firstly, initializing V with non-negative values, we let Y Then, problem (6) is reformulated as follows,

7: 8:

LðUÞ ¼ argmin UP0

N  2 X  e i X i  U Y  ;

2 N  X  e iV T  LðVÞ ¼ argmin X i  Y  F

i¼1

Now, our task is to iteratively solve these two subproblems, termed as alternating non-negativity least squares (ANLS) problems, until a convergence criterion is satisfied. The solutions are represented as 

U ¼

X eT Xi Y i i

!

updating akþ1 ; akþ1 ¼ a0  exp ðk=qÞ. check convergence: 2 P   if i X i  U k Y i V Tk  < tol then F

10: break; 11: end if 12: end while

F

i¼1

e i ¼ UY i , then let V be updated by, Alternatively, fixing U and letting Y

VP0

9:

e k ¼ Y i V T , update U by fixing V and setting Y k i   P e kT  P e k e kT þ X a  , U k ¼ max 0; Y k Iþ i i i iYi Yi

X

!1 eiY eT Y i

i

e i is equal to Y i V T , and where Y

4. Image SR via learned mapping function In this section, we discuss how to perform the patchwise SR recovery by the learned mapping function. At the training stage, we collect a large number of HR and LR image pairs to efficiently learn the relationship between them. In order to accurately represent the mapping, we first classify each training patch pairs into a certain cluster. Specifically, we apply a high-pass filter to each HR patch to output the feature for clustering similar to [9,10]. Given fX i ; Y i gNi¼1 patches, we categorize the dataset into K clusters. And the mapping function for each of the K clusters is learned by using the above three 2D tensor learning model (4)–(6).

Please cite this article in press as: M. Yin et al., Image super-resolution via 2D tensor regression learning, Comput. Vis. Image Understand. (2014), http:// dx.doi.org/10.1016/j.cviu.2014.11.005

6

M. Yin et al. / Computer Vision and Image Understanding xxx (2014) xxx–xxx

After training each 2D tensor learning model, we obtain the parameter matrices U and V for subsequent SR processing. For simplicity, we first interpolate the input LR image into the size of desired HR one with Bicubic method, and then separate the initial HR image into a set of overlapping patches with the same size as in the phase of learning the mapping function. Then, these image patches are also categorized into K clusters. In terms of the mapping function belonging to each cluster, the most suitable mapping function can be selected for each given low-resolution patch. Given each LR image patch Y i , we remove its dc component and then recover its HR counterpart X i by computing X i ¼ UY i V T . And then the mean value is added into the HR counterpart. Then, we can efficiently recover HR image via tiling those HR patches together, where the average of multiple estimates is taken for each pixel in the overlapping region. Since the obtained HR image may not satisfy the reconstruction constraint as (1), a back-projected process is required to be carried out to further refine the recovered HR image. In particular, the recovered HR image should be projected onto the solution space of (1) to correct the high-resolution image pixels in a back-projected way [17]. This process validates that the final recovered HR image is consistent with the input LR image.

5. Experimental results In order to investigate the performance of our proposed SR schemes based on our 2D tensor regression learning model, we conducted several experiments of single image super-resolution using the proposed method and other existing methods. Except for Parthenon, the size of most test images are all 256  256 which are popular for validating image super-resolution performance in literature [7,24,37]. An observed LR image is synthesized by simply down-sampling an HR image at a certain scaling factor in both horizontal and vertical directions without any blurring processing. We apply three magnification factors, i.e., 2, 3 and 4, in our experiments. These scaling factors are widely used in all the literatures of single image super-resolution. Regarding the fact that the human visual system is more sensitive to luminance changes, in our experiments, we only apply the SR methods to the luminance component. As for the other components, a simple Bicubic interpolator is utilized to up-sample them.

5.1. Training dataset To train the parameter matrices U and V in the suggested tensor regression learning model, we collected 100,000 patch pairs as for both HR images and LR images by randomly sampling from natural images. Some examples of training images are illustrated in Fig. 1. Specifically, similar to [37], we extract the up-sampled LR image using Bicubic interpolation as the training patches instead of the original LR image. To efficiently cluster the image patches, the Gauss filter with r ¼ 2:2 is adopted to capture the image features. Then the clustering is performed in a feature space. To further efficiently describe the feature relationship, we just choose those patches with its variance larger than some threshold, i.e., varðX i Þ > 10. The patches with lower variance values are excluded due to their smoothness. These patches are subtracted from their mean and normalized to eliminate the mismatch problem. 5.2. Experimental setting To compare our proposed method with existing techniques both qualitatively and quantitatively, we take into account the following state-of-the-art methods since our proposed algorithm belongs to the family that only learns the mapping information from the collected LR and HR image pairs without any statistical priors beforehand. Therefore, the following four methods for image SR methods are applied for performance comparison.

Table 1 Performance comparison of PSNR and SSIM with different patch sizes when downscaling factor is 2. Bold value is the best. Images

Butterfly Parrots Parthenon Plants Girl Bike Hat Leaves Flower

PSNR/dB

SSIM

size = 3

size = 5

size = 7

size = 3

size = 5

size = 7

29.02 32.44 28.57 35.43 35.07 26.84 32.83 29.01 31.29

27.86 31.47 28.21 34.67 34.48 26.37 32.36 27.46 30.56

27.24 31.29 28.01 34.34 34.47 26.28 32.37 26.91 30.58

0.932 0.936 0.823 0.932 0.784 0.870 0.894 0.944 0.900

0.916 0.928 0.816 0.928 0.779 0.856 0.891 0.929 0.887

0.898 0.925 0.811 0.925 0.780 0.849 0.888 0.915 0.885

Fig. 1. Examples of training data.

Please cite this article in press as: M. Yin et al., Image super-resolution via 2D tensor regression learning, Comput. Vis. Image Understand. (2014), http:// dx.doi.org/10.1016/j.cviu.2014.11.005

7

M. Yin et al. / Computer Vision and Image Understanding xxx (2014) xxx–xxx

Table 2 PSNR values of SR images with different methods: Bicubic, Yang’s method [37], Chang’s method [3], Glasner’s method [14], O-SR (4), R-SR (5) and NN-SR (6) (Magnification factor = 3). Bold value is the best. Images

Bicubic

Yang [37]

Chang [3]

Glasner [14]

O-SR

R-SR

NN-SR

Butterfly Parrots Parthenon Plants Girl Bike Hat Leaves Flower Average Gain

24.08 28.13 26.05 31.12 32.70 22.83 29.22 23.49 27.48 27.23 1.15

24.73 28.38 26.13 31.48 32.79 23.20 29.65 24.26 27.76 27.60 0.78

24.34 23.02 25.12 30.13 32.00 22.16 28.51 21.81 26.58 25.96 2.42

25.11 28.59 26.40 31.96 33.00 23.41 30.01 24.66 28.01 27.91 0.47

25.95 29.50 26.59 32.23 33.48 23.75 30.19 25.22 28.49 28.38 –

25.59 29.25 26.57 32.10 33.49 23.59 30.05 24.93 28.31 28.21 0.17

24.72 28.98 26.07 31.42 33.09 23.35 29.79 24.08 27.83 27.70 0.68

Table 3 SSIM values for different methods (Magnification factor = 3). Bold value is the best. Images

Bicubic

Yang [37]

Chang [3]

Glasner [14]

O-SR

R-SR

NN-SR

Butterfly Parrots Parthenon Plants Girl Bike Hat Leaves Flower Average Gain

0.822 0.875 0.692 0.834 0.688 0.684 0.802 0.796 0.773 0.774 0.035

0.819 0.877 0.700 0.870 0.801 0.719 0.836 0.817 0.793 0.804 0.005

0.830 0.631 0.630 0.796 0.661 0.615 0.767 0.745 0.721 0.711 0.098

0.857 0.856 0.667 0.846 0.745 0.684 0.813 0.834 0.763 0.785 0.024

0.863 0.894 0.721 0.860 0.712 0.740 0.825 0.851 0.811 0.809 –

0.847 0.892 0.721 0.859 0.712 0.732 0.821 0.837 0.806 0.803 0.006

0.829 0.884 0.707 0.850 0.707 0.720 0.816 0.821 0.790 0.792 0.017

Fig. 2. SR results on Flower magnified by a factor of 3 using different methods. (a) Bicubic; (b) Chang’s [3]; (c) Yang’s [37]; (d) Glasner’s [14]; (e) O-SR (4); (f) R-SR (5); (g) NN-SR (6).

1. The Bicubic interpolation method: as a benchmark for superresolution. 2. Yang’s method [37]: sparse representation for SR, a state-ofthe-art method of example-based SR utilizing the information learned from training image data.

3. Chang’s method [3]: example-based SR using neighbour embedding. 4. Glasner’s method [14]: a classical self-exemplar based SR exploiting similar patches across multiple scale versions of the input image.

Please cite this article in press as: M. Yin et al., Image super-resolution via 2D tensor regression learning, Comput. Vis. Image Understand. (2014), http:// dx.doi.org/10.1016/j.cviu.2014.11.005

8

M. Yin et al. / Computer Vision and Image Understanding xxx (2014) xxx–xxx

Fig. 3. SR results on Parrots magnified by a factor of 3 using different methods. (a) Bicubic; (b) Chang’s [3]; (c) Yang’s [37]; (d) Glasner’s [14]; (e) O-SR (4); (f) R-SR (5); (g) NN-SR (6).

Fig. 4. SR results on leaves magnified by a factor of 3 using different methods. (a) Bicubic; (b) Chang’s [3]; (c) Yang’s [37]; (d) Glasner’s [14]; (e) O-SR (4); (f) R-SR (5); (g) NN-SR (6).

In our experiments, we apply two criteria to assess experimental results. One is the peak signal-to-noise ratio (PSNR), which is the most popular measure for image objective quality. Besides PSNR, we also use the Structural Similarity (SSIM) index [33] value to measure the visual quality of recovered image. Similar to most current works on super-resolution, we just report the experimental results of the luminance component. For the sake of convenience,

we simply call SR based on Orthogonally Constrained learning method (4) as the Orthogonal SR (O-SR), Regularized learning method (5) as the Regularized SR (R-SR), and Non-negative learning method (6) as the Non-negative SR (NN-SR). Firstly, we should determine the size of image patch for preparing effective training dataset and recovering the high resolution image. It is known that the patch size cannot be too big. Otherwise,

Please cite this article in press as: M. Yin et al., Image super-resolution via 2D tensor regression learning, Comput. Vis. Image Understand. (2014), http:// dx.doi.org/10.1016/j.cviu.2014.11.005

M. Yin et al. / Computer Vision and Image Understanding xxx (2014) xxx–xxx

9

Fig. 5. SR results on Parthenon magnified by a factor of 3 using different methods. (a) Bicubic; (b) Chang’s [3]; (c) Yang’s [37]; (d) Glasner’s [14]; (e) O-SR (4); (f) R-SR (5); (g) NN-SR (6).

if the patch size is too big, their self-similarities will be too low to effectively reflect the information among the spatially related pixels in images. For simplicity, the low resolution image is obtained by down-sampling the original image with a factor of 2 to assess SR performance for different patch sizes1. We here choose three types of patches, i.e., 3  3, 5  5 and 7  7, to compare the image

1 We also test over down-sampling with factor of 3 and 4 and we can get the similar conclusion, so we exclude those experiments in this paper.

super-resolution performance. Table 1 reports the experimental results of super-resolution on test images via the Orthogonal SR (4), in which the best result is denoted with bold digits. From the table, we can see that these different patch sizes lead to different SR performance in terms of PSNR and SSIM results at the given down-sampling factor. After checking details carefully, we can observe that the smaller patch size (i.e, 3  3) is able to achieve better performance both in quantitative and qualitative respects. Based on the comprehensive visual and quantity consideration, we finally adopt 3  3 as the best image patch size in our experiments.

Please cite this article in press as: M. Yin et al., Image super-resolution via 2D tensor regression learning, Comput. Vis. Image Understand. (2014), http:// dx.doi.org/10.1016/j.cviu.2014.11.005

10

M. Yin et al. / Computer Vision and Image Understanding xxx (2014) xxx–xxx

5.3. Experimental results Table 2 reports PSNR values of 9 different images magnified by a up-sampling factor of 3, where our proposed methods are used for single image super-resolution with patch size of 3  3 and overlap of 2 pixels. In addition, other four super-resolution methods are also applied to extensive performance comparison. Besides PSNR comparison, we further provide SSIM values to measure the visual quality of the recovered SR images in Table 3. In order to clearly show the gain in terms of PSNR and SSIM, we present the average value by each SR method and the gain achieved by our method against the others. In particular, we select our best record to compute the gain. As we can see, the results demonstrate that our proposed SR methods can generally outperform other methods in terms of PSNR values. The proposed methods always achieve better performance than the classical Bicubic interpolation method with a gain of 1.15 dB on average. Meanwhile, our proposed methods also provide much improvement over other SR methods. As for Chang’s method [3], it is often to find a similar training image to the input test image so that the high-resolution image reconstruction weight can be learned efficiently. When the training image is not similar enough, however, the quality of the recovered image is degraded significantly. In our work, our PSNR gain against this method [3] is up to 2.42 dB on average. On the other hand, since the spatial localization among image patches is exploited, the learned mapping can provide a heavy likelihood to synthesize HR images with more realistic details. As for Glasner’s method [14], although it has an advantage of finding similar patches in some slightly downsampled input images, the number of patches is too limited to recover good high-resolution image in the end. Although our proposed methods can achieve pleasing objective quality improvement, they behave poorly in respect of SSIM for some test images, e.g., Plants,Hat and Girl. This may be explained by the fact that Yang’s method [37] can learn a good dictionary to capture a certain detail from some infrequent patches. While this kind of high-resolution information can not be learned due

to the special character of the test image. However, this difference is relatively minor, which is even hard to be noticed. We further provide a visual comparison with the test images by using different methods in Figs. 2–6. Figs. 2–6 show the images magnified by factor of 3 via applying different methods with zoom part. Though the difference of the qualitative results is not noticeable like the quantitative one, by carefully checking, we can see our methods can recover HR images with clear and sharp details yet. According to the reported results, our proposed three models show the similar performance. More specifically, we also observe that the Orthogonally Constrained Learning model (4) can achieve the best performance among our three tensor learning models. This observation can be further demonstrated by results in Tables 4 and 5 with the up-sampling scale of 4. When the up-sampling factor is 4, it can be seen that the PSNR gain is significantly improved against the Bicubic interpolation method and Chang’s method [3], illustrated in Tables 4 and 5. This can be explained by the fact that the learned mapping function can capture the details from HR and LR patches even for the bigger down-sampled factor. In addition, the advantage is also attributed to the usage of the back-projection technique in each up-sampling loop. 5.4. Computational complexity analysis The computational complexity of our proposed methods mainly focuses on training the parameter matrices U and V. Once U and V are learned, the recovery of HR image can be executed very fast. For an image with size of 256  256, our proposed methods only require limited time to implement super-resolution on a PC with Intel Core2 Duo 2.0 GHz processor and 2.0G RAM memory. If we decrease the overlap size of every patch, the speed of HR image recovery can be much faster at the cost of degradation quality of the recovered image. In order to give the concrete value for algorithm, Table 6 lists actual runtime for different SR methods. The average computation time (in minutes) for training and testing, i.e. T Tr and T Te , are both reported. We simply report the training/

Fig. 6. SR results on bike magnified by a factor of 3 using different methods. (a) Bicubic; (b) Chang’s [3]; (c) Yang’s [37]; (d) Glasner’s [14]; (e) O-SR (4); (f) R-SR (5); (g) NN-SR (6).

Please cite this article in press as: M. Yin et al., Image super-resolution via 2D tensor regression learning, Comput. Vis. Image Understand. (2014), http:// dx.doi.org/10.1016/j.cviu.2014.11.005

11

M. Yin et al. / Computer Vision and Image Understanding xxx (2014) xxx–xxx Table 4 PSNR values for our proposed methods (Magnification factor = 4). Bold value is the best. Images

Bicubic

Yang [37]

Chang [3]

Glasner [14]

O-SR

R-SR

NN-SR

Butterfly Parrots Parthenon Plants Girl Bike Hat Leaves Flower Average Gain

22.14 26.33 24.96 29.28 31.39 21.46 27.88 21.22 25.79 25.61 1.46

23.60 27.29 25.41 30.24 32.14 22.29 28.75 22.37 26.66 26.53 0.54

21.77 22.10 23.98 28.05 30.61 20.64 22.43 20.02 24.86 23.83 3.24

23.82 27.32 25.88 30.73 32.40 22.54 28.93 22.84 26.85 26.81 0.26

24.04 27.55 26.25 31.22 32.66 22.49 29.40 23.02 27.04 27.07 –

24.02 27.61 26.23 31.20 32.60 22.70 29.33 22.95 27.00 27.07 0

23.80 27.28 26.01 30.94 32.32 22.40 28.81 22.71 26.78 26.78 0.29

Table 5 SSIM values for our proposed methods (Magnification factor = 4). Bold value is the best. Images

Bicubic

Yang [37]

Chang [3]

Glasner [14]

O-SR

R-SR

NN-SR

Butterfly Parrots Parthenon Plants Girl Bike Hat Leaves Flower Average Gain

0.734 0.830 0.623 0.757 0.624 0.571 0.751 0.664 0.680 0.693 0.065

0.782 0.850 0.650 0.789 0.650 0.637 0.774 0.729 0.724 0.732 0.026

0.732 0.597 0.553 0.711 0.596 0.480 0.498 0.627 0.615 0.601 0.157

0.793 0.850 0.691 0.780 0.660 0.641 0.779 0.742 0.730 0.741 0.017

0.813 0.856 0.710 0.791 0.714 0.639 0.785 0.761 0.752 0.758 –

0.797 0.848 0.703 0.787 0.709 0.625 0.782 0.746 0.739 0.748 0.01

0.779 0.845 0.708 0.784 0.710 0.618 0.779 0.732 0.737 0.744 0.014

Table 6 Average run-time for our proposed methods (in minutes).

References

Time

Bicubic

Chang [3]

Glasner [14]

Yang [37]

Ours

T Tr n T Te

n

n 10

n 40

500 n 5

660 n 4.5

test time using the Matlab code either implemented by ourselves or released by the authors. As can be seen, the computation time of our SR algorithm is actually less than or comparable to stateof-the-art learning based SR methods.

6. Conclusion In this paper, we proposed a generalized 2D tensor regression learning framework for single image super-resolution reconstruction, which can efficiently preserve 2D pixel spatial information between HR and LR images. Instead of using the linear regression model to represent the relationship between vectorized HR/LR image patch pairs, we directly regress the HR patch over LR patch by a multiple 2D tensor regression learning model imposed on different constraints. Once the appropriate model parameter matrices have been learned, the desired HR image can be efficiently recovered by computing a 2D tensor model value for a given LR input. The extensive experimental results on the test LR images validate our proposed methods are competitive or even superior to many existing state-of-the-art methods in both PSNR and visual quality. Acknowledgments Junbin Gao’s work is supported by the Australian Research Council (ARC) through the Grant DP130100364. The work of the first and third authors is supported by NSF China under Grants No. 61201392.

[1] Deng Cai, Xiaofei He, Jiawei Han, Subspace learning based on tensor analysis, Technical report, Computer Science Department, UIUC, UIUCDCS-R-20052572, May 2005. [2] Deng Cai, Xiaofei He, Jiawei Han, Thomas S. Huang, Graph regularized nonnegative matrix factorization for data representation, IEEE Trans. Pattern Anal. Mach. Intell. 33 (8) (2011) 1548–1560. [3] H. Chang, D.-Y. Yeung, Y. Xiong, Super-resolution through neighbor embedding, in: IEEE Conference on Computer Vision and Pattern Classification (CVPR), 2004, pp. 275–282. [4] P.L. Combettes, J.C. Pesquet, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, Springer, New York, 2011. [5] S. Dai, M. Han, W. Xu, Y.Wu, Y. Gong, Soft edge smoothness prior for alpha channel super resolution, in: IEEE Conference on Computer Vision and Pattern Classification (CVPR), vol. 1, 2007, pp. 1–8. [6] Shengyang Dai, Mei Han, Wei Xu, Ying Wu, Yihong Gong, A.K. Katsaggelos, Softcuts: a soft edge smoothness prior for color image super-resolution, IEEE Trans. Image Process. 18 (5) (2009) 969–981. [7] I. Daubechies, M. Defrise, C. DeMol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Commun. Pure Appl. Mathe. 57 (11) (2004) 1413–1457. [8] Chris H.Q. Ding, Tao Li, Michael I. Jordan, Convex and seminonnegative matrix factorizations, IEEE Trans. Pattern Anal. Mach. Intell. 32 (1) (2010) 45–55. [9] Weisheng Dong, Lei Zhang, Guangming Shi, Xin Li, Nonlocal centralized sparse representation for image restoration, IEEE Trans. Image Process. 22 (4) (2013) 1620–1630. [10] Weisheng Dong, Lei Zhang, Guangming Shi, Xiaolin Wu, Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization, IEEE Trans. Image Process. 20 (7) (2011) 1838–1857. [11] S. Farsiu, M.D. Robinson, M. Elad, P. Milanfar, Fast and robust multiframe super-resolution, IEEE Trans. Image Process. 13 (2004) 1327–1344. [12] William T. Freeman, Thouis R. Jones, Egon C. Pasztor, Example-based superresolution, IEEE Comput. Graph. Appl. 22 (2) (2002) 56–65. [13] Junbin Gao, Yi Guo, Ming Yin, Restricted boltzmann machine approach to couple dictionary training for image super-resolution, in: IEEE International Conference on Image Processing, 2013, pp. 499–503. [14] Daniel Glasner, Shai Bagon, Michal Irani, Super-resolution from a single image, in: ICCV, 2009, pp. 349–356. [15] W. Guo, I. Kotsia, I. Patras, Tensor learning for regression, IEEE Trans. Image Process. 21 (2) (2012) 816–827. [16] He He, Wan-Chi Siu, Single image super-resolution using Gaussian process regression, in: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 449–456. [17] Michal Irani, Shmuel Peleg, Improving resolution by image registration, CVGIP: Graph. Models Image Process. 53 (3) (1991) 231–239.

Please cite this article in press as: M. Yin et al., Image super-resolution via 2D tensor regression learning, Comput. Vis. Image Understand. (2014), http:// dx.doi.org/10.1016/j.cviu.2014.11.005

12

M. Yin et al. / Computer Vision and Image Understanding xxx (2014) xxx–xxx

[18] Kui Jia, Shaogang Gong, Multi-modal tensor face for simultaneous superresolution and recognition, in: Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV), vol. 2, 2005, pp. 1683–1690. [19] Hyunsoo Kim, Haesun Park, Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method, SIAM J. Matrix Anal. Appl. 30 (2) (2008) 713–730. [20] Daniel D. Lee, H. Sebastian Seung, Learning the parts of objects by nonnegative matrix factorization, Nature 401 (6755) (1999) 788–791. [21] C.J. Lin, Projected gradient methods for nonnegative matrix factorization, Neural Computat. 19 (10) (2007) 2756–2779. [22] Zhouchen Lin, Heung-Yeung Shum, Fundamental limits of reconstructionbased superresolution algorithms under local translation, IEEE Trans. Pattern Anal. Mach. Intell. 26 (1) (2004) 83–97. [23] Haifeng Liu, Zhaohui Wu, Deng Cai, Thomas S. Huang, Constrained nonnegative matrix factorization for image representation, IEEE Trans. Pattern Anal. Mach. Intell. 34 (7) (2012) 1299–1311. [24] Antonio Marquina, Stanley J. Osher, Image super-resolution by TVregularization and Bregman iteration, J. Sci. Comput. 37 (3) (2008) 367–382. [25] Qiang Ning, Kan Chen, Li Yi, Chuchu Fan, Yao Lu, Jiangtao Wen, Image superresolution via analysis sparse prior, IEEE Signal Process. Lett. 20 (4) (2013) 399–402. [26] P. Paatero, U. Tapper, Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values, Environmetrics 5 (2) (1994) 111–126. [27] Sung Cheol Park, Min Kyu Park, Moon Gi Kang, Super-resolution image reconstruction: a technical overview, IEEE Signal Process. Magaz. 20 (3) (2003) 21–36. [28] Menaka Rajapakse, Lonce Wyse, Face recognition with non-negative matrix factorization, in: SPIE, vol. 5150, 2003, pp. 1838–1847. [29] J. Sun, Z. Xu, H. Shum, Image super-resolution using gradient profile prior, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, 2008, pp. 1–8. [30] M.E. Tipping, C.M. Bishop, Bayesian image super-resolution, in: Advances in Neural Information and Processing Systems 16 (NIPS), 2003.

[31] Pavan K. Turaga, Ashok Veeraraghavan, Rama Chellappa, Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision, in: IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, 2008. [32] T. Viklands, Algorithms for the Weighted Orthogonal Procrustes Problem and Other Least Squares Problems, Umeå universitet, 2006. [33] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment: from error measurement to structural similarity, IEEE Trans. Image Process. 13 (4) (2004) 600–612. [34] Junwen Wu, Mohan M. Trivedi, A regression model in TensorPCA subspace for face image super-resolution reconstruction, in: The 18th International Conference on Pattern Recognition (ICPR), 2006, pp. 627–630. [35] Ya Xue, Xuejun Liao, Lawrence Carin, Balaji Krishnapuram, Multi-task learning for classification with dirichlet process priors, J. Mach. Learn. Res. 8 (2007) 35–63. [36] Jianchao Yang, Zhaowen Wang, Zhe Lin, Scott Cohen, Thomas S. Huang, Coupled dictionary training for image super-resolution, IEEE Trans. Image Process. 21 (8) (2012) 3467–3478. [37] Jianchao Yang, John Wright, Thomas Huang, Yi Ma, Image super-resolution via sparse representation, IEEE Trans. Image Process. 19 (11) (2010) 2861– 2873. [38] Jianchao Yang, Shuicheng Yan, Yun Fu, Xuelong Li, Thomas S. Huang, Non-negative Graph Embedding, in: CVPR, IEEE Computer Society, 2008. [39] Ming-Chun Yang, Chang-Heng Wang, Ting-Yao Hu, Yu-Chiang Frank Wang, Learning context-aware sparse representation for single image superresolution, in: Benot Macq, Peter Schelkens (Eds.), ICIP, IEEE, 2011, pp. 1349–1352. [40] Chunjie Zhang, Jing Liu, Qi Tian, Changsheng Xu, Hanqing Lu, Songde Ma, Image classification by non-negative sparse coding, low-rank and sparse decomposition, in: CVPR, IEEE, 2011, pp. 1673–1680. [41] Hua Zhou, Lexin Li, Hongtu Zhu, Tensor regression with applications in neuroimaging data analysis, Technical report, North Carolina State University, 2012.

Please cite this article in press as: M. Yin et al., Image super-resolution via 2D tensor regression learning, Comput. Vis. Image Understand. (2014), http:// dx.doi.org/10.1016/j.cviu.2014.11.005