Communicated by Jun Yu
Accepted Manuscript
Double sparsity for multi-frame super resolution Toshiyuki Kato, Hideitsu Hino, Noboru Murata PII: DOI: Reference:
S0925-2312(17)30334-X 10.1016/j.neucom.2017.02.043 NEUCOM 18117
To appear in:
Neurocomputing
Received date: Revised date: Accepted date:
7 December 2016 15 February 2017 17 February 2017
Please cite this article as: Toshiyuki Kato, Hideitsu Hino, Noboru Murata, Double sparsity for multiframe super resolution, Neurocomputing (2017), doi: 10.1016/j.neucom.2017.02.043
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Double sparsity for multi-frame super resolution Toshiyuki Katoa , Hideitsu Hino∗,b , Noboru Murataa a
CR IP T
School of Science and Engineering, Waseda University, 3-4-1 Ohkubo, Shinjuku, Tokyo, Japan, 169-8555 b Department of Computer Science, University of Tsukuba, 1-1-1 Tennoudai, Tsukuba, Ibaraki, Japan, 305-8573
Abstract
AN US
A number of image super resolution algorithms based on the sparse coding
have successfully implemented multi-frame super resolution in recent years. In order to utilize multiple low-resolution observations, both accurate image registration and sparse coding are required. Previous study on multi-frame super resolution based on sparse coding firstly apply block matching for im-
M
age registration, followed by sparse coding to enhance the image resolution. In this paper, these two problems are solved by optimizing a single objective
ED
function. The proposed formulation not only has a mathematically interesting structure, called the double sparsity, but also yields comparable or
PT
improved numerical performance to conventional methods. Keywords: Image Super Resolution, Sparse Coding, Double Sparsity,
CE
Dictionary Learning
AC
1. Introduction Image super resolution (SR) is the problem of enhancing low-resolution
(LR) images. Its importance is increasing over time because of a growing need to remaster old videos or investigate low-resolution surveillance videos, ∗
Corresponding author. Tel.:+81 298 53 5538; Fax: +81 298 53 5538 Email address:
[email protected] (Hideitsu Hino∗, )
Preprint submitted to Neurocomputing
February 22, 2017
ACCEPTED MANUSCRIPT
for example. A large number of methods have been proposed for SR, of which a group of actively studied methods is based on sparse coding (Olshausen and Field, 1996, 2005; Natarajan, 1995; Elad, 2010; Bruce et al., 2016), which is the focus of this paper. Sparse coding is a methodology in signal process-
CR IP T
ing where the observed signals are approximated by a linear combination of
simple, common components called atoms. A distinctive feature of sparse coding is that the number of atoms prepared for signal reconstruction is large, whereas the number of atoms actually used to represent a signal is
AN US
small. The nature of sparse coding enables us to represent signals compactly
and remove noises effectively. Sparse coding has broad applications, and one of the most actively studied area is image representation. For example, combined with graph-based regularization techniques (Belkin and Niyogi, 2003), manifold regularized sparse coding methods are proposed (Zheng et al., 2011;
M
Jin et al., 2015; Lou et al., 2016; Shu et al., 2016). Other interesting applications of sparse coding are on human pose estimation problems (Wang et al.,
ED
2014; Hong et al., 2015b,a) and on image ranking problems (Yu et al., 2014, 2016), to name but a few.
PT
Sparse coding is also used for super resolution with a single LR image (Yang et al., 2008, 2010; Zeyde et al., 2010) and, recently, with mul-
CE
tiple LR images (Kato et al., 2015b). Multi-frame image SR is expected to generate clearer high-resolution (HR) images than single-frame SR, if the rel-
AC
ative positions of the observed LR images are accurately estimated. Indeed, the estimation of relative displacement or shift of multiple images, referred to as image registration (Zitova and Flusser, 2003), plays a critical role in
multi-frame SR as well as sparse coding. In our previous work on image SR (Kato et al., 2015b), we treated problems of image registration and sparse coding separately. That is, we first
2
ACCEPTED MANUSCRIPT
estimated shifts in observed LR images, and then applied the sparse coding algorithm to the aligned LR images. Since the purpose of registering LR images is high-quality HR image restoration by sparse representation, it is natural to perform image registration so that the error in sparse image rep-
CR IP T
resentation is minimized. The contribution of this paper is to treat image
registration at the sub-pixel level and the sparse coding problem in a unified
framework. More concretely, we simultaneously estimate both the shifts of LR images and the coefficients of sparse coding with a single objective. The-
AN US
oretically, we pose the multi-frame SR problem in a framework called double sparsity, which is an interesting approach to sparse modeling (Rubinstein et al., 2010; Zhan et al., 2013).
The rest of this paper is organized as follows. Section 2 describes the problem setting and the underlying model of multi-frame super resolution.
M
The sparse coding approach to super resolution is also described in this section. Section 3 briefly explains how fine shifts (relative displacements) be-
ED
tween observations are expressed by combinations of pixel-level shifts. Section 4 describes our proposed approach for simultaneously estimating shifts
PT
and sparse coding coefficients in a unified framework, then some theoretical analyses on the proposed method are presented. Section 5 details the exper-
CE
imental results, and the final section is devoted to our concluding remarks. A short and preliminary version of this paper appeared in MLSP 2016 (Kato
AC
et al., 2016). In this paper, in addition to the preliminary results appeared in MLSP, we show 1. Theoretical error analysis of estimating sparse coding coefficients under image registration errors. 2. Theoretical evaluation of the computational cost of the proposed method. 3. Experimental results on the application to movie datasets and evalua3
ACCEPTED MANUSCRIPT
tion of computational costs for both still image and movie datasets.
2. Notation and Formulation
CR IP T
We first explain the notion of single-frame super resolution by sparse coding, and extend it to multi-frame super resolution. 2.1. Single-frame Super Resolution
Let Y ∈ RQ be the observed LR image. The aim of super solution is to
AN US
construct an HR image X ∈ RP from Y, where Q < P . To reduce computational cost, image super resolution is often performed for small image regions called patches, which are then combined to construct the entire image. Following this method, we consider reconstructing an HR image patch
represented by a vector x ∈ Rp by using a single LR image patch y ∈ Rq ,
M
where q < p. Having obtained all the HR patches, post-processing to construct the full-sized HR image is performed, which is explained in details in
ED
Sections 5.2 and 5.3 of our previous paper (Kato et al., 2015a). For each patch, we assume the following degradation process: the HR
y = Gx + ε,
(1)
CE
PT
and LR patches x and y are connected each other by the observation model
where G is a degradation operator composed of blur and down-sampling
AC
operations, and ε ∈ Rq is the additive observation noise. In the framework of sparse coding (Olshausen and Field, 1996; Elad,
2010), the observed signals are represented by combinations of only a small number of basis vectors chosen from a large number of candidates. These basis vectors are called atoms henceforth.
4
ACCEPTED MANUSCRIPT
Let D = [d1 , d2 , . . . , dK ] ∈ Rq×K be a dictionary consisting of K atoms
dk , k = 1, . . . , K, and let α ∈ RK be a coefficient vector for the sparse representation of the patch y ∈ Rq . Typically, K > q. The problem of sparse
minimize ky − Dαk22 + ηkαk1 , α
where kαk1 =
PK
j=1
CR IP T
coding is then formulated as follows: η > 0,
(2)
|αj | is the `1 -norm of a vector, and η is called the
regularization parameter. This problem (2) adopts the `1 -norm of coefficients
as a measure of sparsity, and is referred to as the `1 -norm sparse coding. By
AN US
minimizing both the approximation error and the `1 -norm of the coefficient, the resultant α has only a few nonzero elements, and the observed patch y is well-approximated by using a small number of atoms. Specifically, we call
the problem of obtaining the coefficients α with a fixed dictionary D sparse
M
coding.
On the other hand, the problem of optimizing the dictionary D with a
ED
set of observations {yi }ni=1 is called dictionarylearning: minimize
PT
D
n X i=1
kyi − Dαi k22 ,
(3)
where αi is the solution of the problem (2) for each yi . A number of methods
CE
are available for dictionary learning (Engan et al., 1999; Aharon et al., 2006) and sparse coding (Rezaiifar and Krishnaprasad, 1993; Tibshirani, 1996). In
AC
this study, we use the ones proposed in (Lee et al., 2006) for their computational efficiency. We assume the HR patch x is represented by a sparse combination of HR
atoms as x = Dh α. Because of relation (1), the LR patch y is connected to the HR patch x as y ' Gx = GDh α = Dl α, 5
(4)
ACCEPTED MANUSCRIPT
= 0.63
+ 0.41
+ 0.31
+ 0.24
HR patch reconstruction
degradation
= 0.63
+ 0.41
+ 0.31
+ 0.24
CR IP T
LR patch
HR patch
HR atoms constructed in advance
LR atoms obtained by degenerating HR atoms
step 0: construct HR atoms using HR training images (dictionary learning) step 1: degenerate HR atoms by blur and downsampling step 2: approximate an observed LR image by using LR atoms (sparse coding) step 3: use the coefficients obtained in "step 2" for constructing an HR image
AN US
Figure 1: Super resolution by sparse coding.
where Dl is the LR dictionary. Atoms in Dl are generated from Dh by the image degeneration process explained above, and have one-to-one correspondence with the HR atoms. The above correspondence (4) naturally leads to
M
the following two-step procedure for single-frame super resolution: representing the LR patch by a sparse combination of LR atoms, and reconstructing
ED
the corresponding HR patch by using the combination coefficients of LR atoms to combine the HR atoms, as shown in Fig. 1. Before performing this single-frame SR, an HR dictionary needs to be prepared by an appropriate
PT
dictionary learning algorithm with HR training images. The single-frame SR
CE
algorithm with sparse coding is summarized in Algorithm 1. 2.2. Multi-frame Super Resolution by Sparse Coding
AC
Suppose N low-resolution images Yi ∈ RQ , i = 1, . . . , N are observed,
and all are varyingly degenerated from a high-resolution image X ∈ RP .
Without loss of generality, we assume that the LR image Y1 is the target for super resolution; the other N − 1 LR images are called auxiliary images. We assume that each patch is exposed by the following image degradation process: for target patch y1 , image degradation is modeled by Eq. (1), and, 6
ACCEPTED MANUSCRIPT
CR IP T
Algorithm 1 Single-frame Super Resolution with Sparse Coding Input: LR image Y, arbitrary chosen HR images Z1 , Z2 , · · · , ZN ∈ RP for training dictionary, regularization parameter η > 0 for the `1 -norm sparse coding Initialize:
• Randomly extract patches zj of size Rp from the input HR images.
minimize Dh
X j
AN US
• Learn an HR dictionary by solving the problems kzj − Dh αj k22
and minimizekzj − Dh αj k22 + ηkαj k1 αj
iteratively until convergence.
ED
Dl = GDh .
M
• Generate an LR dictionary by applying the degradation operator as Enhancing Resolution by Sparse Coding: while there remain patches in Y to be extracted do
PT
extract patch yi from the input LR image Y αi ← arg minα kyj − Dl αk22 + ηkαk1
CE
xi ← Dh αi
end while
AC
post processing to construct the entire image from {xi } (see, e.g. (Kato et al., 2015b)) return HR image X
7
ACCEPTED MANUSCRIPT
for auxiliary patch pairs (x, yi ), i = 2, . . . , N , observation model yi = GWi x + ε
(5)
is assumed, where Wi is the parallel shift, and the clipping operator corre-
CR IP T
sponds to the i-th LR patch, which is explained later and shown with an explicit operation in Appendix A. We note that G can be not identical for
different images Yi , but they are assumed identical for the sake of simplicity and we concentrate on estimating Wi , i = 2, . . . , N for N − 1 observations.
We explain the concept of the shift and clipping operators Wi with a
AN US
schematic diagram in Fig. 2. A detailed explanation is deferred until Sec-
tion 3. For ease of explanation, Fig. 2 depicts the case where y1 ∈ R5×5 and
the magnification factor is three, i.e. x ∈ R15×15 . The target patch y1 is first extracted from the target image Y1 as shown in the red square in “target
M
LR image” of Fig. 2. We then consider estimating the shift in the i-th image Yi . We can roughly estimate the shift of target patch y1 in the auxiliary im-
ED
age Yi by sub-pixel-level accuracy matching in LR space, as shown in “i-th LR image” of Fig. 2. For a detailed explanation, a small region around the placed target patch in the i-th LR image is zoomed into the bottom-right
PT
part of Fig. 2. The grid correspond to pixels in the i-th LR image. The target patch shown in the red square does not lie on the grid in general. To
CE
avoid a negative effect at the non-overlapping region and use the informative area only, the pixels completely included in the target patch are extracted
AC
as the i-th LR patch yi ∈ R(5−1)×(5−1) , which is shown by the blue square. √ √ Since the target patch y1 is represented by a vector of length q = q × q, √ √ the size of the patch in two-dimensional expression is q × q. On the √ √ other hand, the boundary-clipped patch has size q 0 = ( q − 1) × ( q − 1). Once Wi , i = 2, . . . , N are estimated, the image SR based on sparse coding is straightforward. With the shift and the clipping {W2 , . . . , WN } of auxiliary 8
ACCEPTED MANUSCRIPT
target LR Image Y1
HR Image magnify by a factor of three
=
CR IP T
= y1 2 R5⇥5 i-th LR Image Yi
x 2 R15⇥15
(k, k) (a, b): shifted origin
(0, 0) : origin of LR patch =
AN US
to be estimated
yi 2 R4⇥4
M
Figure 2: Shift in the i-th observation to the target image patch. The target image patch (red square in the upper-left part of the figure) is to be magnified as shown in the upper-
ED
right part of the figure, where the HR patch x is also shown in the red square. The target LR patch is placed in the i-th observation (lower-left) by sub-pixel-level matching in LR space, and the image around the target patch in Yi is zoomed into for display. The blue
PT
dashed square is patch yi clipped from Yi . The upper-left point of the upper-left pixel of yi is regarded as the origin (0, 0), and the shift from the target patch y1 is denoted by
CE
(a, b) ∈ [0, k]2 .
0
AC
patches y2 , . . . , yN ∈ Rq , the LR dictionaries corresponding to observations
9
ACCEPTED MANUSCRIPT
(6)
CR IP T
{yi }N i=1 are stacked to construct a stacked LR dictionary: h Dl1 GD l D2 GW2 Dh ˜l = h l D = D3 GW3 D . .. .. . . h l GWN D DN
AN US
This dictionary is used to approximate the stacked LR patch y 1 y2 ˜= y .. . yN
(7)
ˆ is given by by sparse coding: namely, the HR estimate x
(8)
ˆ = Dh α. ˆ x
(9)
M
˜ l αk2 + ηkαk1 , ˆ = arg mink˜ α y−D 2
ED
α
Each atom dh in the HR dictionary Dh is shifted and clipped to an atom of
PT
size q 0 by the action of Wi , i = 2, . . . , N , and then blurred and down-sampled by the action of G to form corresponding LR atoms GWi dh , i = 2, . . . , N .
CE
Each block of the stacked dictionary in Eq. (6) is composed of LR atoms
AC
obtained in this manner. 3. Approximation of Displacements by Pixel-level Shifts As discussed in the previous section, the major problem of multi-frame SR
can be reduced to the problem of estimating shift and the clipping operators Wi , i = 2, . . . , N . In the following, we consider enhancing the resolution of the magnification factor k. Since the LR target patch y1 ∈ Rq is represented 10
ACCEPTED MANUSCRIPT
√ by a square with q pixels on a side, the corresponding HR patch is a square √ √ with p = k q pixels, namely, a vector of size p = k 2 q. To estimate Wi , i = 2, . . . , N , we consider the upper-left-most point of the i-th auxiliary patch yi as the origin (0, 0) of the shift, which is on the
CR IP T
grid of the i-th LR image Yi as shown in Fig. 2. We consider the shift of the
target LR patch using HR accuracy to achieve a satisfactory result, which
means placing the target LR patch on the LR image Yi by using sub-pixellevel matching. Let the nearest upper-left grid point to the origin in the i-th
AN US
LR image Yi be (k, k), and let that of the target patch be (a, b) ∈ [0, k]2 , as shown in Fig. 2.
There are two cases of displacement: pixel-level parallel shifts in the HR image space, and others. In the former case where (a, b) are both integers, the LR dictionary for the auxiliary patch can be simply constructed through the
M
clipping of the corresponding areas from the HR dictionary and degradation with G. These LR dictionaries of pixel-level parallel shifts a, b = 0, 1, . . . , k
ED
are denoted by Dl(a,b) and referred to as the LR base dictionaries henceforth. In the latter case, we assume that the shift in the auxiliary patch is well-
PT
approximated by a linear combination of pixel-level parallel shift in HR space: namely, the estimation of the shift and clipping operators is reduced to that
CE
of the combination coefficients for possible (k + 1)2 parallel shifts. The i-th
AC
block of the LR dictionary Dli , i = 2, 3, . . . in Eq. (6) is then given by Dli = GWi Dh = θi,(0,0) Dl(0,0) + θi,(0,1) Dl(0,1) + · · · + θi,(k,k) Dl(k,k) , k X
a,b=0
θi,(a,b) = 1,
(10)
θi,(a,b) ≥ 0,
where the LR base dictionaries are common to all LR observations, and different observations are represented by different coefficients θ. We call θ 11
ACCEPTED MANUSCRIPT
the registration coefficient. In our previous work (Kato et al., 2015b), we user a two-step procedure for multi-frame SR. That is, we first estimated the registration coefficient by using the 2D simultaneous block matching method proposed in (Shimizu and
CR IP T
Okutomi, 2006) due to its computational efficiency. To further reduce computational cost, we then simplified the registration procedure by considering
only four pixel-level shift operators around origin (0, 0), obtained by pixel-
level matching. The sub-pixel-level shift operator was estimated by estimat-
AN US
ing the coefficients (1 − v1 )(1 − v2 ), (1 − v1 )v2 , v1 (1 − v2 ), and v1 v2 , v1 , v2 ≥ 0
for four shift operators corresponding to the coordinates (0, 0), (1, 0), (0, 1), and (1, 1). We then constructed the stacked LR dictionary in Eq. (6) and the stacked LR patches in Eq. (7). The HR patch was obtained by sparse coding. In this work, instead of this two-step procedure, we propose a novel
M
approach to estimate the sub-pixel-level accuracy shifts through a common
ED
optimization objective for spares coding. 4. Double Sparsity for Image Super Resolution
PT
In this section, we formalize the proposed method to estimate both the shifts in LR images and the coefficients for sparse coding.
CE
4.1. Problem Formulation ˜ in Eq.(7) is approxiWe start by showing that the stacked LR patch y
AC
mated by a bi-linear form. Let B be a matrix of base dictionaries, Dl and Dl(a,b) , a, b = 0, . . . , k, defined as
Dl1
B=
Dl(0,0) · · ·
Dl(k,k) ..
,
. Dl(0,0)
12
l(k,k)
···
D
(11)
ACCEPTED MANUSCRIPT
and the registration coefficient of the base dictionaries is collectively denoted by h i> 1, θ2,(0,0) , θ2,(0,1) , . . . , θ2,(k,k) , . . . , θN,(0,0) , . . . , θN,(k,k) .
(12)
CR IP T
Then, the stacked dictionary in Eq. (6) is expressed as ˜ l = B (θ ⊗ I) , D
(13)
where ⊗ is the Kronecker product and I is the unit matrix of appropriate size.
AN US
Then, the approximation of an LR patch is denoted by three representations as
˜ l α = B (θ ⊗ I) α ˜'D y
(14)
= B (I ⊗ α) θ
(15)
M
= B vec αθ
>
,
(16)
ED
where vec is the column-span vectorization operator. In the above expression, Eqs. (14) and (15) are linear with respect to α and θ, respectively. We use this fact to derive a computationally efficient iterative algorithm. Then, the
PT
estimated coefficient α is used to reconstruct the HR image as x = Dh α. minimize k˜ y − B vec αθ > k22 + ηkαk1 {α,θ}
(17)
subject to Eθ ≤ 1, θ ≥ 0,
AC
CE
The optimization problem to be solved is
where 1 denotes the vector of all ones, and E is defined as 1 1 1 ··· 1 ∈ RN ×{1+(k+1)2 ×(N −1)} . E= ... 1 1 ··· 1
0
0
13
(18)
ACCEPTED MANUSCRIPT
The regularization term ηkαk1 imposes sparsity in the representation of the observed signals by linear combinations of atoms. The inequality Eθ ≤ 1 encodes a constraint whereby the sum of coefficients for interpolation is less than or equal to one for each image.
CR IP T
Under the non-negativity constraint θ ≥ 0, Eθ = kθk1 , hence Eθ ≤ 1
works as a sparsity-inducing constraint for θ like the `1 -norm regulariza-
tion. The sparsity-inducing mechanism of the the `1 -norm-like constraints
with non-negativity has been studied in the literature of non-negative matrix
AN US
factorization (Lee and Seung, 1999; Cichocki et al., 2009). 4.2. Optimization method
We solve the optimization problem (17). Since the problem of finding a closed-form solution for problem (17) on both α and θ is intractable, we
M
alternatingly solve the problem with respect to α with a fixed θ and θ with a fixed α. As explained in the next subsection, the optimization problem (17)
ED
has the structure of double sparsity: the dictionary used for sparse coding itself is the sparse combination of the base dictionary. Using only the target LR patch y1 , we first initialize the sparse coding coefficient α(1) by solving
CE
PT
the following optimization problem: α(1) = arg min ky1 − Dl1 αk22 + ηkαk1 , α
(19)
which is efficiently solved using sparse coding algorithms, such as the methods
AC
proposed by Tibshirani (1996) or Lee et al. (2006). Sparse optimization for shift operator coefficients: By fixing coefficient α(t) , we obtain the registration coefficient by θ (t) = arg min k˜ y − B(I ⊗ α(t) )θk22 θ
subject to Eθ ≤ 1, θ ≥ 0. 14
(20)
ACCEPTED MANUSCRIPT
This is a quadratic programing problem that is efficiently solved by using any off-the-shelf solver. We used the active-set method for solving the quadratic problem equipped with the MATLAB quadprog function. We note that the optimization cost of Eq. (20) is very small, since the dimension of parameter
CR IP T
θ to be optimized is (N − 1) × (k + 1)2 , which is 64 in case of N = 5 observed images and we try to magnify the image of factor k = 3, for example. Sparse optimization for atom coefficients:
To optimize α with a fixed θ (t) , we solve the following problem:
α
AN US
α(t+1) = arg min k˜ y − B(θ (t) ⊗ I)αk22 + ηkαk1 .
(21)
This problem is also an instance of the `1 -norm regularized least-squares optimization problem, which is efficiently solved by using sparse coding algorithms. We iteratively solve the optimization problems (20) and (21) until
M
convergence.
Algorithm 2 summarizes the proposed method. By the operation of
ED
ClipByMatching, we estimate the position of the target patch in auxiliary image Yi and extract auxiliary patch yi of size q 0 . Moreover, operations
PT
to solve the registration problem Eq. (20) and the sparse coding problem Eq. (21) are denoted by SolveReg and SolveSC, respectively.
CE
4.3. Computational Complexity We briefly show the computational complexity of the proposed super res-
AC
olution algorithm. The main part of the proposed algorithm is composed of two modules, SolveReg and SolveSC. We use the feature-sign-search method for SolveReg, namely, solving sparse coding problem (21). This algorithm searches for the sign of the coefficients α. Then, considering only nonzero
elements, the problem is reduced to a standard unconstrained quadratic optimization problem, which can be solved analytically with cubic computational 15
CR IP T
ACCEPTED MANUSCRIPT
Algorithm 2 Proposed Algorithm Input: LR Images Y1 , Y2 , · · · , YN , HR Dictionary Dh , and LR base Dictionaries Dl1 , B
while there remain patches to be extracted do
AN US
extract patch y1 from the target image Y1 for i = 2 to N do
yi ← ClipByMatching(y1 , Yi ) end for h i> ˜ ← y1 , y 2 , . . . , yN y α
for t = 1 to T do
M
α(1) ← arg min ky1 − Dl1 αk22 + ηkαk1
ED
θ (t) ← SolveReg(˜ y, B, α(t) ) α(t) ← SolveSC(˜ y, B, θ (t) )
. Eqs. (21)
PT
end for
. Eqs. (20)
x ← Dh α(T )
CE
end while
Post-processing to construct the entire image (see, e.g.
AC
2015b))
return HR image X
16
(Kato et al.,
ACCEPTED MANUSCRIPT
complexity with respect to the non-zero number of the coefficient vector α. The computational cost for solving equation (21) is of order O(L · n3α ) where L is the number of iterations for the feature-sign-search algorithm and nα is the number of non-zero elements in the optimum vector α. The length of the
CR IP T
coefficient vector is the number of atoms K in the dictionary. In the frame-
work of the sparse coding, usually the number of atoms in the dictionary is
several times of the pixel size ql . Whereas, the proportion of non-zero elements in the coefficients are less than one-tenth, namely, nα . ql , in typical
AN US
solutions.
The problem (20) is a quadratic programming with linear constraints, and solved by the active-set method. The computational cost of quadratic programming is cubic. Since the length of the registration coefficient θ is (N −1)×(k+1)2 , the computational cost for solving (20) is of order O(N 3 k 6 ).
M
Typically the number of observation N = 5 and the magnification factor k = 3. The active-set method significantly reduces effective computational cost by
ED
using the elements of θ satisfying the equality only 1 , and the computational cost for QP becomes of order O(L0 · n3θ ), where L0 is the number of iterations
PT
for the active-set algorithm and nθ is the number of non-zero elements in the estimated coefficient vector θ. The sparsity of θ depends on actual shift in
CE
the observed images. In our experiments, we observed that nθ ' 10 in many cases.
AC
The procedures SolveReg and SolveSC are repeated at most T times,
hence the overall computational cost for the proposed algorithm is of order O(T · (L · n3α + L0 · n3θ )). 1
We note that the constraint in the problem (20) leads us sparse solution.
17
ACCEPTED MANUSCRIPT
4.4. Double sparsity structure It is worth noting that the optimization problem (17) shares in form with the formulation of double sparsity dictionary learning proposed by (Rubinstein et al., 2010). Double sparsity dictionary learning was proposed to
CR IP T
bridge the gap between analytic dictionaries such as wavelets (Daubechies, 1992), and learning-based dictionaries such as MOD (Engan et al., 1999) and
K-SVD (Aharon et al., 2006). Double sparsity dictionary learning assumes that dictionary atoms have some sparse underlying structure over a set of
AN US
more fundamental base dictionaries. In our formulation of multi-frame super
resolution, the atoms are generated from a sparse combination of fundamental atoms derived by pixel-level shifting and degeneration of the HR atoms. A schematic diagram of the double sparsity structure in our formulation is shown in Fig. 3. By modeling the shift operation by a set of base shift op[registration]
M
[sparse coding] solve eq. (21)
Dl
ED
solve eq. (20)
Dl2 (✓
.. .
Dl3
.. .
..
.
(✓3,(0,0) , . . . , ✓3,(k,k) )
Dl(0,0) , . . . , Dl(k,k) 2 DlN (✓N,(0,0) , . . . , ✓N,(k,k) ) (k + 1) base dictionaries
composed of pixel-level parallel shifts
AC
CE
y˜
PT
(↵1 , . . . , ↵K )
2,(0,0) , . . . , ✓2,(k,k) )
Figure 3: Double sparsity for our formulation of the multi-frame super resolution problem. The LR dictionary used for the sparse coding of the stacked LR patches was itself a sparse combination of base dictionaries generated by the pixel-level parallel shift in the HR dictionary followed by degradation.
18
ACCEPTED MANUSCRIPT
erators, we obtained a natural and simple formulation of multi-frame super resolution based on sparse coding. 4.5. Sparse Coding Stability in Double Sparse Dictionary Learning
CR IP T
Sparse coding coefficient in equations (2), (19), and (21) are affected by the dictionary used for representing observed signal. There have been proposed quite a few theoretical analyses on sparse coding coefficient (Fan and
Li, 2001; Donoho et al., 2006). Stability and sample complexity of dictionary learning are also analyzed (Vainsencher et al., 2011). In this section, follow-
AN US
ing the analysis presented in Mehta and Gray (2013), we consider the effect of image registration estimation error to that of sparse coding coefficient estimation.
First, we define the space of dictionaries parametrized by the registration
M
coefficient θ as D = {B(θ ⊗ I)|Eθ ≤ 1, θ ≥ 0}. Since the estimated
sparse coding coefficient vector α ∈ RK in equation (2) depends on both the
ED
dictionary D and observed signal y, we write the estimated sparse coding coefficient as α(D, y) or, in our double sparse dictionary formulation, α(θ, y) because dictionary is specified by a registration coefficient θ. For the sake
PT
of simplicity, each atom of dictionaries is assumed to be normalized to have unit `2 norm.
CE
Let [n] = {1, 2, . . . , n} be a set of natural numbers from 1 to n ∈ N. To
AC
guarantee the stability of sparse codes, we define the notion of s-incoherence. Definition 1 (s-incoherence). For s ∈ [K] and D ∈ D, the s-incoherence µs (D) is defined as the square of the minimum singular value among subdictionaries of D composed of s atoms: µs (D) = (min{ζs (DΛ )|Λ ⊆ [K], |Λ| = s})2 ,
19
(22)
ACCEPTED MANUSCRIPT
where DΛ denotes a sub-dictionary with columns indicated by the index set Λ, |Λ| is the cardinality of Λ, and ζs (A) is the s-th largest singular value of a matrix A.
CR IP T
In Mehta and Gray (2013)(Theorem 4), the following result has been presented:
Theorem 1 (Sparse Coding Stability). Let D, D0 satisfy µs (D), µs (D0 ) ≥ µ for some positive constant µ and kD − D0 k2 ≤ ε, and let y ∈ BRq , where BRq is the unit ball of Rq . Suppose that there exists an index set I ⊆ [K] of
AN US
K − s indices such that for all i ∈ I,
|d> i (y − Dα(D, y))| < η − τ
with
ε≤
τ 2η . 27
(23)
and α(D0 , y) is bounded as
M
Then, the difference between two sparse coding coefficient vectors α(D, y) √ 3ε s kα(D, y) − α(D , y)k2 ≤ . 2 ηµ
(24)
ED
0
In this theorem, the condition (23) implies that at least K − s inactive atoms
PT
in α(D, y) do not have too high absolute correlation with the residual y − Dα(D, y)), and the degree of permissible correlation is bounded by η − τ ,
CE
where η is the sparse coding regularization coefficient and τ (< η) controls the strictness on the permissible correlation. By applying this theorem to our
AC
double sparse dictionary learning problem, we obtain the following corollary: Corollary 1. Let θ, θ 0 ∈ {θ|Eθ ≤ 1, θ ≥ 0}, and µs (B(θ ⊗ I)), µs (B(θ 0 ⊗ I)) ≥ µ. We also let, for some ε ≥ 0, v u N k uX X t (θi,(a,b) − θ0
2 i,(a,b) )
i=2 a,b=0
20
ε ≤ √ = ε˜, K
(25)
ACCEPTED MANUSCRIPT
and y˜ ∈ BRq+(q−1)(N −1) . Suppose that there exists an index set I ⊆ [K] of
˜ − B(θ ⊗ I)α(θ, y))| ˜ < η−τ K − s indices such that for all i ∈ I, |b> i (y √ 2 with K ε˜ ≤ τ27η , and bi is the i-th atom in B(θ ⊗ I). Then, the difference
˜ and α(θ 0 , y) ˜ is bounded between two sparse coding coefficient vectors α(θ, y) as
CR IP T
√ 3 ε˜ Ks ˜ − α(θ , y)k ˜ 2≤ kα(θ, y) . 2 ηµ 0
(26)
Suppose θ 0 is the true shift parameter and θ is the estimate obtained by
solving the problem (20). Then, this corollary states that the error of the
AN US
sparse coding coefficient α is bounded as equation (26). The equation (25)
is the counterpart of kD − D0 k2 in Theorem 1, and based on the fact that dictionary D is written as B(θ ⊗ I) hence v u N k uX X l(a,b) 0 kB(θ ⊗ I) − B(θ 0 ⊗ I)k2 =t (θi,(a,b) − θi,(a,b) )2 kDi k22
(27)
i=2 a,b=0
M
v u N k u XX 0 =tK (θi,(a,b) − θi,(a,b) )2 ,
(28)
ED
i=2 a,b=0
where we used the assumption that each atom are normalized to have unit
PT
norm. The bound (26) explicitly contains the dimension of the sparse coefficient, and we can see that when we increase the dimension K of the
CE
sparse coefficient, we need to increase the regularization coefficient η for the sparse coding problem to ensure the stability of α with respect to the reg-
AC
istration. Finally, we show the worst case bound for the error of the sparse coding coefficient vector α. Noting that the registration coefficient has reP striction ka,b=0 θi,(a,b) = 1, θi,(a,b) ≥ 0 for i = 2, . . . , N , the left hand side of qP P p N k 0 2 ≤ (θ − θ ) 2K(N − 1), equation (25) is bounded as i=2 a,b=0 i,(a,b) i,(a,b)
hence we obtain the inequality
3 ˜ − α(θ , y)k ˜ 2≤√ kα(θ, y) 2 0
21
p
K(N − 1)s , ηµ
N ≥ 2,
(29)
ACCEPTED MANUSCRIPT
as the worst case bound. 5. Experimental Results
images and those of images from movies2 . 5.1. Application to still images
CR IP T
In this section, we report the result of super-resolution on some sets of
We assumed that the observed LR images were generated from an HR
image through parallel shifts, blurring, down-sampling, and the addition of
AN US
noises. The degrees of vertical and horizontal shifts were randomly sampled from a uniform distribution in [−5, 5]. The blurring operation was imple-
mented by the convolution of a 9 × 9-pixel Gaussian filter with standard deviation σh = 1. The blurred and shifted images were then down-sampled
M
by a factor three. Finally, noise sampled from N (0, σn = 1) were added to generate LR observation images. In our experiments, both the intensity of
ED
the blur and the noise were assumed to be given. Our algorithm iteratively solved the quadratic programming (20) and the sparse coding problem (21).
PT
We observed that the algorithm converged in fewer than three iterations, and fixed the number of iterations T to three for all experiments. We magnified
CE
the input LR images by factor of three for all cases. We used five LR images to estimate an HR images, i.e., N = 5.
AC
We compare the proposed method with seven conventional methods. The
first was bi-cubic interpolation. This is a simple method regarded as baseline for SR. The second and third methods were Single-Frame Joint Dictionary 2
A simple MATLAB implementation will be made available on author’s website fol-
lowing the acceptance of the paper.
22
ACCEPTED MANUSCRIPT
Learning (SF-JDL; (Yang et al., 2010)) and Adaptive Sparse Domain Selection (ASDS; (Dong et al., 2011)). These methods are considered state-ofthe-art single-frame SR methods with publicly available software implementations. The fourth method was the one proposed by (Wang et al., 2011),
CR IP T
which is a multi-frame SR method based on joint dictionary learning. We re-
fer to this method as MF-JDL (Multi-Frame super resolution based on Joint
Dictionary Learning). The other two methods were representative methods in reconstruction-based SR in the literature. In (Farsiu et al., 2004), a multi-
AN US
frame SR method based on regularization in the form of Bilateral Total Vari-
ation (BTV) was proposed. Due to its impressive performance and simplicity, the method has become one of the most commonly used in multi-frame SR. BTV was further improved in (Li et al., 2010), where the method based on regularization by Locally Adaptive Bilateral Total Variation (LABTV) was
M
proposed. In this paper, these reconstruction-based methods were referred to as BTV and LABTV, respectively. Finally, we also used the multi-frame
ED
SR method based on sparse coding proposed in our previous paper (Kato et al., 2015b), which is referred to as MF-SC.
PT
There were several tuning parameters for each SR method. To make fair comparison, we first optimized the parameters of each method to maximize
CE
the PSNR of the image “Lena”, which is one of the most commonly used benchmark images in image processing. For all other images, we continued
AC
to use the same parameters optimized for Lena. We used two grayscale images (Lena and Cameraman), and three color
images (Flower, Girl, and Parthenon) to assess the performance of SR methods. When dealing with color images, we first converted them to YCbCr format and then apply SR methods only to the luminance channel (Y). The values of the channels Cb and Cr were simply expanded by bi-cubic interpo-
23
ACCEPTED MANUSCRIPT
lation. The experimental results are shown in Fig. 4-Fig. 8. To focus on the difference between our previous method and the newly proposed method, we only show the original images, the degraded images, the images obtained by
CR IP T
MF-SC and those obtained by our proposed method. These figures indicate that the proposed method was able to generate images comparable or better
PT
ED
M
AN US
than those of MF-SC.
CE
Figure 4: Reconstructed images estimated from LR observations for Lena. (a) Observed LR image. (b) Original HR image. Results of (c) MF-SC(our previous work), and (d) the
AC
proposed method.
For a quantitative comparison of SR methods, we used the Peak Signal-
to-Noise Ratio (PSNR), defined as PSNR[dB] = 10 log10
2552 , MSE
(30)
where MSE is the mean squared error between the original HR image and 24
AN US
CR IP T
ACCEPTED MANUSCRIPT
Figure 5: Reconstructed images estimated from LR observations for Cameraman. (a)
M
Observed LR image. (b) Original HR image. Results o (c) MF-SC, and (d) the proposed
AC
CE
PT
ED
method.
Figure 6: Reconstructed images estimated from LR observations for Flower. (a) Observed LR image. (b) Original HR image. Results of (c) MF-SC, and (d) the proposed method.
25
AN US
CR IP T
ACCEPTED MANUSCRIPT
Figure 7: Reconstructed images estimated from LR observations for Girl. (a) Observed
AC
CE
PT
ED
M
LR image. (b) Original HR image. Results of (c) MF-SC, and (d) the proposed method.
Figure 8: Reconstructed images estimated from LR observations for Parthenon. (a) Observed LR image. (b) Original HR image. Results of (c) MF-SC, and (d) the proposed method.
26
ACCEPTED MANUSCRIPT
the estimated HR image; a higher PSNR indicates better SR performance. We show PSNR values obtained by various methods in Table 1. To evaluate the PSNR values, we randomly generated 100 sets of 5 shift operators and generated degraded images by adding random observation noises. From
CR IP T
each set of observed images, we randomly chose one target image. Only the target image was used for single-frame SR, whereas in multi-frame SR, the
remaining 4 images were used as auxiliary LR images. The mean values and standard deviations of the PSNR values were calculated using 100 SR results
AN US
of each methods. The best and the second-best results are shown in bold
and are underlined, respectively. The averaged computational time over 100 trials are also shown in parentheses. As shown in Table 1, the proposed Table 1: PSNRs and computational times (in parentheses) of SR method. The best and
Image Lena
Bicubic
M
second-best results are shown in bold and are underlined, respectively. SF-JDL
ED
Cameraman 27.03 ± 0.00 28.25 ± 0.01 (1.81)
PT
35.50 ± 0.01 35.81 ± 0.01
(2.24)
24.40 ± 0.00 24.59 ± 0.00 (4.28)
BTV
LABTV
MF-SC
Proposed
29.01 ± 0.22
29.33 ± 0.20
29.69 ± 0.15
29.82 ± 0.14
(149.07)
(99.79)
(44.03)
(71.15)
(24.78)
(59.54)
29.88 ± 0.02
28.29 ± 0.03
29.43 ± 0.37
29.83 ± 0.37
30.19 ± 0.38
30.44 ± 0.35
(157.71)
(98.45)
(44.29)
(71.06)
(24.23)
(42.78)
36.22 ± 0.02
36.32 ± 0.04
36.26 ± 0.24
36.46 ± 0.21
36.61 ± 0.10 36.61 ± 0.12
(124.88)
(94.14)
(43.86)
(67.96)
(21.68)
(30.80)
31.72 ± 0.01
31.73 ± 0.02
31.84 ± 0.17
32.09 ± 0.16
31.98 ± 0.06
31.92 ± 0.07
(157.10)
(118.64)
(53.53)
(83.21)
(26.66)
(56.52)
25.07 ± 0.01
24.70 ± 0.02
25.45 ± 0.19
25.33 ± 0.13
25.41 ± 0.09
25.37 ± 0.08
(442.35)
(257.19)
(108.94)
(173.38)
(57.80)
(141.46)
AC
Parthenon
(1.88)
31.12 ± 0.00 31.49 ± 0.01
CE
Girl
MF-JDL
27.91 ± 0.00 28.73 ± 0.01 30.08 ± 0.02 29.25 ± 0.05 (1.90)
Flower
ASDS
method outperformed the other conventional methods for two out of five images ( Cameraman and Flower), and was the second-best for Lena. It showed improvement over our previous method in two images, was the same in one image, and was slightly worse in the other two images. The proposed method 27
ACCEPTED MANUSCRIPT
requires more computational time than our previous method, but comparable or faster than other methods. We can see that the proposed method, which has theoretically interesting double sparse structure, has comparable
computational cost.
Illustration of the Double Sparsity Structure
CR IP T
performance over other methods in both image reconstruction quality and
For deeper understanding of the behavior of our algorithm, we illustrate, for
AN US
“Lena” image as an example, the estimated registration coefficients by our propose double sparse dictionary learning. For the original HR Lena image, we applied the following four different shift operations:
S2 = (2, 3), S3 = (3, 2), S4 = (0, 1), S5 = (1, 0),
M
where Si = (a, b) means the i-th observed image has relative displacement to the target image, a pixel in horizontal and b pixel in vertical directions. We
ED
can consider sub-pixel level shifts in HR space, but to simplify the explanation, we consider pixel-level shifts in HR space. We note that we consider five LR observations and treat the first one as the target image, hence we
PT
only consider four shift operations. The target and shifted images were then degenerated by applying the blur, downsampling, and additive noise as in
CE
equation (5). We extracted a patch of size q = 5 × 5 from the target LR observed image, and corresponding 4 × 4 patches from four auxiliary LR im-
AC
ages. According to the possible (N − 1) × (k + 1)2 = 4 × 16 = 64 different
pixel-level shifts in HR space, we generated 64 LR dictionaries, which are the element dictionaries in terms of the double sparse dictionary learning framework. The stacked observed patch y˜ was generated from the target
and auxiliary patches. Then, we obtained the registration coefficients θ by solving the problem (20). The length of the coefficient vector θ was 64, 28
ACCEPTED MANUSCRIPT
which were composed of N − 1 blocks corresponding to the j-th observed image (j = 2, . . . , 5), and these blocks were of length 16. For an arbitrary chosen patch, we show four blocks of coefficients in Fig. 9 by barplots. In the Coefficients for shift S3
(0,3)
(1,3)
(2,3)
(3,3)
(0,3)
(1,3)
(2,3)
(3,3)
(2,2)
(3,2)*
(1,2)
(0,2)
(3,1)
(2,1)
(1,1)
(0,1)
(3,0)
(2,0)
(1,0)
(0,0)
0.0 (3,3)
(1,3)
(2,3)*
Coefficients for shift S5
0.4 0.2
CE
Figure 9: Examples of the estimated registration coefficients for four shift patterns.
barplots, the indices corresponds to the true shifts are marked with ∗. From
AC
these plots, we can see that the estimated coefficients are sparse and the elements correspond to the true shifts have large values as expected. Figure 9 is an example of the estimated registration coefficients for one patch, but we saw the same tendency of the estimated coefficients for other patches. We show average of estimated registration coefficients over whole patches in Fig. 10. It is seen that the elements of coefficients correspond to true 29
(3,2)
(2,2)
(1,2)
(0,2)
(3,1)
(2,1)
(1,1)
(0,1)
(3,0)
(2,0)
(0,0)
(1,0)*
0.0 (3,3)
(2,3)
(1,3)
(0,3)
(3,2)
(2,2)
(1,2)
(0,2)
(3,1)
ED (2,1)
(1,1)
(0,1)*
PT
(3,0)
(2,0)
(1,0)
(0,0)
0.0
0.2
0.4
M
0.6
0.6
0.8
AN US
0.8
Coefficients for shift S4
(0,3)
(3,2)
(2,2)
(1,2)
(0,2)
(3,1)
(2,1)
(1,1)
(0,1)
(3,0)
(2,0)
(1,0)
(0,0)
0.0
0.2
0.2
0.4
0.6
0.4
0.8
0.6
CR IP T
Coefficients for shift S2
ACCEPTED MANUSCRIPT
shift have larger value than other elements, and we can see that by solving the problem (20), we can obtain reasonable estimate of the registration of multiple observations.
(2,3)
(3,3) (3,3)
(1,3)
(2,3)
(0,3)
(2,2)
(3,2)*
(1,2)
(0,2)
(3,1)
(2,1)
(1,1)
(0,1)
(3,0)
Averaged coefficients for shift S5
0.2 0.1
CE
Figure 10: Average of the estimated registration coefficients for four shift patterns.
AC
5.2. Application to motion pictures We show the experimental results on LR images sequentially captured
from movies. From five consecutive LR images, the middle (third in the temporal sequence) image was selected as the target image, and the other four are considered auxiliary images. The obtained HR images using MF-SC and the proposed method are
30
(1,3)
(0,3)
(3,2)
(2,2)
(1,2)
(0,2)
(3,1)
(2,1)
(1,1)
(0,1)
(3,0)
(2,0)
(0,0)
(1,0)*
0.0 (3,3)
(2,3)
(1,3)
(0,3)
(3,2)
(2,2)
(1,2)
(0,2)
(3,1)
ED (2,1)
(1,1)
(0,1)*
PT
(3,0)
(2,0)
(1,0)
(0,0)
0.00
0.10
M
0.20
0.3
0.30
0.4
Averaged coefficients for shift S4
(2,0)
(1,0)
(0,0)
(3,3)
AN US
(1,3)
(2,3)*
(0,3)
(3,2)
(2,2)
(1,2)
(0,2)
(3,1)
(2,1)
(1,1)
(0,1)
(3,0)
(2,0)
(1,0)
(0,0)
0.0
0.00
0.1
0.10
0.2
0.20
0.3
0.30
CR IP T
Averaged coefficients for shift S3
0.4
Averaged coefficients for shift S2
ACCEPTED MANUSCRIPT
shown in Fig. 11 and Fig. 12. We also show the PSNR in Table 2 for all
AN US
CR IP T
methods.
Figure 11: Images estimated from LR observations for MacArthur. (a) Observed LR
AC
CE
PT
ED
M
image. (b) Original HR image. Results of (c) MF-SC, and (d) the proposed method.
Figure 12: Images estimated from LR observations for Samurai. (a) Observed LR image. (b) Original HR image. Results of (c) MF-SC, and (d) the proposed method.
31
ACCEPTED MANUSCRIPT
Table 2: PSNRs and computational times (in parentheses) of SR methods applied to movie data. SF-JDL
ASDS
MF-JDL
BTV
LABTV
MF-SC Proposed
MacArthur
34.11
34.33
35.63
35.18
34.39
34.40
34.79
35.63
(2.69)
(178.08)
(133.78)
(61.72)
(96.17)
(27.70)
(61.74)
Samurai
25.36
25.97
26.66
26.12
26.16
26.07
25.90
26.49
(2.50)
(211.65)
(138.38)
(62.13)
(96.24)
(30.75)
(59.86)
CR IP T
Bicubic
From Fig. 11 and Fig. 12, we see that the HR images obtained by the
proposed method were clear and had distinct edges compared to images ob-
AN US
tained by MF-SC. From Table 2, the PSNRs of the proposed method were the same or lower than those of ASDS. However, the computational cost of the proposed method was lower than that of ASDS and other multi-frame SR methods. Although the proposed method incurs approximately double
M
the computational cost of our previous method (MF-SC), it significantly im-
6. Conclusion
ED
proves the quality of the reconstructed image in terms of PSNR.
PT
In this paper, we discussed multi-frame image super resolution as a combination of two distinct problems, i.e., image registration and sparse coding.
CE
The main contribution of this work was formulating these problems within a framework of double sparsity dictionary learning. Image registration and
AC
sparse coding were unified in a single objective function, following which registration coefficients and sparse coding coefficients were alternatingly optimized with quadratic programming and `1 -norm constraint least squares, respectively, both of which led the sparse estimation of those coefficients. The proposed method improved our previous formulation of multi-frame super resolution for some images, particularly images from movies. We explained
32
ACCEPTED MANUSCRIPT
the proposed super resolution method using an application to image resolution enhancement; however, we think that our double sparsity formulation is applicable to the enhancement or refinement of multiple observations from a number of inaccurate sensors with appropriate base dictionaries, such as
CR IP T
information integration from observations by autonomous mobile robots.
Our future work includes an application of the proposed method to other signal resolution enhancement problems as well as further improvement of
computational efficiency by using or developing optimization methods for
AN US
sparse coding and shift estimation. It is interesting to optimize the learning
method of HR dictionary for super resolution task. For example, in the framework of the super resolution based on sparse coding, we generate the LR atoms from HR atoms. In reality, there can be a lot of different ways of degeneration of an HR atom that results in the same LR atom. So, when
M
we optimize the HR dictionary, making each atom in HR space as different as possible would improve the overall super resolution results. The way to
ED
impose this constraint and how to formulate the optimization problem will be investigated in our future work. It is also of great interest to investigate
PT
the relationship between the double sparse structure and the intermediate representation obtained by using the deep neural network architecture (Dong
CE
et al., 2016).
AC
Acknowledgment The authors would like to express their special thanks to the editor, the
associate editor and anonymous reviewers whose comments led to valuable improvements of the manuscript. Part of this work was supported by JSPS KAKENHI No. 25120009, 25120011, 16H02842 and 16K16108.
33
ACCEPTED MANUSCRIPT
Appendix A. Shift and clipping operator We explain how the parallel shift and clipping operator Wi in Eq. (5) is realized as an explicit operator. Suppose Wi shifts the original x-pixel of the
CR IP T
LR patch in the row and the y-pixel in the column. We denote Wi by W (x,y) . √ √ √ We consider an LR patch of size q = q × q: the patch has q pixels in
both rows and columns. Let Oa×b ∈ Ra×b the zero matrix and Ia×a ∈ Ra×a √ be a unit matrix. We define matrix Sz , z ∈ {0, ±1, ±2, . . . , ±( q − 1)} as follows:
O(√q−z)×z
ED
when z ≤ −1, and
√ √ I(√q−z)×(√q−z) ∈ R q× q Oz×(√q−z)
M
O√ ( q−z)×z Sz = Oz×z
√ √ ∈ R q× q
Oz×z
AN US
when z ≥ 1,
O √ z×( q−z) Sz = I(√q−z)×(√q−z)
(A.1)
(A.2)
Sz = I√q×√q
PT
(A.3)
when z = 0. Then, Sy ⊗Sx gives a matrix acting on the LR patch and realizes
CE
a pixel-level parallel translation to (x, y), where ⊗ is the matrix Kronecker product. We note that in the shifted matrix, pixel values outside the patch
AC
are replaced with zeros (zero padding). The shift and clipping operator W (a,b) acting on the y ∈ Rq is defined by W (a,b) y = C > (Sy ⊗ Sx y)C ∈ R(
√ √ q−2)×( q−2)
,
(A.4)
where h i> √ √ √ √ √ √ C = O( q−2)×1 I( q−2)×( q−2) O( q−2)×1 ∈ R q×( q−2) . 34
(A.5)
ACCEPTED MANUSCRIPT
References Aharon, M., Elad, M., Bruckstein, A., 2006. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. IEEE Trans-
CR IP T
actions on Signal Processing 54 (11), 4311–4322. Belkin, M., Niyogi, P., 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15 (6), 1373–1396.
Bruce, N. D., Rahman, S., Carrier, D., 2016. Sparse coding in early visual
AN US
representation: From specific properties to general principles. Neurocomputing 171, 1085 – 1098.
URL //www.sciencedirect.com/science/article/pii/S0925231215010772 Cichocki, A., Zdunek, R., Phan, A. H., Amari, S.-i., 2009. Nonnegative Ma-
M
trix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. Wiley Publishing.
ED
Daubechies, I., 1992. Ten lectures on wavelets. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.
PT
Dong, C., Loy, C. C., He, K., Tang, X., February 2016. Image super-
CE
resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38 (2), 295–307.
AC
URL http://dx.doi.org/10.1109/TPAMI.2015.2439281
Dong, W., Zhang, D., Shi, G., Wu, X., 2011. Image Deblurring and SuperResolution by Adaptive Sparse Domain Selection and Adaptive Regularization. IEEE Transactions on Image Processing 20 (7), 1838–1857.
Donoho, D. L., Elad, M., Temlyakov, V. N., Jan 2006. Stable recovery of
35
ACCEPTED MANUSCRIPT
sparse overcomplete representations in the presence of noise. IEEE Transactions on Information Theory 52 (1), 6–18. Elad, M., 2010. Sparse and Redundant Representations: From Theory to
CR IP T
Applications in Signal and Image Processing. Springer. URL http://books.google.co.jp/books?id=d5b6lJI9BvAC
Engan, K., Aase, S. O., Hakon Husoy, J., 1999. Method of optimal directions
for frame design. In: Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 05.
AN US
ICASSP ’99. IEEE Computer Society, Washington, DC, USA, pp. 2443– 2446.
URL http://dx.doi.org/10.1109/ICASSP.1999.760624 Fan, J., Li, R., 2001. Variable selection via nonconcave penalized likelihood
96 (456), 1348–1360.
M
and its oracle properties. Journal of the American Statistical Association
ED
URL http://www.jstor.org/stable/3085904
PT
Farsiu, S., Robinson, M., Elad, M., Milanfar, P., Oct 2004. Fast and Robust Multiframe Super Resolution. IEEE Transactions on Image Process-
CE
ing 13 (10), 1327–1344. Hong, C., Yu, J., Tao, D., Wang, M., June 2015a. Image-based three-
AC
dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Transactions on Industrial Electronics 62 (6), 3742–3751.
Hong, C., Yu, J., Wan, J., Tao, D., Wang, M., Dec 2015b. Multimodal deep autoencoder for human pose recovery. IEEE Transactions on Image Processing 24 (12), 5659–5670.
36
ACCEPTED MANUSCRIPT
Jin, T., Yu, Z., Li, L., Li, C., 2015. Multiple graph regularized sparse coding and multiple hypergraph regularized sparse coding for image representation. Neurocomputing 154, 245 – 256.
CR IP T
URL //www.sciencedirect.com/science/article/pii/S0925231214016610 Kato, T., Hino, H., Murata, N., 2015a. Double sparse multi-frame image super resolution. CoRR abs/1512.00607. URL http://arxiv.org/abs/1512.00607
Kato, T., Hino, H., Murata, N., 2015b. Multi-frame image super resolution
AN US
based on sparse coding. Neural Networks 66, 64–78.
URL http://dx.doi.org/10.1016/j.neunet.2015.02.009 Kato, T., Hino, H., Murata, N., 2016. Doubly sparse structure in image super resolution. In: 26th IEEE International Workshop on Machine Learning for
13-16, 2016. pp. 1–6.
M
Signal Processing, MLSP 2016, Vietri sul Mare, Salerno, Italy, September
ED
URL http://dx.doi.org/10.1109/MLSP.2016.7738902 Lee, D. D., Seung, H. S., October 1999. Learning the parts of objects by
PT
non-negative matrix factorization. Nature 401 (6755), 788–791.
CE
Lee, H., Battle, A., Raina, R., Ng, A. Y., 2006. Efficient sparse coding algorithms. In: NIPS. pp. 801–808.
AC
Li, X., Hu, Y., Gao, X., Tao, D., Ning, B., February 2010. A multi-frame image super-resolution method. Signal Process. 90 (2), 405–414.
Lou, S., Zhao, X., Chuang, Y., Yu, H., Zhang, S., 2016. Graph regularized sparsity discriminant analysis for face recognition. Neurocomputing 173, Part 2, 290 – 297. URL //www.sciencedirect.com/science/article/pii/S0925231215012862 37
ACCEPTED MANUSCRIPT
Mehta, N., Gray, A. G., 2013. Sparsity-based generalization bounds for predictive sparse coding. In: Dasgupta, S., Mcallester, D. (Eds.), Proceedings of the 30th International Conference on Machine Learning (ICML-13). Vol. 28. JMLR Workshop and Conference Proceedings, pp. 36–44.
CR IP T
URL http://jmlr.csail.mit.edu/proceedings/papers/v28/mehta13.pdf Natarajan, B. K., 1995. Sparse approximate solutions to linear systems. SIAM journal on computing 24 (2), 227–234.
Olshausen, B. A., Field, D. J., 1996. Emergence of simple-cell receptive field
AN US
properties by learning a sparse code for natural images. Nature 381, 607– 609.
Olshausen, B. A., Field, D. J., 2005. How Close Are We to Understanding
M
V1? Neural Computation 17, 1665–1699.
Rezaiifar, Y. C. P. R., Krishnaprasad, P. S., 1993. Orthogonal matching
ED
pursuit: Recursive function approximation with applications to wavelet decomposition. In: Proceedings of the 27 th Annual Asilomar Conference
PT
on Signals, Systems, and Computers. pp. 40–44. Rubinstein, R., Zibulevsky, M., Elad, M., 2010. Double sparsity: Learn-
CE
ing sparse dictionaries for sparse signal approximation. Signal Processing, IEEE Transactions on 58 (3), 1553–1564.
AC
Shimizu, M., Okutomi, M., 2006. Multi-parameter simultaneous estimation on area-based matching. In: International Journal of Computer Vision, vol. 67, no. 3, pp. 327-342, May.
Shu, Z., Zhou, J., Huang, P., Yu, X., Yang, Z., Zhao, C., 2016. Local and global regularized sparse coding for data representation. Neurocomputing 38
ACCEPTED MANUSCRIPT
175, Part A, 188 – 197. URL //www.sciencedirect.com/science/article/pii/S0925231215015076 Tibshirani, R., 1996. Regression shrinkage and selection via the lasso. Journal
CR IP T
of the Royal Statistical Society, Series B 58, 267–288. Vainsencher, D., Mannor, S., Bruckstein, A. M., November 2011. The sample complexity of dictionary learning. J. Mach. Learn. Res. 12, 3259–3281. URL http://dl.acm.org/citation.cfm?id=1953048.2078210
AN US
Wang, C., Wang, Y., Lin, Z., Yuille, A. L., Gao, W., June 2014. Robust esti-
mation of 3d human poses from a single image. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. pp. 2369–2376. Wang, P., Hu, X., Xuan, B., Mu, J., Peng, S., 2011. Super Resolution Recon-
M
struction via Multiple Frames Joint Learning. In: International Conference on Multimedia and Signal Processing (CMSP). Vol. 1. pp. 357–361.
ED
Yang, J., Wright, J., Huang, T., Ma, Y., 2008. Image super-resolution as sparse representation of raw image patches. In: Proceedings of the 2008
PT
IEEE Computer Society Conference on Computer Vision and Pattern
CE
Recognition (CVPR’08). pp. 1–8. Yang, J., Wright, J., Huang, T. S., Ma, Y., November 2010. Image superresolution via sparse representation. IEEE Transactions on Image Process-
AC
ing 19 (11), 2861–2873.
Yu, J., Rui, Y., Tao, D., 2014. Click prediction for web image reranking using multimodal sparse coding. IEEE Trans. Image Processing 23 (5), 2019–2032. URL http://dx.doi.org/10.1109/TIP.2014.2311377 39
ACCEPTED MANUSCRIPT
Yu, J., Yang, X., Gao, F., Tao, D., 2016. Deep multimodal distance metric learning using click constraints for image ranking. IEEE Transactions on Cybernetics PP (99), 1–11.
CR IP T
Zeyde, R., Elad, M., Protter, M., 2010. On single image scale-up using sparserepresentations. In: Proceedings of the 7th international conference on Curves and Surfaces. pp. 711–730.
Zhan, X., Zhang, R., Yin, D., Hu, A., Hu, W., 2013. Remote sensing image compression based on double-sparsity dictionary learning and univer-
AN US
sal trellis coded quantization. In: IEEE International Conference on Im-
age Processing, ICIP 2013, Melbourne, Australia, September 15-18, 2013. IEEE, pp. 1665–1669.
URL http://dx.doi.org/10.1109/ICIP.2013.6738343
M
Zheng, M., Bu, J., Chen, C., Wang, C., Zhang, L., Qiu, G., Cai, D., 2011. Graph regularized sparse coding for image representation. IEEE Transac-
ED
tions on Image Processing 20 (5), 1327–1336.
PT
Zitova, B., Flusser, J., 2003. Image registration methods: a survey. Image and Vision Computing 21 (11), 977 – 1000.
AC
CE
URL http://www.sciencedirect.com/science/article/pii/S0262885603001379
40
ACCEPTED MANUSCRIPT
Toshiyuki Kato received the Bachelors degree in engineer-
CR IP T
ing in 2013, and Master’s degree in engineering in 2015 from Waseda Univer-
sity. Since April 2015, he has been working at Toshiba Solutions Corporation.
AN US
His research interests include image processing and sparse representation.
Hideitsu Hino received his B. Eng. in 2003, and M. Informatics in Applied Mathematics and Physics in 2005 from Kyoto University, Japan. He was with Hitachi’s Systems Development Laboratory from April
M
2005 to August 2007. He earned D. Eng. in 2010 from Waseda University. From April 2017, he is an Associate Professor at University of Tsukuba.
PT
member of IEEE.
ED
His research interests includes the analysis of learning algorithms. He is a
CE
Noboru Murata received the B. Eng., M. Eng. and Dr.
Eng in Mathematical Engineering and Information Physics from the University of Tokyo in 1987, 1989 and 1992, respectively. After working at the
AC
University of Tokyo, GMD FIRST in Germany, and RIKEN in Japan, in April of 2000, he joined Waseda University in Japan where he is presently a professor. His research interest includes the theoretical aspects of learning machines such as neural networks, focusing on the dynamics and statistical properties of learning.
41