Combining sparse coding with structured output regression machine for single image super-resolution

Combining sparse coding with structured output regression machine for single image super-resolution

Accepted Manuscript Combining Sparse coding with Structured Output Regression Machine for Single Image Super-Resolution Yongliang Tang , Weiguo Gong ...

2MB Sizes 0 Downloads 61 Views

Accepted Manuscript

Combining Sparse coding with Structured Output Regression Machine for Single Image Super-Resolution Yongliang Tang , Weiguo Gong , Qiane Yi , Weihong Li PII: DOI: Reference:

S0020-0255(16)30735-6 10.1016/j.ins.2017.12.001 INS 13294

To appear in:

Information Sciences

Received date: Revised date: Accepted date:

3 September 2016 22 October 2017 4 December 2017

Please cite this article as: Yongliang Tang , Weiguo Gong , Qiane Yi , Weihong Li , Combining Sparse coding with Structured Output Regression Machine for Single Image Super-Resolution, Information Sciences (2017), doi: 10.1016/j.ins.2017.12.001

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights ๏‚ท

The structured output regression machine (SORM) is improved by considering the correlation and independence between different output components. The model combine sparse coding with the improved SORM is used to SISR.

๏‚ท

The global and nonlocal optimization with a new similarity weight is proposed

AC

CE

PT

ED

M

AN US

to further improve the reconstructed HR images.

CR IP T

๏‚ท

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences Combining Sparse coding with Structured Output Regression Machine for Single Image Super-Resolution

Yongliang Tang, Weiguo Gong*, Qiane Yi, Weihong Li

CR IP T

Key Lab of Optoelectronic Technology & Systems of Education Ministry, Chongqing University, Chongqing 400044, China

AN US

Correspondence information: Weiguo Gong, Key Lab of Optoelectronic

Technology & Systems of Education Ministry, Chongqing University, Chongqing 400044, China,

M

[email protected],

AC

CE

PT

ED

+86 023 65112779

Combining Sparse Coding with Structured Output Regression Machine for Single Image Super-Resolution 1

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences Yongliang Tang, Weiguo Gong*, Qiane Yi, Weihong Li Key Lab of Optoelectronic Technology & Systems of Education Ministry, Chongqing University, Chongqing 400044, China

CR IP T

Abstract In this paper, considering that the trained dictionary pairs by sparse coding based super-resolution (SR) methods have difficulty capturing the complicated nonlinear relationships between the low-resolution (LR) and high-resolution (HR) feature spaces,

AN US

we propose a new single image SR method by combining sparse coding with the improved structured output regression machine (SORM). In the proposed method, the dictionary pairs are firstly learned by joint sparse coding to characterize the structural

M

domain of each feature space and add more consistency between the sparse codes of two

ED

feature spaces. Then, since the classical SORM does not give sufficient weight to the independence of different output components, we improve the SORM by considering

PT

the correlation and independence between different output components to establish a set

CE

of mapping functions for tying the sparse code of two feature spaces. With this, the more precise mapping relationships between two feature spaces are obtained by the

AC

trained dictionary pairs and mapping functions. Moreover, we propose a new global and nonlocal optimization for further enhancing the quality of the restored HR images. Extensive experiments validate that the proposed method can achieve convincing improvement over other state-of-the-art methods in terms of the reconstruction quality

Corresponding author. Tel.: +86 023 65112779; E-mail address: [email protected]

2

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences and computational cost. Keywords: Single image super-resolution, structured output regression machine,

AC

CE

PT

ED

M

AN US

CR IP T

sparse coding, global and nonlocal optimization.

1 Introduction High-resolution (HR) images often preserve more details and critical information 3

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences for later image processing, analysis and interpretation. However, due to the limitations of physical imaging system and the disturbances from external environment, it is difficult to obtain an image with the desired resolution [5]. As an effective technique, which can obtain the visually pleasing HR images from a low-cost imaging system and

CR IP T

limited environmental condition, image super-resolution (SR) has attracted much attentions and produced extremely promising results for many civilian and military applications, such as criminal investigation [61], video surveillance [60], medical

AN US

imaging [28], etc.

According to the number of input images, the existing image SR methods can be divided into multi-frame SR [18,32,44] and single-frame or single image SR

M

[6,11,21,35,36,39,40,45,46,49,58,59]. The multi-frame SR methods attempt to reconstruct an HR image by fusing a sequence of low-resolution (LR) images with

ED

subpixel displacement from the same scene. By contrast, the single image SR (SISR)

PT

methods aim to recover the HR image from a given LR image via removing the degradations caused by the limitations of physical imaging system and imaging

CE

environment. In this paper, we primarily focus on the SISR problem.

AC

The SISR methods can be classified into interpolation-based, reconstruction-based, and learning-based methods. The interpolation-based methods, such as bicubic interpolation, edge-guided interpolation, and nearest neighbor interpolation, typically adopt fixed-function kernels [21,45] or structure-adaptive kernels [11,40,59] to estimate the unknown pixels in the HR grid. Although the interpolation-based methods can reconstruct HR images in a very simple and effective way, they are usually prone to 4

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences yield overly smooth and blurring images. Therefore, the reconstructed results are unsatisfactory to the practical applications. The reconstruction-based methods usually suppose that the observed LR image is a product of several degradation factors such as blurring, down-sampling, and noising

CR IP T

with additive zero-mean white and Gaussian noise [58]. Since many HR images may degrade to the same LR image, the SISR is inherently ill-posed problem. To tackle the ill-posed problems, the certain priors or constraints need to be imposed in the SR

AN US

process. The typical priors include edge-directed priors [6,39,46,49], gradient profile priors [34, 35,36], Bayesian priors [1,24,29] and nonlocal self-similarity priors [23,27,30,57]. Although this kind of SR methods is particularly effective to preserve

M

geometric structure and to suppress ringing artifacts, it fails to add sufficient novel high frequency details to the reconstructed HR image and is limited in reconstructing the

ED

visual complexity of the real image, especially at high magnification.

PT

The learning-based methods presume that certain relationships exist between the LR images and their corresponding HR counterparts and that these relationships can be

CE

learned from millions of co-occurrence LR-HR image, before they are used to

AC

reconstruct a new HR image. Since the learning-based methods exploit the information on training images or example images adequately, they are able to recover the missing high-frequency details caused by the degradation factors and are superior to other SR methods. Depending on how the mapping relationships are established, the learning-based methods can be mainly divided into the regression-based methods and the coding-based methods. 5

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences The regression-based methods typically establish regression models to reveal the relationships between the LR and HR images. For example, Kim and Kwon [22] utilize kernel ridge regression (KRR) to estimate the high-frequency details of the desired HR image. He and Siu [16] predict each pixel of the HR image by its neighbors through

CR IP T

Gaussian process regression (GPR) with a proper covariance function. Timofte et al. [42] reduce the image SR to a projection problem from input feature space to HR feature space via the anchored neighborhood regression (ANR). Recently, they further propose

AN US

an improved ANR framework (A+) by combining the best qualities of ANR and simple functions [41]. Dong et al. [9] propose a deep learning method to train a deep convolutional neural network for establishing the end-to-end mapping between the LR

M

and HR images. Kim et al. [19,20] present two highly accurate SISR methods by training a deeper convolutional network inspired by VGG-net used for ImageNet

ED

classification [33] and a deeply-recursive convolutional network (up to 16 recursions).

PT

Recently, the more baseline techniques are proposed by Timofte et al.[43] to improve the learning-based single image SR.

CE

Another kind of learning-based methods is the coding-based ones that attempt to

AC

capture the relationships between the LR and HR images by space transformation. The coding-based methods include NE-based learning [4,13,56], k-nearest neighbor (k-NN) learning [12,37], and sparse coding [10,15,25,47,50,52,53,54]. The k-NN and NE-based learning methods often need to search similar patterns in an immense reference dataset for optimally representing complicated structures in generic images, and therefore the image SR lacks efficiency for practical applications [58]. Recently, the sparse coding 6

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences methods show promising performance to reconstruct the visually pleasing HR image from one LR image. In particular, Yang et al. [53] presume that the LR and HR feature spaces share the same sparse code with respect to their own dictionaries and jointly train a compact dictionary pair to capture the relationships between two feature spaces.

CR IP T

However, since the jointly trained dictionary pair in the training phase cannot guarantee the co-occurrence of the sparse code in the reconstruction phase, it fails to reveal accurately the intrinsic relationships between the two feature spaces. Several methods

AN US

have been proposed to alleviate the above-mentioned problem. Zeyde et al. [54] propose a two-step learning method, where the LR dictionary is learned by KSVD and the HR one is generated via least-square. Although it can guarantee co-occurrence of sparse

M

code, the HR dictionary has more errors to characterize the structure of HR feature space. Wang et al. [47] improve SR result by training a pair of dictionaries and a linear

ED

mapping function simultaneously. However, the linear mapping function is always not

PT

enough to reveal the complicated nonlinear relationships between the two feature spaces. Yang et al. [52] use a bilevel optimization of the problem instead of solving the

CE

two-optimization problems in two feature spaces together in the training phase. Since

AC

the bilevel optimization is highly nonlinear and nonconvex, it is difficult to achieve an optimal solution. He et al. [15] utilize the beta process prior to learn the over-complete dictionary pairs for adding more consistent and accurate mapping between two feature spaces. Yang et al. [50] propose a consistent coding scheme to guarantee the prediction accuracy of HR sparse code. Nevertheless, the independent trained dictionaries for each feature spaces enhance the difficulty to reveal relationships via established mapping 7

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences function in sparse representation space. Li et al. [25] present a dual-sparsity regularized sparse representation (DSRSR) model that exploring both the column and row nonlocal similarity priors among the matrix of sparse code for different image patch. Dong et al. [10] present a unified image restoration framework by combining local sparsity,

image denoising, deburring, and SR reconstruction.

CR IP T

nonlocal self-similarity, and local autoregressive constraints, which performs well on

Although the sparse coding methods have attained a convincing improvement over

AN US

many other state-of-the-art SR methods, there are still several challenging issues that need to be addressed. First, most existing sparse coding methods attempt to train one or a set of dictionary pairs to reveal the complex, spatial-variant and nonlinear

M

relationships between the LR and HR feature spaces, which leads to SR models complexity, large errors, and time or space sensitivity. Second, because the current SR

ED

methods are typically based on a fixed size image patch, which often ignore the inherent

PT

geometry structures in the natural image, it is easy to cause ringing artifacts and increase reconstruction errors. Third, most of sparse coding algorithms mainly focused

CE

on developing image priors to reconstruct the details rather than preserve geometry

AC

feature, which will lead to the blurring of small scale structures and excessively smooth image edges. To address the aforementioned problems, we explore a new SISR method by

combining sparse coding with the improved structured output regression machine (SORM). In the proposed method, we first train a set of dictionary pairs by the joint dictionary training method to transform the model domain from image space to sparse 8

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences coding space. Secondly, considering that the classical SORM does not give sufficient weight to the independence of different output components, we improve the SORM and make it more suitable to train the mapping functions in the sparse coding space for preferably revealing the relationships between the sparse codes of the LR and HR

CR IP T

feature spaces. Once the jointly trained dictionary pairs and the mapping functions are obtained, the more accurate relationships between two feature spaces can be established and then the missing details in a given LR input image can be predicted efficiently. In

AN US

addition, considering that the reconstruction based on fixed size image patches may cause ringing artifacts, a new global and nonlocal optimization is introduced into our framework to suppress these ringing artifacts and further improve the quality of desired

M

HR image. The major advantages of the proposed method are summarized as follows: 1) Since the establishment of mapping functions is based on elementary structures

ED

(atoms) of the two feature spaces, the proposed method is able to better preserve

PT

geometry structures and suppress noises. 2) Compared to consistent coding scheme [50] the jointly trained dictionary pairs

CE

can well characterize each feature space and add more consistency between the sparse

AC

code of two feature spaces. The more consistency is beneficial to establish the mapping functions.

3) Since the improved SORM considers not only the underlying correlations

between inputs and corresponding output but also the correlations and independence between different output components, the trained mapping functions can more precisely reveal the complicated nonlinear relationships between the two feature spaces. 9

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences 4) Because the geometry structure information is adequately considered in measuring the similarity between image patches, the proposed global and nonlocal optimization can better suppress ringing artifacts and improve the final HR image quality.

CR IP T

The remainder of the paper is organized as follows. Section 2 briefly reviews the related work that is important to the proposed method. Section 3 details our formulation and solution to the image SR problem based upon sparse coding and the improved

AN US

SORM. Various experimental results in Section 4 demonstrate the efficacy of the proposed method. The conclusion of this paper is drawn in Section 5. 2 Related Work

M

In this section, we will describe two related dictionary learning methods-sparse coding in a single feature space and couple sparse coding in two feature spaces for SR

PT

2.1 Sparse Coding

ED

problem, which are important to the proposed method.

For an image vector ๐’™ โˆˆ ๐‘… ๐‘€ , we denote the ๐‘–th local patch of ๐’™ by ๐’™๐‘– = ๐‘ท๐‘– ๐’™,

CE

where ๐‘ท๐‘– is a matrix which extracts patch ๐’™๐‘– from ๐’™ at the spatial location ๐‘–. The

AC

goal of sparse coding is to represent the patch ๐’™๐‘– โˆˆ ๐‘… ๐‘‘ approximately as a weighted linear combination of a few basis atoms which often chosen from an over-complete dictionary ๐‘ซ๐’™ โˆˆ ๐‘… ๐‘‘ร—๐พ (๐‘‘ โ‰ช ๐พ). Compared with the classical signal representation, the key advantage of sparse coding is able to learn such dictionary ๐‘ซ๐’™ from a large number of samples. If there are enough image patches to form a training set *๐’™๐‘– +๐‘ ๐‘–=1 , the problem of learning the dictionary is solved by minimizing the following energy 10

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences function that combines squared reconstruction errors and the ๐‘™1 -sparsity penalties 2 min โˆ‘๐‘ ๐‘–=1*โ€–๐’™๐‘– โˆ’ ๐‘ซ๐’‚๐‘– โ€–2 + ๐œ†โ€–๐’‚๐‘– โ€–1 +

๐‘ซ,*๐œถ๐‘– +๐‘ ๐‘–=1

s. t โ€–๐‘ซ๐‘ฅ (: , ๐‘˜)โ€–2 โ‰ค 1, โˆ€๐‘˜ โˆˆ *1,2 โ‹ฏ ๐พ+

(1)

where ๐’‚๐‘– is the sparse code of ๐’™๐‘– , ๐‘ซ๐‘ฅ (: , ๐‘˜) is the ๐‘˜-th column of ๐‘ซ๐’™ , and ๐œ† is a

CR IP T

parameter controlling the sparsity penalty and representation fidelity. Since the above-optimization problem is convex in either ๐‘ซ๐’™ or *๐’‚๐‘– +๐‘ ๐‘–=1 when the other is fixed, it can be effectively solve by many ๐‘™1 minimization techniques, such as iterative Shrinkage algorithm [7] and Bregman split algorithm [55] Once the dictionary ๐‘ซ๐’™ was

AN US

trained, the sparse code for a given patch ๐’™๐‘– can be obtained via minimizing ฬ‚๐‘– = arg min*โ€–๐’™๐‘– โˆ’ ๐‘ซ๐’™ ๐’‚๐‘– โ€–22 + ๐œ†โ€–๐’‚๐‘– โ€–1 + ๐’‚ ๐œถ

2.2 Couple Sparse Coding

(2)

M

Unlike the general sparse coding, couple sparse coding considers the problem of

ED

two coupled feature spaces. Since the observed LR images can be seen as a degraded product of HR images, which can be generally formulated as [51]

PT

๐’š = ๐‘ซ๐‘ฏ๐’™ + ๐’—

(3)

CE

where ๐’™ and ๐’š represent the original HR and observed LR image respectively, D is the down-sampling operator, H is the blurring filter, and v represents the additive noise,

AC

the recovery of HR image ๐’™ from the observation ๐’š is a typical problem of couple feature space. In order to tackle this problem, Yang et al. [53] propose to train two dictionaries ๐‘ซ๐‘ฆ and ๐‘ซ๐‘ฅ to tie the LR image feature spaces ๐‘Œ and its corresponding HR image feature spaces ๐‘‹ by postulating that the sparse code of ๐’š๐‘– โˆˆ ๐‘Œ in terms of ๐‘ซ๐‘ฆ is the same as that of ๐’™๐‘– โˆˆ ๐‘‹ in terms of ๐‘ซ๐‘ฅ . To train the dictionary ๐‘ซ๐‘ฆ and ๐‘ซ๐‘ฅ , 11

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences a joint dictionary training method for couple sparse coding is proposed via generalizing the general sparse coding scheme as follows min

๐‘ซ๐‘ฅ ,๐‘ซ๐‘ฆ *๐œถ๐‘– +๐‘ ๐‘–=1

2

2 โˆ‘๐‘ ๐‘–=1 2โ€–๐’™๐‘– โˆ’ ๐‘ซ๐‘ฅ ๐’‚๐‘– โ€–2 + โ€–๐’š๐‘– โˆ’ ๐‘ซ๐‘ฆ ๐’‚๐‘– โ€–2 + ๐œ†โ€–๐’‚๐‘– โ€–1 3

s. t โ€–๐‘ซ๐‘ฅ (: , ๐‘˜)โ€–2 โ‰ค 1, โ€–๐‘ซ๐‘ฅ (: , ๐‘˜)โ€–2 โ‰ค 1, โˆ€๐‘˜ โˆˆ *1,2 โ‹ฏ ๐พ+

(4)

Grouping the two reconstruction error terms together and denoting ๐‘ซ ฬ… = [ ๐‘ฅ] ๐‘ซ ๐‘ซ

(5)

CR IP T

๐’™๐‘– ฬ…๐‘– = 0๐’š 1 ๐’™ ๐‘–

๐‘ฆ

the Eqn. (4) can be converted to the standard sparse coding as

ฬ… ๐’‚๐‘– โ€–22 + ๐œ†โ€–๐’‚๐‘– โ€–1 + ฬ…๐‘– โˆ’ ๐‘ซ min โˆ‘๐‘ ๐‘–=1*โ€–๐’™

ฬ… ,*๐›ผ๐‘– +๐‘ ๐‘ซ ๐‘–=1

(6)

AN US

ฬ… ๐‘ฅ (: , ๐‘˜)โ€–2 โ‰ค 1, โˆ€๐‘˜ โˆˆ *1,2 โ‹ฏ ๐พ+ s. t โ€–๐‘ซ

Thus, once the dictionary ๐‘ซ๐‘ฆ and ๐‘ซ๐‘ฅ are obtained by minimizing Eqn. (6), we can recover the desired HR image ๐’™ from its corresponding observed LR image y. However,

since there are not equivalent constrains to the sparse code of ๐’™ and ๐’š as

M

in the training phase, we can only obtain the sparse code of ๐’š with respect to ๐‘ซ๐‘ฆ and

ED

use it as an approximation to the sparse code of ๐’™ in terms of ๐‘ซ๐‘ฅ , which leads to large

PT

error in reconstructing the HR image. To alleviate the problems of jointly learned dictionaries, Yang et al. [52] further

CE

propose a bilevel dictionary training method to alternatively optimize the dictionaries

AC

๐‘ซ๐‘ฅ and ๐‘ซ๐‘ฆ . When ๐‘ซ๐‘ฅ is fixed, the problem of optimization over ๐‘ซ๐‘ฆ was solved by min โˆ‘๐‘ ฬ‚ ๐‘– โ€–22 ๐‘–=1โ€–๐’™๐‘– โˆ’ ๐‘ซ๐’™ ๐’› ๐‘ซ๐‘ฆ

2

๐‘ . ๐‘ก ๐’›ฬ‚๐‘– = arg min โ€–๐’š๐‘– โˆ’ ๐‘ซ๐‘ฆ ๐’›๐‘– โ€–2 + ๐œ†โ€–๐’›๐‘– โ€–1 ๐’›๐‘–

(7)

โ€–๐‘ซ๐‘ฅ (: , ๐‘˜)โ€–2 โ‰ค 1, โˆ€๐‘˜ โˆˆ *1,2 โ‹ฏ ๐พ+ where ๐’›ฬ‚๐‘– is the sparse code for each ๐’š๐‘– โˆˆ ๐‘Œ with respect to ๐‘ซ๐‘ฆ . Minimizing the Eqn. (7) over ๐‘ซ๐‘ฆ is a complicated nonlinear and nonconvex bilevel programming problem 12

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences because the independent variable ๐’›ฬ‚๐‘– of upper-level optimization is the optimum of a lower-level ๐‘™1 -minimization. For that reason, Yang et al. [52] explore a first-order projected stochastic gradient descent algorithm to search the feasible direction which will decrease the objective function value of upper-level optimization. Although

CR IP T

compared with the joint dictionary training method, the bilevel dictionary training method for couple sparse coding achieves some improvements in recovering the desired HR image because of the establishment of a more accurate mapping between the two

AN US

feature spaces, it always has the following problems:

1) Owing to the highly nonlinearity and non-convexity of Eqn. (7), it is difficult to find a local optimum that will noticeably decrease the objective function value. 2) During each iteration of the training dictionary, we should optimize a

M

๐‘™1 -minimization problem for each ๐’š๐‘– in the training set *๐’š๐‘– +๐‘ ๐‘–=1 , which will

ED

dramatically increase computational burden.

PT

Moreover, the couple sparse coding method also is a linear model. As a result, the learned dictionaries ๐‘ซ๐‘ฅ and ๐‘ซ๐‘ฆ by couple sparse coding still have difficulty exactly

CE

capturing the complex nonlinear relationship between two feature spaces, as well as

AC

increase computational burden. 3 Proposed method To alleviate the above-mentioned problems, we proposed a new method by

combining sparse coding with the improved SORM to reveal the complicated nonlinear relationships between the LR feature space and its corresponding HR feature space. Moreover, a new global and nonlocal optimization model is introduced into the 13

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences proposed method to further improve the reconstructed result. Therefore, in this section, we first describe the improved SORM and the global and nonlocal optimization model, and then discuss how to combine them into our framework to solve the SISR problem. 3.1 Improved Structured Output Regression Machine

CR IP T

Multi-output regression aims to predict an output vector from an input vector via learning the relationships between the input space and the output space. Since the classical multi-output regression models are always constructed by assembling multiple

AN US

single-output regression, it ignores the correlations between the different output components and reduces the regression accuracy [2,38]. In order to utilize the possible correlations to improve the regression accuracy, Brudnak [3] propose a vector-valued

M

support vector regression (SVR) model by extending the estimator, regularization, and loss function from the scalar-valued to a vector one. Xu et al. [48] propose a

ED

multiple-output least-square SVR (MLS-SVR) model that takes full consideration into

PT

the circumstance that each sub-model may affect the other sub-models during the training phase. Since this kind of MLS-SVR model considers the correlations between

CE

different output components, we also call it as the structured output regression machine

AC

(SORM) model [8].

Although the SORM improves the regression accuracy by utilizing the correlations,

it does not give sufficient weight to the independence of different output components. Hence, we can improve the SORM by fully considering correlation and independence between different output components. In the following, we will detail the improved SORM. 14

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences ๐‘‘ Given a training set *(๐๐‘– , ๐‚๐‘– )+๐‘ is an input vector with ๐‘–=1 , where ๐๐‘– โˆˆ ๐‘…

dimensionality ๐‘‘, and ๐‚๐‘– โˆˆ ๐‘… ๐‘š is a m-dimensional output vector. The regression function of the SORM in the primal weight spaces is formulated as ๐‘“(๐) = ๐‘พ๐‘‡ ๐œ‘(๐) + ๐’ƒ

(8)

CR IP T

where ๐‘พ = (๐’˜1 , ๐’˜2 , โ‹ฏ , ๐’˜๐‘š ) โˆˆ ๐‘… ๐‘‘โ„Ž ร—๐‘š , ๐’ƒ โˆˆ ๐‘… ๐‘š , ๐ โˆˆ ๐‘… ๐‘‘ , and ๐œ‘(โˆ™) : ๐‘… ๐‘‘ โ†’ ๐‘… ๐‘‘โ„Ž is used to map the input vector to high-dimensional feature spaces. According to the structural risk minimization principle, the SORM problem can be expressed as a constrained optimization problem

AN US

2 1 1 ๐‘š (๐‘–) ๐‘‡ ๐‘ min ๐ฝ(๐’˜๐‘– , ๐’†๐‘˜ ) = 2 โˆ‘๐‘š ๐‘–=1 ๐’˜๐‘– ๐’˜๐‘– + 2 โˆ‘๐‘–=1 ๐›พ๐‘– โˆ‘๐‘˜=1.๐’†๐‘˜ /

๐’˜๐‘– ,๐’ƒ,๐’†๐‘˜

๐‘ . ๐‘ก ๐‚๐‘˜ = ๐‘พ๐‘‡ ๐œ‘(๐๐‘˜ ) + ๐’ƒ + ๐’†๐‘˜ , โˆ€๐‘˜ โˆˆ *1,2 โ‹ฏ ๐‘+

(9)

(๐‘–)

where ๐’†๐‘˜ โˆˆ ๐‘… ๐‘š is the error variable, ๐’†๐‘˜ is the ๐‘–th element of the vector ๐’†๐‘˜ , and ๐›พ๐‘–

M

(โˆ€๐‘– โˆˆ *1,2, โ‹ฏ , ๐‘š+, โˆ€๐›พ๐‘– > 0) are the regularization parameters. In order to consider the

ED

correlations, Xu et al [48] assume that all ๐’˜๐‘– โˆˆ ๐‘… ๐‘‘โ„Ž (โˆ€๐‘– โˆˆ *1,2, โ‹ฏ , ๐‘š+) can be written as ๐’˜๐‘– = ๐’˜0 + ๐’˜ ฬƒ ๐‘– , where the vectors ๐’˜ ฬƒ ๐‘– are small when the different outputs are

PT

similar to each other, otherwise the mean vectors ๐’˜0 are small. Accordingly, the Eqn. (9) can be rewritten as follow: 2 1 1 1 ๐‘š (๐‘–) ๐‘‡ ๐‘ โˆ‘ โˆ‘ ๐ฝ(๐’˜0 , ๐’˜ ฬƒ ๐‘– , ๐’†๐‘˜ ) = 2 ๐’˜๐‘‡0 ๐’˜0 + 2 โˆ‘๐‘š ๐œ‰ ๐’˜ ฬƒ ๐’˜ ฬƒ + ๐›พ .๐’† / ๐‘–=1 ๐‘– ๐‘– ๐‘– 2 ๐‘–=1 ๐‘– ๐‘˜=1 ๐‘˜

CE min

๐‘ . ๐‘ก ๐‚๐‘˜ = ๐‘พ๐‘‡ ๐œ‘(๐๐‘˜ ) + ๐’ƒ + ๐’†๐‘˜ , โˆ€๐‘˜ โˆˆ *1,2 โ‹ฏ ๐‘+

(10)

AC

๐’˜0 ,๐’˜ ฬƒ ๐‘– ,๐’ƒ,๐’†๐‘˜

where ๐œ‰๐‘– (โˆ€๐‘– โˆˆ *1,2, โ‹ฏ , ๐‘š+, โˆ€๐œ‰๐‘– > 0) is the regularization parameter that control the weight of correlation. Since the improved SORM can optimize the parameters ๐œ‰๐‘– and ๐›พ๐‘– for each out component, the weight of correlations can be tuned for more regression accuracy. Actually, it preserves the independence between different output components. The Lagrangian function for Eqn. (10) is 15

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences ๐ฟ(๐’˜๐‘– , ๐’ƒ, ๐’†๐‘˜ , ๐œถ๐‘˜ ) = ๐ฝ(๐’˜0 , ๐’˜ ฬƒ ๐‘– , ๐’†๐‘˜ ) โˆ’ โˆ‘

๐‘

๐œถ๐‘˜ ๐‘‡ (๐‘พ๐‘‡ ๐œ‘(๐๐‘˜ ) + ๐’ƒ + ๐’†๐‘˜ โˆ’ ๐‚๐‘˜ ) (11)

๐‘˜=1

where ๐œถ๐‘˜ โˆˆ ๐‘… ๐‘š (โˆ€๐‘˜ โˆˆ *1,2, โ‹ฏ , ๐‘+). According to the KKT conditions, the partial derivatives of Eqn. (11) are taken as

CR IP T

๐‘ ๐‘š ๐œ•๐ฟ (๐‘–) = 0 โ†’ ๐’˜0 = โˆ‘ (โˆ‘ ๐œถ๐‘˜ ) ๐œ‘(๐๐‘˜ ) ๐œ•๐’˜0 ๐‘˜=1 ๐‘–=1 ๐‘ ๐œ•๐ฟ 1 (๐‘–) =0โ†’ ๐’˜ ฬƒ๐‘– = โˆ‘ ๐œถ๐‘˜ ๐œ‘(๐๐‘˜ ) ๐œ•๐’˜ ฬƒ๐‘– ๐œ‰๐‘– ๐‘˜=1 ๐‘ ๐œ•๐ฟ = 0 โ†’ โˆ‘ ๐œถ๐‘˜ = ๐ŸŽ๐‘š ๐œ•๐’ƒ ๐‘˜=1 ๐œ•๐ฟ = 0 โ†’ ๐œถ๐‘˜ = ๐šผ๐’†๐‘˜ ๐œ•๐’†๐‘˜ ๐œ•๐ฟ = 0 โ†’ ๐‘พ๐‘‡ ๐œ‘(๐๐‘˜ ) + ๐’ƒ + ๐’†๐‘˜ โˆ’ ๐‚๐‘˜ = ๐ŸŽ๐‘š { ๐œ•๐œถ๐‘˜

AN US

(12)

where ๐‘˜ = 1,2, โ‹ฏ ๐‘, ๐‘– = 1,2, โ‹ฏ ๐‘š, ๐šผ = diag(๐›พ1 , โ‹ฏ , ๐›พ๐‘š ) โˆˆ ๐‘… ๐‘šร—๐‘š , ๐‘ฌ๐‘š is the identity (๐‘–)

matrix of dimension ๐‘š ร— ๐‘š , and ๐œถ๐‘˜

is the ๐‘– th element of the vector ๐œถ๐‘˜ . By

M

eliminating the variable ๐‘พ and ๐’†๐‘˜ , we then obtain the following KKT system

[๐ŸŽ๐‘šร—๐‘š ๐‘ช

๐‘ช๐‘‡ ] 0 ๐’ƒ 1 = 0๐ŸŽ๐’Ž 1 ๐‚ ๐œด+๐‘ซ ๐œถ

(13)

PT

where

ED

equation.

CE

1 1 ๐‘ช = blockdiag {[ โ‹ฎ ] , โ‹ฏ , [ โ‹ฎ ]} โˆˆ ๐‘…๐‘๐‘™ร—๐‘š , ๐‘๐‘™ = ๐‘š ร— ๐‘; 1 1

AC

๐œด = diag(1โ„๐œ‰1 , โ‹ฏ , 1โ„๐œ‰๐‘š ) โŠ— ๐‘ฒ + ๐‘ฌ๐‘š โŠ— ๐‘ฒ๐‘‡ โˆˆ ๐‘…๐‘๐‘™ร—๐‘๐‘™ ; ๐‘‡

๐‘ฒ = (๐œ‘(๐1 ), โ‹ฏ , ๐œ‘(๐๐‘ )) (๐œ‘(๐1 ), โ‹ฏ , ๐œ‘(๐๐‘ )) โˆˆ ๐‘… ๐‘ร—๐‘ ; ๐‘ซ = ๐‘ฌ๐‘ โŠ— ๐šผ โˆ’1 โˆˆ ๐‘…๐‘๐‘™ร—๐‘๐‘™ ; ๐œถ = (๐œถ1๐‘‡ , โ‹ฏ , ๐œถ๐‘‡๐‘ )๐‘‡ โˆˆ ๐‘…๐‘๐‘™ ; ๐‚ = (๐‚1๐‘‡ , โ‹ฏ , ๐‚๐‘‡๐‘ )๐‘‡ โˆˆ ๐‘…๐‘๐‘™ .

where โŠ— is the Kronecker product. Thus, the matrix ๐‘ฒ is composed of the kernel 16

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences function ๐‘ฒ(๐๐‘˜๐‘– , ๐๐‘˜๐‘— ) = ๐œ‘(๐๐‘˜๐‘– )๐‘‡ ๐œ‘(๐๐‘˜๐‘— ), (โˆ€๐‘˜๐‘–, โˆ€๐‘˜๐‘— โˆˆ *1,2, โ‹ฏ , ๐‘+). There is a variety of kernel functions for different problems, including Linear kernel, Polynomial kernel, Radial basis function (RBF) kernel, and so on. Since our problem is complicated nonlinear problem, the following RBF kernel function is applied in the improved model.

CR IP T

๐‘ฒ(๐๐‘˜๐‘– , ๐๐‘˜๐‘— ) = ๐œ‘(๐๐‘˜๐‘– )๐‘‡ ๐œ‘(๐๐‘˜๐‘— ) = exp(โˆ’โ€–๐๐‘˜๐‘– โˆ’ ๐๐‘˜๐‘– โ€–22 โ„๐œŽ๐‘–2 ), ๐‘– = 1, โ‹ฏ , ๐‘š (14) where ๐œŽ๐‘– is the scale parameter that is related to the bandwidth of the kernel in statistics which have shown that the bandwidth is an important parameter to generalize

AN US

the behavior of a kernel method. Since the KKT system in Eqn. (13) consists of (๐‘ + 1) ร— ๐‘š linear equations and the RBF is a positive definite kernel, it can be reliably solved by many algorithms that have been developed in numerical analysis, including Conjugate Gradient (CG), Successive Over-Relaxation (SOR), Generalized

M

Minimal Residual, etc. Once the ๐œถ = (๐œถ1๐‘‡ , โ‹ฏ , ๐œถ๐‘‡๐‘ )๐‘‡ and ๐’ƒ are obtained by solving

ED

Eqn. (13), the predictive model of the improved SORM can be expressed as ๐‘“(๐) = ๐‘พ๐‘‡ ๐œ‘(๐) + ๐’ƒ = (๐’˜1 , โ‹ฏ , ๐’˜๐‘š )๐‘‡ ๐œ‘(๐) + ๐’ƒ

PT

๐‘‡ (1) (๐‘š) ๐‘ .โˆ‘๐‘ ๐‘˜=1 ๐œถ๐‘˜ ๐œ‘(๐๐‘˜ ) , โ‹ฏ , โˆ‘๐‘˜=1 ๐œถ๐‘˜ ๐œ‘(๐๐‘˜ )/ ๐œ‘(๐)

=

(1)

(๐‘š)

(15) +๐’ƒ ๐‘‡

CE

๐‘ ๐‘‡ ๐‘‡ = .โˆ‘๐‘ ๐‘˜=1 ๐œถ๐‘˜ ๐œ‘(๐) ๐œ‘(๐๐‘˜ ) , โ‹ฏ , โˆ‘๐‘˜=1 ๐œถ๐‘˜ ๐œ‘(๐) ๐œ‘(๐๐‘˜ )/ + ๐’ƒ

AC

Unlike the classical SORM, the improved SORM not only considers the correlation among different output components but also the independence by tuning the parameters *๐œ‰๐‘– , ๐›พ๐‘– , ๐œŽ๐‘– + for each output components. Therefore, the improved SORM is superior to the classical one. In this paper, since the sparse code has a strong sparsity that only a few of components is nonzero, which implies a high similarity between different components, and independence that each component represents a very different 17

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences elementary structure, the improved SORM is more suitable for solving our problem. 3.2 Global and Nonlocal Optimization Because of the patch-based reconstruction and other factors, the estimated HR image ๐’™0 may not satisfy the Eqn. (3) exactly and may contain ringing artifacts.

CR IP T

Inspired by [26,53], the following global constraint is introduced into the proposed method as the regularization term to enforce the estimated HR image ๐’™0 to satisfy the image degradation model

๐’™โˆ— = minโ€–๐‘ซ๐‘ฏ๐’™ โˆ’ ๐’šโ€–22 + ฮดโ€–๐’™ โˆ’ ๐’™0 โ€–22

(16)

AN US

๐‘ฅ

where ฮด is the regularization parameter. DH is a randomly generated linear operator that analogizes the degradation process of blurring and down-sampling. At the same

M

time, in order to eliminate the ringing artifacts caused by noise and other factors, we explore the nonlocal self-similarity that is a useful prior knowledge for various image

ED

restoration tasks. The nonlocal self-similarity presumes that a small patch in the nature

PT

image tends to repeat itself many times redundantly and can be approximated by the convex combination of its similar image patches [14]. Mathematically, the constraint is

CE

given by

๐‘—

๐’™๐‘– = โˆ‘ ๐œ“๐‘–๐‘— ๐’™๐‘–

(17)

AC

๐‘—

๐‘—

where ๐’™๐‘– is the image patch at the ๐‘–th position of image, ๐’™๐‘– is the ๐‘—th similar image patch of image patch ๐’™๐‘– within a search window, ๐œ“๐‘–๐‘— denotes the similarity weight and can be computed by ๐‘— 2

โ€–๐’™๐‘– โˆ’ ๐’™๐‘– โ€–2 1 ๐œ“๐‘–๐‘— = exp (โˆ’ ) ๐‘๐‘– โ„Ž 18

(18)

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences where ๐‘๐‘– = โˆ‘๐‘— ๐œ“๐‘–๐‘— is a normalization factor and โ„Ž is the attenuator that controls the extent of averaging. However, since the weight in measurement of similarity merely takes into account Euclidean distance between image patches, it is less sensitive to the image patches with fine texture structure. In order to overcome this drawback, a new

CR IP T

similarity weight is proposed by utilizing the spatial and intensity information of the pixels. Thus, the proposed similarity weight can be calculated by the following formula ๐‘— 2

(19)

AN US

โ€–๐’™๐‘– โˆ’ ๐’™๐‘– โ€–2 |๐ท๐’™๐‘– (๐‘๐‘– , ๐‘๐‘– ) โˆ’ ๐ท๐’™๐‘—๐‘– (๐‘๐‘— , ๐‘๐‘— )| 1 ๐œ“๐‘–๐‘— = exp (โˆ’ โˆ’ ) ๐‘๐‘– โ„Ž1 โ„Ž2

where โ„Ž1 and โ„Ž2 are the attenuators, ๐ท๐’™๐‘– (๐‘๐‘– , ๐‘๐‘– ) is the geodesic distance between a pixel ๐‘๐‘– of the image patch ๐’™๐‘– and its center ๐‘๐‘– , and ๐ท๐’™๐‘— (๐‘๐‘— , ๐‘๐‘— ) also is the geodesic ๐‘–

๐‘—

distance of ๐’™๐‘– with the same pixel and path to the image patch ๐’™๐‘– . The ๐ท๐’™๐‘– (๐‘๐‘– , ๐‘๐‘– ) is

M

defined as the shortest path that connects ๐‘๐‘– and ๐‘๐‘–

๐ท๐’™๐‘– (๐‘๐‘– , ๐‘๐‘– ) = min ๐‘‘(๐‘)

(20)

ED

๐‘โˆˆ๐‘ƒ๐‘๐‘– ,๐‘๐‘–

PT

where ๐‘ƒ๐‘๐‘– ,๐‘๐‘– is the set of all paths connecting the pixel ๐‘๐‘– and center ๐‘๐‘– . The path ๐‘ƒ is defined as a sequence of spatially neighboring points in 8-connected neighborhood ๐‘›

CE

which connects the pixel ๐‘๐‘– and center ๐‘๐‘– . Let the sequence ๐‘ƒ be ๐‘ƒ = *๐‘๐‘–2 โ‹ฏ ๐‘๐‘– ๐‘– +,

AC

then the ๐‘‘(๐‘) is computed by ๐‘‘(๐‘) = โˆ‘

๐‘›๐‘– ๐‘˜=2

|๐’™๐‘– (๐‘๐‘–๐‘˜ ) โˆ’ ๐’™๐‘– (๐‘๐‘–๐‘˜โˆ’1 )|

(21) ๐‘—

Intuitively, if the difference of geodesic distance between the image patch ๐’™๐‘– and ๐’™๐‘– ๐‘—

is low, the image patch ๐’™๐‘– and ๐’™๐‘– will have more similar textures or geometric ๐‘—

structure. And larger similarity weights should be given to the image patch ๐’™๐‘– . Although computing the geodesic distance for all pixels in an image patch is a NP-hard problem, 19

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences a fast approximation algorithm was reported in [17]. Moreover, for each image patch, we only need to compute the geodesic distance of a randomly given pixel. In addition, because of the same pixel and path, it is easy to compute the geodesic distance of its similar image patches. Therefore, the new similarity weight will maintain a lower

CR IP T

computational cost. By incorporating the nonlocal self-similarity constraint into Eqn. (16), the following global and nonlocal optimization can be obtained

๐’™โˆ— = minโ€–๐‘ซ๐‘ฏ๐’™ โˆ’ ๐’šโ€–22 + ฮดโ€–๐’™ โˆ’ ๐’™0 โ€–22 + ฮทโ€–๐’™ โˆ’ ๐๐’™โ€–22 ๐’™

(22)

computed by ๐(๐‘–, ๐‘—) = {๐œ“๐‘–๐‘—

AN US

where ฮท is the regularization parameter, ๐ is the similarity weight matrix and can be

๐‘—

if ๐’™๐‘– is the similar image patch to ๐’™๐‘– 0 otherwise

(23)

M

3.3 Algorithm of the Proposed Method for SISR

In section 3.1 and 3.2, we introduced the improved SORM and the global and

ED

nonlocal optimization model. Now we will discuss how to incorporate them into the

PT

proposed method to solve image SR problem. As show in Fig.1, The proposed SISR method is composed of two parts: the learning phase and the HR image reconstruction

CE

phase.

AC

(1) Learning phase: Firstly, we extract respectively the image patches from the training LR images *๐’š+ and its corresponding HR images *๐’™+ to form a couple feature ๐‘ space *(๐’š๐‘– , ๐’™๐‘– )+๐‘ ๐‘–=1 . Secondly, the couple feature space *(๐’š๐‘– , ๐’™๐‘– )+๐‘–=1 is partitioned into ๐‘

๐พ ๐‘1 ๐‘2 ๐พ K sub-feature space *(๐’š1๐‘– , ๐’™1๐‘– )+๐‘–=1 , *(๐’š2๐‘– , ๐’™2๐‘– )+๐‘–=1 , โ€ฆ, {(๐’š๐‘ฒ ๐’Š , ๐’™๐‘– )}๐‘–=1 by k-means with

๐‘๐‘˜

Euclidean distance. For each sub-feature space {(๐’š๐‘˜๐‘– , ๐’™๐‘˜๐‘– )}๐‘–=1 , โˆ€๐‘˜ โˆˆ *1,2, โ‹ฏ , ๐พ+ , a compact LR-HR sub-dictionary pair {๐‘ซ๐‘˜๐ฟ , ๐‘ซ๐‘˜๐ป } is trained via joint sparse coding, and 20

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences ๐‘๐‘˜

then the sub-feature space {(๐’š๐‘˜๐‘– , ๐’™๐‘˜๐‘– )}๐‘–=1 is transformed to sparse coding space to form ๐‘๐‘˜

a sparse code set {(๐๐‘˜๐‘– , ๐‚๐‘˜๐‘– )}๐‘–=1 via basic sparse coding scheme with jointly learned sub-dictionary pair {๐‘ซ๐‘˜๐ฟ , ๐‘ซ๐‘˜๐ป }. Finally, the improved SORM is utilized to establish the following mapping functions to reveal the relationships between the ๐๐‘˜๐‘– and

CR IP T

๐‚๐‘˜๐‘– , โˆ€๐‘– โˆˆ *1,2, โ‹ฏ , ๐‘๐‘˜ + ๐‘“(๐๐‘˜ ) = (๐‘พ๐‘˜ )๐‘‡ ๐œ‘(๐๐‘˜ ) + ๐’ƒ๐‘˜ , โˆ€๐‘˜ โˆˆ *1,2, โ‹ฏ , ๐พ+

(24)

where ๐‘พ๐‘˜ and ๐’ƒ๐‘˜ is the parameters of the mapping functions and can be obtain by

AN US

solving the KKT system Eqn. (13). Accordingly, the trained sub-dictionary pairs and corresponding mapping functions can make the following energy equation to reach a smaller value with low time and space complexity. It means that the proposed method

M

can establish more precise mapping to reveal the relationships between the two feature spaces.

PT

ED

min

๐‘˜ ๐‘“(๐๐‘˜ ),๐‘ซ๐‘˜ ๐ฟ ,๐‘ซ๐ป

๐‘˜ ๐‘˜ ๐‘˜ ๐‘˜ โˆ‘๐‘ ๐‘–=1โ€–๐’™๐‘– โˆ’ ๐‘ซ๐ป ๐‚๐‘– โ€–

2 2

๐‘ . ๐‘ก ๐‚๐‘˜๐‘– = (๐‘พ๐‘˜ )๐‘‡ ๐œ‘(๐๐‘˜๐‘– ) + ๐’ƒ๐‘˜

(25)

2

๐๐‘˜๐‘– = arg min โ€–๐’š๐‘˜๐‘– โˆ’ ๐‘ซ๐‘˜๐ฟ ๐’›๐‘˜๐‘– โ€–2 + ๐œ†โ€–๐’›๐‘˜๐‘– โ€–1 ๐‘˜ ๐‘ง๐‘–

CE

(2) HR images reconstruction phase: Firstly, the input LR image ๐’š is divided into ๐‘

๐‘ก many overlapped image patches *๐’š๐‘– +๐‘–=1 . Secondly, for each image patch ๐’š๐‘– (โˆ€๐‘– โˆˆ

AC

*1,2, โ‹ฏ , ๐‘๐‘ก +), we can obtain its sparse code ๐๐‘– via the basic sparse coding scheme with a matched LR sub-dictionary that choose from there trained LR sub-dictionaries, and then the sparse code ๐๐‘– is mapped to the sparse code ๐’—๐‘– of the desired HR image patch by the learned corresponding mapping function. Then, initial HR image ๐’™0 be reconstructed by

21

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences

๐’™0 = (โˆ‘

๐‘๐‘ก

โˆ’1

๐‘ท๐‘‡๐‘– ๐‘ท๐‘– )

๐‘–=1

โˆ‘

๐‘๐‘ก

๐‘ท๐‘‡๐‘– ๐‘ซ๐‘˜โ„Ž ๐’—๐‘–

๐‘–=1

(26)

where ๐‘ซ๐‘˜โ„Ž (โˆ€๐‘˜ โˆˆ *1,2, โ‹ฏ , ๐พ+) is the ๐’—๐‘– corresponding HR sub-dictionary and ๐‘ท๐‘– is the image patch extracting matrix. Finally, the initial HR image ๐’™0 will be refined by

be obtained by

CR IP T

using the global and nonlocal optimization Eqn. (22). Thus the final HR image ๐’™โˆ— can

๐’™๐‘›+1 = ๐’™๐‘› + ๐œ,(๐‘ซ๐‘ฏ)๐‘‡ (๐‘ซ๐‘ฏ๐’™๐‘› โˆ’ ๐’š) + ฮด(๐’™๐‘› โˆ’ ๐’™0 ) + ฮท(๐‘ฌ โˆ’ ๐)๐‘‡ (๐‘ฌ โˆ’ ๐)๐’™๐‘› - (27) where ๐‘› is the iteration number, ๐œ is the step size.

AN US

The entire procedure of the proposed method is summarized in algorithm 1. Algorithm 1: the algorithm of the proposed method for SISR

Input: the test LR image ๐’š , the dictionary pairs and mapping function {๐‘ซ๐‘˜๐ฟ , ๐‘ซ๐‘˜๐ป , ๐‘“(๐๐‘˜ )}, โˆ€๐‘˜ โˆˆ *1,2, โ‹ฏ , ๐พ+, scaling factor ๐‘ , the size of LR patch ๐‘ ร— ๐‘, and Output: the final HR image ๐’™โˆ— .

M

initial regularization and step parameters: ๐œ†, ฮด, ฮท ๐œ.

ED

1. Upscale ๐’š to ๐’šโˆ— using Bicubic interpolation with a factor of ๐‘  2. Extract overlapping patches *๐’š๐‘– + from the image ๐’šโˆ— 3. For each LR image patch ๐’š๐‘– do

Compute the sparse code ๐๐‘– using Eqn. (2) with a matched LR sub-dictionary

5.

Map the ๐๐‘– to ๐’—๐‘– using Eqn. (24)

6.

Generate the HR image patch with ๐’™๐‘– = ๐‘ซ๐‘˜๐ป ๐’—๐‘–

CE

PT

4.

8. End for

AC

7. Obtain the initial HR image ๐’™0 by fusing all the HR patches using Eqn. (26) 9. For ๐‘› โ‰ค

ax ter and โ€–๐’™๐‘›+1 โˆ’ ๐’™๐‘› โ€–22

๐œ€ do

10.

If mod(๐‘›, ) = 0, update the similarity weights matrix ๐ using Eqn. (19)

11.

Update ๐’™๐‘›+1 using Eqn. (27)

12. End for 4 Experimental Results and Discussion To validate the effectiveness and robustness of the proposed method, the SR 22

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences experiments for different scaling factors (ร—2, ร—3 and ร—4) are performed on all the images in several datasets (Set5, Set14, Set17, and B100) [14,42]. Some of the test images are shown in Fig.2. Five representative learning-based SR methods are used as comparision baselines, including NE [4], SCSR [53], NCSR [10], A+ [41], and SRCNN

CR IP T

[9]. In order to evaluate the quality of the reconstructed HR images objectively, the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) are adopted as evaluation indexes in our experiments. All of the compared results are reproduced by the models

AN US

that are retrained by the codes downloaded from the authorsโ€™ websites under the same blurring and down-sampling conditions in our experiment. 4.1 Experimental Settings

M

Since the human visual system is more sensitive to the luminance component of the image than to the chrominance ones, we transform images from the RGB color

ED

space to YCbCr color space and only perform SR reconstruction on the luminance

PT

component, and the others are simply magnified to the desired size via the Bicubic interpolation algorithm. To mimic the real imaging process, all the test HR images are

CE

first blurred by 5ร—5 Gaussian kernel with standard deviation 1.2 and down-sampled by a

AC

decimated factor to produce noiseless LR images for the noiseless experiments. In addition, the Gaussian noises is added to the noiseless LR images for the noise experiments. For a fair comparison, we also select 91 HR images similar to [53] as training HR images and its corresponding training LR images are generated by the same blurring and down-sampling conditions in generating the test LR images. Some training images are shown in Fig.3. 23

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences 4.2 Parameters Selection and Optimization In our experiment, the basic parameters of the proposed method are empirically set as follows. For the training phase, the number of cluster used to partition the couple feature space is set to K=256. According to [53], we set the sparsity parameter ๐œ†, the

CR IP T

size of sub-dictionary and the iteration parameter for dictionary training as 0.15, 128, and 40 in training couple sub-dictionary pairs by joint sparse coding respectively. Considering computation and reconstruction quality, the number of the training image

AN US

patches that are used for learning sub-dictionaries and mapping functions is 1,000,000. Since the RBF kernel function is adopted in the improved SORM, when we train the regression functions in sparse coding space, we should determine the parameter pairs *๐›พ๐‘– , ๐œŽ๐‘– , ๐œ‰๐‘– +, โˆ€๐‘– โˆˆ *1,2, โ‹ฏ , ๐‘š+ for each output component. Because the parameter pairs

M

*๐›พ๐‘– , ๐œŽ๐‘– , ๐œ‰๐‘– + have important effect on the mapping accuracy of regression functions, an

ED

improved optimization algorithm is utilized to optimize the parameter pairs for each

PT

component of output variable according to Suykens et al. [38] โ€™s research on the structural risk minimization. The complete optimization process goes as follows: First,

CE

the Coupled Simulated Annealing (CSA) determines suitable starting points for each

AC

parameter pair in the range of exp(โˆ’10) to exp(10). Then these starting points are given to the optimization routines to produce an evaluated grid over the parameter space. Thus, the optimization of parameter pairs is obtained by picking the minimum in the grid. In the reconstruction phase, the parameters for the global and nonlocal optimization are set based on the data in [14]. The number of similar image patches in each cluster is set as 85, and the window width for searching the similar image patches 24

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences is set as 50. For the noiseless experiments, the sparsity penalty and regularization parameters ๐œ†, ๐›ฟ, โ„Ž1 , โ„Ž2 and ฮท are set to be 0.10, 0.02, 3, 3 and 0.05 respectively. Similarily, the ๐œ†, ๐›ฟ, โ„Ž1 , โ„Ž2 and ฮท are experimentally set as 0.15, 0.01, 8, 8 and 0.53 for the noisy image reconstruction. And the iteration parameters of the optimization will

CR IP T

be discussed in the section 4.4. Correspondingly, all the empirical parameters in our experiments also are shown in Table 1. 4.3 Effectiveness of the Improved SORM

AN US

In this section, we design two groups of experiment on the noiseless and noisy test images to evaluate the effectiveness of the improved SORM. One experiment is the proposed method without the improved SORM named as N_SORM, thus reconstructed

M

model will degenerate into the classical sparse representation model, and the other one is the proposed method. The reconstructed results on noiseless and noisy Baboon image

ED

by the proposed and N-SORM method are showed in Fig. 4-5. It is obvious that the

PT

proposed method outperforms N_SORM method in preserving edges, reconstructing visual details, and suppressing noise. This is mainly because the improved SORM

CE

forces the sparse code of the reconstructed images to approximate the sparse code of the

AC

original HR images.

Moreover, we report the PSNR and SSIM values of the reconstructed HR images

by the proposed and N-SORM method for the noiseless and noisy experiments in Table 2 and Table 3. From the tables, the average PSNR and SSIM gains of the reconstructed noiseless results by the proposed method over the N_SORM method are 1.16 dB and 0.0244, respectively. Correspondingly, the average PSNR and SSIM gains over the 25

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences N_SORM method are 1.53 dB and 0.0722, respectively, in the noisy experiment. In view of the above, the combine sparse coding with the improved SORM can significantly enhance the performance of the sparse coding SR methods. 4.4 Effectiveness of Global and Nonlocal Optimization

CR IP T

In this section, three groups of experiments are carried out on all the images in Set17 to illustrate the effectiveness of the proposed global and nonlocal optimization. The proposed method without the global and nonlocal optimization (N_GNO), with the

AN US

global and nonlocal optimization of the classical similarity weight(N_GNC), and with the global and nonlocal optimization of the new similarity weight are considered in our experiments. As shown in Fig. 6 and Fig. 7, the reconstructed Parrots images by the

M

proposed method contain less ringing artifacts and more sharp edges. Correspondingly, we also show the average PSNR and SSIM values of all the restored HR images by the

ED

three groups of experiments in Table 4. It is clearly that the proposed method

PT

outperforms the N_GNO and N_GNC methods. Concretely, the average PSNR gain of the proposed method over the N_GNO and N_GNC method are 0.16dB and 0.04,

CE

respectively, in the noiseless experiments. The average PSNR gain of noise experiments

AC

over the N_GNO and N_GNC method are 0.39dB and 0.03, respectively. Simultaneously, we also report the optimization process in Fig.8 to show the performance of refinement and the influence of the iterations in Eqn. (23). From Fig. 8, we can observe that the average PSNR for all the reconstructed HR images in Set5 tends to increase with the number of iterations. In addition, due to the nonlocal self-similarity also is widely applied in denoising area, the proposed optimization achieves the more 26

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences promising performance in recovering the HR images from noise images. The above evaluation validates that the proposed global and nonlocal optimization can further improve the quality of reconstructed HR image. 4.5 Experiments on Noiseless Image

CR IP T

In order to verify the effectiveness of the proposed method, we implement the proposed and compared methods on noiseless test image. Table 5 show the PSNR and SSIM values of the reconstructed HR images. From Table 5, we can observe that NE

AN US

method always gives the worst performance. Since SCSR and A+ methods exploit more priori knowledge from examples, they obtain better results than NE method. SRCNN obtain higher PSNR and SSIM values than SCSR and NE methods because of the

M

establishment of more complex relationships between samples. NCSR method achieves more promising performance than the above-mentioned methods owing to the

ED

exploration of local and nonlocal self-similarity. Correspondingly, the proposed method

PT

significantly outperforms the compared methods on all the test images. The average PSNR and SSIM gains over the NCSR method are 0.59 dB and 0.0126, respectively. At

CE

the same time, we perform unequal variances T-test [31] on the proposed method and

AC

each of the compared methods to evaluate the significance degree of the average PSNR and SSIM gains. The T-statistics of PSNR and SSIM in Table 5 verify that the proposed method significantly improves the quality of the reconstructed HR image since most of the test statistics between the proposed and compared method are obviously away from 1. To further evaluate the SR performances of the proposed method, the scaling 27

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences factors ร—3 results of the Butterfly, Boats, and Barbara images for visual quality comparison are shown in Fig. 9-11, respectively. For further comparison, we also show a local magnification result in the red rectangle region segmented from each reconstructed HR images. From Fig.8-10, we can observe that the reconstructed HR

CR IP T

images by NE method contains a large number of jaggy artifacts and blurred edges. Although SCSR method has better performance over NE method, the reconstructed HR images shows noticeable artifacts around the edges. SRCNN and A+ methods perform

AN US

better than NE methods in suppressing ringing artifacts, but it always gives low performance for blurred images. By exploiting the local and nonlocal self-similarity in nature images, NCSR methods is effective to suppress ringing and jaggy artifacts, but it

M

always leads to the blurring of small scale and sharp edge structures. Comparison with the baseline methods, the reconstructed results of proposed method show both sharp

ED

edges and finer details that can be observed from the black edges and white enclosed

PT

areas of the reconstructed Butterfly image in Fig. 9(g). We can also obtain the similar observations around the edge regions in Figs. 10 and 11. This is mainly due to that the

CE

proposed method can more accurately reveal the intrinsic relationship between the two

AC

feature spaces. Hence, our method is more capable of estimating the desired HR image. 4.6 Experiments on Noisy Image Since the input LR images are always corrupted by noise in real applications, it is

necessary to verify the effects of image noise on the proposed method. For fair comparison, the proposed and compared methods are all trained in noiseless condition. In addition, we add the additive Gaussian noise with different levels of standard 28

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences deviation (๐œŽ๐‘› = 4, 6, 8) to the LR input images. Fig. 12-14 show the reconstructed HR images by the proposed and compared methods in the different noise levels. As shown in Fig.12-14, NE method generates serious jaggy artifacts cause by noise along edges. SCSR and A+ method can better suppress these artifacts. However, the watercolor-like

CR IP T

artifacts are generated in reconstructed HR images. Similarly, there are many unpleasing blurred textural details and ringing effects in reconstructed HR images by SRCNN method. It is obvious that NCSR method performs better than other compared methods

AN US

in suppressing noise and recovering the miss high-frequency information. However, the reconstructed results always have the problem of the blurring of image small scale structures and over-smooth, especially at the high noise level. Compared with the

M

baselines, the proposed method show a more convincing performance in suppressing noise, preserving edge and reconstructing visual details in the different noise levels.

ED

Moreover, we show the PSNR and SSIM values of the reconstructed HR images by

PT

the proposed and compared methods with different noise level in Table 6. From the Table, the PSNR and SSIM values for all methods tend to decrease when the noise level

CE

becomes larger. However, compared with the other methods, our method still achieves

AC

the best reconstructed quality in the experiments of different noise level. Similiar to the noiseless experiments, we also obtain the T-statistics between the proposed method and the compared methods. These statistics in different noise levels indubitably indicate that although the quality of reconstructed HR images by the proposed method declines markedly with the increase of noise level, the average PSNR and SSIM gains are still significant compared with other methods. These results indicate even if the proposed 29

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences model is trained in noiseless condition, it still has stunning performance to recover the HR image from a given LR noise image, which means that the proposed method has better noise robustness than the compared methods. 4.7. Benchmarks

CR IP T

To further verify the effectiveness of the proposed method, the more statistical experiments with the different scaling factors ร—2, ร—3 and ร—4 are performed on all the images in Set5, Set14 and B100. Table 7 and 8 show the PSNR and SSIM values of the

AN US

reconstructed results by the proposed and compared methods in the noiseless and noisy experiments. From Table 7 and 8, we can see that the proposed method achieves the consistent performance on Set 5, Set 14, and B100. Since the proposed and compared

M

methods are based on the learning methods and the training data are limit, the performance of all methods tends to decline with the increase of test examples.

ED

However, the proposed method still performs better than all the compared methods in

PT

both PSNR and SSIM.

In addition, considering that most of the compared methods (NE, SCSR, A+ and

CE

SRCNN ) which reported their performance on the LR images generated by the bicubic

AC

downsampling process in the original published paper, we also execute the same experiment for obtaining the more fair comparison. Table 9 show the average PSNR and SSIM values on the datasets Set5, Set14, and B100 with the scale factor ร—2, ร—3 and ร—4. We can observe that the proposed method is capable of achieving more impressing performance than the compared methods as before. The average PSNR gains by the proposed method over SRCNN for the scale factor ร—3 on Set5, Set14, and B100 are 30

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences 0.28, 0.31, 0.22 respectively. These experimental results further suggest that the proposed method can effectively improve the quality of reconstructed HR images. 4.8 Computational Complexity For the reality SR applications, the time and space complexity of algorithm also

CR IP T

need to be considered owing to the limitation of computing resource. In this section, we discuss and show the computational complexity of the proposed algorithm in terms of both training and reconstructing phase. For fair comparison, the proposed and compared

AN US

methods are conducted in MATLAB 2014a platform on a desktop PC with 64-win10 operating system, 2.9 GHz Intel-Core CPU, and 12 GB memory. In the training phase, the time complexity of the proposed method is higher than the SCSR method because

M

we have to train the mapping functions after the learning of the couple dictionary pairs. However, since the training of mapping functions is to solve the linear systems of

ED

equations in Eqn. (23) offline, the training time of mapping functions is at a low level.

PT

However, the total training time of the proposed method is about 3.43 times of SCSR because we train a set of joint dictionary pairs and mapping functions (K=256). At the

CE

same time, we also compare the CPU running time and the PSNR values between the

AC

proposed method and the compared methods to evaluate the practicability of the proposed algorithm. The comparisons of 3ร— SR magnification on โ€œBarbaraโ€ image are shown in Fig. 15. Since the RBF kernel function and the improved nonlocal regularization term are adopted during the SR process, the proposed method takes about 171 seconds to recover a 255 ร— 255 HR image. As shown in Fig. 15, although the proposed method costs more time than the A+, SRCNN, SCSR, and NE methods and 31

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences only slightly faster than the NCSR methods, the reconstructed results of the proposed method are the best. 5 Conclusion In this paper, we propose a new SISR method by combining sparse coding with the

CR IP T

improved SORM. In the proposed method, we train a set of joint dictionary pairs and mapping functions. Since the trained dictionary pairs well characterize the structural of the LR and HR feature spaces and the learned mapping functions map the sparse code

AN US

of LR feature space to its corresponding the sparse code of HR feature space more precisely, the proposed method establishes a more accurate model to reveal the intrinsic relationship between the two feature spaces and improves the quality of reconstructed

M

HR image notably. Furthermore, aiming at alleviating the ringing artifacts cause by the patchwise and other factors such as noise, we propose an new global and nonlocal

ED

optimization to further improve the HR image quality by exploiting spatial and intensity

PT

information of nonlocal self-similarity image patches. The experiment results on test images demonstrate that the proposed method is capable of achieving performance that

CE

is more competitive over many state-of- the-art SISR methods.

AC

Quantitatively and qualitatively, the proposed method can obtain better quality HR images than other single image SR methods, but the reconstruction speed is much slower than many outstanding methods especially for the methods based on deep convolution neural network. In future work, we will investigate potential parallelism of the proposed method to improve the efficiency of training and reconstruction by making full use of computer CPU and GPU. 32

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences

Acknowledgements This work was supported by Key Projects of the National Science and Technology Program, China (Grant No. 2013GS500303), the Key Science and Technology Projects of CSTC, China (Grant Nos. CSTC2012GG-YYJSB40001, CSTC2013-JCSF40009),

CR IP T

and the National Natural Science Foundation of China (Grant No. 61105093).

The authors would like to thank the editors and reviewers for their valuable comments and suggestions.

AN US

Author Contributions

This manuscript was performed in collaboration between the authors. Weiguo

M

Gong is the corresponding author of this research work. Yongliang proposed a new SISR method based on the sparse coding and improved SORM. Qiane Yi proposed the

ED

global and nonlocal optimization model. Weihong Li was involved in the writing and

CE

manuscript.

PT

argumentation of the manuscript. All authors discussed and approved the final

AC

References [1] N. Akhtar, F. Shafait, A. Mian. Bayesian sparse representation for hyperspectral image super resolution, in: IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 3631-3640. [2] X. An, S. Xu, L. Zhang, S. Su, Multiple dependent variables LS-SVM regression algorithm and its application in NIR spectral quantitative analysis. Spectrosc. Spect. Anal. 29 (01) (2009)

33

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences 127โ€“130. [3] M. Brudnak, Vector-valued support vector regression, in: Int. Joint Conf. Neural Netw., 2006, pp. 1562โ€“1569. [4] H. Chang, D.-Y. Yeung, Y. Xiong, Super-resolution through neighbor embedding, in: IEEE Conf.

CR IP T

Comput. Vis. Pattern Recognit., 2004, pp. 275-282. [5] X. Chen, C. Qi, Nonlinear neighbor embedding for single image super-resolution via kernel mapping, Signal Process. 94 (1) (2014) 6-22.

AN US

[6] S. Dai, M. Han, W. Xu, Y. Wu, Y. Gong, Soft edge smoothness prior for alpha channel super resolution, in: IEEE Conf. Comput. Vis. Pattern Recognit., 2007, pp. 1โ€“8. [7] I. Daubechies, M. Defriese, C. DeMol, An iterative thresholding algorithm for linear inverse

M

problems with a sparsity constraint, Communications on Pure and Applied Mathematics 57(11) (2004) 1413-1457.

ED

[8] C. Deng, J. Xu, K. Zhang, Similarity constraints-based structured output regression

PT

machine: an approach to image super-resolution, IEEE Transactions on Neural Networks & Learning Systems 27 (12) (2015) 2472-2485.

CE

[9] C. Dong, C. Loy, X. Tang, Image super-resolution using deep convolutional networks, IEEE

AC

Trans. Pattern Anal. Mach. Intell. 38 (2) (2016) 295-307. [10] W. Dong, L. Zhang, G. Shi, X. Li, Nonlocally centralized sparse representation for image restoration, IEEE Trans. Image Process. 22 (4) (2013) 1620โ€“1630.

[11] R. Fattal, Image up-sampling via imposed edge statistics, ACM Transactions on Graphics 26 (03) (2007) 095-103. [12] W.-T. Freeman, T.-R. Jones, E.-C. Pasztor, Example-based super-resolution, IEEE Comput.

34

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences Graph. Appl. 22 (2) (2002) 56-65. [13] X. Gao, K. Zhang, X. Li, D. Tao, Image super-resolution with sparse neighbor embedding, IEEE Trans. Image Process. 21 (7) (2012) 3194-3205. [14] W. Gong, L. Hu, J. Li, W. Li, Combining sparse representation and local rank constraint for

CR IP T

single image super resolution, Information Sciences 325 (2015) 1-19. [15] L. He, H. Qi, R. Zaretzki, Beta process joint dictionary learning for coupled feature spaces with application to single image super-resolution, in: IEEE Conf. Comput. Vis. Pattern

AN US

Recognit., 2013, pp. 345-352.

[16] H. He, W.-C. Siu, Single image super-resolution using Gaussian process regression, in: IEEE Conf. Comput. Vis. Pattern Recognit., 2011, pp. 449โ€“456.

M

[17] A. Hosni, M. Bleyer, M. Gelautz, C. Rhemann, Local stereo matching using geodesic support weights, in: IEEE International Conference on Image Processing, 2009, pp. 2093โ€“2096.

ED

[18] M. Irani, S. Peleg, Improving resolution by image registration, CVGIP Graphical Models &

PT

Image Processing 53 (03) (1991) 231-239. [19] K. Jiwon, J.-K. Lee, K.-M. Lee. Accurate image super-resolution using very deep

CE

convolutional networks, in: IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1646-1654.

AC

[20] K. Jiwon, J.-K. Lee, K.-M. Lee. Deeply-recursive convolutional network for image super-resolution, in: IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp.1637-1645.

[21] R. Keys, Cubic convolution interpolation for digital image processing, IEEE Trans. Acoust., Speech, Signal Process. 29 (06) (1981) 1153-1160. [22] K. I. Kim, Y. Kwon, Single-image super-resolution using sparse regression and natural image prior, IEEE Trans. Pattern Anal. Mach. Intell. 32 (06) (2010) 1127-1132.

35

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences [23] S. Kindermann, S. Osher, P. W. Jones, Deblurring and denoising of images by nonlocal functionals, Multiscale Model. Simul. 04 (04) (2005) 1091-1115. [24] R.-O. Lane, Non-parametric Bayesian super-resolution, IET radar, sonar & navigation, 04 (04) (2010) 639-648.

super-resolution, Information Sciences 298 (2015) 257-273.

CR IP T

[25] J. Li, W. Gong, W. Li, Dual-sparsity regularized sparse representation for single image

[26] W. Liu, S. Li, Sparse representation with morphologic regularizations for single image

AN US

super-resolution, Signal Process. 98 (2014) 410โ€“422.

[27] J. Mairal, F. Bach, J. Ponce, G. Sapiro, A. Zisserman, Non-local sparse models for image restoration, in: IEEE Int. Conf. Comput. Vis., 2009, pp. 2272โ€“2279.

M

[28] S. Peled, Y. Yeshurun, Super-resolution in MRI: application to human white matter fiber tract visualization by diffusion tensor imaging, Magnetic Resonance in Medicine Official Journal of

ED

the Society of Magnetic Resonance in Medicine 45 (2001) 29โ€“35.

PT

[29] G. Polatkan, M. Zhou, L. Carin, D. Blei, I. Daubechies, A Bayesian Nonparametric Approach to Image Super-Resolution, IEEE Trans. Pattern Anal. Mach. Intell. 37 (02) (2015) 346-358.

CE

[30] M. Protter, M. Elad, H. Takeda, P. Milanfar, Generalizing the nonlocal-means to

AC

super-resolution reconstruction, IEEE Trans. Image Process. 18 (01) (2009) 36-51. [31] G.-D. Ruxton, Unequal variance t -test is an underused alternative to student's t -test and the mannโ€“whitney u test, Behavioral Ecology. 17(4) (2006) 688-690.

[32] R.-R. Schultz, R.-L. Stevenson, Extraction of high-resolution frames from video sequences, IEEE Trans. Image Process. 05 (06) (1996) 996-1011. [33] K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image

36

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences recognition, in: International Conference on Learning Representations, 2015. [34] J. Su, Z. Xu, H. Shum, Gradient pro๏ฌle prior and its applications in image super-resolution and enhancement, IEEE Transactions on Image Process. 20 (06) (2011) 1529-42 [35] J. Sun, Z. Xu, H.-Y. Shum, Image super-resolution using gradient profile prior, in: IEEE Conf.

CR IP T

Comput. Vis. Pattern Recognit., 2008, pp. 1โ€“8. [36] J. Sun, J. Zhu, M. F. Tappen, Context-constrained hallucination for image super-resolution, in: IEEE Conf. Comput. Vis. Pattern Recognit., 2010, pp.231โ€“238.

AN US

[37] J. Sun, N.-N. Zheng, H. Tao, H.-Y. Shum, Image hallucination with primal sketch priors, in: IEEE Conf. Comput. Vis. Pattern Recognit., 2003, pp. 729โ€“736.

[38] J. Suykens, J. Vandewalle, Least squares support vector machine classifiers, Neural Process.

M

Lett. 09 (03) (1999) 293โ€“300.

[39] Y.-W. Tai, S. Liu, M. S. Brown, S. Lin, Super resolution using edge prior and single image

ED

detail synthesis, in: IEEE Conf. Comput. Vis. Pattern Recognit., 2010, pp. 2400โ€“2407.

PT

[40] H. Takeda, S. Farsiu, P. Milanfar, Kernel regression for image processing and reconstruction, IEEE Trans. Image Process. 16 (02) (2007) 349-366.

CE

[41] R. Timofte, V. De, L. Gool, A+: Adjusted anchored neighborhood regression for fast

AC

super-resolution, in: 12th Asian Conf. Comput. Vis., 2014, pp. 111โ€“126. [42] R. Timofte, V. De, L. Gool, Anchored neighborhood regression for fast example-based super-resolution, in: IEEE Conf. Comput. Vis. Pattern Recognit., 2013, pp. 1920โ€“1927.

[43] R. Timofte, R. Rothe, L. Gool, Seven ways to improve example-based single image super resolution, in: IEEE Conf. Comput. Vis. Pattern Recognit., 2016:1865-1873. [44] R.-Y. Tsai, T.-S. Huang, Multi-frame image restoration and registration, Advances in

37

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences Computer Vision & Image Process. 01 (1984) 317-339. [45] M. Unser, A. Aldroubi, M. Eden, Fast B-spline transforms for continuous image representation and interpolation, IEEE Trans. Pattern Anal. Mach. Intell. 13 (03) (1991) 227-285. [46] L. Wang, S. Xiang, G. Meng, H. Wu, C. Pan, Edge-directed single-image super-resolution via

Video Technology 23 (08) (2013) 1289-1299.

CR IP T

adaptive gradient magnitude self-interpolation, IEEE Transactions on Circuits & Systems for

[47] S. Wang, D. Zhang, Y. Liang, Q. Pan, Semi-coupled dictionary learning with applications to

Recognit., 2012, pp. 2216โ€“2223.

AN US

image super-resolution and photo-sketch synthesis, in: IEEE Conf. Comput. Vis. Pattern

[48] S. Xu, X. An, X. Qiao, L. Zhu, Multi-task least-squares support vector machines, Multimedia

M

Tools Appl. 71 (02) (2014) 699-715.

[49] H. Xu, G. Zhai, X. Yang, Single image super-resolution with detail enhancement based on

PT

1740-1754.

ED

local fractal analysis of gradient, IEEE Trans. Circuits Syst. Video Technol. 23 (10) (2013)

[50] W. Yang, Y. Tian, F. Zhou, Consistent coding scheme for single image super-resolution via

CE

independent dictionaries, IEEE Transactions on Multimedia 18 (03) (2016) 313-325.

AC

[51] S. Yang, M. Wang, Y. Chen, Y. Sun, Single-image super-resolution reconstruction via learned geometric dictionaries and clustered sparse coding, IEEE Trans. Image Process. 21 (09) (2012) 4016โ€“4028.

[52] J. Yang, Z. Wang, Z. Lin, S. Cohen, Thomas Huang, Couple dictionary training for image super-resolution, IEEE Trans. Image Process. 21 (08) (2012) 3467-3487. [53] J. Yang, J. Wright, T.-S. Huang, Y. Ma, Image super-resolution via sparse representation, IEEE

38

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences Trans. Image Process. 19 (11) (2010) 2861-2873. [54] R. Zeyde, M. Elad, M. Protter, on single image scale-up using sparse-representations, in: 7th Int. Conf. Curves Surf., 2010, pp. 711โ€“730. [55] X. Zhang, M. Burger, X. Bresson, S. Osher, Bregmanized nonlocal regularization for

CR IP T

deconvolution and sparse reconstruction, SIAM J. Imag. Sci.. 03 (03) (2010) 253โ€“276. [56] K. Zhang, X. Gao, X. Li, D. Tao, Partially supervised neighbor embedding for example-based image super-resolution, IEEE J. Sel. Topics Signal Process. 05 (02) (2011) 230-239.

AN US

[57] K. Zhang, X. Gao, D. Tao, and X. Li, Image super-resolution via nonlocal steering kernel regression regularization, in: 20th IEEE Conf. Image Process, 2013, pp. 943โ€“946. [58] K. Zhang, D. Tao, X. Gao, X. Li, Z. Xiong, Learning multiple linear mappings for efficient

M

single image super-resolution, IEEE Trans. Image Process. 24 (03) (2015) 846-861. [59] L. Zhang, X. Wu, An edge-guided image interpolation algorithm via directional filtering and

ED

data fusion, IEEE Trans. Image Process. 15 (08) (2006) 2226-2238.

PT

[60] L. Zhang, H. Zhang, H. Shen, P. Li, A super-resolution reconstruction algorithm for surveillance images, Signal Process. 90 (3) (2010) 848โ€“859.

CE

[61] L-W. Zou, P.-C. Yuen, Very low resolution face recognition problem, IEEE Trans. Image

AC

Process. 21 (1) (2012) 327โ€“340.

39

ACCEPTED MANUSCRIPT

ED

M

AN US

CR IP T

Y. Tang et al. /Information Sciences

PT

Vitae

Yongliang Tang received the B.E. and M.S. degrees from Sichuan

CE

University of Science & Engineering, China, in 2010 and 2013,

AC

respectively. Currently๏ผŒhe is a PhD candidate in the College of Opto-Electronic Engineering, Chongqing University. His research

interests include machine learning and image processing. Weiguo Gong received his doctoral degree in computer science from the Tokyo Institute of Technology of Japan in March 1996 as a scholarship gainer of Japan Government. From April 1996 to March 40

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences 2002, he served as a researcher or senior researcher in NEC Labs of Japan. Now he is a professor of Chongqing University, China. He has published over 120 research papers in international journals and conferences and two books as an author or co-author. His current research interests are in the areas of pattern recognition and image processing.

CR IP T

Qiane Yi is a M.S. candidate majoring in Control Technology and Instrument in the College of Opto-Electronic Engineering, Chongqing University. Her research interests are in information acquiring and

AN US

image processing.

Weihong Li received her doctoral degree from Chongqing University in 2006. Now she is a professor in Chongqing University, China. Her

Table captions:

ED

image processing.

M

current research interests are in the areas of pattern recognition and

method.

PT

Table 1 Empirical parameters for the learning and reconstruction phases of the proposed

CE

Table 2 PSNR (dB) and SSIM results of reconstructed HR images by the proposed and

AC

N_SORM methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 0). Table 3 PSNR (dB) and SSIM results of reconstructed HR images by the proposed and N_SORM methods (scaling factor ๐‘  = 3, noise level ๐œŽ=6). Table 4 Average PSNR (dB) and SSIM results of reconstructed HR images by the proposed, N_GNO and N_GNC methods (scaling factor ๐‘  = 3). Table 5 PSNR (dB) and SSIM results of reconstructed HR images by the proposed and 41

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences baseline methods (scaling factor ๐‘  = 3, noise level ฯƒ= 0). Table 6 PSNR (dB) and SSIM results of reconstructed HR images by the proposed and baseline methods (scaling factor ๐‘  = 3, noise level ฯƒ= 4, 6, 8). Table 7 Average PSNR (dB) and SSIM for scaling factor ร—2, ร—3 and ร—4 on datasets

CR IP T

Set5, Set14, and B100 among different methods (Noise Level, ๐œŽ๐‘› = 0). Table 8 Average PSNR (dB) and SSIM for scaling factor ร—2, ร—3 and ร—4 on datasets Set5, Set14, and B100 among different methods (Noise Level, ๐œŽ๐‘› = 6).

AN US

Table 9 Average PSNR (dB) and SSIM for scale factor ร—2, ร—3 and ร—4 on datasets Set5, Set14, and B100 with the bicubic downsampling (Noise Level, ๐œŽ๐‘› = 0). Figure captions:

M

Fig.1 Framework of the proposed method.

ED

Fig.2 Some of test images, from left to right and top to bottom: Butterfly, Parrots, Bike, Boats, Flower, Hat, Leaves, Lighthouse, Plants, and Raccoon.

PT

Fig.3 Some of training images.

CE

Fig.4 Comparison of reconstructed results on noiseless Baboon image by the N_SORM

AC

and proposed methods (scaling factor๐‘  = 3, noise level ๐œŽ = 0). (a) LR image. (b) N_SORM (c). Proposed method. (d) Original image. Fig.5 Comparison of reconstructed results on noise Baboon image by the N_SORM and proposed methods (scaling factor ๐‘  = 3 , noise level ๐œŽ = 6 ). (a) LR image. (b) N_SORM. (c) Proposed method. (d) Original image. Fig.6 Comparison of reconstructed results on noiseless Starfish image by the N_GNO, 42

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences N_GNC and proposed methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 0). (a) LR image. (b) N_GNO. (c) N_GNC. (d) Proposed method. (e) Original image. Fig.7 Comparison of reconstructed results on noise Starfish image by the N_GNO, N_GNC and proposed methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 6). (a) LR image.

CR IP T

(b) N_GNO. (c) N_GNC. (d) Proposed method. (e) Original image. Fig.8 Performance and influence of the number of iterations in the global and nonlocal optimization. (a) Average PSNR with the iterations for scaling factor ร—3 on the noiseless

AN US

images in Set5 (Noise Level, ๐œŽ๐‘› = 0). (b) Average PSNR with the iterations for scaling factor ร—3 on the noisy images in Set5 (Noise Level, ๐œŽ๐‘› = 6).

Fig.9 Comparison of reconstructed results on noiseless Butterfly image by the baseline

M

and proposed methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 0). (a) LR image. (b) NE

(h) Original image.

ED

[4]. (c) SCSR [53]. (d) NCSR [10]. (e) A+ [41]. (f) SRCNN [9]. (g) Proposed method.

PT

Fig.10 Comparison of reconstructed results on noiseless Boats image by the baseline and proposed methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 0). (a) LR image. (b) NE

CE

[4]. (c) SCSR [53]. (d) NCSR [10]. (e) A+ [41]. (f) SRCNN [9]. (g) Proposed method.

AC

(h) Original image.

Fig.11 Comparison of reconstructed results on noiseless Barbara image by the baseline and proposed methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 0). (a) LR image. (b) NE [4]. (c) SCSR [53]. (d) NCSR [10]. (e) A+ [41]. (f) SRCNN [9]. (g) Proposed method. (h) Original image. Fig.12 Comparison of reconstructed results on noisy Butterfly image by the baseline 43

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences and proposed methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 4). (a) LR image. (b) NE [4]. (c) SCSR [53]. (d) NCSR [10]. (e) A+ [41]. (f) SRCNN [9]. (g) Proposed method. (h) Original image. Fig.13 Comparison of reconstructed results on noisy Leaves image by the baseline and

CR IP T

proposed methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 6). (a) LR image. (b) NE [4]. (c) SCSR [53]. (d) NCSR [10]. (e) A+ [41]. (f) SRCNN [9]. (g) Proposed method. (h) Original image.

AN US

Fig.14 Comparison of reconstructed results on noisy Hat image by the baseline and proposed methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 8). (a) LR image. (b) NE [4]. (c) SCSR [53]. (d) NCSR [10]. (e) A+ [41]. (f) SRCNN [9]. (g) Proposed method. (h)

M

Original image.

Fig.15 Speed versus PSNR between the proposed and compared methods. Our proposed

ED

method provides the best quality in comparison with state-of-the-art learning-based SR

AC

CE

PT

methods and preserves the time complexity of NCSR.

44

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences Preprocessing

Model Training

Optimization

DL

Postprocessing

SORM Training

Bicubic

(a) Global constraint

N

๏ปui ๏ฝi๏€ฝ1

๏ƒฌ๏ƒฏ x ๏€ญ D๏ก 2 ๏ƒผ๏ƒฏ i i 2 min ๏ƒฅ ๏ƒญ ๏ƒฝ D i ๏€ฝ1 ๏ƒฏ ๏€ซ ๏ฌ ๏ก i 1 ๏ƒฎ ๏ƒพ๏ƒฏ N

Training Dictionary

(b) Nonlocal self-similarity

x* ๏€ฝ min DHx ๏€ญ y 2 ๏€ซ ๏ค x ๏€ญ x 0

N

๏ปvi ๏ฝi๏€ฝ1

2

RGB

x

2 2

๏€ซ x ๏€ญ๏น x

2 2

DH

0.3 0.2

0.2

vi ๏€ฝ W ๏ช ๏€จ ui ๏€ฉ ๏€ซ b T

0.1

0.1

0

0

-0.1

SORM Mapping-0.1

-0.2

-0.2

-0.3

-0.3

n ๏€พ Nm

u

YCbCR

-0.4

-0.4 0

20

LR

40

60

80

100

120

0

20

40

60

80

Feature-mapping

100

120

or

x n ๏€ซ1 ๏€ญ x n

2 2

๏€ผ๏ฅ

CR IP T

0.3

xi ๏€ฝ Dxvi

2

0.4

0.4

๏ป

2

uห†i ๏€ฝ argmin yi ๏€ญ Dy ui ๏€ซ ๏ฌ ui

1

๏ฝ

YCbCR

RGB HR

ED

M

AN US

Fig.1 Framework of the proposed method

PT

Fig.2 Some of test images, from left to right and top to bottom: Butterfly, Parrots, Bike,

AC

CE

Boats, Flower, Hat, Leaves, Lighthouse, Plants, and Raccoon.

2

ACCEPTED MANUSCRIPT

CR IP T

Y. Tang et al. /Information Sciences

(b)

M

(a)

AN US

Fig.3 Some of training images

(c)

(d)

ED

Fig.4 Comparison of reconstructed results on noiseless Baboon image by the N_SORM and proposed methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 0). (a) LR image. (b)

AC

CE

PT

N_SORM. (c) Proposed method. (d) Original image.

(a)

(b)

(c)

(d)

Fig.5 Comparison of reconstructed results on noise Baboon image by the N_SORM and proposed methods (scaling factor ๐‘  = 3 , noise level ๐œŽ = 6 ). (a) LR image. (b) N_SORM. (c) Proposed method. (d) Original image.

3

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences

(a)

(b)

(c)

(d)

(e)

Fig.6 Comparison of reconstructed results on noiseless Starfish image by the N_GNO,

CR IP T

N_GNC and proposed methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 0). (a) LR image.

(a)

AN US

(b) N_GNO. (c) N_GNC. (d) Proposed method. (e) Original image.

(b)

(c)

(d)

(e)

Fig.7 Comparison of reconstructed results on noise Starfish image by the N_GNO,

M

N_GNC and proposed methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 6). (a) LR image.

AC

CE

PT

ED

(b) N_GNO. (c) N_GNC. (d) Proposed method. (e) Original image.

4

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences 33.55

33.50

33.40

CR IP T

PSNR(dB)

33.45

33.35

33.30

33.25 0

40

60

80 100 Iterations

120

140

160

180

AN US

20

(a)

31.80 31.75

M

31.70

ED

31.60 31.55 31.50 31.45

CE

31.40

PT

PSNR (dB)

31.65

31.35

AC

31.30 0

20

40

60

80 100 Iterations

120

140

160

180

(b)

Fig.8 Performance and influence of the number of iterations in the global and nonlocal optimization. (a) Average PSNR with the iterations for scaling factor ร—3 on the noiseless images in Set5 (Noise Level, ๐œŽ๐‘› = 0). (b) Average PSNR with the iterations for scaling factor ร—3 on the noisy images in Set5 (Noise Level, ๐œŽ๐‘› = 6). 5

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences

(c)

(e)

(f)

(g)

(d)

CR IP T

(b)

(h)

AN US

(a)

Fig.9 Comparison of reconstructed results on noiseless Butterfly image by the baseline and proposed methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 0). (a) LR image. (b) NE [4]. (c) SCSR [53]. (d) NCSR [10]. (e) A+ [41]. (f) SRCNN [9]. (g) Proposed method.

(b)

(c)

(d)

AC

CE

(a)

PT

ED

M

(h) Original image.

(e)

(f)

(g)

(h)

Fig.10 Comparison of reconstructed results on noiseless Boats image by the baseline and proposed methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 0). (a) LR image. (b) NE 6

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences [4]. (c) SCSR [53]. (d) NCSR [10]. (e) A+ [41]. (f) SRCNN [9]. (g) Proposed method.

(a)

(b)

(c)

(e)

(f)

AN US

CR IP T

(h) Original image.

(g)

(d)

(h)

Fig.11 Comparison of reconstructed results on noiseless Barbara image by the baseline

M

and proposed methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 0). (a) LR image. (b) NE

AC

CE

PT

(h) Original image.

ED

[4]. (c) SCSR [53]. (d) NCSR [10]. (e) A+ [41]. (f) SRCNN [9]. (g) Proposed method.

7

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences

(c)

(e)

(f)

(g)

(d)

CR IP T

(b)

(h)

AN US

(a)

Fig.12 Comparison of reconstructed results on noisy Butterfly image by the baseline and proposed methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 4). (a) LR image. (b) NE [4]. (c) SCSR [53]. (d) NCSR [10]. (e) A+ [41]. (f) SRCNN [9]. (g) Proposed method.

(b)

(c)

(d)

AC

CE

(a)

PT

ED

M

(h) Original image.

(e) (f)

(g)

(h)

Fig.13 Comparison of reconstructed results on noisy Leaves image by the baseline and proposed methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 6). (a) LR image. (b) NE [4]. 8

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences (c) SCSR [53]. (d) NCSR [10]. (e) A+ [41]. (f) SRCNN [9]. (g) Proposed method. (h)

(a)

(b)

(c)

(e)

(f)

AN US

CR IP T

Original image.

(g)

(d)

(h)

Fig.14 Comparison of reconstructed results on noisy Hat image by the baseline and

M

proposed methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 8). (a) LR image. . (b) NE [4].

AC

CE

PT

Original image.

ED

(c) SCSR [53]. (d) NCSR [10]. (e) A+ [41]. (f) SRCNN [9]. (g) Proposed method. (h)

9

ACCEPTED MANUSCRIPT

CR IP T

Y. Tang et al. /Information Sciences

AN US

Fig.15 Speed versus PSNR between the proposed and compared methods. Our proposed method provides the best quality in comparison with state-of-the-art learning-based SR

AC

CE

PT

ED

M

methods and preserves the time complexity of NCSR.

10

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences Table 1 Empirical parameters for the learning and reconstruction phases of the proposed method. Parameters

Phase K

๐œ†

๐ท ๐‘ก๐‘’

Patch size

Dict. size

Sample size

256

0.15

40

5ร—5

128

1000,000

Noise ๐œŽ๐‘› =0

๐›ฟ

โ„Ž1

โ„Ž2

๐œ

0.10 0.02

0.05

3

3

0.12

๐œŽ๐‘› =4,6,8 0.20 0.01

0.53

8

8

0.08

๐‘ ๐‘

๐‘™

85

50

40

160

85

50

40

160

AN US

Testing

๐œ†

๐‘€

๐‘ก๐‘’

CR IP T

Training

Table 2 PSNR (dB) and SSIM results of reconstructed HR images by the proposed and N_SORM methods (scaling factor ๐‘  = 3, noise level ๐œŽ = 0). N_SORM

PT

M

SSIM 0.9101 0.6245 0.7806 0.9522 0.8730 0.8954 0.8392

AC

CE

Plants Baboon Bike Leaf Hat Leaves Avg.

PSNR 32.94 24.86 24.08 39.97 30.73 26.11 29.78

ED

Images

11

PSNR 34.13 25.33 25.31 40.62 31.85 27.90 30.86

Proposed SSIM 0.9254 0.6561 0.8274 0.9564 0.8910 0.9254 0.8636

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences Table 3 PSNR (dB) and SSIM results of reconstructed HR images by the proposed and N_SORM methods (scaling factor ๐‘  = 3, noise level ๐œŽ=6). N_SORM PSNR 30.44 24.36 23.65 33.14 29.10 25.46 27.69

Plants Baboon Bike Leaf Hat Leaves Avg.

Proposed SSIM 0.7806 0.5764 0.7261 0.7755 0.7309 0.8409 0.7383

PSNR 31.86 24.56 24.16 36.79 30.39 27.25 29.17

SSIM 0.8630 0.5782 0.7619 0.9068 0.8404 0.9124 0.8105

CR IP T

Images

AN US

Table 4 Average PSNR (dB) and SSIM results of reconstructed HR images by the proposed, N_GNO and N_GNC methods (scaling factor ๐‘  = 3). Noise

N_GNO

N_GNC

SSIM

PSNR

๐œŽ๐‘› =0

30.32

0.8670

30.44

๐œŽ๐‘› =6

28.65

0.8038

29.01

AC

CE

PT

ED

M

PSNR

1

SSIM

Proposed PSNR

SSIM

0.8685

30.48

0.8688

0.8125

29.04

0.8130

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences Table 5 PSNR (dB) and SSIM results of reconstructed HR images by the proposed and compared methods (scaling factor ๐‘  = 3, noise level ฯƒ= 0). The T-statistics is closer to 0 meaning that average PSNR and SSIM gains over compared method is more significant and is closer to 1 indicating that average PSNR and SSIM difference is less obvious.

Flower Butterfly Hat Plants Bike House

PSNR SSIM 26.86 0.8047 27.97 0.8787 27.87 0.8080 27.42 0.7866 24.80 0.8511 29.69 0.8447 31.21 0.8692 23.11 0.7238 30.86 0.8511 30.10 0.8368 27.70 0.7081 24.27 0.7291 23.97 0.7238 32.95 0.8155 27.76 0.8043

PSNR SSIM 27.80 0.8464 29.19 0.9024 28.71 0.8399 28.52 0.8321 26.05 0.8892 30.62 0.8693 32.58 0.9032 23.95 0.7712 32.49 0.8766 31.08 0.8642 28.24 0.7408 24.61 0.7538 25.04 0.8005 33.88 0.8429 28.77 0.8380

PSNR SSIM 28.87 0.8726 29.94 0.9137 29.30 0.8590 29.14 0.8541 28.04 0.9248 31.31 0.8833 33.66 0.9203 24.55 0.8010 33.53 0.8890 31.92 0.8811 28.57 0.7533 24.89 0.7697 25.35 0.8224 34.18 0.8511 29.52 0.8568

PSNR SSIM 28.08 0.8524 29.56 0.9067 28.99 0.8475 28.81 0.8408 26.67 0.9037 30.89 0.8741 33.05 0.9122 24.35 0.7845 32.75 0.8804 31.48 0.8719 28.30 0.7395 27.72 0.7582 25.30 0.8129 34.02 0.8472 29.07 0.8450

AC

Raccoon

Lighthouse Boats Girl Avg.

2

SRCNN Proposed PSNR SSIM 28.31 0.8577 29.73 0.9120 29.22 0.8496 28.95 0.8420 26.75 0.9043 30.94 0.8771 33.07 0.9145 24.38 0.7854 32.91 0.8906 31.59 0.8779 28.47 0.7499 24.96 0.7602 25.39 0.8247 34.30 0.8522 29.21 0.8506

PSNR SSIM 29.51 0.8864 30.70 0.9194 29.67 0.8686 29.70 0.8679 28.35 0.9286 31.85 0.8910 34.13 0.9254 25.31 0.8274 34.49 0.8981 32.32 0.8885 28.97 0.7699 25.47 0.7874 26.60 0.8578 34.41 0.8548 30.11 0.8694

CR IP T

PSNR SSIM 26.38 0.7958 27.36 0.8697 27.58 0.7975 26.98 0.7725 23.51 0.8134 28.94 0.8314 30.60 0.8586 22.47 0.6842 30.55 0.8510 29.48 0.8272 27.63 0.6992 24.03 0.7191 23.76 0.7291 32.81 0.8130 27.51 0.7850

CE

Lena

A+

AN US

Barbara

NCSR

M

Parrots

SCSR

ED

Starfish

NE

PT

Images

Bicubic

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences 0.0511 0.0030

0.2649 0.1134

0.6239 0.5184

0.3848 0.2176

0.4536 0.3295

1.000 1.000

CR IP T

0.0226 0.0008

T-Test

Table 6 PSNR (dB) and SSIM results of reconstructed HR images by the proposed and baseline methods (scaling factor ๐‘  = 3, noise level ฯƒ= 4, 6, 8).

PSNR SSIM 26.47 0.7858 27.74 0.8402 23.85 0.7272

30.29 0.8318 23.46 Butterfly 0.8019 23.97 Lighthouse 0.7033 25.83 Avg. 0.7816 0.0910 T-Test 0.0468 26.16 Starfish 0.7743 27.09 Parrots 0.8283 23.64 Boats 0.7013

30.28 0.8114 24.61 0.8278 24.18 0.7010 26.19 0.7822 0.1208 0.0424 26.01 0.7442 27.19 0.7684 23.59 0.6665

Starfish Parrots Boats House

AC

CE

PT

ฯƒ=4

ฯƒ=6

NCSR

A+

AN US

PSNR SSIM 26.28 0.7860 27.24 0.8501 23.71 0.7163

Images

SCSR

SRCNN Proposed

PSNR SSIM 27.73 0.8177 28.70 0.8467 24.83 0.7605

PSNR SSIM 28.40 0.8487 29.77 0.8919 25.17 0.8017

PSNR SSIM 27.72 0.8270 29.08 0.8572 25.10 0.7772

PSNR SSIM 27.85 0.8298 29.19 0.8586 25.17 0.7858

PSNR SSIM 29.22 0.8631 30.57 0.8941 26.59 0.8350

31.48 0.8196 25.77 0.8552 24.43 0.7085 27.16 0.80014 0.3078 0.1142 27.01 0.7872 28.15 0.8751 24.58 0.7219

32.76 0.8675 27.56 0.9099 24.74 0.7485 28.07 0.8477 0.6220 0.7240 27.94 0.8298 29.47 0.8751 25.05 0.7874

31.79 0.8257 26.41 0.8750 24.54 0.7167 27.44 0.8131 0.3874 0.2087 27.30 0.7989 28.52 0.8063 24.86 0.7415

31.88 0.8293 26.46 0.8774 24.76 0.7217 27.55 0.8171 0.4206 0.2429 27.34 0.8037 28.59 0.8085 24.92 0.7435

34.10 0.8792 27.89 0.9103 25.41 0.7588 28.96 0.8568 1.000 1.000 28.33 0.8364 29.87 0.8768 25.96 0.8086

M

NE

ED

Noise

Bicubic

3

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences 29.35 0.7331 24.15 0.7766 23.70 0.6333 25.67 0.7202 0.1081 0.0057 25.81 0.7344

30.47 0.7629 25.46 0.8224 24.21 0.6636 26.65 0.7722 0.2906 0.1198 26.48 0.7510

32.06 0.8507 27.32 0.8999 24.57 0.7301 27.74 0.8288 0.7032 0.7887 27.45 0.8088

30.79 0.7759 26.10 0.8456 24.34 0.6746 26.99 0.7738 0.3913 0.0880 26.78 0.7650

30.85 0.7797 26.12 0.8466 24.53 0.6774 27.06 0.7766 0.4139 0.0999 26.61 0.7544

33.36 0.8659 27.71 0.9040 25.10 0.7389 28.39 0.8384 1.000 1.000 27.60 0.8126

Parrots

26.88 0.8003

26.88 0.7488

27.50 0.7319

29.11 27.85 0.8551 0.7490

27.87 0.7497

29.25 0.8598

Boats

23.55 0.6822

23.49 0.6624

24.26 0.6784

24.86 24.56 0.7694 0.7005

24.58 0.6989

25.45 0.7848

House

29.58 0.7814

28.93 0.7203

29.39 0.7026

31.14 29.71 0.8295 0.7180

29.73 0.7156

32.66 0.8515

Butterfly

23.30 0.7729

24.16 0.7738

25.05 0.7850

26.90 25.71 0.8857 0.8110

25.69 0.7877

27.22 0.8912

Lighthouse

23.79 0.6628

23.79 0.6268

23.92 0.6149

24.49 24.07 0.7122 0.6281

24.13 0.6286

24.86 0.7204

Avg.

AN US

M

25.52 0.7431

25.51 0.7111

26.10 0.7106

27.33 26.45 0.8101 0.7286

26.44 0.7225

27.87 0.8201

0.1612 0.0478

0.1408 0.0093

0.2577 0.0108

0.7470 0.3598 0.7865 0.0286

0.3552 0.0164

1.000 1.000

AC

CE

PT

T-Test

ED

ฯƒ=8

CR IP T

29.98 0.8095 23.39 Butterfly 0.7891 23.89 Lighthouse 0.6854 25.69 Avg. 0.7647 0.1247 T-Test 0.0543 25.99 Starfish 0.7590 House

4

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences

Table 7 Average PSNR (dB) and SSIM for scaling factor ร—2, ร—3 and ร—4 on datasets Set5, Set14, and B100 among different methods (Noise Level, ๐œŽ๐‘› = 0). Set5

Set14

B100

Methods x3

x4

x2

x3

x4

x2

x3

x4

30.67

29.19

27.51

27.69

26.46

25.42

27.56

26.52

25.33

CR IP T

x2 Bicubic

0.8790 0.8390 0.7926 0.7897 0.7350 0.6846 0.7554 0.6990 0.6480 30.99

29.87

27.54

28.50

27.24

25.52

NE

28.17

26.85

25.46

AN US

0.9002 0.8555 0.7927 0.8275 0.7604 0.6883 0.7997 0.7250 0.6556 -

30.13

-

-

-

0.8733 -

-

36.01

32.82

32.13

SCSR

30.19

NCSR

-

-

27.10

0.7780 -

-

0.7485 -

29.17

31.01

28.22

28.13

27.16

-

26.57

34.67

32.15

A+

M

0.9431 0.9132 0.8664 0.8969 0.8217 0.7522 0.8768 0.7844 0.7090 29.49

30.93

28.82

26.67

29.98

28.05

26.36

34.78

ED

0.9371 0.9027 0.8454 0.8796 0.8100 0.7394 0.8549 0.7735 0.7021 32.32

SRCNN

29.60

30.96

29.10

26.75

30.12

28.12

26.39

PT

0.9387 0.9028 0.8549 0.8821 0.8258 0.7496 0.8564 0.7845 0.6960 36.68

33.51

30.89

32.61

29.96

CE

Proposed

27.71

31.79

28.94

27.15

AC

0.9573 0.9267 0.8796 0.9103 0.8332 0.7623 0.8893 0.7954 0.7194

5

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences Table 8 Average PSNR (dB) and SSIM for scaling factor ร—2, ร—3 and ร—4 on datasets Set5, Set14, and B100 among different methods (Noise Level, ๐œŽ๐‘› = 6). Set5

Set14

B100

Methods x3

x4

x2

x3

x4

x2

x3

x4

30.12

28.97

27.42

27.30

26.56

25.87

27.18

26.22

25.40

Bicubic

CR IP T

x2

0.8375 0.8062 0.7677 0.7530 0.7066 0.6626 0.7199 0.6711 0.6267 30.04

28.72

27.06

27.13

26.47

NE

25.80

26.94

26.16

25.13

0.7775 0.7808 0.7562 0.7181 0.6932 0.6571 0.6937 0.6597 0.6248 -

28.71

-

-

-

0.7839

-

32.94

31.34

29.51

27.05

NCSR

AN US

SCSR

-

-

26.27

-

-

0.6980

-

-

0.6715

-

30.01

28.32

26.76

28.95

27.48

26.18

30.72

30.29

A+

27.83

27.68

M

0.8947 0.8723 0.8395 0.8256 0.7761 0.7228 0.7913 0.7324 0.6774 28.52

27.56

26.10

27.37

27.01

25.81

0.8349 0.8207 0.7947 0.7569 0.7369 0.6957 0.7147 0.7016 0.6588 30.06

28.41

ED

30.68 SRCNN

27.77

25.95

27.31

26.89

25.56

0.8264 0.8198 0.8017 0.7482 0.7219 0.6929 0.7030 0.6881 0.6509 31.79

29.89

30.41

PT

33.43 Proposed

28.66

27.19

29.32

27.81

26.49

AC

CE

0.9058 0.8826 0.8492 0.8356 0.7851 0.7322 0.8007 0.7409 0.6851

6

ACCEPTED MANUSCRIPT Y. Tang et al. /Information Sciences Table 9 Average PSNR (dB) and SSIM for scale factor ร—2, ร—3 and ร—4 on datasets Set5, Set14, and B100 with the bicubic downsampling process (Noise Level, ๐œŽ๐‘› = 0). Set5

Set14

B100

x3

x4

x2

x3

x4

x2

x3

x4

33.66

30.39

28.42

30.24

27.55

26.00

29.56

27.21

25.96

0.9299

0.8682

0.8104

0.8688

0.7742

0.7027

0.8431

0.7385

0.6675

35.57

31.84

29.61

31.76

28.60

26.81

30.41

27.85

26.47

0.9490

0.8956

0.8402

0.8993

0.8076

0.7331

0.8648

0.7592

0.6951

-

31.42

-

-

28.31

-

-

27.72

-

-

0.8821

-

-

0.7956

-

-

0.7647

-

36.54

32.58

30.28

32.28

29.13

27.32

31.21

28.29

26.82

0.9544

0.9088

0.8603

0.9056

0.8215

0.7491

0.8863

0.7835

0.7087

36.66

32.75

30.48

32.42

29.28

27.49

31.36

28.41

26.90

0.9542

0.9090

0.8628

0.9063

0.8209

0.7503

0.8879

0.7863

0.7101

36.93

33.03

30.76

32.61

29.59

27.71

31.59

28.63

27.07

0.9599

0.9142

0.8681

0.9087

0.8255

0.7543

0.8917

0.7896

0.7132

Bicubic

NE

AN US

SCSR

A+

SRCNN

AC

CE

PT

ED

Proposed

CR IP T

x2

M

Methods

7