Super-resolution image reconstruction using surface fitting with hierarchical structure

Super-resolution image reconstruction using surface fitting with hierarchical structure

Accepted Manuscript Super-Resolution Image Reconstruction Using Surface Fitting with Hierarchical Structure Xiaofeng Wang, Didong Zhou, Nengliang Zeng...

859KB Sizes 0 Downloads 64 Views

Accepted Manuscript Super-Resolution Image Reconstruction Using Surface Fitting with Hierarchical Structure Xiaofeng Wang, Didong Zhou, Nengliang Zeng, Xina Yu, Shaolin Hu PII: DOI: Reference:

S1047-3203(18)30059-2 https://doi.org/10.1016/j.jvcir.2018.03.011 YJVCI 2158

To appear in:

J. Vis. Commun. Image R.

Received Date: Revised Date: Accepted Date:

6 June 2017 6 February 2018 4 March 2018

Please cite this article as: X. Wang, D. Zhou, N. Zeng, X. Yu, S. Hu, Super-Resolution Image Reconstruction Using Surface Fitting with Hierarchical Structure, J. Vis. Commun. Image R. (2018), doi: https://doi.org/10.1016/j.jvcir. 2018.03.011

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Super-Resolution Image Reconstruction Using Surface Fitting with Hierarchical Structure★ Xiaofeng Wang, Didong Zhou, Nengliang Zeng, Xina Yu, Shaolin Hu

Abstract

We propose a super-resolution image reconstruction method using multi-source low resolution images. The proposed method includes a hierarchical structure that combines a neighborhood expansion process with the surface fitting technique. In the proposed method, a series of nested neighborhoods are created to collect LR pixels, and a purification algorithm is put forward to remove the outliers. Then we fit with a surface in each neighborhood to obtain a value at the location of estimated high resolution grid site. These values are pooled to a MAP frame to reconstruct high resolution pixels. Therefore, a reconstructed pixel is associated with the pixel correlation, pixel intensity and the spatial structure. Moreover, our method is non-iterative and does not suffer from convergence problem. Comparing with the state-of-the-art schemes, the proposed method provides superior effect and computational efficiency. Experimental results demonstrate the superiority of the proposed method in both visual fidelity and numerical measures.

Key words: Super-resolution image reconstruction, Neighborhood expansion, Multi-surface fitting, Hierarchical structure, MAP estimation



This work was supported by the National Natural Science Foundation of China under Grant No.61772416, 91646108, and 61473222; the Key Laboratory Project of the Education Department of Shaanxi Province under Grant No.17JS098; The Scientific Researching Fund of the State Key Laboratory of Astronautic Dynamics. Xiaofeng Wang, Didong Zhou, Nengliang Zeng, Xina Yu are with the School of Science, Xi'an University of Technology, Xi'an, Shaanxi, 710048, P.R.China. E-mail: [email protected] Shaolin Hu is with the Automation School, Foshan University, Foshan, Guangdong, 528000, P.R. China *Corresponding author: Xiaofengwang, E-mail: [email protected]

1 Introduction In the process of digital image acquisition, digital images often show lower resolution because of many factors such as the limitation of imaging equipment performance and the influence of shooting conditions. Super-resolution (SR) image reconstruction [1] is a kind of technology that estimates high-resolution (HR) images by using single or multiple observed low-resolution (LR) images. It uses the method of signal processing to increase high frequency components and remove degradations caused by imaging process. Currently, SR image reconstruction is a very active research area. It offers a promising method to overcome inherent resolution limitations of low-cost imaging sensors (e.g. cell phone or surveillance cameras) and allows the better utilization of the growing capability of high-resolution displays (e.g. high-definition LCDs) [2]. Super resolution image reconstruction technology is aiming at the image quality problems caused by degradation factors in imaging process, such as motion deformation, optical blurring, warping, low sampling rate, random noise, and aliasing. It has important applications in various fields such as space target imaging [3], remote sensing imaging [4], medical imaging [5, 6], tomographic imaging [7], and satellite imaging [8] where diagnosis or analysis from low-quality images may be extremely difficult. It has some special applications in video monitoring, special target identification and forensics, robot vision, microwave imaging, multi-spectral imaging, ultrasonic imaging, and so on. In the view of mathematic, SR image reconstruction is a typical ill-posed problem because of the insufficient of reasonable assumptions or prior knowledge about the observation model that maps a HR image to LR versions, ill-conditioned registration and unknown blurring operators, and non-unique reconstruction constraint. It has become one of the hot topics in multi-disciplinary fields including applied mathematics, image processing, computer vision, computational harmonic analysis, and so on. SR image reconstruction is one of the most active research areas since the pioneering work [9] was reported by Tsai and Huang in 1984. In the last three decades, many techniques have been proposed [10]. These techniques include single frame image SR reconstruction and

multiple frame images SR reconstruction. The involved fields are from signal processing to machine learning. The former is achieved by taking a single low-resolution image as input, and the latter is achieved by taking multiple low-resolution images of the same scene as input, and reconstructing a HR version based on reasonable assumptions or prior knowledge about the observation model. According to literature [11], the existing SR image reconstruction methods

can

be

divided

into

three

categories:

interpolation-based

methods,

reconstruction-based methods, and example learning-based methods. Interpolation-based SR methods are mainly aimed on single frame image SR reconstruction. It is achieved through pixels interpolation in spatial domain. This kind of method is mainly based on the following theory: it is non-uniform that the low resolution image pixels are mapped to a high resolution image grid through the algorithm of motion estimation, so the pixels of sampling point in high resolution image grid can be obtained by non-uniform interpolation. Based on this view, work [12] proposed an interpolation-based SR image reconstruction approach that used multi-surface fitting to take full advantage of image spatial structure. In [12], each low-resolution pixel was fitted with one surface, and the final estimation was made by fusing the multisampling values on these surfaces in the maximum a posteriori fashion. Subsequently, authors improved their work in [13], where, they took the advantage of nonlocal pixels to fit local surfaces. These pixels from nonlocal spatial- temporal positions were selected and weighted based on patch similarity and outlier removal. With this method, the fitted surfaces become more elaborate so that more details can be retrieved in SR results. Work [14] presented a sparse neighbor selection scheme for SR image reconstruction, where a larger number of neighbors were predetermined as potential candidates and an extended Robust- SL0 algorithm was developed to find the neighbors and solve the reconstruction weights simultaneously. Work [15] incorporated image nonlocal self-similarity into sparse representation model for image interpolation. This method can reconstruct the edge structures and suppress the jaggy/ringing artifacts. Reconstruction-based SR methods are mainly aimed multiple frame images SR reconstruction. This kind of method usually based upon an image degradation model and

solves an ill-posed inverse problem of de-blurring, up-sampling, and de-noising for a high-quality image [11]. There are many typical methods such as the Iterative Back-Projection (IBP) Approach [16], the Projection On to Convex Sets (POCS) Algorithm [17] and the Maximum A Posteriori (MAP) Estimation [18], and so on. In work [16], the HR image was estimated by back projecting the error (difference) between simulated LR images via imaging blur and observed LR images. The estimation procedure is repeated iteratively to minimize the error. The defect of this method is that the solutions are non-unique due to the ill-posed nature of SR image reconstruction problem [2]. The work [17] was based on the set theory. The work [18] was based on the statistical theory. The main advantage of these methods is that they are convenient to add image priors to regularize the ill-posed inverse problem. E.g. [19] utilized total variation function as image prior. Work [20] suggested incorporating registration information as prior. Example learning-based SR methods are mainly aimed single frame image SR reconstruction. This kind of method usually utilize different machine learning techniques to learn the mapping relationship from LR images to HR images, and obtain nonlinear qualitative relations between HR images and LR images for generation of details [11]. The prior knowledge obtained from the learning model is introduced in the process of image reconstruction. The early method was proposed by Freeman [21]. It is an example-based learning strategy, in which, Markov network was used to probabilistically model the relationships between high-resolution and low-resolution patches. This method relies heavily on enormous databases of millions of HR and LR patch pairs, and is therefore computationally intensive. Recent work [22] proposed an example learning based image super-resolution method. This method was based on a series of linear least squares functions, and possesses the characteristic of simple, effective, robust, and fast. About example-based learning strategy, exemplar-based face sketch synthesis has been widely applied to both digital entertainment and law enforcement, for example, Bayesian face sketch synthesis [23] and face

sketch-photo synt hesis [24]. Work [25] adopted the theory of Locally Linear Embedding (LLE) [26] from manifold learning. This idea was based on the assumption that small image patches

in low- and high-resolution images form manifolds with similar local geometry in two distinct feature spaces. The algorithm maps the local geometry of the LR patch space to HR one, and generates HR patch by a linear combination of neighbors in feature space. The potential problems existed in this method is it often results in blurring effects in reconstructed images, due to over- or under-fitting. Work [27] reported a method that adaptively chooses most relevant reconstruction neighbors based on sparse coding. This method avoids over- or under-fitting and produces superior results. Later, the same authors developed a new method based on sparse signal representation for single-image super resolution reconstruction, in their new work [28]. The main idea of work [28] is, by jointly training two dictionaries for the lowand high-resolution image patches, the similarity of sparse representations between LR and HR image-patch-pairs with respect to their own dictionaries can be enforced. Therefore, the sparse representation of a LR image patch can be applied with a HR image patch dictionary to generate a HR image patch. Work [29] presented a single image super-resolution reconstruction method by learning both non-local and local regularization priors from a given low-resolution image, in which, the algorithm utilized the non-local means filter to learn a non-local prior, and used steering kernel regression to learn a local prior. By assembling two complementary regularization terms, authors proposed a maximum a posteriori probability framework for SR recovery. Work [30] suggested a multiple-geometric-dictionaries-based clustered sparse coding scheme for single image SR reconstruction. In their method, a large number of high- resolution (HR) image patches were randomly extracted from a set of example training images and clustered into several groups of “geometric patches,” from which the corresponding “geometric dictionaries” were learned to further sparsely code each local patch in a LR image. A clustering aggregation was performed on the HR patches recovered by different dictionaries, followed by a subsequent patch aggregation to estimate the HR image. Work [31] reported a fully automatic SR algorithm that used the nonparametric Bayesian inference method based on numerical integration, which is known in the statistical literature as integrated nested Laplace approximation (INLA). The experimental results show that this algorithm performs better than other SR algorithms recently proposed.

Consider that Gaussian Process Regression (GPR) had been successfully applied to example learning-based image SR reconstruction, work [11] employed an active learning strategy to heuristically select more informative samples for training the regression parameters of the GPR model, and reported an example learning-based single image SR method, called active-sampling GPR (AGPR). This method is superior to other competitors for producing much sharper edges and finer details. Afterwards, the same authors reported a fast single image SR reconstruction method using sparse GPR [32], in which, authors first trained a full GP model, and then employed the sparse GP to further approximate the full GP model by seeking a sparse projection vector which can significantly accelerate the prediction efficiency while getting higher reconstruction quality. The experimental results indicated that this method is promising for real-time SR application. Recent work [33] developed a coarse-to-fine framework for single-image super-resolution (SR) reconstruction. This approach achieved high-quality SR recovery based on the complementary properties of both example learning and reconstruction-based algorithms. The experimental results indicated that this method outperforms other state-of-the-art methods for producing high-quality images. Enlightened by previous works, we propose a novel super-resolution image reconstruction method that comprehensively utilizes the multiple and structured information of LR images. The main contribution of our work is as follows. (1) The proposed method is provided with a frame of hierarchical structure that combines a neighborhood expansion process with the surface fitting technology, in which, a series of nested neighborhoods are created, and the LR pixels within each neighborhood are used to fit with a surface and thus a referenced sampling values of estimated pixel is obtained. (2) We propose an algorithm to remove the outliers according to pixel intensity. (3) In the proposed method, LR pixels provide effectively contribute to the final HR estimations through fitted surfaces in spatial distributions, correlation and intensity of pixels. (4) The proposed method does not need any artificial hypothesis on image prior, and it avoids to use conventional iterative process thus does not suffer from convergence problem. Comparing with the existing method, it is superior in both visual fidelity and numerical measures.

The remainder of this paper is organized as follows. In section 2, we describe the proposed method. Section 3 presents experimental results and performances analysis. Section 4 concludes the paper with some thoughts on future works. 2 Proposed Method In this section, we describe the proposed method. The proposed method consists of three pipeline stages: image registration, image fusion and image reconstruction. The flowchart of the proposed method is shown in Fig.1. 2.1 Image quality degradation The digital imaging system is not perfect due to hardware limitations, acquiring images with various kinds of degradations. These degradations are modeled fully or partially in different SR techniques. We use an observation model [2, 31, 34] to obtain a series of LR images. The observation model of a real imaging system relates a desired referenced HR image to all observed LR images [2, 31, 34]. Typically, an imaging process involves warping, followed by blurring and down-sampling to generate LR images from real scenes. So observation model can be described as follows:

Underlying HR image I Image fusion

Images registration Distance definition

SURF features

Matching rule

interpolate, amplify mapping to HR grid

Image Reconstruction Nested neighborhoods

The final HR image

Surface fitting

Removing abnormal data

MAP estimation

A series of estimated values from nested neighborhoods

Fig.1 The flowchart of the proposed method

The underlying HR image can be denoted as a vector I  [ I1 , I 2 ,..., I N1L1  N2 L2 ]T , where N1L1  N2 L2 is the size of HR image. Let L1 and L2 denote down-sampling factors of

horizontal and vertical directions, respectively, each observed LR image has size N1  N 2 . Thus, LR images can be represented as Tk  [Tk ,1 , Tk ,2 ,..., Tk , N1 N2 ]T , and the LR observation images are related with HR scene I by model

Tk  D  Bk  Mk  I  n k  W k  I  kn

(1)

where k  1,2,..., N , N is the number of LR images, D represents down-sampling matrix, Bk models the blurring effects, M k is a warp matrix, and nk is the noise term. The size of

Wk is ( N1 N2 )2  L1 N1 L2 N2 . Based on this observation model, the aim of SR image

reconstruction is estimating HR image from LR images Tk , k  1,2,..., N . In our work, the noisy version of an image is generated by mixing signal with a statistical random noise, in particular an additive white Gaussian noise. The blurred version is generated by convoluting the image with a motion blur operator. It is equivalent to the linear motion of a camera by LEN pixels, with an angle of THETA degrees in a counter-clockwise direction. That is, the motion blur operator becomes a vector for horizontal and vertical motions. Finally the rotated version is generated by using inbuilt Matlab function viz.imrotate( ). After this process, we can obtain a series of LR images (T1 , T2 ,..., TN ) . 2.2 Image Registration Image registration is an important preprocessing task for super-resolution image reconstruction. It is a process of overlaying two or more same scene images that are taken at different times, from different viewpoints and by different sensors. In this process, it is necessary to select a referenced image from processed image sequence. Image registration is implemented by applying spatial transformations which help in mapping a series of lower resolution image to the same reference benchmark. Considering that SURF (Speeded Up Robust Features) [36] is a fast algorithm of the SIFT [35], it is invariant to translation, rotation and scaling transformation in image domain and robust to moderate perspective transformations and illumination variations. We use SURF feature to define a matching rules to realize image registration.

(1) Selecting referenced image For a series of LR images Tk  k  1,2,..., N  , we calculate the Peak Signal Noise Ratio (PSNR) for each Tk , respectively, and define the image whose PSNR reaches maximum as a referenced image, denoted as T0 .

T0  arg max (PSNR k |PSNR k =PSNR(Tk ))

(2)

k

Here, PSNR( Tk ) represents the PSNR value of image Tk . (2) Defining matching rule Assuming the sets of SURF feature points are denoted as:



P1  p1,1 , p1,2 ,..., p1, m1

 (3)





PN  pN, 1, pN , ,2 . . . ,pNN ,m Here,

mk

represents

the

number

of

feature

points

extracted

from

Tk ,

pk , j  { pk1, j , pk2, j ,..., pk64, j } , k  1,2,..., N , j  1,2,...,max(m1 , m2 ,..., mN ) . Then we calculate the Euclidean distance between each point in P1 and each point in P2 ,..., PN . That is, for p1, j1  P1 and p2, j2  P2 , let 64

( p

d ( p1, j1 , p2, j2 ) 

i 1, j1

i 1

 p2,i j2 ) 2 ,

… For p1, j1  P1 and p2, jN  PN , let 64

( p

d ( p1, j1 , pN , jN ) 

i 1, j1

i 1

 pNi , jN ) 2

Here, j1  1,2,..., m1 , j2  1,2,..., m2 ,…, jN  1,2,..., mN . For d ( p1, j , p2, j ) , let d1 be the minimum valueand d 2 be the second minimum value, 1

2

that is d1  min{d ( p1, j , p2, j )} , d2  min{{d ( p1, j , p2, j )}  {d1}} . j1 , j2

1

2

The matching rule is defined as:

j1 , j2

1

2

r   , s u c c e s s   r  , f a i l u r e

(4)

Here, r  d1 / d 2 ,  is a threshold. Repeat the above process for d ( p1, j , p3, j ),..., d ( p1, j , pN , j ) , and matched feature points 1

3

1

N

between P1 and each Pk ( k  2,..., N ) are denoted as: Q1  q1,1 , q1,2 ,..., q1, m 

(5) QN  qN ,1 , qN ,2 ,..., qN , m 

Here, m is the number of matched feature points. Using these points, we can get parameters: rotation angle  k , horizontal offset xk and vertical offset yk , via affine transformation formula (6). These parameters represent the relative position of k-th LR image relative to the referenced image T0 .

Qk  Rk Q1  Dk  cos k Here, Rk     sin  k

(6)

sin  k   xk  , Dk     , k  2,..., N . cos k   yk 

2.3 Image Fusion Image fusion is a process that stacks multiple registered low-resolution images together and maps them into an estimated HR grid. Through image fusion, pixels from multiple low-resolution images are clustered together, and more relevant information from multiple low-resolution images is combined to estimate the HR pixels.

Fig.2 The schematic diagram of LR images fusion via interpolation, amplification and mapping. The “circles” represent HR pixels to be estimated. Other shapes represent LR pixels that

come from different LR images.

In order to get more information of low resolution pixels, the image fusing techniques used here are a series of continuous operations include interpolation, amplification and mapping.

We

first

execute

bicubic

interpolation

to

amplify

registered

images

Tk  k  1,2,..., N  , and then map them into a high resolution grid via using rotation angle  k ,

horizontal offset xk and vertical offset yk . These parameters are obtained from image registration stage. Here, Bicubic interpolation is used to ensure the smoothness of generated images. After this process, multiple LR images are positioned into an HR grid, as shown in Fig.2. Fig.2 is a schematic diagram of LR images fusion via interpolation, amplification and mapping. The “circles” represent the HR pixels should be estimated. The LR pixels are represented by other shapes. Different shapes represent different pixels that come from different LR images. 2.4 Image Reconstruction In this section, we describe a frame of hierarchical structure that combines a neighborhood expansion process with surface fitting technology to estimate high resolution pixel. We treat the reconstructed image as a grid, and our goal is to estimate the pixel value of each grid node. Firstly, we take a grid site as a center to generate an initial neighborhood, and fit with a surface using LR pixels within this neighborhood. For the fitted surface, we can regard it as a continuous image, and thus, a sampling value can be obtained at the location of discussed grid node. Then we expand above mentioned neighborhood by using a given stepsize to obtain a series of nested neighborhoods, fit with one surface for each neighborhood, and thus, a series of sampling value at same location can be obtained. The final HR pixel can be estimated via pooling these values to a MAP frame. To get exact estimated values, the sizes of initial neighborhood, final neighborhood, and step-size are all very important. The algorithm process can be formulated as follows. (1) Neighborhoods expanding We denote a discussed grid site as Hg. To estimate high resolution pixel HP at Hg, we

create a series of progressively expanding neighborhoods. These neighborhoods are nested one by one and around Hg (see the “circle” point in the center of Fig.3). Hg

b2

b1

Fig.3 The schematic diagram of the neighborhood expanding Take Hg as center, create a 1×1 initial neighborhood. Thus the initial length of half side of the neighborhood is 0.5. Seeing Fig.3, b1  0.5 . Let l  0.1 as step-size, b2  1.5 as the final value of half side of the neighborhood. From b1 to b2 , we can create a series of nested neighborhoods one by one. The size of the final neighborhood is 3  3 , and the number of neighborhoods is b  (b2  b1 ) / l  1 , denote these neighborhoods as NBi

 i  1,2,..., b  ,

seeing Fig.3. (2) Data collecting Searching for LR pixels within NBi , denoted as LPij , i  1,2,

, b , j  1, 2,...,bi , bi is

the number of the LR pixels within NBi . (3) Establish pixel intensity planes We define a coordinate system, and take the high resolution grid plane is XOY plane. Let Z-axis be perpendicular to the XOY plane. For each LR pixel Lij  j  1,2, neighborhood NBi ,

 i  1, 2,...,b  ,

, bi  within

using pixel value of Lij as high, define a plane Pij to

parallel to the XOY plane (see Fig.4), we call pixel intensity plane, and denoted as:





Pi j  p f Ni B L i j

(7)

Here, f NBi  Lij  denotes the pixel value of Lij , p( ) denotes a function that generates a plane.

Z

Pij plane

Hg Y

O X

Fig.4 The schematic diagram of pixel intensity plane (4) Data cleansing Considering that some outlier data will influence the accuracy of image reconstruction, we develop the following algorithm to remove outlier data. Definition 1. The distance between two pixels: The distance between pixels Lik and Lij is defined as the distance between corresponding plane Pik and Pij . In neighborhood

NBi , computing distance  i ,k , j from each pixel Lik to Lij :

 i ,k , j  f NB ( Lik )  f NB ( Lij ) , k =1, 2,...,bi ,k  j i

Let

Wij 

i

   k j

i ,k , j

2

/ (bi  1) , j  1, 2,...,bi

Wi  {Wi1 ,Wi 2 ,

,Wibi }

(8)

(9) (10)

Rearrange elements of Wi in ascending order and obtain set: H

Wi  {Wi1 ,Wi2 ,

g

,Wibi }

(11)

Computing the difference of adjacent elements in Wi  , and the difference set is denoted as: Hi  {hi1 , hi 2 ,

, hi (bi 1) } , here

hi1  Wi 2  Wi1 hi 2  Wi3  Wi2



(12)

hi (bi 1)  Wibi  Wi(bi 1)

ni  arg max (hint )

Let

(13)

nt 1,2,...,bi 1

ni is the index of the maximal element hini  H i . Using ni , we can find the

corresponded element Wini Wi  . Then we define a criterion to decide whether a low resolution pixel Lij is an outlier:   if Wij  Wni , Lij   if Wij  Wni ,Lij

i s a n o r m aj  l d a t abi (

1, 2 , . . . ,

ani ab s normal data ( j  1, 2,..., bi )

) (14)

We remove the outlier, and obtain a new low resolution pixel set of neighborhood NBi :

LPi  {LPi1 , LPi 2 ,

, LPiMi }

(15)

Here, M i  M i  bi  is the number of remaining low resolution pixels in NBi . (5) Pixel estimating For neighborhood NBi  i  1,2,

, b  with M i LR pixels, we fit with a surface from

LPij ( j  1,2,..., M i ), that is, using coordinates

x , y  ij

ij

of LPij to fit with the quadric

surface equation:

fˆ  xij , yij   c0  c1 xij  c2 yij  c3 xij 2  c4 xij yij  c5 yij 2 Here,

x , y  ij

ij

(16)

is the horizontal and vertical coordinates of a LR pixel LPij in the

coordinate system of HR grid. Hence, each neighborhood is corresponding to a fitted surface. For each LR pixel LPij , the intensity value is estimated by the following equation:

f NBi  LPij   fˆNBi  LPij   ij , 1  i  b, 1  j  M i

(17)

Here, f NBi  LPij  is estimated intensity value of LR pixel LPij . Let ij  0 ,we can get an equation system:

1 xi1   1 xij   1 x iM i 

yi1

xi21

xi1  yi1

yij

xij2

xij  yij

yiM i

2 xiM i

xiM i  yiM i

c  yi21   0   f NBi  LPi1     c1    c     2 2  yij      f NBi  LPij     c3         c 4 2  yiM   f  LP i  iM i   c5   NBi



(18)



As can be seen from equation (18), as long as M i is larger enough, solving equations system (18) to get the value of parameter c0 ,..., c5 is equivalent to solve an overdetermined equations system. Here, we solve such a problem by using the method of least squares. To get better surface fitting effect, a certain number of low resolution pixels are needed, thus the size of b1 should not be too small. However, bigger neighborhood probably introduces more outlier that will cause negative impact to surface fitting. By selecting appropriate value of b1 and b2 , fitted surfaces can effectively represent image details such as gradients, curvatures, and higher order information. After surface fitting, the error value  ij can be obtained from equation (17). Then we calculate mean squared error and an estimated value HP in HR grid site:

 i2 

1 Mi

Mi

 j 1

ij

, 1 i  b

f NBi  HP   f NBi  HP    i , 1  i  b

(19)

(20)

By this way, a HR pixel is associated with multiple LR pixels in terms of three aspects. One is the spatial distribution of LR pixels in the HR grid, and it can be represented by the local structures of a series of nested neighborhoods. The second is the correlation of LR pixels, and it can be represented by fitted surface. The third is the intensity of LR pixels within the neighborhood, and this can be represented by pixel value. (6) Image reconstruction Finally, the intensity of HR pixel can be calculated by MAP estimation [12], i.e.,



 H P, .N. B. , f  

fˆ  H P  a r g m a xq f H  PN |B1 f f  H P



 a r g m aqx fN B1  H P f  H P

b

HP



H  P  f |  H P

, . N.fb .B,

 q f

(21)

HP

where q () denotes the probability density function. Under Gaussian assumption, equation (21) is equivalent to:



 b f  HP   f  HP  NBi fˆ  HP   a r g m i  n  i2 f  HP   i 1 

Here,

f 0  PH  is the prior estimation of



2

 2    f 0  HP   f  HP     

(22)

f  PH  , it is obtained using B-spline

interpolation [12].  is an empirical parameter. Let the gradient of cost in equation (22) equal to zero, and we will get the MAP estimation for the intensity of the HR pixel HP. fˆ  H P 

  B    i  1

0 f

B

H P i i 1

f

Ni B



H  P 

(23)

i 1

Here, i  1/  i2 . Let   0 , the expression (23) is essentially a weighted sum. The high resolution image is finally obtained by estimated high resolution pixels. 3 Experimental Results and Performance Analysis In this section, we discuss the performance of the proposed method via visual fidelity and numerical measures, and compare it with the state-of-the-art SR approaches [12], [13], [15]. Unlike literature [12] that utilized 60, and 70 LR images in reconstruction, in the proposed method, just 25 LR images can achieve obvious reconstruction effect. In our experiment, the magnification factor of SR is 4. The parameters of the motion blur operator are LEN =3 and THETA =5. Our method is implemented and tested using MATLAB2010a. We run programs in the computer with Dual-Core CPU, E5400 @ 2.70GHz, 2.00GB RAM. A. Simulations Results In order to demonstrate and compare the performance of the proposed method, we carry out the simulation experiments on USC-SIPI [38] image database, and use four well-known images that are shown in Fig.5. Fig.5(a) is the top left corner of the EIA 1956 video resolution target, and Fig.5 (b)-(d) are from the USC-SIPI image database. The resolution of all these

images is 512 ×512.

(a)

(b)

(c)

(d)

Fig.5. The tested images. (a) The top left corner of the EIA 1956 video resolution target. (b) Lena. (c) Tank. (d) Aerial image of San Diego.

We use the proposed method to implement SR reconstruction for tested images, and the results are shown in Fig.6. As can be seen from Fig.6, comparing with (b), (c) and (d), results (e) show clearer margin and fewer artifacts, and show better visual effect. This means that the proposed method is superior to methods [12], [13] and [15] in visual effect.

(a)

(b)

(c)

(d)

(e)

Fig.6 The part close-up views of the reconstruction results and the comparison. (a) the part

close-up views of the referenced low resolution image, (b) the reconstruction results of the method [12], (c) the reconstruction results of the method [13], (d) the reconstruction results of the method [15], (e) the reconstruction results of the proposed method. Table 1 Comparison results of PSNR and FSIM Images EIA Lena Tank Aerial

Criteria PSNR FSIM PSNR FSIM PSNR FSIM PSNR FSIM

Zhou[12] 28.8335 0.8667 29.3164 0.8701 29.7847 0.8917 26.7937 0.8267

Zhou[13] 34.5030 0.8667 32.8475 0.8798 33.5940 0.8665 29.5954 0.8288

Song[15] 32.9989 0.8590 31.5453 0.8483 32.1511 0.8508 29.3640 0.8427

The proposed method 35.5495 0.9516 34.4815 0.9218 34.9665 0.8857 30.2280 0.8594

In order to estimate reconstruction results in quantify manner, we investigate the image quality assessment (IQA) indicators. Feature similarity index (FSIM/FSIMc) is a new image quality assessment indicator proposed in [37]. FSIM/FSIMc is proposed for full reference IQA. It is based on the fact that human visual system (HVS) understands an image mainly according to low-level features, specifically, phase congruency (PC) and image gradient magnitude (GM). Phase congruency is a dimensionless measure of the significance of image local structure, which is used as the primary feature in FSIM. Image gradient magnitude is computed as the secondary feature to encode contrast information of the image. In [37], experimental results performed on six benchmark IQA databases demonstrate that FSIM can achieve higher consistency with the subjective evaluations than state-of-the-art IQA metrics, such as the visual information fidelity (VIF) metric, the structural similarity (SSIM) index, the multiscale extension of SSIM (MS-SSIM), the information fidelity criterion (IFC), the visual SNR (VSNR), the noise quality measure (NQM), and Peak Signal-to-Noise Ratio (PSNR). We utilize PSNR and Feature similarity index (FSIM/FSIMc) to demonstrate the superiority of our method in the case of SR image reconstruction. SR image reconstruction is considered as more effective if reconstructed image is provided higher PSNR and FSIM. In experiment, we use 25 LR images in reconstruction. We compare the proposed method with state-of-the-art methods [12], [13], and [15], the results are shown in Table 1. As can be seen from Table 1, comparing with the methods [12], [13], and [15], the proposed method reaches highest PSNR and FSIM. This means that the proposed method can achieve better reconstruction effect via using a little amount of LR images.

Now we estimate the computational efficiency of the proposed method. We compare average run time, and results are listed in Table 2. In Table 2, work [15] is a learning-based method, though the average run time is 165s only, it is indispensable to need large-scale training-test sample data set in prior. As can be seen from Table 2, compared with [12], [13], and [15], the average run time of the proposed method is the lowest. This means that the proposed method is computationally efficient. Table 2 The comparison of the average run time (s) Methods Average run time (s)

Zhou[12] 563

Zhou[13] 395

Song[15] 165

The proposed method 381

B. The Performance Estimation To estimation algorithm performance, we investigate the behaviors of the proposed method in different image noise level and different number of LR images. In experiment, we add Gaussian white noise into testing images and let the noise variances equal to 0.01 and

0.03, respectively, and then we use equation (1) to obtain a series of LR images. We execute the proposed algorithm by using different number of LR images, 15 frames, 20 frames, 25

frames, 30 frames and 35 frames, respectively. Taking Pepper image and Lena image as examples, testing results are shown in Fig.7 and Fig.8. In Fig.7, (a) is the original Pepper image; the noise variance is 0.01; (b) is the LR version of (a), and (c), (d), (e), (f) and (g) are restructured images via 15 frames, 20 frames, 25 frames, 30 frames, and 35 frames LR images, respectively. In Fig.8, (a) is the original Lena image; the noise variance is 0.03; (b) is the LR version of (a), and (c), (d), (e), (f) and (g) are restructured images via 15 frames, 20 frames, 25 frames, 30 frames, and 35 frames LR images, respectively. Here, parameter   0 .

(a) Pepper (original image) (b)Inputted LR Image

(c)N=15

(d)N=20

(e)N=25

(f)N=30

(g)N=35

Fig. 7 Reconstruction results of Pepper image with 0.01 Gaussian white noise

(a)Lena (original image) (b)Inputted LR Image

(e)N=25

(c)N=15

(f)N=30

(d)N=20

(g)N=35

Fig. 8 Reconstruction results of Lena image with 0.03 Gaussian white noise

As can be seen from Fig.7 and Fig.8, when noise variance is 0.01, the reconstruction quality is less affected by the changing of LR image numbers. However, when noise variance is 0.03, the reconstruction quality presents some changes with different image numbers. For further analysis, we investigate the image quality evaluation standard -- Peak Signal-to- Noise Ratio (PSNR), results are shown in Table 3. Table 3. The comparison of PSNR in different noise level and different number of LR images The

number

of LR images

Pepper

Lena

The noise variance

The noise variance

0.01

0.03

0.01

0.03

15

33.0027

32.3700

33.1878

31.5315

20

33.0689

32.6104

33.2074

31.6352

25

33.0629

32.7287

33.1219

31.6639

30

33.0636

32.8046

33.1456

31.8299

35

33.0332

32.7765

33.0913

31.7575

As can be seen from the Table 3, the reconstruction quality is changing with different noise level and different LR image number. While, the reconstruction quality is not proportional to the number of low resolution image number. From table 3, in the case that noise variance is equal to 0.01, the PSNR of the reconstruction image is highest when the number of LR image is 20, while, if the noise variance is equal to 0.03, the PSNR of the reconstruction image is highest when the number of LR image is 30. In order to further analyze the performance of the proposed algorithm, we show the two dimensional Fourier transform results of the original image, LR image, and the reconstructed HR image. Fig.9 is the examples. In Fig.9, (a) and (d) are original images, (b) is the LR image of (a), (c) is the reconstructed HR image of (b); (e) is the LR image of (d), (f) is the reconstructed HR image of (e). (a’)~(f’) are the two dimensional Fourier frequency spectrums of (a)~(f). As can be seen from Fig.9, (a’) and (c’) are similar, and the high frequency points in (a’) or (c’) are more than those in (b’), obviously; (d’) and (f’) are similar, and the high frequency points in (d’) or (f’) are more than those in (e’), obviously. This means that the image quality of the reconstructed HR images (c) and (f) are close to that of the original image (a) and (d), and are higher than their LR version (b) and (e), respectively.

(a)

(b)

(c)

(d)

(e)

(f)

(a’) (b’) (c’) (d’) (e’) (f’) Fig.9 The two dimensional Fourier transform results of the original image, LR image, and the reconstructed HR image

Furthermore, we compare the proposed method with work [12] in term of PSNR, and the results are shown in Fig.10. In Fig.10, (a) is the PSNR tendency of the proposed method, and (b) is the PSNR tendency of work [12]. As can be seen from Fig.10, for the proposed method, image reconstruction quality is less affected by the changing of LR image numbers when

noise level is lower. However, the image reconstruction quality of work [12] is affected obviously by the changing of LR image numbers. This means that the proposed method is more stability than that of work [12]. Furthermore, for the work [12], one has to use a large number of LR images if he/she want to get high quality reconstruction image.

(a)

(b)

Fig.10 The comparison of the proposed method and work [12] in term of PSNR

C. The Reconstruction Effect for Reality Scene As we know, an algorithm is useful if it is effective not only in laboratory, also for the real scene. In this section, we test the effectiveness of the proposed approach by using real-world data. We photograph two sequences images of “Car” and “Leaf” from long distance with a portable digital camera, the blur types of these images include out-of-focus blur and motion blur. Then we use the proposed method to implement SR reconstruction for these images. The visual comparisons are illustrated in Fig.11 and Fig.12. Here, we use 25 images as the input for each sequence. For each group of results, the top row is the example of inputted image, (a), (b), (c) and (d) are the reconstruction results of [12], [13], [15], and the proposed method, respectively. Certainly, merely a small part of the testing image is highlighted. As can be seen from Fig.11 and Fig.12, compared with original image, the reconstructed results obtained from our method has definite improvement in image quality. Compared with other SR methods, the proposed method produces better visual effect. For the real-world scene, there is not the ground truth of the HR image. To quantitatively evaluate the reconstruction results, we use a state-of-art blind image quality assessment

method, known as quality-aware clustering (QAC) [39]. The QAC is based on blind image quality assessment (BIQA), which predicts image quality that correlates with human perception without any knowledge of a reference image.

Leaf

(a)

(b)

(c)

(d)

Fig.11 The example of a real-world image “Leaf”. The top is the input. (a), (b), (c) and (d) are the reconstruction results of [12], [13], [15], and the proposed method, respectively.

Car

(a)

(b)

(c)

(d)

Fig.12 The example of a real-world image “Car”. The top is the input. (a), (b), (c) and (d) are the reconstruction results of [12], [13], [15], and the proposed method, respectively.

An image is considered as higher quality if it has higher QAC. In the case of SR reconstruction, the result is considered as more accurate if higher QAC are obtained. We show the quantitative comparisons in terms of QAC and average run time in Table 4. As can be seen, the proposed method is superior to others. Though the running time of work [15] is lowest, it is a learning-based method and need large-scale training-test sample data in prior. This demonstrates that the proposed method is feasible for real scene. Table 4 The comparison results of QAC and average run time (s) Criteria

Zhou[12]

Zhou[13]

Song[15]

QAC

0.6271

0.6518

0.6460

Average run time (s)

112

64

23

The proposed method 0.6690 45

4. Conclusion In this paper, we propose an SR image reconstruction frame of hierarchical structure by using the techniques of neighborhood expansion and multi-surface fitting. In the proposed method, each HR pixel is determined by neighboring LR pixels, and associated with three aspects: pixel correlation, and pixel intensity and the spatial structure. The proposed method does not need too many low-resolution images in reconstruction, and is non-iterative and does not suffer from convergence problem. The experimental results demonstrate that the proposed method is superior in both visual fidelity and numerical measures. Image SR reconstruction is a very promising filed since various important applications. However, there are some larger gaps between current study results and practical application requests. Our future researches are mainly concentrated in the following respects. (1) Developing accurate image degradation model to achieve precise estimation for point diffusion function and noise. The technique of image super-resolution enhancement depends heavily on the accuracy of imaging system and imaging condition. However, it is very difficult to get image degradation model that is conformity with the actual imaging process. Existing approximate observation models used in literatures are all simple and deterministic, and are difficult to tally with the real imaging process. (2) Robustness. Many of the existing algorithms are based on the assumptions that are

difficult to meet. In practical application, the robustness to tolerate content-preserving manipulations is necessary. (3) Compressed-domain SR image reconstruction. Some SR image reconstruction algorithms are aiming at image sequences. In fact, the most common image sequences are video files. Therefore, in designing of super-resolution image reconstruction algorithm, it is valuable to take full account of the image degradation factor caused by different video compression format and codec technology, imaging model and compression algorithm, as well as the motion compensation and coding transmission mechanism, to implement SR image reconstruction in compressed domain.

REFERENCES [1] Chaudhuri, S.(2001).Super-resolution imaging. In Springer. [2] Park,S.C., Park,M.K., & Kang,M.G.(2003). Super-resolution image reconstruction: a technical overview. IEEE Signal Processing Magazine,20(4), 21-36. [3] Kidera,S., Sakamoto,T., & Sato,T.(2011).Super-Resolution UWB Radar Imaging Algorithm Based on Extended Capon with Reference Signal Optimization. IEEE Transactions on Antennas and Propagation, 59(5), 1606-1615. [4] Jia,X., Li,F., & Fraser.D.(2008). Universal HMT based super resolution for remote sensing images. IEEE International Conference on Image Processing, 333-336. [5] Kennedy,J.A., Israel,O., Frenkel,A., Bar-Shalom,R., & Azhari,H.(2006). Super resolution in PET imaging. IEEE Transactions on Medical Imaging ,25(2), 137-147. [6] Robinson,M.D., Chiu,S.J., Lo, J.Y., Toth, C., Izatt, J.A.,& Farsiu, S.(2011).New Applications of Super-Resolution in Medical Imaging. Book chapter in Super-Resolution Imaging, CRC Press, 383-412. [7] Zhu,X., & Bamler,R.(2012). Demonstration of Super-Resolution for Tomographic SAR Imaging in Urban Environment. IEEE Transactions on Geoscience and Remote Sensing, 50(8), 3150-3157. [8] Ilovitsh,A., Zach,S., & Zalevsky,Z.(2012).Contour superresolved imaging of static ground targets using satellite platform. Applied Optics,51(24),5863. [9] Tsa,R.Y., & Huang,T.S.(1984). Multiple frame image restoration and registration. Advances in Computer Vision and Image Processing, CT, 317-339. [10] Yang,J., & Huang,T.(2011). Image super-resolution: Historical overview and future challenges. SuperResolution Imaging, CRC Press, 1-24. [11] Wang,H., Gao,X., Zhang, K., (2016). Single Image Super-resolution Using Active-sampling Gaussian Process Regression. IEEE Trans. on Image Processing, 25(2), 935-948. [12] Zhou,F., Yang,W., & Liao,Q.(2012). Interpolation-Based Image Super-Resolution Using Multisurface Fitting. IEEE Transactions on Image Processing, 21(7), 3312-3318. [13] Zhou,F., Xia,S., & Liao, Q.(2014).Nonlocal Pixel Selection for Multisurface Fitting-Based Super-Resolution. IEEE Trans.Circuits and Systems for Video Technology, 24(12), 2013-2017. [14] Gao,X., Zhang,K., Tao,D., Li,X.(2012). Image Super-Resolution with Sparse Neighbor Embedding. IEEE Transactions on Image Processing,21(7), 3194-3205. [15] Dong,W., Zhang,L., Lukac,R., & hi,G.(2013). Sparse representation based image interpolation with nonlocal autoregressive modeling. IEEE Transactions on Image Processing, 22(4), 1382–1394. [16] Irani,M., & Peleg,S.(1991). Improving resolution by image registration. Graphical Models and Image Proc(CVGIP), 53, 231-239.

[17] Patti,A.J., Sezan,M.I., & Tekalp,A.M.(1997). Super-resolution video reconstruction with arbitrary sampling lattices and nonzero aperture time. IEEE Transactions on Image Processing, 6(8), 1064–1076. [18] Schulz,R.R., & Stevenson,R.L.(1996). Extraction of high-resolution frames from video sequences. IEEE Transactions on Image Processing,5(6), 996- 1011. [19] Ng,M.K., Shen,H., Lam,E.Y., & Zhang,L.(2007). A total variation regularization based super-resolution reconstruction algorithm for digital video. Journal on Advances in Signal Processing, 2007(2),230-236 [20] Belekos,S.P., Galatsanos,N.P., & Katsaggelos,A.K.(2010).Maximum a posteriori video super-resolution using a new multichannel image prior. IEEE Transactions on Image Processing, 19(6), 1451–1464. [21] Freeman,W.T., Jones,T.R., & Pasztor,E.C.(2002).Example-Based Super Reslution. IEEE Computer Graphics and Applications, 22, 56-65. [22] Hu,Y, Wang,N, Tao,D, Gao,X., Li,X., (2016). SERF: A Simple, Effective, Robust, and Fast Image Super-Resolver From Cascaded Linear Regression, IEEE Transactions on Image Processing, 25(9), 4091-4102. [23] Wang,N. Gao,X., Sun,L., Li,j. (2017), Bayesian Face Sketch Synthesis, IEEE Transactions on Image Processing, 26(3). [24] Wang,N., Tao,D., Gao,X., Li,X., Li,j.,(2014), A Comprehensive Survey to Face Hallucination, International Journal of Computer Vision, 106(1), 9–30. [25] Chang,H., Yeung,D.Y., & Xiong,Y.(2004). Super-resolution through neighbor embedding. IEEE Conference on Computer Vision and Pattern Classification (CVPR), 275-282. [26] Roweis,S.T., & Saul,L.K.(2000).Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323-2326. [27] Yang,J., Wright,J., Huang,T., & Ma,Y.(2008). Image super-resolution as sparse representation of raw image patches. CVPR , 1-8. [28] Yang,J., Wright,J., Huang,T., & Ma,Y.(2010). Image Super Resolution via Sparse Representation.IEEE Transactions on Image Processing,19(11), 2861-2873. [29] Zhang,K., Gao,X., Tao,D., & Li,X.(2012). Single Image Super-Resolution With Non-Local Means and Steering Kernel Regression. IEEE Transactions on Image Processing, 21(11), 4544-4556. [30] Yang,S., Wang,M., Chen,Y., & Sun,Y.(2012). Single-Image Super-Resolution Reconstruction via Learned Geometric Dictionaries and Clustered Sparse Coding. IEEE Transactions on Image Processing, 21(9), 4016-4028. [31] Camponez,M.O., Salles,E.O.T., & Filho,M.S.(2012). Super-Resolution Image Reconstruction Using Nonparametric Bayesian INLA Approximation. IEEE Transactions on Image Processing ,21(8), 3491-3501. [32] Wang, H., Gao, X., Zhang, K., Li, J.(2017). Fast Single Image Super-resolution Using Sparse Gaussian Process Regression. Signal Processing, 134, 52-62. [33] Zhang, K., Tao, D., Gao, X., Li, X., Li. J.(2017). Coarse-to-Fine Learning for Single-Image Super-Resolution. IEEE Transactions on Neural Networks and Learning Systems, 28(5), 1109-1122. [34] Elad,M., & Feuer,A.(1997). Restoration of a single super-resolution image from several blurred, noisy, and undersampled measured images. IEEE Transactions on Image Processing, 6(12),1646-1658. [35] Lowe, D.G.(2004). Distinctive image features from scale-invariant keypoints. IJCV, 60(2), 91-110. [36] Bay,H., Ess, A., Tuytelaars,T., & Van Gool,L.(2008). SURF: Speeded Up Robust Features. Computer Vision and Image Understanding, 110(3), 346-359. [37] Zhang,L., Zhang, L., Mou,X., & Zhang,D. (2011). FSIM: A Feature Similarity Index for Image Quality Assessment. IEEE Transactions on Image Processing, 20(8), 2378-2386. [38] Weber, A. USC-SIPI Image Database. [Online]. Available: http://sipi.usc.edu/database /database.php. [39] Xue,W., Zhang, L., & Mou, X.(2013). Learning without human scores for blind image quality assessment. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

Xiaofeng Wang received her B.S. degree in applied mathematics from Tianjin University, China, and both her M.S. degree in mathematics and her Ph.D. degree in mechanical and electronic engineering from the Xi’an University of Technology, China. In 2007, she joined the Institute of Artificial Intelligence and Robotics, Xi’an

Jiaotong University, where she was a post-doctoral researcher until 2010. In 2012, she joined the Grasp Lab, University of Pennsylvania, where she was a visiting scholar until 2013. She is currently a professor with the Department of Mathematics, Xi’an University of Technology, China. Her current research interests include multimedia forensics and security, image processing, steganography and steganalysis. Didong Zhou received his B.S. degree in mathematics from Xi’an University of Technology, Xi’an, Shaanxi, China, in 2013. Currently he is studying for an M.S. degree in mathematics, Xi’an University of Technology, Xi’an, Shaanxi, China. His research interests include images processing and super-resolution image reconstruction. Nengliang Zeng received his B.S. degree in mathematics from Xi’an University of Technology, Xi’an, Shaanxi, China, in 2012, and received his M.S. degree in mathematics from Xi’an University of Technology, Xi’an, Shaanxi, China. His research interests include, images processing, super-resolution image reconstruction. Xina Yu received her B.S. degree in mathematics from Xi’an University of Technology, Xi’an, Shaanxi, China, in 2014. Currently she is studying for an M.S. degree in mathematics, Xi’an University of Technology, Xi’an, Shaanxi, China. Her research interests include images processing and super- resolution image reconstruction. Hu Shaolin received his B.S degree in mathematics from Anhui Normal University, and his M.S degree in statistics and her Ph D degree in system engineering from the Xi’an Jiaotong University, China. In 2002, he joined the School of Information, University of Science and Technology of China, where he was a post-doctoral researcher until 2004. In 2006, he joined the Royal Swedish Institute of Technology, where he was a visiting professor until 2007. He is currently a distinguished professor with the School of Automation, Foshan University, and supervisor of doctorate candidate at the School of Automation and Information Engineering, Xi’an University of Technology, China. His current research interests include system safety, data fusion, fault diagnosis and big data analysis.