Computers and Electrical Engineering xxx (2015) xxx–xxx
Contents lists available at ScienceDirect
Computers and Electrical Engineering journal homepage: www.elsevier.com/locate/compeleceng
A performance comparison among different super-resolution techniques q Damber Thapa a, Kaamran Raahemifar b,⇑, William R. Bobier a, Vasudevan Lakshminarayanan a,c,d a
School of Optometry and Vision Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada Department of Electrical and Computer Engineering, Ryerson University, Toronto, ON M5B 2K3, Canada c Department of Physics and Electrical and Computer Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada d Department of Physics, University of Michigan, Ann Arbor, MI, USA b
a r t i c l e
i n f o
Article history: Received 1 March 2014 Received in revised form 13 September 2015 Accepted 15 September 2015 Available online xxxx Keywords: Super-resolution Reconstruction Learning-based Example-based Sparse representation Interpolation
a b s t r a c t Improving image resolution by refining hardware is usually expensive and/or time consuming. A critical challenge is to optimally balance the trade-off among image resolution, Signal-to-Noise Ratio (SNR), and acquisition time. Super-resolution (SR), an off-line approach for improving image resolution, is free from these trade-offs. Numerous methodologies such as interpolation, frequency domain, regularization, and learning-based approaches have been developed for SR of natural images. In this paper we provide a survey of the existing SR techniques. Various approaches for obtaining a high resolution image from a single and/or multiple low resolution images are discussed. We also compare the performance of various SR methods in terms of Peak SNR (PSNR) and Structural Similarity (SSIM) index between the super-resolved image and the ground truth image. For each method, the computational time is also reported. Ó 2015 Elsevier Ltd. All rights reserved.
1. Introduction The computerized image resolution enhancement began in 1984 when Tsai and Huang [1] introduced a mathematical method for combining multiple low resolution (LR) images to obtain a single high resolution (HR) image. While initially there was little interest in this technology, over time with much theoretical and practical improvement, the technique led to the development of many tools currently available and was used in different fields such as security surveillance,
Abbreviations: LR, low resolution; HR, high resolution; SNR, Signal-to-Noise Ratio; SR, super-resolution; PSNR, Peak-Signal-to-Noise Ratio; SSIM, Structural Similarity; MSE, Mean Square Error; EGI, Edge-Guided Interpolation; NEDI, New Edge-Directed Interpolation; GBA, Gradient-Based Adaptive; ASDS, Adaptive Sparse Domain Selection; DFT, Discrete Fourier Transform; DCT, Discrete Cosine Transform; DWT, Discrete Wavelet Transform; IBP, Iterative Back Projection; POCS, Projection Onto Convex Set; MAP, Maximum A-Posteriori; MRF, Markov Random Field; PDF, Probability Density Function; SAR, Simultaneous Autoregressive; CRF, Conditional Random Field; MLE, Maximum Likelihood Estimation; PCA, Principle Component Analysis; CSR, Centralized Sparse Representation. q
Reviews processed and recommended for publication to the Editor-in-Chief by Associate Editor Dr. F. Sahin.
⇑ Corresponding author at: Department of Electrical and Computer Engineering, Ryerson University, 350 Victoria St, Toronto, ON M5B 2K3, Canada. Tel.: +1 416 979 5000x6097; fax: +1 (416) 979 5280. E-mail addresses:
[email protected] (D. Thapa),
[email protected] (K. Raahemifar),
[email protected] (W.R. Bobier),
[email protected] (V. Lakshminarayanan). http://dx.doi.org/10.1016/j.compeleceng.2015.09.011 0045-7906/Ó 2015 Elsevier Ltd. All rights reserved.
Please cite this article in press as: Thapa D et al. A performance comparison among different super-resolution techniques. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.09.011
2
D. Thapa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx
biomedical applications, remote sensing, object recognition (such as face, finger print, iris, vehicle number plate and text) and video conversion [2,3]. Resolution enhancement is one of the most rapidly growing areas of research in the field of image processing. The term resolution refers to the ability of an imaging instrument in revealing the fine details of an object. The resolution of an imaging device depends on the quality of its optics as well as its recording (sensor) and display components. The spatial resolution of an imaging instrument can be improved by modifying the hardware (sensor) in two ways. The first approach is to increase the pixel numbers. However, this approach has rather limited applications since it decreases the Signal-to-Noise Ratio (SNR) and increases the image acquisition time, and therefore, it is challenging to balance the trade-off between resolution, SNR, and acquisition time [4]. The second approach is to increase the chip size; however, a chip size necessary to capture a HR image would be very expensive [5]. An interesting alternative to both of the aforementioned approaches is to use the super-resolution (SR) techniques. SR is an off-line approach for improving the resolution of an image. SR techniques are broadly divided into multi-frame SR (classic approach) and single-frame SR. In multi-frame SR techniques a set of LR images acquired from the same scene are combined to reconstruct a single HR image. LR images can be taken by the same imaging instrument or with different instruments. The goal is to find the information missing in one LR image in other LR images. By doing so, the information contained in all LR images is pooled to obtain a HR image [5]. Several multi-frame SR techniques have been investigated in medical imaging [4]. In single frame SR technique, the missing high frequency information in the LR image during the acquisition step is estimated from a large number of training set images and added to the LR image [2]. In this paper, we present a survey of major SR techniques. Besides this, the MATLAB codes written and published by different groups of researchers were downloaded from their websites and the performance of various SR techniques were compared. The comparisons are made in terms of common image quality metrics such as peak SNR (PSNR) and Structural Similarity (SSIM) discussed in details in Section 5. We also report the execution time of the codes for each method. A number of review papers have also been published in this field [3,5–8]. While some of these papers provide a good overview of SR techniques, only [8] provides a comprehensive performance comparison in terms of image quality metrics. The survey paper [8] has provided the performance comparison in terms of objective quality metrics; however, it is limited to single-frame SR techniques. This paper is different from the previous review papers in that it provides performance comparisons of both single-frame and multi-frame SR techniques. The rest of the paper is organized as follows. Section 2 explains observation model that relates the HR image to the observed LR images. Several multi-frame SR techniques are described in Section 3. The single-frame SR techniques are described in Section 4. The image quality metrics are discussed in Section 5. Section 6 provides comprehensive performance comparisons of various SR techniques with natural images. A detailed discussion of the pros and cons of each technique is presented in Section 7, and the paper is concluded in this section. 2. Observation model The observation model describes the way by which the observed LR images have been obtained. It models the parameters that degrade the original HR image to the observed LR images; therefore, it is also termed as forward model. A number of parameters contribute to the reduced image quality. These include: (a) the blur created either by defocus or motion of the camera; (b) sampling an object at a frequency less than the highest frequency contained in the object produces aliasing artifact on the image; (c) the inherent noise of natural images, as all the natural images contain some level of noise. These image degradation factors (i.e., blur, aliasing, and noise) can be incorporated into a mathematical model that relates the HR image to the observed LR image [5]. The schematic diagram of observation model is depicted in Fig. 1. Mathematically, let X be an original image degraded by motion blur (M), camera blur (B), and decimation effect (D). Suppose the image contains white Gaussian noise of standard deviation (g). Therefore, the forward observation model that relates the HR image to the observed LR image is [9]:
yk ¼ DBk M k X þ gk ;
Y ¼ HX þ g
ð1Þ
where k represents the number of LR images. A slightly different amount of blur and motion parameters are used to create different LR images. An example for creating simulated LR images from a HR natural image is described in Section 6. Once the
Fig. 1. The low resolution image is the blurred, warped, decimated and noisy version of the high resolution image created by the observation model.
Please cite this article in press as: Thapa D et al. A performance comparison among different super-resolution techniques. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.09.011
D. Thapa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx
3
model is known, an inverse process can be used to recover a HR image from a series of LR images. Mathematically speaking, it is an inverse problem that needs prior information from the HR image to find the reliable solution. 3. Super-resolution algorithms As we discussed earlier, a HR image is reconstructed either from a single LR image or from a sequence of LR images. There are a number of different approaches for reconstructing a single HR image from LR image(s). This paper includes only the most common reconstruction approaches. 3.1. Interpolation-based approaches Interpolation is the process of estimating new pixels within an image’s given set of pixels. It is one of the simplest ways of improving the resolution of an image. Interpolation methods have proven useful in many practical cases. Most commercial software such as Photoshop, Qimage, PhotoZoom Pro, and Genuine Fractals use interpolation methods to resize an image. The interpolation-based SR methods involve the following three intermediate steps: registration, interpolation, and restoration. Image registration is the process of geometrically aligning a set of LR images of the same scene with reference to one particular LR image called the reference image. LR images have different sub-pixel displacements and rotations from each other; therefore, it is very important to have accurate estimation of motion parameters before fusing them to create a HR image. Inaccurate estimation of motion parameters results in various types of visual artifacts that consequently degrade the quality of the reconstructed image. The registration is performed in either the frequency domain or the spatial domain. The frequency domain approaches for estimating motion parameters are described in more detail in Section 3.2. There are various techniques to estimate motion in the spatial domain as well. Keren et al. [10] proposed an algorithm based on Taylor expansion which estimates the motion parameters with sub-pixel accuracy. Bergen and colleagues [11] proposed a hierarchical framework for estimation of motion models such as planer and affine methods. Irani and Peleg [12] developed an interactive multi-resolution approach for estimating motion parameters. To estimate motion parameters, some algorithms map the whole image while others map only the features that are common among the LR images [13]. The HR image and motion parameters can be simultaneously estimated using Bayesian methods. Hardie et al. [14] explain one such approach. The Bayesian approaches are described in more detail in Section 3.3. Recently a gradient-based motion estimation method has been presented by Botella et al. [15]. Besides registration, the interpolation also plays an important role in estimating a HR image. There are many different interpolation methods, yet the complexity of each method depends upon the number of adjacent pixels used to estimate the intermediate pixels. The most commonly used interpolation methods include: Nearest neighbor, bilinear and bicubic methods [16]. Nearest neighbor is the most basic interpolation method that simply selects the closest pixel surrounding the interpolated point. The disadvantage of nearest neighbor is the stair-step shaped linear features visible in the HR image. Bilinear takes a weighted average of the closest 2 2 neighborhood pixels to estimate the value of the unknown interpolated pixel. Similarly, bicubic takes the closest 4 4 neighborhood pixels to estimate the value of the unknown interpolated pixel. In both of the latter methods the closer pixels are given the higher weights [16]. Since the shifts among the LR images are unequal, non-uniform interpolation methods are required to fuse all LR frames into one HR frame. In 1992 Ur and Gross [17] developed a non-uniform interpolation method for a set of spatially translated LR images using generalized multi-channel sampling theorem. There are many other complex interpolation approaches which are used in resizing a single image, such as Cubic B-spline [18], New Edge-Directed Interpolation (NEDI) [19], and Edge-Guided Interpolation (EGI) [20]. In short, the cubic spline fits a piecewise continuous curve, passing through a number of points. This spline consists of weights and these weights are the coefficients on the cubic polynomials. The essential task of the cubic spline interpolation is to calculate the weights used to interpolate the data. NEDI [19] is a covariance-based adaptive directional interpolation method in which the interpolated pixels are estimated from the local covariance coefficients of the LR image based on the geometric duality between the LR covariance and the HR covariance. EGI [20] divides the neighborhood of each pixel into two observation subsets in two orthogonal directions. Each observation subset approximates a missing pixel. The algorithm fused these two approximate values into a more robust estimate by using linear minimum mean square error estimation. Other interpolation methods include Gradient-Based Adaptive (GBA) interpolation [21], Interpolation by Autoregressive model [22]. These complex interpolation methods are very efficient and preserve most of the image information; however, their processing time and computational cost is higher in comparison with the general interpolation methods. The registration, interpolation, and restoration steps in the SR method can be conducted iteratively to achieve a HR image from a sequence of LR images through the Iterative Back Projection (IBP) approach [12]. In this method the HR image is estimated by iteratively minimizing the error between the simulated and observed LR images. This approach is very simple and easy to understand; however, it does not provide a unique solution due to the ill-posed inverse problem. Another easily implementable SR approach is the Projection Onto Convex Set (POCS) approach devised by Stark and Oskoui [23]. In this method, a set of constraints are defined to restrict the space of HR image. The constraint sets are convex and represent certain desirable SR image characteristics such as smoothness, positivity, bounded energy, and reliability. The intersection of these sets represents the space of the permissible solution. Thus, the problem is reduced to finding the intersection of the constraint sets. To find the solution, a projection operator is determined for each convex constraint set. The projection Please cite this article in press as: Thapa D et al. A performance comparison among different super-resolution techniques. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.09.011
4
D. Thapa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx
operator projects an initial estimate of the HR image onto the associated constraint set. By iteratively performing this approach, a good solution can be obtained at the surface of intersection of the k convex constraint sets. The algorithm originally did not incorporate the observation noise [23]; therefore, it was subsequently expanded by many other researchers. Tekalp et al. [24] extended the method to incorporate noise. Patti et al. [25] further expanded the algorithm by including space varying blur, non-zero aperture time, non-zero physical dimension of each individual sensor element, sensor noise, and arbitrary sampling lattices. 3.2. Frequency domain approaches Another popular approach to increase the resolution of an image is the frequency domain approach [1,26,27]. In fact, the first SR technique developed by Tsai and Huang [1] for working on LR satellite images was based on the frequency domain. Many researchers have subsequently expanded this approach to formulate different SR methods. In frequency domain method, the LR images are first transformed into the Discrete Fourier Transform (DFT) domain and the HR image is estimated in this domain. The estimated HR image is then transformed back to the spatial domain. Tsai and Huang [1] assume that the satellite images are similar but globally translated and can be treated as undersampled images of a static and unknown scene. The shift and aliasing parameters are used to devise a set of equations which relate DFT of LR images to the continuous Fourier transform (CFT) of the unknown HR image (F y ¼ UF x ; F y is the DFT of LR image y and F x is the CFT of HR unknown image x) [7]. The system matrix U is constructed from the motion information between LR images. Thus, the SR problem reduced to finding CFT of HR image with the help of DFT of multiple LR images and system matrix. This simplified SR problem is usually solved by using a Least Squares method [6]. The blur and noise during image acquisition were ignored in the study by Tsai and Huang. However, later Kim et al. [28] extended their work by considering additive noise and blurring effect. Correlation method is often used to find motion parameters in the frequency domain. The motion parameters are estimated based on the fact that spatially shifted images in the frequency domain differ only by a phase shift [26,27]. The phase shift between the two images can be obtained from their correlation. Using the phase correlation method both the image rotation and the scale can be converted into horizontal and vertical shifts. To minimize errors due to aliasing, only parts of the discrete Fourier coefficients that are free of aliasing are used [26]. After estimating the registration parameters, the LR images are combined according to the relationship between the aliased discrete Fourier transform coefficients of the observed LR images and the unknown HR image. The data, after fusion, are transformed back to the spatial domain and reconstructed a HR image. The advantage of the frequency domain method is that it is easy to apply and more suitable for removing aliasing than the spatial domain. The disadvantage of the frequency domain is that it is limited to global motion, and therefore it works only for planar shifts and planar rotations [26]. Lately, the Fourier domain is being replaced by Discrete Cosine Transform (DCT) [29] and Discrete Wavelet Transform (DWT) [30]. Rhee and Kang [29] modified the Fourier transform based approach to perform regularized deconvolution techniques using DCT. This method works even for the ill-posed cases or cases with insufficient sub-pixel information. DCT uses only real coefficients; therefore, it is computationally less expensive than the Fourier domain. Recently, several researchers have investigated the use of wavelet transform to address the SR problem [30–37]. Nguyen and Milanfar [30] used wavelet interpolation followed by restoration method for SR. They first calculated the wavelet coefficients of LR images and then interpolated them for blurred values at the HR grid points. By deconvolving the interpolated values with the known blur, an estimation of HR image is possible. El-Khamy and colleagues [31] performed the registration of multiple LR images in wavelet domain. Wavelet coefficients were fused and denoised after registration using a regularization method. Interpolation methods were used to get HR wavelet coefficients, and finally an inverse wavelet transform was performed to get the HR image in spatial domain. Chappalli and Bose [32] further implemented soft thresholding techniques to remove the noise associated with the wavelet coefficients. Ji and Fermuller [33,34] used a multi-resolution scheme to decompose the wavelet coefficients into two channels; those coefficients were then upsampled, filtered, and fused to get the simulated image. The super-resolved image was obtained using iterative back-projection method with efficient regularization criteria at each iteration to remove the noise. Li [35] proposed image resolution enhancement by extrapolating high-band wavelet coefficients. Recently, researchers have started to use contourlet transform to address the SR problem [2,38]. The advantage of contourlet transform is that unlike wavelet transform which only captures the horizontal and vertical edges in an image, contourlet transform can catch edges oriented along any arbitrary direction [2]. 3.3. Regularization-based approaches As already discussed in Section 2, SR is an underdetermined problem with many possible solutions. Another interesting approach for solving this ill-posed problem is utilizing a regularization term [39]. The regularization approach incorporates the prior knowledge of the unknown HR image to solve the SR problem. Deterministic and stochastic approaches are two different ways to implement regularization. The deterministic approach introduces regularization term that converts the ill-posed problem to a well-posed one [4]. The HR image X is estimated by minimizing the following cost function,
X ¼ arg min
N X
kyk Hk Xk2 þ kkRXk2
ð2Þ
k¼1
Please cite this article in press as: Thapa D et al. A performance comparison among different super-resolution techniques. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.09.011
D. Thapa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx
5
where R is the regularization term and k is regularization constant. The constrained least square regularization method uses smoothness constraints as a priori. In this case R is the high pass filter that minimizes the amount of high frequency content in the reconstructed image. The regularization parameter k controls the high frequency information. The larger values of k may over-smooth the reconstructed image which is an appropriate choice if only a small number of LR images are available and/or there is a lot of noise. The smaller values of k might result noisy solution which is applicable when a large number of LR images are available and the amount of noise is small [6]. The regularized Tikhonov least-square estimator uses l2 -norm of the second order derivative of the HR reconstruction as a regularization term [4]. The l2 -norm does not guarantee a unique solution. Farsiu et al. [40] exploited an alternative l1 -norm minimization for fast and robust SR. Zomet and colleagues [41] described a robust SR method for considering outliers. Kim and Bose [28] proposed a weighted recursive leastsquare-based algorithm for SR. The weight depends on the prior knowledge of the image; the algorithm assigns higher weights to the LR images with higher SNR. With different weights, the problem simply reduces to the general least square estimate. At last, interpolation and restoration are used to obtain the HR image. Recently, Mallat and Yu [36] proposed a regularization-based SR method which uses adaptive estimators obtained by mixing a family of linear inverse estimators. The stochastic approach [42–57], especially the Maximum A-Posteriori (MAP) approach, is popular because it provides a flexible and convenient way to include an a priori information and builds a strong relationship between the LR images and the unknown HR image. The method proposes to find the MAP estimation of the HR image X MAP for which a posteriori probability P(X|Y) is a maximum [5].
b MAP ¼ arg min PðXjYÞ X
ð3Þ
X
Using Bayes theorem, the above equation can be written as [5]:
b MAP ¼ arg min ½log PðYjXÞ þ log PðXÞ X X
ð4Þ
where P(Y|X) is the likelihood function and P(X) is a prior. Markov Random Field (MRF) is commonly used as the prior model and the Probability Density Function (PDF) of noise is calculated to determine the likelihood function. The HR image is computed by solving the optimization problem defined in Eq. (4). Several models such as TV norm [42], l1 norm [43] of horizontal and vertical gradients, Simultaneous Autoregressive (SAR) norm [44], Gaussian MRF model [14,45], the Huber MRF model [46], the discontinuity adaptive MRF model [47], the twolevel Gaussian non-stationary model [48], and the Conditional Random Field (CRF) model [49] are used for the prior image model. A special case of MAP where prior information of the HR image is not given is called Maximum Likelihood Estimation (MLE). Tom and Katsaggelos [50] examined the application of MLE for the SR of an image; however, since the solution of the underdetermined system needs a priori information, MLE remains an uncommon method. In the multi-frame SR, multiple images are fused to set a single HR image. While fusing the LR frames, pixel averaging methods are used. These methods blur the image; hence, image restoration methods are also needed to remove the blur [5,6]. Estimation of the blur kernel has an important role in predicting a HR image; however, many SR approaches assume a known blur kernel for simplicity. The known blur kernel can help estimate a HR image from a set of simulated LR images; however, for real LR images, the motion blur and point spread functions may lead to an unknown blur kernel [58]. Many algorithms are proposed in Bayesian framework to estimate the blur kernel. Recently, Liu and Sun [58] proposed a Bayesian approach of simultaneously predicting motion blur, blur kernel, noise level and HR image. The blind deconvolution algorithm has been used when the information about the blur kernel and the noise level are unknown. The blind deconvolution methods recover the blurring function from the degraded LR images and estimate the HR image without any prior knowledge of the blur or the original image [59–65]. Sroubek et al. [59] proposed a multichannel blind deconvolution model for SR, where multiple LR images are combined to get a single HR image by minimizing regularized energy function E(X, h).
EðX; hÞ ¼ arg min
k X
2
kyk DHk Xk þ aQ ðXÞ þ bRðhÞ
ð5Þ
k¼1
The first term is the fidelity term, and the remaining two are regularization terms. The regularization Q(X) is a smoothing term, while R(h) is the PSF regularization term. The regularization is carried out in both the image and blur domain. Bai and colleagues [62] propose a wavelet approach of blind deconvolution that adaptively selects the parameter of the regularization term. In [63–65] combined blind deconvolution and MAP estimator methods have been used for estimating the blur and HR image, respectively. 4. Single-frame super-resolution Although a number of multi-frame SR algorithms have been developed to enhance the resolution of an image, they highly depend on the estimation accuracy of the registration parameters [3]. The registration methods are restricted mostly to the global motion; however, different components in the same scene may have different or complex motion in the real world Please cite this article in press as: Thapa D et al. A performance comparison among different super-resolution techniques. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.09.011
6
D. Thapa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx
applications. In such cases, multi-frame SR methods do not give good results. Sometimes, LR images are better than the super-resolved image. Furthermore, the high frequency information lost during the acquisition period cannot be recovered using multi-frame SR approaches. An alternative approach is to use single image based SR algorithms [2]. Most of the single image based SR algorithms use learning mechanisms and therefore are called leaning based SR algorithms. The learning mechanism extracts the high frequency information lost during the image acquisition process, from external sources (training set) and integrates this information with the input LR image to achieve a super-resolved image [2]. The training set includes a large number of HR images and their simulated LR version. The performance of the learning-based SR methods highly depends upon the training set data, therefore, the training set images are chosen in a way that they have high frequency information and are similar to the input LR image [2]. Fig. 2 shows the flow chart of the learning based SR algorithms. The learning-based SR methods include the following three stages: feature extraction, learning, and reconstruction. 4.1. Feature extraction stage In this stage the features of the test image and training set images are extracted separately. First, the images are divided into small patches. The patches are convolved with the filters to create their LR versions. The HR patches and their simulated LR version of the training set are stored as a pair. Now, the features of the training set patches are extracted. Similarly, the features of the test images are also extracted. A number of feature extraction models have been developed by several groups of researchers. For example, in [66–70] a bandpass filter is used to extract the features, while in [71] low and high frequency components of images are used. Similarly, extractions of Gaussian derivatives [72], Gradient derivatives [73], Laplacian pyramid [74], and Steerable pyramid [75] have also been proposed. Many researchers used luminance values as the key features in their work [76–83]. The coefficients of DCT [84], wavelet transform [37,85], contourlet transform [2,38], and Principle Component Analysis (PCA) [86] have also been used as the features in learning-based SR. 4.2. Learning stage The features extracted from the input LR patches and training set patches are matched using learning models. Similar to feature extraction models, a number of learning models have been proposed in recent years. The most common learning models are Best Matching [37,38,71,73,75,80], MRF [2,66,72,85], Neighbor Embedding [82,87–89], and Sparse Representation [68–70,76,90–96] models. Other methods such as Content-based Classification and Class-specific Predictors [74], Support Vector Regression [84], Locally Linear Embedding Construction [67], PCA Construction [78], Canonical Correlation Analysis [86], Position-patch Construction [79], and Learning-based Interpolation Method and Deconvolution [77] have also been used. Glasner et al. [83] introduced a unified method for combining both reconstruction-based and learning-based SR. The learning method matches the features of LR patches of the test image with the features of LR patches of training set images. Since the HR and LR patches in training set are in pair; thus, the features of HR patch corresponding to the features of the selected LR patch are chosen for SR. Fig. 3 summarizes the various feature extraction and learning models available in the literature.
Fig. 2. Flowchart of learning-based super-resolution.
Please cite this article in press as: Thapa D et al. A performance comparison among different super-resolution techniques. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.09.011
D. Thapa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx
7
Fig. 3. Feature extraction and learning models available in the literature.
4.3. Reconstruction stage The feature extraction and learning models estimate the HR features for the input LR patch. These features are integrated to the input LR patch to achieve a super-resolved patch. Finally, all super-resolved patches are combined to generate the HR image [2]. A detailed description of learning-based SR methods is shown in Fig. 2. There are many learning-based SR approaches; however, the example-based (EB) SR method proposed by Kim and Kwon [97,98] has outperformed several state-of-the-art algorithms in single image SR. This method is based on the framework of Freeman et al. [66] which collects pairs of LR and HR image patches in the training stage. In the learning stage, each LR patch of the input image is compared to the stored training set LR patches, and using a nearest neighbor search method a nearest LR patch and its corresponding HR pair are selected. However, Freeman et al. [66] approach often results in a blurred image due to the inability of nearest neighbor. Kim and Kwon [97] modified this approach by replacing nearest neighbor search with sparse kernel ridge regression. In their approach, kernel ridge regression is adopted to learn a map from input LR patch to training set’s HR and LR patch pairs. This method however also produces some blurring and ringing effects near the edges which can be removed using post processing techniques [97]. Over the last century, there have been extensive studies on sparse representation algorithms. Sparse representation is the approximation of an image/signal with the linear combinations of only a small set of elementary signals called atoms. The
Please cite this article in press as: Thapa D et al. A performance comparison among different super-resolution techniques. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.09.011
8
D. Thapa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx
atoms are chosen either from a predefined set of functions (analytical-based dictionary) such as Discrete Cosine Transform and Wavelets, or learned from a training set (learning-based dictionary). The main advantage of these algorithms is that the signal representation coefficients are sparse, i.e., they have many zero coefficients and a few nonzero coefficients. To be precise, consider a finite dimensional discrete time signal x 2 RN and an over-complete dictionary D 2 RNK ; N < K. The aim is to represent signal x using dictionary D such that the signal representation error kDa xk2 , where a is the sparse representation vector, is minimized. The sparse representation of a signal is obtained by solving the following optimization problem [69].
arg min kak0 a
subject to x ¼ Da;
ð6Þ
Sparse representation has become a major field of research in signal processing. Utilizing this approach, several researchers have proposed learning-based SR algorithms [68–70,76,90–96]. Sparse representation based SR computes the sparse approximation of input LR patch and uses the coefficients of approximation to estimate a HR patch. In this method, two dictionaries Dh and Dl are jointly trained from HR and LR patches. There is a need to enforce the similarity of sparse coding between the LR (j ¼ Dl bÞ and HR patch ðl ¼ Dh aÞ. The dictionary extracted from the HR patch Dh is applied with the sparse representation of the LR patch (Dh b) to recover the super-resolved patch. Zeyde et al. [70] applied K-SVD dictionary learning algorithm for learning HR and LR dictionary-pair which increased the performance of this approach. In sparse representation based approach, the final super-resolved image patch is generated from the combination of sparse coefficients of the LR patch and the HR dictionary; the performance of the method depends on both the sparse coefficients of LR patch and the HR dictionary [99]. Many researchers have proposed new algorithms to better estimate the HR
Fig. 4. Natural images used for simulations.
Please cite this article in press as: Thapa D et al. A performance comparison among different super-resolution techniques. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.09.011
9
D. Thapa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx
dictionary and sparse coefficients of the LR image. Zhang and colleagues [90] proposed a dual-dictionary learning method that consists of main dictionary learning and residual dictionary learning to recover the main HR and residual HR high frequency information. Additional details can be added to the LR image using the double-channel learning process. Since the final SR image is constructed from the sparse coding coefficients of the LR image and the learned HR dictionary, the performance of the method depends on both the coefficients and the dictionary. Yang et al. [93] reduced the execution time of the sparse representation based SR by learning a neural network model for fast sparse inference and then selectively processing only the visually salient features. Using both the analytical-based and learning-based models, Kanakaraj and Kathiravan [94] improved the dictionary learning method. Dong et al. [95] proposed a clustered based sparse representation model called Adaptive Sparse Domain Selection (ASDS) to improve the dictionary. In this approach, the image patches are gathered into many clusters and a compact subdictionary is learned for each cluster. For each image patch, the best subdictionary can be selected that can reconstruct an image more accurately than a universal dictionary. In another study Dong et al. [96] proposed sparse representation based image interpolation through incorporating the image nonlocal self-similarities to the sparse representation model. The term selfsimilarity refers to the similarity of image pixel values or structure at different parts of the image. The algorithm included nonlocal autoregressive model as a new fidelity term to the sparse representation model which reduces the coherence between the dictionaries, and consequently makes sparse representation model more effective. Dong and colleagues not only estimated better HR dictionary for each image patch, they also utilized the image nonlocal self-similarity to obtain good estimation of sparse representation coefficients of the LR image. Recently, Dong et al. have proposed two models for extracting sparse coding coefficients from a LR image as close to the original image as possible using nonlocal sparsity constraints. These are the Centralized Sparse Representation (CSR) [91] and the Nonlocally Centralized Sparse Representation (NCSR) models [92]. 5. Image quality metrics To compare the performance of SR techniques, Peak-SNR (PSNR) and Structural Similarity (SSIM) between the superresolved image and its original are calculated. The PSNR is calculated from the Mean Square Error (MSE), which is the average b ði; jÞ and its original error between the original image and the super-resolved image. Given a super-resolved m n image X Xði; jÞ, MSE and PSNR are defined as:
MSE ¼
n1 XX 1 m1 b jÞ2 ½Xði; jÞ Xði; mn i¼0 j¼0
ð7Þ
L PSNR ¼ 20log 10 pffiffiffiffiffiffiffiffiffiffi MSE
ð8Þ
The SSIM index computes the similarity between the original and super-resolved images [100]. The SSIM takes into account luminance, contrast, and structural changes between the two images. The SSIM index is defined as:
Table 1 The Peak-Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) indices between the original and super-resolved image obtained from different reconstruction-based SR approaches. Techniques
Parameters
Barbara
Butterfly
Lena
Parrot
Peppers
Frequency domain
PSNR SSIM Time (s)
25.42 0.8621 13.86
23.16 0.8724 12.14
26.05 0.8594 12.14
26.72 0.8984 12.01
24.10 0.8686 11.02
IBP
PSNR SSIM Time(s)
28.21 0.8787 2.92
22.83 0.8728 1.94
27.73 0.8705 2.07
27.73 0.9038 2.04
28.38 0.9140 1.94
POCS
PSNR SSIM Time(s)
27.46 0.8854 1.10
20.69 0.8148 1.19
27.15 0.8526 1.41
26.40 0.8810 1.10
26.69 0.8861 1.12
TV-norm
PSNR SSIM Time(s)
34.94 0.9618 66.72
32.45 0.9734 62.89
32.19 0.9330 77.54
30.84 0.9431 92.32
35.52 0.9663 85.87
l1-norm
PSNR SSIM Time(s)
34.52 0.9598 66.13
32.46 0.9751 107.9
31.97 0.9300 94.31
30.52 0.9403 138.2
35.68 0.9683 79.49
SAR-norm
PSNR SSIM Time(s)
33.69 0.9527 59.89
28.49 0.9390 55.04
30.33 0.9116 70.31
29.70 0.9316 71.98
33.57 0.9581 87.10
Please cite this article in press as: Thapa D et al. A performance comparison among different super-resolution techniques. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.09.011
10
D. Thapa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx
Fig. 5. Results of different reconstruction-based super-resolution techniques applied to Lena image.
Fig. 6. Results of different reconstruction-based super-resolution techniques applied to Peppers image.
Please cite this article in press as: Thapa D et al. A performance comparison among different super-resolution techniques. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.09.011
11
D. Thapa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx
SSIMðx; ^xÞ ¼
ð2lx l^x þ c1 Þð2rx^x þ c2 Þ ðl2x þ l2^x þ c1 Þðr2x þ r^2x þ c2 Þ
ð9Þ
where lx and l^x are the means and rx and r^x are the standard deviations of the original and super-resolved images, rx^x is the b , and c1 and c2 are constants. SSIM measures the similarity between the two images. When the supercovariance of X and X resolved image is very similar to its original, the value of SSIM approaches to 1. 6. Simulations MATLAB software (version R2008a) was used to code and/or to run the programs. The MATLAB codes were downloaded from the websites of respective authors, and the parameters of each method were set according to the values given in their corresponding papers. A computer with the operating system 64 bit version of Windows 7, Intel (R) Pentium (R) CPU G620T 2.2 GHz processor, and 4 GB RAM was used to run the simulations. The screen resolution was 1920 1080. Natural images Barbara, Butterfly, Lena, Parrot and Peppers shown in Fig. 4 were reduced to size 180 180 for faster simulations. SR approaches were applied to the simulated LR images. Simulated LR images are viewed as the shifted, rotated and downsampled version of a HR image. Four 90 90 pixels LR images were created from these HR images. For each simulation the shift and rotation parameters were generated randomly. The downsample factor was set to 2. The first LR image is the reference LR image which is a downsampled version of the HR image, with the shift and rotation parameters of zero. We used these simulated LR images to recover the original HR image (resolution 180 180) using various SR methods. Frequency domain SR approaches [26] were first examined on the simulated LR images. These images were transformed into Fourier domain, and shift and rotation parameters between the LR and reference images were calculated based on their Table 2 The Peak-Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) indices between the original and super-resolved image obtained from different single image SR approaches. Techniques
Parameters
Barbara
Butterfly
Lena
Parrot
Peppers
Bicubic intp.
PSNR SSIM Time(s)
26.20 0.8541 0.18
22.13 0.8584 0.17
26.17 0.8403 0.18
26.78 0.8903 0.18
26.07 0.8888 0.19
EGI
PSNR SSIM Time(s)
26.63 0.8586 3.24
22.28 0.8859 3.23
26.60 0.8370 3.26
26.52 0.8850 3.14
26.94 0.8962 2.88
Cubic spline
PSNR SSIM Time(s)
26.71 0.8653 0.18
22.43 0.8798 0.17
26.63 0.8534 0.18
27.19 0.8989 0.18
26.70 0.9033 0.18
NEDI
PSNR SSIM Time(s)
26.46 0.8484 12.42
22.36 0.8764 12.43
26.44 0.8356 12.14
27.09 0.8904 11.31
26.20 0.8893 10.27
SME
PSNR SSIM Time(s)
26.68 0.8624 50.15
22.39 0.8822 48.56
26.63 0.8505 49.53
27.21 0.8973 49.96
26.66 0.9004 51.28
ASDS
PSNR SSIM Time(s)
31.57 0.9338 183.7
27.49 0.9452 194
31.11 0.9207 175.8
30.31 0.9376 173.5
32.78 0.9553 170.5
Sparse intp.
PSNR SSIM Time(s)
29.76 0.9241 135.5
27.69 0.9585 136.4
30.77 0.9269 124.7
29.86 0.9454 137.4
31.45 0.9555 140.93
CSR
PSNR SSIM Time(s)
31.04 0.9264 525
28.05 0.9505 541
31.27 0.9224 478
30.89 0.9438 468
31.87 0.9540 485
Nonlocal CSR
PSNR SSIM Time(s)
31.52 0.9370 321
28.48 0.9532 335
31.98 0.9374 309
31.26 0.9538 320
32.13 0.9599 374
Sparse representation (Yong et al.)
PSNR SSIM Time(s)
32.44 0.9397 73.68
29.52 0.9521 69.63
33.02 0.9271 88.15
32.82 0.9509 74.39
32.35 0.9518 94.78
Sparse representation (Zeyde et al.)
PSNR SSIM Time(s)
31.3 0.9407 3730
27.6 0.9528 3735
31.2 0.9286 3710
30.8 0.9500 3705
32.7 0.9601 3718
Example-based (Kim et al.)
PSNR SSIM Time(s)
31.43 0.9476 5.17
28.90 0.9660 12.91
32.03 0.9388 5.6
31.91 0.9587 4.65
31.58 0.9665 7.15
The bold number indicates the highest value.
Please cite this article in press as: Thapa D et al. A performance comparison among different super-resolution techniques. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.09.011
12
D. Thapa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx
Fig. 7. Results of different single image super-resolution techniques applied to Lena image.
low-frequency, aliasing part. Shifts were estimated from the central low frequency components in which ten low frequency components were used and the rotations were estimated from a disc of radius 0.8. By incorporating these motion parameters on the simulated LR images, a HR image was reconstructed using cubic interpolation. Besides cubic interpolation, the performances of IBP [12], Robust Regularization [41], and POCS [23] were also examined by measuring the motion parameters in Fourier domain. We employed MATLAB software prepared by Vandewalle et al. [26] to implement these algorithms. For IBP, an upsampled version of the reference LR image was used as an initial estimate of HR image. The upsampling was performed using bicubic interpolation. The IBP created a set of LR images from the initial estimate of HR image using the motion Please cite this article in press as: Thapa D et al. A performance comparison among different super-resolution techniques. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.09.011
D. Thapa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx
13
Fig. 8. Results of different single image super-resolution techniques applied to Peppers image.
parameters estimated in Fourier domain. The estimate was then updated by iteratively minimizing the error between the simulated LR images and test LR images based on the algorithm developed in [12]. Robust regularization further incorporates a median estimator in the iterative process to achieve better results. We implemented the robust regularization algorithm proposed by Zomet and colleagues [41]. The POCS algorithm [23] which reconstructs a HR image using projection on convex sets was examined only for the planer shift.
Please cite this article in press as: Thapa D et al. A performance comparison among different super-resolution techniques. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.09.011
14
D. Thapa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx
Similarly, Bayesian SR methods were studied and their robustness on the LR images was tested for various prior models. For simulation we used algorithms and MATLAB software prepared by Villena et al. [44]. Total Variation (TV) [42], l1 norm of the horizontal and vertical gradients [43], and Simultaneous Auto Regressive (SAR) [44] were used as prior models. The motion and rotation parameters for creating LR imagers were generated randomly. The simulated four LR images were used as input. The algorithm utilized hierarchical Bayesian model where the model parameters, registration parameters and HR image were estimated simultaneously from the LR images. Variational approximation was applied to estimate the posterior distributions of the unknowns. The algorithm terminated when either a maximum number of iterations (k = 100) or the criterion kxk xk1 k2 =kxk1 k2 < 104 where xk is the kth estimated HR image, was satisfied. The Bayesian methods showed the highest PSNR value compared to the other multi-frame SR methods. However, the TV norm, l1 norm of the horizontal and vertical gradients and SAR norm prior model led to over-smooth non-edge regions of the image. Table 1 shows the performance comparison of all of the aforementioned multi-frame SR methods. Figs. 5 and 6 show the results of various reconstruction-based SR approaches applied to Lena and Peppers images, respectively. Single image interpolation methods were also studied on natural images. The input LR image was created by direct subsampling of the original image by a factor of 2. The LR image was upscaled to its double size 180 180 using nearest neighbor, bilinear and bicubic interpolations. The interpolated images were compared with the original image. The PSNR and SSIM indices for bicubic method were greater than those of the nearest neighbor and bilinear interpolation. The complex interpolation methods, Cubic Spline [36], NEDI [19], and EGI [20] were also applied to the downsampled LR images. A regularization-based SR with Sparse Maxing Estimators (SME) [36] was also examined. Since noise was not added to the LR image of single image interpolation methods they showed better PSNR and SSIM indices. Table 2 compares the objective quality metrics (PSNR and SSIM) of various single image interpolation approaches. We examined EB method proposed by Kim and Kwon [97] on the natural images. We chose this method since it has outperformed many state-of-the-art algorithms and also because it removes blurring and ringing effects near the edges [97]. The input LR images were created by downsampling the original image by a factor of 2. Noise was not added to the downsampled image. The training set was created by randomly selecting HR generic images. The LR training images were obtained by blurring and subsampling HR images. Thus, the training set constituted a set of LR and HR image pairs. The algorithm was performed on image patches. In this method, the input LR patch was first interpolated by a factor of 2 using cubic interpolation. Next, kernel ridge regression was adopted to learn a map from input LR patch to training set image HR and LR patch pairs. The regression provided a set of candidate images. The super-resolved image was obtained by combing through candidate images based on estimated confidences. The artifacts around the edges of the reconstructed image were removed by utilizing image prior regularization term. Better PSNR and SSIM values were noticed in this method. Similarly, sparse representation based SR techniques were examined on the LR image. We extracted 5 5 patches with 1 pixel overlap between adjacent patches from the input image. The HR dictionaries and sparse coefficients were learned from both the training set HR images and LR test image. We used the method and software proposed by Yang et al. [69] to run the simulation. In addition, sparse representation based SR proposed by Zeyde et al. [70] was examined. In this method, LR and HR dictionaries were constructed from the LR and HR image patches, respectively, and learned using K-SVD dictionary learning algorithm. The learned HR dictionary was used to recover the HR patches by combining them with the sparse coefficients of the LR image. ASDS [95], sparse interpolation [96], CSR [91], and the most recent NCSR [92] methods proposed by Dong et al. were also implemented on LR images. The latter two methods introduced the centralized sparsity constraint by exploiting non-local statics. Both the local sparsity and nonlocal sparsity constraints are combined in this approach. The centralized sparse representation approach approximates the sparse coefficients of the LR image as closely as the original HR image does which results in better image reconstruction and hence better PSNR and SSIM indices. Figs. 7 and 8 show the results of various single image-based SR approaches applied to Lena and Peppers images, respectively.
7. Conclusion In this paper, we provided a general survey of the existing SR techniques. We also reported a comprehensive performance comparison among different SR techniques in terms of PSNR and SSIM indices. The results showed that the Fourier-based cubic interpolation method significantly blurred the reconstructed image. The IBP, robust regularization and single image bicubic interpolation methods introduced small amount of ‘‘ringing effect”; however, they preserved most of the image features. The performance of these methods highly depends on the estimation of registration parameters. Therefore, a small sub-pixel error in the registration may result in a different estimation. Robust motion estimation algorithm is essential to increase the performance of multi-frame SR. The Bayesian approaches jointly perform a registration and fusion task. They provided a visually pleasant image with the highest PSNR indices. Many algorithms including Bayesian approaches assume spatially uniform Gaussian blur which is usually impractical. Similarly, most of the SR algorithms deteriorate when noise is present in the image; therefore, a method which is more robust to noise but can preserve image features is essential. The EB and sparse representation methods show good results and can be used when there are not enough input LR images and/or when a higher resolution factor is required. The performance of these algorithms may be improved by using a larger set of training images and a more relevant learning method. The MATLAB codes used in this paper were downloaded from the websites of respective authors and the parameters of each method were set according to the values given in their corresponding papers. The differences in PSNRs and SSIMs of Please cite this article in press as: Thapa D et al. A performance comparison among different super-resolution techniques. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.09.011
D. Thapa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx
15
various SR approaches may be due to the differences in techniques, and/or the parameters used. We refer the interested readers to visit the webpage http://quark.uwaterloo.ca/~dthapa which contains the MATLAB source codes for various SR techniques developed by several groups of researchers. Acknowledgements We thank several authors for providing their MATLAB codes online. We also thank University of Waterloo and Ryerson University for providing lab equipment and financial support for this study. References [1] Tsai RY, Huang TS. Multiframe image restoration and registration. Adv Comput Vis Image Process 1984;1:317–39. [2] Wu W, Liu Z, Gueaieb W, He X. Single-image super-resolution based on Markov random field and contourlet transform. J Electron Imag 2011;20. 023005–023005. [3] Nasrollahi K, Moeslund TB. Super-resolution: a comprehensive survey. Mach Vis Appl 2014;25(6):1423–68. [4] Plenge E, Poot DH, Bernsen M, Kotek G, Houston G, Wielopolski P, et al. Super-resolution methods in MRI: can they improve the trade-off between resolution, signal-to-noise ratio, and acquisition time? Magn Reson Med 2012;68:1983–93. [5] Tian J, Ma KK. A survey on super-resolution imaging. Signal Image Video Process 2011;5:329–42. [6] Park SC, Park MK, Kang MG. Super-resolution image reconstruction: a technical overview. IEEE Signal Process Mag 2003;20:21–36. [7] Borman S, Stevenson R. Spatial resolution enhancement of low-resolution image sequences-a comprehensive review with directions for future research. Lab Image and Signal Analysis, University of Notre Dame. Tech rep; 1998. [8] Van Ouwerkerk JD. Image super-resolution survey. Image Vis Comput 2006;24(10):1039–52. [9] Simpkins JD, Stevenson RL. An introduction to super-resolution imaging. In: Lakshminarayanan V, Calvo ML, Alieva T, editors. Mathematical optics: classical, quantum, and computational methods. CRC Press; 2012. p. 555–78. [10] Keren D, Peleg S, Brada R. Image sequence enhancement using sub-pixel displacements. In: Proceedings of IEEE computer society conference on CVPR, Ann Arbor, MI; 1988. p. 742–6. [11] Bergen JR, Anandan P, Hanna KJ, Hingorani R. Hierarchical model-based motion estimation. In: Proceedings of 2nd ECCV 1992, lecture notes in computer science. p. 237–52. [12] Irani M, Peleg S. Improving resolution by image registration. CVGIP: Graph Models Image Process 1991;53(3):231–9. [13] Capel D, Zisserman A. Computer vision applied to super-resolution. IEEE Signal Process Mag 2003;20(3):75–86. [14] Hardie RC, Barnard KJ, Armstrong EE. Joint MAP registration and high-resolution image estimation using a sequence of undersampled images. IEEE Trans Image Process 1997;6(12):1621–33. [15] Botella G, Meyer-Baese U, García A, Rodríguez M. Quantization analysis and enhancement of a VLSI gradient-based motion estimation architecture. Digit Signal Process 2012;22(6):1174–87. [16] Cambridge in colour. Digital image interpolation.
[accessed 20th, December, 2012]. [17] Ur H, Gross D. Improved resolution from sub-pixel shifted pictures. CVGIP Graph Models Image Process 1992;54:181–6. [18] Zhang X, Liu Y. A computationally efficient super-resolution reconstruction algorithm based on the hybrid interpolation. J Comput 2010;5:885–92. [19] Li X, Orchard MT. New edge-directed interpolation. IEEE Trans Image Process 2001;10:1521–7. [20] Zhang L, Wu X. An edge-guided image interpolation algorithm via directional filtering and data fusion. IEEE Trans Image Process 2006;15:2226–38. [21] Chu J, Liu J, Qiao J, Wang X, Li Y. Gradient-based adaptive interpolation in super-resolution image restoration. In: IEEE ICSP; 2008. p. 1027–30. [22] Zhang X, Wu X. Image interpolation by adaptive 2-D autoregressive modeling and soft-decision estimation. IEEE Trans Image Process 2008;17:887–96. [23] Stark H, Oskoui P. High-resolution image recovery from image-plane arrays, using convex projections. JOSA A 1989;6:1715–26. [24] Tekalp AM, Ozkan MK, Sezan MI. High-resolution image reconstruction from lower-resolution image sequences and space-varying image restoration. In: IEEE ICASSP; 1992. p. 169–72. [25] Patti AJ, Sezan MI, Tekalp AM. Super-resolution video reconstruction with arbitrary sampling lattices and nonzero aperture time. IEEE Trans Image Process 1997;6:1064–76. [26] Vandewalle P, Sü S, Vetterli M. A frequency domain approach to registration of aliased images with application to super-resolution. EURASIP J Appl Signal Process 2006:1–14. [27] Lucchese L, Cortelazzo GM. A noise-robust frequency domain technique for estimating planar roto-translations. IEEE Trans Signal Process 2000;48:1769–86. [28] Kim S, Bose NK, Valenzuela HM. Recursive re-construction of high resolution image from noisy under-sampled multiframes. IEEE Trans Acoust Speech Signal Process 1990;38(6):1013–27. [29] Rhee S, Kang MG. Discrete cosine transform based regularized high-resolution image reconstruction algorithm. Opt Eng 1999;38:1348–56. [30] Nguyen N, Milanfar P. A wavelet-based interpolation-restoration method for super-resolution (wavelet super-resolution). Circ Syst Signal Process 2000;19:321–38. [31] El-Khamy SE, Hadhoud MM, Dessouky MI, Salam BM, El-Samie FA. Regularized super-resolution reconstruction of images using wavelet fusion. Opt Eng 2005;44:097001–0970010. [32] Chappalli MB, Bose NK. Simultaneous noise filtering and super-resolution with second-generation wavelets. IEEE Signal Process Lett 2005;12:772–5. [33] Ji H, Fermüller C. Wavelet-based super-resolution reconstruction: theory and algorithm. In: ECCV; 2006. p. 295–307. [34] Ji H, Fermuller C. Robust wavelet-based super-resolution reconstruction: theory and algorithm. IEEE Trans Pattern Anal Mach Intell 2009;31:649–60. [35] Li X. Image resolution enhancement via data-driven parametric models in the wavelet space. EURASIP J Image Video Process 2007:1–12. [36] Mallat S, Yu G. Super-resolution with sparse mixing estimators. IEEE Trans Image Process 2010;19:2889–900. [37] Jiji CV, Joshi MV, Chaudhuri S. Single-frame image super-resolution using learned wavelet coefficients. Int J Imag Syst Technol 2004;14:105–12. [38] Jiji CV, Chaudhuri S. Single-frame image super-resolution through contourlet learning. EURASIP J Appl Signal Process 2006. 235–235. [39] Kim SP, Su WY. Recursive high-resolution reconstruction of blurred multiframe images. IEEE Trans Image Process 1993;2:534–9. [40] Farsiu S, Robinson MD, Elad M, Milanfar P. Fast and robust multiframe super-resolution. IEEE Trans Image Process 2004;13:1327–44. [41] Zomet A, Rav-Acha A, Peleg S. Robust super-resolution. IEEE Comput Soc Conf Comput Vis Pattern Recogn 2001;1:I-645–50. [42] Babacan SD, Molina R, Katsaggelos AK. Total variation super resolution using a variational approach. In: 15th IEEE international conference on image processing; 2008. p. 641–4. [43] Villena S, Vega M, Molina R, Katsaggelos AK. Bayesian super-resolution image reconstruction using an l1 prior. In: 6th International symposium on image and signal processing and analysis; 2009. p. 152–7. [44] Villena S, Vega M, Babacan SD, Molina R, Katsaggelos AK. Bayesian combination of sparse and non-sparse priors in image super-resolution. Digit Signal Process 2013;23:530–41. [45] Tian J, Ma KK. Stochastic super-resolution image reconstruction. J Vis Commun Image Represent 2010;21:232–44.
Please cite this article in press as: Thapa D et al. A performance comparison among different super-resolution techniques. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.09.011
16
D. Thapa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx
[46] Cheeseman P, Kanefsky B, Kraft R, Stutz J, Hanson R. Super-resolved surface reconstruction from multiple images. In: Heidberg GR, editor. Maximum entropy and bayesian methods. Netherlands: Springer; 1996. p. 293–308. [47] Suresh KV, Rajagopalan AN. Robust and computationally efficient super-resolution algorithm. J Opt Soc Am A 2007;24:984–92. [48] Belekos S, Galatsanos NP, Katsaggelos AK. Maximum a posteriori video super-resolution using a new multichannel image prior. IEEE Trans Image Process 2010;19:1451–64. [49] Kong D, Han M, Xu W, Tao H, Gong Y. A conditional random field model for video super-resolution. In: IEEE ICPR; 2006. p. 619–22. [50] Tom BC, Katsaggelos AK. Reconstruction of a high resolution image from multiple degraded mis-registered low resolution images. In: Proc SPIE, vol. 2308; 1994. p. 971–81. [51] Woods NA, Galatsanos NP, Katsaggelos AK. Stochastic methods for joint registration, restoration, and interpolation of multiple undersampled images. IEEE Trans Image Process 2006;15:201–13. [52] Zhang H, Wipf D, Zhang Y. Image super-resolution via sparse bayesian modeling of natural images; 2012. p. 1–29. Available from: arXiv:1209.4317. [53] Elad M, Hel-Or Y. A fast super-resolution reconstruction algorithm for pure translational motion and common space-invariant blur. IEEE Trans Image Process 2001;10:1187–93. [54] Dalley G, Freeman B, Marks J. Single-frame text super-resolution: a Bayesian approach. In: IEEE ICIP; 2004. p. 3295–8. [55] Capel D, Zisserman A. Super-resolution from multiple views using learnt image models. In: IEEE comput soc conf comput vis pattern recogn; 2001. p. II-627–II-634. [56] Pickup LC, Capel DP, Roberts SJ, Zisserman A. Bayesian methods for image super-resolution. Comput J 2009;52:101–13. [57] Polatkan G, Zhou M, Carin L, Blei D, Daubechies I. A bayesian nonparametric approach to image super-resolution; 2012. p. 1–30. Available from: arXiv: 1209.5019. [58] Liu C, Sun D. A Bayesian approach to adaptive video super resolution. In: IEEE conference on CVPR; 2011. p. 209–21. [59] Sroubek F, Cristóbal G, Flusser J. A unified approach to super-resolution and multichannel blind deconvolution. IEEE Trans Image Process 2007;16:2322–32. [60] Sroubek F, Flusser J, Sorel M. Super-resolution and blind deconvolution of video. In: IEEE ICPR; 2008. p. 1–4. [61] Hirsch M, Harmeling S, Sra S, Schölkopf B. Online multi-frame blind deconvolution with super-resolution and saturation correction. Astron Astrophys 2011;531:1–11. [62] Bai Y, Hu J, Luo Y. Self-adaptive blind super-resolution image reconstruction. In: IEEE ICSIP; 2010. p. 1208–12. [63] Kasturiwala SB, Ladhake SA. Super-resolution: a novel application to image restoration. IJCSE 2010;2:1659–64. [64] Šroubek F, Flusser J. Resolution enhancement via probabilistic deconvolution of multiple degraded images. Pattern Recogn Lett 2006;27:287–93. [65] Sroubek F, Flusser J. Multichannel blind deconvolution of spatially misaligned images. IEEE Trans Image Process 2005;14:874–83. [66] Freeman WT, Jones TR, Pasztor EC. Example-based super-resolution. IEEE Comput Graph Appl 2002;22:56–65. [67] Chang H, Yeung DY, Xiong Y. Super-resolution through neighbor embedding. In: IEEE comput soc conf comput vis pattern recogn; 2004. p. I-275–I-282. [68] Yang J, Wright J, Huang T, Ma Y. Image super-resolution as sparse representation of raw image patches. In: IEEE CVPR; 2008. p. 1–8. [69] Yang J, Wright J, Huang TS, Ma Y. Image super-resolution via sparse representation. IEEE Trans Image Process 2010;19:2861–73. [70] Zeyde R, Elad M, Protter M. On single image scale-up using sparse-representations. In: Curves and surfaces; 2012. p. 711–30. [71] Suetake N, Sakano M, Uchino E. Image super-resolution based on local self-similarity. Opt Rev 2008;15:26–30. [72] Sun J, Zheng NN, Tao H, Shum HY. Image hallucination with primal sketch priors. In: IEEE comput soc conf comput vis pattern recogn; 2003. p. II-729– II-736. [73] Baker S, Kanade T. Limits on super-resolution and how to break them. IEEE Trans Pattern Anal Mach Intell 2002;24:1167–83. [74] Li X, Lam KM, Qiu G, Shen L, Wang S. Example-based image super-resolution with class-specific predictors. J Vis Commun Image Represent 2009;20:312–22. [75] Su C, Zhuang Y, Huang L, Wu F. Steerable pyramid-based face hallucination. Pattern Recogn 2005;38:813–24. [76] Dong W, Shi G, Zhang L, Wu X. Super-resolution with nonlocal regularized sparse representation. In: Proc SPIE, vol. 7744; 2010. p. 77440H–77440H. [77] Shi G, Dong W, Wu X, Zhang L. Context-based adaptive image resolution upconversion. J Electron Imag 2010;19. 013008–013008. [78] Wang X, Tang X. Hallucinating face by eigen transformation. IEEE Trans Syst Man Cyb Part C: Appl Rev 2005;35:425–34. [79] Ma X, Zhang J, Qi C. Hallucinating face by position-patch. Pattern Recogn 2010;43:2224–36. [80] Hertzmann A, Jacobs CE, Oliver N, Curless B, Salesin DH. Image analogies. In: Proceedings of the 28th annual conference on computer graphics and interactive techniques; 2001. p. 327–40. [81] Ni KS, Kumar S, Vasconcelos N, Nguyen TQ. Single image super-resolution based on support vector regression. In: IEEE ICASSP; 2006. p. II. [82] Bevilacqua M, Roumy A, Guillemot C, Morel MLA. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In: BMVC; 2012. p. 1–10. [83] Glasner D, Bagon S, Irani M. Super-resolution from a single image. In: IEEE ICCV; 2009. p. 349–56. [84] Ni KS, Nguyen TQ. Image super-resolution using support vector regression. IEEE Trans Image Process 2007;16:1596–610. [85] Lui SF, Wu JY, Mao HS, Lien JJ. Learning-based super-resolution system using single facial image and multi-resolution wavelet synthesis. In: ACCV; 2007. p. 96–105. [86] Huang H, He H, Fan X, Zhang J. Super-resolution of human face image using canonical correlation analysis. Pattern Recogn 2010;43:2532–43. [87] Gao X, Zhang K, Tao D, Li X. Image super-resolution with sparse neighbor embedding. IEEE Trans Image Process 2012;21:3194–205. [88] Chan JCW, Ma J, Canters F. A comparison of super-resolution reconstruction methods for multi-angle CHRIS/Proba images. Proc SPIE 2008;7109:1–11. [89] Gong M, He K, Zhou J, Zhang J. Single color image super-resolution through neighbor embedding. JCIS 2011;7:49–56. [90] Zhang J, Zhao C, Xiong R, Ma S, Zhao D. Image super-resolution via dual-dictionary learning and sparse representation. In: IEEE ISCAS; 2012. p. 1688–91. [91] Dong W, Zhang L, Shi G. Centralized sparse representation for image restoration. In: IEEE ICCV; 2011. p. 1259–66. [92] Dong W, Zhang L, Shi G, Li X. Nonlocally centralized sparse representation for image restoration. IEEE Trans Image Process 2013;22:1620–30. [93] Yang J, Wang Z, Lin Z, Cohen S, Huang T. Coupled dictionary training for image super-resolution. IEEE Trans Image Process 2012;21:3467–78. [94] Kanakaraj J, Kathiravan S. Super-resolution image reconstruction using sparse parameter dictionary framework. Sci Res Essays 2012;7:586–92. [95] Dong W, Zhang L, Shi G, Wu X. Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization. IEEE Trans Image Process 2011;20:1838–57. [96] Dong W, Zhang L, Lukac R, Shi G. Sparse representation based image interpolation with nonlocal autoregressive modeling. IEEE Trans Image Process 2013;22:1382–94. [97] Kim KI, Kwon Y. Example-based learning for single-image super-resolution and JPEG artifact removal. Max Planck Institute Biological Cybernetics, Tübingen, Germany. Technical report no TR-173; 2008. [98] Kim KI, Kwon Y. Single-image super-resolution using sparse regression and natural image prior. IEEE Trans Pattern Anal Mach Intell 2010;32:1127–33. [99] Thapa D, Raahemifar K, Lakshminarayanan V. Comparison of super-resolution algorithms applied to retinal images. J Biomed Opt 2014;19(5):056002. http://dx.doi.org/10.1117/1.JBO.19.5.056002. [100] Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 2004;13 (4):600–12. Damber Thapa (PhD) received his PhD degree in Vision Science at the University of Waterloo in 2015. He received B.Sc degree in Physics from Tribhuvan University, Kathmandu, Nepal. His current research interests include adaptive optics of the eye, optical imaging, and biomedical image processing.
Please cite this article in press as: Thapa D et al. A performance comparison among different super-resolution techniques. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.09.011
D. Thapa et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx
17
Kaamran Raahemifar (PhD) is a professor of Department of Electrical and Computer Engineering at Ryerson University. His research interests include hardware implementation and software based approaches to signal and image processing algorithms with focus on biomedical engineering application. He is a professional engineer of Ontario and a senior member of IEEE. William R. Bobier (PhD) is a professor of Optometry and Vision Science at the University of Waterloo. His research interests include the optics of the eye and related binocular motor development. He is primarily interested in normal and abnormal developmental patterns in infants and children. V. Lakshminarayanan (PhD) is a professor at the University of Waterloo and has held positions at UC Irvine, Universities of Missouri and Michigan. He was a KITP Scholar at the Kavili Institute of Theoretical Physics, and is an optics advisor to the International Center for Theoretical Physics, Trieste, Italy. He is a fellow of APS, SPIE, OSA, AAAS, and IoP.
Please cite this article in press as: Thapa D et al. A performance comparison among different super-resolution techniques. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.09.011