Enhancing image visuality by multi-exposure fusion

Enhancing image visuality by multi-exposure fusion

Accepted Manuscript Enhancing Image Visuality by Multi-Exposure Fusion Qingsen Yan, Yu Zhu, Yulin Zhou, Jinqiu Sun, Lei Zhang, Yanning Zhang PII: DOI...

811KB Sizes 0 Downloads 48 Views

Accepted Manuscript

Enhancing Image Visuality by Multi-Exposure Fusion Qingsen Yan, Yu Zhu, Yulin Zhou, Jinqiu Sun, Lei Zhang, Yanning Zhang PII: DOI: Reference:

S0167-8655(18)30801-8 https://doi.org/10.1016/j.patrec.2018.10.008 PATREC 7334

To appear in:

Pattern Recognition Letters

Received date: Revised date: Accepted date:

1 June 2018 2 October 2018 8 October 2018

Please cite this article as: Qingsen Yan, Yu Zhu, Yulin Zhou, Jinqiu Sun, Lei Zhang, Yanning Zhang, Enhancing Image Visuality by Multi-Exposure Fusion, Pattern Recognition Letters (2018), doi: https://doi.org/10.1016/j.patrec.2018.10.008

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT 1 Highlights • A simulated exposure images mechanism generates more input data. • Four weighted maps increase the visualization of the degraded regions. • Color corrected of the simulated exposure images.

AC

CE

PT

ED

M

AN US

CR IP T

• Gradient enhanced of the simulated exposure images.

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

2

ACCEPTED MANUSCRIPT 3

Pattern Recognition Letters journal homepage: www.elsevier.com

Enhancing Image Visuality by Multi-Exposure Fusion Qingsen Yana,c , Yu Zhua,∗∗, Yulin Zhoua , Jinqiu Sunb , Lei Zhangc , Yanning Zhanga a School

CR IP T

of Computer Science, Northwest Polytechnical University, Xi’an, Shaanxi Province, China, 710072 of Astronomy, Northwest Polytechnical University, Xi’an, Shaanxi Province, China, 710072 c School of Computer Science, The University of Adelaide, South Australia, Australia, 5005 b School

ABSTRACT

PT

ED

M

AN US

Image visuality enhancement aims at increasing visual quality of a given image to convey more useful information. The key for visuality enhancement is to comprehensively exploit the details of the image scene. However, one (or several) observed image only provides partial information of the scene. To address this problem, we present a novel multi-exposure fusion based visuality enhancement method in this study. Firstly, we propose a simulated exposure model to synthesize the results of the observed image under different exposure conditions. Then, white balancing and image gradient are separately performed on those simulated exposure results. By doing these, scene details in different exposure conditions as well as image feature spaces (i.e., color and gradient spaces) can be well exploited. To appropriately take advantage of those resultant details, we adopt the Laplacian pyramid based linear fusion framework to integrate them into the enhanced image in a multi-scale way. Different from conventional fusion methods, we develop a more powerful weight map for fusion, which is able to simultaneously highlight pixels with good exposedness, contrast, saturation as well as Gamut. Experimental results demonstrate that the proposed method can effectively enhance the visuality of the observed image. Compared with existing methods, the proposed method well highlights the fine details as well as avoiding halo artifacts. c 2018 Elsevier Ltd. All rights reserved.

1. Introduction

AC

CE

High dynamic range (HDR) imaging is significant in various computer vision applications. Firstly, enhancing the dynamic range of an image can improve the visual effects of the scene. In general, the high dynamic range image is more visually satisfactory. Secondly, HDR images can provide more information for most computer vision tasks, whether low level image processing or high level object detection than conventional images. Lastly, the high dynamic range imaging has started to make its way to commercial softwares, such as smartphones, televisions. High dynamic range imaging thus has become increasingly popular in recent years and attracted many professional and amateur photographers. However, on one hand, radiance of outdoor scene usually has a wider dynamic range than the imaging ability of conventional digital cameras. On the other hand, most display equipments ∗∗ Corresponding

author: e-mail: [email protected] (Yu Zhu )

have narrow luminance dynamic range due to the limited capacity of CCD or CMOS sensors. As a consequence, taking a photograph of an natural scene, the bright areas often turn out to be overexposed while the dark regions are underexposed. Obviously, these overexposed and underexposed regions lose the details of the scene information and decrease the visuality of the scene. Therefore, a single image is difficult to preserve all detailed information in a natural scene. Although the autoexposure mechanism may correctly expose the specific region chosen by users, it cannot capture the whole dynamic range of image such as background region. There are many approaches have been developed to capture high dynamic ranges images. A few of hardware [33, 26] methods focused on directly capturing HDR images with customized cameras. For example, the basic principle of [26] is to simultaneously capture the exposure and spatial dimensions of the same scene irradiance. The captured image is mapped to a high dynamic range image according to an effective image reconstruction algorithm. Aggarwal [1] described a camera design for simultaneously sampling multiple images of the same scene

ACCEPTED MANUSCRIPT 4

CR IP T

response function. Debevec and Malik [3] used a trigonometric function based on the assumption of the reliability of intensity values in the two terminals of the range are lower. However, those methods require to estimate the complicated camera response function (CRF) and irradiance. Our technique avoids CRF calibration and can be computationally efficient. This is done in a simulate exposed images fashion to cope the dark and saturation region in the sequence. The method by Li et al. [22] used a pyramidal image decomposition, and attenuate the coefficients at each level to compress the dynamic range. Reinhard et al. [32] computed a multi-scale measure that is related to contrast and rescales the HDR pixel values accordingly. Grundland et al. [24] cross-dissolved between two images using a pyramid decomposition. Mertens et al. [25] blended multiple exposures with pyramidal image decomposition, guided by simple quality measures like saturation and contrast as parameters for weighting functions. However, these methods generate dissatisfactory results when the inputs are captured in extreme environment. Zhang and Cham [40] used quality measures based on gradient information changes to get weighting map among different exposed images. Paul et al. [28] proposed the method based on blending the gradients of the luminance components of the input images. The method uses the maximum gradient magnitude at each pixel location, and then obtains the fused luminance by a Haar wavelet-based image reconstruction technique. Ke et al. [16] proposed a multi-exposure image fusion method based on human visual perception with overall image quality index (OIQ) and local saturation. Tae-Hyun et al.’s method [27] employed a rank minimization algorithm which simultaneously aligns LDR images and detects outliers. Lee et al. [19] also explored rank minimization for HDR deghosting. The result is unsatisfying when most of input images are too dark or too bright. Yan et al. [37, 38] proposed a ghost-free HDR image synthesis algorithm that utilizes a sparse representation [39] framework. The ghosting will not happen in our algorithm due to the simulation of different exposed images. Compared to those algorithms, our result obtains more information.

AC

CE

PT

ED

M

AN US

under different exposure settings. This is done by splitting the aperture into multiple parts and directing the light exiting from each in a different direction using an assembly of mirrors. Tumblin et al. [35] proposed a new camera design that measures static gradients instead of static intensities and then quantizes the differences appropriately to capture a HDR image. This intrinsically differential design can correct for its own saturated sensors. Despite their effectiveness to enhance the visual impact of a image, these devices are so expensive for mainstream consumers that reduce their practical applicability. The most common solution for enlarging the dynamic range is based on the combination of multiple low dynamic range (LDR) images with different exposure time. The motivation is that images from different exposure time highlight different regional information of the natural scene. In general, dark regions are obtained in the longer exposures while bright regions are saturated, then bright regions are captured in the shorter ones while dark regions still blacken. Inspired by this, with varying exposure time, we can capture different information of the same scene. During the past decades, there have been many algorithms generating high dynamic range imaging, such as [23, 3, 14, 9]. In this study, we propose a novel algorithm which can enhance visual effects of the scene. Our algorithm is built on the fusion framework that has been used in many applications. Considering that several images are difficult to preserve all the content of a natural scene and different exposed images are fairly complementary, we simulate the different exposed images from each image. Since the color information plays an important role in HDR, the first version of input images employ traditional white balance enhancing techniques to produce a natural appearance. The second version of input images that enhancing the different exposed images in gradient domain which renders the details in the entire intensity range. Finally, result is done in a multi-resolution fashion that is robust to brightness variation and artifacts in the sequence. The weight maps of each image in fusion framework are computed by assessing image quality for each pixel. Contrast to other approaches, all of the input images and the weight maps are from the original image. The framework of our algorithm is shown in Figure 1. The remaining paper is organized as follows. Section 2 describes the related work of the multiple exposed fusion techniques. The detailed explanation of the enhancement algorithm is presented in Section 3. Section 4 demonstrates the results and comparisons with other algorithms for high dynamic range imaging. Finally, the conclusion of the proposed algorithm is represented in Section 5. 2. Related Work

During the past decades, there have been many works focusing on generating a high dynamic range image from different exposed images. Fusion in irradiance domain is to synthesize all the differently exposed LDR images into an irradiance image, and then a LDR image can be captured by a mapping function. Considering that the reliability of pixel values relate to the camera sensitivity of light, Mann and Picard [23] calculated irradiance maps from different exposed images using the camera

It is well known that most display devices usually have much narrower dynamic range than the real-world scenes. For this reason, tone mapping technology [33] compresses HDR image in order to fit low dynamic range display device and preserve visual content, simultaneously. Various kinds of tone mapping methods have been proposed to cope with different problems. According to the transformation pattern, they are usually categorized into two main modes, global and local operators. Global tone mapping methods [18, 5, 31] compress the dynamic range of the original image in accordance with a spatially uniform function of intensity. They have the advantage of speed, but sometimes cause a loss of detail visibility. Local tone mapping methods [4, 6, 22, 32] apply different mapping function to different regions in the image. The feature of local operators make the result has more details. However, the result may cause some artifacts. Durand and Dorsey [6] used a two-scale decomposition of the image into a base and detail layer for compression in the gradient domain. Only the base layer has its contrast reduced, thereby preserving detail. Reinhard et al. [32] applied automatic dodging-and-burning to accomplish dynamic range

ACCEPTED MANUSCRIPT

M

AN US

(a) The framework for single image.

CR IP T

5

ED

(b) The framework for multiple images

PT

Fig. 1. The pipeline of the proposed method. From left to right column are: original images, different exposed images, the two versions of input images (White balancing and Enhancement in gradient domain), the corresponding weight maps of input images and the result of proposed method. (a) is the framework for single image. (b) is the framework of multiple images. The result of multiple images is better than only one input.

AC

CE

compression. Our method attempt to generate as much details and color as possible.

In recent years, image fusion process has been used in many fields such as defogging [21], high dynamic range [25] and video enhancement [30]. In this paper, image fusion is employed to enhance visual effects of scene from different exposed images. Burt et al. [2] presented an extension to the pyramid approach to image fusion. Goshtasby [13] proposed a fusing multi-exposure images method with maximum information content, but it may cause block artifacts. However, our method is more reliable to enhance the visuality of images by multiple assessment functions. It’s worth mentioning that our method also can be used for image enhancement in single frame. More complete presentation can be found in Section 3.

3. Enhancement Algorithm

Considering several images are difficult to capture all the content of natural scene, and it is a complementary relationship between the different exposure images, we propose an enhancing algorithm with different exposed images using the fusion framework. This method aims to obtain more different exposed images from original images which are able to capture more details of the scene. With these data, the final image is calculated by keeping the high quality parts in the image with multi-scale fusion. The fusion process is guided by a weight map consisting of four quality measurements. Our framework consists of four main steps: exposed images generation, inputs assignment, weight maps calculation and multi-scale fusion. We assume that all the images are perfectly aligned.

ACCEPTED MANUSCRIPT 6

AN US

3.1. Exposed Images Generation HDR image is generated by several LDR images at different exposures. We derive several exposed images with the following observation. The LDR images provide information of the natural scene, however, the details of bright or dark regions can not be captured under insufficient data or some specific circumstances. Based on this observation, we generate inputs recovering the visibility of the entire image from the original LDR images. For a static scene, the intensity values at any pixel in different exposed images are linearly associated with to the exposure time ∆t, when the sensor irradiance R of a scene is constant [20] (Figure 2). Z n (u, v) Z 1 (u, v) Z 2 (u, v) ≈ ≈ ... ≈ ∆tn ∆t1 ∆t2

(1)

ED

M

where Z 1 (u, v) represents the intensity value at the location (u, v), ∆t1 denotes the corresponding exposure time. Based on the above observation, we can capture different exposed images from one image. Therefore, the intensity of the input images yi j in our method are observed through: yi j = ki j xi

CR IP T

Fig. 2. Relationship between intensity value and exposure time. The intensity value has a linear relation to the exposure time, except saturation region (green region at maximal exposure time).

can solve problems such as the color restoration and image tone processing. White balancing can explore the color information though the degrees vary from exposure factors in section 3.1. In achieving computational white balance, it is necessary to estimate the scene illumination and perform the chromatic adaptation. It can adjust the scene colors so that look as they undering a desired light condition (mostly daylight). Assumed the scene illumination is uniform, then the global illumination can be estimated for the whole scene. However, if not, then the illumination has to be estimated locally. There are a lot of white balance algorithms, such as Shades-of-Grey [8], GreyEdge [36] and Grey-World. Finlayson proposed Shades-of-Grey [8] based on statistic method, which calculates illumination of the scene for each channel. Particularly, Shades-of-Grey is the expression of MaxRGB and Gray-world with Minkowski p norm. This expression turns into Gray-world which assumes that a scene average is grey, when p is equal to 1. For p = ∞, it is a particular case of the Max-RGB which assumes that at least one white patch exists in an image. Weijer and Gevers presented Grey-Edge [36] method with the derivative structure of images, because the average edge difference in a scene is achromatic. It can also be formulated by Minkowski p norm. Compared with other white balance algorithms, Shades-of-Grey algorithm is a simple and effective approach to estimate the color of the prevailing light. The first version of input images employ traditional white balance enhancing techniques to eliminate chromatic casts. Our method is similar to the Shades-of-Grey algorithm but more robust to saturation region. We use M to denote mask which removes all saturated points. Then the average of the scene µavg is calculated by Minkowski p norm except saturated regions X  1p (3) µavg = kO Mk p

(2)

AC

CE

PT

where xi denotes the i-th degraded original images, ki j denotes exposure factor of the xi . Given different exposure factors, input images yi j are obtained from xi . If ki j > 1, the brightness of the dark region can be increased. If ki j < 1, the brightness of the bright region can be reduced. In this way, we get more images to enhance the details of bright and dark regions. This ensures that the result is favorable when most of input images are too dark or light. It will create appropriate conditions to generate results, of which each parts present an appropriate facade. 3.2. Inputs of the Fusion Processing A pleasant visuality of final result is determined by the input images and weights. As a consequence, we aim to rectify input images so as to make image sharp and clear. The first version of input images employs traditional white balance enhancing techniques to produce a natural appearance. The second version of input images that are enhanced by different exposed images in gradient domain promotes the image details. 3.2.1. White Balancing White balance is a very important concept of image processing field that corrects the color balance of the lighting, and it

where denotes dot-product, O is the input image. After that, the illumination of scene µ is µ = ηµavg + 0.5

(4)

where η denotes the capacity of detecting set of colors and the default value is 0.4. Our approach takes saturated regions into consideration, hence it effectively protects the color balance of non-saturation region. 3.2.2. Enhancement in Gradient Domain Currently, a number of image processing algorithms are processed in the image of gradient domain [7, 12, 34, 21, 11, 10], such as image enhancement, image segmentation, edge detection and image fusion. In general, image reconstruction from gradient is solved by the Poisson equation and Laplacian operator. It is a simple and robust algorithm that improves lowcontrast details and avoids halo effects. Since the details of bright and dark (low contrast) regions are small, we propose an adaptive method for visuality enhancement in the gradient domain. Those low contrast regions have smaller gradient values, and larger coefficients are multiplied by the gradient of those regions to capture details. The result can be calculated by an optimization algorithm which minimizes the mean square error between the input and output gradients.

ACCEPTED MANUSCRIPT 7 Our algorithm employs an enhancement function T (u, v) to transform the magnitudes of the image gradients at each pixel. The enhancement function T (u, v) is defined as !1−β kG(u, v)k (5) T (u, v) = α

H(u, v) = G(u, v)T (u, v).

(6)

Our purpose is searching the enhancement image J according to minimizing the mean square error between the gradients of J and H. E = k∇J − Hk22 (7) dE = ∇T ∇J − ∇T H dJ

(8)

Also, the initialization condition is Dirichlet boundary

(9)

J|Ω = O|Ω

Fig. 3. Example images (blue box) and the corresponding Gamut weight WG (red box).

Contrast weight (WC ): The contrast weight describes the relation of pixel spatial. The ability of contrast weight is to strengthen the appearance of bright and dark regions. A Laplacian filter is applied to the grayscale image, then we calculate the absolute value of the filter result. WC tends to assign a high values to high-frequency information such as texture and edges. According to this indicator, details are preserved in fusion process. Saturation weight (WS ): The photographer pay more attention to saturation which is the constituent elements of color image. The eye is quite sensitive to saturation, obviously, the more purity of the color, the more distinctive performance we can get. However, a longer exposure cause desaturated eventually clipped. Based on this problem, we use saturation weight WS which is calculated the standard deviation of intensity values of R, G and B channel at each pixel. Gamut weight (WG ): We propose a simple but effective image quality assessment - Gamut prior to describe the range of color from a single image. Gamut prior is a kind of statistics of the high dynamic images. It is based on a key observation: in most high dynamic images, the difference of maximum and minimum intensity values often larger than others. Formally, for an image I, we define

AN US

The derivation of formula (7) is:

CR IP T

where G(u, v) denotes the gradient of image, α decides the gradient magnitudes change or not, and β determines range of gradient magnitudes. For instance, the gradient magnitudes which smaller than α will be magnified. In our experiments, α is 0.3 times the average gradient magnitude, and β is 0.7. More specifically, we compute:

ED

M

where Ω is the boundary of image, O|Ω denotes the intensity values of the original image at Ω. The second version of input images is enhance the different exposed images in gradient domain. In the implementation, the enhanced image J which preserves local changes of small gradient magnitude is calculated by conjugating gradient descent method.

AC

CE

PT

3.3. Weights of the Images Image weight maps play a considerable role in image fusion. A higher weight map value means that the corresponding portion of the image needs to be preserved, while a lower value represents a low quality image, which has a relatively small probability of appearing in the final image. In our framework, the two versions of the input images contain flat, bright, dark, low contrast regions that should be set lower weight. In this section, various measures used to assess image quality. Well-exposedness weight (WE ): For the raw intensities within a channel, this metric reveals how well a pixel is exposed. It provides a criterion to preserve regions that are not overexposed (too bright) or underexposed (too dark). Generally speaking, the pixels have high credibility when their normalized intensity values are near 0.5. This weight map is expressed as a Gauss curve with the average value ! (I(u, v) − 0.5)2 WE (u, v) = exp − , (10) 2σ2 where I(u, v) denotes the normalized intensity value of the location (u, v) in the image I, the standard deviation σ equals 0.2. In consequence, this weight map preserves well exposed appearance of the image.

WG (u, v) = max (max(Ic (u, v)) − min(Ic (u, v))) c∈(r,g,b)

(11)

where Ic (u, v) is a color channel of the location (u, v) of the image I. To verify the performance of the Gamut weight, Figure 3 displays several images and the corresponding Gamut weight WG . To yield the weight map of image, multiple factors should be considered. For one pixel, it maybe has low value of contrast or others, that means this pixel in a low quality. If we combine the above metrics using multiplication, the weight values are determined by the minimum of all metrics. Only in this way can we ensure that the weight map extracts high quality areas. W(u, v) = WE (u, v) × WC (u, v) × WS (u, v) × WG (u, v)

(12)

ACCEPTED MANUSCRIPT 8 Table 1. The algorithm performance of window dataset.

AG 2.65 3.63 2.74 3.01

IE 12.11 12.24 12.05 12.31

GV 29.16 30.69 26.89 30.72

IS 4.18 3.52 2.47 4.21

Fig. 4. Different exposed images of Window dataset.

In order to gain consistent results, the weight map should be normalized: W s (u, v) (13) W s (u, v) = N Σn=1 W n (u, v) where W s (u, v) presents the weight at the location (u, v) of the s-th image, N is the total number of input images. 3.4. Multi-scale Fusion

(a) Mertens et al. algorithm

AN US

Actually, we can obtain result by weight blending. Wherever weights vary quickly, disturbing artifacts will appear. The reason is that the combining images contain different absolute intensities due to their different exposure times. To address above problem, the resulting image R is calculated by fusing the input images with the weight map using Laplacian pyramid image decomposition. First, each image is decomposed into a Laplacian pyramid, which use the Laplacian operator to different scales. Meanwhile, each normalized weight map is decomposed into a Gaussian pyramid. Blending is then carried out for each level separately. Let the l-th level in a Laplacian pyramid decomposition of an image A be defined as Ll {A}, and Gl {B} for a Gaussian pyramid of image B. Then, we blend the coefficients of Laplacian and Gaussian in the each pyramid level.

CR IP T

Method Mertens et al. [25] Lee et al. [19] Oh et al. [27] Proposed

ED

M

(c) Oh et al. algorithm

N Ll {R(u, v)} = Σn=1 Gl {W n (u, v)}Ll {I n (u, v)}

(14)

AC

CE

PT

where G{W n } presents the Gaussian pyramid of the n-th weight map W n , L{I n (u, v)} presents the Laplacian pyramid of the n-th image I n . The final image R is represented by collapsing the pyramid Ll {R(u, v)}. The multi-scale fusion strategy is a relatively effective method. Since the fusion procedure is computed at every scale, sharp transitions in the weight maps are minimized. This approach retains the details of the image effectively.

4. Experiments

We implemented our algorithm in MATLAB and tested it on degraded images which including multiple frames and single frame, as shown throughout the paper. In our experiments, for the parameter k we set three exposure factors (0.6, 1, 1.4), the other parameters are set to the default values. For quantitative evaluation, average gradient(AG), information entropy(IE), gray variance(GV) and image sharpening(IS) are employed to quantitatively assess the performance of the algorithms.

(b) Lee et al. algorithm

(d) Proposed algorithm

(e) Zoomed-in regions Fig. 5. Comparison with other HDR algorithms.

4.1. Multiple Frame To demonstrate the effectiveness of the algorithm, we tested the proposed algorithm on multiple degraded images. For comparison, we compared our algorithm to the state-of-the-art methods such as Mertens et al. [25] algorithm, Lee et al. [19] algorithm, Oh et al. [27] algorithm. Mertens et al. algorithm combines multiple exposures, guided by simple quality measures such as saturation and contrast as parameters of weighting functions. As for Lee et al. algorithm, based on the assumption that the irradiance map is linearly related to low dynamic range image, the HDR image is obtain using a low-rank matrix completion framework. Oh et al. algorithm introduces a new high dynamic range (HDR) imaging algorithm that utilizes rank minimization. The implementation of all algorithms are directly taken from the authors. We begin with showing results of the experimental scenes to validate our method. Figure 4 is the different exposed images of window dataset. Figure 5 shows the results (a-d) and zoomedin regions (e) of Mertens et al. algorithm, Lee et al. algorithm, Oh et al. algorithm and proposed algorithm. The method by Mertens generates an acceptable result. However, the zoomedin regions of their method has less details, such as the border of

ACCEPTED MANUSCRIPT 9 Table 2. The algorithm performance of cloud dataset.

Method Mertens et al. [25] Lee et al. [19] Oh et al. [27] Proposed

Fig. 6. Different exposed images of Cloud dataset.

AG 3.33 4.87 5.73 8.79

IE 12.55 12.73 12.73 13.02

GV 30.61 26.01 25.74 29.89

IS 3.57 5.02 6.19 8.93

Table 3. The algorithm performance of house dataset.

(b) Lee et al. algorithm

AG 2.97 7.31 3.10 3.49

IE 10.85 12.78 11.47 11.51

GV 32.39 30.10 31.58 31.86

CR IP T

(a) Mertens et al. algorithm

Method Mertens et al. [25] Lee et al. [19] Oh et al. [27] Proposed

IS 3.17 8.58 3.39 3.80

Table 4. The algorithm performance of door dataset.

(d) Proposed algorithm

M

(c) Oh et al. algorithm

AN US

Method Mertens et al. [25] Lee et al. [19] Oh et al. [27] Proposed

ED

(e) Zoomed-in regions

PT

Fig. 7. Comparison with other HDR algorithms.

AC

CE

the window and the straps of the bag. Lee et al. algorithm and Oh et al. algorithm obtains an HDR image from the irradiance domain, which is calculated from a number of different exposed images. Although the irradiance image can be estimated from three images, the results are unsatisfactory (Lee et al. algorithm produces artifacts, Oh et al. algorithm changes the color of the scene). The proposed algorithm can synthesize image with more details, because we capture more images from section 3.1. The edges of the window and the straps of the bag prove that our algorithm is preferable. From the Table 1 we can see that the proposed method has higher performance than other methods. The reason of Oh et al.’s algorithm achieves a higher value on average gradient is that the noise is greater. Data analysis shows that our method can bring out more details in bright and dark areas. Figure 6 is the different exposed images of cloud dataset. Figure 7 shows the results (a-d) and zoomed-in regions (e) of Mertens et al. algorithm, Lee et al. algorithm, Oh et al. algorithm and proposed algorithm. This is a very challenging image

AG 6.92 7.88 6.57 7.33

IE 10.26 12.14 11.19 11.93

GV 29.56 31.05 27.78 31.46

IS 7.81 7.83 7.68 7.86

sequence, even in the longest exposure time image, the details of the darkest areas are not clear, not to mention shorter exposure time. The result of Mertens et al. algorithm has less information in dark regions, but has better saturation due to more information from the last frame. The irradiance of a scene can be estimated effectively due to the great changes in exposure time. However, image segmentation affects the results of Oh et al. algorithm. Compared with others, the proposed algorithm which using simulated exposure images achieves more details and plausible visual effects. The algorithms performance of the cloud dataset is shown in the Table 2. Due to the result of Mertens et al.’s algorithm has more dark regions, that may not meet human needs. Compared with Oh et al.’s and Lee et al.’s algorithms, our result achieves higher performance. A very challenging 2-image stack is shown in Figure 8. From the original images we can see that the correct exposure regions of the left image are barely visible in the right image and the longer exposure image also has dark regions. Figure 9 shows the results (a-d) and zoomed-in regions (e) of Mertens et al. algorithm, Lee et al. algorithm, Oh et al. algorithm and proposed algorithm. Although the result of Mertens et al. algorithm has higher contrast and better visual effect, the result of roof (zoom in) fails to display the contents of the scene due to only two input images. Obviously, the halo in the result of Lee et al. algorithm and Oh et al. algorithm is caused by the estimated irradiance map and tone mapping. Compared to others, the proposed method attempts to increase the brightness of dark areas in the source image and acquires artifact free results.

ACCEPTED MANUSCRIPT 10

CR IP T

Fig. 8. Different exposed images of House dataset.

Fig. 10. Different exposed images of door dataset.

(b) Lee et al. algorithm

(c) Oh et al. algorithm

(d) Proposed algorithm

(a) Mertens et al. algorithm

(b) Lee et al. algorithm

(c) Oh et al. algorithm

(d) Proposed algorithm

ED

M

AN US

(a) Mertens et al. algorithm

PT

(e) Zoomed-in regions

Fig. 9. Comparison with other HDR algorithms.

AC

CE

The performance of house dataset is shown in Table 3. We can see many noises in the result of Lee et al. algorithm in Figure 9, which led to Lee et al. algorithm has a high value of quality assessment. Hence, we should ignore some quality assessment metrics, and focus on the visual effects. Despite all this, our result also gets moderate performance. Multi-scale blending is quite effective at avoiding noise, because it blends image features instead of intensities. Since the blending is computed at each scale separately, sharp transitions in the weight map can only affect sharp transitions appear in the original images. Figure 10 displays the door sequence which is an extreme case of a stack containing only two exposed images. The image on the left shows the outdoor scene, in contrast, only the indoor scene can be captured from the right image. Figure 11 shows results and zoomed-in regions of Mertens et al., Lee et al., Oh et al. and proposed algorithm, respectively. Mertens et al. algorithm produces artifacts. The color of Lee et al. algorithm

(e) Zoomed-in regions Fig. 11. Comparison with other HDR algorithms.

ACCEPTED MANUSCRIPT 11 Table 5. The algorithm performance of cave dataset.

AG 7.54 6.96 8.26 8.44

IE 12.08 11.52 10.27 12.49

GV 29.70 21.39 51.60 45.75

IS 7.71 7.06 12.49 11.50

(a)

AG 9.03 8.49 8.26 9.02

IE 12.05 12.58 12.07 12.69

GV 27.76 28.16 33.91 32.21

IS 10.40 9.23 10.11 9.41

(a)

(b)

AN US

is not satisfactory result, and Oh et al. algorithm captures little details of the scene. Compared with others, the proposed algorithm obtains image textural informations and has better color visual effects. The algorithms performance of cloud dataset is shown in the Table 4. We can see that Lee et al.’s and our results obtain better performance.

(d)

(e)

(c)

(d)

(e)

Fig. 13. Comparison with other image enhancement algorithms. (a) Original image. (b) Jobson et al. algorithm. (c) Petro et al. algorithm. (d) Kim et al. algorithm. (e) Proposed algorithm.

4.3. Ablation Experiments

4.2. Single Frame

CE

PT

ED

M

To further verify the effectiveness of the algorithm, we tested the proposed algorithm in the more extreme case (only one image). We compared the proposed algorithm with several image enhancement methods such as Jobson et al. [15] algorithm, Petro et al. [29] algorithm, Kim et al. [17] algorithm. Jobson proposes a multi-scale retinex with color restoration method which facilitates to gain visual information. However, it introduces too many parameters which increase the complexity of the algorithm. For Petro et al. algorithm, MSRCP (Multi-Scale Retinex with chromaticity preservation) performs Retinex on the intensity of the image and then the result is mapped to each channel based on the ratio of RGB. This method enhances the image by preserving the original color distribution. Kim et al. algorithm enhances image by minimizing a cost function which consists of the contrast term and information loss term. Figure 12 and 13 (a) display the degraded images of cave and church, respectively. There are some details drown in the dark region influencing the visual effect. Figure 12 and 13 (be) show the results of Jobson et al., Petro et al., Kim et al. and proposed algorithm. Jobson et al. and Petro et al. algorithm enhance the details of dark region, however they produce halo artifacts in some areas. The result of Kim et al. algorithm optimally preserves the information of original image and promotes the contrast, but it cannot improve the visuality effect of the dark region. In comparison, the result of our method has more details. With multiple different exposed images, the proposed algorithm works well for both regions. The comparison results are shown in Table 5 and 6, from which we can see that the proposed method performs favourably on single frame.

AC

(c)

Fig. 12. Comparison with other image enhancement algorithms. (a) Original image. (b) Jobson et al. algorithm. (c) Petro et al. algorithm. (d) Kim et al. algorithm. (e) Proposed algorithm.

Table 6. The algorithm performance of church dataset.

Method Jobson et al. [15] Petro et al. [29] Kim et al. [17] Proposed

(b)

CR IP T

Method Jobson et al. [15] Petro et al. [29] Kim et al. [17] Proposed

To evaluate the performance of the gamut weight, we conduct the ablation experiment of the proposed method. It is well known that well-exposedness weight, contrast weight and saturation weight are widely used in fusion task. In this paper, we focus on the importance of gamut weight. We remove the gamut weight term in equation (12) and execute multi-scale fusion. The result without gamut weight is shown in Figure 14 (a), and Figure 14 (b) is the result with gamut weight. Although these results look similar each other, the zoom-in patch demonstrates that the method with gamut weight obtain more details in edge regions. As we can see from Table 7, the quantitative results of method with gamut weight are better than those without gamut weight. We believe the performance improvement arises because the gamut weight captures useful information from the input images. We also display experiment with different weight blending strategies, such as average, maximum and multiplication. The results are shown in Figure 15, the multiplication strategy archives the best result. The result with multiplication strategy has greater contrast and richer in color, we consider the reason is that multiplication strategy pays more attention to useful regions. There are four quality measurements of the images, multiplication strategy forces to choose the regions that all the measures are perfect for image. If one of these measures has a poor result, the weight of that pixel will be a smaller value. This information needs to be captured from other images to get vivid result. However, the average strategy reduces the ability to constrain the bad areas, and the maximum strategy that considers the large value of these measurements will lose useful weight. The results of these two strategies have poor visual effects.

ACCEPTED MANUSCRIPT 12 Table 7. The quantitative analysis of gamut weight.

Without gamut With gamut

AG 7.78 7.86

IE 12.61 12.65

GV 32.88 33.35

IS 8.93 9.03

(a) Avarage.

(c) Multiplication .

(b) Maximum.

(a) Result without gamut weight.

(b) Result with gamut weight.

Fig. 14. Ablation study on without/with gamut weight. The Zoom-in regions are shown in right-bottom.

Table 8. The quantitative analysis of exposure factors.

Exposure Factor 0.2 0.4 0.6 0.8

PT

4.4. Computational Complexity

ED

M

AN US

To determine the exposure factors of simulated image, we set k with different values. In the experiment, we use the same rate to capture bright and dark images from the inputs. For example, if a bright image is calculated using the coefficient k, the coefficient of the dark image is 1 + (1 − k) = 2 − k. Quantitative analysis of exposure factors is shown in Table 8. We can obtain the key observations are as follows. First, the algorithm uses a factor of 0.6 to archive the best performance, probably because this factor can display more details from the inputs. Second, the performance will decrease when the factor is less than 0.6. This is a reasonable phenomenon because many details have been lost in the simulated images. Third, the proposed method can get acceptable performance when the factor exceeds 0.6. For a particular image, the best result can be obtained by manually adjusting the parameters.

CR IP T

Fig. 15. The results of different strategy of weight fusion.

AC

CE

As well know, computation complexity is an essential issue for HDR imaging. We compare the runtime of methods using Matlab R2011a on a 3.0 GHz Intel Pentium 4 Processor machine. It should be noted that our implementation based on MATLAB is not optimized. The comparison results are shown in Table 9, from which we can see that Mertens et al.’s method gets the fastest speed than other methods, using image information fusion directly. However, the result of Mertens et al.’s method obtains poor results in certain challenging situations. Due to the optimization process is time consuming, other methods are slower. Our method is faster than Lee et al.’s and Oh et al.’s method, and archives favorable results. 5. Conclusion In this paper, we present a novel multi-exposure fusion based visuality enhancement method. We propose a simulated exposure model to synthesize the results of the observed image under different exposure conditions. Then, white balancing and

AG 6.41 7.22 7.86 7.40

IE 11.24 11.46 12.65 12.23

GV 29.16 31.63 33.35 32.92

IS 8.36 8.66 9.03 8.91

Table 9. Computational complexity for different methods.

Method time(s)

Mertens 2.09

Lee 103.41

Oh 76.16

Proposed 17.34

image gradient are separately performed on those simulated exposure results. To appropriately take advantage of those resultant details, we adopt the pyramid fusion framework to integrate them into the enhanced image in a multi-scale way. Different from conventional fusion methods, four weighted maps of quality metrics are introduced to represent useful information in the image. In particular, we propose a simple and effective image quality assessment – Gamut prior to describe the range of color from a single image. The proposed method is computationally efficient and enhances the details of the original image in the sequence. Experiments on the challenging datasets show that the proposed algorithm is robust and effective for image enhancement. Acknowledgments The work is supported by grants NSF of China [grant number 61231016, 61301192, 61301193, 61303123], Natural Science Basis research Plan in Shaanxi Province of China [grant number 2013JQ8032], Chang Jiang Scholars Program of China [grant number 100017GH030150, 100015GH0301]. References [1] Aggarwal, M., Ahuja, N., 2004. Split aperture imaging for high dynamic range. International Journal of Computer Vision 2, 7–17.

ACCEPTED MANUSCRIPT 13

CR IP T

ysis and Machine Intelligence 37, 1219–1232. [28] Paul, S., Sevcenco, I.S., Agathoklis, P., 2016. Multi-exposure and multifocus image fusion in gradient domain. Journal of Circuits Systems and Computers 25, 1650123. [29] Petro, A.B., Sbert, C., Morel, J.M., 2014. Multiscale retinex. Image Processing on Line 4, 71–88. [30] Ramesh, R., Adrian, I., Jingyi, Y., 2004. Image fusion for context enhancement and video surrealism. Proceedings of Npar , 85–152. [31] Reinhard, E., Devlin, K., 2005. Dynamic range reduction inspired by photoreceptor physiology. IEEE Transactions on Visualization and Computer Graphics 11, 13–24. [32] Reinhard, E., Stark, M., Shirley, P., Ferwerda, J., 2002. Photographic tone reproduction for digital images. Acm Transactions on Graphics 21, 267–276. [33] Reinhard, E., Ward, G., Pattanaik, S.N., Debevec, P., 2005. High dynamic range imaging, acquisition, display, and image-based lighting. [34] Rez, P., Gangnet, M., Blake, A., 2003. Poisson image editing. Acm Transactions on Graphics 22, 313–318. [35] Tumblin, J., Agrawal, A., Raskar, R., 2005. Why i want a gradient camera, in: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 103–110. [36] Van, de Weijer, J., Gevers, T., Gijsenij, A., 2007. Edge-based color constancy. IEEE Transactions on Image Processing 16, 2207–14. [37] Yan, Q., Sun, J., Li, H., Zhu, Y., Zhang, Y., 2017. High dynamic range imaging by sparse representation. Neurocomputing 269, 160–169. [38] Yan, Q., Zhu, Y., Zhang, Y., . Robust artifact-free high dynamic range imaging of dynamic scenes doi:10.1007/s11042-018-6625-x. [39] Zhang, L., Wei, W., Zhang, Y., Shen, C., van den Hengel, A., Shi, Q., 2018. Cluster sparsity field: An internal hyperspectral imagery prior for reconstruction. International Journal of Computer Vision , 1–25. [40] Zhang, W., Cham, W.K., 2012. Gradient-directed multiexposure composition. IEEE Transactions on Image Processing 21, 2318–2323.

AC

CE

PT

ED

M

AN US

[2] Burt, P.J., Kolczynski, R.J., 1993. Enhanced image capture through fusion, in: International Conference on Computer Vision, pp. 173–182. [3] Debevec, P.E., Malik, J., 1997. Recovering High Dynamic Range Radiance Maps from Photographs. [4] Dicarlo, J.M., Wandell, B.A., 2000. Rendering high dynamic range images. Proceedings of SPIE - The International Society for Optical Engineering 3965, 392–401. [5] Drago, F., Myszkowski, K., Annen, T., Chiba, N., 2003. Adaptive logarithmic mapping for displaying high contrast scenes. Computer Graphics Forum 22, 419–426(8). [6] Durand, F., Dorsey, J., 2002. Fast bilateral filtering for the display of highdynamic-range images. Acm Transactions on Graphics 21, 257–266. [7] Fattal, R., Lischinski, D., Werman, M., 2002. Gradient domain high dynamic range compression. Acm Transactions on Graphics 21, 249–256. [8] Finlayson, G.D., Trezzi, E., 2004. Shades of gray and colour constancy, in: The Twelfth Color Imaging Conference: Color Science and Engineering Systems, Technologies, Applications, pp. 37–41. [9] Gallo, O., Gelfand, N., Chen, W.C., Tico, M., Pulli, K., 2009. Artifactfree high dynamic range imaging, in: Proceedings of the IEEE International Conference on Computational Photography, pp. 1–7. [10] Gong, D., Tan, M., Zhang, Y., van den Hengel, A., Shi, Q., 2017a. Mpgl:an efficient matching pursuit method for generalized lasso, in: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 1934–1940. [11] Gong, D., Tan, M., Zhang, Y., Hengel, A.V.D., Shi, Q., 2016. Blind image deconvolution by automatic gradient activation, in: Computer Vision and Pattern Recognition, pp. 1827–1836. [12] Gong, D., Tan, M., Zhang, Y., Hengel, A.V.D., Shi, Q., 2017b. Self-paced kernel estimation for robust blind image deblurring, in: IEEE International Conference on Computer Vision, pp. 1670–1679. [13] Goshtasby, A.A., 2005. Fusion of multi-exposure images. Image and Vision Computing 23, 611–618. [14] Hasinoff, S.W., Durand, F., Freeman, W.T., 2010. Noise-optimal capture for high dynamic range photography, in: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 553–560. [15] Jobson, D.J., Rahman, Z.U., Woodell, G.A., 1997. A multiscale retinex for bridging the gap between color images and the human observation of scenes. IEEE Transactions on Image Processing 6, 965–976. [16] Ke, P., Jung, C., Fang, Y., 2017. Perceptual multi-exposure image fusion with overall image quality index and local saturation. Multimedia Systems 23, 239–250. [17] Kim, J.H., Jang, W.D., Sim, J.Y., Kim, C.S., 2013. Optimized contrast enhancement for real-time image and video dehazing. Journal of Visual Communication and Image Representation 24, 410–425. [18] Larson, G.W., Rushmeier, H., Piatko, C., 1997. A visibility matching tone reproduction operator for high dynamic range scenes. IEEE Transactions on Visualization and Computer Graphics 3, 291–306. [19] Lee, C., Li, Y., Monga, V., 2014. Ghost-free high dynamic range imaging via rank minimization. IEEE Signal Processing Letters 21, 1045–1049. [20] Lee, J.Y., Matsushita, Y., Shi, B., Kweon, I.S., Ikeuchi, K., 2013. Radiometric calibration by rank minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 144–156. [21] Li, W.J., Gu, B., Huang, J.T., Wang, S.Y., H.Wang, M., 2012. Single image visibility enhancement in gradient domain. Iet Image Processing 6, 589–595. [22] Li, Y., Sharan, L., Adelson, E.H., 2005. Compressing and companding high dynamic range images with subband architectures. Acm Transactions on Graphics 24, 836–844. [23] Mann, S., Picard, R.W., 1995. On being undigital with digital cameras: Extending dynamic range by combining differently exposed pictures, in: Proceedings of IS&T, pp. 442–448. [24] Mark, G., Rahul, V., Williams, G.P., Dodgson, N.A., 2006. Cross dissolve without cross fade: Preserving contrast, color and salience in image compositing, in: Comput. Graph. Forum, pp. 577–586. [25] Mertens, T., Kautz, J., Reeth, F.V., 2007. Exposure fusion, in: Conference on Computer Graphics and Applications, pp. 382–390. [26] Nayar, S.K., Mitsunaga, T., 2000. High dynamic range imaging: Spatially varying pixel exposures, in: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 472–479. [27] Oh, T.H., Lee, J.Y., Tai, Y.W., Kweon, I.S., 2015. Robust high dynamic range imaging by rank minimization. IEEE Transactions on Pattern Anal-