J. Vis. Commun. Image R. 23 (2012) 908–923
Contents lists available at SciVerse ScienceDirect
J. Vis. Commun. Image R. journal homepage: www.elsevier.com/locate/jvci
Rate-distortion optimized layered coding of high dynamic range videos Chul Lee, Chang-Su Kim ⇑ School of Electrical Engineering, Korea University, Seoul, Republic of Korea
a r t i c l e
i n f o
a b s t r a c t
Article history: Received 28 March 2011 Accepted 30 May 2012 Available online 12 June 2012
An efficient algorithm to compress high dynamic range (HDR) videos into layered bitstreams is proposed in this work. First, we separate an HDR video sequence into a tone-mapped low dynamic range (LDR) sequence and a ratio sequence, which represents ratios between HDR and LDR pixel values. Then, we encode the LDR and ratio sequences to maximize the rate-distortion (R–D) performance by extending the standard H.264/AVC codec. Specifically, we estimate the distortion of the HDR sequence from those of the LDR sequence and the ratio sequence, and then allocate a limited bit budget to the LDR sequence and the ratio sequence efficiently to maximize the qualities of both LDR and HDR sequences. Conventional LDR devices use only the LDR stream, whereas HDR devices reconstruct the HDR video from the LDR and ratio streams. Simulation results show that the proposed algorithm provides significantly better R-D performance than conventional HDR video coding techniques. Ó 2012 Elsevier Inc. All rights reserved.
Keywords: High dynamic range video Video coding Rate-distortion optimization Layered coding H.264/AVC Tone mapping Backward compatibility Human visual system
1. Introduction The dynamic range of a digital image is defined as the intensity ratio between the brightest pixel and the darkest pixel. Although the recent development of display technology has enabled the dynamic ranges of high-end display devices to have three or more orders of magnitude, real world scenes have much higher dynamic ranges. Moreover, human eyes can perceive up to about five orders of magnitude in a scene simultaneously and adapt to about nine orders of magnitude depending on scene contents. Conventional devices hence cannot display realistic scenes that the human visual system (HVS) perceives. To overcome this limitation, a lot of researches have been carried out for acquiring, processing, and displaying images that can handle full dynamic ranges of natural scenes. Such images with higher dynamic ranges than conventional display devices are called HDR images. An HDR image contains the information about actual scene colors, instead of the colors to be displayed on devices [1]. HDR imaging technologies have been adopted already in games, movies, and surveillance systems [2,3]. For example, video games ‘Need for Speed: Undercover’ and ‘Resident Evil 5’ and movies ‘Harry Potter and the Half-Blood Prince’ and ‘Iron Man’ have used HDR technologies to render computer graphics scenes. In addition, graphics card manufacturers supply products supporting HDR rendering. This trend would continue, and HDR imaging would be used in general applications in near future [1]. However, more advanced ⇑ Corresponding author. Fax: +82 2 921 0544. E-mail addresses: (C.-S. Kim).
[email protected]
(C.
Lee),
[email protected]
1047-3203/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jvcir.2012.05.009
HDR representation and compression technologies are required for HDR images to be used in a wider range of practical applications. Due to their higher dynamic ranges, HDR images and videos require a huge amount of storage space and transmission bandwidth. However, only a few algorithms have been proposed to compress HDR images or videos. They can be classified according to the encoding strategies. First, HDR image formats to represent floating-point colorimetric HDR values, such as Radiance [4], OpenEXR [3], and LogLuv [5], have been proposed. The Radiance and OpenEXR formats employ efficient floating-point coding, and the LogLuv format uses a perceptually motivated quantization scheme. Second, conventional image and video coding standards have been extended to support HDR data compression. These algorithms quantize HDR images and videos, and then use the results as the input to the JPEG 2000 codec [6] and the MPEG-4 codec [7], respectively. Another approach to HDR image and video compression is to separate an HDR image into an LDR image and residual data, which represent differences between HDR and LDR pixel values, to support backward-compatibility. This approach first obtains an LDR image from an input HDR image using a tone mapping scheme. Then, it encodes the tone-mapped LDR image and the residual data using the JPEG codec [8], the MPEG-4 codec [9], or bit-depth scalable video coding schemes [10–12]. In this way, an HDR-enabled decoder can reconstruct the HDR data, while a conventional decoder can reconstruct only the LDR data. While recent researches on HDR data compression have focused on the extension of conventional image/video coding standards and algorithms, little work has been made to optimize the compression system in the R–D sense. In this work, we propose an R–D optimized video compression algorithm for HDR sequences.
C. Lee, C.-S. Kim / J. Vis. Commun. Image R. 23 (2012) 908–923
We separate an HDR video into an LDR sequence and a ratio sequence using a tone mapping algorithm. The proposed algorithm then extends the standard H.264/AVC codec [13,14] to encode the LDR and ratio sequences. Whereas conventional methods attempt to maximize the quality of the reconstructed HDR sequence only, we develop an R–D optimized scheme to allocate a limited bit budget to the LDR and ratio sequences efficiently to maximize the qualities of both LDR and HDR sequences. Simulation results demonstrate that the proposed algorithm provides significantly better R–D performance than the conventional techniques. The rest of this paper is organized as follows. Section 2 briefly reviews related work. Section 3 describes the proposed HDR video compression algorithm, and Section 4 discusses experimental results. Finally, Section 5 concludes the paper. 2. Related work 2.1. Tone mapping Although HDR images are expected to be employed in general applications in the future, they should be used also with conventional display devices in the transition stage. Thus, the conversion from HDR images to LDR images is necessary, since the conventional devices cannot display HDR images due to their limited dynamic ranges. This conversion is called tone mapping, tone reproduction, or dynamic range compression. Various algorithms have been proposed for the tone mapping [1]. These tone mapping operators (TMO’s) can be classified into global or local operators according to the methods for deriving mapping functions. A global operator uses a single curve to convert all pixels in an entire image. For example, the adaptive logarithmic mapping [15] reduces the dynamic range by a logarithmic function, which imitates the response of HVS to light. Mantiuk et al. [16] developed a global TMO to minimize the perceptual contrast distortion between an input HDR image and its tone-mapped LDR image. Mai et al. [17] proposed another global TMO for HDR video compression to improve the qualities of reconstructed HDR videos based on a statistical distortion model. On the other hand, a local operator derives the mapping for each pixel by considering a local neighborhood around the pixel. The photographic tone reproduction [18], the fast bilateral filter [19], and the gradient domain tone mapping [20] are examples of local operators. For video tone mapping, a high quality LDR video cannot be obtained by simply compressing the dynamic ranges of successive HDR frames independently. This simple approach often leads to flickering artifacts, which are caused by the lack of temporal coherence. An approach to overcome this problem is to smooth tone mapping parameters temporally as done in [9]. Alternatively, the state-of-the-art image tone mapping algorithms can be extended to video tone mapping by adding temporal coherency terms. For example, the gradient domain tone mapping scheme [20] has been extended to provide a temporally coherent LDR sequence by treating an HDR video as a three-dimensional array of pixels [21] or exploiting the motion information in an input sequence [22]. Also, temporally coherent LDR sequences can be obtained by exploiting the temporal adaptation models of HVS [23,24]. The proposed HDR video coding algorithm is designed to be compatible with any TMO. As will be discussed in Section 4.4, a TMO can be selected depending on applications or user preferences. 2.2. HDR image and video compression Recent video coding standards support the video compression with an extended bit-depth [25,13]. Also, several attempts have been made to add the HDR capability to existing video coding standards by developing bit-depth scalable video coding schemes
909
[10–12], which use the bit-depth of 10 or 12 bits per pixel. In these approaches, tone-mapped LDR information is encoded as a base layer, and HDR refinement information, which is estimated using the inter-layer prediction, is encoded as an enhancement layer. However, their compression performance is not as good as that of the conventional 8-bit coding, since the standard codecs are optimized mainly for 8-bit data. Furthermore, because of the limited bit-depths, they cannot faithfully represent full dynamic ranges of HDR images in the Radiance [4], OpenEXR [3], or LogLuv [5] formats. Only the recent image compression standard, JPEG XR [26,27], supports the coding of 16-bit integer and 32-bit floatingpoint pixel values. Also, researches have been made to encode HDR data based on perceptual luminance quantization schemes. Xu et al. [6] extended JPEG 2000 to compress HDR images. They quantized an input HDR image with 12 bits per pixel and used it as the input to the JPEG 2000 codec. Mantiuk et al. [7] adopted a similar approach based on a luminance quantization method, which was optimized for the contrast threshold perception in HVS. They compressed quantized videos, using the MPEG-4 codec, with an additional scheme to transform high contrast blocks. Garbas and Thoma [28] proposed a temporally coherent luminance quantization based on the weighted prediction tool and the quantization adaptation, and compressed quantized videos using the H.264/AVC codec. However, these algorithms can be employed only when both the encoder and the decoder are HDR-enabled. HDR image and video compression algorithms with the backward compatibility with conventional LDR devices have been also proposed. This can be achieved by separating an input HDR image into a tone-mapped LDR image and residual data, which represent differences between HDR and LDR pixel values, and then compressing them with the standard codecs. At the decoder side, a conventional decoder reconstructs only the LDR data, whereas an HDR-enabled decoder can reconstruct the HDR data. In [8], Ward and Simmons proposed a backward-compatible compression algorithm for HDR images. They obtained an LDR image from an input HDR image using a tone mapping scheme. Then, they encoded the tone-mapped LDR image and the ratio image using the JPEG codec. In [29], to improve compression performance, Okuda and Adami adopted an inverse tone mapping function to predict HDR pixel values from tone-mapped LDR pixel values. They used a wavelet encoder to compress prediction residuals. In [9], Mantiuk et al. extended their HDR video compression algorithm in [7] to offer the backward compatibility with LDR devices. They proposed several tools, such as a residual frame filter, to improve compression performance. In [30], Lee and Kim employed the H.264/AVC codec to encode LDR and ratio sequences with a bit allocation scheme to enhance the qualities of the reconstructed LDR and HDR sequences. However, they controlled the qualities of LDR and ratio frames at the sequence level without rigorous distortion modeling. On the contrary, in this work, we develop a more effective R-D optimization scheme at the macroblock level to maximize qualities of both LDR and HDR sequences subject to the constraint on a limited bit budget. 3. HDR video compression Fig. 1 shows the block diagram of the proposed HDR video encoder. The encoder takes HDR frames as the input, and converts them into LDR frames using a TMO, which can be selected from the conventional algorithms [15,23,19,22,18,24,20,21,17,16]. The LDR frames are compared with the original HDR frames to construct ratio frames. The luma (Y) and chroma (U, V) channels of the LDR frames and the luma channel of the ratio frames are encoded. Notice that, whereas both luma and chroma channels of the LDR sequence are encoded, only the luma channel of the ratio sequence
910
C. Lee, C.-S. Kim / J. Vis. Commun. Image R. 23 (2012) 908–923
Extended H.264/AVC encoder HDRn
Tone mapping
Color Tranform
Generating Ratio
+
LDRn Ration
T
Q
T-1
Q-1
Entropy coding
LDR stream Ratio stream
ME
MC
Mode decision Inter
Intra prediction Intra + LDR n-1 Ratio n-1
+
Deblocking Filter
Fig. 1. The block diagram of the proposed HDR video encoder. An input HDR frame is tone-mapped and decomposed into an LDR frame and a ratio frame. Then, the LDR and ratio frames are encoded with a modified H.264 encoder to produce two streams: an LDR stream as a base layer and a ratio stream as an enhancement layer. ME stands for motion estimation, MC motion-compensated prediction, T transform, and Q quantizer.
is acquired and encoded. Then, the chroma channels of the HDR sequence are reconstructed by the decoder using the color values of LDR pixels. The LDR and ratio frames are encoded by an extended H.264/AVC encoder, which compresses an LDR stream as a base layer and a ratio stream as an enhancement layer. Both the LDR frames and the ratio frames are exploited in all modules of the extended H.264/AVC encoder, including motion estimation (ME), motion compensation (MC), transform (T), quantization (Q), mode decision, deblocking filter, intra prediction, and entropy coding. In this work, we implement the HDR video codec based on the H.264/AVC standard. The main contribution of this work is, however, to develop R–D optimized HDR coding techniques, rather than to apply those techniques to a specific video coding standard. We note that the proposed techniques can be easily modified for any video coding standards, based on motion-compensated transform coding, such as the on-going standard HEVC [31]. 3.1. Luminance-to-luma conversion To allocate a limited bit budget to LDR frames and ratio frames efficiently, we will develop an R–D optimized scheme in Section 3.3. To this end, we need the distortion measures for LDR and HDR frames, respectively. However, the distortions of HDR values cannot be directly compared with those of LDR values, since they represent different quantities [32]. An HDR value represents the luminance or the linear light intensity, which is defined as the radiance weighted by the spectral sensitivity function of HVS. On the other hand, an LDR value represents the luma, which is a weighted sum of nonlinear color components. The nonlinear components are obtained by processing linear color values with a nonlinear transfer function to approximate the lightness response of HVS. To measure and compare the distortions of HDR and LDR values fairly, we transform HDR luminance values into the perceptually uniform space of luma values. The relations between luminance and luma involve a power function in general. However, since power functions poorly match the sensitivity of HVS for large luminance values, Mantiuk et al. [33] proposed a more robust transform function. They derived a perceptually linearized measure so that luma values correlate with human perception of brightness. In this work, we employ their conversion formula from HDR luminance y to HDR luma h and the inverse conversion formula, which were modeled with a threshold versus intensity (TVI) function in [33], given by
8 > < ay c hðyÞ ¼ by þ d > : e log y þ f
y < y1 ; if y1 6 y < y2 ; if y P y2
and
8 0 >
: 0 e expðf 0 hÞ
if h < h1 ; if h1 6 h < h2 ;
ð2Þ
if h P h2 ;
respectively.1 All coefficients are listed in Table 1. Note that, in this work, HDR luminance values are used only in the tone mapping and the color space transformation, and then the transformed HDR luma values are used in all the other modules in the encoder in Fig. 1. 3.2. Layered representation We decompose an HDR frame into a backward compatible LDR frame and an enhancement layer, which represents the differences between the HDR and LDR frames. Several approaches have been proposed for the decomposition process. In [8], the enhancement layer consists of ratios, obtained by dividing the HDR value by the LDR value at each pixel location. In general, the HDR value and the LDR value are correlated, since they represent the same object in a scene. In [29,9], inspired by this correlation, HDR values are predicted from LDR values, and prediction errors are employed, instead of ratios, as an enhancement layer. However, in general, prediction errors are temporally less correlated and contain more high frequency components than ratios, especially when a local TMO is used. Thus, their compression efficiency is not as good as that of ratios. Therefore, as in [8], we employ ratios as the enhancement layer. A ratio is defined as
rðx; yÞ ¼ a ln
hðx; yÞ þ b; lðx; yÞ
ð3Þ
where the constants a and b are selected at each frame such that the minimum and the maximum values of the ratio r are 0 and 255, respectively, and transmitted to the decoder as side information. Then, after the rounding operation, r can fit into the 1-byte representation in conventional image and video codecs. The logarithm operator suppresses unstable values when h and l are extremely large or extremely small. Fig. 2 shows examples of LDR frames and ratio frames. The luma channel of an LDR frame looks similar to the corresponding ratio frame, since both LDR and ratio frames represent the same scene. However, note that the ratio frames have less texture than the LDR frames. In [8], the spatial resolution of ratio frames is reduced to decrease the bit rate. This approach, how-
ð1Þ 1 Throughout this paper, l and h denote nonlinear luma values of LDR and HDR pixels, respectively, whereas yl and yh are the corresponding linear luminance values.
911
C. Lee, C.-S. Kim / J. Vis. Commun. Image R. 23 (2012) 908–923 Table 1 Coefficients in Eqs. (1) and (2). a
b
c
d
17:554
826:81
0:10013
884:17
a0
b
c0
d
0:056968
7:3014 1030
9:9872
884:17
0
0
e
f
y1
y2
209:16
731:28
5:6046
10,469
e0
f0
h1
h2
32:994
0:0047811
98:381
1204:7
Fig. 2. Examples of LDR and ratio frames of the ‘‘Tunnel’’ and ‘‘Cactus’’ sequences.
ever, degrades the qualities of reconstructed HDR sequences, although the pre-correction or post-correction techniques are employed. In this work, we encode ratio frames at their original resolutions.
3.3. Rate-distortion modeling
in [9,30,34]. Specifically, we optimize the encoding parameters of the LDR component and the ratio component for each MB to minimize the distortions of the reconstructed LDR block ðDLDR Þ and the reconstructed HDR block ðDHDR Þ, subject to the constraint on an overall bit budget. This constrained optimization problem can be solved by minimizing the Lagrangian cost function
J ¼ DLDR þ lDHDR þ kðRLDR þ Rratio Þ; After the layered representation, we have two sequences to encode: an LDR sequence and a ratio sequence. Their encoding should be jointly controlled to use a limited bit budget efficiently. In [8], Ward and Simmons used the highest quality setting for the ratio sequence, assuming that the ratio sequence is more important than the LDR sequence. Mantiuk et al. [9,34] experimentally observed that, when the same quality setting was used for both the LDR sequence and the enhancement layer, the HDR sequence could be reconstructed with a high quality. However, these strategies attempt to achieve good performance for the HDR sequence only. When we support the backward compatibility, the conventional decoder should be capable of decoding the LDR sequence with a high quality as well. Therefore, in this work, we consider the qualities of both HDR and LDR sequences simultaneously. Furthermore, we optimize encoding parameters at the macroblock (MB) level, whereas those parameters are adjusted at the entire sequence level
ð4Þ
where RLDR and Rratio denote the bit rates for the LDR and ratio blocks, respectively. In this work, the distortions are measured by the sum of squared differences (SSD) between original and reconstructed pixels. Also, k is a Lagrangian multiplier, which controls the tradeoff between the rates and the distortions [35], and l is another multiplier, which determines the relative importance of the HDR block as compared with the LDR block. The selection of the multipliers k and l will be described in Section 3.5 and at the end of this Section, respectively. The LDR or HDR block is reconstructed at the decoder side, whereas the LDR and ratio blocks are compressed at the encoder side. Therefore, the distortion of the HDR block ðDHDR Þ, instead of the distortion of the ratio block, is incorporated into the Lagrangian cost function in (4). It is technically possible that, for each set of encoding parameters, the encoder computes the exact distortion
912
C. Lee, C.-S. Kim / J. Vis. Commun. Image R. 23 (2012) 908–923
DHDR by emulating the decoder and reconstructing the HDR block. However, this requires too high complexity and is impractical for most applications. Therefore, to reduce the encoding complexity, we first derive a formula to estimate DHDR from the distortions of the LDR block ðDLDR Þ and the ratio block (Dratio ). It is not straightforward to express DHDR in terms of DLDR and Dratio , since HDR, LDR, and ratio pixels are related by the logarithm operator in (3). In the Appendix, based on a piecewise linear approximation of the logarithm function and statistical assumptions, we derive an approximate relationship among DHDR ; DLDR , and Dratio , which is given by
!
2
DHDR ¼
DLDR þ
E½ l
a2
Dratio E½z2 ¼ C LDR DLDR þ C ratio Dratio ;
ð5Þ
where a is the constant in (3), and z ¼ hl at each pixel location. Also, 2
2
C LDR ¼ E½z2 and C ratio ¼ E½ l aE½z . From (5), we make the following 2 observations: First, DHDR increases as DLDR or Dratio gets higher. Second, for fixed DLDR and Dratio ; DHDR is proportional to E½z2 , which represents the average squared ratio between HDR and LDR pixels.
where lðx; yÞ and hðx; yÞ denote the luma values of the LDR block and the HDR block, respectively, and x is a user parameter. When x ¼ 0, only the LDR quality is considered. As x gets larger, more bits are allocated to the HDR block to increase its quality. Therefore, x can control the tradeoff between the LDR quality and the HDR quality, depending on applications. In this work, x is fixed to 1, unless specified otherwise. 3.4. Quantization parameter (QP) control As shown in Fig. 1, an MB, which is composed of LDR and ratio components, is predicted spatially by the intra prediction or temporally by the motion-compensated prediction. Then, the prediction residuals are transformed and quantized. In this work, we use different quantization parameters (QP’s) for the LDR component and the ratio component to control their bit rates adaptively. In a quantizer without entropy coding, the quantization step size Q step is halved as the rate R is increased by 1 bit per sample [36]. In other words, Q step can be approximated as
Third, DHDR is also proportional to E½ l , which is the average squared value of LDR pixels. Using the sufficiently high rate approximation for scalar quantization [36], the rates and distortions for the LDR block and the ratio block, respectively, can be modeled by
Q step ¼ j 2R ;
DLDR ¼ gLDR 22RLDR ;
ð6Þ
Dratio ¼ gratio 22Rratio ;
ð7Þ
so that Q step increases by a factor of approximately 12% for a unit increase in QP [14]. Therefore, QP can be modeled as an affine function of the rate
2
where gLDR and gratio are constants depending on the distribution of the pixel values [37]. By substituting (5)–(7) into the Lagrangian cost function J in (4), we have
J ¼ ð1 þ lC LDR ÞDLDR þ lC ratio Dratio þ kðRLDR þ Rratio Þ ¼ gLDR ð1 þ lC LDR Þ22RLDR þ gratio lC ratio 22Rratio þ kðRLDR þ Rratio Þ;
ð8Þ
which is expressed in terms of the two rates RLDR and Rratio only. By differentiating (8) with respect to RLDR and Rratio and setting them to 0, we derive the bit allocation strategy to minimize J, which is given by
1 ln ½gLDR ð1 þ lC LDR Þ þ d; ln 4 1 ¼ ln ½gratio lC ratio þ d; ln 4
RLDR ¼ Rratio
ð9Þ ð10Þ
where d is a constant. Notice that the bit allocation ratio between the LDR block and the ratio block is determined by C LDR and C ratio , which are obtained from the input HDR and the tone-mapped LDR pixel values. In the next subsection, we will analyze the relationships among C LDR ; C ratio , and the bit allocation ratio in more detail. As mentioned earlier, the multiplier l in (4) determines the relative importance of the HDR distortion DHDR in comparison with the LDR distortion DLDR . Thus, the selection of l affects the reconstruction qualities of the LDR and HDR blocks. Different l’s can be chosen in different applications. For example, customers’ interest can be considered to select l, as done in the scalable video coding in [38]. In this work, we set l as follows. Note that DHDR in (5) is 2 proportional to the average squared ratio E½hl2 and that DHDR is much bigger than DLDR in general. Therefore, to make DHDR comparable to DLDR , we set l to be proportional to the ratio of the squared sums of LDR and HDR luma values by
P
l¼xP
2 x;y ½lðx; yÞ
x;y ½hðx; yÞ
2
;
ð11Þ
ð12Þ
where j is a constant. Also, in H.264/AVC, Q step is related to QP via
Q step / ð1:12ÞQP ;
ð13Þ
QP ¼ c1 þ c2 R;
ð14Þ
where c1 and c2 are constants. Using this model, the bit allocation strategy in (9) and (10) can be expressed in terms of the QP’s for the LDR block and the ratio block by
QPLDR QPratio ¼ c1 ln
lC ratio þ c2 ; 1 þ lC LDR
ð15Þ
where c1 and c2 are constants. Note that the formula for Q step in (12) is not accurate due to the entropy coding in H.264/AVC, and thus the affine model in (14) expresses the QP-rate relationship only approximately. In [39], Kamaci et al. analyzed the Q step -rate relationship more precisely, which leads to a logarithm model QP ¼ c1 þ c2 ln R. However, although the affine-model is less accurate than the logarithm model, it enables us to derive the derivation of the bit allocation strategy in (15), which has only two parameters c1 and c2 . If the logarithm model is employed, the formula for QPLDR QPratio becomes more complicated with more parameters, which are harder to train. Therefore, we employ the affine model in this work. We train the parameters c1 and c2 with three training sequences, which are not employed in the experiments in the next section. Fig. 3 shows the occurrence frequency of the optimal comlC ratio binations ðln 1þ lC LDR ; QPLDR QPratio Þ that maximize the R–D perfor-
mance of MB’s in the training sequences. More specifically, for a fixed QPLDR for each MB, we search exhaustively the optimal QPratio that minimizes the Lagrangian cost in (4). Then, we record lC ratio the corresponding combination ðln 1þ lC LDR ; QPLDR QPratio Þ. This is
repeated over all QPLDR ’s and all MB’s to obtain the occurrence frequency in Fig. 3. We see that the optimal relationship between lC ratio ln 1þ lC LDR and QPLDR QPratio can be approximated by a fitting line,
as predicted from our analysis in (15). The line parameters c1 and c2 of this fitting line are computed using the method of least squares, which are 2.99 and 0.47, respectively. Therefore, we treat QPratio as a dependent variable upon QPLDR , which is given by
C. Lee, C.-S. Kim / J. Vis. Commun. Image R. 23 (2012) 908–923
913
Fig. 5. The same motion vector is used for both the LDR (l) block and the ratio (r) block in the motion compensated prediction. The motion vector is selected to minimize the overall sum of absolute differences.
Fig. 3. The occurrence frequency of the optimal combinations lC ratio ðln 1þ lC LDR ; QPLDR QPratio Þ, which maximize the R–D performance on MB’s in three test sequences. The solid line is fitted with the method of least squares.
QPratio ¼ QPLDR 2:99 ln
lC ratio þ 0:47 1 þ lC LDR
ð16Þ
and control the overall bit rate by QPLDR only. The relationship in (16) can be intuitively understood as follows. For a fixed QPLDR , or equivalently for a fixed C LDR , it can be shown that
C ratio /
1
a2
2
2
E½l / ðln zmax ln zmin Þ2 E½l ;
ð17Þ
where zmin and zmax are the minimum and maximum values of z, respectively. Thus, C ratio increases with the dynamic range of HDR pixels in an MB. Therefore, the rule in (16) indicates that, when an MB has a higher dynamic range, the encoder should set a smaller QPratio to allocate more bits to the ratio component than to the LDR component. Fig. 4 shows examples of HDR, LDR and ratio frames of the ‘‘Cactus’’ sequence, when QPLDR ¼ 24. Since the HDR pixel values are clamped for the printing [40], the printed HDR frame in Fig. 4(a) does not show the details in very bright or very dark regions. However, those details are preserved faithfully during the encoding as shown in the reconstructed LDR frame in Fig. 4(b).
In Fig. 4(c), the reconstructed ratio frame is overlaid with colors, which represent the QP differences ðQPLDR QPratio Þ for MB’s. Red and blue depict positive and negative differences, respectively. Red MB’s correspond to the regions with higher dynamic ranges, and are assigned smaller QPratio ’s. For example, some MB’s containing stairs and background trees have both very bright and very dark pixels, and thus they are assigned smaller QPratio ’s and encoded at higher bit rates. On the other hand, blue MB’s on the pot are almost textureless and have very low dynamic ranges. Therefore, they are assigned larger QPratio ’s. 3.5. Mode decision and motion estimation The H.264/AVC standard supports various intra prediction modes and various motion compensation modes to encode MB’s. Among them, we should select the optimal encoding mode and also the optimal motion vectors, if a motion compensation mode is chosen, for each MB to achieve the highest compression performance. In this work, to encode HDR sequences, we modify the mode decision and motion estimation procedures of the H.264/ AVC standard accordingly. Since the LDR component and the ratio component of an MB represent the same scene as illustrated in Fig. 2, they can share some data elements in the encoded bitstream to reduce the amount of header bits to be transmitted to the decoder. The proposed HDR video encoder is designed to share the same MB mode, the same intra prediction mode, and the same motion vectors. Specifically, given a bit budget, the proposed algorithm determines the common mode to minimize the distortions of LDR and HDR blocks and the common motion vectors to minimize the prediction errors of LDR and ratio blocks. Let us describe these two procedures subsequently. For the intra prediction, the H.264/AVC reference software [41] chooses the best mode among all possible intra prediction modes,
Fig. 4. Examples of (a) HDR, (b) LDR, and (c) ratio frames. In (a), HDR pixel values are clamped for the printing. In (c), the ratio frame is overlaid with colors, which represent the QP differences (QPLDR QPratio ) for MB’s. Red and blue depict positive and negative differences, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
914
C. Lee, C.-S. Kim / J. Vis. Commun. Image R. 23 (2012) 908–923
Table 2 Properties of test HDR sequences which are used in the experiments. Sequence
Frame resolution
Number of frames
Average dynamic range
Tunnel Sun Parking Lot Cactus
320 240 320 240 320 240 320 240
300 300 300 300
2.23 5.69 3.30 3.82
J motion ¼ SADðv x ; v y Þ þ kmotion Rðv x ; v y Þ;
which minimizes the Lagrangian cost D þ kR. We modify the software to use our Lagrangian cost in (4), where DHDR is estimated by the model in (5). In other words, we choose the intra prediction mode that minimizes
J ¼ ð1 þ lC LDR ÞDLDR þ lC ratio Dratio þ kðRLDR þ Rratio Þ:
ð18Þ
For the motion compensation, H.264/AVC supports variable block sizes from 4 4 to 16 16 luma samples. The proposed algorithm estimates the motion vector of each block as follows. As shown in Fig. 5, the motion vector ðv x ; v y Þ of an LDR block is used for the corresponding ratio block as well. For each motion vector candidate ðv x ; v y Þ within a search range ½W; W ½W; W, we compute the sum of absolute differences (SAD) by
X SADðv x ; v y Þ ¼ jlðx; yÞ l0ðx þ v x ; y þ v y Þj x;y
þ
X x;y
jrðx; yÞ r 0 ðx þ v x ; y þ v y Þj;
where l and r denote the LDR and ratio pixel values in the current 0 frame, and l and r 0 are the LDR and ratio pixel values in the reference frame, respectively. Then, similarly to the reference software [41], we select the motion vector that minimizes the cost Jmotion , given by
ð19Þ
ð20Þ
where Rðv x ; v y Þ is the number of bits to encode the motion vector ðv x ; v y Þ, and kmotion is the Lagrangian multiplier for the rate-constrained motion estimation, which is set as
kmotion ¼
pffiffiffiffiffiffiffiffiffiffi ðQP 12Þ=6 0:85 2 LDR :
ð21Þ
pffiffiffiffiffiffiffiffiffiffi Note that the constant 0:85 was experimentally obtained in [35,42]. After the encoder chooses the best intra prediction mode and the best set of motion vectors for each motion compensation mode, it determines the mode that minimizes the overall Lagrangian cost in (18). The multiplier k, which controls the tradeoff between distortions and rates, is set to
k ¼ k2motion ;
ð22Þ
as suggested in [35,42]. Notice that the proposed algorithm controls the overall bit rate by varying the single parameter QPLDR , and sets all the other parameters QPratio ; kmotion , and k via (16), (21), and (22), respectively.
Fig. 6. PSNR performance comparison on the luma components of the HDR sequences. The unit for bit rate is bits per pixel (bpp). The data for the Mantiuk et al.’s algorithm [9] in (a) were provided by Dr. Mantiuk.
C. Lee, C.-S. Kim / J. Vis. Commun. Image R. 23 (2012) 908–923
915
Fig. 7. PSNR performance comparison on the chroma components of the HDR sequences. The data for the Mantiuk et al.’s algorithm [9] in (a) and (b) were provided by Dr. Mantiuk.
916
C. Lee, C.-S. Kim / J. Vis. Commun. Image R. 23 (2012) 908–923
Fig. 8. PSNR performance comparison on the tone-mapped LDR sequences. The data for the Mantiuk et al.’s algorithm [9] in (a) were provided by Dr. Mantiuk.
3.6. Reconstruction of HDR color components While both luma and chroma components of the LDR sequence are encoded, only the luma component of the ratio sequence is defined and encoded. Thus, to reconstruct HDR color values, the decoder should estimate the color components of the HDR sequence from those of the LDR sequence. Several algorithms have been developed for this purpose. For example, Ward and Simmons [43] proposed the saturation compensation scheme, assuming that the saturation change could be estimated from LDR information. Okuda and Adami [29] proposed another saturation compensation scheme, which estimated HDR values from LDR values using a polynomial approximation. In this work, we use a simpler saturation compensation scheme to make the HDR color reconstruction consistent with the tone mapping, assuming that the amount of saturation is the same for all color components. Specifically, in order to obtain the color channels of each LDR pixel during the tone mapping, we use the following formula in [1].
2 s 3 y rh 6 l yh 7 rl 6 s 7 7 6 7 6 4 g l 5 ¼ 6 yl gyhh 7; 7 6 4 s5 bl bh yl y 2
components. The exponent s, which is less than or equal to 1, is a saturation parameter and represents the correlation between LDR and HDR color components. Then, we use the inverse mapping of (23) to reconstruct the lin^ Þ of the HDR pixel at the decoder side, ear RGB color vector ð^rh ; g^h ; b h which is given by
2
3
2
^h y
1=s 3 ^r l
^l y 7 6 ^r h 7 6 6 g^ 7 6 g^l 1=s 7 ^h y^ 7; 4 h 5 ¼ 6y l 7 6 ^ 5 4 bh ^ 1=s b l ^ yh y^
ð24Þ
l
^l and y ^h are the linearized luminance values of the reconwhere y ^ denote structed LDR and HDR pixels, respectively. Also, ^rl ; g^l and b l the linearized R, G, B components of the reconstructed LDR pixel. The saturation parameter s is set to be the same as in (23) and is transmitted from the encoder to the decoder as side information. Note that the mapping in (24) is similar to the saturation compensation scheme in [29].
3
ð23Þ
h
where yh and yl are the linear luminance values of HDR and LDR pixels, respectively. r l ; g l , and bl are the linear R, G, B components of the tone-mapped LDR pixel, while r h ; g h , and bh are the linear HDR color
4. Simulation results 4.1. Experimental conditions We evaluate the performance of the proposed algorithm on four HDR sequences. Table 2 summarizes properties of these HDR sequences, which are used in the experiments. Whereas the aver-
C. Lee, C.-S. Kim / J. Vis. Commun. Image R. 23 (2012) 908–923
917
Fig. 9. Comparison of the overall distortions Doverall ¼ DLDR þ lDHDR at various bit rates. The data for the Mantiuk et al.’s algorithm [9] in (a) were provided by Dr. Mantiuk.
age dynamic range of the ‘‘Sun’’ sequence is very high, it has a small region of an extreme dynamic range and most regions have little textures and lower dynamic ranges. We acquired the ‘‘Parking Lot’’ and ‘‘Cactus’’ sequences, whereas the ‘‘Tunnel’’ and ‘‘Sun’’ sequences were provided by Krawczyk [44]. We captured short and long exposure frames alternately using a conventional LDR camera at a frame rate of 200 frames per second (fps), and then synthesized HDR frames from successive LDR frames as done in [45]. Finally, we sampled every seventh frame to reduce the frame rate of the HDR video sequence to about 28.6 fps. The average dynamic range of a sequence is defined as N yi;max 1X log10 ; N i¼1 yi;min
ð25Þ
where N is the number of frames in the sequence, and yi;min and yi;max denote the minimum and maximum luminance values of the ith HDR frame, respectively. The saturation parameter s in (23) is fixed to 1, and the quantization parameters of the LDR sequences ðQPLDR Þ are set to 12, 16, 20, 24, or 28 to evaluate the compression performance at various bit rates. Finally, the parameter x in (11) is fixed to 1 to equalize the importance of the HDR sequence and the LDR sequence except for the results in Fig. 10, which shows the impacts of the parameter x on the LDR and HDR distortions. We test four tone mapping schemes, which will be discussed in Section 4.4, using the software in [46]. Also, we implement the proposed HDR video encoding algorithm by extending the JM 10.2
implementation [41] of the baseline profile of the H.264/AVC standard. 4.2. Compression performance The compression performance of the proposed algorithm is evaluated in this section. Specifically, we compare the distortions of HDR and LDR sequences and the overall distortions, respectively, at various bit rates. In Figs. 6–9, blue curves show the performance of the proposed R–D optimized encoding algorithm. For comparison, we provide the performance of three conventional approaches. Green curves are for the simulcast method that uses H.264/AVC to encode an LDR sequence and a ratio sequence independently with the same QP as done in [9]. Red curves are for the simulcast method using the smallest QP for a ratio sequence as done in [8]. Violet curves are for the sequence level R-D optimized encoding algorithm [30]. Results in Figs. 6–9 are obtained using the gradient domain video tone mapping [22], but other TMO’s also produce similar results. It is unfair to compare the proposed algorithm directly with the Mantiuk et al.’s algorithm [9], whose performance on the ‘‘Tunnel’’ sequence is shown in Figs. 6–9 as light blue curves, since it employs MPEG-4 as a basis codec. Note that MPEG-4 is much inferior to H.264/AVC in terms of compression performance. Therefore, for the simulcast methods and the sequence level R-D optimization algorithm in Figs. 6–9, we employ H.264/AVC as a basis codec.
918
C. Lee, C.-S. Kim / J. Vis. Commun. Image R. 23 (2012) 908–923
Fig. 10. Compression performance on the ‘‘Tunnel’’ sequence for different values of x.
Fig. 6 shows the PSNR performances on the luma components of the HDR sequences at various bit rates. The PSNR’s of the reconstructed HDR sequences are computed with luma values, obtained by the conversion formula in (1), and the unit for bit rate is bits per pixel (bpp). We observe that, on all test sequences, the proposed algorithm provides better performance than the simulcast with the same QP setting ðQPratio ¼ QPLDR Þ by employing the R–D optimized bit allocation and sharing MB modes and motion vectors. For example, the proposed algorithm produces about 0.6, 0.4, 0.3, and 0.5 dB higher PSNR results at 0.5 bpp, and about 0.5, 0.3, 0.4, and 0.4 dB higher PSNR results at 1 bpp on the ‘‘Tunnel’’, ‘‘Sun’’, ‘‘Parking Lot’’ and ‘‘Cactus’’ sequences, respectively. We also see that the simulcast using the smallest QP for a ratio sequence yields inferior R-D performance due to its wasteful bit usage for the ratio sequence. The sequence level R–D optimization outperforms all the other methods on the ‘‘Tunnel’’ sequence, whereas it achieves even worse performance than the simulcast with the same QP setting on the other three sequences. This is because the sequence level R–D optimization assigns a fixed QPratio to all ratio MB’s in a frame without considering the characteristics of individual MB’s. This is acceptable only for the ‘‘Tunnel’’ sequence with a low dynamic range, in which ratio frames have little textures, but is not efficient for the other sequences with higher dynamic ranges. However, even on the ‘‘Tunnel’’ sequence, the proposed algorithm yields better performance than the sequence level R–D optimization in terms of LDR PSNR’s and overall distortions, as will be discussed later in this section. Fig. 7 compares the PSNR performances on the chroma components of the reconstructed HDR sequences. The chroma PSNR’s are computed with 8-bit uniform chromacity scales u0 and v 0 as defined in [9,33]. We see that the chroma PSNR curves show similar
tendencies to the luma PSNR curves in Fig. 6. Specifically, the proposed algorithm provides better performance than the simulcast methods on all test sequences. Also, the sequence level R–D optimization yields the best performance on the ‘‘Tunnel’’ sequence, while providing lower PSNR’s than the proposed algorithm on the other three sequences. To support the backward compatibility with conventional LDR devices, it is also important to achieve high quality reconstruction of a tone-mapped LDR sequence. Fig. 8 compares the PSNR performances on the LDR sequences. The proposed algorithm achieves significantly better performance than the simulcast methods. For example, at 1 bpp, the proposed algorithm provides about 1.3, 0.2, 2.0, and 0.8 dB higher PSNR results on the ‘‘Tunnel’’, ‘‘Sun’’, ‘‘Parking Lot’’ and ‘‘Cactus’’ sequences, respectively, than the simulcast with the same QP setting. The performance gaps between the proposed algorithm and the simulcast with the same QP setting on the ‘‘Sun’’ and ‘‘Cactus’’ sequences with relatively higher dynamic ranges are not as high as those of the ‘‘Tunnel’’ and ‘‘Parking Lot’’ sequences, since the proposed algorithm allocates more bits to the ratio sequences for HDR sequences with higher dynamic ranges via (16). Contrary to the HDR PSNR results, the sequence level R-D optimization yields lower LDR PSNR’s than the proposed algorithm on the ‘‘Tunnel’’ sequence, while providing higher LDR PSNR’s on the other three sequences. This is because the proposed algorithm
Table 3 Comparison of the average encoding times for encoding a frame in the ‘‘Tunnel’’ sequence at QPLDR ¼ 20. Exhaustive search
Proposed algorithm
41.20 s
5.94 s
C. Lee, C.-S. Kim / J. Vis. Commun. Image R. 23 (2012) 908–923
919
Fig. 11. Comparison of the overall distortions of the exhaustive search and the proposed algorithm.
attempts to minimize the overall distortion, not just the LDR distortion, by adapting to the characteristics of MB’s. Thus, on the ‘‘Sun,’’ ‘‘Parking Lot,’’ and ‘‘Cactus’’ sequences, the proposed algorithm sacrifices LDR PSNR’s to increase HDR PSNR’s, thereby reducing the overall distortions. Fig. 9 compares the overall distortions Doverall ¼ DLDR þ lDHDR . The LDR distortion DLDR and the HDR distortion DHDR , respectively, are defined as the sum of squared differences between the luma values of original and reconstructed pixels, and l is determined by (11). The proposed algorithm provides lower overall distortions than the conventional simulcast methods and the sequence level R–D optimization method, by allocating bits adaptively according to the dynamic range of each MB. Compared with the simulcast with the same QP setting, the proposed algorithm reduces overall distortions efficiently, except for the ‘‘Sun’’ sequence. Specifically, at the overall distortion 20, the proposed R-D optimized algorithm uses about 30.9, 15.4, and 14.1% less bits than the simulcast with the same QP setting to encode the ‘‘Tunnel’’, ‘‘Parking Lot’’ and ‘‘Cactus’’ sequences, respectively. The R–D curve of the proposed algorithm on the ‘‘Sun’’ sequence is almost the same as that of the simulcast method using the same QP. This is because, although the ‘‘Sun’’ sequence has a small region of an extreme dynamic range, its most regions have little textures and lower dynamic ranges. The simulcast with the same QP also achieves quite good performance on average, when a sequence consists mainly of MB’s with low dynamic ranges for which ratio components have similar importance to LDR components. Moreover, as compared
with the sequence level R-D optimization method, the proposed algorithm reduces the overall distortions consistently on all sequences. Especially, the proposed algorithm reduces the distortions more effectively on the ‘‘Sun’’ and ‘‘Cactus’’ sequences. Those sequences contain MB’s with extremely high dynamic ranges, for which the proposed algorithm significantly outperforms the sequence level optimization based on the MB level adaptation. The proposed HDR video compression algorithm allocates a limited bit budget to LDR and ratio sequences, and a single parameter x in (11) controls the bit allocation between these two sequences. Fig. 10 shows the compression performance on the ‘‘Tunnel’’ sequence for three different values of x. When the smallest x is used, i.e., x ¼ 0:5, the proposed algorithm yields the worst performance in terms of HDR PSNR, while providing the highest LDR PSNR, since the HDR distortion DHDR is weighted by a smaller l in computing the overall distortion in (4). On the contrary, as x gets higher, the HDR PSNR increases, whereas the LDR PSNR decreases. The overall distortion is not much affected by different values of x. Note that, for a fixed k, the bit rate increases as x gets larger, since the distortion is emphasized in the R-D cost function in (4). The parameter x can be selected adaptively depending on applications. 4.3. Comparison with exhaustive search The proposed algorithm controls the bit allocation to the LDR and ratio sequences based on the estimation of the HDR distortion DHDR in (5). Without the estimation, for a fixed QPLDR , the encoder
920
C. Lee, C.-S. Kim / J. Vis. Commun. Image R. 23 (2012) 908–923
Fig. 12. Comparison of compression performance for different TMO’s on the ‘‘Tunnel’’ sequence.
Fig. 13. Bit uses for LDR and ratio components.
should search exhaustively the optimal QPratio that minimizes the Lagrangian cost for each MB in (4) to optimize the R–D performance. However, the exhaustive search method requires high computational complexity, even though it can provide the optimal R–D performance. Table 3 lists the average encoding times for encoding a frame in the ‘‘Tunnel’’ sequence at QPLDR ¼ 20. We use a PC with a 2.8 GHz CPU for this test. The times for performing the tone mapping and obtaining the ratio sequences are not included in Table 3, since the tone mapping is independent of the encoding and the ratio sequences are obtained once and used for all tests. As compared with the exhaustive search method, the proposed algorithm significantly reduces the encoding time by deciding QPratio via the rule
in (16). More specifically, the proposed algorithm is about 6.9 times faster than the exhaustive search method. Moreover, we note that the proposed algorithm provides similar encoding times to the simulcast methods, since the proposed algorithm requires only a few more computations to estimate HDR distortions in the R–D optimization procedure. Fig. 11 compares the overall distortions Doverall of the exhaustive search method and the proposed algorithm. At the cost of very high computational complexity, the exhaustive search provides the optimal R–D performance. Thus, the proposed algorithm provides inferior performance to the exhaustive search. However, the differences in the R–D curves are negligible on the ‘‘Tunnel’’ and the ‘‘Sun’’ sequences. Moreover, as shown in the last section, the
C. Lee, C.-S. Kim / J. Vis. Commun. Image R. 23 (2012) 908–923
921
Fig. 14. Examples of reconstructed HDR frames. The pixel values in each region, marked by a blue or red rectangle, are linearly scaled to a displayable range and shown separately. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
proposed algorithm provides significantly better R–D performance than the conventional simulcast methods. 4.4. Effects of tone mapping operators We use a TMO to separate an input HDR video sequence into an LDR sequence and a ratio sequence. The proposed HDR video encoder works independently of a TMO, and any TMO can be used for this purpose. The selection of a TMO, however, affects the compression efficiency of an HDR video sequence. Let us evaluate the compression performance using four TMO’s: the fast bilateral filter [19], the photographic tone reproduction [18], the adaptive logarithmic mapping [15], and the gradient domain video tone mapping [22]. Fig. 12 compares the compression performance on the ‘‘Tunnel’’ sequence using the four TMO’s. Although the TMO’s provide similar compression performances, they exhibit different tendencies for each quality metric. The fast bilateral filter provides the best performance in terms of the overall distortion and the LDR PSNR, whereas the adaptive logarithmic mapping outperforms the others on the HDR PSNR. Furthermore, it was observed that different sequences favor different TMO’s. Therefore, the selection of a TMO should depend on applications and the characteristics of input sequences. Note that similar observations were made in [9,17]. 4.5. Bit use analysis The proposed HDR video coding algorithm extends the H.264/ AVC standard by adding three types of syntax elements for ratio
Fig. A.15. The domain of the logarithm function is divided into M partitions. Then, the logarithm is approximated as a linear segment within each partition.
blocks: coded block patterns, transform coefficients, and QP’s. Fig. 13 analyzes the bit uses on the ‘‘Tunnel’’ and ‘‘Cactus’’ sequences. In this test, the gradient domain video tone mapping [22] is employed. Bits are used for LDR pixels, ratio pixels, and parameters that are employed during the tone mapping and ratio frame generation. However, bits for these parameters are negligible and not plotted in Fig. 13. Since the dynamic range of the ‘‘Tunnel’’ sequence is relatively low, its ratio frames have little textures with large flat regions as shown in Fig. 2. As a result, larger QP’s are assigned to many MB’s in the ratio sequence, and less ratio bits are required. On the other hand, the dynamic range of the ‘‘Cactus’’ sequence is
922
C. Lee, C.-S. Kim / J. Vis. Commun. Image R. 23 (2012) 908–923
higher than that of the ‘‘Tunnel’’ sequence, and the ratio frames contain more complex textures. This causes smaller QP’s for the ratio frames, requiring a higher bit rate. Fig. 13 also indicates that, in addition to LDR bits, about 25 94% more bits are required to transmit HDR videos. Although this portion of ratio bits is larger than that of the simulcast method in [9], the proposed algorithm provides significantly better R–D performance than the simulcast method.
Let us divide the domain of the logarithm function in (A.1) into M partitions as shown in Fig. A.15. Then, assuming that M is large enough, the logarithm is approximated as a linear function of z within each partition fz : zk1 6 z < zk g, given by
r¼
a zk1
ðz zk1 Þ þ a ln zk1 þ b ¼ ak z þ bk ;
ðA:2Þ
where ak ¼ z a and bk ¼ a ln zk1 a þ b. Since z ¼ hl, this implies k1
4.6. Decoding examples
h¼
Fig. 14 shows selected frames of the reconstructed HDR sequences. In this test, QPLDR ¼ 12, and the bit rates for the ‘‘Tunnel’’, ‘‘Sun’’, ‘‘Parking Lot’’ and ‘‘Cactus’’ sequences are 2.11, 1.44, 1.81, and 2.17 bpp, respectively. Also, the gradient domain video tone mapping [22] is employed. Compared with an LDR frame, an HDR frame can represent very dark and very bright regions simultaneously. Since HDR pixel values are clamped for the purpose of printing [40], very dark and very bright regions within blue and red rectangles appear to lose visual details in Fig. 14. To show these details, we scale the corresponding histograms to a displayable range. For example, in the ‘‘Parking Lot’’ sequence, the front window and headlights of the vehicle are too bright, and the wall of the tunnel is too dark. Similarly, in the ‘‘Sun’’ sequence, the signpost, streetlight, and clouds are not clear. However, after scaling the histograms, all details of these objects are recognizable. This indicates that the proposed encoding algorithm preserves the details in HDR sequences faithfully by utilizing limited bit budgets efficiently.
l
ak
ðr bk Þ:
Therefore, in the kth partition fz : zk1 6 z < zk g, the squared error of the HDR signal can be written as
n o2 ^ 2 ¼ 1 lðr b Þ ^lð^r b Þ ðh hÞ k k 2 ak o2 1n ¼ 2 ðl ^lÞðr bk Þ þ ^lðr ^r Þ ; ak
Pk ¼ Pr½Bk ¼ Pr½zk1 6 z < zk :
Then, the conditional squared error of the HDR signal given the event Bk can be approximated as
h i h i ^ 2 jB ¼ 1 E fðl ^lÞðr b Þ þ ^lðr ^r Þg2 jB E ðh hÞ k k k 2 ’
i h i 1 h E ðl ^lÞ2 jBk E ðr bk Þ2 jBk 2
ak þ
i 1 h2 i h E l jBk E ðr ^r Þ2 jBk ; 2
We thank Dr. Grzegorz Krawczyk for making the HDR sequences ‘‘Tunnel’’ and ‘‘Sun’’ available for our experiments and Dr. Rafał Mantiuk for providing us valuable comments and their experimental data on the ‘‘Tunnel’’ sequence for comparison. This work was supported partly by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2012-011031), and partly by the Global Frontier R&D Program on Human-centered Interaction for Coexistence, funded by the National Research Foundation of Korea grant funded by the Korean Government (MEST) (NRF-M1AXA003-2011-0031648). Appendix A. Derivation of the distortion model DHDR in (5) The relation among HDR, LDR, and ratio signals in (3) can be written as
ðA:6Þ
ak
where, in order to derive a simple approximate distortion model, we assume that the l components and the r components are uncor2 related, E½ðl ^lÞ^ljBk ¼ 0, and E½l jBk ¼ E½^l2 jBk . Furthermore, assuming that the LDR signal value and the distortions of LDR and ratio signals are independent of the event Bk , we 2 2 have E½l jBk ¼ E½l ; E½ðl ^lÞ2 jBk ¼ DLDR , and E½ðr ^r Þ2 jBk ¼ Dratio . Then, (A.6) can be rewritten as
h i h i ^ 2 jB ¼ 1 E ðr b Þ2 jB DLDR þ 1 E½ l2 D : E ðh hÞ ratio k k k 2 2
ak
Acknowledgments
where z ¼
ðA:5Þ
ak
In this work, we proposed an R–D optimized compression algorithm for HDR videos. The proposed algorithm decomposes an input HDR sequence into a tone-mapped LDR sequence and a ratio sequence. We also derived a distortion model of an HDR block, which is expressed in terms of the distortions of the LDR block and the ratio block. Then, based on the distortion model, we proposed an efficient bit allocation scheme, which maximizes the qualities of both LDR and HDR sequences subject to the constraint on a total bit rate for LDR and ratio components. Simulation results demonstrated that the proposed algorithm provides significantly better R–D performance than the conventional simulcast methods [9,8].
h . l
ðA:4Þ
^ ^l and ^r denote the reconstructed HDR, LDR, and ratio sigwhere h; nals, respectively. Let Bk denote the event that z belongs to the kth partition and Pk denote its probability
5. Conclusions
h r ¼ a ln þ b ¼ a ln z þ b; l
ðA:3Þ
ðA:7Þ
ak
From (A.2), we have r bk ¼ ak z ’ ak zk1 . Then, E½ðr bk Þ2 jBk in (A.7) can be approximated as
h i E ðr bk Þ2 jBk ’ a2k z2k1 :
ðA:8Þ
Substituting (A.8) into (A.7), we have
h i 2 ^ 2 jB ¼ z2 D þ zk1 E½l2 D : E ðh hÞ ratio k k1 LDR 2
ðA:9Þ
a
^ 2 jB in (A.9) over Finally, DHDR is obtained by summing E½ðh hÞ k all partitions, given by
DHDR ¼
M h i X ^ 2 jB P ¼ E ðh hÞ k k k¼1
2
DLDR þ
E½ l
a2
Dratio
! M X
z2k1 Pk :
k¼1
ðA:10Þ ðA:1Þ
As the number of partitions, M, approaches infinity, (A.10) can be written as an integral,
PM
2 k¼1 zk1 P k
in
C. Lee, C.-S. Kim / J. Vis. Commun. Image R. 23 (2012) 908–923
lim
Z M M X X z2k1 P k ¼ lim z2k1 f ðzk1 ÞD ¼
M!1
M!1
k¼1
k¼1
1
z2 f ðzÞdz ¼ E½z2 ; 0
ðA:11Þ where D denotes the partition size zk zk1 , and f is the probability density function of z. Therefore, given the distortions of the LDR and ratio signals, the distortion of the HDR signal can be expressed as 2
DHDR ¼
DLDR þ
E½ l
a2
! Dratio E½z2 :
ðA:12Þ
References [1] E. Reinhard, G. Ward, S. Pattanaik, P. Debevec, High Dynamic Range Imaging, Morgan Kaufman Publishers, 2005. [2] P.E. Debevec, E. Reinhard, G. Ward, K. Myszkowski, H. Seetzen, H. Zargarpour, G. McTaggart, D. Hess, Course on high dynamic range imaging: theory and applications, in: ACM SIGGRAPH Course Notes, 2006. [3] F. Kains, R. Bogart, D. Hess, OpenEXR image file format, in: ACM SIGGRAPH Sketches & Applications, 2003. [4] G.W. Larson, R. Shakespeare, Rendering With Radiance: The Art and Science of Lighting Visualization, Morgan Kaufmann Publishers, 1998. [5] G.W. Larson, LogLuv encoding for full-gamut high-dynamic range images, J. Graph. Tools 3 (1998) 15–31. [6] R. Xu, S. Pattanaik, C. Hughes, High-dynamic-range still-image encoding in JPEG 2000, IEEE Comput. Graph. Appl. 25 (6) (2005) 57–64. [7] R. Mantiuk, G. Krawczyk, K. Myszkowski, H.-P. Seidel, Perception-motivated high dynamic range video encoding, ACM Trans. Graph. 23 (3) (2004) 733–741. [8] G. Ward, M. Simmons, Subband encoding of high dynamic range imagery, in: Proceedings of the 1st Symposium on Applied Perception in Graphics and Visualization, 2004, pp. 83–90. [9] R. Mantiuk, A. Efremov, K. Myszkowski, H.-P. Seidel, Backward compatible high dynamic range MPEG video compression, ACM Trans. Graph. 25 (3) (2006) 713–723. [10] A. Segall, Scalable coding of high dynamic range video, in: Proceedings of the IEEE ICIP, vol. 1, 2007, pp. 1–4. [11] S. Liu, W.-S. Kim, A. Vetro, Bit-depth scalable coding for high dynamic range video, in: Proceedings of the SPIE Visual Communication and Image Processing, vol. 6822, 2008, p. 8220O [12] Y. Gao, Y. Wu, Y. Chen, H.264/Advanced video coding (AVC) backwardcompatible bit-depth scalable coding, IEEE Trans. Circuits Syst. Video Technol. 19 (4) (2009) 500–510. [13] ITU-T Rec. H.264 and ISO/IEC 14496-10 AVC, Advanced Video Coding for Generic Audiovisual Services, 2005. [14] T. Wiegand, G.J. Sullivan, G. Bjøntegaard, A. Luthra, Overview of the H.264/AVC video coding standard, IEEE Trans. Circuits Syst. Video Technol. 13 (7) (2003) 560–576. [15] F. Drago, K. Myszkowski, T. Annen, N. Chiba, Adaptive logarithmic mapping for displaying high contrast scenes, Comput. Graphics Forum 22 (3) (2003) 419– 426. [16] R. Mantiuk, S. Daly, L. Kerofsky, Display adaptive tone mapping, ACM Trans. Graph. 27 (3) (2008) 1–10. [17] Z. Mai, H. Mansour, R. Mantiuk, P. Nasiopoulos, R. Ward, W. Heidrich, Optimizing a tone curve for backward-compatible high dynamic range image and video compression, IEEE Trans. Image Process. 20 (6) (2011) 1558–1571. [18] E. Reinhard, M. Stark, P. Shirley, J. Ferwerda, Photographic tone reproduction for digital images, ACM Trans. Graph. 21 (3) (2002) 267–276. [19] F. Durand, J. Dorsey, Fast bilateral filtering for the display of high-dynamicrange images, ACM Trans. Graph. 21 (3) (2002) 257–266.
923
[20] R. Fattal, D. Lischinski, M. Werman, Gradient domain high dynamic range compression, ACM Trans. Graph. 21 (3) (2002) 249–256. [21] H. Wang, R. Raskar, N. Ahuja, High dynamic range video using split aperture camera, in: Proceedings of the IEEE 6th Workshop on Omnidirectional Vision, Camera Networks and Non-Classical Cameras, 2005, pp. 83–90. [22] C. Lee, C.-S. Kim, Gradient domain tone mapping of high dynamic range videos, in: Proceedings of the IEEE ICIP, vol. 3, 2007, pp. 461–464. [23] J.H.V. Hateren, Encoding of high dynamic range video with a model of human cones, ACM Trans. Graph. 25 (4) (2006) 1380–1399. [24] P. Irawan, J.A. Ferwerda, S.R. Marschner, Perceptually based tone mapping of high dynamic range image streams, in: Proc. Eurographics Symposium on Rendering, 2005, pp. 231–242. [25] ISO/IEC 14496-2, Information Technology–Coding of Audio–Visual Objects– Part 2: Visual, 2004. [26] ITU-T Rec. T.832 and ISO/IEC 29199-2, Information Technology–JPEG XR Image Coding System–Image Coding Specification, 2009. [27] F. Dufaux, G.J. Sullivan, T. Ebrahimi, The JPEG XR image coding standard, IEEE Signal Process. Mag. 26 (6) (2009) 195–199, 204. [28] J.-U. Garbas, H. Thoma, Temporally coherent luminance-to-luma mapping for high dynamic range video coding with H.264/AVC, in: Proc. IEEE ICASSP, 2011, pp. 829–832. [29] M. Okuda, N. Adami, Two-layer coding algorithm for high dynamic range images based on luminance compensation, J. Vis. Commun. Image R. 18 (5) (2007) 377–386. [30] C. Lee, C.-S. Kim, Rate-distortion optimized compression of high dynamic range videos, in: Proceedings of the 16th European Signal Processing Conference, 2008. [31] B. Bross, W.-J. Han, J.-R. Ohm, G.J. Sullivan, T. Wiegand, Working Draft 5 of HighEfficiency Video Coding, JCTVC-G1103, Geneva, Switzerland, November, 2011. [32] C. Poynton, Digital Video and HDTV: Algorithms and Interfaces, Morgan Kaufmann Publishers, 2003. [33] R. Mantiuk, K. Myszkowski, H.-P. Seidel, Lossy compression of high dynamic range images and video, in: Proceedings of the SPIE Human Vision and Electronic Imaging XI, vol. 6057, 2006, p. 60570V. [34] R. Mantiuk, A. Efremov, K. Myszkowski, H.-P. Seidel, Design and evaluation of backward compatible high dynamic range video compression, Research Report MPI-I-2006-4-001, Max-Planck-Institut für Informatik, April 2006. [35] G.J. Sullivan, T. Wiegand, Rate-distortion optimization for video compression, IEEE Signal Process. Mag. 15 (6) (1998) 74–90. [36] A. Gersho, R.M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Press, 1992. [37] T.M. Cover, J.A. Thomas, Elements of Information Theory, second ed., WileyInterscience, 2006. [38] Z.G. Li, S. Rahardja, H. Sun, Implicit bit allocation for combined coarse granular scalability and spatial scalability, IEEE Trans. Circuits Syst. Video Technol. 16 (12) (2006) 1449–1459. [39] N. Kamaci, Y. Altunbasak, R.M. Mersereau, Frame bit allocation for the H.264/ AVC video coder via Cauchy-density-based rate distortion models, IEEE Trans. Circuits Syst. Video Technol. 15 (8) (2005) 994–1006. [40] C. Tchou, P. Devebec, HDR Shop, in: ACM SIGGRAPH Sketches & Applications, 2001. [41] H.264/AVC reference software JM 10.2. . [42] T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, G.J. Sullivan, Rate-constrained coder control and comparison of video coding standards, IEEE Trans. Circuits Syst. Video Technol. 13 (7) (2003) 688–703. [43] G. Ward, M. Simmons, JPEG-HDR: A backwards-compatible, high dynamic range extension to JPEG, in: Proceedings of the 13th Color Imaging Conference, 2005, pp. 283–290. [44] G. Krawczyk, M. Goesele, H.-P. Seidel, Photometric calibration of high dynamic range cameras, Research Report MPI-I-2005-4-005, Max-Planck-Institut für Informatik, April 2005. [45] P.E. Debevec, J. Malik, Recovering high dynamic range radiance maps from photographs, in: Proceedings of the SIGGRAPH 97, 1997, pp. 369–378. [46] Tone mapping library PFStmo 1.4. .