Author’s Accepted Manuscript An Entropy based Approach for SSIM Speed Up V. Bruni, D. Vitulano
www.elsevier.com/locate/sigpro
PII: DOI: Reference:
S0165-1684(17)30007-5 http://dx.doi.org/10.1016/j.sigpro.2017.01.007 SIGPRO6366
To appear in: Signal Processing Received date: 25 January 2016 Revised date: 6 December 2016 Accepted date: 9 January 2017 Cite this article as: V. Bruni and D. Vitulano, An Entropy based Approach for SSIM Speed Up, Signal Processing, http://dx.doi.org/10.1016/j.sigpro.2017.01.007 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
An Entropy based Approach for SSIM Speed Up V. Brunia , D. Vitulanob a
University of Rome ’Sapienza’, Dept. of SBAI, Faculty of Engineering, Via A. Scarpa 16, 00161 Rome, Italy, e-mail:
[email protected] b Istituto per le Applicazioni del Calcolo ”M. Picone”, C.N.R., Via dei Taurini 19, 00185 Rome, Italy, e-mail:
[email protected]
Abstract This paper focuses on an entropy based formalism to speed up the evaluation of the Structural SIMilarity (SSIM) index in images affected by a global distortion. Looking at images as information sources, a visual distortion typical set can be defined for SSIM. This typical set consists of just a subset of information belonging to the original image and the corresponding one in the distorted version. As side effect, some general theoretical criteria for the computation of any full reference quality assessment measure can be given in order to maximize its computational efficiency. Experimental results on various test images show that the proposed approach allows to estimate SSIM with a considerable speed up (about 200 times) and a small relative error (often lower than 5%). Keywords: Information theory, SSIM, asymptotic equipartition property, image quality assessment, typical set. 1. Introduction In the last few years, there has been an increasing interest in defining new objective visual Quality Assessment (QA) measures that better correlate with the human perception [6, 12, 28, 31, 33, 44, 46, 48]. They can be roughly classified as follows [40]: 1. Full Reference (FR) or non blind approaches [1, 13, 38, 41, 43], that measure the quality of the observed (distorted) image when the original (or reference) one is available.
Preprint submitted to Elsevier
January 10, 2017
2. Reduced Reference (RR) [8, 17] approaches, that predict the quality of the observed image on the basis of a partial information about the original image (reference features). 3. No Reference (NR) or blind approaches [9, 11, 19, 21, 27, 36, 49], that try to give a measure of the quality of the observed image when the original one is not available. It is worthwhile outlining the fact that these measures can depend on the kind of distortion they are able to deal with. An alternative classification for FR measures in terms of the preferred distortion kinds can be found in [13]. Probably, the most famous FR measures are the classical Signal-to-NoiseRatio (SNR) [14, 47] and the more recent Structural SIMilarity (SSIM) [42]. This latter is pixel-based and exploits a suitable combination of first and second order statistical moments in each pixel neighborhood (block) of both the recovered and the original image [39, 34] — see Fig. 1 for details regarding its implementation. Despite the high correlation with human perception, SSIM suffers from a higher (than SNR) computational effort, that may penalize its use in various real-time applications, especially video-oriented ones. In fact, SSIM complexity is O(l2 N), as it depends on both the image size (N) and block dimension (l2 ) — necessary to locally evaluate the metric. On the contrary, SNR complexity is O(N) as it only depends on the image size. The aim of this paper is to investigate the possibility of speeding up SSIM on images affected by a global distortion i.e., a distortion that affects almost uniformly the original image. Speed up can be achieved by selecting a subset of information, denoted as Visual Distortion Typical Set, from both the original image (first source) I and its distorted version (second source) J. The proposed approach is motivated by two empirical observations. The first one accounts for the fact that human beings do not explore the whole information sources (original and distorted) for determining the degree of distortion and assigning a subjective quality score. The second reason derives from the fact that any distortion differently affects images, with some suprathreshold effects that have not a clear and theoretical explanation yet [5, 47]. This also confirms the observation that there is still ”lack of theoretical principles as the basis for the development of reliable computational models” regarding human perception [46]. We will show that a formalism able to estimate the Visual Distortion Typical Set allows to speed up SSIM computation of about 200 times, with a very small estimation error: lower than 5% on average (with few peaks lower than 8%) in images affected by 2
different distortions with different intensities. The outline of the paper is the following. Next section gives some preliminary details useful to follow the rest of the paper. In particular, the proposed study is compared with the state of the art, outlining differences with related approaches and research lines. Section III deals with the problem of formalizing the Visual Distortion Typical Set starting from an original source and its degraded copy. In the first part of this section, following an Information Theory based approach, it will be shown that some theoretical findings that lead to a more correct QA estimate can be found. They are very general as they are valid for any QA measure and any distortion kind, and they hold independently of the Visual Distortion Typical Set characterization. The last part of this section shows how to find out one sequence belonging to the Visual Distortion Typical Set by means of an Information Theory based strategy. Section IV shows some results on LIVE [32, 29, 30, 16] and TID2013 [24] databases oriented to experimentally confirm the theoretical results in Section III, while showing the potential of the proposed approach in terms of SSIM speed up with a very small error estimation error. Finally, Section V draws the conclusions. 2. Preliminary Background In this paper, a scene denotes the image I under study. Its Visual Distortion Typical Set AM with respect to any FR quality measure M is supposed to be composed of a subset of information of I along with the corresponding one in its distorted version J. This set will then depend on the original source I, on its distorted version J, on the considered FR quality measure ˆ (i.e., M estimated on A ) and M ¯ M and on , i.e. the distance between M M (i.e., M computed using the whole available information of I and J). In order to find out the visual distortion typical set AM , an entropy based formulation will be adopted. In other words, both the original image and its distorted version will be seen as related to two sources [37]. In agreement with Information Theory principles, it is always possible to assume the existence of a typical set, i.e. a subset of sequences (composed of 2D blocks) able to describe the original source content within a small error — to be fixed a priori. It is worth stressing that the proposed approach is not far from some widely investigated problems like selective visual attention and quality metrics. In fact, there is a wide literature on how to find fixation points [10, 20, 3
47], that allow to synthesize and understand scene information in the preattentive phase. In particular, several approaches oriented to determine the most representative subset of information of a given scene are usually based on saliency maps [2, 4, 26, 45]. However, the proposed approach has some differences with this specific literature. It accounts for both original and distorted versions, while most of the existing approaches deal with just the original source information. Among the approaches and attempts also dealing with degradation, to the best of authors’ knowledge there are not complete theoretical formalisms that lead to a specific subset, like AM , in a limited time [25], as the proposed approach does. In fact, a determistic formalism should account for suprathreshold effects that have not been understood yet [47]. Unlike existing approaches that provide empirical strategies that lead to a specific solution (i.e. a specified walk in terms of blocks within the scene under exam), the proposed approach proves the existence of more than one walk (or blocks sequence) given I, J, M and . These blocks sequences have an informative content very close to the one of the whole sources I and J — in agreement with the notion of typical set in Information Theory [37]. In this sense, the proposed formalism may be also interesting from a theoretical point of view as it may give some basic criteria used by Human Visual System in the selection of information for quality assessment. The proposed approach is also related to the problem of selecting the optimal pooling function for a quality assessment measure. In fact, any QA measure estimate can be split into two phases. In the former, local distortions are estimated via a suitable visibility based distance function. In the latter, these distortions are combined through a pooling strategy. Some interesting and effective approaches focusing on this topic have been already investigated (see for instance [3, 23, 46]). The proposed approach can be then seen as a binary pooling function that preserves some blocks while discards the others. However, again, we are not interested in a precise and specific strategy for balancing all available information. The possibility of estimating a rough but not computationally expensive subset of available information is the main target in order to get a real speed up for any visual FR QA measure and for any global distortion. 3. Visual Distortion Typical Set In order to find out the Visual Distortion Typical Set, SSIM will be mainly considered here, even though theoretical results are valid for any distortion 4
SSIM Computation 1. Split the original image I into a set of W1 · W2 blocks {bi } of size l ×l and centered at each pixel of I. Make the same for the distorted version J, achieving blocks {di }. 2. For each block bi and the corresponding di , estimate SSIM: Mi (bi , di ) =
2μbi μdi + C1 μ2b + μ2d + C1 i i
luminance adaptation
2σbi σdi + C2 σb2 + σd2 + C2 i i contrast masking
σbi di + C3 σ σ i + C3 bi d
spatial correlation
where C1 ,C2 and C3 are numerical stabilizing constants. The array M (that can also be seen as a matrix, as each pixel of I (or J) can be assigned to the corresponding SSIM value) is then produced. 1 ·W2 Mi 3. Compute the mean of M : M = W11·W2 W i=1 Figure 1: Algorithm for SSIM estimation.
measure. Fig. 1 gives a simple algorithm for SSIM evaluation [42, 43] between the original image I and its distorted version J. It is worth observing that, in general, the blocks bi are overlapping and W1 · W2 = N, where N is the image dimension. SSIM algorithm in Fig. 1 has been defined by preferring the (apparent) most simple way to face the problem. However, one may ask if these solutions always are the best one can do. Specifically, one may deal with the following aspects: 1. Information reduction Is the whole I and J’s information really important, or is it sufficient to select just a part of it? 2. Selection of the best reduction domain Is it more convenient to reduce I and J’s information or M’s information (i.e. to subsample M in Fig. 1)? 3. Locality of the selected information Is it more convenient to take I (and J) samples from local and compact regions (for instance, blocks) or by means of a sparse spatial pixels selection from I (and J)? 4. Overlapping blocks In the case of block-based measures, have blocks to be overlapped? 5. How to find an AM sequence Is there a formal (and possibly fast) procedure to find this reduced information? 5
Figure 2: The original image (left) can be considered to be composed of two homogeneous regions X1 , X2 — as well as its distorted version (right).
The sequel shows that the estimate of AM (and the search of at least one subsequence belonging to it) has as side effect a formal answer to these questions. 3.1. Information reduction From a qualitative point of view, the visual distortion typical set AM can be defined as a subset of all sequences composed of samples of I (and the ˆ) corresponding ones of J) such that they give an approximated value (i.e. M ¯ ˆ ¯ of the expected value of M (i.e. M ) within an error , i.e.: |M − M | < . More formally, AM can be thought in terms of IT quantities. Shannon typical set is defined as the set of sequences of fixed size whose entropy is close to the true (source) one. Similarly, we can think about the original image I as the first source associated to the variable X, its distorted version J as the second source associated to the variable Y while the variable Z = M(X, Y ) characterizes the third source M which, in turn, depends on X and Y . Note that AM also depends, even though not explicitly, on the kind of degradation DJ that gave J starting from the original image I: J = DJ (I). However, DJ can be considered ’embedded’ in J and then it will be not explicitly mentioned in the sequel. AM will be then composed of the subset {X1 , .., XNr , Y1 , .., YNr } of size 2Nr < 2N = 2W1 W2 such that: ¯ (X, Y ) − M ¯ (X1 , .., XNr , Y1 , .., YNr )| < |M
> 0.
(1)
The existence of AM is guaranteed, again, by IT results. In fact, nthe weak 1 law of large numbers states that for i.i.d. r.v.s Xi it holds: n i=1 Xi → ¯ X n → ∞. However, it is more convenient to use the equivalent concept, known as the Asymptotic Equipartition Property (AEP) [37] for which 1 log p(X1 ,X12 ,..,Xn) → H(X) n → ∞, where p denotes the pdf. n That is why in the sequel just the entropy will be considered. Entropy is more mathematically tractable as it has a monotonic behavior as the number 6
of samples grows [37], while it is not so for the mean value, as proved in the following: Proposition 1 Let X ∼ Q with a positive and numerical alphabet χ and {X1 } ∼ p1 , {X1 , X2 } ∼ p2 , ..., {X1, X2 , ..., Xn } ∼ pn while μn be the mean of pn and μ be the mean of Q. Then 1. the sequence {μn } is not monotonic for increasing n; 2. |μn − μ|2 ≤ 2CDKL (pn ||Q), with C = maxx∈χ x ∀n. x pn (x) + Proof 1 1) μn+1 − μn = x∈χ x(pn+1 (x) − pn (x)) = x∈χ − n+1 xn+1 xn+1 μn 1 = − n+1 + n+1 = n+1 (xn+1 − μn ). Hence, the difference between two n+1 successive mean values changes its sign depending on the new value xn+1 : it is positive if xn+1 > μn , negative otherwise. It turns out that the convergence of the mean value is not monotonic. 2) |μn − μ|2 = | x(pn (x) − Q(x))|2 ≤ C 2 V 2 (pn , Q) x∈χ
where V (pn , Q) is the variational distance between pn and Q i.e., V (pn , Q) = 1 2 x∈χ |pn (x) − Q(x)|, and C = maxx∈χ x. Since DKL (pn ||Q) ≥ 2 V (pn , Q), we have |μn − μ|2 ≤ 2C 2 DKL (pn ||Q). Hence, if n is such that DKL (pn ||Q) ≤ ε , with ε > 0, then |μn − μ|2 ≤ ε. • 2C 2 3.2. Selection of the best reduction domain In order to get a typical subsequence {X1 , .., XNr , Y1, .., YNr }, one may ask whether it is more convenient to reduce information of the sources X and Y ¯ on them (and then Z), or to leave X and Y unchanged and to estimate M while reducing Z’s information. This is the topic of the following: Proposition 2 H(Z) ≡ H(M(X, Y )) ≤ H(X, Y ). Proof 2 Since p(X, Y, Z) = p(Z)p(X, Y |Z), we have: H(X, Y, Z) = H(Z) + H(X, Y |Z) ≥ H(Z)
(2)
On the other hand, since p(X, Y, Z) = p(X, Y )p(Z|X, Y ) we have: H(X, Y, Z) = H(X, Y ) + H(Z|X, Y ) = H(X, Y ).
(3)
The proposition is proved by inserting eq. (3) in eq. (2). • Prop. 2 can be seen as a straightforward consequence of the well-known result H(f (X)) ≤ H(X) which holds for any function f [37]. It states that 7
if the information in X is correlated by means of a given function f , the entropy decreases. In other words, the entropy of SSIM (Z) is less than the joint entropy of the source images (X and Y ). As a result, if we are interested in finding the typical set of Z we should sample Z and not the original sources X and Y , since part of the information of X and Y may be lost in the computation of Z = M(X, Y ). That is why it it is more convenient to leave X and Y unchanged, while reducing Z information. However, as we will see in the following section, Z sampling can correspond to a proper X and Y sampling, whenever the criteria used in the latter sampling are derived from Z properties. 3.3. Locality of the selected information Though the results above would lead to select information directly from M, it is necessary to find a strategy for retrieving part of the significant information directly from X and Y . In fact, with regard to SSIM algorithm in Fig. 1, it is unuseful (and non effective) to firstly build the whole vector M to take just a subset of its samples. In the sequel, we find formal criteria for the selection of this significant information. More formally, we can assume the subsequence {X1 , .., XNr , Y1 , .., YNr } to be built in a progressive manner: {X1 , Y1},
{X1 , X2 , Y1, Y2 },
...,
{X1 , .., XNr , Y1 , .., YNr }
(4)
till the constraint in eq. (1) is verified — with fixed ’a priori’. The problem is then to determine if it is better to locally take samples from the image (i.e., by means of blocks) or to take them non locally. The original image I can be considered to be composed of a finite number of ’homogeneous’ regions (for instance ’grass’, ’sky’ etc.). Without lack of generality, we can consider two only regions, as in Fig. 2. Hence: Proposition 3 Be X1 with prob. α X= X2 with prob. 1 − α Y =
Y1 Y2
and with prob. α with prob. 1 − α
where X1 ∩ X2 = ∅, Y1 ∩ Y2 = ∅ and Z = M(X, Y ) ∼ pZ . Then H(pZ ) ≤ H(α) + H(pM (X1 ,Y1 ) ) + H(pM (X2 ,Y2 ) ). 8
(5)
The proof is in the Appendix. For a suitable cardinality (> 2) of the alphabet of M, H(α) (whose maximum is equal to 1) can be neglected and then one can say that a mixture leads to a lower entropy. Hence, in order to search the subsequence {X1 , .., XNr , Y1 , .., YNr } necessary to build the reduced vector of Z, it is more convenient to find it in local regions within the images I and J. In this way, we maximize {M(X1 , Y1), .., M(XNr , YNr )} entropy — by minimizing at the same time Nr . This theoretical result shows that the practical choice of getting SSIM information from blocks in I and J is really the most convenient. It is not fortuitous that also HVS follows the same procedure (see [10, 20]). 3.4. Overlapping blocks It remains to see whether blocks have to be overlapped or not. Next proposition proves that the selection of non overlapping blocks maximizes entropy and then it is the most convenient method for block selection. Proposition 4 If Z1 , Z2 , .., ZT are variables associated to image blocks that overlap and such that Ti=1 Zi = I, while Z¯1 , Z¯2 , .., Z¯R are blocks such R
¯ ¯ that R i=1 Zi = ∅ but i=1 Zi = I, then H(Z1 , Z2 , .., ZT ) H(Z¯1 , Z¯2 , .., Z¯R ) ≤ T R
T > R.
Proof is in the Appendix. 3.5. How to find one sequence belonging to AM The objective of this paper is to go beyond the teoretical existence of AM . We want to find at least one subsequence ∈ AM with the least size (i.e. the minimum Nr ) — and with a low computational effort, if possible. Mathematically, if we find a subset of indices {i1 , .., iL } such that ¯ ¯ (xi1 , .., xim , yi1 , .., yim )| < . Nr = argmin |M(X, Y)−M m
(6)
Previous sections told us that information has to be extracted via non overlapping blocks but there is not a constructive way for finding such blocks. In the sequel, an entropy approach based on visual system behavior in its preattentive phase is proposed. It is natural to guess that the peculiarities of natural scenes guided the evolution of the Human Visual System over time [35]. In particular, saccadic 9
movements (generating fixation points) are mainly guided by the image content in the preattentive phase (i.e., in the first milliseconds of scene inspection) rather than by the observer experience, needs etc. — as it happens in the successive (attentive) phase. As we are considering global distortions, this conjecture still holds. The proposed method tries to account for the aforementioned considerations through the following phases: i) rough image segmentation in regions having different characteristics and ii) a random explorative walk on these regions. The latter step allows us to find out blocks of a sequence belonging to the typical set. Some hints can be found in [10]. In particular, since just a few fixation points are employed in the preattentive ’scene understanding’, the number of significant blocks will be estimated by means of the Minimum Description Length, exploiting the entropy monotonic behavior shown in Section 3.1, as explained in the sequel. 3.5.1. Image segmentation Image segmentation has been performed on a low-pass version of the luminance of the original image. It is performed on the original image since a global distorsion uniformly affects each image region. The luminance criterion for segmentation has been considered since it is one of the two measures (the second one is contrast) that regulate the adaptation process in the preattentive phase. Contrast has not been considered here for keeping the model complexity low. Finally, a low pass filter whose cutoff frequency depends on the viewing distance simulates early vision process. Specifically, the approximation band (low-pass component) at level G (AG ) of the dyadic wavelet 1 expansion of the image I has been computed [18], since its dimension is 2G+1 G of the original image size. For segmenting A , the Successive Mean Quantization Transform (SMQT) [22] has been adopted due to its simplicity and reduced computational effort. SMQT builds a binary tree using the following rule: given a set of data AG and a real parameter L (number of levels), split AG into two subsets, G G G G G G G , and A AG = x ∈ A |A (x) ≤ A = x ∈ A |A (x) > A 0 1 G where AG is the mean value of AG . AG 0 and A1 are the first level of the G SMQT. The same procedure is recursively applied to AG 0 and A1 until the th L L level, that is composed of 2 subsets (regions) that will be denoted with R1 , R2 , . . . , R2L .
10
3.5.2. Explorative walk on the scene The observation process to find out image blocks has been modeled as a Markov chain, i.e. random walk on a connected weighted graph whose nodes are the 2L regions R1 , R2 , . . . , R2L , with weights Wij ≥ 0 on the edge joining node i to node j. The graph is undirected, i.e Wij = Wji , and Wij = 0 if there is not an edge joining the node i to the node j. Hence, given a point randomly extracted from the region Ri , the successive point in the walk is a random point in the region Rj chosen among the nodes connected to Ri with a probability Wij Pij = (7) i∼k Wik that is proportional to the weight Wij . By denoting with ni the number of pixels in the region Ri , the weights are defined as follows i=j ni Wij = (8) Zij +Zji i = j 2 where Zij = nj
i∼k,k=i nk
2L
k=1
nk
. Wij takes into account the representativeness of
the region Rj in the image and also as neighbouring region of Ri . Even though a more refined definition of the weights could be used, this choice is simple but enough significant for our preliminary study. The initial point of the walk can be extracted looking at the stationary distribution of the process [37]. The number of blocks can be estimated via an automatic procedure based on the minimum description length principle [15], as shown in the sequel. 3.5.3. MDL for blocks number estimation This principle allows the selection of a good model for approximating the data with the least complexity. It is based on the concept that good compression means good approximation, in agreement with the definition of Kolmogorov complexity. Specifically, the simpler version of MDL, namely crude-MDL, selects a model from a set of candidates M(1) , M(2) , . . . by minimizing the following cost L(M(k) ) + L(X|M(k) ),
(9)
where L(M(k) ) is the cost (in terms of bits) required for coding the model M(k) , while L(X|M(k) ) is the number of bits required for coding the data 11
Figure 3: Original version of the images from LIVE database considered in this paper for tests. Clockwise: Ocean, Stream, Lighthouse, Sailing4, House, Flowersonih35.
X given the model. In general, the better the model the higher its cost but the smaller the approximation error. That is why the selection of the best model is a trade off between complexity and good approximation. In our case the model M(k) is the fixation path containing the SSIM value of k points whose average gave an approximation of SSIM of the whole image. The data X are corresponding blocks in I and J centered at the selected pixels that are involved in SSIM computation. The cost is measured as entropy per element. More precisely, by indicating with M1 , M2 , . . . , Mk the value of SSIM computed in the first k points selected during the random walk on the graph described above, and with (b1 , b2 , · · · , bk ) the blocks 2 ,Mk ) and used for the evaluation of SSIM, we have L(X|M(k) ) = H(M1 ,M k H(b1 ,b2 ,...,bk )+2log2 (k)+1 (k) 2 L(X|M ) = where H is the entropy, l is the dimen2w 2 sion of a block and 2log2 (k)+1 is the cost for coding the integer k. By coding the blocks independently, H(b1 , b2 , · · · , bk ) = kH(bi ), i = 1, 2, . . . , k and by considering a compression ratio 8 : 1, eq. (9) can be rewritten as Nr = argmink
H(M1 , M2 , .., Mk ) k + 2log2 (k) + 1 + k 2l2
(10)
where Nr gives the length of the optimal path, i.e. the length of a sequence in the visual distortion typical set.
12
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2
SSIM SNR SSIM entropy SNR entropy
0.1 0
0
200
400
600
800
1000
1200
1400
1600
Figure 4: Ocean image and its fastfaded copy (LIVE). SSIM (blue) and SNR (red) versus the number of considered blocks. The other two (smoother) curves indicate SSIM (black) and SNR (green) entropy. For a clearer presentation on just one picture, all plots have been normalized.
Figure 5: Ocean image and its Gaussian blurred version (LIVE). Entropy versus number of blocks (or an equivalent number of randomly selected and non adjacent pixels). SSIM entropy via random pixels (solid), SSIM entropy via non overlapping blocks (dashdot), SNR entropy via random pixels (dashed), SNR entropy via non overlapping blocks (dot).
3.6. The Algorithm The complete algorithm of the proposed method is given below. 1. Compute the wavelet approximation band AG at G − th level of the image I 2. Apply L levels of the SMQT transform to AG and extract the regions R1 , R2 , . . . , R2L 3. Compute the cardinality n1 , n2 , . . . , n2L of the segmented regions and evaluate the weights of the graph as in eq. (8) 13
Original Image ocean H(X) 7.1785
Distorted Image img57 ocean H(Y |X) 4.8630
Distortion kind gaussian blur ¯ H(Z) H(Z) 5.0798 5.3230
Original Image stream H(X) 7.4230
Distorted Image img58 stream H(Y |X) 6.6357
Distortion kind gaussian blur ¯ H(Z) H(Z) 5.9525 5.9997
Original Image lighthouse H(X) 7.3799
Distorted Image img97 lighthouse H(Y |X) 5.3012
Distortion kind gaussian blur ¯ H(Z) H(Z) 5.1560 5.4239
Original Image sailing4 H(X) 6.8476
Distorted Image img127 sailing4 H(Y |X) 5.2000
Distortion kind gaussian blur ¯ H(Z) H(Z) 5.0161 5.1369
Original Image ocean H(X) 7.1785
Distorted Image img7 ocean H(Y |X) 4.7998
Distortion kind fastfading ¯ H(Z) H(Z) 4.6848 4.8755
Original Image house H(X) 7.1803
Distorted Image img73 house H(Y |X) 4.6494
Distortion kind fastfading ¯ H(Z) H(Z) 4.0312 4.3385
Original Image stream H(X) 7.4230
Distorted Image img100 stream H(Y |X) 6.4137
Distortion kind jpeg ¯ H(Z) H(Z) 5.2691 5.5687
Original Image flowersonih35 H(X) 7.7161
Distorted Image img27 flowersonih35 H(Y |X) 5.6465
Distortion kind jpeg ¯ H(Z) H(Z) 3.4555 3.6996
Original Image ocean H(X) 7.1785
Distorted Image img118 ocean H(Y |X) 4.6308
Distortion kind white noise ¯ H(Z) H(Z) 4.2645 4.6762
Original Image house H(X) 7.1803
Distorted Image img109 house H(Y |X) 6.4434
Distortion kind white noise ¯ H(Z) H(Z) 6.1604 6.1460
Original Image flowersonih35 H(X) 7.7161
Distorted Image img72 flowersonih35 H(Y |X) 5.5789
Distortion kind white noise ¯ H(Z) H(Z) 3.6160 3.8698
Table 1: Entropy of the original images in Fig. 3 (H(X)), entropy of the corresponding distorted images given the original ones (i.e. H(Y |X)), entropy of SSIM with overlapping ¯ blocks (H(Z)) and entropy of SSIM for non overlapping blocks (H(Z)).
4. Extract a point from a region R1 according to the stationary distribution of the graph as defined in eq. (8) 5. Compute M1 , i.e. SSIM on a block of dimension l × l centered at the selected point and set k = 2 6. Extract a point in the region Rj selected according to the probability Pi,j defined in eq. (7) 7. Compute Mk , i.e. SSIM on a block of dimension l × l centered at the selected point 8. Evaluate the argument of eq. (10) and assign its value to the new variable Lk 14
Figure 6: (Left) Original Ocean image and (Right) its blurred version from TID2013.
ˆ = 9. If Lk > Lk−1 , set Nr = k − 1 and M k = k + 1 and go to step 6.
Nr
k=1 Mk Nr
and stop; otherwise set
3.7. Model’s complexity The following proposition provides a constraint on the number Nr of blocks to use in the proposed method in order to have a computational gain with respect to the classical algorithm for SSIM computation in Fig. 1. Proposition 5 Let OSSIM and Oalgo be the number of operations required for the computation of SSIM respectively using all image pixels and Nr blocks selected using MDL procedure. Let Clog be the cost for the calculation of the logarithm of a number, N the image size and |χ| the cardinality of the alphabet of SSIM, G and L two integer numbers denoting the levels of respectively the wavelet transform and the SQMT; then
1 7 Oalgo = 1 − 2G + 2L + 1 N − 2L − L + 1 + 22L )+ 3 2
3 N2 2 + 8l + 30 + Clog + log2 |χ| Nr + (4 + Clog ) r 2 2 and OSSIM = (8l2 + 18)N. In addition, −8l2 − 30 − 32 Clog − log2 |χ| + Nr < N0 = 4Clog with
√ Δ
⇒ Oalgo < OSSIM
1 7 2 L+1 2L+1 Δ = Clog 2N 8l + 17 − 2L − 1 − 2G +2 . + 2L − 2 − 2 3 2 15
Figure 7: (Left) SSIM map of Fig. 6 and (right) its SMQT segmentation. √ −8l2 −30− 32 Clog −log2 |χ|+ Δ 4Clog
Proof: The proof is simply achieved by observing that is the positive root of the equation Oalgo − OSSIM = 0. The detailed computation of Oalgo and OSSIM is in the Appendix1 . 4. Experimental Results
In order to clearly present some tests that confirm the proposed theoretical results, this section has been organized as follows. The first part is devoted to the theoretical results in Sections 3.1-3.4. The second part presents some tests regarding the topic of Section 3.5 and show the potential of the proposed approach in speeding up SSIM evaluation. All tests have been performed on several test images. For the sake of brevity, we will present just those concerning some images coming from LIVE [32] (Fig. 3) and TID2013 [24] databases. LIVE database is composed of 779 images having a different amount of (five kinds of) distortion: Fast Fading, Gaussian Blur, JPEG2K, JPEG and Additive Gaussian Noise. TID2013 database [24] contains images affected by different kinds of distortion, among which the global ones are additive and multiplicative gaussian noise, high frequency noise, gaussian blurring, jpeg and jpeg 2000 compression, mean shift and contrast change. For each distortion, four intensity levels have been considered. 4.1. Some tests on reducing information Fig. 4 shows a first qualitative, but very informative, result about the theoretical findings in Section 3.1 on Ocean image and its fastfaded version (from Live database). Specifically, it shows both the quality assessment 1
It is worth observing that constraint on Nr is feasible. N0 = 1053 when parameters default values, given in Section 4.5, are used for 256 × 256 images.
16
measure versus the number of selected blocks and the entropy of the same quality assessment measure versus the number of selected blocks. Both SSIM and SNR have been considered, using (non overlapping) 16 × 16 blocks on each image. The blocks have been randomly set, in order to show that results are not tied to a specific block selection procedure. Notice that the curves in Fig. 4 have been normalized (divided by the corresponding maximum value) in order to better appreciate their trend on the same plot. In addition, a uniform quantization step with bins of width (Δ) equal to 10−4 has been used for the SSIM vector, which is composed of SSIM values computed on each selected block — a very little bin size for reducing the quantization distortion (∝ log(Δ) [37]). Looking at Fig. 4, it can be seen the different behavior of the curve of the Full Reference measure (i.e., SSIM and SNR) and the curve of the corresponding entropy. In particular, the first part of SSIM and SNR curves is clearly oscillating before approaching values close to the true one — where their trend is quite horizontal. This confirms the result in Prop. 1. On the contrary, the corresponding entropies have a smoother increasing trend with a critical curvature after which the values approach the entropy of the whole available sample (i.e., the true one). However, it can also be noted that both the critical ascending trend of entropies and the oscillating region of SSIM and SNR stop nearly in correspondence to the same point. It is worth highlighting the fact that the behavior in Fig. 4 is common to all considered images and any distortion kind. Interestingly, the behavior of those curves does not change if involved parameters change. This preliminary result shows two important things. The first one is that it is really possible to drastically reduce the available information sent by the two sources I and J in order to achieve a quite exact value of the involved quality measure (quantitative results are presented below). The second one is that the search of the reduced information may be performed exploiting the entropy smooth profile. 4.2. Some tests about the best reduction domain Section 3.2 has proved that it is more convenient to work in the measure (i.e. SSIM, SNR etc.) domain rather than directly in the image one. In fact, any measure naturally reduces entropy [37]. Hence, any further information reduction is more convenient on the already reduced information — without introducing any distortion before. It can be easily verified that the measure domain contains less information than the original source one as follows. For each image in Fig. 3, the corresponding entropy H(X) has been computed 17
distortion kind high frequency noise Gaussian noise Gaussian blur JPEG compression JPEG2K compression Mean shift Contrast change Multipl. gaussian noise
dist. level 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5
¯ M
ˆ M
σ
¯r N
σNr
nopp
0.8661 0.7034 0.4829 0.2722 0.8673 0.7781 0.6614 0.5276 0.9513 0.8805 0.7925 0.7012 0.9451 0.8891 0.7578 0.6320 0.8516 0.6942 0.5394 0.4799 0.9951 0.9778 0.9620 0.8929 0.9829 0.9713 0.9349 0.8726 0.8594 0.7730 0.6615 0.5376
0.8672 0.7086 0.4842 0.2627 0.8728 0.7821 0.6607 0.5272 0.9542 0.8828 0.8045 0.7128 0.9448 0.8895 0.7518 0.6257 0.8553 0.6939 0.5529 0.4827 0.9950 0.9779 0.9644 0.8930 0.9832 0.9711 0.9392 0.8749 0.8697 0.7851 0.6627 0.5346
1.1222 2.6788 4.9297 8.9767 1.4447 2.1462 3.4739 4.3112 0.6528 1.4811 2.9532 4.1474 0.5260 0.8761 2.1654 4.1073 1.9288 4.0191 6.4984 7.6406 0.1259 0.2198 0.5506 1.0167 0.3674 0.1596 0.8867 0.7490 2.3503 3.3095 4.0346 5.0814
0.0113 0.0237 0.0275 0.0284 0.0132 0.0217 0.0308 0.0260 0.0069 0.0159 0.0260 0.0346 0.0061 0.0097 0.0198 0.0311 0.0183 0.0326 0.0395 0.0426 0.0015 0.0028 0.0059 0.0111 0.0042 0.0019 0.0086 0.0081 0.0220 0.0286 0.0327 0.0322
48.00 48.87 48.70 47.80 45.67 48.50 48.87 49.30 37.37 44.47 46.00 48.40 41.87 47.33 48.33 49.37 46.73 49.43 47.73 49.53 21.00 34.37 31.27 45.53 29.23 33.53 40.20 44.60 45.80 47.40 48.60 50.77
2.92 4.54 3.51 4.43 3.87 4.55 3.50 3.37 6.15 5.10 6.59 4.67 5.17 3.94 3.56 4.10 3.59 4.35 3.55 2.83 5.52 6.13 7.36 4.57 5.70 5.51 5.70 3.28 5.25 4.33 3.75 3.18
10.00 10.01 10.01 10.00 9.95 10.00 10.01 10.02 9.82 9.93 9.96 10.01 9.90 9.99 10.00 10.02 9.98 10.02 9.99 10.02 9.58 9.78 9.73 9.96 9.70 9.76 9.87 9.94 9.96 9.99 10.01 10.05
Table 2: Ocean image; I16 in TID2013 database with different distortion kind and distor¯ ), estimated SSIM (M ˆ ) using the proposed method, mean value of tion levels. SSIM (M the estimation error (%) over 30 runs (), standard deviation of the estimation error (σ ), ¯r ), standard deviation of the number of blocks mean value of the number of blocks used (N (σNr ), number of operations per pixel (nopp) required by the proposed algorithm.
in Table 1 (first column). In this case, bins width (necessary to build the empirical p.d.f.), has been set equal to 1. This choice has been motivated by setting the precision at the one of the image. For any distorted image J, the 18
conditional entropy H(Y |X) has also been computed (second column). It has been achieved considering the distortion as an additive term, i.e. H(X − Y ). Although it may seem a rough approximation, since the distortion process may be much more complicated, it is quantitatively significant as proportional to the residual between the original source I and the distorted one J. Again, the size of the bin width of H(Y |X) has been set equal to 1 while tests have been performed for 32 × 32 blocks. Finally, the third column of Table 1 contains the entropy H(Z) relative to the SSIM vector (i.e., of the vector M in Fig. 1) by quantizing M with a bin width of .01, that accounts for the fact that the ratio between the range of SSIM and the one of the images 1 (original and distorted) is 255 .004. It can then be easily observed in Table 1 that H(Z) < H(X) + H(Y |X). Hence, SSIM naturally reduces entropy of the original images [37]. In particular, SSIM entropy can be considerably less than the joint entropy of the original and distorted images, according to both image content and distortion kind. As a result, any preprocessing on X and Y can alter data making H(Z) estimation completely unfair. That is why the selection of corresponding blocks in X and Y must obey constraints and rules directly tied to SSIM. This result holds for both overlapping and non overlapping blocks. Moreover, this behavior does not change for a different parameters setting. 4.3. Some tests about the selected information locality Fig. 5 is one of the examples that prove the results in Section 3.3. In other words, it is more convenient to use blocks rather than pixels randomly spread over the image for building Full Reference quality measures. In particular, blocks convey a greater amount of information. Fig. 5 shows that SSIM entropy curve built via non overlapping and randomly selected blocks is always over SSIM entropy curve built via randomly selected pixels — pixels are not adjacent in the image. The same result can be observed for SNR curves, even though in this case the effect is strongly less visible: curves are very close to each other. Quantization bins have been set equal to 10−4 in all tests, but different settings confirm the same trend. 4.4. Some tests on overlapping blocks choice Section 3.4 states that it is more convenient to select non overlapping blocks from I and J. A simple practical proof can be performed by taking all possible (non overlapping) blocks from images shown in Fig. 3 and computing SSIM on them. Considering again a bin width of .01 (for the same reasons as 19
distortion kind Gaussian noise High frequency noise Gaussian blur Jpeg
Jpeg2K
Mean shift Contrast change Multiplicative gaussian noise
distortion level 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5
(%) 1.11 1.76 2.77 3.67 1.08 2.57 4.16 7.06 0.43 1.02 2.26 3.89 0.45 0.76 1.57 3.16 0.89 1.79 2.96 4.88 0.14 0.76 0.61 2.24 0.51 0.27 1.23 0.85 1.71 2.28 3.03 4.44
σ 0.012 0.017 0.024 0.029 0.011 0.023 0.030 0.033 0.005 0.010 0.020 0.028 0.005 0.008 0.015 0.025 0.009 0.017 0.023 0.032 0.002 0.008 0.007 0.021 0.006 0.003 0.013 0.009 0.017 0.023 0.027 0.034
¯r N 42.82 44.75 46.70 48.17 43.18 46.58 48.60 48.92 39.38 44.68 47.36 48.94 41.10 45.10 47.64 48.84 43.74 46.99 47.99 49.25 23.48 34.83 36.20 44.76 27.50 29.82 38.58 42.68 39.77 43.91 45.37 47.75
σNr 5.17 5.16 4.35 3.96 5.14 4.54 3.61 3.61 5.05 4.64 4.42 3.67 4.57 4.48 3.98 3.85 4.74 4.02 3.64 3.27 5.38 6.32 6.28 4.49 6.72 7.23 6.14 4.78 6.44 4.97 4.58 4.00
nopp 9.91 9.94 9.97 10.00 9.92 9.97 10.01 10.01 9.86 9.94 9.99 10.01 9.88 9.95 9.99 10.01 9.93 9.98 10.00 10.02 9.61 9.78 9.81 9.94 9.67 9.71 9.84 9.91 9.86 9.93 9.95 9.99
Table 3: TID2013 database. Mean value of SSIM estimation error (%) over 30 runs (), standard deviation of the estimation error (σ ), mean value of the number of blocks used ¯r ), standard deviation of the number of blocks (σNr ), required number of operations (N per pixels (nopp). The latter must be compared with 2330, which is the nopp required by SSIM computation using all pixels in 512 × 384 images. Different distortion kinds and distortion intensities have been considered.
20
in Section 4.2) for this new vector M and 32 × 32 blocks, the corresponding ¯ has been then computed (fourth column of Table 1). Table 1 entropy H(Z) shows that non overlapping blocks lead to a higher entropy and then they convey a greater amount of information. In order to minimize the number of selected blocks that contain most of information, the selection of non overlapping blocks is then the most effective strategy to adopt. 4.5. SSIM speed up The proposed method has been tested on several images affected by different distortion kinds with different intensities from TID2013 database. The step by step procedure is presented for 512 × 384 Ocean image (image I16 in TID2013) and its blurred version 2 (see Fig. 6). In all tests, the following parameters have been used. The level G of the wavelet transform (a Daubechies with 2 vanishing moments) has been set equal to 3; the level L of the SMQT has been fixed to 3 in order to have 8 regions; blocks size (l × l) for SSIM computation has been set to 17 × 17, since it corresponds to a visual angle equal to 0.56 degrees [20] — however smaller dimensions provide similar results; the cardinality of the alphabet for SSIM has been set equal to 200, that corresponds to a quantization step equal to 0.01. Figure 7 shows the segmentation used for Ocean image. As it can be observed, the segmentation is quite faithful to its SSIM map except for the edges. It is due to the fact that the criterion used for the segmentation is based just on the luminance and then a region based segmentation has been employed. It turns out that the optimal point selected by the MDL principle (see Fig. 8) on the entropy curve corresponds to a good value of SSIM, providing acceptable estimation errors. It is worth outlining that Fig. 8 clearly shows that MDL is able to exploit the property in Section 3.1 i.e., entropy behavior is monotonic and significantly more regular than the SSIM one. Finally, Fig. 9 shows the blocks belonging to the selected fixation path. As it can be observed, more blocks are selected in regions where blurring is more visible. With regard to quantitative results, note that each run of the proposed algorithm provides a different sequence in the visual distortion typical set of the image under study. That is why the average value of SSIM estimations obtained over 30 runs of the algorithm has been given in Table 2. The latter includes the standard deviation of the estimation, the average number of blocks used for 2
Results for different images do not significantly differ from the ones for Ocean image.
21
SSIM
MDL
1
0.7
0.6
0.95
0.5 0.9 0.4 0.85 0.3 0.8 0.2 0.75
0.7
0.1
0
10
20
30
40
50
60
0
70
0
10
20
30
40
50
60
70
Figure 8: (Left) SSIM value of Fig. 6 estimated for an increasing number of blocks and (Right) entropy per sample used in the MDL based procedure — the optimal point has been marked.
Figure 9: Selected blocks on Fig. 6.
computing it and the corresponding standard deviation as well as the average number of required operations per pixel (nopp). The latter is compared with the nopp required by the computation of SSIM using all image pixels. As it can be observed, the estimation error increases as the distortion level increases but it rarely reaches 8%. In addition, the standard deviation is very small (10−2 − 10−3 ). For some distortion kinds, like gaussian noise and gaussian blur this percentage is less than 5%; for distortions like mean shift and contrast change it does not overexceed 1.2%. The same considerations are valid for a larger class of images, as shown in Table 3, which shows the average results achieved for images in TID2013 database. The average number of the selected blocks is always less than 50. It turns out that the number of operations required for the computation of SSIM of Ocean image is reduced of about 200 times. It is also worth stressing that the proposed procedure does not involve an exhaustive search of points of interest, as required by the contrast-based procedure in [25].
22
5. Conclusions and Future Research This paper has shown that it is possible to estimate a quality assessment measure from just a subset of image information. The relative estimation error obviously depends on the scene content and on the kind of distortion, but it has been shown to be usually small — very often under 5%. The proposed entropy based formalism has also proved that there exist some criteria for an optimal estimation of QA measures. It is worth outlining that the proposed formalism may have various further advantages, both theoretical and practical. In fact, such an approach may also allow: i) to improve the design of existing FR measures, ii) to design novel and possibly more precise ones, iii) to build novel ’HVS based functionals’ according to a novel concept of 2D and 3D functions regularity, iv) to add some novel elements to Visual Information Theory [7] with possible effects on the definition of new visive image coding schemes etc.. Moreover, since the proposed approach is close to two well investigated topics as pooling and fixation points, future research will be oriented to deepen common aspects involving them. Specifically, on the one hand it will be investigated if a more effective binary pooling can be designed keeping, at the same time, the computational effort low. Following the approaches in [23, 46], the proposed binary pooling function could be designed by weighting AM information, accounting for the scene content. Similarly, fixation points search strategy may also be embedded in the proposed framework for making blocks selection pseudo-random and more adaptive to both scene and distortion content. Finally, instead of taking an a priori fixed number of blocks, an optimal number of them in terms of Minimum Description Length [15] of the available information may be defined. Appendices Proof of Proposition 3 It is well-known [37] that: H(X) = H(α) + αH(X1) + (1 − α)H(X2) and the same holds for H(Y ). On the other hand, M(X1 , Y1 ) with prob. α Z= M(X2 , Y2 ) with prob. 1 − α where M(X1 , Y1 ) ∩ M(X2 , Y2 ) = ∅ and pZ = αpM (X1 ,Y1 ) + (1 − α)pM (X2 ,Y2 ) . In order to show that H(pZ ) ≤ H(α) + αH(pM (X1,Y1 ) ) + (1 − α)H(pM (X2 ,Y2 ) ), 23
let’s suppose by absurd H(pZ ) > H(α) + αH(pM (X1 ,Y1 ) ) + (1 − α)H(pM (X2 ,Y2 ) ).
(.1)
α , i.e. By reminding the definition of the Jensen-Shannon divergence DJS α (pM (X1 ,Y1 ) ||pM (X2 ,Y2 ) ) = H(pZ ) − αH(pM (X1 ,Y1 ) ) − (1 − α)H(pM (X2 ,Y2 ) ), DJS α (pM (X1 ,Y1 ) ||pM (X2 ,Y2 ) ) > and by combining previous equations we have: DJS α H(α), that is absurd since 0 ≤ DJS (pM (X1 ,Y1 ) ||pM (X2 ,Y2 ) ) ≤ H(α). Hence, eq. (.1) is absurd and since α < 1, eq. (5) follows. •
Proof of Proposition 4 Let Zˆj = {Z1 , Z2, .., ZNj } denote a collection of Nj variables Zi selected
N j in {Z1 , Z2 , .., ZT } such that i=1 Zi = ∅ and be K the number of Nj −ples K ˆ such that: j=1 Zj = {Z 1 , Z2 , .., ZT }. Since for generic variables S1 , .., Sn it holds H(S1 , S2 , .., Sn ) ≤ ni=1 H(Si ) [37] and KR ≤ T , then H(Z1, Z2 , .., ZT ) ≤
K
H(Zˆj ) ≤
j=1
K
H(Z¯1, Z¯2 , .., Z¯R ) =
j=1
H(Z¯1 , Z¯2 , .., Z¯R ) = KH(Z¯1 , Z¯2 , .., Z¯R ) = KR R
•
Proof of the Propositon 5 Let m, d, s and c respectively denote multiplication, division, algebraic sum and comparison, the number of operations required by SSIM is OSSIM = OSSIM p = (8l2 + 18)N,
(.2)
where OSSIM p is the number of operations required for SSIM computation at x μy +C1 2σxy +C2 a given pixel. In fact, SSIMp = S1 S2 = μ2μ2 +μ 2 +C σ 2 +σ 2 +C , then OSSIM p = 1 x 2 x y y (OS1 +OS2 +1m), where O(S1 ) = 2m+3s+1d+2Oμ, O(S2 ) = 1m+1d+3s+ 2Oσ2 + Oσxy , with O(μ) = (l2 − 1)s + 1d, O(σ 2 ) = (l2 + 1)m + l2 s + 1d, O(σxy ) = (l2 + 1)m + l2 s + 1d. As a result, OS1 = (2l2 + 1)s + 2m + 3d, OS2 = (3l2 + 4)m + (3l2 + 3)s + 4d and then OSSIM p = 8l2 + 18.
(.3)
The number of operations required by the Algorithm in Section 3.5 is the sum of the number of operations required by 24
G−th low pass (Haar) wavelet subband (Oapproxwavelet); L levels SMQT (OSM QT ); regions cardinality and weights of the graph (Ograph ); sequential SSIM computation on an increasing number of blocks up to Nr (OSSIM blocks); 5. entropy of SSIM vectors (OSSIM entropy ); 6. MDL for the selection of the best Nr (OM DL ),
1. 2. 3. 4.
where 1. the computation of the low pass wavelet subband using Haar wavelet requires (4m + 3s)N/22j at each level j, then Oapproxwavelet
G 1 7 1 = 7N = N 1 − 2G ; 22j 3 2 j=1
(.4)
2. each level l of SQMT requires (N − 1)s + 2l−1 d + Nc, then OSQM T = L(N − 1)s + LNc +
L
2l−1 d = 2LN + 2L − L − 1; (.5)
1=1
3. the cardinality of selected regions requires Nc while the computation of the weights requires 1m + 1d + (2L (2L − 2))s, then Ograph = Nc + 1m + 1d + (2L (2L − 2))s = N + 2 + 2L (2L − 2); (.6) 4. using eq. (.3), we have OSSIM blocks = Nr (8l2 + 18)
(.7)
5. if Nr ≤ |χ|, at k −th step the computation of the pdf (as in Proposition 1) requires log2 (|χ|)c + km + kd + 1s while the entropy requires km + (k − 1)s + kClog + 1d. Hence, the complete entropy computation at the k −th step requires Ok−entr = (2k)m+ks+(k +1)d+kClog +log2 (|χ|)c. As a results, for Nr steps OSSIM entropy = Nr (log2 (|χ| + 1) + (4 + Clog )
Nr
k=
k=1
= (log2 (|χ| + 3 + 25
Clog N2 )Nr + (4 + Clog ) r . 2 2
(.8)
6. at the k-th step of MDL procedure it is necessary to compute L(k) + L(H|k). This computation requires 3s + Clog + 4m + 1d, while the minimum value requires 1c. Hence, OM DL = (9 + Clog )Nr .
(.9)
As a result
1 7 Oalgo = N 1 − 2G + 2L + 1 + 3 2 2 N 3 −2L − L + 1 + 22L + (4 + Clog ) r + (8l2 + log2 (|χ| + 30 + Clog )Nr (.10) 2 2
Hence, using eqs. (.2) and (.10), the thesis follows by solving the inequality Oalgo < OSSIM with respect to Nr . Acknowledgements The Authors would like to thank the anonymous reviewer for his valuable comments and suggestions that contributed to improve the paper. [1] A. Beghdadi and B. Pesquet-Popescu, A new image distortion measure based on wavelet decomposition, 7th Int. Symp. on Signal Processing and Its Applications, Paris, France, 2003. [2] S. Benabdelkader, M. Boulemden, Recursive Algorithm based on Fuzzy 2Partition Entropy for 2-Level Image Thresholding, Pattern Rec., 38:12891294, 2005. [3] B. P. Bondzulic, V. S. Petrovic, Additive Models and Separable Pooling, a New Look at Structural Similarity, Signal Processing, 97:110-116, 2014. [4] V. Bruni, D. Vitulano, G. Ramponi, Image quality assessment through a subset of the image data, Proc. of ISPA 2011. [5] V. Bruni, E. Rossi, D. Vitulano, On the equivalence between Jensen Shannon divergence and Michelson contrast, IEEE Trans. on Information Theory, 58(7):4278-4288, 2012. 26
[6] V. Bruni, E. Rossi, D. Vitulano, Jensen-Shannon divergence for visual quality assessment, Signal Image and Video Processing, Springer, 7(3):411-421, 2013. [7] V. Bruni, D. Vitulano, Z. Wang, Special issue on human vision and information theory, Signal Image and Video Processing, Springer, 7(3), 2013 [8] V. Bruni, D. Vitulano, Evaluation of degraded images using adaptive Jensen-Shannon divergence, to appear in Proc. of the 8th Int. Symp. ISPA 2013, Trieste, Italy, 2013. [9] R. Ferzli, L. J. Karam, A no-reference objective image sharpness metric based on the notion of just noticeable blur (JNB), IEEE Trans. Image Processing, 18(4):717-728, 2009. [10] R.A. Frazor, W.S. Geisler, Local luminance and contrast in natural in natural images, Vision Research, 46:1585-1598, 2013. [11] S. Gabarda, G. Cristbal, Blind image quality assess. through anis., J. Opt. Soc. Amer., 24(12):42-51, 2007. [12] F. Gao, J. Yu, Biologically Inspired Image Quality Assessment, Signal Processing, Article in press. [13] D. Gayle, H. Mahlab, Y. Ucar, A. M. Eskicioglu, A Full-Reference Color Image Quality Measure in the DWT Domain, Proc. of EUSIPCO 2005. [14] R. C. Gonzalez, R. E. Woods, Digital Image Processing, Prentice Hall, 2nd Edition, 2002. [15] P. D. Grunwald, A Tutorial Introduction to the Minimum Description Length Principle, edited by: Myung Grunwald, Pitt, In Advances in Minimum Description Length: Theory and Applications, 2004. [16] URL: http://live.ece.utexas.edu/research/quality/live video.html [17] Q. Li, Z. Wang, General-Purpose Reduced-Reference Image uality Assessment based on Perceptually and Statistically Motivated Image Representation, Proc. of IEEE ICIP, San Diego, CA, 2008. [18] S. Mallat, A wavelet tour of signal processing, Academic Press, 1998. 27
[19] A. Mittal, A. K. Moorthy, A. C. Bovik, No-Reference Image Quality Assessment in the Spatial Domain, IEEE Trans. on Image Processing, 21(12):4695-4708, 2012. [20] V. Monte, R.A. Frazor, V. Bonin, W.S. Geisler, M. Corandin, Independence of luminance and contrast in natural scenes and in the early visual system, Nature Neuroscience, 8(12), 2005. [21] A. K. Moorthy, A. C. Bovik, Blind image quality assessment: From natural scene statistics to perceptual quality, IEEE Trans. Image Process., 20(12):3350-3364, 2011. [22] M. Nilsson, M. Dahl, I. Claesson, The successive mean quantization transform, Proc. of ICASSP05, 2005. [23] J. Park, K. Sshadrinathan, S. Lee, A. C. Bovik, Spatio-Temporal Quality Pooling Accounting for Transients Severe Impairments and Egomotion, Proc. of ICIP 2011. [24] N. Ponomarenko, L. Jin, O. Ieremeiev, V. Lukin, K. Egiazarian, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti, C.C. Jay Kuo, Image Database TID2013, Image Communication, 30, 2015. [25] R. Raj, W.S. Geisler, R.A. Frazor, A.C. Bovik, Contrast statistics for foveated visual systems: fixation selection by minimizing contrast entropy, J. Opt. Soc. Am. A, 20(10), 2005. [26] M. Rivera, O. Ocegueda, J. L. Marroquin, Entropy-Controlled Quadratic Markov Measure Field Models for Efficient Image Segmentation, IEEE Trans. on Image Processing, 16(12):3047-3057, 2007. [27] M. Saad, A. C. Bovik, C. Charrier, Blind image quality assessment: A natural scene statistics approach in the DCT domain, IEEE Trans. Image Processing, 21(8):3339-3352, 2012. [28] A. Saha, Q. M. J. Wu, Perceptual Image Quality Assessment using Phase Deviation Sensitive Energy Features, Signal Processing, 93:31823191, 2013. [29] K. Seshadrinathan, R. Soundararajan, A. C. Bovik and L. K. Cormack, Study of Subjective and Objective Quality Assessment of Video, IEEE Trans. on Image Processing, 19(6):1427-1441, 2010. 28
[30] K. Seshadrinathan, R. Soundararajan, A. C. Bovik and L. K. Cormack, A Subjective Study to Evaluate Video Quality Assessment Algorithms, SPIE Proceedings Human Vision and Electronic Imaging, Jan. 2010. [31] H. R. Sheikh, A. C. Bovik, G. De Veciana, An Information Fidelity Criterion for Image Quality Assessment using Natural Scene Statistics, IEEE Trans. on Image Processing, 14(12), 2005. [32] H. R. Sheikh, Z. Wang, L. Cormack, A. C. Bovik, Live Image Quality Assessment Database Release 2. [Online]. Available: http://live.ece.utexas.edu/research/quality [33] H. R. Sheikh, A. C. Bovik, Image Information and Visual Quality, IEEE Trans. on Image Proc., 15(2), 2006. [34] J. Sivestre-Blanes, Structural Similarity Image Quality Reliability Determining Parameters and Window Size, Signal Processing, 91:1012-1020, 2011. [35] E. P. Simoncelli, B. A. Olshausen, Natural Image Statistics and Neural Representation, Annu. Rev. Neurosci., 24:1193-1216, 2001. [36] S. Suthaharan, No-reference visually significant blocking artifact metric for natural scene images, J. Signal Processing, 89(8):1647-1652, 2009. [37] T. M. Cover, J. A Thomas, Elements of Information Theory, John Wiley & sons, 1991. [38] D. Van der Weken, M. Nachtegael and E. E. Kerre, A new similarity measure for image processing, Journal of Computational Methods in Sciences and Engineering, 3(2):209-222, 2003. [39] Y. Yang, J. Ming, Image Based Assessment based on the Space Similarity Decomposition Model, Signal Processing, 120:797-805, 2016. [40] Z. Wang and A.C. Bovik, Modern Image Quality Assessment, Morgan & Claypool Publishers, 2006. [41] Z. Wang and A. Bovik, A Universal Image Quality Index, IEEE Sig. Proc. Letters, 9(3):81-84, 2002.
29
[42] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, Image Quality Assessment: From Error Visibility to Structural Similarity, IEEE Trans. on Image Processing, 13:600-612, 2004. [43] Z. Wang, L. Lu, A. Bovik, Video Quality Assessment based on Structural Distortion Measurement, Signal Processing: Image Communication, 19(2):121-132, 2004. [44] Z. Wang, E.P.Simoncelli, Reduced-Reference Image Quality Assessment using a Wavelet-Domain Natural Image Statistic Model, Pro. of SPIE Human Vision and Electronic Imaging X, vol. 5666, 2005. [45] W. Wang, Y. Wang, Q. Huang, W. Gao, Meas. Visual Saliency by Site Entropy Rate, Proc. of CVPR 10, 2010. [46] Z. Wang, Q. Li, Information Content Weighting for Perceptual Image Quality Assessment, IEEE Trans. on Image Proc., 20(5):1185-1198, 2011. [47] S. Winkler, Digital Video Quality, Vision Models and Metrics, Wiley, 2005. [48] D. Zhang, E. Jernigan, An Information Theoretic Criterion for Image Quality Assessment based on Natural Scene Statistics, Proc. of IEEE ICIP 2006, Atlanta GA USA, 2006. [49] J. Zhang, T. M. Le, S. H. Ong, T. Q. Nguyen, No-reference Image Quality Assessment using Structural Activity, Signal Processing, 91:25752588, 2011.
30