An entropy based approach for SSIM speed up

Author’s Accepted Manuscript An Entropy based Approach for SSIM Speed Up V. Bruni, D. Vitulano www.elsevier.com/locate/sigpro PII: DOI: Reference: ...

Download PDF

719KB Sizes 32 Downloads 43 Views

Report

PDF Reader
Full Text

Author’s Accepted Manuscript An Entropy based Approach for SSIM Speed Up V. Bruni, D. Vitulano

www.elsevier.com/locate/sigpro

PII: DOI: Reference:

S0165-1684(17)30007-5 http://dx.doi.org/10.1016/j.sigpro.2017.01.007 SIGPRO6366

To appear in: Signal Processing Received date: 25 January 2016 Revised date: 6 December 2016 Accepted date: 9 January 2017 Cite this article as: V. Bruni and D. Vitulano, An Entropy based Approach for SSIM Speed Up, Signal Processing, http://dx.doi.org/10.1016/j.sigpro.2017.01.007 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

An Entropy based Approach for SSIM Speed Up V. Brunia , D. Vitulanob a

University of Rome ’Sapienza’, Dept. of SBAI, Faculty of Engineering, Via A. Scarpa 16, 00161 Rome, Italy, e-mail: [email protected] b Istituto per le Applicazioni del Calcolo ”M. Picone”, C.N.R., Via dei Taurini 19, 00185 Rome, Italy, e-mail: [email protected]

Abstract This paper focuses on an entropy based formalism to speed up the evaluation of the Structural SIMilarity (SSIM) index in images aﬀected by a global distortion. Looking at images as information sources, a visual distortion typical set can be deﬁned for SSIM. This typical set consists of just a subset of information belonging to the original image and the corresponding one in the distorted version. As side eﬀect, some general theoretical criteria for the computation of any full reference quality assessment measure can be given in order to maximize its computational eﬃciency. Experimental results on various test images show that the proposed approach allows to estimate SSIM with a considerable speed up (about 200 times) and a small relative error (often lower than 5%). Keywords: Information theory, SSIM, asymptotic equipartition property, image quality assessment, typical set. 1. Introduction In the last few years, there has been an increasing interest in deﬁning new objective visual Quality Assessment (QA) measures that better correlate with the human perception [6, 12, 28, 31, 33, 44, 46, 48]. They can be roughly classiﬁed as follows [40]: 1. Full Reference (FR) or non blind approaches [1, 13, 38, 41, 43], that measure the quality of the observed (distorted) image when the original (or reference) one is available.

Preprint submitted to Elsevier

January 10, 2017

2. Reduced Reference (RR) [8, 17] approaches, that predict the quality of the observed image on the basis of a partial information about the original image (reference features). 3. No Reference (NR) or blind approaches [9, 11, 19, 21, 27, 36, 49], that try to give a measure of the quality of the observed image when the original one is not available. It is worthwhile outlining the fact that these measures can depend on the kind of distortion they are able to deal with. An alternative classiﬁcation for FR measures in terms of the preferred distortion kinds can be found in [13]. Probably, the most famous FR measures are the classical Signal-to-NoiseRatio (SNR) [14, 47] and the more recent Structural SIMilarity (SSIM) [42]. This latter is pixel-based and exploits a suitable combination of ﬁrst and second order statistical moments in each pixel neighborhood (block) of both the recovered and the original image [39, 34] — see Fig. 1 for details regarding its implementation. Despite the high correlation with human perception, SSIM suﬀers from a higher (than SNR) computational eﬀort, that may penalize its use in various real-time applications, especially video-oriented ones. In fact, SSIM complexity is O(l2 N), as it depends on both the image size (N) and block dimension (l2 ) — necessary to locally evaluate the metric. On the contrary, SNR complexity is O(N) as it only depends on the image size. The aim of this paper is to investigate the possibility of speeding up SSIM on images aﬀected by a global distortion i.e., a distortion that aﬀects almost uniformly the original image. Speed up can be achieved by selecting a subset of information, denoted as Visual Distortion Typical Set, from both the original image (ﬁrst source) I and its distorted version (second source) J. The proposed approach is motivated by two empirical observations. The ﬁrst one accounts for the fact that human beings do not explore the whole information sources (original and distorted) for determining the degree of distortion and assigning a subjective quality score. The second reason derives from the fact that any distortion diﬀerently aﬀects images, with some suprathreshold eﬀects that have not a clear and theoretical explanation yet [5, 47]. This also conﬁrms the observation that there is still ”lack of theoretical principles as the basis for the development of reliable computational models” regarding human perception [46]. We will show that a formalism able to estimate the Visual Distortion Typical Set allows to speed up SSIM computation of about 200 times, with a very small estimation error: lower than 5% on average (with few peaks lower than 8%) in images aﬀected by 2

diﬀerent distortions with diﬀerent intensities. The outline of the paper is the following. Next section gives some preliminary details useful to follow the rest of the paper. In particular, the proposed study is compared with the state of the art, outlining diﬀerences with related approaches and research lines. Section III deals with the problem of formalizing the Visual Distortion Typical Set starting from an original source and its degraded copy. In the ﬁrst part of this section, following an Information Theory based approach, it will be shown that some theoretical ﬁndings that lead to a more correct QA estimate can be found. They are very general as they are valid for any QA measure and any distortion kind, and they hold independently of the Visual Distortion Typical Set characterization. The last part of this section shows how to ﬁnd out one sequence belonging to the Visual Distortion Typical Set by means of an Information Theory based strategy. Section IV shows some results on LIVE [32, 29, 30, 16] and TID2013 [24] databases oriented to experimentally conﬁrm the theoretical results in Section III, while showing the potential of the proposed approach in terms of SSIM speed up with a very small error estimation error. Finally, Section V draws the conclusions. 2. Preliminary Background In this paper, a scene denotes the image I under study. Its Visual Distortion Typical Set AM with respect to any FR quality measure M is supposed to be composed of a subset of information of I along with the corresponding one in its distorted version J. This set will then depend on the original source I, on its distorted version J, on the considered FR quality measure ˆ (i.e., M estimated on A ) and M ¯ M and on , i.e. the distance between M M (i.e., M computed using the whole available information of I and J). In order to ﬁnd out the visual distortion typical set AM , an entropy based formulation will be adopted. In other words, both the original image and its distorted version will be seen as related to two sources [37]. In agreement with Information Theory principles, it is always possible to assume the existence of a typical set, i.e. a subset of sequences (composed of 2D blocks) able to describe the original source content within a small error — to be ﬁxed a priori. It is worth stressing that the proposed approach is not far from some widely investigated problems like selective visual attention and quality metrics. In fact, there is a wide literature on how to ﬁnd fixation points [10, 20, 3

47], that allow to synthesize and understand scene information in the preattentive phase. In particular, several approaches oriented to determine the most representative subset of information of a given scene are usually based on saliency maps [2, 4, 26, 45]. However, the proposed approach has some diﬀerences with this speciﬁc literature. It accounts for both original and distorted versions, while most of the existing approaches deal with just the original source information. Among the approaches and attempts also dealing with degradation, to the best of authors’ knowledge there are not complete theoretical formalisms that lead to a speciﬁc subset, like AM , in a limited time [25], as the proposed approach does. In fact, a determistic formalism should account for suprathreshold eﬀects that have not been understood yet [47]. Unlike existing approaches that provide empirical strategies that lead to a speciﬁc solution (i.e. a speciﬁed walk in terms of blocks within the scene under exam), the proposed approach proves the existence of more than one walk (or blocks sequence) given I, J, M and . These blocks sequences have an informative content very close to the one of the whole sources I and J — in agreement with the notion of typical set in Information Theory [37]. In this sense, the proposed formalism may be also interesting from a theoretical point of view as it may give some basic criteria used by Human Visual System in the selection of information for quality assessment. The proposed approach is also related to the problem of selecting the optimal pooling function for a quality assessment measure. In fact, any QA measure estimate can be split into two phases. In the former, local distortions are estimated via a suitable visibility based distance function. In the latter, these distortions are combined through a pooling strategy. Some interesting and eﬀective approaches focusing on this topic have been already investigated (see for instance [3, 23, 46]). The proposed approach can be then seen as a binary pooling function that preserves some blocks while discards the others. However, again, we are not interested in a precise and speciﬁc strategy for balancing all available information. The possibility of estimating a rough but not computationally expensive subset of available information is the main target in order to get a real speed up for any visual FR QA measure and for any global distortion. 3. Visual Distortion Typical Set In order to ﬁnd out the Visual Distortion Typical Set, SSIM will be mainly considered here, even though theoretical results are valid for any distortion 4

SSIM Computation 1. Split the original image I into a set of W1 · W2 blocks {bi } of size l ×l and centered at each pixel of I. Make the same for the distorted version J, achieving blocks {di }. 2. For each block bi and the corresponding di , estimate SSIM: Mi (bi , di ) =

2μbi μdi + C1 μ2b + μ2d + C1 i i

luminance adaptation

2σbi σdi + C2 σb2 + σd2 + C2 i i contrast masking

σbi di + C3 σ σ i + C3 bi d

spatial correlation

where C1 ,C2 and C3 are numerical stabilizing constants. The array M (that can also be seen as a matrix, as each pixel of I (or J) can be assigned to the corresponding SSIM value) is then produced. 1 ·W2 Mi 3. Compute the mean of M : M = W11·W2 W i=1 Figure 1: Algorithm for SSIM estimation.

measure. Fig. 1 gives a simple algorithm for SSIM evaluation [42, 43] between the original image I and its distorted version J. It is worth observing that, in general, the blocks bi are overlapping and W1 · W2 = N, where N is the image dimension. SSIM algorithm in Fig. 1 has been deﬁned by preferring the (apparent) most simple way to face the problem. However, one may ask if these solutions always are the best one can do. Speciﬁcally, one may deal with the following aspects: 1. Information reduction Is the whole I and J’s information really important, or is it suﬃcient to select just a part of it? 2. Selection of the best reduction domain Is it more convenient to reduce I and J’s information or M’s information (i.e. to subsample M in Fig. 1)? 3. Locality of the selected information Is it more convenient to take I (and J) samples from local and compact regions (for instance, blocks) or by means of a sparse spatial pixels selection from I (and J)? 4. Overlapping blocks In the case of block-based measures, have blocks to be overlapped? 5. How to find an AM sequence Is there a formal (and possibly fast) procedure to ﬁnd this reduced information? 5

Figure 2: The original image (left) can be considered to be composed of two homogeneous regions X1 , X2 — as well as its distorted version (right).

The sequel shows that the estimate of AM (and the search of at least one subsequence belonging to it) has as side eﬀect a formal answer to these questions. 3.1. Information reduction From a qualitative point of view, the visual distortion typical set AM can be deﬁned as a subset of all sequences composed of samples of I (and the ˆ) corresponding ones of J) such that they give an approximated value (i.e. M ¯ ˆ ¯ of the expected value of M (i.e. M ) within an error , i.e.: |M − M | < . More formally, AM can be thought in terms of IT quantities. Shannon typical set is deﬁned as the set of sequences of ﬁxed size whose entropy is close to the true (source) one. Similarly, we can think about the original image I as the ﬁrst source associated to the variable X, its distorted version J as the second source associated to the variable Y while the variable Z = M(X, Y ) characterizes the third source M which, in turn, depends on X and Y . Note that AM also depends, even though not explicitly, on the kind of degradation DJ that gave J starting from the original image I: J = DJ (I). However, DJ can be considered ’embedded’ in J and then it will be not explicitly mentioned in the sequel. AM will be then composed of the subset {X1 , .., XNr , Y1 , .., YNr } of size 2Nr < 2N = 2W1 W2 such that: ¯ (X, Y ) − M ¯ (X1 , .., XNr , Y1 , .., YNr )| < |M

> 0.

(1)

The existence of AM is guaranteed, again, by IT results. In fact, nthe weak 1 law of large numbers states that for i.i.d. r.v.s Xi it holds: n i=1 Xi → ¯ X n → ∞. However, it is more convenient to use the equivalent concept, known as the Asymptotic Equipartition Property (AEP) [37] for which 1 log p(X1 ,X12 ,..,Xn) → H(X) n → ∞, where p denotes the pdf. n That is why in the sequel just the entropy will be considered. Entropy is more mathematically tractable as it has a monotonic behavior as the number 6

of samples grows [37], while it is not so for the mean value, as proved in the following: Proposition 1 Let X ∼ Q with a positive and numerical alphabet χ and {X1 } ∼ p1 , {X1 , X2 } ∼ p2 , ..., {X1, X2 , ..., Xn } ∼ pn while μn be the mean of pn and μ be the mean of Q. Then 1. the sequence {μn } is not monotonic for increasing n; 2. |μn − μ|2 ≤ 2CDKL (pn ||Q), with C = maxx∈χ x ∀n. x pn (x) + Proof 1 1) μn+1 − μn = x∈χ x(pn+1 (x) − pn (x)) = x∈χ − n+1 xn+1 xn+1 μn 1 = − n+1 + n+1 = n+1 (xn+1 − μn ). Hence, the diﬀerence between two n+1 successive mean values changes its sign depending on the new value xn+1 : it is positive if xn+1 > μn , negative otherwise. It turns out that the convergence of the mean value is not monotonic. 2) |μn − μ|2 = | x(pn (x) − Q(x))|2 ≤ C 2 V 2 (pn , Q) x∈χ

where V (pn , Q) is the variational distance between pn and Q i.e., V (pn , Q) = 1 2 x∈χ |pn (x) − Q(x)|, and C = maxx∈χ x. Since DKL (pn ||Q) ≥ 2 V (pn , Q), we have |μn − μ|2 ≤ 2C 2 DKL (pn ||Q). Hence, if n is such that DKL (pn ||Q) ≤ ε , with ε > 0, then |μn − μ|2 ≤ ε. • 2C 2 3.2. Selection of the best reduction domain In order to get a typical subsequence {X1 , .., XNr , Y1, .., YNr }, one may ask whether it is more convenient to reduce information of the sources X and Y ¯ on them (and then Z), or to leave X and Y unchanged and to estimate M while reducing Z’s information. This is the topic of the following: Proposition 2 H(Z) ≡ H(M(X, Y )) ≤ H(X, Y ). Proof 2 Since p(X, Y, Z) = p(Z)p(X, Y |Z), we have: H(X, Y, Z) = H(Z) + H(X, Y |Z) ≥ H(Z)

(2)

On the other hand, since p(X, Y, Z) = p(X, Y )p(Z|X, Y ) we have: H(X, Y, Z) = H(X, Y ) + H(Z|X, Y ) = H(X, Y ).

(3)

The proposition is proved by inserting eq. (3) in eq. (2). • Prop. 2 can be seen as a straightforward consequence of the well-known result H(f (X)) ≤ H(X) which holds for any function f [37]. It states that 7

if the information in X is correlated by means of a given function f , the entropy decreases. In other words, the entropy of SSIM (Z) is less than the joint entropy of the source images (X and Y ). As a result, if we are interested in ﬁnding the typical set of Z we should sample Z and not the original sources X and Y , since part of the information of X and Y may be lost in the computation of Z = M(X, Y ). That is why it it is more convenient to leave X and Y unchanged, while reducing Z information. However, as we will see in the following section, Z sampling can correspond to a proper X and Y sampling, whenever the criteria used in the latter sampling are derived from Z properties. 3.3. Locality of the selected information Though the results above would lead to select information directly from M, it is necessary to ﬁnd a strategy for retrieving part of the signiﬁcant information directly from X and Y . In fact, with regard to SSIM algorithm in Fig. 1, it is unuseful (and non eﬀective) to ﬁrstly build the whole vector M to take just a subset of its samples. In the sequel, we ﬁnd formal criteria for the selection of this signiﬁcant information. More formally, we can assume the subsequence {X1 , .., XNr , Y1 , .., YNr } to be built in a progressive manner: {X1 , Y1},

{X1 , X2 , Y1, Y2 },

...,

{X1 , .., XNr , Y1 , .., YNr }

(4)

till the constraint in eq. (1) is veriﬁed — with ﬁxed ’a priori’. The problem is then to determine if it is better to locally take samples from the image (i.e., by means of blocks) or to take them non locally. The original image I can be considered to be composed of a ﬁnite number of ’homogeneous’ regions (for instance ’grass’, ’sky’ etc.). Without lack of generality, we can consider two only regions, as in Fig. 2. Hence: Proposition 3 Be X1 with prob. α X= X2 with prob. 1 − α Y =

Y1 Y2

and with prob. α with prob. 1 − α

where X1 ∩ X2 = ∅, Y1 ∩ Y2 = ∅ and Z = M(X, Y ) ∼ pZ . Then H(pZ ) ≤ H(α) + H(pM (X1 ,Y1 ) ) + H(pM (X2 ,Y2 ) ). 8

(5)

The proof is in the Appendix. For a suitable cardinality (> 2) of the alphabet of M, H(α) (whose maximum is equal to 1) can be neglected and then one can say that a mixture leads to a lower entropy. Hence, in order to search the subsequence {X1 , .., XNr , Y1 , .., YNr } necessary to build the reduced vector of Z, it is more convenient to ﬁnd it in local regions within the images I and J. In this way, we maximize {M(X1 , Y1), .., M(XNr , YNr )} entropy — by minimizing at the same time Nr . This theoretical result shows that the practical choice of getting SSIM information from blocks in I and J is really the most convenient. It is not fortuitous that also HVS follows the same procedure (see [10, 20]). 3.4. Overlapping blocks It remains to see whether blocks have to be overlapped or not. Next proposition proves that the selection of non overlapping blocks maximizes entropy and then it is the most convenient method for block selection. Proposition 4 If Z1 , Z2 , .., ZT are variables associated to image blocks that overlap and such that Ti=1 Zi = I, while Z¯1 , Z¯2 , .., Z¯R are blocks such R

¯ ¯ that R i=1 Zi = ∅ but i=1 Zi = I, then H(Z1 , Z2 , .., ZT ) H(Z¯1 , Z¯2 , .., Z¯R ) ≤ T R

T > R.

Proof is in the Appendix. 3.5. How to find one sequence belonging to AM The objective of this paper is to go beyond the teoretical existence of AM . We want to ﬁnd at least one subsequence ∈ AM with the least size (i.e. the minimum Nr ) — and with a low computational eﬀort, if possible. Mathematically, if we ﬁnd a subset of indices {i1 , .., iL } such that ¯ ¯ (xi1 , .., xim , yi1 , .., yim )| < . Nr = argmin |M(X, Y)−M m

(6)

Previous sections told us that information has to be extracted via non overlapping blocks but there is not a constructive way for ﬁnding such blocks. In the sequel, an entropy approach based on visual system behavior in its preattentive phase is proposed. It is natural to guess that the peculiarities of natural scenes guided the evolution of the Human Visual System over time [35]. In particular, saccadic 9

movements (generating ﬁxation points) are mainly guided by the image content in the preattentive phase (i.e., in the ﬁrst milliseconds of scene inspection) rather than by the observer experience, needs etc. — as it happens in the successive (attentive) phase. As we are considering global distortions, this conjecture still holds. The proposed method tries to account for the aforementioned considerations through the following phases: i) rough image segmentation in regions having diﬀerent characteristics and ii) a random explorative walk on these regions. The latter step allows us to ﬁnd out blocks of a sequence belonging to the typical set. Some hints can be found in [10]. In particular, since just a few ﬁxation points are employed in the preattentive ’scene understanding’, the number of signiﬁcant blocks will be estimated by means of the Minimum Description Length, exploiting the entropy monotonic behavior shown in Section 3.1, as explained in the sequel. 3.5.1. Image segmentation Image segmentation has been performed on a low-pass version of the luminance of the original image. It is performed on the original image since a global distorsion uniformly aﬀects each image region. The luminance criterion for segmentation has been considered since it is one of the two measures (the second one is contrast) that regulate the adaptation process in the preattentive phase. Contrast has not been considered here for keeping the model complexity low. Finally, a low pass ﬁlter whose cutoﬀ frequency depends on the viewing distance simulates early vision process. Speciﬁcally, the approximation band (low-pass component) at level G (AG ) of the dyadic wavelet 1 expansion of the image I has been computed [18], since its dimension is 2G+1 G of the original image size. For segmenting A , the Successive Mean Quantization Transform (SMQT) [22] has been adopted due to its simplicity and reduced computational eﬀort. SMQT builds a binary tree using the following rule: given a set of data AG and a real parameter L (number of levels), split AG into two subsets, G G G G G G G , and A AG = x ∈ A |A (x) ≤ A = x ∈ A |A (x) > A 0 1 G where AG is the mean value of AG . AG 0 and A1 are the ﬁrst level of the G SMQT. The same procedure is recursively applied to AG 0 and A1 until the th L L level, that is composed of 2 subsets (regions) that will be denoted with R1 , R2 , . . . , R2L .

10

3.5.2. Explorative walk on the scene The observation process to ﬁnd out image blocks has been modeled as a Markov chain, i.e. random walk on a connected weighted graph whose nodes are the 2L regions R1 , R2 , . . . , R2L , with weights Wij ≥ 0 on the edge joining node i to node j. The graph is undirected, i.e Wij = Wji , and Wij = 0 if there is not an edge joining the node i to the node j. Hence, given a point randomly extracted from the region Ri , the successive point in the walk is a random point in the region Rj chosen among the nodes connected to Ri with a probability Wij Pij = (7) i∼k Wik that is proportional to the weight Wij . By denoting with ni the number of pixels in the region Ri , the weights are deﬁned as follows i=j ni Wij = (8) Zij +Zji i = j 2 where Zij = nj

i∼k,k=i nk

2L

k=1

nk

. Wij takes into account the representativeness of

the region Rj in the image and also as neighbouring region of Ri . Even though a more reﬁned deﬁnition of the weights could be used, this choice is simple but enough signiﬁcant for our preliminary study. The initial point of the walk can be extracted looking at the stationary distribution of the process [37]. The number of blocks can be estimated via an automatic procedure based on the minimum description length principle [15], as shown in the sequel. 3.5.3. MDL for blocks number estimation This principle allows the selection of a good model for approximating the data with the least complexity. It is based on the concept that good compression means good approximation, in agreement with the deﬁnition of Kolmogorov complexity. Speciﬁcally, the simpler version of MDL, namely crude-MDL, selects a model from a set of candidates M(1) , M(2) , . . . by minimizing the following cost L(M(k) ) + L(X|M(k) ),

(9)

where L(M(k) ) is the cost (in terms of bits) required for coding the model M(k) , while L(X|M(k) ) is the number of bits required for coding the data 11

Figure 3: Original version of the images from LIVE database considered in this paper for tests. Clockwise: Ocean, Stream, Lighthouse, Sailing4, House, Flowersonih35.

X given the model. In general, the better the model the higher its cost but the smaller the approximation error. That is why the selection of the best model is a trade oﬀ between complexity and good approximation. In our case the model M(k) is the ﬁxation path containing the SSIM value of k points whose average gave an approximation of SSIM of the whole image. The data X are corresponding blocks in I and J centered at the selected pixels that are involved in SSIM computation. The cost is measured as entropy per element. More precisely, by indicating with M1 , M2 , . . . , Mk the value of SSIM computed in the ﬁrst k points selected during the random walk on the graph described above, and with (b1 , b2 , · · · , bk ) the blocks 2 ,Mk ) and used for the evaluation of SSIM, we have L(X|M(k) ) = H(M1 ,M k H(b1 ,b2 ,...,bk )+2log2 (k)+1 (k) 2 L(X|M ) = where H is the entropy, l is the dimen2w 2 sion of a block and 2log2 (k)+1 is the cost for coding the integer k. By coding the blocks independently, H(b1 , b2 , · · · , bk ) = kH(bi ), i = 1, 2, . . . , k and by considering a compression ratio 8 : 1, eq. (9) can be rewritten as Nr = argmink

H(M1 , M2 , .., Mk ) k + 2log2 (k) + 1 + k 2l2

(10)

where Nr gives the length of the optimal path, i.e. the length of a sequence in the visual distortion typical set.

12

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

SSIM SNR SSIM entropy SNR entropy

0.1 0

0

200

400

600

800

1000

1200

1400

1600

Figure 4: Ocean image and its fastfaded copy (LIVE). SSIM (blue) and SNR (red) versus the number of considered blocks. The other two (smoother) curves indicate SSIM (black) and SNR (green) entropy. For a clearer presentation on just one picture, all plots have been normalized.

Figure 5: Ocean image and its Gaussian blurred version (LIVE). Entropy versus number of blocks (or an equivalent number of randomly selected and non adjacent pixels). SSIM entropy via random pixels (solid), SSIM entropy via non overlapping blocks (dashdot), SNR entropy via random pixels (dashed), SNR entropy via non overlapping blocks (dot).

3.6. The Algorithm The complete algorithm of the proposed method is given below. 1. Compute the wavelet approximation band AG at G − th level of the image I 2. Apply L levels of the SMQT transform to AG and extract the regions R1 , R2 , . . . , R2L 3. Compute the cardinality n1 , n2 , . . . , n2L of the segmented regions and evaluate the weights of the graph as in eq. (8) 13

Original Image ocean H(X) 7.1785

Distorted Image img57 ocean H(Y |X) 4.8630

Distortion kind gaussian blur ¯ H(Z) H(Z) 5.0798 5.3230

Original Image stream H(X) 7.4230

Distorted Image img58 stream H(Y |X) 6.6357

Distortion kind gaussian blur ¯ H(Z) H(Z) 5.9525 5.9997

Original Image lighthouse H(X) 7.3799

Distorted Image img97 lighthouse H(Y |X) 5.3012

Distortion kind gaussian blur ¯ H(Z) H(Z) 5.1560 5.4239

Original Image sailing4 H(X) 6.8476

Distorted Image img127 sailing4 H(Y |X) 5.2000

Distortion kind gaussian blur ¯ H(Z) H(Z) 5.0161 5.1369

Original Image ocean H(X) 7.1785

Distorted Image img7 ocean H(Y |X) 4.7998

Distortion kind fastfading ¯ H(Z) H(Z) 4.6848 4.8755

Original Image house H(X) 7.1803

Distorted Image img73 house H(Y |X) 4.6494

Distortion kind fastfading ¯ H(Z) H(Z) 4.0312 4.3385

Original Image stream H(X) 7.4230

Distorted Image img100 stream H(Y |X) 6.4137

Distortion kind jpeg ¯ H(Z) H(Z) 5.2691 5.5687

Original Image flowersonih35 H(X) 7.7161

Distorted Image img27 flowersonih35 H(Y |X) 5.6465

Distortion kind jpeg ¯ H(Z) H(Z) 3.4555 3.6996

Original Image ocean H(X) 7.1785

Distorted Image img118 ocean H(Y |X) 4.6308

Distortion kind white noise ¯ H(Z) H(Z) 4.2645 4.6762

Original Image house H(X) 7.1803

Distorted Image img109 house H(Y |X) 6.4434

Distortion kind white noise ¯ H(Z) H(Z) 6.1604 6.1460

Original Image flowersonih35 H(X) 7.7161

Distorted Image img72 flowersonih35 H(Y |X) 5.5789

Distortion kind white noise ¯ H(Z) H(Z) 3.6160 3.8698

Table 1: Entropy of the original images in Fig. 3 (H(X)), entropy of the corresponding distorted images given the original ones (i.e. H(Y |X)), entropy of SSIM with overlapping ¯ blocks (H(Z)) and entropy of SSIM for non overlapping blocks (H(Z)).

4. Extract a point from a region R1 according to the stationary distribution of the graph as deﬁned in eq. (8) 5. Compute M1 , i.e. SSIM on a block of dimension l × l centered at the selected point and set k = 2 6. Extract a point in the region Rj selected according to the probability Pi,j deﬁned in eq. (7) 7. Compute Mk , i.e. SSIM on a block of dimension l × l centered at the selected point 8. Evaluate the argument of eq. (10) and assign its value to the new variable Lk 14

Figure 6: (Left) Original Ocean image and (Right) its blurred version from TID2013.

ˆ = 9. If Lk > Lk−1 , set Nr = k − 1 and M k = k + 1 and go to step 6.

Nr

k=1 Mk Nr

and stop; otherwise set

3.7. Model’s complexity The following proposition provides a constraint on the number Nr of blocks to use in the proposed method in order to have a computational gain with respect to the classical algorithm for SSIM computation in Fig. 1. Proposition 5 Let OSSIM and Oalgo be the number of operations required for the computation of SSIM respectively using all image pixels and Nr blocks selected using MDL procedure. Let Clog be the cost for the calculation of the logarithm of a number, N the image size and |χ| the cardinality of the alphabet of SSIM, G and L two integer numbers denoting the levels of respectively the wavelet transform and the SQMT; then

1 7 Oalgo = 1 − 2G + 2L + 1 N − 2L − L + 1 + 22L )+ 3 2

3 N2 2 + 8l + 30 + Clog + log2 |χ| Nr + (4 + Clog ) r 2 2 and OSSIM = (8l2 + 18)N. In addition, −8l2 − 30 − 32 Clog − log2 |χ| + Nr < N0 = 4Clog with

√ Δ

⇒ Oalgo < OSSIM

1 7 2 L+1 2L+1 Δ = Clog 2N 8l + 17 − 2L − 1 − 2G +2 . + 2L − 2 − 2 3 2 15

Figure 7: (Left) SSIM map of Fig. 6 and (right) its SMQT segmentation. √ −8l2 −30− 32 Clog −log2 |χ|+ Δ 4Clog

Proof: The proof is simply achieved by observing that is the positive root of the equation Oalgo − OSSIM = 0. The detailed computation of Oalgo and OSSIM is in the Appendix1 . 4. Experimental Results

In order to clearly present some tests that conﬁrm the proposed theoretical results, this section has been organized as follows. The ﬁrst part is devoted to the theoretical results in Sections 3.1-3.4. The second part presents some tests regarding the topic of Section 3.5 and show the potential of the proposed approach in speeding up SSIM evaluation. All tests have been performed on several test images. For the sake of brevity, we will present just those concerning some images coming from LIVE [32] (Fig. 3) and TID2013 [24] databases. LIVE database is composed of 779 images having a diﬀerent amount of (ﬁve kinds of) distortion: Fast Fading, Gaussian Blur, JPEG2K, JPEG and Additive Gaussian Noise. TID2013 database [24] contains images aﬀected by diﬀerent kinds of distortion, among which the global ones are additive and multiplicative gaussian noise, high frequency noise, gaussian blurring, jpeg and jpeg 2000 compression, mean shift and contrast change. For each distortion, four intensity levels have been considered. 4.1. Some tests on reducing information Fig. 4 shows a ﬁrst qualitative, but very informative, result about the theoretical ﬁndings in Section 3.1 on Ocean image and its fastfaded version (from Live database). Speciﬁcally, it shows both the quality assessment 1

It is worth observing that constraint on Nr is feasible. N0 = 1053 when parameters default values, given in Section 4.5, are used for 256 × 256 images.

16

measure versus the number of selected blocks and the entropy of the same quality assessment measure versus the number of selected blocks. Both SSIM and SNR have been considered, using (non overlapping) 16 × 16 blocks on each image. The blocks have been randomly set, in order to show that results are not tied to a speciﬁc block selection procedure. Notice that the curves in Fig. 4 have been normalized (divided by the corresponding maximum value) in order to better appreciate their trend on the same plot. In addition, a uniform quantization step with bins of width (Δ) equal to 10−4 has been used for the SSIM vector, which is composed of SSIM values computed on each selected block — a very little bin size for reducing the quantization distortion (∝ log(Δ) [37]). Looking at Fig. 4, it can be seen the diﬀerent behavior of the curve of the Full Reference measure (i.e., SSIM and SNR) and the curve of the corresponding entropy. In particular, the ﬁrst part of SSIM and SNR curves is clearly oscillating before approaching values close to the true one — where their trend is quite horizontal. This conﬁrms the result in Prop. 1. On the contrary, the corresponding entropies have a smoother increasing trend with a critical curvature after which the values approach the entropy of the whole available sample (i.e., the true one). However, it can also be noted that both the critical ascending trend of entropies and the oscillating region of SSIM and SNR stop nearly in correspondence to the same point. It is worth highlighting the fact that the behavior in Fig. 4 is common to all considered images and any distortion kind. Interestingly, the behavior of those curves does not change if involved parameters change. This preliminary result shows two important things. The ﬁrst one is that it is really possible to drastically reduce the available information sent by the two sources I and J in order to achieve a quite exact value of the involved quality measure (quantitative results are presented below). The second one is that the search of the reduced information may be performed exploiting the entropy smooth proﬁle. 4.2. Some tests about the best reduction domain Section 3.2 has proved that it is more convenient to work in the measure (i.e. SSIM, SNR etc.) domain rather than directly in the image one. In fact, any measure naturally reduces entropy [37]. Hence, any further information reduction is more convenient on the already reduced information — without introducing any distortion before. It can be easily veriﬁed that the measure domain contains less information than the original source one as follows. For each image in Fig. 3, the corresponding entropy H(X) has been computed 17

distortion kind high frequency noise Gaussian noise Gaussian blur JPEG compression JPEG2K compression Mean shift Contrast change Multipl. gaussian noise

dist. level 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5

¯ M

ˆ M

σ

¯r N

σNr

nopp

0.8661 0.7034 0.4829 0.2722 0.8673 0.7781 0.6614 0.5276 0.9513 0.8805 0.7925 0.7012 0.9451 0.8891 0.7578 0.6320 0.8516 0.6942 0.5394 0.4799 0.9951 0.9778 0.9620 0.8929 0.9829 0.9713 0.9349 0.8726 0.8594 0.7730 0.6615 0.5376

0.8672 0.7086 0.4842 0.2627 0.8728 0.7821 0.6607 0.5272 0.9542 0.8828 0.8045 0.7128 0.9448 0.8895 0.7518 0.6257 0.8553 0.6939 0.5529 0.4827 0.9950 0.9779 0.9644 0.8930 0.9832 0.9711 0.9392 0.8749 0.8697 0.7851 0.6627 0.5346

1.1222 2.6788 4.9297 8.9767 1.4447 2.1462 3.4739 4.3112 0.6528 1.4811 2.9532 4.1474 0.5260 0.8761 2.1654 4.1073 1.9288 4.0191 6.4984 7.6406 0.1259 0.2198 0.5506 1.0167 0.3674 0.1596 0.8867 0.7490 2.3503 3.3095 4.0346 5.0814

0.0113 0.0237 0.0275 0.0284 0.0132 0.0217 0.0308 0.0260 0.0069 0.0159 0.0260 0.0346 0.0061 0.0097 0.0198 0.0311 0.0183 0.0326 0.0395 0.0426 0.0015 0.0028 0.0059 0.0111 0.0042 0.0019 0.0086 0.0081 0.0220 0.0286 0.0327 0.0322

48.00 48.87 48.70 47.80 45.67 48.50 48.87 49.30 37.37 44.47 46.00 48.40 41.87 47.33 48.33 49.37 46.73 49.43 47.73 49.53 21.00 34.37 31.27 45.53 29.23 33.53 40.20 44.60 45.80 47.40 48.60 50.77

2.92 4.54 3.51 4.43 3.87 4.55 3.50 3.37 6.15 5.10 6.59 4.67 5.17 3.94 3.56 4.10 3.59 4.35 3.55 2.83 5.52 6.13 7.36 4.57 5.70 5.51 5.70 3.28 5.25 4.33 3.75 3.18

10.00 10.01 10.01 10.00 9.95 10.00 10.01 10.02 9.82 9.93 9.96 10.01 9.90 9.99 10.00 10.02 9.98 10.02 9.99 10.02 9.58 9.78 9.73 9.96 9.70 9.76 9.87 9.94 9.96 9.99 10.01 10.05

Table 2: Ocean image; I16 in TID2013 database with diﬀerent distortion kind and distor¯ ), estimated SSIM (M ˆ ) using the proposed method, mean value of tion levels. SSIM (M the estimation error (%) over 30 runs (), standard deviation of the estimation error (σ ), ¯r ), standard deviation of the number of blocks mean value of the number of blocks used (N (σNr ), number of operations per pixel (nopp) required by the proposed algorithm.

in Table 1 (ﬁrst column). In this case, bins width (necessary to build the empirical p.d.f.), has been set equal to 1. This choice has been motivated by setting the precision at the one of the image. For any distorted image J, the 18

conditional entropy H(Y |X) has also been computed (second column). It has been achieved considering the distortion as an additive term, i.e. H(X − Y ). Although it may seem a rough approximation, since the distortion process may be much more complicated, it is quantitatively signiﬁcant as proportional to the residual between the original source I and the distorted one J. Again, the size of the bin width of H(Y |X) has been set equal to 1 while tests have been performed for 32 × 32 blocks. Finally, the third column of Table 1 contains the entropy H(Z) relative to the SSIM vector (i.e., of the vector M in Fig. 1) by quantizing M with a bin width of .01, that accounts for the fact that the ratio between the range of SSIM and the one of the images 1 (original and distorted) is 255 .004. It can then be easily observed in Table 1 that H(Z) < H(X) + H(Y |X). Hence, SSIM naturally reduces entropy of the original images [37]. In particular, SSIM entropy can be considerably less than the joint entropy of the original and distorted images, according to both image content and distortion kind. As a result, any preprocessing on X and Y can alter data making H(Z) estimation completely unfair. That is why the selection of corresponding blocks in X and Y must obey constraints and rules directly tied to SSIM. This result holds for both overlapping and non overlapping blocks. Moreover, this behavior does not change for a diﬀerent parameters setting. 4.3. Some tests about the selected information locality Fig. 5 is one of the examples that prove the results in Section 3.3. In other words, it is more convenient to use blocks rather than pixels randomly spread over the image for building Full Reference quality measures. In particular, blocks convey a greater amount of information. Fig. 5 shows that SSIM entropy curve built via non overlapping and randomly selected blocks is always over SSIM entropy curve built via randomly selected pixels — pixels are not adjacent in the image. The same result can be observed for SNR curves, even though in this case the eﬀect is strongly less visible: curves are very close to each other. Quantization bins have been set equal to 10−4 in all tests, but diﬀerent settings conﬁrm the same trend. 4.4. Some tests on overlapping blocks choice Section 3.4 states that it is more convenient to select non overlapping blocks from I and J. A simple practical proof can be performed by taking all possible (non overlapping) blocks from images shown in Fig. 3 and computing SSIM on them. Considering again a bin width of .01 (for the same reasons as 19

distortion kind Gaussian noise High frequency noise Gaussian blur Jpeg

Jpeg2K

Mean shift Contrast change Multiplicative gaussian noise

distortion level 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5

(%) 1.11 1.76 2.77 3.67 1.08 2.57 4.16 7.06 0.43 1.02 2.26 3.89 0.45 0.76 1.57 3.16 0.89 1.79 2.96 4.88 0.14 0.76 0.61 2.24 0.51 0.27 1.23 0.85 1.71 2.28 3.03 4.44

σ 0.012 0.017 0.024 0.029 0.011 0.023 0.030 0.033 0.005 0.010 0.020 0.028 0.005 0.008 0.015 0.025 0.009 0.017 0.023 0.032 0.002 0.008 0.007 0.021 0.006 0.003 0.013 0.009 0.017 0.023 0.027 0.034

¯r N 42.82 44.75 46.70 48.17 43.18 46.58 48.60 48.92 39.38 44.68 47.36 48.94 41.10 45.10 47.64 48.84 43.74 46.99 47.99 49.25 23.48 34.83 36.20 44.76 27.50 29.82 38.58 42.68 39.77 43.91 45.37 47.75

σNr 5.17 5.16 4.35 3.96 5.14 4.54 3.61 3.61 5.05 4.64 4.42 3.67 4.57 4.48 3.98 3.85 4.74 4.02 3.64 3.27 5.38 6.32 6.28 4.49 6.72 7.23 6.14 4.78 6.44 4.97 4.58 4.00

nopp 9.91 9.94 9.97 10.00 9.92 9.97 10.01 10.01 9.86 9.94 9.99 10.01 9.88 9.95 9.99 10.01 9.93 9.98 10.00 10.02 9.61 9.78 9.81 9.94 9.67 9.71 9.84 9.91 9.86 9.93 9.95 9.99

Table 3: TID2013 database. Mean value of SSIM estimation error (%) over 30 runs (), standard deviation of the estimation error (σ ), mean value of the number of blocks used ¯r ), standard deviation of the number of blocks (σNr ), required number of operations (N per pixels (nopp). The latter must be compared with 2330, which is the nopp required by SSIM computation using all pixels in 512 × 384 images. Diﬀerent distortion kinds and distortion intensities have been considered.

20

in Section 4.2) for this new vector M and 32 × 32 blocks, the corresponding ¯ has been then computed (fourth column of Table 1). Table 1 entropy H(Z) shows that non overlapping blocks lead to a higher entropy and then they convey a greater amount of information. In order to minimize the number of selected blocks that contain most of information, the selection of non overlapping blocks is then the most eﬀective strategy to adopt. 4.5. SSIM speed up The proposed method has been tested on several images aﬀected by different distortion kinds with diﬀerent intensities from TID2013 database. The step by step procedure is presented for 512 × 384 Ocean image (image I16 in TID2013) and its blurred version 2 (see Fig. 6). In all tests, the following parameters have been used. The level G of the wavelet transform (a Daubechies with 2 vanishing moments) has been set equal to 3; the level L of the SMQT has been ﬁxed to 3 in order to have 8 regions; blocks size (l × l) for SSIM computation has been set to 17 × 17, since it corresponds to a visual angle equal to 0.56 degrees [20] — however smaller dimensions provide similar results; the cardinality of the alphabet for SSIM has been set equal to 200, that corresponds to a quantization step equal to 0.01. Figure 7 shows the segmentation used for Ocean image. As it can be observed, the segmentation is quite faithful to its SSIM map except for the edges. It is due to the fact that the criterion used for the segmentation is based just on the luminance and then a region based segmentation has been employed. It turns out that the optimal point selected by the MDL principle (see Fig. 8) on the entropy curve corresponds to a good value of SSIM, providing acceptable estimation errors. It is worth outlining that Fig. 8 clearly shows that MDL is able to exploit the property in Section 3.1 i.e., entropy behavior is monotonic and signiﬁcantly more regular than the SSIM one. Finally, Fig. 9 shows the blocks belonging to the selected ﬁxation path. As it can be observed, more blocks are selected in regions where blurring is more visible. With regard to quantitative results, note that each run of the proposed algorithm provides a diﬀerent sequence in the visual distortion typical set of the image under study. That is why the average value of SSIM estimations obtained over 30 runs of the algorithm has been given in Table 2. The latter includes the standard deviation of the estimation, the average number of blocks used for 2

Results for diﬀerent images do not signiﬁcantly diﬀer from the ones for Ocean image.

21

SSIM

MDL

1

0.7

0.6

0.95

0.5 0.9 0.4 0.85 0.3 0.8 0.2 0.75

0.7

0.1

0

10

20

30

40

50

60

0

70

0

10

20

30

40

50

60

70

Figure 8: (Left) SSIM value of Fig. 6 estimated for an increasing number of blocks and (Right) entropy per sample used in the MDL based procedure — the optimal point has been marked.

Figure 9: Selected blocks on Fig. 6.

computing it and the corresponding standard deviation as well as the average number of required operations per pixel (nopp). The latter is compared with the nopp required by the computation of SSIM using all image pixels. As it can be observed, the estimation error increases as the distortion level increases but it rarely reaches 8%. In addition, the standard deviation is very small (10−2 − 10−3 ). For some distortion kinds, like gaussian noise and gaussian blur this percentage is less than 5%; for distortions like mean shift and contrast change it does not overexceed 1.2%. The same considerations are valid for a larger class of images, as shown in Table 3, which shows the average results achieved for images in TID2013 database. The average number of the selected blocks is always less than 50. It turns out that the number of operations required for the computation of SSIM of Ocean image is reduced of about 200 times. It is also worth stressing that the proposed procedure does not involve an exhaustive search of points of interest, as required by the contrast-based procedure in [25].

22

5. Conclusions and Future Research This paper has shown that it is possible to estimate a quality assessment measure from just a subset of image information. The relative estimation error obviously depends on the scene content and on the kind of distortion, but it has been shown to be usually small — very often under 5%. The proposed entropy based formalism has also proved that there exist some criteria for an optimal estimation of QA measures. It is worth outlining that the proposed formalism may have various further advantages, both theoretical and practical. In fact, such an approach may also allow: i) to improve the design of existing FR measures, ii) to design novel and possibly more precise ones, iii) to build novel ’HVS based functionals’ according to a novel concept of 2D and 3D functions regularity, iv) to add some novel elements to Visual Information Theory [7] with possible eﬀects on the deﬁnition of new visive image coding schemes etc.. Moreover, since the proposed approach is close to two well investigated topics as pooling and ﬁxation points, future research will be oriented to deepen common aspects involving them. Speciﬁcally, on the one hand it will be investigated if a more eﬀective binary pooling can be designed keeping, at the same time, the computational eﬀort low. Following the approaches in [23, 46], the proposed binary pooling function could be designed by weighting AM information, accounting for the scene content. Similarly, ﬁxation points search strategy may also be embedded in the proposed framework for making blocks selection pseudo-random and more adaptive to both scene and distortion content. Finally, instead of taking an a priori ﬁxed number of blocks, an optimal number of them in terms of Minimum Description Length [15] of the available information may be deﬁned. Appendices Proof of Proposition 3 It is well-known [37] that: H(X) = H(α) + αH(X1) + (1 − α)H(X2) and the same holds for H(Y ). On the other hand, M(X1 , Y1 ) with prob. α Z= M(X2 , Y2 ) with prob. 1 − α where M(X1 , Y1 ) ∩ M(X2 , Y2 ) = ∅ and pZ = αpM (X1 ,Y1 ) + (1 − α)pM (X2 ,Y2 ) . In order to show that H(pZ ) ≤ H(α) + αH(pM (X1,Y1 ) ) + (1 − α)H(pM (X2 ,Y2 ) ), 23

let’s suppose by absurd H(pZ ) > H(α) + αH(pM (X1 ,Y1 ) ) + (1 − α)H(pM (X2 ,Y2 ) ).

(.1)

α , i.e. By reminding the deﬁnition of the Jensen-Shannon divergence DJS α (pM (X1 ,Y1 ) ||pM (X2 ,Y2 ) ) = H(pZ ) − αH(pM (X1 ,Y1 ) ) − (1 − α)H(pM (X2 ,Y2 ) ), DJS α (pM (X1 ,Y1 ) ||pM (X2 ,Y2 ) ) > and by combining previous equations we have: DJS α H(α), that is absurd since 0 ≤ DJS (pM (X1 ,Y1 ) ||pM (X2 ,Y2 ) ) ≤ H(α). Hence, eq. (.1) is absurd and since α < 1, eq. (5) follows. •

Proof of Proposition 4 Let Zˆj = {Z1 , Z2, .., ZNj } denote a collection of Nj variables Zi selected

N j in {Z1 , Z2 , .., ZT } such that i=1 Zi = ∅ and be K the number of Nj −ples K ˆ such that: j=1 Zj = {Z 1 , Z2 , .., ZT }. Since for generic variables S1 , .., Sn it holds H(S1 , S2 , .., Sn ) ≤ ni=1 H(Si ) [37] and KR ≤ T , then H(Z1, Z2 , .., ZT ) ≤

K

H(Zˆj ) ≤

j=1

K

H(Z¯1, Z¯2 , .., Z¯R ) =

j=1

H(Z¯1 , Z¯2 , .., Z¯R ) = KH(Z¯1 , Z¯2 , .., Z¯R ) = KR R

•

Proof of the Propositon 5 Let m, d, s and c respectively denote multiplication, division, algebraic sum and comparison, the number of operations required by SSIM is OSSIM = OSSIM p = (8l2 + 18)N,

(.2)

where OSSIM p is the number of operations required for SSIM computation at x μy +C1 2σxy +C2 a given pixel. In fact, SSIMp = S1 S2 = μ2μ2 +μ 2 +C σ 2 +σ 2 +C , then OSSIM p = 1 x 2 x y y (OS1 +OS2 +1m), where O(S1 ) = 2m+3s+1d+2Oμ, O(S2 ) = 1m+1d+3s+ 2Oσ2 + Oσxy , with O(μ) = (l2 − 1)s + 1d, O(σ 2 ) = (l2 + 1)m + l2 s + 1d, O(σxy ) = (l2 + 1)m + l2 s + 1d. As a result, OS1 = (2l2 + 1)s + 2m + 3d, OS2 = (3l2 + 4)m + (3l2 + 3)s + 4d and then OSSIM p = 8l2 + 18.

(.3)

The number of operations required by the Algorithm in Section 3.5 is the sum of the number of operations required by 24

G−th low pass (Haar) wavelet subband (Oapproxwavelet); L levels SMQT (OSM QT ); regions cardinality and weights of the graph (Ograph ); sequential SSIM computation on an increasing number of blocks up to Nr (OSSIM blocks); 5. entropy of SSIM vectors (OSSIM entropy ); 6. MDL for the selection of the best Nr (OM DL ),

1. 2. 3. 4.

where 1. the computation of the low pass wavelet subband using Haar wavelet requires (4m + 3s)N/22j at each level j, then Oapproxwavelet

G 1 7 1 = 7N = N 1 − 2G ; 22j 3 2 j=1

(.4)

2. each level l of SQMT requires (N − 1)s + 2l−1 d + Nc, then OSQM T = L(N − 1)s + LNc +

L

2l−1 d = 2LN + 2L − L − 1; (.5)

1=1

3. the cardinality of selected regions requires Nc while the computation of the weights requires 1m + 1d + (2L (2L − 2))s, then Ograph = Nc + 1m + 1d + (2L (2L − 2))s = N + 2 + 2L (2L − 2); (.6) 4. using eq. (.3), we have OSSIM blocks = Nr (8l2 + 18)

(.7)

5. if Nr ≤ |χ|, at k −th step the computation of the pdf (as in Proposition 1) requires log2 (|χ|)c + km + kd + 1s while the entropy requires km + (k − 1)s + kClog + 1d. Hence, the complete entropy computation at the k −th step requires Ok−entr = (2k)m+ks+(k +1)d+kClog +log2 (|χ|)c. As a results, for Nr steps OSSIM entropy = Nr (log2 (|χ| + 1) + (4 + Clog )

Nr

k=

k=1

= (log2 (|χ| + 3 + 25

Clog N2 )Nr + (4 + Clog ) r . 2 2

(.8)

6. at the k-th step of MDL procedure it is necessary to compute L(k) + L(H|k). This computation requires 3s + Clog + 4m + 1d, while the minimum value requires 1c. Hence, OM DL = (9 + Clog )Nr .

(.9)

As a result

1 7 Oalgo = N 1 − 2G + 2L + 1 + 3 2 2 N 3 −2L − L + 1 + 22L + (4 + Clog ) r + (8l2 + log2 (|χ| + 30 + Clog )Nr (.10) 2 2

Hence, using eqs. (.2) and (.10), the thesis follows by solving the inequality Oalgo < OSSIM with respect to Nr . Acknowledgements The Authors would like to thank the anonymous reviewer for his valuable comments and suggestions that contributed to improve the paper. [1] A. Beghdadi and B. Pesquet-Popescu, A new image distortion measure based on wavelet decomposition, 7th Int. Symp. on Signal Processing and Its Applications, Paris, France, 2003. [2] S. Benabdelkader, M. Boulemden, Recursive Algorithm based on Fuzzy 2Partition Entropy for 2-Level Image Thresholding, Pattern Rec., 38:12891294, 2005. [3] B. P. Bondzulic, V. S. Petrovic, Additive Models and Separable Pooling, a New Look at Structural Similarity, Signal Processing, 97:110-116, 2014. [4] V. Bruni, D. Vitulano, G. Ramponi, Image quality assessment through a subset of the image data, Proc. of ISPA 2011. [5] V. Bruni, E. Rossi, D. Vitulano, On the equivalence between Jensen Shannon divergence and Michelson contrast, IEEE Trans. on Information Theory, 58(7):4278-4288, 2012. 26

[6] V. Bruni, E. Rossi, D. Vitulano, Jensen-Shannon divergence for visual quality assessment, Signal Image and Video Processing, Springer, 7(3):411-421, 2013. [7] V. Bruni, D. Vitulano, Z. Wang, Special issue on human vision and information theory, Signal Image and Video Processing, Springer, 7(3), 2013 [8] V. Bruni, D. Vitulano, Evaluation of degraded images using adaptive Jensen-Shannon divergence, to appear in Proc. of the 8th Int. Symp. ISPA 2013, Trieste, Italy, 2013. [9] R. Ferzli, L. J. Karam, A no-reference objective image sharpness metric based on the notion of just noticeable blur (JNB), IEEE Trans. Image Processing, 18(4):717-728, 2009. [10] R.A. Frazor, W.S. Geisler, Local luminance and contrast in natural in natural images, Vision Research, 46:1585-1598, 2013. [11] S. Gabarda, G. Cristbal, Blind image quality assess. through anis., J. Opt. Soc. Amer., 24(12):42-51, 2007. [12] F. Gao, J. Yu, Biologically Inspired Image Quality Assessment, Signal Processing, Article in press. [13] D. Gayle, H. Mahlab, Y. Ucar, A. M. Eskicioglu, A Full-Reference Color Image Quality Measure in the DWT Domain, Proc. of EUSIPCO 2005. [14] R. C. Gonzalez, R. E. Woods, Digital Image Processing, Prentice Hall, 2nd Edition, 2002. [15] P. D. Grunwald, A Tutorial Introduction to the Minimum Description Length Principle, edited by: Myung Grunwald, Pitt, In Advances in Minimum Description Length: Theory and Applications, 2004. [16] URL: http://live.ece.utexas.edu/research/quality/live video.html [17] Q. Li, Z. Wang, General-Purpose Reduced-Reference Image uality Assessment based on Perceptually and Statistically Motivated Image Representation, Proc. of IEEE ICIP, San Diego, CA, 2008. [18] S. Mallat, A wavelet tour of signal processing, Academic Press, 1998. 27

[19] A. Mittal, A. K. Moorthy, A. C. Bovik, No-Reference Image Quality Assessment in the Spatial Domain, IEEE Trans. on Image Processing, 21(12):4695-4708, 2012. [20] V. Monte, R.A. Frazor, V. Bonin, W.S. Geisler, M. Corandin, Independence of luminance and contrast in natural scenes and in the early visual system, Nature Neuroscience, 8(12), 2005. [21] A. K. Moorthy, A. C. Bovik, Blind image quality assessment: From natural scene statistics to perceptual quality, IEEE Trans. Image Process., 20(12):3350-3364, 2011. [22] M. Nilsson, M. Dahl, I. Claesson, The successive mean quantization transform, Proc. of ICASSP05, 2005. [23] J. Park, K. Sshadrinathan, S. Lee, A. C. Bovik, Spatio-Temporal Quality Pooling Accounting for Transients Severe Impairments and Egomotion, Proc. of ICIP 2011. [24] N. Ponomarenko, L. Jin, O. Ieremeiev, V. Lukin, K. Egiazarian, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti, C.C. Jay Kuo, Image Database TID2013, Image Communication, 30, 2015. [25] R. Raj, W.S. Geisler, R.A. Frazor, A.C. Bovik, Contrast statistics for foveated visual systems: ﬁxation selection by minimizing contrast entropy, J. Opt. Soc. Am. A, 20(10), 2005. [26] M. Rivera, O. Ocegueda, J. L. Marroquin, Entropy-Controlled Quadratic Markov Measure Field Models for Eﬃcient Image Segmentation, IEEE Trans. on Image Processing, 16(12):3047-3057, 2007. [27] M. Saad, A. C. Bovik, C. Charrier, Blind image quality assessment: A natural scene statistics approach in the DCT domain, IEEE Trans. Image Processing, 21(8):3339-3352, 2012. [28] A. Saha, Q. M. J. Wu, Perceptual Image Quality Assessment using Phase Deviation Sensitive Energy Features, Signal Processing, 93:31823191, 2013. [29] K. Seshadrinathan, R. Soundararajan, A. C. Bovik and L. K. Cormack, Study of Subjective and Objective Quality Assessment of Video, IEEE Trans. on Image Processing, 19(6):1427-1441, 2010. 28

[30] K. Seshadrinathan, R. Soundararajan, A. C. Bovik and L. K. Cormack, A Subjective Study to Evaluate Video Quality Assessment Algorithms, SPIE Proceedings Human Vision and Electronic Imaging, Jan. 2010. [31] H. R. Sheikh, A. C. Bovik, G. De Veciana, An Information Fidelity Criterion for Image Quality Assessment using Natural Scene Statistics, IEEE Trans. on Image Processing, 14(12), 2005. [32] H. R. Sheikh, Z. Wang, L. Cormack, A. C. Bovik, Live Image Quality Assessment Database Release 2. [Online]. Available: http://live.ece.utexas.edu/research/quality [33] H. R. Sheikh, A. C. Bovik, Image Information and Visual Quality, IEEE Trans. on Image Proc., 15(2), 2006. [34] J. Sivestre-Blanes, Structural Similarity Image Quality Reliability Determining Parameters and Window Size, Signal Processing, 91:1012-1020, 2011. [35] E. P. Simoncelli, B. A. Olshausen, Natural Image Statistics and Neural Representation, Annu. Rev. Neurosci., 24:1193-1216, 2001. [36] S. Suthaharan, No-reference visually signiﬁcant blocking artifact metric for natural scene images, J. Signal Processing, 89(8):1647-1652, 2009. [37] T. M. Cover, J. A Thomas, Elements of Information Theory, John Wiley & sons, 1991. [38] D. Van der Weken, M. Nachtegael and E. E. Kerre, A new similarity measure for image processing, Journal of Computational Methods in Sciences and Engineering, 3(2):209-222, 2003. [39] Y. Yang, J. Ming, Image Based Assessment based on the Space Similarity Decomposition Model, Signal Processing, 120:797-805, 2016. [40] Z. Wang and A.C. Bovik, Modern Image Quality Assessment, Morgan & Claypool Publishers, 2006. [41] Z. Wang and A. Bovik, A Universal Image Quality Index, IEEE Sig. Proc. Letters, 9(3):81-84, 2002.

29

[42] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, Image Quality Assessment: From Error Visibility to Structural Similarity, IEEE Trans. on Image Processing, 13:600-612, 2004. [43] Z. Wang, L. Lu, A. Bovik, Video Quality Assessment based on Structural Distortion Measurement, Signal Processing: Image Communication, 19(2):121-132, 2004. [44] Z. Wang, E.P.Simoncelli, Reduced-Reference Image Quality Assessment using a Wavelet-Domain Natural Image Statistic Model, Pro. of SPIE Human Vision and Electronic Imaging X, vol. 5666, 2005. [45] W. Wang, Y. Wang, Q. Huang, W. Gao, Meas. Visual Saliency by Site Entropy Rate, Proc. of CVPR 10, 2010. [46] Z. Wang, Q. Li, Information Content Weighting for Perceptual Image Quality Assessment, IEEE Trans. on Image Proc., 20(5):1185-1198, 2011. [47] S. Winkler, Digital Video Quality, Vision Models and Metrics, Wiley, 2005. [48] D. Zhang, E. Jernigan, An Information Theoretic Criterion for Image Quality Assessment based on Natural Scene Statistics, Proc. of IEEE ICIP 2006, Atlanta GA USA, 2006. [49] J. Zhang, T. M. Le, S. H. Ong, T. Q. Nguyen, No-reference Image Quality Assessment using Structural Activity, Signal Processing, 91:25752588, 2011.

30

An entropy based approach for SSIM speed up

An entropy based approach for SSIM speed up

Recommend Documents