Selected Theoretical Results

Selected Theoretical Results

APPENDIX Selected Theoretical Results B B.1 INFORMATION-THEORETIC ANALYSIS OF SECURE WATERMARKING (MOULIN AND O’SULLIVAN) In [201], O’Sullivan et a...

219KB Sizes 1 Downloads 50 Views

APPENDIX

Selected Theoretical Results

B

B.1 INFORMATION-THEORETIC ANALYSIS OF SECURE WATERMARKING (MOULIN AND O’SULLIVAN) In [201], O’Sullivan et al. suggest that watermarking can be viewed as a game played between an information hider and an adversary. The information hider embeds watermarks in content, and the adversary attempts to remove them. The resulting system can be studied using a combination of information theory and game theory. The study of watermarking as a game was extended by Moulin and O’Sullivan in [301, 305]. In this section of the appendix, we briefly summarize two main results of their analysis and provide some insight as to what these results mean. Those interested in the actual proofs are referred to [301].

B.1.1 Watermarking as a Game The formal description of the watermarking game studied by Moulin and O’Sullivan is based on formal definitions of a distortion function, a watermarking code, and an attack channel. These are defined as follows: I

A distortion function is a real-valued function, D(c1 , c2 ), where c1 is a Work and c2 is a distorted version of c1 . This function is meant to be monotonic with the perceptual difference between the two Works. For their most general results, Moulin and O’Sullivan assume only that D(c1 , c2 ) is a classical extension of a distortion function applied to the individual elements of c1 and c2 ; that is, 1 d(c1 [i], c2 [i]). N i N

D (c1 , c2 ) =

(B.1)

For example, if d(a, b) = (a − b)2 , then D(c1 , c2 ) is the mean squared error between c1 and c2 .

511

512

APPENDIX B Selected Theoretical Results

Of course, realistic perceptual distance models, as discussed in Chapter 8, seldom fit into the pattern of Equation B.1. In particular, masking effects mean that the perceptual difference between one pair of terms can be affected by the values of other terms, and therefore the distance function might not be separable into successive applications of a onedimensional function, d( ). However, many models can be implemented by first applying some nonlinear transform to the two Works, and then computing an Lp norm between them (see, for example, [33, 444]). Masking effects are embodied in the nonlinear transform, and thus the perceptual distance can be computed simply as Lp (C1 , C2 ) =

 N

1 p |C2 [i] − C1 [i]|

p

,

(B.2)

i

where C1 and C2 are the transformed versions of c1 and c2 ,and p is a constant. If we let d(a, b) = |b − a| p , then D(C1 , C2 ) = (1 / N) |C2 [i] − C1 [i]| p is monotonic with Lp (C1 , C2 ). Thus, assuming that true perceptual distance can be captured by an Lp norm applied to a nonlinear transform of two Works, we can assume a meaningful distortion function exists in the form of Equation B.1. I

A length-N information-hiding code subject to distortion D1 comprises three parts: — A set of messages, M. — A watermark embedding function, EK (co , m), where co is an unwatermarked Work of dimensionality N, m ∈ M is a message, and K is a watermarking key. — A watermark detection function, DK (c), where c is a (possibly watermarked) Work and K is a key. The watermark embedding function is constrained to yield an expected distortion less than or equal to D1 . That is,  co

1 Pc ,K (co , K) D (co , EK (co , m)) ≤ D1 , |M| o ,K,m

(B.3)

where the summation is performed over all possible combinations of cover Work co , key K, and message m. The expression Pco ,K (co , K) is the probability of co being drawn from the distribution of unwatermarked content and K being used as a key for that Work. This is expressed as a joint probability distribution to handle the case where the key is dependent on the cover Work. I

An attack channel subject to distortion D2 is a conditional probability function, Q2 (cn |c) = Pcn |c (cn ), which gives the probability of obtaining the

B.1 Information-Theoretic Analysis of Secure Watermarking

Work, cn , after applying a specific attack to c. The probability function is constrained to yield an expected distortion less than or equal to D2 , when applied to watermarked content. That is,  cn ,cw

Pcw (cw )Q2 (cn |cw )D (cw , cn ) ≤ D2 .

(B.4)

Here, Pcw (cw ) is the probability of obtaining cw by embedding a random message in a randomly selected cover Work; that is, the embedding distribution (see Chapter 3). If the distortion function is symmetric, so that D(c1 , c2 ) = D(c2 , c1 ), then we should assume that D2 ≥ D1 , because the adversary should always be satisfied with the original unwatermarked Work as the result of an attack (so D2 must be at least equal to D1 ). In general, D2 will be greater than D1 because the adversary will be satisfied with low fidelity. In an information-hiding game subject to distortions (D1 , D2 ), the information hider designs an information-hiding code subject to distortion D1 , and the adversary designs an attack channel subject to distortion D2 . The information hider is trying to maximize the information communicated across the attack channel, and the adversary is trying to minimize it. A given rate, R, is achievable for distortions (D1 , D2 ) if it is possible to design information-hiding codes with rates of at least R, such that the probability of error after the worst-case attack diminishes to 0 as N increases toward infinity (see Section A.1.3 of Appendix A). The data-hiding capacity, C(D1 , D2 ), is the supremium of all rates achievable for distortions (D1 , D2 ).

B.1.2 General Capacity of Watermarking Moulin and O’Sullivan’s main result is a general expression for data-hiding capacity, C(D1, D2 ). As in Gel’fand and Pinsker’s expression for the capacity of communications with side information this expression involves the use of an auxiliary variable, u. The watermark-embedding algorithm can be divided into two steps. First, a value of u is selected based on the desired message, m the cover Work, co and a key, K. Assuming messages are uniformly distributed, this gives rise to a conditional distribution of u based on the distribution of unwatermarked content and keys, Pu|co ,K (u). Second, the value of u = u is combined with the unwatermarked Work, co , with reference to the key, K, to obtain the watermarked Work, cw . This gives rise to a distribution of watermarked Works from a given cover Work and key: Q1 (cw , u|co , K) = Pcw |co ,u,K (cw )Pu|co ,K (u).

(B.5)

513

514

APPENDIX B Selected Theoretical Results

In general, it is possible to design an embedding algorithm that leads to any desired distribution for Q1 ( ), provided that distribution satisfies the embedding distortion constraint  cw ,co ,u,K

D(cw , co )Q1 (cw , u|co , K)Pco ,K (co , K) ≤ D1 .

(B.6)

The capacity of a data-hiding game subject to distortions (D1 , D2 ) is given by C(D1 , D2 ) = max min I(u; cn |K ) − I(u; cw |K ), Q1

(B.7)

Q2

where u, cn , and cw are corresponding elements of u, cn , and cw . The maximum is taken over all embedding distributions that satisfy the D1 distortion constraint, and the minimum is taken over all attack channels that satisfy the D2 distortion constraint. I(u; c|K ) is the mutual information between u and c, computed with their probability distributions conditioned on K. Equation B.7 can be understood intuitively by recognizing that the value inside the max-min, I(u; cn |K ) − I(u; cw |K), is essentially the same as the value maximized in Gel’fand and Pinsker’s expression. Thus, if we were to limit the possible attack distributions, Q2 , to one, specific distribution, we would obtain the capacity of communications with side information at the transmitter. Of course, the adversary is not limited to only one attack and will choose the one that minimizes the amount of information that gets through. Therefore, the amount of information that can be transmitted with a given Q1 distribution is the minimum over all possible attacks. The best Q1 (from the point of view of the data hider) is the one that maximizes this minimum. Thus, the data-hiding capacity is maximized over possible Q1 s, and minimized over possible Q2 s. The value of Equation B.7 depends on D1 , D2 , and the distortion function, D( ). It may or may not also depend on the distribution of unwatermarked content. One can argue that for a given type of content there is a single distortion function that best reflects human perception, and a single distribution of unwatermarked Works. Thus, in principle, the data-hiding capacity is determined by D1 , D2 , and the type of cover content. However, for images, video, and audio, the best distortion function and true distribution are as yet unknown, and therefore Equation B.7 can be only used to obtain estimates of capacity under some simplifying assumptions.

B.1.3 Capacity with MSE Fidelity Constraint Moulin and O’Sullivan explore the hiding capacity that results under two assumptions: I

D( ) is the mean square error (MSE), that is, 1 (c [i] − c1 [i])2 , N i 2 N

D(c1 , c2 ) =

(B.8)

B.1 Information-Theoretic Analysis of Secure Watermarking

and I

the cover Work, co , is drawn from an i.i.d. Gaussian distribution with 2 variance ␴co .

Of course, these assumptions are known to be unrealistic for the types of media with which we have been concerned. MSE correlates poorly with perceptual distance (see Chapter 8), and the distribution of unwatermarked content is correlated and non-Gaussian (see Chapter 3). However, the results obtained under these assumptions may indicate some qualitative behaviors of capacity under a more realistic distortion function. These qualitative behaviors are as follows: I

The value of ␴2co affects capacity only by affecting the severity of the attacks the adversary may perform.

I

If D1 , D2  ␴2co , the value of ␴2co has little effect on capacity and can be ignored.

I

Furthermore, if D1 , D2  ␴2co , the assumption of Gaussian-distributed content can be relaxed. That is, non-Gaussian distributions should behave in essentially the same way.

Under the assumptions previously listed, Moulin and O’Sullivan show that  0 if D2 ≥ ␴2co + D1  C(D1 , D2 ) = 1  log 1 + D1 otherwise, ␤D2 2

(B.9)

where 

␤= 1−

D2

␴ + D1 2 co

−1 .

(B.10)

Intuitively, Equation B.9 results from three assertions. First, if we look at the vector added by an optimal embedding algorithm, wa = cw − co (where cw is the watermarked Work and co is the original), we

will find that it has expected magnitude |wa | ≈ ND1 , and expected correlation

with the original co · wa ≈ 0. That it should have expected magnitude ND1 is ensured because the expected MSE between co and cw is constrained to D1 . That it should be orthogonal to co is intuitive, because, in high dimension, most vectors are orthogonal to a given vector. A coding system that led to nonorthogonal added vectors would severely limit its use of the space, and thereby limit √ its capacity. Because the expected magnitude of the original Work is |co | ≈ N␴co , we can conclude that the expected magnitude of the watermarked Work will be |cw | ≈

N(␴2co + D1 ).

515

APPENDIX B Selected Theoretical Results

2

Second, if D2 ≥ ␴co + D1 , the adversary can simply zero out the watermarked Work to remove the watermark. That is, the attacked Work, cn , is just cn [i] = 0 for all i, regardless of the watermarked Work. The MSE will be D(cw , cn ) =

1 |c |2 = ␴2co + D1 ≤ D2 . N w

(B.11)

Obviously, after such an attack, no information is retained, and the capacity is 0. Finally, if D2 < ␴2co + D1 , the optimal attack will be the Gaussian test channel. A geometric interpretation of this attack is shown in Figure B.1. The first step of the attack is to add Gaussian white noise with power ␤D2 . The noise vector added is likely to be orthogonal to the watermarked Work, as shown in −1 the figure. The second step is to scale the resulting vector by ␤ . With the

dd

:a

p2

p1

Ste

: sc

ale

␤ given in Equation B.10, and assuming that |cw | = N(␴2co + D1 ), this yields a point on the edge of the region of acceptable fidelity defined by D2 (shown as a circle in the figure). The resulting channel is equivalent to an AWGN dirtypaper channel with the first noise source having power ␴2co and the second

Ste

516

ise

no

FIGURE B.1 Geometric interpretation of the Gaussian test channel. The black dot represents a watermarked Work. The two steps of the attack are shown with solid arrows. The ⊗ represents the result of the attack. Note that the resulting Work lies on the edge of the region of acceptable fidelity, shown with a circle. Note also that this attack maximizes the angle between the attacked Work and the original Work, thereby minimizing the signal-to-noise ratio.

B.2 Error Probabilities Using Normalized Correlation Detectors

noise source having power ␤D2 . Thus, regardless of whether the detector is blind or informed, the capacity is 12 log(1 + D1 / ␤D2 ). Note that unlike the case of Costa’s capacity for dirty-paper channels the distribution of unwatermarked content affects capacity in a data-hiding game, in that ␤ depends on ␴2co . This leads to the first qualitative point with which we began this discussion of capacity under an MSE distortion function. The reason that ␴2co affects capacity is that it affects the severity of attack the adversary may perform. In the case of an MSE distortion function, a smaller value of ␴2co allows the adversary to perform a more severe attack, adding more noise in the first step and scaling down by a larger amount in the second step to stay within the region of acceptable fidelity. In the case of more realistic distortion functions that account for masking effects, it may be that larger values of ␴2co allow for more severe attacks, because they imply more noise in each Work and thus greater ability to hide distortions. In either case, the qualitative point that ␴2co affects capacity still holds. The second qualitative point, that ␴2co has little effect on capacity if D1 , D2  2 ␴co , follows from Equation B.10. As D2 / ␴2co tends toward 0, ␤ tends toward 1, and the capacity tends toward 21 log(1 + D1 / D2 ). Because both watermarks and attacks are meant to be imperceptible, it is generally safe to assume that D1 and 2 2 D2 are small relative to ␴co , so that ␴co can be ignored. Moulin and O’Sullivan go on to prove the third qualitative point: that the shape of the distribution of unwatermarked content can also be ignored when D1 , D2  ␴2co .

B.2 ERROR PROBABILITIES USING NORMALIZED CORRELATION DETECTORS (MILLER AND BLOOM) We here describe a precise method of estimating the false positive probability when using a normalized correlation detector. We also extend this to estimate the effectiveness of blind embedding when using normalized correlation at the detector. We refer to this method, described in [289], as the spherical method. It gives an exact value for the false positive probability, under the assumption that the random vectors are drawn from a radially symmetric distribution. We derive the spherical method for the case of a random watermark, wr , and a constant Work, c. We assume that the distribution of wr is radically symmetric. That is, we assume that the probability of obtaining a given random vector, wr , depends only on the length of wr , and that it is independent of the direction of wr . A white Gaussian distribution satisfies this assumption. The derivation of the spherical method begins by observing that if we normalize each randomly drawn vector to unit length before computing

517

518

APPENDIX B Selected Theoretical Results

normalized correlation we will not change the probability of detection. Thus, we have

wr =

znc (c, wr ) =

c · wr

|c||wr |

=

wr

(B.12)

|wr | c · wr |c||wr |

= znc (c, wr ),

(B.13)

where c is the constant vector against which the random vectors are being compared. The distribution of wr is limited to the surface of the unit N-sphere. That is, we never obtain a vector wr that is not on this surface. The probability of obtaining a given wr is clearly independent of its direction, because the distribution of wr is assumed to be radially symmetric. This means that wr is drawn from a distribution uniformly distributed over the surface of the unit N-sphere. The probability that a randomly chosen wr will lie in the detection region for c can be found by simply finding the fraction of the surface of the unit N-sphere that lies within the detection region. Because the detection region is an N-dimensional cone, we need to determine the fraction of the surface of the sphere that is intersected by the cone. The intersection of an N-cone and an N-sphere is an N-spherical cap, and we need to find the ratio between the (N − 1) - content1 of a given N-spherical cap and the (N − 1) - surface content of the N-sphere. This point is illustrated for N = 2 in Figure B.2 and for N = 3 in Figure B.3. Let cap(N, ␪) be the (N − 1) - content of the N-spherical cap obtained by intersecting the unit N-sphere with an N-cone that subtends angle ␪ (in radians). We then have Pfp =

cap(N, ␶␪ ) . 2cap(N, ␲ / 2)

(B.14)

Note that 2cap(N, ␲ / 2) is the complete (N − 1) - surface content of the unit N-sphere. The cap( ) function, found in [379], is cap (N, ␪) = SN−1 IN−2 (␪),

1

(B.15)

Content is a generalization of the terms length, area, and volume. The 1-content of a onedimensional region (i.e., a line) is the length of that region. The 2-content of a two-dimensional region is the area of that region. The 3-content of a three-dimensional region is the volume of that region. In higher dimensions, content has the analogous meanings.

B.2 Error Probabilities Using Normalized Correlation Detectors

Direction of constant vector

Detection region

The length of this arc, divided by The probability of 5 the perimeter of the unit circle false detections FIGURE B.2 Computing the false positive probability when using normalized correlation in a two-dimensional marking space.

where ␪ is half the angle subtended by the N-cone, and where, for any d, d␲d / 2 d / 2 !

Sd =

(B.16)

and Id (␪) =



sind (u) du.

(B.17)

0

Thus, if a random N-dimensional vector is drawn from a radially symmetric distribution, the probability that its normalized correlation with some constant vector will be over a given threshold, ␶nc , is given exactly by IN−2 (cos−1 (␶nc )) = Pfp = 2IN−2 (␲ / 2)

cos−1 (␶nc ) 0

2

␲/2 0

sinN−2 (u) du

sinN−2 (u) du

.

(B.18)

Evaluating Equation B.18 requires calculation of the integral Id (␪), with d = N − 2. This integral has a closed-form solution for all integer values of d.

519

520

APPENDIX B Selected Theoretical Results

Direction of constant vector

The area of this spherical cap, divided by the surface area of the unit sphere

5

The probability of false detections

FIGURE B.3 Computing the false positive probability when using normalized correlation in a three-dimensional marking space.

Table B.1 provides the solutions for d = 1 through 5. When d > 5, the solution can be found by the recursive formula shown on the last row of the table. The spherical method also can be used to estimate the effectiveness of a blind embedding system for which detection is performed using normalized correlation. To illustrate this, we consider the case of a fixed Work and random watermarks. Figure B.4 illustrates the process geometrically. We are given a fixed vector, vo , which is extracted from an unwatermarked Work. To test for effectiveness, we generate a series of random watermark vectors from a radially symmetric distribution, one of which is depicted in Figure B.4, aligned with the horizontal axis. The angle, ␾, between these two vectors is therefore a random value. Given the Work vector, vo , and a random watermark vector, wr , the blind embedder adds ␣wr to the Work vector to produce a watermarked vector, vw .

B.2 Error Probabilities Using Normalized Correlation Detectors

Table B.1 Closed-form solutions for Id (␪). d

Id (␪)

0



1

1 − cos (␪)

2

␪ − sin(␪) cos (␪) 2

3

cos (␪) − 3 cos (␪) + 2 3

4

3␪ − (3 sin(␪) + 2 sin3 (␪)) cos (␪) 8

5

4 cos3 (␪) − (3 sin4 (␪) + 12) cos (␪) + 8 15

>5

cos (␪) sind−1 (␪) d −1 Id−2 (␪) − d d

3

The Work is said to be watermarked if this vector falls inside the detection region defined by ␪. Clearly, when vw lies on the surface, 

␾ = ␶␾ = ␪ + ␾o = ␪ + sin−1

␣ |vo |

 sin ␪ ,

(B.19)

where ␶␾ is the critical angle. Thus, provided the angle subtended by a Work vector and a random watermark vector (i.e., ␾) is less than or equal to ␶␾ , watermark embedding will be successful. If the watermarks are drawn from a radially symmetric distribution, as we have assumed, then the probability that ␾ ≤ ␶␾ (i.e., the probability of successful embedding, or probability of a true positive) can be calculated in the same manner as the false positive probability using the spherical method. The effectiveness (i.e., true positive probability), Ptp , is the fraction of the surface of an N-sphere that lies within the detection region. That is, it is the fraction of the surface of the N-sphere that lies within the detection region, given by

␶␾ Ptp =

0

2

sinN−2 (u) du

␲/ 2 0

sinN−2 (u) du

.

(B.20)

If the probability of a false negative, Pfn , is required rather than the effectiveness, it is simply Pfn = 1 − Ptp .

521

522

APPENDIX B Selected Theoretical Results

Vo

Vw

a

Detection region

u

f

f0 u

Wr

FIGURE B.4 Calculation of effectiveness of blind embedding for normalized correlation detection.

B.3 EFFECT OF QUANTIZATION NOISE ON WATERMARKS (EGGERS AND GIROD) Quantization noise is often considered to be independent of the signal. When the quantization step size is small, this approximation is valid. However, as the quantization step size increases, this assumption breaks down. In the following analysis, we model a blind watermark detector as equivalent to nonsubtractive dither quantization. This analysis results in a model for the correlation between the quantization noise and the watermark. Assume the embedder simply adds a reference mark, so cw = co + wr , and let cwq denote the output from the watermarking and quantization process. The quantization error, n, is defined as n = cwq − cw = cwq − co − wr .

(B.21)

B.3 Effect of Quantization Noise on Watermarks

The output from the correlator, zlc (cwq , wr ), is given by zlc (cwq , wr ) = cwq · wr = (co + wa + n) · wr = co · wr + wa · wr + n · wr .

(B.22)

Assuming that co · wr = 0, this gives us zlc (cwq , wr ) = wa wr + n · wr ,

(B.23)

where N is the dimensionality of marking space and ␴wr is the standard deviation of wr . Ideally, the quantization error, n, would be independent of the 2 watermark, and Equation B.23 would simply equal N␴wr . However, if the quantization error is negatively correlated with the watermark, the correlator output will be reduced. The independence assumption is often reasonable for small quantization factors. However, for large quantization factors, the quantization error does indeed become negatively correlated with the watermark. In fact, when the quantization factor is large enough, the quantization output will always be zero, and the error, given by Equation B.21, will be n = −co − wr . Thus, n · wr will equal 2 −N␴wr , and the correlator output will be zero. Our objective, then, is to find the expected correlation between the quantization noise and the reference mark. We consider both the Work and the watermark as being chosen at random. The quantization factor for each term, on the other hand, is prespecified. The correlation between the error vector, n, and the reference mark, wr , is the sum of random values, n[1]wr [1] + n[2]wr [2] + · · · + n[N]wr [N]. The expected value of these random values’ sum is just the sum of their respective expected values; that is, E(n · wr ) =



E(n[i]wr [i]).

(B.24)

i

Thus, we can meet our objective by finding the expected value of the product of two random scalar values, E(nw), where w is drawn from some distribution with a density function of Pw (w) and n is computed as n = cwq − c − w,

(B.25)

where c, a scalar value for the term of the cover Work that corresponds with w, is drawn from a distribution with density function Pc (c), and where cwq is the corresponding term of the watermarked and quantized Work, given by cwq = q(w + c) / q , where q is given.

(B.26)

523

524

APPENDIX B Selected Theoretical Results

B.3.1 Background Before we continue, we briefly review some facts about statistics [320]. Given a random variable, x, the characteristic function, Cx (u), of its probability density function, Px (x), is defined as ∞

Cx (u) =

−∞

Px (x)eux dx.

(B.27)

Thus, Cx (−2␲u) is the Fourier transform of the probability density function, Px (x). The moment-generating function, Mx (s), of Px (x) is defined as ∞

Mx (s) =

−∞

Px (x)esx dx.

(B.28)

Thus, Mx ( u) = Cx (u).

(B.29)

The importance of the moment-generating function derives from the property that the moments of a random variable can be determined from the derivatives of its moment-generating function, E(x k ) = −k

 dk  M (  u)  . duk x u=0

(B.30)

B.3.2 Basic Approach The expected value of nw can be found by determining the moment-generating function for n, given Pw (w), Pc (c), and q. We begin by finding the probability density function for n while assuming a constant value for w, that is Pn|w = w (n), where w is some constant. From this, we find the moment-generating function Mn|w = w (s), which gives us the expected value of n for w = w. The derivation of Mn|w = w (s) is due to Schuchman [363]. Finally, E(nw) is found by integrating Pw (w)wE(n|w = w) over all possible values of w.

B.3.3 Finding the Probability Density Function To find Pn|w = w (n), start by assuming that the watermarked and quantized value, cwq , is equal to some given integer, b, times q. Because cwq = q(w + c) / q , the assumption that cwq = bq holds whenever −q / 2 ≤ w + c − bq < q / 2.

(B.31)

The quantization error, n, is n = bq − w − c.

(B.32)

From Inequality B.31 and Equation B.32, it is clear that n must lie between −q / 2 and q / 2, regardless of the value of b. Because w = w is a constant, c is

B.3 Effect of Quantization Noise on Watermarks

the only random value in Equation B.32, and the probability that a given value of n arises is just the probability that c = bq − w − n. Thus, Pn|cwq = bq,w = w (n) = Pc (bq − w − n), −q / 2 < n ≤ q / 2.

(B.33)

The total probability of a given error value, n, is then the sum of the probabilities of n occurring for each possible value of b; that is, Pn|w = w (n) =

∞  b = −∞

=

∞ 

Pn|cwq = bq,w = w (n) Pc (bq − w − n).

(B.34)

b = −∞

B.3.4 Finding the Moment-Generating Function We will now derive the moment-generating function, Mn|w = w (u). The momentgenerating function is defined as Mn|w = w (u) =

∞ −∞

Pn|w = w (n)eun dn.

(B.35)

Substituting Equation B.34 into Equation B.35, and noting that Pn|w = w (n) = 0 when |n| > q / 2, we obtain Mn|w = w (u) =

q/2

∞ 

−q / 2 b = −∞

Pc (bq − w − n)eun dn.

(B.36)

Now we convert this into an expression equivalent to a Fourier transform, so that we can take advantage of the Fourier convolution theorem. To perform  the conversion, we substitute −2␲u for u, and multiply by the rect( ) function, which allows the integral to go from −∞ to ∞. This results in    ∞  n Pc (bq − w − n)e−2␲u n dn q −∞ b = −∞      ∞ n = Fn rect P (bq − w − n) , q b = −∞ c

Mn|w = w (−2␲u ) =



rect

(B.37)

where rect

   1 n = q 0

if |n| <

q 2

otherwise

(B.38)

525

526

APPENDIX B Selected Theoretical Results

and Fx { f(x)} denotes the Fourier transform of f(x). Now, applying the Fourier convolution theorem [48], and noting that Fn {rect(n / q)} = q sinc(qu ), we have 

Mn|w = w (−2␲u ) = q sinc(qu ) ∗ Fn 





∞ 

Pc (bq − w − n)

b = −∞ ∞ 

= q sinc(qu ) ∗

Fn {Pc (bq − w − n)} .

(B.39)

b = −∞

Let’s now look at the Fourier transform of Pc (bq − w − n): Fn {Pc (bq − w − n)} =



Pc (bq − w − n)e−2␲u n dn. 

−∞

(B.40)

This can be converted into a function of the moment-generating function for c (which we have if we are given the probability density function for c). We introduce the substitution, v = bq − w − n, so that n = bq − w − v and dn = − dv. Equation B.40 becomes Fn {Pc (bq − w − n)} = −

−∞

Pc (v)e−2␲u (bq−w−v) dv 



= e−2␲u (bq−w) 

∞ −∞

Pc (v)e2␲u v dv. 

(B.41)

The remaining integral in Equation B.41 is the moment-generating function, Mc ( 2␲u ) of c. Referring to Equation B.39, we now examine the summation of the Fourier transform. This can be rewritten as ∞ 

∞ 

Fn {Pc (bq − w − n)} =

b = −∞

e−2␲u (bq−w) Mc (2␲u ) 

b = −∞

= e2␲u w Mc (2␲u ) 

∞ 

e−2␲u bq . 

(B.42)

b = −∞

Using the substitution ∞ 

e−2␲u bq = 

b = −∞

∞ b 1   ␦ u− , q b = −∞ q

(B.43)

Equation B.42 becomes ∞ 

Fn {Pc (bq − w − n)} = e2␲u w Mc (2␲u ) 

b = −∞

∞ 1   b ␦ u− . q b = −∞ q

(B.44)

Finally, we substitute Equation B.44 into B.39, replacing u with −u / 2␲, and simplifying, we obtain Mn|w = w ( u) =

     q  b b −2␲ bq w . e u + 2␲ Mc 2␲ sinc q 2␲ q b=−∞ ∞ 

(B.45)

B.3 Effect of Quantization Noise on Watermarks

B.3.5 Determining the Expected Correlation for a Gaussian Watermark and Laplacian Content Having derived a model for the moment-generating function of Pn|w=w (n), we are now in a position to analyze the quantization error in more detail. Ideally, the error, n, should be independent of the watermark, w. Equation B.45 shows that in general this will not be true. For dithered quantization, a careful choice of the dither signal can be made. However, for watermarking we do not have control over the content and must therefore assume that the quantization noise is not independent of the watermark signal. At this point, we can now outline the derivation of Eggers and Girod. The application of Equations B.30 to B.45 yields E(n|w = w) = −

∞  b=−∞ b =0

 2␲b 2␲b (−1)b Mc e q w . 2␲b / q q

(B.46)

We would like to determine E(nw). By definition, we have E(nw) =

∞ −∞

Pw (w)wE(n|w = w) dw.

(B.47)

Substituting Equation B.46 in B.47, we have E(nw) = −

  2 ␲b 2␲b ∞ (−1)b Mc wPw (w)e q w dw, 2␲b / q q −∞

∞  b=−∞ b =0

(B.48)

which can be rewritten as E(nw) = ␴2w

∞  b=1

(−1)b M (2␲b / q)Im ␲ b␴ w / q c



1

␴w

 (1) Mw (2␲b / q) ,

(B.49)

(k)

where Mx (u) is defined as Mx (u) = (k)

∞ −∞

xk Px (x)eux dx,

(B.50)

and ␴w is the standard deviation of the watermark. To apply Equation B.49, we need to know the probability density functions for the watermark and the content. The probability distribution for the watermark is up to the designer of the watermarking system. For convenience, we will assume it is Gaussian. As discussed in Chapter 3, the probability distribution for the content is difficult to model accurately. In Chapter 7, when deriving the whitening filter of System 11, we assumed an elliptical Gaussian distribution as a model of image pixel values. In that case, the Gaussian assumption led to a reasonable result. However, Eggers and Girod investigated the use of an elliptical Gaussian model

527

528

APPENDIX B Selected Theoretical Results

for the distribution of coefficients in an image’s block DCT and found that the resulting predictions of E(nw) did not match experimental results. Instead, they suggested using a Laplacian distribution, as recommended in [41, 345], or a generalized Gaussian, as recommended in [41, 305]. The generalized Gaussian yielded better results, but it was not possible to compute E(nw) analytically. We therefore conclude this section by providing the equation for E(nw) when c is drawn from a Laplacian distribution. We are assuming that the content has a Laplacian distribution with zero mean and standard deviation ␴c , Pc (c) = √

1 2␴c



2

|c| − e ␴c ,

(B.51)

and that the watermark has a Gaussian distribution with zero mean and standard deviation ␴w , rendering 2

Pw (w) =

− w2 1 2␴ √ e w. ␴w 2␲

(B.52)

Under these assumptions, the expected value, E(nw), is given by E(nw) = ␴2w

∞ 

(−1)b

b = −∞

2 1 e−2(␲b␴w /q) . 1 + 2(␲b␴c / q)2

(B.53)