ARTICLE IN PRESS
Signal Processing 88 (2008) 539–557 www.elsevier.com/locate/sigpro
An interpolation-based watermarking scheme V. Martin, M. Chabert, B. Lacaze ENSEEIHT/IRIT, National Polytechnic Institute of Toulouse, 2 Rue Camichel, BP 7122, 31071 Toulouse Cedex 7, France Received 5 April 2007; received in revised form 2 August 2007; accepted 28 August 2007 Available online 4 September 2007
Abstract Interpolation techniques are often designed to provide a good perceptual quality from known sample values. However, interpolation is essentially considered as a source of decoding errors for watermarking schemes. Conversely, this paper proposes an informed watermarking scheme based on interpolation. This scheme takes advantage of interpolation to generate imperceptible marks in the spatial domain. It can be related to random binning schemes with particular codebook and decoding rule. Theoretical performances are derived and informed embedding strategies are proposed. Two particular implementations based on bilinear and spline interpolation are then applied to image watermarking. The good robustness of these schemes to noise and valumetric attacks is confirmed by simulations. Finally, an attack is specifically designed to check the algorithm security. r 2007 Elsevier B.V. All rights reserved. Keywords: Digital watermarking; Interpolation; Informed embedding
1. Introduction 1.1. Digital watermarking Digital watermarking consists of embedding data at the content-level of digital media under constraints on imperceptibility, security and robustness to attacks. Its applications range from digital rights management to integrity protection. This paper considers scenarios where document-dependent watermarks can be embedded. This includes the copyright protection application. The host document is not used at the detection. Corresponding author. IRIT-ENSEEIHT, 2 rue Charles Camichel, B.P. 7122, 31000 Toulouse, France. Tel.: +33 561588072; fax: +33 561588306. E-mail addresses:
[email protected] (V. Martin),
[email protected] (B. Lacaze).
In direct sequence (DS) spread spectrum watermarking [1], the additive mark is the message modulated by a pseudo-noise. The message can be decoded by correlation with this pseudo-noise. Classical spread spectrum methods are subject to host interference. Extensions provide improved performance thanks to Wiener prefiltering before decoding (DS þ W) or optimal decoding for a given host statistical model [2]. Informed watermarking provides better performance when the host signal is known to the embedder [3]. In informed coding, a watermark template is directly generated from the host document. It can be combined with informed embedding, which uses knowledge upon both the host and the decoding technique. Specific strategies are designed to improve imperceptibility, robustness or detection receiver performance. For instance, linear improved spread spectrum (LISS) [4] is a modulation technique derived from DS. It removes
0165-1684/$ - see front matter r 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2007.08.016
ARTICLE IN PRESS V. Martin et al. / Signal Processing 88 (2008) 539–557
540
Nomenclature x y z m d ^ m u u~ ðxÞ N L P S Sl G K
original document watermarked document received document message dither vector or security parameters message estimation any input vector result of an interpolation of u interpolation error on x document length payload total rate watermarked sample coordinates subset of S associated to ml interpolation grid secret key
a part of the host signal interference, as a compromise with robustness to a given attack. Recent advances focus on random binning inspired from Costa work in information theory [5]. The inserted mark is selected in a random codebook divided into bins. Each bin is associated to a possible message. For a given message, the inserted mark is the bin element which is closest to the host data. The decoding identifies the bin which is closest to the received document. In practice, acomputationally tractable binning codebook can be constructed using structured quantization [6]. A popular scalar quantization-based watermarking scheme is called scalar Costa scheme (SCS) [7]. It improves the robustness by an informed embedding strategy: the quantization step is increased, while additional distortions are compensated. In spread transform scalar Costa scheme (ST-SCS) [7], the robustness to noise is improved by quantizing the projection of the data onto a pseudo-random vector. Classical quantization-based watermarking schemes are fragile to valumetric attacks such as gain or histogram modifications. Host-proportional embedding techniques [8,9] such as rational dither modulation (RDM) offer invariance to linear amplitude scaling by using locally adaptive quantization step-sizes. This paper contains comparisons to DS, DS þ W, LISS, SCS, ST-SCS and RDM schemes. Imperceptibility of the mark is a major concern in watermarking. Thus, most watermarking schemes
NS PS g f Nv Z3 b3 r2l n nth nð0Þ nðtÞ a a D No
cardinality of S embedding redundancy interpolation operation interpolant interpolant support length cubic spline interpolant B-spline synthesis function decoding mean square error on Sl detection threshold optimum decoding threshold empirical decoding threshold iterative decoding threshold scalar weighting factor range of d k associated to noise influence at decoding or quantization step size number of observations (security attack)
must be combined with the so-called perceptual masks. These masks often weight spread spectrum watermarks. They also apply to spread transformbased schemes such as ST-SCS. In other quantization-based schemes, the quantization step must be locally adapted to the host according to a perceptual analysis, as suggested in the conclusion of [9]. In image watermarking, most popular spatial masks consist of weighting by a local variance computed with the noise visibility function (NVF) [10], subtracting a portion of second derivatives by a Laplacian filter [11] or weighting by filtered horizontal and vertical first derivatives [12]. All masks are based on empirical properties of the human visual system, combined with a statistical analysis. Spatial masks favor edges and regions of high local variance, that may concern few pixels. These masks often lead to high-pass watermarks. On the other hand, frequential masks such as DCTdomain masking are based on contrast and texture masking [2]. Interpolation techniques, widely studied in digital signal processing, are often designed to provide a good perceptual quality from known sample values. However, interpolation is essentially considered as a source of decoding errors for watermarking schemes, as detailed in Section 1.2. Conversely, this article proposes to take advantage of the perceptual properties of interpolation to generate imperceptible marks in the spatial domain. An informed watermarking algorithm is proposed. It shares some
ARTICLE IN PRESS V. Martin et al. / Signal Processing 88 (2008) 539–557
properties with host-proportional random binning techniques, but uses original codebook and decoding rule. 1.2. Role of interpolation in watermarking schemes In watermarking schemes, interpolation is usually considered as a source of decoding errors. Interpolation is involved in most geometrical attacks such as rotation [13]. Indeed, such attacks distort the original data coordinates. The pixel values on the original discrete grid are then derived by interpolation. The software Checkmark proposes an attack based on downsampling followed by interpolation-based upsampling [14]. Interpolation is also required to construct a geometrically transformed mark [15]. Heavy interpolation dissuades from watermarking in continuous transformed domains such as the Fourier–Mellin domain [16]. In all cases, the necessary resampling of continuous interpolated data acts as a perturbation. In three specific cases, interpolation has been exploited as an element of the watermarking scheme. Firstly, classical additive embedding is combined with polynomial interpolation in [17]. Interpolation is used there for a cryptographic purpose, in a hierarchical and deterministic secret sharing procedure. Secondly, 3D object documents consist of mathematical models. These models are used to generate the visual output by interpolation. For instance, 3D objects can be represented by non-uniform rational B-Splines. The watermark modifies the interpolation parameters, such as the spline degree or node positions [18]. Thirdly, the detection performance of correlation-based watermarking schemes facing an upscaling attack is improved in [19] by combining information from different polyphase components of the interpolated image. 1.3. Outline and notations The following notations will be used. Let x denote the original document, w the watermark, y ¼ x þ w the watermarked document, n the attack noise and z the received document, all of size N. Let m denote ^ denote the the original message of size L. Let m estimation of m at the decoding. Let P ¼ N=L denote the redundancy. In an application to images, a will refer indifferently to a matrix of N entries or to the associated vector. Elements of any vector a will be denoted ak . Elements of any matrix a will be
541
denoted ak;l . Let s2a denote the variance of any element of a. The watermarked document y is transmitted and possibly attacked, leading to z. Under the widespread additive white Gaussian noise (AWGN) assumption, z ¼ y þ n where nk Nð0; s2n Þ. Finally, let us define the document to watermark ratio (DWR) and the watermark to noise ratio (WNR), expressed in dB: 2 2 s s DWR910 log10 2x ; WNR910 log10 w2 . sw sn WNR measures the transmission noise and the attack influence. DWR measures the strength of the watermark. It gives an indication on the imperceptibility of w with respect to the host. However, as detailed in Section 3, a perceptual analysis must be conducted in practical conditions. This paper is organized as follows: Section 2.1 presents a short review of interpolation techniques. In Section 2.2, a generic watermarking scheme called W-interp is proposed. The examples of W-bilin, based on bilinear interpolation, and W-spline, based on spline interpolation, are detailed in an application to image watermarking. In Section 3, the imperceptibility of the scheme is assessed. Section 4 derives theoretical decoding performance in the context of AWGN. This analysis is used in informed embedding strategies. Section 5 provides an experimental study of the robustness of W-bilin and W-spline. Finally, Section 6 studies W-interp security. 2. Interpolation-based watermarking 2.1. Review of interpolation techniques Let u ¼ fuk gk2Z denote discrete samples. Let G ¼ ftk gk2Z denote a grid corresponding to u coordinates. Interpolation is the construction of a continuously defined signal gðG; u; tÞ; t 2 R with gðG; u; tk Þ ¼ uk ; 8k 2 Z. For instance, perfect reconstruction of polynomials is possible under some conditions from a sufficiently large number of observed values. More complex models such as band-limited signals are more suitable in the field of signal processing. If the model is not appropriate, the reconstruction by interpolation performs poorly. Artifacts such as ringing, blocking, aliasing or blurring may then occur [20]. In digital image processing, the original data points generally belong to a regular grid. Since the spectrum of natural
ARTICLE IN PRESS V. Martin et al. / Signal Processing 88 (2008) 539–557
542
Let b3 ðtÞ denote the B-spline synthesis function of degree 3 (cf. Fig. 1(a)). Let TZ and TZ1 denote the Z-transform and its inverse, and ðb3 Þ1 ðkÞ ¼ ðTZ1 ð1=TZðb3 ðkÞÞÞÞðkÞ. Then
images decreases rapidly, low-pass interpolation techniques are often appropriate and preserve visual quality. Signal interpolation techniques include in range of increasing performance, nearest-neighbor, linear, cubic-spline and B-spline interpolation [20]. For each of these techniques, a bidimensional extension is possible: an image gðG; u; t1 ; t2 Þ is interpolated from u ¼ fuk;l gðk;lÞ2Z2 by separable filtering on lines and columns. In this case, each line (resp. each column) must have the same sampling, which excludes scattered data. Other techniques allow for interpolation from scattered data, e.g. radial basis interpolants. In this paper, implementations will focus on bilinear and bicubic B-spline interpolation.
þ1 X
uk f ðt tk Þ.
ðb3 Þ1 ðkÞb3 ðt kÞ.
(2)
k¼1
2.2. Informed watermarking scheme based on interpolation An intuitive idea, which will not be developed in this paper, is to propose interpolation-based psychovisual masks. These masks could be proportional to an interpolation error in the spatial domain. For instance, a mask based on bilinear interpolation would be similar to [11], or to a second derivative extension of [12]. In this first proposal, the watermark is only weighted thanks to interpolation. In this section, the watermark w is rather directly generated from an interpolation function. The proposed watermarking technique can be considered as a random binning scheme. However, it is based on an original codebook and decoding rule. Two bins are considered. The first bin, corresponding to the bit 0, consists of the original host samples. The second one, corresponding to 1, consists of samples interpolated from their neighbors. Thus, only samples outside an interpolation grid are watermarked. The decoding rule relies on redundancy, since the two bins are not disjoint. Security is based on the secrecy of the exact interpolation technique. Parts of this work have been presented in [22,23].
2.1.1. Linear interpolation Most interpolation techniques use convolution by finite-support interpolants f: gðG; u; tÞ ¼
þ1 X
Z3 ðtÞ ¼
(1)
k¼1
If f is linear (i.e. polynomial of degree 1), the technique is called linear interpolation. Bilinear interpolation at ðt1 ; t2 Þ is the mean of the four nearest neighbors on the grid weighted by their distance from ðt1 ; t2 Þ. 2.1.2. Cubic spline interpolation Consider now regularly spaced samples (tk ¼ k 8k 2 Z). Let N v denote the support length of f: f ðtÞ ¼ 0 8te N v =2; N v =2½. Splines are piecewise polynomial functions whose pieces are smoothly connected together. Cubic convolutional splines provide interpolants of degree 3 and support length 4 [20]. B-splines allow for interpolation by an infinite-support interpolant Z3 , called cardinal spline (cf. Fig. 1(b)), with a reasonable computational cost, by avoiding direct convolution [21].
2.2.1. W-interp, a generic watermarking scheme A generic watermarking scheme called W-interp is presented in Fig. 2. As previously, g denotes an
0.8
1
0.6 0.5
0.4 0.2
0 0 −3
−2
−1
0
1
2
3
−4
−2
0
Fig. 1. Cubic B-spline: (a) synthesis function, (b) cardinal spline.
2
4
6
ARTICLE IN PRESS V. Martin et al. / Signal Processing 88 (2008) 539–557 embedding
m substitution
x|S k
x
interpolation g
∼ x
y
|S
x|G attacks
z |S ^ m
detection/ decoding
ε (z|S) z∼ |S
interpolation g
z
z |G k detection
Fig. 2. Watermarking scheme W-interp.
interpolating function and G the coordinates of an interpolation grid. The watermark is embedded in a subset S of the complementary set of the grid in the host: S f1; . . . ; NgnG. Let N S denote the cardinality of S and PS ¼ N S =L the embedding redundancy. S is divided into L non-overlapping, randomly constructed sets of size PS : S ¼ S1 [ [ SL , Si \ Sj ¼ ; 8iaj. Sl is associated to the bit ml of the message. Let xjG denote the restriction of x to samples whose coordinates belong to G. A vector of parameters is specifically introduced to guarantee the algorithm security. This vector is denoted d ¼ fd k gk¼1;...;N in reference to the ‘‘dither signal’’ used in quantization-based schemes [6]. Finally, for any input vector u, let us define the interpolation result u~ k 9gðG; ujG ; k d k Þ.
(3)
In the following, the interpolation error on any vector u will be denoted ðuk Þ ¼ u~ k uk . By using interpolation, the output document u~ is meant to be perceptually close to u. Computation of the interpolation result requires knowledge of d. At the embedding, if ml ¼ 1, the values xjSl of the samples in Sl are substituted by x~ jSl . Otherwise, xjSl is not modified. The embedding rule is thus the following: yjSl ¼ xjSl þ ml ðe xjSl xjSl Þ.
(4)
After possible attacks, the decoding compares zjSl and z~ jSl . For a given bit ml , let us consider the mean square error r2l defined by 1 X ðzk Þ2 . (5) r2l ¼ PS k2S l
A basic decoding strategy is to compare r2l to a document-dependent threshold n. If r2l on, the
543
^ l ¼ 0. n can be chosen ^ l ¼ 1, else m decision is m empirically or from appropriate hypotheses about the interpolation error distribution, as described in Section 4.1. As developed in Section 4.1.3, this mean square error-based decoding rule corresponds to the Neyman–Pearson detector under the assumption of Gaussian interpolation error and noise distributions. Specific decoders can be designed under other hypothesis. Finally, a symmetric watermarking algorithm is based on a secret key K, known to the embedder and to the decoder, that prevents unauthorized users to decode the watermark. Here, K consists of the watermark coordinates and the associated parameters d: K ¼ fS1 ; . . . ; SL ; dg. The algorithm is characterized by the choice of function g, parameters d, a grid G, and locations S1 ; . . . ; SL of N S watermarked samples. 2.2.2. W-interp analysis W-interp is a blind watermarking scheme, since x is not used at the decoding. It is an informed coding method since x is used during the generation of w. The basic embedding strategy considered here is to maximize detection for a fixed distortion, in the absence of attack noise. Other informed embedding strategies are considered in Section 4.2. In the implementations proposed in this paper, g will be a linear function of xjG . It will act as a local filter. The condition of imperceptibility imposes that w modifies the high and middle frequencies of x. Under this assumption, g behaves like a low-pass filter. The watermark modifies the high-pass components of x. Due to the embedding rule (4), W-interp can be linked to random binning techniques. In random binning, in the case of binary signaling, two codebooks M0 and M1 are constructed, corresponding to bits 0 and 1. The embedding of ml chooses the codeword that is the perceptually closest to xk among the bin Mml . Thus, a perceptual analysis is often necessary. At the ^ l corresponds to the bin Mm^ l that is decoding, m the closest to zk . The distance between bins associated to different transmitted symbols is usually maximized. A minimum-distance decoder is then efficient. Bins are often designed to approach the channel capacity. Space-filling binning codebooks are usually built on non-linear quantizers [5]. In such quantization-based schemes, the security relies on a secret dither signal d added prior quantization. For instance, if QD is a scalar
ARTICLE IN PRESS 544
V. Martin et al. / Signal Processing 88 (2008) 539–557
quantizer of step D, mk mk yk ¼ QD xk D d k þ þ D dk þ . 2 2
(6)
A pirate might succeed in identifying the quantization grid associated to yk . However, it cannot directly decode mk without knowledge of d k . The secrecy of the codebook has also been envisaged [24]. W-interp does not follow some of these guidelines. W-interp can be considered as a random binning technique using repetition coding since in the embedding rule (4), the original sample is substituted by a document-dependent codeword. For a sample xk , the codebook of W-interp consists of two bins. The bin M0 consists of an original host sample xk . The bin M1 consists of the interpolated sample x~ k . Thus, only one codeword belongs to each bin. The two bins do not have the same statistical properties. Thanks to interpolation, the particular codebook of W-interp provides an intrinsic perceptual mask: both codewords are meant to be perceptually close to xk . Like in random binning, decoding a bit ml for W-interp consists in identifying the bin associated to a sample zk ; k 2 Sl . However, at the sample level, M0 and M1 can overlap, since the interpolation error may equal zero. Moreover, M0 is unknown to the decoding. Thus, the original decoding rule (5) has been introduced. Its performance is intrinsically due to the redundancy. For the basic embedding strategy and for the noiseless case, W-interp performance can be arbitrarily tuned as a function of the redundancy, as shown in Section 4.1. Obviously, W-interp codebook leads to a poorer robustness to AWGN than SCS, which is specifically designed for this attack. However, simulations on real data in a low-rate scenario, provided in Section 5, show W-interp high robustness to other attacks such as compression, denoising and histogram equalization, as shown in Section 5. Neighboring samples indexed by G are necessary to compute ðxÞ, and thus the bin M1 . Thus, they can be considered as an information about the document-dependent codebook, that must be transmitted in the document alongside the watermark, in yjG . The achievable rate is N S =N because of the interpolation grid transmission. Note that since performance increases with redundancy, lower rates are considered in practice (cf. Section 4.1). The generating operator g plays a similar role as quantizers: both are idempotent. However, dither
modulation in W-interp is useless, since it would not alternate the two bins. Dither addition in W-interp would correspond to u~ k 9gðG; ujG þ djG ; kÞ.
(7)
A specificity of W-interp is that g can be linear. In this case, dither addition (7) and secrecy of the exact interpolation (3) have close behaviors. The possible linearity of g may damage the algorithm security. In this case, indeed, estimation of K is possible by solving a linear equation system with latent variables, as developed in Section 6. 2.2.3. Application to images Two instances of the generic algorithm W-interp are detailed for an application to images. Both algorithms use the quincunx grid (lattice D2 according to the notations used in [5]). This grid allows for computationally efficient implementation. Comparison between a basic interpolation technique and a more sophisticated one is necessary, since imperceptibility and robustness of the scheme depend on the interpolation error. In W-bilin, the interpolation technique g is built on bilinear interpolation. In W-spline, g performs a bicubic B-spline interpolation. In two dimensions, the interpolation (3) becomes u~ k1 ;k2 9gðG; ujG ; k1 þ d 1k1 ;k2 ; k2 þ d 2k1 ;k2 Þ,
(8)
where the elements of d1 ; d2 are independent random variables uniformly distributed over a; þa½, with 0pap1. A geometrical interpretation of these shifts is displayed in Fig. 3. Section 6 explains how these random shifts improve security. The choice of parameter N S is discussed in Section 3.
d
1
d
2
known points (grid) unknown point
Fig. 3. Random shifts in the coordinates.
ARTICLE IN PRESS V. Martin et al. / Signal Processing 88 (2008) 539–557
3. Imperceptibility 3.1. Theoretical embedding strength DWR is used to control the embedding strength. Let s2ðxÞ denote the variance of ðxk Þ on f1; . . . ; NgnG. Note that the interpolation accuracy measured by s2ðxÞ depends on the interpolation technique, on x and on the cardinality of G. It is independent of the subset S and of N S . For W-interp, assuming the equiprobability of bits 0 and 1, DWR ¼
2s2x N . s2ðxÞ N S
(9)
N S and thus the embedding redundancy PS can be derived as a function of the DWR for a given document and interpolation technique. Note that the local distortion cannot be constrained to lie in a given range. Moreover, on a subset Sl with ml ¼ 1, the DWR per sample is s2x =s2ðxÞ , fixed by the document. A large value of s2ðxÞ , corresponding to a small N S , could result in a large distortion localized on few samples. However, several solutions are available in order to increase N S at constant DWR. The most elegant solution would be to improve the interpolation technique itself. Clipping the interpolation error would affect its statistical properties and the robustness. Finally, limitation of the range of shifts d through parameter a improves the imperceptibility at the expense of the security (cf. Section 6). 3.2. Application to images A typical watermark generated by W-bilin is presented in Fig. 4. Interpolation leads to large pixel modifications only in regions of high local variance, where modifications are less perceptible. Thus, perceptual masking is intrinsic to W-interp. Moreover, the resulting watermark is highly correlated to the host document, which contributes to the robustness to attacks such as denoising (cf. Section 5.1).
545
The spectrum of w has the form of a lobe, as shown by the cumulated periodograms on the lines displayed in Fig. 5. Indeed, the interpolation error is a high-pass signal [20] and the spectrum of natural images decreases rapidly around the origin (cf. Fig. 5). Since W-interp is based on intrinsic properties of the image such as correlation between neighboring pixels, its features depend on x. On a database of 47 natural images extracted from [25], the DWR per sample on a subset Sl with ml ¼ 1 is s2x =s2ðxÞ ¼ 16 dB on average for bilinear interpolation and a ¼ 1. The cardinality of the proposed quincunx grid is N=2, thus the maximal cardinality of S is N S ¼ N=2. Consequently, the minimum achievable DWR is 4s2x =s2ðxÞ when N S ¼ N=2. The robustness and security properties of W-interp are thus limited on some images. The imperceptibility can be evaluated thanks to objective quality metrics. DWR is a quality metric in itself since it is related to the peak signal-to-noise ratio (PSNR), defined as follows for 8-bit images: 0 1 B C 2552 C PSNR910 log10 B @ 1 PN A. 2 E ðy x Þ k N k¼1 k However, it fails to handle some properties of the human visual system. For instance, the perceptual distortion introduced by geometrical transforms (cf. Section 5.3) is overrated by PSNR. Thus, objective metrics have been introduced to assess the image perceptual quality. Note that no existing quality metric is completely satisfying [26]. Watson’s perceptual distance [27] is a classical quality metric in the watermarking community. However, it is obviously biased toward the DCT mask derived from the same psychovisual studies [2]. Experimental results (cf. Table 1) present similar distortion for W-bilin and W-spline than for spatial psychovisual masks associated to DS. An implementation of the NVF based on a non-stationary Gaussian model in the spatial domain was used [10]. The performance
Fig. 4. Lena (detail): original, watermarked and watermark, PSNR ¼ 43 dB.
ARTICLE IN PRESS V. Martin et al. / Signal Processing 88 (2008) 539–557
546
10
6
4
10
original 10
2
watermark
0
10
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
Fig. 5. Original image and watermark spectrum, Lena. Table 1 Comparison of Watson quality metric, PSNR ¼ 43 dB, N ¼ 218 DS DS þ Laplacian W-bilin a ¼ 1 W-bilin a ¼ 1=2 W-bilin a ¼ 0
367 345 357 330 322
DS+NVF DS+[12] W-spline a ¼ 1 W-spline a ¼ 1=2 W-spline a ¼ 0
315 341 354 321 295
of this mask would be improved by implementation in an appropriate transformed domain and by adapting its parameters to each image. The test image set is composed of 47 images extracted from [25]. Kullback–Leibler and SSIM criteria confirm these results [23]. However, a rough subjective evaluation of W-bilin reveals artifacts on edges of x, as well as some aliasing artifacts on oriented textures. These artifacts are smaller for W-spline, but still present. A formal subjective evaluation would be necessary, to the example of [26]. An accurate but costly perceptual evaluation would allow the selection of the most appropriate interpolation technique for W-interp. Edge-preserving interpolation techniques (see Ref. [28]) are good candidates. For instance, some of these techniques involve an edge detection step. A simple implementation would combine W-spline with an edge-excluding mask, possibly at the cost of some additional fragility to attacks. 4. W-interp performance analysis 4.1. Theoretical performance study This section studies the performance of W-interp in the presence of an AWGN attack. When s2n is unknown, an iterative decoder is proposed. When the attack parameter s2n is known or can be accurately estimated, theoretical detection and
decoding thresholds are derived. This theoretical performance expression is also used in informed embedding strategies of Section 4.2. For simplicity and generality, the interpolation error ðxk Þ will be modeled as a zero-mean Gaussian variable of variance s2ðxÞ . In the particular application to images, a generalized Gaussian distribution is more appropriate, according to the histogram with truncated tails of ðxÞ [23]. A detector for W-interp based on this model was derived in [23]. 4.1.1. AWGN influence Let define d0 9ð1 dÞ=2. According to (1), in the case of a unidimensional regularly spaced grid 1 X
ðuk Þ ¼
f ðd 0k þ iÞðukþ2i1 uk Þ.
(10)
i¼1
The decoding results from the thresholding of ðzÞ, with ðzk Þ ¼ ðyk Þ þ ðnk Þ.
(11)
Straightforward derivations lead to E½ðnÞ ¼ 0 and s2ðnÞ ¼ ð1 þ DÞs2n , where the constant D depends on d: D9
1 X
E½f ðd 0k þ iÞ2 .
(12)
i¼1
In the case of W-bilin, each weight expresses as d 1k;l Þð1 d 2k;l Þ. Thus, D ¼ 14 ð1 þ s2d Þ2 for any distribution of d with zero mean and variance s2d . This leads to D ¼ 0:25 if a ¼ 0 and D ¼ 49 ’ 0:44 if a ¼ 1. In the case of W-spline, f ¼ Z3 and according to the bidimensional extension by tensor product of (12),
1 4 ð1
D¼
þ1 X þ1 X i¼1 j¼1
2 3 02 E½ðZ3 ðd 01 k;l þ iÞZ ðd k;l þ jÞÞ .
(13)
ARTICLE IN PRESS V. Martin et al. / Signal Processing 88 (2008) 539–557
4.1.2. Neyman– Pearson detector This section considers a single bit mark (L ¼ 1) with m1 ¼ 1. The detection problem consists of a binary hypothesis test:
hypothesis H0 : absence of mark; hypothesis H1 : presence of a mark.
Under both hypotheses, zk ¼ yk þ nk , where nk is modeled as an AWGN noise. In this section, the Neyman–Pearson detector will be derived under the assumption of Gaussian ðzjS Þ. Note that this assumption concerns only the interpolation error since the spatial samples themselves would be modeled at best by mixtures of Gaussians in an application to images. Let Pd denote the probability of detection, Pfa the probability of false alarm. Provided that s2ðnÞ is known to the detector, the Neyman–Pearson detector maximizes Pd for a given Pfa [29]. Otherwise, Pd and Pfa can still be computed as a function of s2ðnÞ and of the detection threshold n. Under hypothesis H1 , the substitution at the embedding leads to ðyk Þ ¼ 0, thus ðzk Þ ¼ ðnk Þ is a zero-mean Gaussian variable of variance s2ðnÞ . Under hypothesis H0 , ðzk Þ ¼ ðxk Þ þ ðnk Þ. As x and n are supposed to be uncorrelated and in the case of linear interpolation, ðxk Þ and ðnk Þ are uncorrelated. If ðxk Þ is assumed to be zero-mean Gaussian, then ðzk Þ is a zero-mean Gaussian variable of variance s2ðnÞ þ s2ðxÞ . The test statistics corresponding to the Gaussian distribution is X T¼ ðzk Þ2 . (14)
Neyman–Pearson detector decides H1 when Ton with n ¼ s2ðnÞ F1 w2 ð1 Pfa Þ NS
thus Pd ¼ 1 Fw2N
S
! n . s2ðnÞ þ s2ðxÞ
The detection performance is evaluated through the receiver operating characteristic (ROC) curves [29]. These curves display Pd as a function of Pfa . W-interp provides good detection results since it is affected by AWGN attacks only for very high values of s2n or of PSNR, as the one shown in Fig. 6. The simulations provide the averaged performance on the image set composed of Lena, Baboon, Fishingboat, Peppers, Pentagon. 4.1.3. Decoding problem This section considers a multiple bit mark ðLX1Þ for decoding performance evaluation. The decoding problem consists of estimating the message from z. The decoding performance is measured experimentally Pthrough the bit error rate BER91 ^ l ; ml Þ, where d denotes the Kronecker ð1=LÞ Ll¼1 dðm delta. The optimum decision threshold nth minimizes the BER under the Gaussian assumption of ðzjS Þ. Classical random binning techniques are usually designed for the decoding problem. Thus, detection must be studied separately [30]. Due to the embedding rule (4), W-interp is more related to the detection problem. However, in this case, these results can be extended to decoding. Indeed, on a
1 0.99 Pd
Numerically, D ’ 0:58 if a ¼ 0 and D ’ 0:77 if a ¼ 1. Note that the noise influence is reduced in the absence of any shift and that W-spline is less robust than W-bilin since f can take negative values. The choice of the distribution of d is a compromise between imperceptibility and robustness on the one hand, and security on the other hand (cf. Section 6).
547
0.98
k2S
As ðzk Þ is zero-mean Gaussian and N S is the cardinality of S, T is distributed according to a w2N S distribution under both hypotheses. Let Fw2N S denote the w2N S cumulative distribution function. Under hypothesis H1 , the presence of a mark corresponds to the absence of an interpolation error ðxÞ. The lower mean and variance of T under H1 are involved in the detector derivation. Thus, the
0.97 0.96
0
0.01
0.02 P fa
0.03
0.04
Fig. 6. Receiver operating characteristic for W-bilin, N ¼ 218 , PSNR ¼ 43 dB, WNR ¼ 23 dB.
ARTICLE IN PRESS V. Martin et al. / Signal Processing 88 (2008) 539–557
given set Sl , hypothesis H0 (absence of a mark) is exactly similar to the embedding of ml ¼ 0. The redundancy on each set Sl is now PS . Thus, the test statistics T on each set Sl is the mean square error r2l defined in (5). The total probability of error is BER ¼ p½ml ¼ 1p½r2l 4nth jH1 þ p½ml ¼ 0p½r2l onth jH0 .
ð15Þ
Let f r2 jH0 , f r2 jH1 the respective probability density l l functions of r2l under H0 and H1 . Assuming the equiprobability of the binary message symbols, nth is the solution of qBER=qnjn¼nth ¼ 0, thus f r2 jH1 ðnth Þ l f r2 jH0 ðnth Þ ¼ 0. l Let f w2P denote the w2PS probability density S function. nth is numerically derived as the solution of ! 1 nth =PS f 2 s2ðnÞ þ s2ðxÞ wPS s2ðnÞ þ s2ðxÞ ! 1 nth =PS ¼ 2 f w2P . ð16Þ s2ðnÞ sðnÞ S Fig. 7 displays the experimental and theoretical BER. The estimation error between the theoretical experimental performance for large threshold values is due to the Gaussian approximation of ðxk Þ for images. W-interp is almost errorless for a low-rate 1 scenario. For a total rate N=L ¼ 1=Po 64 , which 18 corresponds to Lo4096 if N ¼ 2 , experiments on the test image set indicate that BER o105 at PSNR ¼ 43 dB. Results for larger values of L are displayed in Fig. 8. 4.1.4. Iterative decoding If sn is unknown to the decoder, an empirical threshold n must be used.
Firstly, n can be chosen as the mean of the decoding results: nð0Þ ¼
L 1X r2 . L l¼1 l
(17)
The theoretical threshold nth derived in the previous section provides better performance than nð0Þ , as shown in Fig. 7. Secondly, an iterative decoder can improve the threshold n based on the estimation of sðnÞ . At each iteration t40, the following decisions are taken: 2 ðt1Þ ^ ðtÞ ^ ðtÞ ^ ðtÞ m , else m l ¼ 1 if rl on l ¼ 0. Then s ðzÞjH1 is defined as the standard deviation of ðzk Þjk 2 ^ ðtÞ ^ ðtÞ Sl ; mðtÞ n ¼ s l ¼ 1. Thus, an estimation s ðzÞjH1 of sðnÞ is computed. Finally, the decision threshold nðtÞ at iteration t is the optimal threshold computed in Section 4.1.3, associated to s^ ðtÞ ðnÞ . 4.2. Informed embedding strategies In this section, informed embedding strategies are proposed, based on the analysis of the decoder in Section 4.1. In particular, the link between random binning and W-interp is exploited. An extension to distortion-compensation is proposed, inspired from quantization-based schemes [6]. 10
BER
548
10
10
3000
4000
5000
6000
7000
8000
Fig. 8. Influence of the embedding redundancy on the decoding, s2n ¼ 0, N ¼ 218 , PSNR ¼ 43 dB.
0.5
BER
0.4 0.3
BERth η (emp.) η
0.2
BERxp
th
0.1 0 1000
2000
3000
9000 10000
L
4000 η
5000
6000
7000
8000
Fig. 7. Choice of n: N ¼ 218 , PSNR ¼ 43 dB, L ¼ 1024, WNR ¼ 10 dB, Fishingboat.
ARTICLE IN PRESS V. Martin et al. / Signal Processing 88 (2008) 539–557
4.2.1. Distortion-compensation The proposed extension is based on the fact that the distance between the bins M0 and M1 is dependent on s2ðxÞ . Thus, the distance between the test statistics distributions under the two hypotheses depends also on s2ðxÞ (cf. Section 4.1.2). However, the distortion in Eq. (9) increases with s2ðxÞ when N S is constant. In this section, a constant distortion embedding strategy called distortion-compensation is applied to W-interp. Distortion-compensation consists firstly of increasing s2ðxÞ to s20 ðxÞ 4s2ðxÞ . For instance, it is sufficient to increase a (cf. Section 2.2.3). Let e x0 denote this new interpolation result. Let a denote a scalar 0oao1. In order to keep the distortion constant, a fraction ð1 aÞ of the new interpolation error is introduced in (4): yjSl ¼ xjSl þ ml aðe x0jSl xjSl Þ.
(18)
In this section, N S will be kept constant and a will not be used at the decoding. The constant distortion condition, according to Eq. (9), imposes that vffiffiffiffiffiffiffiffiffiffi u 2 u sðxÞ (19) a¼t 2 . s0 ðxÞ The decoding will be based on the new value of a. The impact of distortion-compensation on the robustness is more limited for W-interp than for SCS. Indeed, the robustness improvement for Winterp is a function of two factors and thus must be studied numerically. On the one hand, s2ðxÞ increases with a, thus with D (cf. Eq. (12)). On the other hand, s2ðnÞ is proportional to ð1 þ DÞ and plays also a role in the test performance of Section 4.1.2. The dependence between s2ðxÞ and a must be studied
549
experimentally or according to a model of the interpolation error. In W-interp, D (cf. Eq. (12)) plays a similar role as the quantization step in SCS. Distortion-compensation benefits also to the security of W-interp (cf. Section 6). We consider now the scenario of an AWGN attack of power known to the embedder. Let D0 denote the value of D corresponding to s20 ðxÞ . The optimal value of D0 is then chosen to minimize the BER computed in Section 4.1.3. This scheme will be called distortion-compensated W-interp (DC-W-interp). Fig. 9 shows the improvement of DC-W-interp on the theoretical performance facing AWGN. The performance in Fig. 9 and the optimal interpolation step are computed numerically, according to experimental values of s2ðxÞ corresponding as an example to bilinear interpolation and the image Lena. 4.2.2. Other strategies DC-W-interp can be considered as a strategy of maximization of the robustness at constant distortion. Other informed embedding strategies can optimize parameters N S , D and a to achieve a goal at the reception. Strategies include minimizing distortion at constant detection or at constant robustness. Note that when g is linear, W-interp cannot be combined to a spread transform to improve robustness as for SCS. However, this strategy could apply to non-linear interpolation techniques. 5. Robustness to attacks Attacks on the robustness imperceptibly distort y in order to prevent a correct decoding. For instance,
0
10
−1
BER
10
−2
10
−3
10
−4
10
22
23
24
25
26
27
28
DNR
Fig. 9. Theoretical improvement of the decoding performance by DC-W-bilin, Lena, PSNR ¼ 43 dB, N ¼ 218 , L ¼ 256.
29
ARTICLE IN PRESS V. Martin et al. / Signal Processing 88 (2008) 539–557
550
Fig. 10 shows on the Lena image an example of the impact of a JPEG compression on the distribution of the interpolation error under both hypotheses. The source of decoding error is the intersection between the two histograms. Since there exist almost as many sophistications as possible attacks, the proposed simulations compare W-bilin and W-spline to some classical schemes in their standard implementations. Parameter a ¼ 1 in W-interp. In the experiments, the classical schemes DS [2], DS þ W [2], LISS [4], SCS [7], ST-SCS [7] and RDM [9] have been applied in the same conditions as W-interp, in the spatial domain of natural images. Consequently, no transformed domain, no perceptual mask and no host-statistics optimal decoding have been used. The robustness of all these schemes could clearly be improved by embedding in a specific transformed domain and by using specific decoding strategies. Moreover, no standard implementation of quantization-based schemes in the spatial domain of natural images has been defined. Thus, the basic implementations initially proposed to work on synthetic data have been used. SCS and RDM have been implemented with a repetition coding strategy, binary signaling and uniformly distributed dither vectors. More sophisticated quantization-based strategies available in the literature could outperform SCS in these scenarios. Therefore, the study does not intend to derive general conclusions about the intrinsic performance of classical schemes. The compatibility of W-interp with more sophisticated embedding domains and decoding techniques is a key issue for future work. The simulations illustrate two different scenarios.
The AWGN attack scenario assumes that s2n is known at the embedding (Fig. 11). This hypothesis allows for distortion compensation in ST-SCS and LISS and optimal decoding in W-interp. In the second (more practical) scenario, the attack parameters are unknown to the embedder. The iterative decoding threshold (cf. Section 4.1.4) is then used in W-interp. Similarly, no distortion compensation is performed in SCS, ST-SCS and LISS. SCS would perform better if an appropriate distortion compensation parameter was used. The loss of performance in ST-SCS and LISS is low for reasonable values of the spreading factor P ¼ N=L and WNR. Indeed, the optimal distortion compensation parameter (a in [7] or l in [4]) is close to 1 when PWNR is large. The chosen robustness evaluation criterion is the BER. In Figs. 11 and 12, ‘‘(th)’’ denotes theoretical performance while ‘‘(xp)’’ denotes experimental ones. In Figs. 13–15, only experimental results on real data are displayed. The simulations provide the averaged performance on the image set composed of Lena, Baboon, Fishingboat, Peppers, Pentagon for 100 iterations. The feature vector length is N ¼ 218 . Due to projection of the noise on a pseudorandom pattern at the decoding, DS techniques are robust at very low WNR. W-interp allows for better performance at reasonable WNR only. When severe attacks occur, the performance of LISS falls close to that of simple DS. W-interp provides good robustness to compression and denoising. The larger interpolant support length of W-spline compared to W-bilin does not improve the robustness to these attacks.
140
500
120 400 100 300
hypothesis H 1
80
hypothesis H1
60
200
40 100
0
hypothesis H 0
0 0
1000
2000
3000
4000
5000
hypothesis H 0
20
6000
7000
0
2000
4000
Fig. 10. Example of histogram of r2 ðlÞ, no attack (left) and after a JPEG compression.
6000
8000
ARTICLE IN PRESS V. Martin et al. / Signal Processing 88 (2008) 539–557
551
x 10−3 130
~ ~
~ ~
BER
7.5 5 2.5 0 −10
−9
−8
−7
−6
−5 WNR
−4
−3
−2
−1
Fig. 11. Robustness to AWGN, L ¼ 300, PSNR ¼ 43 dB.
BER
100
10−5 200
400
600
800 L
1000
1200
1400
Fig. 12. Robustness to AWGN, WNR ¼ 4 dB, PSNR ¼ 43 dB.
BER
100
10-2
10
-4
70
75
80 85 JPEG Quality factor
90
Fig. 13. Robustness to JPEG compression, L ¼ 64, PSNR ¼ 43 dB.
Fig. 14. Robustness to denoising (WNR ¼ 14 dB), PSNR ¼ 43 dB.
95
0
ARTICLE IN PRESS V. Martin et al. / Signal Processing 88 (2008) 539–557
552
100
W-bilin DS
BER
DS+W SCS
10-2
RDM LISS
10-4
0
100
200
300 L
400
500
600
Fig. 15. Robustness to histogram equalization, PSNR ¼ 43 dB.
5.1. Robustness to noise attacks
5.2. Robustness to valumetric attacks
Classical noise attacks include AWGN, JPEG compression and denoising.
Valumetric attacks refer to intensity scale modifications. The most simple example is the gain attack: z ¼ ry; r 2 R. More powerful attacks include local scale factors or non-linear operations. Histogram equalization combines these features. SCS and ST-SCS are vulnerable to valumetric attacks. Thus, RDM has been designed under the constraint of robustness to a gain attack. This scheme uses an embedding rule similar to (6), with a locally adaptive quantization step Dhðxk Þ. Here, xk is a vector of N v samples preceding xk , and h is an homogeneous function: hðryk Þ ¼ rhðyk Þ. In the simulations, h is the Minkowski distance and N v ¼ 4. Function h could be close to the interpolation function g used in W-interp, which is also homogeneous. However, in order to ensure errorless decoding in the absence of attacks, the embedding in RDM is causal. This is not the case in W-interp where the interpolation-based psychovisual mask is inherently non-causal and 2D (for images). Obviously, RDM is invariant to the gain attack, which is not totally the case of W-interp. However, Fig. 15 shows that W-interp is also very robust to a nonlinear valumetric attack, histogram equalization, as well as LISS and DS þ W.
5.1.1. AWGN attack According to the study of D, W-bilin is more robust than W-spline to the AWGN attack. When WNR and/or L are low, DS þ W is the most robust (cf. Fig. 11). Fig. 12 shows that thanks to its hostinterference rejecting property, W-interp outperforms DS, DS þ W and LISS for a reasonable WNR and high L. However, SCS, which is also based on repetition coding, is by far the most robust technique to the AWGN attack. In Fig. 12, low BER values are obtained theoretically. Note that for low BER values, the theoretical W-interp decoding performance might not be very accurate due to the Gaussian approximation for ðzÞ. 5.1.2. JPEG compression Fig. 13 displays the robustness to a JPEG compression as a function of the quality factor. In the experimental conditions, DS þ W provides a BERo104 , not shown in Fig. 13. According to these experiments, W-interp is more robust than the four remaining techniques at low quality factors. 5.1.3. Denoising Denoising algorithms such as Wiener filtering aim at estimating the watermark as an additive Gaussian noise independent of the document, and at removing it. The assumption is valid for DS and DS þ W methods. LISS and ST-SCS marks are slightly correlated to the document. Since W-interp marks are strongly correlated to the document, the assumption is not valid for W-interp. Thus, W-interp is very robust to denoising (cf. Fig. 14).
5.3. Robustness to geometrical attacks Simple geometrical transforms such as scaling, translation or rotation lead to a desynchronization between the watermark and the detector. This is tricky to cope with for most watermarking schemes. More complex attacks such as the Stirmark attack [31] consist of local geometrical transforms. Even after resynchronization, generally based on autocorrelation-based techniques [12], decoding errors
ARTICLE IN PRESS V. Martin et al. / Signal Processing 88 (2008) 539–557
are introduced by interpolation. It is difficult here to distinguish between the two kinds of interpolation caused by embedding and attack. Moreover, low values of WNR can be observed. Due to interpolation perceptual properties, this noise is lowly perceptible. In the case of W-interp, interpolation caused by attack and resynchronization interfere with embedding interpolation, leading to low WNR values. As a consequence, W-interp is less robust to resynchronized geometrical attacks than DS þ W for instance. Moreover, content-based watermarking and invariant embedding domain synchronization techniques are not compatible with the basic version of W-interp. Image normalization [32] could be used, but it involves also several interpolation steps. Thus, specific resynchronization techniques should be investigated. In the specific case of upscaling followed by noise addition attack, the improved detection technique of [19] could be easily adapted to W-interp in order to benefit from the additional information provided by interpolation during the upscaling step. The same polyphase componentslinear combination and optimization procedure as in [19] can be used. W-interp detector must then take into account the new noise power. 6. Security Attacks on the security aim at uncovering or estimating the secret key K from several observations of data watermarked with K. 6.1. Embedding model and security level The security level of an algorithm is the minimum number of observations required to estimate K with a sufficient accuracy. Tools based on information theory have been recently developed to assess the theoretic security level of watermarking schemes [24,33]. Practical algorithms implement attacks on the security for classical spread spectrum [33] and quantization-based schemes [24]. An embedding algorithm makes a perfect covering if the mutual information Iðy; wÞ ¼ 0, i.e. the observation of y does not reveal any information about w [33]. Due to the specificity of W-interp, it is difficult to conduct theoretical derivations similar to [24,33]. Thus, we propose in the following an ad hoc study of the security. W-interp does not make a perfect watermark covering, since it modifies the distribution of an interpolation error. The embedding
553
model can be reformulated as follows: y k ¼ xk þ m l
þ1 X
! f ðd 0k
þ iÞxkþð2i1Þ xk
þ nk .
i¼1
(20) The noise njSl is introduced for instance by an inherent quantization in 8-bit images. It can also be voluntarily introduced by the embedder, to improve the security level. Eq. (20) shows that estimating each non-null f ðd 0k þ iÞ resorts to solving a noisy linear system with N v unknowns. Moreover, other latent variables are unknown: Sl and ml ; l ¼ 1; . . . ; L. The pirate would need at least N o ¼ N v observations to estimate d if s2n ¼ 0 and ml ¼ 1 for all observations. In practice, the simultaneous estimation of all secret parameters is more complicated. However, the security level of W-interp remains lower than that of DS techniques and dither modulation quantization-based schemes. In quantization-based techniques, the dither signal d can be estimated using (6) from several observations. Thus, strong distortion-compensation (cf. Section 4.2) is essential to the security. In this case, yk is randomly distributed around the codeword, since xk is unknown to the pirate [24]. A similar improvement is offered by distortion-compensation in W-interp. In DC-W-interp, a noise n is added to the interpolation codeword, since xk is unknown to the pirate if ml ¼ 1: nk ¼ ml ða 1Þ ðx~ k xk Þ. However, similarly to DC-QIM, there is a compromise between security and robustness. Indeed, a high value of a affects the decoding performance if it is not justified by an AWGN attack. 6.2. Security attack on W-interp In this section, security to the watermarked only attack (WOA) is studied: the attacker has only access to N o documents fyno gno 2f1;...;N o g watermarked with the same key K. Two security attack algorithms specifically tailored to W-interp are proposed, as well as simulations assessing its security level. 6.2.1. Proposed approach In the context of digital forensics, [34] proposed to expose tampering by detecting interpolation traces. Indeed, most digital cameras use specific algorithms for missing sample interpolation. In the
ARTICLE IN PRESS 554
V. Martin et al. / Signal Processing 88 (2008) 539–557
tampered areas of the document, this form of interpolation might disappear. Thus, one can detect and even locate digital forgeries. An expectationmaximization (EM) [35] algorithm is used to estimate simultaneously the interpolated pixels and the specific interpolation weights. In the context of this paper, this EM algorithm has been adapted to estimate K ¼ fS1 ; . . . ; SL ; dg. The first security attack consists of Popescu and Farid’s EM algorithm, applied to y ðN o ¼ 1Þ. The EM algorithm provides an estimation of the weights, as well as the map of the probability for each sample to be marked. The algorithm also estimates the variance s2EM of the estimated interpolation error on the estimated watermarked samples. When the algorithm converges to the correct interpolant, s2EM ! 0 since the interpolation error is null on the marked samples. For a given DWR, S is estimated as the coordinates of the N S =2 probability map greatest values. In the second attack, N o 41. For each k, the EM algorithm is applied to the collection fynko gno 2f1;...;N o g and their neighborhoods. On S, the prior probability that a sample is marked is 12. The estimation of S consists of the coordinates where the EM variance among marked samples is the lowest, since in this case algorithm converges if N o is sufficiently large. At the output, an estimation of d k is provided. The probability map allows then to decode m. Finally, S1 ; . . . ; SL could be estimated thanks to source separation algorithms [33]. Note that the attack is suboptimal, since estimation of m would be aided by a simultaneous estimation of S1 ; . . . ; SL . W-interp is more vulnerable to known message attack (KMA) and known original attack (KOA) for which the proposed algorithms can be simplified. For KOA, the probability of Sl not to be exposed is 1=2N o . If S is known, estimation of d
simply resorts to a noisy linear system solving. For KMA, this linear system is combined with the Sl decision strategy of the second attack, while no EM algorithm is necessary for estimating m. 6.2.2. Practical results Simulations have been performed on W-bilin for simplicity. Extensions to W-spline (approximated to finite interpolant support) and to other implementations of W-interp are possible. The prior probability of a pixel to be in S is derived from the DWR. In a practical scenario, the pirate may not have access to the DWR and the estimation of k might be more difficult. The test image set is composed of 47 images extracted from [25]. Images are tiled to reach a large N o . Note that intercorrelation between neighboring pixels already exists in non-watermarked documents. This affects the efficiency of the EM algorithm. In [34], good performance is achieved since the probability pH1 of a pixel to be interpolated is 12. For W-interp, pH1 can be much lower, depending on DWR, which increases security. Fig. 16 displays the probability of each pixel to be watermarked, when the algorithm converges (d constant). Dark pixels correspond to high probabilities. When z is not watermarked, least probable points correspond to edges, where the interpolation error is the largest. When z is watermarked with very low DWR, the most probable points are spread on the whole document: they correspond to watermarked pixels. If N o ¼ 1, the attack is successful only if d is null or constant for all watermarked pixels and only for a low PSNR (cf. Fig. 17). Up to 80% of points in S can be estimated. If PSNR439 dB, the intrinsic properties of x prevent the algorithm convergence. If d is randomly generated for each k 2 S, the attack fails for any PSNR, which confirms that
Fig. 16. EM probability map, Lena, constant d, (left) not watermarked, (right) watermarked, PSNR ¼ 37 dB.
ARTICLE IN PRESS V. Martin et al. / Signal Processing 88 (2008) 539–557
555
shifts estimation error
1 0.8 0.6 0.4 Constant shift No shift Variable shift
0.2 0 37
39
41
43
45
47
49
51
PSNR
% of correct points
100 Constant shift No shift Variable shift
80 60 40 20 0 37
39
41
43
45
47
49
51
PSNR
shifts estimation error
Fig. 17. Attack on security, N o ¼ 1. 0.2 0.15 0.1 0.05 0
3
Detection error rate on S
10 No 0.5
s2EM decreases rapidly on S when N o increases and a rough estimation of S becomes possible. If the attacker has access to less than N o ¼ 103 documents, W-interp is secure to this attack. Note that such N o can be reached in video watermarking, while it is a reasonable security level for document watermarking. A key-informed attack on the robustness could also be performed with a rough estimation of K.
0.4
7. Conclusion
0.3 0.2 0.1 0 3
10 No
Fig. 18. Attack on security, N o 41, PSNR ¼ 43 dB.
W-interp is secure against this specific attack when a single document is known. Up to 25% of points in S can be estimated at PSNR ¼ 37 dB, which is only slightly superior to a random choice since the ratio of watermarked points (N S =2N) is high at low PSNR. d guarantees W-interp security when N o ¼ 1. If N o 41 and d is variable, d can be uncovered only when N o is very large (cf. Fig. 18). However,
This paper has proposed an interpolation-based generic watermarking scheme W-interp. W-interp is a blind informed watermarking algorithm which can be related to random binning schemes. Interpolation is used to construct a document-dependent bin. Informed embedding strategies have been proposed. In particular, a distortion-compensation technique improves the robustness. The security of W-interp is based on the secrecy of the embedding sites and random parameters of the interpolation algorithm. For linear interpolation functions, the security level is weaker than that of classical binning schemes. However, the security can be improved by distortion-compensation techniques and by the choice of non-linear interpolation
ARTICLE IN PRESS 556
V. Martin et al. / Signal Processing 88 (2008) 539–557
functions. W-interp applies to low-rate scenarios since redundancy is required at the decoding. The maximum achievable rate is imposed by the transmission of information about the host-dependent codes. Since the bins of W-interp are overlapping, W-interp is less robust to AWGN than classical binning schemes. Like most of classical watermarking schemes, W-interp is also fragile to desynchronizing attacks and should be combined with specific counter-measures. However, W-interp possesses two main advantages. Firstly, it includes an inherent perceptual mask. Secondly, experiments on images have shown a good robustness to various attacks, including histogram equalization. This study is valid for each implementation of the generic scheme W-interp. Experiments have focused on the application to natural images. Two particular cases have been studied: W-bilin, based on bilinear interpolation, and W-spline, based on bicubic B-spline interpolation. The comparison shows a better imperceptibility of W-spline, but better robustness of W-bilin to AWGN and JPEG compression attacks. Objective quality metrics have confirmed the perceptual properties of W-interp. However, a subjective evaluation might reveal visual artifacts. An accurate subjective evaluation would be useful to select the most suitable interpolation techniques for the watermarking application. For instance, edge-preserving interpolation could be used. Among other interpolation techniques, non-linear interpolation techniques or scattered sampling grids would introduce major changes in the properties of W-interp. Finally, the implementation has been conducted in the spatial domain for perceptual concerns. Classical watermarking schemes are more robust to some attacks in transformed domains. It would be interesting to study W-interp in such transformed domains in order to address its lack of robustness to some attacks. In particular, the possibility to implement the detection stage in a transformed domain could be investigated.
References [1] J. Smith, B. Comiskey, Modulation and information hiding in images, Information Hiding Workshop, 1996, pp. 207–226. [2] J. Herna´ndez, F. Pe´rez-Gonza´lez, Statistical analysis of watermarking schemes for copyright protection of images, IEEE Proc. (Special Issue on Identification and Protection of Multimedia Information) 87 (7) (1999) 1142–1166.
[3] M. Miller, I. Cox, J. Bloom, Informed embedding: exploiting image and detector information during watermark insertion, in: IEEE International Conference on Image Processing— ICIP, vol. 3, 2000, pp. 1–4. [4] H. Malvar, D. Floreˆncio, Improved spread spectrum: a new modulation technique for robust watermarking, IEEE Trans. Signal Process. 51 (4) (2003) 898–905. [5] P. Moulin, R. Koetter, Data-hiding codes, Proc. IEEE 93 (12) (2005) 2083–2127. [6] B. Chen, G. Wornell, Quantization index modulation: a class of provably good methods for digital watermarking and information embedding, IEEE Trans. Inform. Theory (2001) 1423–1443. [7] J. Eggers, R. Ba¨uml, R. Tzschoppe, B. Girod, Scalar Costa scheme for information embedding, IEEE Trans. Signal Process. 51 (4) (2003) 1003–1019. [8] J. Oostven, T. Kalker, M. Staring, Adaptive quantization watermarking, Proc. SPIE 5306 (2004) 296–303. [9] F. Pe´rez-Gonza´lez, C. Mosquera, M. Barni, A. Abrardo, Rational dither modulation: a high-rate data-hiding method invariant to gain attacks, IEEE Trans. Signal Process. 10 (2) (2005) 3960–3975. [10] S. Voloshynovskiy, A. Herrigel, N. Baumgartner, T. Pun, A stochastic approach to content adaptive digital image watermarking, in: International Workshop on Information Hiding, 1999, pp. 212–236. [11] T. Kalker, A. Janssen, Analysis of SPOMF detection, in: Proceedings of the IEEE Conference on ICIP, vol. 1, 1999, pp. 316–319. [12] M. Alvarez-Rodrı´ guez, F. Pe´rez-Gonza´lez, Analysis of pilotbased synchronization algorithms for watermarking of still images, Signal Processing: Image Commun. 17 (8) (2002) 611–633. [13] K. Su, D. Kundur, D. Hatzinakos, A content-dependent spatially localized video watermarked for resistance to collusion and interpolation attacks, in: Proceedings of the IEEE International Conference on Image Processing, vol. 1, 2001, pp. 818–821. [14] CheckMark, hhttp://watermarking.unige.ch/Checkmark/i. [15] P. Bas, J.-M. Chassery, B. Macq, Robust watermarking based on the warping of pre-defined triangular patterns, Proceedings of the SPIE Electrical Imaging, Security and Watermarking of Multimedia Content II 18 (2000) 99–109. [16] J.J.K. O´ Ruanaith, T. Pun, Rotation, scale and translation invariant spread spectrum digital image watermarking, Signal Processing 66 (3) (1998) 303–317. [17] G. Boato, F.D. Natale, C. Fontanari, F. Melgani, Hierarchical ownership and deterministic watermarking of digital images via polynomial interpolation, Signal Processing: Image Commun. 21 (7) (2006) 573–585. [18] R. Ohbuchi, H. Masuda, M. Aono, A shape-preserving data embedding algorithm for NURBS curves and surfaces, in: Proceedings of the Computer Graphics International (CGI), 1999, pp. 170–177. [19] A. Giannoula, N. Boulgouris, D. Hatzinakos, K. Plataniotis, Watermark detection for noisy interpolated images, IEEE Trans. Circuits Systems 53 (5) (2006) 359–363. [20] P. The´venaz, T. Blu, M. Unser, Image interpolation and resampling, in: I. Bankman (Ed.), Handbook of Medical Imaging, Processing and Analysis, Academic Press, San Diego, USA, 2000, pp. 393–420 (Chapter 25).
ARTICLE IN PRESS V. Martin et al. / Signal Processing 88 (2008) 539–557 [21] M. Unser, Splines: a perfect fit for signal and image processing, IEEE Signal Process. Magazine 16 (6) (1999) 22–38. [22] V. Martin, M. Chabert, B. Lacaze, A novel watermarking scheme based on interpolation for digital images, in: Proceeding of ICASSP, vol. 2, 2006, pp. 217–220. [23] V. Martin, M. Chabert, B. Lacaze, Substitutive watermarking algorithms based on interpolation, in: Proceedings of EUSIPCO, 2006. [24] L. Pe´rez-Freire, P. Comesan˜a, J. Troncoso-Pastoriza, F. Pe´rez-Gonza´lez, Watermarking security: a survey, Lecture Notes in Computer Science, Transactions on Data Hiding and Multimedia Security, vol. 4300, 2006, pp. 41–72. [25] City University of Hong Kong Corel Image Database, hhttp://abacus.ee.cityu.edu.hk/benjiman/corel_1/i. [26] E. Marini, F. Autrusseau, P.L. Callet, P. Campisi, Evaluation of standard watermarking techniques, in: SPIE Electronic Imaging, Security, Steganography, and Watermarking of Multimedia Contents IX, 2007. [27] A. Watson, Visually optimal DCT quantization matrices for individual images, in: Data Compression Conference, 1993, pp. 178–187.
557
[28] X. Li, M. Orchard, New edge directed interpolation, IEEE Trans. Image Process. 10 (10) (2001) 1521–1527. [29] H.V. Trees, Detection Estimation and Modulation Theory, Wiley, New York, 1968. [30] L. Pe´rez-Freire, P. Comesan˜a, F. Pe´rez-Gonza´lez, Detection in quantization-based watermarking: performance and security issues, SPIE, 2005. [31] Stirmark, hhttp://www.petitcolas.net/fabien/watermarking/ stirmark/i. [32] P. Dong, J. Brankov, N. Galatsanos, Y. Yang, F. Davoine, Digital watermarking robust to geometric distortions, IEEE Trans. Image Process. 14 (12) (2005) 2140–2150. [33] F. Cayre, C. Fontaine, T. Furon, Watermarking security: theory and practice, IEEE Trans. Signal Process. (Special Issue on Content Protection) 53 (10) (2005) 3976–3987. [34] A. Popescu, H. Farid, Exposing digital forgeries in color filter array interpolated images, IEEE Trans. Signal Process. 53 (10) (2005) 3948–3959. [35] A. Dempster, N. Laird, D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Statist. Soc. 99 (1) (1977) 1–38.