Finding robust domain from attacks: A learning framework for blind watermarking

Finding robust domain from attacks: A learning framework for blind watermarking

Neurocomputing 337 (2019) 191–202 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Finding...

3MB Sizes 0 Downloads 55 Views

Neurocomputing 337 (2019) 191–202

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Finding robust domain from attacks: A learning framework for blind watermarking Seung-Min Mun a, Seung-Hun Nam a, Haneol Jang b, Dongkyu Kim a,∗, Heung-Kyu Lee a,∗ a b

School of Computing, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, South Korea The Affiliated Institute of ETRI, Daejeon, South Korea

a r t i c l e

i n f o

Article history: Received 27 December 2017 Revised 21 November 2018 Accepted 24 January 2019 Available online 1 February 2019 Communicated by Jungong Han Keywords: Digital watermarking Color image watermarking Blind watermarking Convolutional neural network (CNN)

a b s t r a c t In recent years, some researchers have been interested in whether robustness and blindness can be simultaneously secured in a watermarking based on machine learning. However, achieving robustness against various attacks at once is still difficult for watermarking techniques. To address the problem, in this paper, we propose a learning framework for robust and blind watermarking based on reinforcement learning. We repeat three stages: watermark embedding, attack simulation, and weight updating. Specifically, we present image watermarking networks called WMNet using convolutional neural networks (CNNs). Two methods to embed a watermark are proposed and these two methods are based on backpropagation and autoencoder, respectively. We can optimize the robustness while carefully considering the invisibility of the watermarking system. The experimental results show that the trained WMNet captures more robust features than the current watermarking schemes, which use the frequency domain. The trade-off between the robustness and the invisibility of each technique was measured. Also, we adopt a visual masking with which we can achieve the appropriate balance between robustness and invisibility of the watermark. Our reinforcement-learning-based technique has better robustness than the existing techniques for both attacks seen in learning and unseen attacks. Due to the generalization ability of WMNet, moreover, it shows high robustness against multiple attacks and various levels of attacks which are not considered in training stage. © 2019 Elsevier B.V. All rights reserved.

1. Introduction Watermarking is used to identify and protect the ownership of copyrighted media content by embedding invisible data into the original content. The most challenging requirement for watermarking schemes is robustness: the detector should be able to properly extract the watermark even if the content has distortions. Another requirement is invisibility: the watermark embedding should not excessively damage the visual quality. Over the last decade, most watermarking techniques have acquired robustness using the frequency domain. Generally, the first step is to use a transformation method such as discrete cosine transform (DCT), discrete wavelet transform (DWT) or discrete Fourier transform (DFT) to create a set of values that correspond to the original pixel data. Then, some values in the transformed domain are modified and become the pixel data



Corresponding authors. E-mail addresses: [email protected] (S.-M. Mun), [email protected] (S.-H. Nam), [email protected] (H. Jang), [email protected] (D. Kim), [email protected] (H.-K. Lee). https://doi.org/10.1016/j.neucom.2019.01.067 0925-2312/© 2019 Elsevier B.V. All rights reserved.

by inverse transformation [1–3]. In recent years, the quaternion discrete Fourier transform (QDFT) has been the best approach for blind watermarking techniques for color images as shown in [4–6]. These techniques are robust against signal processing attacks and geometric attacks. For geometric attacks, robustness is secured with the help of templates. However, since these templates are limited to compensating for rotation, scaling, and translation (RST) attacks, they are vulnerable to general affine transformations. The coefficients of the frequency domain are signals that are widely used for watermarking, but they are obtained from a simple linear transform. Therefore, we cannot be sure that these domains are simultaneously robust against various signal-processing attacks. For example, our experiments show that the QDFT is vulnerable to Salt-and-pepper noise and Gamma correction, and DCT is weak in Gaussian filtering and median filtering. Other special embedding domains have been proposed for robustness against geometric attacks [7–11]. One approach uses feature points that are invariant to geometric attacks [7,8,10,11]. A histogram-based technique has also been proposed not to be affected by the pixel coordinate [9]. However, the watermark capacity of these techniques is limited ( < 255).

192

S.-M. Mun, S.-H. Nam and H. Jang et al. / Neurocomputing 337 (2019) 191–202

Over the past years, there have been some studies on watermarking using neural network (NN). Most studies involve nonblind techniques [12,13]. Other methods use existing frequencydomain-based watermarking algorithms such as the spread spectrum (SS). These techniques adjust the embedding intensity of each sub-block using NNs. The optimization target of these techniques is invisibility, and robustness is not considered in learning [14–17]. In summary, most of these techniques are non-blind, which is less practical. The blind techniques are eventually classified as frequency-domain-based watermarking. In addition, deeper neural networks guarantee higher performance in various tasks such as image segmentation, detection, and classification [18–21], but the previous techniques used a notably shallow network. The first CNN-based watermarking technique was proposed in [22]. This technique first trains the network using general deep learning and subsequently use it for watermarking. For detection, an image block to be inspected is input, and two different blocks are generated by two autoencoders. Through the Euclidean distance between the suspect block and the generated block by autoencoder, the autoencoder used at embedding and the message are determined. Unlike [22], our learning process is specialized in watermarking and acquires robustness in response to attacks. Although the robustness of [22] is excellent, [22] is a non-blind scheme, so it lacks practicality in the real world. For example, when there is an image to check a watermark, the non-blind technique first searches the database for the autoencoder that corresponds to this image. At this time, how to identify the suspect image from a prepared image set is another problem, and this process may be more difficult than watermark detection. “Blind watermarking” indicates that when we detect a message, we do not require any other stored information that depends on the suspect image. The conventional machine learning based watermarking techniques do not aim for learning a watermarking domain. They focus on a few specific individual images instead of the general watermark patterns. Especially, the same images are used in training a neural network as well as in detection. Therefore, when the previous techniques detect a message after an attack, their networks already know information related to the suspected image. Meanwhile, in order for our technique to become a blind watermarking technique, the following condition must be satisfied. Our technique should not train the detector in advance, especially during the embedding it should be fixed. Note that the detector can be used in the embedding but its weights or parameters must not be changed. Our goal is to achieve the optimized robustness while satisfying the above conditions, i.e., without losing blindness. Since, in real scenarios, attacks are not open to the embedder nor detector, a reinforcement learning framework is borrowed to find robust domain from the attacks. As a result, the framework in Fig. 1 was finally derived to find a watermarking domain that resists those attacks. In other words, we have found a set of cover signals that can hold messages by machine learning. It does not require human labeling and we automatically obtain the watermarking embedder and detector using uncompressed natural images and only eight types of attacks. Signal-processing attacks and geometric attacks were experimentally tested. A comparison with the QDFT-based watermarking paper [6] was performed. The experimental results show that our learning-based watermarking scheme can surpass the existing frequency domain in watermarking. Again, note that our technique blindly extracts the message. After the training completes, we no longer modify the weights of the WMNet. Therefore, the weights are independent of the test images. The image sets for training and testing are strictly separate. The training and testing processes are described in Section 2 and Section 3.2, respectively.

a

b Fig. 1. (a) Framework of a general reinforcement learning. (b) Framework of the training for WMNet. The embedded watermark is shown as visible for better understanding. This iterative process can create a robust domain based on the given attacks such as JPEG, resizing, and noise addition. Our framework can adaptively re-acquire robustness.

The contributions of this paper are summarized as follows: • To the best of our knowledge, this paper is the first attempt to design a deep learning-based blind watermarking technique. The proposed technique finds a watermarking domain with a deep neural network instead of the frequency domain. As a result, more robust features that are customized for watermarking are extracted as shown in Fig. 5. • We have designed an adaptive domain to the set of attacks. Most watermarking studies acquire robustness by repeating the attack experiments and adjusting the parameters. In this paper, this process is automated and expressed as a reinforcement learning framework. • We propose an embedding method that can freely control the invisibility of the watermark when using an embedder network. We repetitively modify an image little by little to ensure perceptual invisibility. For each embedding, we apply the visual masking while adjusting the embedding strength of the message. As a result, the invisibility can be changed even though we use a fixed-weight CNN. More details are shown in Section 4.2. A summary of the proposed learning framework is given in Section 2. There are two methods to embed a watermark and these two methods are proposed in Section 3 and Section 4, respectively. In Sections 5 and 6, we evaluate the performance of the proposed method. In the last section, we conclude our work. 2. Main concept of the proposed learning framework As shown in Fig. 1, the process of training WMNet improves the performance through repetition and the framework of general reinforcement learning. The goal of this iterative learning is to train

S.-M. Mun, S.-H. Nam and H. Jang et al. / Neurocomputing 337 (2019) 191–202

our WMNet as an optimized watermark embedder and a detector. Training comprises of three stages, which begin from the CNN with random weights. These three stages are repeated until the message embedded by the network (or embedder CNN) is correctly identified after the attacks. The weights of the WMNet are changed to minimize our loss function L during the training. Then, these CNNs are used in watermarking techniques as described in Section 3.2. In this section, we provide the concepts for each stage. The first stage is watermark embedding, which is the process of inserting a message into the cover image using the embedder network. This stage accepts images of size M × N and generates watermarked images. The WMNet and binary message (or watermark) image are required. Here, we incrementally change the pixel using the detector network. We repeat this image modification until the detector correctly detects the message in the image. There are two methods to achieve this goal: (1) to embed the message using only the detector network: with back-propagation, we can add noise to the image to cause the detector network to output the desired message value; the details are provided in Section 3.1; (2) to use an autoencoder designed for only embedding. The embedder and detector are connected as if they are the generator and discriminator of generative adversarial networks (GANs), as shown in Fig. 3. The details are shown in Section 4.1. The second stage is attack simulation. Attack simulation is necessary for the CNN to adaptively capture robust features for various attacks. This stage accepts the watermarked images and message images to produce the attacked image. After the message image is attacked, it is used as a label in the updating stage. A simulated attack set can include any type of attacks. For each image and message, attacked images are created as many times as the number of attacks. The third stage is updating. We update the weights of the detector to correctly extract the message from the given image. The detector in the embedding stage and the attacked images are required. Stochastic gradient descent (SGD) is used to update the weight so that the detector identifies the message from an attacked image. Although our entire framework is a form of reinforcement learning, we supervise the learning in this stage. At the end of this stage, all created watermarked and attacked images are discarded, and the process returns to the embedding stage. 3. Backpropagation-based watermarking using single network This section explains how to embed a watermark using only a single network (i.e., detector network). Similar to previous works that guarantee the high capacity in the existing frequency domain [4–6], our technique embeds and detects the watermark on nonoverlapping sub-blocks. We divide the image into sub-blocks and detect the message in each sub-block. 3.1. Construction of the detector We use a network of convolutional layers and rectified linear units (ReLUs) as presented in [18–20]. The convolutional layer is a linear operation that accepts the 3D volume and produces a convolved 3D volume. Each kernel of the convolution consists of variable parameters, which are called weights. An ReLU layer is a non-linear operator that performs a simple function f : R → [0, ∞ ), f (x ) = max({0, x} ) for all elements x in a given 3D volume. The WMNet alternates between convolutional layers and ReLU layers and takes a pixel block B of size R × C as an input to determine the message bit m ∈ {0, 1}. From two outputs h0 and h1 of the last convolutional layer, we can apply the following softmax function:

p( y = m | W D , B ) =

ehm , eh0 + eh1

193

a

b Fig. 2. Examples of our (a) watermark embedding and (b) weight-updating stage. In the embedding stage (a), unlike the common SGD, we fix the weights and repeatedly modify the image blocks. We put an image block into the network, and modify the block so that the result of the network is the same as the message. When this process is completed for all the blocks, we can obtain the watermarked image where the message is inserted. In (b), we apply SGD in the usual manner. In this stage, the network accepts image blocks as inputs and modifies its weights so that the output is the same as the message. As this process is repeated, a more accurate detector is obtained. The WMnet includes residual mappings [20], but the architecture is simplified in this figure.

where WD is the weights of the detector network; y is a random variable for the output message; p(y = m|WD , B ) or p(m|WD , B) indicate the probability that message bit m is embedded into B. If we define the derivative of ReLU at 0, both convolution and ReLU are differentiable functions. Therefore, p(m|WD , B) is differentiable with respect to any weight WD of the convolutional layers and pixel block B. Stage 1: Watermark embedding using the CNN First, the cover images are divided into non-overlapping blocks of R × C size. Message bit m ∈ {0, 1} of each pixel block B is determined by their coordinates as shown in Fig. 2. If the pixel block of the cover image is a block of the ith row and jth column, the message bit is selected from the ith row and jth column of the message image. We intend to incrementally change the pixel blocks so that p(m|W, B) becomes 1. Our modification of the cover image is inspired by the stochastic gradient descent (SGD) method [23]:

B(t+1) = B(t ) − α∇B L(WD , B(t ) , m ),

(2)

where is the pixel block for embedding at iteration t; α is the embedding factor, which is a hyper-parameter, to stabilize the convergence; L is the loss function defined as follows: B(t)

L(WD , B, m ) = − log p(m|WD , B ) +

1 λB − B(0) 2 , 2

where B(0) is the initial pixel block, i.e., the raw pixel block of the cover image. The first term on the right-hand side indicates a cross-entropy loss. We introduce a regularization factor λ and an l2 regularization term • to ensure the invisibility. Empirically, we have confirmed that if α is sufficiently small, the loss gradually decreases even if we initialize weights to random values at the beginning of learning. A loss close to 0 indicates that message bit m is properly embedded into pixel block B. At this time, the blocks are combined to produce a watermarked image. The weight WD does not change at this stage. We can detect message bit m in each block by computing arg max p(m|WD , B ) in m

(1)

(3)

Eq. (1).

194

S.-M. Mun, S.-H. Nam and H. Jang et al. / Neurocomputing 337 (2019) 191–202

Stage 2: Attack simulation The message images are upsampled to size M × N and attacked with the same type and parameter as applied to the cover images. Then, we downsample the message images to the original size. As a result, even if a geometric attack such as a rotation occurs, the block can be correctly labeled as shown in Fig. 2 (b). Both the watermarked cover image and message images are attacked. If we skip this calibration for the message images, the detector will be notably vulnerable to geometric transformations, such as the QDFT technique, as shown in Table 1. Stage 3: CNN weight updating This stage begins by resizing the attacked images into size M × N. As before, the image is divided into non-overlapping blocks of size R × C. Again, message bit m of each distorted block B is determined by its coordinates as shown in Fig. 2. If the average value of the message image block is greater than 0.5, we set message bit m as 1; otherwise, it is set as 0. When updating is completed by repeating the following equation, the detector can successfully predict m from input B . (t+1 )

WD

(t )

(t )

= WD − η∇WD L(WD

, B , m ),

(4)

(t )

where WD is the weight at iteration t, and η is the learning rate, which is a hyperparameter. The loss function L is identical to that of the embedding stage. A lower loss indicates that the CNN can more accurately extract message bits even under the attacks. In other words, we can secure the robustness against various types of attacks. 3.2. Use of the trained network If the weight parameters of the detector network sufficiently converge through the above learning framework, the network can be used as an embedder and detector for blind watermarking in an actual application, as described in this subsection. The method to use the detector for embedding is described in detail in Section 3.1. The embedding processes for the learning and actual application are identical. However, in this case, the network has weights obtained at the end of learning. Naturally, the detector network can also extract a 1-bit message for each R × C block as before from an image of size M × N, i.e., it predicts the message from the block of the attacked image. If a probability p(m = 0|WD , B ) that the detector predicts is larger than 0.5, the message of the block is designated as 0. Otherwise, the message bit is designated as 1. As a result, a message image of

M/R × N/C size is extracted for one image. 4. Improved embedding method using autoencoder The embedding method using only one detector requires a back-propagation operation to add a gradient as shown in Section 3.1. Experimentally, we have also found that adding the gradient to an image causes performance degradation if batchnormalization is included in the network. We assume that the initial behavior of batch-normalization is interfering with the computation of gradient. However, without dropout or batch-normalization, it might be difficult to obtain a global optimum [24]. Thus, we create a new network for embedding to use these latest regularization methods. In other words, we propose an embedding method that can embed a message through a feed-forward operation without calculating the gradient. We can use two networks (i.e., WMNet) including the embedder network instead of only the detector network. The robustness

is now more reliably increased using an embedder with an autoencoder structure. This section explains the improvement from the framework of Section 3 and the introduction of a new network specialized for embedding. Visual masking can also be introduced to flexibly control the invisibility and robustness. We also extended the receptive filed without modifying the network structure to further investigate the relationship between neighboring pixels. Again, attack simulation and weight updating in Section 3 are applied equally here. 4.1. Embedder network of the autoencoder structure This subsection describes a separate network designed for only embedding. The network for embedding receives a pixel block and outputs a modified pixel block differently from the network for detection. There are many recent successful cases of various tasks when configuring autoencoders using the essentials of deep learning [25–29]. Using these structures, a directly watermarked pixel block can be obtained without back-propagation. Therefore, when using the autoencoder, Eq. (2) can be simply modified as follows.

B(t+1) = B(t ) + α (t ) AE (WAE , B(t ) , m ),

(5)

where function AE indicates the output of the autoencoder. To calculate function AE, a message image is required. The message image is simply added to the feature map reduced by the encoder, as shown in Fig. 3. Then, the decoder spreads the effect of the message image to the spatial domain. Note that unlike [25–29], the proposed autoencoder must accept not only images but also messages as inputs. This network structure will also be part of our contribution because there is no deep-learning based watermarking technique that embeds messages in this way. The loss function remains as defined in Eq. (3), and the modification method of WAE is similar to equation (4). The structure of the autoencoder is a symmetrical extension of the detector network. WAE and WD are the weights of the autoencoder (i.e., the embedder network) and detector network, respectively. (t+1 ) (t ) WAE = WAE − η∇WAE L(WD , B , m ).

(6)

Where B is a watermarked block. Like previous techniques using autoencoders [25,27], the embedder network consists of convolution and transposed convolution. Because the embedder network contains batch-normalization, it suppresses the specific phenomenon that the performance decreases when the iteration increases. In addition, as previously mentioned, to ensure blind detection, the detector network should not change during the embedding stage. In the training process (Section 2), the autoencoder for embedding optimizes the weight, but in the testing (Section 3.2), we do not modify the weight. After the learning is complete, neither the embedder nor the detector is trained. In addition, the detector is no longer required in the embedding stage. After learning ends, the detector does not require information about the embedder and vice versa, i.e., the embedder and detector are used independently. Therefore, it is more secure than the case where the message is embedded and detected by one detector network. To summarize, in the case of the back-propagation based method, we have added the gradient obtained by back-propagation in Eq. (2) for embedding. Now, we modified the embedding method for the feed-forward operation as shown in Eq. (5). The detection and attack simulation is identical to the previously described case. In this process, it is possible to safely regularize the embedder and detector networks and continuously improve the performance of the system without overfitting. When the autoencoder is used, it is confirmed that the embedding time and detection time are shortened as shown in Table 2 because the

S.-M. Mun, S.-H. Nam and H. Jang et al. / Neurocomputing 337 (2019) 191–202

Fig. 3. Steps to train an autoencoder for embedding. The embedder network is indirectly trained using the detector network according to loss L. The detector weight does not change here. On the feature map reduced by the encoder, embedding is performed by simply calculating the weighted sum with the message vector. For all channels in the reduced feature map, the messsage vector is added. In our experiments, the encoder and the detector have the same structure but do not share the parameters.

195

a

b

Fig. 4. (a) Change in peak signal-to-noise ratio (PSNR) with parameter K and (b) variance of pixel value differences before and after embedding. When K increases, the covariance between the gradients can be negative because the watermark energy tends to avoid redundancy. Therefore, we obtain a lower variance than the expected value.

feed-forward operation is generally much faster than the backpropagation. proximated as follows:



4.2. Visual masking and parameter analysis In general deep learning tasks, using l2 loss as in equation (2) is a direct approach to ensure the visual quality. However, in this case, when embedding, the invisibility cannot be adjusted. Thus, equally, we embed a message using only cross entropy and apply the classic visual masking. The following operations are equivalent to the l2 regularization term of Eq. (3). This equivalence can be deduced from the equivalence between the weight decay and the l2 loss in weight [30].

B ← B + α (t ) B,

(7)

B ← γr (B − B(0) ) + B(0) ,

(8)

T  Var[D] = Var γr(T −t+1) α (t ) ∇B log p(m|W, B(t) ) t=1



T 

Var



γr(T −t+1) α (t ) ∇B log p(m|W, B(t) )

=

2

α (0 ) σ 2

T 

γα2t γr2(T −t+1) .

The analysis of the sum of variance is relatively straightforward instead of the covariance. Thus, for maximum invisibility, we focus on the sum of variance and this variance should be minimized. According to Jensen’s inequality: 2

T 

 T 1/T T 2 γα2t γr2(T −t+1) ≥ α (0) σ 2 T γα2 t=1 t γr2 t=1 (T −t+1)

α (t ) = α (0) γαt .

4.3. Extended receptive field

D = |B ( T ) − B ( 0 ) |.

(10)

|B(T ) − B(0) | is the pixel value difference between the block with complete embedding and the original block. The variance is ap-

(11)

t=1

α (0 ) σ 2

We define the invisibility ratio K = γα γr and use it as a new parameter to control the invisibility and robustness. If K is 0, then B(t ) = B(0 ) because γ α or γ r is zero. Therefore, no watermark is embedded, and the invisibility is infinite but not robust. In other words, the image after embedding is identical to the original, but the message cannot be detected. Conversely, if K is close to 1, the robustness is high, but the invisibility is sacrificed. Thus, we set the appropriate level at 0-1 to the value of K = γα γr to balance the invisibility and robustness. Techniques such as batch normalization are widely used to normalize the intermediate result values of a layer. Even if batch normalization is not used, we force the gradient to be normalized as described in Section 5. Thus, we can assume that in Eq. (2), the gradient is identically distributed with a normal distribution. We assume that for all t, the element of ∇B L(W, B(t ) , m ) can be considered as N(0, σ 2 ). First, we define the difference D in pixel block as follows to analyze the invisibility.



t=1

where B equals to ∇ B log p(m|WD , B) if we embed a watermark using backpropagation as in Section 3.1. Otherwise, B will be AE(WAE , B, m) as in Section 4.1. In both methods, the residuals generated in the embedding process up to t times are compressed to a certain ratio γr = 1 − λ > 0. As a result, we can freely adjust the invisibility in the embedding step using only one embedder network without having to store multiple versions of the embedder network. The embedding factor α is reduced to a factor of γ α > 0.

(9)



t=1

=

2

α (0) σ 2 T K T +1 .

(12)

To establish equality, γα = γr must be true. Therefore, if we determine only √ K to control the robustness and invisibility, we specify γα = γr = K . The change in actual PSNR value according to parameter K and experimental result of Eq. (12) are shown in Fig. 4.

The proposed scheme uses a block-based scheme to maintain the consistency with the widely used watermarking with the frequency domain. However, the traditional block-based watermarking uses small block sizes to achieve high embedding capacity. Typically, R = C = 8 is used. However, spaces of 8 × 8 × 3 can have notably limited amount of information. To embed a watermark, it is necessary to be able to store both visual information of the original content and information of the hidden message. Therefore, if the block size is small, the entropy is small, so messages will not be sufficiently embedded to secure robustness. To improve this situation and have a wide receptive field, the structure of WMNet is adjusted, so that the entire image instead of the block becomes the input of the network at a time. A receptive field is a pixel patch that affects when the detector determines one message bit. For example, in the proposed technique, if the number of residual units is r, (2(r+1 ) − 1 ) × (2(r+1 ) − 1 ) is the size of the receptive field of the detector. At this time, the weight of the detector may be identical to that of Section 3. Nonetheless, on average, 1 bit per R × C block is embedded, but embedding and detection are done considering the surrounding pixels. The messages are not handled one bit at a time, and the entire message image is compared as a whole.

196

S.-M. Mun, S.-H. Nam and H. Jang et al. / Neurocomputing 337 (2019) 191–202

Algorithm 1 Improved embedding for a training batch. W ater mar ked batch ← ∅ //I is a cover image. for I ∈ T raining batch do //M0 is a randomly generated message image in binary. //To prevent overfitting, generate a bitwise complement of M0 , M1 . M1 ← 1 − M0 for M ∈ {M0 , M1 } do α ← α (0 ) I (0 ) ← I for t = 1 to T do α ← α × γα I ← I + α AE (WAE , I, M ) //L is the sum of the cross-entropy losses for each message bit. Please see Fig. 3. WAE ← WAE − η∇WAE L(WD , I , M ) //Apply visual masking. I ← γr (I − I(0 ) ) + I(0 ) end for W ater mar ked batch ← W ater mar ked batch ∪ {I} end for end for

The algorithms to summarize the improvements are shown in Algorithm 1. After generating a watermarked batch using this algorithm, we attack each of the images in the batch. Then, we train the detector to extract the original message from the attacked images.

layers correspond to better detection performance; however, there is a trade-off when there are increases in the feed-forwarding and back-propagation time. Pooling may ensure spatial invariance; however, it causes performance degradation when fine patterns are captured. Hence, we did not use max-pooling or average-pooling. The image data for training consisted of 40 0 0 24-bit color images from the BOSSBase dataset designed for data hiding studies [31]. Without the loss of generality, we set M = N = 512 and R = C = 8. Initially, α and η were set to 0.01 and 0.0 0 01, respectively. We used eight randomly sampled images per embedding stage to form a training batch. For each image, we used a pair of message images, where 0 and 1 were inverted from each other to avoid overfitting. In attack simulation (Section 3.1), some attacks were chosen with a higher probability. We chose JPEG attack with a probability of 1/2 and the remaining seven were chosen with a probability of 1/14 each. In practice, we used the adaptive moment estimation (Adam) [32], which is an extension of SGD method, because Adam has better loss reduction. As in Section 4.2, inspired by learning rate annealing, we reduced α by γ α times per embedding iteration. Iteration t for each block in the embedding stage was limited to the maximum value T = 8. The detailed structure of the detector and embedder is shown in Fig. 5. In the case of an autoencoder (i.e., the embedder), the number of filters in each residual unit was 4-16-32 for the encoder and 32-16-4 for the decoder. In the case of embedding and detecting using only a detector network (i.e., backpropagation-based method) as shown in Section 3 without using autoencoder, 4-3264-64 filters were required. 6. Experimental results

5. Implementation details 6.1. Setup of the experiment We propose a CNN model modified from a part of the powerful model called Residual Network [20], which included residual units. The proposed CNN contains more than 12 convolution layers, and the kernel size of each convolution layer was 1 × 1 or 3 × 3. A softmax layer was added at the end for classification. Generally, more

The network was trained using GPU NVIDIA GTX 1070 and CPU i7-4770k for a week; the test images and test messages were not used for the training. Attack experiments were performed using Python libraries scikit-image [33] and scipy [34].

Fig. 5. The embedder and detector structure consist of residual units. In the encoder, the size of the feature map is halved each time it passes through the residual unit. Conversely, the size of the produced feature map by the decoder is doubled. In the decoder, a transposed convolution is used instead of the convolution of existing residual units. Below the autoencoder, we visualize which filter map is computed each time it passes through the residual unit. For backpropagation-based embedding, we use the detector only. For autoencoder-based embedding, both the embedder and the detector are used. A residual unit proposed by [20] (bottom right). Here, conv1 and conv3 denote that the kernel size of the convolution is 1 × 1 and 3 × 3, respectively.

S.-M. Mun, S.-H. Nam and H. Jang et al. / Neurocomputing 337 (2019) 191–202

Fig. 6. Visual impact comparison before and after watermark embedding. The top row is the original test images, and the bottom row is the watermarked test images.

We conducted robustness testing for signal-processing attacks and geometric attacks. The comparison techniques were blind QDFT-based [4–6] and DCT-based [35] watermarking. One bit was inserted into the QDFT mid-frequency coefficients of four components using the QIM method. To ensure that the embedding capacity was identical to the proposed technique, the comparison techniques inserted one bit per block. Similar to the related studies [4–6,22,36,37], 64 standard test images including Baboon, Lenna, and Peppers were used as the cover images. They were 24-bit color images and 512 × 512 in size as shown in Fig. 6. The attacks in the training are as follows. The standard deviation of the kernel in Gaussian filtering was 1. Median filtering was performed for each channel with a 3 × 3 kernel. The JPEG factor was set to 80. Note that even with the same factor, the operation of JPEG compression may differ depending on the program. As mentioned before, we experimented with scikit-image [33]. The additive Gaussian noise was used with the standard deviation of 0.05 was performed. The cropping ratio, rotation angle and rescaling ratio were set to 0.8, 10 and 0.6, respectively. We trained WMNet with the attacks with the fixed parameters, but we tested to show the generalization ability of WMNet as in the next subsection and Fig. 9. The robustness and invisibility performances were measured based on the normalized correlation (NC) and Peak signal-to-noise ratio (PSNR), respectively. NC and PSNR are the most widely used performance indicators as shown in previous watermarking studies [2,6,11,22,36–38].







PSNR I, I = 10log10





NC w, w =

2552 × 3MN

 I − I 2

w, w  ,  w  w 

,

(13)

(14)

where I and I are the original cover image and its watermarked image, respectively. w and w are the bit sequences of the original watermark and detected watermark after the attack, respectively. · , · denotes the inner product. We subtracted the mean of the sequences, respectively and then clipped the negative outputs, so that the value of NC was between 0 and 1. 6.2. Results with registration The existing block-based techniques [4,6] estimated the rotation angle, moving distance, and scaling factor at the detection using

197

the inserted template. For all techniques in Fig. 7, we assumed that the RST correction in this estimation was performed without errors to compare robustness, as [5] did. We measured the NC every time, while changing the embedding intensity, i.e., changing the PSNR for the test set. In general, there is a strong trade-off between robustness and invisibility. That is, the lower invisibility for each technique means the higher robustness. Therefore, in order to compare the techniques more explicitly, we should check which technique is located at the upper right after drawing the PSNR-NC graph as shown in Fig. 7. The technique in the upper right is superior to techniques in the bottom left in both the robustness and invisibility. To verify the effectiveness of the proposed technique, we tested the robustness against attacks that are not used in training. Fig. 7 (i)–(l) show the results of training the attacks in Fig. 7 (a)– (h). In other words, Figs. 7 (i)–(l) show the test results against unseen attacks. Salt and Pepper noise with the probability of 0.01 was performed. The factor of Gamma adjustment was 0.3. We experimented with general types of affine transformations that could alter the pixel coordinates by a bit. Overall, the robustness was better than the QDFT-based and DCT-based techniques. Our technique is not limited to certain types of attacks used in training. We experimented with QIM and a combination of SS [39] and QDFT. A DCT-based technique was also tested. The WMNet (Single) is the result of the framework which only uses the single network and WMNet (E/D) is the result of the autoencoder inclusion as described in Fig. 3. For WMNet (Single) and WMNet (E/D), we obtained the graphs by changing the invisibility ratio K. For the QDFT-QIM technique, we controlled the robustness and invisibility through quantization steps. In the QDFT-SS technique, the embedding factor, which is widely written as α , has played this role. For DCT-DIFF, the robustness was tested while adjusting the threshold. In their paper, they set their embedding factor to 0.15 times the threshold, so we retained that ratio when we obtained the DCTDIFF graph. For QR decomposition-based technique [38], we controlled the threshold T from 0.01 to 0.1. WMNet (E/D) showed good robustness. Traditional frequency-domain-based techniques are notably vulnerable to one or two attacks. For example, DCT-DIFF is vulnerable to Gaussian filtering and median filtering; for QDFT-SS, the watermark message is not inserted well even if there is no attack. In addition, the rescaling and rotation showed a lack in robustness compared to other techniques. In our experiments, the QDFT-QIM technology demonstrates the best performance among the frequency-domain-based watermarking techniques. However, it is notably vulnerable to the salt-and-pepper noise and Gamma correction. WMNet (E/D) was the most robust in most cases. However, WMNet (E/D) did not have the best robustness for no attack and cropping as shown in Fig. 7 (a) and (g). As the embedding domain of WMNet (E/D) becomes complicated by training, it seems that it is difficult to embed messages compared to WMNet (Single). However, once the watermark was embedded, WMNet (E/D) showed the higher level of robustness. In Fig. 7 (a), the QDFT-QIM method showed better performance than WMNet (E/D). We also measured the invisibility with the Structural SIMilarity (SSIM) [40] as shown in Fig. 8. SSIM is the more reliable metric than PSNR because it correlates well with the human visual system. In Fig. 9, we present the performance of WMNet (E/D) and the previous watermarking techniques as the degree of attacks changes. As shown, WMNet (E/D) achieved the better robustness than the existing frequency-domain-based techniques. WMNet (E/D) worked well on both various levels of attacks and multiple attacks. Also, Fig. 10 shows that the robustness changes for unseen attacks when the number of iteration increases. We measured the performance change for salt-and-pepper noise when we trained

198

S.-M. Mun, S.-H. Nam and H. Jang et al. / Neurocomputing 337 (2019) 191–202

a

b

c

d

e

f

g

h

i

j

k

l

Fig. 7. Comparison of the normalized correlation values with registration as the embedding intensity changes. In this figure, the invisibility was measured by PSNR and the levels of attacks was fixed. Each point of the graphs was determined by the average of 64 test images.

S.-M. Mun, S.-H. Nam and H. Jang et al. / Neurocomputing 337 (2019) 191–202

199

a

b

c

d

e

f

g

h

i

j

k

l

Fig. 8. Comparison of the normalized correlation values with registration as the embedding intensity changes. In this figure, the invisibility was measured by SSIM and the levels of attacks were fixed. Each point of the graphs was determined by the average of 64 test images.

200

S.-M. Mun, S.-H. Nam and H. Jang et al. / Neurocomputing 337 (2019) 191–202

a

b

c

d

e

f

Fig. 9. Comparison of the normalized correlation values as the degree of attacks changes. The embedding intensity for each technique was fixed and SSIM values are presented in the bottom left corner. Again, for all techniques, registration was assumed. Each point of the graphs was determined by the average of 64 test images.

Fig. 10. We measured the performance change for the salt-and-pepper noise when we trained WMNet for JPEG only.

WMNet (E/D) only about JPEG. The performance is gradually improved for the unseen attack (i.e., salt-and-pepper noise).

Net showed better robustness against geometric attacks compare to QDFT domain. Our future work will finally determine the correlation from a corrupted message without using template matching.

6.3. Results without registration 6.4. Embedding and detection speed We tested the robustness when template matching was not applied. The extracted messages without registration are shown in Table 1. Here, we experimented with QDFT-QIM, which has the best performance among all frequency-domain-based watermarking techniques. For frequency domains such as the QDFT, if the block synchronization fails, it is vulnerable because meaningless noise is detected. Since we attacked both the message images and the cover images in the attack simulation stage, our WM-

The proposed method has higher computational complexity than the QIM after QDFT or other frequency-domain-based watermarking methods, and the required time for embedding and detection should be longer. However, most computations in the proposed technique are implemented using a GPU to be computed in parallel. The comparison methods, embedding time and detection time of this technique are shown in Table 2, which were

S.-M. Mun, S.-H. Nam and H. Jang et al. / Neurocomputing 337 (2019) 191–202

201

obtained by averaging over 60 images. WMNet (E/D) and WMNet (Single) have identical detector structures, but WMNet (E/D) can achieve higher performance even with smaller parameters. Therefore, there is a difference in detection speed.

7. Discussion In fact, the watermark detection using QDFT is representable by a notably shallow network. According to [5], each component of QDFT can be obtained as a linear combination of DFT coefficients on each color channel. Most recent techniques use the quantization index modulation (QIM) [4–6], which can be represented by one ReLU and two fully connected layer back and forth. Here, adjacent linear layers can be replaced by one linear layer. In total, QDFT and QIM consist of three layers in a linear, ReLU, and linear order as shown in Fig. 11. Therefore, the CNN with a sufficiently large number of layers and filters can be considered an extension of QDFT and QIM. The proposed CNN model has 12 ReLU layers alternating with linear layers. In addition, the parameters of our model are optimized by learning while the QDFT is fixed. In other words, unlike QDFT, the proposed scheme has a domain specialized for watermarking. Using the identical argument, the SS-based watermarking system can be expressed in a notably shallow network. The domain of the proposed technique is superior to the domain obtained by a linear transform such as QDFT and DCT; although these techniques are combined with the QIM and SS methods, they remain notably shallow when expressed in terms of neural networks. Deepening of the network may raise concerns that the effect of fine noise may gradually disappear. However, in recent years, deep learning has proven to be useful in high-level contexts and for capturing fine patterns [41–43]. Also, the structure of the widely used residual unit uses a shortcut connection. Shortcuts transfer

Table 1 Comparison of the extracted message without registration. Without assuming registration, the proposed technique cannot obtain robust correlation results for rotation and cropping, but the messages of the watermark were not entirely erased.

Fig. 11. Network representation of an existing watermarking domain. The identical logic holds for a linear transform such as DFT or DCT and QDFT. is the quantization step of QIM.

the noise in the front layer to the back layer without loss, so even if the layer is deepened, the fine pattern remains transmitted. In this paper, over 40 0 0 training images and eight types of attacks were used. This amount was enough to show robustness against 64 unseen images and 4 unseen attacks. As future work, one can combine smaller amounts of data, attack modules and appropriate regularization methods to obtain the better robustness. 8. Conclusion We proposed a learning framework for a robust and blind watermarking technique based on reinforcement learning. The rationale is that the current blind-watermarking detector can be represented by a notably shallow neural network. The proposed methods have higher SSIM and NC values than current watermarking methods, which are based on QDFT. The trained model outperformed QDFT or DCT when registration is assumed. Traditional watermarking researchers first design the transform and subsequently attack and measure the performance. In contrast, the proposed framework can optimize the watermarking domain through attack functions. In addition, our technique does not require detailed algorithms for each attack or expert knowledge to counter them. Our technique is adaptive to simulated attacks and supports blind detection. Acknowledgments This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2016R1A2B2009595) References

Table 2 Embedding and extraction time of comparison techniques and the proposed techniques. WMNet-AE is a variant with an autoencoder added to the proposed technique as described in Fig. 3. Techniques

device

Emb. (s)

Ext. (s)

DCT-DIFF QDFT-SS QDFT-QIM WMNet (Single) WMNet (E/D) WMNet (Single) WMNet (E/D)

CPU CPU CPU CPU CPU GPU GPU

0.3577 0.8617 0.8626 59.7471 7.1802 0.6703 0.2168

0.1507 0.4942 0.5042 2.7645 0.5602 0.0229 0.0093

[1] X. Kang, J. Huang, Y.Q. Shi, Y. Lin, A dwt-dft composite watermarking scheme robust to both affine transform and jpeg compression, IEEE Trans. Circ. Syst. Video Technol. 13 (8) (2003) 776–786. [2] S.D. Lin, C.F. Chen, A robust dct-based watermarking for copyright protection, IEEE Trans. Consum. Electron. 46 (3) (20 0 0) 415–421. [3] C. Li, Z. Zhang, Y. Wang, B. Ma, D. Huang, Dither modulation of significant amplitude difference for wavelet based robust watermarking, Neurocomputing 166 (2015) 404–415. [4] X.Y. Wang, C.P. Wang, H.Y. Yang, P.P. Niu, A robust blind color image watermarking in quaternion fourier transform domain, J. Syst. Softw. 86 (2) (2013) 255–277. [5] B. Chen, G. Coatrieux, G. Chen, X. Sun, J.L. Coatrieux, H. Shu, Full 4-d quaternion discrete fourier transform based watermarking for color images, Digital Signal Process. 28 (2014) 106–119. [6] J. Ouyang, G. Coatrieux, B. Chen, H. Shu, Color image watermarking based on quaternion fourier transform and improved uniform log-polar mapping, Comput. Electr. Eng. 46 (2015) 419–432. [7] J.S. Tsai, W.B. Huang, Y.H. Kuo, On the selection of optimal feature region set for robust digital image watermarking, IEEE Trans. Image Process. 20 (3) (2011) 735–743.

202

S.-M. Mun, S.-H. Nam and H. Jang et al. / Neurocomputing 337 (2019) 191–202

[8] P.C. Su, Y.C. Chang, C.Y. Wu, Geometrically resilient digital image watermarking by using interest point extraction and extended pilot signals, IEEE Trans. Inf. Forensics Secur. 8 (12) (2013) 1897–1908. [9] T. Zong, Y. Xiang, I. Natgunanathan, S. Guo, W. Zhou, G. Beliakov, Robust histogram shape-based method for image watermarking, IEEE Trans. Circ. Syst. Video Technol. 25 (5) (2015) 717–729. [10] F. Ji, C. Deng, L. An, D. Huang, Desynchronization attacks resilient image watermarking scheme based on global restoration and local embedding, Neurocomputing 106 (2013) 42–50. [11] S.H. Nam, W.H. Kim, S.M. Mun, J.U. Hou, S. Choi, H.K. Lee, A sift features based blind watermarking for DIBR 3d images, Multimed. Tools Appl. (2017) 1–40. [12] C.T. Yen, Y.J. Huang, Frequency domain digital watermark recognition using image code sequences with a back-propagation neural network, Multimed. Tools Appl. 75 (16) (2016) 9745–9755. [13] L. Sun, J. Xu, S. Liu, S. Zhang, Y. Li, C. Shen, A robust image watermarking scheme using arnold transform and bp neural network, Neural Comput. Appl. (2017) 1–16. [14] B. Jagadeesh, P.R. Kumar, P.C. Reddy, Robust digital image watermarking based on fuzzy inference system and back propagation neural networks using DCT, Soft Comput. 20 (9) (2016) 3679–3686. [15] K.J. Davis, K. Najarian, Maximizing strength of digital watermarks using neural networks, in: Proceedings of the IJCNN’01. International Joint Conference on Neural Networks, 4, IEEE, 2001, pp. 2893–2898. [16] S.C. Mei, R.H. Li, H.M. Dang, Y.K. Wang, Decision of image watermarking strength based on artificial neural-networks, in: Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP’02, 5, IEEE, 2002, pp. 2430–2434. [17] M.S. Hwang, C.C. Chang, K.F. Hwang, Digital watermarking of images using neural networks, J. Electronic Imaging 9 (4) (20 0 0) 548–555. [18] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: Proceedings of the Advances in Neural Information Processing Systems, 2012, pp. 1097–1105. [19] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556. [20] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778. [21] J. Dai, K. He, J. Sun, Instance-aware semantic segmentation via multi-task network cascades, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3150–3158. [22] H. Kandi, D. Mishra, S.R.S. Gorthi, Exploring the learning capabilities of convolutional neural networks for robust image watermarking, Comput. Secur. 65 (2017) 247–268. [23] L. Bottou, Large-scale machine learning with stochastic gradient descent, in: Proceedings of COMPSTAT’2010, Springer, 2010, pp. 177–186. [24] B.D. Haeffele, R. Vidal, Global optimality in neural network training, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7331–7339. [25] L. Theis, W. Shi, A. Cunningham, F. Huszár, Lossy image compression with compressive autoencoders, arXiv:1703.00395. [26] D. Kim, H.U. Jang, S.M. Mun, S. Choi, H.K. Lee, Median filtered image restoration and anti-forensics using adversarial networks, IEEE Signal Process. Lett. 25 (2) (2018) 278–282. [27] J. Kim, J.K. Lee, K.M. Lee, Accurate image super-resolution using very deep convolutional networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1646–1654. [28] Z. Zhu, X. Wang, S. Bai, C. Yao, X. Bai, Deep learning representation using autoencoder for 3d shape retrieval, Neurocomputing 204 (2016) 41–50. [29] C.Y. Liou, W.C. Cheng, J.W. Liou, D.R. Liou, Autoencoder for words, Neurocomputing 139 (2014) 84–96. [30] A. Krogh, J.A. Hertz, A simple weight decay can improve generalization, in: Proceedings of the Advances in Neural Information Processing Systems, 1992, pp. 950–957. [31] P. Bas, T. Filler, T. Pevny` , break our steganographic system: The ins and outs of organizing boss, in: Proceedings of the International Workshop on Information Hiding, Springer, 2011, pp. 59–70. [32] D. Kingma, J. Ba, Adam: a method forstochastic optimization, arXiv preprint: 1412.6980. [33] S.V.d. Walt, J.L. Schönberger, J. Nunez-Iglesias, F. Boulogne, J.D. Warner, N. Yager, E. Gouillart, T. Yu, Scikit-image: image processing in python, PeerJ 2 (2014) e453. [34] E. Jones, T. Oliphant, P. Peterson, and others, {SciPy}: Open source scientific tools for {Python}, 2001. http://www.scipy.org/. [Online; accessed 2019-02-07]. [35] S.A. Parah, J.A. Sheikh, N.A. Loan, G.M. Bhat, Robust and blind watermarking technique in DCT domain using inter-block coefficient differencing, Digit. Signal Process. 53 (2016) 11–24. [36] L. An, X. Gao, X. Li, D. Tao, C. Deng, J. Li, Robust reversible watermarking via clustering and enhanced pixel-wise masking, IEEE Trans. Image Process. 21 (8) (2012) 3598–3611. [37] L. An, X. Gao, Y. Yuan, D. Tao, Robust lossless data hiding using clustering and statistical quantity histogram, Neurocomputing 77 (1) (2012) 1–11. [38] Q. Su, G. Wang, X. Zhang, G. Lv, B. Chen, An improved color image watermarking algorithm based on QR decomposition, Multimed. Tools Appl. 76 (1) (2017) 707–729.

[39] L. Pérez-Freire, F. Pérez-González, Spread-spectrum watermarking security, IEEE Trans. Inf. Forensics Secur. 4 (1) (2009) 2–24. [40] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Process. 13 (4) (20 04) 60 0–612. [41] H.U. Jang, D. Kim, S.M. Mun, S. Choi, H.K. Lee, Deeppore: Fingerprint pore extraction using deep convolutional neural networks, IEEE Signal Process. Lett. 24 (12) (2017) 1808–1812. [42] B. Bayar, M.C. Stamm, A deep learning approach to universal image manipulation detection using a new convolutional layer, in: Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security, ACM, 2016, pp. 5–10. [43] J. Chen, X. Kang, Y. Liu, Z.J. Wang, Median filtering forensics based on convolutional neural networks, IEEE Signal Process. Lett. 22 (11) (2015) 1849–1853. Seung-Min Mun received the B.S. degree in Department of Mathematical Sciences from Korea Advanced Institute of Science and Technology (KAIST), Korea, in 2014. He received his M.S. degree in School of Computing from KAIST, in 2016. He is currently working toward his Ph.D. degree in Multimedia Computing Lab., School of Computing, KAIST. His research interests include digital water marking and deep learning.

Seung-Hun Nam received the B.S. degree in Information Communication Engineering from Dongguk University, Seoul, Republic of Korea, in 2013, and the M.S. degree in School of Computing from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea, in 2015. He is currently pursuing the Ph.D. degree in Multime dia Computing Lab., School of Computing, KAIST. His research interests include computer vision, digital watermarking and image forensics.

Haneol Jang received his B.E. degree in information computer engineering from Ajou University, Korea, in 2012, and his M.S. and Ph.D. degrees in computer science from KAIST, Korea, in 2014 and 2018, respectively. Since 2018 he has been a senior re searcher in the Affiliated Institute of ETRI. His research interests include multimedia security, machine learning, and computer vision.

Dongkyu Kim received the B.S. degree in electronics and communications engineering from Hanyang University, Seoul, South Korea, in 2013, and the M.S. degree in electrical engineering and Ph.D. degree in computer science from the Korea Advanced Institute of Science and Technology, Daejeon, South Korea, in 2015 and 2019, respectively. His research interests include various aspects of information hiding, digital image forensics, machine learning, and deep learning.

Heung-Kyu Lee received a BS degree in electronics engineering from Seoul National University, Seoul, Korea, in 1978, and M.S. and Ph.D. degrees in computer science from Korea Advanced Institute of Science and Technology, Korea, in 1981 and 1984, re spectively. Since 1986 he has been a professor in the Department of Computer Science, KAIST. He has authored/coauthored over 100 international journal and conference papers. He has been a reviewer of many international journals, including Journal of Electronic Imaging, Real- Time Imaging, and IEEE Trans. on Circuits and Systems for Video Technology. His major interests are information hiding, digital watermarking, and multimedia forensics.