Detecting USM image sharpening by using CNN

Detecting USM image sharpening by using CNN

Accepted Manuscript Detecting USM image sharpening by using CNN Jingyu Ye, Zhangyi Shen, Piyush Behrani, Feng Ding, Yun-Qing Shi PII: DOI: Reference:...

507KB Sizes 1 Downloads 66 Views

Accepted Manuscript Detecting USM image sharpening by using CNN Jingyu Ye, Zhangyi Shen, Piyush Behrani, Feng Ding, Yun-Qing Shi

PII: DOI: Reference:

S0923-5965(18)30435-1 https://doi.org/10.1016/j.image.2018.04.016 IMAGE 15385

To appear in:

Signal Processing: Image Communication

Received date : 2 December 2017 Revised date : 18 March 2018 Accepted date : 29 April 2018 Please cite this article as: J. Ye, Z. Shen, P. Behrani, F. Ding, Y.-Q. Shi, Detecting USM image sharpening by using CNN, Signal Processing: Image Communication (2018), https://doi.org/10.1016/j.image.2018.04.016 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1

Detecting USM image sharpening by using CNN Jingyu Ye, Zhangyi Shen, Piyush Behrani, Feng Ding, Yun-Qing Shi Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ 07102 USA

ARTICLEINFO Article history: Received 00 December 00 Received in revised form 00 January 00 Accepted 00 February 00 Keywords:

ABSTRACT Image sharpening is a basic digital image processing scheme utilized to pursue better image visual quality. From image forensics point of view, revealing the processing history is essential to the content authentication of a given image. Hence, image sharpening detection has attracted increasing attention from researchers. In this paper, a convolutional neural network (CNN) based architecture is reported to detect Unsharp Masking (USM), the most commonly used sharpening algorithm, applied to digital images. Extensive experiments have been conducted on two benchmark image datasets. The reported results have shown the superiority of the proposed CNN based method over the existed sharpening detection method, i. e., edge perpendicular ternary coding (EPTC).

Image sharpening Image forensics Unsharp Masking Convolutional neural network Edge perpendicular ternary coding

1. Introduction Images provide information to human eyes and have been considered as secured information evidence in courts. Images have also been utilized in various media such as newspapers, magazines and television. With rapid advancement of digital technologies, digital images have replaced the traditional photo images. For the convenience of modern life more and more techniques have been created, including image editing techniques. Since common people have acquired the ability to tamper, forge or alter digital images, the reliability of images serving as evidence has become an issue [1]. Therefore, scientists use image forensics [2][3] to identify if a given image has been illegally or improperly manipulated or not. Furthermore, as the digital images become dominant in our daily life, it is needed for us to know whether a given image has been processed or not, if the image has been processed, to what extent it has been altered needs to be known. Image sharpening [4] is one of the widely used manipulations on digital images. Through this manipulation people would like to obtain clearer detailed information from images because after image sharpening, the contrast of image is enhanced, edges, outlines and details become clearer. Therefore, to detect image sharpening becomes important in image forensics. Since 2009, several methods [5][6][7] have been proposed for Unsharp Masking (USM) sharpening detection. In [5], Cao et al. discovered the histogram aberration after image sharpening and a method was proposed to detect such artifact. But according to their latest work [6], the method in [5] is only effective to detect the images with wide histogram. To overcome this drawback, Cao et al. [6] proposed another

detection method. Firstly, the method detects the edge pixels of a given image. Secondly, the set of side-planar crosswise pixel sequences are located on the basis of detected edge pixels. Then, for each side-planar pixel sequence an overshoot strength is calculated and the overshoot metric of the whole image is measured by average of the overshoot strengths. Finally, threshold is applied on overshoot metric to make a binary decision for sharpening detection. The method in [6] can overcome the drawback of [5] and is more effective. But it is vulnerable to JPEG compression and image sharpening, which limit its use in practical applications. By regarding the appearance of overshoot artifacts as a special kind of texture modification, a detection method [7] was proposed, which is based on a widely used texture classification technique called local binary pattern (LBP) [8][9]. The LBP-based method has been validated to be more accurate for sharpening detection compared with the method in [6]. In contrast to the methods in [5][6][7], Lu et al. [10] proposed a method to remove overshoot artifacts for anti-forensics of USM sharpening. Inspired by the LBP-based method [7], Ding et al. [11] proposed a novel method called Edge Perpendicular Binary Coding (EPBC) to detect USM sharpening. Considering that the texture modification caused by USM sharpening is high mainly along the perpendicular direction of image edges, the EPBC uses a long rectangular window perpendicular to edge to characterize image textures. To avoid the long rectangular window used in EPBC, a particular binary coding strategy is proposed in [12] called Edge Perpendicular Ternary Coding (EPTC) which is better than the method reported in [11]. The Convolutional Neural Networks (CNNs) have made tremendous achievements in computer vision since 2012. This has aroused interests of researchers to seek the way to use CNNs for image forensics. In this paper, a CNN based

2 approach is proposed to detect image sharpening. Experimental results have shown that the use of CNNs gives better results in image sharpening detection than the EPTC does, which is so far the best method to detect USM based sharpening. The rest of this letter is organized as follows. In Section 2 the USM sharpening and EPTC are presented. Section 3 provides the proposed CNN structure. The experimental works and results are shown in Section 4. Conclusion is drawn in Section 5.

2. USM Sharpening and Edge Perpendicular Ternary Coding A. USM Sharpening Unsharp masking (USM) is a common algorithm for image sharpening. The algorithm is used to improve the visual effect of image by increasing contrast at the high frequency part, such as the marginal area of the image. The sharpening process is implemented by adding a scaled unsharp mask M to the original image, the specific formula is: λ

(1)

where X, Y, M and λ denote, respectively, input image, output image, unsharp mask and scaling.

B1. Edge detection. Canny operator [16], a popular edge detector, is used for this step. This operator can detect the edge pixels of image. B2. Determination of local edge datasets. Set a rectangular window of 1×N pixels which is perpendicular to the direction of the edge, N is an odd number no less than 3. Also, the center of the window is the edge pixel. Then, a pixel set S1 could be obtained, which can be represented as

(2)

where ⊗ and H denote convolution operator and a high-pass filter, respectively. However, unsharp mask can also be generated via Gaussian filtering, as expressed by ⊗

(3)

B. Edge Perpendicular Ternary Coding (EPTC) The edge perpendicular ternary coding (EPTC), which is considered as the comparative method to our proposed CNN structure, is utilized to detect sharpened image and has been reported in [12]. The method was inspired by the previous work that uses LBP [7] for sharpening detection. The detailed steps are listed as follows.

(4)

B3. Ternary coding. First, calculate the difference between each pixel and the pixels on its right side. As a result, a new data set S2 can be obtained as ,

,…,

(5)

Second, convert S2 into a ternary code T as follows. ,

where

,…,

(6)

1 0 1



(7)

where q is a positive number calculated by



/





‖ ∑

/





(8)

B4. Calculation of EPTC histogram. From the ternary code T obtained above, a transferred set T’ is derived by

denotes a Gaussian filter with variance σ, which can where control the range of sharpening. According to Eq. (1) and Eq. (3), there are two parameters to control the sharpening process. One is the scaling coefficient λ and another is the variance of Gaussian filter σ. By using a combination with different values of λ and σ, the effects of sharpening will be different accordingly. After sharpening, the edge, the contour line as well as the details of image will become clearer than in the original image. In this paper, images sharpened with different values of λ and σ will be discussed in the experiment section.

,…,

where Pi donates the pixel values in the rectangular window. The edge pixel is the element P(N-1)/2.

In traditional USM algorithm, unsharp mask is generated through processing a high-pass filtering on the original image. The formula is: ⊗

,

S

′ , ′ ,…, ′ For n

For n

1 ′

1 0



1 0

1

(9)

1

(10)

1

(11)

Then, a decimal number, denoted as EPTC(T), can be derived as shown below. ∑



2 (12)

1. As we can see, EPTC(T) is in the range of 0, 2 1 different integer values can be regarded as These 2 1 patterns. The histogram of the EPTC pattern for a 2 given image can be calculated as follows:

(13)

3 H ∈∆

1 0,1, … , 2 where denotes the indicator function which equals to 1 if x 0 and 0 otherwise, Δ denotes the sets of T calculated from all of the local edge areas in the given image. Also, to make H invariant to image, H could be normalized as follows. for i

∑ 0,1, … , 2

This normalized histogram images.



(14)

1.

is used as features of given

B5. SVM training and classification. Finally, the features obtained in Eq. (14) need to be prioritized and selected and then, used as input to SVM classifier. This trained SVM is able to recognize whether images have been sharpened or not.

3. Convolutional Neural Network (CNN) Convolutional neural network (CNN) has become the most efficient method in image forensics research recently. In this section, an efficient structure based on CNN for sharpening detection is proposed. A. History of CNN In 1989, a neural network named ‘Net-5’ was designed to solve a handwritten digit recognition problem [17]. The author had two major contribution in designing the network. The first one is reducing the number of free parameters to gain better generalization, and the second one is to force hidden units to learn from local information in order to achieve better results. The hidden layers in the Net-5 are composed of several feature maps, while each unit in one feature map is connected to units within a fixed size neighborhood, for instance 3×3, on the input plane. Therefore, the number of free parameters is largely reduced comparing with traditional fully connected neural networks. Furthermore, all units in a feature map share the same set of weights, and subsampling is utilized as well to reduce the complexity of the network. Thus, much less parameters are employed during the computation. Besides, back propagation technique [18] is also employed to train the neural network. According to the reported results, Net-5 has achieved the best performance among five compared structures. Noted, Net-5 is the first CNN as known and the idea behind is still the essence of today’s various deep CNNs. Later in [19], the above introduced work was applied on recognizing handwritten digits taken by U.S. Mail, and the network has been extended from Net-5’s two hidden layers to three hidden layers, including two convolutional layers, and one fully connected layer. Although the number of convolutional layer

was not increased, the number of kernels adopted, that is the set of in each hidden layer is significantly increased. The results turned out to be the state of the art. In [20], another CNN named ‘LeNet-5’ is proposed for handwritten character recognition. However, the network is still shallow. The first deep CNN architecture called ‘AlexNet’ was presented in 2012 [21]. This network has achieved remarkable success in the ILSVRC-2012 competition which is considered as a huge step to the machine learning society. In ‘AlexNet’, five convolutional layers are employed to generate hierarchical feature maps. Besides, Max pooling is applied to reduce the size of the network. To increase non-linearity, an activation function named ReLU is utilized in ‘AlexNet’ as well. Finally, ‘AlexNet’ achieved a top-5 test error rate of 15.3% on the ImageNet database [22] while the second-best result was 26.2%. After the big success of ‘AlexNet’, deep CNN has aroused tremendous interests and several successful CNNs have been presented for image classification, such as ‘ZF Net’ [23], ‘VGGNet’ [24], ‘GoogLeNet’ [25], ‘ResNet’ [26], etc. Not only the research of image classification, deep CNN has been widely spread to other related areas and achieved successes, such as face recognition, human action recognition, and steganalysis [27-29] as well. B. Common layers in design CNN Structure Neural network is built by the accumulation of different types of layers. By reviewing the proposed CNNs, most of the networks are based on a similar structure that is a hierarchical architecture starts with multiple stages of convolutional modules and ends with a classification module. A common convolutional module includes a convolutional layer, an activation layer, and a pooling layer. By stacking a series of convolutional modules, hierarchical feature maps are extracted and then fed into the classification module composed of one or more fully-connected layers, and the SoftMax layer with cross-entropy loss. In this part, several essential types of layer will be introduced. B1. Convolutional layer Convolutional layer is a trainable filter bank which can be considered as a feature extractor. It transforms images to feature maps or feature vectors. For example, a 6×6 input image on the left is filtered in order by a 3×3 filter on the upper-left in Fig. 3.1. The filter will scan the whole image row by row. Each element in the scanning will be multiplied by the elements on the corresponding position in filter. The sum of each products in one block will be the result on the corresponding position in feature map. In this example, after scanning the whole image, a 4×4 feature map is generated. The filter block can also be described as kernel. Normally, there will be several different kernels. As a result, the corresponding feature maps are also increased. Colorful image has 3 channels (R, G, B), so the kernel should be in 3 channels so as to the feature maps.

4 B4. Normalization layer Sttochastic graddient descent is i a process thhat CNN searrches for optimum soluution and thenn optimizes thhe whole netw work throough back-ppropagation. More specifically, throough backk-propagationn, weights andd biases in coonvolutional laayers willl be optimizedd so as to reduce the trainning loss, andd the pow wer of the netw work will then be enforced too predict the laabels of uunseen data. A After each stoochastic gradieent descent(SG GD), the correspondinng activation is normalizeed by mini-batch, whiich makes thee mean of thee result (each dimension off the outpput signal) 0 and the variaance of 1. Thhis process iss the funcction of normalization layerr and can speeed up trainingg and impprove model prrecision. Figg. 3.1 The exaample of convoolution l B22. Activation layer Activation laayer brings som me nonlinear factors to the neural neetwork so that the neural nettwork can solvve the more coomplex prroblems betterr. In CNN deesigning, there are three ppopular acctivation functtions- Sigmoidd, TanH and R ReLU. ReLU, as the acctivation functtion our CNN structure seleected, will be ffurther discussed in secction D in this chapter. B33. Pooling layyer Pooling layer reduces the qquantity of feaatures extractedd from im mmediately prrior convolutiional layer too avoid overffitting. Thhere are two tyypes of poolingg layers, averaage pooling annd max poooling. The exxample of averrage pooling iis shown in Fiig. 3.2. Thhe result is geenerated by taaking the averrage values foor each poooling mask.

mple of averagee pooling Fig. 3.2 The exam Fiig. 3.3 shows the example of max poolling. The maxximum vaalues for each pooling maskk is taken as thhe results.

Figg. 3.3 The exaample of max ppooling

B5. Fully connectted layer T The function of fully connnected layerr is mappingg the feattures from connvolutional moodule to targett labels. It justt like a vooting system. F For each featuure, the system m will give a weeight and finally calcuulated the totaal scores for eeach class. Hiigher m scorre in one class means this iimage is conssidered to be more propper for belongging to that claass by the systeem. B6. SoftMax layeer S SoftMax layyer is to transform thhe scores from fullyy-connected llayer to probaabilities and eensure the sum m of them m 1. This layeer always be thhe last layer of the classificaation moddule. The six layerrs introducedd above are thhe most com mmon CNN designinng. In the nexxt subsection,, our layeers used in C propposed CNN sttructure will bee presented. C. P Proposed CNN N Structure F Fig. 3.4 presennts the overaall architecturee of the propposed CNN N structure, w which is motiivated by “LeeNet-5” [20]. The whoole CNN contaains four convoolutional moduules and one liinear classsification moddule. T The input of thhe whole netw work are the orriginal imagess and the sharpened iimages. Folloowing the innput layer is the convvolutional moodules. The funnction of convvolutional moddules is trransforming thhe images to 664 of 1×1 featture vectors. Since S eachh module conntains a groupp of functionaal layers, theyy are pressented as grouup with labelss in the figure. The modules are com mprised by fouur groups (dispplayed as “Grooup 1”, “Groupp 2”, “Grroup 3” as well as “Group 4”” in Fig. 1). Eaach group conntains fourr layers startiing with a convolutional layer to generate featture maps. In cconvolutional layers, the innput image is to t be filteered by 8 kernnels of size 1× (3×3) eacch (the kernel size folloows number of o input featuure maps × (height × widthh)) in Grooup 1. In the following connvolutional laayers, there arre 16 kernnels of size 8× × (3×3) in G Group 2, 32 keernels of size 16× (3× ×3) in Group 3 and 64 kerneels of size 32× × (3×3) in G Group 4, reespectively. Thhen, the generrated feature m maps get into B Batch Norrmalization laayer, which can prevent ddata from graddient diffu fusion. The rectified lineaar units, whicch is an efficcient activvation functioon, is appliedd next to enhaance the poweer of

5 staatistical modeeling. Finally, each group except Group 4 ends wiith a Max Pooling layer which perform loccal maximum taking ass well as dow wn sampling on o the featuree maps. The ffeature m maps are to be filtered by a m mask of size 55×5 with stridde 2 in As for Group 4, the poolingg layer mergees each Grroup 1, 2, 3. A m map to a singlee element throuugh global avveraging. The kernel fixed to the spaatial size of thee input sizze for this poooling layer is fi feature maps. Inn this way, 644-D features w will be generateed and then enter the linear classificaation module.

pieccewise linear nnature [13]. As a result, it haas been selecteed as the activation funnction in the prroposed CNN structure.

Fig. 3.5 Plot of R ReLU functioon Inn addition, maax pooling layeers have been ttaken to reducce the sizees of feature m maps. Since US SM applied higgh-pass filterinng on the original imaage to generrate mask, C CNN should have concerns on the contour of thhe sharpened image. Thereefore, mpared with avverage poolingg, max poolingg is able to resserve com morre efficient feeatures. The superiority foor using the max poooling is also pproved by the experimentall results show wn in secttion 4.

Experimental R Results 4. E

Fig. 3.4 Proposeed CNN structuree. Layers types annd parameter chooices are dissplayed inside booxes. Sizes of featture maps are dissplayed on the tw wo sides, shoown as (number of feature maps)) × (height× widdth). Size of convvolution kerrnels are shown inn the boxes in the format of (number of kernels × nuumber of inpput feature maps × (height× width))).

The linear claassification moodule consists of a fully connnected layyer and a SooftMax layer. It transformss feature vecttors to ouutput probabiliities for each cclass. These pprobabilities inndicate the accuracy oof the sharpenning detectionn achieved by b our prroposed CNN structure. D. ReLU layerss with Max Pooling Rectified lineear unit (ReLU U) is an activattion function ddefined ass 0, (15) whhere x is the innput to a neurron. The plot of o ReLU is shoown in Fiig. 3.5. As an activattion function, ReLU has the ability to inncrease the nonlinearityy of CNN m model. In adddition, ReLU U can effficiently speeed up the connvergence of stochastic grradient deescent and proocess learningg optimizing easily e becausee of its

Datasets and S Settings A. D T There are two different imagge data sets haave been utilizeed in our experimentall works. Onee is BOSSbaase v1.01 [155], a well-established image databaase designed ffor steganograaphy and steganalysiss, which conntained 10,0000 uncompreessed mage grayyscale images with size of 512×512. Another im dataabase, which is same as thhat used for testing the EPTC E scheeme in [12], is generatedd by randomlyy selecting 1,000 1 imaages from the UCID and aanother 1,000 images from m the NRC CS databases. That is there are totally 2,0000 uncompreessed grayyscale images with size of 3384×384. Inn Section 2, w we have introduuced two param meters: λ and σ. In the experimentall works we hhave worked on nine diffe ferent mbined λ and σ parameter paairs. That is, for each imagge in com the databases, thee USM sharpeening algorithhm is applied with the following ninne different coombinations, i.e., ‘σ 0.7, λ 8 ’, ‘ σ 1.0, λ 1.0 ’, ‘ σ 1.0, λ 0.5 ’, ‘ σ 1.0, λ 0.8 5 ’, ‘ σ 1.3, λ 1.0 ’, ‘ σ 1.0, λ 1.3 ’, ‘ σ 1.0, λ 1.5 ments 1.0’’, ‘σ 1.3, λ 1.5’, ‘σ 1.5, λ 1.0’.. The experim have been conduccted to detect nnine USM sharrpening cases so as to evaluate the peerformance of the proposed CNN architeccture. wo datasets as said above, thhere are in totaal 18 Sincce we have tw sharrpened imagee sets. In thiis work, the USM sharpeening algoorithm has beeen implementeed in MATLA AB.

6 ments, the prooposed CNN aarchitecture haas been In the experim im mplemented ussing a modifiedd version of thhe Caffe toolboox [14], annd stochastic gradient g descennt is applied too train all the CNNs wiith the batch siize of 64 imagges. As for the essential param meters ussed for buildinng a CNN, the momentum iss fixed as 0.9 and a the weeight decay iss 0.0005. Thee learning ratte was initialized to 0.001 and forceed to decreasee 10% after eaach 5000 iteraations. w used 2-foldd cross validaation to conduuct our Addditionally, we exxperiment. Thhe method is to divide eaach of the daatasets eqqually to two parts and firsst take one part for traininng and annother part forr testing. Thenn, switch the partition and coonduct the experiment again. The finnal accuracy iis the average of the wo experimentts. tw B.. Results mes of our prooposed For each grouup of σ and λ,, we ran 10 tim CN NN structure and used tthe average values v of 100 final acccuracies as the results oof experimennt. In addition, we peerformed the E EPTC schemee [12] on thesse two databasses for the performance comparisonn. By the wayy, the implem menting coode of EPTC iis provided byy the authors of [12]. The rresults obbtained on BO OSSbase is recorded in Table 4.1 and the rresults obbtained from U UCID&NRCS is recorded inn Table 4.2. As shown inn Table 4.1 annd Table 4.2, the proposedd CNN strructure signifficantly outpeerforms the E EPTC. Even oon the woorst case of E EPTC as σ 1.0, λ 0.5, the performannce of CN NN can achhieve 99.91% % on BOSS S and 98.89% % on UC CID&NRCS, which are 10..09% and 13.112% higher thhan the peerformance off EPTC, resppectively. Thee training losss and tessting error onn the worst case as σ 1.0 and λ 0.5 0 are plotted in Fig. 4.1. These tw wo plotted cuurves show thhat the coonvergence off results occurrred in 20,000 iiterations.

Parameters oof USM

CNN

EPTC

σ

0.7, λ

1.0

999.96%

93.22%

σ

1.0, λ

0.5

999.91%

89.82%

σ

1.0, λ

0.8

999.95%

93.95%

σ

1.0, λ

1.0

999.98%

95.26%

σ

1.0, λ

1.3

999.98%

96.53%

σ

1.0, λ

1.5

999.96%

97.10%

σ

1.3, λ

1.0

999.98%

95.96%

σ

1.3, λ

1.5

999.97%

97.48%

σ

1.5, λ

1.0

999.98%

96.23%

Parameters of USM

C CNN

EPTC

σ

0.7, λ

1.0

999.53%

88.73%

σ

1.0, λ

0.5

988.89%

85.73%

σ

1.0, λ

0.8

999.50%

91.00%

σ

1.0, λ

1.0

999.65%

92.87%

σ

1.0, λ

1.3

999.81%

94.92%

σ

1.0, λ

1.5

999.86%

95.32%

σ

1.3, λ

1.0

999.75%

94.15%

σ

1.3, λ

1.5

999.89%

95.67%

σ

1.5, λ

1.0

999.78%

94.23%

Table 4.2 dettection accuraccy on UCID& &NRCS datasetts

(a)

Table 4.1 detection acccuracy on BO OSS datasets (bb) Fig.. 4.1 Training loss and testinng error whenn σ

1.0, λ

0.5

Inn addition, we have also ddemonstrated the rationalityy for choosing 4 layeer-groups. It is noted thaat the numbeer of layeer-groups andd the testingg performancce are not iin a

7 proportional relation that the more the better. The specific working means of adding layer-groups is to add similar convolutional modules, which always contain four layers (convolutional layer, normalization layer, activation layer and pooling layer), described in section 3.B. The principle is that the last pooling layer should transform the feature maps in any size to a single element. To find a proper number of layer-groups in CNN designing, 4, 5, 6 and 7 layer-groups are compared, respectively. It appears that the worst case occurred when σ 1 and λ 0.5 . Our experiments are shown in Table 4.3. Table 4.3 reveals that using 4 layer-groups has the best performance. Furthermore with the increasing of the number layer-groups, the overfitting to image contents by CNN training becomes much more serious. As a result, 4 layer-groups is applied on the proposed CNN structure. Group number Accuracy (%)

4

5

6

7

99.16%

96.85%

98.35%

97.35%

Table 4.3 Comparison of different number of layer-groups when σ 1, λ 0.5 USM adds a ratio of high pass regions, such as the edge of image, to original image. Hence the high pass regions of digital image become more obvious. In this situation, choosing max pooling instead of average pooling for down sampling is considered as a more proper choice. The comparison of performance in experiments also fully proved this consideration. The result on the worst case, when σ 1, λ 0.5 , is listed in Table 4.4. Type of pooling Max pooling Average pooling layer Accuracy (%) 98.15% 93.35% Table.4.4 Comparison between Max pooling and Average pooling As shown in Table.4.4, the performance of Max pooling is almost 5% higher than using average pooling. Therefore, the obvious improvement of using Max pooling is testified.

5. Conclusion In this paper, an efficient CNN structure is proposed to detect images processed by unsharp masking (USM). The proposed CNN structure contains 16 layers in convolutional module to generate 64-D features. To aim at sharpening detection, which is an edge enhancement algorithm, the Max pooling is taken to process down sampling. In addition, the ReLU layers have speed up the processing and greatly reduced the experimental time. Finally, the experimental results derived from two large image data sets have shown that the proposed CNN structure outperforms the existing method (EPTC) for sharpening detection in term of detection accuracy. In the future, we will further work on image sharpening in order to further enhance the detection accuracy and speed up the process.

8

REFERENCES [1] S. Lyu and H. Farid, “How realistic is photorealistic?” Signal Processing, IEEE Transactions on, vol. 53, no. 2, pp. 845–850, 2005. [2] H. Farid, “Digital image forensics,” Sci. Amer., vol. 298, no. 6, pp. 66–71, 2008. [3] J. Fridrich, “Digital image forensics,” Signal Processing Magazine, IEEE, vol. 26, no. 2, pp. 26–37, 2009. [4] R. C. Gonzalez and R. E. Woods, Digital image processing (2nd edition). Upper Saddle River, NJ: Prentice-Hall, 2002. [5] G. Cao, Y. Zhao, and R. Ni, “Detection of image sharpening based on histogram aberration and ringing artifacts,” in Proc. IEEE Int. Conf. Multimedia and Expo (ICME), 2009, pp. 1026–1029. [6] G. Cao, Y. Zhao, R. Ni and A. C. Kot, “Unsharp masking sharpening detection via overshoot artifacts analysis,” IEEE Signal Process. Lett., vol. 18, no. 10, pp. 603–606, 2011. [7] F. Ding, G. Zhu, and Y. Q. Shi, “A novel method for detecting image sharpening based on local binary pattern,” in Int. Workshop on Digital forensics and Watermarking (IWDW), 2013. [8] T. Ojala, M. Pietikäinen, and D. Harwood, “A comparative study of texture measures with classification based on featured distributions,” Patt. Recognit., vol. 29, no. 1, pp. 51–59, 1996. [9] T. Ojala, M. Pietikainen and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Trans. Patt. Anal. Mach. Intell., vol.24, no.7, pp.971–987, 2002. [10] L. Lu, G. Yang, and M. Xia, “Anti-forensics for unsharp masking sharpening in digital images,” Int. J. Digital Crime Forensics, vol. 5, no. 3, pp. 53–65, 2013. [11] F. Ding, G. Zhu, J. Yang, J. Xie, Y.Q Shi, “Edge perpendicular binary coding for USM sharpening detection,” IEEE Sig. Process. Lett. 22(3), 327-331 (2015). [12] F. Ding, W. Dong, G. Zhu, Y.Q Shi “An Advanced Texture Analysis Method for Image Sharpening Detection”, Digital-Forensics and Watermarking. 14th International Workshop, IWDW, pp. 72-82, 2015. [13] Glorot, X., Bordes, A., and Bengio, Y. (2011b). Deep sparse rectifier neural networks. In JMLR W&CP: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011) [14] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: convolutional architecture for fast feature embedding. In: Proc. ACM Int. Conf. Multimed., pages 675-678, 2014.

[15] P. Bas, T. Filler, and T. Pevný, “Break our steganographic system - the ins and outs of organizing BOSS,” Inf. Hiding, 13th Int. Conf., vol. 6958, LNCS, Prague, Czech Republic, pp. 59-70, May 18-20, 2011. [16] J. F. Canny, Finding edges and lines in images, Mass. Inst. Technol., Tech. Rep., Cambridge, MA, USA, 1983 [17] Y. LeCun, “Generalization and network design strategies,” Proc. Connectionism in Perspective, Zurich, Switzerland, Oct. 1989. [18] S. Becker and Y. LeCun, “Improving the convergence of back-propagation learning with second-order methods,” Proc. Connectionist Models Summer School, Morgan Kaufman, San Mateo, pp. 29-37, 1988, [19] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1, no. 4, pp. 541-551, 1989. [20] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, 1998. [21] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Proc. Adv. Neural Inf. Process. Syst., pp. 1106-1114, 2012. [22] O. Russakovsky, “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vision, vol. 115, pp. 211-252, 2015. [23] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” European Conf. Comput. Vision, Springer International Publishing, 2014. [24] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [25] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” Proc. IEEE Int. Conf. Comput. Vision Pattern Recog., pp. 1-9, Jun. 2015. [26] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. IEEE Conf. Comput. Vision Pattern Recog., 2016. [27] Y. Qian, J. Dong, W. Wang, and T. Tan, “Deep learning for steganalysis via convolutional neural networks,” Proc. SPIE Media Watermarking Security Forensics, vol. 9409, pp. 94090J-94090J-10, 2015. [28] L. Pibre, J. Pasquet, D. Ienco, and M. Chaumont, “Deep learning is a good steganalysis tool when embedding key is reused for different images even if there is a cover source mismatch,” Proc. SPIE, pp. 14-18, Feb. 2016. [29] G. Xu, H. Wu, and Y. Q. Shi, “Structural design of convolutional neural networks for steganalysis,” IEEE Signal Process. Lett., vol. 23, Issue 5, pp. 708-712, 2016.