Accepted Manuscript Digital video tampering detection: An overview of passive techniques K. Sitara, B.M. Mehtre
PII:
S1742-2876(16)30071-8
DOI:
10.1016/j.diin.2016.06.003
Reference:
DIIN 638
To appear in:
Digital Investigation
Received Date: 25 November 2015 Revised Date:
24 May 2016
Accepted Date: 22 June 2016
Please cite this article as: Sitara K, Mehtre BM, Digital video tampering detection: An overview of passive techniques, Digital Investigation (2016), doi: 10.1016/j.diin.2016.06.003. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Digital Video Tampering Detection: An Overview of Passive Techniques
RI PT
Sitara K.a,b,∗∗, B.M.Mehtrea,∗ a Center
SC
for Cyber Security (CCS), Institute for Development and Research in Banking Technology (IDRBT), Established by Reserve Bank of India, Hyderabad-500057, India b School of Computer Science and Information Sciences(SCIS), University of Hyderabad, Hyderabad-500046, India.
Abstract
TE D
M AN U
Video tampering is a process of malicious alteration of video content, so as to conceal an object, an event or change the meaning conveyed by the imagery in the video. Fast proliferation of video acquisition devices and powerful video editing software tools have made video tampering an easy task. Hence, the authentication of video files (especially in surveillance applications like bank ATM videos, medical field and legal concerns) are becoming important. Video tampering detection aims to find the traces of tampering and thereby evaluate the authenticity and integrity of the video file. These methods can be classified into active and passive (blind) methods. In this paper, we present a survey on passive video tampering detection methods. Passive video tampering detection methods are classified into the following three categories based on the type of forgery they address: Detection of double or multiple compressed videos, Region tampering detection and Video inter-frame forgery detection. At first, we briefly present the preliminaries of video files required for understanding video tampering forgery. The existing papers surveyed are presented concisely; the features used and their limitations are summarized in a compact tabular form. Finally, we have identified some open issues that help to identify new research areas in passive video tampering detection.
EP
Keywords: Video forgery detection, video tampering detection, video forensics, video anti-forensics.
AC C
1. Introduction
Video forensics deals with the scientific examination, comparison or analysis of video. Why is it required? Nowadays, most people carry a device with which they can capture a video or an image, and also, a wide variety of video and image editing tools are available that help in tampering the video at ease. This combination has a potential danger as anyone can modify the video contents according to his or her wish. Here comes the importance of video forensics∗ Corresponding
author corresponding author Email addresses:
[email protected] (Sitara K.),
[email protected] (B.M.Mehtre ) ∗∗ Principal
Preprint submitted to Elsevier
May 24, 2016
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
“Seeing is no longer believing”. Consider a scenario in which a CCTV camera is the only eye witness of a crime scene, when such a video is produced as evidence in the court of law, the authenticity of the video should be proved in a scientific manner, otherwise it could be challenged by the defense attorney. The word ‘scientific’ is very important when it comes to video forensics. Applying any operations to a video may make some changes to the video, i.e., we are manipulating the video to arrive at desired results. Therefore, these operations have to be scientifically correct, as the results in most cases have to be submitted before a legal authority. A video can be thought of as a sequence of images called frames. Video tampering attacks can be performed in the spatial, temporal and spatio-temporal domains. Region splicing and copy-paste tampering occurs in the spatial and spatio-temporal domains. Frame insertion, deletion, duplication and shuffling occurs in the temporal domain. It is intutive that the image tampering detection methods can be applied on videos. But it may not produce satisfactory results as expected due to the fact that videos may have complex scenarios like constantly moving objects or noise incurred due to compression. Also, exclusion of temporal domain information may lead to increase in computation cost. Video tampering detection techniques can be classified into active and passive (also called as blind). Watermarking and digital signature come under active techniques, where some key information to authenticate the video will be embedded intentionally into it. Any changes to this embedded information indicates a tampering. Not all devices have the capability to embed a watermark or digital signature to the video that they capture. These methods may fail in situations where tampering is done before inserting watermark or digital signature. In most realistic scenarios, prior information about the video such as metadata may not be available. Here comes the importance of passive video forensic techniques where the authenticity of the video is verified by extracting features from the video. Any editing operations will leave some footprints in the video which could be exploited for checking the authenticity of the video. These footprints include increase in prediction error; high spatial and temporal correlation between frame intensity values, noise and motion residues; abnormalities in optical flow, motion vectors, quality of frames, Variation of Prediction Footprint (VPF), Motion-Compensated Edge Artifact (MCEA), etc. A perpetrator who knows how these forensic methods work, can use anti-forensic techniques to hide tampering to reduce the effect of these footprints left by the editing process, which itself may lead to another footprint. This paper aims to provide an overview of blind video tampering detection and anti-forensic techniques that exists in the literature. The pros and cons mentioned against each work in this survey are defined by their respective authors. Some of these are also mentioned by other authors who have implemented the prior work, identified these based on their experiments and have published the solutions to overcome the identified limitations. The rest of the paper is organized as follows. Section 2 deals with the background concepts which is required for understanding this survey. Section 3 to 5 are dedicated to the survey addressing the three types of passive video tampering detection under consideration. Section 6 deals with anti-forensic techniques in the literature. Section 7 discusses the limitations or challenges faced by most methods covered in this survey. Section 8 concludes the survey by specifying the open issues in video tampering detection. 2
ACCEPTED MANUSCRIPT
2. Background
EP
TE D
M AN U
SC
RI PT
A video is a sequence of images which could be displayed continuously so as to create the illusion of motion exploiting the persistence of vision of human visual system. The audio components associated with the capturing scenes are also stored in the video file. In this survey, we are concentrating only on the video data part of a video file. Normally, a raw video may require a lot of storage space. So, most acquisition devices store videos in compressed formats. Video coding standards like Motion JPEG, MPEG-1 (Sikora, 1997), MPEG-2 (Sikora, 1997), MPEG-4 (Richardson, 2003), H.264(Richardson, 2003) etc are used for compression which help in efficient storage and transmission. Video can be represented in three dimensions - two spatial and one temporal dimensions as shown in Fig.1. Video compression exploits redundancy in spatial and temporal domain. The former is dealt with transform domain coding and the latter with predictive coding.
Figure 1: Video as a sequence of frames
AC C
The difference between any two consecutive frames in most videos is often very less. So, instead of storing all the frames in the video, the practice is to keep a reference frame and predict the remaining frames from it. In this scheme of video storage, there are mainly three types of frames in a compressed video: I-frame (Intra-coded), P-frame (Predicted) and B-frame (Bi-directionally predicted). I-frames are coded using a JPEG like scheme where the spatial redundancy alone is exploited. That is, it is treated more like an image. The first frame of a video is an I-frame. P-frames predict its value from previous I or P frames. It stores only the changes from its reference, so they are more efficient than I-frames. B-frames predict its value from the forward and backward reference frames - provide more compression. It is always not possible to predict the entire frames in a video from the first frame (eg: dynamic background videos). I-frames need to be inserted at regular intervals or based on the motion in the video. The video sequence is divided into fragments, called Group of Pictures 3
ACCEPTED MANUSCRIPT
SC
Figure 2: Structure of GOP
RI PT
(GOP). Each GOP has a particular structure, wherein the order of I, B and P frames are such that the I-frame appears first followed by B and P frames (Fig.2).
AC C
EP
TE D
M AN U
Most encoders use fixed number of frames in a GOP as it is easy to implement, but it affects coding efficiency and also visual quality. It is better to go for adaptive GOP (AGOP) structure where the length of GOP (number of frames) varies according to the video content. Each frame in a GOP will be segmented into blocks, called macro-blocks (MB). If a frame is in RGB, then it will be converted to YCbCr or YUV so that the video can be compressed in 4:2:0 format where the MB corresponding to a 16×16 pixel block consists of four 8×8 sample block from Y component and one 8×8 sample block from each chrominance components (sensitivity of human eyes are more towards luminance component) giving a total of six 8×8 blocks in an MB. For encoding a MB in P or B frames: search for its best matched MB in the reference frames and store the location along with how far this matched MB has to be displaced (motion vectors) to create the predicted frame. The predicted frames are compared with original frames to find the error residues that are coded in a JPEG like process. For coding each MB, it’s error residuals and motion vectors (MV) are stored. There are several types of MBs: Intra-coded MB (I-MB), Predicted MB (P-MB), Bi-predicted MB (B-MB) and skipped-MB (S-MB); where no coding is required for skipped-MBs. A frame is coded as one or more slices, which contains one or more MBs.
Figure 3: Classification of Digital Video Forensics
With this brief introduction on video, compression and its structure, we can move on to video tampering detection. The classification of digital video forensics is shown in Fig.3. The first one focuses on identifying the fake copies of video for copyright protection. The second one tries to differentiate between the computer generated and real videos. The third one is video tampering detection which can be defined as the detection of malicious modification of video content, done in order to conceal an object, an event or change the meaning of the video and so on in a video sequence. Source camera identification is concerned 4
ACCEPTED MANUSCRIPT
M AN U
1. Detection of double or multiple compression, 2. Region tampering detection and 3. Video inter-frame forgery detection.
SC
RI PT
with identifying the specific device used to record the video under examination. Steganography is the art of hiding information in a cover media (here video) and hence the name video steganography. Video steganalysis deals with the detection of this hidden information in a video file. The works in (Milani et al., 2012b) and (Rocha et al., 2011) provide an overview on video forensics. A few works in the literature have the capability to localize the tampered regions in the video ((Subramanyam and Emmanuel, 2012); (Labartino et al., 2013); (Lin and Tsay, 2014); (D’Amiano et al., 2015); (Feng et al., 2014)) whereas others can only classify the video as tampered or not ((Wang and Farid, 2006); (Chen et al., 2015); (Bidokhti and Ghaemmaghami, 2015); (Su et al., 2011); (Dong et al., 2012); (Stamm et al., 2012)). As mentioned in the introduction, video tampering detection techniques can be classified into active and passive methods. This survey, aims to explore the passive video forgery detection methods as shown in Fig.4 (enclosed in dotted box) based on the type of forgery they address:
AC C
EP
TE D
Each of these are explained in detail in the later sections. (Wahab et al., 2014) gives an idea of some of the passive video tampering detection techniques whereas (Joshi and Jain, 2015) provides a review on video authentication techniques developed in the last few years.
Figure 4: Classification of Tampering Detection
Datasets: (Qadir et al., 2012) has designed a dataset for video forensics specially related to source camera identification and integrity verification called Surrey University Library for Forensic Analysis (SULFA), which was the first of this kind and is freely available. There are 150 original videos collected from 3 different cameras. Each video has resolution of 320×240 and frame rate of 30fps. (Bestagini et al., 2013b) developed another freely available video dataset consisting of 20 videos: 10 original and 10 forged videos for detecting copy-move forgeries. Some of the videos in this are taken from the SULFA dataset. The resolution and frames per second of videos in this dataset is same as that in SULFA. Most researchers download videos in uncompressed format or capture 5
ACCEPTED MANUSCRIPT
RI PT
videos and make modifications to them according to their research requirements for training and/or testing. A tool to create tampered videos is developed by the authors in (Ardizzone and Mazzola, 2015). It consists of a selection step where the user have to select the object or region to be copied along with the number of frames where this object is present which is to be tracked. The user should also specify the destination where the copied object is to be pasted. User have to specify the transformation parameters if you don’t want a plain copymove forgery. The copied regions can be pasted at some locations in the same video or in a different video. Speeded Up Robust Features (SURF) keypoint descriptors are used for tracking the selected object. The trajectory of objects can also be changed. Blending is done to make the forgery more convincing.
SC
3. Detection of Double or Multiple Compression
TE D
M AN U
To tamper a video in compressed format, it has to be decompressed first. After editing, the resultant video may be stored again in compressed format. So a video might be a forged one, if it has undergone double or multiple compression. This process will leave some footprints or artifacts in the video, which could be exploited for tampering detection. The coding parameters used in the first and subsequent compression will be different in most cases that leads to these artifacts. In a natural video, the distribution of block Discrete Cosine Transform (DCT) coefficients (DC/AC) may be Gaussian or Laplacian. After quantization, the quantized DCT coefficients will also obey this. But it may not follow this distribution once recompression is done with different encoding parameters. A summary of techniques that are going to be discussed in this section is given in Table.1.
AC C
EP
Table 1 Summary of Double or Multiple compression detection techniques Features Used Ref. Limitations (Wang and Farid, Works with fixed GOP; senStatistical patterns 2006) sitive to noise and changes in the distribution in GOP; fails when number of DCT coefficients of frames deleted is an integral multiple of GOP size (Wang and Farid, Accuracy decreases when 2009) the ratio between the first and the second quantization scale is less than 1.7; analysis is done with I-frames only (Su and Xu, 2010) Fails with simple content or slow motion videos; accuracy decreases when second encoder bitrate is less than first encoder bitrate; analysis is done with I-frames only
6
ACCEPTED MANUSCRIPT
(Sun et al., 2012)
Markov statistics of quantized DCT coefficients
(Jiang et al., 2013)
Markov statistics of compression noise Pixel estimation in GOP, error between the true and estimated value
(Ravi et al., 2014)
M AN U
(Milani et al., 2012a) (Vazquez-Padin et al., 2012)
TE D
(Subramanyam and Emmanuel, 2013)
(Luo et al., 2008)
Correlation of a video with its re-encoded version with the same codec and coding parameters Number of different coefficients between I-frames of singly and doubly compressed videos, number of different coefficients between I-frames of the corresponding doubly and triply compressed videos
(Bestagini 2012)
EP
Blocking artifact
AC C
RI PT
Statistical patterns in the first digit distribution of DCT coefficients Variation of Prediction Footprint
Performance decreases with increase in target output (transcoding) bitrate Analysis is done with Iframes only Performance decreases with multiple compression Works with fixed GOP structure; Fails when G1=G2; As G1 increases, accuracy drops; where G1 and G2 are the GOP sizes of the first and second compression respectively Fails if second quantization scale (qs) is same as first; performance degrades when the second compression qs is an odd multiple of first Fails if second qs is same as first Works with fixed GOP; accuracy decreases when the ratio between the first and the second qs is less than 1.3; works with videos from static cameras only Fails when number of frames deleted is an integral multiple of GOP size; works with fixed GOP structure Performance degrades if coarse quantization is adopted in the second encoding step; content dependent Able to address double compression with same bitrate only; performance depends on proper selection of recompression bitrate
SC
(Xu et al., 2012)
et
al.,
(Huang et al., 2014)
7
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
Wang and Farid were the pioneers in this field and they developed two techniques for tamper detection in MPEG videos (Wang and Farid, 2006) exploiting the facts that the I-frames of an MPEG sequence will undergo double JPEG compression; and relatively larger motion estimation errors occur when frames are moved from one GOP to another in an attempt to perform frame deletion or insertion. It relies on identifying the periodic spikes in the Discrete Fourier Transform (DFT) of the P-frame prediction error sequence. It requires human inspection and cannot run automatically on large amount of data as it may be prone to human error. Its sensitivity towards noise and changes in GOP are another downsides. Also, it fails when the number of frames deleted is an integral multiple of GOP. In (Wang and Farid, 2009), for each MB in the I-frames, the distribution of its DCT coefficients is computed and compared to an expected distribution. To measure how much they differ, a slight variant of the normalized Euclidean distance is used. This distance is converted to a probability based on which a MB is classified as double compressed or not. The authors in (Su and Xu, 2010) also utilized the distribution of quantized DCT coefficients from MBs of I-frame, but used a different detection step. Another work exploiting DCT coefficient distribution from I, P and B frames was presented in (Xu et al., 2012) and (Xu et al., 2013). A feature vector of dimension 3×3 is computed from 3 features (Xu et al., 2012): squares due to error (SSE), root mean squared error (RMSE) and R-Square. Support Vector Machine (SVM) classifier is used for classifying the input video as forged or not. In (Sun et al., 2012), the method used is similar to (Xu et al., 2012) and (Chen and Shi, 2009), but, a 12-D feature is created by fitting the first digit distribution of Alternating Current (AC) coefficients from each I-frame alone, using parametric logarithmic law. In (Milani et al., 2012a), the authors proposed a method to recover the number of compression steps applied to a video sequence using multiple SVM exploiting the Benford’s law. The statistics of the most significant digit of quantized transform coefficients are used to generate feature vectors for training and testing the SVM. Apart from double compression detection, (Vazquez-Padin et al., 2012) discusses a method to estimate the GOP size of first compression, if the video is doubly compressed. The idea is that, whenever an I-frame was coded as a P-frame, there is a noticeable increase in the number of I-MBs and a considerable reduction of S-MBs from (n − 1)th frame to nth frame; decrease in I-MBs and increase in S-MBs from nth to (n + 1)th frame; provided (n − 1), n and (n + 1) are all P-frames. This variation of the MB prediction types in the re-encoded P-frames, called VPF, is recorded and its periodicity is calculated. As it works with VPF, good performance can be expected with H.264 but the algorithm works with fixed GOP. In (Jiang et al., 2013), the authors modeled first order Markov statistics on the differences along the horizontal, vertical, major diagonal, and minor diagonal directions of the quantized DCT coefficients. Transition Probability Matrix (TPM) of size 9 × 9 is obtained from this. Each GOP is treated as a detection unit, which will be classified as singly or doubly compressed by Fisher’s linear discriminant (FLD) analysis. In (Ravi et al., 2014), compression noise is estimated from the video based on Huber Markov Random Field (HMRF) and maximum a posteriori (MAP) criteria, modeled as first order Markov Process. TPM of size 3 × 3 is calculated in each of the eight directions considering 8connected neighborhood in 16 × 16 block. A clip is classified as forged when the number of frames classified as forged/double compressed is above a thresh8
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
old. In (Subramanyam and Emmanuel, 2013), each pixel of a given frame is estimated from the spatially co-located pixels of all the other frames in a GOP, the error between the true and estimated value is subjected to a threshold to identify the double compressed frame or frames in a GOP. Recompression detection based on blocking artifacts is discussed in (Luo et al., 2008). (Bestagini et al., 2012) proposed a method to identify the type of codec used in the first encoding step, they utilized the fact that when the reconstructed sequence from double compression is re-encoded with the same codec and coding parameters it produces a sequence which is highly correlated to the input sequence. So they recompressed the reconstructed sequence with different codecs and coding parameters looking for similarities between them. Most algorithms fail when the video is recompressed with the same encoding parameters used for first compression. (Huang et al., 2014) proposed a method to handle this particular scenario alone. The basic assumption is that when a frame is recompressed with the same quantization matrix again and again, the number of different DCT coefficients between the sequential two versions will monotonically decrease, i.e., the number of different coefficients between the Iframes of singly and doubly compressed MPEG-2 videos with same bitrate will be larger than that between doubly and triply compressed MPEG-2 videos with same bitrate. Dealing with H.264/AVC coded video in forensics is difficult compared to other codecs. (Yammine et al., 2010) developed a method to find the GOP structure of video based on the behavior of the noise variance along the decoded sequence. This could be used in video tampering detection where the GOP structure is not known. Fast Noise Variance Estimation algorithm is used for measuring the noise. As I-frames have high or low noise power, they could be differentiated. Autocorrelation gives high periodic peaks with a period equal to the GOP size. This method works better on sequences having just one scene. The detection could become more difficult in the presence of more scenes due to the difference in noise power in different scenes. (Tagliasacchi and Tubaro, 2010) developed a method to find the quantization parameter in H.264 decoded video by using motion residuals at block level, but it works with fixed GOP. Estimation of MPEG2 parameter information from the decoded video stream without access to the MPEG stream is presented by (Li and Forchhammer, 2009). (Valenzise et al., 2010) estimated quantization parameter and motion vectors from H.264 decoded video without using the encoded bitstream, but use of de-blocking filter affects performance. They took 4 × 4 block size for motion estimation, but H.264 allows variable block size. 4. Detection of Region Tampering
In this section, methods that give information about the location of tampering in the spatial as well as temporal domain such as - copy-paste/region duplication tampering and splicing in the video are discussed. By copy-paste or copy-move tampering, it means copying a small portion of the frame and pasting at another location in the same frame, or copying particular regions from sequence of frames and pasting it at another sequence of the same video as shown in Fig.6(a). In this figure, the red frame regions indicate tampering. Fig.5 shows an example of copy-move tampering from the SULFA dataset provided by (Qadir et al., 2012), the images in the left side of the figure correspond 9
(f)
AC C
EP
TE D
(e)
(d)
M AN U
(c)
(b)
SC
(a)
RI PT
ACCEPTED MANUSCRIPT
(g)
(h)
(i)
(j)
Figure 5: Copy-move Tampering - the images in the left side of the figure (a,c,e,g,i) corresponds to frames 101 to 105 of an original video and those on the right (b,d,f,h,j) corresponds to the frames at the same position in a tampered video where the presence of a lady in the scene is concealed by copy-move forgery (frames taken from the SULFA dataset)
10
ACCEPTED MANUSCRIPT
(a)
M AN U
SC
RI PT
to frames 101 to 105 of an original video and those on the right correspond to the frames at the same position in a tampered video where the presence of a lady in the scene is concealed by copy-move forgery. Splicing in video tampering indicates the insertion of foreign objects or copying of frame regions from a different video and pasting to the target video frames, as shown in Fig.6(b) where green frame regions stand for the pasted portion which is copied from frames of different video. Geometric transformations and retouching of the tampered regions will make tampering detection a difficult task in video. Most of the methods discussed in this section are not capable of addressing copy-paste attack with geometric transformations. A summary of techniques that are going to be discussed in this section is given in Table.2.
(b)
Figure 6: Region Tampering (a) Copy-move forgery - indicated in red are small portions of frames copied from the video and pasted at later sequence of frames of the same video, (b) Splicing - green frame regions stand for the pasted portion which is copied from frames of different video
AC C
EP
TE D
Table 2 Summary of Region tampering detection techniques Features Used Ref. Limitations Special & temporal (Wang and Farid, Compression artifacts and noise correlation 2007a) (De- will degrade performance interlaced) (Wang and Farid, Gives false positives for uniform 2007b) areas like clouds; fails when duplication is done synchronously with the period of the moving camera; fails to detect duplication in static background video frames Motion vector (Wang and Farid, Compression artifacts and noise based 2007a) (Interlaced) will degrade performance (Li et al., 2013) Works well on static background videos compared to moving background; Compression affects performance; may fail if tampering is done properly
11
ACCEPTED MANUSCRIPT
(Pandey 2014)
et
al.,
(Chen et al., 2015)
Ghost shadow artifact
(Zhang et al., 2009)
Noise and quantization residue
(Chetty et al., 2010; Goodwin and Chetty, 2011) (Subramanyam and Emmanuel, 2012)
TE D
M AN U
Motion residuals
(Labartino et al., 2013)
EP
Histogram of Oriented Gradients (HOG) VPF, histogram of DCT coefficients Spatio-temporal coherence Difference between current & nontampered reference frame
(Lin and Tsay, 2014) (Su et al., 2015a)
Zernike moments and 3D patch match
(D’Amiano et al., 2015)
AC C
RI PT
(Kobayashi et al., 2010)
Sensitive to quantization noise, too high or too low illumination; content dependent; works well on static background videos only Deals with static scene videos only; performance depends on the codec used for video compression; post-processing in forgery such as tuning brightness and contrast affects the noise characteristics heavily Detection accuracy decreases with increase in compression ratio Extact localization of forged objects in the video frame is not possible. Bit rate reduction reduces the performance of the system. Cannot accurately locate the tampered areas in each frame; works well in static background videos only Sensitive to noise, too high or too low illumination; works well static background videos only Works with fixed GOP; copypaste tampering alone is addressed Presence of B-frames are not considered; works with Variable Bit Rate (VBR) coding only Performance decreases with increase in compression Works with static background videos; detection accuracy decreases when the deleted foreground is very small, or too fast moving Accuracy is very low
SC
(Hsu et al., 2008) Temporal noise
12
ACCEPTED MANUSCRIPT
(Bidokhti and Ghaemmaghami, 2015)
Performance depends on Region of Interest (ROI) mask selection which is a manual process; works with fixed GOP; fails when displacement of forged parts is not a multiplier of GOP’s length and performance decreases in videos with high amount of motion.
RI PT
Optical flow
AC C
EP
TE D
M AN U
SC
(Wang and Farid, 2007a) developed a method for tamper detection in interlaced and de-interlaced video. The spatial and temporal correlations introduced by the de-interlacing algorithms may get destroyed due to tampering. This may disturb the motion across fields of neighboring frames in de-interlaced videos and the equality of the motion between fields of a single frame in interlaced video. The authors described a method to detect the frame rate conversion that might result after a video was manipulated. They also proposed a technique to detect frame duplication and region duplication in (Wang and Farid, 2007b) using correlation. For tampering detection in moving camera (panning surveillance camera), the motion of camera and the period of scene view is also taken into account which fails when duplication is done synchronously with the period. In (Hsu et al., 2008), block-level correlation values of noise residuals are extracted and its distribution in the forged and normal video is modeled as a Gaussian mixture model (GMM). Expectation-Maximization (EM) algorithm is used to estimate the parameters of GMM model. Based on these estimated parameters, Bayesian classifier is employed to find the optimal threshold value. This method is not good for moving camera or dynamic background videos. In (Li et al., 2013), the magnitude and orientation of the motion vectors computed from adjacent frames are used to differentiate the authentic region and the forged region. The distribution of MVs are often uniform for normal movement compared to the tampered region. So, the variance of the angle of the MVs in the tampered regions are bigger than those in the normal moving regions. A method to detect region tampered frames (add/remove objects) using features extracted from motion residuals of each frame is proposed by (Chen et al., 2015). In order to make the system robust to variable GOP structure videos, collusion operators are used for generating motion residues. A sequence of frames centered at k are taken to compute the motion residue of the k th frame. The feature extractors used in image steganalysis such as CC-PEV, SPAM, CF*, CDF, SRM, CC-JRM and J+SRM are adopted in their work to extract forensic features from the motion residuals 1 . From their experiments, it is shown that CC-PEV and J+SRM provided better results. An ensemble classifier consisting of multiple FLD base learners are adopted for classifying the frame as pristine/double compressed, double/forged and where, the ensemble classifier makes it decision using majority voting strategy from individual base 1 The abbreviations for the feature extractors listed http://dde.binghamton.edu/download/feature extractors/
13
above
are
taken
from
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
learners. Usually, removal of moving objects by video inpainting causes ghost shadow artifacts. It is detected from the inconsistencies of moving foreground, segmented from the video frames and the moving track obtained from the accumulative frame differences in (Zhang et al., 2009). This method is robust to MPEG compression and recompression. In (Kobayashi et al., 2010), the authors developed an approach to detect suspicious regions in static scene videos (surveillance) using noise characteristics. The variance of the irradiance-dependent noise in image signals are described by a noise level function (NLF). They used a probabilistic model that controls the characteristics of the noise at each pixel. Pixels in the spliced regions are differentiated using MAP estimation of the noise model where the NLFs of these regions are inconsistent with the rest of the video. In (Chetty et al., 2010) and (Goodwin and Chetty, 2011), video tamper detection in low-bandwidth Internet streamed videos using the residue (the noise and the quantization residue) features from intra-frame and inter frame pixel sub-blocks, their transformation in the cross-modal subspace using Latent Semantic Analysis (LSA), Cross-modal Factor Analysis (CFA), and Canonical Correlation Analysis (CCA) and their subsequent multimodal fusion is discussed. In (Subramanyam and Emmanuel, 2012), Histogram of Oriented Gradients (HOG) features are computed from the cells of overlapping sub-blocks of each frames. The HOG feature of each block in a frame is compared with the rest of the blocks in the same frame for identifying intra-frame copy-move forgery. For detecting inter-frame copy-move forgery, the HOG features of each block in a frame is compared with all blocks in another frame (taken from fixed GOP size of 12). In (Labartino et al., 2013), the authors used (Vazquez-Padin et al., 2012) to estimate the GOP size of first compression to locate the frames which are intra-coded twice and a double quantization analysis is performed on them. In (Lin and Tsay, 2014), an approach to detect tampering done by inpainting methods, such as temporal copy-and-paste (TCP) and exemplar-based texture synthesis (ETS) is proposed. These inpainting methods are used to fill the holes left by removed objects. Frame motion information is calculated from the grayscale converted video for frame grouping and alignment to handle the camera motion so that it can be neglected in the subsequent analysis and detection steps. Spatio-temporal coherence analysis is performed over each frame group independently. This produces a group coherence abnormality pattern (GCAP) which is used to identify regions having unnaturally high or abnormally low coherence. Each spatiotemporal slice is compared to its GCAP to determine if it is tampered. It works with static and dynamic background videos. In (Pandey et al., 2014), Copy-Move forgery detection in spatial and temporal domain is done separately. SIFT and k-Nearest Neighbors (k-NN) matching is used for copy-move detection in the spatial domain whereas cross correlation of noise residue is used in the temporal domain. In (Su et al., 2015a), moving foreground removal from static background videos are detected. The difference between current and non-tampered reference frames is computed and features are extracted using k-SVD (k-Singular Value Decomposition), projected into a lower-dimensional subspace which is clustered by k-means, and the final result is obtained by combining the detection results for each frame. A method to detect copy-move forgery using Zernike moments and 3D patch match is discussed by (D’Amiano et al., 2015). The authors extended the 2D 14
ACCEPTED MANUSCRIPT
M AN U
5. Detection of Video Inter-frame forgery
SC
RI PT
patch match for images used in (Cozzolino et al., 2014) to account the temporal information. Though the method’s accuracy is very low, it is rotation and scale invariant. Regional copy-move forgery detection based on Lucas Kanade optical flow between adjacent frames is proposed by (Bidokhti and Ghaemmaghami, 2015). Each video frame is divided into two parts: Suspicious region (ROI for attackers) and the remaining areas. The optical flow (OF) and the optical flow variation factor (OPVF) of these two parts are computed separately. If secondary peaks are found in the OPVF of these two parts, then the video is a forged one. Assuming fixed size GOP, peaks are periodic. Autocorrelation is used to find the periodic peaks in OPVF of two parts separately and the ratios (rs and rr) between the largest and the second largest values of autocorrelation in each of these sequences are computed. Thresholding approach on the ratios is used for tamper classification.
TE D
Video inter-frame forgery detection deals with the following four types of tampering in the temporal domain: (1) frame insertion - inserting frames from a different video indicated as red in Fig.7(b) to a target video; (2) frame deletion/removal - deleting frames from the video, indicated with dashed border lines in Fig.7(c); (3) frame duplication - copying a sequence of frames and pasting it at another location in the same video, shown in Fig.7(d) where frames 2 to 4 are copied and pasted after the 7th frame and no frames are deleted, the same can be performed with frame deletion to fill the gap of deletion; (4) frame shuffling - changing the order of frames to change the order of events. Frame shuffling with duplication and deletion is shown in Fig.7(e) where frames 2 to 5 are copied and pasted from 8 to 11 after shuffling. Fig.7(a) shows the first 12 frames in a video. A summary of techniques that are going to be discussed in this section is given in Table.3.
AC C
EP
Table 3 Summary of Video Inter-frame forgery detection techniques Forgery Features Ref. Limitations Addressed Used MCEA in (Su et al., 2009) Works with fixed GOP P-frames structure with at least 3 Frame deletion P-frames; fails when number of frames deleted is an integral multiple of GOP; fails when tampered video’s impact factor and original video’s impact factor comes in the same interval; performance degrades in slow motion videos
15
ACCEPTED MANUSCRIPT
et
al.,
(Su et al., 2011)
(Liu et 2014)
al.,
Differences of mean motion residual of adjacent frames Prediction residuals, percentage of I-MBs, quantization scales and reconstruction quality Color histogram difference Correlation of SVD features of frame sequences
(Feng 2014)
al.,
M AN U
et
Works with fixed GOP structure only and fails when number of frames deleted is an integral multiple of GOP As bitrate increases accuracy decreases; not good for surveillance videos where motion is very less Fails when number of frames deleted is a integral multiple of GOP
SC
Periodic artifacts in the DCT coefficients of P and B-frames Sequence of Average Residual of P-frames (SARP)
Works with fixed GOP structure; fails when number of frames deleted is an integral multiple of GOP Fails when number of frames deleted is a integral multiple of GOP
RI PT
(Dong 2012)
al.,
(Yang 2014)
al.,
et
Correlation between suspicious frames
(Singh et al., 2015)
P-frame prediction error sequence
(Stamm et al., 2012)
AC C Frame insertion & deletion
(Lin et 2011)
TE D
EP
Frame duplication
(Shanableh, 2013)
16
Fails if the copied set of frames are shuffled before pasting Fails if number of frames duplicated is less than the window size considered and when frame duplication is done in different order. Compression may decrease the performance of the system as the method works with correlation; fails when frame duplication is done in different order. Sensitive to noise
ACCEPTED MANUSCRIPT
(Wang et al., 2014a)
Increase in compression leads to decrease in performance; frame deletion accuracy is less
(Gironi et al., 2014)
Works with fixed GOP size fails when G1=G2; localization of cutting and insertion point is not as precise as in MV based schemes Method can only report if forgeries exist or not and cannot distinguish between frame insertion and deletion; performance decreases if the number of frames deleted is less. Works well on frame insertion when the inserted frame have a different background scene Detection accuracy decreases when number of frames inserted or deleted is less than 25; work on videos recorded by stationary cameras and forged videos having only one type of forgery where insertion or deletion is performed once Deblocking filter and intra prediction in H.264/AVC will degrade the performance; fails when same rate control method is used to transcode the video As compression increases, frame duplication detection accuracy decreases; works in videos recorded by static surveillance cameras
(Zhang et al., 2015)
Block-wise Brightness Variance Descriptor (BBVD)
(Zheng et al., 2015)
M AN U
SC
Quotients of consecutive correlation coefficients of local binary pattern coded frames Optical flow
(Chao 2013)
et
al.,
TE D
EP
Sum of absolute differences between video frames before and after applying deblocking filter Consistency of velocity field intensity
AC C Frame deletion & duplication
RI PT
Differences of correlation coefficients of gray values between sequential frames VPF
(Su et 2015b)
al.,
(Wu et 2014)
al.,
17
ACCEPTED MANUSCRIPT
Frame insertion,Optical flow deletion and duplication
Detection accuracy decreases with increase in compression
(Bestagini et al., 2013a)
Compression affects performance; fails to detect downsampling where interpolation factor >= 2 Frame deletion accuracy is less; complicated backgrounds, frequent motions and compression affects performance Good performance on camera in stationary or slow-moving mode; fails in dynamic background videos Works well on videos with high motion content
(Wang et al., 2014b)
(Liu and Huang, 2015)
Motion energy at spatial region of interest (SROI), average object area and entropy
(Gupta et al., 2015)
M AN U
Zernike Opponent chromaticity moments (ZOCM)
TE D
Frame repetition and deletion
(Bestagini et al., 2013b)
RI PT
Residual computed between adjacent frames, crosscorrelation of residual MVs, periodicity of squared prediction error
SC
Fixed Frame repetition and spatiotemporal region duplication Frame interpolation
AC C
EP
(Su et al., 2009) used MCEA for frame deletion detection, which is a side effect of the blocking impairment and motion-compensated prediction. This appears in video codecs where block-based motion-compensated prediction is used. With coarse quantization, blocking artifacts may propagate from I-frames into subsequent frames and accumulate. Within a GOP, new high frequency artifacts will arise in the block boundaries when the frame distance increases from the last I-frame. Video inter-frame forgery will affect the temporal correlation between frames and hence, it disturbs MCEA energy. So this extra energy is detected and an impact factor is calculated from MCEAs of first, second and third P-frames in a GOP that denotes the MCEA value distribution of the Pframes in a GOP, based on which a clip is classified as forged or not. It fails if an entire GOP or GOPs are deleted as no P-frame is moved from one GOP to another. In (Lin et al., 2011), the video is divided into subsequences and each sub-sequence is used as a query clip to check for duplicates. The similarity between adjacent frames is evaluated using color histogram difference (HD). The HD of candidate and query clips are computed and the correlation between them is obtained. If it is greater than a threshold, then the spatial correlation is computed by calculating the HD of each blocks between the query and the candidate frames. Then a thresholding approach is used to detect duplication. This method fails if the copied set of frames are shuffled before pasting. (Su et al., 2011) developed a method exploiting the periodic artifacts in the DCT 18
RI PT
ACCEPTED MANUSCRIPT
(b) Frame Insertion
(c) Frame Deletion
(d) Frame Duplication
M AN U
SC
(a) Original Video Sequence
(e) Frame Shuffling
Figure 7: Video inter-frame forgery
AC C
EP
TE D
coefficients of P and B-frames which occur due to tampering in MPEG videos. (Dong et al., 2012) used the Fast Fourier Transform (FFT) of MCEA difference instead of thresholding in (Su et al., 2009) due to its limitations. For all the GOPs in a sequence, the difference of MCEA (dM) between adjacent P frames is relatively steady. Video inter-frame forgery with recompression or change in GOP structure may lead to greater motion compensation errors in P frame, leads to some kind of transition in dM distribution, will appear periodically, results in peak spikes in the Fourier domain. In (Stamm et al., 2012), frame addition and deletion is detected both in fixed and variable GOP structure. Increase in prediction error will occur periodically for codecs with a fixed GOP pattern, which is exploited for detection in the fixed GOP whereas an energy detector is used in the variable GOP. A method to detect anti-forensic technique is discussed in (Stamm and Liu, 2011) which detects the unusual MVs with 0s by comparing the MVs of video in question with the estimated MVs of the same video. (Bestagini et al., 2013b) proposed algorithms to detect whether a spatiotemporal region of a video sequence was replaced by either a series of fixed images repeated in time, or a portion of the same video. In the first case, the algorithm detects the forgery by analyzing the footprint left on the residual computed between adjacent frames (exactly zero). In the second case, the forgery is detected exploiting a correlation analysis similar to (Wang and Farid, 2007b), cross-correlation of small 3-dimensional residual blocks is used. In (Bestagini et al., 2013a), they proposed a method for detecting temporal interpolation in videos. In video sequences tampered with temporal splicing, if the sequences spliced together do not share the same frame rate, they have to be temporally
19
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
interpolated beforehand, often achieved using motion compensated interpolators that allow to minimize visual artifacts. For detection, motion vectors are to be estimated from adjacent frames. Periodicity of the squared prediction error in the frequency domain is taken to find the interpolation factor. In (Liu et al., 2014), the Sequence of Average Residual of P-frames (SARP) is used both in the time and frequency domain to detect frame deletion in H.264 encoded videos. The SARP of P-frames within a GOP which are shifted from one GOP to another after frame deletion and recompression will be high compared to that of other P-frames in the same GOP. Thresholding is used to distinguish tampered videos from original videos. (Wu et al., 2014) proposed a method for detecting consecutive frame deletion and consecutive frame duplication based on the consistency of velocity field. The Velocity Field Intensity (VFI) in horizontal and vertical directions are computed from Particle Image Velocimetry (PIV) technique. The consistency of the VFI sequences in both directions will be destroyed if the video is manipulated by some inter-frame forgery operations, so relative factors are computed to reveal these changes. The generalized extreme studentized deviate (ESD) test can be used to detect one or more outliers in a univariate data set that follows an approximate normal distribution. Since the probability distribution of the relative factor sequence of VFIs follows an approximate normal distribution, ESD is applied to identify the forgery types and locate the manipulated positions in forged videos. In (Feng et al., 2014), a method to detect the frame deletion point using motion residual of video frame is proposed. To reduce false positives (as rapid motion videos have large motion residual), the differences of mean motion residual of adjacent frames are taken. Interference frames with statistical properties similar to Deletion point (DP) might be present in the tampered sequences, such as relocated I-frames (RI) means a frame which was initially an I-frame and became a P-frame after recompression, frames with sudden lighting change, frames with sudden zooming etc. But for DP, the distribution of the residual may not be as uniform as that in RI, i.e., fluctuation is more in DP than in RI. A texture descriptor relative smoothness based algorithm is proposed by the authors to quantify the fluctuation strength of the frame motion residual, which subdued the RI spikes. Temporal content difference due to forgery is the reason for this unique fluctuation feature of DP. Hence, intra prediction coding will lower the robustness of the feature. Intra-Prediction Elimination (IPE) processing procedure is proposed to eliminate the intra-predicted blocks in P-frames. This exposes the temporal motion and enhance the robustness of the fluctuation features. Thresholding approach is used to locate the deletion point. In (Wang et al., 2014a), the differences of correlation coefficients of gray values are used to detect forgery. For original sequences, they are consistent where as it is abnormal for inter-frame forgeries. The differences of correlation coefficients of gray values between sequential frames in the video are calculated, they are normalized and quantized to form a vector, which indicates the count of the statistical distribution of differences. SVM is used to classify original videos and inter-frame forgeries. (Gironi et al., 2014) extended the work in (Vazquez-Padin et al., 2012) for frame deletion or addition detection based on phase change in periodicity. The correlation coefficients of adjacent local binary pattern (LBP)-coded frames are used for frame insertion and deletion detection in (Zhang et al., 2015). The correlation between them will be high or very close if the video is 20
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
authentic. Frame insertion or deletion leads to poor correlation at the insertion and deletion point. Variation of video contents may also lead to low correlation coefficients. To handle such situations, the authors proposed to use quotients of consecutive correlation coefficients of local binary pattern (QCCoLBP). Frame insertion or deletion may disturb the continuity of QCCoLBP, hence, it can used for tampering detection. Tchebyshev inequality is applied twice on QCCoLBP to detect the abnormal points. Video inter-frame forgery detection based on Lucas Kanade optical flow between adjacent frames is proposed by (Chao et al., 2013) and (Wang et al., 2014b). In (Chao et al., 2013), the total optical flow values in the X direction of adjacent frames are almost the same for non-tampered videos which means the optical flow is consistent, likewise in the Y direction. Frame insertion and deletion forgery will destroy this consistency. A video is divided into subsequence or windows and the optical flow between the first and last frames in this window is computed, if it is above a threshold then binary searching scheme is used to detect frame insertion point. Frame-to-frame optical flows are computed and double adaptive thresholds are applied to detect frame deletion forgery. In (Wang et al., 2014b), the sum of magnitudes of optical flow velocities is calculated for each frame in the video. An optical flow variation factor to reveal the relative changes in the optical flow sequence is also computed for each frame from the magnitudes of optical flow velocities of its immediate neighbors. Gaussian model based statistical anomaly detection technique is employed for detecting frame insertion, duplication and deletion forgery. (Zheng et al., 2015) proposed Block-wise Brightness Variance Descriptor (BBVD) based method for detecting frame insertion and frame deletion forgery. It utilizes the persistence phenomenon of human vision - the image can be retained in the human visual system for about 0.1 to 0.4 s after its disappearance. So, in normal videos, the ratio of variation in brightness between two frames within a certain interval (0.4s) is close to a constant and consistent; large variation indicates tampering. Similarity-analysis based method for frame duplication detection was proposed in (Yang et al., 2014). Features of each frame are obtained via Singular Value Decomposition (SVD). Video is divided into overlapping subsequence of one frame. In each sequence, Euclidean distance is calculated between features of each frame and the reference frame which is the first frame in the sequence, considered as the new feature of each frame. The similarities between the subsequences are calculated using correlation of features, those video sequences with high similarity are identified as candidate duplications. The candidate duplications are confirmed through random block matching. In (Singh et al., 2015), 9 features (1 feature from mean of the frame/block, 4 features from the ratio for each sub block, and 4 features from residue of each sub blocks) are extracted from each frame in the video after dividing the frame into 4 sub blocks. After feature extraction, lexicographical sorting is performed to group similar frames. RMSE is calculated between features of adjacent frames after sorting. If RMSE is less a threshold value, those frames are discarded and rest of them are kept as suspicious ones. To detect the duplicated frames, the correlation between suspicious frames are computed. The authors considered frames having correlation value closer to 1 as duplicates. To avoid False positives, 4 consecutive frames after the highly correlated ones are also checked for duplication. If these groups of frames are correlated, then they are candidates of frame duplication. Video frames in RGB format are transformed to 2D opponent chromatic21
ACCEPTED MANUSCRIPT
TE D
M AN U
SC
RI PT
ity space by (Liu and Huang, 2015) for detection of frame insertion, deletion, replacement, and duplication forgeries. This gives the chromaticity aberration of the frame. Zernike moment correlation is used to calculate the Zernike Opponent chromaticity moments (ZOCM) from the chromaticity space. The abnormal points are extracted based on the difference in ZOCM between adjacent frames. To reduce false positives, fine detection using Tamura coarseness feature is proposed. (Gupta et al., 2015) proposed methods for detecting frame repetition and deletion. For detection of repeated frames, the mean square value of motion energy present in the video computed using frame differencing at spatial region of interest (SROI) is taken. To detect dropped or deleted frames, the average object area and entropy of difference between every consecutive frames is computed and given as input to SVM for tamper frame classification. For detecting frame insertion and deletion forgery, the sum of the absolute differences between the video frames, before and after applying the deblocking filter are calculated and its DFT are taken by (Su et al., 2015b), high difference indicates tampering. This method works well with fixed Quantization Parameter (QP) or smooth content video. A method that uses the rate control mechanism to check the quantization parameters is also discussed. The QP of each frame and its relationship with the bitrate can be used to determine what QP should be assigned in the next frame. If the actual QP value is different from the one that should be assigned, indicates tampering, suitable in situations where complete GOP is removed. Machine learning techniques such as k-NN, logistic regression and SVM are used for detection of frame deletion using prediction residuals, percentage of intra-coded MBs, quantization scales and an estimate of the PSNR values as features is presented in (Shanableh, 2013). This method is suitable for variable GOP, VBR and CBR coding. 6. Anti-forensic Techniques
AC C
EP
(Stamm and Liu, 2011) proposed an anti-forensic method to remove temporal fingerprints in tampered (frame addition or deletion followed by recompression) MPEG video sequences. It works by increasing the prediction error (motion vectors of certain MBs within the frame is set to zero) of certain P-frames so that the P-frame prediction error sequence approximates a target prediction error sequence. (Su et al., 2015b) discusses an anti-forensic method where the MB types of the frames and the quantization indices before and after the targeted frames are recorded as reference after decoding a video, to limit the coding modes of the targeted frames. In second round encoding, the indices are adjusted along with the MB types and residuals so that the edited video seems to be a normal one. The limitations discussed in section 7 can be utilized by the perpetrator for developing anti-forensic techniques. 7. Discussion on Limitations of Existing methods Most of the present methods are very sensitive to camera settings and lighting conditions. Sudden lighting change and automatic zooming of lens will affect the performance of most algorithms discussed in this survey. Algorithms
22
ACCEPTED MANUSCRIPT
M AN U
SC
RI PT
that exploit correlation of pixel values or residues are sensitive to compression artifacts and they may fail in natural videos. Generally, the methods developed to address one type of forgery are not capable of addressing other kinds of forgeries. The performance of these algorithms depends on the codecs used for compression and the video content. For example, methods capable of detecting forgeries in slow motion videos will fail in rapid motion videos and static backgrounds will fail in dynamic background videos etc. Moreover, these algorithms assume a fixed GOP structure. Only a few methods are capable of tampering detection in variable GOP structure videos. These methods work with some initial assumptions and limitations. Today the most used video encoder like H.264/AVC uses adaptive GOP structure where the GOP size can grow upto 250 frames depending on the changes in the video content. So the tampering detection techniques that exploit abnormal changes in error residues, motion vectors, relocated I-frames and fixed GOPs may fail in H.264 coded videos. The performance of those algorithms that rely on the changes in quantized DCT coefficient distribution is very poor. Frame shuffling in static scene videos is another problem which is not much explored in this field. Geometric transformation independent features are not used for tampering detection in video copy-move forgeries. A recompressed video may not be a tampered one in all cases. Compression artifacts based methods may fail in uncompressed tampered videos. Recompression of videos by using the same encoding parameters of first compression, removal of frames where the number of frames deleted is an integral multiple of GOP, forgery detection in highly compressed videos are other problems to be solved.
TE D
8. Conclusion and Future Scope
AC C
EP
With the advent of inexpensive and easy-to-use video acquisition devices (e.g., smartphones, camcoders) and video editing software tools (e.g., Adobe Premiere Pro, Mocha Pro by Imagineer Systems, CyberLink’s PowerDirector) one can record a scene and alter its content. A video can be altered by: replicating, removing, inserting and replacing the frames or objects in the frames. There is a growing interest in identifying the authenticity of the videos in many cases (e.g., video submitted as an evidence, genuinity of internet videos). In this survey, we have analyzed passive tampering detection methods based on the type of forgery that they address. An in-depth study of the works in these categories is performed by identifying the features used, their capabilities and limitations. We could not do a comparative study of the accuracy of these methods as each of them used different video datasets for training and testing. Most of these datasets were custom built to satisfy their research assumptions and constraints. Hence, these research findings may or may not work on other datasets including real scenarios. As of now, there exists no universal tool for video tampering detection. There is a need for effective and efficient methods capable of addressing open issues discussed in section 7. Furthermore, new techniques applicable to a wider range of video forgery detection have to be developed and their anti-forensic counterparts should also be explored. We envision that this topic is a fertile and fruitful research area.
23
ACCEPTED MANUSCRIPT
References
RI PT
Ardizzone, E., Mazzola, G., 2015. Image Analysis and Processing — ICIAP 2015: 18th International Conference, Genoa, Italy, September 7-11, 2015, Proceedings, Part II. Springer International Publishing, Cham, Ch. A Tool to Support the Creation of Datasets of Tampered Videos, pp. 665–675. URL http://dx.doi.org/10.1007/978-3-319-23234-8_61 Bestagini, P., Allam, A., Milani, S., Tagliasacchi, M., Tubaro, S., March 2012. Video codec identification. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 2257–2260.
SC
Bestagini, P., Battaglia, S., Milani, S., Tagliasacchi, M., Tubaro, S., 2013a. Detection of temporal interpolation in video sequences. In: Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, pp. 3033–3037.
M AN U
Bestagini, P., Milani, S., Tagliasacchi, M., Tubaro, S., 2013b. Local tampering detection in video sequences. In: Multimedia Signal Processing (MMSP), 2013 IEEE 15th International Workshop on. IEEE, pp. 488–493. Bidokhti, A., Ghaemmaghami, S., March 2015. Detection of regional copy/move forgery in MPEG videos using optical flow. In: Artificial Intelligence and Signal Processing (AISP), 2015 International Symposium on. pp. 13–17.
TE D
Chao, J., Jiang, X., Sun, T., 2013. Digital Forensics and Watermaking: 11th International Workshop, IWDW 2012, Shanghai, China, October 31 – November 3, 2012, Revised Selected Papers. Springer Berlin Heidelberg, Berlin, Heidelberg, Ch. A Novel Video Inter-frame Forgery Model Detection Scheme Based on Optical Flow Consistency, pp. 267–281. URL http://dx.doi.org/10.1007/978-3-642-40099-5_22
EP
Chen, S., Tan, S., Li, B., Huang, J., 2015. Automatic detection of object-based forgery in advanced video. IEEE Transactions on Circuits and Systems for Video Technology PP (99), 1–1.
AC C
Chen, W., Shi, Y. Q., 2009. Digital watermarking. Springer-Verlag, Berlin, Heidelberg, Ch. Detection of Double MPEG Compression Based on First Digit Statistics, pp. 16–30. URL http://dx.doi.org/10.1007/978-3-642-04438-0_2 Chetty, G., Biswas, M., Singh, R., 2010. Digital video tamper detection based on multimodal fusion of residue features. In: Network and System Security (NSS), 2010 4th International Conference on. IEEE, pp. 606–613. Cozzolino, D., Poggi, G., Verdoliva, L., Oct 2014. Copy-move forgery detection based on patchmatch. In: 2014 IEEE International Conference on Image Processing (ICIP). pp. 5312–5316. D’Amiano, L., Cozzolino, D., Poggi, G., Verdoliva, L., June 2015. Video forgery detection and localization based on 3D patchmatch. In: Multimedia Expo Workshops (ICMEW), 2015 IEEE International Conference on. pp. 1–6.
24
ACCEPTED MANUSCRIPT
Dong, Q., Yang, G., Zhu, N., 2012. A MCEA based passive forensics scheme for detecting frame-based video tampering. Digital Investigation 9 (2), 151–159.
RI PT
Feng, C., Xu, Z., Zhang, W., Xu, Y., 2014. Automatic location of frame deletion point for digital video forensics. In: Proceedings of the 2nd ACM workshop on Information hiding and multimedia security. ACM, pp. 171–179. Gironi, A., Fontani, M., Bianchi, T., Piva, A., Barni, M., 2014. A video forensic technique for detecting frame deletion and insertion. In: Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, pp. 6226–6230.
SC
Goodwin, J., Chetty, G., 2011. Blind video tamper detection based on fusion of source features. In: Digital Image Computing Techniques and Applications (DICTA), 2011 International Conference on. IEEE, pp. 608–613.
M AN U
Gupta, A., Gupta, S., Mehra, A., Feb 2015. Video authentication in digital forensic. In: Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE), 2015 International Conference on. pp. 659–663. Hsu, C.-C., Hung, T.-Y., Lin, C.-W., Hsu, C.-T., 2008. Video forgery detection using correlation of noise residue. In: Multimedia Signal Processing, 2008 IEEE 10th Workshop on. IEEE, pp. 170–174. Huang, Z., Huang, F., Huang, J., 2014. Detection of double compression with the same bit rate in MPEG-2 videos. In: Signal and Information Processing (ChinaSIP), 2014 IEEE China Summit & International Conference on. IEEE, pp. 306–309.
TE D
Jiang, X., Wang, W., Sun, T., Shi, Y. Q., Wang, S., 2013. Detection of double compression in MPEG-4 videos based on markov statistics. Signal Processing Letters, IEEE 20 (5), 447–450.
EP
Joshi, V., Jain, S., March 2015. Tampering detection in digital video - a review of temporal fingerprints based techniques. In: Computing for Sustainable Global Development (INDIACom), 2015 2nd International Conference on. pp. 1121–1124.
AC C
Kobayashi, M., Okabe, T., Sato, Y., 2010. Detecting forgery from static-scene video based on inconsistency in noise level functions. Information Forensics and Security, IEEE Transactions on 5 (4), 883–892. Labartino, D., Bianchi, T., De Rosa, A., Fontani, M., Vazquez-Padin, D., Piva, A., Barni, M., 2013. Localization of forgeries in MPEG-2 video through GOP size and DQ analysis. In: Multimedia Signal Processing (MMSP), 2013 IEEE 15th International Workshop on. IEEE, pp. 494–499. Li, H., Forchhammer, S., 2009. MPEG2 video parameter and no reference PSNR estimation. In: Picture Coding Symposium, 2009. PCS 2009. IEEE, pp. 1–4. Li, L., Wang, X., Zhang, W., Yang, G., Hu, G., 2013. Detecting removed object from video with stationary background. In: Proceedings of the 11th International Conference on Digital Forensics and Watermaking. IWDW’12. Springer-Verlag, Berlin, Heidelberg, pp. 242–252. URL http://dx.doi.org/10.1007/978-3-642-40099-5_20 25
ACCEPTED MANUSCRIPT
Lin, C.-S., Tsay, J.-J., 2014. A passive approach for effective detection and localization of region-level video forgery with spatio-temporal coherence analysis. Digital Investigation 11 (2), 120–140.
RI PT
Lin, G.-S., Chang, J.-F., Chuang, C.-H., 2011. Detecting frame duplication based on spatial and temporal analyses. In: Computer Science & Education (ICCSE), 2011 6th International Conference on. IEEE, pp. 1396–1399. Liu, H., Li, S., Bian, S., 2014. Detecting frame deletion in H.264 video. In: Information Security Practice and Experience. Springer, pp. 262–270.
SC
Liu, Y., Huang, T., 2015. Exposing video inter-frame forgery by zernike opponent chromaticity moments and coarseness analysis. Multimedia Systems, 1–16.
M AN U
Luo, W., Wu, M., Huang, J., 2008. MPEG recompression detection based on block artifacts. In: Electronic Imaging 2008. International Society for Optics and Photonics, pp. 68190X–68190X. Milani, S., Bestagini, P., Tagliasacchi, M., Tubaro, S., 2012a. Multiple compression detection for video sequences. In: Multimedia Signal Processing (MMSP), 2012 IEEE 14th International Workshop on. IEEE, pp. 112–117. Milani, S., Fontani, M., Bestagini, P., Barni, M., Piva, A., Tagliasacchi, M., Tubaro, S., 2012b. An overview on video forensics. APSIPA Transactions on Signal and Information Processing 1, e2.
TE D
Pandey, R. C., Singh, S. K., Shukla, K., 2014. Passive copy-move forgery detection in videos. In: Computer and Communication Technology (ICCCT), 2014 International Conference on. IEEE, pp. 301–306. Qadir, G., Yahaya, S., Ho, A., July 2012. Surrey university library for forensic analysis (SULFA) of video content. In: Image Processing (IPR 2012), IET Conference on. pp. 1–6.
EP
Ravi, H., Subramanyam, A., Gupta, G., Kumar, B. A., 2014. Compression noise based video forgery detection. In: Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, pp. 5352–5356.
AC C
Richardson, I. E., 2003. H. 264 and MPEG-4 video compression: video coding for next-generation multimedia. John Wiley & Sons. Rocha, A., Scheirer, W., Boult, T., Goldenstein, S., 2011. Vision of the unseen: Current trends and challenges in digital image and video forensics. ACM Computing Surveys (CSUR) 43 (4), 26. Shanableh, T., 2013. Detection of frame deletion for digital video forensics. Digital Investigation 10 (4), 350–360. Sikora, T., 1997. Digital consumer electronics handbook. McGraw-Hill, Inc., Hightstown, NJ, USA, Ch. Digital Video Coding Standards, pp. 83–823. URL http://dl.acm.org/citation.cfm?id=275869.275882
26
ACCEPTED MANUSCRIPT
Singh, V. K., Pant, P., Tripathi, R. C., 2015. Detection of frame duplication type of forgery in digital video using sub-block based features. In: Digital Forensics and Cyber Crime. Springer, pp. 29–38.
RI PT
Stamm, M. C., Lin, W. S., Liu, K., 2012. Temporal forensics and anti-forensics for motion compensated video. Information Forensics and Security, IEEE Transactions on 7 (4), 1315–1329.
Stamm, M. C., Liu, K. R., 2011. Anti-forensics for frame deletion/addition in MPEG video. In: Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE, pp. 1876–1879.
SC
Su, L., Huang, T., Yang, J., Sep. 2015a. A video forgery detection algorithm based on compressive sensing. Multimedia Tools Appl. 74 (17), 6641–6656. URL http://dx.doi.org/10.1007/s11042-014-1915-4
M AN U
Su, P.-C., Suei, P.-L., Chang, M.-K., Lain, J., 2015b. Forensic and anti-forensic techniques for video shot editing in h. 264/AVC. Journal of Visual Communication and Image Representation 29, 103–113. Su, Y., Nie, W., Zhang, C., 2011. A frame tampering detection algorithm for MPEG videos. In: Information Technology and Artificial Intelligence Conference (ITAIC), 2011 6th IEEE Joint International. Vol. 2. IEEE, pp. 461–464. Su, Y., Xu, J., 2010. Detection of double-compression in MPEG-2 videos. In: Intelligent Systems and Applications (ISA), 2010 2nd International Workshop on. IEEE, pp. 1–4.
TE D
Su, Y., Zhang, J., Liu, J., 2009. Exposing digital video forgery by detecting motion-compensated edge artifact. In: Computational Intelligence and Software Engineering, 2009. CiSE 2009. International Conference on. IEEE, pp. 1–4.
EP
Subramanyam, A., Emmanuel, S., 2012. Video forgery detection using HOG features and compression properties. In: Multimedia Signal Processing (MMSP), 2012 IEEE 14th International Workshop on. IEEE, pp. 89–94.
AC C
Subramanyam, A., Emmanuel, S., 2013. Pixel estimation based video forgery detection. In: Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, pp. 3038–3042. Sun, T., Wang, W., Jiang, X., 2012. Exposing video forgeries by detecting MPEG double compression. In: Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE, pp. 1389–1392. Tagliasacchi, M., Tubaro, S., 2010. Blind estimation of the QP parameter in H.264/AVC decoded video. In: Image Analysis for Multimedia Interactive Services (WIAMIS), 2010 11th International Workshop on. IEEE, pp. 1–4. Valenzise, G., Tagliasacchi, M., Tubaro, S., 2010. Estimating QP and motion vectors in H.264/AVC video from decoded pixels. In: Proceedings of the 2nd ACM workshop on Multimedia in forensics, security and intelligence. ACM, pp. 89–92.
27
ACCEPTED MANUSCRIPT
Vazquez-Padin, D., Fontani, M., Bianchi, T., Comesa˜ na, P., Piva, A., Barni, M., 2012. Detection of video double encoding with GOP size estimation. In: Information Forensics and Security (WIFS), 2012 IEEE International Workshop on. IEEE, pp. 151–156.
RI PT
Wahab, A., Bagiwa, M., Idris, M., Khan, S., Razak, Z., Ariffin, M., Nov 2014. Passive video forgery detection techniques: A survey. In: Information Assurance and Security (IAS), 2014 10th International Conference on. pp. 29–34. Wang, Q., Li, Z., Zhang, Z., Ma, Q., 2014a. Video inter-frame forgery identification based on consistency of correlation coefficients of gray values. Journal of Computer and Communications 2 (04), 51.
SC
Wang, W., Farid, H., 2006. Exposing digital forgeries in video by detecting double MPEG compression. In: Proceedings of the 8th workshop on Multimedia and security. ACM, pp. 37–47.
M AN U
Wang, W., Farid, H., 2007a. Exposing digital forgeries in interlaced and deinterlaced video. Information Forensics and Security, IEEE Transactions on 2 (3), 438–449. Wang, W., Farid, H., 2007b. Exposing digital forgeries in video by detecting duplication. In: Proceedings of the 9th workshop on Multimedia & security. ACM, pp. 35–42.
TE D
Wang, W., Farid, H., 2009. Exposing digital forgeries in video by detecting double quantization. In: Proceedings of the 11th ACM workshop on Multimedia and security. ACM, pp. 39–48.
EP
Wang, W., Jiang, X., Wang, S., Wan, M., Sun, T., 2014b. Digital-Forensics and Watermarking: 12th International Workshop, IWDW 2013, Auckland, New Zealand, October 1-4, 2013. Revised Selected Papers. Springer Berlin Heidelberg, Berlin, Heidelberg, Ch. Identifying Video Forgery Process Using Optical Flow, pp. 244–257. URL http://dx.doi.org/10.1007/978-3-662-43886-2_18
AC C
Wu, Y., Jiang, X., Sun, T., Wang, W., 2014. Exposing video inter-frame forgery based on velocity field consistency. In: Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, pp. 2674–2678. Xu, J., Su, Y., Liu, Q., 2013. Detection of double MPEG-2 compression based on distributions of DCT coefficients. International Journal of Pattern Recognition and Artificial Intelligence 27 (01), 1354001. Xu, J., Su, Y., You, X., 2012. Detection of video transcoding for digital forensics. In: Audio, Language and Image Processing (ICALIP), 2012 International Conference on. IEEE, pp. 160–164. Yammine, G., Wige, E., Kaup, A., 2010. Blind GOP structure analysis of MPEG-2 and H.264/AVC decoded video. In: Picture Coding Symposium (PCS), 2010. IEEE, pp. 258–261. Yang, J., Huang, T., Su, L., 2014. Using similarity analysis to detect frame duplication forgery in videos. Multimedia Tools and Applications, 1–19. 28
ACCEPTED MANUSCRIPT
Zhang, J., Su, Y., Zhang, M., 2009. Exposing digital video forgery by ghost shadow artifact. In: Proceedings of the First ACM workshop on Multimedia in forensics. ACM, pp. 49–54.
RI PT
Zhang, Z., Hou, J., Ma, Q., Li, Z., 2015. Efficient video frame insertion and deletion detection based on inconsistency of correlations between local binary pattern coded frames. Security and Communication Networks 8 (2), 311–320.
AC C
EP
TE D
M AN U
SC
Zheng, L., Sun, T., Shi, Y.-Q., 2015. Digital-Forensics and Watermarking: 13th International Workshop, IWDW 2014, Taipei, Taiwan, October 1-4, 2014. Revised Selected Papers. Springer International Publishing, Cham, Ch. Interframe Video Forgery Detection Based on Block-Wise Brightness Variance Descriptor, pp. 18–30. URL http://dx.doi.org/10.1007/978-3-319-19321-2_2
29