J. Vis. Commun. Image R. 25 (2014) 1865–1877
Contents lists available at ScienceDirect
J. Vis. Commun. Image R. journal homepage: www.elsevier.com/locate/jvci
Rapid detection of camera tampering and abnormal disturbance for video surveillance system Deng-Yuan Huang a, Chao-Ho Chen b,⇑, Tsong-Yi Chen b, Wu-Chih Hu c, Bo-Cin Chen b a
Department of Electrical Engineering, Dayeh University, 168 University Rd., Dacun, Changhua 515, Taiwan, ROC Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, 415 Chien Kung Rd., Kaohsiung 807, Taiwan, ROC c Department of Computer Science and Information Engineering, National Penghu University of Science and Technology, 300 Liu-Ho Rd., Makung, Penghu 880, Taiwan, ROC b
a r t i c l e
i n f o
Article history: Received 23 October 2013 Accepted 14 September 2014 Available online 23 September 2014 Keywords: Camera tampering Camera motion Camera occlusion Background subtraction Video surveillance system Screen shaking Defocus Color cast
a b s t r a c t Camera tampering may indicate that a criminal act is occurring. Common examples of camera tampering are turning the camera lens to point to a different direction (i.e., camera motion) and covering the lens by opaque objects or with paint (i.e., camera occlusion). Moreover, various abnormalities such as screen shaking, fogging, defocus, color cast, and screen flickering can strongly deteriorate the performance of a video surveillance system. This study proposes an automated method for rapidly detecting camera tampering and various abnormalities for a video surveillance system. The proposed method is based on the analyses of brightness, edge details, histogram distribution, and high-frequency information, making it computationally efficient. The proposed system runs at a frame rate of 20–30 frames/s, meeting the requirement of real-time operation. Experimental results show the superiority of the proposed method with an average of 4.4% of missed events compared to existing works. Ó 2014 Elsevier Inc. All rights reserved.
1. Introduction Video surveillance systems are widely used in the fields of environmental safety, traffic control, and crime prevention. Important public places such as government agencies, malls, schools, railway stations, airports, military bases, and historical sites are often equipped with digital camera recording systems for video surveillance. However, when cameras are tampered with, the video surveillance system will fail to work properly. Moreover, long-term monitoring of a screen by operators is difficult and many video cameras are frequently left unattended. An intelligent video surveillance system that can automatically analyze live video content, detect suspicious activities, and trigger an alarm to notify operators, is desirable. Camera tampering is any sustained event which thoroughly alters the image seen by a video camera. Common examples of camera tampering in video surveillance systems are turning the camera lens to point to a different direction (i.e., camera motion) and covering the lens by opaque objects or with paint (i.e., camera occlusion). In general, camera problems are caused by: (1) ⇑ Corresponding author. E-mail addresses:
[email protected] (D.-Y. Huang),
[email protected] (C.-H. Chen),
[email protected] (T.-Y. Chen),
[email protected] (W.-C. Hu),
[email protected] (B.-C. Chen). http://dx.doi.org/10.1016/j.jvcir.2014.09.007 1047-3203/Ó 2014 Elsevier Inc. All rights reserved.
deliberate actions, such as camera motion and occlusion; (2) weather conditions, such as image blurring due to fogging; (3) abnormal disturbances, such as screen shaking, defocus, color cast, and screen flickering. For automated camera tampering and abnormality detection systems, high reliability and a relatively low false alarm rate are strongly desirable. Most research on the detection of camera tampering and abnormalities has focused on discovering events that move, cover, or defocus the camera [1–6] in a video surveillance system. However, other abnormalities such as screen shaking, fogging, color cast, and screen flickering have received less attention. Aksay et al. [1] proposed computationally efficient wavelet domain methods for the rapid detection of camera tampering and identified real-life security-related problems. Two algorithms were presented for detecting an obscured camera view and reduced visibility based on a learned background model together with the wavelet transform. However, camera tampering detection based on background modeling often suffers from instability due to varying light source intensity. Ribnick et al. [2] presented an approach to identify camera tampering by detecting large differences between older and more recent frames in video sequences, which are separately stored in two buffers, labeled as the shortterm pool and the long-term pool, respectively. Three measures of image dissimilarity are then used to compare the frames to determine whether camera tampering has occurred. However,
1866
D.-Y. Huang et al. / J. Vis. Commun. Image R. 25 (2014) 1865–1877
several preset thresholds are required and need to be tuned manually for optimal performance. Sag˘lam and Temizel [3] proposed adaptive algorithms to detect and identify abnormalities in video surveillance when the camera lens is defocused, moved, or covered. In their method, background subtraction is utilized to build the absolute background, which is used to determine camera tampering types. In the detection of camera defocus, the discrete Fourier transform is used and then a Gaussian windowing function is applied to eliminate low-frequency content. By comparing the highfrequency components of the current frame image and its background, a defocused camera view can be detected. In the detection of a moved camera, a delayed background image is built and compared with the current background using a preset criterion. The detection of a covered camera is done using the peak histograms of the current frame and its background. However, many thresholds must be set. Lin and Wu [4] identified camera tampering by detecting edge differences and analyzing the grayscale histograms between current and previous frames. An adaptive non-background model image is compared with both incoming video frames and an updated background image for edge difference detection and abnormality justification. Three types of camera tampering and abnormality, namely occlusion, defocus, and motion, were detected in a timely fashion with an overall recognition rate of 94% in their test scenarios. However, differentiating between camera defocus and motion may be unstable if only edge difference information is used. A detection method for camera tampering was also proposed in [5]. In more recent years, a method that uses an adaptive background codebook model was utilized for classifying camera tampering into displacement and obstruction types [6]. In general, camera tampering may indicate that a criminal act might be happening. Detecting abnormalities and triggering alarms to notify operators may thus decrease crime. This study thus develops a system for the rapid detection of camera tampering and various abnormalities in a video surveillance system. To achieve this goal, a computationally efficient method for the rapid detection of various abnormalities, including screen shaking, fogging, color cast, and screen flickering, is proposed. Camera motion, occlusion, and defocus are also detected, which will be compared with existing works.
The rest of this paper is organized as follows. The proposed method for detecting camera tampering and various abnormalities is introduced in Section 2. Experimental results are provided to demonstrate the performance of the proposed method in Section 3. Finally, concluding remarks are given in Section 4. 2. Proposed method Fig. 1 shows a flowchart of the proposed method for the detection of camera tampering and other abnormalities, including fogging, defocus, color cast, and screen flickering. Screen shaking is first detected to determine whether the input images can be used to build absolute backgrounds. In this work, two backgrounds with a delay of n frames, i.e., Bt and Btn, are built when video frames are stable; otherwise, the alarm for screen shaking is triggered. Then, the difference between backgrounds Bt and Btn is evaluated to determine the types of camera tampering and various abnormalities. If the difference between them is larger than a threshold hB, camera motion or occlusion is determined; otherwise, fogging, defocus, color cast, or screen flickering is determined. Finally, a background update is carried out to timely respond to changes in the input video frames. In Section 2.1, the method of detecting screen shaking is described. Background modeling and updating are introduced in Section 2.2 and 2.3, respectively. Then, the method of evaluating the background (i.e., Bt and Btn) difference is explained in Section 2.4. Finally, the methods of detecting camera motion, occlusion, and other abnormalities, such as fogging, defocus, color cast, and screen flickering, are given in Section 2.5 and 2.6, respectively. 2.1. Detection of screen shaking Screen shaking is often caused by wind or vibrations from nearby vehicles. Screen shaking makes the absolute background unstable. As shown in Fig. 2(b) and (d), for shaking frames, the number of pixels with larger gray intensities in a frame-difference image is higher than that of pixels whose gray intensities are smaller. However, for stable frames, the number of pixels with smaller gray intensities in a frame-difference image is higher than that of pixels whose gray intensities are larger. Based on this observation,
Fig. 1. Flowchart of proposed method for the detection of camera tampering and other abnormalities.
D.-Y. Huang et al. / J. Vis. Commun. Image R. 25 (2014) 1865–1877
1867
Fig. 2. Results of temporal difference for two consecutive frames. (a) Shaking image, (b) temporal difference of (a), (c) normal image, and (d) temporal difference of (c).
the decision rule for determining whether the frame is shaking or stable is derived as [7]:
if Ns > h2 frames are shaking else
2.2. Modeling of absolute background
ð1Þ
frames are stable where
Ns ¼
X 1; jIt ðx; yÞ It1 ðx; yÞj > h1 sðx; yÞ; where sðx; yÞ ¼ 0; otherwise x;y ð2Þ
where It(x, y) and It1(x, y) denote the gray intensities of pixel (x, y) at frame t and frame t 1, respectively, h1 and h2 are two thresholds, and Ns represents the number of pixels whose values are larger than h1 in frame-difference images. The frames at t and t 1 are considered to be shaking if Ns is larger than h2, as shown in Eq. (1). However, for some special shaking cases, Ns may be below h2 due to the varying shaking speed of frames. To detect frame shaking more accurately, an iterative process for checking whether Ns is larger than h2 is required. The screen is considered to be shaking if N ðNs >h2 Þ > T, where N ðNs >h2 Þ denotes the number of times that Ns is larger than h2 in a given period, T is the total number of checks that is set to 12 times per second. Thresholds h1 and h2 can be determined from statistical data obtained from experiments, though they may vary with the content of a frame-difference image. The modified decision rule for determining whether the frame is shaking is:
if NðNs >h2 Þ > T frames are shaking else frames are stable
It is noted that ‘‘frames are shaking’’ means that the background of frame t has moved relative to that of frame t 1.
ð3Þ
Motion detection in video sequences focuses on detecting regions corresponding to moving objects. Methods for this task can be roughly categorized as (1) background subtraction [8,9], (2) temporal differencing [10–12], and (3) optical flow approaches [13,14]. Background subtraction is suitable for detecting moving objects because the backgrounds in video streams are often stationary. In this paper, background modeling based on the temporal distribution of grayscales is built for each point, where the grayscale value with the maximum occurrence probability is assigned as the grayscale value of the absolute background. For details of background modeling, please refer to our previous study [15]. Here, the method of background modeling is briefly described below. To facilitate the process of background extraction, a distribution of gray levels for a fixed point p(x, y) in consecutive frames of a video sequence is first built. Then, the probability of occurrence, also called the appearance probability (AP), for each gray level at that point is calculated. Next, gray levels at that point are grouped and assigned to classes, where each class comprises a certain range of gray levels centered at a specific gray level. Finally, the absolute background is built using the gray level with the maximum AP for each point. For the proposed method of background modeling, the mean and variance of the kth class for each point p(x, y) at next frame t + 1, i.e., lktþ1 ðpÞ and Rktþ1 ðpÞ, are temporally updated using Eq. (4) and Eq. (5), respectively.
lktþ1 ðpÞ ¼
Nkt ðpÞ lkt ðpÞ þ It ðpÞ Nkt ðpÞ þ 1
ð4Þ
1868
D.-Y. Huang et al. / J. Vis. Commun. Image R. 25 (2014) 1865–1877
Rktþ1 ðpÞ ¼
n
1 Nkt ðpÞ
þ1
½N kt ðpÞ Rkt ðpÞ þ ½It ðpÞ lkt ðpÞ
2
o
ð5Þ
where N kt ðpÞ is the number of points for the kth class at current frame t, and It(p) is the grayscale value of point p(x, y). In this paper, the class is the bin grouping of grayscale values for each point. In the proposed background modeling, the grayscale value corresponding to the class with the maximum occurrence probability is then assigned as the grayscale value of the background. The appearance probability AP kt ðpÞ of the kth class for each point is calculated using Eq. (6). The grayscale value of the mth class that has the maximum appearance probability (AP) is assigned as the grayscale value of the background, determined using Eq. (7).
Nkt ðpÞ AP kt ðpÞ ¼ PNct ðpÞ1 ; where 0 6 k 6 Nct ðpÞ 1 N ct ðpÞ c¼0 m ¼ arg
max
AP ct ðpÞ
ð6Þ
X sðx; yÞ P hB NT where sðx; yÞ ¼
where Nct(p) is the total number of classes for point p(x, y). Thus, the grayscale value of the background can be determined as:
Prior to detecting the events of camera tampering and various abnormalities, the difference of the input and background images is calculated using the background subtraction scheme. If the difference between them is large, it implies the possibility of camera motion or occlusion. However, it is undesirable for pedestrians walking in front of the lens to trigger an alarm. Using only the input image and its background image to detect camera tampering is not reliable. To improve reliability, two absolute backgrounds with a delay of n frames, i.e., Bt and Btn, are built. In this work, n = 30 (obtained from experiments) is used. To determine how large the difference of backgrounds is, the following method is proposed.
x;y
ð7Þ
06c6Nct ðpÞ1
2.4. Evaluation of difference of backgrounds
Bt ðpÞ ¼ lm t ðpÞ
ð8Þ
Rt ðpÞ ¼ Rm t ðpÞ
where Bt(p) and Rt(p) are the mean and variance of the background, respectively, established from the class m, which has the maximum AP value for each point p(x, y) at current frame t. The criterion used to determine whether a point belongs to the background or the foreground is proposed as:
if jIt ðpÞ Bt ðpÞj 6 b rt ðpÞ It ðpÞ 2 background pixel else It ðpÞ 2 foreground pixel
ð9Þ
where rt(p) is the standard deviation of the background (i.e., the square root of Rt(p)), and b is the weight assigned to the frame difference threshold with a range of 1 6 b 6 5. In this work, b is set to 3, indicating that the probability of points with gray levels that fall within 3 standard deviations close to the background is greater than 88.9% according to the theory of Chebyshev’s inequality. The setting is reasonable for most situations.
ð12Þ 1; BD ðx; yÞ–0 0;
otherwise
where BD(x, y) = Bt(x, y) Btn(x, y) is the background difference at pixel (x, y), Bt(x, y) and Btn(x, y) are the gray intensities of pixel (x, y) at background t and background t n, respectively, NT is the image size, and hB is a threshold (set to 0.7 from experiments). If Eq. (12) is satisfied, camera motion or occlusion is detected and the corresponding alarm is triggered; otherwise, fogging, defocus, color cast, and screen flickering is checked (see Fig. 1). 2.5. Detection of camera motion and occlusion Fig. 3 shows the proposed method for the detection of camera motion and occlusion. In this work, camera motion includes events that move the camera or make it point to a different direction and camera occlusion includes events that cover the lens by opaque objects or with paint. To achieve more accurate detection of camera occlusion, two decision rules must be satisfied using histogram analysis and edge detection of the current frame and its background. If these two decision rules are satisfied, lens covering
2.3. Background updating Background updating is required to respond to changes in the environment. The performance of motion detection can be greatly enhanced by refreshing backgrounds. In this work, an iterative rule is used to carry out background updating. The background is updated by taking a weighted average of the current background and the current frame for a video sequence. To determine pixels belonging to the background or foreground for the current frame, the updating rule is:
Btþ1 ðpÞ ¼
Bt ðpÞ; if It ðpÞ 2 foreground ð1 aÞBt ðpÞ þ aIt ðpÞ; if It ðpÞ 2 background
(
Rtþ1 ðpÞ ¼
ð10Þ
if It ðpÞ 2 foreground
Rt ðpÞ; 2
ð1 aÞRt ðpÞ þ aðIt ðpÞ Bt ðpÞÞ ; if It ðpÞ 2 background ð11Þ
where a 2 ½0; 1 represents the weight (or updating rate) assigned to the current frame and the background. In this paper, a typical value of a = 0.05 is used, as given elsewhere [16].
Fig. 3. Flowchart of proposed method for detection of camera motion and occlusion.
D.-Y. Huang et al. / J. Vis. Commun. Image R. 25 (2014) 1865–1877
1869
Fig. 4. Histograms of (a) normal, (b) lens covered, (c) lens painted, and (d) absolute background images.
and painting are considered; otherwise, the camera is detected as having been moved. Fig. 4 shows histogram distributions for normal, lens covered, lens painted, and background images. The distributions in Fig. 4(b) and (c) are more concentrated than those in Fig. 4(a) and (d). Based on this observation, the number of pixels around the maximum histogram in the obscured image (see Fig. 4(b)) and its background (see Fig. 4(d)) can be used to determine whether the lens has been covered or painted. Therefore, the first decision rule using histogram analysis is: n X k¼n
! Hf 0 þk
P CF
n X
! Hf 0 þk
k¼n
hocclusion
ð13Þ
BI
where subscripts CF and BI denote the current frame (i.e., obscured image) and its background image, respectively, f0 is the grayscale value corresponding to the maximum histogram in the obscured image, and Hf 0 is the histogram at the grayscale value of f0. The thresholds of n = 2 and hocclusion = 1.5 are set from experiments. Further analysis of the edge information of an obscured image and its background image, as shown in Fig. 5, reveals that edge information is lost when a frame is obscured. Therefore, the second decision rule using edge information is proposed as:
SCF 6 SBI hedge
ð14Þ
where SCF and SBI denote the numbers of edge pixels in the obscured image and the background image, respectively, and hedge is a proportional ratio, set to 0.7 from experiments. Hence, when the criteria in Eq. (13) and (14) are both satisfied, the event is considered as lens covering or painting; otherwise, camera movement is detected. 2.6. Detection of other abnormalities Other abnormalities in this work include fogging, defocus, color cast, and screen flickering. The detection methods are proposed and described in subsections 2.6.1 to 2.6.4. Generally, both blurring and fogging can lead to edge reduction but the causes of them are different. Fogging is due to weather condition while defocus could be resulted from camera damage, mist, or water droplets. For a fogging image, the color saturation is greatly attenuated due to the scattering characteristics of airlight but it is not the case for a blurring image. Moreover, the distribution of grayscale histogram for a fogging image is more concentrated in the central portion than that of a normal image but it cannot be deduced from a blurring image. Therefore, it is necessary to distinguish the cases of blurring and fogging for an image prior to the detection of them. Fig. 6 shows
1870
D.-Y. Huang et al. / J. Vis. Commun. Image R. 25 (2014) 1865–1877
Fig. 5. Results of edge detection for (a) normal and (b) obscured images.
1 RS<0:04 ¼ MN
N1 M1 X X
nðx; yÞ
y¼0 x¼0
where nðx; yÞ ¼
ð15Þ 1
if Sðx; yÞ < 0:04
0;
otherwise
where S(x, y) is the saturation value at pixel (x, y), M and N denote the height and width of an image, respectively. If RS<0.04 > 60%, then the input image is determined as a fogging one and a fogging subroutine is applied, as described in 2.6.1. However, if either the conditions of SAVG < 0.04 or RS<0.04 > 60% is not satisfied, further grayscale histogram analysis is required. Suppose the whole grayscales of an image can be divided into four parts, the middle range is defined as the central two parts of the whole grayscales, i.e., 64 6 Iðx; yÞ 6 191. Therefore, the ratio of pixels that have gray intensities in the middle range can be estimated as. 1 RI MIDDLE ¼ MN
N1 P M1 P
nðx; yÞ
y¼0 x¼0
where 8 > < 1; if 64 6 Iðx; yÞ 6 191 nðx; yÞ ¼ 0; otherwise > :
ð16Þ
As described earlier, a fogging image has a more concentrated grayscale distribution than that of a defocus image. Therefore, if RIMIDDLE > 95%, the input image is determined as a fogging one; otherwise, it is a defocus image and corresponding subroutine is applied, as described in 2.6.2.
Fig. 6. The proposed method for distinguishing fogging and defocus.
the proposed method to determine the cases of blurring or fogging for an input image. As shown in Fig. 6, the input image is converted from the color space of RGB into HIS. The average saturation SAVG can be calculated from the saturation channel of the whole image. If SAVG < 0.04, the image might be a fogging image; otherwise, further grayscale histogram is analyzed. Note that the value of 0.04 is experimentally observed from 35 video sequences. However, because the condition of SAVG < 0.04 is quite loose, the ratio of pixels with saturation less than 0.04 in the whole image can be estimated as.
2.6.1. Detection of fogging Two decision rules, shown in Fig. 7, are proposed to determine whether the frame has been fogged based on the principle of Lambert–Beer [17]. As shown in Fig. 8(c) and (d), the histogram of the fogged image is more concentrated than that of a free-one, indicating a lower contrast for the fogged image. Fewer edge details is the most important feature of fogged images compared to fog-free images, as shown in Fig. 9(b) and (e). Based on this observation, the Sobel edge detection method is suitable for determining frame fogging. As shown in Fig. 9(c) and (f), the total number of pixels with grayscale values of 250– 255 for fogged images is much lower than that for fog-free ones. Therefore, if the first decision rule, given in Eq. (17), is satisfied, the current frame is possibly a fogged image.
ðn2506I6255 =MNÞ < hE and ðnI¼250 =MNÞ < hE P255
ð17Þ
where n2506I6255 ¼ I¼250 nI and nI=250 denote the total number of pixels with grayscale values of 250–255 and the number of pixels
D.-Y. Huang et al. / J. Vis. Commun. Image R. 25 (2014) 1865–1877
1871
Fig. 7. Flowchart of proposed method for fogging detection.
Fig. 8. Analysis of gray intensities for fogged and fog-free images. (a) Fogged image, (b) fog-free image, (c) histogram of (a), and (d) histogram of (b).
with a grayscale value of 250, respectively, and hE is a threshold, set to 0.001 from experiments. As suggested in [17], fogging according to the degree of concentration can be roughly divided into four levels, i.e., heavy fog, medium fog, slight fog, and fog-free. In this work, a measure of z = Sl + r is proposed to determine the fogging level, where Sl and r denote the mean value and standard deviation of an edge image, respectively.
Sl ¼
M 1X N 1 X
IE ðx; yÞ=MN
ð18Þ
x¼0 y¼0
r¼
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi XM1 XN1 ðIE ðx; yÞ Sl Þ2 =MN x¼0 y¼0
ð19Þ
where IE(x, y) is the grayscale value at pixel (x, y) for an edge image. In our work, z = 0–45, 45–75, 75–90, and >90 correspond to the levels of heavy fog, medium fog, slight fog, and fog-free, respectively. Hence, the second decision rule is z > 90, which is used to indicate
that the frame is fog-free. Fogging and its level can be determined from the proposed two decision rules, as shown in Fig. 7. Note that the fogging levels only indicate the degree of concentration of fog. 2.6.2. Detection of defocus Defocus causes blurring due to a loss of edge details and may be caused by camera damage, mist, or water droplets. The discrete cosine transform (DCT) is suitable for the detection of defocus because the change of edge details due to blurring is reflected by the distribution of the power spectrum. As shown in Fig. 10(b) and (d), the cosine spectrum is clearly different for normal and defocused images, specifically in the high-frequency region. In addition, the number of zero coefficients in the cosine spectrum at high frequency increases with increasing loss of image focus. Therefore, the number of nonzero coefficients in the cosine spectrum, called the content of high-frequency components, can be used as a measure to evaluate whether the frame is defocused, denoted as Q HF t , where the subscript t indicates that the value is calculated at frame t:
1872
D.-Y. Huang et al. / J. Vis. Commun. Image R. 25 (2014) 1865–1877
Fig. 9. Histogram distribution of edge image for fogging and fog-free cases. (a) Fog-free image, (b) Sobel edge image of (a), (c) histogram distribution of (b), (d) fogged image, (e) Sobel edge image of (d), and (f) histogram distribution of (e).
Fig. 10. Variation of cosine spectrum for normal and defocus images. (a) Normal image, (b) cosine spectrum of (a), (c) defocused image of (a), and (d) cosine spectrum of (c).
Q HF t ¼
M 1 X N1 X
sðx; yÞ
y¼M=2x¼N=2
where sðx; yÞ ¼
ð20Þ (
1 fðx; yÞ > 0 0
otherwise
where f(x, y) is the coefficient of the cosine spectrum at pixel (x, y), and M and N denote the height and width of the image of the cosine spectrum, respectively. Note that in Eq. (20), Q HF is calculated only t in the high-frequency region. The proposed decision rule for determining whether the frame is defocused as: HF Q HF t < aL where aL ¼ Q Base bL
ð21Þ
D.-Y. Huang et al. / J. Vis. Commun. Image R. 25 (2014) 1865–1877
1873
where bL is set to 0.7 from experiments, and Q HF Base is a base value that is initially determined from the first frame of the input video sequence and updated by the rule as:
( Q HF Base ¼
Q HF t
if Q HF t > aU
Q HF Base
otherwise
where aU ¼ Q HF Base bU
ð22Þ
where bU is set to 0.95 from experiments. The complete flowchart of the detection of defocus is shown in Fig. 11.
Fig. 11. Flowchart of proposed method for defocus detection.
2.6.3. Detection of color cast A color cast is a tint of a particular color, usually undesirable, which affects image uniformly. Color cast often arises from incorrect settings of white balance, weak or abnormal video signal
Fig. 12. Color cast caused by (a) incorrect settings of white balance, (b) abnormal video signal input, and (c) monitor damage.
Fig. 13. Histogram distributions of B, G, and R color channels for (a) normal and (b) color cast images. (c) Histogram for B, G, and R color channels of (a) with average gray intensities of 116, 113, and 119, respectively, and (d) histogram for B, G, and R color channels of (b) with average gray intensities of 141, 107, and 94, respectively. Note that these two images satisfy decision rule 1.
1874
D.-Y. Huang et al. / J. Vis. Commun. Image R. 25 (2014) 1865–1877
Fig. 14. Flowchart of proposed method for color cast detection.
input, and monitor damage, as shown in Fig. 12. Different white balance settings are required to compensate for unnatural color due to the color temperature of a given light source. In general, human eyes do not perceive the unnatural color because our eyes and brains can automatically adjust and compensate for different types of light in ways that cameras cannot. In general, when video cameras do not have a color cast, the average gray intensities of R, G, and B color channels are similar. The difference between them becomes large when there is a color cast [18], as shown in Fig. 13. Two decision rules based on the analysis of gray intensities for all color channels, as shown in Fig. 14, are proposed to detect color cast. Let Ravg, Gavg, and Bavg be the average gray intensities of R, G, and B color channels, respectively, and Lmax = max {Ravg, Gavg, Bavg} and Lmin = min {Ravg, Gavg, Bavg} denote the maximum and minimum values of the average gray intensities for all color channels, respectively. The first proposed decision rule for determining the presence of color cast is:
jLmax Lmin j > hcc
ð23Þ
where hcc is set to 30 from experiments. Generally, human eyes can perceive images with a color cast only when the histogram distributions of color channels have a significant difference (see Fig. 13). Hence, images can be considered to have a color cast when the biggest difference between color channels is greater than 30, as given in Eq. (23). However, in some special situations, only considering the difference of global average values of color channels can fail to detect color cast. To avoid such misjudgment, local details in images should also be considered.
Let Ccc be the channel with color cast, which is determined from the difference between color channels, i.e., Rdiff, Gdiff, and Bdiff, as:
C cc ¼
arg
maxfRdiff ; Gdiff ; Bdiff g
color channel
where ð24Þ
Rdiff ¼ jRav g Bav g j þ jRav g Gav g j Gdiff ¼ jGav g Bav g j þ jGav g Rav g j Bdiff ¼ jBav g Gav g j þ jBav g Rav g j
Based on experimental observations, when video cameras have a color cast, the whole image has a uniform tint toward a particular color, indicating that even the local blocks do, as shown in Fig. 15. Define the local light and dark regions in the channel with cc cc color cast as Rlight ¼ ff jf ¼ ICmax n to ICmax g and Rdark ¼ ff jf ¼ C cc C cc cc cc Imin to Imin þ ng, respectively, where n is set to 5, and ICmin and ICmax denote the minimum and maximum grayscale values in the histogram of the channel with color cast, respectively. Therefore, the sets of Rlight and Rdark can be used to calculate the average intensities in these two local regions for all color channels as: C cc
Iclight
Imax X
1
¼ Nc
light
C cc
ncf f ; c
¼ R; G; B; where
Nclight
¼
cc n f ¼ICmax
cc n f ¼ICmax
cc þn ICmin
Icdark ¼ Nc1
dark
X
cc f ¼ICmin
Imax X
ncf f ; c ¼ R; G; B; where
Ncdark
cc þn ICmin
¼
X
cc f ¼ICmin
ncf
ncf ð25Þ
D.-Y. Huang et al. / J. Vis. Commun. Image R. 25 (2014) 1865–1877
1875
Fig. 15. Histogram distributions of blocks A and B for color cast image. (a) Color cast image, (b) magnification of block A, (c) magnification of block B, (d) histograms for B, G, and R color channels of block A with average gray intensities of 148, 202, and 203, respectively, and (e) histograms for B, G, and R color channels of block B with average gray intensities of 18, 77, and 73, respectively. Note that Blocks A and B satisfy decision rule 1.
where ncf is the number of pixels at the grayscale value f for color channel c, and N clight and N cdark denote the total numbers of pixels in sets Rlight and Rdark for color channel c, respectively. Subsequently, the difference of the average intensities between the color cast channel and other color channels in the light and dark regions is calculated as: cc DIclight ¼ jIClight Iclight j for c–C cc
and
DIcdark
¼
cc jICdark
Icdark j
ð26Þ
for c–C cc
Based on the values of DIcdark and DIclight , the second proposed decision rule is:
DIcdark > hcc and DIclight > hcc for any color channel
ð27Þ
2.6.4. Detection of screen flickering A flickering screen can contribute to eyestrain and headaches and make a surveillance system unstable. It is often caused by electromagnetic interference (EMI) or signal lines being destroyed. Flickering is a visible fading between cycles on display monitors and makes the gray intensities of the screen change quickly. Fig. 16(a) and (b) show variations of average gray intensities with frame t for normal and flickering screens, respectively. The changes of gray intensities are stable for the normal screen and extremely unstable for the flickering screen. Based on this observation, the change of average gray intensities (Iavg) of frames can be used to detect a flickering screen, as shown in Fig. 17. Let statet = {0, 1} be at state 0 when the change of average gray intensities (DIavg = Iavg(t + 1) Iavg(t)) is increasing negatively and 1 when the change of DIavg is increasing positively. It is updated as:
8 1;if jIav g ðt þ 1Þ Iav g ðtÞj > Df and Iav g ðt þ 1Þ > Iav g ðtÞ > > > > > and stateðtÞ–1 > < stateðt þ 1Þ ¼ 0; if jIav g ðt þ 1Þ Iav g ðtÞj > Df and Iav g ðt þ 1Þ < Iav g ðtÞ > > > and stateðtÞ–0 > > > : stateðtÞ;otherwise ð28Þ where Df is set to 5 from experiments to avoid recording small changes of signals. Therefore, the proposed decision rule for determining a flickering screen is:
NSC ¼
NSC þ 1;
if state is changed:
NSC
if state is not changed:
Fig. 17. Schematic illustration of Iavg increasing and decreasing.
where hf is set to 10 times from experiments. In Eq. (29), when the number of state changes (i.e., from 1 to 0 or from 0 to 1) is greater than the threshold hf in a preset time period, an alarm for flickering screen is triggered and NSC is reset to zero. However, NSC is still reset to zero after the preset time period even if no alarm is triggered.
3. Experimental results and discussion
If NSC > hf then trigger an alarm for flickering where
Fig. 16. Variations of average gray intensities with frame t for (a) normal and (b) flickering screens.
ð29Þ
To evaluate the performance of the proposed system, video sequences with seven kinds of camera tampering and abnormalities, with a total of 137 abnormal events, were tested. The abnor-
1876
D.-Y. Huang et al. / J. Vis. Commun. Image R. 25 (2014) 1865–1877
Fig. 18. Detection results for camera tampering and other abnormalities, including shaking, fogging, defocus, color cast, and flickering. (a) Shaking screen, (b) camera lens pointed in a fixed direction, (c) camera lens turned to another direction, (d) camera occlusion, (e) fogging, (f) defocus, (g) color cast, and (h) flickering screen.
Table 1 Performance evaluation abnormalities.
of
proposed
method
for
camera
tampering
and
Abnormality type
Number of false alarms
Number of missed events
Percentage of missed events (%)
Screen shaking (16) Camera motion (30) Camera occlusion (30) Fogging (18) Defocus (20) Color cast (9) Screen flickering (14)
2 4 0 2 1 4 3
0 2 1 0 2 0 1
0.0 6.7 3.3 0.0 10.0 0.0 7.1
Note: The values in parentheses denote the numbers of occurrences of the given abnormal event.
Table 2 Comparison of proposed and existing methods for camera occlusion. Approach
Number of false alarms
Number of missed events
Percentage of missed events (%)
Sag˘lam and Temizel [3] Lin and Wu [4] Gil-Jimenez et al. [5] Proposed method
0 3 0 0
6 5 1 1
20.0 16.7 3.3 3.3
Table 3 Comparison of proposed and existing methods for camera motion. Approach
Number of false alarms
Number of missed events
Percentage of missed events (%)
Sag˘lam and Temizel [3] Lin and Wu [4] Gil-Jimenez et al. [5] Proposed method
4 7 3 4
4 6 4 2
13.3 20.0 13.3 6.7
Table 4 Comparison of proposed and existing methods for defocused lens. Approach
Number of false alarms
Number of missed events
Percentage of missed events (%)
Sag˘lam and Temizel [3] Lin and Wu [4] Gil-Jimenez et al. [5] Proposed method
3 6 1 1
2 9 2 2
10.0 45.0 10.0 10.0
the size of 320 240 pixels. The proposed system was then run on a PC with a 3.2 GHz Intel i5 processor and 4 GB of RAM. The results in terms of the number of false alarms, number of missed events, and percentage of missed events for the seven types of camera tampering and abnormalities are listed in Table 1 to show the performance of the proposed system. As shown in Table 1, a total number of 6 abnormal events are missed, indicating that an average of 4.4% (=6/137 ⁄ 100%) of missed events is obtained for the proposed system. Comparisons with existing works [3–5] are shown in Tables 2–4 for camera occlusion, camera motion, and defocused lens, respectively, to show the superiority of the proposed method, indicating the feasibility of the proposed method. However, since the abnormalities of screen shaking, fogging, color cast, and screen flickering have rarely been investigated, no comparisons were made for them. As shown in Table 2 and 4, the lowest values of both numbers of false alarms and missed events are obtained for the proposed method when comparing with existing works [3–5]. As shown in Table 3, the number of false alarms for camera motion for the proposed method is slightly higher than that for the method proposed by Gil-Jimenez et al. [5], but the number of missed events is much lower. The slightly higher number of false alarms may arise from the scene change from an edge-rich frame to a frame with fewer edge details, which may lead to a misjudgment of camera motion as camera occlusion. 4. Conclusion
mal events included screen shaking (16), camera motion (30), camera occlusion (30), fogging (18), defocus (20), color cast (9), and screen flickering (14), where the values in parentheses denote the numbers of occurrences of the given abnormal event. Typical test video frames are shown in Fig. 18. In this work, 320 240pixel videos were captured and saved in AVI format at a frame rate of 30 frames/s. However, different video size is also normalized to
This paper presented a simple but efficient method for detecting camera tampering and various abnormalities for video surveillance systems. Since screen shaking, fogging, color cast, and screen flickering are rarely investigated, several schemes were proposed with satisfactory detection results in terms of the percentage of missed events. To verify the performance of the proposed method,
D.-Y. Huang et al. / J. Vis. Commun. Image R. 25 (2014) 1865–1877
comparisons with existing works for the detection of camera motion, occlusion, and defocus were carried out. The results show the superiority of the proposed system with an average of 4.4% of missed events, indicating its feasibility. Since the proposed method is based on the analyses of brightness, edge details, histogram distribution, and the content of high-frequency components, it is computationally efficient. Moreover, the proposed system runs at a frame rate of 20–30 frames/s, meeting the requirement of real-time operation. Acknowledgements This work was partially supported by the National Science Council of Taiwan under Grants NSC102-2221-E-151-042 and NSC-102-2221-E-212-015. References [1] A. Aksay, A. Temizel, A.E. Cetin, Camera Tamper Detection Using Wavelet Analysis for Video Surveillance, in: Proc. of IEEE Int. Conf. on Advanced Video and Signal Based Surveillance, London, United Kingdom, 2007, pp. 558–562. [2] E. Ribnick, S. Atev, O. Masoud, N. Papanikolopoulos, R. Voyles, Real-Time Detection of Camera Tampering, in: Proc. of Int. Conf. on Advanced Video and Signal Based Surveillance, Sydney, Australia, 2006, pp. 1–6. [3] A. Sag˘lam, A. Temizel, Real-time Adaptive Camera Tamper Detection for Video Surveillance, in: Proc. of IEEE Int. Conf. on Advanced Video and Signal Based Surveillance, Genova, Italy, 2009, pp. 430–435. [4] D.T. Lin, C.H. Wu, Real-time Active Tampering Detection of Surveillance Camera and Implementation on Digital Signal Processor, in: Proc. of IEEE Int. Conf. on Intelligent Information Hiding and Multimedia Signal Processing, Piraeus, Greece, 2012, pp. 383–386. [5] P. Gil-Jimenez, R. Lopez-Sastre, P. Siegmann, J. Acevedo-Rodriguez, and S. Maldonado-Bascon, Automatic Control of Video Surveillance Camera Sabotage, in: Proc. of Int. Work-Conference on the Interplay between Natural and Artificial Computation (IWINAC), vol. 4528, 2007, pp. 222–231.
1877
[6] C.L. Tung, P.L. Tung, C.W. Kuo, Camera Tamper Detection Using Codebook Model for Video Surveillance, in: Proc. of IEEE Int. Conf. on Machine Learning and Cybernetics, Xian, China, 2012, pp. 1760–1763. [7] C.H. Chen, C.Y. Chen, C.H. Chen, J.R. Chen, Real-time video stabilization based on vibration compensation by using feature block, Int. J. Innovative Comput., Inform. Control 7 (9) (2011) 5285–5298. [8] C.R. Wren, A. Azarbayejani, T. Darrell, A.P. Pentland, Pfinder: real-time tracking of human body, IEEE Trans. Pattern Anal. Mach. Intell. 9 (7) (1997) 780–785. [9] C. Stauffer, W. Grimson, Adaptive Background Mixture Models for Real-time Tracking, in: Proc. of IEEE Int. Conf. on Computer Vision and Pattern Recognition, FT. Collins, CO, USA, 1999. [10] A.J. Lipton, H. Fujiyoshi, R. Patil, Moving Target Classification and Tracking from Real-time Video, in: Proc. of the 4th IEEE Workshop on Applications of Computer Vision, Washington, DC, USA, 1998. [11] R. Collins, A. Lipton, T. Kanade, H. Fujiyoshi, D. Duggins, Y. Tsin, D. Tolliver, N. Enomoto, O. Hasegawa, A system for video surveillance and monitoring, VSAM final report, Technical CMU-RI-TR-00-12, 2000. [12] C. Zhang, M.Y. Siyal, A New Segmentation Technique for Classification of Moving Vehicles, in: Proc. of IEEE 51st Vehicular Technology Conference, Tokyo, Japan, 2000. [13] S. Galic, S. Loncaric, Spatio-Temporal Image Segmentation Using Optical Flow and Clustering Algorithm, in: Proc. of IEEE International Workshop on Image and Signal Processing and Analysis, Pula, Croatia, 2000. [14] D. Gutchess, M. Trajkonic, E. Cohen-Solal, D. Lyons, A.K. Jain, A Background Model Initialization Algorithm for Video Surveillance, in: Proc. of IEEE International Conference on Computer Vision, Vancouver, Canada, 2001. [15] D.Y. Huang, C.H. Chen, W.C. Hu, S.S. Su, Reliable moving vehicle detection based on the filtering of swinging tree leaves and raindrops, J. Vis. Commun. Image Represent. 23 (4) (2012) 648–664. [16] A. Prati, I. Mikic, M.M. Trivedi, R. Cucchiara, Detecting moving shadows: algorithms and evaluation, IEEE Trans. Pattern Anal. Mach. Intell. 25 (7) (2003) 918–923. [17] C.H. Chen, W.W. Tsai, D.J. Wang, Image Defogging Method based on the Atmosphere Scattering Theory and Color Analysis, in: Proc. of the 4th Intelligent Living Technology Conference, Taichung, Taiwan, pp. 1596–1602, 2009. [18] S.C. Tai, T.W. Liao, Y.Y. Chang, C.P. Yeh, Automatic white Balance Algorithm Through the Average Equalization and Threshold, in: Proc. of Information Science and Digital Content Technology, Jeju Island, South Korea, vol. 3, 2012, pp. 571–576.