Pattern Recognition Letters 26 (2005) 91–99 www.elsevier.com/locate/patrec
Cast shadow detection in video segmentation Dong Xu a, Xuelong Li b
b,*
, Zhengkai Liu a, Yuan Yuan
c
a Information Processing Center, University of Science and Technology of China, Anhui 230027, PR China School of Computer Science and Information Systems, Birkbeck College, University of London, Malet Street, London WC1E 7HX, UK c Department of Electronic and Electrical Engineering, University of Bath, Bath BA2 7AY, UK
Received 4 November 2003; received in revised form 31 July 2004 Available online 18 October 2004
Abstract In video segmentation, an intrinsic problem is that the moving cast shadows are always misclassified as part of the moving objects. This paper presents a novel moving cast shadow detection algorithm. The experiments demonstrate shadow region can be removed quite well and thus good video segmentation results can be obtained. 2004 Elsevier B.V. All rights reserved. Keywords: Cast shadow; Video segmentation
1. Introduction Video segmentation, the key technique for semantic object extraction, plays a very important role in content based video coding in (MPEG4, 1998). Several schemes (Chien et al., 2002; Mech and Wollborn, 1998; Meier and Ngan, 1999; Neri et al., 1998) have been applied for video segmentation, among which change detection based technique attracts more and more attention. Since the motion is the most important cue for distinguishing moving object and the static background, *
Corresponding author. E-mail addresses:
[email protected],
[email protected] (X. Li).
the efficiency of these kinds of algorithms is obvious. The frame difference between two consecutive frames is computed and thresholded firstly, thus the rough location and shape of the objects may be detected, and then spatial and temporal information is applied to tune the boundary (Chien et al., 2002). However, there still exist several serious drawbacks in the traditional algorithms (Chien et al., 2002). These shortcomings always bring some false positives and false negatives to the system, especially, these problems increase obviously when the shadows exist. Cast shadow in background is generated because of the lack of light. Like moving objects, moving cast shadows can also cause large differences between two successive frames;
0167-8655/$ - see front matter 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2004.09.005
92
D. Xu et al. / Pattern Recognition Letters 26 (2005) 91–99
meanwhile, the moving object and cast shadow region are usually connected together, so cast shadow must be processed definitely in the change detection based algorithms (Chien et al., 2002; Stauder et al., 1999). Actually, some solutions were proposed to prevent shadow from being misclassified as moving object (Chien et al., 2002; Stauder et al., 1999). Chien et al. (2002) employed the gradient filter to remove shadow region. The criteria focuses on an observation that shadow area tends to have a gradual change in luminance value, in addition, it can remove some effects of the luminance or camera gain change because these changes effect is decreased obviously in gradient domain. Generally, it performs well in various conditions, nevertheless, some limitations occur simultaneously (Chien et al., 2002). First, since the algorithm relies on the smooth changing gradient in one region, so the benefit of gradient filter will reduce when processing the texture region. Second, some objects, with a weak edge and low texture, may be damaged because of the removal of image details. Stauder et al. (1999) analyzed the moving cast shadow behavior extensively and based their approach on four assumptions: (1) a strong light source causing a cast shadow, (2) a static camera, and static and textured background, (3) planar background, and (4) the light source with a certain extent. A cast shadow is found mainly according to the results of change detection, static edge detection, shading change detection and penumbra detection. There are some problems in this approach. Some regions of a moving object, such as the facial part of a human, are easy to be misclassified as shadow regions because the uniform colors there present the same characteristics as the shadow regions. The regions that are always shadowed along the sequence cannot be detected by their algorithm, as pointed out by the authors (Stauder et al., 1999). Moreover, the computation is quite complex (Chien et al., 2002). So, nowadays, it is more and more necessary to establish an efficient and effective method for the shadow problem. Shadow can be classified as significant shadow and insignificant shadow according to its extent. Shadow significance is a comparative concept.
When the edge of a shadow region is as sharp as the corresponding moving objectÕs edge, the shadow is defined as a significant shadow. Otherwise, it is thought as an insignificant shadow. Shadow significance depends on several factors, such as the light source, the width of penumbra, the background grey value, etc. In sunny days and on cement roads, the shadow cast by pedestrians or vehicles can be considered as significant shadow. Because under this condition, no penumbra area exists, and the contrast between cement road and dark shadow is obvious, so the edge of shadow region is as sharp as the moving object. However, shadow tends to be insignificant in normal indoor environment.Penumbra usually exists due to the fact that the size of light source cannot be ignored when compared with the distance between light source and object (Stauder et al., 1999). Moreover, the penumbra is usually quite wide, so the change of the grey values between un-shadowed regions to shadowed regions is quite smooth, and it is considered as insignificant shadow. Facing the problem that the moving cast shadow is often misclassified as part of the moving object in change detection based video segmentation, we propose an effective approach to the detection and removal of insignificant moving cast shadows in normal indoor scenes where the camera is stationary. It is especially appropriate for the applications of indoor video surveillance and conferencing. The main contribution of this paper is that we successfully remove cast shadows from moving objects by the conditional dilation operation, where the edge and region information are used in a unified framework.Compared with the method in (Stauder et al., 1999), our approach does not require (or assume) the region uniform property, and thus seldom misclassifies the uniform moving object region as a shadow region. Moreover, it can detect insignificant shadows appearing along a whole image sequence. We have compared our approach with the gradient filter method (Chien et al., 2002) which is the most recent state of the art related to our approach, and the experimental results show that our approach improves the detection performance. In our algorithm, the edge information plays an important role for shadow removal. Canny edge is
D. Xu et al. / Pattern Recognition Letters 26 (2005) 91–99
also applied for video segmentation in (Meier and Ngan, 1999). However there are still several obvious differences between these two approaches. In (Meier and Ngan, 1999), Canny edges are used to find the edge of the moving object, and then the filling algorithms can be applied to extract the whole moving object. Actually it is not an easy task to find the actual edge of the moving object by Canny edge and the initial change detection mask, so two model update schemes are proposed to handle slowly changing components and rapidly changing components respectively. Moreover, to handle the un-closed boundary, some complex filling algorithms are also implemented as the postprocessing step.However in our approach, the Canny edge is implemented based on the property of insignificant shadow, i.e. although the moving cast shadows appear in the initial change detection mask, however, the fact that the transition from the backgrounds to the shadow regions is gradual makes the edges caused by the shadow boundaries almost invisible. Thus Canny edges can be used to find some initial seed points for shadow region. So, in a word, the Canny edges are used for a different purpose, and thus in a different way. This paper is organized as follows: In Section 2, the presented scheme is specified in details; the preliminary experimental results are shown in Section 3 to prove the advantages of the proposed scheme; Section 4 summarizes the paper.
93
2. Cast shadow detection algorithm The block diagram of the algorithm is shown as Fig. 1. It is composed of several steps, which will be described specifically in the following subsections. 2.1. Initial change detection mask (CDM) computation To obtain the initial CDM for a frame is an important step in moving object segmentation. This mask usually contains moving objects and cast shadows. The most popular scheme to detect moving objects in outdoor environment is background subtraction or background suppression (Cucchiara et al., 2001), where the objects move fast and the complete background is easy to obtain. However, we found that this scheme cannot yield satisfactory results for indoor video segmentation. Frame difference has been used for moving object detection in indoor scenes. In (Mech and Wollborn, 1998; Stauder et al., 1999), the thresholded frame difference between two consecutive frames is utilized as the initial CDM, but this mask contains both uncovered background region and uncovered shadow region, and excludes the still part of moving objects. For our application, we adopt the background registration algorithm proposed in (Chien et al., 2002) to obtain the initial CDM. This scheme has been verified, both in (Chien et al., 2002) and
Fig. 1. The block diagram of the cast shadow detection algorithm.
94
D. Xu et al. / Pattern Recognition Letters 26 (2005) 91–99
in our experiments, to have better performance than the frame difference technique. Since we just want to have an initial CDM, the morphologic close and open operations in (Chien et al., 2002) are not implemented here. The result is denoted as ICDMi, the initial change detection mask, which is a binary mask where the white pixels, where the grey level is 255, indicate the foreground region while the black pixels, where the grey level is 0, indicate the background region. Actually, shadow is misclassified as foreground region, as shown in Fig. 3(b). 2.2. Canny edge detection For indoor video sequences, the shadow tends to be insignificant, i.e. the edge of the shadow is not as sharp as the moving object, so Canny edge detection algorithm is applied, which is described as follows. CEi ¼ UðrG fi Þ;
i ¼ 1; . . . ; n
ð1Þ
where CEi denotes the result of Canny edge detection, which is a binary mask, where the white pixels with grey value 255, indicate the Canny edge. $ means gradient operation, G*f means Gaussian convoluted image. U means non-maximum suppression to the gradient magnitude, thin the edge, the threshold operation with hysteresis to detect and link edges. The edge of the shadow region is suppressed in Canny edge detection, as shown in Fig. 4(b). Although the edge between the shadow region and the background in Fig. 4(c) is obvious, it is not detected in Fig. 4(b). So, Canny edge can be applied to distinguish shadow region from moving objects. Another experiment is also shown in Fig. 3(c). 2.3. Shadow region detection This is the kernel part of our proposed shadow removal algorithm, and the block diagram is in Fig. 2. There are several steps, and the details will be described as follows. 2.3.1. Step 1: Edge extraction Part of the edge of ICDMi is not detected by Canny edge detection (Canny, 1986), so Ei, the
edge pixels of ICDMi are computed firstly. Since ICDMi is a binary mask, it is easy to compute its edge. Those pixels, which lie in the foreground region of ICDMi but at least one of their neighbor is in background region, are selected as Ei, i.e. Ei ¼ fp j ICDM i ðpÞ ¼ 255; 9q ¼ NGðpÞ; ICDM i ðqÞ 6¼ 255g;
ð2Þ
where p and q denote the image pixel, function NG is for calculating neighbors, i.e. q is one of the 4neighbourhood of p. The result of this step is show in Fig. 3(d), where the white pixels indicate the edge of ICDMi. 2.3.2. Step 2: Multiple frames integration When shadow appears in textured background, although the insignificant shadow brings less edge, there are still many Canny edges in shadow region, which are caused by static background region. This is shown in Fig. 3(c). So, distinguishing shadow region and moving object becomes difficult if they are not removed beforehand. Actually it is impossible for a computer to distinguish whether the Canny edges in the mask belonging to a moving object or the static background by one frame information. However, static Canny edges, should appear in the same locations at some of the past frames. So, the Canny edge information of two consecutive frames can be used to remove these static Canny edges. However we observe that some part of the moving object usually stops moving in a short period, so we roughly classify the Canny edges as moving Canny edges and static Canny edges through multiple frames information by 8 static Canny edge > > > > > if CEi ðpÞ ¼ 255 and > > > > < CEin ðpÞ ¼ 255 and ð3Þ ICEi ðpÞ ¼ > > CEim ðpÞ ¼ 255 for i P m > > > > > moving Canny edge > > : otherwise where ICEi(p) denotes the information of the Canny edge (ICE) at pixel p. The ICE of one pixel is static Canny edge only when that pixel is judged as Canny edges in theith, (i m)th and (i n)th
D. Xu et al. / Pattern Recognition Letters 26 (2005) 91–99
95
Fig. 2. The block diagram of shadow region detection.
Fig. 3. The illustration of the cast shadow detection algorithm (Erik frame46) (a) fi, (b) ICDMi, (c)CEi, (d) Ei, (e) ICEi, (f) IMCEi, (g) ISSRi, (h) SSRi, (i) DIMCEi, (j) SRi, (k) IMOi and (l) MOi.
frame simultaneously (m > n > 0). For the first m frames, there are not enough frames to be applied
as reference, so the Canny edge masks of the next m frames are implemented, i.e.
96
D. Xu et al. / Pattern Recognition Letters 26 (2005) 91–99
Fig. 4. Few Canny edges in insignificant shadow region (Hall frame 53) (a) fi, (b)CEi and (c) ICDMi.
8 static Canny edge > > > > if CEi ðpÞ ¼ 255 and > > > < CE ðpÞ ¼ 255 and iþn
ð4Þ > CEiþm ðpÞ ¼ 255 for i < m > > > > > moving Canny edge > : otherwise Since multi-frame information is applied, nearly all the Canny edges, which belong to the moving object, are judged asmoving canny edge correctly, thus it is easy to make the edge of moving object close through simple morphologic dilation operation as in step 6. However due to the strict criterion, some Canny edges, which lie in the shadow region, are misclassified as moving canny edge too. But because enough Canny edges in the shadow region are classified as static Canny edges correctly, the shadow region can be distinguished clearly from the moving object region. This is shown in Fig. 3(e), where the white pixels indicate the pixels with the ICE of moving canny edge. From Fig. 3(e), we can clearly distinguish the shadow region from moving object region. ICEi ðpÞ ¼
2.3.3. Step 3: Compute interior moving canny edge (IMCE) According to ICDMi, those pixels with ICE of moving Canny edge can be divided into two categories IMCEi and exterior moving Canny edge EMCEi. IMCEi is defined by IMCEi ¼ fpjICEi ðpÞ ¼ moving Canny edge; 9 min kp qk < T 1 ; ICDM i ðqÞ ¼ 255g ð5Þ where q is the T1-neighborhood of p, k k means the chessboard distance. This equation explains that the pixel with the ICE of moving Canny edge, is considered as in IMCEi only when they are neigh-
boring with the foreground region of the initial CDM. T1 is applied here since the edge of the initial CDM is not precise enough. The result is shown in Fig. 3(f), where the white pixels denote IMCEi. We define EMCEi by EMCEi ¼ fpjICEi ðpÞ ¼ moving Canny edge; p 62 IMCEi g ð6Þ 2.3.4. Step 4: Edge matching Since some part of Ei could not find any corresponding moving Canny edge, thus they can be chosen as the initial seed points of the shadow region ISSRi, i.e. ISSRi ¼ fpjp 2 Ei ; 8 min kp qk < T 2 ; ICEi ðqÞ 6¼ moving Canny edgeg
ð7Þ
This equation explains that the pixel in Ei, can not find any corresponding pixel with ICE of moving Canny edge, in T2-neighborhood, are considered as the initial seeds of the shadow region. T2 is applied to tolerate the difference between the Canny edge and Ei. The result of this step is illustrated in Fig. 3(g). It shows that enough connected seed points are obtained. However, some sporadic false seed points are detected in the moving object area as well, which will be removed in the next step. 2.3.5. Step 5: Seed region formation In order to remove the false initial seed points, the classic connect component algorithm (Haralick and Shapiro, 1992) is implemented here to connect the initial seed points. Among all connected seed points, only those, whose size is greater than a certain threshold T3, are considered as the correct seed regions. Here the size of one region means the number of pixels within.
D. Xu et al. / Pattern Recognition Letters 26 (2005) 91–99
97
The result is denoted as SSRi, the seeds of shadow region, as shown in Fig. 3(h). Compared with Fig. 3(g), those false initial seed points are removed and only the correct initial seed points, corresponding to the edge of shadow region, are preserved.
The conditional dilation operation is described in Table 1. The final result of conditional dilation is denoted as SRi, the shadow region, as shown in Fig. 3(j). From it we observe that most parts of the shadow region are detected correctly.
2.3.6. Step 6: Morphologic dilation Few small gaps appear at the common boundary of moving object and shadow region, which is mainly due to the wrong connection in the heuristic searching step of Canny edge detection in (Canny, 1986). In order to remove them, morphologic dilation operation is applied, i.e.
2.4. Shadow region removal
DIMCEi ¼ IMCEi B1
ð8Þ
where DIMCEi denotes the dilated IMCE, B1 is the 3 · 3 square structure element, as shown in Fig. 5(a). Actually, the small structure element is enough to fill these small gaps, as in Fig. 3(i).
With ICDMi and SRi, the initial moving object mask IMOMi is acquired by IMOM i ¼ ICDM i SRi
ð10Þ
The detected initial moving object IMOi is simply the region of the original frame in IMOMi, as shown in Fig. 3(k). From it, we find that the moving object is extracted quite well. However, there is still some small part of retained shadows, so postprocessing can be applied to remove them. 2.5. Post-processing
2.3.7. Step 7: Conditional dilation With SSRi, ICDMi and DIMCEi, the conditional dilation operation (Haralick and Shapiro, 1992) can be applied to acquire the shadow region. The structure element B2 used in conditional dilation is shown in Fig. 5(b), where only 4-neighbourhood is considered, and the constraints conditions S is defined by S ¼ fpjp 2 ICDM i ; p 62 DIMCEi g
1 1 1
1 1 1
1 1 1
(a)
0 1 0
ð9Þ
1 1 1
0 1 0
(b)
Fig. 5. The structure element used in dilation operation. (a) B1 and (b) B1.
Table 1 Conditional dilation algorithm Let C0 = SSRi, m = 0 Do m=m+1 Cm = (Cm1 B) \ S Until Cm = Cm1
Post-processing step is applied to remove the retained shadow part and tune the boundary of moving object. Firstly, a morphologic erosion operation is applied with 5 · 5 square structure element. Secondly, the connected component algorithm (Haralick and Shapiro, 1992) is implemented again to remove the noise (some small retained shadow regions) in the background region. Finally, a morphologic dilation operation with 3 · 3 square structure element is used to compensate the erosion operation. Thus, MOi the actual moving object is obtained, which is shown in Fig. 3(1).
3. Experimental results Simulations have been carried out on the standard MPEG-4 test sequence Hall and Erik. The format of Hall is CIF, 352 * 288 at 30 fps. The shadow is caused by the indoor light source, and it is illustrated in Fig. 6(a). Fig. 6(b) shows the result of gradient filter approach (Chien et al., 2002), since the moving object has weak edge, some part of the moving object is misclassified as the background region. However our algorithm can leave a satisfying video segmentation result, as shown
98
D. Xu et al. / Pattern Recognition Letters 26 (2005) 91–99
in Fig. 6(c). Another example is for sequence Erik with the format CIF, 352 * 288 at 10 fps. The shadow, which is caused by diffuse light and a spot light, appears clearly at the left side of the person as in Fig. 6(d). The result of gradient filter approach is shown in Fig. 6(e), since shadow appears in the region with strong texture, gradient filter fails to remove it thoroughly. However the video segmentation result of our algorithm is quite satisfying, as shown in Fig. 6(f). Other video segmentation results are also given in Fig. 7, which demonstrate the performance of our proposed algorithm.
In this version of our algorithm, there are several parameters that need to be pre-defined. As mentioned in the corresponding descriptions of them, the frame intervals n and m are chosen as 3 and 5, respectively. We do not set them to be 1 and 2, respectively, because the former choice can better preserve edges when the moving object moves very slowly. T1, T2 are set as 2 to tolerate the imprecision of ICDMi, and T3 is set as 30 for removing the false initial seed points. Other choices of these parameters are possible. In our future work, we will do more experiments to see whether there are other better combinations of the parameters.
Fig. 6. Results of our shadow removal algorithm and the comparison with the gradient filter approach (hall frame 85 and Erik 23). (a, d) ICDMi, (b, e) Results of gradient filter and (c, f) MOi.
Fig. 7. Video Segmentation Results (Hall frame 46, 121, 255 and Erik frame 6, 27, 36).
D. Xu et al. / Pattern Recognition Letters 26 (2005) 91–99
4. Conclusions This paper presented a new approach to the detection of moving cast shadows in a normal indoor environment. Compared with outdoor shadows at daytime, these shadows are generally insignificant and more difficult to detect. In our approach, a number of techniques have been developed to achieve the goal. They include the generations of initial change detection masks and Canny edge maps, shadow region detection by multi-frame integration, edge matching and condition dilation, and post-processing. We have tested our algorithm with some standard MPEG-4 testing sequences and compared it with a recently published method called gradient filter (Chien et al., 2002). The results show that our approach performs better. It can produce satisfactory segmentation when shadows appear in both smooth and highly textured backgrounds. Further work includes doing more experiments to see whether there are better choices of parameters and combining our approach with others to achieve better performance.
Acknowledgments The researches were supported by the Nature Science Fund of Anhui, PR China, under grant
99
no. 03042307. X. Li and Y. Yuan were also under Prof. Z. LiuÕs directions in USTC during the work.
References Canny, J.F., 1986. A computational approach to edge detection. IEEE Trans. Pattern Anal. Machine Intell. 8, 679–698. Chien, S.Y., Ma, S.Y., Chen, L.G., 2002. Efficient moving object segmentation algorithm using background registration technique. IEEE Trans Circuits Syst. Video Technol. 12, 577–585. Cucchiara, R., Granan, C., Piccardi, M., Prati, A., 2001. Detection objects, shadows and ghosts in video streams by exploiting color and motion information. In: Proc. Internat. Conf. Image Anal. Process., pp. 26–28. Haralick, R.M., Shapiro, L.G., 1992. Computer and Robot Vision. Addision-Wesley, Reading, MA, pp. 28–48. Mech, R., Wollborn, M., 1998. A noise robust method for 2D shape estimation of moving objects in video sequences considering a moving camera. Signal Process. 66, 203–217. Meier, T., Ngan, K.N., 1999. Video segmentation for contentbased coding. IEEE Trans. Circuits Syst. Video Technol. 9, 1190–1203. MPEG Video Group, MPEG-4 video verification model version 11.0 ISO/IEC JTC1/SC29/WG11 MPEG98/N2172, Tokyo, Japan, March 1998. Neri, A., Colonnese, S., Russo, G., Talone, P., 1998. Automatic moving object and background separation. Signal Process. 66, 219–232. Stauder, J., Mech, R., Ostermann, J., 1999. Detection of moving cast shadows for object segmentation. IEEE Trans. Multimedia 1, 65–76.