Tracking multiple people with recovery from partial and total occlusion

Pattern Recognition 38 (2005) 1059 – 1070 www.elsevier.com/locate/patcog Tracking multiple people with recovery from partial and total occlusion Char...

Download PDF

971KB Sizes 0 Downloads 25 Views

Report

PDF Reader
Full Text

Pattern Recognition 38 (2005) 1059 – 1070 www.elsevier.com/locate/patcog

Tracking multiple people with recovery from partial and total occlusion Charay Lerdsudwichai, Mohamed Abdel-Mottaleb∗ , A-Nasser Ansari Department of Electrical and Computer Engineering, University of Miami, 1251 Memorial Drive, Coral Gables, FL 33146, USA Received 14 June 2004; accepted 15 November 2004

Abstract Robust tracking of multiple people in video sequences is a challenging task. In this paper, we present an algorithm for tracking faces of multiple people even in cases of total occlusion. Faces are detected ﬁrst; then a model for each person is built. The models are handed over to the tracking module which is based on the mean shift algorithm, where each face is represented by the non-parametric distribution of the colors in the face region. The mean shift tracking algorithm is robust to partial occlusion and rotation, and is computationally efﬁcient, but it does not deal with the problem of total occlusion. Our algorithm overcomes this problem by detecting the occlusion using an occlusion grid, and uses a non-parametric distribution of the color of the occluded person’s cloth to distinguish that person after the occlusion ends. Our algorithm uses the speed and the trajectory of each occluded person to predict the locations that should be searched after occlusion ends. It integrates multiple features to handle tracking multiple people in cases of partial and total occlusion. Experiments on a large set of video clips demonstrate the robustness of the algorithm, and its capability to correctly track multiple people even when faces are temporarily occluded by other faces or by other objects in the scene. 䉷 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Face tracking; Multiple people; Occlusion recovery; Video

1. Introduction Detecting and tracking multiple people in a video sequence is important for automated video surveillance and monitoring systems. Face tracking can provide input to higher-level processing modules such as face or activity recognition for surveillance applications. Several researchers have proposed different face detection and tracking methods. A review of face detection techniques can be found in Ref. [1]. Most of the methods for tracking faces are model-based [2–5], where face models are used for detection and tracking. In Refs. [2–4,6], a face is ∗ Corresponding author. Tel.: +1 305 284 3825; fax: +1 305 284 4044. E-mail address: [email protected] (M. Abdel-Mottaleb).

represented by a 3D model and tracking is performed using motion compensation on the points of the facial features, e.g. eyes, nose, and mouth, in 2D, where the 2D facial features are the projections estimated from the 3D model. 3D face models are also used in Ref. [5] with eigenface for pose estimation and matching. Generally, 3D model-based approaches are reliable and accurate. However, their computational cost is usually high, which makes them unsuitable for real-time applications. On the other hand, color-based methods [5,7–13] can handle real-time processing. These methods rely on the fact that the skin-color is invariant to face orientation and insensitive to partial occlusion. Usually, a generic skin-color model is created from a large training set of face images. The model is then used for face detection and tracking. Tracking faces with a generic skin-color model fails when the face is occluded by another face or

0031-3203/$30.00 䉷 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2004.11.022

1060

C. Lerdsudwichai et al. / Pattern Recognition 38 (2005) 1059 – 1070

another object with similar colors. In Ref. [12], a face tracking algorithm using skin-color and a shape template was presented, where motion information was used for predicting a search region for the face in the next image in the sequence. This method could track multiple faces; however, it could not handle total occlusion and signiﬁcant change in the size and the orientation of the face. In Refs. [9,10], skin-color and other facial features were used to avoid the high false rate of detection and tracking. However, a face missing any facial features because of occlusion could not be tracked. In Refs. [7,9], systems werepresented for tracking a face and controlling camera by panning, tilting, and zooming to maintain the face at the center position. These systems assume only one face in the image sequence and do not handle tracking multiple faces. In Ref. [14], it was shown that integrating multiple visual cues increases the robustness of object tracking. In this paper, we present an approach for tracking multiple people with recovery from both partial and total occlusion. The tracker initially detects faces and represents each person by the non-parametric distributions of colors of the face region as well as the non-parametric distribution of the cloth. It then uses the mean shift method [15,16] for tracking the faces. The algorithm uses multiple cues to detect and recover from occlusion, e.g., an occlusion grid, the speed, and the motion trajectory of a person’s face in each frame. These features are used to predict the area, within the video frame, that should be searched in order to recover tracking the occluded face. Experiments were conducted using more than thirty video clips, both outdoor and indoor, with multiple people moving around. The results demonstrate the robustness of the algorithm and its capability to deal with occlusions. This paper is organized as follows: Section 2 presents the details of the tracking algorithm. Experimental results are provided in Section 3 and the conclusions are presented in Section 4.

2. The tracking algorithm There are a number of existing systems for face detection, recognition, tracking, and facial expression analysis which employ skin detection as an initial step to locate faces [11,17–27]. Table 1 summarized some of the previous research in face detection. In our algorithm, we initialize the tracking process by face detection. The face detection algorithm locates the faces in the grayscale version of the image [28], then, it veriﬁes the results using the skin color model in Ref. [17]. The face detection in Ref. [28] is a variant of the Adaboost algorithm [29], which is provided in the OpenCV library [30]. Our tracking system models each face by a nonparametric distribution of the colors in the face region. The tracker initially uses the non-parametric distribution and the mean shift method [16,15] for face tracking. The non-

Video Frames

Face Detection

Person Models

Mean Shift Tracker Occlusion Detection/Recovery Fig. 1. Block diagram of the face tracking algorithm.

parametric distribution of the cloth’s color for each person is used in the occlusion recovery to identify the person after occlusion. During mean shift tracking, location, speed, and motion trajectory of each person are updated into the person’s model. They are used for predicting the search area for the occluded person. Fig. 1 shows the block diagram of the tracking algorithm. When the system detects an occlusion event, it attempts to recover the occluded person by using multiple cues, which are kept and updated in the person’s model. The details of the algorithm are presented in the following sections.

2.1. Non-parametric person’s model Many researchers proposed skin color for face detection and tracking. When using skin color, there are two issues, i.e., the color space used and the representation of the skin color in this space. Vezhnevets et al. [31] presents a survey of skin color detection techniques which includes color spaces, distribution modeling, and also segmentation methods. The skin color can be modeled either by parametric or non-parametric distributions. Each model has advantages and disadvantages; as discussed in Ref. [31]. In our work, after face detection, each person is modeled with two nonparametric distributions, one for the face region and one for the cloth region. These distributions are non-parametric kernel density estimations [32]. Non-parametric kernel density estimation can represent any mixture of colors and/or patterns. We only use two components, Cb Cr of Y C b Cr color space, and use our method in Ref. [17] for lighting compensation to reduce the color sensitivity to lighting variations. The representation of each region is then derived by employing a convex and monotonically decreasing function, i.e., Epanechnikov kernel, which assigns a smaller weight to the locations of pixels that are farther from the center of the region, as shown in Fig. 2. Giving the distribution of colors in a face region, let px ij , be a pixel location inside the face region with the origin at the center of the face region; the non-parametric distribution of the face, Q, computed by Q = {qu ; u = 1 . . . m},

(1)

Table 1 Summary of face and human tracking techniques Year

Features for face detection

Tracking method

Multiple tracking

Occlusion recover

K. Schwerdt et al. [7]

2000

Skin color

No

No

S. Spors et al. [8]

2000

Skin color, eigeneyes

No

No

R. Herpers et al. [9]

1999

Skin color, eigeneyes

No

No

V. Vezhnevets et al. [10]

2002

No

No

J. Yang et al. [11]

1996

Skin color, face elliptical shape, edge of eyebrows, color and shape of mouth region, the change of brightness gradient in nostril regions Skin color

Track the skin region based on color histogram Track eyes by a block matching technique, predict location by linear Kalman ﬁlter Predict location using motion of the face region Locate an elliptical shape of skin color region, verify by locating facial features in the skin area

Yes

No

L. Wang et al. [12]

2002

Skin color, face elliptical shape

Yes

No

M. Hunke et al. [21]

1994

Skin color, face elliptical shape and size

No

No

W.N. Long Jiao Han et al. [40]

2003

No face detection

Predict search region of face from motion estimation, then apply face detection Find moving silhouettes using background substraction and histogram projection of moving regions, locate face region using template matching Repeat face detection on the image sequence Track moving foreground region using Kalman ﬁlter

Yes

R. Liang et al. [41]

2003

Skin color

No

J. Ruiz-del-Solar et al. [42]

2003

Skin color, detect faces by cascade adaboost

Build a 3D face model based on the 2D facial features, estimate locations of 2D facial features for the next frame using Kalman ﬁlters, verify facial features by matching, adjust the 3D model based on 2D facial features Repeat face detection on the image sequence

Yes Kalman ﬁlter and matching moving region to reidentify No

Yes

No

C. Lerdsudwichai et al. / Pattern Recognition 38 (2005) 1059 – 1070

Authors

1061

1062

C. Lerdsudwichai et al. / Pattern Recognition 38 (2005) 1059 – 1070

Fig. 2. Non-parametric distribution of CbCr after employing Epanechnikov Kernel function.

where qu = C

x,y i=1,j =1

k(px ij 2 )[b(px ij ) − u].

(2)

is the Kronecker delta function, k is the Epanechnikov kernel function and, b is the mapping function which associates with the pixel at location px ij the index b(px ij ) of the histogram bin corresponding to the color of that pixel. C is the normalization constant derived by imposing the con dition u=1 qu = 1. The non-parametric distribution of the cloth is also computed the same way. Given the non-parametric distributions of the face model and the candidate face; the similarity or Bhattacharyya coefﬁcient can be computed by: (y) ≡ [Py , Q] =

m pu (y)qu ,

(3)

u=1

where Py is the non-parametric distribution of the candidate face at position y in the image. Fig. 3 shows a video

Fig. 3. Rectangular borders generated around the faces/clothes after face detection.

frame with three people, after face detection, with the regions to derive the models marked by rectangles. We assume that both parts, i.e., the face and the body, always move together. This constraint helps the tracking system avoid confusion when other objects, that have similar color to the skin,

C. Lerdsudwichai et al. / Pattern Recognition 38 (2005) 1059 – 1070

1063

occlude the face. It also helps the tracking system recover each tracked face after the occlusion ends, as explained in detail in Section 2.3.

2.2. Tracking To initialize the persons’ models, we assume that the models are built while people are isolated in the scene so that the face detection algorithm can initially locate the faces. Lots of research was conducted in the past to segment or track multiple persons. In Refs. [33–35], they use techniques that are either based on heuristics or they assume that the people are separated at the beginning of the tracking. Elgammal and Davis [33] rely on the assumption that the people are initially isolated in order to segment the individuals and obtain their models to initialize the tracking. They model each person by the major regions of the body such as head, torso, and legs. Similarly, Siebel and Maybank [34] and Zhao and Nevatia [35] initialize the individual models before tracking. In Ref. [36], they use multi cameras to provide different views of the same scene and help in segmentation and tracking. The system automatically models people by observing them over the time from the sequences of the multiple cameras. The beneﬁt of using multiple cameras is that if people are occluded in one view, they might be isolated in another. In Ref. [37], they presented a tracking algorithm that works for a static camera and static background. The system tracks multiple people by particle ﬁlters which are limited by the dimensionality of the state space. Therefore, extending their approach to tracking a large number of humans may be difﬁcult. In Ref. [38], they presented an approach for multiple objects tracking with probabilistic exclusion of occluded objects. It is a general algorithm for object tracking, but in the paper they did not show experiments for tracking people under different environments and conditions. Our tracking system starts by creating models for people, i.e., faces and clothes. The faces are detected automatically with the assumption that people are initially isolated. We have used this assumption to simplify our algorithm. One solution to relax this assumption is to extract motion and only build the cloth model when both the face and the body are moving consistently. In this case, if the cloth of the person to be modeled is occluded by a stationary object in the scene or by a moving object that has inconsistent motion with the person, we only build the cloth model when the motion becomes consistent. In the case, where the person is occluded by a moving object that has motion consistent with the person face, the cloth model will be wrong and the person will only be tracked correctly as long as the face does not get occluded. Fig. 4 shows the values of the Bhattacharyya coefﬁcients for the shown image where the face in the bounding rectangle is used as a face model. The peak value of the Bhattacharyya coefﬁcient is at the location of the face model in the frame.

Fig. 4. Bhattacharyya coefﬁcients calculated for every position in the image, with the face inside the bounding rectangle used as the model.

The tracker follows the mean shift algorithm for each video frame. The locations of the faces are updated in the occlusion grid and the motion information is updated in the person’s model, as show in the diagram in Fig. 1. The mean shift algorithm is used only to track the face of each person. The models of the persons’ clothes are used to identify each person after occlusion ends as will be explained later. Unlike the original mean shift where the size of a tracked object is kept constant or allowing only for increase or decrease by a ﬁxed percentage, we allow the size to increase or decrease adaptively based on the area of the skin-colored region. This allows for tracking people while moving towards or away from the camera. In order to prevent the tracker from being confused if two faces come closer to each other, i.e., increase the size of one of the models to include the other face, we do not allow size change if overlap of faces is detected from the occlusion grid. 2.3. Occlusion People always interact in groups, which increases the chances of partial or complete occlusion by other people or other objects in the scene. We deal with the occlusion problem in our tracking system by using multiple cues. Our tracking system detects the occlusion events, and recovers each tracked face after the occlusion ends. During occlusion, we look for faces using the face detection algorithm around predicted areas where occlusion occurred. The same process is applied when multiple occlusion occurs. Once a face is detected, the algorithm uses the non-parametric distribution

1064

C. Lerdsudwichai et al. / Pattern Recognition 38 (2005) 1059 – 1070

1

1

1

1

1

1

1

1

1

1

1

1

2

2

2

2

2

3

3

2,3

2,3

2

2

2

3

3

2,3

2,3

2

2

2

Fig. 5. Occlusion grid for three faces, where face 2 and face 3 overlap.

of the cloth to calculate the similarity with the clothes’ models of the occluded people. Based on the similarity values the algorithm determines whether that person was occluded before or is a new appearing person. 2.3.1. Occlusion detection A face can become occluded by another tracked face or by another object. In the ﬁrst case, where the face is occluded by another tracked face, the occlusion detection is achieved using an occlusion grid. The locations that the objects occupy in the image are recorded into the occlusion grid. This grid is used to determine the locations of the moving objects and their overlap. The occlusion grid we use [39] has the same size as the video frames, i.e., each cell of the grid represents a pixel in the video frame. The occlusion grid is initialized after face detection by ﬁlling the cells with the face number that exists at the corresponding pixels. During the tracking, the new locations of tracked faces are used to update the occlusion grid. To detect occlusion and determine which faces are involved, we search the cells of the occlusion grid. Cells with more than one face number indicate overlapping faces and their locations. Fig. 5 shows an example of an occlusion grid with three faces, where face 2 and face 3 overlap. The similarity values of faces during the tracking process indicate which face is occluded. Once occlusion occurs, the similarity values for the occluded face will usually drop more than the similarity values for the occluding face. In fact, the similarity values for the occluding face may not drop at all. In the second case, where a person is occluded by an object in the scene, i.e., non-face, the occlusion can be detected by locating signiﬁcant decreases in the similarity value. Fig. 6 depicts the sequence of the similarity values for only the third person which becomes occluded by the second person in the frame nos. 53–94. The ﬁgure shows two sequences in the plot area, one for the original similarity

values and the other is a low-pass ﬁltered version of the original sequence. It is easier to determine the occlusion event by using threshold on the smoothed sequence. Note that the sequence of the similarity values shown in Fig. 6 scans the frames before, during, and after occlusion ends. In a case where there are two people coming close to each other and then one of them occludes the other, the tracker detects the occlusion event from the occlusion grid. The drop in the similarity value for one of them indicates who is being occluded. The tracker can then conﬁrm that by the color distribution of the person’s cloth. The color distribution of the cloth helps determine which face is occluded and also can be used in the occlusion recovery to identify a person when that person reappears again as explained in Section 2.3.2. 2.3.2. Occlusion recovery In many applications, such as human–computer interaction and video surveillance, it is essential that the tracking system continues tracking the correct face after occlusion ends. In the case where the face is occluded by another tracked face, our system uses the color distribution of the face as well as the color distribution of the clothes to identify the correct face after occlusion. To speed up recovery from occlusion, we use the results of tracking from each frame to update the motion trajectory of each face. When the system determines that occlusion occurred, it stops tracking the occluded face. The motion trajectory is used to predict the time (number of frames) and location (area) where the face will appear again. For a given face, the motion vector at frame j, is estimated as follows: Vx = x pos(j ) − x pos(i),

(4)

Vy = y pos(j ) − y pos(i),

(5)

where i, j are previous and current frames, x pos and y pos are the coordinates of face center, and Vx and Vy are the motion vectors of the face center. By assuming that the occluded face will move consistently with the same speed and direction, Vx and Vy , the system initially searches at the predicted location. Because the occluded face may also change the direction of motion after occlusion, if the face is not found at the predicted location, the tracker continues searching in a circular area around the position of the face before occlusion. In the case where a face is occluded by an object in the scene, we do not have any clue for the system to determine a search area. If the occlusion is partial, the mean shift tracking method can recover after the partial occlusion ends. However, if the face were completely occluded, which is indicated by the drop of the similarity value to almost zero, the system assumes that the face has disappeared from the scene. Searching for face reappearance is accomplished by searching for candidate faces using the face detection algorithm. After locating possible faces, the tracker chooses the face with the corresponding cloth model

C. Lerdsudwichai et al. / Pattern Recognition 38 (2005) 1059 – 1070

1065

0.85 0.8

Similarity Values

After Apply Filter 0.75 0.7

Original Curve of Similarity values 0.65 0.6 0.55 0.5 0

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110

Frame No

Fig. 6. Plot of the sequence of similarity values for only the third person before, during, and after occlusion by the second person. Frame nos. 53, 57, 71, 83, and 95 are shown below the plot.

that best matches the occluded person as follows: max{Bhattacharyya(Co, Cpp )},

p∈P

(6)

where the Bhattacharyya coefﬁcient of Co and Cp p is calculated by Bhattacharyya(Co, Cp p ) = [Co, Cp p ].

(7)

Co is the non-parametric distribution of the cloth colors for the occluded person and Cp p is the corresponding nonparametric distribution of the cloth colors of each possible person detected in the search area. Similar to the nonparametric distribution of the face, in Eq. (1), Co of each person at the beginning of face detection is obtained by Co = { qu ; u = 1 . . . m},

(8)

where qu = C

x,y i=1,j =1

k(px ˙ ij 2 )[b(px ˙ ij ) − u]

The indoor scenes were captured in our laboratory; there are walls, glass windows, and cabinets in the background. The outdoor scenes were captured in the daylight, where trees and grass appear in the background. All the video clips were captured with a ﬁxed camera without any zooming or panning. In the following we present some results of the tracking under different scenarios and occlusions.

(9)

and Cp p is obtained similarly.

3.1. Occlusion by an object similar in color to the face Fig. 7 shows an example of a face being occluded by an object (hand) with color similar to the skin-color of the face. The ﬁgure shows a selected set of the frames from the sequence and the similarity values for the sequence. The signiﬁcant decrease in the similarity values indicates the frame numbers in which the hand occludes the face. These were frame nos. 17, 39, 51, 69, 105, 128, 141, 262, 288, 327, 386, and 401. This ﬁgure shows that the tracker still successfully tracks the face, even when there is an occluding object that has similar skin color. The reason for the successful tracking is that the distribution that represents the face includes more than just the skin color, e.g., facial features, and special information. 3.2. Occlusion by another object

3. Experimental results We tested our algorithm with more than thirty video clips that were captured both indoors and outdoors. Each video clip has multiple people walking around, occluding each other at sometimes, and moving behind objects in the scene.

Fig. 8 shows occlusion detection and recovery with two persons in the scene. The ﬁgure shows only frame nos. 1, 14, 24, 30, and 38. Initially two faces are detected by the system. In frame no. 0.14, the ﬁrst person is moving right close to a pole, then she is moving behind the pole at frame no. 24,

1066

C. Lerdsudwichai et al. / Pattern Recognition 38 (2005) 1059 – 1070

1

Similarity Values

0.95 0.9 0.85 0.8 0.75 0.7 0.65 0

25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 Frame No

Fig. 7. Plot of similarity values and sampled frames during tracking a face being occluded by a hand several times in a video clip. The drop in the similarity values indicates the occlusion events. Occlusions occur at frame nos. 17, 39, 51, 69, 105, 128, 141, 262, 288, 327, 386, and 401 as shown below the plot.

the occlusion grid does not indicate an occlusion, because it is not by another person, but the decrease in similarity values for the ﬁrst person indicate that there is an occlusion event. After frame no. 30, the tracker detects a new face. By matching the distribution of the color of the cloth, the tracker conﬁrms that the ﬁrst person reappears. The ﬁgure also shows that when the occlusion occurs, the similarity value drops rapidly at frame no. 21.

3.3. Indoor occlusion by another person Fig. 9 shows occlusion detection and recovery with three people in the scene. The ﬁgure shows only frame nos. 1, 9, 14, and 48. Three people are detected at the ﬁrst frame, where their faces and cloth regions are masked to create the models. The third person is moving closer to the second person in frame no. 9 and the face of third person starts to be

C. Lerdsudwichai et al. / Pattern Recognition 38 (2005) 1059 – 1070

1067

1

Similarity Values

0.8 0.6 Person 2 0.4

Person 1

0.2 0 0

2

4

6

8

10

12 14 Frame No

16

18

20

22

24 25

Fig. 8. Example for occlusion with a non-face object and recovery when the person reappears. Frame no. 1, 14, 24, 30, and 38 are shown below the plot.

1 Similarity Values

0.9 0.8 0.7 0.6 Person 1 Person 2 Person 3

0.5 0.4 0

5

10 Frame No

15

20

Fig. 9. Example of occlusion by a person and recovery when the person reappears. Frame no. 1, 9, 14, and 48 are shown below the plot.

occluded by the second person at frame no. 14. The occlusion grid indicates that the second and third persons are involved in occlusion. However, the similarity values indicate that the third person is occluded by the second person. At frame no. 48, the tracker is successfully tracking the same face after reappearing. This has been achieved by matching the distribution of the color of the cloth after reappearance to that of the occluded person. The plot of similarity values

shows that when the occlusion event happens, the similarity values of the occluded person drops before frame no. 16. 3.4. Outdoor occlusion by another person Fig. 10 shows an outdoor tracking scenario with three occlusion events. The tracker succeeded in following each person before and after occlusion. In this scene, the second

1068

C. Lerdsudwichai et al. / Pattern Recognition 38 (2005) 1059 – 1070

Fig. 10. Outdoor scene with three occlusions.

Fig. 11. Results of tracking a face moving forward and backward from the camera.

Fig. 12. Occlusion recovery failure of the third person when he reappears after occlusion with his face turning away from the camera.

C. Lerdsudwichai et al. / Pattern Recognition 38 (2005) 1059 – 1070

and third persons start walking towards each other where the second person is walking to the right of the frame and occluding the second person. Then he changes the direction by walking back to the left and passing the third and the ﬁrst persons. In this video clip, the second person turns his face but the tracker is still able to follow him. 3.5. Adaptable face size In this experiment we demonstrate the capability of the tracking algorithm to follow a tracked face even in cases where the person moves towards or away from the camera. In this case the size of the face in the images changes and that leads to low similarity values if the size of the model is kept ﬁxed. In our algorithm, during tracking, the algorithm adaptively changes the size of the face window within ±15% of the size of the face in the previous frame. This is done by using the generic skin-color model to detect the face region and ﬁnd the bounding rectangle. During this process we constrain the size of the new bounding rectangle to the range within ±15% of the size the face from the previous frame and calculate the face model using the new bounding rectangle. Fig. 11 shows the result of an experiment of tracking a face that is moving forward and backward from the camera. From the ﬁgure we can see that the tracker is able to follow the face and adapt to the size of the bounding rectangle. 3.6. Occlusion recovery failure Our occlusion recovery fails temporarily in cases when a totally occluded person reappears with the face looking away (non-frontal) from the camera. The reason is that the algorithm relies on the face detection, after total occlusion, to search for faces in order to resume tracking. Fig. 12 shows an example where the algorithm could not resume the tracking when the person reappears. For successful identiﬁcation of the occluded people after their reappearance, the distributions of the clothes’ color should be different from one person to another. Most of the processing time of the tracking system is spent in the mean-shift tracking module, which is very fast. The occlusion recovery is needed only when occlusion occurs and it does not require much computation. The system was implemented using MS C + + running on a 1.6 GHz PC and can process a 352 × 240 video at 15 f/s. 4. Conclusion We presented an algorithm for tracking multiple people. The algorithm is capable of recovering from both partial and total occlusions. The algorithm starts by locating the faces using face detection. While tracking, the algorithm detects occlusions and recovers from the occlusion using multiple cues. When occlusion ends, the algorithm recovers by utilizing the color distribution of the person’s cloth to distinguish

1069

between the people who have been involved in the occlusion. The algorithm uses the previous motion direction and the speed of the occluded face in order to limit the search space when occlusion ends. The experiments show that the algorithm is robust and can handle both partial as well as total occlusion. In the future we plan to study the problem of tracking using multiple cameras that cover different areas. This will involve handoffs between different cameras.

References [1] M. Yang, N. Ahuja, D. Kriengman, Detecting faces in images: a survey, IEEE Trans. PAMI 24 (1) (2002) 34–58. [2] T. Jebara, A. Pentland, Parameterized structure from motion for 3D adaptive feedback tracking of faces, in: IEEE Conference on CVPR, 1997, pp. 144–150. [3] T. Cootes, K. Walker, C. Taylor, View-based active appearance models, in: Fourth International Conference on Automatic Face and Gesture Recognition, Grenoble, France, 2000, pp. 227–232. [4] J. Strom, T. Jebara, S. Basu, A. Pentland, Real time tracking and modeling of faces: an EKF-based analysis by synthesis approach, in: Proceedings of the Modelling People Workshop at ICCV, 1999. [5] Y. Wu, T.S. Huang, Non-stationary color tracking for visionbased human computer interaction, IEEE Trans. Neural Networks 13 (4) (2002) 948–960. [6] A. Colmenarez, R. Lopez, T. Huang, 3D model-based head tracking, in: Visual Communications and Image Processing, 1997. [7] K. Schwerdt, J. Crowley, Robust face tracking using color, in: Automatic Face and Gesture Recognition, Grenoble, France, 2000, pp. 90–95. [8] S. Spors, R. Rabenstein, A real-time face tracker for color video, in: IEEE International Conference on ICASSP, Utah, 2001. [9] R. Herpers, G.K. Derpanis, R.M.J. MacLean, A. Levin, D. Topalovic, L. Wood, A. Jepson, J.K. Tsotsos, Detection and tracking of faces in real-time environments, in: Proceedings of IEEE International Workshop on Recognition, Analysis and Tracking of Faces and Gestures in Real-time Systems /(RATFG-RTS/), 1999, pp. 96–104. [10] V. Vezhnevets, Face and facial feature tracking for natural human–computer interface, in: Conference of Graphicon, 2002. [11] J. Yang, A. Waibel, A real-time face tracker, in: Proceedings of the Third Workshop on Applications of Computervision, 1996, pp. 142–147. [12] L. Wang, T. Tan, W. Hu, Face tracking using motion-guided dynamic template matching, in: Fifth Asian Conference on Computer Vision, 2002. [13] P. Fieguth, D. Terzopoulos, Color-based tracking of heads and other mobile objects at video frame rates, in: IEEE Proceedings of the Conference on CVPR, 1997. [14] C. Rasmussen, G.D. Hager, Joint probabilistic techniques for tracking objects using multiple visual cues, in: IEEE International Conference on Intelligent Robots and Systems, 1998, pp. 191–196.

1070

C. Lerdsudwichai et al. / Pattern Recognition 38 (2005) 1059 – 1070

[15] D. Comaniciu, P.M.V. Ramesh, Real-time tracking of nonrigid objects using mean shift, in: IEEE Conference on CVPR, 2000. [16] D. Comaniciu, P. Meer, Mean shift analysis and applications, in: International Conference on Computer Vision, 1999, pp. 1197–1203. [17] R. Hsu, M. Abdel-Mottaleb, Face detection in color images, IEEE Trans. PAMI 24 (5) (2002) 696–706. [18] J. Cai, A. Goshtasby, C. Yu, Detecting human faces in color images, in: International Workshop on Multi-Media Database Management Systems, 1998, pp. 124–131. [19] Q. Chen, H. Wu, M. Yachida, Face detection by fuzzy pattern matching, in: IEEE Fifth International Conference on Computer Vision, 1995, pp. 591–596. [20] M. Collobert, R. Feraud, G. Le Tourneur, D. Bernier, J.E. Vaiallet, Y. Mahieux, D. Collobert, Listen: a system for locating and tracking individual speakers, in: International Conference on Automatic Face and Gesture Recognition, 1996, pp. 283–288. [21] M. Hunke, A. Waibel, Face locating and tracking for humancomputer interaction, in: Proceedings of the Twenty-Eighth ACSSC, 1994. [22] K. Sobottka, I. Pitas, Segmentation and tracking of faces in color images, in: International Conference on Automatic Face and Gesture Recognition, 1996, pp. 236–241. [23] H. Wu, T. Yokoyama, D. Pramadihanto, M. Yachida, Face and facial feature extraction from color images, in: International Conference on Automatic Face and Gesture Recognition, 1996, pp. 345–350. [24] J. Yang, W. Lu, A. Waibel, Skin color modeling and adaptation, in: Proceedings of the Third Asian Conference on Computer Vision, 1998, pp. 687–694. [25] D. Saxe, R. Foulds, Toward robust skin identiﬁcation in video images, in: Second International Face and Gesture Recognition Conference, 1996. [26] J.-C. Terrillon, M. David, S. Akamatsu, Automatic detection of human faces in natural scene images by use of a skin color model and of invariant moments, in: Proceedings of the Third International Conference on Face and Gesture Recognition, 1998. [27] K. Imagawa, S. Lu, S. Igi, Color-based hands tracking system for sign language recognition, in: Proceedings of the Third International Conference on Face and Gesture Recognition, 1998. [28] P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, in: IEEE Proceeding Conference on CVPR, 2001.

[29] Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, Computational Learning Theory, Spinger, Berlin, 1995. [30] Intel Corp., Opencv Library, World Wide Web, http://www.intel.com/research/mrl/research/opencv. [31] V. Vezhnevets, V. Sazonov, A. Andreeva, A survey on pixelbased skin color detection techniques, in: Proceedings of Graphicon, 2003, pp. 85–92. [32] D.W. Scott, Multivariate Density Estimation, WileyInterscience, New York, 1992. [33] A. Elgammal, L.S. Davis, Probabilistic framework for segmenting people under occlusion, in: Proceedings of the International Conference on Computer Vision, 2001. [34] N.T. Siebel, S. Maybank, Fusion of multiple tracking algorithm for robust people tracking, in: Proceedings of the European Conference on Computer Vision, 2002, pp. 373–387. [35] T. Zhao, R. Nevatia, Bayesian human segmentation in crowded situations, in: IEEE Proceedings of the Conference on CVPR, 2003. [36] A. Mittal, L. Davis, M2tracker: a multi-view approach to segmenting and tracking people in clustered scene, Int. J. Comput. Vision 51 (2003) 189–203. [37] M. Isard, J. MacCormick, Bramble: a bayesian multiple-blob tracker, in: Proceedings of the International Conference on Computer Vision, 2001. [38] J. MacCormick, A. Blake, A probabilistic exclusion principle for tracking multiple objects, Int. J. Comput. Vision 39 (2002) 57–71. [39] H. Hey, R.F. Tobler, Lazy occlusion grid culling, Technical Report TR-186-2-99-12, Institute of Computer Graphics and Algorithms, Vienna University of Technology, March 1999. [40] W.L.J. Han, D.Y.-F. Wang, Real-time multi-person tracking in video surveillance, in: Proceedings of the 2003 Joint Conference of the Fourth International Conference on Information, Communications and Signal Processing, vol. 2, 2003, pp. 1144–1148. [41] R. Liang, C. Chen, J. Bu, Real-time facial features tracker with motion estimation and feedback, in: Proceedings of the International Conference on Systems, Man and Cybernetics, vol. 4, 2003, pp. 3744–3749. [42] J.R. del Solar, A. Shats, R. Verschae, Real-time tracking of multiple persons, in: Proceedings of the 12th International Conference on Image Analysis and Processing, 2003.

Tracking multiple people with recovery from partial and total occlusion

Tracking multiple people with recovery from partial and total occlusion

Recommend Documents