J. Vis. Commun. Image R. 19 (2008) 426–436
Contents lists available at ScienceDirect
J. Vis. Commun. Image R. journal homepage: www.elsevier.com/locate/jvci
Ball detection from broadcast soccer videos using static and dynamic features V. Pallavi a,*, Jayanta Mukherjee a, Arun K. Majumdar a, Shamik Sural b a b
Department of Computer Science and Engineering, Indian Institute of Technology, Near IIT Gate No. 5, West Bengal, Kharagpur 721302, India School of Information Technology, Indian Institute of Technology, Kharagpur 721302, India
a r t i c l e
i n f o
Article history: Received 30 January 2007 Accepted 26 June 2008 Available online 4 July 2008 Keywords: Shot classification Soccer analysis Ball detection Trajectory Dynamic programming
a b s t r a c t In this paper, we propose an approach for detecting ball in broadcast soccer videos. We use hybrid techniques for identifying ball in medium and long shots. Candidate ball positions are first extracted using features based on shape and size. For medium shots, a ball is identified by filtering the candidates with the help of motion information. In long shots, after motion based filtering of the non-ball candidates, a directed weighted graph is constructed for the remaining ball candidates. Each node in the graph represents a candidate and each edge links candidates in a frame with the candidates in next two consecutive frames. Finally, dynamic programming is applied to find the longest path of the graph, which gives the actual ball trajectory. Experiments with several soccer sequences show that the proposed approach is very efficient. Ó 2008 Elsevier Inc. All rights reserved.
1. Introduction Sports video analysis, especially ball games like soccer, basketball, tennis, etc., has always received much attention from researchers. Despite a lot of research efforts in sports video, soccer video analysis still remains a challenging task. Analysis and summarization of soccer highlights is an active research topic. Leo et al. [7] and D’Orazio et al. [3] developed ball recognition systems on real soccer images. Kim et al. [6] proposed a method for three dimensional position estimation of a flying soccer ball from a monocular image sequence. However, only a few work [8,10,12,13] were reported on ball detection in broadcast soccer videos. In this regard, ball trajectory analysis is considered to be an important component of soccer video summarization. However, finding the ball position in a broadcast soccer video is a difficult task as the features of the ball (say, its color, size, shape) vary over frames. The relative size of the ball is usually very small. The ball may not be an ideal circle in a frame because of fast motion and illumination conditions. Many other objects in the field or in the crowd may look similar to a ball. For example, some of the body parts are similar to the shape of a ball. Alphabets like O or Q on the billboards may also be detected as a ball. The ball even gets merged with lines in the field like midfield line, center circle and corner arc or sometimes gets occluded by the players. Even the change in field appearance from place to place and time to time makes the ball detection tougher. As a result, it is difficult to define a
* Corresponding author. E-mail addresses:
[email protected] (V. Pallavi),
[email protected] (J. Mukherjee),
[email protected] (A.K. Majumdar),
[email protected] (S. Sural). 1047-3203/$ - see front matter Ó 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.jvcir.2008.06.007
property by which a ball can be uniquely identified from other surrounding objects present in a video frame. Fig. 1 shows some typical ball samples in broadcast soccer videos. Considering all these difficulties and complexities in detecting balls, different authors reported various techniques for discriminating a ball from other objects, apparently similar in shape and size. D’Orazio et al. [2,3] suggest a modified circular Hough transform with neural network classifier to detect the ball from real soccer image sequences. The videos used by them were recorded by their own camera. They do not use broadcast soccer videos. This method fails if objects similar to that of the ball are present in the frame. Seo et al. [10] adapted template matching and Kalman filter based approach to track the ball. But the tracking results were not reported in the paper. Tong et al. [11] consider a condensation based technique for detecting and tracking the ball. They discriminate ball based on combination of color and shape evaluation in a coarse-to-fine process. But the algorithm fails under complex backgrounds. Yu et al. [13,14] use Kalman filter based trajectory mining and trajectory evaluation for tracking ball in broadcast soccer videos. They generate a set of ball candidates by removing salient objects like center circle, goalmouth and players. However, they preprocess the broadcast soccer video to obtain the field color range, line color range and color ranges of both teams to remove them. [8] propose a scheme to track ball in broadcast soccer videos. Their ball detection algorithm is based on the observation that the ball’s color is white in long view shots. Their scheme has two methods, they are: ball detection and ball tracking. In ball detection procedures, ball candidates are first extracted from several consecutive frames using color, texture and size cues. Then a weighted graph is constructed with each node representing a candidate and each edge linking two candidates in adjacent frames.
V. Pallavi et al. / J. Vis. Commun. Image R. 19 (2008) 426–436
Fig. 1. Ball samples in broadcast soccer videos.
Viterbi algorithm is applied to obtain the ball’s locations. Finally, Kalman filter based template matching is used by them to track ball in broadcast soccer videos. In this paper, we propose a scheme for detecting and tracking the ball in broadcast soccer videos. In our work, we consider both long shots and medium shots for detecting the trajectory. It may be noted that, in earlier efforts, Liang et al. [8], Yu et al. [13] and Tong et al. [11] detect ball only in the long shots. But the detection of ball in medium shots is also an important task for analysis of ball possession, goal detection, etc. Therefore we detect ball in both medium and long shots. In our approach, we first classify shots as either close, medium or long and then apply specific algorithms for detecting balls in medium and long shots. Although our work on ball tracking in long shots bears some similarity with the ball detection algorithm suggested by Liang et al. [8], there are a number of important differences. Liang et al. [8] and Yu et al. [13,14] have used color feature of the ball as main source of information to detect it. But the color of the ball to a large extent depends on the illumination conditions. Sometimes the ball may have a shade of green due to the reflection of light from the play field. Color of the ball even changes with sunlight and flood lights. In contrast, we use shape as the most important cue to detect the ball. We use Circular Hough Transform by Hough [5] to detect the ball in a frame. D’Orazio et al. [2], D’Orazio et al. [3] also used Hough transform to detect ball. But, they do not use broadcast soccer videos for their experiments. The results in the paper indicate that they performed experiments on medium shots. However, we detect ball in both medium as well as long shots. Several ball candidates are obtained in a frame on applying the Circular Hough Transform method. A few non-ball candidates are then filtered out using optical flow velocity, camera motion and background subtraction. Our ball candidate detection method is different from Liang et al’s method as we use several filters based on spatial locations and direction of motion to remove most probable non-ball candidates. To detect ball in the long shots, a weighted graph is constructed for all the consecutive frames in a long shot with the ball candidates that are left after filtering. However, the ball may not be identified in some intermediate frames by Hough Transform. To deal with such situations, an edge in the weighted graph links candidates in next two consecutive frames in our method so that a frame which misses the ball can be dropped. Each node in the graph represents a candidate and each edge links candidates in a frame with the candidates in next two consecutive frames. In our method, weights are assigned to the edges based on the intensity correlation between the candidates, whereas Liang et al. use different weights for the nodes and edges. Then dynamic programming is applied to find out the longest path of the graph which is actually the path of the ball. In case of long shots, the ball may not always have a circular shape. Light may produce self shadowing effects on the ball distorting its shape. The ball may also appear elongated in a few frames due to fast motion. Ball is not identified in the frame by the Circular Hough Transform in such situations. Even if the ball is missed in a frame, the trajectory obtained from the longest path of the graph shows the actual ball locations in rest of the frames. The experiments performed by us show that our scheme can work even in bad playfield conditions. In the proposed method, the user selects the ball candidate in the first frame of the sequence. Thus, the longest path in the graph obtained for the selected node gives the path of the ball. On the other hand, Liang et al used ball detection results
427
to initialize Kalman filter for tracking the ball. Yu et al. [13,14] performed preprocessing to obtain the field color range, line color range and color ranges of both teams to remove these non-ball objects. Our algorithm does not require preprocessing for filtering out the non-ball candidates. Seo et al. [10] and Yu et al. [13] used a Kalman filter based approach to generate candidate trajectories from which ball trajectory is selected. The rest of the paper is organized as follows. In Section 2, the methods for shot classification are described. Section 3 describes the detection of ball candidates and the elimination of non-ball candidates. In Section 4, we present our analysis on ball trajectory following the dynamic programming approach. We present experimental results in Section 5 and finally conclude in Section 6 of the paper. 2. Shot classification A video can be represented as a union of similar coherent segments that are obtained by a temporal segmentation process. Each segment, known as a shot, is a continuous sequence of frames captured from the same camera in a video. The frames within a shot represent a continuous action in time and space. Broadcasters use different shot types to convey certain semantics for different domains like sports, news, etc. While broadcasting soccer videos, the soccer directors usually want to convey a global view of the field. But certain events of this sport make it more dramatic and interesting. To maintain viewer’s interest, camera zooms in on events like goal scoring, ball passing, etc. Fans follow the players and so does the camera by zooming in on the face of players during the play. Frames in a soccer video can be classified into long shots, medium shots and close shots. A long shot is one that captures a global view of the field. Medium shot shows a close up view of one or more players in a specific part of the field while a close shot shows an above-waist view of a single player. 2.1. Classification of soccer video shots A soccer field has one distinct dominant color i.e. green, which varies from stadium to stadium and even varies with lighting conditions. In long shots as the game proceeds it has been observed that either grass dominates the entire frame as shown in Fig. 2a or the crowd covers upper part of the frame as in Fig. 2b. Therefore, long shots are characterized by high grass pixel ratio in the lower part of the frame. So, we divide a frame into twelve equal parts such that width to height ratio for each part is 4:3 as shown in Fig. 3. For each frame we extract a region ABCD, as given in Fig. 3 which we call Field region [9]. This region is chosen because, while broadcasting the global view of a soccer match, it has been observed that the broadcasters either show the field in the whole frame or the spectators occupy the region above AB. Thus, in a long shot, the field region is dominated by green color. 2.2. Feature extraction for grass pixels In order to extract features that can be used for classifying pixels as belonging to grass regions (denoted as grass pixels), we have analyzed several videos of Euro Cup 2004 and UEFA Champions League 2003 broadcasted by ESPN and Star Sports, respectively. These matches have been played in different lighting conditions at different venues. In our analysis, we have used YIQ color space. After performing extensive experiments on 4500 frames from five different soccer matches, it is noted that grass pixels have ‘I’ values ranging between Imin and Imax while ‘Q’ values range between Q min and Q max , where experimentally Imin = 25, Imax = 55, Q min = 0 and Q max = 12. The grass pixels retain this IQ range irrespective of the
428
V. Pallavi et al. / J. Vis. Commun. Image R. 19 (2008) 426–436
Fig. 2. Long shot views.
illumination conditions, venues of the matches and the channels that have broadcasted these matches. Fig. 4 shows the corresponding region in IQ space.
grass pixel ratio provides an estimation of the dominant color ratio in the field region. If the dominant color is green i.e., ‘I’ values range between 25 and 55 while ‘Q’ values range between 0 and 12 in the field region and the ratio lies between 0.75 and 1.0, then the frame belongs to a long shot. Medium shots have grass pixel ratio low compared to the long shots. If the grass pixel ratio in the field region of a frame lies between 0.5 and 0.75 then the frame belongs to medium shot or else it belongs to close shot. Some frames are kept under the unclassified shots category where the dominant color ratio lies between 0.75 and 1.0 but the dominant color is not green. We tested the proposed scheme with soccer videos played in Euro Cup 2004 and UEFA Champions League 2003. The testing data includes 6030 long view frames, 500 medium view frames and 1680 close view frames. The performance of our shot classification technique is measured by using a confusion matrix. A confusion matrix defines classification error or misclassification.
2.3. Classification of a frame
Misclassification ¼
Observations from grass pixel features is used to classify a frame, as belonging to long shot, medium shot or close shot. Histograms for I and Q values for each frame are constructed and their peaks are identified. Next, the number of pixels of the field region lying in the range of peak + 15 and peak 15 is estimated. The ratio of these pixels with respect to the field region is computed. This
The number of frames classified into each of the four classes is shown in Table 1. The confusion matrix of the proposed shot classification technique is shown in Table 2. It shows the percentage of frames of a class that were misclassified to each of the other three classes. Percentage of true classification of long shots is 96.68% while for medium and close shots is 83.76% and 87.63%, respectively. The classification error is 5.96%.
Fig. 3. Field region.
No of Misclassified Frames in a Shot Class Total No of Frames in that Shot Class
3. Ball detection A ball in soccer videos is usually circular in shape. It has been observed that the ball generally retains its circular shape in medTable 1 Results for shot classification technique
Fig. 4. Grass values.
True class
Predicted class Long shot. No. of frames (% of total frames)
Medium shot. No. of frames (% of total frames)
Close shot. No. of frames (% of total frames)
Unclassified shot. No. of frames (% of total frames)
Long shot Medium shot Close shot
5830 (71.01) 37 (0.45) 78 (0.95)
32 (0.39) 418 (5.09) 110 (1.33)
24 (0.29) 38 (0.46) 1472 (17.93)
144 (1.75) 7 (0.085) 20 (0.24)
429
V. Pallavi et al. / J. Vis. Commun. Image R. 19 (2008) 426–436 Table 2 Confusion matrix for shot classification technique True class
Long shot Medium shot Close shot
Predicted class Long shot (% of misclassified frames)
Medium shot (% of misclassified frames)
Close shot (% of misclassified frames)
Unclassified shot (% of misclassified frames)
— 7.4
0.53 —
0.39 7.6
2.39 1.4
4.64
6.55
—
1.2
ium shots because of the close view though it may not be always circular in long shots due to lighting conditions, shadows, occlusion or high velocity of the ball. So we apply different techniques to identify ball in medium shots and long shots. A block diagram for ball detection in medium and long shots is shown in Fig. 5. 3.1. Ball candidate detection by hough transform Ball candidates are detected using Hough transform for both medium as well as long shots. Before applying Hough transform, smoothing is done on a video frame to reduce the amount of noise.
Then Sobel operators are applied to determine the edges and edge directions. Next, circular Hough transform is applied and the resulting parametric space is convoluted with the Mexican Hat filter in order to enhance the centers of the circles. The parametric space is then thresholded and the centers of the circles are detected. After localization of the circle’s centers, radii of the circles are computed. For a given center, accumulating in a one dimensional space (R-space) gives the coordinate that is the radius of the concentric circles. Thus, accumulating in R-space gives the most suitable circle with the given centers. In the example soccer videos associated with UEFA Champions League 2003, Euro Cup
Fig. 5. Block diagram for ball detection in long and medium shots.
430
V. Pallavi et al. / J. Vis. Commun. Image R. 19 (2008) 426–436
Fig. 6. Distribution of ball radius.
Fig. 7. Ball identified in a medium shot view.
2004 and UEFA Champions League 2007, we observe that the radius of the ball in the medium shots range between M min and Mmax , where empirically M min equals to 10 and M max equals to 15 pixels. However, radius of the ball in long shots ranges between Lmin and Lmax , where Lmin equals to 1 and Lmax equals to 4 pixels. The thresholds were obtained after performing experiment on 150 medium view and 150 long view frames of the test data. Fig. 6 shows the distribution of ball radius for these frames. These thresholds are obtained for frames having a resolution of 352 288. However, it has been experimentally found that the threshold values change proportionally with the increase or decrease of the frame size. Thus, after filtering out the circles having size outside the range of (Mmin , M max ) and (Lmin , Lmax ) for medium and long shots, respectively, we get a few circles in the frame which are the most probable ball candidates. Different filters are then applied to remove non-ball candidates from medium shots and long shots.
velocity of the ball is greater than any other moving objects present in a frame. In some medium shots, the players following the ball may also have velocity similar to that of the ball. So we find the velocities of each ball candidate as obtained by the Hough transform. We use optical flow velocity method by Horn and Schunck [4] to compute the velocities of the candidates. Optical flow velocity is the distribution of apparent velocities of movement of brightness patterns in an image. Thus optical flow can give important information about the spatial arrangements of the objects viewed and the rate of change of their arrangement. For each candidate in a frame we compute the average of the sum of the magnitude of the velocities of its boundary pixels. Candidate having the maximum velocity is identified as the ball region. A typical example of ball candidates detected in a medium shot and the detection of actual ball after the removal of non-ball candidates is demonstrated in Fig. 7a and b, respectively. 3.3. Ball detection in long shots
3.2. Ball detection in medium shots After obtaining a few ball candidates by using circular Hough transform, it is required to filter out the non-ball candidates in a frame. The non-ball candidates may be the body parts of the players that are similar to the shape of a ball or some circular regions on the billboards. Since the ball in a soccer match is a moving object; it has significant amount of velocity. It is expected that the
The ball may not look circular in some long shots frames, as a result of which the technique used to find the ball in medium shot view does not perform well for long shot views. So we apply a different algorithm for ball detection in long shot views. After obtaining the regions that are circular in shape in long shot views, we filter out the non-ball candidates based on certain heuristics. The components of our algorithm are as follows. It has been observed
V. Pallavi et al. / J. Vis. Commun. Image R. 19 (2008) 426–436
431
Fig. 8. Field and non-field region separator in a long shot view.
that the non-ball candidates may belong to any of the following: certain regions of the crowd, figures in the logo of the broadcasting channel, regions on the billboards, some regions of the players, regions of central line in the playfield. Now, instead of trying to find out the actual ball candidate we eliminate the most probable nonball candidates. 3.3.1. Filtering the non-ball candidates Here, we remove the non-ball candidates as follows. Removal from Channel’s Logo: Logo of the broadcasting channel usually appears on the top left corner of the broadcasted video. We filter out all the ball candidates that come in the logo region. Removal from Gallery Region: It has been observed that a few probable ball candidates belong to some regions in the crowd. Even alphabets like O or Q on the billboards may look like ball. So these candidates too have to be filtered out. The extra field region includes the gallery and the billboards. In long shot views, the crowd may or may not be present. Fig. 8 shows that the lower part of the billboard can be used to separate the field region from the non-field region. We apply Hough transform for line detection to find the horizontal lines present in a frame. We consider the lowest and the longest line as the separator between field and non-field regions. This algorithm identifies a set of horizontal lines say L for a frame F such that L ¼ fl1 ; l2 ; . . . ln g with their midpoints at
fðx1 ; y1 Þ; ðx2 ; y2 Þ; . . . ðxn ; yn Þg which also includes the line separating the billboard from the crowd and the line separating the billboard from the field. We select the line li with maximum yk from the top of the frame having length above a certain threshold LT such that LT is greater than half the width of the frame. This identifies the line separating the billboard from the field. Fig. 8 shows the line that separates the gallery and the play field. But, in the corner area, two lines for billboards on both the sides are present. The algorithm however, selects the line with maximum yk from the top of the frame and as a result, the billboard towards the width of the field is selected. All the ball candidates above this line are filtered out. Removal from Midfield Line: Midfield line or the central line divides the field in half along the width. Because of the color, texture and width of this line, some regions on it get identified as circular regions. As the game proceeds, the ball is kicked from one end of the field to the other by the players for scoring goal. The camera tracks the ball and in doing so many long shot views display the midfield line. So the removal of ball candidates detected on this line is very important. We expect that the centers of the candidates on the midfield line to have their x-coordinates within a range of 10 pixels. So we find the number of candidates which have their x-coordinate value within a range of 10. If the number of such candidates is more than four then we remove all these candidates. If the actual ball lies within that region, then the algorithm misses it.
Fig. 9. Ball candidates in a long view frame before and after filtering.
432
V. Pallavi et al. / J. Vis. Commun. Image R. 19 (2008) 426–436
Filtering out the candidates moving against the camera: Camera parameters are strongly related to ball movements. To maintain the viewers’ interest, camera tracks the ball which is the focus of the game. So we hypothesize that the ball candidates which move against the camera direction are non-ball regions. Since the ball movement is from one goal area to the other, the camera either moves towards right or it moves leftwards. So now we try to find out the direction of the camera and the direction of the ball candidates. To calculate the direction of camera, we compute the optical flow direction for each pixel in a frame using the Horn and Schunck [4] method. We conclude that the camera moves towards right if the maximum pixels in the frame have optical flow velocity direction within þ p2 to p2 ; else the camera moves towards left. We then compute the direction of all the ball candidates which are left after filtering out the ball candidates using the above techniques. Direction of each ball candidate is the summation of the directions of its boundary pixels. If the direction of a candidate lies between þ p2 and p2 and the majority of the pixels in the frame lie between þ p2 and p2 then it is concluded that the ball candidate is moving rightwards and is in the direction of the camera. In this way, we filter out the candidates that move against the camera. Fig. 9 shows the ball candidates detected in a long shot view before and after using all the filters. 4. Ball trajectory estimation by dynamic programming Filtering out the non-ball regions from long shot views still leaves a few candidates that look similar to the ball. It is very difficult to judge the ball among these candidates. Unlike other nonball candidates, a ball has a definite trajectory. We construct a directed weighted graph for all these ball candidates. We assume that the longest path of this graph gives the trajectory of the ball. 4.1. Directed graph construction The few candidates left after filtering out the non-ball regions from T consecutive frames are considered and a weighted graph is constructed for these ball candidates. We assume that the actual ball candidate in a frame has a correlation with the actual ball regions in the next consecutive frames. Each graph node represents the ball candidates while each edge links candidates in a frame with the candidates in next two consecutive frames. Each edge is assigned a weight which is represented by the correlation between the candidates. A directed graph GðV; EÞ is constructed for the set of candidates fc1 ; c2 ; . . . cn gV of T consecutive frames fF1 ; . . . FT g after filtering the non-ball candidates. We consider candidates of ‘k’ successive frames for the formation of edges ‘E’ of the graph ‘G’. ‘k’ is set to 2 as the actual ball may not have been detected in an intermediate frame ‘Ft ’. While shooting a soccer match, the sports videographer follows the ball. As a result, the ball’s location in any two adjacent frames ‘Ftþ1 ’ and ‘Ftþ2 ’ remains close to each other. Let ci be a candidate in frame Ft and cj be a candidate in frame Ftþ1 or Ftþ2 . An edge ci cj is added to the graph G only if the spatial locations of the centers of ci and cj lie within a window W having a width of w=5 and a height of h=5 where w is the width of the frame and h is the height of the frame. The window has been chosen to be rectangular in shape to preserve the aspect ratio of the frame. However, it is observed that whenever there is a ball shoot then due to the very high velocity of the ball, the ball may not lie within the window W in adjacent frames. Each edge is now assigned a weight. Block matching (Fig. 10) is performed for estimating correlation of a ball candidate in frame Ft with the candidates in Ftþ1 and Ftþ2 . Iðci Þ is the intensity map of ci represented by a 16 16 block around its center ðxci ; yci Þ. Each edge ‘ci cj ’ E of the graph
Fig. 10. Assigning weight to the edges for block matching.
G is assigned a weight. Intensity maps of ci and cj which are joined by the edge ‘ci cj ’ are divided into non-overlapping 4 4 sub blocks (bI 1 ðci Þ; bI 2 ðci Þ; . . . bI 16 ðci Þ) and (bI 1 ðcj Þ; bI 2 ðcj Þ; . . . bI 16 ðcj Þ), respectively. Overlapping sub blocks are not considered because they are computationally very expensive. Each sub block in ci is matched with every sub block in cj . The degree of matching of the blocks is denoted by CR which is comparable to Peak Signal to Noise Ratio (PSNR). CR for ci and cj is defined by CRci ;cj :
0
CRci ;cj
1
B rci ;cj C ffiA ¼ 20 log @qffiffiffiffiffiffiffiffiffi
ð1Þ
Cci ;cj
where,
rci ;cj ¼
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 ðIðci Þ Iðcj ÞÞ N
ð2Þ
where, Iðci Þ is the intensity map of ci represented by a 16 16 block, I is the average intensity of Iðci Þ and N ¼ 16.
CðbIðci Þ; bIðcj ÞÞ ¼
1 b ð Iðci Þ bIðcj ÞÞ2 nn
ð3Þ
bIðci Þ and bIðcj Þ are the 4 4 sub blocks of Iðci Þ and Iðcj Þ, respectively, and n ¼ 4. Finally,
Cci ;cj ¼ maxfCðbI a ðci Þ; bI b ðcj ÞÞg;
8a; b
ð4Þ
where, 1 <¼ a 6 16 and 1 <¼ b 6 16 Thus, the edge ci cj joining the ball candidates ci in frame Ft and cj in frame Ftþ1 is assigned a weight CR. Algorithm: Graph Construction Input: c0 ! Actual ball candidate in frame F1 T ! Total number of consecutive frames cN ¼ fc1 . . . cn g ! Candidate set from frame F2 to FT C ¼ c0 [ cN ! Candidate set from F1 to FT wða; bÞ ! weight of the edge joining a and b wt ¼ Widthoftheframe ht ¼ Heightoftheframe Output: Directed weighted graph G Begin for each candidate ‘c’ in Fi where i ¼ f1; 2; . . . T 2g
V. Pallavi et al. / J. Vis. Commun. Image R. 19 (2008) 426–436
choose 4 4 pixel block around the center of c for each candidate ‘d’ in Fiþ1 and Fiþ2 if ðjcx dx j < wt and ðjcy dy j < ht5 Þ 5 then choose 16 16 pixel block around the center of d divide d into 16 (4 4) pixel blocks ! d1 ½4; 4; . . . d16 ½4; 4 for d ¼ d1 to d16 Calculate CR of c and d end for max(CR) = w(c,d) add edge c; d to G end if end for end for End
4.2. Ball trajectory estimation The graph constructed from the ball candidates is a directed acyclic graph. In the case of directed acyclic graph, dynamic programming can be used to find the longest path between two given vertices Alsuwaiyel [1]. Since the graph constructed is a directed acyclic graph, so we use dynamic programming to find its longest path. Fig. 11 shows a weighted graph for a few ball candidates detected for four consecutive frames in a long shot. Vertices A to H represent ball candidates that were obtained after applying the ball detection and non-ball candidate filtering algorithm. The numbers on the edges represent the CR of the pair of vertices joined by the edges. Given that A is the starting ball candidate, the problem here is to find the actual path of the ball from frame F 1 to F 4 . Out of the sixteen possible paths in Fig. 11, the actual path of the ball has to be determined. We find the longest path of the acyclic graph to extract the path of the ball. Let graph GðV; EÞ be the directed acyclic graph constructed for the set of ball candidates obtained after applying the ball detection and non-ball candidates filtering algorithm for T consecutive frames. Let ‘V tj ’ be the j-th vertex in t-th frame. We find the longest path of the acyclic graph between ‘s’, which is the only vertex of the first frame and each of the vertices in T-th frame. We define:
wit1 ;jt2 ¼ weight of edge Ei;j joining vertex i in t1th frame and vertex j in t2th frame; if edge is present ¼ 0; We compute follows:
‘V tj ’,
otherwise
which is the value of j-th vertex in t-th frame as
‘V tj ’ ¼ 0;
if t ¼ T n o ¼ max ðV itþ1 þ witþ1 ;jt Þ; ðV itþ2 þ witþ2 ;jt Þ 8i; j; otherwise
433
The dynamic programming algorithm to find the longest path is as follows: Algorithm: Ball Path Trajectory Input: T ¼ totalnumberofframes s ¼ startingballcandidateinthefirstframe t Pathj ¼ Sumofedgesoftheoptimalpathstartingat j th candidatein ‘t-th’ frame and ending at ‘s’ Output: Optimal Path Iteration: for each candidate ‘V Tj ’ in ‘T th’ frame T Calculate Pathj end for T Optimal Path = max(Pathj ; 8j) Thus, given a source node our ball path trajectory algorithm extracts the longest path which is the path of the ball. The user selects the ball candidate in the first frame of the sequence by manually entering the labeled value of the node.
5. Experiments and results We tested the proposed method of ball detection in medium shots and long shots on a match played between Real Madrid and Manchester United in UEFA Champions League 2003 (Match 1) and Chelsea and Liverpool in UEFA Champions League 2007 (Match 2). These matches were broadcasted by Star Sports and Ten Sports, respectively, and are recorded from TV by using the Pinnacle Grabber Card. The software has been implemented using Visual C++. 5.1. Results for medium shots We tested the proposed method on 24 medium shots of Match 1 and 22 medium shots of Match 2. The Match 1 was played in varying lighting conditions. The first half of the match was played in the absence of floodlights while the second part of the match was played in floodlights. A total of 338 frames were selected from within the medium shots. Ball was clearly visible in 266 frames while in 24 frames the ball was partially visible or occluded. In 48 frames the ball was not visible. We even tested our approach on the frames where the ball was not visible to verify that the algorithm can detect the absence of ball. The ball was detected with an accuracy of 83.10% while the absence of the ball was detected with an accuracy of 91.66%. Table 3 shows the results of our ball detection algorithm for medium shots. The second column of the table indicates the experimental results to obtain the location of ball using only the shape information (applying Hough Transform). On applying Hough Transform more than one circular regions were obtained in some frames. In such cases the region with the greatest radius was considered as the ball candidate. Motion information was not used here. Whereas, the third column indicates the results of the experiment in which both shape as well as motion information was used. It clearly shows the improvement in ball detection using the motion information. 5.2. Results for long shots
Fig. 11. Weighted graph for four consecutive frames.
To evaluate the performance of our algorithm we carried out experiments with several long shots of the Match 1 and Match 2. Fig. 12 shows the trajectory of the ball obtained after finding the longest path using dynamic programming for a sequence of frames in the video. Figs. 13 and 14 shows the trajectory of a frame sequence from frame number 35,900 to frame number 37,000. We consider all
434
V. Pallavi et al. / J. Vis. Commun. Image R. 19 (2008) 426–436
Table 3 Ball detection results for medium shots
Ball detected (clearly visible) Ball detected (partially visible) Absence of ball detected
Without motion (No. of frames)
With motion (No. of frames)
143 (49.31%) 0 (0%) 44 (91.66%)
241 (83.10%) 0 (0%) 44 (91.66%)
Fig. 14. Ball trajectory for a long shot in y-plane.
The results of few other long shots have been given in Table 4. We use recall and precision as the performance measures to evaluate our algorithm. They are defined as:
No of Frames Where Ball is Truly Detected Total No of F rames Where Ball is Truly Present No of Frames Where Ball is Truly Detected Precision ¼ Total No of Frames
Recall ¼ Fig. 12. Ball trajectory obtained for a sequence of frames using ball path trajectory algorithm.
Fig. 13. Ball trajectory for a long shot in x-plane.
the frames in this sequence at an interval of 2 frames. The circle in the figure represents the ball location obtained by our algorithm while the dot represents the actual ball position. This frame sequence is of 8 s duration. Ball is identified accurately even in frames where the ball appears very small due to the distance. This shot includes 2 ball passes. Ball is not located accurately in eleven frames. Fig. 15 shows the ball detection results for this sequence of frames in the video. The ball detected is marked with a black rectangle. Our assumption that the ball is circular in shape and moves with the camera failed in a frame. The ball moved opposite to the direction of camera in this frame. In another frame the ball is partially visible and so it is not identified by the circle detection algorithm. In yet another frame the ball is not visible but our algorithm identifies a candidate which occludes the ball.
We say a ball is truly present in a frame if it can be located independent of its preceding or successive frames. A ball is truly detected if the algorithm detects the center of the ball. Table 4 shows that our algorithm gives an average recall of 94.2% and an average precision of 90.72%. Table 5 shows the deviation of x- and y-coordinates of the actual ball candidates in the frames where the ball was not located. We next evaluate our algorithm by performing experiments on the video clips used by Liang et al. [8] for their experiments. There are two video sequences of two different soccer matches. The first video clip has 650 frames while the second clip contains 719 long view frames. The playfield of the first sequence does not have good lighting. Part of the field is illuminated by the sunlight while, rest of the field comes under the shadow of the stadium. Our algorithm shows good results even for such playfield conditions. The ground truth is obtained by manually checking the presence of the ball in a frame. A ball is said to be present in a frame if it can be located independent of its preceding or successive frames. In a frame the ball may be partially or fully visible. Table 6 shows the results of our ball detection algorithm on both the clips. Ball is detected with a recall of 98.03% for the first sequence and a recall of 98.26% for the second sequence. We observed that the ball is not detected in the frames when it is on the midfield line. In 22 frames, the ball was detected even though it was partially visible. The ball is missed by the ball candidate detection algorithm due to occlusion in a few intermediate frames. Still, the trajectory obtained from the longest path of the graph gives the true ball trajectory. The true trajectory is obtained as the algorithm skips these intermediate frames. This is because the correlation between the actual ball candidates in the current frame and its next to next frame is more than the correlation between ball candidate of the current frame and the non-ball candidates in its next frame. In Match 1 we test with frame interval of 20 frames while Match 2 is tested with frame interval of 2 frames. However, we test each frame in the video clips of Liang et al. [8]. Processing each frame in a long shot to obtain the ball trajectory is quite time consuming. Hence, instead of processing each frame, we
V. Pallavi et al. / J. Vis. Commun. Image R. 19 (2008) 426–436
435
Fig. 15. Ball regions detected for the trajectories in Figs. 13 and 14.
process frames at regular intervals. Experiments are performed with different frame intervals to check the robustness of the algorithm. These experiments clearly indicate that our ball detection algorithm is very efficient for broadcast soccer videos. Our algo-
rithm gives good results for wide playfield conditions. Ball is identified accurately even in frames where the ball appears very small due to the distance. It takes about 27.2 s to process each frame to track the ball.
436
V. Pallavi et al. / J. Vis. Commun. Image R. 19 (2008) 426–436
Table 4 Results of ball detection in long shots Match Id
Frame range
Duration of shot (in seconds)
Total (no. of frames)
Ball identified in (no. of frames)
Ball present in (no. of frames)
Recall (%)
Precision (%)
Match Match Match Match Match Match Match Match Match Match Match Match
23,800–24,200 30,300–30,760 34,800–35,400 35,900–37,000 37,020–37,400 40,500–40,900 41,400–41,940 9730–9770 10,460–10,632 11,810–11,854 15,334–15,400 17,200–17,310
18 20 23 40 18 18 22 1.5 8 2 2.5 4.5
20 23 30 55 19 20 27 20 86 44 33 55
18 21 28 51 18 20 27 16 75 44 31 36
19 22 30 53 19 20 27 20 86 44 33 38
94.73 95.45 93.33 96.23 94.74 100 100 80.0 87.20 100.0 94.74 93.94
90 91.30 93.33 92.73 94.74 100 100 80.0 87.20 100.0 93.94 65.45
1 1 1 1 1 1 1 2 2 2 2 2
Table 5 Deviation of ball locations Frame number
Ground truth (x-coordinate)
Ground truth (y-coordinate)
Experimental (x-coordinate)
Experimental (y-coordinate)
Dx
Dy
23,940 30,660 36,040 36,080 36,200 37,300
180 225 180 160 175 103
102 87 125 125 155 129
193 242 237 144 197 143
94 98 138 192 140 106
13 17 57 16 22 40
8 11 13 67 15 23
Table 6 Results of ball detection in long shots Video sequences
Total (no. of frames)
Ball present in (no. of frames)
Ball identified in (no. of frames)
Recall (%)
Precision (%)
Sequence Sequence Sequence Sequence
650 650 719 719
609 600 689 631
597 513 677 598
98.03 76.7 98.26 84.5
91.85 89.7 95.83 89.1
1 1 2 2
(our approach) (Liang et al.’s approach) (our approach) (Liang et al.’s approach)
6. Conclusion In this paper, we have presented an approach to detect the ball in broadcast soccer videos. We use motion based approach for detecting ball in medium shots while a trajectory based approach is used for long shots. In medium shots, non-ball candidates are filtered out based on the motion information. In long shots, non-ball candidates are filtered out based on their size, position, velocity and direction of motion. The novelty of this approach is in the techniques used to remove non-ball candidates based on static and dynamic features and applying dynamic programming to find the ball trajectory for long shots. The experimental results show that this new approach is very efficient even with poor playfield conditions. Our algorithm does not require prior preprocessing of the videos. Experimental results show an efficiency of 83.10% for medium shots and an average recall of 94.2% for ball detection in long shots. Acknowledgments This work has been supported by research grants from Department of Science and Technology, India under Grant No: SR/S3/ EECE/024/2003. We are thankful to Dawei Liang and Yang Liu of School of Computer Science and Technology, Harbin Institute of Technology, China for providing their video data to us. References [1] M.H. Alsuwaiyel, Algorithms Design Techniques and Analysis, vol. 7, World Scientific Publishing Co. Pte. Ltd., Singapore, 1999. 205–209.
[2] T. D’Orazio, N. Ancona, G. Cicirelli, M. Nitti, A ball detection algorithm for real soccer image sequences, Proceedings of International Conference on Pattern Recognition 1 (2002) 210–213. [3] T. D’Orazio, C. Guaragnella, M. Leo, A. Distante, A new algorithm for ball recognition using circle hough transform and neural classifier, Journal of Pattern Recognition 37 (2004) 393–408. [4] B.K. P Horn, B.G. Schunck, Determining optical flow, Artificial Intelligence 17 (1981) 185–203. [5] P.V.C. Hough, Methods and Means for Recognizing Complex Patterns, U.S. Patent 3,069,654, 1962. [6] T. Kim, Y. Seo, K.S. Hong, Physics-based 3D position analysis of a soccer ball from monocular image sequences, Sixth International Conference on Computer Vision, (1998) 721–726. [7] M. Leo, T. D’Orazio, P. Spagnolo, A. Distante, Wavelet and ICA preprocessing for ball recognition in soccer images, ICGST International Journal on Graphics, Vision and Image Processing 1 (2005) 53–58. [8] D. Liang, Y. Liu, Q. Huang, W. Gao, A scheme for ball detection and tracking in broadcast soccer video, pacific rim conference on multimedia, LNCS 1 (3767) (2005) 864–875. [9] V. Pallavi, J. Mukherjee, A.K. Majumdar, S. Sural, Shot classification in soccer videos, Proceedings of Recent Trends in Information Systems 1 (2006) 216– 219. [10] Y. Seo, S. Choi, H. Kim, K. Hong, Where are the ball and players? Soccer game analysis with color-based tracking and image mosaic, Proceedings of Ninth International Conference on Image Analysis and Processing 2 (1997) 96–203. [11] X. Tong, H. Lu, Q. Liu, An effective and fast soccer ball detection and tracking method, Proceedings of International Conference on Pattern Recognition 4 (2004) 795–798. [12] A. Yamada, Y. Shirai, J. Miura, Tracking players and a ball in video image sequence and estimating camera parameters for 3D interpretation of soccer games, Proceedings of 16th International Conference on Pattern Recognition 1 (2002) 303–306. [13] X. Yu, C. Xu, H.W. Leong, Q. Tian, K.W. Wan, Trajectory-based ball detection and tracking with applications to semantic analysis of broadcast soccer video, Proceedings of ACM Conference on Multimedia (2003) 11–20. [14] X. Yu, H.W. Leong, C. Xu, Q. Tian, Trajectory-based ball detection and tracking in broadcast soccer video, IEEE Transactions on Multimedia 8 (6) (2006) 1164–1178.