Multi-stage segmentation of optical flow field

Multi-stage segmentation of optical flow field

SIGNAL PROCESSING ELSEVIER Signal Processing 54 (I 996) 109%1 18 Multi-stage segmentation of optical flow field Jae Gark Choi”.*, Seong-Dae Kimb ...

833KB Sizes 18 Downloads 70 Views

SIGNAL

PROCESSING ELSEVIER

Signal Processing

54

(I 996) 109%1 18

Multi-stage segmentation of optical flow field Jae Gark Choi”.*, Seong-Dae Kimb

Abstract

This paper presents object-based segmentation of optical flow field and its motion estimation method. A major difficulty in estimating general motion is that it requires a large area of support in order to achieve a good estimation. Unfortunately, when the supporting area is large, it is very likely to have multiple moving objects. Thus, object-based segmentation and general motion estimation are interdependent problems. To solve the problem, we propose a multi-stage segmentation method which groups progressively the flow field into segments according to a hierarchy of motion models. The basic idea is to group uniform subregions with respect to a simple motion model into uniform regions with respect to a complex model using region growing technique. By the method, we can segment the flow field into several parabolic patches. For head-and-shoulder images. simulation results show that it can segment the flow field into meaningful regions such as head and shoulder. Zusammenfassung

Diescr Artikel prasentiert Methoden zur objekt-basierten Segmentiemng des optischen Flu&Feldes und zur SchItzung seiner Bewegung. Ein wesentliches Problem der allgemeinen Bewegungsschatzung ist, dal3 sie einen grofien Bereich benstigt, um eine gutes Ergebnis zu erzielen. Wenn der Bereich grol3 ist. ist es aber ungliicklicherweise sehr wahrscheinlich. dal3 mehrere sich bewegende Objekte vorhanden sind. Daher sind objekt-basierte Segmentierung und allgemeine BewegungsschItzung voneinander abhgngige Probleme. Urn dieses Problem zu l&en, schlagen wir eine mehrstufige Segmentierungsmethode vor, die das FluR-Feld entsprechend einer Hierarchic von Bewegungsmodellen schrittweise in Segmente zusammenfaljt. Die grundlegende Idee besteht darin, mit Hilfe einer GebietsvergrijBerungsmethode gleichartige Teilgebiete in Hinblick auf ein einfaches Bewegungsmodell in gleichartige Gebiete in Hinblick auf ein komplexes Model1 zusammenzufassen. Mit dieser Method k6nnen wir das Flu&Feld in mehrere parabolische Bereiche segmentieren. Fiir ‘Kopf-und-Schulter’-Bilder zeigen Simulationsergebnisse, daR die Method das Flu&Feld in sinnvolle Gebiete wie Kopf und Schulter segmentieren kann.

Cet article prksente une segmentation basCe sur les objets du champ de flux optique et sa m&hode d’estimation de mouvement. Une difficult6 majeure de I’estimation d’un mouvement g&&al est que celle-ci requiert un support Ctendu pour obtenir une bonne estimation. Malheureusement, quand le support est large, il est trts probable d’avoir plusieurs objets

* Corresponding author. Address: Department South Korea. E-mail: [email protected].

of Electrical

Engineering.

0165-l 684/96/$15.00 @ 1996 Elsevier Science B.V. All rights reserved PII SO I65- 1684( 96)00 100-4

KAIST,

373-l

Kusong-Dong,

Yusung-Go.

Taejon,

305-70

I,

J. G. Choi, S.-D. Kim/ Signal Processing 54 (1996) 109-118

110

en mouvement. De ce fait, la segmentation bake objet et l’estimation du mouvement general sont des problemes interd&pendants. Pour les resoudre, nous proposons une methode de segmentation multi-niveaux qui regroupe progressivement le champ de flux en segments selon une hierarchic de modeles de mouvement. L’idee de base est de grouper les sous-regions uniformes par rapport a un modele de mouvement simple dans des regions uniformes par rapport au modele complexe a l’aide d’une technique de croissance de regions. Avec cette mtthode, nous pouvons segmenter le champ de flux en plusieurs parcelles paraboliques. Pour les images tete-et-epaule, les resultats de simulation montrent qu’elle peut segmenter le champ de flux en regions signifiantes telles que la t&e et les Cpaules. Keywrds:

Optical flow; Segmentation;

Motion estimation;

1. Introduction The segmentation of the optical flow field between successive frames finds a wide variety of applications such as robotics, navigation and image coding, etc. Especially, object-based segmentation is essential in the areas of object tracking and object-based motioncompensated (MC) coding [8,9]. Optical flow based segmentation has been investigated by many authors [ 1,2,15]. In [ 11, Adiv proposed an algorithm which uses a generalized Hough transform technique. Using the Hough transform, flow field is divided into segments which are consistent with moving rigid objects with roughly planar surfaces. Then these segments are grouped under the hypothesis that they are induced by the same 3D motion parameters of rigid objects. In [15], k-means clustering is used for motion segmentation. The clustering on parameter space is difficult in high-order motion model, because the importance of each component in a parameter set is different and such effects should be considered. On the other hand, Chang et al. [2] proposed a general MAP (maximum a posteriori probability) formulation for segmentation of image sequences using flow field and gray level informations. They used an iterative conditional mode (KM) for the optimization. However, these algorithms require excessive amount of memory or heavy computational load. Moreover, the performances of these algorithms are limited by the 8-parameters model describing the motion of a planar surface in 3D space. In this paper, we propose a new scheme that is based on a succession of region-growing processes by relaxing constraints on the motion. The region-growing algorithm is much simpler than Hough transform or ICM, and a succession of it makes it easy to extend a motion model to more complex models which

Motion parameter

can result in a better description of the motion in a sequence. Object-based segmentation of flow fields necessitates a motion model to describe a 3D general motion of moving objects. The more complex the motion model is, the larger the size of describable region will be. At the same time, the more complex the motion model is, the larger area of support is required in order to achieve a good estimation. Unfortunately, when the supporting region is large, it is very likely to have multiple moving objects. This is one of the major difficulties in object-based segmentation and motion estimation [3, 161. To solve the problem, we propose a multi-stage segmentation scheme which is based on three stages. The approach is to segment flow field into regions corresponding to the moving objects according to a hierarchy of motion models. To do this, a hierarchy of motion models is defined. In the first stage, the flow field is partitioned into segments having similar motion vectors. Each segment has a 2D translational motion, and is denoted as a 2D translational patch. In the second stage, 2D translational patches which are consistent with a 3D rigid motion of a roughly planar surface are merged into segments. We call the segment a planar patch. In the third stage, planar patches are grouped into homogeneous regions which are generated by roughly parabolic rigid objects with 3D motion. We call the homogeneous region a parabolic patch. Hence the flow field is compactly segmented to parabolic patches. Especially, the above principle can be effectively applied for head-and-shoulder images. For instance, the face of a person in headand-shoulder images can be roughly represented as a parabolic surfaces. Also the parabolic surface can be approximated to a collection of roughly planar surfaces. The shoulder can also be described similarly.

J. G. Choi, S-D.

Kim / Siynul Prowssimq

In this paper, we assume that the flow field is given in advance using one of the estimation techniques [7, 10-121. However, the existing techniques have inherent noise problem especially near motion boundary [5]. To alleviate the noise effect in segmentation, we use a small threshold value in the first stage and postpone grouping of noisy flow vectors to post-processing which is done after third stage. The paper is organized as follows. Section 2 describes a hierarchy of motion models used in our proposed algorithm. The multi-stage segmentation algorithm is introduced in Section 3. In Section 4, simulation results are shown to examine the perfor mance of the proposed algorithm. Section 5 includes conclusions.

54 ( 1996) IOY- 118

III

It assumes that the 2D translational within the small patch.

motion is constant

Level 2: Plunar patch model. The planar patch model describes a 3D general motion of a rigid planar surface under orthographic projection. The planar surface is defined by the equation (3)

Z=plX+p2Y+p3,

where X, Y and Z are the object-space coordinates. Under orthographic projection, Eq. (3 ) can be rewritten in the image-space coordinate system as Z = PIX + p2y + p3. Substituting

(4)

Eq. (4) into Eq. (1) we have

u(x, y)

= aI + a2x + a3y,

c(x.y)

= 6, + h2x + b3y.

(5) 2. A hierarchy of motion models As we know, the flow field (u, v) induced by a rigid body motion under orthographic projection is [ 131

v = 7-y +xQz

-zsz,,

(1)

where x and y are the image coordinates. The 3D motion is represented by a small rotation R = [ax, S2y,SZzlT followed by a translation T = [TX, rr, rzlT, and Z is the scene depth. Eq. (1) defines an instantaneous optical flow vector. Clearly, an optical flow vector is not only determined by both the translation T and rotation R of a moving object but also influenced by the surface and shape of the object through the depth information Z. Depending upon the depth Z (i.e., the surface of the moving object), a hierarchy of motion models for multi-stage segmentation scheme can be defined as follows. Level 1: 20 trunslationul patch model. We assume that any motion including 3D motion of a rigid object can be approximated by 2D translational motion in small local area. Hence the flow field of Eq. (1) can be described by the 2D translational patch model in a small local area. The model is a simple transformation as follows: uk.1.)

= aI,

0(x, y) = b,.

(2)

The model is an affine transformation which contains six parameters. It is used to represent the motion of the planar surface in mosaic as 2D translational patches. Level 3: Parabolic patch model. The parabolic patch model describes a 3D general motion of a rigid parabolic surface under orthographic projection. The parabolic surface is characterized by Z = 4,x2 + q2Y2 + q3m

+ q& + q5Y + q6.

(6)

Under orthographic projection, Eq. (6) can be further rewritten in the image-space coordinate system as Z = q,x’ + q2y2 + q3xy + q4x + q5y + qfl. Substituting

(7)

Eq. (7) into Eq. (1) we have

u(x, I’) = 01 + a2X + a3y + a&c2 + a5y‘ + a@)‘, c(x, y) =

(8)

b, + b2x + b3y + bqx2 + b5_v’ + 66x4‘.

The model is a quadratic transformation which has 12 parameters. It is used to represent the motion of parabolic surface which is approximated to a collection of roughly planar surfaces.

3. Multi-stage

segmentation

algorithm

In this section, we describe our multi-stage segmentation algorithm. Fig. 1 illustrates the whole

J.G. Choi, S.-D. Kim/ Signal Processing 54 (1996)

112

two consecutive images -____----------_______.-----.- ._________.-----____________-preliminary segmentation

3. I. Preliminary

109-118

segmentation

-3

Optical flow fields generated by conventional estimation techniques [7, 10, 121 are inaccurate due to J-7 false match or local minima of cost function and noisy intensity values [5, 141. Especially, the estimated flow vectors of the background regions are not perfectly zero. In order to force Ilow vectors of the background combination of flow field and results of change detection to zero, we combine the flow field with results of _.___--_~~_~~_______.__~~~~___~________~.~~~~_~________._~~ change detection. The change detection technique .___-.________________..-----.--------_----------~~~~-~-__-is a straightforward solution for separating moving main * segmentation foreground objects from the static background. Our . change detection algorithm is similar to that given 1st segmentation stage I in [4]. By preliminary segmentation, the flow field is divided into changed regions and unchanged regions. The unchanged regions are supposed to belong to the static background. Hence, its flow vectors are forced to zero. The isolated changed regions are supposed to be moving objects. The main segmentation is applied to changed regions. 3.2. The $rst segmentation

YES .__________________._.______ ____________________--..--_-_I object-based segmentation and its motion description

stage

Neighboring pixels of similar flow vectors are grouped together to form 2D translational patches. The segmented patch has homogeneity with respect to 2D translational motion. Here the centroidlinkage region growing [6], which is simple and gives good segmentation results, is used for the segmentation. The direction and magnitude of flow vectors are considered in segmenting flow vectors by using the similarity measure M,, defined by

Fig. 1, Structure

of the multi-stage

segmentation

algorithm. Mi

process which consists of two major operations: preliminary segmentation and main segmentation. Preliminary segmentation is a procedure that divides the flow field into changed regions and unchanged regions. The main segmentation partitions progressively flow vectors of changed regions into moving objects. It consists of three segmentation stages: (1) 2D translational patch segmentation (first stage); (2) planar patch segmentation (second stage); and (3) parabolic patch segmentation (third stage). They are described in the following subsections in detail.

=

Id,,

-

d,il

+ Id_,r

-

d,v,il

9

(9)

where d,., and d,, are the x and y components for the local representative flow vector of a patch, respectively, and d,,; and d,i are the x and y components of the flow vector di at the pixel considered, respectively. If Mi is less than a given threshold, the pixel is determined to be in the patch. To reduce grouping mistakes due to noisy flow vectors, we use a small threshold value where it is set to 1. As a result, noisy vectors are grouped into very small segments. In order to prevent their bad effects on subsequent segmentation procedures, further grouping

J.G. Clzoi, S.-D. Kim/

Siynal Puocessiny 54 (1996)

of such segments is postponed to the post-processing which is done after the third stage. The post-processing is described in Section 3.4.

3.3.

Thr region merging method

In this subsection, the region merging technique which is a major tool in the second and third stages is described. Fig. 2 depicts the flow chart of the merging method. It is a procedure that groups subregions into larger regions based on merging criteria. An initial seed region grows progressively by appending its neighboring regions which are consistent with a motion of the seed region. This is repeated recursively until no more merging exists around the seed region. If there is no more merging, we find a new seed region and repeat the above-mentioned procedure.

1 sty

]

1 Select an initial seed region Ri

I.

4 Find merging candidates R i of ( a seed region Ri : ( R i: j=l: _.., m)

1

L

I

I. Test merging of Ri and RI

113

IO9 118

The procedure can be more precisely described in the following way: 1. Select an initial seed region R; for growing. The largest region is chosen for it among regions which are not yet merged to any of the already created segments. This is because we can obtain the most reliable motion parameters when the seed region has the largest area of support within homogeneity. 2. Find the merging candidates {Rj : j = 1,. . . , m} of the seed region R,. Only neighbor regions which are not yet assigned to any of the already created segments are considered as candidates. Here region R.i is defined as the neighbor of the seed region Ri, if there are two pixels p, and pi such that pi E Ri, pi E R,i and pi E Nd(pj) where Na(p) is the 4neighborhood of pixel p. 3. Compute an optima1 transformation corresponding to the set R, U R.i : {a;, 6; : I = 1.. . . , d} where d is 3 or 6 depending upon the motion model considered. 4. Calculate the criteria measure cr(Ri) for a merging decision of candidate R,. G(Rj) is the standard deviation of the flow vectors in Rj from the optimal solution. 5. Test the merging of the candidate R,, into the seed region R;. This is done by using the error measure a(R.,). If O(Rj) is less than a given threshold, the candidate Rj is merged into the seed region R;. 6. After testing all candidates, update the seed region newly and go to step 2, if any one of the candidates {R, : j = 1,. . , m} has been merged into the seed region Ri. 7. If no more merging exists around the seed region, we find a new seed region and repeat the above steps 1-6.

I

Any merging exits

Updare seed region Ri

All subregions are

Fig. 2. Flow chart of region merging algorithm.

3.4.

The second und third segment&on

stage

The second stage is to group 2D translational patches which are consistent with a rigid motion of a roughly planar surface into segments. The resulting segments are the planar patches. The flow vectors of the planar patch can be described by the planar patch model which is an affine transformation. As previously mentioned, the planar patch model describes

114

J. G. Choi, S.-D. Kim/ Signal Processing

3D general motion of a rigid planar surface under orthographic projection. In order to group 2D translational patches into planar patches, we use the region merging technique described in Subsection 3.3. If a seed region Ri and a candidate Rj are consistent with the same affine transformation, they are merged together to create a planar patch. Consistency with an affine transformation is detected by computing optimal motion parameters and a related error measure as follows. Computing an optimal affine transformation. Given a set of n flow vectors of Ri and Rj, we wish to compute, employing the least-squares criterion, the optimal affine transformation corresponding to this set. The error function to be minimized is

54 (1996)

109-118

In the third stage, planar patches which are consistent with the quadratic transformation are merged into parabolic patches. Flow vectors of a parabolic patch can be described by the parabolic patch model which is a quadratic transformation. It is similar to the second stage except for using a quadratic transform instead of affine transform. After the third stage, noisy small segments of which grouping is postponed still remain ungrouped into any parabolic patches. Thus these segments should be merged to one of its neighboring patches. For such segments, consistency with the quadratic transformation of neighboring parabolic patches are checked and then it is assimilated to a maximally consistent patch.

4. Simulation results

=

k[(u(xk,yk)

-

al -a2xk

-

h

-a3ykj2

k=l +(u(xk,

Yk)

-

b2Xk -

b3Yk>*l

t

(10)

In this section, the performance of the proposed method will be examined with both synthetic and real images. The segmentation results of synthetic flow field are shown in Fig. 3. Fig. 3(a) is a synthetic flow

where k is an element of the set Ri U Rj. Taking partial derivatives with respect to al,. . . , b3 and equating to 0, a set of six linear equations is obtained. If these equations are independent, their solution, denoted by a;, . . . , b; represents the optimal affine transformation. Error measure for a merging decision. Substituting the optimal solution and the flow vectors contained in only the candidate Rj into Eq. (10) and using a normalization equation, a new error measure O(Rj) is obtained for the candidate Rj: 4Rj)

= /T

t

(11)

a(Rj) is an estimate of standard deviation of the flow vectors of Rj from those predicted by the optimal transformation over the set Ri U Rj. If o(Rj) is less than a given threshold, the candidate Rj is merged into the seed region Ri. The threshold is empirically determined to be 1.2 which is slightly greater than the threshold of the first stage.

I

tdl Fig. 3. Simulation results of synthetic flow field: (a) synthetic flow field; (b) first segmentation result; (c) second segmentation result; (d) third segmentation result.

J.G. Choi, S.-D. Kim/

Sipd

Prorrssiny

54 (1996)

109-118

115

(a)

Fig. 4. Two images from ‘Claire’ sequence and optical flow field: (a) first image; (b) second image; and (c) flow field between (a) and (b).

field which contains three complex motions such as translation and rotation of 3D surfaces. The results of the three stages of the proposed method, shown in Fig. 3(b)-(d), demonstrate the role and importance of each of these stages. The flow field is divided into 2D translational patches and planar patches in Figs. 3(b) and (c), respectively. At the third stage, as can be seen in Fig. 3(d), the proposed scheme successfully divides the field into three segments. Real image sequence is used in the second simulation. Figs. 4(a) and (b) are two frames of headand-shoulder image sequence “Claire” in CIF (352 pixels x 288 lines) format. Fig. 4(c) shows the optical flow field between Figs. 4(a) and (b), which is generated by the method in [lo]. We compare the proposed method with the conventional method of Adiv [l].

Fig. 5 shows the segmentation results of both Adiv’s and the proposed methods. The results of three stages of the proposed scheme are shown in Fig. 5(a)-(c). Each segment of the three stages corresponds to the 2D translational patch, planar patch and parabolic patch, respectively. Fig. 5(d) is the segmentation result of Adiv’s method. We see that the changed region is divided into three segments in the third stage with the proposed method, and seven segments with Adiv’s method. Especially, a person’s face in head and shoulder images is segmented as one object. When the motion segmentation is utilized for coding applications such as object-based coding, a small number of segments with no increment of MC error is desirable, because it reduces data rates for the transmission of motion and contour parameters. Fig. 6 shows the predicted frame by object-based motion compensation and motion-failure (MF) parts.

J. G. Choi, S.-D. Kim/ Signal Processing

116

54 (1996)

109-118

(a)

(b)

(d) Fig. 5. Segmentation results: (a) first segmentation of proposed method; (b) second segmentation of proposed method; and (d) segmentation result of Adiv’s method

Table 1 Simulation

PSNR (dB) MF (%) Time (s)

results for ‘Claire’ sequence Adiv’s method

Proposed method

35.63 1.06 399

36.41 0.72 197

MF parts are detected by thresholding the absolute difference between the original image and the predicted image. In the experiment, the threshold is set to 15. The exact motion boundary in such a real sequence is not known, and hence we should compare the seg-

of proposed method; (c) third segmentation

mentation performance by an indirect measure. The PSNR in the predicted frame and the ratio of MF parts over the entire image can be thought as indirect measures, since they are related to the correctness of the estimated motion parameters. Table 1 depicts the simulated results for ten frames of the “Claire” sequence. The average PSNR of Adiv’s method is 35.63 dB while that of the proposed scheme is 36.41 dB. There is a 0.8 dB improvement in the predicted frame. Also shown in Table 1, our method produces smaller MF parts than the conventional one. In addition, the average simulation time on SUN spare station 2 is also measured. We see that the proposed method needs much shorter processing time than Adiv’s method. It should be noted that memory burden which is one of

J. G. Choi, S-D.

Kim 1 Signul Proce.ssimq 54 (I 996) 109-I 18

(a)

(b)

(cl

(d)

Fig. 6. Motion compensated frames and MF parts: (a) predicted frame by the proposed (c) MF parts by the proposed method; and (d) MF parts by Adiv’s method.

method: (b) predicted

117

frame by Adiv’s method;

the main defects of Adiv’s method is no more troublesome in the proposed method.

of object-oriented coding depends on the size of the objects, a high coding efficiency can be expected.

5. Conclusions

References

We have proposed an object-based segmentation method using a hierarchy of motion models. The method is based on a multi-stage segmentation that consists of three stages: (1) 2D translational patch segmentation; (2) planar patch segmentation; (3) parabolic patch segmentation. Simulation results for head-and-shoulder images show that it has meaningful segmentation results. Further research would focus on the application of the proposed segmentation method to object-based image coding. Since the efficiency

[I] C. Adiv, “Determining three-dimensional motion and structure from optical flow generated by several moving objects”, IEEE. Trans. Puttrrn Anal. Mochine Intell., Vol. 7. No. 4. July 1985, pp. 384401. [2] M.M. Chang, A.M. Tekalp and M.I. Sezan, “Motion-field segmentation using an adaptive MAP criterion”, Proc,. IEEE Internut. Corzf.’ Acoust. Speech Siynul Prowxv., Minneapolis, MN, 27-30 April 1993, pp. V33-V40. [3] J.G. Choi, S.W. Lee and S.D. Kim, “Segmentation and motion estimation of moving objects for object-oriented coding”, Proc IEEE Internut. Conf: Acoust. Sprwh Signal Process.. Detroit, MI, 9-12 May 1995, pp. 2431-2434.

118

J.G. Choi, S.-D. Kim/ Signul Processing 54 (1996)

“Object-oriented motion estimation and [4] N. Diehl, segmentation in image sequences”, Signal Processing: Image Communication, Vol. 3, No. 1, February 1991, pp. 23-56. [S] F. Dufaux and F. Moscheni, “Motion estimation techniques for digital TV: A review and a new contribution”, Proc. IEEE, Vol. 83, No. 6, June 1995, pp. 858-876. [6] R.M. Haralick and L.G. Shapiro, Computer and Robot Vision, Vol. 1, Addison-Wesley, Reading, MA, 1992, Chapter 10, pp. 532-535. [7] B.K.P. Horn and B.G. Schunck, “Determining optical flow”, Artgcial Intell., Vol. 17, 1981, pp. 185-203. [8] M. Hotter and R. Thoma, “Image segmentation based on object oriented mapping parameter estimation”, Signal Processing, Vol. 15, No. 3, October. 1988, pp. 3 15-334. [9] C. Labit and H. Nicolas, “Compact motion representation based on global features for semantic image sequence coding”, Proc. Visual Commun. Image Process., Boston, MA, 1t-13 November 1991, pp. 697-708. [lo] J.H. Lee and SD. Kim, “Velocity field estimation using a weighted local optimization”, IEICE Trans. Fundam., Vol. E76-A, No. 4, April 1993, pp. 661663.

109-118

[l l] J.-H. Lee and S-D. Kim, “An error analysis of gradientbased methods”, Signal Processing, Vol. 35, No. 2, January 1994, pp. 157-162. [12] H.H. Nagel, “Displacement vectors derived from secondorder intensity variation in image sequences”, Comput. Vision Graphic Image Process., Vol. 21, 1983, pp. 855117. [13] G. Tziritas, “Recursive and/or iterative estimation of the two-dimensional velocity field and reconstruction of threedimensional motion”, Signal Processing, Vol. 16, No. I, January 1989, pp. 53-72. [14] S. Ullman, “Analysis of visual motion by biological and computer systems”, IEEE Comput., Vol. 14, August 1981, pp. 57-69. [15] J.Y.A. Wang and E.H. Adelson, “Representing moving images with layers”, IEEE. Trans. Image Process., Vol. 3, No. 5, September. 1994, pp. 6255638. [ 161 S.F. Wu and J. Kittler, “General motion estimation and segmentation”, Proc. Visual Commun. Image Process., Lausanne, Switzerland, I4 October 1990, pp. 1198-1209.