ARTICLE IN PRESS
Signal Processing: Image Communication 20 (2005) 582–594 www.elsevier.com/locate/image
Estimation of orientation tensors for simple signals by means of second-order filters Klas Nordberga,, Gunnar Farneba¨ckb a Computer Vision Laboratory, Department of Electrical Engineering, Linko¨ping University, Linko¨ping, Sweden Laboratory of Mathematics in Imaging, Department of Radiology, Brigham and Women’s Hospital, Harvard Medical School, MA, USA
b
Received 8 March 2005; accepted 14 March 2005
Abstract Tensors have become a popular tool for representation of local orientation and can be used also for estimation of velocity. A number of computational approaches have been presented for tensor estimation which, however, are difficult to analyze or compare since there has been no common framework in which analysis or comparisons can be made. In this article, we propose such a framework based on second-order filters and show how it applies to three different methods for tensor estimation. The framework contains a few conditions on the filters which are sufficient to obtain correctly oriented rank one tensors for the case of simple signals. It also allows the derivation of explicit expressions for the variation of the tensor across oriented structures which, e.g., can be used to formulate conditions for phase invariance. r 2005 Elsevier B.V. All rights reserved. Keywords: Orientation estimation; Tensors; Second-order filters; Volterra series
1. Introduction The orientation or structure tensor is a popular tool for low-level image processing and computer vision where it has found two main applications: representation of local orientation in an n-dimensional Euclidean space and estimation of local motion from image sequences. In the first case, it
typically acts as a feature descriptor of edges, lines or planes in 2D or 3D image data, and a basic assumption is often that the local signal gðxÞ is simple1 (or intrinsically 1D): ^ gðxÞ ¼ g0 ðxT mÞ.
(1)
Notice that this implies that g is constant along ^ and that its any direction perpendicular to m Fourier transform GðuÞ vanishes everywhere
Corresponding author. Tel.: +46 13 281634;
fax: +46 13 138526. E-mail address:
[email protected] (K. Nordberg).
1 A prime sign 0 will in the following be used to denote onevariable versions of multi-variable functions.
0923-5965/$ - see front matter r 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.image.2005.03.006
ARTICLE IN PRESS K. Nordberg, G. Farneba¨ck / Signal Processing: Image Communication 20 (2005) 582–594
^ Any practical impleexcept on the line u ¼ tm. mentation of the orientation tensor assumes a basis for the Euclidean space such that it can be represented in terms of a symmetric n n matrix. In the following, we will not make any distinction between the tensor and its matrix representation. For the case of a simple signal, the corresponding tensor representation is ^m ^ T; T ¼ lm
lX0.
(2)
From this equation it follows that T also is positive semidefinite and at most of rank one in the case of simple signals.2 The representation can be extended to local signals which are concentrated to a k-dimensional subspace in the Fourier domain, ^ 1; . . . ; m ^ k (see spanned by the orthonormal basis m for example [6]) and the corresponding tensor is then given by ^ 1m ^ T1 þ þ lk m ^ km ^ Tk ; T ¼ l1 m
l1 ; . . . ; lk X0, (3)
i.e., T is now of at most rank k ðkpnÞ. Applications of orientation tensors for Euclidean spaces include image enhancement [6], corner detection [7], curvature estimation [15], and segmentation [17]. In the case of motion estimation, the brightness constancy constraint implies that the local signal g is constant (at least) along any line parallel to the local spatio-temporal motion vector v~ ¼ ðv1 ; v2 ; 1ÞT . Consequently, in the Fourier domain v~ is always perpendicular to the subspace where G is non-zero. Given the previous interpretation of the tensor for general signals, v~ is then ^ 1; . . . ; m ^ k , from which it perpendicular to all m follows that T~v ¼ 0.
(4)
This means that T, in this application, always has a rank of at most two, and if the rank is two then we can uniquely determine the corresponding local motion. However, if the local 2D signal that is moving itself is a simple signal, e.g., a line or an edge, then the corresponding local 3D signal g is 2 We are here using rank to indicate the number of eigenvalues which are sufficiently large not be neglected.
583
also simple, Eq. (1). Consequently, GðuÞ vanishes ^ and T has rank one, except on the line u ¼ tm Eq. (2). This implies that the motion cannot be uniquely determined from Eq. (4) alone, which is consistent with the fact that, in this case, the motion component parallel to the edge cannot be determined at the local scale (a situation often referred to as the so-called aperture problem). This can be solved by integrating T spatially over regions which are sufficiently large to contain nonsimple signals, which leads to a resulting tensor of rank two and, again, the motion can be determined from Eq. (4). In this article, we will look at the issue of estimating the orientation tensor. To this date, a number of estimation methods have been presented, each one originating from different problem formulations, computational frameworks, or ideas about what aspects of the processing that are important. As a consequence, the resulting methods appear rather different in terms of the local operations on the signal that are used to obtain the corresponding tensor. In Section 2, we will make a short review of three different methods for estimating local orientation tensors. The main contribution of this article is a formulation of a general framework for the computation of orientation tensors, Section 3. The proposed framework represents the computation of the tensor in terms of a second-order mapping on the local signal and using the assumption that the local signal is simple, Eq. (1), we derive a set of sufficient conditions on the mapping to assure that the resulting tensor can be written as Eq. (2). In Section 4, we show that the three methods presented in Section 2 fit the proposed framework, i.e., despite their distinct implementations they can all be seen as different instantiations of one single approach. In Section 5, a simple tool for analysis or comparison of computational methods is derived in terms of a function which describes how T varies across an oriented structure. This function can be used, e.g., for analyzing the response relative to an impulse line or a step edge, or for understanding the phase invariance properties of a method. Finally, in Section 6 we will summarize and discuss various implications of the framework.
ARTICLE IN PRESS 584
K. Nordberg, G. Farneba¨ck / Signal Processing: Image Communication 20 (2005) 582–594
2. Methods for estimation of orientation tensors In this section, we will briefly present three different methods for estimating tensor based descriptors of local orientation. For detailed presentations of these methods, see [8,6,3]. It should be mentioned that other methods for computing orientation tensors exist in the literature. The selection made here is based on presenting methods which have relatively different computational approaches and can be implemented in a straightforward way for arbitrary signal dimensionalities. In the following derivations, the signal g is an nvariable function which is assumed to be realvalued. The latter implies, e.g., that its Fourier transform G is Hermitian, i.e., GðuÞ ¼ GðuÞ. Furthermore, all integrals without explicit limits are taken over the entire of Rn . 2.1. Using outer products of the gradient In [1] the problem of finding the dominant orientation in an n-dimensional local region represented by the function g is formulated as a total least squares problem, where the dominant orientation is defined by a line through the origin of the ^ such that Fourier domain, with direction vector m, the total squared distances from any point to the line, integrated over the entire Fourier domain and weighted by the spectrum, is minimized. A solution is given by computing a matrix T which in the Fourier domain can be formulated as Z T ¼ uuT W ðuÞjGðuÞj2 du, (5) where u is the n-dimensional frequency variable and the solution is then given by the largest eigenvector of T. W is a real-valued weighting function which is used to reduce the influence of higher frequencies. To make the magnitude of T rotational invariant W is assumed ^ ¼ W 0 ðtÞ, typically to be rotational symmetric, W ðtmÞ a Gaussian. To simplify the subsequent derivations, Eq. (5) can be rewritten as a double integral, ZZ T¼ dðu vÞuvT GðuÞGðvÞW ðuÞW ðvÞ du dv ZZ ¼ dðu þ vÞuvT GðuÞGðvÞW ðuÞW ðvÞ du dv. ð6Þ
For reasons which will be apparent shortly, the impulse function d is approximated by a function P, which gives ZZ T ¼ Pðu þ vÞuvT GðuÞGðvÞW ðuÞW ðvÞ du dv. (7) Under the assumption that P is symmetric, Section A.1 shows that a spatial representation of this expression is given by Z T ¼ ð2pÞ2n pðxÞ½rðg wÞðxÞ ½rðg wÞðxÞ T dx, (8) where p and w are the inverse Fourier transforms of P and W, respectively, and rðg wÞ is the local gradient of g w. From this expression it is clear that W serves as a regularizing function in the computation of the gradient of g. Furthermore, P ¼ d implies that the integration of the outer product of this gradient with itself is done over the entire signal. To obtain a localized T we can, e.g., choose P as a normalized Gaussian of suitable standard deviation s. The two functions P and W then are typically chosen as 2 1 2 PðuÞ ¼ pffiffiffiffiffiffi n ejuj =2sp ; ð 2psp Þ
2
2
W ðuÞ ¼ ejuj =2sw . (9)
However, the more localized T is, i.e., the larger we choose sp , the more T will deviate from the original formulation in Eq. (5). Notice that when the local signal g is simple, Eq. (1), it follows immediately that T is given by Eq. (2). The above formulation of the orientation tensor, Eq. (8), can also be derived by solving an optimization problem in the spatial domain, see [5]. Compare also with the formulation of the Harris corner detector [7]. 2.2. Quadrature filters The issue of ensuring phase invariance of the resulting tensor is addressed in [10] which derives the representation of local orientation of simple signals according to Eq. (2) from certain desired properties of the representation. A set of
ARTICLE IN PRESS K. Nordberg, G. Farneba¨ck / Signal Processing: Image Communication 20 (2005) 582–594
quadrature filters are then defined in the Fourier domain according to F k ðuÞ ¼ RðuÞr2 ð^uT n^ k Þ, rðzÞ ¼ z stepðzÞ ¼
z þ jzj ; 2
(10) u ¼ u^u;
u ¼ kuk, (11)
where n^ k are filter direction vectors, one for each filter. R is a real-valued radial weighting function which should be chosen so that the resulting filters have a reasonable localization in the signal domain and are tuned to the signal’s scale, e.g., as the lognormal function proposed in [10]. Notice that F k ðuÞ ¼ 0 when u^ T n^ k o0, which means that the filters as well as the filter responses are complex valued in the signal domain. For a simple signal, Eq. (1), the magnitudes of the corresponding filter responses are given by ^ T n^ k Þ2 ¼ trace½NTk T ¼ hTjNk i, jqk j ¼ lðm
(12)
^k ¼ and the expression in the rightwhere N hand side defines a scalar product on tensors. By using the magnitude of qk , phase invariance is ensured for the case of single frequency signals. Following the presentation in [6], T is then computed at point x as X ~ k, TðxÞ ¼ jqk ðxÞjN (13) n^ k n^ Tk
k
where qk ðxÞ are the responses from filter F k at ~ k is a symmetric tensor that is dual point x. N ^ k , i.e., relative to N ^ l i ¼ trace½N ^ TN ~ k jN ~ k ¼ dkl . hN l
(14)
The number of filters in the set has to be equal to the number of independent elements of a symmetric n n matrix, i.e., 3 for the 2D case and 6 for the 3D case. 2.3. Local polynomial expansion A third method for estimating orientation tensors is presented in [2,3]. It is based on local polynomial expansions of the signal. At each point the signal is approximated by gmodel ðxÞ ¼ xT Ax þ xT b þ c,
(15)
585
where x are local coordinates and the expansion coefficients A, b, and c are determined by minimizing the weighted norm kgmodel gk2wa . The solution can be computed by convolution with the dual basis functions relative to the monomial basis functions in the signal model. The weighting function wa is called the applicability function and there is a certain degree of freedom in how to choose it. However, to avoid a directional bias for non-simple signals it needs to be isotropic. If it is also Cartesian separable, therefore Gaussian, all filtering operations can be performed by filters which are Cartesian separable in the signal domain which makes the computations very efficient. In the continuous case, with wa as a normalized Gaussian with standard deviation s, the filters can be computed explicitly and we get closed form expressions for A and b as convolution responses, 1 1 T AðxÞ ¼ xx wa ðxÞ 2 Iwa ðxÞ gðxÞ, (16) 2s4 2s 1 bðxÞ ¼ 2 xwa ðxÞ gðxÞ. s
(17)
In this specific case b and A coincide with Gaussian derivatives and second derivatives, respectively. It should also be mentioned that the polynomial expansion is derived from a technique called normalized convolution [12,3] which also provides support for processing of uncertain or missing data by including certainty about the signal values in the weighting function. From the polynomial expansion an orientation tensor can be computed as T ¼ AAT þ gbbT ,
(18)
where gX0 is a parameter which controls the relative weight of the even and odd parts of the local signal. In Section 5.2 it is discussed how to make T phase invariant for specific signal frequencies by tuning g. For the simple signal case, Eq. (1), and with a sufficiently symmetric wa , A has at most rank one and is directed consistently with b so that T has the form given by Eq. (2).
ARTICLE IN PRESS K. Nordberg, G. Farneba¨ck / Signal Processing: Image Communication 20 (2005) 582–594
586
3. Orientation tensor estimation as a second-order filter The thesis of this work is that it is possible to formulate a common framework in which the different methods for estimation of an orientation tensor can be placed. With such a framework at hand, it should be simpler to analyze a method which falls within the framework. A framework may also facilitate comparisons between different methods, and possibly also allow synthesis of completely new methods for computing the tensor. However, the focus in this article is mainly on the formulation of the framework. Steps in this direction have already been taken, e.g., [9] which tries to make a qualitative comparison between the methods. A quantitative comparison between the gradient-based method and the quadrature filterbased method is presented in [11]. A framework based on Lie groups is given in [14], where conditions on the mapping from signal to tensor are derived which guarantees that a set of group transformations of the signal and of the tensor are in correspondence, e.g., that rotations of the signal induces the corresponding rotations of the tensor’s eigenvectors. Second-order polynomial mappings for the 3D case are studied in detail. A first clue how to characterize a common framework for the presented methods is given by the fact that all of the methods presented in Section 2, except the quadrature filter-based approach, compute the resulting tensor as second-order expressions of the local signal. However, the quadrature filter approach can be formulated as a second-order mapping from signal to descriptor by introducing suitable modifications. This can for example be done by changing Eq. (13) to X ~k T¼ jqk j2 N (19) k
and at the same time modifying the quadrature filters to F k ðuÞ ¼ RðuÞrð^uT n^ k Þ.
(20)
This will give the same resulting tensor for the simple signal case, disregarding a multiplicative constant, since the norm of T now depends on the
square of the signals amplitude instead of being proportional to it [11]. The fact that all of the above methods (with the modified quadrature filter approach) can be formulated as second-order mappings from signal to tensor indicates that such mappings can be used as a common ground for computing the tensor. This implies that instead of applying a set of linear filters on the signal, followed by a sequence of various non-linear and linear operations on the filter results, we will see the entire mapping from signal to descriptor as one single operation that is described as a Volterra series with only a secondorder term. See [16] for an overview of signal processing based on second-order filters. If g is the local (real-valued) n-dimensional signal, a general second-order convolution on g can be defined as ZZ qðxÞ ¼ f ðy; zÞgðx yÞgðx zÞ dy dz, (21) where f is the second-order filter. We can assume f to be symmetric in the following sense: f ðy; zÞ ¼ f ðz; yÞ,
(22)
since anti-symmetric terms in f give no contributions to q. The right-hand side of Eq. (21) can be written in terms of the Fourier transforms of f and g: ZZ T 2n qðxÞ ¼ ð2pÞ F ðu; vÞGðuÞGðvÞeiðuþvÞ x du dv, (23) where F is the 2n-dimensional Fourier transform of the filter function f, and G is the n-dimensional Fourier transform of g. Notice that if q is real for all real g then F ðu; vÞ ¼ F ðu; vÞ
(24)
and from Eq. (22) it follows that F ðu; vÞ ¼ F ðv; uÞ.
(25)
Assuming a local view we can set x ¼ 0, q ¼ qð0Þ, and get ZZ q ¼ ð2pÞ2n F ðu; vÞGðuÞGðvÞ du dv. (26) In the following, we will make the assumption that each element of the local orientation tensor can be computed as a second-order function on the local
ARTICLE IN PRESS K. Nordberg, G. Farneba¨ck / Signal Processing: Image Communication 20 (2005) 582–594
signal g. Consequently, to each element of the tensor, T ij , there is a filter function f ij with a corresponding transform F ij and we can write ZZ T ij ¼ ð2pÞ2n F ij ðu; vÞGðuÞGðvÞ du dv. (27) If we want to maintain the above assumption, the immediate question becomes: how should we choose the functions F ij to get a tensor T which is useful for orientation representation? The approach chosen here is to consider the case of simple signals, Eq. (1). In this case GðuÞ vanishes ^ and Eq. (27) then reduces to a for all uatm, ^ and double line integral along the lines u ¼ tm v ¼ tv: ZZ ^ tmÞG ^ 0 ðtÞG 0 ðtÞ dt dt, T ij ¼ ð2pÞ2 F ij ðtm; (28) ^ A sufficient condition for F ij where G 0 ðtÞ ¼ GðtmÞ. ^m ^ T is then given by to produce T ¼ lm ^ tmÞ ^ ¼m ^ im ^ j Hðt; tÞ, F ij ðtm;
(29)
where H for the moment is arbitrary except for Hðt; tÞ ¼ Hðt; tÞ;
Hðt; tÞ ¼ Hðt; tÞ
(30)
to make T real. Substituting Eq. (29) into Eq. (28) gives ZZ ^ im ^ j Hðt; tÞG0 ðtÞG0 ðtÞ dt dt, T ij ¼ ð2pÞ2 m (31) ZZ ^m ^ T Hðt; tÞG 0 ðtÞG 0 ðtÞ dt dt. T ¼ ð2pÞ2 m
(32)
It should be noted that the particular choice of F ij ^ which described above makes l independent of m normally is a desirable property. In addition to this we also require that H has the property ZZ l ¼ ð2pÞ2 Hðt; tÞG 0 ðtÞG 0 ðtÞ dt dtX0 for all G 0 . (33) From this it follows that any set of functions F ij which meet the conditions in Eqs. (29), (30) and ^m ^ T , with lX0, for the (33) will produce T ¼ lm simple signal case. Let us summarize this main
587
result properly: The rank one conditions If the orientation tensor is computed as RR T ij ¼ ð2pÞ2n F ij ðu; vÞGðuÞGðvÞ du dv; ^m ^ T; then a sufficient condition to get T ¼ lm ^ is lX0; for a simple signal gðxÞ ¼ g0 ðxT mÞ ^ tmÞ ^ ¼m ^ im ^ j Hðt; tÞ; F ij ðtm; where Hðt; tÞ ¼ Hðt; tÞ; Hðt; tÞ ¼ Hðt; tÞ and RR l ¼ ð2pÞ2 Hðt; tÞG 0 ðtÞG 0 ðtÞ dt dtX0 for all G 0 : (34) At this point it may be appropriate to try to give an intuitive interpretation of the function H. First of all, it can be given a meaning only if the local signal is simple, Eq. (1). Second, the condition on F ij stated in Eq. (29) has been chosen to guarantee ^ Third, H is involved that l does not depend on m. in the computation of l by weighting all possible products between pairs of frequency components in g. This last observation allows us to restrict H even further. In most practical cases we do not want T to be dependent on the local DCcomponent. This is guaranteed if we impose the additional condition Hðo; 0Þ ¼ Hð0; oÞ ¼ 0,
(35)
which should be valid for all o. Notice, that due to the symmetry of H, Eq. (30), the last condition does not contain two independent conditions. This last condition is not necessary for obtaining a T which is of rank one in the case of simple signals, but in the following we will sometimes explicitly assume that also this condition is satisfied since it simplifies some of the derivations.
4. Establishing the framework In this section, we will establish the proposed framework by showing that the three methods presented in Section 2, using the modified quadrature-based method, meet the requirements stated
ARTICLE IN PRESS K. Nordberg, G. Farneba¨ck / Signal Processing: Image Communication 20 (2005) 582–594
588
in the previous section. This implies that for each method we need first to demonstrate that there exist second-order filters F ij such that T can be written according to Eq. (28) and then that these filters satisfy Eqs. (29), (30) and (33).
4.1. Outer products of gradients In this case we have already derived, in Eq. (7), that F ij ðu; vÞ ¼ Pðu þ vÞui vj W ðuÞW ðvÞ,
(37)
which satisfies the conditions in Eqs. (29) and (30) for (44) H 3 ðt; tÞ ¼ 14 tt g ttW 0a ðtÞW 0a ðtÞ.
(38)
and the conditions in Eqs. (29) and (30) are thus satisfied. In Section A.2 it is shown that H 1 also meets the condition in Eq. (33) as soon as p is a non-negative function.
4.2. Quadrature filters In the case that T is estimated as in Eq. (19) where the filter functions are given by Eq. (20), Section A.3 derives an expression for F ij given by F ij ðu; vÞ ¼ RðuÞRðvÞ
X rð^uT n^ k Þrð^vT n^ k Þ þ rð^uT n^ k Þrð^vT n^ k Þ k
2
N~ ij;k ,
ð39Þ
where u ¼ u^u, v ¼ v^v. In Section A.3 it is also shown that this formulation of F ij leads to ^ tmÞ ^ ¼m ^ im ^ j RðjtjÞRðjtjÞstepðttÞ, F ij ðtm;
(40)
which shows that Eqs. (29) and (30) are satisfied with H 2 ðt; tÞ ¼ RðjtjÞRðjtjÞstepðttÞ.
and ^ tmÞ ^ ¼m ^ im ^ j 14 tt g ttW 0a ðtÞW 0a ðtÞ, F ij ðtm;
where P0 and W 0 are the 1-variable versions of P and W. This means that H 1 ðt; tÞ ¼ P0 ðt þ tÞttW 0 ðtÞW 0 ðtÞ
In the case of a polynomial expansion based estimation of T, a Fourier transformation of Eqs. (16) and (17) inserted into Eq. (18) followed by an identification with Eq. (27) gives (Section A.4) u i v j þ v i uj 1 T u v g W a ðuÞW a ðvÞ F ij ðu; vÞ ¼ 4 2 (42)
(36)
from which it follows that ^ tmÞ ^ ¼ m ^ im ^ j P0 ðt þ tÞttW 0 ðtÞW 0 ðtÞ, F ij ðtm;
4.3. Polynomial expansion
(43)
In Section A.4, it is shown that H 3 also satisfies Eq. (33).
5. The l function For methods which fall within the proposed framework a deeper analysis can be made by considering how l, the non-zero eigenvalue3 of T, depends on the local signal described by g. More precisely, assuming that g is defined in terms of the single variable function g0 , Eq. (1), we can derive an expression for the variation of l when we move across the local structure. Hence, we write the local signal g as ^ þ x0 Þ. gðxÞ ¼ g0 ðxT m
(45)
This corresponds to a modification of the expression for l given in Eq. (33) according to ZZ lðx0 Þ ¼ ð2pÞ2 Hðt; tÞG0 ðtÞG0 ðtÞeiðtþtÞx0 dt dt, (46) where l now is written as a function of x0 to emphasize that we are varying this parameter to move across the structure. Substituting variables t and t for r and s according to r ¼ t þ t, s ¼ t t,
(41)
Furthermore, H 2 can be shown to meet the condition in Eq. (33), Section A.3.
3 Strictly speaking, there is no guarantee that the eigenvalue l does not vanish for an arbitrary signal g.
ARTICLE IN PRESS K. Nordberg, G. Farneba¨ck / Signal Processing: Image Communication 20 (2005) 582–594
results in the following expression for l: ZZ r þ s r s 0 r þ s
; lðx0 Þ ¼ ð2pÞ2 H G 2 2 2 r s
1 G 0 eirx0 dr ds 2 2 Z ¼ ð2pÞ1
where
MðrÞeirx0 dr,
of the signal, and since also the amplitude is of no consequence for the following discussion we write g0 simply as g0 ðxÞ ¼ cosðoxÞ which has a Fourier transform given by G0 ðuÞ ¼ p½dðu wÞ þ dðu þ wÞ . ð47Þ
Z
r þ s r s
; 2 2 r þ s r s
G0 G0 ds. ð48Þ 2 2 Note that M depends both on the function H, related to the filter, and G0 which is related to the signal. It can also be interpreted as the Fourier transform of lðx0 Þ. A few different cases of the function g0 , with corresponding transform G 0 , are discussed below. 1 MðrÞ ¼ 4p
589
H
5.1. Impulse and step responses If we choose g0 ðxÞ ¼ dðxÞ we will get lðx0 Þ which is a type of ‘‘impulse line response’’ (or impulse hyperplane response in general). In this case G 0 ðuÞ ¼ 1, which leads to Z r þ s r s
1 H ; M imp ðrÞ ¼ ds. (49) 4p 2 2 By computing M imp for a particular or two different methods, i.e., for one or several functions H, it will be possible to quantitatively analyze the performance of a method or compare two methods in terms of their impulse line responses. In a similar way we can choose g0 as a step function, gðxÞ ¼ stepðxÞ, which implies GðuÞ ¼ pdðuÞ i=u. Assuming that Eq. (35) is true, the corresponding function M is Z 1 Hððr þ sÞ=2; ðr sÞ=2Þ ds. (50) M step ðrÞ ¼ p ðr þ sÞðr sÞ 5.2. Phase invariance Let us consider a local signal which contains a single-frequency component. Assuming that Eq. (35) is valid, we can disregard any DC-component
(51)
The corresponding function M can then be written M phase ðrÞ Z r þ s r s
p H ; ¼ 4 2 2 h r þ s
r þ s
i o þd þo d 2 2 h r
r
i s s d o þd þ o ds 2 2 p ¼ ½Hðo; oÞdðr 2oÞ þ Hðo; oÞdðrÞ 2 þ Hðo; oÞdðrÞ þ Hðo; oÞdðr þ 2oÞ . ð52Þ We can now formulate a condition on H to make lðx0 Þ constant with respect to x0 . That situation occurs if and only if Hðo; oÞ ¼ 0 since this implies that also Hðo; oÞ ¼ 0, Eq. (30) and that M phase ðrÞ only contains frequency components at r ¼ 0. The assumption that the local signal only containing a single frequency is sometimes used for introducing phase and phase invariant descriptors [6]. In this context, it means that if we are interested in computational methods which produce a tensor that is phase invariant then we have to require that Hðo; oÞ ¼ 0,
(53)
where o is the frequency of the local signal. In practice, the local frequency o will be different in different parts of the signal and the last condition has then to be valid for all o. At this point we can investigate the phase invariance properties of the three different methods. First of all, it is clear that the condition in Eq. (35) is valid for all three methods. For the first method, we also see that H 1 ðo; oÞ ¼ P0 ð2oÞo2 ½W 0 ðoÞ 2 .
(54)
Consequently, only for o which are sufficiently large to make P0 much smaller than the other two factors can H 1 be said to approximately vanish.
ARTICLE IN PRESS 590
K. Nordberg, G. Farneba¨ck / Signal Processing: Image Communication 20 (2005) 582–594
To get a larger range of such o it is then necessary to make P0 as narrow as possible. However, this will also increase the spatial support of F ij which is used to compute T. In the case of the quadrature filter based method we get H 2 ðo; oÞ ¼ 0 for all o which means that this method is phase invariant for all frequencies. However, this is not the case for H 3 :
2 o 0 g o2 W a2 ðoÞ, (55) H 3 ðo; oÞ ¼ 4 which implies that T is phase invariant only for the pffiffiffi frequency o ¼ 2 g.
6. Summary and discussion In this article, we have formulated a computational framework which can deal with tensors for representation of orientation and estimation of velocity. The framework is based on a secondorder filter F ij from signal to tensor, where the filter has to meet the three conditions, Eqs. (29), (30) and (33), which are sufficient for producing a rank one tensor in the case that the local signal is simple, Eq. (1). This implies that velocity representation is only covered for the case of moving lines or edges. An additional condition, Eq. (35), is also introduced to assure that the resulting tensor does not depend on the local DC-level. A key component in all these conditions is the two variable function H which describes how the products of pairs of frequency components of the simple signal are weighted to give l, the non-zero eigenvalue of the tensor. In Section 4 we have also seen that three methods which are presented in the literature fall within the proposed framework, each being described by its own function H. These functions are illustrated in Section A.5. It should be emphasized that the proposed framework is not limited to only these three methods. Examples of other methods which fall within the framework are the energy tensor [4] and the boundary tensor [13]. The main motivation for developing a framework of this type is that it enables the analysis and comparison of different computational approaches. To this end, Section 5 provides formula-
tions of characteristic functions which can be used, e.g., for analysis of impulse responses or phase invariance. The proposed framework is sufficiently general to allow the formulation of various questions or investigations regarding methods for estimation of orientation tensors. These are all outside the scope of this article, but a few of them are mentioned here for future work in the area. First, it should be possible to use the framework for synthesizing new computational methods by demanding that the second-order filter F ij , in addition to satisfying the critical conditions, has certain desirable properties for solving a particular problem. However, even though any such filter has a realization in the spatial domain, according to Eq. (21), it may not be possible to decompose it into reasonably simple computational steps, see [16]. To advance in this direction, a better understanding of the decomposition process in relation to the stipulated conditions is required. Second, an analysis restricted to local signals which are simple does not suffice for a full understanding of how the different methods compare. Consequently, it should be interesting to formulate a theory for orientation estimation which includes signals of higher intrinsic dimensionality than one.
Acknowledgments This work has been performed within the WITAS project, funded by the Knut and Alice Wallenberg foundation, and the VISCOS project funded by the Swedish Foundation for Strategic Research.
Appendix A In this appendix, some of the mathematical derivations used in the previous discussions are presented in more detail. A.1. Eq. (8) follows from Eq. (7) Starting from Eq. (7) we replace the functions P, G and W with the explicit expressions of their
ARTICLE IN PRESS K. Nordberg, G. Farneba¨ck / Signal Processing: Image Communication 20 (2005) 582–594
spatial functions p, g and w: ZZZ Z iðuvÞx T¼ pðxÞe dx rðg wÞðyÞeiuy dy Z rT ðg wÞðzÞeivz dz du dv. ð56Þ We already assume that g is real-valued. In Section 2.1 it was assumed that W is real-valued and symmetric from which it follows that also w is realvalued. By changing the order of integration in the previous expression, we therefore get ZZZ T¼ pðxÞrgðyÞrT gðzÞ ZZ eiuðxþyÞ eivðzþxÞ du dv dx dy dz ZZZ pðxÞrgðyÞrT gðzÞdðx þ yÞ ¼ ð2pÞ2n dðx þ zÞ dx dy dz Z 2n pðxÞrgðxÞrT gðxÞ dx. ¼ ð2pÞ
ð57Þ
Finally, if we make the assumption that P is symmetric, which implies that p also is symmetric, we get the formulation presented in Eq. (8).
A.2. H 1 meets the condition in Eq. (33) To see that H 1 meets the condition in Eq. (33), we consider the following integral: ZZ H 1 ðt; tÞG 0 ðtÞG 0 ðtÞ dt dt ZZ ¼ P0 ðt þ tÞttW 0 ðtÞW 0 ðtÞG 0 ðtÞG 0 ðtÞ dt dt ZZ ¼ P0 ðt tÞttW 0 ðtÞW 0 ðtÞG0 ðtÞG0 ðtÞ dt dt. ð58Þ P0 is the Fourier transform of the spatial window function p0 , i.e., Z P0 ðt tÞ ¼ p0 ðxÞeiðttÞx dx (59)
591
and consequently ZZ
H 1 ðt; tÞG0 ðtÞG0 ðtÞ dt dt ZZZ ¼ p0 ðxÞeiðttÞx dxttW 0 ðtÞ W 0 ðtÞG 0 ðtÞG 0 ðtÞ dt dt Z Z 0 ¼ p ðxÞ tW 0 ðtÞG 0 ðtÞeitx dt Z tW 0 ðtÞG0 ðtÞeitx dt dx Z Z ¼ p0 ðxÞcðxÞcðxÞ dx ¼ p0 ðxÞjcðxÞj2 dx,
ð60Þ
where cðxÞ is substituted for the integral over t. Clearly, the last expression is not negative if we also assume p0 ðxÞX0. A.3. F ij and H for quadrature filters The filter responses qk from the quadrature filters are given by Z qk ¼ ð2pÞn F k ðuÞGðuÞ du Z n RðkukÞrð^uT n^ k ÞGðuÞ du. ð61Þ ¼ ð2pÞ Inserted into Eq. (19) this gives Z X 2n RðkukÞrð^uT n^ k ÞGðuÞ du ð2pÞ T ij ¼ k
Z RðkvkÞrð^vT n^ k ÞGðvÞ dvN~ ij;k ZZ X ¼ ð2pÞ2n RðkukÞRðkvkÞ rð^uT n^ k Þ k
rð^v n^ k ÞN~ ij;k GðuÞGðvÞ du dv T
ð62Þ
and since r is real-valued and we assume R to be real-valued and G to be Hermitian, this can be rewritten as (v has been substituted for v) ZZ X T ij ¼ ð2pÞ2n RðkukÞRðkvkÞ rð^uT n^ k Þ k
rð^vT n^ k ÞN~ ij;k GðuÞGðvÞ du dv.
ð63Þ
ARTICLE IN PRESS K. Nordberg, G. Farneba¨ck / Signal Processing: Image Communication 20 (2005) 582–594
592
By observing the symmetry property of F ij , Eq. (25), this filter is then given by
which means that F ij satisfies Eqs. (29) and (30) with
F ij ðu; vÞ ¼ RðkukÞRðkvkÞ
H 2 ðt; tÞ ¼ RðjtjÞRðjtjÞ stepðttÞ.
X rð^uT n^ k Þrð^vT n^ k Þ þ rð^uT n^ k Þrð^vT n^ k Þ 2
k
N~ ij;k .
ð64Þ
^ v ¼ tm ^ gives Substituting u ¼ tm; ^ u^ ¼ signðtÞm;
^ v^ ¼ signðtÞm,
(65)
from which it follows that
H 2 meets the condition in Eq. (33) since ZZ H 2 ðt; tÞG0 ðtÞG0 ðtÞ dt dt ZZ ¼ RðjtjÞRðjtjÞ stepðttÞG0 ðtÞG0 ðtÞ dt dt ZZ ¼ RðjtjÞRðjtjÞ stepðttÞG 0 ðtÞG 0 ðtÞ dt dt Z 1Z 1 RðjtjÞRðjtjÞG 0 ðtÞG 0 ðtÞ dt dt ¼ 0
rð^uT n^ k Þrð^vT n^ k Þ T
Z
T
þ Z
^ n^ k þ jm ^ n^ k j signðtÞm ¼ 2 ^ T n^ k þ jm ^ T n^ k j signðtÞm 2 1 ^ T n^ k Þ2 þ ðsignðtÞ signðtÞÞ ¼ 4 ½signðttÞðm ^ T n^ k Þjm ^ T n^ k j þ ðm ^ T n^ k Þ2 ðm
Z
1
0 0 0
Z
0
RðjtjÞG ðtÞ dt 1
RðjtjÞG 0 ðtÞ dt
1
¼ c1 c1 þ c2 c2 ¼ jc1 j2 þ jc2 j2 X0
ð71Þ
for all functions G 0 . A.4. F ij and H for polynomial expansion ð67Þ
We can now write X X
^ T n^ k Þ2 N~ ij;k ðm
W a ðuÞ ¼ es
^m ^ T j^nk n^ Tk iN~ ij;k . hm
k
ð68Þ ^ k ¼ n^ k n^ T and From the duality relation between N k ~ k , Eq. (14), it follows that the sum in the last N ^ im ^ j , and we finally arrive expression amounts to m to the relation ^ tmÞ ^ ¼m ^ im ^ j RðjtjÞRðjtjÞ stepðttÞ, F ij ðtm;
Let T be estimated by means of a polynomial expansion using a normalized Gaussian applicability function wa ðxÞ with corresponding Fourier transform W a ðuÞ, 2 1 2 wa ðxÞ ¼ pffiffiffiffiffiffi n ejxj =2s , ð 2psÞ
k
¼ RðjtjÞRðjtjÞ stepðttÞ
0
RðjtjÞRðjtjÞG 0 ðtÞG 0 ðtÞ dt dt Z 1 RðjtjÞG 0 ðtÞ dt RðjtjÞG 0 ðtÞ dt
þ
rð^uT n^ k Þrð^vT n^ k Þ þ rð^uT n^ k Þrð^vT n^ k Þ 1 signðttÞ T 2 ^ n^ k Þ ¼ stepðttÞðm ^ T n^ k Þ2 . ðm ¼ 2
¼ RðjtjÞRðjtjÞ stepðttÞ
Z
0
and
^ tmÞ ^ F ij ðtm;
0 0
1 1
¼
ð66Þ
(70)
(69)
2
juj2 =2
.
ð72Þ
If we assume a local view, x ¼ 0, Eqs. (16) and (17) can be rewritten Z 1 1 Aij ¼ xi xj wa ðxÞ 2 dij wa ðxÞ 2s4 2s gðxÞ dx, Z 1 bi ¼ 2 xi wa ðxÞ gðxÞ dx. s
ð73Þ
(74)
ARTICLE IN PRESS K. Nordberg, G. Farneba¨ck / Signal Processing: Image Communication 20 (2005) 582–594
These two relations can also be expressed by integrals in the Fourier domain as follows: Z 1 ui uj W a ðuÞGðuÞ du, Aij ¼ (75) 2ð2pÞn
bi ¼
i ð2pÞn
From Eq. (27), together with Eq. (25), we can now identify F ij as u i v j þ v i uj 1 T u v g W a ðuÞW a ðvÞ F ij ðu; vÞ ¼ 4 2 (78)
Z ui W a ðuÞGðuÞ du.
and
(76)
According to Eq. (18), the elements of the tensor are then given by X Aik Ajk þ gbi bj T ij ¼ k
¼ ð2pÞ
ZZ
2n
"
1X uk v k g ui v j 4 k
#
^ tmÞ ^ ¼m ^ im ^ j 14 tt g ttW 0a ðtÞW 0a ðtÞ. F ij ðtm;
(79)
Thus we get H 3 ðt; tÞ ¼ 14 tt g ttW 0a ðtÞW 0a ðtÞ,
(80)
which satisfies Eqs. (29) and (30). Furthermore, ZZ H 3 ðt; tÞG0 ðtÞG0 ðtÞ dt dt ZZ 1 t2 t2 W 0a ðtÞW 0a ðtÞG 0 ðtÞG 0 ðtÞ dt dt ¼ 4 ZZ g ttW 0a ðtÞW 0a ðtÞG0 ðtÞG0 ðtÞ dt dt
W a ðuÞW a ðvÞGðuÞGðvÞ du dv ZZ 1 T 2n ui v j u v g ¼ ð2pÞ 4 W a ðuÞW a ðvÞGðuÞGðvÞ du dv.
ð77Þ
2
2
0
-2 0
2
0
-2 0
-2
0 2
3
3
2
2
2
1
1
1
0
0
0
-1
-1
-1
-2
-2
-2
-3 -2
-1
0 H1
-2
2
3
-3
0
-2
-2
2
-3 1
2
3
593
-3 -3
-2
-1
0 H2
1
2
3
-3
-2
-1
Fig. 1. The functions H 1 ; H 2 , and H 3 as mesh plots (top) and contour plots (bottom).
0 H3
1
2
3
ARTICLE IN PRESS K. Nordberg, G. Farneba¨ck / Signal Processing: Image Communication 20 (2005) 582–594
594
ZZ 1 t2 t2 W 0a ðtÞW 0a ðtÞG 0 ðtÞG 0 ðtÞ dt dt 4 ZZ þ g ttW 0a ðtÞW 0a ðtÞG 0 ðtÞG 0 ðtÞ dt dt Z Z 1 t2 W 0a ðtÞG 0 ðtÞ dt t2 W 0a ðtÞG 0 ðtÞ dt ¼ 4 Z Z 0 0 þ g tW a ðtÞG ðtÞ dt tW 0a ðtÞG0 ðtÞ dt
¼
¼ 14 c1 c1 þ gc2 c2 ¼ 14 jc1 j2 þ gjc2 j2 ,
[5]
[6]
[7]
ð81Þ [8]
which proves that the condition in Eq. (33) holds for gX0.
[9]
A.5. Illustrations This section presents Fig. 1 with some illustrations of the functions H 1 , H 2 and H 3 which have been derived in the previous sections. The dashed lines in the contour plots describe the points where H k vanishes. For H 1 this is only on the axes t ¼ 0 and t ¼ 0. H 3 vanishes on both axes too and also on the hyperbolas tt ¼ 4g. H 2 vanishes everywhere in the upper right and lower left quadrants, illustrated by white squares in the figure. Notice also that H 1 is positive in the upper left and lower right quadrants and negative in the other two quadrants.
[10]
References
[14]
[1] J. Bigu¨n, G.H. Granlund, Optimal orientation detection of linear symmetry, in: Proceedings of the IEEE First International Conference on Computer Vision, June 1987, London, pp. 433–438. [2] G. Farneba¨ck, Spatial domain methods for orientation and velocity estimation. Lic. Thesis LiU-Tek-Lic-1999:13, Department of Electrical Engineering, Linko¨ping University, SE-581 83 Linko¨ping, Sweden, March 1999, Thesis No. 755, ISBN 91-7219-441-3. [3] G. Farneba¨ck, Polynomial expansion for orientation and motion estimation, Ph.D. Thesis, Linko¨ping University, Sweden, SE-581 83 Linko¨ping, Sweden, 2002, Dissertation No. 790, ISBN 91-7373-475-6. [4] M. Felsberg, G.H. Granlund, POI detection using channel clustering and the 2D energy tensor, in: Pattern Recogni-
[11]
[12]
[13]
[15]
[16] [17]
tion: 26th DAGM Symposium, Lecture Notes in Computer Science, vol. 3175, Tu¨bingen, Germany, 2004, pp. 103–110. W. Fo¨rstner, E. Gu¨lch, A fast operator for detection and precise location of distinct points, corners and centres of circular features, in: ISRPS Intercommission Workshop, Interlaken, June 1987, pp. 149–155. G.H. Granlund, H. Knutsson, Signal Processing for Computer Vision, Kluwer Academic Publishers, 1995 ISBN 0-7923-9530-1. C.G. Harris, M. Stephens, A combined corner and edge detector, in: Fourth Alvey Vision Conference, September 1988, pp. 147–151. B. Ja¨hne, H. Haussecker, editors. Computer Vision and Applications. Academic Press, Dordrecht, 2000, ISBN 0-12-379777-2. B. Johansson, G. Farneba¨ck, A theoretical comparison of different orientation tensors, in: Proceedings SSAB02 Symposium on Image Analysis, SSAB, Lund, March 2002, pp. 69–73. H. Knutsson, Representing local structure using tensors, in: The Sixth Scandinavian Conference on Image Analysis; Oulu, Finland, June 1989, pp. 244–251; Report LiTH-ISYI-1019, Computer Vision Laboratory, Linko¨ping University, Sweden, 1989. H. Knutsson, M. Andersson, What’s so good about quadrature filters? in: Proceedings of the International Conference on Image Processing, Barcelona, Spain, September 2003, Invited Paper. H. Knutsson, C.-F. Westin, Normalized and Differential Convolution: Methods for Interpolation and Filtering of Incomplete and Uncertain Data, IEEE, New York, June 1993, pp. 515–523. U. Ko¨the, Integrated edge and junction detection with the boundary tensor, in: Proceedings of Ninth International Conference on Computer Vision, Nice, vol. 1, 2003, pp. 424–431. K. Nordberg, Signal representation and processing using operator groups, Ph.D. Thesis, Linko¨ping University, Sweden, SE-581 83 Linko¨ping, Sweden, 1995, Dissertation No. 366, ISBN 91-7871-476-1. K. Nordberg, H. Knutsson, G. Granlund, Local curvature from gradients of the orientation tensor field, Report LiTH-ISY-R-1783, Computer Vision Laboratory, SE-581 83 Linko¨ping, Sweden, August 1995. G.L. Sicuranza, Quadratic filters for signal processing, Proc. IEEE 80 (1) (August 1992) 1263–1285. C.-F. Westin, L.M. Lorigo, O.D. Faugeras, W.E.L. Grimson, S. Dawson, A. Norbash, R. Kikinis, Segmentation by adaptive geodesic active contours, in: Proceedings of the Third International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2000, Pittsburgh, 11–14 October 2000, pp. 266–275.