Pattern Recognition, Vol. 29, No. 8, pp. 1385 1399, 1996 Elsevier Science Ltd Copyright ~l;~,1996 Pattern Recognition Society Printed in Great Britain, All right reserved 0031 3203/96 $15.00 + 00
Pergamon
0031-3203 (95) 00161-1
DEFORMATION DETECTION WITH FREQUENCY MODULATION BERTRAND COLLIN* and BERTRAND ZAVIDOVIQUEt:~ *CREA/Systrme de Perception 16bis, av. Prieur de la C6te d'Or, 94114 Arcueil, France "t'IEF Universit6 de Paris Sud, 91405 Orsay, France (Received 1 November 1994; in revised form 31 October 1995;received for publication 24 November 1995)
Abstract--We describe a robust and knowledge-freemethod to perform the deformation extraction from textured images. We first recall how the Solid Mechanicsapproach of deformation leads to a nonseparability of the couple "deformation/motion"in the case of elastic material, but sets a distinction in the plastic case. We then present a frequency-modulation/demodulation based method to recognize the deformation within textured images. Choices, algorithm limits and suitable data are discussed. Eventually,results are presented in the static case, that is on a single image, and then in tracking deformed areas within image sequences. Copyright © 1996 Pattern Recognition Society. Published by Elsevier Science Ltd. Image processing
Deformation analysis
Frequency modulation
1. INTRODUCTION "Motion detection" has been extensively studied during the last decade. "Deformation" as such was hardly investigated, although it is paradoxically used in a more "active" way, in many recognition techniques like "warping" for dynamic programming or snakes (see Terzopoulos' work, in general, for instanceI1) a multi-dimensional unusual extensionIz) and a metaactive process where deliberate small motion deforms the velocity field 13)) and in visual inspection based on structured lighting for instance. An interesting correlation is made by Milios,14) where deformation emerges from unmatched segments by dynamic programming on their shape and orientation. Another is made by Fujimura,15) where snakes and dynamic programming are used jointly. Deformation shows up intuitively, but definitely distinct from motion, in many applications often related to camouflage as in the above-mentioned industrial control, or dynamic body behind or among a wrapping environment (hares in tall grass, vehicles in foliage, traces after submersion, etc.). This means that shape might not exist or may not be immediately detectable. Although it is commonly referred to in areas of physics as different as solid mechanics and electromagnetism, the semantic of deformation is not clear. In this experiment, we try to
understand what deformation really is in order to design suitable deformation detectors. The guideline of our approach is as follows: referring to a dynamic process, one could try an initial premise, "any phenomenon not relating to motion relates to deformation", hence, a definition. It would force a separation that unfortunately does not hold [see reference $ Author for correspondence
Texture
(6)], however, where an attempt is made to separate deformation from motion aiming to exhibit rigid motion). Indeed, definitions from Mechanics do illustrate to which extent deformation needs motion to appear. However, there are restrictions to this wouldbe intimacy between motion and deformation, and these limits are especially obvious from an imageprocessing point of view. From a single picture, motion is not readily observed; only its consequences appear: a ghost of deformation. In that static case, without any further a priori knowledge, a deformation is conjectured in referring to a surrounding. The regularity is part of instinctive knowledge and can act as a filter to extract potential bent or warped zones. Consequently, this paper starts with an analysis of the couple motion/deformation from the point of view of physics. It leads to conclude that texture is a likely prop of deformation. Eventually, a method and algorithms are derived from the latter analysis that are described and commented on in the rest of the article, together with application examples.
2. DEFORMATIONAND INFINITESIMALMOTIONS Transformations involved in a deformation are intuitively "quasi-reversible." It is a matter of fact that a misshapen object has not yet lost its shape. The initial shape has not completely disappeared and it might be recovered. This "quasi-reversibility" can be explained well through a solid mechanics formulation. 2.1. The solid mechanics approach
In following the course of two neighboring points M o and M o + dMo, respectively, to M and M + dM, deformation indicates the increase d M - dMo. With
1385
1386
B. COLLIN and B. ZAVIDOVIQUE
U the displacement of M o into M: dM-dM
~U dM
o=~oo
o and
irrelevant. Adapted from the well-known conservation principle: OU ~=--
~M °'
~ivj + ~t = 0
(1)
where ~ is the tensor of deformation. It has been explicitly considered by Chen and Huang 171after estimating and compensating for the global motion, when a model (superquadrics here) is adequately available. The dynamic operator associated to a deformation tensor from a displacement field is the constraint tensor that tells the evolution of an elastic solid body under external forces. It is defined in a volumic manner by slicing the body. The linking force between two parts after splitting is:
as j = pv, where j is the current density, p the charge density and v the charge velocity, the grey level (~Fg) preservation equation implies:
dF = CdX,
The preservation equation restricts the case to transformations in which only motion or deformation change the pixel grey level. In the strict case of "constant grey-level motion," deformation shows up in the velocity field divergence. It would suffice to know initial and final parameter values of each component in the scene. When the velocity field is not conservative, ignoring the image formation process is all the more difficult; an additional assumption on joint consequences of deformation and grey-level formation is necessary. Other transforms can be involved, such as projections of 3D (three-dimensional) scenes on the image plane that will not be taken into account in the example here, but were specifically addressed by Chen and Penna ¢11~ for known objects and restricted motions. Considering that deformation induces a grey-level modification, one can describe evolutions of some nonconservative velocity field. If a pixel is merely a charged particle, one can understand that two pixels P1 and P2 with different grey levels, meeting at picture point M, produce a pixel P3 in M with grey level:
where dZ is the part of the plane surface intercepted by the body. Projections of C on the slicing plane and its normal n are, respectively, the tangential and normal constraints. Whence the constraint tensor a is given by: o'n = C .
Solid mechanics defined relations between e and a to classify deformations. The classification happens to open different research areas. Hooke's law e = a / E (E--Young's parameter) involves merely atomic displacements, thus is reversible. Linearity allows superimposition. This phase corresponds to elasticity and small actions. Another phase deals with an elasticity breaking off, introducing plasticity. A permanent state of elastic deformation, e,, so-called plastic, is superimposed to the linear Hooke's deformation and remains in the material after the constraints have been removed. Such considerations are common in synthesis, see for example reference (8). They are seldom applied to analysis with the exception of Pentland and Horowitz, t9) for articulated patterns, ~°) with the additional hypothesis: "nonrigid is coherent." This leads to state a principle of low amplitude. While motion is actually involved in the very definition of deformation, one is dealing with a specific type of motion: it is infinitesimal and reversible in the elasticity case, where it supports deformation. However, the movement does not matter much in the plasticity case where a permanent deformation is imprinted in the material. Thus, motion and deformation split in time, the latter being a consequence of the former. In both cases, deformation relies on the structure of bodies. Hence, two possible perceptions can result • finding deformations through motion; • finding deformations from structure and destructure of materials. 2.2. From motion to deformation With systems assumed open, one needs to define volumes where fields are computed: image windows can be used for that. They are frames within which images are known and outside of which images are
~Yg
~ivJIrgv + ~
= 0,
(2)
where v is the velocity of the pixel, thus ,A~g~iv v + vC~rad~Fg + ~
= 0.
(3)
P3 = ~ ( P 1 , P2). Knowing ~- is unlikely and this function would not be injective anyway: it translates the integration surface through which the grey level is built and therefore is not invertible. Let us nevertheless try to develop equation (3) and find at least specific cases to solve it. No further constraint can be added to the equation in the absence of knowledge about the phenomenon that originated the deformation, to yield a state relation between velocity components. It suffices to take a simple example. Consider a layer of oil* growing by flowing up to the sea surface. Assuming that the vertical component V~ of the velocity becomes null at the sea level and Vx and Vyare null, except at this level the state relation between velocity components is: Vz x S - ~
V,(r)rdO = O,
*We can imagine that each oil particle can be marked different from one another so that an Eulerian approach of this fluid problem can be chosen.
Deformation detection with frequency modulation with S the section of the flowing tube and e the layer thickness. If additionally it is a constant speed layer rise, the radial component V, is given by:
(SVz i and the conservation equation yields /ff ~ V, Y, ~¢-g ~Jt/g = 0 • g ~-~-+ t3r + t3t
or by substitution
(svz) 1+ vr00+ if the reference point is exactly the center of the flow expansion. Thus, it means that one has no better conditioned equation yet, as the flow expansion center is not part of the given knowledge and remains to be brought out. So the correct expression of the conservation equation should be expressed with a non-null tangential component: ~," 0 Vr +
J ~-~
O.V a
v~-
1 t3 V o
1 ¢3~¢'g + ~.A/" o = O,
+ W~rT-ff + vG-~d-
~t
where V~is no longer a function of r only. We revert to the case where these equations become unsolvable. Nothing to prove iterative methods converge unless some knowledge on the center position is added somehow and then checked. Holt ~2) deals with the pure expansion case based on feature point correspondence and Goldgof ~3~ outlines a method to decide between homothetic or nonhomothetic motion based on curvature changes. 2.3. Conclusion Solid mechanics prompts an approach through motion that, in turn, sets up a conservative equation. In the simplest cases, general hypotheses actually bring typical equations of optical flow. Then, additional hypotheses are required. However, it is easy to build examples still simple where adding state equations, on the velocity for instance, does not yield better conditioned equations.
Lasa Beam
Referring to a quite different type of regularization, slightly more concerned with frequencies, resolution, etc.,114~it could be that most problems in this approach merely come from the observation scale. Indeed, in some cases, changing the resolution turns deformation into motion. We have underlined that the digitization process, via ~-(P1, P2), plays a major part in this transformation of motion into deformation. Over-sampling pictures would possibly reveal P1 and P2, while conversely subsampling would likely erase grey-level changes originated by pure deformations and leave mostly motion. Yet, this scale is imposed by the application. Consequently, in most cases, motion does not structure a scene to the extent that it would allow deformation analysis, simply because it does not separate clearly from deformation. Moreover, the plastic deformation model, in which a steady state of deformation remains, suggests that deformation is not a subfeature, secondary effect of motion. Then, whether it is elastic or plastic, deformation is a feature likely to be supported by the scene structure more than by motion, however dynamic the scene is. In this study, deformation is defined as any departure from a conjectured structure, for instance from a regular texture, or from any object's model if it were available. Such a "perceived deformation" is to be confirmed either in time (e.g. for most dynamic scenes a conservation assumption is made about the selected image pattern) or in space (either some common-sense knowledge is taken for granted such as "chair-backs are supposed to be smooth, with no bumps, or it hurts...", or some object model is conjectured etc.). Furthermore as "image intensity" is unreliable at the considered scale (at least, is unknown), one is led to explore "image frequency" that is translated into texture in the present work. So let us try to look closer for this structure and its related deformation processes. 3. T E X T U R E A N D D E F O R M A T I O N
By analogy, one can remark that the deforming function acts on images as diffraction would spread out a point laser beam (see Fig. 1). A spatial slit would
Inset spatial diffraction
~bt sl~readseet the point beam
Monochromatic wave
1387
_LD~otn~tioa a "frequ~y fllt" frequency diffraction that s4~readsout the "point" moaodwomatic wave
Fig. l. A diffraction in the frequency domain.
1388
B. COLLIN and B. ZAVIDOVIQUE
display a point as a spot. Deformation just spreads the frequency content of the image and makes a "frequency spot" from every frequency. In another way, accurate surface inspection based on interferometry is common. A monochromatic source of light, perfectly known and structured, projected onto a material, reveals surface defects. Likewise "moir6" patterns are routinely used to make early state scoliosis observable, for instance. That way, structured lighting helps to reveal deformation: of course, here plasticity is discussed and not elasticity. Now, in the case where image processing cannot rely on such structured lighting, then another support needs to be involved for structuring deformation. Huang 115) models 3D surfaces, while reconstructing them by an adaptive mesh. Jasinschi ~16) outlines quite a similar idea by approximating a surface by triangles in a net, the bending of which is evaluated. Both indicate that constraints of the meshing process provide for some stiffness in the link with. the image. However, the structure that is supposed to be noticeably deformed refers more to surfaces than to direct image content. By reverse analogy, we think that texture could play the role of the former active artificial lighting, allowing this passive analysis, necessarY to a generality of the applications. After Gabor, the Fourier spectrum becomes the unexpected candidate to localize any such phenomenon inside a picture. It will be shown that deformation is in evidence in the spectrum, but at the cost of a multistage analysis. This summarizes the idea of deformation extraction by frequency modulation methods.
3.1. A simple example(Fig. 2) In the Fourier transform, from a spatial domain
(x,y) to frequency domain (vx, vr), the entire spatial signal contributes to the spectral power related to vx, vy
(a) 2D sine wave and constant
frequencies: so, any deformation of the spatial signal affects all frequencies. Let us consider the following elementary example: S(x) a signal, S(v) its Fourier transform and Sn(x) a perturbed version:
~S(x)+C o if xe[Xo,Xo+A], Sd(x)= ~ S(x) elsewhere like a local rise in temperature leading to a very simple deformation. The Fourier transform of Sd is: sin ~vA
g~tv) = g(v) + ACo ~vA The deformation is not observable at all frequencies, since sin~vh becomes null for v = k/A (kEN). In the discrete 2D version suitable for image processing:
"[,(vx, v,) = l(vx, v~) + Co e-2i~v.. + v.b,
xFSinrtVx(2a+l)][sinlrvr(2b+l)] L
~
sin ~vy
'
where a and b are dimensions of the perturbation gate. 3.2.
Capturingdeformationfrom texture
The former elementary example reminds us that a deformation shows up on the whole spectrum. Moreover, the more localized the deformation, the larger the spectrum extends (~-(6)= 1). Hence, the following statements can be made: • Texture is fundamental in the deformation analysis since it acts as a separating filter, providing multiple "image instances" of the deforming function. • Texture is the corresponding passive feature of the active coherent sine waves. The following model outcomes from the analysis above:
(b) Deformation localization
Fig. 2. A simple example and its results.
Deformation detection with frequency modulation
A texture offers a set of carrier frequencies. The deformation results into a spectrum change around each carrier, according to some frequency modulation of each carrier by the same deformation signal. A given texture is described by: ~-- = {f~, carrier frequencies}
(being aware that f~ is in fact a two-dimension frequency vector, i.e. f~c= fc.xvx i i + fc.yVy), a deformation applying on the texture ~-- makes a set of modulation envelopes appear around each f~, including possible folds. Therefore, after outlining this model of both texture and deformation the limits of the analysis must be studied. 3.3. The modulation model limitations Although deformation affects all the original peaks of the texture spectrum, as said before, it should be easy to determine both the amplitude and the position of this deformation. Nevertheless, two major limitations occur: • Original frequencies of the pure texture are not known. Actually, it will amount to a frequency demodulation without knowing the carrier wave, asking for an a priori spectrum analysis or, if a model is available, for an adaptive identification. • The texture signal was not created to inform about deformation, making our detector's life easier. Even in a very structured texture, many frequencies do coexist and a spatial deformation does modify all of them. When modifications become too large, different contributions interfere and it becomes impossible to identify them anymore. Intuitively, the second limitation may originate from the diffraction-like phenomenon. We will assume that the deformation can be fully recovered if:
Given a texture Y = { f~ } and a deformation ~, the spectrum width (W(@)) of the deforming transformation is smaller than the half-width between two frequencies of the original signal:
1389
textures if they appear structured enough. Sh is defined by Sh = {f~, kf~ = fo}- The frequency fo is related to the smallest periodicity of the textural patch. This means that information about deformation can be found at various scales. Consequently, the deformation periodization that appears serves the analysis in avoiding complex choices about the frequency band to select. However, accordingly, it forces the deformation to fit a given frequency domain and likewise constrains the texture to have a sparse enough spectrum. The deformation analysis algorithm that will now be described is therefore adapted to rather structural textures (see example, textures in Fig. 9) and, found here again, to low amplitude and limited extent deformations, i.e. the corresponding modulation envelope is narrow enough. Lastly, let us come back to motion, quite unseparable from deformation thanks to the Fourier transform under assumptions above: small deformations work well with translations, that Fourier is invariant with. Proposed algorithms, although designed for a static analysis, can then be run on image sequences without modification, but for a dramatic increase in rapidity. 4. DEFORMATIONEXTRACTIONALGORITHM Attempting a straight frequency demodulation torecover the deforming function would be naive: • carrier frequencies are not accessible • unlike the mono-dimensional case, principles of a bidimensionalmodulation of two scalar fields are not well mastered. One has to help the analysis in clearing pictures of all information that does not pertain to deformation. The method first tries to extract a very simplified grid, significant of some texture present in the image where deformation looms out via local displacements of grid elements. Actually it amounts to keeping only a set of domains from the frequency spectrum. These domains feature properties, as previously exploited. Let us study the characteristics to extract and how to select corresponding peaks. Afterwards, the latter results can be exploited to extract the information related to deformation.
recovered~3i/V j ~ i dist(f~,fi~) > 2W(~). Such an assumption seems reasonable considering the Carson's theorem*. Let us underline that in most cases, it is possible to exhibit a few frequencies, isolated enough for this analysis to apply. This guarantees at least some information on the deformation. Moreover, one can find a set Sh of harmonics in most given
* Carson's theorem givesthe band width W of a modulated signal: let s(t) be the modulating signal, the maximum frequency of which is fro' and let m be the modulation index: "W = 2f=(rn + 1). In our case we will consider small deformations that do not introduce modulation indexes greater than one.
4.1. Selecting frequencies and filtering 4.1.1. Characteristics of a carrier-frequency. As already mentioned, a grid significant of a texture arrangement at a given scale implies that texture is not purely random: such arrangements exist, as does a set of carriers, hence the title structural grid. In fact, this grid does not need to typically represent the texture; the multigrid process brings robustness. A fortiori, unlike with most references already quoted, objects do not need to be extracted beforehand or known in advance, especially since they might not be present in the image; neither does any before- or after-motion state in the dynamic case. Thus, selected peaks to carry
1390
B. COLLIN and B. ZAVIDOVIQUE
deformation have the characteristics: (i) power: selected peaks need to be strong enough so that deformation is likely entirely contained in it; (ii) isolation: selected peaks need to be isolated from one another so that superimposition or intermodulation phenomena might likely be negligible. In the absence of any further information or model on the texture generation, most powered frequencies are searched for. However, within a frequency modulation process, a carrier frequency may be significantly attenuated, or may even disappear: we had better look for strong frequency domains. 4.1.2. The choice of a Gaussian filterin 9. This resembles very much a frequency-repartition search the way a Gabor transform does; in the present case, however, adaptiveness matters more than universal reconstruction ability. The power spectrum partition is not predetermined and adapted as an average; it resuits from applying some real classification technique, unlike Gabor transform using a fixed tessellation of the Fourier space within Gaussian partitions. Each frequency domain can be simply described by • a central frequency; • a confidence interval giving its extension in the Fourier space. This is the first major point that leads to use Gaussianlike domains in order to perform the classification and the filtering. Although the modulation envelope is inaccessible for a priori knowledge, we can figure out how it could be using two mono-dimensional deforming functions. Given a monochromatic sine wave in the direction k, a pure frequency modulation, as for instance a contraction within the direction k of the signal, will affect the carrier f o in a way described in Fig. 3(a). There is no change in the monochromatic wave; the modulation frequencies f ~ are such that:
v.f~,x
f°,x
We can also think of a direction change that does not affect the carrier frequency and, as described in Fig. 3(b), the modulation envelope will take place onto a circle, the radius of which is given by: r m = x / ( f O , x)2 +(f,,y) o 2• The superimposition of these two cases leads, to some extent, to Fig. 3(c). This type of elliptic shape can well be described using a Gaussian and that constitutes the second point that urges us to use a Gaussian filtering. Beside these two major facts, we can also remember that: • The Fourier transform of a Gaussian is known and easy to compute since it is a Gaussian. Filtering can be carried out in the spatial domain and very efficient implementations, both analog and digital, are available. It really matters to the on-the-fly vision, preattentive or not, deformation is especially suitable for this type of vision. • Gaussians are similarly considered within a Gabor's scheme. To the price of adding orthogonality functions, should it be necessary, the textured signal could be at least partially reconstructed from spatial powers measured in pseudo-Gabor's domains to accumulate results from several (unidirectional) filters. Nevertheless, other filtering distributions may be envisioned. It is known that the trade-off is between robustness and precision. As already mentioned, precision does not matter much here: it is already bound to the texture separating capability. A two degrees of freedom ~z type of filtering function would no doubt provide an excellent precision to the detriment of robustness. This precision would be even greater than the limits of the resolution power! Now it is obvious that such an "orthogonal" decomposition and recovering method does not cover every case. If the deforming function, such as torsion, appears to turn the carrier frequency into a modulation envelope whose elliptic shape cannot be assumed in the former decomposition, the domain resulting from the classification will be larger than the modulation envelope itself as Fig. 4 explains. Thus, the filtering process will probably in-
V,
J (&) Frequency modulation
(b) Direction modulation
Fig. 3. A separation between deforming functions.
T)
(c) General modulation
Deformation detection with frequency modulation
1391
dispersion around this direction as defined formally. Let al, and o'f~ be the standard deviation of this dispersion; the Gaussian is meant to be: 9(f~,fr) = ~exp ( × exp (
((f~ - D~)cos 0 - (fr - Drk)sin 0)2"~ 2o}, ((f~ - D~)sin 0 + (Jr - D~)cos
2o'~,,
/ O)Z'~ / (5)
Fig. 4. Larger domain than envelope. clude frequencies that do not belong to the modulation envelope and the result may be less accurate. Without any more valuable knowledge on the deforming function, a more accurate classification scheme cannot be derived, neither, probably, would deformation results be perceptible. We can now look closer at the classification process since we have described precisely the results we intended to obtain. 4.2. Extractin9 the 9rid-like pattern 4.2.1. Power spectrum classification. Given the discrete power spectrum, we extract the set of frequencies f i ( i = 1... N) corresponding to the N highest amplitudes. These frequencies belong to both the set of carrier frequencies and the set of modulation envelopes. A frequency is simply represented by its coordinates f i , f i y and its power Pi; a domain D R will be described by its center of gravity Dx, k Dyk and its power D k. We will also compute the standard deviation in k the directions Dx, Dyk and the orthogonal direction - D k, D k since this will describe the confidence interval of the Gaussian filter relative to the domain. The clustering of two frequencies f i and f~ results in a domain ~k, the parameters of which are given in equations (4). This simple scheme is also valid if extended to frequency/domain and domain/domain clustering. i j D~ P i f x + P j f x Pi + Pj i j D~ = p ' f y + p~f y
(4)
Pl + Pj ok = Pi + Pi" On these bases, we compute a distance matrix from all frequency couples. Then we cluster two frequencies into a domain on the basis that their distance is less than a prefixed threshold T o. This gives another matrix distance and the process can go on. It comes to an end when no more frequency couples or domain couples satisfying the threshold can be found. This is summarized in Fig. 5. 4.2.2. Filterin9. The clustering results in a simple file, giving every domain fundamental direction (Dk~,o k ) - - a l s o described by the angle 0 between the center of gravity of a domain and the fx a x i s - - a n d the
which represents a mere change of coordinate by a rotation 0. At that stage, a bank of Gaussian filters is ready for application to the image. Filters operate in the frequency domain by mere product, respectively, between the real and complex parts of the image and filter. The filter is extended to the whole Fourier plane by symmetry around the origin, since signals are real. Figure 6 displays the original image (a) and resulting frequency* couples (b), then filtered images (c). The pictures (d) and (e) show the differences between Gabor fixed clustering (d) and the frequency modulation clustering (e). One can remark that some domains do not obviously match any structure representative of the texture. This is no problem for the rest of the algorithm: such results are so far from expectations that it should not be difficult to find a way to clear them. It is explained and successfully tested at the end of Section 4.3. 4.2.3. Filter result adaptation. For obtaining the grid, i.e. a set of one-pixel width edges to mesh the picture in following its texture, the picture must be filtered twice again. First, thanks to the directional band-pass nature of previous filters, it is enough to keep local maxima in the spatial direction associated with the filtering one and to a depth corresponding to the frequency of the domain: Given a filteredt image coming from domain D k, whose spatial direction d follows the vector dxu x + dyu r, a pixel M is a local maximum for this domain if: VP/MP = __+at(dxux + druy) P <~M with ~t < 1/x/(Dk) 2 + (O~)2. As already mentioned, direction changes may have occurred, leaving the search for local maxima very approximative even in the direction d, and it becomes worse in others due to plateaus. This can be seen in Fig. 7 where Fig. 7(a) depicts the filtered image and * One can note that the frequency couples are indexed by real numbers. Due to the square image size L and discrete representation, a frequency f is bound to be f = n/L, with ne[-- L/2, L/2]. As we only need the values of the Gaussian filter at discrete places (the image points), we can use a real frequency position as given by the clustering process using equation (5). t We assume that the representation of discrete frequencies is in the range from 0 to ___1/2 as mentioned in the former footnote.
1392
B. COLLIN and B. ZAVIDOVIQUE Frequencies Domains
t
•
set I
[ Distance matrix I
~n
[ Clustering
" [Set of closest
ce < Fixed v a l u e . )
frequencies domains
°
i No END Fig. 5. Classification scheme.
• 2.035000 44.06250 -10.5390 15.31250
6.144531 3.25000
21.50000
21.0000
-0.50000 -24.0000
21.5000 19.0000
-22.0000
4.0000
(~) ,
"
.....
..
;
r)
-~
(c)
44.126500 2.0625000
(b) ,
""
~t,"
~ i "' h~)?,
(d)
(e)
Fig. 6. Analysing a wool surface.
(a) Filtered image
(b) Local m~xim~ extraction Fig. 7. Extracting local maxima
Deformation detection with frequency modulation Fig. 7(b) the local maximum extraction. So, for the second and last filter, every edge is shrunk in the direction d, keeping only the medial axis line. As already mentioned, some results are not satisfactory (see Section 4.2.2) and to obtain a real bidimensionaldescription of the texture network, results have to be associated with one another. To obtain correct final results, an a posteriori check of frequency hypothesis is enough. 4.3. Hypothesis checking The Gaussian filtering associated with the D k domain is valid if described deformations are weak and the central frequency represents the periodic structure of the texture net well. Then, the cardinal of the edge set {C} obeys conditions with respect to the frequency associated with D k. If local maxima form lines with intervals of length I between them, in the d direction, the average number N of such edges should be:
1393
detected, especially as correlation also receives very efficient implementation. Nevertheless, still more accurate results on both the amplitude and position of deformations can be extracted when a grid (i.e. two monochromatic plane waves) is considered. Results, so far, present parallel edges and a grid can be obtained in composing any two different direction results. It should then describe correctly the bidimensional texture. That is why a 2D grid is searched for rather than a more general mesh. Additionally, deformation would better be rendered similarly along both directions. As a consequence, edge sets with different directions but comparable periods will be preferably combined. Such couples do not exist for instance in the last application (marine pictures in Section 5.2). In any case, the algorithm sorts all valid "peaks" by their frequencies and tries the highest first. Still, involving more couples should improve the robustness and overall knowledge on the local deformation within the limits set in the introduction of Section 3.
L
N = 7,
4.4. Deformation extraction from an image
where L is the size of the image in the d direction. This results from band-pass filtering centered in D k, D k, so the ideal monochromatic plane wave for the domain should be:
O(x, y) = ee- 2in(OkxX+ O~y), the wavelength of which should be equal to N. The numbers of edges crossing Ox and Oy are, respectively, L Nx = ~
Nr =
L k' Dr
if we assume a square image. By mere connectedcomponent labeling (eight-connectivity here), the number of edges in the picture is estimated, which allows to check whether the D k domain corresponds to a relevant information on texture. In such a case, the deformation should be present in these edges as well. Not only are validities of the analysis and classification on the spectrum checked a posteriori, but results can be sorted by validity to keep only those associated with domains providing reliable texture informations. At this point, deformation becomes measurable. First, one can notice that band-filtering should have cleared both noise and slow variations as light changes--over the whole image. Filtering results are then reprocessed by mere local correlation associated with a domain for each image. More precisely, for the domain D k and the corresponding filtered image j k , each pixel .Jk(x, y) is associated with the value:
lk(x,y) = ~, i-
~
--n j--
lk(x+i,y+j) n
xI k x+i+
D--~x,y+j+
,
where n must satisfy n >1 Max(LID k, L/Dk). That way, the absence or presence of deformation is quickly
Following the testing of the filtering hypotheses that led to the sorting of the results, grid-generation remains easy. The couple, offering comparable periods, different directions, and the best score, is selected. Hence, results such as in Fig. 8 exist, which need some post-processing. In the case of fixed images, unlike in the sequence case, what is considered deformation has to be made explicit first. To that aim a so-called "base motif" is designed: it is the grid element (curvilinear "rectangle") with the highest occurrence. Both localization and amplitude of the deformation are defined relative to that base motif. The grid is coded as a set of curvilinear trapezoids, i.e. a list of six triples to describe three translations that build the pattern from a selected vertex of it. Hence, the following algorithm exists: • co6nected component labeling of the grid "trapezoids" to obtain regions; • extraction and polygonal approximation of every region outline; • extraction of the left superior corner in the coordinate system associated to the grid periods; • calculation of the three translations (to up-right, down-right and down left corners). That way, the problem is turned into a classical dataclassification problem in'side 6D space. Many different techniques can then be tried comparing the grid polygons to the base motif. Several have been tested, making it obvious that accurate evidence of deformation requires a calibration phase. Consequently, a simplification was attempted that should not lower significantly the measure quality, still allowing quantitative estimations. A geometric deformation is presumed, i.e. the plane surface coated with the texture is put under constj'aints that change image point altitudes. The 3D representation consists of the changes incurred.
1394
B. COLLIN and B. ZAVIDOVIQUE
Fig. 8. Selecting couples. • Spatial position search: each grid element is considered a set of four points, the altitudes of which are varied independently to be back- projected onto the image plane. This involves a mere symmetry and the intersection between the cone built that way and a constant altitude plane. This results in estimated altitudes of every vertex of the image mesh. • Motif rotation: grid elements are now rigid bodies able to rotate around two orthogonal axes. The couple of rotations giving best projection results is estimated. 5. RESULTS
.
5.1 Static results First of all, we introduce the set of textured images we have worked with in Fig. 9. Several images are displayed below, which are distorted by image synthesis. Deformations result from a classical bilinear warping that induces simultaneously contracted and expanded zones. Each figure (Figs 10-14) represents the original image and a perspective view of the deformation surface. In Fig. 10 we have a contraction at th'e center of the image. It can be seen from the 3D representation that, being assimilated to a focal length increase, the deformation is correctly localized. Actually, overshots on the edges come from the deformation representation and generation algorithm. The following example depicts the whole process, from spectrum inspection and classification (Figs 11-13) to the quantification of the detected deformation (Fig. 14). It also aims to prove that localization is accurate: the upper right quarter of the image was magnified and the point altitudes translate the artifical amplification.
Fig. 9. An example of four textured images dedicated to analysis. Figure 15 is a real deformation: a sheet of painted paper was folded introducing two deformations orthogonal to the main texture and was then digitized. Note that texture is well structured, regular, but the elementary motif is quite complex from the spectral point of view. Both localization and amplitude (with a scale factor) look adequate. 5.2. Simpler extraction from image sequences In this case, detection matters but the interest is in tracking image zones being deformed. Motion serves to detect deformation in a very easy way because spotting motion is greatly helped by the grid. Indeed, deformations influence the modulation envelopes of texture frequencies, but these frequencies remain quite
Deformation detection with frequency modulation
1395
Z qualifies the deformation amplitude and X,Y are the image reference frame.
(a)
(b) Fig. 10. Result from a wool patch.
.
X
Y
(~)
(b)
Fig. 11. (a) Image of the spoiled back of a chair; (b) frequency peak classification results. The superimposed Gaussians (from one to six) indicate the extent of conjectured modulation envelopes.
(a) 1
(b) 2
-/
(c) 3
J
I (d) 4
(e) 5
(f) 6
Fig. 12. Effective results of the "Cortex filters" from one to six. Notice that results one and six are close to the zero frequency, giving unreliable information for the method. Numbers 2, 3, 4 and 5 are more directional and closer to the monochromatic wave acceptation.
1396
B. COLLIN and B. ZAVIDOVIQUF
Fig. 13. Local maxima extraction (top) on results of filters 2, 3, 4 and 5 and the same after thinning (bottom).
Z im,sil~ i ~ ! u ~ l ! l
i
iii
.......................... I
•
X
i
~ | ~
1 t
i
iii.i_i~ql ~ ; m • 2
(~)
(b)
(c)
Fig. 14. Filter"2" turns out to be the most " monochromatic", and filter "4" is selected for beingorthogonal. (b) 1-2: Curvilinear rectangle extraction on windows of (a); 3: comparison between average rectangle prototype and curvilinear rectangles: the black surface among white squares indicates the deformation rate. (c) Systematic comparison gave the deformation quantification. Z
I
X
y,_-x
Y
(~)
(b) Fig. 15. Result on a folded sheet of paper.
stable a l o n g the sequence as the F o u r i e r t r a n s f o r m is shift-invariant. This leads to two conclusions: • Spectral features are searched for at the beginning of a sequence. T h e i r modification is only envisioned in case of a drastic evolution a n d is n o t likely to occur
since a m a j o r hypothesis is " n o model break". F o r instance, an accelerated z o o m on an image changes the spectral features of the whole image a n d a b r e a k off in extracting the grid is likely to h a p p e n as soon as frequencies are shifted e n o u g h to differ from the model c o m p u t e d initially.
Deformation detection with frequency modulation
1397
. ;:.; , : f
. ,.~"
(~)
(b) Fig. 16. Tracking a deformed area within a sequence of images.
li:
!i,li);"'
!i
.
.
.
.
.
' /"
Fig. 17. Tracking a deformed area on the surface of the sea.
• Spectral features being constant in time, a logical grid comparison suffices to track deformation. The algorithm is then very simple: the former procedure (static images) is executed on the first image of the sequence; using this single set of resulting frequencies, the grid is extracted from every image in the sequence. An "exclusive or" between extracted grids at t - 1 and t puts the deformation in evidence. Two test sequences are shown. The first deals with the same wool sample: an increasing pressure is brought on four distinct spots of the textured sample. After the maximum pressure has been reached, the pressure zones move before a final phase of pressure decrease. Deformed patterns do not jump into sight from source images but do in resulting images. Figure 16(a) represents texture at the beginning of the sequence; Fig. 16(b) is an array of results at different stages of the deformation process. The second example (Fig. 17), synthetic again, comes from the simulation of a diving object into a real sea (a piece of genuine satellite image). A traveling of the camera, with some pumping to make a more realistic animation, is recovered by mere correlation between grids. In that case, original images and
tracking results are shown. Deformation is noticeable in both cases all along. 6. CONCLUSIONS Two remarks deserve to be underlined as a conclusion about this frequency modulation method for deformation analysis as introduced in this paper. On the one hand, a frequency model does enforce limitations, mainly three: • Texture must somehow be organized. That translates into a power spectrum with quite regularly blanked out frequency peaks. Actually the fundamental signal, characteristic of the textured mesh, is not mandatory as long as a large enough spectral power related to some periodicity (harmonics) does exist. Searching preferably for signal periodicity variations is a benefit of this method. It does not limit too much since most parts of the real textures involve some type of marked regularity--human artefacts (fabrics, buildings, plantations), as well as natural phenomena where waves intervene (ocean surface, vegetation, dunes, etc.).
1398
B. COLLIN and B. ZAVIDOVIQUE
• Texture/deformation separability is guaranteed if the frequency spectrum of the former can be considered as a set of carriers. Spectral power peaks should be spaced enough that introducing a.deformation does not bring any overlap. Actually, using an FM transmission analogy, the texture acts as a bidimensional transmitter that must manage its frequency distribution in such a way as to rend "outbursts" from deformation "audible"; • Deformation remains within acceptable limits. Breaking off the texture net results in either wrong informations (imaginary base motifs) or power loss and measure weakening (mediocre filtering results and then failure of following processes such as trapezoid search). In short, deformation must not generate a change in the texture net larger than a "half period"of it. This amounts to an uncertainty principle: precise localization of the deformation requires a short selected wavelength among the texture net, hence, a low deformation power (power stands for spatial variation). Conversely, large deformations, resulting from a highdeforming power, are recovered from the long waves in a less precise manner. Again, this fits frequency modulation principles since too large a modulation implies overlapping and envelope-splitting pollution. As deformation is sampled by the texture net, this is nothing more than Shannon revisited. From a segmentation point of view, large deformations, in some ways, should appear like motion and then be detected by other methods, at least from a sequence; on a single image, it might extend to the whole texture, substituting to it to the point of becoming undetectable. On the other hand, the method presented in this paper applies to a single image, although results are greatly improved on a sequence, thanks to information redundancy. While this method does assume some regularity--a frequential concept--that deformation upsets, it does not assume any grey-level conservation of any kind. In that respect, it does not belong to optical flow technics. Moreover, correlation techniques required a choice of the reference pattern: this amounts to processing a reduced part of the spectrum. Auto-regressive methods are in great need of directionality and accuracy compared with the energy-based classification on the whole spectrum. Extending the grey-level conservation to some texture parameter conservation would not improve anything. Most parameter fluctuations appear to be poorly correlated with deformation. Several models were tried, such as local histograms,tl 7) third-order momentstl s~or the co-ocurrence matrix,t19) The present method has been successfully tried on a single SPOT image of a sea surface where some perturbation has occurred. It is now being used for more classical surface inspection.
A novel feature was studied and extracted, different from motion, closer to some phenomena already used in preattentive vision.~2°~This is deformation: it triggers action by its sole detection, such as attention focusing on details or starting to track. REFERENCES
1. D. Terzopoulos, A. Witkin and M. Kass, Constraints on deformable models: Recovering 3d shape and nonrigid motion, Artif. lntell. 36(1), 91-123 (1988). 2. M. Takahata, M. Imai and S. Tsuji, Determining motion of non-rigid objects by active tubes, Int. Conf. Pattern Recognition, A, 647-650 (1992). 3. R. Cipolla and A. Blake, Surface orientation and time to contact from image divergenceand deformation, ECC V, 187-202 (1992). 4. E. E. Milios, Recovering shape deformation by an extended circular image representation, Proc. IEEE Conf. Comput. Vis. 20 29 (1988). 5. K. Fujimura, N. Yokoya and K. Yamamoto, Motion tracking of deformable objects based on energy minimization using multiscale dynamic programming, Int. Conf. Pattern Recognition A, 83-86 (1992). 6. S. Chaudhuri and S. Chatterjee, Motion analysis of a homogeneouslydeformable object using subset correspondences, Pattern Recognition 24, 739-745 (1991). 7. C.W. Chen and T. S. Huang, Nonrigid object motion and deformation estimation from thrce-dimensional data, Int. J. Imaging Syst. Technol. 2, 385-394 (1990). 8. D. Terzopoulos and K. Fleischer, Modeling inelastic deformation: viscoelasticity, plasticity, fracture, Proc. SIGGRAPH 269-278 (1988). 9. B. Horowitz and A. Pentland, Recovery of non-rigid motion and structure, Proc. IEEE Conf. Comput. 288 293 (June 1991). 10. A. Pentland and B. Horowitz, Recovery of nonrigid motion and structure, IEEE Trans. Pattern Anal. Mach. Intell. 13, 730-742 (1991). 11. S. S. Chen and M. Penna, Shape and motion of nonrigid bodies, Comput. Vis. Graphics Image Process. 36, 175-207 (1986). 12. R.J. Holt and A. N. Netravali, Motion of nonrigid objects from multiframe comparison, J. Visual Commun. Image Rep. 3, 255-271 (1992). 13. D. B. Goldgof, H. Lee, and T. S. Huang, Motion analysis of nonrigid surfaces, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognition 375-380 (1988). 14. C. Millour, Contribution d la vision dynamique: une approche multi-r~solutions et multi-traitements. PhD thesis, Universit6 d'Orsay PARIS XI (March 1989). 15. W.-C. Huang and D. B. Goldgof, Adaptive-sizemeshes for rigid and nonrigid shape analysis and synthesis,IEEE Trans. Pattern Anal. Mach. Intell 15(6), 611-616 (June 1993). 16. R. Jasinschi and A. Yuille, Nonrigid motion and regge calculus, J. Opt. Soc. Am. A6, 1088-1095 (1989). 17. G. E. Lowitz, Can a local histogram really map texture information? Pattern Recognition 16(2), 141-147 (1983). 18. A. Gagalowicz and C. Touriner-Lasserve, Third order model for non homogeneous natural textures, 8th ICPR Conf. 1,409-411 (November 1986). 19. J. Parkkinen and E. Oja, Coocurrence matrices and subspace methods in texture analysis, 8th Int. Conf. Pattern Recognition Paris (October 1986). 20. V. Brecher, New techniques for patterned wafer inspection based on a model of human preattentive vision Appl. Artif. Intell. SPIE 1708,452-459 (1992).
Deformation detection with frequency modulation
About the Author--BERTRAND COLLIN graduated from the Ecole Normale Sup6rieure de Cachan. He received his Agr6gation de Physique Appliqube in 1987 and has graduated with a Ph.D. in Electrical Engineering from the Paris XI University in November 1994. He is a principal research fellow at the Perception System Laboratory of the Armement Research Center (ETCA), also working with the Meudon Observatory on the solar convective zone investigation through image processing. His research deals with physics models in image processing algorithmics, mainly with the Deformation analysis, as an application carrying stealthy or faint event detection and tracking.
About the Author--BERTRAND ZAVIDOVIQUE is currently professor in the Electrical Engineering Department at PARIS XI University. He is also a scientific advisor at the DRET/ETCA for problems of real-time processing and robotics and heads the Perception System Laboratory. His research interests include perception systems, the impact of their internal organization on external efficiency, real-time implementation, architecture and programming methods within the frame of circuit integration. He holds a MS in Mathematics from PARIS VII University. He received a Ph.D. in Computer Science from the University of Tours (France) and a Doctorate of Science in robot vision from the University of Franche Comt6 (France). He has published more than two hundred scientific papers dealing with image processing, sensor fusion, computer architecture and intelligent control and learning.
1399