Computer Vision and Image Understanding 115 (2011) 759–770
Contents lists available at ScienceDirect
Computer Vision and Image Understanding journal homepage: www.elsevier.com/locate/cviu
Markov random field based phase demodulation of interferometric images Dijia Wu ⇑, Kim L. Boyer Signal Analysis and Machine Perception Laboratory, Department of Electrical, Computer and Systems Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
a r t i c l e
i n f o
Article history: Received 9 April 2010 Accepted 2 February 2011 Available online 17 February 2011 Keywords: Sign ambiguity 2p ambiguity Phase unwrapping Phase demodulation Fringe pattern analysis Interferometric imaging
a b s t r a c t We present a novel method to solve the sign ambiguity for phase demodulation from a single interferometric image that possibly contains closed fringes. The problem is formulated in a Markov random field (MRF) energy minimization framework with the assumption of phase gradient orientation continuity. The binary pairwise objective function is non-submodular and therefore its minimization is an NP-hard problem, for which we devise a multigrid hierarchy of quadratic pseudoboolean optimization problems that can be improved iteratively to approximate the optimal solution. We name the method MSARI algorithm, for Markov based sign ambiguity resolution in interferometry. Compared with traditional path-following phase demodulation methods, the new approach does not require any heuristic scanning strategy, is not subject to the propagation of error, and the extension to three dimensional fringe patterns is straightforward. A set of experiments with synthetic data and real prelens tear film interferometric images of the human eye demonstrate the effectiveness and robustness of the proposed algorithm as compared with existing state-of-the-art phase demodulation methods. Ó 2011 Elsevier Inc. All rights reserved.
1. Introduction Interferometric imaging superposes two or more light waves giving rise to the inference pattern caused by the phase differences between the waves. This constructive and destructive interference forms the interference fringes, which provide pointwise but indirect measurement of the physical quantity changing the optical path. It is widely used in such fields as astronomy, remote sensing, fiber optics, optical metrology, as an investigative technique to recover surface profiles or to infer object shape, deformation, or vibration [1–3]. A typical fringe pattern can be modeled mathematically as a sinusoidal fluctuation (Fig. 2a).
Iðx; yÞ ¼ aðx; yÞ þ bðx; yÞcosð/ðx; yÞÞ
ð1Þ
where a(x, y), b(x, y), represent the slowly varying background and local contrast, respectively. /(x, y) is the phase function obtained by the interferometer which is an analogue of the physical quantity being measured (shape, displacement, deformation, strain, etc.). It is obviously an ill-posed problem to fully reconstruct /(x, y) from the only measurable quantity I(x, y), because an infinite number of possible /(x, y) can satisfy Eq. (1). Specifically, /(x, y) and / (x, y) + 2kp for any integer k will generate exactly the same I(x, y) as /(x, y) owing to the even character and periodicity of the cosine function. These are referred to as the sign ambiguity and the 2p ambiguity, respectively. ⇑ Corresponding author. E-mail address:
[email protected] (D. Wu). 1077-3142/$ - see front matter Ó 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.cviu.2011.02.001
To remove phase ambiguities, many methods have been proposed. Phase stepping interferometry (PSI), first introduced by Brunning et al. [4], is one of the most commonly used techniques for reliable phase determination. It captures a number (typically 3–20) of phase-shifted interferograms each with an exactly known phase step obtained by linearly varying the path difference between test and reference beams. However, PSI requires that the physical quantity being measured and all other imaging conditions remain constant during the time needed to obtain these interferograms. This is impossible for the study of fast transient events such as the prelens tear film of the human eye that flows and thins quickly as described below in the experiment section. To address this, one may obtain a single carrier frequency interferogram by introducing a large tilt in the reference beam of a two-path interferometer or by projecting a linear ruling in profilometry:
Iðx; yÞ ¼ aðx; yÞ þ bðx; yÞcos½x0 x þ /ðx; yÞ
ð2Þ
where the carrier frequency x0 must be higher than the maximum frequency content of the phase in the carrier direction, i.e., x0 > j@/ (x, y)/@xj. This condition ensures that the total phase will grow or decrease monotonically, generating an open-fringe interferogram which is much easier to demodulate than a closed-fringe interferogram, by using well-known techniques such as the Fourier transform [5], synchronous spatial phase detection [6], and phaselocked loop [7]. Although this approach is appealing, introducing a spatial carrier will complicate the experimental procedure and may involve the use of more expensive equipment. Furthermore, the spatial carrier imposes a limit on the highest fringe frequency
760
D. Wu, K.L. Boyer / Computer Vision and Image Understanding 115 (2011) 759–770
that may be recovered. In that case, we have to recover the phase solely from the intensity map Eq. (1) with no additional information. In past years, different methods for the demodulation of single fringe pattern with closed fringes have been reported. Fringe counting [8] is probably one of the simplest phase demodulation methods. However, it relies on robust fringe skeletonization and extremum extraction which can be challenging, especially when the fringe pattern becomes noisy and complex. Regularized phase tracker (RPT) based methods [9–12] assume that the fringe pattern is locally spatially monochromatic. It sequentially demodulates the phase /(x, y) at each site (x, y) by optimizing a regularized nonlinear cost function. These path following demodulators solve both ambiguities in one formulation. However, they are known to have several drawbacks. First, these methods estimate /(x, y) based only on previously demodulated pixels in the neighborhood. This does not guarantee optimal results because only part of the neighboring information is known and used when each pixel is demodulated. Therefore, after the whole fringe pattern is demodulated, a refinement postprocessing is usually carried out to improve the results because all neighboring pixels have been estimated at this stage and better initialization can be made for optimization [13]. Second, the sequential demodulation method is subject to error propagation and results can be severely affected if some of the earlier pixels are demodulated incorrectly. Third, these methods often fail in complex interferograms that contain critical points such as local maxima, minima and saddles where the assumption of phase field /(x, y) being locally planar does not hold and the spurious phase sign problem occurs [9,13]. Hence, they require sophisticated and heuristic scanning strategies such as the fringe follower [9], the quality map [10] or the more recent frequency guided demodulation path [13]. These scanning strategies determine the demodulation sequence to process the critical points last based on the pixel intensity, energy function value, and spatial local frequency, respectively. The same problems also apply to the phase demodulation method through the successive computation of modulo 2p fringe orientation using a vector field regularized estimator [14] and the tile-based propagation method for recovering sign in closed fringe analysis [15]. More recently, Bioucas-Dias et al. proposed a new energy minimization framework to unwrap modulo 2p phase [16]. They proved that the exact optimal solution can be achieved via graph cuts [17,18] as long as the clique potential function is convex. This method produced very good results even in the presence of additive noise and aliasing. However, the sign ambiguity removal, which is the most challenging problem in phase demodulation from single closed fringe pattern, was not addressed in [16]. In this paper, we approach the problem of phase demodulation from a single interferogram by solving the two phase ambiguities separately as shown in Fig. 1. We use the phase unwrapping method in [16] to remove the 2p ambiguity and present a novel sign ambiguity resolution method. It formulates the sign ambiguity as a Markov random field (MRF) based energy minimization problem using the phase gradient orientation map, which is obtained and smoothed by the gradient vector flow [19]. We prove that the new objective function is supermodular and therefore can’t be readily mapped to the graph and solved by max-flow/min-cut algorithm. Instead, we minimize the objective function via extended roof duality [20–22] and propose a multigrid iterative optimization strategy to approximate the optimal solution. We call the proposed method MSARI for Markov based sign ambiguity resolution in interferometry. It does not require any sophisticated scanning strategy or heuristic postprocessing and provides better results than existing state-of-the-art closed fringe pattern analysis
methods. In addition, the extension to 3D volumes is straightforward and it is easy to incorporate a priori knowledge of the phase to be demodulated. Although motivated by the phase estimation from interferometric images, the presented optimization procedure can be applied to general computer vision optimization problems of non-submodular, binary pairwise energy functions based on MRF. The adaptive quadrature filter proposed in [23] for phase recovery from a single closed fringe pattern is also characterized as an energy minimization problem with MRF prior model. But the energy function formulated in [23] is complex-valued with more unknown variables including the in-phase and quadrature components of the fringe pattern as well as the local dominant frequency, whereas the energy function in MSARI only has the binary variables of the signs. In addition, the coarse to fine pyramid minimization used in [23] does not work in high frequency fringes that will be eliminated by the smoothing and subsampling operations. Instead, we use a different multigrid optimization scheme to avoid this problem, as discussed in Section 3.3. The remainder of the paper is organized as follows: In Section 2, we review the energy minimization framework used in modulo 2p phase unwrapping. The sign ambiguity resolution method and multi-grid optimization strategy are discussed in detail in Section 3. Experimental results on synthetic data and real world interferometric images of the prelens tear film of the human eye are given in Section 4 with discussion. Section 5 offers conclusions and outlines future work. 2. Review of modulo 2p phase unwrapping To simplify the problem, the fringe pattern Eq. (1) is usually preprocessed first to suppress the background a(x, y) and normalize the fringe amplitude b(x, y). Given that both terms vary slowly compared with modulating phase /(x, y), they can be removed by a high pass filter and the Hilbert transform [24], respectively. As a result, the fringe pattern is simplified to I(x, y) = cos(/(x, y)). As mentioned above, it is an ill-posed problem to estimate absolute phase /(x, y) from I(x, y) so that assumptions about /(x, y) are required to obtain unique solution. Assumption 1. The absolute value of the phase difference between any two neighboring pixels is less than p. The above assumption, also known as Itoh condition [25], supposes that the phase surface is continuous. It is used by many phase unwrapping algorithms to recover the absolute phase / from its modulo 2p phase /2p, where / = 2kp + /2p and 0 6 /2p < 2p, i.e., to remove the 2p ambiguity. Based on the assumption, phase unwrapping can be formulated as an integer programming problem that minimizes the phase jumps between first order neighboring pixels. The function to be minimized takes the form:
Eðkj/2p Þ ¼
X
VðD/hi;j Þ þ VðD/vi;j Þ
ð3Þ
ði;jÞ2I
where k fki;j 2 Z : ði; jÞ 2 Ig is the wrap-count image denoting 2p multiples, and I is the image grid. V() is a real function, the so-called clique potential. D/hi;j and D/vi;j denote the horizontal and vertical unwrapped phase differences, respectively.
D/hi;j ¼ 2pðki;jþ1 ki;j Þ þ ð/2pi;jþ1 /2pi;j Þ D/vi;j ¼ 2pðkiþ1;j ki;j Þ þ ð/2piþ1;j /2pi;j Þ
ð4Þ
If function V() is convex, such as the commonly used square function V(x) = x2, it can be proved that energy function E(kj/2p) (Eq. (3)) has the following desirable properties:
D. Wu, K.L. Boyer / Computer Vision and Image Understanding 115 (2011) 759–770
761
Fig. 1. The two step phase demodulation method. The sign ambiguity is solved first to obtain the modulo 2p phase /2p(x, y), which is unwrapped next to reconstruct the absolute phase /(x, y). 0
0
Theorem 1. If k is locally optimal, that is E(k) 6 E(k ) for all k = k + d, where d is any binary image d {di,j 2 {0, 1}:(i, j) 2 I}, then k is also globally optimal. Theorem 2. Let kt be the globally optimal minimizer of E(kj/2p) on Dt , where Dt fk : 0 6 ki;j 6 tg, then there exists a binary image d such that kt+1 = kt + d is the global minimizer of E(kj/2p) on Dtþ1 . Theorem 1 indicates that the energy function E(kj/2p) can be minimized by iteratively optimizing the following binary problem: tþ1
k
t
t
¼ k þ arg min Eðk þ dj/2p Þ d
ð5Þ
Theorem 2 assures that starting with k0 = 0, the above optimization will converge in at most K iterations to the global minimizer k⁄ of Eq. (3), where K is the maximum wrap count spanned by k⁄. We refer the reader to [16] for detailed proofs of Theorems 1 and 2. tþ1 t Let ki;j ¼ ki;j þ di;j and substitute it in Eq. (3). The resulting binary energy function can be rewritten as: t
Eðdjk ; /2p Þ ¼
X
Eh ðdi;jþ1 ; di;j Þ þ Ev ðdiþ1;j ; di;j Þ
ði;jÞ2I
Eh ðdi;jþ1 ; di;j Þ ¼ Vð2pðdi;jþ1 di;j Þ þ ch Þ Ev ðdiþ1;j ; di;j Þ ¼ Vð2pðdiþ1;j di;j Þ þ cv Þ t ðkiþ1;j
c h ¼ 2p
t
t ki;j Þ
ð6Þ
þ ð/2pi;jþ1 /2pi;j Þ
t
cv ¼ 2pðki;jþ1 ki;j Þ þ ð/2piþ1;j /2pi;j Þ All pairwise terms Eh() and Ev() of the binary energy function Eq. (6) follow the values below:
Eð0; 0Þ ¼ VðcÞ;
Eð0; 1Þ ¼ Vðc 2pÞ
Eð1; 1Þ ¼ VðcÞ;
Eð1; 0Þ ¼ Vðc þ 2pÞ
ð7Þ
where E represents Eh or Ev and, correspondingly, constant c denotes ch or cv. If function V(.) is convex, it is easy to prove that E(0, 0) + E(1, 1) 6 E(0, 1) + E(1, 0), the so-called regular condition [17]. Therefore, all pairwise terms of Eq. (6) are submodular and its global minimum can be computed in polynomial time as a min s–t cut in an appropriately mapped graph [22,26,17]. In summary, the phase unwrapping from modulo 2p phase is formulated as an energy minimization problem, which can be decomposed into a sequence of binary optimizations with each then minimized via graph cuts.
3. Sign ambiguity removal 3.1. Orientation based energy function Section 2 described a robust method to recover absolute phase / from its modulo 2p phase /2p. However, /2p is usually not directly available from a single closed fringe pattern I(x, y) = cos(/(x, y)) due to the sign ambiguity, as mentioned above. To remove the sign ambiguity, Assumption 1 is not sufficient because cos1(I(x, y)) satisfies the Itoh condition, but is usually not the solution for practical interferometry applications. Instead, the measured phase surface /(x, y) can be considered as a smooth function in many real problems [27,14], which means that not only / but also its gradient (/x, /y) is continuous. Accordingly, here we make the pffiffiffiffiffiffiffisecond assumption about the phase gradient orientation h:(j ¼ 1)
/y /x ffi þ j qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ejh ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 2 /x þ /y /x þ /2y
ð8Þ
Assumption 2. The direction of the phase gradient [cosh, sinh] is continuous almost everywhere except for critical points including local maxima, minima and saddle points. By definition 8, the phase gradient orientation h is in the range of [0, 2p) with 2p periodicity, which means that h and h + 2kp point in exactly the same direction. Hence, the above continuity assumption also holds for orientation h, but with a 2p dislocation when h jumps between 0 and 2p. It is easy to show that the phase gradient orientation / is closely related to the image gradient orientation w, which can be directly calculated from the image:
Ix Iy ejw ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffi þ j qffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 2 Ix þ Iy Ix þ I2y
ð9Þ
where
Ix ðx; yÞ ¼ sinð/ðx; yÞÞ/x ðx; yÞ Iy ðx; yÞ ¼ sinð/ðx; yÞÞ/y ðx; yÞ
ð10Þ
As shown above, the phase gradient orientation h is exactly the same as or opposite to w depending on the sign of sin(/(x, y)) which is denoted as binary variable s(x, y):
sðx; yÞ ¼
1 sgnðsinð/ðx; yÞÞÞ 2
ð11Þ
762
D. Wu, K.L. Boyer / Computer Vision and Image Understanding 115 (2011) 759–770
where sgn(x) is the signum function. Given the continuity assumption of the phase gradient orientation, the image gradient orientation w has a jump of p wherever sign s(x, y) flips, as shown in Fig. 2d and e. To determine the sign image s(x, y), we formulate a binary energy function that minimizes the absolute differences of neighboring phase gradient orientation:
EðsjwÞ ¼
X
Vðhi hj Þ ¼
ði;jÞ2N
X
Vððsi sj Þp þ ðwi wj ÞÞ
ð12Þ
ði;jÞ2N
where N is a neighborhood system on image I. i and j are pairwise pixel indices. hi = wi + sip and hj = wj + sjp so that all pairwise terms in function 12 satisfy
Eð0; 0Þ ¼ VðcÞ;
Eð0; 1Þ ¼ Vðc pÞ
Eð1; 1Þ ¼ VðcÞ;
Eð1; 0Þ ¼ Vðc þ pÞ
ð13Þ
where c = wi wj. This form appears similar to Eq. (7) but differs in the choice of clique potential function V(). Because of the 2p periodic character of gradient orientation h as mentioned above, the clique potential V() measuring the angular distance between two orientations must also be 2p periodic so that V(2p) = V(0) = 0. For this reason, function V() is no longer convex and the regular condition does not hold. A simple example of this is that if two neighboring pixels have opposite image gradient orientations like wi = 0 and wj = p, then:
Eð0; 0Þ þ Eð1; 1Þ ¼ 2VðpÞ Eð0; 1Þ þ Eð1; 0Þ ¼ 2Vð0Þ ¼ 0 Clearly, E(1, 0) + E(0, 1) < E(0, 0) + E(1, 1) implying that the sign is more likely to flip between pixel i and j. Therefore, the energy function of Eq. (12) is non-submodular and its global minimum can’t be guaranteed by the standard graph-cut algorithm. In addition, it is easy to show that the energy function Eq. (12) is symmetric, which means EðsjwÞ ¼ EðsjwÞ. This global sign ambiguity usually has to be resolved using prior knowledge of the problem. For example, in our case tear film thickness is always nonnegative. 3.2. Improvement with gradient vector flow The energy function Eq. (12) defined above is based on image gradients, which may give rise to two potential problems. First, image derivatives can be sensitive to noise. Second, in those areas where image gradients approach zero, such as local minima, maxima, and saddle points, or regions of nearly constant brightness, the gradient orientation is either not well defined or determined by noise. This problem was also mentioned in [9,15] and two different scanning strategies were accordingly proposed to leave those difficult areas to be demodulated after their surrounding pixels were estimated. We adopted gradient vector flow (GVF) [19] to smooth the gradient vectors and extend them into homogeneous regions lacking
high frequency phase modulation by minimizing the following energy functional:
W¼ ¼
Z Z Z Z
lðu2x þ u2y þ v 2x þ v 2y Þ þ ðI2x þ I2y Þððu Ix Þ2 þ ðv Iy Þ2 Þdxdy lðjruj2 þ jrv j2 Þ þ jrIj2 jm Ij2 dx dy
ð14Þ
This minimization produces the gradient vector flow field
m = (u, v). In this formulation, when jrIj is small, the energy is dominated by the sum of squares of the partial derivatives of the vector field, yielding a slowly varying field with small jruj and jrvj; when jrIj is large, the integrand is dominated by the second term and minimized by setting v = rI. Regularization parameter l controls the tradeoff between the two terms. The smoothed gradient vector flow field m = (u, v) replaces rI = (Ix, Iy) in calculating the image gradient orientation w as shown in Eq. (9). The improvement from introducing GVF prior to phase demodulation will be validated in Section 4. 3.3. QPBO and multigrid optimization As shown in Section 3.1, the energy function of Eq. (12) is nonsubmodular, which turns the optimization into an N–P hard problem and the energy cannot be minimized by max-flow/min-cut algorithms. To tackle this issue, a majorize-minimize approximation method is used in [16]. It is similar to the algorithm proposed earlier in [28] that truncates the pairwise terms violating the regular condition to regular terms. This method is simple. However, it seems to work well only when relatively few terms need to be truncated [29]. Many other approximate optimization methods have been developed for non-submodular functions such as simulated annealing, belief propagation [30], or tree-reweighted message passing [31]. Here we focus on a promising approach called QPBO (Quadratic Pseudo-Boolean Optimization). This method, originally known as roof duality, was proposed in [20] and later improved in [21]. More recently, Kolmogorov et al. introduced and extended this optimization method to computer vision [22,26]. QPBO can be viewed as a generalization of the standard graph cut. It also maps the energy function onto a directed graph G = (V, E) and computes a minimum s t cut on it. But the difference is that for each binary variable si, two nodes s and s are added to V, which corresponds to variable si and its complement si ¼ 1 si , respectively. After running the minimum s t cut, the labeling of si is determined as 1 if s 2 T; s 2 S; 0 if s 2 S; s 2 T and unlabeled otherwise (S and T stand for source and sink subset, respectively). It is equivalent to the linear programming (LP) relaxation of the energy function where binary variables si 2 {0, 1} are replaced with linear constraints si 2 [0, 1]. It can be shown that this LP has a half-integer optimal solution ~si , i.e., ~si 2 f0; 1; 0:5g for each variable, where ~si ¼ 0:5 corresponds to a variable that is unlabeled. The partial labeling s produced by QPBO has the property that there exists a global minimum s⁄ of the objective function such
Fig. 2. Toy example. (a) I(x, y) = cos(/(x, y)), (b) original smooth /(x, y), (c) cos1(I(x, y)), (d) normalized image gradient (Ix, Iy), (e) normalized phase gradient (/x, /y).
D. Wu, K.L. Boyer / Computer Vision and Image Understanding 115 (2011) 759–770
that si ¼ si for all labeled pixels i, i.e., pixels with si = 0 or 1. In two special cases, QPBO can obtain a complete labeling of all variables: 1. If all pairwise terms of the energy function are submodular. 2. If all cycles in the energy function contain an even number of non-submodular pairwise terms. If the second condition holds, we can flip a subset of node pairs, i.e., swapping the meaning of 0 and 1 for some variables so that the original energy function becomes completely submodular. For the problem under discussion (Eq. (12)), it means that any closed path in the image has an even number of sign flips, which holds in ideal cases. However, in real images, the fringe pattern can be noisy and complex causing path inconsistency in sign flips so that QPBO may leave many variables unassigned. To obtain a complete labeling, we adopt the extension of QBPO proposed in [22] called QPBOI, where ‘‘I’’ stands for ‘‘improve’’. It takes an input labeling ~s and tries to improve its energy by fixing a subset of nodes to the labels specified by ~s and running QPBO. These operations guarantee not to increase the energy, and in practice do often decrease it. However, this method can converge to non-optimal stable solutions depending on the initialization ~s. To address this problem, we propose a multigrid optimization scheme to approximate the global minimum. The main idea is to divide the larger image into multiple overlapping smaller subimages and formulate the energy function Eq. (12) on each of these subimages with QPBOI optimization separately. The obtained results are then stitched together as the initialization of the QPBOI on the larger image. This procedure is repeated for multiple levels starting from the smallest subimages. The detailed optimization scheme is described in Algorithm 1. Algorithm 1. The multigrid optimization scheme of the proposed sign ambiguity resolution algorithm. Input: Fringe pattern image I(x, y), number of layers L and number of QPBOI iterations at each layer Nl (1 6 l 6 L). Output: Binary sign image s(x, y). Initialize l = 1 if l < L then Split the image into four subimages. for j = 1 to 4 do Pass each subimage to the next optimization level l + 1. endfor Stitch the obtained results of each subimage together and set it as s0(x, y). else Set s0(x, y) = 0. endif Starting with s0(x, y), run QPBOI for Nl times to approximate the optimal minimizer of Eq. (12). The reasons we use this multigrid optimization strategy are twofold. First, the optimization complexity reduces polynomially with decreasing image size, therefore it is more likely to reach the global minimum for a smaller image. Second, in the energy function, pixels only interact directly with neighboring pixels. Therefore, each subimage can be optimized with little impact from its separation from the other subimages. Due to the global sign ambiguity, as mentioned above, the global sign of each subimage must be verified as consistent in the stitching step. This can be achieved by maximizing the number of pixels with matching signs in the overlapping regions of adjoining subimages, which is not a difficult binary optimization problem, given the limited number of subimages.
763
Finally, we point out the differences between our method and several other related methods. First, the phase gradient orientation h and image gradient orientation w is similar to the fringe direction angle and fringe orientation angle, respectively, as defined in [32]. It solves the probability distributions of binary sign variables s(x, y) using the Gauss–Markov measure field method, which is essentially the same as the linear programming relaxation of the 0–1 integer programming problem formulated in this paper. However, in general, the rounded LP relaxation solution is usually non-optimal, and can be very non-optimal in the sense that the objective function value of the rounded LP relaxation is far away from the objective function value of the optimal integer solution. Second, our multigrid optimization scheme differs from the tilebased propagation method proposed in [15], which also divides the image into multiple smaller subimages. Specifically, our method does not rely on any heuristic path propagation strategy and has no restriction that each subimage or tile must be small enough to contain open fringes only. Our technique is also not the classical multigrid method, namely, the well known pyramid, where images are repeatedly smoothed and subsampled as shown in Fig. 3. That approach was used in [15,23,33] for faster convergence in phase demodulation algorithms. However, the classical multigrid method is limited by the resolution of the original image, because it usually requires at least two to three pixels’ width of fringe period for reliable phase reconstruction. Thus the subsampling process will introduce aliasing to high frequency fringes and cause the phase demodulation to fail [23].
3.4. Extension to 3D The energy based method of removing the sign ambiguity as introduced in this section can be easily extended to 3D fringe pattern analysis. The phase and image gradient orientation are each represented by three-dimensional unit vectors ~ v and ~u instead of angle h and w, where
½/x ; /y ; /z T ~ v ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi /2x þ /2y þ /2z
½Ix ; Iy ; Iz T ~ u ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi I2x þ I2y þ I2z
with ~ v ¼ ð2s 1Þ~u; s ¼ f0; 1g. Accordingly, the objective function Eq. (12) can be reformulated by maximizing the angle between neighboring phase gradient vectors ~ v.
EðsjwÞ ¼
X
Vð1 ~ v Ti ~ vjÞ
ð15Þ
ði;jÞ2N
where neighborhood N is extended from 4 or 8-connectivity on a 2D image to 6 or 26-connectivity in 3D. Eq. (15) is still a binary pairwise energy function in which each pairwise term takes the following values:
Eð0; 0Þ ¼ Vð1 cÞ Eð0; 1Þ ¼ Vð1 þ cÞ Eð1; 1Þ ¼ Vð1 cÞ Eð1; 0Þ ¼ Vð1 þ cÞ where c ¼ ~ uTi ~ uj 2 ½1; 1. When c < 0,E(0, 1) + E(1, 0) < E(0, 0) + E(1, 1), therefore the new energy function is also non-submodular and an optimization scheme similar to that described in Section 3.3 can be used. Similarly, the modulo 2p phase unwrapping method described in Section 2 can also be extended to 3D by simply adding pairwise neighboring pixel cliques in the third dimension into the energy function.
764
D. Wu, K.L. Boyer / Computer Vision and Image Understanding 115 (2011) 759–770
Fig. 3. The difference between (a) the proposed multigrid optimization scheme and (b) classic pyramid image processing.
4. Experimental results and discussion In this section, we present experiments to evaluate the performance of the proposed MSARI algorithm with synthetic data and real prelens tear film interferometric images. The choice of the number of layers L in the proposed multigrid optimization scheme depends on the image size, fringe pattern complexity, and noise level. In general, the results improve with increasing
L but the computational complexity rises, as well. To accelerate the algorithm, we usually set a larger number of QPBOI iterations, Nl, for bottom level subimages and a smaller number in the upper level to refine the results. In the following experiments, we set L = 4 and Nl = 30 for the bottom level (l = L) subimages, Nl = 15 for the middle level and Nl = 5 for the top level (l = 1) unless specified otherwise. l in the gradient vector flow algorithm is set to 0.2.
Fig. 4. Synthetic examples. The first row is /1 and the second row is /2. (a) original image I(x, y) = cos(/(x, y)); (b) image gradient orientation w(x, y) (modulo 2p); (c) obtained sign image s(x, y) after one iteration of QPBOI; (d) s(x, y) after two iterations of QPBOI.
Fig. 5. Synthetic example with additive noise. (a) original image I(x, y) = cos(/(x, y) + n(x, y)); (b) direct image gradient orientation w(x, y) (modulo 2p); (c) smoothed w(x, y) using GVF; (d) s(x, y) after 400 iterations of LBP; (e) s(x, y) after 400 iterations of traditional QPBOI (L = 1); (f) s(x, y) using multigrid QPBOI (L = 4).
765
D. Wu, K.L. Boyer / Computer Vision and Image Understanding 115 (2011) 759–770
Fig. 6. The objective energy function E(sjw) with iteration number of traditional QPBOI (L = 1), multigrid QPBOI (L = 4) and LBP. The left is for /1 and the right is for /2.
As discussed in Section 3.3, the clique potential should be a 2p-periodic function:
VðxÞ ¼ V 2p ðx 2kpÞ
ð16Þ
where jx 2kpj < p. In this study, we select the square function V2p(x) = x2 as used in [34] because it is simple and works well under the continuity assumption of the phase surface as given in Section 3.1. For discontinuous phase surfaces, non-convex clique potentials such as a truncated square function, or the potential used by Biouscas-Dias et al. [16] that penalizes larger amplitude discontinuities less than the quadratic, might be more desirable. 4.1. Synthetic data example Fig. 4 shows two phase images (256 256) to be demodulated. They are synthesized from original absolute phase surfaces formed by the following equations, respectively:
ðx78Þ2 þðy78Þ2 502
ðx178Þ2 þðy78Þ2 502
/1 ¼ 25e /2 ¼ 25e
þ 25e
þ 25e
þ 25e
ðx178Þ2 þðy178Þ2 502
ðx178Þ2 þðy178Þ2 502
ðx78Þ2 þðy78Þ2 502
ð17Þ þ 25e
ðx78Þ2 þðy178Þ2 502
values obtained by these three optimization methods with increasing iterations are plotted in Fig. 6. These methods converge to the energy values listed in Table 1. It is obvious that the multigrid QPBOI optimization of Algorithm 1 achieves better results in considerably fewer iterations. 4.2. Prelens tear film narrowband interferometry We now present results on a set of prelens tear film (PLTF) images captured with a modified Doane’s interferometer [35,36] provided by colleagues at The Ohio State University College of Optometry. It has been used to image full thickness fringes from the PLTF using narrow-band light and thus record the threedimensional spatio-temporal structure of the PLTF. A simplified diagram of the optical system is given in Fig. 7. Light from a
Table 1 The energy function values reached by different optimization methods. LBP and traditional QPBOI (L = 1) runs for 400 iterations, respectively. Multigrid QPBOI (L = 4) runs for 30, 15, 15, 5 iterations from the bottom to the top level.
ð18Þ
In these ideal noiseless cases, the optimization of the energy function Eq. (12) without using the multigrid scheme converges quickly in two iterations of QPBOI. In fact, the results are already close to the global minimizers after the first iteration as shown in Fig. 4c. This observation is consistent with the statement above that if the sign s(x, y) flips an even number of times along any closed path in the image, QPBO can obtain a complete labeling of all variables that is globally optimal. However, when images are noisy or fringe patterns are complex, the sign changes will become inconsistent along different paths. Now we add Gaussian noise with standard deviation 0.5 radians to both synthetic images. The proposed multigrid QPBOI optimization scheme is compared with max-product loopy belief propagation (LBP) [30] and traditional QPBOI. As shown in Fig. 5, the resulting image gradients w(x, y) become highly noisy, but can be effectively smoothed by using gradient vector flow (GVF). LBP converges to the local minima after 400 iterations as shown in Fig. 5d. The traditional QPBOI (L = 1) yields better results, but they are still not optimal, especially for the first image /1. The energy function
E(/1) E(/2)
lbp
qpboi (l = 1)
Multigrid qpboi (l = 4)
6.69 104 8.02 104
2.88 104 4.10 104
2.76 104 3.82 104
Fig. 7. A simplified diagram of the prelens tear film interferometric imaging system.
766
D. Wu, K.L. Boyer / Computer Vision and Image Understanding 115 (2011) 759–770
Fig. 8. Prelens tear film interferometric images I(x, y) = cos(/(x, y)).
Fig. 9. The resulting binary sign image s(x, y) with different optimization methods. Left: LBP, Middle: QPBOI (L=1), Right: multigrid QPBOI (L = 4).
compact tungsten-halide source S is reflected from a glass plate beam splitter B and this source is imaged at the center of curvature of the cornea by the large aperture f/0.8 lens L1. Light therefore strikes the cornea at normal incidence and so is reflected straight back along its path so that a further image of the source is formed near the center of L2, the objective lens of a video camera (Vc is the video sensor). This video camera is focused on infinity, so that the tear film is in focus when it lies in the focal plane of L1. Uncompressed video recordings were made at 30 images per second with an exposure duration of 30 ms. In each recording, the subject was asked to blink about 1 s and then to keep their eyes open for 19 s. The filter F is an 850 nm infrared filter, thus avoiding possible reflex tear stimulation from high luminance stimuli. A more detailed diagram and description of the optical system, including a second broad-spectral-band beam and video camera, ocular alignment system, and fixation point have been presented previously [36].
The example PLTF interferometric images (256 256) are shown in Fig. 8. The interference fringes, analogous to Newton’s
Table 2 The energy function values reached by different optimization methods. LBP and traditional QPBOI (L = 1) runs for 400 iterations, respectively. Multigrid QPBOI (L = 4) runs for 30,15,15,5 iterations from the bottom to the top level. Energy
lbp
qpboi (l = 1)
Multigrid qpboi (l = 4)
E(/1) E(/2) E(/3) E(/4) E(/5) E(/6) E(/7) E(/8)
4.16 104 4.19 104 4.05 104 4.35 104 3.76 104 4.38 104 4.22 104 4.46 104
1.64 104 1.55 104 2.14 104 1.74 104 1.22 104 1.66 104 1.35 104 2.53 104
1.52 104 1.33 104 1.92 104 1.62 104 1.08 104 1.42 104 1.35 104 2.28 104
D. Wu, K.L. Boyer / Computer Vision and Image Understanding 115 (2011) 759–770
rings, indicate the thickness of the tear film (modulo half the optical wavelength) on the surface of a contact lens in a wearer’s eye. The measurement of prelens tear film thickness and reconstruction of its surface has important implications for understanding tear physiology and fluid dynamics associated with dry eye syndromes in contact lens patients.
767
First, we compare the proposed multigrid QPBOI optimization method with LBP and traditional QPBOI. The binary sign images s(x, y) obtained by minimizing Eq. (12) with the three methods are shown in Fig. 9. They are consistent with our results on the synthetic data in that traditional QPBOI performs much better than LBP, however it is still likely to get trapped in local minima, which
Fig. 10. Modulo 2p phase images /2p(x, y) after the sign ambiguity removed. Left: FGRPT; Middle: IQT; Right: MSARI.
Fig. 11. Unwrapped phase surface /(x, y) after the 2p ambiguity removed. Left: FGRPT; Middle: IQT; Right: MSARI.
768
D. Wu, K.L. Boyer / Computer Vision and Image Understanding 115 (2011) 759–770
can be effectively improved by multigrid QPBOI in general. The energy function values achieved by the three optimization methods are listed in Table 2. Next, we compare our MSARI algorithm with the well known regularized phase tracker. The frequency guided scanning strategy [13] is adopted and a 9 9 neighborhood with a value of 0.3 for regularization parameter k are used for the experiments. Another phase demodulation method compared is used in XtremeFringeÒ, a state-of-the-art fringe pattern processing software developed by Indizen Optical Technologies. It is based on a 2-D isotropic quadrature transform (IQT) [27] with a path-following fringe orientation estimator [14]. Like all the other path-following phase demodulation methods, it relies on a scanning strategy such that the neighboring pixel with the highest image gradient jrIj is demodulated first. The regularization parameter l and neighborhood size C are adjusted with l = {0.5, 1, 2, 4} and C = {5, 9, 13}, and the best result is selected. The modulo 2p phase images /2p obtained by these three methods are compared in Fig. 10. As highlighted by the red arrows,
FGRPT results are much less consistent with the original fringe patterns, especially in the high frequency fringes compared with the other two methods. Because the RPT energy function is non-convex, the gradient descent [9] or Levenberg–Marquardt optimization algorithm [13] can converge to a local minimum. Therefore, when the local spatial frequency is high and phase changes too quickly, the initial estimation based on neighboring demodulated pixels may not be close enough to the global minimum for the optimization algorithm to find it. For IQT, the results show many broken or open 2p jump paths especially at local minima, maxima, and saddle points. Ideally, /2p should have 2p jumps along paths which are either closed or cross the image boundaries, otherwise phase discontinuities will be caused after unwrapping. Fig. 11 shows the reconstructed absolute phase surfaces / of the three methods. They are directly obtained by FGRPT, whereas the phase unwrapping algorithm in Section 2 is used for IQT and MSARI. Notice the considerable difference between the results of the three methods, especially from the fourth to the seventh samples. For better comparison, we plotted the contour lines at /
Fig. 12. Contours at /(x, y) = 2kp overlaid on original image I(x, y). Left: FGRPT; Middle: IQT; Right: MSARI.
Fig. 13. (a) The average data error ED (b) The number of pairs of neighboring pixels that do not satisfy the Itoh condition on the unwrapped phase surface /.
D. Wu, K.L. Boyer / Computer Vision and Image Understanding 115 (2011) 759–770
(x, y) = 2kp on original images I(x, y) of Fig. 8. As shown in Fig. 12, it is clear that the contour lines of MSARI are much more consistent with the ridge lines (cos(2kp) = 1) in the original images. To evaluate quantitatively the performance of the two methods, we compare the data fidelity and smoothness, which is measured respectively by the average data error ED, and the number of pairs of neighboring pixels on the unwrapped phase surface /(x, y) with absolute difference larger than or equal to p, i.e., breaking the Itoh condition.
X ED ¼ ðjcosð/ðx; yÞÞ Iðx; yÞjÞ=N
ð19Þ
x;y
where N is the number of pixels in the image. As indicated by by Fig. 13a, FGRPT generates the largest data error compared with IQT and our algorithm, which have zero error due to the exact integer programming problem they formulate. However, MSARI algorithm yields a significantly more continuous phase surface /(x, y) compared with IQT as shown in Fig. 13b. Despite the better performance of the proposed method, it is also less reliable in weak phase modulating fringe patterns where the image or phase gradients become unreliable, as shown in the left bottom corner of the last sample. The unwrapped phase surface /(x, y) is subject to global 2p and sign ambiguities as mentioned above, which are usually solved separately with prior knowledge. However, if /(x, y) is known a priori for some pixels (x, y), it can be incorporated into the proposed energy minimization framework easily by setting highly unbalanced unary terms for the corresponding variables to ensure that they are assigned to the known label. In the case of prelens tear film, if the tear film breaks up, the captured interferometric image will show dry areas where the thickness (phase) is absolutely zero
769
providing a locus of known /(x, y). Based on this and the fact that tear film thickness is never negative, the global sign and 2p ambiguity can be removed as shown in Fig. 14. The absolute tear film thickness d(x, y) can then be easily calculated from the unwrapped phase surface /(x, y) [36]:
dðx; yÞ ¼
/ðx; yÞk 4pn
ð20Þ
where k is the light wavelength equal to 850 nm (infrared) in our experiments and n is the refractive index of the tear (1.33). To our knowledge, this is the first time anyone has automatically and quantitatively measured the absolute thickness of a prelens tear film surface with interferometry. Finally, our MSARI algorithm is implemented in MATLAB with the QPBOI optimization package developed by Kolmogorov [37]. For an image of 256 256 in size, the average processing time is about 8 minutes on a Dell OptiPlex 755 Desktop with IntelÒ Core™ 2 Quad Processor Q6700 (2.66GHz) and 4GB RAM. It depends on the number of layers L and number of iterations at each layer Nl that we set, as well as the complexity of the fringe pattern. 5. Final comments Motivated by the problem of prelens tear film thickness measurement, we presented a novel approach to reconstruct the smooth phase surface from a single fringe pattern in an energy minimization framework based on image gradient orientation. The formulated objective function is non-submodular and therefore the traditional QPBOI algorithm can get trapped in local minima. To address this, we present a hierarchical multigrid optimization strategy to approximate the globally optimal solu-
Fig. 14. From left to right, the frame at 2, 4, 6, 8, 10 and 12 s after blink. The first row: original prelens tear film image with dry areas (zero phase) manually labeled as red; The second row: obtained sign image s(x, y); The third row: modulo 2p phase image /2p(x, y); The last row: absolute phase surface /(x, y) without global sign and 2p ambiguity. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
770
D. Wu, K.L. Boyer / Computer Vision and Image Understanding 115 (2011) 759–770
tion. The experiments show that this algorithm outperforms the path following based phase demodulation method. Ongoing work will include experiments on three dimensional space–time PLTF interferometric video to further improve the result by exploiting the temporal information. However, threedimensional phase gradient orientation estimation might be more susceptible to noise, and the computational cost will increase significantly for GVF computation and QPBOI optimization in three dimensions. To accelerate the algorithm, simpler and faster phase demodulation approaches can be applied first to obtain a reasonably good estimate to initialize our method. It is also desirable to select parameters L and Nl adaptively instead of manually to improve the tradeoff between performance and speed automatically. In addition, this work is based on the assumption of a continuous and smooth phase surface; it will be very challenging to extend it to a single closed fringe pattern of a discontinuous phase surface. Acknowledgments The authors are grateful to Drs. Ewen King-Smith and Jason Nichols of The Ohio State University College of Optometry for support, numerous helpful discussions, and access to data. References [1] D. Malacara, B.J. Thompson, Handbook of Optical Engineering, first ed., Taylor & Francis, Inc., 2001. [2] T. Kreis, Holographic Interferometry: Principles and Methods, first ed., VCH Publishers, Inc., 1996. [3] P. Hariharan, Optical Interferometry, second ed., Elsevier Inc., 2003. [4] J.H. Bruning, D.R. Herriott, J.E. Gallagher, D.P. Rosenfel, A.D. White, D.J. Brangaccio, Digital wavefront measuring interferometer for testing optical surfaces and lenses, Appl. Opt. 13 (1974) 2693–2703. [5] M. Takeda, H. Ina, S. Kobayashi, Fourier transform methods of fringe-pattern analysis for computer-based topography and interferometry, J. Opt. Soc. Am. 72 (1982) 156–160. [6] K.H. Womack, Interferometric phase measurement using spatial synchronous detection, Opt. Eng. 23 (1984) 391–395. [7] M. Servin, R. Rodriguez-Vera, Two dimensional phased locked loop demodulation of carrier frequency interferograms, J. Mod. Optics 40 (1993) 2087–2094. [8] C.J. Tay, C. Quan, F.J. Yang, X.Y. He, A new method for phase extraction from a single fringe pattern, Opt. Commun. 239 (2004) 251–258. [9] M. Servin, J.L. Marroquin, F.J. Cuevas, Fringe-follower regularized phase tracker for demodulation of closed-fringe interferograms, J. Opt. Soc. Am. A 18 (3) (2001) 689–695. [10] J.A. Quiroga, M. Servin, F. Cuevas, Modulo 2p fringe orientation angle estimation by phase unwrapping with a regularized phase tracking algorithm, J. Opt. Soc. Am. A 19 (8) (2002) 1524–1531. [11] M. Servin, J.L. Marroquin, J.A. Quiroga, Regularized quadrature and phase tracking from a single closed-fringe interferogram, J. Opt. Soc. Am. A 41 (2004) 411–419. [12] R. Legarda-Saenz, M. Rivera, Fast half-quadratic regularized phase tracking for nonnormalized fringe patterns, J. Opt. Soc. Am. A 23 (2006) 2724–2731.
[13] H. Wang, Q. Kemao, Frequency guided methods for demodulation of a single fringe pattern, Opt. Express 17 (2009) 15118–15127. [14] J. Villa, I.D. la Rosa, G. Miramontes, Phase recovery from a single fringe pattern using an orientational vector field regularized estimator, J. Opt. Soc. Am. A 22 (12) (2005) 2766–2773. [15] O.S. Dalmau-Cedeno, M. Rivera, R. Legarda-Saenz, Fast phase recovery from a single closed-fringe pattern, J. Opt. Soc. Am. A 25 (6) (2008) 1361–1370. [16] J.M. Bioucas-Dias, G. Valadao, Phase unwrapping via graph cuts, IEEE Trans. Image Process. 16 (3) (2007) 698–709. [17] V. Kolmogorov, R. Zabih, What energy functions can be minimized via graph cuts?, IEEE Trans Pattern Anal. Mach. Intell. 26 (2) (2004) 147–159. [18] Y. Boykov, O. Veksler, R. Zabih, Fast approximate energy minimization via graph cuts, IEEE Trans. Pattern Anal. Mach. Intell. 23 (11) (2001) 1222– 1239. [19] C. Xu, J.L. Prince, Snakes, shapes, and gradient vector flow, IEEE Trans. Image Process. 7 (3) (1998) 359–369. [20] P.L. Hammer, P. Hansen, B. Simeone, Roof duality, complementation and persistency in quadratic 0-1 optimization, Math. Program. 28 (2) (1984) 121– 155. [21] E. Boros, P.L. Hammer, X. Sun, Network Flows and Minimization of Quadratic Pseudo-Boolean Functions, Technical Report RRR 17-1991, RUTCOR. [22] C. Rother, V. Kolmogorov, V. Lempitsky, M. Szummer, Optimizing binary MRFs via extended roof duality, in: Proc. CVPR, 2007, pp. 1–8. [23] J.L. Marroquin, M. Servin, R. Rodriguez-Vera, Adaptive quadrature filters and the recovery of phase from fringe pattern images, J. Opt. Soc. Am. A 14 (8) (1997) 1742–1753. [24] J.A. Ourioga, M. Servin, Isotropic N-dimensional fringe pattern normalization, Opt. Commun. 224 (4–6) (2003) 221–227. [25] K. Itoh, Analysis of the phase unwrapping problem, Appl. Opt. 21 (14) (1982) 2470. [26] V. Kolmogorov, C. Rother, Minimizing nonsubmodular functions with graph cuts – a review, IEEE Trans. Pattern Anal. Mach. Intell. 29 (7) (2007) 1274– 1279. [27] K.G. Larkin, Natural demodulation of two-dimensional fringe patterns. II. Stationary phase analysis of the sprial phase quadrature transform, J. Opt. Soc. Am. A 18 (8) (2001) 1871–1881. [28] C. Rother, S. Kumar, V. Kolmogorov, A. Blake, Digital tapestry, in: Proc. CVPR, 2005, pp. 589–596. [29] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, C. Rother, A comparative study of energy minimization methods for markov random fields with smoothness-based priors, IEEE Trans. Pattern Anal. Mach. Intell. 30 (6) (2008) 1068–1080. [30] P.F. Felzenszwalb, D.P. Huttenlocher, Efficient belief propagation for early vision, Int. J. Comput. Vis. 70 (1) (2006) 19–32. [31] V. Kolmogorov, Convergent tree-reweighted message passing for energy minimization, IEEE Trans. Pattern Anal. Mach. Intell. 28 (10) (2006) 1568– 1583. [32] J.L. Marroquin, R. Rodriguez-Vera, M. Servin, Local phase from local orientation by solution of a sequence of linear systems, J. Opt. Soc. Am. A 15 (6) (1998) 1536–1544. [33] S. Botello, J.L. Marroquin, M. Rivera, Multigrid algorithms for processing fringepattern images, Appl. Opt. 37 (32) (1998) 7587–7595. [34] J. Dias, J. Leitão, The ZpM algorithm for interferometric image reconstruction in SAR/SAS, IEEE Trans. Image Process. 11 (4) (2002) 408–422. [35] M.G. Doane, An instrument for in vivo tear film interferometry, Optometry Vis. Sci. 66 (6) (1989) 383–388. [36] P.E. King-Smith, B.A. Fink, J.J. Nichols, K.K. Nichols, R.M. Hill, Interferometric imaging of the full thickness of the precorneal tear film, J. Opt. Soc. Am. A: Opt. Image Sci. Vis. 23 (9) (2006) 2097–2104. [37] V. Kolmogorov,
.