Applied Mathematics Letters 26 (2013) 774–779
Contents lists available at SciVerse ScienceDirect
Applied Mathematics Letters journal homepage: www.elsevier.com/locate/aml
Optimal sampling and curve interpolation via wavelets Heeyoung Kim a,∗ , Xiaoming Huo b a
AT&T Labs, Florham Park, NJ 07932, USA
b
Georgia Institute of Technology, Atlanta, GA 30332, USA
article
info
Article history: Received 30 December 2012 Received in revised form 4 March 2013 Accepted 5 March 2013 Keywords: Wavelets Lipschitz conditions Optimal sampling Interpolation
abstract We propose a wavelet-based method for determining optimal sampling positions and inferring underlying functions based on the samples when it is known that the underlying function is Lipschitz. We first propose a Lipschitz regularity-based statistical model for data which are sampled from a Lipschitz curve. And then we propose a wavelet-based interpolation method for generating a Lipschitz curve given a set of points, and derive the optimal sampling positions. © 2013 Elsevier Ltd. All rights reserved.
1. Introduction Often data analysis must be conducted based on a few samples taken from the surface (or function) of a response. For enhanced data-based decision making, it is important to determine the optimal sampling positions so that we can get as much information from a limited number of samples as possible, and to accurately estimate the underlying function based on these sampled points so that we can infer the entire function as if we observe a dense enough set of points. For example, in manufacturing, the actual form of manufactured features inevitably deviates from the ideal form specification. To ensure the quality of a manufactured part, one needs to measure a machined part and assess the part compliance with tolerance specifications. To decide the part acceptance while minimizing the cost and time, it is critical to determine optimal locations on the part to be inspected in order to gain maximum part information out of limited number of points, and to accurately estimate the underlying surface of the part given the limited number of measurement points. In this paper, we propose a wavelet-based method, such that when it is known that the underlying function is Lipschitz, we can take a few samples and based on them, infer the entire function. Wavelets have emerged in the last two decades as a useful tool for the analysis of trends [1] as well as the detection of jumps and sharp cusps [2]. This paper uses wavelets as an intermediate tool to create a statistical model for Lipschitz functions. We use the fact that some specially designed wavelets are Lipschitz, and the wavelet coefficients of a Lipschitz curve decay exponentially as a function of the scale index [3, Section 3.5.3]. The rest of the paper is organized as follows. In Section 2, we propose a Lipschitz regularity-based statistical model. In Section 3, we propose a wavelet-based interpolation method for generating Lipschitz curves. In Section 4, we derive the optimal sampling strategy. Finally, we conclude in Section 5. 2. Lipschitz regularity-based statistical model In this section, we examine the mathematical property of Lipschitz functions, and propose a statistical model for data sampled from a Lipschitz function using this property. In Section 2.1, we review the Lipschitz regularity and wavelet
∗
Corresponding author. Tel.: +1 4046634773. E-mail address:
[email protected] (H. Kim).
0893-9659/$ – see front matter © 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.aml.2013.03.002
H. Kim, X. Huo / Applied Mathematics Letters 26 (2013) 774–779
775
transforms, and present the properties of wavelet coefficients for Lipschitz functions. A statistical model for measurements (which are taken from a Lipschitz function) is proposed in Section 2.2. 2.1. Lipschitz regularity and wavelet transform We briefly review the Lipschitz regularity and wavelets, and present an important property of wavelet coefficients for Lipschitz functions, which will be the foundation for our Lipschitz regularity-based statistical model. A function f (·) is called pointwise Lipschitz γ (γ > 0) [4] at point x if there exists a constant K > 0 and a polynomial px (t ) in the neighborhood of x, of degree ⌊γ ⌋ (i.e., the largest integer no larger than γ ) such that
|f (t ) − px (t )| ≤ K |t − x|γ .
∀ t ∈ R,
(1)
A function is uniformly Lipschitz γ over [a, b] (a ≤ b) if for all x ∈ [a, b], there is a constant K (that is independent of x) such that (1) holds. The Lipschitz regularity of a function f is the supreme of γ such that f is uniformly Lipschitz γ . The essence of Lipschitz regularity is how f can be locally approximated by a polynomial function—whose degree is a natural indicator of the smoothness. The Lipschitz regularity can be efficiently measured by wavelet transform. We first review some basics of wavelets. For a function f (x), its wavelet decomposition has a form: f ( x) =
αj φj (x) +
L′
βij ψij (x),
(2)
i=L+1 j∈Ii
j∈IL
where φj (x) are scaling functions at the coarsest scale, ψij (x) are wavelet functions, L is the coarsest scale, IL is the set of location indices at the coarsest scale while Ii is the set of location indices at scale i, finally, i and j are the scale and location indices, respectively. Note that φj (x) and ψij (x) can be derived by shifting and scaling φ(x) and ψ(x): φj (x) = φ(x − jc ) 1
and ψij (x) = 2 2 (i−L) ψ(2i−L x − jc ). Mostly, we choose φ and ψ to have finite support and form an orthonormal basis of the function space being considered. The Daubechies’ wavelets are an example of these wavelet bases, which will be used in this paper. In (2), L′ determines how well (i.e., up to which fine scale in the multiresolution analysis) the experimenter wants the model to approximate the true curve. For fixed i ∈ {L + 1, L + 2, . . . , L′ } and fixed x, j∈Ii ψij (x) has only finite number of nonzero terms. In particular, if we consider Daubechies’ wavelets with p vanishing moments, the number of nonzero terms is 2p + 1. It is also known that if ψ has p vanishing moments, then φ and ψ are roughly Lipschitz γ with γ ≈ 0.2p (Chapter 7 of [4] for more details). Moreover, we can show that if ψ is Lipschitz γ , ψij is also Lipschitz γ as in the following theorem. Theorem 2.1. If ψ is uniformly Lipschitz γ with constant K , then ψij (i > L) is also uniformly Lipschitz γ with constant 1
K · 2(i−L)(γ + 2 ) . Proof. Recall that we have
|ψ(t ) − px (t )| ≤ K |t − x|γ , where px (t ) is a polynomial given in (1). One can easily verify the following: 1
1
1
|2 2 (i−L) ψ(2i−L t ) − px·2(i−L) (2i−L t ) · 2 2 (i−L) | ≤ K · 2(i−L)(γ + 2 ) |t − x|γ . 1
The above is equivalent to the fact that ψij is uniformly Lipschitz γ with K · 2(i−L)(γ + 2 ) .
The following theorem will be used to establish a Lipschitz regularity-based statistical model in Section 2.2. Theorem 2.2. In (2) with the choice of Daubechies’ wavelets with p vanishing moments, if we have − γ + 21 (i−L)
|βij | ≤ A · 2
,
(3)
and ψ, φ are Lipschitz γ with constant K , then f (x) is Lipschitz γ with a constant that is determined by K , A, L − L, p, and αj ′ s. ′
Proof. Similar to the proof for Theorem 2.1, we can show that:
k+2p
1. γ with the constant K · supk j=k |αj |. j∈IL αj φj (x) is Lipschitz 2. For fixed i ∈ [L + 1, L′ ], j∈Ii βij ψij (x) is Lipschitz γ with the constant KA(2p + 1). (Note that we need to call Theorem 2.1 to establish this result.)
Overall, f (x) is Lipschitz γ with the constant: K · supk
k+2p j =k
|αj | + KA(2p + 1)(L′ − L).
The above theorem implies that if the wavelet coefficients of a curve decay exponentially as a function of the scale index, then the curve is Lipschitz.
776
H. Kim, X. Huo / Applied Mathematics Letters 26 (2013) 774–779
2.2. Lipschitz regularity-based statistical model Taking advantage of Theorem 2.2, we establish a statistical model for measurements which are assumed to be taken from a Lipschitz curve. Let Sℓ denote the measurement positions, ℓ = 1, 2, . . . , N where N is the number of measurements, and let Yℓ denote the value of measurement at Sℓ . We propose a Lipschitz regularity-based model as Yℓ = f (Sℓ ) + εℓ ,
1 ≤ ℓ ≤ N,
(4)
where f (·) is given in (2) and εℓ ’s are measurement (random) errors. We assume that f (Sℓ ) is Lipschitz and the error εℓ is negligibly small relative to f (Sℓ ), so the property of Yℓ ’s is mainly up to the underlying function f (Sℓ ). From the estimated coefficients of wavelet functions, we can detect the Lipschitz regularity using the inequality (3). For (3) to be held, the wavelet coefficients should decay as a function of the scale i. The parameters A and γ can be calculated by measuring the intercept and the slope of log2 |βij | as a function of i − L at the finest scales. 3. Wavelet-based random curve interpolating algorithm In practice, very often only a small number of sampled observations are available for inferring the underlying functions. In this section, we propose a wavelet-based interpolation method such that the interpolating curve is uniformly Lipschitz. We can take advantage of the interpolating curves: even though only a small number of sampled points are available, using the interpolating curve, we can infer the entire curve as if we observe a dense enough set of points. Recall the Lipschitz regularity-based statistical model in (4). From this point, we let {Sℓ }Nℓ=1 denote a dense enough set of data positions (i.e., N is large enough), while a small subset of {Sℓ }Nℓ=1 (i.e., much smaller number of sampling positions) is denoted by {sℓ }nℓ=1 where n ≪ N. Similarly, yℓ is a counterpart of Yℓ . Assume that we have taken a small number of sample points at {sℓ }nℓ=1 . According to (2), we have a system of linear equations for the measurements at {sℓ }: yℓ = f (sℓ ) =
αj φj (sℓ ) +
L′
βij ψij (sℓ ),
ℓ = 1, 2, . . . , n.
(5)
i>L j∈Ii
j∈IL
Note that the measurement error εℓ is tentatively left out; we focus on the true surface at this moment. If we consider the dense set of measurements at {Sℓ }, we have a complete system of equations: Yℓ = f (Sℓ ) =
αj φj (Sℓ ) +
L′
βij ψij (Sℓ ),
ℓ = 1, 2, . . . , N .
(6)
i>L j∈Ii
j∈IL
Note that the equations in (5) is a subset of equations in (6). We introduce notations that will facilitate future discussion. Let Y = (f (S1 ), f (S2 ), . . . , f (SN ))T , α = (α1 , α2 , . . . , α2L )T , β = (βij )T —i.e., β is a column vector that contains all βij ’s. Let 81 = (φj (Sℓ ))ℓj and 82 = (ψij (Sℓ ))ℓ,ij —matrix 82 contains all values ψij (Sℓ ) in (6). The system in (6) can be rewritten as Y = 81 α + 82 β.
(7)
Let y = (y1 , y2 , . . . , yn ) , i.e., y is a subset of Y , consisting of the measurements at {sℓ }ℓ=1 . We use y to denote the complement of y within Y . Let Φ1 and Φ2 denote the subset of rows of 81 and 82 , whose membership is consistent with {s1 , s2 , . . . , sn } being a subset of {S1 , S2 , . . . , SN }. The Eq. (5) is equivalent to the following: T
y = Φ1 α + Φ2 β.
Φ1c
n
c
(8)
Φ2c
denote the matrices constructed by the remaining rows of 81 and 82 which are not included in Moreover, let and Φ1 and Φ2 , respectively. The complete system in (7) (or (6)) can be written as follows:
y yc
Φ1 = Φ1c
Φ2 Φ2c
α . β
(9)
Suppose that a few samples have been taken. We introduce a wavelet-based interpolating scheme which interpolates the sampled points and adopts the minimum energy principle. Recall that β can be generated to ensure the Lipschitz regularity as illustrated in Section 2.2. Then in (8), the only unknown variable is α , which satisfies Φ1 α = y − Φ2 β . To interpolate the measurements, we propose to use the minimum ℓ2 norm solution for α that solves the following optimization problem: min ∥α∥22 α
(10)
subject to Φ1 α = y − Φ2 β. The solution to the above problem is presented in the following lemma. Lemma 3.1. Eq. (10) is a quadratic programming problem, which has the closed-form solution:
α = Φ1T (Φ1 Φ1T )−1 (y − Φ2 β).
(11)
H. Kim, X. Huo / Applied Mathematics Letters 26 (2013) 774–779
777
The proof has been relegated to the Appendix. Using Lemma 3.1, we propose a wavelet-based interpolating algorithm as follows: 1. Generate β such that each βij satisfies (3) given A and γ . This ensures the Lipschitz property. 2. Given the sampled points y, apply Lemma 3.1 to obtain α . 3. The interpolating function at the dense set of measurement positions is obtained by (7). The above interpolating algorithm will be called minimum energy interpolation. By letting N → ∞, one interpolates f (x) nearly everywhere. 4. Optimal sampling strategy Based on the Lipschitz regularity-based model and the wavelet-based interpolating algorithm in Sections 2 and 3, we determine the optimal sampling positions for {sℓ }nℓ=1 . To derive the optimal sampling strategy, we assume the existence of a true signal which consists of dense measurements. A sampling strategy is equivalent to selecting a subset of the true signal. Based on the selected subset, the minimum energy interpolating in Section 3 is applied to obtain an estimated signal. The optimum is to minimize the norm of the difference between the true and estimated signals. Under our wavelet-based model, it turns out that the optimal sampling positions are where the scaling functions take the maxima. The rest of this section presents the details of the justification of our optimal sampling strategy. Recall from (9) that the following matrix denotes the complete matrix associated with the discrete wavelet transform and that it is orthogonal:
Φ1 Φ1c
Φ2 . Φ2c
(12)
We suppose that the true surface at the sampling positions is y = Φ1 α0 + Φ2 β0 , where α0 and β0 are the wavelet coefficients of the true surface. Recall from Section 3 that when the minimum energy interpolating algorithm is applied, one needs to generate β . Let β˜ denote such a generated β . Then we have y = Φ1 α + Φ2 β˜ . From Lemma 3.1, we have
˜ = Φ1T (Φ1 Φ1T )−1 (Φ1 α0 + Φ2 β0 − Φ2 β) ˜ α = Φ1T (Φ1 Φ1T )−1 (y − Φ2 β) T T −1 T T −1 ˜ = Φ1 (Φ1 Φ1 ) Φ1 α0 + Φ1 (Φ1 Φ1 ) Φ2 (β0 − β). Let yc = Φ1c α+Φ2c β˜ . Substituting the above α to yc , with the fact that the matrix in (12) is orthogonal, i.e., 0 = Φ1c Φ1T +Φ2c Φ2T , we have
˜ + Φ2c β. ˜ yc = Φ1c Φ1T (Φ1 Φ1T )−1 Φ1 α0 − Φ2c Φ2 (Φ1 Φ1T )−1 Φ2 (β0 − β) Then, we have the difference between the interpolated and the truth as yc − (Φ1c α0 + Φ2c β0 ) = Φ1c Φ1T (Φ1 Φ1T )−1 Φ1 − I α0 + Φ2c Φ2T (Φ1 Φ1T )−1 Φ2 + I (β˜ − β0 ).
We consider the quantity ∥yc − (Φ1c α0 + Φ2c β0 )∥22 , which is the norm of the above difference. It will be desirable if this quantity is small. Given the above equation, recalling n ≪ N, matrix (Φ1c , Φ2c ) is a big proportion of the orthogonal matrix
in (12). Hence the value of ∥yc − (Φ1c α0 + Φ2c β0 )∥22 is minimized when the norm of coefficients M1 α0 and M2 (β˜ − β0 )
are minimized, where M1 = Φ1T (Φ1 Φ1T )−1 Φ1 − I and M2 = Φ2T (Φ1 Φ1T )−1 Φ2 + I. Since α0 and (β˜ − β0 ) are prefixed, to minimize the norm of the coefficients, we need to minimize the eigenvalues of M1 and M2 . Since M1 is a projection matrix, its eigenvalues are 0’s and 1’s. The eigenvalues are minimized (in fact, reduces to zero matrix) when Φ1 is of full column rank. Recall Φ1 has 2L columns. If the row rank of Φ1 is k, the multiplicity of one in Φ1T (Φ1 Φ1T )−1 Φ1 − I is 2L − k. Apparently, larger k is more desirable. The maximal possible k is the sample size; i.e., Φ1 has full row rank. It is more delicate to study the eigenvalues of M2 . Recall that Φ1 Φ1T + Φ2 Φ2T = I, hence Φ1 Φ1T and Φ2 Φ2T can be diagonalized simultaneously. The singular value decompositions of Φ1 and Φ2 consequently can be written as Φ1 = UD1 V1 , Φ2 = UD2 V2 , where U , V1 , and V2 are orthogonal and D1 , D2 are diagonal. Moreover, we must have D21 + D22 = I. Then, M2 = Φ2T (Φ1 Φ1T )−1 Φ2 + I = 2 −2 T V2T D2 U T (UD21 U T )−1 UD2 V2 + I = V2T (D2 D− 1 D2 + I )V2 = V2 (D1 )V2 . Hence minimizing the eigenvalues of M2 is equivalent to maximizing the eigenvalues of Φ1 Φ1T , which is a hard numerical question. We consider a heuristic approach instead. Fig. 1(a) presents sixteen scaling functions, corresponding to Symmlet with 6 vanishing moments at the coarsest level L = 4. Each x-coordinate corresponds to a row in the system (7). Intuitively, eigenvalues of the matrix Φ1 Φ1T is maximized if the matrix Φ1 is diagonally dominated: the diagonal entries in absolute value are much bigger than off-diagonal entries. In Fig. 1(a), this corresponds to finding locations (x-coordinate) such that one scaling function takes a big value, while the other scaling functions take values close to zero at the same site. Sixteen of these positions are marked by dash-dotted vertical lines in the figure: they are the optimal sampling positions. To illustrate the advantage of the proposed optimal sampling strategy, we compare the optimal sample points and some non-optimal sample points in Fig. 1(b). We assume that the black curve is the noisy real data. We also assume that 16 data points can be sampled to estimate the baseline curve. The red circles are the data points sampled using the proposed
778
H. Kim, X. Huo / Applied Mathematics Letters 26 (2013) 774–779
(a) Optimal sampling positions.
(b) Optimal versus non-optimal sample points.
Fig. 1. (a) Scaling functions in the wavelet decomposition: the corresponding maximum positions are 0.056, 0.119, 0.181, 0.244, 0.306, 0.369, 0.431, 0.494, 0.556, 0.619, 0.681, 0.744, 0.806, 0.869, 0.931, and 0.994. (b) Comparison of two interpolating curves: the red one is obtained using the optimal sample points (red circles) and the blue one is obtained using some non-optimal sample points (blue circles). The black curve is the noisy real data. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
optimal sampling strategy (the sampling positions are shown in Fig. 1(a)), and the blue circles are some non-optimal sample points which have been obtained by shifting the optimal sampling positions to the left by 16. Then using the sampled points, baseline curves are estimated using the interpolating algorithm proposed in Section 3. The red and blue curves are based on the red and blue circles, respectively. We observe that the red curve obtained using the optimal sample points is more appealing than the blue curve, which shows the benefit of the proposed optimal sampling strategy. 5. Conclusion We propose a wavelet-based method for determining optimal sampling positions and inferring underlying functions, when it is known that the underlying function is Lipschitz. We use the wavelet framework as an intermediate tool to create a statistical model for Lipschitz functions, while most existing work uses wavelets as a nonparametric smoothing tool. In this paper, we have focused on the case when the underlying boundary is a 1-D function f residing in the unit interval [0, 1]. It can be straightforwardly extended to 2-D surface, if we can assume that the 2-D surface is a tensor production of two 1-D functions, although in reality, such an assumption may be too optimistic. Tensor production simply implies that h(x, y) = f (x) · g (y), x, y ∈ [0, 1], where f (x) and g (y) are uniformly Lipschitz γ > 0. Acknowledgements This project was partially supported by National Science Foundation of United States. The authors appreciate the valuable comments and suggestions from the referees. Appendix. Proof of Lemma 3.1 Proof. Writing the condition in (10) in a generic form, we have min ∥α∥22 ,
subject to Aα = b,
where A is a matrix of full row rank and b is a vector. We consider its Lagrangian: L(α, λ) = α T α +
r
λi (Ai α − bi ) = α T α + λT (Aα − b),
i=1
where λi is the ith Lagrangian multiplier, which is also the ith component of vector λ, Ai is the ith row of A, and bi is the ith entry of b. The above achieves the minimum if and only if
2α + AT λ = 0, Aα = b.
The above can be rewritten as
2I A
AT 0
α 0 = . λ b
Hence we have
H. Kim, X. Huo / Applied Mathematics Letters 26 (2013) 774–779
α 2I = λ A
AT 0
−1 0 b
1
= 2
I−
1 2
AT (AAT )−1 A
(AAT )−1 A
AT (AAT )−1
−2(AAT )
779
T T −1 0 = A (AA T) −1b . b −2(AA ) b −1
For our purpose, we have A = Φ1 and b = y − Φ2 β , which leads to (11).
References [1] D.L. Donoho, I.M. Johnstone, Adapting to unknown smoothness via wavelet shrinkage, Journal of the American Statistical Association 90 (1995) 1200–1224. [2] Y. Wang, Jump and sharp cusp detection via wavelets, Biometrika 82 (1995) 385–397. [3] B. Vidakovic, Statistical Modeling by Wavelets, Wiley, New York, 1999. [4] S. Mallat, A Wavelet Tour of Signal Processing, Academic Press, San Diego, CA, 1998.