Journal of Complexity 32 (2016) 122–136
Contents lists available at ScienceDirect
Journal of Complexity journal homepage: www.elsevier.com/locate/jco
Approximation of piecewise Hölder functions from inexact information Paweł M. Morkisz a,∗ , Leszek Plaskota b a
AGH University of Science and Technology, Faculty of Applied Mathematics, Al. A. Mickiewicza 30, 30-059 Krakow, Poland b Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, ul. S. Banacha 2, 02-097 Warsaw, Poland
article
info
Article history: Received 5 March 2015 Accepted 9 September 2015 Available online 25 September 2015 Keywords: Numerical approximation Noisy data Piecewise smooth functions Adaptive algorithms
abstract We study the Lp -approximation of functions f consisting of two smooth pieces separated by an (unknown) singular point sf ; each piece is r times differentiable and the rth derivative is Hölder continuous with exponent ϱ. The approximations use n inexact function values yi = f (xi ) + ei with |ei | ≤ δ . Let 1 ≤ p < ∞. We show that then the minimal worst case error is proportional to max(δ, n−(r +ϱ) ) in the class of functions with uniformly bounded both the Hölder coefficients and the discontinuity jumps |f (s+ f )−
f (s− f )|. This error is achieved by an algorithm that uses a new adaptive mechanism to approximate sf , where the number of adaptively chosen points xi is only O (ln n). The use of adaption, p < ∞, and the uniform bound on the Hölder coefficients and the discontinuity jumps are crucial. If we restrict the class even further to continuous functions, then the same worst case result can be achieved also for p = ∞ using no more than (r − 1)+ adaptive points. The results generalize earlier results that were obtained for exact information (where δ = 0) and f (r ) piecewise Lipschitz (where ϱ = 1). © 2015 Elsevier Inc. All rights reserved.
1. Introduction The problem of approximating smooth functions based on information about n function values has been very well studied. Much less explored, but more important for applications, is approximation of
∗
Corresponding author. E-mail addresses:
[email protected] (P. Morkisz),
[email protected] (L. Plaskota).
http://dx.doi.org/10.1016/j.jco.2015.09.002 0885-064X/© 2015 Elsevier Inc. All rights reserved.
P. Morkisz, L. Plaskota / Journal of Complexity 32 (2016) 122–136
123
functions that are piecewise smooth only. Note that a function f is piecewise smooth if its domain can be divided into subdomains such that f is smooth in each subdomain. We say that a scalar function g is (r , ϱ)-smooth in an interval [a, b] if g ∈ C r ([a, b]) and g (r ) is Hölder continuous with exponent ϱ ∈ (0, 1]. We consider the space Fr ,ϱ of T -periodic functions f that consist of two (r , ϱ)-smooth pieces separated by an unknown singular point sf . We stress that the periodicity is assumed only to simplify the presentation; after some technical modifications, all the results hold also for nonperiodic functions, cf. [15]. The Lp -approximation in the space Fr ,ϱ (periodic or non-periodic) based on function evaluations has been so far studied assuming Lipschitz continuity of the rth derivatives (that is, for ϱ = 1) and exact information about function values, see, e.g., [1,10,15–17]. In [16], classes of piecewise smooth functions contained in Fr ,1 were identified for which the minimal worst case error is proportional to n−(r +1) , which is also the minimal worst case error for the (globally) smooth functions, cf. [11,21]. This holds for p < ∞ and for the Skorohod metric, but not for p = ∞. However, to obtain such error level, the use of adaption is necessary. Optimal algorithms use an adaptive mechanism to detect essential singularities sf , i.e., those with ‘large’ discontinuity jumps of the derivatives of f . This mechanism basically relies on using divided differences on uniform mesh, and the fact that the largest divided difference covers sf if the mesh-size is small enough. Approximation of continuous functions from Fr ,ϱ ∩ C was studied in [15] where it was shown that then the worst case error of level n−(r +1) can be obtained even for the uniform approximation. A related work has also been done for other problems such as numerical integration [14], solving ODEs [6–9], or solving SDEs [18,19]. For approximation of piecewise smooth functions using other methods (including spectral methods) see, e.g., [2–5,20]. In the present paper we generalize the results of [15,16] in two directions. We assume that the Hölder exponent ϱ ∈ (0, 1] and, what is more important, information is obtained with some error. This means that instead of the exact values f (xi ) for i = 1, 2, . . . , n one obtains perturbed values yi = f (xi ) + ei where |ei | ≤ δ , cf. [12,13]. It is not difficult to see that in the presence of perturbations the minimal worst case error is at least proportional to min(δ, n−(r +ϱ) ). The question is whether this worst case error can be achieved by a particular algorithm for reasonable classes F ⊂ Fr ,ϱ . The answer is not obvious since the adaptive algorithms developed in [15,16] do not work well for inexact information. Indeed, in the presence of noise the divided differences calculated from noisy function values do not say anything about the location of sf if the mesh-size is too small. On the other hand, a localization of the essential singularities is crucial for constructing efficient approximations of the function. For that reason we develop a new version of the adaptive mechanism for detecting singularities. In this mechanism, the divided differences are computed only in the preliminary nonadaptive step for the uniform mesh of size O (n−1 ). Then a special extrapolation technique is applied to approximate an essential singularity (if it exists) with enough accuracy. This technique uses only O (ln n) adaptively chosen points. Moreover, for δ = 0, which corresponds to exact information, it is even simpler than the detection procedure of [15,16]. The improved detection mechanism allows to construct an adaptive algorithm using n function evaluations with precision δ that possesses the desired error properties in the worst case setting. In particular, its worst case error is proportional to max(δ, n−(r +ϱ) ), in the class of functions with − uniformly bounded the Hölder coefficients and the discontinuity jumps |f (s+ f ) − f (sf )|. For this result to hold the use of adaption, p < ∞, and uniform boundedness of the Hölder coefficients and the discontinuity jumps are crucial. Moreover, if we restrict the class even further to continuous functions, then the same worst case result can be achieved also for p = ∞ using no more than (r − 1)+ adaptive points. The results of this paper can also be interpreted as follows. In order to obtain an ε -approximation in the worst case setting for the classes described above it is necessary and sufficient to use n = Ω (ε −1/(r +ϱ) ) function evaluations with precision δ = O (ε). In Section 2 we formally define our approximation problem and provide lower bounds on the minimal error. In Section 3 we show some auxiliary results. The adaptive algorithm is presented and its cost and error analysis is provided correspondingly in Sections 4 and 5. Section 6 contains results of some numerical experiments.
124
P. Morkisz, L. Plaskota / Journal of Complexity 32 (2016) 122–136
2. Preliminaries and main result For an integer r ≥ 0, 0 < ϱ ≤ 1, and a < b denote by Hr ,ϱ (a, b) the space of functions g : [a, b] → R such that g ∈ C r ([a, b]) and g (r ) is Hölder continuous with exponent ϱ, c (g ) :=
|g (r ) (x) − g (r ) (y)| < ∞. |x − y|ϱ a≤x
Given T > 0 let Fr ,ϱ = Fr ,ϱ (T ) be the space of functions f : R → R satisfying the following conditions: there exist sf ∈ [0, T ) and gf ∈ Hr ,ϱ (0, T ) such that f (ℓT + sf + x) = gf (x)
for all ℓ = 0, ±1, ±2, . . . and x ∈ [0, T ).
(1)
That is, f is a ‘copy’ of gf on each of the intervals (ℓT + sf , (ℓ + 1)T + sf ] and f is right-continuous at ℓT + sf . Hence all the points that differ from each other by a multiple of T will be considered identical. For instance, if 0 < x1 ≤ T < x2 ≤ 2T , then the interval (x1 , x2 ] will be identified with (x1 , T ] ∪ (0, x2 − T ] ⊂ (0, T ]. (j) Denote by ∆f the discontinuity jumps of successive derivatives of f at the singular point sf , (j)
(j)
(j)
(j) − ∆f = f (j) (s+ (sf ) = gf (0) − gf (T ), f )−f
0 ≤ j ≤ r.
We are interested in approximating functions f ∈ Fr ,ϱ in the Lp -norm where 1 ≤ p ≤ ∞. The approximations are provided by algorithms that use inexact (noisy) values of f at finitely many points. Specifically, information about f consists of n values yj = f (xj ) + ej ,
1 ≤ j ≤ n,
where ej denotes noise. The noise is deterministic and bounded,
|ej | ≤ δ where δ is nonnegative. We distinguish nonadaptive information where the n points xj are given independently of f , and adaptive information where the choice of the successive points xj (as well as the number n of them) depends on the previously obtained noisy values of f . That is, x1 is fixed and xj = xj (y1 , y2 , . . . , yj−1 )
for j ≥ 2.
Any information will be formally identified with a multi-valued operator N. Specifically, for given f ∈ Fr ,ϱ we denote by N (f ) the set of all possible information y = (y1 , y2 , . . . , yn ) about f . Then N : Fr ,ϱ → Y where Y is the range of N, i.e., Y = ∪f ∈Fr ,ϱ N (f ). Note that y ∈ N (f ) is a formal way of saying that y is information about f . For instance, for nonadaptive information consisting of noisy evaluations of f at x1 , . . . , xn with precision δ we have N (f ) =
y ∈ Rn : |yi − f (xi )| ≤ δ, 1 ≤ i ≤ n .
If δ = 0, then N (f ) is a singleton and information is exact. Having computed information y about f , an approximation to f is given as ϕ(y), where
ϕ : Y → Lp (0, T ) is an arbitrary mapping. For given f and information y ∈ N (f ) the error of approximating f equals
∥f − ϕ(y)∥Lp =
T
|(f − ϕ(y))(x)| dx p
0
1/p
if 1 ≤ p < ∞,
P. Morkisz, L. Plaskota / Journal of Complexity 32 (2016) 122–136
125
and
∥f − ϕ(y)∥L∞ = ess sup |(f − ϕ(y))(x)|. 0
The worst case error of an algorithm ϕ using information N in a given class F ⊂ Fr ,ϱ is defined as p ewor p (ϕ, N ; F ) = sup sup ∥f − ϕ(y)∥L .
f ∈F y∈N (f )
We will distinguish the following classes F :
K = { f = c 1R , c ∈ R }, (j)
Hr ,ϱ = { f ∈ Fr ,ϱ : c (gf ) ≤ 1, ∆f = 0 for all 0 ≤ j ≤ r }, (0)
FrC,ϱ = { f ∈ Fr ,ϱ : c (gf ) ≤ 1, ∆f FrD,ϱ
= 0 },
(0)
= { f ∈ Fr ,ϱ : c (gf ) ≤ 1, |∆f | ≤ 1 },
Fr ,ϱ = { f ∈ Fr ,ϱ : c (gf ) ≤ 1 }. Obviously
K ⊂ Hr ,ϱ ⊂ FrC,ϱ ⊂ FrD,ϱ ⊂ Fr ,ϱ ⊂ Fr ,ϱ . Denote by N (n, δ) the class of all (adaptive) information N that use at most n function evaluations, each with precision δ . Let wor rwor p (n, δ; F ) = inf{ ep (ϕ, N ; F ) : ϕ uses N ∈ N (n, δ) }
be the minimal worst case error in the class F that can be achieved by algorithms using information about at most n function values with precision δ . The following lower bounds on rwor p (n, δ; F ) are rather obvious or can be derived from known results. Proposition 1. For any n and δ ≥ 0 we have 1/p (i) rwor , p (n, δ; K ) ≥ δ T
−(r +ϱ) (ii) rwor for some ar ,ϱ > 0, p (n, δ; Hr ,ϱ ) ≥ ar ,ϱ n wor (iii) rp (n, δ; Fr ,ϱ ) = ∞ if r ≥ 1.
Proof. To see (i) it is enough to notice that y = (0, . . . , 0) is information about the constant functions f±δ = ±δ for any N with precision δ . Hence the error of any algorithm using N is at least ∥f+δ − f−δ ∥Lp /2 = δ T 1/p . wor Since rwor p (n, δ; F ) ≥ rp (n, 0; F ), inequality (ii) immediately follows from known results about the minimal error for exact information, see, e.g., [11]. To show (iii) we use an argument similar to that in [16, Sec. 5.2] where the non-periodic case with ϱ = 1 and δ = 0 was considered. That is, let S (M ) ⊂ Fr ,ϱ be the family of functions fs parameterized by s ∈ [0, T ), f s ( x) =
M T
x1[0,s) (x) + (x − T )1[s,T ) (x) ,
0 < x ≤ T.
Let N be any (adaptive) information using no more than n function evaluations. Since for any fixed x we have that fs (x) assumes only two different values depending on whether s ≤ x or s > x, the total number of points used by N for the class S (M ) is at most 2n − 1. Hence there is an interval [s1 , s2 ] ⊂ (0, T ) of length T 2−(n+1) that does not contain any of such points. This means that N (fs1 ) = N (fs2 ) and therefore the error of any algorithm using N is at least ∥fs1 − fs2 ∥Lp /2 = M (T 2−(n+p+1) )1/p . Since M is arbitrarily large, (iii) follows. −(r +ϱ) It is well known that rwor for some Ar ,ϱ > 0, see again [11]. Proposip (n, 0; Hr ,ϱ ) ≤ Ar ,ϱ n tion 1(iii) says that this result cannot be generalized to piecewise Hölder functions Fr ,ϱ with r ≥ 1.
126
P. Morkisz, L. Plaskota / Journal of Complexity 32 (2016) 122–136
Therefore from now on we restrict our considerations to the class FrD,ϱ of piecewise Hölder functions (0)
with uniformly bounded discontinuity jumps ∆f , and the narrower class FrC,ϱ of piecewise Hölder (0)
and continuous functions. (Observe that ∆f (0)
is uniformly bounded in F0,ϱ since then for any f is
ϱ
|∆f | = |gf (0) − gf (T )| ≤ T .) By Proposition 1(i)–(ii) we have D wor C 1/p rwor , ar ,ϱ n−(r +ϱ) ). p (n, δ; Fr ,ϱ ) ≥ rp (n, δ; Fr ,ϱ ) ≥ max(δ T
In the remaining part of the paper we prove that these inequalities are sharp, except the first inequality D for p = ∞. (Indeed, it follows from [16] that in this case rwor ∞ (n, δ, Fr ,ϱ ) does not converge to zero.) That is, we aim to prove the following theorem which is the main result of this paper. Theorem 1. D −(r +ϱ) rwor ) , p (n, δ; Fr ,ϱ ) = Θ max(δ, n
r∞ (n, δ; wor
FrC,ϱ
) = Θ max(δ, n
−(r +ϱ)
1 ≤ p < ∞,
) .
To reach our goal, we construct an algorithm that possesses the desired error properties. But first, we need to show some auxiliary results on the error of interpolation/extrapolation of piecewise Hölder functions based on inexact function values. 3. Auxiliary results Let m ≥ 2r + 1 and h = T /m. Let ti = ih for all i. Denote by di the (exact) (r + 1)st order divided differences of f that are based on the values f (tj ), i+r +1
i +r +1
di = f [ti , . . . , ti+r +1 ] =
f (tj )
j =i
(tk − tj )−1 ,
k=i k̸=j
and by d˜ i the (noisy) (r + 1)st order divided differences of f that are based on the values yj = f (tj ) + ej with |ej | ≤ δ , i +r +1
d˜ i = f˜ [ti , . . . , ti+r +1 ] =
i+r +1
yj
j =i
(tk − tj )−1 .
k=i k̸=j
Lemma 1. If f ∈ Hr ,ϱ (ti , ti+r +1 ), then
|d˜ i | ≤
2r +1 c (gf )(r + 1)ϱ ϱ−1 h +δ h−(r +1) . (r + 1)! (r + 1)!
Proof. We use the triangle inequality
|d˜ i | ≤ |di | + |d˜ i − di |. For the first component of the sum above we have
|di | = = ≤
|f [xi+1 , . . . , xi+r +1 ] − f [xi , . . . , xi+r ]| xi+r +1 − xi 1 |f (r ) (ξ1 ) − f (r ) (ξ2 )| r! c (gf ) r!
xi+r +1 − xi
≤
(xi+r +1 − xi )ϱ−1 ≤
c (gf ) |ξ1 − ξ2 |ϱ r!
x i +r +1 − x i
c (gf )(r + 1)ϱ ϱ−1 h , (r + 1)!
P. Morkisz, L. Plaskota / Journal of Complexity 32 (2016) 122–136
127
and for the second component r +1 r +1 |d˜ i − di | = h−(r +1) ei (ℓ − j)−1 i=0
≤ δ h−(r +1)
ℓ=0 ℓ̸=i
r +1 r +1
|ℓ − j|−1 = δ
i=0 ℓ=0 ℓ̸=i
as claimed.
2r +1
(r + 1)!
h−(r +1) ,
We now estimate the error of the polynomial interpolation and extrapolation in the presence of noise. Let pi and p˜ i be correspondingly the polynomials of degree at most r interpolating f based on exact and noisy values of f at ti , ti+1 , . . . , ti+r . For r ≥ 1 we define
r r t − ℓ k − ℓ , 0≤t ≤r
r βr = max (t − k), 0≤t ≤r
Λr = max
¯r = Λ
k=0 ℓ=0 ℓ̸=k
k=0
r r 2r + 1 − ℓ k − ℓ . k=0 ℓ=0 ℓ̸=k
Lemma 2. Let f ∈ H0,ϱ . Then for x ∈ [ti−1/2 , ti+1/2 ]
|f (x) − p˜ i (x)| ≤ C0,ϱ (f ) hϱ + δ,
C0,ϱ (f ) = c (gf )2−ϱ ,
and for x ∈ [ti−1 , ti−1/2 ) ∪ (ti+1/2 , ti+1 ]
|f (x) − p˜ i (x)| ≤ C¯ 0,ϱ (f ) hϱ + δ,
C¯ 0,ϱ (f ) = c (gf ).
Let f ∈ Hr ,ϱ and r ≥ 1. Then for x ∈ [ti , ti+r ]
|f (x) − p˜ i (x)| ≤ Cr ,ϱ (f ) hr +ϱ + δ Λr ,
Cr ,ϱ (f ) = c (gf )
βr r 1−ϱ
r!
,
and for x ∈ [ti−r −1 , ti ) ∪ (ti+r , ti+2r +1 ]
¯ r, |f (x) − p˜ i (x)| ≤ C¯ r ,ϱ (f ) hr +ϱ + δ Λ
C¯ r ,ϱ (f ) = c (gf )
(2r + 1)!(2r + 1)ϱ . r (r !)2
Proof. Since the case r = 0 is obvious, we only consider r ≥ 1. We again use the triangle inequality
|f (x) − p˜ i (x)| ≤ |f (x) − pi (x)| + |˜pi (x) − pi (x)|. If x ∈ [ti , ti+r ], then for the first component of the sum above we have
|f (x) − pi (x)| = |(x − ti ) · · · (x − ti+r )f [ti , . . . , ti+r , x]| |f [ti+1 , . . . , ti+r , x] − f [ti , . . . , ti+r −1 , x]| ≤ β r hr +1 ti+r − ti c ( g ) f ≤ βr hr +1 1−ϱ hϱ−1 = Cr ,ϱ (f ) hr +ϱ , r r! and for the second component
i+r i+r x − tℓ ≤ δ Λr , |˜pi (x) − pi (x)| = ek t − t k ℓ ℓ= i k=i
(2)
ℓ̸=k
as claimed. In the case x ∈ [ti−r −1 , ti ) ∪ (ti+r , ti+2r +1 ] we proceed similarly.
128
P. Morkisz, L. Plaskota / Journal of Complexity 32 (2016) 122–136
Lemma 3. Let f ∈ Fr ,ϱ with sf ∈
(ti−1/2 , ti+1/2 ] (ti , ti+r ]
if r = 0, if r ≥ 1.
Suppose that
|d˜ k | ≤ Bhϱ−1 ∀k.
(3)
Then for all x ∈ [ti−1 , ti+1 ] if r = 0, or for all x ∈ [ti−r −1 , ti+2r +1 ] if r ≥ 1, we have
|f (x) − p˜ i (x)| ≤ Dr (B, f ) hr +ϱ + δ Λr , where D0 (B, f ) = c (gf ) + B and Dr (B, f ) = c (gf )
(2r )! βr (r + 1)ϱ + B(2r +1 − 1) for r ≥ 1. r r! (r − 1)!
Proof. We only consider sf ≤ x since the other case is similar. If r = 0, then
|f (x) − p˜ i | ≤ |f (x) − pi+1 | + |pi+1 − p˜ i+1 | + |˜pi+1 − p˜ i | ≤ c (gf )hϱ + δ + Bhϱ = (c (gf ) + B)hϱ + δ, as claimed. Assume now that r ≥ 1 and sf ≤ x < ti+r . Choose j the smallest index such that sf ≤ tj . Obviously i + 1 ≤ j ≤ i + r and x ∈ [tj−1 , tj+r ]. Then
|f (x) − p˜ i (x)| ≤ |f (x) − pj (x)| + |pj (x) − p˜ j (x)| + |˜pj (x) − p˜ i (x)|. Since sf ̸∈ (tj , tj+r ] then
|f (x) − pj (x)| ≤ c (gf )βr hr +1
1 1 (tj+r − tj−1 )ϱ = Cr ,ϱ (f ) 1 + r ! tj+r − tj r
ϱ
hr +ϱ .
(4)
As in (2) we also have
|pj (x) − p˜ j (x)| ≤ δ Λr .
(5)
We estimate the remaining component |˜pj (x) − p˜ i (x)|. For i + r + 1 ≤ k ≤ j + r
(f˜ − p˜ i )[ti , . . . , ti+r , tk ] =
yk − p˜ i (tk )
(k − i)(k − i − 1) · · · (k − i − r )hr +1
.
(6)
On the other hand
|(f˜ − p˜ i )[ti , . . . , ti+r , tk ]| = |f˜ [ti , . . . , ti+r , tk ]| ≤
max i≤ℓ≤k−r −1
|d˜ ℓ | ≤ Bhϱ−1 ,
(7)
where the first inequality follows from [15, Lemma 1] and the second one from (3). Then (6) and (7) give
|yk − p˜ i (tk )| ≤
(2r )! Bhr +ϱ . (r − 1)!
(8)
Now we write
j +r j+r x − tℓ |˜pj (x) − p˜ i (x)| = (˜pj (tk ) − p˜ i (tk )) t −t k=j
≤
ℓ=j ℓ̸=k
k
ℓ
r r t − ℓ max |yk − p˜ i (tk )| max k − ℓ . j≤k≤j+r 0≤t ≤r +1 k=0 ℓ=0 ℓ̸=k
(9)
P. Morkisz, L. Plaskota / Journal of Complexity 32 (2016) 122–136
129
The first max in (9) is estimated by (8). The second max is attained for t = r + 1 and equals
r r r r + 1 − ℓ r +1 = = 2r +1 − 1. k−ℓ k k=0 ℓ=0 k=0 ℓ̸=k
Hence
|˜pj (x) − p˜ i (x)| ≤
(2r )! (2r +1 − 1)Bhr +ϱ . (r − 1)!
(10)
Summing up the right hand sides of (4), (5) and (10) we obtain the desired bound. For ti+r ≤ x ≤ xi+2r +1 we proceed similarly as for sf ≤ x < ti+r . The only difference is that we take the polynomials pj and p˜ j with j = i + r. The proof is complete. 4. The algorithm We now describe an algorithm that uses at most n function evaluations with precision δ and has the worst case error proportional to max(δ, n−1/(r +ϱ) ) in the class FrD,ϱ for p < ∞, and in the class
FrC,ϱ for p ≤ ∞. The crucial parameter of the algorithm is h = T /m with m ≥ 2r + 1, which is the initial resolution of the mesh. It will also use ω = ω(h) satisfying 0 < ω ≤ (r + 1)h. The algorithm firstly approximates the singular point sf . This is done in three steps. In Step 1 the regular grid of size h and divided differences are used to localize sf in an interval [u1 , v1 ] of length (r + 1)h. This interval is further narrowed down to [u2 , v2 ] of length ω in Step 2 using extrapolation polynomials p˜ − and p˜ + from the left of u1 and from the right of v1 , respectively. The final Step 3 produces an interval [u3 , v3 ] ⊆ [u2 , v2 ] in which the difference |˜p+ − p˜ − | is nonincreasing in [u3 , ξ ] and nondecreasing in [ξ , v3 ], where ξ is the final approximation of sf . It will be clear from the analysis in Section 5 that the monotonicity is necessary to approximate the function in [u1 , v1 ] with desired accuracy. We also stress that if the singularity sf is essential (i.e., relatively large compared to h and δ ), then sf ∈ [u3 , v3 ]; otherwise it can be neglected. The three steps can be formally described as follows.1 As before, ti = ih ∀i. Step 1. Evaluate the divided differences d˜ i = f˜ [ti , . . . , ti+r +1 ] for 1 ≤ i ≤ m, and find i∗ = arg max |d˜ i |. 1≤i≤m
Return u1 := ti∗ and v1 := ti∗ +r +1 . Step 2. Denote by p˜ − and p˜ + the polynomials of degree ≤ r that interpolate the data (tj , f˜ (tj )) for i∗ − r ≤ j ≤ i∗ , and for i∗ + r + 1 ≤ j ≤ i∗ + 2r + 1, correspondingly. Do the following iteration. u := u1 , v := v1 while v − u > ω do zj := u + j(v − u)/(r + 2), j = 1, 2, . . . , r + 1 j∗ := arg max1≤j≤r +1 |˜p+ (zj ) − p˜ − (zj )| if |f˜ (zj∗ ) − p˜ − (zj∗ )| ≤ |f˜ (zj∗ ) − p˜ + (zj∗ )| then u := zj∗ else v := zj∗ end while Return u2 := u and v2 := v .
1 By arg max ψ (or arg min ψ ) we mean any argument j maximizing (or minimizing) ψ over j. j j j j j
130
P. Morkisz, L. Plaskota / Journal of Complexity 32 (2016) 122–136
Step 3. Do another iteration. u := u2 , v := v2 while there is a local max of |˜p+ − p˜ − | in (u, v) do z := largest local max of |˜p+ − p˜ − | in (u, v) if |f˜ (z ) − p˜ − (z )| ≤ |f˜ (z ) − p˜ + (z )| then u := z else v := z end while Return u3 := u and v3 := v . The final approximation to sf is
ξ := arg min |˜p+ (x) − p˜ − (x)|. u3 ≤x≤v3
Let Nh be the information operator corresponding to our algorithm. An approximation ϕh∗ (yh ) to f for information yh about f , i.e., for yh ∈ Nh∗ (f ), is now constructed as follows. In the interval [u1 , v1 ) we extrapolate, ∗
p˜ (x) ϕh (yh ) = ˜ − p+ (x) ∗
if u1 ≤ x < ξ , if ξ ≤ x < v1 .
Outside of [u1 , v1 ) we apply a piecewise polynomial interpolation of degree r that is based on r + 1 consecutive points ti , . . . , ti+r such that x ∈ [ti , ti+r ) (or |x − ti | ≤ h/2 if r = 0) and tj ̸∈ (u1 , v1 ) for i ≤ j ≤ i + r. Remark 1. Observe that in Step 1 the sample points are selected nonadaptively, while in Step 2 and Step 3 they are chosen adaptively. Hence our algorithm is in general adaptive. For r = 0 or r = 1 the polynomial p˜ + − p˜ − is correspondingly of degree 0 or 1. Then |˜p+ − p˜ − | does not have any local maximum in (u, v) and Step 3 is omitted. 5. Cost and error analysis Our algorithm uses m function values in Step 1. In each iteration step of Step 2 and Step 3 only one value is used. Then the number of points used is at most
ln
(r +1)h ω(h)
2 ln rr + +1 in Step 2 and at most (r − 1)+ in Step 3. Hence, if ω = ω(h) ≥ khα for some fixed k and α , then the worst case number of points used by the algorithm asymptotically equals m = T /h, as h → 0+ . 5.1. Pointwise error We now analyze the error |f (x) − ϕh∗ (yh )(x)| for each x. We consider several cases depending on the location of the singularity sf . Note that
(u3 , v3 ] ⊆ (u2 , v2 ] ⊆ (u1 , v1 ], where ui , vi for i = 1, 2, 3 are correspondingly the points returned by Steps 1–3 of our algorithm. For Case A below we assume that
δ ≤ bhr +ϱ for some constant b > 0.
(11)
The reason for that will be clear later, see Remark 2. Case A: sf ̸∈ (u1 , v1 ]. Then Lemma 1 together with (11) implies that for all i is
|d˜ i | ≤ Br (b, f ) hϱ−1 with Br (b, f ) =
c (gf )(r + 1)ϱ + b 2r +1
(r + 1)!
.
P. Morkisz, L. Plaskota / Journal of Complexity 32 (2016) 122–136
131
Suppose that ϕh∗ (y)(x) = p˜ i (x) for some i. If x ̸∈ [u1 , v1 ), then x ∈ [ti , ti+r ) (or x ∈ [ti−1/2 , ti+1/2 ) if r = 0) and [ti , ti+r ] ∩ (u1 , v1 ) = ∅. Then by Lemmas 2 and 3 the error of approximation at x is upper bounded by Cr ,ϱ (f )hr +ϱ + δ Λr for sf ̸∈ (ti , ti+r ] (or for sf ̸∈ (xi−1/2 , xi+1/2 ] if r = 0), or by Er (f )hr +ϱ + δ Λr otherwise, where Er (f ) = Dr (Br (b, f ), f ).
¯r On the other hand, if x ∈ [u1 , v1 ), then again by Lemma 3 the error is bounded by C¯ r ,ϱ hr +ϱ + δ Λ for sf ̸∈ (ti , ti+r ] (or sf ̸∈ (ti−1/2 , ti+1/2 ] if r = 0), or by Er (f )hr +ϱ + δ Λr otherwise. Case B: sf ∈ (u1 , v1 ]. Without loss of generality we can assume that u1 < sf ≤ ξ ≤ v1 . By Lemma 2, for x ̸∈ [u1 , v1 ) the error is upper bounded by Cr ,ϱ (f )hr +ϱ + δ Λr , and for x ∈ [u1 , sf ) ∪ ¯ r . Therefore from now on we assume that [ξ , v1 ) by C¯ r ,ϱ (f )hr +ϱ + δ Λ x ∈ [sf , ξ ),
(12)
which is the most interesting case. We have three possibilities. Case B1: sf ∈ (u1 , v1 ] \ (u2 , v2 ]. Then in one of the iteration steps of Step 2 we must have had
|f˜ (zj∗ ) − p˜ − (zj∗ )| ≤ |f˜ (zj∗ ) − p˜ + (zj∗ )| and sf ∈ (u, zj∗ ]. This implies that for all 1 ≤ j ≤ r + 1 is
|˜p+ (zj ) − p˜ − (zj )| ≤ |˜p+ (zj∗ ) − p˜ − (zj∗ )| ≤ |f˜ (zj∗ ) − p˜ − (zj∗ )| + |f˜ (zj∗ ) − p˜ + (zj∗ )| ≤ 2|f˜ (zj∗ ) − p˜ + (zj∗ )| ≤ 2(Cr ,ϱ (f )hr +ϱ + δ Λr ). Using the fact that if for a polynomial p of degree at most r is |p(zj )| ≤ a, 1 ≤ j ≤ r + 1, then for all u1 ≤ x < v1 is |p(x)| ≤ (2r +1 − 1)a, cf. (9), we obtain
|f (x) − ϕh∗ (y)(x)| = |f (x) − p˜ − (x)| ≤ |f (x) − p˜ + (x)| + |˜p+ (x) − p˜ − (x)| = 2r +1 (Cr ,ϱ (f )hr +ϱ + δ Λr ). Case B2: sf ∈ (u2 , v2 ] \ (u3 , v3 ]. Then in one of the iteration steps of Step 3 we must have had
|f˜ (z ) − p˜ − (z )| ≤ |f˜ (z ) − p˜ + (z )| and sf ∈ (u, z ].
(13)
This and (12) imply that
|˜p+ (x) − p˜ − (x)| ≤ max |˜p+ (z ) − p˜ − (z )|, |˜p+ (sf ) − p˜ − (sf )| , since otherwise z would not be the largest local maximum of |˜p+ − p˜ − | in the interval (u, v). Next, we have − ˜ + (sf )| + |f (s− ˜ − (sf )| + |f (s+ |˜p+ (sf ) − p˜ − (sf )| ≤ |f (s+ f )−p f )−p f ) − f (sf )|
¯ r ) + |∆(f 0) | ≤ 2(C¯ r ,ϱ (f )hr +ϱ + δ Λ and by (13)
¯ r ). |˜p+ (z ) − p˜ − (z )| ≤ 2|f˜ (z ) − p˜ + (z )| ≤ 2(C¯ r ,ϱ (f )hr +ϱ + δ Λ Finally
|f (x) − ϕh∗ (y)(x)| = |f (x) − p˜ − (x)| ≤ |f (x) − p˜ + (x)| + |˜p+ (x) − p˜ − (x)| ¯ r ) + |∆(f 0) |. ≤ 3(C¯ r ,ϱ (f )hr +ϱ + δ Λ
132
P. Morkisz, L. Plaskota / Journal of Complexity 32 (2016) 122–136
Case B3: sf ∈ (u3 , v3 ]. Since the function |˜p+ − p˜ − | does not have any local maximum in (u3 , v3 ), it is non-increasing in [sf , ξ ). Hence again
¯ r ) + |∆(f 0) | |˜p+ (x) − p˜ − (x)| ≤ |˜p+ (sf ) − p˜ − (sf )| ≤ 2(C¯ r ,ϱ (f )hr +ϱ + δ Λ and
|f (x) − ϕh∗ (y)(x)| = |f (x) − p˜ − (x)| ≤ |f (x) − p˜ + (x)| + |˜p+ (x) − p˜ − (x)| ¯ r ) + |∆(f 0) |. ≤ 3(C¯ r ,ϱ (f )hr +ϱ + δ Λ We now draw conclusions from our analysis, for the worst case and for the asymptotic settings. 5.2. Worst case setting The pointwise error analysis can be summarized as follows. Suppose that (11) holds, i.e., δ ≤ bhr +ϱ . Then the error |f (x)−ϕh∗ (yh )(x)| is proportional to max(1, c (gf ))hr +ϱ for x ̸∈ (u2 , v2 ], and proportional (0)
to max(1, c (gf ))hr +ϱ + |∆f | for x ∈ (u2 , v2 ], where v2 − u2 ≤ ω. We also have that the number n of function evaluations is proportional to h−1 so that hr +ϱ is proportional to n−(r +ϱ) . These observations immediately yield the following result. Proposition 2. Let 1 ≤ p < ∞. If δ satisfies (11) and ω(h) = h(r +ϱ)p+1 , then ∗ ∗ D −(r +ϱ) ewor ). p (ϕh , Nh ; Fr ,ϱ ) = O (n
It is worthwhile to mention that in the presence of singularities the upper bound of Proposition 2 cannot be achieved by nonadaptive algorithms, as was shown in [16]. It is also shown in [16] that for p = ∞ there is no algorithm with error converging to zero; hence the assumption p < ∞ is crucial. (0) Proposition 2 obviously holds also for the class FrC,ϱ ⊂ FrD,ϱ . However, since then ∆f = 0, we can simplify our algorithm by putting ω(h) = (r + 1)h thus avoiding the iterations in Step 2. For r = 0, 1 the algorithm is then nonadaptive, and for r ≥ 2 it uses only at most r − 1 adaptively chosen points, independently of how small h is. Moreover, the upper bound holds even for p = ∞. Proposition 3. If δ satisfies (11) and ω(h) = (r + 1)h, then ∗ ∗ C −(r +ϱ) ewor ). ∞ (ϕh , Nh ; Fr ,ϱ ) = O (n
As shown in [15], for r ≥ 2 the use of adaptive information is crucial. Propositions 2 and 3 together with Proposition 1(i) already yield Theorem 1 of Section 2. Indeed, for given δ and n it is enough to set h = T /m with
m = m(n, δ) =
min β n,
1 T
1 r +ϱ
b
δ
= Θ min n, δ −1/(r +ϱ) ,
for some fixed b > 0 and 0 < β < 1. Remark 2. Observe that for fixed precision δ it does not make sense to use more than mmax = Θ (δ −1/(r +ϱ) ) function evaluations, cf. (11). Indeed, for m = mmax the maximum accuracy δ is achieved and further increase of m does not decrease the error, see also the tests of Section 6. 5.3. Asymptotic setting Suppose that we are interested not in the worst case error, but rather in the behavior of the error for each individual function f ∈ Fr ,ϱ and information yh ∈ Nh∗ (f ), as h → 0+ . Then simplified versions
P. Morkisz, L. Plaskota / Journal of Complexity 32 (2016) 122–136 y 1.0
1.0
0.5
0.5
0
x 1
2
3
4
5
2
6
–0.5
–0.5
–1.0
–1.0
4
133
6
8
Fig. 1. Test functions f1 and f2 .
of our algorithm achieve the rate n−(r +ϱ) . Indeed, since for fixed f the influence of the size of the (0) discontinuity jump ∆f on the overall error disappears when h → 0+ , for the asymptotic error to converge as n−(r +ϱ) it is again not necessary to perform Step 2. Therefore we have the following result. Proposition 4. Let 1 ≤ p < ∞. If δ satisfies (11) and ω(h) = (r + 1)h, then for any f ∈ Fr ,ϱ and yh ∈ Nh∗ (f ) we have
∥f − ϕh∗ (yh )∥Lp = O (n−(r +ϱ) ). Here adaption is once more necessary if r ≥ 2, and the assumption p < ∞ is crucial, see again [16]. (0) However, if in addition f is continuous, i.e., if ∆f = 0, then the same result can be asymptotically obtained also for p = ∞ using a fully nonadaptive algorithm. Indeed, consider a modification of our algorithm that performs Step 1 and avoids Step 2 and Step 3 at all. The approximation ξ to sf is then determined by minimizing |˜p+ (x) − p˜ − (x)| in the interval [u1 , v1 ] obtained from Step 1. Denote such information and algorithm by Nh∗∗ and ϕh∗∗ . Proposition 5. If δ satisfies (11), then for any f ∈ Fr ,ϱ ∩ C (R) and information yh ∈ Nh∗∗ (f ) we have
∥f − ϕh∗∗ (yh )∥L∞ = O (n−(r +ϱ) ). Proof. For a fixed f ∈ Fr ,ϱ ∩ C (R) define
Tf (t ) =
r j=1
(j) t
∆f
j
j!
and h0 > 0 such that |Tf | is nonincreasing in [−(r + 1)h0 , 0] and nondecreasing in [0, (r + 1)h0 ]. It suffices to show that the error is of order hr +ϱ for any x satisfying u1 < s f ≤ x < ξ ≤ v 1 . We have that f (x) − p˜ + (x) = O (hr +ϱ ) and
f (x) − p˜ − (x) = Tf (x) + O (hr +ϱ ),
where the factors in the O notation depend on f . Then for any h ≤ h0 we have
|˜p+ (x) − p˜ − (x)| ≤ |f (x) − p˜ − (x)| + |f (x) − p˜ + (x)| = |Tf (x − sf )| + O (hr +ϱ ) ≤ |Tf (ξ − sf )| + O (hr +ϱ ) = |˜p+ (ξ ) − p˜ − (ξ )| + O (hr +ϱ ) ≤ |˜p+ (sf ) − p˜ − (sf )| + O (hr +ϱ ) = O (hr +ϱ ),
134
P. Morkisz, L. Plaskota / Journal of Complexity 32 (2016) 122–136
Fig. 2. − log10 (err) vs. m for the function f1 : for δ = 0.5hr +1 and r = 1, 2, 3, 4 (top), and for r = 4 and δ = 0, 10−12 , 10−8 , 10−4 (bottom).
where the second inequality follows from the monotonicity of |Tf |, and the third inequality follows from the definition of ξ . Hence
|f (x) − ϕh∗∗ (y)(x)| = |f (x) − p˜ − (x)| ≤ |f (x) − p˜ + (x)| + |˜p+ (x) − p˜ − (x)| = O (hr +ϱ ) as claimed.
6. Numerical results Our algorithm ϕh∗ using information Nh∗ was implemented and tested with the help of the Wolfram Mathematica environment. Below we present test results for two particular periodic functions, f1 (x) = f2 (x) =
sin(x − π ), sin(x − π − 0.5),
sin(x), sin(x − π ),
0 ≤ x < π, π ≤ x ≤ 2π + 0.5,
0 ≤ x < π, π ≤ x ≤ 3π .
Observe that f1 is discontinuous and f2 is continuous, see Fig. 1.
P. Morkisz, L. Plaskota / Journal of Complexity 32 (2016) 122–136
135
Fig. 3. − log10 (err) vs. m for the function f2 : for δ = 0.5hr +1 and r = 1, 2, 3, 4 (top), and for r = 4 and δ = 0, 10−12 , 10−8 , 10−4 (bottom).
We considered the L2 error and the following two sets of parameters: (i) r = 1, 2, 3, 4 and δ = hr +1 /2, (ii) r = 4 and δ = 0, 10−12 , 10−8 , 10−4 . For each function fj , j = 1, 2, and each set of parameters we tracked the behavior of the worst case error with respect to noise, given by sup ∥fj − ϕh∗ (yh )∥2 ,
yh ∈Nh∗
h = T /m,
(14)
as m → ∞. Specifically, for given m the quantity (14) was approximately determined as the maximum L2 error of 104 runs of the algorithm. In each run the noise ei was chosen randomly and independently from the uniform distribution in [−δ, δ]. The test results are presented correspondingly in Figs. 2 and 3. The scales are logarithmic, − log10 (err) versus log10 m, where err and m are given by (14). In all the cases we observe the theoretical convergence rate max(δ, m−(r +ϱ) ). Acknowledgments This research was supported by the National Science Centre, Poland, under projects 2014/13/N/ST1/ 02370 (P. Morkisz) and 2013/09/B/ST1/04275 (L. Plaskota). A part of this work was done while P. Mork-
136
P. Morkisz, L. Plaskota / Journal of Complexity 32 (2016) 122–136
isz was visiting the Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, under the Ph.D. internship program of the Warsaw Center of Mathematics and Computer Science. References [1] F. Arandiga, A. Cohen, R. Donat, N. Dyn, Interpolation and approximation of piecewise smooth functions, SIAM J. Numer. Anal. 43 (2005) 41–57. [2] F. Arandiga, A. Cohen, R. Donat, N. Dyn, B. Matei, Approximation of piecewise smooth functions and images by edgeadapted (ENO-EA) nonlinear multiresolution techniques, Appl. Comput. Harmon. Anal. 24 (2008) 225–250. [3] E.J. Candès, D.L. Donoho, New tight frames of curvelets and optimal representations of objects with piecewise C 2 singularities, Comm. Pure Appl. Math. 57 (2004) 219–266. [4] S. Engelberg, E. Tadmor, Recovery of edges from spectral data with noise—a new perspective, SIAM J. Numer. Anal. 46 (2008) 2620–2635. [5] A. Gelb, Reconstruction of piecewise smooth functions from non-uniform grid point data, J. Sci. Comput. 30 (2007) 409–440. [6] B. Kacewicz, P. Przybyłowicz, Optimal adaptive solution of initial-value problems with unknown singularities, J. Complexity 24 (2008) 455–476. [7] B. Kacewicz, P. Przybyłowicz, Optimal adaptive solution of piecewise regular systems of IVPs with unknown switching hypersurface, Appl. Math. Comput. 228 (2014) 116–127. [8] B. Kacewicz, P. Przybyłowicz, Optimal solution of a class of non-autonomous initial-value problems with unknown singularities, J. Comput. Appl. Math. 261 (2014) 364–377. [9] B. Kacewicz, P. Przybyłowicz, Complexity of the derivative-free solution of systems of IVPs with unknown singularity hypersurface, J. Complexity 31 (2015) 75–97. [10] Y. Lipman, D. Levin, Approximating piecewise-smooth functions, IMA J. Numer. Anal. 30 (2010) 1159–1183. [11] E. Novak, Deterministic and Stochastic Error Bounds in Numerical Analysis, in: Lecture Notes in Math., vol. 1349, Springer, Berlin, 1988. [12] L. Plaskota, Noisy Information and Computational Complexity, Cambridge Univ. Press, Cambridge, 1996. [13] L. Plaskota, Noisy information: optimality, complexity, tractability, in: J. Dick, F.Y. Kuo, G.W. Peters, I.H. Sloan (Eds.), Monte Carlo and Quasi-Monte Carlo Methods 2012, Springer, 2013, pp. 173–209. [14] L. Plaskota, G.W. Wasilkowski, Adaption allows efficient integration of functions with unknown singularities, Numer. Math. 102 (2005) 123–144. [15] L. Plaskota, G.W. Wasilkowski, Uniform approximation of piecewise r-smooth and globally continuous functions, SIAM J. Numer. Anal. 47 (2009) 762–785. [16] L. Plaskota, G.W. Wasilkowski, Y. Zhao, The power of adaption for approximating functions with singularities, Math. Comp. 77 (2008) 2309–2338. [17] L. Plaskota, G.W. Wasilkowski, Y. Zhao, An adaptive algorithm for weighted approximation of singular functions over R, SIAM J. Numer. Anal. 51 (2013) 1470–1493. [18] P. Przybyłowicz, Optimality of Euler-type algorithms for approximation of stochastic differential equations with discontinuous coefficients, Int. J. Comput. Math. 91 (2014) 1461–1479. [19] P. Przybyłowicz, P. Morkisz, Strong approximation of solutions of stochastic differential equations with time-irregular coefficients via randomized Euler algorithm, Appl. Numer. Math. 78 (2014) 80–94. [20] E. Tadmor, Filters, mollifiers and the computation of the Gibbs phenomenon, Acta Numer. 16 (2007) 305–378. [21] J.F. Traub, G.W. Wasilkowski, H. Woźniakowski, Information-Based Complexity, Academic Press, New York, 1988.