Available online at www.sciencedirect.com
ScienceDirect Journal of Approximation Theory 241 (2019) 1–10 www.elsevier.com/locate/jat
Full Length Article
Markov factors on average—An L 2 case Len Bos Department of Computer Science, University of Verona, Italy Received 30 November 2017; received in revised form 17 November 2018; accepted 8 January 2019 Available online 15 January 2019 Communicated by Doron S. Lubinsky
Abstract The classical Markov polynomial inequality bounds the uniform norm of the derivative of a polynomial on an interval in terms of its degree squared and the norm of the polynomial itself, with the factor of degree squared being the optimal upper bound. Here we study what this factor should be on average, for random polynomials with independent N (0, 1) coefficients. In the case of the interval [−1, 1] and Jacobi weights defining an L 2 space, we show that this average factor is order degree to the 3/2, as compared to the degree squared worst case upper bound. c 2019 Elsevier Inc. All rights reserved. ⃝ Keywords: Markov inequality; Random polynomial; Average Markov factor
1. Introduction The classical Markov inequality states that for all univariate polynomials, p ∈ R[x], 2 (deg( p))2 ∥ p∥[a,b] , ∀x ∈ [a, b] b−a where the norm ∥ p∥[a,b] denotes the maximum, or uniform, norm on [a, b]. Such inequalities are also known in other norms, and in particular in L p norms for measures defined by a so-called doubling weight, i.e., ones for which the measure of an interval expanded by a factor of 2, is less than a constant times the measure of the original interval (see [1] for the precise definition). | p ′ (x)| ≤
Theorem 1.1 (Mastroianni and Totik [1, Thm. 7.4]). Suppose that w is a doubling weight on [−1, 1] and that 1 ≤ r < ∞. Then there is a constant Cr such that for all polynomials p(x) with E-mail address:
[email protected]. https://doi.org/10.1016/j.jat.2019.01.004 c 2019 Elsevier Inc. All rights reserved. 0021-9045/⃝
2
L. Bos / Journal of Approximation Theory 241 (2019) 1–10
real coefficients, i.e., p ∈ R[x], {∫ 1 }1/r {∫ r | p ′ (x)| w(x)d x ≤ Cr (deg( p))2 −1
1
| p(x)|r w(x)d x
}1/r
.
−1
In general, the factor (deg( p))2 is best possible. We will discuss this more in depth later. Alternatively we introduce, for a given norm ∥ · ∥, on the polynomials, the Markov factor ∥ p′ ∥ . (1.1) ∥ p∥ Then the above Markov inequalities state that (for the uniform norm and L p norms associated to a doubling weight), M( p) :=
sup M( p) ≤ Cn 2 deg( p)≤n
for some constant C, i.e., that the worst case upper bound for M( p) is O((deg( p))2 ). In this work we will discuss the average value of M( p) taken over p a random polynomial of degree at most n and specifically using the L 2 norm over [−1, 1] with Jacobi weight wα,β (x) := (1 − x)α (1 + x)β ,
α, β > −1.
2. General formulation Aspects of this problem are best clarified by establishing a general framework. Suppose that K ⊂ Rd is compact and is the closure of its interior and that µ is a finite Borel measure supported on K . We consider the L 2 norm √∫ ∥ p∥ L 2 (µ) = p 2 (x)dµ(x). K
We of course must establish precisely what we mean by a “random” polynomial. To this end, for degree n, let Bn := {P1 , P2 , . . . , PNn } be an orthonormal basis with respect ( to)the measure µ for the polynomials of degree at most n restricted to K ⊂ Rd . Here Nn := n+d denotes the n dimension of this space and we order the basis to be consistent with the degree, i.e., for j < k, deg(P j ) ≤ deg(Pk ). Then, by a random polynomial of degree at most n we mean a pa (x) =
Nn ∑
ai Pi (x)
i=1
where the coefficients ai are independent N (µ, σ 2 ) random variables with common normal density function which we will refer to as f (ai ), 1 ≤ i ≤ Nn . Here a ∈ R Nn refers to the vector of coefficients a = (a1 , . . . , a Nn ). Remark 2.1. It is of course more general to allow an arbitrary basis, such as the monomials, and not just an orthonormal basis. However, this general case, although highly interesting, leads to a rather more complicated analysis. An orthonormal basis, on the other hand, lends itself to a complete analysis, and is nevertheless of interest. It is to this latter case that this paper is dedicated. Then for the jth partial derivative we define the average Markov factors ∫ ∥D j pa ∥ L 2 (µ) j Mn := f (a1 ) f (a2 ) · · · f (a Nn )da1 da2 · · · da Nn . ∥ pa ∥ L 2 (µ) R Nn
(2.1)
L. Bos / Journal of Approximation Theory 241 (2019) 1–10
3
∑ Nn Now for any p(x) = i=1 a j P j (x) of degree at most n, D j p(x) is a polynomial of degree at most n − 1 and so we may write Nn−1
D j p(x) =
∑
b j P j (x)
i=1
for some coefficients b j . The mapping of the vector of coefficients a ∈ R Nn to b ∈ R Nn−1 is represented by a matrix D j ∈ R Nn−1 ×Nn called the corresponding differentiation matrix. Using this notation we may write the Markov factor as ∥D j pa ∥ L 2 (µ) ∥D j a∥2 = ∥ pa ∥ L 2 (µ) ∥a∥2 and the average Markov factor as ∫ ∥D j a∥2 f (a1 ) f (a2 ) · · · f (a Nn )da1 da2 · · · da Nn , Mnj = R Nn ∥a∥2
(2.2)
(2.3)
where, for a vector b, ∥b∥2 denotes the usual euclidean norm of b. Remark 2.2. The form (2.2) shows that the worst case Markov factor sup deg( p)≤n
∥D j p∥ L 2 (µ) ∥D j a∥2 = sup = ∥D j ∥2 , ∥ p∥ L 2 (µ) N a∈R n ∥a∥2
the Euclidean operator norm of the matrix D j . Example 2.3. Consider the interval [−1, 1] with Chebyshev measure 1 2 d x. √ π 1 − x2 The orthonormal polynomials are then dµ :=
1 Bn = { √ T0 (x), T1 (x), . . . , Tn (x)} 2 where the T j (x) are the classical Chebyshev polynomials (of the first kind). It is essentially a trigonometric identity that { ∑k−1 2 j odd T j (x) ′ ) k even ; √ ( 1 ∑k−1 Tk (x) = k 2 j>0, even T j (x) + 2 √2 T0 (x) k odd see e.g. [3, Chap. 1]. The resulting differentiation matrix has the form (for n odd) √ √ √ √ ⎤ ⎡ 0 2 0 3 2 0 5 2 0 ··· 0 n 2 ⎢0 0 4 0 8 0 12 · · · 2(n − 1) 0 ⎥ ⎥ ⎢ ⎢0 0 0 6 0 10 0 14 ··· 2n ⎥ ⎢ ⎥ ⎢0 0 0 ⎥ 0 8 ⎢ ⎥ n×(n+1) D=⎢ . · ⎥ ⎢· ⎥∈R ⎢· ⎥ · ⎥ ⎢ ⎢· · ⎥ ⎢ ⎥ ⎣0 0 0 0 0 0 0 0 2(n − 1) 0 ⎦ 0 0 0 0 0 0 0 0 0 2n
4
L. Bos / Journal of Approximation Theory 241 (2019) 1–10
The first column of zeros reflects the fact that the derivative of T0 (x) is zero and hence the constant term does not contribute to the derivative. It could be safely suppressed, if so desired. If we let x ∈ Rn+1 be the vector of all ones, then ∥D∥2 ≥
∥Dx∥2 . ∥x∥2
This latter expression, given the components of D, can be directly evaluated to be √ ∥Dx∥2 = n(2n + 1)(n 2 + n + 3)/15 ∥x∥2 √ so that ∥D∥2 ≥ n(2n + 1)(n 2 + n + 3)/15 = O(n 2 ) and the worst case Markov factor has growth at least of order n 2 . From now on consider the unbiased case, i.e., when we assume that the coefficients a j are N (0, σ 2 ) with common density f (x) =
1 √ exp(−x 2 /(2σ 2 )). σ 2π
Notice that, using the notation da = da1 · · · da Nn , ∫ ∥D j a∥2 Mnj = f (a1 ) f (a2 ) · · · f (a Nn )da R Nn ∥a∥2 is radial
and that ∥a∥2 = r is the radius. Hence we may switch to polar spherical coordinates and use the fact that the joint density f (a1 ) f (a2 ) · · · f (a Nn ) must integrate to one, to obtain ∫ ∥D j x∥2 dσ (x) (2.4) Mnj = S Nn −1
where x = a/r and dσ (x) is normalized surface area on the unit sphere S Nn −1 ⊂ R Nn . Note that we may express (2.4) in terms of a quadratic form as ∫ √ Mnj = xt Qx dσ (x)
(2.5)
S Nn −1
where Q := Dtj D j . Such averages of square roots of quadratic forms is a classic subject investigation in statistics. Rivin [2] gives the following upper and lower bounds. Proposition 2.4 (Rivin [2, Cor. 15]). For any vector q ∈ R N , √∑ ∫ N 2 2 N −1 Γ (N /2) 1 i=1 qi x i dσ (x) S ≤ ≤√ . √ ∥q∥2 π Γ ((N + 1)/2) N Using this we may conclude Proposition 2.5. Setting N = Nn , we have j
√
Γ (N /2) Mn 1 ≤√ ≤√ . π Γ ((N + 1)/2) tr(Q) N
L. Bos / Journal of Approximation Theory 241 (2019) 1–10
5
Proof. In (2.5) diagonalize Q = P t ΛP where P is an orthogonal matrix and Λ is the diagonal matrix of non-negative eigenvalues of Q, λi say. Then (2.5) reduces to N ∫ ∑ j √ M = q 2 x 2 dσ (x) n
S N −1
i
i
i=1
with qi2 := λi and N N ∑ ∑ √ 2 √ ∥q∥2 = qi = √ λi = tr(Q). □ i=1
i=1
j Mn
Bounding then requires a good estimate of tr(Q). This may be expressed in terms of the basis of orthonormal polynomials. Lemma 2.6. We have tr(Q) =
tr(Dtj D j )
=
Nn ∫ ∑ i=1
(D j Pi (x))2 dµ(x). K
Proof. Just note that tr(Dtj D j ) is the sum of the squares of the entries of D j . The columns of D j give the coefficients of the expansion of the derivative of the corresponding Pi with respect to the orthonormal basis Bn . □ 3. Jacobi weights on [−1, 1] In certain cases it is possible to use Lemma 2.6 to give an explicit expression for tr(Q). The case of Jacobi weights wα,β (x) := (1 − x)α (1 + x)β ,
α, β > −1
is one of these. (α,β) Let Pn (x) denote the classical orthogonal Jacobi polynomial of degree n. Then, as is well known (see e.g. [4, (4.3.3)]), ∫ 1 h n := (Pn(α,β) (x))2 wα,β (x) d x −1
=
Γ (n + α + 1)Γ (n + β + 1) 2α+β+1 . 2n + α + β + 1 Γ (n + 1)Γ (n + α + β + 1)
We let ˆn(α,β) (x) := √1 Pn(α,β) (x) P hn denote the orthonormalized Jacobi polynomials. Proposition 3.1. For the Jacobi weights on [−1, 1], (λ + 1)n(n + 1)(n + λ)(n + λ + 1) tr(Dtj D j ) = 8(α + 1)(β + 1) where λ := α + β + 1.
6
L. Bos / Journal of Approximation Theory 241 (2019) 1–10
Proof. By Lemmas 2.6 and 3.3, we have n ∑ (λ + 1)k(k + λ)(2k + λ) tr(Dtj D j ) = 4(α + 1)(β + 1) k=1 n
∑ (λ + 1) {2k 3 + (3λ)k 2 + λ2 k} 4(α + 1)(β + 1) k=1 { 2 n (n + 1)2 n(n + 1)(2n + 1) (λ + 1) 2 + (3λ) = 4(α + 1)(β + 1) 4 6 } n(n + 1) +λ2 2 } n(n + 1) { (λ + 1) n(n + 1) + λ(2n + 1) + λ2 = 4(α + 1)(β + 1) 2 (λ + 1) = n(n + 1)(n + λ)(n + λ + 1). □ 8(α + 1)(β + 1) Using this formula we may establish the precise order of the average Markov factor for this case. =
Theorem 3.2. For the L 2 norm with Jacobi weights there are constants C1 , C2 > 0, such that C1 n 3/2 ≤ Mn ≤ C2 n 3/2 .
(3.1)
Proof. By Proposition 2.5, as Nn = n + 1, we have √ √ tr(Q) Γ ((n + 1)/2) tr(Q) √ ≤ Mn ≤ √ . π Γ ((n + 2)/2) n+1 Given the formula of Proposition 3.1, the upper bound is clearly of order n 3/2 . For the lower bound, just note that by Stirling’s formula we have the asymptotic estimate √ ( ) 2π x x Γ (x) ≈ √ x e from which it follows that √ Γ ((n + 1)/2) ≥ C/ n √ π Γ ((n + 2)/2) for some constant C and hence the lower bound is also order n 3/2 . □ Lemma 3.3. We have ∫ 1( d ˆ(α,β) )2 (λ + 1)n(n + λ)(2n + λ) . Pn (x) wα,β (x) d x = d x 4(α + 1)(β + 1) −1 Proof. We will use the facts that d (α,β) n + λ (α+1,β+1) Pn (x) = Pn−1 (x) (cf. [4, (4.21.7)]) dx 2 and that )( ) n ( ∑ n + α n + β ( x − 1 )k ( x + 1 )n−k Pn(α,β) (x) = n−k k 2 2 k=0
(3.2)
(cf. [4, (4.3.2)])
(3.3)
L. Bos / Journal of Approximation Theory 241 (2019) 1–10
from which it follows, in particular, that )( ) n−1 ( d (α,β) n+λ∑ n+α n + β ( x − 1 )k ( x + 1 )n−1−k P (x) = . dx n 2 k=0 n − 1 − k k 2 2 Now note that by dividing we may write d (α,β) P (x) = (1 − x 2 )Q n (x) + Rn (x) dx n with deg(Q n ) = (n − 1) − 2 = n − 3 and (x + 1) (1 − x ) Rn (x) = An + Bn 2 2 for some constants An , Bn . Evaluating at x = ±1 shows that ⏐ d (α,β) ⏐⏐ Pn (x)⏐ An = dx x=+1 n + λ (α+1,β+1) = P (+1) 2 (n−1 ) n+λ n+α = (cf. [4, (4.1.1)]) 2 n−1 and ⏐ d (α,β) ⏐⏐ Pn (x)⏐ dx x=−1 n + λ (α+1,β+1) = Pn−1 (−1) 2 ( ) n+λ n+β = (−1)n−1 (cf. [4, (4.1.4)]). 2 n−1
Bn =
Then ∫ 1( d (α,β) )2 Pn (x) wα,β (x) d x −1 d x ∫ 1( d (α,β) ) Pn (x) ((1 − x 2 )Q n (x) + Rn (x)) wα,β (x) d x = −1 d x ∫ 1 n + λ (α+1,β+1) = Pn−1 (x)Q n (x) wα+1,β+1 (x) d x 2 −1 ∫ 1( d (α,β) ) + Pn (x) Rn (x) wα,β (x) d x −1 d x ∫ 1( d (α,β) ) =0+ Pn (x) Rn (x) wα,β (x) d x (as deg(Q n ) = n − 3) −1 d x {( ) ∫ n + λ 1 (α+1,β+1) n + λ n + α (1 + x ) = Pn−1 (x) + 2 2 n−1 2 −1 ( ) } n + β (1 − x ) (−1)n−1 wα,β (x) d x n−1 2
7
8
L. Bos / Journal of Approximation Theory 241 (2019) 1–10
)2 {( )∫ 1 ( ) ( ) n + λ 1 − x α 1 + x β+1 n+α (α+1,β+1) α+β =2 P (x) dx 2 n − 1 −1 n−1 2 2 } ) ( ) ( )∫ 1 ( 1 − x α+1 1 + x β (α+1,β+1) n−1 n + β (x) P dx . + (−1) 2 2 n − 1 −1 n−1 (
(3.4)
In what follows we will make use of the classical Beta function ∫ 1 B(x, y) = t x−1 (1 − t) y−1 dt = Γ (x)Γ (y)/Γ (x + y). 0
Indeed, we calculate ( ) ( ) ∫ 1 1 − x α 1 + x β+1 (α+1,β+1) Pn−1 (x) dx 2 2 −1 ( )( )( ) ( ) ∫ 1∑ n−1 n+α n+β 1 − x k+α 1 + x n−k+β k (−1) = d x (by (3.3)) n−1−k k 2 2 −1 k=0 ( ) ( ) )( )∫ 1 ( n−1 ∑ n+α n+β 1 − x k+α 1 + x n−k+β = (−1)k dx 2 2 n−1−k k −1 k=0 ( )( ) n−1 ∑ n+α n+β = (−1)k [2B(k + α + 1, n + 1 − k + β)] n−1−k k k=0 ( )( ) n−1 ∑ n+α n + β Γ (k + α + 1)Γ (n + 1 − k + β) k =2 (−1) Γ (n + 2 + α + β) n−1−k k k=0 n−1
=
∑ Γ (n + α + 1) Γ (n + β + 1) 2 (−1)k Γ (n + 2 + α + β) k=0 Γ (k + 2 + α)Γ (n − k) Γ (k + 1)Γ (n + 1 − k + β) × Γ (k + α + 1)Γ (n + 1 − k + β)
( ) n−1 1 2Γ (n + α + 1)Γ (n + β + 1) ∑ n−1 k (−1) = k Γ (n + 2 + α + β)Γ (n) k=0 k+α+1 2Γ (n + α + 1)Γ (n + β + 1) Γ (α + 1)Γ (n) Γ (n + 2 + α + β)Γ (n) Γ (n + α + 1) Γ (n + β + 1)Γ (α + 1) =2 . Γ (n + 2 + α + β) =
(by Lemma 3.4) (3.5)
Similarly, by changing variables x ′ := −x, we obtain ( ) ( ) ∫ 1 1 − x α+1 1 + x β Γ (n + α + 1)Γ (β + 1) (α+1,β+1) (−1)n−1 dx = 2 . (3.6) Pn−1 (x) 2 2 Γ (n + 2 + α + β) −1 Substituting (3.5) and (3.6) into (3.4) we obtain ∫ 1( d (α,β) )2 Pn (x) wα,β (x) d x −1 d x ( ) {( ) 2α+β+1 n+λ 2 n+α = Γ (n + β + 1)Γ (α + 1) Γ (n + 2 + α + β) 2 n−1
L. Bos / Journal of Approximation Theory 241 (2019) 1–10
9
} ( ) n+β + Γ (n + α + 1)Γ (β + 1) n−1 ( ) { } n + λ 2 Γ (n + α + 1)Γ (n + β + 1) 1 1 = 2α+β+1 , + 2 Γ (n + λ + 1)Γ (n) α+1 β +1 after some simplification. Finally, by dividing by the normalization constant h n and simplifying, the result follows. □ Lemma 3.4. For α > −1, ( ) n−1 ∑ 1 n−1 Γ (α + 1)Γ (n) k (−1) . = k+α+1 k Γ (n + α + 1) k=0 Proof. Let F(t) :=
( ) n−1 ∑ t k+α+1 n−1 (−1)k . k+α+1 k k=0
We need to evaluate F(1). To this end, note that as α > −1, F(0) = 0. Thus ∫ 1 F ′ (t)dt. F(1) = 0
But n−1 ∑
(
n−1 F (t) = (−1) t k k=0 ( ) n−1 ∑ n−1 = tα (−t)k k k=0 ′
k k+α
)
= t α (1 − t)n−1 and so 1
∫ F(1) =
t α (1 − t)n−1 dt
0
= B(α + 1, n) (the Beta function) Γ (α + 1)Γ (n) . □ = Γ (n + α + 1) 4. Some comments on the monomial basis and the uniform norm As mentioned previously, using for example the basis of monomial basis leads, in general, to a complicated analysis problem. However, for the interval K = [0, 1] and a random polynomial of degree n taken to mean pa (x) :=
n ∑
ak x k ,
k=0
with the coefficients ak iid sampled from any distribution on R+ := {a ∈ R : a ≥ 0} with common density function f (x), the average Markov factor in the uniform norm may easily be calculated.
10
L. Bos / Journal of Approximation Theory 241 (2019) 1–10
Proposition 4.1. Under the above assumptions, the average Markov factor ∫ ∥ pa′ (x)∥ K n Mn := f (a0 ) f (a1 ) · · · f (an ) da = . n+1 ∥ pa (x)∥ K 2 R+ Proof. Since the coefficients of pa (x) and pa′ (x) are all positive, we have ∥ pa ∥ K = pa (1)
and ∥ pa′ ∥ K = pa′ (1).
Hence ∫
nan + (n − 1)an−1 + · · · + a1 f (a0 ) f (a1 ) · · · f (an ) da an + an−1 + · · · + a1 + a0 ( n )∫ ∑ an k = f (a0 ) f (a1 ) · · · f (an ) da n+1 an + an−1 + · · · + a1 + a0 R+ k=1 ∫ an + an−1 + · · · + a0 n(n + 1) 1 f (a0 ) f (a1 ) · · · f (an ) da = n+1 2 n + 1 R+ an + an−1 + · · · + a1 + a0 n(n + 1) 1 = ×1 2 n+1 n = . □ 2 Although imposing positivity of the coefficients may seem to be excessively restrictive, it appears to be the case that, more generally, the average Markov factor for the monomial basis is indeed of order n 1 . This will be the subject of a forthcoming work. Mn =
n+1 R+
References [1] G. Mastroianni, V. Totik, Weighted polynomial inequalities with doubling and A∞ weights, Constr. Approx. 16 (2000) 37–71. [2] I. Rivin, Surface area and other measures of ellipsoids, Adv. Appl. Math. 39 (2007) 409–427. [3] T.J. Rivlin, The Chebyshev Polynomials, Wiley, 1974. [4] G. Szego, Orthogonal polynomials, Amer. Math. Soc. (1939).