APPENDIX
B
Regularization Operators
The simples case.
Kernel machines can be nicely presented within a regularization framework based on differential operators. Here we give an introduction to differential and pseudodifferential operators. A natural way of imposing the development of a “smooth solution” f of a learning problem is to think of a special expression of the parsimony principle which relies on restricting the quick variations of f . In the simplest case in which f : X ⊂ R → R and f ∈ L2 (X ), one can introduce the index d d f (x) · f (x) dx = Pf (x) · Pf (x) dx = f , f R = [f (x)]2 dx = dx dx X
=
X
X
f 2P .
The index f 2P ≥ 0 is a seminorm in L2 (X ). It has all the properties of a norm, except for the fact that f P = 0 does not imply f ≡ 0, since this clearly holds for constant functions f (x) ≡ c, too. In case X = R, we can promptly see that this way of measuring the degree of parsimony of f makes strong conditions on the asymptotic behavior of f . If X = [a. . b], then b
b
[f (x)] dx = 2
a
d f (x) · df (x) = [f 2 (x)]ba − dx
a
b
f (x) · f (x) dx.
a
If f (a) = f (b) = 0 then b a
The case P = ∇.
[f (x)]2 dx = −
b
f (x) · f (x) dx = f, −
d d f = f, P Pf , (B.0.1) dx dx
a
where P := −d/dx is the adjoint operator of P = d/dx. Interestingly, once we assume the boundary condition that f is null on its border, it turns out that f P is related to L = P P = −d 2 /dt 2 . Now, let us consider the case of X ⊂ Rd in which we replace P = d/dx with P = ∇. Like for the case d = 1, we still assume to analyze functions in L2 (X ). Assume f, u ∈ L2 (X ). We have ∇ · (u∇f ) = ∇f · ∇u + u∇ 2 f.
Regularization Operators
513
Now, like in the case of a single dimension, let us assume as boundary condition that u vanishes on the boundary ∂X of X . Then we get
∇ · (u∇f ) − u∇ 2 f dx
∇f · ∇u dx = X
X
u∇f · dS −
= ∂X
u∇ 2 f dx = −
X
u∇ 2 f dx
X
This can be rewritten as ∇f, ∇u = u, −∇ · ∇f = u, ∇ ∇f , and then ∇ = −∇·. Now for f = u we get ∇f, ∇f = f, −∇ 2 f .
(B.0.2)
Of course, like for P = d/dx, the above expression for f P , which clearly generalizes (B.0.1), holds in case function f is identically null on its border. Now, we consider the case P = = ∇ 2 . Interestingly, we can analyze this case by invoking the result discovered for P = ∇. Given u, v ∈ L2 (X ), if (∇u = 0) ∧ (∇v = 0) on ∂X , then we have ∇u, ∇v = −∇ 2 u, v. If we exchange u with v, we get ∇v, ∇u = −∇ 2 v, u. Since ∇u, ∇v = ∇v, ∇u, we get ∇ 2 u, v = ∇ 2 v, u, that is, is self-adjoint. As a consequence, we can determine f 2 since f, f = f, (f ) = f, 2 f = f, ∇ 4 f . Of course, this holds whenever ∇f = 0 on ∂X . Now, it is interesting to see what happens when we consider higher order differential operators. A crucial remark concerns the periodic structure that emerges in P m . Beginning from P = ∇ and P 2 = ∇ · ∇, it becomes natural to define P 3 = ∇(∇ · ∇) and, therefore, the sequence P 0 = I,
P 1 = ∇,
P 2 = ∇ · ∇,
P 3 = ∇∇ · ∇,
P 4 = ∇ · ∇∇ · ∇, . . .
Now, let ak ∈ R+ , with κ ∈ Nm , and consider
em
=
m/2 h=0
a2h ∇
2h
and
om
=
m/2 h=0
a2h+1 ∇∇ 2h ,
The case P = ∇ 2 .
514
APPENDIX B Regularization Operators
where P 2h = h = ∇ 2h and P 2h+1 = ∇∇ 2h for h = 0, . . . , m. Now, the operator
em leads to em f, em f =
m/2
a2h ∇ 2h f,
h=0
=
m/2
m/2 m/2 a2κ ∇ 2κ f = a2h a2κ ∇ 2κ f, ∇ 2h f
κ=0
m/2 m/2
h=0 κ=0
a2h a2κ f, ∇ 2κ ∇ 2h f =
h=0 κ=0
m/2 m/2
a2h a2κ f, ∇ 2(h+κ) f .
h=0 κ=0
Likewise, for om we have om f, om f =
m/2
a2h+1 ∇∇ 2h f,
h=0
=
m/2 m/2
m/2
a2κ+1 ∇∇ 2κ f
κ=0
a2h+1 a2κ+1 ∇∇ 2h f, ∇∇ 2κ f
h=0 κ=0
=−
m/2 m/2
a2h+1 a2κ+1 ∇ 2h f, ∇ · ∇∇ 2κ f
h=0 κ=0
=−
m/2 m/2
a2h+1 a2κ+1 f, ∇ 2h ∇ 2(κ+1) f
h=0 κ=0
=−
m/2 m/2
a2h+1 a2κ+1 f, ∇ 2(h+κ+1) f .
h=0 κ=0
By definition, these operators give rise to the norm f 2 m := f 2 em + f 2 om . The following proposition helps determine the adjoint of m . Proposition 1. Let u, v ∈ C 2n (X ⊂ Rd , R) be such that ∀n ∈ N and ∀x ∈ ∂X , ∇ n u(x) = v(x) = 0. If h = 2n then (P h ) = P h , and if h = 2n + 1 then (P h ) = 2n . −∇ · ∇ , where ∇ = ∇∇·. Proof. We start noting that the proposition holds trivially for h = 0; in this case P h reduces to the identity. Then we discuss even and odd terms separately. We prove that for the even terms P 2n is Hermitian. The proof is given by induction on n. • Basis of induction. For n = 1, P 2 = ∇ 2 and P 2 is self-adjoint. • Induction step. Since ∇ 2 is self-adjoint (basis of induction), because of the induction hypothesis ∇ 2(n−1) u, v = u, ∇ 2(n−1) v, and because of the conditions on the border ∂X , we have ∇ 2n u, v = ∇ 2 (∇ 2(n−1) u), v = ∇ 2(n−1) u, ∇ 2 v = u, ∇ 2(n−1 ∇ 2 v = u, ∇ 2n v.
Regularization Operators
2n
Now, for the odd terms we prove that (∇ 2n+1 ) = −∇ · ∇ . • Basis of induction. For n = 0, we have P 1 = ∇ and ∇ = −∇·. • Induction step. We get ∇ 2n+1 u, v = ∇∇ 2n u, v = ∇ 2n u, −∇ · v = 2n u, −∇ 2n ∇ · v = u, −∇ · ∇ v. Corollary 1. Let u, v : X ⊂ Rd → R be two analytic functions such that ∀h ∈ N and ∀x ∈ ∂X , ∇ 2h u(x) = v(x) = 0. Then ( em )
= m =
m/2
a2h ∇
2h
and
( om )
=−
h=0
m/2
2h
a2h+1 ∇ · ∇ .
(B.0.3)
h=0
Proof. For m = 2r, given any two functions that satisfy the hypotheses, from Proposition 1 we have m u, v =
r
m a2h ∇ 2h u, v = u, a2h ∇ 2h v = u, m v.
h=0
h=0
Likewise, for m = 2r + 1 we have m u, v =
r
r 2h a2h+1 ∇ 2h+1 u, v = u, − a2h+1 ∇ · ∇ v = u, m v.
h=0
h=0
The distinct definitions of em and om for even and odd integers with the corresponding adjoint operators ( em ) and ( om ) makes it possible to compute m f, m f =
m/2
2 2 ∇ 2h f, ∇ 2h f + a2h+1 ∇∇ 2h f, ∇∇ 2h f . a2h
h=0
Proposition 2. Let m be an even number. Then m f, m f = f, ( m m )f = f,
m+1
(−1)h ah2 ∇ 2h f .
h=0
Proof. From straightforward application of the above propositions to m , m f, m f =
m/2 2 2 ∇ 2h f, ∇ 2h f + a2h+1 ∇∇ 2h f, ∇∇ 2h f a2h h=0
m/2 2 2 = f, ∇ 4h f + a2h+1 f, −∇ 4h+2 f a2h h=0
= f,
m+1
(−1)h ah2 ∇ 2h f .
h=0
515
516
We use the multiindex notation, that is, α! = α1 !α2 ! · · · αd ! ∂ )α = and ( ∂x
APPENDIX B Regularization Operators
Now we discuss a more general case in which m =
m h=0
ah
∂ ∂ ∂ + + ··· + ∂x1 ∂x2 ∂xd
h =
m
ah D = h
h=0
m h=0
h! ∂ α ah α! ∂x |α|=h
∂ α1 ∂ α2 · · · ∂ αd . α α α ∂x1 1 ∂x2 2 ∂xd d
(B.0.4) with |α| = α1 + α2 + · · · + αd . Proposition 3. Let us consider the differential operator given by (B.0.4). Then (m ) =
m
(−1)h ah
h=0
h! ∂ α . α! ∂x
(B.0.5)
|α|=h
Proof. Since the adjoint of a sum of operators is the sum of the adjoints, we can restrict the proof to the operator ∂xα . So we just need to prove that (∂xα ) = (−1)|α| ∂xα . For |α| = 0, the proof is trivial. For |α| > 0, under some regularity conditions on the space on which the operator acts, we can always write ∂xα = ∂xi1 ∂xi2 · · · ∂xi|α| , where |α|
the indices i1 , . . . , i|α| belongs to {1, 2, . . . , d}; for example, ∂x1 = ∂x1 ∂x1 · · · ∂x1 . From here it is immediate to see that ∂xi1 ∂xi2 · · · ∂xi|α| u, v = (−1)|α| u, ∂xi1 ∂xi2 · · · ∂xi|α| v. This ends the proof. Let M be the set of d-dimensional multiindices with length betweenα 0 and m, M = { α = (α1 , . . . , αd ) | 0 ≤ |α| ≤ m }. Then m = α∈M bα ∂x , and the regularization term deriving from m is m f, m f = bα ∂xα f, bβ ∂xβ f = (−1)|α| bα bβ f, ∂xα ∂xβ f . α∈M
β∈M
α∈M β∈M
The above discussion on differential operators can be enriched at least two different directions. First, we can consider an infinite number of differential terms (m → ∞) and, second, we can replace the ak coefficients with functions ak : X → R.