Accepted Manuscript Functional Reproducing Kernel Hilbert Spaces for Non-Point-Evaluation Functional Data
Rui Wang, Yuesheng Xu
PII: DOI: Reference:
S1063-5203(17)30074-X http://dx.doi.org/10.1016/j.acha.2017.07.003 YACHA 1209
To appear in:
Applied and Computational Harmonic Analysis
Received date: Revised date: Accepted date:
25 June 2016 17 June 2017 10 July 2017
Please cite this article in press as: R. Wang, Y. Xu, Functional Reproducing Kernel Hilbert Spaces for Non-Point-Evaluation Functional Data, Appl. Comput. Harmon. Anal. (2017), http://dx.doi.org/10.1016/j.acha.2017.07.003
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Functional Reproducing Kernel Hilbert Spaces for Non-Point-Evaluation Functional Data Rui Wang∗ and Yuesheng Xu†
Abstract Motivated by the need of processing non-point-evaluation functional data, we introduce the notion of functional reproducing kernel Hilbert spaces (FRKHSs). This space admits a unique functional reproducing kernel which reproduces a family of continuous linear functionals on the space. The theory of FRKHSs and the associated functional reproducing kernels are established. A special class of FRKHSs, which we call the perfect FRKHSs, are studied, which reproduce the family of the standard point-evaluation functionals and at the same time another different family of continuous linear (non-point-evaluation) functionals. The perfect FRKHSs are characterized in terms of features, especially for those with respect to integral functionals. In particular, several specific examples of the perfect FRKHSs are presented. We apply the theory of FRKHSs to sampling and regularized learning, where non-point-evaluation functional data are used. Specifically, a general complete reconstruction formula from linear functional values is established in the framework of FRKHSs. The average sampling and the reconstruction of vector-valued functions are considered in specific FRKHSs. We also investigate in the FRKHS setting the regularized learning schemes, which learn a target element from non-point-evaluation functional data. The desired representer theorems of the learning problems are established to demonstrate the key roles played by the FRKHSs and the functional reproducing kernels in machine learning from non-point-evaluation functional data. We finally illustrate that the continuity of linear functionals, used to obtain the non-point-evaluation functional data, on an FRKHS is necessary for the stability of the numerical reconstruction algorithm using the data.
Key words: Functional reproducing kernel Hilbert space, functional reproducing kernel, average sampling, regularized learning, non-point-evaluation functional data 2010 Mathematics Subject Classification: 46E22, 68T05, 94A20
1
Introduction
The main purpose of this paper is to introduce the notion of the functional reproducing kernel Hilbert space and discuss the essential ideas behind such spaces along with their applications to sampling and machine learning. The introduction of such a notion is motivated by the need of processing various data (not necessarily the point-evaluations of a function) emerging in practical applications. Many scientific problems come down to understanding a function from a finite set of its sampled data. Commonly used sampled data of a function to be constructed are its function values. Through out this paper, we reserve the term “function value” for the evaluation of a ∗
School of Mathematics, Jilin University, Changchun 130012, P. R. China. E-mail address:
[email protected]. Corresponding author. School of Data and Computer Science and Guangdong Provincial Key Lab of Computational Science, Sun Yat-sen University, Guangzhou 510275, P. R. China. This author is a Professor Emeritus of Mathematics, Syracuse University, Syracuse, NY 13244, USA. E-mail addresses:
[email protected],
[email protected] †
1
function at a point in its domain. We call such data the point-evaluation functional data of the function. The stability of the sampling process requires that the point-evaluation functionals are continuous. This leads to the notion of reproducing kernel Hilbert spaces (RKHSs) [5] which have become commonly used spaces for processing point-evaluation functional data. There is a bijective correspondence between a RKHS and its reproducing kernel [36]. Moreover, reproducing kernels are used to measure the similarity between elements in an input space. In machine learning, many effective schemes based upon reproducing kernels were developed and proved to be useful [16, 47, 50, 51, 58]. The theory of vector-valued RKHSs [44] received considerable attention in multi-task learning which concerns estimating a vector-valued function from its function values. In the framework of vector-valued RKHSs, kernel methods were proposed to learn multiple related tasks simultaneously, [21, 39]. Recently, to learn a function in a Banach space which has more geometric structures than a Hilbert space, the notion of reproducing kernel Banach spaces (RKBSs) was introduced in [64] and further developed in [53, 59, 65, 66]. The celebrated Shannon sampling formula can be regarded as an orthonormal expansion of a function in the Paley-Wiener space, which is a typical RKHS. Motivated by this fact, many general sampling theorems were established in RKHSs through frames or Riesz bases composed by reproducing kernels, [28, 29, 30, 35, 42]. There are many occasions in practical applications where point-evaluation functional data are not readily available. Sampling is to convert an analog signal into a sequence of numbers, which can then be processed digitally on a computer. These numbers are expected to be the values of the original signal at points in the domain of the signal. However, in most physical circumstances, one cannot measure the value f (x) of a signal f at the location x exactly, due to the non-ideal acquisition device at the sampling location. A more suitable form of the sampled data should be the weighted-average value of f in the neighborhood of the location x. The average functions may reflect the characteristic of the sampling devices. Such a sampling process is called an average sampling process [1, 55]. In signal analysis, sometimes what is known is the frequency information of a signal. These data, represented mathematically by the Fourier transform or wavelet transform, are clearly not the function values. When numerically solving a differential equation, the discrete values of the right-hand side are usually seen as the known information for obtaining an approximation of the exact solution. The data in this case may be expressed as evaluations of the differential operator applied to the exact solution at given points in its domain, which are not the function values of the exact solution. Another example concerns extended x-ray absorption fine structure spectroscopy (EXAFS), which is a useful atomic-scale probe for studying the environment of an atomic species in a chemical system. One key problem in EXAFS is to obtain the atomic radial distribution function, which describes how density varies as a function of the distance from a reference particle to the points in the space. The measured quantity used to reconstruct the atomic radial distribution function is the x-ray absorption coefficient as a function of the wave vector modulus for the photoelectron. Theoretically, this observed information is determined by an integral of the product of the atomic radial distribution function and functions reflecting the characteristics of the sample and the physics of the electron scattering process [14, 38]. All above mentioned data can be formulated mathematically as linear functional values of a function. The term “functional value” is reserved in this paper for the value of a functional applied to a function in its domain. Function values are a special case of linear functional values when the linear functional is the point-evaluation functional. Data, coming from non-pointevaluation functionals, are called the non-point-evaluation functional data. Large amount of such sampled data motivate us to consider constructing a function from its linear functional values other than its function values. There is considerable amount of work on processing non-point-evaluation functional data in the literature. The reconstruction of a function in the band-limited space, the finitely generated shift-invariant spaces and more generally the spaces of signals with finite rate of innovation from 2
its local average values were investigated, respectively, in [27, 56], [1, 3] and [55]. Sampling and reconstruction of vector-valued functions were considered in [7], where the sampled data were defined as the inner product of the function values and some given vectors. It is clear that these sampled data can also be treated as the linear functional values of the vector-valued function. A number of alternative estimators were introduced for the functional linear regression based upon the functional principal component analysis [10, 48, 60] or a regularization method in RKHSs [63]. Optimal approximation of a function f from a general RKHS by algorithms using only its finitely many linear functional values was widely studied in the area of information-based complexity [43, 57]. A quasi-interpolation scheme was established in [24] for linear functional data. In machine learning, regularized learning schemes were proposed for non-point-evaluation functional data. For example, in [22] the authors considered the learning algorithms with functional data. Learning a function from local average integration functional values was investigated in [58] in the framework of support vector machines. Regularized learning from general linear functional values in Banach spaces was considered in [38]. Regularized learning in the context of RKBSs was studied in [64, 65] with both data from point-evaluation functional and non-point-evaluation functional. Reconstructing a function in a usual RKHS (where only the point-evaluation functionals are continuous) from its non-point-evaluation functional data is not appropriate in general since the non-point-evaluation functionals used in the processing may not be continuous in the underlying RKHS. When constructing a function from its linear functional values, the sampled data are generally obtained with noise. To reduce the effects of the noise, we require that the sampling process be stable, that is, small errors in the sampled data will not result in large errors in the reconstructed function. Clearly, a usual RKHS that enjoys only the continuity of point-evaluation functionals is not a suitable candidate for processing a class of non-point-evaluation functionals. This demands the availability of a Hilbert space where specific non-point-evaluation functionals used in the processing are continuous. Such spaces are presently not available in the literature. Therefore, it is desirable to consider a Hilbert space where a specific class of linear functionals on it are continuous. The consideration of such a space leads to the notion of the functional reproducing kernel Hilbert spaces for non-point-evaluation functional data. The goal of this paper is to study the reproducing kernel Hilbert spaces for non-point-evaluation functional data. We shall introduce the notion of functional reproducing kernel Hilbert spaces by reviewing examples of data forms available in applications. Such a space ensures the continuity of a family of prescribed linear functionals on it. A functional reproducing kernel Hilbert space is expected to admit a reproducing kernel, which reproduces the linear functionals defining the space. We shall show that every functional reproducing kernel Hilbert space is isometrically isomorphic to a usual RKHS. Although this does not mean that the study of functional reproducing kernel Hilbert spaces can be trivially reduced to that of RKHSs, several important results about RKHSs can be transferred to functional reproducing kernel Hilbert spaces by the isometric isomorphism procedure. In the literature, the average sampling problem was usually investigated in spaces which are typical RKHSs. We shall point out that these spaces are in fact special functional reproducing kernel Hilbert spaces. We shall call such a space a perfect functional reproducing kernel Hilbert space, since it admits two different families of continuous linear functionals, among which one is the set of the standard point-evaluation functionals. The perfect functional reproducing kernel Hilbert spaces, especially the ones with respect to integral functionals, bring us special interest. Existing results about the average sampling are special cases of the sampling theorem which we establish in the functional reproducing kernel Hilbert space setting, with additional information on the continuity of the used integral functionals on the space. In machine learning, learning a function from its linear functional values was also considered in RKHSs or RKBSs. In the framework of the new spaces, we study the regularized learning schemes for learning an element from non-point-evaluation functional values. By establishing the representer 3
theorem, we shall show that the functional reproducing kernel Hilbert spaces are more suitable for learning from non-point-evaluation functional values than the usual RKHSs which are used in the existing literature. The theoretical results and specific examples in this paper all demonstrate that functional reproducing kernel Hilbert spaces provide a right framework for constructing an element from its non-point-evaluation continuous linear functional values. This paper is organized in nine sections. We review in section 2 several non-point-evaluation functional data often used in practical applications. Motivated by processing such forms of data, we introduce the notion of functional reproducing kernel Hilbert spaces in section 3. We also identify a functional reproducing kernel with each functional reproducing kernel Hilbert space and reveal the relation between the new type of spaces and the usual RKHSs. In section 4, we characterize the perfect functional reproducing kernel Hilbert spaces and the corresponding functional reproducing kernels and study the universality of these kernels. Motivated by the local averages of functions, we investigate in section 5 the perfect functional reproducing kernel Hilbert space with respect to a family of integral functionals and present two specific examples. Based upon the framework of functional reproducing kernel Hilbert spaces, we establish a general reconstruction formula of an element from its linear functional values in section 6 and develop the regularized learning algorithm for learning from non-point-evaluation functional values in section 7. In section 8, we comment on stability of a numerical reconstruction algorithm using linear functional values in a functional reproducing kernel Hilbert space. Finally, we draw a conclusion in section 9.
2
Non-point-evaluation functional data
Point-evaluation function values are often used to construct a function in the classical RKHS setting. However, we encounter in both theoretical analysis and practical applications large amount of data which are not point-evaluation function values. The purpose of this section is to review several such forms of data, which serve as motivation of introducing the notion of functional reproducing kernel Hilbert space. Classical methods of constructing a function are based on its finite samples. The finite empirical data of a function f from points in its domain X to R or C are usually taken as the values of f on the finite set {xj : j ∈ Nm } ⊆ X , where Nm := {1, 2, . . . , m}. We call them the point-evaluation functional data. Various approaches were developed for processing the pointevaluation functional data in the literature. The classical RKHSs are a class of function spaces suitable for the point-evaluation functional data. However, because of a variety of reasons, the available data in practical applications are not always in the form of point evaluations. We shall call data that are not point-evaluation function values the non-point-evaluation functional data. In most physical circumstances, although the values of a function f are expected to be sampled and processed on computers, one can not measure the value f (x) at the location x exactly because of the non-ideal acquisition device [1, 55]. Instead of the function values at the sample points, one obtains the finite local average values of f in the neighborhood of the sample points. Specifically, we suppose that {xj : j ∈ Nm } is a finite set of R and {uj : j ∈ Nm } is a set of nonnegative functions on R, where for each j ∈ Nm , the function uj supports in a neighborhood of xj . The sampled data have the form {Lj (f ) : j ∈ Nm }, where for each j ∈ Nm , Lj is defined by Lj (f ) := f (x)uj (x)dx, R
which can be viewed as a linear functional of f . We note that when the support of uj is small, the linear functional value Lj (f ) can be treated as an approximation of the point-evaluation function value of f at the location xj . 4
There are other practical occasions, where the available information is not point-evaluation functional data or the local average functional data. The frequency information of a signal are such non-point-evaluation functional data. Specifically, suppose that f is a 2π-periodic signal. The Fourier coefficients of f , defined by 2π 1 cj (f ) := √ f (x)e−ijx dx, j ∈ Z, 2π 0 are usually used to recover the signal f in the time domain. In some imaging applications, such as magnetic resonance imaging (MRI) [8, 25, 45, 52] and synthetic aperture radar (SAR) imaging, the sampled data may be collected as a series of non-uniform Fourier coefficients 2π 1 c˜j (f ) := √ f (x)e−iλj x dx, j ∈ Z, 2π 0 where the family {λj : j ∈ Z} of real numbers are not necessarily integers. When λj , j ∈ Z, are spaced logarithmically and centered around 0, the non-uniform Fourier coefficients represent a one-dimensional analogue of the spiral sampling that is sometimes used in MRI. These Fourier data that contain the information of the signal in the frequency domain can be taken as the non-point evaluation functional data. Another example concerns the x-ray absorption coefficients of an atomic radial distribution function in EXAFS spectroscopy [14, 38]. Within the EXAFS theory accepted generally, the x-ray absorption coefficient of the atomic radial distribution function f is described by
b
G(κ, r)f (r)dr,
χ(κ) := a
where G is the kernel defined by 2r − λ(κ)
G(κ, r) := 4πρφ(κ)e
sin(2κr + ψ(κ)).
Here, the constant ρ and the functions λ, φ, ψ are used to describe physical parameters of the Radon transform. Specifically, ρ is the sample density, λ is the mean-free-path of the photo-electron, φ is the scattering amplitude of the electron scattering process and ψ is the phase-shifts of the electron scattering process. To determine the atomic radial distribution function f , measurements are made giving values χ(κj ), j ∈ Nm . There were many methods developed in the literature to learn the atomic radial distribution function f from such sampled data. It is clear that for each κ, the quantity χ(κ) can be viewed as a linear functional value of the atomic radial distribution function f. We next present an example related to the differential functionals. For a set of non-negative integers α := [αk : k ∈ Nd ], we let |α| := k∈Nd αk . Suppose that Ω is an open subset of Rd and m is a positive integer. For α := [αk : k ∈ Nd ] with |α| ≤ m, we define the differential operator Dα for f ∈ C m (Ω) by |α| ∂ f (x), x := [xk : k ∈ Nd ]. Dα (f )(x) := ∂xα1 1 · · · ∂xαd d Let aα , |α| ≤ m, and g be given functions on Ω. When solving numerically the differential equation aα (x)Dα (f )(x) = g(x), x ∈ Ω, |α|≤m
5
with certain boundary conditions, the finite data with the form aα (xj )Dα (f )(xj ), j ∈ Nn , |α|≤m
are usually used to establish a numerical approximation of its exact solution f . In this case, what is known is the linear functional values of the solution f . The sampled data emerge in the cases discussed above are not the point-evaluation functional data but the non-point-evaluation functional ones. This will motivate us to consider estimating the target function f from the finite information {Lj (f ) : j ∈ Nm }, where Lj , j ∈ Nm , are linear functionals of f . Finally, we describe types of data used in the classical sampling. Motivated by the average sampling, one may consider reconstruction of a function f from non-point-evaluation functional data {Lj (f ) : j ∈ J}, where J is an index set and Lj , j ∈ J, is a sequence of linear functionals of f . For vector-valued functions taking values in a Hilbert space Y, there are two types of data used in the reconstruction formula. One concerns the point-evaluation operator data {f (xj ) : j ∈ J}, where each data point is a vector in Y. Another one consists of linear combinations of the components of the vector f (xj ), point-evaluation operator data. Following [7], we shall consider reconstructing a function f from the finite sampled data, defined as the inner product f (xj ), ξj Y , j ∈ J, where J is an index set and {ξj : j ∈ J} is a subset in Y. If we introduce a family of linear functionals Lj , j ∈ J, as Lj (f ) := f (xj ), ξj Y , j ∈ J, the above sampling problem can be taken as reconstructing an element f from its linear functional values Lj (f ), j ∈ J. It is known that RKHSs are ideal spaces for processing the point-evaluation functional data. However, these spaces are no longer suitable for processing non-point-evaluation functional data. Hence, it is desirable to introduce spaces suitable for processing such data. This is the focus of this paper.
3
The notion of functional reproducing kernel Hilbert spaces
We introduce in this section the notion of the functional reproducing kernel Hilbert space. We identify the classical RKHS as its special example. We present nontrivial examples of functional reproducing kernel Hilbert spaces with respect to integral functionals. In a way similar to an RKHS, each functional reproducing kernel Hilbert space has exactly one functional reproducing kernel to reproduce the functional values of elements in the space. Based upon the isometric isomorphism between a functional reproducing kernel Hilbert space and an RKHS, we characterize a functional reproducing kernel by its features. 1/2 Let H denote a Hilbert space with inner product ·, ·H and norm · H := ·, ·H . For a family F of linear functionals on H, we say that the norm of H is compatible with F if for each f ∈ H, L(f ) = 0 for all L ∈ F if and only if f H = 0. Definition 3.1. Let H be a Hilbert space and F a family of linear functionals on H. Space H is called a functional reproducing kernel Hilbert space with respect to F if the norm of H is compatible with F and each linear functional in F is continuous on H. In this paper, we shall use the abbreviation FRKHS to represent a functional reproducing kernel Hilbert space. It is convenient to associate a family F of linear functionals with an index set Λ, mainly, F := {Lα : α ∈ Λ}. 6
Two remarks are in order about this definition. First, unlike the typical RKHS, FRKHS is not always composed of functions, but general vectors. Second, for any Hilbert space H there always exists a family F of linear functionals such that H becomes an FRKHS with respect to F. In fact, we can choose the family F of linear functionals as the set of all the continuous linear functionals on H. That is, F = H∗ , the dual space of H. Since H∗ is isometrically anti-isomorphic to H, we can easily obtain that the norm of H is compatible with F and then H is an FRKHS with respect to F. Although any Hilbert space is an FRKHS with respect to some family of linear functionals, we are interested in finding an FRKHS with respect to a specific family of linear functionals. This is nontrivial and useful in practical applications. We begin with presenting several specific examples of FRKHSs to demonstrate that the notion of FRKHSs is not a trivial extension, but a useful development of the concept of RKHSs. The typical RKHS is a special example of FRKHSs. Let X be a set. The space H is chosen as a Hilbert space of functions from X to C and the family F of linear functionals on H is chosen as the set of the point-evaluation functionals defined by Lx (f ) := f (x), x ∈ X . Then it is clear that the norm of f ∈ H vanishes if and only if f , as a function, vanishes everywhere on X , which indicates that the norm of H is compatible with F. If each point-evaluation functional in F is continuous on H, the corresponding FRKHS H reduces to the classical RKHS. Another example of FRKHSs is the vector-valued RKHS, whose elements are functions taking range in a Hilbert space. Suppose that X is a set and Y is a Hilbert space with inner product ·, ·Y . A Hilbert space H of functions from X to Y is called a Y-valued RKHS if each pointevaluation operator on H is continuous. Let Λ := X × Y. We define for each (x, ξ) ∈ Λ the linear functional L(x,ξ) on H as (3.1) L(x,ξ) (f ) := f (x), ξY , f ∈ H, and set F := {L(x,ξ) : (x, ξ) ∈ Λ}. For each element f of H, its norm vanishes if and only if it, as a function, vanishes everywhere on X . The latter is equivalent to that there holds L(x,ξ) (f ) = 0, for any (x, ξ) ∈ Λ. Hence, the norm of H is compatible with F. The continuity of the point-evaluation operators on H leads directly to the continuity of the linear functionals in F. Consequently, by Definition 3.1 we conclude that H is an FRKHS with respect to F. We are particularly interested in FRKHSs with respect to a family of linear functionals not restricted to point-evaluation functionals. The next three examples concerning integral functionals show that the notion of FRKHSs is a useful development of the concept of classical RKHSs. Our interest in integral functionals is motivated by a practical scenario where the point-evaluation function values can not be measured exactly in practice due to inevitable measurement errors. Alternatively, it may be of practical advantage to use local averages of f as observed information [1, 38, 55, 56, 58]. Let X be a measure space with a positive measure μ. For 1 ≤ p < +∞, we denote by Lp (X , μ) the space of all the measurable functions f : X → C such that f Lp (X ,μ) :=
X
|f (x)|p dμ(x)
1/p < +∞.
In the case that μ is the Lebesgue measure on Rd and X is a measurable set of Rd , Lp (X , μ) will be abbreviated as Lp (X ). The first two concrete examples of FRKHSs with respect to integral functionals is related to L2 (X ). Let Λ be an index set. We suppose that the linear span of uα ∈ L2 (X ), α ∈ Λ, is dense in L2 (X ). For each α ∈ Λ we define a linear functional Lα for f ∈ L2 (X ) by Lα (f ) :=
X
f (x)uα (x)dx,
(3.2)
and set F := {Lα : α ∈ Λ}. We shall show that L2 (X ) is an FRKHS with respect to F. With the 7
help of the H¨older’s inequality, we have for each α ∈ Λ and each f ∈ L2 (X ) that |Lα (f )| ≤ f L2 (X ) uα L2 (X ) , which guarantees the well-definedness and the continuity of the functionals in F. Moreover, the density of the linear span of uα ∈ L2 (X ), α ∈ Λ, leads to the statement that Lα (f ) = 0 for all α ∈ Λ if and only if f L2 (X ) = 0. This shows that the norm of L2 (X ) is compatible with F. Consequently, by Definition 3.1 we conclude that L2 (X ) is an FRKHS with respect to the family F of linear functionals defined by (3.2) in terms of uα , α ∈ Λ. Two specific choices of the family of functions {uα : α ∈ Λ} lead to two FRKHSs which may be potentially important in practical applications. The first one is motivated by reconstructing a signal from its frequency components, which is one of the main themes in signal analysis. Fourier analysis has been proved to be an efficient approach for providing global energy-frequency distributions of signals. Because of its prowess and simplicity, Fourier analysis is considered as a most useful tool in signal analysis and many other application areas. In this example, we restrict ourselves to the case of periodic functions on R, where the frequency information of such a signal is represented by its Fourier coefficients. Specifically, we consider the Hilbert space L2 ([0, 2π]). Set Λ := Z and choose the functions uk , k ∈ Λ, as the Fourier basis functions. That is, 1 uk := √ eik· , k ∈ Λ. 2π Associated with these functions, we introduce a family of linear functionals on L2 ([0, 2π]) as 2π Lk (f ) := f (x)uk (x)dx, f ∈ L2 ([0, 2π]), k ∈ Λ, (3.3) 0
and set F := {Lk : k ∈ Λ}. It follows from the density of the Fourier basis functions that the space L2 ([0, 2π]) is an FRKHS with respect to F. Wavelet analysis is an important mathematical tool in various applications, such as signal processing, communication and numerical analysis. Various wavelet functions have been constructed for practical purpose in the literature [12, 17, 34, 37, 40]. Particularly, wavelet analysis provides an alternative representation of signals. The wavelet transform contains the complete information of a signal in both time and frequency domains at the same time. Our next example comes from the problem of recovering a signal from finite wavelet coefficients, which can be seen as the linear functional values of the signal. Suppose that ψ ∈ L2 (Rd ) is a wavelet function such that the sequence ψj,k , j ∈ Z, k ∈ Zd , defined by jd
ψj,k (x) := 2 2 ψ(2j x − k), x ∈ Rd , j ∈ Z, k ∈ Zd , constitutes an orthonormal basis for L2 (Rd ). We consider the space L2 (Rd ). To introduce a family of linear functionals, we let Λ := Z × Zd and for each (j, k) ∈ Λ set u(j,k) := ψj,k . Accordingly, the linear functionals are defined by f (x)u(j,k) (x)dx, f ∈ L2 (Rd ), (j, k) ∈ Λ. (3.4) L(j,k) (f ) := Rd
By the density of the orthonormal basis functions, we obtain directly that the space L2 (Rd ) is an FRKHS with respect to the family F := {L(j,k) : (j, k) ∈ Λ} of continuous linear functionals. 8
As a third concrete example of FRHKSs with respect to integral functionals, we consider the Paley-Wiener space on Rd . For a set of positive numbers Δ := {δj : j ∈ Nd }, we denote by BΔ (Rd ) the closed subspaces of functions in L2 (Rd ) which are bandlimited to the region [−δj , δj ]. IΔ := j∈Nd
That is,
BΔ (Rd ) := f ∈ L2 (Rd ) : suppfˆ ⊆ IΔ ,
ˆ and the with the inner product inherited from L2 (Rd ). Here, we define the Fourier transform h 1 d ˇ inverse Fourier transform h of h ∈ L (R ), respectively, by ˆ h(t)e−i(t,ω) dt, ω ∈ Rd , h(ω) := Rd
and ˇ h(ω) :=
1 (2π)d
Rd
h(t)ei(t,ω) dt, ω ∈ Rd ,
with (·, ·) being the standard inner product on Rd . By standard approximation process, the Fourier transform and its inverse can be extended to the functions in L2 (Rd ). It is well-known that the space BΔ (Rd ) is an RKHS. In other words, it is an FRKHS with respect to the family of the point-evaluation functionals. We shall show that it is also an FRKHS with respect to a family of the integral functionals to be defined next. Specifically, we suppose that u ∈ L2 (Rd ) satisfies the property that u ˆ(ω) = 0, a.e. ω ∈ IΔ . For each x ∈ Rd , we set ux := u(· − x) and introduce the functional Lx (f ) := f (t)ux (t)dt (3.5) Rd
and define F := {Lx : x ∈ Λ}, with Λ := Rd . We first show that the norm of BΔ (Rd ) is compatible with F. It suffices to verify for each f ∈ BΔ (Rd ) that if Lx (f ) = 0 for all x ∈ Rd , then f L2 (Rd ) = 0. By the Plancherel identity we have for any x ∈ Rd that u(ω)ei(x,ω) dω. fˆ(ω)ˆ Lx (f ) = IΔ
The density
span {ei(x,·) : x ∈ Rd } = L2 (IΔ )
ensures that the condition Lx (f ) = 0 for all x ∈ Rd is equivalent to that u(ω) = 0, a.e. ω ∈ IΔ . fˆ(ω)ˆ
(3.6)
By the assumption that u ˆ(ω) = 0, a.e. ω ∈ IΔ , (3.6) holds if and only if f L2 (Rd ) = 0. Thus, we have proved that the norm of BΔ (Rd ) is compatible with F. Moreover, applying the H¨older inequality to the definition of functional Lx yields that for each x ∈ Rd the functional Lx is continuous on BΔ (Rd ). According to Definition 3.1, space BΔ (Rd ) is an FRKHS with respect to the family F of linear functionals defined by (3.5). Finally, we describe a finite-dimensional FRKHS. Let H be a Hilbert space of dimension n and F := {Lα : α ∈ Λ} a family of linear functionals on H. We suppose that there exists a finite subset {αj : j ∈ Nn } of Λ such that Lαj , j ∈ Nn , are linearly independent. That is, if for {cj : j ∈ Nn } ⊆ C there holds cj Lαj (f ) = 0, for all f ∈ H, j∈Nn
9
then cj = 0, j ∈ Nn . It is known that any linear functional on a finite-dimensional normed space is continuous. Hence, the continuity of the linear functionals in F is clear. It suffices to verify that the norm of H is compatible with F. For each j ∈ Nn , as Lαj is continuous on H, there exists φj ∈ H such that Lαj (f ) = f, φj H , for all f ∈ H. (3.7) Then φj , j ∈ Nn , are linearly independent. In fact, if for some scalars cj , j ∈ Nn , j∈Nn cj φj = 0, by (3.7) we have that
cj Lαj (f ) = f, cj φj = 0, for all f ∈ H. j∈Nn
j∈Nn
H
This together with the linear independence of Lαj , j ∈ Nn , leads to that cj = 0 for all j ∈ Nn . That is, φj , j ∈ Nn , are linearly independent, which implies span {φj : j ∈ Nn } = H. Suppose that f ∈ H satisfies Lαj (f ) = 0 for all j ∈ Nn . We then get by (3.7) that f, φj H = 0, j ∈ Nn , which yields f H = 0. Consequently, we have proved that the norm of H is compatible with F. By Definition 3.1, we conclude that H is an FRKHS with respect to F. Observing from the examples presented above, we point out that the spaces L2 ([0, 2π]) and 2 L (Rd ) are FRKHSs but not classical RKHSs as they consist of equivalent classes of functions with respect to the Lebesgue measure. However, the space BΔ (Rd ) admits two different families of linear functionals on itself, one being the standard point-evaluation functionals and the other being the integral functionals having the form of (3.5). If the finite-dimensional FRKHS H consists of functions on an input set X , then all the point-evaluation functionals are continuous on H. Hence, the finite dimensional space H also admits the family of point-evaluation functionals, besides the family F. The FRKHSs having this property are especially desirable, which we shall study in the next two sections. We now turn to investigating the properties of general FRKHSs. A core in the theory of RKHSs lies in the celebrated result that there is a bijective correspondence between the reproducing kernel and the corresponding RKHS. We next establish a similar bijection in the framework of FRKHSs. We first identify the functional reproducing kernel to each FRKHS. Recall that an RKHS admits a reproducing kernel and the point-evaluation functional values f (x), x ∈ X , of a function f in the RKHS can be reproduced via its inner product with the kernel. In a manner similar to the RKHS, we show that for a general FRKHS there exists a kernel which can be used to reproduce the functional values Lα (f ), α ∈ Λ. Theorem 3.2. Suppose that H is a Hilbert space and F := {Lα : α ∈ Λ} is a family of linear functionals on H. If H is an FRKHS with respect to F, then there exists a unique operator K : Λ → H such that the following statements hold. (1) For each α ∈ Λ, K(α) ∈ H and Lα (f ) = f, K(α)H , for all f ∈ H.
(3.8)
(2) The linear span SK := span {K(α) : α ∈ Λ} is dense in H. (3) For each n ∈ N and each finite set Λn := {αj : j ∈ Nn } ⊆ Λ the matrix FΛn := [Lαk (K(αj )) : j, k ∈ Nn ] is hermitian and positive semi-definite. Proof. We shall construct an operator K : Λ → H which satisfies conditions (1)-(3). For each α ∈ Λ, since the functional Lα is continuous on H, by the Riesz representation theorem, there exists a unique element gα ∈ H such that Lα (f ) = f, gα H , for all f ∈ H. 10
We define the operator K from Λ to H by K(α) := gα , α ∈ Λ. Clearly, K satisfies statement (1). To prove statement (2), we suppose that f ∈ H satisfies f, K(α)H = 0, for all α ∈ Λ. This together with (3.8) ensures that Lα (f ) = 0 for all α ∈ Λ. By the hypothesis of this theorem, H is an FRKHS with respect to F. We get that the norm of H is compatible with F. Thus, we have that f = 0. This implies that SK = H. It remains to show statement (3). It follows from (3.8) that there holds for all α, β ∈ Λ Lα (K(β)) = K(β), K(α)H = K(α), K(β)H = Lβ (K(α)). Hence, for each n ∈ N and each finite set Λ n := {αj : j ∈ Nn } ⊆ Λ the matrix FΛn is hermitian. For each {cj : j ∈ Nn } ⊆ C, we let fn := j∈Nn cj K(αj ). We then observe from equation (3.8) that cj ck Lαk (K(αj )) = cj ck K(αj ), K(αk )H = fn , fn H ≥ 0, j∈Nn k∈Nn
j∈Nn k∈Nn
proving the desired result. It is known from equation (3.8) that one can use the operator K to reproduce the functional values Lα (f ), α ∈ Λ. Hence, we shall call equation (3.8) the reproducing property and K the functional reproducing kernel for FRKHS H. We present the functional reproducing kernels for the specific FRKHSs discussed earlier. We first consider the FRKHS identical to the RKHS. It is known [5] that an RKHS H consisting of functions on X has a reproducing kernel K from X × X to C such that f (x) = f, K(x, ·)H , f ∈ H, x ∈ X . The functional reproducing kernel can be represented via the classical reproducing kernel K by K(x) := K(x, ·), x ∈ X . In the remaining part of this paper, we shall use the conventional notation of the reproducing kernels for the classical RKHSs. A vector-valued RKHS H of functions from X to Y has an operator-valued reproducing kernel K from X ×X to the set of all the bounded linear operators from Y to Y, [39]. The operator-valued reproducing kernel satisfies the reproducing property as f (x), ξY = f, K(x, ·)ξH , f ∈ H, x ∈ X , ξ ∈ Y. As an FRKHS with respect to the family F of linear functionals defined by (3.1), space H has a functional reproducing kernel K represented via K by K(x, ξ)(y) := K(x, y)ξ, x, y ∈ X, ξ ∈ Y. The kernel for the FRKHS L2 (X ) with respect to the family F of the integral functionals can be represented by the functions uα , α ∈ Λ, appearing in (3.2). Recall that if the linear span of uα ∈ L2 (X ), α ∈ Λ, is dense in L2 (X ), then L2 (X ) is an FRKHS with respect to F. From the definition (3.2) of the integral functionals, we get that the functional reproducing kernel for the FRKHS L2 (X ) has the form K(α) := uα , α ∈ Λ. Particularly, the functional reproducing kernel for L2 ([0, 2π]) with respect to the family L of the linear functionals defined by (3.3) is determined by 1 K(j) := √ eij· , j ∈ Z. 2π
11
Similarly, the functional reproducing kernel for L2 (Rd ) with respect to the family L of the linear functionals defined by (3.4) coincides with the wavelet basis functions. That is, jd
K(j, k) := 2 2 ψ(2j (·) − k), j ∈ Z, k ∈ Zd . Another example of FRKHSs with respect to the family of the integral functionals is the PaleyWiener space of functions on Rd . Recall that BΔ (Rd ) is an RKHS with the translation-invariant reproducing kernel sin δj (xj − yj ) , x, y ∈ Rd . (3.9) K(x, y) := π(xj − yj ) j∈Nd
Let F be the family of the integral functionals defined by (3.5). We also verified earlier in this section that BΔ (Rd ) is an FRKHS with respect to such a set F. Let PBΔ (Rd ) denote the orthogonal projection from L2 (Rd ) onto BΔ (Rd ). Then, for each x ∈ Rd and each f ∈ BΔ (Rd ), there holds Lx (f ) = f, u(· − x)L2 (Rd ) = f, PBΔ (Rd ) u(· − x)L2 (Rd ) . This leads to the kernel K(x) = PBΔ (Rd ) u(· − x), x ∈ Rd . Our last example concerns the finite-dimensional FRKHS. If there exist n linearly independent elements in the family F of linear functionals on H, then the finite-dimensional Hilbert space H was proved to be an FRKHS with respect to the family F. To present the functional reproducing kernel, we suppose that {φj : j ∈ Nn } is a linearly independent sequence in H. For each α ∈ Λ, we can represent K(α) ∈ H as cj φj , for some sequence {cj : j ∈ Nn } ⊆ C. K(α) = j∈Nn
It follows from the reproducing property (3.8) that the sequence {cj : j ∈ Nn } satisfies the following system Lα (φk ) = cj φj , φk H , k ∈ Nn . j∈Nn
We denote by A the coefficients matrix, that is, A := [φk , φj H : j, k ∈ Nn ]. Due to the linear independence of φj , j ∈ Nn , matrix A is positive definite. Denoting by B := [bj,k : j, k ∈ Nn ] the inverse matrix of A, we obtain the functional reproducing kernel as follows: bj,k Lα (φk )φj , α ∈ Λ. K(α) = j∈Nn k∈Nn
Theorem 3.2 indicates that an FRKHS has a unique functional reproducing kernel. We now turn to the reverse problem. That is, associated with a given functional reproducing kernel, there exists a unique FRKHS. To show this, we first point out that Property (3) in Theorem 3.2 is an equivalent characterization for the functional reproducing kernel. Theorem 3.3. Suppose that Λ is a set and Z is a vector space. Let K be an operator from Λ to Z and F := {Lα : α ∈ Λ} be a family of linear functionals on the linear span SK := span {K(α) : α ∈ Λ} ⊆ Z. If for each n ∈ N and each finite set Λn := {αj : j ∈ Nn } ⊆ Λ, the matrix FΛn := [Lαk (K(αj )) : j, k ∈ Nn ] is hermitian and positive semi-definite, then there exists a unique FRKHS H with respect to F such that SK = H and for each f ∈ H and each α ∈ Λ, Lα (f ) = f, K(α)H . 12
Proof. We specifically construct the FRKHS and verify that it has the desired properties. We first construct a vector space based upon the given functional reproducing kernel. To this end, we define equivalent classes in the linear span SK . Specifically, two elements f1 , f2 ∈ SK are said to be equivalent provided that Lα (f1 ) = Lα (f2 ) holds for all α ∈ Λ. Accordingly, they induce a partition of SK by which SK is partitioned into a collection of disjoint equivalent classes. For an element f ∈ SK , we denote by f˜ the class equivalent to f and introduce the vector space H0 := {f˜ : f ∈ SK }. We then equip the vector space H0 with an inner product so that it becomes an inner product we define a bilinear mapping ·, ·H0 : H0 × H0 → C for space. For this purpose, f := j∈Nn cj K(αj ) and g := k∈Nm dk K(βk ) by f˜, g˜H0 :=
cj dk Lβk (K(αj )).
(3.10)
j∈Nn k∈Nm
From the representation f˜, g˜H0 =
⎛
d k L βk ⎝
⎞ cj K(αj )⎠ =
j∈Nn
k∈Nm
dk Lβk (f ),
k∈Nm
we observe that the bilinear mapping ·, ·H0 is independent of the choice of the equivalent class representation of elements in H0 for the first variable. Likewise, we may show that the bilinear mapping ·, ·H0 is also independent of the choice of the equivalent class representation of elements in H0 for the second variable. Thus, we conclude that the bilinear mapping (3.10) is well-defined. above is indeed an inner product on We next show that the bilinear mapping ·, ·H0 defined H0 . It can be verified for any f := j∈Nn cj K(αj ) and g := k∈Nm dk K(βk ) that there holds f˜, g˜H0 =
dk cj Lαj (K(βk )) = ˜ g , f˜H0 .
j∈Nn k∈Nm
Moreover, the positive semi-definiteness of FΛn leads directly to f˜, f˜H0 ≥ 0. Hence, the bilinear semi-inner product on H0 . It remains to check that if f˜, f˜H0 = 0 then f˜ = 0. mapping ·, ·H0 is a Suppose that f := j∈Nn cj K(αj ) satisfies f˜, f˜H0 = 0. By the Cauchy-Schwarz inequality we have for any α ∈ Λ that H | ≤ f˜, f˜H K(α), K(α) H = 0. |f˜, K(α) 0 0 0 Combining this with (3.10), we have for any α ∈ Λ that Lα (f ) =
H = 0, cj Lα (K(αj )) = f˜, K(α) 0
j∈Nn
which implies that Lα (f ) = 0 holds for all α ∈ Λ. The definition of the equivalent classes in SK implies that f˜ = 0. Consequently, we have established that H0 is an inner product space endowed with the inner product ·, ·H0 . Let H be the completion of H0 upon the inner product ·, ·H0 . We finally show that H is an FRKHS. For each f˜ ∈ H0 and each α ∈ Λ, we set Lα (f˜) := Lα (f ). Then, for any α ∈ Λ and any f˜ ∈ H0 we have that H . Lα (f˜) = f˜, K(α) 0 This implies that the linear functional Lα is continuous on H0 . For any F ∈ H, there exists a sequence {f˜n : n ∈ N} ⊆ H0 converging to F as n → ∞. By the continuity of the linear functional 13
Lα , we obtain that Lα (f˜n ), n ∈ N, is a Cauchy sequence. We then define Lα (F ) as the limit of the sequence Lα (f˜n ), n ∈ N, as n → ∞. That is, the linear functionals Lα , α ∈ Λ, can be extended to the Hilbert space H. Notice that H = F, K(α) H, Lα (F ) = lim f˜n , K(α) 0 n→∞
which leads to the continuity of the linear functional Lα on H. Moreover, this together with the density H0 = H yields that if Lα (F ) = 0 holds for all α ∈ Λ then F H = 0. This confirms that the norm of H is compatible with F. Consequently, we conclude that H is an FRKHS with respect to F. We abuse the notation by using K(α) for its equivalent class. Hence, we have that SK = H and for each F ∈ H and each α ∈ Λ, Lα (F ) = F, K(α)H . In addition, by the density, we also have the uniqueness of H. Theorem 3.3 is an analog for FRKHSs of Aronszajn’s theorem [5]. Combining Theorems 3.2 and 3.3, we conclude in the following corollary the equivalence between Property (3) in Theorem 3.2 and the definition of the functional reproducing kernel. Corollary 3.4. Suppose that Λ is a set and Z is a vector space. Let K be an operator from Λ to Z and F := {Lα : α ∈ Λ} be a family of linear functionals on the linear span SK := span {K(α) : α ∈ Λ} ⊆ Z. Then for each n ∈ N and each finite set Λn := {αj : j ∈ Nn } ⊆ Λ, the matrix FΛn := [Lαk (K(αj )) : j, k ∈ Nn ]
(3.11)
is hermitian and positive semi-definite if and only if K is the functional reproducing kernel for an FRKHS H with respect to F. Proof. We suppose that for each n ∈ N and each finite set Λn ⊆ Λ, the matrix FΛn defined by (3.11) is hermitian and positive semi-definite. By Theorem 3.3, there exists a unique FRKHS H with respect to F such that for each α ∈ Λ, K(α) ∈ H and the reproducing property (3.8) holds for all f ∈ H. That is, K is the functional reproducing kernel for H. Conversely, if H is an FRKHS with respect to F and K is its functional reproducing kernel, then we get Property (3) in Theorem 3.2 which states that the matrix FΛn defined by (3.11) is hermitian and positive semi-definite for each n ∈ N and each finite set Λn ⊆ Λ. In the sense of the equivalent characterization of the functional reproducing kernel, we obtain from Theorem 3.3 that associated with a given functional reproducing kernel, there exists a unique FRKHS. Consequently, we have established a bijective correspondence between a functional reproducing kernel and its corresponding FRKHS. We now present an example of functional reproducing kernels and identify its FRKHS. Let φn , n ∈ N, be a sequence of functions in C 1 [0, 1] such that φn (0) = 0, n ∈ N. We also suppose that for each x ∈ [0, 1] the sequences [φn (x) : n ∈ N] and [φn (x) : n ∈ N] belong to l2 (N) and the series φn (x)φn (y), (x, y) ∈ [0, 1] × [0, 1], n∈N
converges uniformly on [0, 1] × [0, 1]. With respect to the sequence φn , n ∈ N, satisfying the above conditions, we introduce an operator K : [0, 1] → C[0, 1] by φn (x)φn , x ∈ [0, 1]. (3.12) K(x) := n∈N
14
For each x ∈ [0, 1], we define a linear functional Lx on the linear span span {K(x) : x ∈ [0, 1]} by x Lx (f ) := f (t)dt, f ∈ span {K(x) : x ∈ [0, 1]}. 0
Set F := {Lx : x ∈ [0, 1]}. We shall show that K is a functional reproducing kernel with respect to F. For each x, y ∈ [0, 1], there holds y Ly (K(x)) = φn (x) φn (t)dt = φn (x)φn (y). 0
n∈N
n∈N
Then we have for each finite sets {xj : j ∈ Nm } ⊆ [0, 1] and {cj : j ∈ Nm } ⊆ C that ⎞⎛ ⎛ ⎞ ⎝ cj φn (xj )⎠ ⎝ ck φn (xk )⎠ = vl2 (N) ≥ 0, cj ck Lxk (K(xj )) = j∈Nm k∈Nm
n∈N
j∈Nm
k∈Nm
with vn := j∈Nm cj φn (xj ) and v := [vn : n ∈ N]. This yields that the matrix [Lαk (K(αj )) : j, k ∈ Nm ] is hermitian and positive semi-definite. Hence, we conclude that K is a functional reproducing kernel with respect to F. To identify the FRKHS of the functional reproducing kernel K defined by (3.12), we denote by W the closed subspace span {φn (x) : x ∈ [0, 1]} of l2 (N) and then introduce a Hilbert space an φn : [an : n ∈ N] ∈ W (3.13) H := n∈N
with the inner product f, gH :=
an bn , for f :=
n∈N
an φn , g :=
n∈N
bn φn .
n∈N
Moreover, the linear functionals in F will be extended to H by Lx (f ) = an φn (x), for f = an φn , x ∈ [0, 1]. n∈N
n∈N
It follows from the Cauchy-Schwarz inequality that for any x ∈ [0, 1] and any f ∈ H |Lx (f )| ≤
n∈N
1/2 |an |2
1/2 |φn (x)|2
n∈N
=
1/2 |φn (x)|2
f H ,
n∈N
which leads to the continuity of the linear functionals Lx , x ∈ [0, 1]. By the definition of the closed subspace W , we have for f = n∈N an φn that if Lx (f ) = 0 holds for all x ∈ [0, 1] then an = 0, n ∈ N, which leads to f = 0. This shows that the norm of H is compatible with F := {Lx : x ∈ [0, 1]}. Consequently, H is an FRKHS with respect to F. By (3.12) and the definition of the inner product, we get for any f ∈ H that Lx (f ) = f, K(x)H , which demonstrates that H is the FRKHS of K. It is known that every reproducing kernel has a feature map representation. This is the reason for which the reproducing kernels can be used to measure the similarity of any two inputs in machine learning. Specifically, K : X × X → C is a reproducing kernel on an input set X if and only if there exist a Hilbert space W and a mapping Φ : X → W such that K(x, y) = Φ(x), Φ(y)W , x, y ∈ X . 15
If span {Φ(x) : x ∈ X } = W , the RKHS of K can be determined by H := {w, Φ(·)W : w ∈ W } with the inner product
w, Φ(·)W , v, Φ(·)W
H
(3.14)
:= w, vW .
We shall present a characterization of a functional reproducing kernel by features. To this end, we suppose that K : Λ → Z is a functional reproducing kernel with respect to a family F = {Lα : α ∈ Λ} of linear functionals on SK ⊆ Z. Associated with K, we introduce a function K : Λ × Λ → C by K(α, β) := Lβ (K(α)), α, β ∈ Λ. (3.15) Note that K is a functional reproducing kernel if and only if K is a reproducing kernel on Λ. This together with the feature map representation of K leads directly the following characterization. Theorem 3.5. The operator K : Λ → Z is a functional reproducing kernel with respect to a family F = {Lα : α ∈ Λ} of linear functionals on SK ⊆ Z if and only if there exists a Hilbert space W and a mapping Φ : Λ → W such that Lβ (K(α)) = Φ(α), Φ(β)W , α, β ∈ Λ.
(3.16)
As in the theory of RKHSs, a Hilbert space W and a map Φ : Λ → W satisfying (3.16) are called, respectively, the feature space and the feature map of the functional reproducing kernel K. RKHSs are special cases of FRKHSs. To close this section, we shall show that there exists an isometric isomorphism between an FRKHS and an RKHS. This will be done by characterizing the RKHS of the reproducing kernel K defined by (3.15). Theorem 3.6. Let H be an FRKHS with respect to a family F = {Lα : α ∈ Λ} of linear functionals on H and K is the functional reproducing kernel for H. If K is the reproducing kernel on Λ defined by (3.15), then K and H are the feature map and feature space, respectively, of K and the RKHS of K is determined by H := {Lα (f ) : α ∈ Λ, f ∈ H}
(3.17)
with the inner product defined for f(α) := Lα (f ) and g(α) := Lα (g), α ∈ Λ, by f˜, g˜H := f, gH .
(3.18)
Proof. Combining the definition of K with the reproducing property (3.8) of K, we have for each α, β ∈ Λ that K(α, β) = K(α), K(β)H . This yields that K has K as its feature map and H as its feature space. As there holds the density span {K(α) : α ∈ Λ} = H, we have for each f˜ in the RKHS of K that there exists f ∈ H such that f˜(α) = f, K(α)H , α ∈ Λ, and there holds f˜, g˜ = f, gH , for g˜(α) = g, K(α)H , α ∈ Λ. This together with the reproducing property (3.8) leads to the representation of f˜ as f˜(α) = Lα (f ), α ∈ Λ. This proves the desired result.
16
Theorem 3.6 reveals an isometric isomorphism between the FRKHS H and the RKHS H, which is established in the next theorem. Theorem 3.7. Suppose that H is an FRKHS with respect to a family F = {Lα : α ∈ Λ} of linear functionals on H and K is the functional reproducing kernel for H. If K is the reproducing kernel on Λ defined by (3.15), then there is an isometric isomorphism between H and the RKHS H of K. Proof. We introduce the operator T : H → H for each f ∈ H by (T f )(α) := Lα (f ), α ∈ Λ. The representation (3.17) ensures that T is surjective. Moreover, it follows from (3.18) that there holds T f H = f H , which yields that T is an isometric isomorphism between H and H. Even though an FRKHS is isometrically isomorphic to a usual RKHS, one cannot reduce trivially the study of FRKHSs to that of RKHSs. Through the isometric isomorphism procedure, the new input space Λ may not have any useful structure. Moreover, we observe from the above discussion that the functions in the resulting RKHS and the corresponding reproducing kernel are obtained by taking the linear functionals on the original ones. The use of general functionals such as the integral functionals will bring difficulties to analyzing the new RKHS and its reproducing kernel.
4
Perfect FRKHSs
In this section, we study a special interesting class of FRKHSs (of functions on X ) with respect to two different families of linear functionals, one of which is the family of the standard pointevaluation functionals. We shall characterize the functional reproducing kernels of this type and study their universality. For a vector space V we say two families of linear functionals on V are equivalent if each linear functional in one family can be represented by a finite linear combination of linear functionals in another family. Throughout this paper we denote by F0 the family of point-evaluation functionals. Definition 4.1. Let a Hilbert space H of functions on X be an FRKHS with respect to a family F := {Lα : α ∈ Λ} of linear functionals on H, which is not equivalent to F0 . We call H a perfect FRKHS with respect to F if it is also an FRKHS with respect to F0 . It follows from Theorem 3.2 that a perfect FRKHS admits two functional reproducing kernels, which reproduce two different families of linear functionals, one being F0 . The following proposition gives the relation between the two functional reproducing kernels. Proposition 4.2. If H is a perfect FRKHS with respect to a family F := {Lα : α ∈ Λ} of linear functionals on H and if K and K are the corresponding functional reproducing kernels with respect to F and F0 , respectively, then K(α)(x) = Lα (K(x, ·)), x ∈ X , α ∈ Λ,
(4.1)
Proof. It follows from the reproducing properties of the two functional reproducing kernels that for each f ∈ H there hold Lα (f ) = f, K(α)H , α ∈ Λ and f (x) = f, K(x, ·)H , x ∈ X . 17
In particular, when we choose f := K(x, ·) for x ∈ X in the first equation, we obtain that Lα (K(x, ·)) = K(x, ·), K(α)H , x ∈ X , α ∈ Λ. Likewise, when we choose f := K(α) for α ∈ Λ in the second equation, we obtain that K(α)(x) = K(α), K(x, ·)H , x ∈ X , α ∈ Λ. The desired result of this proposition follows directly from the above two equations. Motivated by equation (4.1), we shall give an approach of constructing functional reproducing kernels. Proposition 4.3. If K is a reproducing kernel on X and F = {Lα , α ∈ Λ} is a family of continuous linear functionals on the RKHS H of K, then the operator K : Λ → H, given by (4.1), is a functional reproducing kernel for a perfect FRKHS with respect to F. Proof. Since for each α ∈ Λ the functional Lα is continuous on H, there exists gα ∈ H such that there holds (4.2) Lα (f ) = f, gα H , for any f ∈ H. To obtain an FRKHS with respect to F, we choose H := span{gα : α ∈ Λ} as the desired Hilbert space. Clearly, as a closed subspace of RKHS H, H is also an RKHS with determined by the reproducing kernel K K(x, ·) := PH (K(x, ·)), where PH denotes the orthogonal projection from H onto H. That is, space H is a perfect FRKHS with respect to F. By Proposition 4.2, we represent the functional reproducing kernel K as K(α)(x) = Lα (K(x, ·)), x ∈ X , α ∈ Λ. This together with equation (4.2) leads to K(α)(x) = gα , K(x, ·)H = gα , K(x, ·)H = Lα (K(x, ·)), x ∈ X , α ∈ Λ, which completes the proof. The perfect FRKHS obtained from the functional reproducing kernel K described in Proposition 4.3 is the closed linear span of the Riesz representers of F. Hence, it is, of course, a closed subspace of the RKHS H. In fact, space H itself is also a perfect FRKHS with respect to the family F ∪ F0 . We now turn to representing the functional reproducing kernel K for a perfect FRKHS H by the feature map. For this purpose, we let W and Φ : X → W be the feature space and the feature map of K, respectively. We assume that there holds the density condition span {Φ(x) : x ∈ X } = W.
(4.3)
Associated with the features Φ and W of K, we introduce a map Ψ from Λ to W as follows. Note α defined on W for each u ∈ W by that for each α ∈ Λ, the linear functional L α (u) := Lα (u, Φ(·)W ) L 18
is continuous since for each u ∈ W , α (u)| ≤ Lα u, Φ(·)W H = Lα uW . |L By the Riesz representation theorem we define Ψ by letting Ψ(α) be the unique element in W such that (4.4) Lα (u, Φ(·)W ) = u, Ψ(α)W , u ∈ W. We can represent K in terms of the maps Φ and Ψ. Proposition 4.4. Suppose that H is a perfect FRKHS with respect to a family F := {Lα : α ∈ Λ} of linear functionals on H. Let K and K denote the functional reproducing kernels with respect to F and F0 , respectively. If W is the feature space of K, Φ : X → W is its feature map with the density condition (4.3) and Ψ : Λ → W is defined by (4.4), then K(α) = Ψ(α), Φ(·)W , α ∈ Λ. Moreover, W and Ψ are, respectively, the feature space and the feature map of K, satisfying the density condition span {Ψ(α) : α ∈ Λ} = W. (4.5) Proof. Recalling the feature map representation K(x, y) = Φ(x), Φ(y)W , x, y ∈ X , we observe from (4.1) and the above feature map representation that K(α)(x) = Lα (K(x, ·)) = Lα (Φ(x), Φ(·)W ), x ∈ X , α ∈ Λ. By the definition of Ψ, we get that K(α)(x) = Ψ(α), Φ(x)W , x ∈ X , α ∈ Λ. This is the first part of this proposition. To prove the second part, for any α, β ∈ Λ, we find that Lβ (K(α)) = Lβ (Ψ(α), Φ(·)W ) = Ψ(α), Ψ(β)W . This yields that W is the feature space of K and Ψ is the feature map of K. To verify the density condition (4.5), it suffices to show that if u ∈ W satisfies u, Ψ(α)W = 0 for all α ∈ Λ, then u = 0. It follows from (4.4) that there hold Lα (u, Φ(·)W ) = 0 for all α ∈ Λ. Since the norm of H is compatible with F, we conclude that u, Φ(·)W H = 0, which is equivalent to uW = 0. Motivated by the representation of a functional reproducing kernel for a perfect FRKHS in Proposition 4.4, we now turn to characterizing the functional reproducing kernels of this type by its features. Theorem 4.5. An operator K : Λ → H is a functional reproducing kernel for a perfect FRKHS H with respect to a family F := {Lα : α ∈ Λ} of linear functionals on H if and only if there exist a Hilbert space W and two mappings Φ : X → W with the density condition (4.3) and Ψ : Λ → W with the density condition (4.5) such that K(α) = Ψ(α), Φ(·)W , α ∈ Λ.
(4.6)
Moreover, the perfect FRKHS of K has the form H := {u, Φ(·)W : u ∈ W }
(4.7)
u, Φ(·)W , v, Φ(·)W H := u, vW .
(4.8)
with the inner product
19
Proof. We first establish the characterization of a functional reproducing kernel for a perfect FRKHS. Suppose that K : Λ → H is a functional reproducing kernel for a perfect FRKHS H with respect to a family F := {Lα : α ∈ Λ} of linear functionals on H. According to the definition of the perfect FRKHS, H is a reproducing kernel Hilbert space with its reproducing kernel K. By theory of the classical reproducing kernel Hilbert space, we can choose a Hilbert space W as the feature space of the reproducing kernel K and a mapping Φ : X → W which satisfying the density condition (4.3) as its feature map. We then define the map Ψ according to (4.4). Hence, by Proposition 4.4, we get the representation of K as in (4.6). Conversely, we suppose that W is a Hilbert space, Φ : X → W satisfies the density condition (4.3) and Ψ : Λ → W satisfies the density condition (4.5), and K has the form (4.6). Associated with W and Φ we introduce a space in the form (4.7) of functions on X and define a bilinear mapping on H by equation (4.8). By the theory of classical RKHSs, we conclude that H is an RKHS. We shall show that H is a perfect FRKHS with respect to some family of linear functionals and the operator K is exactly the functional reproducing kernel for H. We now introduce a family of linear functionals on H. For each α ∈ Λ we define functional Lα on H by Lα (u, Φ(·)W ) := u, Ψ(α)W , u ∈ W. It is clear that Lα is linear. Moreover, we have that |Lα (u, Φ(·)W )| ≤ Ψ(α)W uW = Ψ(α)W u, Φ(·)W H , which ensures that Lα is continuous on H. We define a family of continuous linear functionals by F := {Lα : α ∈ Λ}. We next verify that the norm of H is compatible with F. Suppose that f := u, Φ(·)W ∈ H satisfies Lα (f ) = 0 for all α ∈ Λ. By the definition of functionals Lα , we have for all α ∈ Λ that u, Ψ(α)W = 0. This together with the density condition (4.5) leads to uW = 0 and then f H = 0. That is, the norm of H is compatible with F. We then conclude that H is a perfect FRKHS with respect to F. It remains to prove that the operator K with the form (4.6) is the functional reproducing kernel for H. It is clear that K(α) ∈ H, α ∈ Λ. For each α ∈ Λ and each f := u, Φ(·)W ∈ H, there holds f, K(α)H = u, Φ(·)W , Ψ(α), Φ(·)W H = u, Ψ(α)W = Lα (f ). This ensures that K is the functional reproducing kernel for H. We remark on the representation (4.6) of a functional reproducing kernel for a perfect FRKHS. For a general functional reproducing kernel, the feature map can only give the representation (3.16) for the values of the functionals Lβ , β ∈ Λ, at the kernel sections K(α), α ∈ Λ. Due to the special structure of the perfect FRKHS, we can give a more precise representation (4.6) for its functional reproducing kernel. We next present a specific example to illustrate the representation (4.6) of the functional reproducing kernel for a perfect FRKHS. To this end, we recall that the RKHS BΔ (Rd ) is also an FRKHS with respect to the family F of the integral functionals defined by (3.5). Hence, it is a perfect FRKHS. It was pointed out in section 3 that the functional reproducing kernel for BΔ (Rd ) has the form (4.9) K(x) = PBΔ (u(· − x)), x ∈ Rd . To give the feature representation (4.6) of kernel K, we first describe the choice of the space W as the feature space of the kernel K defined by (3.9) and the mapping Φ as its feature map. Let W := L2 (IΔ ). We define Φ : Rd → W for each x ∈ Rd by Φ(x) :=
1 ei(x,·) . (2π)d/2 20
Associated with W and Φ, the kernel K having the form (3.9) may be rewritten as K(x, y) = Φ(x), Φ(y)W , x, y ∈ Rd . This shows that W and Φ are, respectively, the feature space and the feature map of K. We next present the other map Ψ needed for the construction of K in terms of the function u appearing in (3.5). Specifically, for each x ∈ Rd , we set ˇei(x,·) . Ψ(x) := (2π)d/2 u By equation (4.9), we have for each x, y ∈ Rd that uei(x,·) , ei(y,·) W . K(x)(y) = (ˆ ue−i(x,·) χIΔ )∨ (y) = ˇ This together with the definition of Φ(y) and Ψ(x) leads to the formula K(x) = Ψ(x), Φ(·)W , x ∈ Rd for the kernel K. This is in the form of (4.6). Return to the general discussion of the perfect FRKHS. Associated with a functional reproducing kernel in the form (4.6) for a perfect FRKHS, there are two reproducing kernels
and
K(x, y) = Φ(x), Φ(y)W , x, y ∈ X
(4.10)
K(α, β) = Ψ(α), Ψ(β)W , α, β ∈ Λ.
(4.11)
As a direct consequence of Theorems 3.7 and 4.5, there exists an isometric isomorphism between We present this result below. the RKHSs of K and K. Corollary 4.6. Suppose that W is a Hilbert space and two mappings Φ : X → W and Ψ : Λ → W satisfy the density conditions (4.3) and (4.5), respectively. If the mapping K has the form (4.6) are, respectively, defined by (4.10) and (4.11), then K and K are and two mappings K and K the functional reproducing kernel and the reproducing kernel, respectively, for a perfect FRKHS is a reproducing kernel for an RKHS H. Moreover, there is an isometric isomorphism H and K between H and H. Proof. It is known from Theorem 4.5 that K is a functional reproducing kernel for a perfect FRKHS H with respect to F = {Lα : α ∈ Λ} defined by Lα (u, Φ(·)W ) := u, Ψ(α)W , u ∈ W. Moreover, K is the corresponding reproducing kernel for H with respect to the family of the leads directly to the fact that point-evaluation functionals. The feature map representation of K K is a reproducing kernel on Λ for the RKHS := {u, Ψ(·)W : u ∈ W }. H and Lβ , we get for all α, β ∈ Λ that By definitions of K, K K(α, β) = Ψ(α), Ψ(β)W = Lβ (Ψ(α), Φ(·)W ) = Lβ (K(α)). We then conclude from Theorem 3.7 that there is an isometric isomorphism between H and H.
21
To close this section, we consider universality of a functional reproducing kernel for a perfect FRKHS. Universality of classical reproducing kernels has attracted much attention in theory of machine learning [11, 41, 46, 54]. This important property ensures that a continuous target function can be uniformly approximated on a compact subset of the input space by the linear span of kernel sections. This is crucial for the consistence of the kernel-based learning algorithms, which generally use the representer theorem to learn a target function in the linear span of kernel sections. For learning a function from its functional values, we also have similar representer theorems, which will be presented in section 7. Below, we discuss universality of a functional reproducing kernel for a perfect FRKHS. Let a Hilbert space H of functions on a metric space X be a perfect FRKHS with respect to a family F of linear functionals on H. Suppose that the functional reproducing kernel K for H satisfies that for each α ∈ Λ, K(α) is continuous on X . We denote by C(Z) the Banach space of continuous functions on compact subset Z of X with the norm defined by f C(Z) := sup{|f (x)| : x ∈ Z}. We say that K is universal if for each compact subset Z of X , span {K(α) : α ∈ Λ} = C(Z), where for each α ∈ Λ, K(α) is taken as a function on Z and the closure is taken in the maximum norm of C(Z). To characterize universality of the functional reproducing kernel having the form (4.6), we first recall a convergence property of functions in an RKHS. That is, if a sequence fn converges to f in an RKHS, then fn , as a sequence of functions on X , converges pointwisely to f . As a direct consequence, we have the following result concerning the density in H. Lemma 4.7. Let H be an RKHS of continuous functions on a metric space X and M a subset of H. If M is dense in H, then for any compact subset Z of X there holds M = H, where the closure is taken in the maximum norm of C(Z). Proof. It suffices to prove that for any compact subset Z there holds H ⊆ M, where the closure is taken in the maximum norm of C(Z). Let f be any element in H. By the density of M in Hilbert space H, there exists a sequence fn in M converging to f . Due to the convergence property in RKHSs, we have for any compact subset Z of X that fn converges to f in C(Z). According to the arbitrariness of f in H, we conclude that H ⊆ M. We present below a characterization of universality of the functional reproducing kernel of the form (4.6). Theorem 4.8. Let X be a metric space, Λ a set and W a Hilbert space. Suppose that Φ : X → W is continuous and satisfies the density condition (4.3) and Ψ : Λ → W satisfies the density condition (4.5). Then the functional reproducing kernel K defined by (4.6) is universal if and only if there holds for any compact subset Z of X , span {u, Φ(·)W : u ∈ W } = C(Z).
(4.12)
Proof. We prove this theorem by employing Lemma 4.7. To this end, we identify the RKHS H and the subset M that appear in the lemma. It follows from Theorem 4.5 that the perfect FRKHS of the functional reproducing kernel K defined by (4.6) has the form H := {u, Φ(·)W : u ∈ W } with the inner product u, Φ(·)W , v, Φ(·)W H := u, vW . 22
Since Φ is continuous, the functions in H are all continuous. Define the linear span M by M := span {Ψ(α), Φ(·)W : α ∈ Λ}. The density condition (4.5) ensures that M is dense in H. Thus, according to Lemma 4.7, we get for any compact subset Z of X that M = H, where the closure is taken in the maximum norm of C(Z). By representation (4.6), we get that M = span {K(α) : α ∈ Λ}. Hence, the fact that M = H leads to span {K(α) : α ∈ Λ} = span {u, Φ(·)W : u ∈ W }. By the definition of universal kernels, we conclude that K is universal if and only if (4.12) holds for any compact subset Z of X . In addition to having a functional reproducing kernel K, a perfect FRKHS also has a classical reproducing kernel on X with the form K(x, y) = Φ(x), Φ(y)W , x, y ∈ X . The characterization of universality of classical reproducing kernels in [11, 41] shows that K is universal if and only if (4.12) holds for any compact subset Z of X . Hence, Theorem 4.8 reveals that the two kernels of a perfect FRKHS have the same universality.
5
Perfect FRKHSs with respect to integral functionals
Local average values of a function are often used as observed information in data analysis. For this reason, we investigate in this section the perfect FRKHS with respect to a family of integral functionals. To this end, we first recall the Bochner integral of a vector-valued function [19]. Let X be a measure space with a positive measure μ and Y a Hilbert space. To define the integral of a function from X to Y with respect to μ, we first introduce the integral of a simple function with the form f := j∈Nn χEj ξj , where ξj , j ∈ Nn , are distinct elements of Y, Ej , j ∈ Nn , are pairwise disjoint measurable sets and χEj denotes the characteristic function of Ej . If μ(Ej ) is finite whenever ξj = 0, the integral of f on a measurable set E is defined by f (x)dμ(x) := μ(Ej ∩ E)ξj . E
j∈Nn
A function f : X → Y is called measurable if there exists a sequence of simple functions fn : X → Y such that lim fn (x) − f (x)Y = 0, almost everywhere with respect to μ on X .
n→∞
A measurable function f : X → Y is called Bochner integrable if there exists a sequence of simple functions fn : X → Y such that lim fn (x) − f (x)Y dμ(x) = 0. n→∞ X
One can get by the above equation for any measurable set E that the sequence E fn (x)dμ(x) is a Cauchy sequence in Y. This together with the completeness of Y allows us to define the integral of f on E by f (x)dμ(x) := lim E
n→∞ E
23
fn (x)dμ(x).
It is known that a measurable function f : X → Y is Bochner integrable if and only if there holds f (x)Y dμ(x) < +∞. X
ξ, f (x)Y dμ(x) ≤ ξY f (x)Y dμ(x), for any ξ ∈ Y, X X Consequently, we can yielding that the functional X ·, f (x)Y dμ(x) is continuous on Y. comprehend the integral X f (x)dμ(x) as an element of Y satisfying f (x)dμ(x) = ξ, f (x)Y dμ(x), for any ξ ∈ Y. ξ, Notice that
X
X
Y
We introduce the integral functionals to be considered in this section. Let (X , M, μ) be a measure space. Suppose that Λ is an index set and {uα : α ∈ Λ} is a family of measurable functions on X . Associated with these functions, we define the family F := {Lα : α ∈ Λ} of integral functionals for a function f on X by Lα (f ) := f (x)uα (x)dμ(x), α ∈ Λ. (5.1) X
We now describe a construction of perfect FRKHSs with respect to F defined by (5.1) following Theorem 4.5. To this end, we suppose that W is a Hilbert space and a mapping Φ : X → W satisfies the density condition (4.3). The perfect FRKHSs will be constructed through a subspace of W . For each α ∈ Λ, we introduce a function from X to W by letting ψα (x) := uα (x)Φ(x), x ∈ X .
(5.2)
If the functions ψα , α ∈ Λ, are all Bochner integrable, then we define a closed subspace of W by ψα (x)dμ(x) : α ∈ Λ . (5.3) W := span X
We let PW denote the orthogonal projection from W to W . We then define the mapping Φ by Φ(x) := PW Φ(x), x ∈ X . : X → W in the The next lemma establishes the density condition involving the mapping Φ Hilbert space W . Lemma 5.1. Let W be a Hilbert space and Φ : X → W a mapping satisfying the density condition (4.3). If a family of functions uα , α ∈ Λ satisfies that ψα , α ∈ Λ, defined by (5.2), are all Bochner and Φ are defined as above, then there holds the density condition integrable and W . span {Φ(x) : x ∈ X} = W
(5.4)
define by (5.3) is a Proof. Since by hypothesis ψα , α ∈ Λ, are all Bochner integrable, space W satisfy closed subspace of W . We now show that equation (5.4) holds. To this end, we let u ∈ W that Φ(x), uW = 0 for any x ∈ X . By the definition of Φ, we have for each u ∈ W that Φ(x), uW = Φ(x), uW , x ∈ X . It follows from the hypothesis on u and the equation above that Φ(x), uW = 0 for any x ∈ X . This together with the density condition (4.3) of Φ leads to u = 0. Thus, we have proved the density condition (5.4). 24
We describe in the next theorem the construction of the perfect FRKHS with respect to F. Proposition 5.2. Let W be a Hilbert space and Φ : X → W a mapping satisfying the density condition (4.3). If a collection of functions uα : X → C, α ∈ Λ satisfies |uα (x)|Φ(x)W dμ(x) < ∞, for all α ∈ Λ, (5.5) X
are defined by (5.2) and (5.3), respectively, then the space ψα , α ∈ Λ, and W
:= u, Φ(·)W : u ∈ W H
(5.6)
with the inner product u, Φ(·)W , v, Φ(·)W H := u, vW , is a perfect FRKHS with respect to F := {Lα : α ∈ Λ} defined by (5.1) and its functional reproducing kernel K has the form ψα (x)dμ(x), Φ(·) , α ∈ Λ. (5.7) K(α) = X
W
Proof. We prove this theorem by employing Theorem 4.5. Specifically, we verify the hypothesis which satisfies the density of Theorem 4.5. For this purpose, we introduce a mapping Ψ : Λ → W . It follows from (5.5) that ψα , α ∈ Λ, is a sequence condition (4.5) with W being replaced by W by of Bochner integrable functions. This allows us to define the operator Ψ : Λ → W ψα (x)dμ(x), α ∈ Λ. Ψ(α) := X
, Ψ satisfies the density condition of W . By Lemma 5.1, the hypotheses of By definition of W Theorem 4.5 are satisfied. Consequently, by identifying for each α ∈ Λ, Lα (u, Φ(·) ) = u, Ψ(α)W , u ∈ W , W
:= u, Φ(·) with the inner product we conclude that the space H :u∈W W u, Φ(·) := u, vW , v, Φ(·)W H W is a perfect FRKHS with respect to F := {Lα : α ∈ Λ} and that the corresponding functional reproducing kernel has the form K(α) = Ψ(α), Φ(·) , α ∈ Λ. W has the form (5.6) and the functional reproducing We next show that the perfect FRKHS H kernel K has the form (5.7). Noting that u, Φ(x) = u, Φ(x)W for all u ∈ W , x ∈ X , W as in (5.6) and K as K(α) = Ψ(α), Φ(·)W , α ∈ Λ. This together with the we can represent H definition of Ψ leads to the form (5.7). It suffices to rewrite Lα , α ∈ Λ, in the form (5.1). Note that for each α ∈ Λ and each f = u, Φ(·)W ∈ H Lα (f ) = u, Ψ(α)W . Hence, by definitions of Ψ and ψα we have that u, uα (x)Φ(x)W dμ(x) = f (x)uα (x)dμ(x), Lα (f ) = X
X
which yields that Lα has the form (5.1) and thus, completes the proof. 25
Associated with the Hilbert space W and the mapping Φ : X → W in Proposition 5.2, the space H := {u, Φ(·)W : u ∈ W }, with the inner product u, Φ(·)W , v, Φ(·)W H := u, vW , given by (5.6) is a closed subspace of H. From the is an RKHS on X . The perfect FRKHS H representation of H, we obtain the following result. Corollary 5.3. Let W be a Hilbert space and Φ : X → W a mapping satisfying the density is a condition (4.3). If a collection of functions uα : X → C, α ∈ Λ, satisfies (5.5) and W subspace of W defined by (5.3), then the RKHS H := {u, Φ(·)W : u ∈ W } with the inner product u, Φ(·)W , v, Φ(·)W H := u, vW , is a perfect FRKHS with respect to F := {Lα : α ∈ Λ} = W. defined by (5.1) if and only if W defined by (5.6), of H is a Proof. It follows from Proposition 5.2 that the closed subspace H, = W, then we have H = H, which shows that H is just perfect FRKHS with respect to F. If W the perfect FRKHS. Conversely, we suppose that H is a perfect FRKHS with respect to F. For each u ∈ W , we set hα (u) :=
u, X
, α ∈ Λ.
ψα (x)dμ(x) W
= W, it suffices to verify that if u ∈ W satisfies hα (u) = 0 for any α ∈ Λ, then u = 0. To prove W By the definition of ψα , we have for each α ∈ Λ that u, uα (x)Φ(x)W dμ(x) = uα (x)u, Φ(x)W dμ(x). hα (u) = X
X
This together with the definition of Lα , α ∈ Λ, leads to hα (u) = Lα (u, Φ(·)W ). Since the norm of H is compatible with F, we have that hα (u) = 0 holds for any α ∈ Λ if and only if u, Φ(·)W H = 0, which is equivalent to u = 0. We now turn to presenting two specific examples of perfect FRKHSs with respect to the family of integral functionals defined by (5.1). These perfect FRKHSs come from widely used RKHSs. The first perfect FRKHS is constructed based on the translation invariant RKHSs. A reproducing kernel K : Rd × Rd → C is said to be translation invariant if for all a ∈ Rd , K(x−a, y −a) = K(x, y), for x, y ∈ Rd . In the scalar-valued case, a well-known characterization of translation invariant kernels was established in [9], which states that every continuous translation invariant kernel on Rd must be the Fourier transform of a finite nonnegative Borel measure on Rd , and vice versa. The characterization was generalized to the operator-valued case in [23]. We consider a special translation invariant reproducing kernel, for which the finite nonnegative Borel measure on Rd is absolutely continuous with respect to the Lebesgue measure. It follows from the Radon-Nikodym theorem that there exists a nonnegative Lebesgue integrable function ϕ such that K(x, y) = ei(x−y,t) ϕ(t)dt, x, y ∈ Rd . (5.8) Rd
We shall use Proposition 5.2 to construct a perfect FRKHS, which is a closed subspace of the translation invariant RKHS of K. To this end, we first present the feature space and the feature map of K. Associated with the nonnegative Lebesgue integrable function ϕ on Rd , we denote by L2ϕ (Rd ) the space of Lebesgue measurable functions f on Rd satisfying that ϕ1/2 f belongs to L2 (Rd ). The space L2ϕ (Rd ) is a Hilbert space endowed with the inner product f, gL2ϕ (Rd ) := f (t)g(t)ϕ(t)dt, f, g ∈ L2ϕ (Rd ). Rd
26
We choose W := L2ϕ (Rd ) and Φ(x) := ei(x,·) , x ∈ Rd . We need to verify that Φ(x), x ∈ Rd , belong to W and satisfy the density condition (4.3). Notice that for each x ∈ Rd 2 i(x,t) 2 Φ(x)W = |e | ϕ(t)dt = ϕ(t)dt. Rd
Rd
This together with the integrability of ϕ yields that Φ(x) belongs to W with the norm satisfying 1/2
Φ(x)W = ϕL1 (Rd ) .
(5.9)
Moreover, it is clear that the density condition (4.3) holds. To describe the integral functionals of the form (5.1) in this case, we introduce a family uα , α ∈ Λ, of Lebesgue integrable functions on Rd . Associated with ϕ and uα , α ∈ Λ, we define the inner product space uα : α ∈ Λ} , (5.10) H := (ϕu)∧ : u ∈ span {ˇ with the inner product
f, gH :=
u(t)v(t)ϕ(t)dt
(5.11)
Rd
for f = (ϕu)∧ and g = (ϕv)∧ . The closure in (5.10) is taken in W . As a consequence of Proposition 5.2, we get in the next theorem a construction of perfect FRKHSs with respect to a family of integral functionals defined by (5.1). Corollary 5.4. If ϕ is a nonnegative Lebesgue integrable function on Rd and uα , α ∈ Λ, is a family of Lebesgue integrable functions, then the space H defined by (5.10) with the inner product (5.11) is a perfect FRKHS with respect to F := {Lα : α ∈ Λ} defined by (5.1) and the functional uα ) ∧ . reproducing kernel is given by K(α) := (2π)d (ϕˇ Proof. In order to use the construction in Proposition 5.2, we first verify condition (5.5). According to equation (5.9) and the integrability of uα , α ∈ Λ, we have for each α ∈ Λ that 1/2 |uα (x)|Φ(x)W dx = ϕL1 (Rd ,Mn ) uα L1 (Rd ) < ∞. (5.12) Rd
of the form (5.6) and the corresponding Hence, by Proposition 5.2 we obtain a perfect FRKHS H functional reproducing kernel K of the form (5.7). and K by making use of the functions ϕ and uα , α ∈ Λ. Note We next turn to representing H that for each α ∈ Λ uα (x)Φ(x)dx = (2π)d u ˇα . Rd
in (5.6) of W can be represented as W = span {ˇ uα : α ∈ Λ} . There Then the closed subspace W d holds for each x ∈ R and each u ∈ W that u, Φ(x)W = e−i(x,t) u(t)ϕ(t)dt = (ϕu)∧ (x). Rd
= H and the and u, Φ(x)W into (5.6), we have that H Substituting the representations of W inner product of H can be represented as in (5.11). Moreover, the functional reproducing kernel K can be represented as uα (x)Φ(x)dx, Φ(·) = (2π)d (ϕˇ uα )∧ , α ∈ Λ. K(α) := Rd
W
27
We next present another perfect FRKHS, which is a closed subspace of a shift-invariant space. To this end, we first recall the shift-invariant spaces, which have been extensively studied in wavelet theory [17, 26], approximation theory [18, 31] and sampling [2, 3, 28]. Given φ ∈ L2 (Rd ) that satisfies ˆ + 2πj)|2 ≤ M, for almost every ξ, |φ(ξ (5.13) m≤ j∈Zd
for some m, M > 0, we construct the shift-invariant space by ⎧ ⎫ ⎨ ⎬ ck φ(· − k) : {ck : k ∈ Zd } ∈ l2 (Zd ) . V 2 (φ) := ⎩ d ⎭ k∈Z
Here, l2 (Zd ) is the Hilbert space of square-summable sequences on Zd . The function φ is called the generator of the shift-invariant space V 2 (φ). Clearly, V 2 (φ) is a closed subspace of L2 (Rd ) and condition (5.13) ensures that the sequence φ(· − k), k ∈ Zd , constitutes a Riesz basis of V 2 (φ). To guarantee that V 2 (φ) is an RKHS, additional conditions need to be imposed on the generator φ. Following [2], we first introduce a weight function ω on Rd , which is assumed to be continuous, symmetric and positive, and to satisfy the condition 0 < ω(x + y) ≤ ω(x)ω(y), for any x, y ∈ Rd . Moreover, we assume that ω satisfies the growth condition ∞ log ω(nk) n=1
n2
< ∞, for all k ∈ Zd .
We also recall the Wiener amalgam spaces W (Lpw ), 1 ≤ p ≤ ∞. For 1 ≤ p < ∞, a measurable function f is said to belong to W (Lpw ), if it satisfies ess sup{|f (x + k)|p ω(k)p : x ∈ [0, 1]d } < ∞. f pW (Lp ) := w
k∈Zd
For p = ∞, a measurable function f is said to belong to W (L∞ w ) if it satisfies := sup {ess sup{|f (x + k)|ω(k) : x ∈ [0, 1]d }} < ∞. f W (L∞ w ) k∈Zd
We denote by W0 (Lpw ) the subspace of continuous functions in W (Lpw ). It has been proved in [2] that if φ ∈ W0 (L1w ), the space V 2 (φ) is an RKHS with the reproducing kernel − k), x, y ∈ Rd , φ(x − k)φ(y (5.14) K(x, y) = k∈Zd
− k), k ∈ Zd , is the dual Riesz basis where φ ∈ V 2 (φ) satisfies the condition that the sequence φ(· d of φ(· − k), k ∈ Z . To see the reproducing property, for each cj φ(· − j) ∈ V 2 (φ), f := j∈Zd
we note that f, K(x, ·)L2 (Rd ) =
j∈Zd
− k) 2 d . cj φ(x − k)φ(· − j), φ(· L (R )
k∈Zd
Due to the biorthogonality relation − k) 2 d = δj,k , j, k ∈ Zd , φ(· − j), φ(· L (R ) 28
the above equation reduces to f, K(x, ·)L2 (Rd ) =
cj φ(x − j) = f (x).
j∈Zd
In order to use the construction described in Proposition 5.2, we choose a natural feature map representation for the reproducing kernel K. That is, W := V 2 (φ) and Φ(x) := K(x, ·), x ∈ Rd . Moreover, we introduce a family of Lebesgue measurable functions uα , α ∈ Λ, on Rd satisfying for each α ∈ Λ, ⎛ ⎞1/2 |uα (x)| ⎝ |φ(x − k)|2 ⎠ dx < ∞. (5.15) Rd
k∈Zd
Associated with uα , α ∈ Λ, we define the closed subspace of V 2 (φ) as ⎧ ⎫ ⎨ ⎬ ˜ − k) : α ∈ Λ . uα (t)φ(t − k)dt φ(· H := span ⎩ d ⎭ Rd
(5.16)
k∈Z
We next construct a perfect FRKHS with respect to a family of integral functionals defined by (5.1). Corollary 5.5. If φ ∈ W0 (L1w ) satisfies (5.13) and a family of measurable functions uα , α ∈ Λ, on Rd satisfies (5.15), then the closed subspace H defined by (5.16) of V 2 (φ) is a perfect FRKHS with respect to the family F := {Lα : α ∈ Λ} of integral functionals defined by (5.1) and the corresponding functional reproducing kernel is given by ˜ − k), α ∈ Λ, x ∈ Rd . K(α)(x) := uα (t)φ(t − k)dt φ(x (5.17) Rd
k∈Zd
Proof. To apply Proposition 5.2, we need to verify for each α ∈ Λ that uα Φ, as a W -valued function, is Bochner integrable. By the reproducing property, we obtain for any x ∈ Rd that − k), Φ(x)2 = K(x, x) = φ(x − k)φ(x W
k∈Zd
which leads to that for each α ∈ Λ,
Rd
|uα (x)|Φ(x)W dx =
Rd
⎛ |uα (x)| ⎝
⎞1/2 − k)⎠ φ(x − k)φ(x
dx.
(5.18)
k∈Zd
For each x ∈ Rd , we introduce two sequences a := {ak : k ∈ Zd } with ak := φ(x + k) and ˜ + k). Note that the dual generator φ˜ ∈ V 2 (φ) has the ˜k := φ(x a ˜ := {˜ ak : k ∈Zd } with a expansion φ˜ = k∈Zd bk φ(· − k), where the coefficients bk , k ∈ Zd , are the Fourier coefficients of the 2π-periodic function ⎛ b(ξ) := ⎝
⎞−1 ˆ + 2πj)|2 ⎠ |φ(ξ
j∈Zd
29
, ξ ∈ [−π, π]d .
Set b := {bk : k ∈ Zd }. Notice that for each k ∈ Zd a ˜k = bl φ(x + k − l) = bl ak−l = (a ∗ b)k . l∈Zd
l∈Zd
Since φ ∈ W0 (L1w ), we get that a ∈ l2 (Zd ). By Lemmas 2.8 and 2.11 in [2], we also have that ⎧ ⎫ ⎨ ⎬ |ck | < +∞ . b ∈ l1 (Zd ) := {ck : k ∈ Zd } : ⎩ ⎭ d k∈Z
Hence, the above equation leads to a ˜ ∈ l2 (Zd ) and ˜ al2 (Zd ) ≤ al2 (Zd ) bl1 (Zd ) . Then by (5.18), we obtain that
1/2
Rd
⎛
|uα (x)|Φ(x)W dx ≤ bl1 (Zd )
Rd
|uα (x)| ⎝
⎞1/2 |φ(x − k)|2 ⎠
dx,
k∈Zd
which together with (5.15) shows that the function uα Φ is Bochner integrable. By Proposition 5.2, we get the perfect FRKHS as a closed subspace of V 2 (φ) := span uα (x)Φ(x)dx : α ∈ Λ H Rd
with the functional reproducing kernel K(α) :=
Rd
uα (x)Φ(x)dx.
= H and K represented as in (5.17). According to (5.14) we get H To close this section, we give a sufficient and necessary condition which ensures that the FRKHS H appearing in (5.16) is identical to V 2 (φ), that is, ⎧ ⎫ ⎨ ⎬ ˜ − k) : α ∈ Λ = V 2 (φ). span uα (t)φ(t − k)dt φ(· (5.19) ⎩ d ⎭ Rd k∈Z
The condition stated in the following Lemma 5.6 will provide a useful tool to verify that a shiftinvariant space is an FRKHS. It allows us to consider sampling theorems and regularized learning schemes from integral functional data in shift-invariant spaces. For this purpose, we introduce a sequence of functions by letting for each α ∈ Λ ˆ + 2lπ), ξ ∈ [−π, π]d . gα (ξ) := u &α (ξ + 2lπ)φ(ξ l∈Zd
Lemma 5.6. If φ ∈ W0 (L1w ) satisfies (5.13) and for each α ∈ Λ, uα ∈ L2 (Rd ) satisfies (5.15), then the density condition (5.19) holds if and only if span {gα : α ∈ Λ} = L2 ([−π, π]d ).
30
(5.20)
Proof. By the definition of gα , α ∈ Λ, we have for each α ∈ Λ and k ∈ Zd that i(k,ξ) ˆ uα (x)φ(x − k)dx = u &α (ξ)φ(ξ)e dξ Rd Rd ˆ + 2lπ)ei(k,ξ) dξ = u &α (ξ + 2lπ)φ(ξ l∈Zd
[−π,π]d
= [−π,π]d
gα (ξ)ei(k,ξ) dξ = ck (gα ),
where we denote by ck (h) the k-th Fourier coefficients of h ∈ L2 ([−π, π]d ). Combining the above observation with the representation of the functions in V 2 (φ), we note that the density condition (5.19) holds if and only if span {{ck (gα ) : k ∈ Zd } : α ∈ Λ} = l2 (Zd ), which is equivalent to (5.20).
6
Sampling and reconstruction in FRKHSs
We describe in this section a reconstruction of an element in an FRKHS from its functional values. This study is motivated by the average sampling, which aims at recovering a function from its local average values represented as integral functional values. As pointed out in the classical sampling theory in RKHSs, the local average functionals used in the reconstruction should be continuous to ensure the stability of the sampling process. In practice, only finite local average functionals are used to establish a reconstruction of a function. However, to obtain a more precise approximation of the target function, more local average functionals are demanded. Hence, the average sampling should be considered in a space which ensures the continuity of the local average functionals with respect to all the points in the domain. Such a space is exactly an FRKHS with respect to the family of all the local average functionals. We establish the sampling theorem in the context of general FRKHSs and present explicit examples concerning the Paley-Wiener spaces. The average sampling has been widely studied in classical RKHSs, such as the Paley-Wiener spaces and the shift-invariant spaces [1, 55, 56] without the availability of FRKHSs. We point out that these spaces are FRKHSs with respect to the local average functionals and the existing sampling theorems are special cases of the main results to be presented in this section. The availability of a frame or a Riesz basis of a Hilbert space is convenient for complete reconstruction of a function. We review the concept of frames and Riesz bases of a Hilbert space according to [17, 20]. Let H be a separable Hilbert space with the inner product ·, ·H and let J be a countable index set. A sequence {fj : j ∈ J} ⊂ H is called a frame of H if there exist constants A, B with 0 < A ≤ B < +∞ such that ⎛ ⎞1/2 |f, fj H |2 ⎠ ≤ Bf H , for all f ∈ H. (6.1) Af H ≤ ⎝ j∈J
For a frame {fj : j ∈ J} of H, there exists a dual frame {gj : j ∈ J} ⊂ H, for which f, gj H fj , for each f ∈ H. f= j∈J
A frame {fj : j ∈ J} is a Riesz basis of H if it is minimal, that is, fj ∈ / span {fk : k ∈ J, k = j}, for each j ∈ J. 31
(6.2)
If {fj : j ∈ J} is a Riesz basis of H, there exists a unique sequence {gj : j ∈ J} ⊂ H satisfying (6.2) and we call it the dual Riesz basis of the original Riesz basis. Moreover, the dual Riesz basis {gj : j ∈ J} satisfies the biorthogonal condition that fj , gk H = δjk , for all j, k ∈ J, where δj,k is the Kronecker delta notation. We now consider reconstructing an element in an FRKHS in terms of its sampled functional values. Let H be an FRKHS with respect to a family F of linear functionals Lα , α ∈ Λ and K the functional reproducing kernel for H. As far as sampling is concerned, we assume that there exists a countable sequence αj ∈ Λ, j ∈ J, such that K := {K(αj ) : j ∈ J} constitutes a frame for H. For the purpose of constructing a dual frame of K, we define the frame operator T : H → H for each f ∈ H by f, K(αj )H K(αj ). (6.3) T f := j∈J
It is well-known [17, Page 58] that operator T is bounded, invertible, self-adjoint and strictly positive on H and the sequence {Gj : j ∈ J} with Gj := T −1 (K(αj )) is the dual frame of K. We shall show that the dual frame {Gj : j ∈ J} is also induced by a functional reproducing kernel. from Λ to H by For this purpose, we define operator K K(α) := T −1 (K(α)), for each α ∈ Λ
(6.4)
:= {f : f ∈ H} with the bilinear mapping ·, · defined by and introduce the space H H f, gH := T f, gH , f, g ∈ H.
(6.5)
is an inner product space Since operator T is self-adjoint and strictly positive, we conclude that H have the same elements although endowed with the inner product (6.5). The spaces H and H their norms are different. The next lemma reveals the equivalence of their norms. Lemma 6.1. Let H be an FRKHS with respect to a family F of linear functionals Lα , α ∈ Λ and K the functional reproducing kernel for H. If for a countable sequence αj , j ∈ J, in Λ, the defined set K constitutes a frame for H and T is the frame operator, then the norms of H and H, as above, are equivalent. Proof. It is known [17, Page 58] that there exist A, B > 0 such that for any f ∈ H, Af 2H ≤ T f, f H ≤ Bf 2H . This together with the fact f, f H = T f, f H yields the norm equivalence of H and H. is a Hilbert space since H is a Hilbert space. The following It follows from Lemma 6.1 that H is also an FRKHS with respect to F and has K as its functional proposition shows that H reproducing kernel. Proposition 6.2. Let H be an FRKHS with respect to a family F of linear functionals Lα , α ∈ Λ, and K the functional reproducing kernel for H. If for a countable sequence αj , j ∈ J, in Λ, the set K := {K(αj ) : j ∈ J} is also an FRKHS with constitutes a frame for H and T is the frame operator, then space H respect to F and K defined by (6.4) is the corresponding functional reproducing kernel for H. 32
established in Lemma 6.1, we can show that the Proof. By the norm equivalence of H and H norm of H is compatible with F since so is the norm of H. Combining the norm equivalence with the continuity of the functionals Lα , α ∈ Λ, on H, we obtain that these functionals are continuous The definition of the FRKHS ensures that H is an FRKHS with respect to F. on H. reproduces the functionals Lα (f ), for α ∈ Λ. By (6.5), It remains to show that the operator K the self-adjoint property of T and the reproducing property of the kernel K, we get for each f ∈ H that f, K(α) = T f, K(α)H = f, T K(α)H = f, K(α)H = Lα (f ), for all α ∈ Λ. H is the corresponding functional reproducing kernel for H. Hence, K We introduce the set
:= {K(α j ) : j ∈ J}. K
(6.6)
is the dual frame of K in H. We shall call operator K defined by (6.4) the dual functional Clearly, K reproducing kernel of K with respect to the set {αj : j ∈ J}. With the help of the dual functional reproducing kernel, we establish next the complete reconstruction formula of an element in an FRKHS from its functional values, which is the main theorem of this section. Theorem 6.3. Let H be an FRKHS with respect to the family F of linear functionals Lα , α ∈ Λ and K the functional reproducing kernel for H. If for a countable sequence αj , j ∈ J, in Λ, the is the dual functional reproducing kernel of K with respect set K constitutes a frame for H and K to the set {αj : j ∈ J}, then K defined by (6.6) is a dual frame of K in H and there holds for each f ∈ H, j ), f= Lαj (f )K(α (6.7) j∈J
Moreover, if K is a Riesz basis for H, where the reconstruction formula (6.7) holds also in H. is an orthonormal basis for the FRKHS H of K. then K is a dual Proof. It follows from the definition of the dual functional reproducing kernel that K frame of K in H. Then, by (6.2) we have that j ). f, K(αj )H K(α f= j∈J
This together with the reproducing property Lαj (f ) = f, K(αj )H , j ∈ J, leads to the reconstruction formula (6.7). Lemma 6.1 guarantees that the norms of H and H are equivalent. Moreover, the two spaces have the same elements. Therefore, the reconstruction formula (6.7) holds also in H. by using the In the case when K is a Riesz basis for H, we verify the orthonormality of K biorthogonality of Reisz basis K and its dual K, that is, k )H = δj,k , j, k ∈ J. K(αj ), K(α Indeed, for all j, k ∈ J, we have that k ) = T K(α j ), K(α k )H = K(αj ), K(α k )H . j ), K(α K(α H This with (6.8) proves the orthonormality of K. 33
(6.8)
In general, the reconstruction formula (6.7) holds only in the norm of an FRKHS. However, when an FRKHS is perfect, the series in (6.7) converges both pointwise and uniformly on any set where the reproducing kernel is bounded. This is due to the fact that the point-evaluation functionals are also continuous, when an FRKHS is perfect. This result is presented in the following corollary. Corollary 6.4. Suppose that a Hilbert space H of functions on X is a perfect FRKHS with respect to the family F of linear functionals Lα , α ∈ Λ. Let K and K be the functional reproducing kernel with respect to F and F0 , respectively. If for a countable sequence αj , j ∈ J, in Λ, the set K is the dual functional reproducing kernel of K with respect to the constitutes a frame for H and K set {αj : j ∈ J}, then there holds for each f ∈ H and each x ∈ X , j )(x). Lαj (f )K(α (6.9) f (x) = j∈J
Moreover, the series converges uniformly on the set where K(x, x) is bounded. Proof. Suppose that Jn , n ∈ Z, is a sequence of subsets of J such that Jn ⊆ Jn+1 , n ∈ Z. For each f ∈ H and each n ∈ Z, set j ). Lαj (f )K(α fn := j∈Jn
By Theorem 6.3, we have that the sequence fn , n ∈ Z, converges to f as n tends to infinity. It follows from the reproducing property of K that for any x ∈ X |f (x) − fn (x)| = |f − fn , K(x, ·)H | ≤ f − fn H K(x, ·)H . This leads directly to (6.9). Again, by the reproducing property of K, we have that ' ' K(x, ·)H = K(x, ·), K(x, ·)H = K(x, x), which leads to |f (x) − fn (x)| ≤ f − fn H
' K(x, x).
Hence, we conclude that fn converges uniformly to f on the set where K(x, x) is bounded. To apply Theorem 6.3, we have to find the frame in the form {K(αj ) : j ∈ J} for an FRKHS in order to establish the complete reconstruction formula. It is helpful to have a characterization of such a frame in terms of the features of the corresponding functional reproducing kernel. Lemma 6.5. Let Φ be a map from Λ to a Hilbert space W satisfying span {Φ(α) : α ∈ Λ} = W. If K is a functional reproducing kernel for an FRKHS H, defined by (3.16), then for a countable sequence αj , j ∈ J, in Λ, the set K := {K(αj ) : j ∈ J} constitutes a frame for H if and only if {Φ(αj ) : j ∈ J} forms a frame for W . Proof. We prove this theorem by establishing an isometric isomorphism between H and W . To this end, we let K(α, β) := Lβ (K(α)), α, β ∈ Λ. By Theorem 3.7, we have that f → L(·) (f ) is an isometric isomorphism between the FRKHS H of K and the RKHS H of K. Equation (3.16) shows that Φ is the feature map of the kernel K and W
34
is the corresponding feature space. We then see that u, Φ(·)W → u is an isometric isomorphism from H to W . We define the operator T : H → W as follows: For each f ∈ H, T f ∈ W satisfies T f, Φ(β)W = Lβ (f ), for all β ∈ Λ. Thus, T is an isometric isomorphism from H to W . By noting that T (K(α)) = Φ(α), we obtain immediately the desired result of this theorem. Motivated by Corollary 6.4, it is of practical importance to consider sampling and reconstruction in a perfect FRKHS. In this case, the frame {K(αj ) : j ∈ J} has the following characterization. Corollary 6.6. Let W be a Hilbert space, Φ : X → W satisfy span {Φ(x) : x ∈ X } = W. and Ψ : Λ → W satisfy
span {Ψ(α) : α ∈ Λ} = W.
(6.10)
If K is a functional reproducing kernel for a perfect FRKHS H, defined by (4.6), then for a countable sequence αj , j ∈ J, in Λ, the set K := {K(αj ) : j ∈ J} constitutes a frame for H if and only if {Ψ(αj ) : j ∈ J} forms a frame for W . Proof. In order to apply Lemma 6.5, we need to represent the functional reproducing kernel K in the form of (3.16). It is known that the linear functionals on the perfect FRKHS H are determined by Lα (u, Φ(·)W ) := u, Ψ(α)W , u ∈ W, α ∈ Λ. By the representation (4.6) of K, we have that Lβ (K(α)) = Ψ(α), Ψ(β)W , for all α, β ∈ Λ. That is, the mapping Ψ : Λ → W is the feature map of K, defined by (4.6). Noting that the density relation (6.10) holds, the desired result of this theorem follows directly from Lemma 6.5. Based upon the discussion above, we conclude that the frame {K(αj ) : j ∈ J} for an FRKHS H is important for establishing the complete reconstruction formula of functions in H. These desirable frames can be obtained by characterizing the frame {Φ(αj ) : j ∈ J} for the feature space W . In the light of this point, we shall consider the complete reconstruction of functions in two concrete FRKHSs.
6.1
Reconstruction of vector-valued functions in the Paley-Wiener space
In this subsection, we consider reconstructing a vector-valued function in the Paley-Wiener space. Recall that the Paley-Wiener space of Cn -valued functions on R is defined by
Bπ (R, Cn ) := f ∈ L2 (R, Cn ) : suppfˆ ⊆ [−π, π] with the inner product f, gBπ (R,Cn ) :=
fj , gj L2 (R) , for f := [fj : j ∈ Nn ], g := [gj : j ∈ Nn ].
j∈Nn
We wish to reconstruct f ∈ Bπ (R, Cn ) from its sampled data f (xj ), ξj Cn , (xj , ξj ) ∈ R × Cn , j ∈ Z. 35
Set Λ := R × Cn and for each (x, ξ) ∈ Λ, we introduce a linear functional on Bπ (R, Cn ) as L(x,ξ) (f ) := f (x), ξCn , f ∈ Bπ (R, Cn ). It was pointed out that the space Bπ (R, Cn ) is an FRKHS with respect to the family F of linear functionals L(x,ξ) , (x, ξ) ∈ Λ. The functional reproducing kernel K for Bπ (R, Cn ) is given by K(x, ξ) := diag(K(x, ·) : j ∈ Nn )ξ, (x, ξ) ∈ Λ, where K is the sinc kernel on R defined by K(x, y) :=
sin π(x − y) , x, y ∈ R. π(x − y)
(6.11)
We next describe the feature space W and the feature map Φ of the functional reproducing kernel K. Set W := L2 ([−π, π], Cn ) and define Φ : Λ → W by 1 Φ(x, ξ) := √ eix· ξ, (x, ξ) ∈ Λ. 2π
It follows from K(x, y) =
1 1 √ eix· , √ eiy· 2π 2π
L2 ([−π,π])
, x, y ∈ R,
that for all (x, ξ), (y, η) ∈ Λ, L(y,η) (K(x, ξ)) = K(x, ξ)(y), ηCn
1 1 iy· ix· √ e ξj , √ e ηj = . 2π 2π j∈N L2 ([−π,π]) n
This leads to L(y,η) (K(x, ξ)) = Φ(x, ξ), Φ(y, η)W , for all (x, ξ), (y, η) ∈ Λ. Thus, we conclude that W and Φ are the feature space and the feature map of the functional reproducing kernel K, respectively. Moreover, there holds the density condition span {Φ(x, ξ) : (x, ξ) ∈ Λ} = W. We now describe the reconstruction of a function f ∈ Bπ (R, Cn ) from its functional values L(xj ,ξj ) (f ), (xj , ξj ) ∈ Λ, j ∈ Z. This is done by finding a sampling set {(xj , ξj ) : j ∈ Z} ⊆ Λ such that {K(xj , ξj ) : j ∈ J} forms a frame for Bπ (R, Cn ). By Lemma 6.5 such a frame can be characterized by the frame {Φ(xj , ξj ) : j ∈ Z} for W . We choose the sampling set {(xj , ξj ) : j ∈ Z} as follows: We represent each j ∈ Z as j := nm + l, where m ∈ Z and l ∈ Zn := {0, 1, . . . , n − 1}. We then choose the sequence xj , j ∈ Z, in R such that for each l ∈ Zn , { √12π eixnm+l · : m ∈ Z} is a frame for L2 ([−π, π]). We pick n vectors ηl := [ηl,k : k ∈ Zn ], l ∈ Zn so that the matrix U := [ηl : l ∈ Zn ] is unitary. We then choose the sequence ξj , j ∈ Z, in Cn as ξnm+l := ηl for all l ∈ Zn and all m ∈ Z. We show below that {Φ(xj , ξj ) : j ∈ Z} constitutes a frame for W . ∈ Proposition 6.7. If the sequence {(xj , ξj ) : j ∈ Z} ⊆ Λ is chosen as above,
then {Φ(xj , ξj ) : j 1 ix · nm+l √ Z} constitutes a frame for W . Moreover, if for each l ∈ Zn , the sequence e :m∈Z 2π is a Riesz basis for L2 ([−π, π]), then {Φ(xj , ξj ) : j ∈ Z} forms a Riesz basis for W . 36
Proof. We prove that {Φ(xj , ξj ) : j ∈ Z} is a frame for W . By the definition of frames, it suffices to verify that there exist positive constants 0 < A ≤ B < +∞ such that for all u ∈ W , ⎛
AuW ≤ ⎝
⎞1/2 |u, Φ(xj , ξj )W |2 ⎠
≤ BuW .
(6.12)
j∈Z
According to the description of the sequence {(xj , ξj ) : j ∈ Z} ⊆ Λ we have for any u := [uk : k ∈ Zn ] ∈ W that
1 ixnm+l · 2 |u, Φ(xj , ξj )W | = η l,k uk , √ e 2π j∈Z l∈Zn m∈Z k∈Zn
For each l ∈ Zn , set vl :=
k∈Zn
L2 ([−π,π])
2 .
(6.13)
η l,k uk . Since { √12π eixnm+l · : m ∈ Z} is a frame for L2 ([−π, π]),
there exist positive constants 0 < Al ≤ Bl < +∞ such that 2 1 A2l vl 2L2 ([−π,π]) ≤ vl , √ eixnm+l · ≤ Bl2 vl 2L2 ([−π,π]) . 2π L2 ([−π,π]) m∈Z
Substituting the above estimates for all l ∈ Zn into (6.13), we get that u, Φ(xj , ξj ) 2 ≤ B 2 vl 2L2 ([−π,π]) ≤ vl 2L2 ([−π,π]) , A2 W j∈Z
l∈Zd
where A := min{Al : l ∈ Zn } and B := max{Bl : l ∈ Zn }. Note that there holds ⎛ ⎞ ⎝ vl 2L2 ([−π,π]) = η l,k ηl,j ⎠ uk , uj L2 ([−π,π]) . k∈Zn j∈Zn
l∈Zn
(6.14)
l∈Zd
(6.15)
l∈Zn
Equation (6.15) with the unitary assumption on U leads to vl 2L2 ([−π,π]) = uk 2L2 ([−π,π]) = u2W . l∈Zn
k∈Zn
Hence, by (6.14) we get the equivalence property (6.12). We next prove that {Φ(xj , ξj ) : j ∈ Z} is a Riesz basis for W if for each l ∈ Zn , { √12π eixnm+l · : m ∈ Z} is a Riesz basis for L2 ([−π, π]). It suffices to show that if there exists {cj : j ∈ Z} ∈ l2 (Z) such that cj Φ(xj , ξj ) = 0, (6.16) j∈Z
then cj = 0, j ∈ Z. It follows from (6.16) that for all j ∈ Z, cj Φ(xj , ξj ), Φ(xj , ξj )W = 0. j∈Z
There holds for any j := nm + l and any j := nm + l that 1 ixnm+l · 1 ixnm +l · Φ(xj , ξj ), Φ(xj , ξj )W = η l,k ηl ,k √ e ,√ e . 2π 2π L2 ([−π,π]) k∈Z n
37
(6.17)
Again, using the unitary property of U, we obtain that 1 1 Φ(xj , ξj ), Φ(xj , ξj )W = δl,l √ eixnm+l · , √ eixnm +l · . 2π 2π L2 ([−π,π])
(6.18)
Substituting (6.18) into (6.17), we obtain for any m ∈ Z and any l ∈ Zn that
1 ixnm+l · 1 ixnm +l · cnm+l √ e ,√ e = 0. 2π 2π 2 m∈Z L ([−π,π])
This together with the hypothesis that { √12π eixnm+l · : m ∈ Z} is a Riesz basis for L2 ([−π, π]) leads to 1 √ cnm+l eixnm+l · = 0. 2π m∈Z Since for any l ∈ ZN , { √12π eixnm+l · : m ∈ Z} is a Riesz basis for L2 ([−π, π]), we conclude that cnm+l = 0, for all m ∈ Z and l ∈ Zn , which completes the proof. Sampling sets {(xj , ξj ) : j ∈ Z} for {Φ(xj , ξj ) : j ∈ Z} being a Riesz basis for W were characterized in [6, 7] in terms of the generating matrix function. The authors of these papers considered the matrix Sturm-Liouville problems and applied the boundary control theory for hyperbolic dynamical systems in producing a wide class of matrix functions generating sampling sets. Such sampling sets can provide us the desired frames for Bπ (R, Cn ) having the form {K(xj , ξj ) : j ∈ Z}. By constructing the frame operator (6.3) and obtaining the dual functional by (6.4), we can build the reconstruction formula of f ∈ Bπ (R, Cn ) as reproducing kernel K j , ξj ). f (xj ), ξj Cn K(x (6.19) f= j∈Z
For the special case n = 1, Proposition 6.7 reduces to the well-known Shannon sampling theorem for the Paley-Wiener space of scalar-valued functions on R Bπ := {f ∈ L2 (R) : suppfˆ ⊆ [−π, π]}. The space Bπ is an RKHS with the sinc kernel defined by (6.11). As pointed out in the last section, the space W := L2 ([−π, π]) and the mapping Φ : R → W , defined by Φ(x) := √12π eix· , are the feature space and the feature map of the sinc kernel, respectively. Choose {j : j ∈ Z} as the sampling set. It is known that {Φ(j) : j ∈ Z} is just an orthonormal basis for W . Then with {K(j, ·) : j ∈ Z} forms an orthonormal basis for Bπ and the dual reproducing kernel K respect to {j : j ∈ Z} coincides with K. Hence, the reconstruction formula (6.19) reduces to the Shannon sampling formula f (x) =
j∈Z
6.2
f (xj )
sin π(j − x) , x ∈ R. π(j − x)
Reconstruction of a function using sampled local averages
We consider in this subsection recovering a scalar-valued function f on R from the sampled data determined by integral functionals. The local average sampling is used to reconstruct a function f from its local average values in the form Lxj (f ) := f (t)uxj (t)dt, xj ∈ R, j ∈ Z, R
38
where ux , x ∈ R, are nonnegative functions that satisfy ux (t)dt = 1 and supp ux ⊆ [x − δ, x + δ],
(6.20)
R
with δ being a positive constant. We call ux , x ∈ R, the average functions. We shall restrict ourselves to investigating the local average sampling in the framework of FRKHSs. The FRKHS to be considered is the Paley-Wiener space Bπ of scalar-valued functions on R. The space Bπ is a standard RKHS, in which the ideal sampling and the local average sampling are usually considered. The sinc kernel for Bπ , defined by (6.11), is a translation invariant reproducing kernel defined by 1 χ[−π,π] . The space W := L2 ([−π, π]) and the mapping Φ : R → W , defined (5.8) with ϕ := 2π 1 by Φ(x) := √2π eix· , are the feature space and the feature map of the sinc kernel, respectively. It is clear that the function ϕ and the average functions ux , x ∈ R, are all integrable. Hence, by Corollary 5.4 we obtain that the closed subspace H := f ∈ Bπ : fˇ ∈ span u∨ x χ[−π,π] : x ∈ R of Bπ is a perfect FRKHS with respect to the family F of the local average functionals Lx , x ∈ R. The closure in the representation of H is taken in W . Moreover, the functional reproducing kernel is given by ∧ K(x)(y) = [u∨ x χ[−π,π] ] (y), x, y ∈ R. For each x ∈ R, set Ψ(x) :=
√
2πu∨ x χ[−π,π] .
Then K can be represented as in (4.6). To relate our new theory to the existing average sampling theorems in the literature, we consider recovering functions in Bπ , rather than the closed subspace H, from the local average values. To this end, we recall Corollary 5.3, which states that Bπ itself is a perfect FRKHS if and only if there holds the density span {Ψ(x) : x ∈ R} = W. We restrict ourselves to this special case. Since Bπ is a perfect FRKHS, with the help of Corollary 6.6 we shall construct the sampling set {xj : j ∈ Z} such that the sequence {Ψ(xj ) : j ∈ Z} constitutes a frame for W . The next theorem gives a sufficient condition to ensure this. For any u ∈ W , we let Ru and Iu denote its real part and imaginary part, respectively. Proposition 6.8. Suppose that the sequence of nonnegative functions ux , x ∈ R, satisfy condition (6.20). If there exist positive constants A, B depending only on the sampling set {xj : j ∈ Z} and a positive constant δ such that for any sequence tj , j ∈ Z, with tj ∈ [xj − δ, xj + δ], {eitj · : j ∈ Z} forms a frame for W with the frame bounds A, B, then {Ψ(xj ) : j ∈ Z} constitutes a frame for W. Proof. We prove this result by establishing the equivalence condition (6.1) in the definition of a frame for W . That is, we estimate lower and upper bounds of the l2 -norm of the quantities Qj (u) := u, Ψ(xj )W , j ∈ Z, for any u ∈ W. Note that for any u ∈ W π 2 1 1 2 −ist u(t) uxj (s)e ds dt = |Qj (u)| = 2π R (2π)2 −π
2 u, eis· W ux (s)ds . j R
We decompose u into its odd and even parts. That is, u = uo + ue , where uo (x) =
u(x) − u(−x) u(x) + u(−x) , ue (x) = , x ∈ [−π, π]. 2 2 39
(6.21)
Substituting the decomposition u = uo + ue into (6.21), we get that 2 1 2 |Qj (u)| = F (s)u (s)ds + i G (s)u (s)ds , u x u x j j (2π)2 R R
(6.22)
where Fu (s) := Rue + iIuo , eis· W and Gu (s) := Iue − iRuo , eis· W , s ∈ R. Note that Fu and Gu are both real and continuous functions. By the mean value theorem and the hypothesis on ux , x ∈ R, there exist sj , s˜j ∈ [xj − δ, xj + δ] such that Fu (s)uxj (s)ds = Fu (sj ) and Gu (s)uxj (s)ds = Gu (˜ sj ). R
R
Substituting the above equations into (6.22), we obtain that |Qj (u)|2 =
1 [F 2 (sj ) + G2u (˜ sj )]. (2π)2 u
(6.23)
By the hypothesis of this theorem, both {eisj · : j ∈ Z} and {ei˜sj · : j ∈ Z} are frames for W with the frame bounds A, B. Hence, by the definition of Fu and Gu , we get that A2 Rue + iIuo 2W ≤ Fu2 (sj ) ≤ B 2 Rue + iIuo 2W j∈Z
and
A2 Iue − iRuo 2W ≤
G2u (˜ sj ) ≤ B 2 Iue − iRuo 2W .
j∈Z
By (6.23) and noting that u2W = Rue + iIuo 2W + Iue − iRuo 2W , we conclude that
⎛ ⎞1/2 1 1 |Qj (u)|2 ⎠ ≤ AuW ≤ ⎝ BuW . 2π 2π j∈Z
This completes the proof by the definition of a frame for W . Observing from Proposition 6.8, we note that frames for W with the form {eitj · : j ∈ Z} are useful for the study of frames with the form {Ψ(xj ) : j ∈ Z}. For a complete characterization of the former, one can see [13, 62]. By making use of the sampling set {xj : j ∈ Z}, we obtain the frame {K(xj ) : j ∈ Z} for Bπ . Accordingly, we get the complete reconstruction formula for f ∈ Bπ as j ), f= f (t)uxj (t)dt K(x j∈Z
R
is the dual functional reproducing kernel of K with respect to the set {xj : j ∈ Z}. In where K fact, Proposition 6.8 indicates that the above complete reconstruction formula does not depend on the exact choice of the average functions uxj , j ∈ Z, but only on the sampling set {xj : j ∈ Z} and the width δ of the support of the average functions. As applications of Proposition 6.8, we present two specific examples for choosing the sampling set {xj : j ∈ Z} and the average functions ux , x ∈ R.
40
Example 6.9. Let xj = j, j ∈ Z, and 0 < δ < 1/4. The well-known Kadec 1/4-theorem shows that for any sequence {tj : j ∈ Z} satisfying tj ∈ [xj − δ, xj + δ], j ∈ Z, the sequence {eitj · : j ∈ Z} forms a frame for L2 ([−π, π]) with the bounds 2π(cos(δπ) − sin(δπ))2 and 2π(2 − cos(δπ) + sin(δπ))2 . That is, the hypothesis of Proposition 6.8 is satisfied. Hence, we have by Proposition 6.8 that the corresponding sequence {Ψ(xj ) : j ∈ Z} constitutes a frame for W and then we can establish the complete reconstruction formula of functions in Bπ as in Theorem 6.3. In fact, the local average sampling theorem in this case was established in [56] for Bπ without the concept of FRKHSs. A generalized Kadec 1/4-theorem can help us extend this result. Specifically, we suppose that the sampling set {xj : j ∈ Z} satisfies that {eixj · : j ∈ Z} is a frame for W with bounds A, B and 0 < δ < 1/4 such that ( A 1 − cos(δπ) + sin(δπ) < . B The generalized Kadec 1/4-theorem states that for any {tj : j ∈ Z} with tj ∈ [xj − δ, xj + δ], j ∈ Z, {eitj · : j ∈ Z} forms a frame for W with bounds A 1−
(
B (1 − cos(δπ) − sin(δπ)) A
2 and B(2 − cos(δπ) + sin(δπ))2 .
Hence, the sampling set {xj : j ∈ Z} and δ in this case also satisfy the hypothesis of Proposition 6.8 and thus, the corresponding sequence {Ψ(xj ) : j ∈ Z} constitutes a frame for W . Example 6.10. We review an example considered in [56] and reinterpret it by Proposition 6.8. Specifically, we suppose that α, L are positive constants and 0 < < 1. It is known [20] that for any sampling set {xj : j ∈ Z} such that |xj − xk | ≥ α, j = k and |xj − j| ≤ L, the sequence {eixj · : j ∈ Z} forms a frame for W with bounds A, B depending only on α, L, . If we choose the sampling set {xj : j ∈ Z} satisfying the above condition with α, L, and 0 < δ < α/2, then we can verify that for any sequence tj , j ∈ Z, with tj ∈ [xj − δ, xj + δ], there holds |tj − tk | ≥ α − 2δ, j = k, |tj − j| ≤ L + δ. Hence, we conclude that {eitj · : j ∈ Z} forms a frame for W with the bounds A, B depending only on α, L, and δ. That is, the hypothesis of Proposition 6.8 is satisfied. Hence, we get that the corresponding sequence {Ψ(xj ) : j ∈ Z} constitutes a frame for W . The corresponding complete reconstruction formula of functions in Bπ can be built as in Theorem 6.3. Such a local average sampling theorem was also established in [56]. To end the discussion about the local average sampling in the Paley-Wiener space Bπ , we consider a special case where ux (t) := u(t − x), t ∈ R, with u ∈ L1 (R) and present a sufficient condition for {Ψ(xj ) : j ∈ J} to be a frame for W . u(t)| ≥ c for a positive constant c and for all t ∈ [−π, π], Proposition 6.11. Let u ∈ L1 (R) satisfy |ˇ and ux := u(· − x), x ∈ R. If {eixj · : j ∈ Z} is a frame for W , then {Ψ(xj ) : j ∈ Z} constitutes a frame for W .
41
Proof. We prove that {Ψ(xj ) : j ∈ Z} constitutes a frame for W by estimating the quantities ix· ˇ. Then we have for any w, Ψ(xj )W , j ∈ Z, for any w ∈ W . Notice that for any x ∈ R, u∨ x =e u w ∈ W that |w, Ψ(xj )W |2 = |wˇ uχ[−π,π] , eixj · W |2 . j∈Z
j∈Z
Since {eixj · : j ∈ Z} is a frame for W , there exist 0 < A ≤ B < ∞ such that uχ[−π,π] 2W ≤ |w, Ψ(xj )W |2 ≤ B 2 wˇ uχ[−π,π] 2W . A2 wˇ
(6.24)
j∈Z
By the assumptions on function u, we have that cwW ≤ wˇ uχ[−π,π] W ≤
1 uL1 (R) wW . 2π
Substituting the above estimate into (6.24), we obtain for any w ∈ W that ⎞1/2
⎛
A wW ≤ ⎝
|w, Ψ(xj )W |2 ⎠
≤ B wW ,
j∈Z B uL1 (R) . By the definition of a frame, the sequence {Ψ(xj ) : j ∈ Z} with A := Ac and B := 2π constitutes a frame for W .
To close this section, we remark that taking into account the continuity of the sampling process, FRKHSs are the right spaces for reconstruction of functions by using their non-pointevaluation functional data. The complete reconstruction formula in such spaces can be obtained by constructing frames in terms of the corresponding functional reproducing kernels. Characterizing the desired frames by features provides us a convenient approach for finding them. The specific examples discussed in this section include not only the existing sampling theorems concerning the non-point-evaluation functional data but also new ones. Hence, we point out that FRKHSs indeed provide an ideal framework for systematically study sampling and reconstruction with respect to non-point-evaluation functional data.
7
Regularized learning in FRKHSs
Regularization, an approach widely used in solving an ill-posed problem of learning an unknown target function from finite samples, has been a focus of attention in the field of machine learning [16, 39, 50, 51, 58]. We study in this section the regularized learning from finite non-pointevaluation functional data in the framework of FRKHSs. The resulting representer theorem shows that as far as the non-point-evaluation functional data are concerned, learning in FRKHSs has advantages over RKHSs. We begin with reviewing the regularized learning schemes for learning a function in an RKHS from finite point-evaluation functional data. Suppose that X is the input space. Let Q : Cm × Cm → R+ := [0, +∞] be a prescribed loss function, φ : R+ → R+ a regularizer and λ a positive regularization parameter. The regularized learning of a function in an RKHS H on X with the reproducing kernel K from its finite samples {(xj , yj ) : j ∈ Nm } ⊂ X × C may be formulated as the following minimization problem: inf {Q(f (x), y) + λφ(f H )} ,
f ∈H
42
(7.1)
where x := [xj : j ∈ Nm ], f (x) := [f (xj ) : j ∈ Nm ] and y := [yj : j ∈ Nm ]. Many popular learning schemes such as regularization networks and support vector machines correspond to the minimization problem (7.1) with different choices of Q and φ. For example, φ(t) := t2 , t ∈ R+ is a frequent choice. If the loss function Q is chosen as |f (xj ) − yj |2 , Q(f (x), y) := j∈Nm
the corresponding algorithm is called the regularization networks. We next choose the loss function as Q(f (x), y) := max(|f (xj ) − yj | − ε, 0), j∈Nm
where ε is a positive constant. In this case, the algorithm coincides with the support vector machine regression. The success of the regularized learning lies on the well-known representer theorem. This remarkable result shows that for certain choice of loss functions and regularizers the solution to the minimization problem (7.1) can be represented by a linear combination of the kernel sections K(xj , ·), j ∈ Nm . Then the original minimization problem in a potentially infinite-dimensional Hilbert space can be converted to one about finitely many coefficients emerging in the linear combination of the kernel sections. Moreover, since the reproducing kernel is used to measure the similarity between inputs, the representer theorem provides us with a reasonable output by making use of the input similarities. The representer theorem for the regularization networks in the scalar case dates from [32] and was generalized for non-quadratic loss functions and nondecreasing regularizers [4, 15, 49]. As the multi-task learning and learning in Banach spaces received considerable attention recently, the representer theorems in vector-valued RKHSs and RKBSs were also established [39, 64, 65, 66]. We state in the following theorem a general result for the solution to the minimization problem (7.1). To this end, we suppose that the loss function Q and the regularizer φ satisfy the following assumptions: (A1) The loss function Q is continuous and convex with respect to its first m variables. (A2) The regularizer φ is assumed to be continuous, nondecreasing, strictly convex and to satisfy lim φ(t) = +∞. t→+∞
Theorem 7.1. Let H be an RKHS on X with the reproducing kernel K. If Q and φ satisfy assumptions (A1) and (A2) then there exists a unique minimizer f0 of (7.1) and it has the form f0 = cj K(xj , ·) (7.2) j∈Nm
for some sequence cj , j ∈ Nm , in C. Substituting the representation (7.2) of the minimizer into (7.1), the original minimization problem in an RKHS is converted into the minimization problem about the parameters cj , j ∈ Nm . Especially in some occasions, with the help of the representer theorem, we can convert the minimization problem into a system of equations about the parameters. For the regularization networks, if K(xj , ·), j ∈ Nm , are linearly independent in H, then the parameters cj , j ∈ Nm , in (7.2) satisfy the linear equations ck K(xk , xj ) + λcj = yj , j ∈ Nm . k∈Nm
43
As the point-evaluation functional data are not readily available in many practical applications, it is desired to study the regularized learning from finite samples which are in general the linear functional values. Most of the existing work mainly focused on such learning problems in RKHSs and RKBSs, where the point-evaluation functionals are continuous [58, 63, 65]. A basic assumption of these results is that the finitely many linear functionals are continuous on a prescribed space, which guarantees the stability of the finite sampling process. Under this hypothesis the representer theorem was obtained. Specifically, in RKHSs, one can represent the solution of the regularization problem by a linear combination of the dual elements of the given linear functionals. In general, learning a function in a usual RKHS from its non-point-evaluation functional data is not appropriate. Two reasons account for this consideration. First of all, although the representer theorem still exists for the regularized learning from non-point-evaluation functional data in an RKHS, it represents the solution by making use of the dual elements of the continuous linear functionals other than the reproducing kernel. This will bring difficulties to the computation of the solution. Secondly, when values of only finite linear functionals are used to learn a target element, the underlying RKHS may ensure the continuity of the finite linear functionals. However, to improve the accuracy of approximation, more functional values should be used in the regularized learning scheme. An RKHS may not have the ability to guarantee the continuity of a family of linear functionals. The disadvantages of RKHSs for learning a function from its non-pointevaluation functional data lead us to considering regularized learning from non-point-evaluation functional data in the FRKHSs setting. Suppose that H is an FRKHS with respect to the set F := {Lα : α ∈ Λ} of linear functionals on H. Let Λm := [αj : j ∈ Nm ] ∈ Λm , y := [yj : j ∈ Nm ] ∈ Cm and for each f ∈ H set
LΛm (f ) := [Lαj (f ) : j ∈ Nm ].
We consider the regularized learning algorithm with respect to the values of linear functionals as inf {Q(LΛm (f ), y) + λφ(f H )} ,
f ∈H
(7.3)
where Q, φ and λ are defined by (7.1). The following theorem gives the existence and the uniqueness of the minimizer of (7.3) and the corresponding representer theorem. Although the theorem can be proved by arguments similar to the proof of Theorem 7.1, we shall show it by making use of the isometric isomorphism between H and an RKHS. Theorem 7.2. Suppose that H is an FRKHS with respect to the set F := {Lα : α ∈ Λ} of linear functionals on H and K is the functional reproducing kernel for H. If the loss function Q and the regularizer φ satisfy assumptions (A1) and (A2) then there exists a unique minimizer f0 of (7.3) and there exist cj ∈ C, j ∈ Nm , such that cj K(αj ). (7.4) f0 = j∈Nm
Proof. To apply Theorem 7.1, we construct an RKHS which is isometrically isomorphic to H. To this end, we introduce the operator K : Λ × Λ → C by K(α, β) := Lβ (K(α)), α, β ∈ Λ.
(7.5)
By Theorems 3.6 and 3.7, we have that K is a reproducing kernel on Λ and its RKHS HK , composed of functions in the form f˜(α) := Lα (f ), α ∈ Λ, f ∈ H, is isometrically isomorphic to H. For each f˜ ∈ HK set f˜(Λm ) := [Lαj (f ) : j ∈ Nm ]. Due to the isometrically isomorphic between H 44
and HK we have that f0 is the minimizer of (7.3) if and only if f˜0 with f˜0 (α) := Lα (f0 ), α ∈ Λ, is the minimizer of
inf Q(f˜(Λm ), y) + λφ(f˜HK ) . f˜∈HK
By Theorem 7.1 we observe that the unique minimizer f˜0 of the above optimization problem has the form cj K(αj , ·) f0 = j∈Nm
for some cj ∈ C, j ∈ Nm . Hence, we conclude that there exists a unique minimizer of (7.3). Moreover, since for each β ∈ Λ, ⎛ ⎞ cj K(αj , β) = cj Lβ (K(αj )) = Lβ ⎝ cj K(αj )⎠ , f0 (β) = j∈Nm
j∈Nm
j∈Nm
we get the representation of f0 as in (7.4). As a special case, we consider the regularization networks (7.3), where the regularizer φ is chosen as φ(t) := t2 , t ∈ R+ and the loss function Q is chosen as Q(LΛm (f ), ξ) := |Lαj (f ) − yj |2 . j∈Nm
It is clear that the loss function Q and the regularizer φ satisfies the hypothesis of Theorem 7.2. We then conclude by Theorem 7.2 that there exists a unique minimizer f0 of the regularization networks. Let K be the reproducing kernel defined by (7.5). From the proof of Theorem 7.2, we see that if K(αj ), j ∈ Nm , are linearly independent in H, then the parameters cj , j ∈ Nm , in (7.2) satisfy the linear equations ck K(αk , αj ) + λcj = yj , j ∈ Nm . k∈Nm
By the definition of K, the above equations are equivalent to ck Lαj (K(αk )) + λcj = yj , j ∈ Nm .
(7.6)
k∈Nm
We note that K(αj ), j ∈ Nm , are linearly independent if and only if the linear functionals Lαj , j ∈ Nm , are linearly independent. This result is stated in the following proposition. Lemma 7.3. Suppose that H is an FRKHS with respect to the set F := {Lα : α ∈ Λ} of linear functionals on H and K is the functional reproducing kernel for H. If {αj , j ∈ Nm } is a finite subset of distinct points in Λ, then the linear functionals Lαj , j ∈ Nm , are linearly independent if and only if K(αj ), j ∈ Nm , are linearly independent. Proof. The linear functionals Lαj , j ∈ Nm , are linearly independent if and only if there does not exist {cj : j ∈ Nm } ⊂ C, not all zero, such that cj Lαj (f ) = 0, for all f ∈ H. (7.7) j∈Nm
By the reproducing property of K, equation (7.7) is equivalent to
f, cj K(αj ) = 0, for all f ∈ H. j∈Nm
H
45
That is, the linear functionals Lαj , j ∈ Nm , are linearly independent if and only if there does not exist {cj : j ∈ Nn } ⊂ C, not all zero, such that cj K(αj ) = 0. j∈Nm
This is equivalent to the fact that K(αj ), j ∈ Nm , are linearly independent. By representation (7.4), we note that for the regularized learning in an FRKHS from nonpoint-evaluation functional data, the target element can be obtained by the functional reproducing kernel in a desired manner. That is, it can be viewed as a linear combination of the kernel sections. As shown in the system of equations (7.6), the finite number of coefficients in the representation (7.4) can be obtained by solving a minimization problem in a finite-dimensional space, which reduces to a linear system in some special cases. Furthermore, the desired result holds for all the finite set of linear functionals in the family F := {Lα : α ∈ Λ}. All these facts show that FRKHSs and functional reproducing kernels provide a right framework for investigating learning from non-point-evaluation functional data.
8
Stability of numerical reconstruction algorithms
When we use a numerical algorithm to reconstruct an element from non-point-evaluation functional data (its functional values), stability of the algorithm is crucial. In this section, we study stability of a numerical reconstruction algorithm using functional values in an FRKHS. As will be pointed out, the continuity of linear functionals, used to obtain the non-point-evaluation functional data, on an FRKHS is necessary for the algorithm to be stable. Numerical reconstruction algorithms are often developed for reconstructing a target element. There are two stages in the reconstruction. The first one is sampling used to obtain a finite set of non-point-evaluation functional data processed digitally on a computer. The second one is the reconstruction of an approximation of the target element from the resulting sampled data. With the number of the sampled data increasing, the numerical reconstruction algorithm is expected to provide a more and more accurate approximation of the target element. As an admissible numerical reconstruction algorithm, it needs to be stable. Such a property will ensure that the effect of noise emerging in both sampling and reconstruction is not amplified by the underlying numerical reconstruction algorithm. We now define precisely the notion of stability of a numerical reconstruction algorithm. Suppose that H is an FRKHS with respect to a family F := {Lα : α ∈ Λ} of linear functionals on := {αj : j ∈ J} is a subset of Λ with J being a countable index set. With respect to each H and Λ m := {αj : j ∈ Jm } and define the sampling operator finite set Jm of m elements in J, we set Λ m IΛ m : H → C by (8.1) IΛ m (f ) := [Lαj (f ) : j ∈ Jm ]. We denote by Am the reconstruction operator from Cm to H. The sampling operator and the reconstruction operator together describe a numerical reconstruction algorithm Am ◦IΛ m . Namely, Am (IΛ m (f )) describes a numerical algorithm for constructing an approximation of f . Definition 8.1. The numerical reconstruction algorithm Am ◦ IΛ m defined above is stable if there exist positive constants c and m0 such that for all f ∈ H, all positive integer m > m0 and all m of Λ, finite subset Λ Am (IΛ m (f ))H ≤ cf H . (8.2)
46
It follows from inequality (8.2) that if there is a small perturbation of f in H, then the approximation of f , constructed by a stable numerical reconstruction algorithm, will be close to the approximation without perturbation. Specifically, suppose that there is a small perturbation δf of f in H. Then we get the approximation Am (IΛ m (f + δf )) by the numerical reconstruction algorithm. Since the operators IΛ m and Am are linear, the error between Am (IΛ m (f + δf )) and Am (IΛ m (f ) can be estimated by inequality (8.2) as Am (IΛ m (f + δf )) − Am (IΛ m (f )) = Am (IΛ m (δf ))H ≤ cδf H . This implies that if the perturbation δf is small, then the error between the two approximations will be small. In other words, a stable numerical reconstruction algorithm will not result in large reconstruction errors when the input function f contains small errors. We present in the following lemma a sufficient condition for the stability of a numerical reconstruction algorithm. To this end, we denote by l2 (J) the Hilbert space of square-summable we define the sampling operator I : H → l2 (J) for f ∈ H by sequence on J. With respect to Λ Λ IΛ (f ) := [Lαj (f ) : j ∈ J]. Lemma 8.2. Let H be an FRKHS with respect to a family F := {Lα : α ∈ Λ} of linear functionals := {αj : j ∈ J} be a countable subset of Λ. Suppose that the operator I is continuous on H. Let Λ Λ on H and the operators IΛ m , Am are defined as above. If there exist positive constants c and m0 m of Λ, such that for all f ∈ H, all positive integer m > m0 and all finite subset Λ Am (IΛ m (f ))H ≤ cIΛ m (f )Cm , then the numerical reconstruction algorithm Am ◦ IΛ m is stable. Proof. It follows from the continuity of operator IΛ that there exists a positive constant c such that for all f ∈ H, (8.3) IΛ (f )l2 (J) ≤ c f H . By the definition of sampling operators and the norm of the space l2 (J), we get for all f ∈ H, for m that all m ∈ N and for all finite subset Λ ⎛ IΛ m (f )Cm = ⎝
⎞1/2 |Lαj (f )|2 ⎠
≤ IΛ (f )l2 (J) .
j∈Jm
This together with (8.3) leads to the estimate IΛ m (f )Cm ≤ c f H . Substituting the above inequality into the inequality in the assumption, we have for all f ∈ H, all m that positive integer m > m0 and all finite subset Λ Am (IΛ m (f ))H ≤ cc f H . Hence, this with the definition of the stability of the numerical reconstruction algorithm proves the desired result. To apply Lemma 8.2 to ensure the stability of the numerical reconstruction algorithm, we need m ⊆ Λ} of sampling operators and the family {Am : m ∈ N} of that both the family {IΛ m : Λ reconstruction operators are uniformly bounded. The continuity of the sampling operators on an FRKHS is necessary to ensure the uniform boundedness of the family of the sampling operators. 47
For the reconstruction operators, we shall show below that two classes of commonly used operators are uniformly bounded. The first numerical reconstruction algorithm that we consider is the truncated reconstruction of elements in an FRKHS from a countable set of non-point-evaluation functional data discussed in section 6. Let H be an FRKHS with respect to a family F of linear functionals {Lα : α ∈ Λ} := {αj : j ∈ J} is a countable and K the functional reproducing kernel for H. Suppose that Λ the dual set of Λ such that {K(αj ) : j ∈ J} constitutes a Riesz basis for H. We denote by K functional reproducing kernel of K with respect to the set {αj : j ∈ J}. For each finite subset m := {αj : j ∈ Jm }, the sampling operator I : H → Cm is defined by (8.1). Since only finite Λ Λm sampled data are used in practice, we consider the reconstruction operator Am from Cm to H defined by j ). Lαj (f )K(α Am (IΛ m (f )) := j∈Jm
The stability of the resulting numerical reconstruction algorithm for this case is established below. Proposition 8.3. Let H be an FRKHS with respect to the family F of linear functionals Lα , := {αj : j ∈ J} α ∈ Λ and K the functional reproducing kernel for H. If for a countable subset Λ of Λ, the set {K(αj ) : j ∈ J} constitutes a Riesz basis for H, then the numerical reconstruction algorithm Am ◦ IΛ m defined as above is stable. Proof. We prove this result by employing Lemma 8.2. To this end, we verify the hypothesis of Lemma 8.2 for this special case. Since {K(αj ) : j ∈ J} constitutes a Riesz basis for H, there exist positive constants A and B such that ⎛
Af H ≤ ⎝
⎞1/2 |f, K(αj )H |2 ⎠
≤ Bf H , for all f ∈ H.
(8.4)
j∈J
By the first inequality in (8.4) and the definition of Am , we have that ⎞1/2 ⎛ ) * 2 1 ⎝ j ), K(αk ) ⎠ . |Lαj (f )|2 K(α Am (IΛ m (f ))H ≤ A H k∈J j∈Jm
j ) : j ∈ J} and {K(αj ) : j ∈ J} in the Using the biorthogonal condition of the sequence pair {K(α right hand side of the inequality above yields that for all f ∈ H, all m ∈ N and all finite subset m, Λ ⎞1/2 ⎛ 1 ⎝ 1 |Lαj (f )|2 ⎠ = IΛ m (f )Cm . Am (IΛ m (f ))H ≤ A A j∈Jm
Employing the reproducing property of K in the second inequality in (8.4), we get that IΛ (f )l2 (J) ≤ Bf H , for all f ∈ H, yielding the continuity of operator IΛ . By Lemma 8.2, we conclude the desired result. We next consider the regularization network, discussed in section 7, which infers an approximation of a target element by solving the optimization problem ⎧ ⎫ ⎨ ⎬ inf (8.5) |Lαj (g) − yj |2 + λg2H . ⎭ g∈H ⎩ j∈Nm
48
Suppose that H is an FRKHS with respect to a family F of linear functionals {Lα : α ∈ Λ} on H := {αj : j ∈ J} be a countable subset of and K is the functional reproducing kernel for H. Let Λ m := {αj : j ∈ Jm }, the sampling operator I : H → Cm is defined Λ. For each finite subset Λ Λm by (8.1). For each f ∈ H, we let fΛ m be the unique minimizer of (8.5) with [yj : j ∈ Jm ] := IΛ m (f ). Accordingly, the reconstruction operator Am : Cm → H is defined by Am (IΛ m (f )) := fΛ m . The next result concerns the stability of the corresponding numerical reconstruction algorithm. To this end, we introduce another operator by considering the following regularization problem ⎧ ⎫ ⎨ ⎬ inf (8.6) |Lαj (g) − yj |2 + λg2H . ⎭ g∈H ⎩ j∈J
It follows from the theory of the Tikhonov regularization [33, Page 243] that if the operator IΛ : H → l2 (J) is continuous on H, then for any y := {yj : j ∈ J} ∈ l2 (J) there exists a unique minimizer g0 of the optimization problem (8.6). Furthermore, the minimizer g0 is given by the unique solution of the equation λg0 + IΛ∗ IΛ (g0 ) = IΛ∗ (y), where I ∗ denotes the adjoint operator of IΛ . We define the operator AΛ : l2 (J) → H by Λ
AΛ := (λI + IΛ∗ IΛ )−1 IΛ∗ . Here, we denote by I the identity operator on H. Proposition 8.4. Suppose that H is an FRKHS with respect to the family F := {Lα : α ∈ Λ} of := {αj : j ∈ J} of Λ is chosen to satisfy the linear functionals on H. If the countable subset Λ condition that the operator IΛ is continuous on H, then the corresponding numerical reconstruction algorithm Am ◦ IΛ m defined as above is stable. Proof. Again, we prove this result by employing Lemma 8.2. Notice that for each g ∈ H λg + IΛ∗ IΛ (g), gH = λg2H + IΛ (g)2H ≥ λg2H . By the Lax-Milgram theorem [61, Page 92], the estimate above ensures that the inverse operator (λI + I ∗ IΛ )−1 is continuous on H. According to the definition of AΛ , there exists a positive Λ constant c such that for all f ∈ H, AΛ (IΛ (f ))H ≤ c IΛ (f )l2 (J) .
(8.7)
m and each f ∈ H, we choose f˜ ∈ H to satisfy that For each finite subset Λ Lαj (f˜) = Lαj (f ), for j ∈ Jm and Lαj (f˜) = 0, for j ∈ / Jm .
(8.8)
By the definition of Am and AΛ , we get that Am (IΛ m (f )) = AΛ (IΛ (f˜)). m, This combined with (8.7) leads to that for all f ∈ H, for all m ∈ N and for all finite subset Λ Am (IΛ m (f ))H = AΛ (IΛ (f˜))H ≤ c IΛ (f˜)l2 (J) = c IΛ m (f )Cm , where the validity of the last equality is due to the relation (8.8). Hence, the stability of the corresponding numerical reconstruction algorithm follows directly from Lemma 8.2. 49
To close this section we present a specific example to illustrate Proposition 8.4. We first describe the FRKHS in which we consider the regularization network. Choose a family of nonnegative functions ux , x ∈ R which satisfy (6.20) with 0 < δ < 14 and the density condition 2 span {u∨ x χ[−π,π] : x ∈ R} = L ([−π, π]).
We then consider the family F := {Lx : x ∈ R} of the local average functionals Lx (f ) := f (t)ux (t)dt, x ∈ R. R
As pointed out in subsection 6.2, the Paley-Wiener space Bπ := {f ∈ L2 (R) : suppfˆ ⊆ [−π, π]} is an FRKHS with respect to F and the functional reproducing kernel K is given by ∧ K(x)(y) := [u∨ x χ[−π,π] ] (y), x, y ∈ R.
:= Z. For each finite subset Jm of m elements in Z, we set Λ m := {j : j ∈ Jm } and define Let Λ m the sampling operator IΛ m : H → C by IΛ m (f ) := [Lj (f ) : j ∈ Jm ], for f ∈ H. We define the reconstruction operator Am : Cm → H by letting Am (IΛ m (f )) denote the unique minimizer of ⎧ ⎫ ⎨ ⎬ inf |Lj (g) − Lj (f )|2 + λg2H . ⎭ g∈H ⎩ j∈Jm
We now show that Proposition 8.4 ensures the stability of the numerical reconstruction algorithm Am ◦IΛ m . To this end, we verify that the hypothesis of Proposition 8.4 is satisfied. Specifically, we is continuous on H. Example 6.9 prove that the operator IΛ , defined by IΛ (f ) := [Lj (f ) : j ∈ Λ], guarantees that {K(j) : j ∈ Z} constitutes a frame for Bπ . Hence, there exist a positive constant c such that ⎛ ⎞1/2 ⎝ |f, K(j)Bπ |2 ⎠ ≤ cf Bπ , for all f ∈ Bπ , j∈Z
which implies that the operator IΛ is continuous on Bπ . According to Proposition 8.4, we conclude that the numerical reconstruction algorithm Am ◦ IΛ m is stable.
9
Conclusion
Non-point-evaluation functional data are used frequently in data analysis. To efficiently analyze such data, we have introduced the notion of the FRKHS and established the bijective correspondence between an FRKHS and its functional reproducing kernel. Particular attention has been paid to the perfect FRKHS, which admits two families of continuous linear functionals with one being the family of point-evaluation functionals and the other being a particular family of Non-point-evaluation functionals. The characterization and specific examples of perfect FRKHSs with respect to the integral functionals are provided. To demonstrate the usefulness of FRKHSs, we have discussed sampling theorems and regularized learning schemes from non-point-evaluation functional data in the FRKHSs setting. We have considered stability of numerical reconstruction 50
algorithms in FRKHSs. This work provides an appropriate mathematical foundation for kernel methods when non-point-evaluation functional data are used. Acknowledgment: This research is supported in part by the Special Project on Highperformance Computing under the National Key R&D Program (No. 2016YFB0200602), by the United States National Science Foundation under grant DMS-1522332 and by the Natural Science Foundation of China under grants 11301208 and 11471013.
References [1] A. Aldroubi, Non-uniform weighted average sampling and reconstruction in shift-invariant and wavelet spaces, Appl. Comput. Harmon. Anal. 13 (2002), 151–161. [2] A. Aldroubi and K. Gr¨ ochenig, Nonuniform sampling and reconstruction in shift-invariant spaces, SIAM Rev. 43 (2001), 585–620. [3] A. Aldroubi, Q. Sun and W. S. Tang, Nonuniform average sampling and reconstruction in multiply generated shift-invariant spaces, Constr. Approx. 20 (2004), 173–189. [4] A. Argyriou, C. A. Micchelli and M. Pontil, When is there a representer theorem? Vector versus matrix regularizers, J. Mach. Learn. Res. 10 (2009), 2507–2529. [5] N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc. 68 (1950), 337–404. [6] S. A. Avdonin and S. A. Ivanov, Families of Exponentials: The Method of Moments in Controllability Problems for Distributed Parameter Systems, Cambridge University Press, Cambridge, 1995. [7] S. A. Avdonin and S. A. Ivanov, Sampling and Interpolation problems for vector valued signals in the Paley-Wiener spaces, IEEE Trans. Signal Process. 56 (2008), 5435–5441. [8] J. J. Benedetto and H. C. Wu, Non-uniform sampling and spiral MRI reconstruction, Proc. SPIE 4119 (2000), 130–141. [9] S. Bochner, Lectures on Fourier Integrals. With an author’s supplement on monotonic functions, Stieltjes integrals, and harmonic analysis, Annals of Mathematics Studies, No. 42, Princeton University Press, New Jersey, 1959. [10] T. T. Cai and P. Hall, Prediction in functional linear regression, Ann. Statist. 34 (2006), 2159–2179. [11] A. Caponnetto, C. A. Micchelli, M. Pontil and Y. Ying, Universal multi-task kernels, J. Mach. Learn. Res. 9 (2008), 1615–1646. [12] Z. Chen, C. A. Micchelli and Y. Xu, Multiscale Methods for Fredholm Integral Equations, Cambridge University Press, Cambridge, 2015. [13] O. Christensen, An Introduction to Frames and Riesz Bases, Birkh¨ auser, Boston, 2002. [14] A. Corrias, G. Navarra, S. Seatzu and F. I. Utreras, Two-phase regularisation method in Hilbert spaces: description and application to EXAFS problems, Inverse Problems 4 (1988), 449–470. [15] D. D. Cox and F. O’Sullivan, Asymptotic analysis of penalized likelihood and related estimators, Ann. Statist. 18 (1990), 1676–1695. 51
[16] F. Cucker and S. Smale, On the mathematical foundations of learning, Bull. Amer. Math. Soc. 39 (2002), 1–49. [17] I. Daubechies, Ten Lectures on Wavelets, CBMS-NSF Regional Conference Series in Applied Mathematics 61, SIAM, Philadelphia, 1992. [18] C. de Boor, R. A. DeVore and A. Ron, The structure of finitely generated shift-invariant spaces in L2 (Rd ), J. Funct. Anal. 119 (1994), 37–78. [19] J. Diestel and J. J. Uhl, Jr., Vector Measures, American Mathematical Society, Providence, 1977. [20] R. J. Duffin and A. C. Schaeffer, A class of nonharmonic Fourier series, Trans. Amer. Math. Soc. 72 (1952), 341–366. [21] T. Evgeniou, C. A. Micchelli and M. Pontil, Learning multiple tasks with kernel methods, J. Mach. Learn. Res. 6 (2005), 615–637. [22] J. Fan, M. Yuan and D. X. Zhou, Learning with functional data, preprint. [23] P. A. Fillmore, Notes on Operator Theory, Van Nostrand Company, New York, 1970. [24] W. Gao and Z. Wu, Quasi-interpolation for linear functionals data, J. Comput. Appl. Math. 236 (2012), 3256–3264. [25] A. Gelb and G. Song, A frame theoretic approach to the nonuniform fast Fourier transform, SIAM J. Numer. Anal. 52 (2014), 1222–1242. [26] T. N. T. Goodman, S. L. Lee and W. S. Tang, Wavelets in wandering subspaces, Trans. Amer. Math. Soc. 338 (1993), 639–654. [27] K. Gr¨ochenig, Reconstruction algorithms in irregular sampling, Math. Comp. 59 (1992), 181– 194. [28] E. Hammerich, Sampling in shift-invariant spaces with Gaussian generator, Sampl. Theory Signal Image Process. 6 (2007), 71–86. [29] E. Hammerich, A generalized sampling theorem for frequency localized signals, Sampl. Theory Signal Image Process. 8 (2009), 127–146. [30] Y. M. Hong, J. M. Kim and K. H. Kwon, Sampling theory in abstract reproducing kernel Hilbert space, Sampl. Theory Signal Image Process. 6 (2007), 109–121. [31] R. Q. Jia, Stability of the shifts of a finite number of functions, J. Approx. Theory 95 (1998), 194–202. [32] G. Kimeldorf and G. Wahba, Some results on Tchebycheffian spline functions, J. Math. Anal. Appl. 33 (1971), 82–95. [33] R. Kress, Linear Integral Equations, Springer, New York, 1989. [34] S. Mallat, A Wavelet Tour of Signal Processing, Second edition, Academic Press, New York, 1999. [35] C. V. M. van der Mee, M. Z. Nashed and S. Seatzu, Sampling expansions and interpolation in unitarily translation invariant reproducing kernel Hilbert spaces, Adv. Comput. Math. 19 (2003), 355–372. 52
[36] J. Mercer, Functions of positive and negative type and their connection with the theory of integral equations, Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 209 (1909), 415–446. [37] C. A. Micchelli, Using the refinement equation for the construction of pre-wavelets, Numer. Algorithms 1 (1991), 75–116. [38] C. A. Micchelli and M. Pontil, A function representation for learning in Banach spaces, Learning Theory, 255–269, Lecture Notes in Computer Science 3120, Springer, Berlin, 2004. [39] C. A. Micchelli and M. Pontil, On learning vector-valued functions, Neural Comput. 17 (2005), 177–204. [40] C. A. Micchelli and Y. Xu, Using the matrix refinement equation for the construction of wavelets on invariant sets, Appl. Comput. Harmon. Anal. 1 (1994), 391–401. [41] C. A. Micchelli, Y. Xu and H. Zhang, Universal kernels, J. Mach. Learn. Res. 7 (2006), 2651–2667. [42] M. Z. Nashed and G. G. Walter, General sampling theorems for functions in reproducing kernel Hilbert spaces, Math. Control Signals Systems 4 (1991), 363–390. [43] E. Novak and H. Wo´zniakowski, Tractability of Multivariate Problems, Volume I: Linear Information, European Mathematical Society, Z¨ urich, 2008. [44] G. B. Pedrick, Theory of reproducing kernels for Hilbert spaces of vector valued functions, Technical Report 19, University of Kansas, 1957. [45] J. G. Pipe and P. Menon, Sampling density compensation in MRI: Rationale and an iterative numerical solution, Magn. Reson. Med., 41 (1999), 179–186. [46] T. Poggio, S. Mukherjee, R. Rifkin, A. Raklin and A. Verri. B, Uncertainty in geometric computations, J. Winkler and M. Niranjan, editors, Kluwer Academic Publishers, 22 (2002), 131–141. [47] T. Poggio and S. Smale, The mathematics of learning: dealing with data, Notices Amer. Math. Soc. 50 (2003), 537–544. [48] J. O. Ramsay and B. W. Silverman, Functional Data Analysis, Second edition, Springer, New York, 2005. [49] B. Sch¨olkopf, R. Herbrich and A. J. Smola, A generalized representer theorem, Proceeding of the Fourteenth Annual Conference on Computational Learning Theory and the Fifth European Conference on Computational Learning Theory, pp. 416–426. Springer-Verlag, London, UK, 2001. [50] B. Sch¨olkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, Mass, 2002. [51] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, Cambridge, 2004. [52] G. Song and A. Gelb, Approximating the inverse frame operator from localized frames, Appl. Comput. Harmon. Anal. 35 (2013), 94–110.
53
[53] G. Song, H. Zhang and F. J. Hickernell, Reproducing kernel Banach spaces with the l1 norm, Appl. Comput. Harmon. Anal. 34 (2013), 96–116. [54] I. Steinwart, On the influence of the kernel on the consistency of support vector machines, J. Mach. Learn. Res. 2 (2002), 67–93. [55] Q. Sun, Nonuniform average sampling and reconstruction of signals with finite rate of innovation, SIAM J. Math. Anal. 38 (2006/07), 1389–1422. [56] W. Sun and X. Zhou, Reconstruction of band-limited functions from local averages, Constr. Approx. 18 (2002), 205–222. [57] J. F. Traub, G. W. Wasilkowski and H. Wo´zniakowski, Information-Based Complexity, Academic Press, New York, 1988. [58] V. N. Vapnik, The Nature of Statistical Learning Theory, Second edition, Springer, New York, 1999. [59] Y. Xu and Q. Ye, Generalized Mercer kernels and reproducing kernel banach spaces, Memoirs of the American Mathematical Society, accepted for publication, 2016. [60] F. Yao, H. G. M¨ uller and J. L. Wang, Functional linear regression analysis for longitudinal data, Ann. Statist. 33 (2005), 2873–2903. [61] K. Yosida, Functional Analysis, Sixth edition, Springer, Berlin, 1980. [62] R. M. Young, An Introduction to Nonharmonic Fourier series, Academic Press, New York, 1980. [63] M. Yuan and T. T. Cai, A reproducing kernel Hilbert space approach to functional linear regression, Ann. Statist. 38 (2010), 3412–3444. [64] H. Zhang, Y. Xu and J. Zhang, Reproducing kernel Banach spaces for machine learning, J. Mach. Learn. Res. 10 (2009), 2741–2775. [65] H. Zhang and J. Zhang, Regularized learning in Banach spaces as an optimization problem: representer theorems, J. Global Optim. 54 (2012), 235–250. [66] H. Zhang and J. Zhang, Vector-valued reproducing kernel Banach spaces with applications to multi-task learning, J. Complexity 29 (2013), 195–215.
54