Discrete Applied Mathematics (
)
–
Contents lists available at ScienceDirect
Discrete Applied Mathematics journal homepage: www.elsevier.com/locate/dam
On the geometry of graph spaces Brijnesh J. Jain Technische Universität Berlin, Germany
article
info
Article history: Received 29 May 2015 Received in revised form 22 June 2016 Accepted 24 June 2016 Available online xxxx Keywords: Graphs Graph metric Orbit space Pattern recognition
abstract Optimal alignment kernels are graph similarity functions defined as pointwise maximizers of a set of positive-definite kernels. Due to the max-operation, optimal alignment kernels are indefinite graph kernels. This contribution studies how the max-operation transforms the geometry of the associated feature space and how standard pattern recognition methods such as linear classifiers can be extended to those transformed spaces. The main result is the Graph Representation Theorem stating that a graph is a point in some geometric space, called orbit space. This result shows that the max-operation transforms the feature space to a quotient by a group action. Orbit spaces are well investigated and easier to explore than the original graph space. We derive a number of geometric results, translate them to graph spaces, and show how the proposed results can be applied to statistical pattern recognition. © 2016 Elsevier B.V. All rights reserved.
1. Introduction In different fields of structural pattern recognition, such as computer vision, chemo- and bioinformatics, objects are naturally represented by attributed graphs [6,12,16,40]. One persistent problem is the gap between structural and statistical methods in pattern recognition [5,8]. This gap refers to a shortcoming of powerful mathematical methods that combine the advantages of structural representations with the advantages of statistical methods defined on Euclidean spaces. One reason for this gap is an insufficient understanding of the geometric structure of graph spaces. The geometry of any space depends on the choice of the underlying distance function. For example, spaces endowed with an intrinsic metric have a richer geometric structure than spaces endowed with an arbitrary distance. The prime example are Euclidean spaces, which have a fundamental status in mathematics. The abundant geometric structure of Euclidean spaces is introduced by an inner product that in turn induces the Euclidean metric. Similarly as inner products, optimal alignment kernels have the potential to introduce a less abundant but still welldeveloped geometric structure into graph spaces. The potential arises from the fact that the definition of optimal alignment kernels implicitly involves inner products. An optimal alignment kernel is a graph similarity function defined as a pointwise maximizer of a set of positive-definite kernels, where the maximum is taken over all possible one-to-one correspondences (alignments) between the nodes of both graphs. Since positive-definite kernels are not closed under the max-operation [1,44], optimal alignment kernels are indefinite graph kernels. Consequently, the geometric structure introduced by optimal alignment kernels is limited. Although it is well-known that graph alignment kernels are indefinite graph kernels, it is unclear how the max-operation transforms the geometry of the feature space associated to the underlying positive-definite kernels. In addition, it is also unclear how pattern recognition methods like linear classifiers can be extended to those transformed feature spaces.
E-mail address:
[email protected]. http://dx.doi.org/10.1016/j.dam.2016.06.027 0166-218X/© 2016 Elsevier B.V. All rights reserved.
2
B.J. Jain / Discrete Applied Mathematics (
)
–
In this contribution, we study the geometry of graph spaces endowed with an optimal alignment kernel. The main result is the Graph Representation Theorem stating that a graph is a point in some geometric space, called orbit space. This result shows that the max-operation transforms the associated feature space to a quotient of a Euclidean space by a group action. Orbit spaces are well investigated, provide access to a plethora of existing results, and are much easier to explore than graph spaces. We derive several geometric concepts and results, translate them to graph spaces, and show how the proposed results can be applied to generalize linear classifiers and other statistical pattern recognition methods to graph spaces. The proposed orbit space framework provides a mathematical foundation for narrowing the gap between structural and statistical pattern recognition and places existing learning methods in graph spaces on a theoretically sound basis. The rest of this paper is structured as follows: Section 2 discusses related work. Section 3 introduces graph alignment spaces. In Sections 4 and 5, we present the Graph Representation Theorem and derive general geometric results. Section 6 places the results into the context of statistical pattern recognition. Finally, Section 7 concludes with a summary of the main results. 2. Related work Graph alignment kernels and their induced distances form a common and widely applied class of graph (dis-)similarity functions [46]. Though graph alignment kernels are indefinite graph kernels, they have been applied to support vector learning by Geibel et al. [18,28] and shortly after by Fröhlich et al. [14]. The problem of computing such graph (dis-)similarities is referred to as the graph matching problem [6,12,40,46]. The graph matching problem can be formulated as a quadratic assignment problem [41], which is known to be NP-complete [17]. Consequently, the majority of work on graph alignment distances is devoted to devise efficient algorithms and heuristics for solving the underlying graph matching problem [46]. In contrast to the amount of work on practical and computational aspects of the graph matching problem, there is only few work on theoretical properties of graph spaces [23,31]. Hurshman and Janssen showed that graph spaces endowed with distances based on the maximum common subgraph induce a discrete topology on the set of isomorphism classes of graphs [23]. Then they introduced and studied the concept of uniform continuity of graph parameters. Graph spaces endowed with an optimal alignment kernel have been first investigated by [31], where the focus was on analytical properties for the differential machinery. A similar approach as in [31] has been followed by Feragen et al. [9,11,10] for tree shapes. Tree-shape spaces are also orbit spaces, but additionally include continuous transitions in the tree topology. This additional requirement leads to a substantially different geometric structure. Feragen et al. studied the geodesic structure of tree-shape spaces that transfers to graph spaces as correctly hypothesized by Feragen et al. (see Theorem 4.7). More generally, this contribution can be placed into the context of statistical analysis of non-Euclidean spaces in the spirit of [42,45] for complex objects, for tree-structured data [10,45], and for shapes [7,37]. Examples of statistical methods adapted to graphs are the sample mean [19,29,30,36], central clustering algorithms [20,21,35,29], learning vector quantization [32,34], and generalized linear classifiers [25,24]. 3. Graph alignment spaces 3.1. Attributed graphs Let A be the set of node and edge attributes. We assume that A contains a distinguished symbol ν denoting the nullattribute. Definition 3.1. An attributed graph is a triple X = (V , E , α) consisting of a finite set V ̸= ∅ of nodes, a set E ⊆ V × V of edges, and an attribute function α : V × V → A such that α(i, j) = ν if and only if (i, j) ̸∈ E for all distinct nodes i, j ∈ V . The attribute function α assigns an attribute to each pair of nodes. Node attributes may take any value from A, edges have non-null attributes, and non-edges are labeled with null-attribute ν . The node set of a graph X is occasionally referred to as VX , its edge set as EX , and its attribute function as αX . By GA we denote the set of attributed graphs with attributes from A. The order |X | of graph X is its number |VX | of nodes. Graphs can be directed and undirected. Attributes can take any values including binary values, discrete symbols, numerical values, vectors, strings, and combinations thereof. Thus, the definition of attributed graph is sufficiently general to cover a wide class of graphs such as binary graphs from graph theory, weighted graphs, molecular graphs, protein structures, and many others. 3.2. Optimal alignment kernels The inner product induces a rich geometric structure into Euclidean spaces. In a similar way, we want to induce a geometric structure into graph spaces by adapting the inner product to graphs. The challenge is that neither the graph space GA nor the attribute set A are vector spaces, in general. A solution to this problem are optimal alignment kernels. An
B.J. Jain / Discrete Applied Mathematics (
)
–
3
optimal alignment kernels is not an inner product, but induces an intrinsic metric on graphs in a similar manner as an inner product induces the Euclidean metric. To define optimal alignment kernels, we need two concepts: (i) the concept of alignment, and (ii) the concept of attribute kernel. We first introduce the concept of alignment. Recall that a partial map φ : U → V between two sets U and V is a map with domain dom(φ) ⊆ U. Definition 3.2 (Alignment). Let X , Y ∈ GA be two graphs. An alignment between X and Y is an injective partial map
φ : VX → VY such that
|dom(φ)| = min {|X | , |Y |} .
(1)
By A(X , Y ) we denote the set of all alignments between X and Y . Observe that the domain dom(φ) of alignment φ is a proper subset of VX , whenever |Y | < |X |. Eq. (1) demands that there is no injective partial map with larger domain. Thus, an alignment between two graphs X and Y is a maximal 1-1correspondence between the nodes of both graphs. We continue with introducing the concept of attribute kernel. An attribute kernel is a special node and edge similarity function. Definition 3.3 (Attribute Kernel). Let A be the set of attributes with null-attribute ν ∈ A. An attribute kernel is a positivedefinite kernel k : A × A → R with k(x, ν) = 0 for all x ∈ A. An attribute kernel k has an associated feature map Φ : A → H into a Hilbert space H such that k(x, y) = Φ (x)T Φ (y) for all x, y ∈ A. The feature map Φ sends the null-attribute ν ∈ A to the zero vector Φ (ν) = 0 in H . We show three standard examples of attribute kernels often used in applications. Example 3.4. 1. Let A = Rp . Then the standard inner product k1 (x, y ) = xT y is a positive-definite kernel with feature map Φ = id. 2. Let A = z1 , . . . , zq be a finite discrete set. Then k2 (zi , zj ) =
1: 0:
i=j i ̸= j
is a positive-definite kernel map Φ (zi ) = ei , where ei ∈ Rq is the ith standard basis vector. with feature d 3. Suppose that A = R × z1 , . . . , zq . Then k (x, zi ), (y , zj ) = k1 (x, y ) + k2 (zi , zj )
is a positive-definite kernel with feature map Φ (y , zi ) = (y , ei ), where ei ∈ Rq is the ith standard basis vector. An alignment φ between X and Y together with an attribute kernel k gives rise to a similarity score
κφ (X , Y ) =
k αX i, j , αY φ(i), φ(j)
.
i,j∈ dom(φ)
In the definition of κφ , terms indexed by i, j ∈ dom(φ) refer to edge (node) similarities if i ̸= j (i = j). The score κφ (X , Y ) measures how well φ aligns the structure of graph X to the structure of graph Y . We may regard κφ as a positive-definite kernel defined on a subset of GA for which φ can be extended to a feasible alignment. Then we define an optimal alignment kernel as a maximum of a set of positive-definite kernels over all possible alignments. Definition 3.5. Let k be an attribute kernel on the attribute set A. The optimal alignment kernel induced by k is a function κ : GA × GA → R of the form
κ(X , Y ) = max κφ (X , Y ). φ∈A(X ,Y )
An optimal alignment kernel is a structural similarity function on graphs that measures how well the structure of graph X can be aligned to the structure of graph Y . Since positive-definiteness is not closed under finite maximum operations, an optimal alignment kernel is not positive-definite [1,44], but an optimal alignment kernel κ induces an intrinsic metric on graphs in a similar way as an inner product induces the Euclidean metric. Definition 3.6. Let κ be an optimal alignment kernel on GA . The graph alignment distance induced by κ is defined as
δ : GA × GA → R,
(X , Y ) →
κ(X , X ) − 2κ(X , Y ) + κ(Y , Y ).
The pair (GA , δ) is called graph alignment space. In the next section, we show that a graph alignment distance is indeed an intrinsic metric.
4
B.J. Jain / Discrete Applied Mathematics (
)
–
4. Geometry of Euclidean graph spaces This section introduces Euclidean graph spaces and studies their geometry. The main result is the Graph Representation Theorem stating that Euclidean graph spaces are isometric to orbit spaces. This result forms the basis for gaining deep insight into the geometry of graph spaces. 4.1. Euclidean graph spaces Euclidean graph spaces are a special class of graph alignment spaces. Studying Euclidean graphs is motivated by two factors: first, to lay the foundations for understanding the properties of arbitrary graph alignment spaces; and second, to form the basis for generalizing standard pattern recognition methods to graph spaces. Arbitrary graph alignment spaces are studied in Section 5 and the role of Euclidean graph spaces in statistical pattern recognition is discussed in Section 6. Definition 4.1. A Euclidean graph space (GH , δ) of order n is a graph alignment space with the following properties: 1. |X | = n for all X ∈ GH . 2. H = Rd . 3. The distance δ is induced by the standard inner product on H . The first property states that all Euclidean graphs have identical order n, that is, the same number of nodes. The second property states that the attribute set H is a finite-dimensional Euclidean space. The third property assumes that the attribute kernel k is the standard inner product on H . Then the kernel k induces an optimal alignment kernel κ , which in turn induces the graph alignment distance δ . From Definition 4.1 together with the definition of attribute kernels follows that ν = 0 is the null-attribute of H , because there is no other element z ∈ H satisfying k(x, z ) = 0 for all x ∈ H . Throughout Section 4, we use the following notations without any further mentioning: Notation 4.2. By (GH , δ) we denote Euclidean graph spaces of order n. The attribute set is of the form H = Rd . The graph alignment distance δ is induced by the graph alignment kernel κ , which in turn is induced by the attribute kernel k : H × H → R. The attribute kernel k is the standard inner product on H . 4.2. The graph representation theorem To state the Graph Representation Theorem, we need to introduce orbit spaces. Let (X, ∥·∥) be a Euclidean space and let
Γ be a group with neutral element ε . An action of group Γ on X is a map
φ : Γ × X → X,
(γ , x) → γ x
satisfying 1. (γ ◦ γ ′ )x = γ (γ ′ x) 2. ε x = x for all γ , γ ′ ∈ Γ and for all x ∈ X. The group Γ acts isometrically on X if
∥x − y ∥ = ∥γ x − γ y ∥ for all x, y ∈ X and for all γ ∈ Γ . The orbit of x ∈ X under the action Γ is the subset of X defined by [x] = {γ x : γ ∈ Γ } . We write x′ ∈ [x] to denote that x′ is an element of the orbit [x]. Definition 4.3. The orbit space of a finite group Γ acting isometrically on a Euclidean space X is the quotient set
X/Γ = {[x] : x ∈ X} of all orbits. The natural projection map is defined by
π : X → X/Γ ,
x → [x] .
In general, orbit spaces are defined for any group acting on some set. Here, we always assume that an orbit space complies with Definition 4.3. Theorem 4.4 (Graph Representation Theorem). A Euclidean graph space (GH , δ) of finite order is isometric to an orbit space X/Γ . Proof. We present a constructive proof. 1. Suppose that GH is of order n. Let X = H n×n be the setof all (n × n)-matrices with elements from H . An attributed graph X = (V , E , α) can be represented by a matrix x = xij from X with elements xij = α(i, j) ∈ H for all i, j ∈ V .
B.J. Jain / Discrete Applied Mathematics (
)
–
5
2. The form of a matrix x representing graph X is generally not unique and depends on how the nodes are arranged in the diagonal of x. Different orderings of the nodes may result in different matrix representations. The set of all possible reorderings of all nodes of X is (isomorphic to) the symmetric group Sn , which in turn is isomorphic to the group Γ of all simultaneous row and column permutations of a matrix from X. Thus, we have a group action
Γ × X → X,
(γ , x) → γ x,
where γ x denotes the matrix obtained by simultaneously permuting the rows and columns according to γ . Since Γ acts isometrically on X, we have an orbit space X/Γ with natural projection π : X → X/Γ according to Definition 4.3. 3. The quotient topology on X/Γ is the finest topology for which the projection map π is continuous. Thus, the open sets U of the quotient topology on X/Γ are those sets for which π −1 (U) is open in X. 4. We can endow X/Γ with the quotient distance defined by d([x] , [y]) = inf
k
∥xi − yi ∥ ,
i =1
where the infimum is taken over all finite sequences (x1 , . . . .xk ) and (y1 , . . . .yk ) with [x1 ] = [x], [yn ] = [y], and [xi ] = [yi+1 ] for all i ∈ {1, . . . , k − 1} (see [4], Example 5.21(6)). The distance d ([x] , [y]) is a pseudo-metric [4], Lemma 5.20. Since Γ is finite and acts by isometries, the pseudo-metric is a metric of the form (see [4, Prop. 8.5(2)]) d([x] , [y]) = min {∥x − y ∥ : x ∈ [x] , y ∈ [y]}
= min {∥x − y ∥ : x ∈ [x]} = min {∥x − y ∥ : y ∈ [y]} , where y in the second row and x in the third row are arbitrarily chosen representations. 5. The quotient metric d([x] , [y]) on X/Γ defined in Part 4 induces a topology that coincides with the quotient topology defined in Part 3. 6. The map
ω : GH → X/Γ ,
X → [x]
sends each graph X to the orbit [x] consisting of all representations of X . The map ω is surjective, because any matrix x ∈ X represents a valid graph X . The problem of dangling edges is solved by allowing nodes with null-attribute. Since the projection π : X → X/Γ is surjective, any orbit [x] is the image of some graph X under the map ω. The map ω is also injective. Suppose that X and Y are non-isomorphic graphs with respective representations x and y. If x and y are in the same orbit, then there is an isomorphism between X and Y , which contradicts our assumption. Thus, ω is a bijection. 7. Let Vφ = dom(φ) denote the domain of alignment φ ∈ A(X , Y ) between graphs X and Y . Suppose that ω(X ) = [x] and ω(Y ) = [y]. We have
κ(X , Y ) = max
φ∈A(X ,Y )
i,j∈Vφ
= max
φ∈A(X ,Y )
T αX i, j αY φ(i), φ(j) xij T yφ(i)φ(j)
i,j∈Vφ
= max x (γ y ). T
γ ∈Γ
From Proposition 4.11 follows that k(X , X ) = xT x and k(Y , Y ) = y T y. From the definition of a graph alignment distance follows
δ 2 (X , Y ) = κ(X , X ) − 2κ(X , Y ) + κ(Y , Y ) = xT x − max xT (γ y ) + y T y γ ∈Γ
= min ∥x − γ y ∥2 γ ∈Γ
= d2 ([x] , [y]) . This shows that the graph alignment distance coincides with the quotient metric. 8. The implication of Part 7 is twofold: 1. The distance δ(X , Y ) induced by κ is a metric. 2. The map ω : GH → X/Γ defined in part 6 is an isometry. This completes the proof.
6
B.J. Jain / Discrete Applied Mathematics (
)
–
Corollary 4.5 summarizes some useful results obtained by proving the Graph Representation theorem. The results have also been proved in [31]. Corollary 4.5. Suppose that (GH , δ) is isometric to the orbit space X/Γ . Then the following properties hold: 1. The natural projection π : X → X/Γ is continuous. 2. The quotient distance d on X/Γ is a metric. 3. The map
ω : GH → X/Γ ,
X → [x]
is a bijective isometry between the metric spaces (GH , δ) and (X/Γ , d). 4. The graph alignment distance δ is a metric on GH satisfying
δ(X , Y ) = min {∥x − y ∥ : x ∈ ω(X ), y ∈ ω(Y )} = min {∥x − y ∥ : x ∈ ω(X )} for all y ∈ ω(Y ) = min {∥x − y ∥ : y ∈ ω(Y )} for all x ∈ ω(X ) for all X , Y ∈ GH . 5. The optimal alignment kernel κ of GH satisfies
κ(X , Y ) = max xT y : x ∈ ω(X ), y ∈ ω(Y ) = max xT y : x ∈ ω(X ) for all y ∈ ω(Y ) T = max x y : y ∈ ω(Y ) for all x ∈ ω(X ) for all X , Y ∈ GH . The Graph Representation Theorem implies that studying Euclidean graph spaces reduces to the study of orbit spaces
X/Γ . Though analysis of orbit spaces is more general, we translate all results into graph spaces to make them directly accessible for statistical pattern recognition. For the sake of convenience, we introduce the following terminology: Notation 4.6. By GH ∼ = X/Γ we identify the graph space GH with the orbit space X/Γ via the bijective isometry ω : GH → X/Γ . We briefly write x ∈ X if ω(X ) = [x]. In this case, we call matrix x a representation of graph X . The next theorem summarizes some properties of Euclidean graph spaces that are useful for a statistical analysis of graphs. The assertions of Theorem 4.7 have been applied in the context of sample mean of graphs [26]. Theorem 4.7. A Euclidean graph space (GH , δ) has the following properties: 1. 2. 3. 4.
(GH , δ) is a complete metric space. (GH , δ) is a geodesic space. (GH , δ) is locally compact. Every closed bounded subset of (GH , δ) is compact.
Proof. By using GH ∼ = X/Γ , it is sufficient to show that the assertions hold in the orbit space X/Γ . 1. Since the group Γ is finite, all orbits are finite and therefore closed subsets of X. The Euclidean space X is a finitely compact metric space. Then X/Γ is a complete metric space [43], Theorem 8.5.2. 2. Since X is a finitely compact metric space and Γ is a discontinuous group of isometries, the assertion follows from [43], Theorem 13.1.5. 3. Since Γ is finite and therefore compact group, the assertion follows from [3], Theorem 3.1. 4. Since GH is a complete, locally compact length space, the assertion follows from the Hopf–Rinow Theorem (see e.g. [4, Prop. 3.7]). 4.3. Length, angle, and orthogonality This section introduces basic geometric concepts such as length, angle, and orthogonality of graphs. Definition 4.8. The scalar multiplication on GH is a function
· : R × GH → GH ,
(λ, X ) → λX ,
where λX is the graph obtained by scalar multiplication of λ with all node and edge attributes of X . In contrast to scalar multiplication on vectors, scalar multiplication on graphs is only positively homogeneous.
B.J. Jain / Discrete Applied Mathematics (
)
–
7
Proposition 4.9. Let λ ∈ R+ be a non-negative scalar. Then we have
κ(X , λY ) = λκ(X , Y ) for all X , Y ∈ GH . Proof. We revise the proof presented in [31], Prop. 9. From GH ∼ = X/Γ and Corollary 4.5 follows
κ(X , Y ) = max xT y : x ∈ X , where y ∈ Y is arbitrary but fixed. Suppose that κ(X , Y ) = xT0 y for some representation x0 ∈ X . Then we have xT0 y ≥ xT y for all x ∈ X . This implies xT0 (λy ) = λ xT0 y ≥ λ xT y = xT (λy )
for all non-negative scalars λ ∈ R+ . The assertion follows from λY = [λy] if Y = [y]. Using optimal alignment kernels, we can define the length of a graph in the usual way. Definition 4.10. The length ℓ(X ) of graph X ∈ GH is defined by
ℓ(X ) =
κ(X , X ).
The length of a graph can be determined efficiently, because the identity alignment maximizes the optimal alignment kernel. Proposition 4.11. The length of X is of the form
ℓ(X ) = ∥x∥ =
κ(X , X )
for all x ∈ X . Proof. From Corollary 4.5 follows
κ(X , X ) = max xT x′ : x, x′ ∈ X . We have xT x′ = ∥x∥ x′ cos α,
where α is the angle between x and x′ . Since Γ is a group of isometries acting on X, we have ∥x∥ = x′ for all elements x and x′ from the same orbit. Thus, xT x′ is maximum if the angle α ∈ [0, 2π ] is minimum over all pairs of elements from X . The minimum angle is zero for pairs of identical elements. This shows the assertion.
The relationship between the length of a graph and the optimal alignment kernel is given by a weak form of the Cauchy–Schwarz inequality. To state the weak Cauchy–Schwarz inequality, we need the notion of positive dependence. We say, two graphs X , Y ∈ GH are positively dependent, if there is a non-negative scalar λ ≥ 0 such that Y = λX . Theorem 4.12 (Weak Cauchy–Schwarz). Let X , Y ∈ GH be two graphs. Then we have
|κ(X , Y )| ≤ ℓ(X ) · ℓ(Y ), where equality holds when X and Y are positively dependent. Proof. We revise the proof presented in [31], Theorem 11. From GH ∼ = X/Γ and Corollary 4.5 follows
κ(X , Y ) = max xT y : x ∈ X , y ∈ Y for all X , Y ∈ GH . Suppose that x0 ∈ X and y0 ∈ Y are representations such that κ(X , Y ) = x0 T y0 . From the standard Cauchy–Schwarz inequality together with Proposition 4.11 follows
|κ(X , Y )| = x0 T y0 ≤ ∥x0 ∥ ∥y0 ∥ = ℓ(X )ℓ(Y ). Next, we show the second assertion, that is, equality of the weak Cauchy–Schwarz inequality if X and Y are positively dependent. Suppose that X = λY for some λ ≥ 0. From the definition of the length of a graph together with Proposition 4.9 follows
λ2 ℓ2 (X ) = λ2 κ(X , X ) = κ(λX , λX ) = ℓ2 (λX ). Then by using Proposition 4.9 we obtain
|κ(X , λX )| = λκ(X , X ) = λℓ(X )ℓ(X ) = ℓ(X )ℓ(λX ).
8
B.J. Jain / Discrete Applied Mathematics (
)
–
We consider the Cauchy–Schwarz inequality as weak, because equality holds only for positively dependent graphs. This is in contrast to the original Cauchy–Schwarz inequality in vector spaces, where equality holds, when two vectors are linearly dependent. Observe that Proposition 4.11 can be regarded as a special case of the weak Cauchy–Schwarz by setting Y = 1 · X . We apply the weak Cauchy–Schwarz inequality to define an angle ^(X , Y ) between two graphs X and Y . For this, we define the null-graph 0 ∈ GH as the graph that is mapped to [0] ∈ X/Γ via the map ω. The null-graph consists of isolated nodes with zero-attribute and without connection to any other node. Definition 4.13. The cosine of the angle α between non-null graphs X and Y is defined by cos α =
κ(X , Y ) . ℓ(X )ℓ(Y )
With the notion of angle, we can introduce orthogonality between graphs. Definition 4.14. Two graphs X and Y are orthogonal, if κ(X , Y ) = 0. A graph X is orthogonal to a subset U ⊆ GH , if
κ(X , Y ) − κ(X , Z ) = 0 for all Y , Z ∈ U. The length of a graph and the angle between two graphs are well-defined geometric concepts that may follow our intuition about length and angle in Euclidean spaces. The situation, however, is different for the concept of orthogonality. For further details, we refer to Section 6.2.2. 4.4. Geometry from a generic viewpoint This section studies the geometry of graph spaces from a generic viewpoint. 4.4.1. Preliminaries Suppose that the metric space (M , d) is either a Euclidean space or a graph alignment space. The underlying topology that determines the open subsets of M is the topology induced by the metric d. The open sets of the topology are all subsets that can be realized as the unions of open balls
B (z , ρ) = {x ∈ M : d(z , x) < ρ} , with center z ∈ M and radius ρ > 0. The following definition of generic property is taken from [10], Definition 2. Definition 4.15. A generic property in a metric space (M , d) is a property that holds on a dense open subset of M . We will use the following notations: Notation 4.16. Let U ⊆ X be a subset. By U◦ we denote the largest open set contained in U, by cl(U) the closure of U, and by ∂ U the boundary of U. 4.4.2. Dirichlet fundamental domains We assume that the group Γ is non-trivial. In the trivial case, we have X = X/Γ and therefore everything reduces to the geometry of Euclidean spaces. For every x ∈ X, we define the isotropy group of x as the set
Γx = {γ ∈ Γ : γ x = x} . An ordinary point x ∈ X is a point with trivial isotropy group Γx = {ε}. A singular point is a point with non-trivial isotropy group. If x is ordinary, then all elements of the orbit [x] are ordinary points. A subset F of X is a fundamental set for Γ if and only if F contains exactly one point x from each orbit [x] ∈ X/Γ . A fundamental domain of Γ in X is a closed set D ⊆ X that satisfies 1. X = γ ∈Γ γ D 2. γ D ◦ ∩ D ◦ = ∅ for all γ ∈ Γ \ {ε}.
A proof of the next result can be found in [43], Theorem 6.6.13. Proposition 4.17. Let z ∈ X be ordinary. Then the set
Dz = {x ∈ X : ∥x − z ∥ ≤ ∥x − γ z ∥ for all γ ∈ Γ } is a fundamental domain, called Dirichlet fundamental domain centered at z.
B.J. Jain / Discrete Applied Mathematics (
)
–
9
From Proposition 4.17 follows that the Dirichlet fundamental domains Dz centered at the different representations z of an ordinary graph Z are the cells of a Voronoi tessellation. Thus, the interior of every cell Dz contains those representations x of a graph X that are closer to representation z of Z than to any other representation of Z . Proposition 4.18. Let Dz be a Dirichlet fundamental domain centered at an ordinary point z. Then the following properties hold: 1. 2. 3. 4. 5. 6. 7.
Dz is a convex polyhedral cone. There is a fundamental set Fz such that Dz◦ ⊆ Fz ⊆ Dz . We have z ∈ Dz◦ . Every point x ∈ Dz◦ is ordinary. Suppose that x, γ x ∈ Dz for some γ ∈ Γ \ {ε}. Then x, γ x ∈ ∂ Dz . γ Dz = Dγ z for all γ ∈ Γ . The Dirichlet fundamental domain can be equivalently expressed as
Dz = x ∈ X : xT z ≥ xT γ z for all γ ∈ Γ .
Proof. 1. For each γ ̸= ε , we define the closed halfspace
Hγ = {x ∈ X : ∥x − z ∥ ≤ ∥x − γ z ∥} . Then the Dirichlet fundamental domain Dz is of the form
Dz =
Hγ .
γ ∈Γ
As an intersection of finitely many closed halfspaces, the set Dz is a convex polyhedral cone [15]. 2. [43], Theorem 6.6.11. 3. The isotropy group of an ordinary point is trivial. Thus z T z > z T γ z for all γ ∈ Γ \ {ε}. This shows that z lies in the interior of Dz . 4. Suppose that x ∈ Dz◦ is singular. Then the isotropy group Γx is non-trivial. Thus, there is a γ ∈ Γ \ {ε} with x = γ x. This implies x ∈ γ Dz ∩ Dz . Then x ∈ ∂ Dz is a boundary point of Dz by [43], Theorem 6.6.4. This contradicts our assumption x ∈ Dz◦ and shows that x is ordinary. 5. From x, γ x ∈ Dz follows ∥x − z ∥ = ∥γ x − z ∥. Since Γ acts by isometries, we have ∥x − z ∥ = ∥γ x − γ z ∥. Combining both equations yields ∥γ x − z ∥ = ∥γ x − γ z ∥. This shows that γ x ∈ ∂ Dz . Let γ ′ ∈ Γ be the inverse of γ . Since γ ̸= ε , we have γ ′ ̸= ε . Then
∥x − z ∥ = ∥γ x − z ∥ = γ ′ γ x − γ ′ z = x − γ ′ z , where the second equation follows from isometry ofthe group action. Thus, ∥x − z ∥ = x − γ ′ z shows that x ∈ ∂ Dz . ′ 6. Let x ∈ γ Dz . We have ∥x − γ z ∥ ≤ γ x − γ z for all γ ′ ∈ Γ showing that x ∈ Dγ z . Now assume that x ∈ Dγ z . Let γ ′ ∈ Γ be the inverse of γ . Then we have
∥x − γ z ∥ = γ ′ x − γ ′ γ z = γ ′ x − z by isometry of γ ′ . Hence, γ ′ x ∈ Dz and therefore γ γ ′ x = x ∈ γ Dz . 7. The following equivalences hold for all γ ∈ Γ : x ∈ Dz
⇔
∥x − z ∥2 ≤ ∥x − γ z ∥2
⇔
∥x∥2 + ∥z ∥2 − 2xT z ≤ ∥x∥2 + ∥γ z ∥2 − 2xT γ z
⇔ xT z ≥ xT γ z . The last equivalence uses that Γ acts on X by isometries. This shows the last property. We first note that the following statements are independent of the particular choice of representation z of an ordinary graph Z due to Property (6) of Proposition 4.18. From Property (1) together with Proposition 4.9 on non-negative scalar multiplication follows that a graph alignment space looks like a convex cone CZ from the point of view of an ordinary graph Z . From Property (3) follows that an ordinary graph Z is in the interior of its convex cone CZ . Moreover, by Property (4) all graphs in the interior of cone CZ are ordinary graphs. Singular graphs are always boundary points of CZ . Translated to Dirichlet fundamental domains Dz centered at representation z ∈ Z , having a unique representation in Dz is a generic property. Graphs with more than one representation in Dz are singular graphs and their different representations are all boundary points of Dz . The next result shows that all these properties are generic. Corollary 4.19. Being an ordinary point is a generic property in X. Proof. Suppose that z ∈ X is ordinary. Then there is a Dirichlet fundamental domain Dz . From Proposition 4.18 follows that all points of the open set Dz◦ are ordinary. With z all representatives from [z] are ordinary. Then all points of γ Dz◦ are ordinary for every γ ∈ Γ . The assertion holds, because the union γ γ Dz◦ is open and dense in X.
10
B.J. Jain / Discrete Applied Mathematics (
)
–
4.4.3. The Weak Graph Representation Theorem Let ω : GH → X/Γ be the bijective isometry defined in Corollary 4.5. A graph Z ∈ GH is ordinary, if there is an ordinary representation z ∈ Z . In this case, all representations of Z are ordinary. The following result is an immediate consequence of Corollary 4.19. Corollary 4.20. Being an ordinary graph is a generic property in GH . Proof. 1. The projection π : X → X/Γ is open and surjective. Then the image π (U) of an open and dense subset U ⊆ X is open and dense in X/Γ . To see this observe that from π (X) = π (cl(U)) and surjectivity of π follows π (cl(U)) = X/Γ . From X/Γ = π (cl(U)) ⊆ cl (π (U)) follows that π (U) is open and dense in X/Γ . 2. There is an open and dense subset U ⊆ X of ordinary points, because being ordinary is a generic property in X. By definition, the graph Z represented by an ordinary point z ∈ Z is ordinary. Hence, all graphs from the set π (U) are ordinary. From the first part of the proof follows that π (U) is open and dense. Thus, being ordinary is also a generic property in X/Γ . The Weak Graph Representation Theorem describes the shape of a graph space from a generic viewpoint. Theorem 4.21 (Weak Graph Representation Theorem). Let (GH , δ) be a Euclidean graph space of order n. For each ordinary graph Z ∈ GH there is an injective map µ : GH → X into a Euclidean space (X, ∥·∥) such that 1. 2. 3. 4.
δ(Z , X ) = ∥µ(Z ) − µ(X )∥ for all X ∈ GH . δ(X , Y ) ≤ ∥µ(X ) − µ(Y )∥ for all X , Y ∈ GH . The closure Dµ = cl (µ(GH )) is a convex polyhedral cone in X. We have Dµ◦ ( µ(GH ) ( Dµ .
Proof. Let ω : GH → X/Γ be the bijective isometry defined in Corollary 4.5. 1. Suppose that Z ∈ GH is a graph with z ∈ ω(Z ) = [z]. Since Z is ordinary so is z by definition. Let Dµ = Dz be the Dirichlet fundamental domain centered at z, and Fz ⊂ Dµ is a fundamental set. 2. The fundamental set Fz ⊆ X induces a bijection f : Fz → X/Γ that maps each element x to its orbits [x]. Then the map
µ : GH → X ,
X → f −1 (ω(X ))
is injective as a composition of injective maps. 3. We show the first property. Let X be a graph. Then from Corollary 4.5 follows
δ(Z , X ) = min z ′ − x : z ′ ∈ ω(Z ) , where x = µ(X ). Since Fz is subset of the Dirichlet fundamental domain Dz , we have δ(Z , X ) = ∥z − x∥. This shows the assertion. 4. We show the second property. From the second part of this proof follows µ(X ) = f −1 (ω(X )). This implies µ(X ) ∈ ω(X ). Thus, µ maps every X to exactly one representation x ∈ ω(X ). Then the assertion follows from Corollary 4.5. 5. We have µ(GH ) = Fz . Then the third and fourth property follow from Proposition 4.18. We call the map µ : GH → X a cross section of GH along µ(Z ). The polyhedral cone Dµ is the Dirichlet fundamental domain centered at µ(Z ). Note that a cross section along Z is not unique. The first property of the Weak Graph Representation Theorem states that there is an isometry with respect to an ordinary graph Z into some Euclidean space. The second property states that the cross section µ is an expansion of the graph space. Properties 3 and 4 say that the image of a cross section along an ordinary graph is a dense subset of a convex polyhedral cone. A polyhedral cone is the intersection of finitely many half-spaces. Fig. 1 illustrates the statements of Theorem 4.21. According to Theorem 4.21 a cross section µ along an ordinary graph Z is an isometry with respect to Z , but generally an expansion of the graph space. Next, we are interested in convex subsets U ⊆ GH such that µ is isometric on U, because these subsets have the same geometric properties as their convex images µ(U) in the Euclidean space X by isometry. To characterize such subsets, we introduce the notion of cone circumscribing a ball for both metric spaces, the Euclidean space (X, ∥·∥) and the graph alignment space (Γ , δ). Definition 4.22. Suppose that the metric space (M , d) is either a Euclidean space or graph alignment space. Let z ∈ M and let ρ > 0. A cone circumscribing a ball B (z , ρ) is a subset of the form
C (z , ρ) = {x ∈ M : ∃ λ > 0 s.t. λx ∈ B (z , ρ)} . The next results states that a cross section along an ordinary graph induces a bijective isometry between cones circumscribing sufficiently small balls.
B.J. Jain / Discrete Applied Mathematics (
)
–
11
Fig. 1. Illustration of the Weak Graph Representation (WGR) Theorem. Suppose that µ : GH → X is a cross section along representation z = µ(Z ). The box on the left shows the graph alignment space GH together with graphs X , Y , Z ∈ GH . The box on the right depicts the image µ(GH ) as a region of the Euclidean space X together with the images x = µ(X ), y = µ(Y ), and z = µ(Z ). Property (1) of the WGR Theorem states that µ is isometric with respect to Z . Distances are preserved for δ(Z , X ) = ∥z − x∥ and δ(Z , Y ) = ∥z − y ∥ as indicated by the respective solid red lines. Property (2) of the WGR Theorem states that δ(X , Y ) ≤ ∥x − y ∥ as indicated by the shorter dashed red line connecting X and Y in GH compared to the longer dashed red line connecting the images x and y. Properties 3 and 4 of the WGR Theorem state that the closure Dµ of the image µ(GH ) in X (right box) is a convex polyhedral cone. Points of Dµ without pre-image in GH are boundary points as indicated by the holes in the dotted black line. Since a polyhedral cone can be regarded as a set of rays emanating from the origin, the side opposite of 0 of the Dirichlet fundamental domain Dµ is unbounded. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Theorem 4.23. A cross section µ : GH → X along an ordinary graph Z can be restricted to a bijective isometry from C (Z , ρ) onto C (z , ρ) for all ρ such that 0 < ρ ≤ ρ∗ =
1 2
min ∥z − x∥ : x ∈ ∂ Dµ ,
where z = µ(Z ). Proof. 1. According to Corollary 4.5, we have GH ∼ = X/Γ . The group Γ is a discontinuous group of isometries. Suppose that z ∈ Z . Since Z is ordinary, the isotropy group Γz is trivial. Then the natural projection π : X → X/Γ induces an isometry from B (z , ρ) onto B (π (z ), ρ) for all ρ such that 0<ρ≤
1 4
min {∥z − γ z ∥ : γ ∈ Γ \ {ε}}
(2)
by [43], Theorem 13.1.1. This implies an isometry from B (z , ρ) onto B (Z , ρ). 2. Since Z is ordinary, we have z ∈ Dµ◦ by Proposition 4.18. The ball B (z , ρ) is contained in the open set Dµ◦ for every radius ρ satisfying Eq. (2). To see this, we assume that B (z , ρ) contains a boundary point x of Dµ . Then there is a γ ∈ Γ \{ε} such that
∥x − γ z ∥ = ∥x − z ∥ ≤ ρ. We have
∥z − γ z ∥ = ∥z − x + x − γ z ∥ ≤ ∥z − x∥ + ∥γ z − x∥ ≤ 2ρ. Since ρ satisfies Eq. (2), we obtain a chain of inequalities of the form 1 2
∥z − γ z ∥ ≤ ρ ≤
1 4
∥z − γ z ∥ .
This chain of inequalities in invalid, because z is ordinary and γ ̸= ε . From the contradiction follows B (z , ρ) ⊆ Dµ◦ . 3. Let ρ ∈ ]0, ρ ∗ ]. We show that C (z , ρ) ⊂ Dµ◦ . The cone C (z , ρ) is contained in Dµ due to part one of this proof and convexity of Dµ . Suppose that there is a point x ∈ C (z , ρ) ∩ ∂ Dµ . Then there is a γ ∈ Γ \ {ε} such that x lies on the hyperplane H separating the Dirichlet fundamental domains Dµ and γ Dµ . Consider the ray L+ x = {λx : 0 ≤ λ} ⊂ C (z , ρ). + Two cases can occur: (1) either L+ x ⊂ H or (2) Lx ∩ H = {x}. The first case contradicts that B (z , ρ) is in the interior of Dµ , ◦ because the ray L+ x passes through B (z , ρ). The second case contradicts convexity of Dµ . Thus we proved C (z , ρ) ⊂ Dµ . ∗ ′ ′ 4. Let ρ ∈ ]0, ρ ] and X , Y ∈ C (Z , ρ). Then there are positive scalars a, b > 0 such that X = aX and Y = bY are contained in B (Z , ρ) by definition of C (Z , ρ). Suppose that x′ ∈ X ′ and y ′ ∈ Y ′ are representations of X ′ and Y ′ such that x′ , y ′ ∈ B (z , ρ) ⊆ Dµ◦ . Since Γ is a subgroup of the general linear group, we have x′ = ax and y ′ = by with x ∈ X and y ∈ Y . From part three of this proof follows that x = x′ /a and y = y ′ /b are also contained in Dµ◦ . Applying Proposition 4.9
12
B.J. Jain / Discrete Applied Mathematics (
)
–
Fig. 2. Illustration of Theorem 4.23. The two columns represent cross sections µ along different graphs Z into the Euclidean space X. The restriction of µ to the isometry cone C (Y ) is an isometric isomorphism into the (round) hypercone C (z ), where z = µ(Z ) is the image of graph Z . The isometric and isomorphic cones are shaded in dark orange. The hypercone C (z ) is wider the more z is centered within its Dirichlet fundamental domain Dµ . (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
yields
κ(X ′ , Y ′ ) = (ax)T (by ) = ab xT y = ab · κ(X , Y ). This implies δ(X , Y ) = ∥x − y ∥. This shows a bijective isometry from C (Z , ρ) onto µ(C (Z , ρ)). 5. We show that µ(C (Z , ρ)) = C (z , ρ). From part four of this proof follows that µ(C (Z , ρ)) ⊆ C (z , ρ). It remains to show that C (z , ρ) ⊆ µ(C (Z , ρ)). Let x ∈ C (z , ρ). Then there is a scalar a > 0 such that x′ = ax is in the ball B (z , ρ) ⊂ C (z , ρ). From the first part of the proof follows that there is a graph X ′ ∈ B (X ′ , ρ) ⊂ C (Z , ρ) with x′ ∈ X ′ . By definition of C (Z , ρ), we have X = X ′ /a is also in C (Z , ρ). We need to show that µ(X ) = x. From the third part of this proof follows that C (z , ρ) ⊂ Dµ◦ . This implies that x ∈ Dµ◦ . From Proposition 4.18 follows that there is no other representation x′ ∈ X contained in Dµ . Since µ is surjective onto Dµ◦ according to the Weak Graph Representation Theorem, we have µ(X ) = x. This shows that C (z , ρ) ⊆ µ(C (Z , ρ)). The maximum radius ρ ∗ in Theorem 4.23 is half the minimum distance of z from the boundary of its Dirichlet fundamental domain Dµ . The circular cone C (z , ρ ∗ ) in X is wider the more centered z is within its Dirichlet fundamental domain Dµ . Then by isometry, the cone C (Z , ρ ∗ ) in GH is also wider. Note that for every generic graph Z , the circular cone C (z , ρ ∗ ) never collapses to a single ray. Fig. 2 visualizes Theorem 4.23. A direct implication of the previous discussion is a correspondence of basic geometric concepts between graph alignment spaces and their images in Euclidean spaces via cross sections. The next result summarizes some correspondences. Corollary 4.24. Let µ : GH → X be a cross section along an ordinary graph Z ∈ GH . Then the following statements hold for all X ∈ GH : 1. 2. 3. 4. 5.
κ(Z , X ) = µ(Z )T µ(X ) for all X ∈ GH . ℓ(X ) = ∥µ(X )∥ ^(Z , X ) = ^ (µ(Z ), µ(X )). Z and X are orthogonal ⇔ µ(Z ) and µ(X ) are orthogonal. Z is orthogonal to a subset U ⊆ GH ⇒ µ(Z ) is orthogonal to µ(U).
5. Geometry of graph alignment spaces In the previous section, we studied the geometry of Euclidean graph spaces. This section shows that graph alignment spaces can be isometrically embedded into a subset of Euclidean graph spaces under certain assumptions.
B.J. Jain / Discrete Applied Mathematics (
)
–
13
Let (GA , δ) be a graph alignment space with attribute set A. The graph alignment distance δ is induced by attribute kernel k : A × A → R. The attribute kernel k has an associated feature map Φ : A → H into the feature space H such that k(x, y) = Φ (x)T Φ (y) for all x, y ∈ A. To ensure that the embedding of the graph alignment space (GA , δ) into a subset of some Euclidean graph space is isometric, we need to make three assumptions:
• Assumption 1: The feature space is of the form H = Rd . • Assumption 2: The graphs of GA are of bounded order n. • Assumption 3: The attribute kernel k is homogeneous. The third assumption, which will be defined next, serves to relax the identical order of Euclidean graphs to bounded order of attributed graphs. Definition 5.1 (Homogeneous Attribute Kernel). An attribute kernel k : A × A → R is homogeneous on attribute set A if k(x, y) ≥ 0 for all x, y ∈ A. The kernel k is inhomogeneous on A if it is not homogeneous on A. An example of a homogeneous attribute kernel is the kernel defined in Example 3.4(2). In contrast, the attribute kernel defined in Example 3.4(1) is inhomogeneous, because the kernel can take negative values. Note that homogeneity is not an inherent property of a kernel, but a property of how a kernel evaluates on a given attribute set. For further details, we refer to Section 6.4. Theorem 5.2. A graph alignment space satisfying Assumptions 1–3 is isometric to a subset of a Euclidean graph space of order n. Proof. 1. Let (GA , δ) be a graph alignment space satisfying Assumptions 1–3. The graph alignment distance δ is induced by an optimal alignment kernel κ . The kernel κ in turn is induced by a homogeneous attribute kernel k : A × A → R. Suppose that Φ : A → H is the feature map associated with the homogeneous attribute kernel k. We assume that the feature space is of the form H = Rd . 2. The feature map Φ gives rise to a Euclidean graph space (GH , δH ) of order n with attributes from the feature space H . The standard inner product on H is an attribute kernel that induces an optimal alignment kernel κH . The optimal alignment kernel κH in turn induces the graph alignment distance function δH (X , Y ). 3. We assume that all graphs from GA are of identical order n. If X ∈ GA is a graph of order m < n, we replace X by a graph X ′ of order n by augmenting X with n − m null-nodes. A null-node is an isolated node with null-attribute ν and without connection to any other node. This assumption identifies two graphs that only differ in the number of null-nodes. Proposition 5.3 shows that the optimal alignment kernel induced by the homogeneous attribute kernel k is invariant under addition and deletion of null-nodes. 4. Let f : GA → GH be the map that sends every graph X = (V , E , α) from GA to the graph Xf = V , E , αf such that αf (i, j) = Φ (α(i, j)) ∈ H for all i, j ∈ V . 5. We show that f satisfies δ(X , Y ) = δH (Xf , Yf ) for all X , Y ∈ GA . The optimal alignment kernel κ is of the form
κ(X , Y ) = max
k αX i, j , αY φ(i), φ(j) : φ ∈ A(X , Y ) .
i,j∈ VX
Note that the sum runs over VX , because all graphs from GA are of identical order by Part 3 of this proof. Since Φ is the associated feature map of the attribute kernel k, we have
k αX i, j , αY φ(i), φ(j)
T = Φ αX i, j Φ αY φ(i), φ(j)
for all i, j ∈ VX and for all alignments φ ∈ A(X , Y ). By construction of the map f in Part 4 of this proof, we have
T Φ αX i, j Φ αY φ(i), φ(j) = αXf i, j T αYf φ(i), φ(j) for all i, j ∈ VX and for all alignments φ ∈ A(X , Y ). The inner product of the right hand side of the last equation is the attribute kernel defined on H that induces the optimal alignment kernel κH . By combining the last three equations, we obtain κ(X , Y ) = κH (Xf , Yf ). Thus, we have
δ(X , Y )2 = κ(X , X ) − 2κ(X , Y ) + κ(Y , Y ) = κH (Xf , Xf ) − 2κH (Xf , Yf ) + κH (Yf , Yf ) = δH (Xf , Yf )2 . Part 4 and Part 5 show that the map f is an isometric embedding of (GA , δ) into a subset of (GH , δH ). To complete the proof of Theorem 5.2, we show that an optimal alignment kernel induced by a homogeneous attribute kernel is invariant under addition and deletion of null-nodes. Recall that null-nodes are isolated nodes with null-attribute.
14
B.J. Jain / Discrete Applied Mathematics (
)
–
Proposition 5.3. An optimal alignment kernel induced by a homogeneous attribute kernel is invariant under addition and deletion of null-nodes. Proof. 1. Let (GA , δ) be a graph alignment space of bounded order n. The graph alignment distance δ is induced by a optimal alignment kernel κ . The kernel κ in turn is induced by a homogeneous attribute kernel k : A × A → R. 2. Suppose that X , Y ∈ GA are two graphs without null-nodes. Let X ′ and Y ′ be two graphs obtained from X and Y , resp., by adding finitely many null-nodes. Then we have VX ⊆ VX ′ , EX ⊆ EX ′ , and αX ′ (i, j) = αX (i, j) for all i, j ∈ VX . The same relationships hold between Y ′ and Y . 3. Let φ : VX → VY be an optimal alignment between the original graphs X and Y . Any extension ψ of alignment φ to an alignment ψ : VX ′ → VY ′ between the augmented graphs X ′ and Y ′ has the property that either i ∈ dom(ψ) \ dom(φ) is a null-node in X ′ or ψ(i) ∈ VY ′ is a null-node in Y ′ , because the domain dom(φ) of alignment φ is maximal by definition. This implies κ(X , Y ) ≤ κ(X ′ , Y ′ ). 4. Suppose that ψ : VX ′ → VY ′ is an optimal alignment between X ′ and Y ′ . Then we have
κ X ′, Y ′ =
.
k αX ′ i, j , αY ′ ψ(i), ψ(j)
i,j∈dom(ψ)
5. Consider the set U = ψ −1 (VY ) ∩ VX consisting of all nodes of graph X that are aligned to nodes of graph Y . By construction, i ̸∈ U implies that either i ∈ VX ′ \ VX is a null-node in X ′ or ψ(i) ∈ VY ′ \ VY is a null-node in Y ′ . Then by definition of an attribute kernel, we have
k αX ′ i, j , αY ′ ψ(i), ψ(j)
=0 for all j ∈ VX ′ . Thus, we can express κ X ′ , Y ′ as a sum over pairs of nodes from U: k αX ′ i, j , αY ′ ψ(i), ψ(j) . κ X ′, Y ′ = i ,j ∈ U
6. In general, we have |U| ≤ min {|VX | , |VY |}. This means that the restriction ψ|U is a partial injective map but not necessarily an alignment between X and Y , because the set U may not be maximal. 7. We show the existence of an alignment ϱ : VX → VY between X and Y such that ϱ|U = ψ|U . There is a subset V ⊆ VX satisfying U ⊆ V and |V | = min {|VX | , |VY |}. We distinguish between two cases: 1. V = VX : This implies |U| ≤ |V | = |VX | ≤ |VY | giving |V \ U| ≤ |VY \ φ(U)|. 2. V ( VX : This implies |U| ≤ |V | = |VY | < |VX | giving |V \ U| = |VY \ φ(U)|. From the implications of both cases follows the existence of an alignment ϱ between X and Y that extends ψ|U . From homogeneity of the attribute kernel follows that
κ X ′, Y ′ = k αX ′ i, j , αY ′ ψ(i), ψ(j) i,j∈ U
=
k αX ′ i, j , αY ′ ϱ(i), ϱ(j) i,j∈ U
≤
k αX ′ i, j , αY ′ ϱ(i), ϱ(j)
i,j∈ V
= κρ (X , Y ). Thus, we have constructed an alignment ϱ between X and Y such that
κ(X , Y ) < κρ (X , Y ), which is impossible. Since κ(X , Y ) ≤ κ(X ′ , Y ′ ) by Part 2 of this proof, we finally have κ(X , Y ) = κ(X ′ , Y ′ ). This shows that the optimal alignment kernel κ induced by a homogeneous kernels is invariant under the addition of null-nodes. 8. It remains to show that κ is invariant under deletion of null-nodes. Suppose that X ′′ and Y ′′ are graphs obtained by deletion of some null-nodes from X ′ and Y ′ . From the previous part follows that κ(X , Y ) = κ(X ′ , Y ′ ) and κ(X , Y ) = κ(X ′′ , Y ′′ ) implying κ(X ′ , Y ′ ) = κ(X ′′ , Y ′′ ). This completes the proof. It is important to note that optimal alignment kernels induced by an inhomogeneous attribute kernel are not invariant under addition and deletion of null-nodes as illustrated in Fig. 3. 6. Statistical pattern recognition The goal of this section is to place the proposed concepts and results into the context of statistical pattern recognition.
B.J. Jain / Discrete Applied Mathematics (
)
–
15
Fig. 3. The effect of adding null-nodes on the optimal alignment kernel κ . Let A = R be the attribute set endowed with attribute kernel k(x, y) = x · y. Right: The graph X has two nodes with attribute +1 connected by an edge. The attribute of the edge can be chosen arbitrarily. The graph Y has a single node with attribute −1. The dashed arrow shows an optimal alignment. Then evaluating theoptimal alignment kernel at X and Y gives κ(X , Y ) = −1. Left: Graphs X ′ and Y ′ are obtained by augmenting graphs X and Y with null-nodes such that X ′ = Y ′ = 3. Null-nodes are isolated nodes with null-attribute ν = 0 as shown by the gray-shaded balls. The dashed arrows depict an optimal alignment between the augmented graphs X ′ and Y ′ . Evaluating the optimal alignment kernel at X ′ and Y ′ gives κ(X , Y ) = 0, which is different from κ(X , Y ) = −1. This shows that the optimal alignment kernel κ is not invariant under addition of null-nodes.
6.1. A first step towards statistical inference A central path of statistical inference departs from the fundamental concept of mean, then leads via the normal distribution and the Central Limit Theorem to statistical estimation using the maximum likelihood method. The maximum likelihood method in turn is a fundamental approach that provides probabilistic interpretations to many pattern recognition methods. This central path is well-defined in Euclidean spaces, but becomes unclear in mathematically less structured spaces. Since an increasing amount of non-Euclidean data is being collected and analyzed in ways that have not been realized before, statistics is undergoing an evolution [38]. Examples of this evolution are contributions to statistical analysis of shapes [2,7,22,37], complex objects [42,45], and tree-structured data [10,45]. The first step along the central path starts with the concept of mean. In Euclidean spaces, the sample mean of n data points x1 , . . . , xn ∈ R is of the form m=
n 1
n i =1
xi .
In graph spaces, a well-defined addition compatible to the underlying graph alignment distance is unknown. Following an idea proposed by Fréchet [13], we exploit that the sample mean m is the unique minimizer of the function Fn (x) =
n
∥xi − x∥2 .
i=1
The optimization-based formulation of the sample mean requires no addition and is applicable for any metric space. Consequently, a sample mean of n graphs X1 , . . . , Xn is a graph that minimizes the sample Fréchet function Fn (X ) =
n
δ 2 (Xi , X ) .
i=1
Basic issues of a sample mean of graphs include conditions for existence, statistical consistency, and uniqueness. To derive such conditions, Corollary 4.5, Theorem 4.7, the Weak Graph Representation Theorem, and Theorem 4.23 have been applied [26]. The special case of sample mean of two graphs allows us to directly address pattern recognition problems, which are located at the other end of the central path. Since graph alignment spaces are geodesic spaces, any pair of graphs X and Y has a midpoint M satisfying 1
δ(X , Y ). 2 In graph alignment spaces the concepts of midpoint and sample mean are equivalent [26]. Using the positively homogeneous scalar multiplication, we can generalize the equivalence between midpoint and sample mean of two graphs to the equivalence of the weighted midpoint and the weighted mean for non-negative weights [26]. The weighted mean is important for extending a number of standard pattern recognition methods to graph spaces. In standard pattern recognition, the weighted mean occurs in the update rule of different learning methods, such as selforganizing maps [39], k-means clustering [27], and learning vector quantization [39]. These update rules can be generalized δ(X , M ) = δ(Y , M ) =
16
B.J. Jain / Discrete Applied Mathematics (
)
–
to graph spaces using the concept of weighted mean of graphs [35,32,34,33]. The weighted midpoint property provides a geometric interpretation of the update rule. 6.2. Generalized linear classifiers In the last section, we touched the topic pattern recognition using the notion of midpoint. In this section, we present a more general approach to extend standard pattern recognition methods to graph spaces. As an example, we extend linear classifiers to graph spaces using the orbit space framework. The basic ideas presented here can be applied to the majority of statistical pattern recognition methods that are based on the concept of gradient such as, for example, neural network learning (deep learning), learning vector quantization [39], self-organizing maps [39], k-means clustering [27], and many more. 6.2.1. Generalized linear classifiers We first consider linear classifiers. Let Y = {±1} be the set of class labels. A linear classifier hw ,b : Rd → Y is a function of the form hw ,b (x) =
+1 : −1 :
wTx + b ≥ 0 w T x + b < 0,
where w ∈ Rd is the weight vector and b ∈ R is the bias. When extending linear classifiers to graph spaces [25,24], we replace the space Rd by Euclidean graph spaces GH and the inner product on Rd by the optimal alignment kernel κ on GH . Then the analogue of a linear classifier in graph spaces is a generalized linear classifier hW ,b : GH → Y such that hW ,b (X ) =
+1 : −1 :
κ W , f (X ) + b ≥ 0 κ W , f (X ) + b < 0,
where W ∈ GH is the weight graph and b ∈ R is the bias. 6.2.2. Geometric interpretation of generalized linear classifiers Next, we apply the concepts of length, angle and orthogonality for a geometric interpretation of generalized linear classifiers. For this, we begin with the geometric interpretation of linear classifiers. A linear classifier hw ,b : Rd → Y determines a hyperplane
Hw ,b = x ∈ Rd : xT w + b = 0
such that 1. w is orthogonal to Hw ,b . 2. b/ ∥w ∥ is the distance of 0 from Hw ,b . 3. hw ,b (x)/ ∥w ∥ is the distance of x from Hw ,b . Next, we turn to the geometric interpretation of generalized linear classifiers. For this, we define the distance of a graph X ∈ GH from a subset U ⊆ GH by δ(X , U) = min Y ∈ U δ(X , Y ). A generalized linear classifier hW ,b : GH → {±1} with ordinary W determines a generalized hyperplane
HW ,b = {X ∈ GH : κH (X , W ) + b = 0} that gives rise to the following geometric interpretation [25,24]: 1. W is orthogonal to Hw ,b . 2. b/ℓ(W ) = δ 0, HW ,b . 3. hW ,b (X )/ℓ(W ) ≤ δ X , HW ,b . Recall that 0 ∈ GH denotes the null-graph consisting of null-nodes only. The first geometric property holds, because
κH (X , Y ) − κH (X , Z ) = κH (X , Y ) + b − κH (X , Z ) − b = 0 for all X , Y ∈ HW ,b . The second and third geometric property follow from Proposition 4.18 and the Weak Graph Representation Theorem (see also [25]). The first two geometric properties are in line with the corresponding geometric properties of linear classifiers. In contrast, the third geometric property of generalized linear classifiers is weaker than its counterpart in Euclidean spaces. To explain this, recall that the distance hw ,b (x)/ ∥w ∥ of x from the hyperplane Hw ,b corresponds to the distance of x from its orthogonal projection x⊥ on Hw ,b . The situation is different in graph spaces, because there are graphs X that have no orthogonal projection on the generalized hyperplane HW ,b . For precisely those graphs X , we have the strict inequality hW ,b (X )
ℓ(W )
< δ X , H W ,b .
B.J. Jain / Discrete Applied Mathematics (
)
–
17
6.2.3. Learning This section applies the Weak Graph Representation Theorem for learning generalized linear classifiers. Again, we first consider the problem of learning a linear classifier. Suppose that (x1 , y1 ) , . . . , (xn , yn ) ∈ Rd × Y are n training examples. The goal of learning is to find parameters w∗ ∈ Rd and b∗ ∈ R such that the linear classifier hw∗ ,b∗ (x) minimizes the empirical error function E (w , b) =
n 1 l yi , hw ,b (xi ) , n i =1
where l(y, yˆ ) is a differentiable loss function measuring the cost incurred by predicting yˆ when the true class is y. Since E is differentiable as a function of the parameters (w , b) ∈ Rd × R, gradient descent updates the parameters according to the rule w ′ = w − η∇w E (w , b)
(3)
b = b − η∇b E (w , b),
(4)
′
where η is the step-size (learning rate). Next, we consider the problem of learning of a generalized linear classifier. Suppose that (X1 , y1 ) , . . . , (Xn , yn ) ∈ GH × {±1} are n training graphs Xi with corresponding class labels yi . Learning a generalized linear classifier amounts to minimizing the error function E (W , b) =
n 1 l yn , hW ,b (Xi ) , n i =1
where l(y, yˆ ) is a differentiable loss function and W is ordinary. In contrast to Euclidean spaces, the concept of derivative and gradient are unknown for functions on graphs. The solution is to select a cross section µ : GH → X of GH along w = µ(W ). The Weak Graph Representation Theorem gives the equivalent error function n 1 E (w , b) = l yn , hw ,b (µ(Xi ))
n i =1
defined on the Euclidean space X × R. The function E (w , b) is exactly of the same form as the error function E (w , b) for linear classifiers. Therefore, we can apply the gradient descent rule defined in Eqs. (3) and (4) to obtain w ′ ∈ X and b′ ∈ R. The updated weight vector w ′ represents a graph W ′ ∈ GH . The next iteration requires to choose a cross section µ′ along µ′ (W ′ ) that may differ from µ in the interior, that is µ(GH )◦ ̸= µ′ (GH )◦ . This in turn may imply that µ′ (Xi ) ̸= µ(Xi ) for some training graphs Xi . For convergence results and applications we refer to [25,24]. 6.3. The role of graph alignment spaces In the previous sections, we focused on the special class of Euclidean graphs exclusively. In this section, we consider the more general graph alignment spaces and its role in pattern recognition. For the sake of simplicity, we continue using generalized linear classifiers as illustrative example. Important for clarifying the different roles of graph alignment spaces and Euclidean graph spaces is the distinction between three type of spaces: the data space D , the feature space F , and the parameter space P . We illustrate the characteristics of the different spaces by considering the problem of filtering spam e-mails as an example. The task of spam-filtering is to classify incoming e-mails to one of the two classes, spam and not-spam. Here, we apply linear classifiers to spam-filtering. The data space D consists of e-mails represented by a multi-set of words. Since it is not possible to directly define a linear function on D , we first assume a vocabulary with d words in lexicographic order. Then we define an embedding f : D → F into the feature space F = Zd . The map f sends every e-mail to a d-dimensional vector x ∈ F . Component xi of the feature vector x counts the occurrences of the word at position i in the vocabulary. The feature space F is a subset of the Euclidean space Rd . Thus, F determines the parameter space P = Rd × R of a linear classifier, where the parameters are of the form (w , b). When extending linear classifiers to graph spaces, graph alignment spaces GA of bounded order n take the role of the data space D . For learning, we define an (isometric) embedding f : GA → F into some feature space F ⊆ GH . The space GH is the Euclidean graph space of order n. Finally, the parameter space of a generalized linear classifier takes the form P = GH × R . 6.4. The role of homogeneous attribute kernels This section discusses the role of homogeneous attribute kernels in pattern recognition. An embedding f : GA → F ⊆ GH is not isometric in general. But we can construct an isometric embedding f if the attribute kernel on A is homogeneous. The restriction to homogeneous attribute kernels is primarily a theoretical limitation
18
B.J. Jain / Discrete Applied Mathematics (
)
–
of less impact in pattern recognition applications. For many discriminative methods in pattern recognition, it is more important to choose an appropriate function class from which a learning algorithm picks a final model rather than devising an appropriate similarity function. For example, a function implemented by a feed-forward neural network is typically composed of several inner products and non-linear activation functions. Though inner products are often not an adequate choice of a similarity function on images, the function class implemented by a deep convolutional network architecture together with a suitable learning algorithm constitutes a powerful state-of-the-art method for image recognition. There are different ways to cope with inhomogeneous attribute kernels k on attribute set A without modifying k. In the following, we present two approaches. For this, we assume that (GA , δ) is a graph alignment space endowed with a metric δ induced by he graph alignment kernel κ . The graph alignment kernel κ in turn is induced by an arbitrary attribute kernel k. The first solution transforms the graph alignment kernel κ . This solution follows the proof of the Graph Representation Theorem and assumes that all graphs under consideration are of identical order by augmenting smaller graphs with nullnodes. This transformation leaves the attribute kernel but may affect the graph alignment kernel κ and k unchanged therefore results in a different graph alignment space G′A , δ with different topological and geometrical properties than the
original space (GA , δ). The Graph Representation Theorem and subsequent results hold for the transformed space G′A , δ . It is important to note that adding null-nodes is merely a technical trick to simplify mathematics, but need not to be enforced in a practical setting. The second solution exploits that homogeneity is not an inherent property of an attribute kernel k, but a property of the kernel k when evaluated on some attribute set. Thus, the same kernel k can be homogeneous on one attribute set but inhomogeneous on another one. Based on this observation, the second solution maps the attribute set A to another attribute set A′ using an isometric transformation. Then the Graph Representation Theorem and subsequent results hold for the transformed space (GA′ , δ). As an example, we assume that A ⊂ Rd is a bounded subset, which is often the case in pattern recognition applications. Suppose that the underlying attribute kernel is the standard inner product. Translating the attribute set A into the positive orthant of Rd results in an attribute set A′ for which the inner product is a homogeneous attribute kernel. Since translations are isometric transformations, the relevant topological and geometrical properties of the transformed graph alignment space remain unchanged.
7. Conclusion This contribution studies geometric properties of graph alignment spaces in order to lay the foundation for a statistical theory of graphs. The main result is the Graph Representation Theorem stating that graphs are points of a geometric space, called orbit space. Based on this result, we derived geometric concepts such as length, angle, and orthogonality in graph spaces. In addition, we presented a weak form of the Cauchy–Schwarz inequality. From the view of a generic graph, a graph space is isometric to a subset of a convex polyhedral cone in the feature space. We placed the proposed concepts and results into the context of statistical pattern recognition and indicated how standard pattern recognition methods such as linear classifiers can be generalized to graph spaces. As a consequence of the orbit space framework, this contribution provides a mathematical foundation for narrowing the gap between structural and statistical pattern recognition and places existing learning methods in graph spaces on a theoretically sound basis. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19]
C. Berg, J.P.R. Christensen, P. Ressel, Harmonic Analysis on Semigroups, Springer, New York, 1984. A. Bhattacharya, R. Bhattacharya, Nonparametric Inference on Manifolds: With Applications to Shape Spaces, Cambridge University Press, 2012. G.E. Bredon, Introduction to Compact Transformation Groups, Elsevier, 1972. M.R. Bridson, A. Haefliger, Metric Spaces of Non-Positive Curvature, Springer, 1999. H. Bunke, S. Günter, X. Jiang, Towards bridging the gap between statistical and structural pattern recognition: Two new concepts in graph matching, in: Advances in Pattern Recognition, 2001. D. Conte, P. Foggia, C. Sansone, M. Vento, Thirty years of graph matching in pattern recognition, Int. J. Pattern Recognit. Artif. Intell. 18 (3) (2004) 265–298. I.L. Dryden, K.V. Mardia, Statistical Shape Analysis, Wiley, 1998. R.P. Duin, E. Pekalska, The dissimilarity space: Bridging structural and statistical pattern recognition, Pattern Recognit. Lett. 33 (7) (2012) 826–832. A. Feragen, F. Lauze, M. Nielsen, P. Lo, M. De Bruijne, M. Nielsen, Geometries on spaces of treelike shapes. in: Asian Conference on Computer Vision, 2011. A. Feragen, P. Lo, M. De Bruijne, M. Nielsen, F. Lauze, Towards a theory of statistical tree-shape analysis, IEEE Trans. Pattern Anal. Mach. Intell. 35 (2013) 2008–2021. A. Feragen, M. Nielsen, S. Hauberg, P. Lo, M. De Bruijne, F. Lauze, A geometric framework for statistics on trees. Technical Report, Department of Computer Science, University of Copenhagen, 2011. P. Foggia, G. Percannella, M. Vento, Graph matching and learning in pattern recognition in the last 10 years, Int. J. Pattern Recognit. Artif. Intell. 28 (1) (2014). M. Fréchet, Les éléments aléatoires de nature quelconque dans un espace distancié, Ann. Inst. Henri Poincaré (1948) 215–310. H. Fröhlich, J.K. Wegner, F. Sieker, A. Zell, Optimal assignment kernels for attributed molecular graphs, in: International Conference on Machine Learning, 2005. D. Gale, Convex polyhedral cones and linear inequalities, in: Activity Analysis of Production and Allocation, Vol. 13, 1951, pp. 287–297. X. Gao, B. Xiao, D. Tao, X. Li, A survey of graph edit distance, Pattern Anal. Appl. 13 (1) (2010) 113–129. M.R. Garey, D.S. Johnson, Computers and Intractability, W.H. Freeman, 2002. P. Geibel, B.J. Jain, F. Wysotzki, SVM learning with the SH inner product, in: European Symposium on Artificial Neural Networks, 2004. C.E. Ginestet, Strong consistency of Fréchet sample mean sets for graph-valued random variables, 2012. arXiv:1204.3183.
B.J. Jain / Discrete Applied Mathematics (
)
–
19
[20] S. Gold, A. Rangarajan, E. Mjolsness, Learning with preknowledge: clustering with point and graph matching distance measures, Neural Comput. 8 (4) (1996) 787–804. [21] S. Günter, H. Bunke, Self-organizing map for clustering in the graph domain, Pattern Recognit. Lett. 23 (4) (2002) 405–417. [22] S. Huckemann, T. Hotz, A. Munk, Intrinsic shape analysis: Geodesic PCA for Riemannian manifolds modulo isometric Lie group actions, Statist. Sinica 20 (2010) 1–100. [23] M. Hurshman, J. Janssen, On the continuity of graph parameters, Discrete Appl. Math. 181 (2015) 123–129. [24] B. Jain, Flip-flop sublinear models for graphs, in: Structural, Syntactic, and Statistical Pattern Recognition, 2014. [25] B. Jain, Margin perceptrons for graphs, in: International Conference on Pattern Recognition, 2014. [26] B. Jain, Properties of the Sample Mean in Graph Spaces and the Majorize-Minimize-Mean Algorithm, 2015. arXiv:1511.00871. [27] A.K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice-Hall, 1988. [28] B. Jain, P. Geibel, F. Wysotzki, SVM learning with the Schur-Hadamard inner product for graphs, Neurocomputing 64 (2005) 93–105. [29] B. Jain, K. Obermayer, On the sample mean of graphs, in: International Joint Conference on Neural Networks, 2008. [30] B. Jain, K. Obermayer, Algorithms for the sample mean of graphs, in: Computer Analysis of Images and Patterns, 2009. [31] B. Jain, K. Obermayer, Structure spaces, J. Mach. Learn. Res. 10 (2009) 2667–2714. [32] B. Jain, K. Obermayer, Generalized learning graph quantization, in: Graph-Based Representations in Pattern Recognition, 2010. [33] B. Jain, K. Obermayer, Graph quantization Computer Vision and Image Understanding, Vol. 115, no. 7, 2011, pp. 946–961. [34] B. Jain, S.D. Srinivasan, A. Tissen, K. Obermayer, Learning graph quantization, in: Structural, Syntactic, and Statistical Pattern Recognition, 2010. [35] B. Jain, F. Wysotzki, Central clustering of attributed graphs, Mach. Learn. 56 (1–3) (2004) 169–207. [36] X. Jiang, A. Munger, H. Bunke, On media graphs: properties, algorithms, and applications, IEEE Trans. Pattern Anal. Mach. Intell. 23 (10) (2001) 1144–1151. [37] D.G. Kendall, Shape manifolds, procrustean metrics, and complex projective spaces, Bull. Lond. Math. Soc. 16 (1984) 81–121. [38] P.T. Kim, J.-Y. Koo, Comment on [22], Statist. Sinica 20 (2010) 72–76. [39] T. Kohonen, Self-Organizing Maps, Springer-Verlag, 2001. [40] L. Livi, A. Rizzi, The graph matching problem, Pattern Anal. Appl. 16 (3) (2013) 253–283. [41] E.M. Loiola, N.M. de Abreu, P.O. Boaventura-Netto, P. Hahn, T. Querido, A survey for the quadratic assignment problem, European J. Oper. Res. 176 (2) (2007) 657–690. [42] J.S. Marron, A.M. Alonso, Overview of object oriented data analysis, Biom. J. 56 (5) (2014) 732–753. [43] J.G. Ratcliffe, Foundations of Hyperbolic Manifolds, Springer, 2006. [44] J. Vert, The optimal assignment kernel is not positive-definite, 2008. arXiv:0801.4061. [45] H. Wang, J.S. Marron, Object oriented data analysis: sets of trees, Ann. Statist. 35 (2007) 1849–1873. [46] J. Yan, X.-C. Yin, W. Lin, C. Deng, H. Zha, X. Yang, A short survey of recent advances in graph matching, in: International Conference on Multimedia Retrieval, 2016.