Graphic sequences, distances and k -degree anonymity

Graphic sequences, distances and k -degree anonymity

Discrete Applied Mathematics 188 (2015) 25–31 Contents lists available at ScienceDirect Discrete Applied Mathematics journal homepage: www.elsevier...

403KB Sizes 0 Downloads 54 Views

Discrete Applied Mathematics 188 (2015) 25–31

Contents lists available at ScienceDirect

Discrete Applied Mathematics journal homepage: www.elsevier.com/locate/dam

Graphic sequences, distances and k-degree anonymity Julián Salas a,∗ , Vicenç Torra a,b a

IIIA-CSIC, Consejo Superior de Investigaciones Científicas, Institut d’Investigació en Intel·ligència Artificial, Campus Universitat Autònoma de Barcelona, Spain b

School of Informatics University of Skövde, Skövde, Sweden

article

info

Article history: Received 26 September 2013 Received in revised form 26 September 2014 Accepted 10 March 2015 Available online 6 April 2015 Keywords: Degree sequence Edge rotation distance Rectilinear distance Regular graph Graph distance Graphic sequence k-anonymity k-degree anonymity

abstract In this paper we study conditions to approximate a given graph by a regular one. We obtain optimal conditions for a few metrics such as the edge rotation distance for graphs, the rectilinear and the Euclidean distance over degree sequences. Then, we require the approximation to have at least k copies of each value in the degree sequence, this is a property proceeding from data privacy that is called k-degree anonymity. We give a sufficient condition in order for a degree sequence to be graphic that depends only on its length and its maximum and minimum degrees. Using this condition we give an optimal solution of k-degree anonymity for the Euclidean distance when the sum of the degrees in the anonymized degree sequence is even. We present algorithms that may be used for obtaining all the mentioned anonymizations. © 2015 Elsevier B.V. All rights reserved.

1. Introduction For modelling applications, it is very important to know the distance between isomorphism classes of graphs. At present different distances have been defined and used in the literature. For example in organic chemistry, the maximum common subgraph distance was used to reflect the principle of minimal structural change, cf. [2,11,18]. Another measure used for graph similarity is the edge rotation distance defined by Chartrand, Saba and Zou [3]. This is also known as the reaction metric. In [5], the authors consider the rotation distance and present bounds on this distance between two graphs in terms of their greatest common subgraphs. Three different metrics derived from edge manipulation are studied in [7]. These are the edge move distance, the edge slide distance and the edge rotation distance. It is also observed a relation between edge rotation distance between two graphs and the rectilinear distance between its degree sequences. In this work we consider the related problem of graph approximation. That is given a graph G and a distance d find the graph G′ at minimum distance from G that satisfies a set of constraints. More specifically we consider the problem of approximating a graph with regular graphs and we obtain the best approximations under the Euclidean and rectilinear metrics between degree sequences and also under the edge rotation distance for graphs. A similar but more complicated problem is that of approximating with a graph in which the degrees of its vertices have frequency at least k, this is a property proceeding from data privacy that is called k-degree anonymity.



Corresponding author. E-mail addresses: [email protected] (J. Salas), [email protected], [email protected] (V. Torra).

http://dx.doi.org/10.1016/j.dam.2015.03.005 0166-218X/© 2015 Elsevier B.V. All rights reserved.

26

J. Salas, V. Torra / Discrete Applied Mathematics 188 (2015) 25–31

In data privacy, a data satisfies k-anonymity with respect to a query when for any value in the domain of the result of the query, there are at least k individuals with that value, cf. [14,16]. The concept of k-anonymity comes from the need to publish data for research or practical purposes, while preserving the privacy of the individuals. It was a common practice for organizations to release person specific data after only removing all explicit identifiers, such as name, or social security number. But this has been proven insufficient, since the combinations of few characteristics that might have been obtained from different data sources, often can be used to link an individual to a data record and therefore disclose private information. Hence, in order to avoid this re-identification, Samarati and Sweeney proposed the concept of k-anonymity, in which a record of an individual in a database is cloaked among k − 1 other equal records. After the proliferation of social networks the concept of anonymity was extended to graphs in order to model the privacy issues that arise from its publication. In [1], a model of attacks focused on anonymized social networks is proposed to learn whether edges exist or not between given targeted pairs of nodes, i.e. to know if there is a relation between a pair of subjects. In [9], the authors define a class of queries Hi (x), of increasing power i, which report on the local structure of the graph around a node. The authors state that these queries are inspired by iterative vertex refinement, a technique originally developed to efficiently test for the existence of graph isomorphisms. The queries Hi (x) correspond to the degree of the neighbours at distance i − 1 from vertex x, hence H1 (x) correspond to the vertex degree. In [12], the k-anonymization of the class of queries H1 (x) is addressed, and it is called k-degree anonymization. The authors define the graph anonymization problem as finding a k-anonymous supergraph of a given graph G, and relate the optimal cost of an anonymization to the rectilinear distance δ1 . Note that this is not the only type of k-anonymization considered in graphs, other definitions exist to formalize other types of attacks, e.g. [15]. In this work, we further study the problem of finding a k-degree anonymous graph with two other metrics for degree sequences and graphs. They are the Euclidean distance δ2 and the edge rotation distance between graphs δER . The optimal solution for vectors can be solved in polynomial time, cf. [8], by using microaggregation, a widely used disclosure control technique, in which records are aggregated into groups and the mean of the group to which an observation belongs is released, see [13]. This solution is not always meaningful for degree sequences, since the result may not be a degree sequence of a graph (i.e. a graphic sequence). For example, when the sum of all the values in the sequence is odd it cannot be graphic. We give a new condition on the maximum and minimum values of a degree sequence that implies that the sequence is graphic, this can be used to guarantee that an optimal k-anonymization with respect to the metric δ2 can be obtained. It is worth noting that all these results may be applied to all kinds of graphs not only for social networks. In this paper we look for theoretical results to find conditions for the optimal k-degree anonymity. The literature in data privacy discusses some approximations that have been used in experiments with some available graphs. See [17,4] for details. In this work, we review in Section 2 the main definitions needed in the rest of the paper. In Section 3, we consider the problem of approximating a graph with regular graphs and we obtain the best approximations under the Euclidean and rectilinear metrics between degree sequences and also under the edge rotation distance for graphs. In Section 4, we consider the problem of approximating with degrees of frequencies at least k, that is the case when k-degree anonymity is required. This section contains the main results, a sufficient condition in order for a degree sequence to be graphic that depends only on its length and its maximum and minimum degrees. An approximation of the k-degree anonymity problem when the subsets are all of size k except one. An optimal approximation for subsets of any size greater than k, or at most an approximation at distance 4k2 to the optimal with the metric δ2 when the degree sequence for the optimal partition is not graphic. In the last section we present sketches of algorithms based on our results that can be used to obtain k-degree anonymous sequences. The paper finishes with a summary. 2. Definitions Let G denote a simple graph, i.e. without loops or multiple edges, V (G) its vertex set and E (G) its edge set. We consider that all degree sequences are in non-increasing order, i.e. if s = (d1 , d2 , . . . , dn ) is the degree sequence of a graph G, then d1 ≥ d2 ≥ · · · ≥ dn , and we will denote by vi the vertex of degree di , that is d(vi ) = di . By adding a superindex n n n we will denote the frequency of the degree, e.g. (d11 , d22 , . . . , dl l ) is a sequence in which each degree di has ni repetitions. ′ ′′ We will denote dG′ (x) as d (x), and dG′′ (x) as d (x). By considering the degree sequences  of graphs with n vertices as vectors in the discrete Euclidean space Nn ⊂ Rn , they n ′ 2 ′ inherit the Euclidean distance δ2 (s, s ) = i=1 (di − di ) , where s = (d1 , d2 , . . . , dn ) is the degree sequence of a graph G and ′ ′ ′ ′ s = (d1 , d2 , . . . , dn ) the degree sequence of a graph G′ . By measuring the distance coordinate by coordinate, the rectilinear n ′ distance is obtained δ1 (s, s′ ) = i=1 |di − di |. The distance between two graphs G and G′ can also be measured by the number of edge operations that must be done to G in order to obtain the graph G′ , if the operation is a rotation then we refer to the edge rotation distance δER . Let G and G′ be two graphs having the same order n = |V (G)| = |V (G′ )| and size q = |E (G)|, q′ = |E (G′ )|, not necessarily equal. The edge rotation distance δER (G, G′ ) is defined as 0 if G ∼ = G′ and, otherwise, as the smallest positive integer r for which

J. Salas, V. Torra / Discrete Applied Mathematics 188 (2015) 25–31

27

there exists a sequence G0 , Gl , . . . , Gr of graphs such that G0 ∼ = G, Gr ∼ = G′ , where ∼ = denotes isomorphism, and Gi can be transformed into Gi+1 by an edge rotation, deletion or addition, for i = 0, 1, . . . , r − 1, i.e. Gi+1 = Gi − xu + xv , Gi+1 = Gi − xu or Gi+1 = Gi + xv , for xu ∈ E (G) and xv ̸∈ E (G). For graphs G and G′ with the same size the following equation holds, cf. [5,7]:

δER (G, G′ ) ≥

n 1

2 i=1

|di − d′i |.

(1)

Let a = ⌊ 1n i=1 di ⌋ denote the lower arithmetic mean of a set of numbers {d1 , . . . , dn }. Let C , C ′ ⊂ V (G) be such that C ∩ C ′ = ∅, we will denote by G[C ] the graph induced by the set C , where uv ∈ E (G[C ]) if and only if uv ∈ E (G) and u, v ∈ V (C ). By G[C , C ′ ] we will denote the induced bipartite graph with vertex set V (G[C , C ′ ]) = V (C ) ∪ V (C ′ ), where uv ∈ E (G[C , C ′ ]) if and only if uv ∈ E (G), u ∈ V (C ) and v ∈ V (C ′ ). If X ⊂ E (G) we will denote by G − X , the graph with vertex set V (G − X ) = V (G) and edge set E (G − X ) = E (G) − X .

n

3. Approximating with regular graphs We begin by proving the following simple but useful Lemma. Lemma 3.1. Let s = (d1 , d2 , . . . , dn ) be the degree sequence of a graph G and C ⊂ V (G). If sequence s∗ with degrees d∗i =





vi ∈C

di = a|C |, then the degree

if vi ̸∈ C if vi ∈ C

di a

is graphic. Proof.  Suppose C = {v1 , v2 , . . . , v|C | }, is ordered so that the degrees are in a non-increasing order, i.e. di ≥ dj , for i < j, and vi ∈C di = a|C |. If there is di > a for some vi ∈ C , then there must be vj ∈ C such that dj < a, hence there exists ′ x ∈ V (G) such that xvi ∈ E (G) and xvj ̸∈ E (G), therefore by means of a rotation around x we get a graph G = G − xvi + xvj in which the degree sequence s′ is the same as s except for d′i = di − 1, d′j = dj + 1 and the sum vi ∈C d′i remains the same

|C |  1 ∗ i=1 |di − a| rotations we arrive at a graph G such that it has degree sequence vi ∈C di . Hence by applying 2   1 ∗ ∗ ∗ ∗ ∗ ∗ s = (di ), vi ∈C di = vi ∈C di = a|C |, di = di if vi ̸∈ C and di = a if vi ∈ C . By Eq. (1) we conclude that δER (G, G ) = 2   vi ∈C |di − a|. as the sum

Next, we compare the approximation of a set of numbers s = {d1 , . . . , dn } with its median, its lower arithmetic mean and with the edge rotation distance between the corresponding graphs. For this we need the following definitions. Let C = {v1 , v2 , . . . , v|C | } ⊂ V (G) and assume that is ordered so that the degrees are in a non-increasing order, i.e. di ≥ dj , for i < j. Let a and m be, respectively, the lower arithmetic mean and the median of the degrees of the vertices in C , from now on assume that a > m. |C | |C | Let C1 = {vi : i ≤ 2 }, C2 = {vi : a > di ≥ m, i ≤ 2 }.

˜ = G − X1 all the vertices v˜ i ∈ G˜ [C1 ] have degree d˜ i at least m. Let X1 ⊂ E (G[C1 ]) be a maximal set of edges such that in G

Theorem 3.1. Assume that ′

di =



di m



vi ∈C

di = a|C |. Let s′ be the following degree sequence:

if vi ̸∈ C if vi ∈ C .

If (a − m)|C | is even and there are |X˜ | = 12 vi ∈C1 (di − m) alternating paths of length 3 with both endpoints in C1 . Then, there is a graph G′ of degree sequence s′ .  ∗ Furthermore, δER (G, G′ ) ≤ δER (G, G∗ ) if and only if |X˜ | ≤ vi ∈C2 (a − di ), where G is the graph from Lemma 3.1.



Proof. First of all rotate vi ∈C \C1 (m − di ) edges. For simplicity of notation we are going to suppose that G is the graph after doing this operation,  hence we suppose all the degrees in G are at least m.  Observe that vi ∈C1 di = vi ∈C1 (di − m) + |C1 |m = (a − m)|C | + |C1 |m. Therefore vi ∈C1 (di − m) = (a − m)|C |.



˜ = G − X1 is such that the following equation holds: Note that G 2|X˜ | =

 v˜ i ∈C1

(d˜ i − m) =



(di − m) − 2|X1 |.

(2)

vi ∈C1

By hypothesis there are |X˜ | alternating paths of length 3 with both endpoints in C1 . Let us denote them by ui vi wi xi , with ui , xi ∈ C1 , ui vi , wi xi ∈ E (G) and vi wi ̸∈ E (G).

28

J. Salas, V. Torra / Discrete Applied Mathematics 188 (2015) 25–31

˜ ) and rotating |X˜ | edges wi xi ∈ E (G˜ ) to the edges wi vi , we obtain a graph G′ with Hence deleting |X˜ | edges ui vi ∈ E (G ′ degree sequence s . |C | For the second part of the proof, recall that di ≥ m for all vi ∈ C , and note that δER (G, G′ ) = |X1 | + 2|X˜ | = (a − m) 2 + |X˜ |   |C | = di ≤a (a − di ) = di ≥a (di − a) vi ∈C2 (a − di )+(a − m) 2  δER (G, G∗ ) if and only if |X | ≤ vi ∈C2 (a − di ). 

from Eq. (2); on the other hand Therefore δER (G, G ) ≤ ′



In general for every value t ≤ m, if we suppose that |E (G[C ])| ≥ (a − t ) applying Lemma 3.1 we prove the following corollary.

|C | 2

=

1 2



vi ∈C

, by deleting (a − t )

|C | 2

|di − a| = δER (G, G∗ ).

edges from G[C ] and

Corollary 3.1. Let G and G∗ as in Lemma 3.1, and sˆ be the following degree sequence: dˆ i =



di t

if vi ̸∈ C if vi ∈ C .

If (a − t )|C | is even and |E (G[C ])| ≥ (a − t )

|C | 2

, then sˆ is graphic.

Lemma 3.2. Let s = (d1 , d2 , . . . , dn ) be the degree sequence of a graph G. Then, either the degree sequence s∗ = (an ) or s∗ = ((a + 1)n ) is graphic.

n

Proof. Let R = i=1 di − an, hence 0 ≤ R ≤ n − 1. n If both R and n − R were odd, then n would be even, implying that an + R = i=1 di is odd, contradicting the hypothesis that s is the degree sequence of a graph. either R or n − R is even. If R is even, say R = 2t. Let G′ = G − {e1 , . . . , et } for e1 , . . . , et ∈ E (G). Then, the sum nHence, ′ d equals an and by Lemma 3.1 a graph G∗ with degree sequence s∗ = (an ) is obtained, yielding the conclusion. i=1 i If R is odd and n − R is even, say n − R = 2t, then n anda must be odd, hence the sequence s′ = (an ) is not graphic. n Therefore, by adding t edges we obtain a graph G′ such that i=1 d′i = (a + 1)n and by Lemma 3.1 a graph G∗ with degree ∗ n sequence s = ((a + 1) ) is obtained.  From Corollary 3.1 and Lemma 3.2 we obtain the following. Corollary 3.2. The degree sequence s′ = (mn ) or s′ = ((m + 1)n ) is graphic. Theorem 3.2. For any given graph G with degree sequence s, the closest graphs and sequences to G and s, respectively, that are regular are the following under the different metrics: (i) The sequence s∗ = (an ) or s∗ = ((a + 1)n ), for the metric δ2 . (ii) The sequence s′ = (mn ) or s′ = ((m + 1)n ), for the metric δ1 . (iii) The graph G′ or G∗ , for the distance δER depending on whether

1 2

vi ∈C1 (di



˜ − m) ≤

vi ∈C2 (a



− di ) or not.

Proof. (i) It is well known that the best approximation, with the distance δ2 , of a vector (d1 , d2 , . . . , dn ) ∈ Rn with a constant vector (x, . . . , x) is when x is the arithmetic mean of the coordinates (d1 , d2 , . . . , dn ), cf. [6]. Therefore, for the discrete Euclidean space Nn the best approximation is with the constant vector A = (a, . . . , a) or (a + 1, . . . , a + 1), where a is the discrete arithmetic mean. (ii) For the distance δ1 , the best approximation of a vector (d1 , d2 , . . . , dn ) ∈ Rn with a constant vector (x, . . . , x) is when x is median of the coordinates (d1 , d2 , . . . , dn ), cf. [6]. Hence in this case the best approximation is with the constant vector M = (m, . . . , m) or (m + 1, . . . , m + 1), where m is the median of the set {d1 , d2 , . . . , dn }. (iii) It is a corollary from Theorem 3.1.



4. Approximating with degrees of frequencies k Let G be a graph and C be a partition of V (G) such that C = i=1 Ci . Let aj be the lower arithmetic mean of {di : vi ∈ Cj }, rj = ( x∈Cj d(x)) − aj |Cj | for all j ≤ l, and e such that

l

′ ′ R= 1≤j≤l rj = e|Cl | + rl , for 0 ≤ rl ≤ |Cl |. In the following theorem we give a k-degree anonymous degree sequence that approximates a given sequence s, it has the advantage of being always graphic but it may not be very precise.



Theorem 4.1. Let G be a graph with degree sequence s = (d1 , d2 , . . . , dn ). Let k be a positive integer and suppose that n = lk + t l with t < k. Let C be a partition of V (G), such that C = j=1 Cj , |Cj | = k for all j < l and |Cl | = k + t. Then the sequence s∗ = (ak1 , ak2 , . . . , akl−1 , (al + e + 1)k+t ) or s∗ = (ak1 , ak2 , . . . , akl−1 , (al + e + 1 ± 1)k+t ) is graphic.

J. Salas, V. Torra / Discrete Applied Mathematics 188 (2015) 25–31

29

Proof. Let G be a graph with degree sequence s = (d1 , d2 , . . . ,  dn ). From this sequence, by passing rj to Cl by  all the residues  means of rotations, it is possible to obtain a graph G′ such that vi ∈Cj d′i = aj |Cj | for j < l and vi ∈Cl d′i = al |Cl | + 1≤j≤l rj . ′



Then, by Lemma 3.1 we obtain the sequence s∗ = (ak1 , ak2 , . . . , akl−1 , (al + e + 1)rl , (al + e)k+t −rl ). If k + t − rl′ is even, i.e. k + t − rl′ = 2h, and aj = al for all j ̸= l, then we will be in the case of Lemma 3.2. Hence we may suppose that aj > al for some j < l. Let x, y ∈ Cl , if xy ̸∈ Cl then add xy to E (G′ ), i.e. let G′′ = G′ + xy. Suppose that xy ∈ Cl for all y ∈ Cl . There exists x′ ∈ Cj , j ̸= l such that xx′ ̸∈ E (G′ ), otherwise d′ (x) = n − 1 contradicting that aj > al for some j < l. Also there is y′ ∈ N (x′ ) such that yy′ ̸∈ E (G′ ), otherwise d′ (y) > d′ (x′ ). In this case let G′′ = G′ − x′ y′ + xx′ + yy′ . Therefore, in both cases, G′′ is a graph with same degree sequence than G′ except for d′′ (x) = d′ (x) + 1 and d′′ (y) = d′ (y) + 1. This procedure can be applied to h pairs of vertices in Cl , obtaining a graph with degree sequence (ak1 , ak2 , . . . , akl−1 , (al + e + 1)k+t ). If k + t − rl′ is odd, by adding or deleting k + t − rl′ edges to G[Cl ], we are in the first case when the remainder is even. Hence, in this case we may obtain also the sequences s∗ = (ak1 , ak2 , . . . , akl−1 , (al + e)k+t ) and s∗ = (ak1 , ak2 , . . . , akl−1 , (al + e + 2)k+t ).  In order to guarantee that a better approximation can be given, we impose a condition on the maximum degree. Theorem 4.2. Let G be a graph with degree sequence s = (d1 , d2 , . . . , dn ), as in Theorem 4.1. Suppose that R =



1≤j≤l rj

is

even. If ∆2 < |E (G \ Cl )| − 3R/2 − 1, Then s∗ = (ak1 , ak2 , . . . , akl +t ) is graphic. Proof. From Theorem 4.1, we obtain a graph G′ with degree sequence s∗ = (ak1 , ak2 , . . . , akl−1 , (al + e + 1)k+t ) or s∗ = (ak1 , ak2 , . . . , akl−1 , (al + e + 1 ± 1)k+t ). Note that G′ is such that |E (G′ \ Cl )| = |E (G \ Cl )| − R. Let u, v ∈ Cl such that there are uu′ and vv ′ edges from Cl to G′ \ Cl , if u′ v ′ is not an edge of E (G′ \ Cl ) then we can complement along the alternating path uu′ v ′ v in G and decrease the degrees of u and v in one decreasing R in two, that is R′ = R − 2. Therefore we may suppose that u′ v ′ ∈ E (G′ \ Cl ). To find an alternating odd path, we need to find an edge xy ∈ E (G′ \ Cl ) such that u′ x ̸∈ E (G′ \ Cl ) and v ′ y ̸∈ E (G′ \ Cl ), then G contains the alternating path uu′ xyv ′ v . Assuming that all the edges xy ∈ E (G′ \ Cl ) are such that there is no such alternating path in G′ we get three cases. The edge xy has an endpoint that belongs to N (u′ )∩ N (v ′ ), both endpoints of the edge xy belong to N (u′ )\ N (v ′ ) or both endpoints belong to N (v ′ ) \ N (u′ ). Any edge different from these different types belongs to an alternating odd path from u to v . We denote by L = |N (u′ ) ∩ N (v ′ )| and M = ∆ − L. Therefore the maximum number of edges of these three types that G 1 , and making can have is 1 + 2M M 2+1 + ∆L = 1 + M 2 + M + ∆2 − ∆M. The maximum of this function is when M = ∆− 2 2

1 1 + ∆2 − ∆ ∆ − = ∆2 + 1. That is, if E (G′ \ Cl ) has more than ∆2 + 1 edges the substitution we obtain 1 + ∆ −22∆+1 + ∆− 2 2 we can find an alternating odd path from u to v . Since, by hypothesis ∆2 + 1 < |E (G \ Cl )| − 3R/2 = |E (G′ \ Cl )| − R/2, there is an edge w z ∈ E (G′ \ Cl ) such that u′ w, v ′ z ̸∈ E (G), therefore, by taking complements along the path uu′ w z v ′ v we remove two incident edges from Cl and keep the degrees of the remaining vertices the same. We may take complements along R/2 of such alternating paths removing R edges that were incident with Cl , yielding that s∗ = (ak1 , ak2 , . . . , akl +t ) is graphic. 

Another good approximation can be given using the following Lemma, cf. [10]. Lemma 4.1 ([10]). Let G be an n-vertex graph with distinguished vertices s and t (not necessarily distinct), and suppose that the set of vertices adjacent to s is equal to the set of vertices adjacent to t. Suppose also that δ and ∆ are natural numbers such that the degrees of all vertices in G other than s and t lie in the range [δ, ∆], and such that s and t themselves have degree at least δ + 1. If (∆ − δ + 1)2 ≤ 4δ(n − ∆ − 1) then there exist an edge disjoint alternating path in G that starts at s, ends at t, and has length 1, 3, 5, or 7. This lemma can be used to obtain a condition on a degree sequence to be graphic. Theorem 4.3. Let s = (d1 , d2 , . . . , dn ) be a degree sequence such that (d1 − dn + 1)2 ≤ 4dn (n − d1 − 1) and Then s is graphic.

n i

di is even.

Proof. The sequence s′′ = ((n − 1)n ) is the degree sequence of the complete graph on n vertices H. By Corollary 3.1, taking C = H, if (n − 1 − t )n is even then the sequence sˆ = (t n ) is graphic, hence sˆ = (dn1 ) or sˆ = ((d1 + 1)n ) is graphic depending whether (n − 1 − d1 ) or (n − 1 − d1 − 1) is even. Let G′ be the graph  with degree sequence sˆ, we suppose that sˆ = (d1 )n , the proof for the other case is similar. n We denote by R = ( d − d ) . Note that R is even, since n ( d ) is even and 1 i 1 1≤i≤n i di is even by hypothesis.

Since the maximum and minimum degrees of G′ are d1 and d1 ≥ dn , we know that (d1 − d1 + 1)2 ≤ (d1 − dn + 1)2 ≤ 4dn (n − d1 − 1) ≤ 4d1 (n − d1 − 1) and we may apply Lemma 4.1. Hence, we know that there is an alternating odd path between any vi′ and vh′ in V (G′ ), by taking the vertices vi′ and vh′ of degrees d1 , and complementing along such path, we decrease the degrees of vi′ and vh′ in one.

30

J. Salas, V. Torra / Discrete Applied Mathematics 188 (2015) 25–31

R ′ As the sum 1≤i≤n (d1 − di ) is even in each step, we can repeat this operation 2 times, for all vertices vj such that its degree is greater than dj and all j ≤ n. Note that in each step the condition of Lemma 4.1 holds for the corresponding graph. At the end we obtain a graph G with degree sequence s = (d1 , d2 , . . . , dn ) as desired. 



As an immediate consequence from this theorem, we have a condition that may be checked in constant time to know in advance if the anonymized degree sequence is graphic. Corollary 4.1. Let G be a graph with degree sequence s = (d1 , d2 , . . . , dn ). Let C be any partition of V (G), such that C =   l j=1 Cj , |Cj | = kj . Suppose that 1≤j≤l kj aj is even. k

k

k

k

If (a1 − al + 1)2 ≤ 4al (n − a1 − 1) then the sequence s∗ = (a11 , a22 , . . . , al−l−11 , al l ) is graphic. This approximation works for any partition C of the vertices of a graph G, in particular if the subsets of the partition C have between k and 2k − 1 elements, the sizes of the subsets in an optimal partition with respect to the metric δ2 . An optimal l partition corresponds to a partition C = j=1 Cj , |Cj | = kj , such that taking the corresponding anonymized degree sequence k

k

k

k

sOPT = (a11 , a22 , . . . , al−l−11 , al l ) is such that δ2 (s, sOPT ) ≤ δ2 (s, s′ ) for all k-anonymous degree sequences s′ . Using the algorithm of [8], an optimal partition with respect to the metric δ2 can be found. Hence, if the inequality (a1 − al + 1)2 ≤ 4al (n − a1 − 1) holds for the arithmetic means a1 and al , in a given optimal partition, we can guarantee optimal k-anonymity of a given graph G in terms of the distance δ2 or an approximation that is at distance at most 4k2 away from the optimal. Theorem 4.4. Let G be a graph with degree sequence s = (d1 , d2 , . . . , dn ). Let k be an integer, and let the optimal partition l C = j=1 Cj for approximating (d1 , d2 , . . . , dn ) with the arithmetic means, be such that (a1 − al + 1)2 ≤ 4al (n − a1 − 1).

There is a k-anonymous degree sequence s∗ such that δ2 (s∗ , s) < δ2 (s, sOPT ) + 4k2 .

Proof. Let G be a graph with degree sequence s = (d1 , d2 , . . . , dn ). Let the optimal partition C =

l

Cj , such that |Cj | = kj ,

k ≤ kj ≤ 2k − 1 for all j. If 1≤j≤l kj aj is even, then by Theorem 4.1 the degree sequence s = ( graphic.   If 1≤j≤l kj aj is odd, then there are rj and kt odd. Recall that rj = x∈Cj d(x) − aj |Cj |.

, a22 , . . . , al−l−11 , al l ) is





j=1 k a11

k

k

k

If j < t, rotate  an edge wvj to wvt , if t < j rotate an edge wvt to wvj . In both cases we obtain a graph in which for some t, kt ,rt and R = 1≤j≤l rj are odd. Therefore, by adding kt − rt or deleting kt + rt edges we obtain a graph, in which R is even, k

k

k

k

hence s∗ = (a11 , a22 , . . . , at t ± 1, . . . , al−l−11 , al l ) is graphic. Therefore, and in this case the distance δ2 (s∗ , s) = δ2 (s, sOPT )+ k2t . Hence, as kt < 2k we obtain that δ2 (s∗ , s) < δ2 (s, sOPT ) + 4k2 .  k

5. Algorithms In this section we sketch three algorithms for obtaining each of the approximations studied in Section 3. Algorithm 3 below is the one that permits to obtain a k-degree anonymity for a given degree sequence according to Theorem 4.4. Algorithm 1 (for Lemma 3.1) begin Order C such that its degrees are non-increasing, denote it C = c1 , . . . , ch . f = 1; l = h; while (f < l) do while (d(cf ) > a and d(cl ) < a ) do find w ∈ N (cf ) \ N (cl ) and rotate the edge cf w to w cl . end while If d(cf ) = a let f = f + 1. If d(cl ) = a let l = l − 1. end while end Algorithm 2 (for Theorem 3.1) begin Delete |X1 | edges between the vertices in C1 .  ˜ Find |X | = 21 vi ∈C1 (di − m) alternating paths of length 3 with both endpoints in C1 , denoted as ui vi wi xi . Delete the edges ui vi and rotate the edges wi xi to wi vi . Apply Algorithm 1 to this graph. end

J. Salas, V. Torra / Discrete Applied Mathematics 188 (2015) 25–31

31

Algorithm for k-degree anonymity ( for Theorem 4.4) begin Find the optimal partition C of V (G), using e.g. the optimal microaggregation in [9]. Find and take complements along R2 alternating paths in order to delete all rj . Apply Algorithm 1 for each Cj . end 6. Summary In this paper, given a graph G of order n and degree sequence s, we answered the question of which is the best approximation of G by a regular graph G′ of the same order, under three different metrics, the edge rotation distance for graphs and the rectilinear and Euclidean distance for degree sequences. We presented a condition for a degree sequence to be graphic, that depends only on its length and its maximum and minimum degrees. We obtained an approximation of the k-degree anonymity problem when the subsets are all of size k except one. Where also obtained an optimal approximation for subsets of any size greater than k, or at most an approximation at distance 4k2 to the optimal with the metric δ2 when the degree sequence for the optimal partition is not graphic. We outlined three algorithms based on this results that can be used in data privacy for graphs. Acknowledgements Partial support by the Spanish MEC projects ARES (CONSOLIDER INGENIO 2010 CSD2007-00004), ICWT (TIN2012-32757) and COPRIVACY (TIN2011-27076-C03-03) is acknowledged. Partial support of the European Project DwB (Grant Agreement Number 262608) is also acknowledged. References [1] L. Backstrom, C. Dwork, J.M. Kleinberg, Wherefore art thou R3579X?: Anonymized social networks, hidden patterns, and structural steganography, in: Proceedings of the 16th International Conference on World Wide Web (WWW’07), Alberta, Canada, May 2007, pp. 181–190. [2] V. Baláž, V. Kvasníčka, J. Pospíchal, Two metrics in a graph theory modeling of organic chemistry, Disc. Appl. Math. 35 (1992) 1–19. [3] G. Chartrand, F. Saba, H. Zou, Edge rotations and distance between graphs, Časopis Pěst. Mat. 110 (1985) 87–91. [4] S. Chester, B.M. Kapron, G. Ramesh, G. Srivastava, A. Thomo, S. Venkatesh, Why Waldo befriended the dummy? k-Anonymization of social networks with pseudo-nodes, Soc. Netw. Anal. Min. 3 (3) (2013) 381–399. [5] R.J. Faudree, R.H. Schelp, L. Lesniak, A. Gyárfas, J. Lehel, On the rotation distance of graphs, Discrete Math. 126 (1–3) (1994) 121–135. [6] C. Gini, Le Medie, Unione Tipografico-Editrice Torinese, 1958, p. 168. [7] W. Goddard, H.C. Swart, Distances between graphs under edge operations, Discrete Math. 161 (1996) 121–132. [8] S.L. Hansen, S. Mukherjee, A polynomial algorithm for optimal univariate microaggregation, IEEE Trans. Knowl. Data Eng. 15 (4) (2003) 1043–1044. [9] M. Hay, G. Miklau, D. Jensen, P. Weis, S. Srivastava, Anonymizing social networks, Tech. Rep., University of Massachusetts Amherst, 2007. [10] M. Jerrum, B.D. McKay, A. Sinclair, in: A. Frieze, T. Luczak (Eds.), When is a graphical sequence stable?, in: Random Graphs, vol. 2, Wiley-Interscience, 1992. [11] M.A. Johnson, Relating Metrics, Lines and Variables Defined on the Space of Graphs, in: Y. Alavi, G. Chartrand, L. Lesniak, C. Wall. (Eds.), Graph Theory and its Application to Algorithms and Computer Science, Wiley, New York, 1985, pp. 457–470. [12] K. Liu, E. Terzi, Towards identity anonymization on graphs. In SIGMOD Conference. (2008), 93–106. [13] D. Pagliuca, G. Seri, Some results of individual ranking method on the system of enterprise accounts annual survey, Esprit SDC Project, Deliverable MI-3/D2 (1999). [14] P. Samarati, Protecting respondents identities in microdata release, IEEE Trans. Knowl. Data Eng. 13 (6) (2001) 1010–1027. [15] K. Stokes, V. Torra, Reidentification and k-anonymity: a model for disclosure risk in graphs, Soft Computing. 16 (10) (2012) 1657–1670. [16] L. Sweeney, k-anonymity: a model for protecting privacy, Internat. J. Uncertain. Fuzziness Knowledge-Based Systems 10 (5) (2002) 557–570. [17] H. Wu, J. Zhang, J. Yang, B. Wang, S. Li, A Clustering Bipartite Graph Anonymous Method for Social Networks, J. Inf. Comput. Sci. 10 (18) (2013) 6031–6040. [18] B. Zelinka, On a certain distance between isomorphism classes of graphs, Časopis Pěst. Mat. 100 (1975) 371–373.