Optimal univariate microaggregation with data suppression

The Journal of Systems and Software 86 (2013) 677–682 Contents lists available at SciVerse ScienceDirect The Journal of Systems and Software journal...

Download PDF

947KB Sizes 0 Downloads 45 Views

Report

PDF Reader
Full Text

The Journal of Systems and Software 86 (2013) 677–682

Contents lists available at SciVerse ScienceDirect

The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss

Optimal univariate microaggregation with data suppression Michael Laszlo ∗ , Sumitra Mukherjee Graduate School of Computer and Information Sciences, Nova Southeastern University, Fort Lauderdale, FL, United States

a r t i c l e

i n f o

Article history: Received 7 February 2012 Received in revised form 4 October 2012 Accepted 27 October 2012 Available online 12 November 2012 Keywords: Data security Privacy Disclosure control Microaggregation Dynamic programming

a b s t r a c t Microaggregation is a disclosure limitation method that provides security through k-anonymity by modifying data before release but does not allow suppression of data. We deﬁne the microaggregation problem with suppression (MPS) to accommodate data suppression, and present a polynomial-time algorithm, based on dynamic programming, for optimal univariate microaggregation with suppression. Experimental results demonstrate the practical beneﬁts of suppressing a few carefully selected data points during microaggregation using our method. © 2012 Elsevier Inc. All rights reserved.

1. Microaggregation problem with suppression Agencies that provide data for statistical use employ disclosure limitation methods to protect the privacy of data subjects. These methods aim to reduce the risk of disclosure by either suppressing data or modifying data before release (see Adam and Wortmann, 1989; Duncan et al., 2011; Willenborg and De Waal, 2000 for a survey of such methods). Microaggregation is a disclosure limitation method for protecting sensitive data through data modiﬁcation. Under microaggregation, records are partitioned into groups that contain no less than a speciﬁed number of records k, and actual attribute values of records within each group are replaced by a common representative value prior to release. The group size restriction may help provide anonymity by ensuring that each data subject is indistinguishable from at least k − 1 other data subjects based on the released data (see Aggarwal et al., 2005; Ghinita et al., 2007; Samarati, 2001; Solanas et al., 2010; Sweeney, 2002; Templ and Meindl, 2008, 2010 for details on privacy protection through anonymization). Increasing the minimum group size k reduces disclosure risk at the expense of information loss due to data modiﬁcation. The goal in microaggregation is to group similar records together so as to minimize information loss. Issues related to disclosure risks and information loss under microaggregation are investigated in Nin and Torra (2009) and Templ and Meindl (2008, 2010). The focus of this paper is on a common version of the microaggregation problem (see e.g. Chang et al., 2007; Domingo-Ferrer et al., 2006, 2008;

∗ Corresponding author. E-mail address: [email protected] (M. Laszlo). 0164-1212/$ – see front matter © 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jss.2012.10.901

Domingo-Ferrer and Mateo-Sanz, 2002; Hansen and Mukherjee, 2003; Heaton and Mukherjee, 2011; Kokolakis and Fouskakis, 2009; Laszlo and Mukherjee, 2005, 2009; Martínez-Ballesté et al., 2007; Oganian and Domingo-Ferrer, 2001; Solanas et al., 2010; Solanas and Martínez-Ballesté, 2006) where the attributes are numeric, actual attribute values are replaced by group means, and the sum of the squared Euclidean distances of the records from the group mean they are replaced by is taken as a measure of information loss. The microaggregation problem (MP) may be formally stated as follows: a data set with d numerical attributes is modeled as a set X of points in Rd . A k-partition Pk (X) is a partition of X in which every group contains at least k elements. That is, Pk (X) = {Ci } where each |Ci | ≥ k, C ∈ P (X) Ci = X, Ci ∩ Cj = / for i = / j. The informai

k

tion loss for a group Ci is given by SSE(Ci ) =

−1

C¯ i = |Ci |

x ∈ Ci

x ∈ Ci

||x − C¯ i ||2 , where

x is the group mean vector that replaces actual

attribute values within the group. Given X and k, the microaggre∗ gation problem seeks an optimal k-partition Pk (X) that minimizes information loss SSE(Pk (X)) = SSE(C ) i over all k-partitions C ∈ P (X) i

k

of X. A few data points may have a disproportionate inﬂuence on the partitions formed during microaggregation, thereby resulting in unacceptably high information loss. Removing a few judiciously selected points could signiﬁcantly decrease the SSE in the remaining modiﬁed records that are released. We formulate the microaggregation problem with suppression (MPS) as a generalization of MP to allow for the suppression of up to p points: Given point set X, group size threshold k, and suppression threshold p, the goal is to ﬁnd a subset of suppressed points S ⊆ X where |S| ≤ p and a k-partition over the remaining points X − S that minimizes SSE(Pk∗ (X − S)).

678

M. Laszlo, S. Mukherjee / The Journal of Systems and Software 86 (2013) 677–682

2. Related work

4. Univariate microaggregation with suppression

Standard microaggregation (MP) is known to be NP-hard (Oganian and Domingo-Ferrer, 2001), and MPS is at least as hard as MP insofar as it must also identify a subset S ⊆ X of points to be suppressed (i.e., MP reduces to MPS with suppression threshold p = 0). Several heuristic methods have been proposed to identify good kpartitions. These methods may be classiﬁed into two broad classes: ﬁxed-size microaggregation and data-oriented microaggregation. Fixed-size microaggregation heuristics partition the points so that each group contains exactly k points. Extant ﬁxed-size microaggregation methods are described in Chang et al. (2007), Defays and Anwar (1995), Domingo-Ferrer and Mateo-Sanz (2002) and Kokolakis and Fouskakis (2009). Data-oriented microaggregation methods form groups of variable size. By relaxing the constraint that every group must contain exactly k points, these methods potentially yield lower information loss. Data-oriented microaggregation heuristics have been proposed in Domingo-Ferrer et al. (2006), Domingo-Ferrer and Mateo-Sanz (2002), Domingo-Ferrer et al. (2008), Laszlo and Mukherjee (2005), Martínez-Ballesté et al. (2007), Solanas et al. (2010) and Solanas and Martínez-Ballesté (2006). Under such methods valid groups contain between k and 2k − 1 points. These limits on group size follow because groups of size less than k are forbidden, and groups of size greater than 2k − 1 can be partitioned into two groups of size at least k with a lower total sum of squared errors. Comparison of microaggregation methods is presented in Domingo-Ferrer and Torra (2001) and Nin and Torra (2009). Approximation guarantees provided by microaggregation heuristics have been obtained in Domingo-Ferrer et al. (2008) and Laszlo and Mukherjee (2009). While the multivariate microaggregation problem is NP-hard, it has been shown that in the univariate case an optimal partition corresponds to a shortest-path in a network constructed on the sorted points (Hansen and Mukherjee, 2003). Domingo-Ferrer et al. (2006) observe that the shortest path method proposed in Hansen and Mukherjee (2003) yields the best k-partition consistent with a given ordering of points in multidimensional space. They propose good heuristics for ordering points and demonstrate that this approach yields better solutions than extant heuristics. Other studies (see Heaton and Mukherjee, 2011) provide corroboratory evidence that the exact univariate method when applied to a good ordering of points in multidimensional space yields k-partitions comparable to the best available microaggregation heuristics. This motivates our focus on univariate MPS (UMPS) in this paper. We believe that UMPS will provide a foundation for devising good heuristics for multidimensional microaggregation with suppression along similar lines as Domingo-Ferrer et al. (2006) and Heaton and Mukherjee (2011).

For notational convenience, we assume that the values X = {x1 , x2 , . . ., xn } are indexed in nondecreasing order, that is, xi ≤ xj if i < j. We use Xi,j ⊂ X to denote the set Xi,j = {xl |i ≤ l ≤ j}; note that Xi,j = if j < i. For any nonempty set C ⊂ X, the open convex hull of C is deﬁned by H(C) = {x ∈ R| min(C) < x < max(C)}. A partition is said to have the consecutiveness property if for any pair of distinct groups Ci and Cj of the partition, H(Ci ) ∩ H(Cj ) = . A partition has the strong consecutiveness property if the consecutiveness property holds and H(Cj ) ∩ S = for all j, where S is the set of suppressed values. Intuitively, a partition has consecutiveness if its groups appear in nonoverlapping intervals of the real number line, and strong consecutiveness if, in addition, suppressed values appear only between successive groups but never intra-group.

3. Our contributions This paper is the ﬁrst to accommodate data suppression in microaggregation (MP) and formulate a generalized version MPS. Univariate MP (i.e., without suppression) is solvable in polynomial time (Hansen and Mukherjee, 2003), and we extend this result to univariate MPS (i.e., with suppression). Our main contributions are to show that UMPS is in NP, to present an efﬁcient exact algorithm, and to demonstrate how practitioners might use our approach to decide whether to suppress points when implementing microaggregation. Section 4 presents our algorithm, based on dynamic programming, for identifying optimal k-partitions. Section 5 provides complexity analysis for our algorithm. Section 6 demonstrates the applicability of our method, and Section 7 concludes with some observations on our contributions.

Theorem 1. An optimal partition with suppression has the strong consecutiveness property. Proof. It is established in Domingo-Ferrer and Mateo-Sanz (2002) and Hansen and Mukherjee (2003) that optimal partitions have the consecutiveness property, so it sufﬁces to show that no suppressed points lie in the open convex hull of any group. Let C be a group in a partition with mean , |C| = n, l = min(C), and u = max(C). Without loss of generality, let s be a suppressed value such that l < s ≤ < u (the following argument is analogous if ≤ s < u). We use the fact that the mean of (C − {l} ∪ {s}) is given by + n−1 (s − l).

SSE SSE(C) − SSE(C − {l} ∪ {s}) =

−

2

2

2

x + l − n

x ∈ C−{l}

2

−1

2

x + s − n( + n

(s − l))

2

x ∈ C−{l} 2

= n[( + n−1 (s − l)) − 2 ] − (s2 − l2 ) = (s − l)[2 + n−1 (s − l)] − (s − l)(s + l) = (s − l)[( − s) + ( − l) + n−1 (s − l)] > 0 since ( − l) ≥ (s − l) > 0 and ( − s) ≥ 0. It follows that in any optimal partition, every suppressed value s must lie between groups and not within the open convex hull of any group (i.e., l < s < u is excluded). Due to strong consecutiveness, we can formulate a pair of mutual recurrence relations that solves UMPS. In the following relations, F(i, r) is the cost (i.e., SSE) of an optimal solution of X1,i where at most r values of X1,i are suppressed. Accordingly, F(n, p) solves the original problem, yielding the cost of an optimal partition of X = X1,n where at most p values are suppressed: F(i, r) =

G(i, r) =

min

{G(i − j, r − j)}

j=0,min{i,r}

⎧ ⎪ ⎨ 0 if i = 0 ⎪ ⎩

∞ if 0 < i < k min

{SSE(Xi−j+1,i ) + F(i − j, r)} otherwise

j=k,min{2k−1,i}

Theorem 2. F(i, r) computes a minimum-cost partition of X1,i where up to r values are suppressed. Proof. We ﬁrst observe that G(i, r) is the cost of an optimal partition of X1,i where at most r values are suppressed but the rightmost value xi of X1,i is included (i.e., not suppressed). This clearly holds when no values remain (i = 0) and when too few values remain to form a sufﬁciently large group (0 < i < k). In the general case, the

M. Laszlo, S. Mukherjee / The Journal of Systems and Software 86 (2013) 677–682

structure of an optimal k-partition of X1,i that includes xi comprises some group to which xi belongs together with the groups of an optimal k-partition (with up to r suppressions) of the remaining values. Theorem 1 implies that in any optimal solution, the group to which xi belongs comprises only rightmost points of X1,i . Since there always exists an optimal partition with groups of size between k and 2k − 1 inclusive, there are at most k groups to which xi can belong, hence at most k partitions conform to that structure: G(i, r) computes one of minimal SSE. Next, we observe that the structure of an optimal k-partition of X1,i with up to r suppressions is an optimal k-partition of X1,i−j that includes value xi−j while suppressing j rightmost values, where 0 ≤ j ≤ min{i, r}. Hence F(i, r) computes a kpartition that conforms to that structure and is of minimal SSE. An algorithm based on dynamic programming stores the solutions to subproblems in the (n + 1) × (p + 1) matrices F and G. Since G[i, r] depends on values of F belonging only to previous rows and F[i, r] depends on values of G of row index no greater than i and column index no greater than r, F and G can be computed in row-major order if G[i, r] is computed before F[i, r] for each pair of indexes i and r. The following code results:

679

5. Complexity To avoid redundant calls to compute the SSE of groups, we precompute an (n − k + 1) × k table storing the SSE values for every contiguous sequence of values of size between k and 2k − 1, and then implement calls to the SSE function by lookup in this table. With regard to asymptotic running time, initialization of this array SSE takes time O(nk2 ), and procedure UMPS runs in O(np·max(p, k)) time. This implies a total running time of O(nk2 + np·max(p, k)). Under the reasonable assumption that k is small and p ∈ ˝(k), the algorithm runs in O(np2 ) time. The optimal univariate microaggregation algorithm (Hansen and Mukherjee, 2003) has complexity O(nk2 ). Hence at marginally higher computational cost, it is possible to determine the impact of data suppression in univariate microaggregation while identifying the best values to be suppressed. Note that a single run yields optimal partitions for every r = 0, 1, . . ., p; the optimal SSE for suppression of up to r values is found in array element F[n, r]. On a problem instance with n = 10,000 data values, p = 50, and k = 20, our Java bytecode executes in about 2 s. Additionally, this analysis has assumed that the data values are

UMPS(int n, int p) { for i=0,…,n (1) (2) for r=0,...,p if (i=0) G[i,r] ← 0; (3) (4) else if (i
(6)

} Although our focus has been on computing optimal costs, optimal partitions themselves can be recovered by maintaining subsolutions during the computation of F and G. The appendix explains how this is done.

passed to UMPS in nondecreasing order. Including the cost of sorting the data values amends the total complexity to O(max(n log(n), nk2 + np·max(p, k))) time.

Fig. 1. Average RIL under different distributions.

Number p of suppressions required

680

M. Laszlo, S. Mukherjee / The Journal of Systems and Software 86 (2013) 677–682

35

Mixture of normal distribuons

30 25 20 15 10 5 0 2

3

4

50% reducon in IL

5

6

7 8 9 10 11 12 13 14 15 16 17 18 19 20 Minimum group size k 60% reducon in IL 70% reducon in IL 80%reducon in IL

Fig. 2. Number of suppressions required to achieve desired reduction in information loss.

6. Experimental results

Fig. 5. Decrease in RIL with increasing number of suppression with Tarragona data.

We ran experiments using sets of n = 10,000 randomly generated values based on random variables with the following four distributions: uniform (between 0 and 1); normal (with = 0, = 1); log-normal (with the variable’s natural logarithm having = 0, = 1); and a mixture of 3 equiprobable normal distributions with varying parameters (1 = 0, 1 = 0.2; 2 = 0.5, 2 = 0.1; 3 = 0.8, 3 = 0.1). Thirty data sets were independently generated with each of these four distributions. UMPS was run on each data set with the group size threshold k varying from 2 to 20 (typical values of k used by data protection agencies range between 3 and 15) and suppression threshold p varying from 0 to 50 (at most 0.5% of the n = 10,000 values are suppressed when p = 50). Consistent with previous studies, UMPS returns the suppressed set S and the normalized information loss IL(k, p) = SSE(Pk∗ (X − S))/SSE(X), that is, the ratio of the optimal SSE (in the released points) for given p and k, to the SSE of the worst possible partition with no suppression (i.e., when all values belong to a single group). We deﬁne the

relative information loss RIL(k, p) = IL(k, p)/IL(k, 0) as the ratio of the IL in the released data (with up to p suppressions) to the SSE that results under optimal microaggregation with no suppression; the lower the RIL, the greater the potential beneﬁt of suppression. Fig. 1 plots RIL averaged over the 30 runs for different values of k and p with the four distributions. The beneﬁts of suppression during microaggregation are negligible when the values are uniformly distributed. For the remaining distributions, the suppression of just a few values signiﬁcantly decreases RIL, but additional suppressions have minimal impact. Our method allows a practitioner to decide whether suppressions are warranted for arbitrary distributions. Further, it helps determine the minimum number of suppressions sufﬁcient to reduce SSE of the unsuppressed points to a desired level. Fig. 2 plots the number of suppressions required to achieve a desired reduction in RIL in the released data when data comes from a mixture of normal distributions. For a given k, percentage reduction in RIL with

Fig. 3. Row p (zero-based indexing) depicts solution for 48 points from a uniform distribution; k = 4.

Fig. 4. Row p (zero-based indexing) depicts solution for 48 points from a normal distribution; k = 4.

M. Laszlo, S. Mukherjee / The Journal of Systems and Software 86 (2013) 677–682

p suppressions is computed as 100 × (1 − RIL(k, p)). For example, it shows that when minimum group size k = 5, suppressing only 2 values results in a 50% reduction; to attain a reduction of 80%, 7 values must be suppressed. It is helpful to visualize the partitions that result as the point suppression threshold p varies. Fig. 3 depicts optimal solutions for a set of 48 real numbers drawn from a uniform distribution, where k = 4, and p ranges from 0 through 15. Each row p (zero-based indexing) depicts the optimal partition where at most p values are suppressed (so the ﬁrst row, row 0, solves microaggregation without suppression). Successive groups of a partition are boxed alternately in light gray and dark gray; suppressed values are unboxed (in white background). Fig. 4 depicts the analogous results for a set of 48 real numbers drawn from a normal distribution. As p increases, suppressed points tend to be drawn from the normal distribution’s outliers, but not exclusively.

681

how our method might be used for devising good heuristics for multidimensional microaggregation with suppression along similar lines as Domingo-Ferrer et al. (2006) and Heaton and Mukherjee (2011), we used the Tarragona dataset that has been used in several previous studies on microaggregation (Domingo-Ferrer et al., 2006; Domingo-Ferrer and Mateo-Sanz, 2002; Heaton and Mukherjee, 2011; Laszlo and Mukherjee, 2005). The data set contains 834 records with 13 numerical variables. The records are ordered using the nearest neighbor heuristic described in Domingo-Ferrer et al. (2006) as follows: start the order with a point farthest from the dataset’s centroid, and then iteratively append, from the remaining points, the point nearest to the last appended point. UMPS is applied to the resulting set of ordered records. Fig. 5 displays the reduction in RIL as the number of suppressions increases as k ranges from 2 to 20. Developing promising heuristic strategies for ordering multivariate points for microaggregation with suppression remains an open research challenge.

7. Conclusion Microaggregation, a disclosure limitation method, seeks to partition datasets into sufﬁciently large groups while minimizing loss of information. By suppressing a small number of data values, it may be possible to signiﬁcantly reduce the distortion in the released microaggregated data. Microaggregation with suppression seeks to minimize the SSE in the released data while suppressing no more than a bounded number of values speciﬁed by the suppression threshold p. This paper presents an efﬁcient and uncomplicated algorithm, based on dynamic programming, for optimally solving univariate microaggregation with suppression (UMPS), extending

Appendix. We present pseudocode for recovering an optimal partition from the values X1,n with up to p suppressions. The following version of UMPS augments the previous version with the (n + 1) × (p + 1) matrices FP and GP. Here FP[i, r] stores the optimal number of rightmost values to suppress in the problem instance n = i and p = r, and GP[i, r] stores the number of values of the rightmost group in an optimal partition. The values FP[i, r] and GP[i, r] need not be computed for i < k, assuming some k-partition exists (i.e., k ≤ n or n ≤ p).

UMPS(int n, int p) { for i=0,…,n for r=0,...,p if (i=0) G[i,r] ← 0; else if (i
(1) (2) (3) (4) (5) (6)

GP[i,r] ← j; } j ← argmin j=0,min{i,r}{G[i − j,r − j]};

(7)

F[i,r] ← G[i-j,r-j]; FP[i,r] ← j;

(8) (9)

} the work of Hansen and Mukherjee (2003) on standard univariate microaggregation (i.e., without suppression). By inspecting a candidate set of points for suppression that our method identiﬁes, a practitioner may decide whether suppressions are indeed warranted based on such considerations as bias introduced into statistical estimates. At only marginally higher computational cost than MPS, it is possible to determine the impact of data suppression in univariate microaggregation while identifying the best values to be suppressed. We are presently researching this method’s use in heuristics for the NP-hard problem of microaggregating multivariate data with suppression: construct a linear ordering of the data points in space, and then apply the UMPS algorithm to this ordering. To illustrate

Reconstruction is performed by the clusters procedure:

Partition clusters(n, p) { return clusters2(n, p, ∅); } In the clusters2 procedure which follows, partition is an accumulator parameter that stores the partition constructed so far. If n ≤ p, the remaining n values are suppressed so we simply return partition. Otherwise we obtain the number of rightmost values to suppress (r = FP[n, p]), then construct a new cluster C over the previous m = GP[n − r, p − r] values, and then call clusters2 recursively on the values to the left of C, where the number of allowed suppressions is p − r, and the accumulated partition also contains the new cluster C:

682

M. Laszlo, S. Mukherjee / The Journal of Systems and Software 86 (2013) 677–682

Partition clusters2(n, p, partition) { if (n <= p) return partition; // suppress r rightmost values r = FP[n,p]; // build new rightmost cluster C ← ∅; // the new cluster contains m values m = GP[n-r,p-r]; // add the m values to cluster C for i = n-r-m+1 to n-r C ← C ∪ {xi}; // recur on remaining leftmost n-r-m values return clusters2(n-r-m, p-r, partition∪{C}); } References Adam, N.R., Wortmann, J.C., 1989. Security control methods for statistical databases: a comparative study. ACM Computing Surveys 21 (4), 515–556. Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A., 2005. Approximation algorithms for k-anonymity. Journal of Privacy Technology, Paper No. 20051120001, http://ilpubs.stanford.edu:8090/645/ Chang, C.C., Li, Y.C., Huang, W.H., 2007. TFRP: an efﬁcient microaggregation algorithm for statistical disclosure control. Journal of Systems and Software 80 (11), 1866–1878. Defays, D., Anwar, N., 1995. Micro-aggregation: a generic method. In: Proceedings of the 2nd International Symposium on Statistical Conﬁdentiality, Eurostat, Luxemburg, pp. 69–78. Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J.M., Sebé, F., 2006. Efﬁcient multivariate data-oriented microaggregation. The VLDB Journal 15, 355–369. Domingo-Ferrer, J., Mateo-Sanz, J.M., 2002. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14 (1), 189–201. Domingo-Ferrer, J., Sebe, F., Solanas, A., 2008. A polynomial-time approximation to optimal multivariate microaggregation. Computers & Mathematics with Applications 55 (4), 714–732. Domingo-Ferrer, J., Torra, V., 2001. A quantitative comparison of disclosure control methods for microdata. In: Conﬁdentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Elsevier Science, North Holland, Amsterdam, pp. 111–133. Duncan, G.T., Elliot, M., Salazar-González, J., 2011. Statistical Conﬁdentiality: Principles and Practice. Springer, New York. Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N., 2007. Fast data anonymization with low information loss. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 758–769. Hansen, S.L., Mukherjee, S., 2003. A polynomial algorithm for optimal univariate microaggregation. IEEE Transactions on Knowledge and Data Engineering 15 (4), 1043–1044. Heaton, W.B., Mukherjee, S., 2011. New record ordering heuristics for disclosure control through microaggregation. In: Proceedings of the International Conference on Advances in Communication and Information Technology – CIT 201, December 01–02, 2011, Amsterdam, Netherlands. Kokolakis, G., Fouskakis, D., 2009. Importance partitioning in micro-aggregation. Computational Statistics and Data Analysis 53 (7), 2439–2445. Laszlo, M., Mukherjee, S., 2005. Minimum spanning tree partitioning algorithm for microaggregation. IEEE Transactions on Knowledge and Data Engineering 17 (7), 902–911. Laszlo, M., Mukherjee, S., 2009. Approximation bounds for minimum information loss microaggregation. IEEE Transactions on Knowledge and Data Engineering 21 (11), 1643–1647.

Martínez-Ballesté, A., Solanas, A., Ferrer, J., Sanz, J., 2007. A genetic approach to multivariate microaggregation for database privacy. In: ICDEW’07 Proceedings IEEE 23rd International Conference on Data Engineering Workshop, pp. 180–185. Nin, J., Torra, V., 2009. Analysis of the univariate microaggregation disclosure risk. New Generation Computing 27 (3), 197–214. Oganian, A., Domingo-Ferrer, J., 2001. On the complexity of optimal microaggregation for statistical disclosure control. Statistical Journal of the United Nations Economic Commission for Europe 18 (4), 345–354. Samarati, P., 2001. Protecting respondents’ identities in microdata release. IEEE Transaction on Knowledge and Data Engineering 13 (6), 1010–1027. Solanas, A., Gonzaˇılez-Nicolaˇıs, U., Martiˇınez-Ballesteˇı, A., 2010. A variable-MDAVbased partitioning strategy to continuous multivariate microaggregation with genetic algorithms. In: The 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, pp. 1–7. Solanas, A., Martínez-Ballesté, A., 2006. V-MDAV: variable group size multivariate microaggregation. In: COMPSTAT 2006, Rome, pp. 917–925. Sweeney, L., 2002. k-Anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10 (5), 557–570. Templ, M., Meindl, B., 2008. Robustiﬁcation of microdata masking methods and the comparison with existing methods. Lecture Notes on Computer Science, vol. 5262. Springer-Verlag, pp. 113–126. Templ, M., Meindl, B., 2010. Practical applications in statistical disclosure control using R. In: Privacy and Anonymity in Information Management Systems New Techniques for New Practical Problems. Springer-Verlag, Berlin, pp. 31–62. Willenborg, A., De Waal, T., 2000. Elements of Statistical Disclosure Control. Springer-Verlag, New York. Sumitra Mukherjee received his PhD in Decision and Information Systems from Carnegie Mellon University. He is a professor in the Graduate School of Computer and Information Sciences at Nova Southeastern University. His research interests include data security and artiﬁcial intelligence. His publications appear in such journals as Management Science, Journal of the American Statistical Association, IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Knowledge and Data Engineering, Discrete Applied Mathematics, Operations Research Letters, Optimization Letters, Pattern Recognition Letters, Environmental Management, Journal of Computing, and Annals of Operations Research. Michael Laszlo, who received the Ph.D. degree in computer science from Princeton University, is a professor in the Graduate School of Computer and Information Sciences at Nova Southeastern University. He is the author of several textbooks on programming and computer graphics, and his work focuses on computer graphics, algorithms, and programming. His publications have appeared in such journals as IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Knowledge and Data Engineering, Algorithmica, Optimization Letters, Pattern Recognition Letters, and Operations Research Letters.

Optimal univariate microaggregation with data suppression

Optimal univariate microaggregation with data suppression

Recommend Documents