The controlled rounding problem: Complexity and computational experience

European Journal of Operational Research 65 (1993) 207-217 North-Holland 207 Theory and Methodology The controlled rounding problem: Complexity and...

Download PDF

2MB Sizes 0 Downloads 26 Views

Report

PDF Reader
Full Text

European Journal of Operational Research 65 (1993) 207-217 North-Holland

207

Theory and Methodology

The controlled rounding problem: Complexity and computational experience James P. Kelly, Bruce L. Golden and Arjang A. Assad College of Business and Management, Unit,ersity of Maryland, College Park, MD 20742, USA Received December 1990; revised July 1991

Abstract: Controlled rounding is a procedure whereby tabular data gathered from respondents is perturbed in such a way as to preserve the anonymity of the respondents while maintaining the integrity of the data. Controlled rounding of two-dimensional tables can be represented by a straightforward network flow representation. Three-dimensional controlled rounding problems, on the other hand, have no pure network analog and have been shown to be NP-complete. A number of complexity results associated with the three-dimensional problem and its variants are discussed. In addition, we review several solution procedures for solving two- and three-dimensional problems. The algorithms reviewed are based on network flow models and linear programming techniques embedded within binary search procedures. We next turn to heuristics in order to reduce the running times of the solution procedures. Of these, simulated annealing is the most effective procedure for reducing the computational burden of the three-dimensional problem. Keywonls: Statistics: censoring; Integer programming: heuristics, applications; Linear programming: applications; Government: agencies

Introduction

Census organizations throughout the world have the responsibility of collecting, processing, and presenting data concerning various aspects of their populations. When publishing data in tabular form, it is often necessary for these organizations to present data that is rounded. In a rounded table, the actual numerical values are rounded to a nearest or appropriate multiple of an integer base. The need for rounding may arise when the data must be perturbed to prevent inadvertent Correspondence to: B.L. Golden, College of Business and Management, University of Maryland, College Park, MD 20742, USA.

statistical disclosure of confidential information. This is becoming increasingly more important as the amount of data collected and processed by statistical bureaus around the world becomes greater [Fellegi 1972, Nargundkar 1972]. Roundings may also arise when presenting data of any kind in terms of thousands (or hundreds, etc.). The controlled rounding problem arises from the following question: Given a frequency count table (two- or three-dimensional), representing characteristics of a specific population, with row sums, column sums, level totals, subtotals, and grand total, how can we publish accurate data while at the same time preserving the anonymity of those who responded? Collectors of data must guarantee anonymity to ensure that potential re-

0377-2217/93/$06.00 © 1993 - Elsevier Science Publishers B.V. All rights reserved

J.P. Kelly et al. / Controlled rounding problem

208

spondents will disclose information freely and without reservations. This is essential in order to maintain the completeness and, ultimately, the utility of the published data. One way to protect the confidentiality of the respondents is to slightly perturb the data. The perturbations must be large enough to prevent disclosure, while at the same time causing a minimal amount of data distortion. It should be pointed out that determining the effectiveness of any protection scheme, such as rounding, is a difficult and somewhat subjective task. Methods for determining the level of protection provided by a suppression scheme (data are omitted) are discussed in Gusfield (1988, 1990). These methods are based on a graph theoretic approach to statistical security. Probably the simplest approach for perturbing data is to round each original entry to the nearest multiple of some prespecified integer base. This deterministic rounding approach can cause large distortions of the data, since rounded sums may differ substantially from the sums in the original table. For example, consider the two-dimensional table shown in Figure 1. The 4 x 4 table consists of sixteen internal entries, four column sums, four row sums, and a grand total sum. Using base 5, every internal entry of the table must round to 0 or 5. The deterministic rounding of the table is also shown in Figure 1. The row and column sums associated with the first row and column change significantly (6 units) when rounded. This lack of control over the difference between the original and rounded sums is the reason why deterministic rounding is undesirable, For a perturbation method to be acceptable, it must provide control over these differences. This notion of control is the motivation

~

behind controlled rounding. A rounding is said to be controlled if the differences between the original entries and sums and the rounded entries and rounded sums are all less than some prespecified integer base b > 0. Note that this notion of control focuses on how sums behave under rounding as even deterministic rounding ensures that internal entries are not perturbed by more than [b/2l in the rounding process. Another perturbation technique is called random rounding, where each element is rounded to an adjacent multiple of a prespeeified base b > 0. The probabilities of rounding to the lower or the upper multiple are both ~. This method tends to round sums composed of n entries to a rounded sum of ½nb which may or may not control the differences between the original and rounded sums.

Nargundkar and Saveland (1972) proposed unbiased randomized rounding as a procedure to perturb the data. This method assigns probabilities to each rounding that encourage table entries to round to the same values as obtained by deterministic rounding. For example, if base 3 is used, then a 1 can round to 0 with probability ~ and can round to 3 with probability ~. The approach controls the rounding of the internal entries but can produce roundings whose sums greatly deviate from the original sums. Fellegi (i975) proposed a method which controlled the rounding procedure along one dimen. sion. This procedure can control rounded sums associated with the columns, but it fails to simultaneously control the rounded sums associated with the table rows, or vice versa. In either ease, the rounding of the grand total is uncontrolled. Furthermore, if three-dimensional tables are con-

L

0

3

3

3

9 I

0

5

5

5

2

I

0

2

5

0

0

0

0

2

0

3

4

9 I

0

3

5

5

2

2

5

3

6

6

II

Original

Table

, .... >

Deterministic

Figure I, Example of deterministic rounding (base 5)

~~'~t

Rounding

J.P.

Kelly et al. / Controlh,d rounding problem

sider,d, then only a small fraction of the sums associated with the tables can be controlled by this approach. Causey (1979) improved upon Fellegi's approach by proposing a heuristic procedure which provided control in two dimensions in 93% of the tables he tested. Dalenius (1981) developed a procedure which could simultaneously control the sums associated with the columns and the rows of two-dimensional tables. However, this procedure is not easily applicable to large tables that have many entries which must be rounded because it requires the user to determine a set of 'cycles' that cover all of the entries to be rounded. These cycles are composed of four internal entries such that if one moves around each cycle, adding and alternatively subtracting an integer, then a rounded table results. Controlled rounding, which evolved from the above procedures, is a technique which introduces slight perturbations into the internal entries and associated sums of the original data. These controlled perturbations provide an excellent approximation to the entries and sums of the original data. The end user receives a table which provides anonymous and complete data. It has been estimated that approximately 90% of the hundreds of millions of tables published by the Bureau of the Census eve~3, ten years are two-dimensional; however, this still leaves millions of tables which have three or more dimensions, and the vast majority of these higher-dimensional tables are three-dimensional (Greenberg, 1988b). The two-dimensional problem has been widely studied and several elegant solution procedures exist. The three-dimensional controlled rounding problem, in contrast, is known to be NP-complete. The plan of this paper is as follows: Section 1 defines and formulates the controlled rounding problem in two and three dimensions. For the two-dimensional case, a network flow formulation is presented. Complexity results associated with the three-dimensional ease and its variants appear in Section 2. Section 3 reviews several algorithms which have been developed to solve the controlled rounding problem. In Section 4, computational experience comparing various algorithms is presented. Section 5 summarizes research related to the controlled rounding problem.

209

1. Formulation of the controlled rounding problem Consider a three-dimensional table or array A with nono-negative integer entries auk, where 1 < i < m, 1 < j < n, 1 < k <19. We use the usual 'dot' notation for sums of the table entries in the table as defined below. tn

a ..ik = ~-, aijk = Column j sum at level k. i=1 I!

ai.k "- E aijk - Row i sum at level k. p

aij.= E aijk = Shaft (vertical) sum. k~l I)1

ti

= g g

- Sum at level k.

i=ly=i m

P

a.j.= E Y', auk i=1 k=!

- Column j sum over various levels. n

,,,..=

P

E E j=l

k~l

= Row i sum over various levels. m

tl

a...~ ~ ~

P

~_, auk = Grand total of all entries.

Variables of the form auk are called internal variables and the remaining variables (variables with dots) are called summation variables. We refer to ai.., a.j., and a..k as face sums of the A-matrix. We use the tbllowing notation: N --- The set of natural numbers {0, 1, 2,...}. b - A positive integer selected as the base. [ x l - T h e smallest integer greater than or equal to x. [x] -= The largest integer smaller than or equal to X.

For a fixed base b, we define two rounding operators as

R-(a) =b[a/b],

R+(a) =b[a/b].

A rounding scheme that leaves elements of bN among original table entries or their sums invariant is called zero-restricted (Cox, 1987). The rounding operators R + and R - perform exactly

ZP. Kellyet al. / Controlled roundingproblem

210

this function: If a is a multiple of the base b, then R-(a) = R+(a) = a.

Zero-restricted two-dimensional controlled rounding (ZR-2DCR)

Zero-restricted three.dimensional controlled rounding (ZR-3DCR)

Most of the work in the literature on controlled rounding addresses the two-dimensional case which can be formulated by replacing (1.1)(1.9) with (2.1)-(2.5). The notation used is completely analogous to the three-dimensional formulation.

Given a three-dimensional m x n x p table A' and a base b as input, construct a table X of the same dimensions, with elements x~jk, by the following steps: t 1. Given a table A' with entries a~j, ~ N and a base b >0, let q~jk=R-(a~jk) and set a~jk ¢-a~jk(mod b). 2. Given A, find a table Y with entries y~, satisfying: 0 ~ YUk S R +(aUk)

for all i, j , k ,

(1.1)

R-(a.jk) -
for all j, k,

(1.2)

R°(a~.k) -
for all i, k,

(1.3)

R-(a~.) $y~j.s;R+(a~j.)

for all i, j,

(1.4)

R-(a~..) ~:y~..~;R+(a~..)

for all i,

(1.5)

R ~ (a.j.) ~ y.j.~; R* (a.j.)

R°(a..k) ~Y..k £ R * ( a . . k )

R°(a,,,)

for all j, for all k,

(1.6) (1.7)

R*ta,.,),

it.s)

for all i, j, k.

(t.9)

with Y,jk ¢ {0, b}

3. Return X - Y + Q. X is the zero.resMcted three.dimensional controlled rounding (ZR-3DCR) of A'. The above formulation is consistent with Kelly et al. (1990). A slightly different, but equivalent, formulation is given by Fagan, Greenberg and Hemmig (1988). A zero-restricted controlled rounding has traditionally been considered to be an ideal solution. Although not considered here, it is sometimes desirable to minimize the sum of the squares of the differences between the original and rounded entries (Cox and Ernst, 1987). If a problem fails to have a zero-restricted solution, then the constraints (I.1)-(1.8) are r e l a ~ ' until a solution is found. The relaxation of co~straints is discussed in Section 2.

R-(a~j)
for all i, j,

(2.1)

R-(a.j)
for all j,

(2.2)

R - ( ai. ) -
for all i,

(2.3) (2.4)

with YO¢ {0, b}

(2.5)

for all i, j.

Previous work has demonstrated that this problem always admits a feasible solution that can be found efficiently with a network flow algorithm (Baranyai, 1975; Cox and Ernst, 1982). Baranyai (1979) and Cox (1987) both present simple polynomial algorithms that cleverly exploit the underlying network structure. In this discussion, we use an example to briefly describe the network flow model for this problem. Consider the controlled rounding of the 3 x 4 table A shown in Figure 2 using base b ~ 3 (pnly internal entries are shown). The constraints (2.1)-(2.5) restrict the rounding Y, with entries y~, as follows: YI! ~ Y14 ~ Y24 = Y31 "~ 0, and all other y~ ¢ {0, 3}; Yt.--Y.3 =3; Y.t, Y.4¢{0, 3}; Y2., Y3.E {3, 6}; Y.2 -- 6; y..= 12. if the above constraints are divided by 3 and we define x~j~ ~y~, then the constraints can be rewritten as: Xll taxi4 ~:x24 ~x31 ~0, and all other x~j~ {0, 1}; xt.=x. 3 ~ 1; x. I, x.4 E {0, 1}; x2., x3.¢ {1, 2}; x. z =2; x..~ 4.

b.

=

0

Figure 2. Two-dimensional controlled rounding example (base 3)

21 i

J.P. Kelly et aL / Controlh'd nntnding probh,m

/

(O,1)j,,~

/

In ,',

~

/"- -~

\

,.-R\ \'/:

/ 3 1"

(4,4)

/

¢

J

Figure 3. Network representation of two-dimensional conlrolled rounding

To construct the network flow model reflecting these constraints, introduce a set I of nz = 3 supply nodes and a set J of n = 4 demand nodes. Add an arc from i to j for every (i, j) pair in I x J unless x,~ is constrained to be zero. The l'Iow on arc (i, j), which is constrained to lie between 0 and i, corresponds to the decision • ) variahlc x,s. To represent the row sum cttl-

straints (x~.), connect a super-source R to each i ~ i with lower and upper limits R-(ai.)/b and R+(a,.)/b on the arc flow. In a similar fashion, connect nodes in J to a super-sink C using arcs with upper and lower bounds R+(a.ri)/b and R(a.~)/b. Finally, introduce a "return arc' with flow limits as set by (2.4)(divided by b) from C to R to represent the flow in circulation form. Fig-

.......

2

2

" ~ R X(1) .......

./~"

3~

(1)

~,

.........(1V

(4)

t,

Figure 4. A network solution to the example problem

/~-

212

J.P. Kelly et al. / Controlled rounding problem

ure 3 shows the resulting network and specifies the upper and lower bound constraints on the flow in the ordered pair placed next to each arc. Any extreme flow, such as the one shown in Figure 4, is a feasible solution to the rounding problem. The network representation allows two-dimensional controlled rounding problems to be solved effectively (in polynomial time) using standard network flow codes. The simplicity of this network flow representation motivated researchers to investigate the possibility of a similar representation for the three-dimensional problem. The results of these studies indicated that, although there are no pure network flow models for the three-dimensional problem, there are several equivalent model formulations which contain network constraints along with complicating linear side constraints. Thus, unlike two-dimensional rounding, three.dimensional controlled rounding does not easily lend itself to a network representation on which to base an effective solution technique. As the problem has been shown to be NP-complete, it does not possess a polynomial ~lution technique unless P = NP (Garey and Johnston, 1979). The complexity of the three-all. mensmnal case ss dsscussed in the next sectmn.

2. Complexity of the controlled rounding problem In this section, we discuss complexity issues associated with the three-dimensional controlled rounding problem, Relaxations of the original zero-restricted problem are also introduced. It is usually assumed that all roundings use an arbitrary base b ~ 3. in order to provide adequate confidentiality protection, data collection organizations generally use a rounding base greater than 2. For example, if base 2 is used, and in two different tables the same entry (representing the same data) rounds to 0 and a 2, then it can ~ concluded that the original entry was I. If a base greater than 2 is used, then some ambiguity always exists. Kelly, Assad and Golden (1990) show that the zero-restricted controlled rounding problem is NP-complete for any base b ~ 3. They prove this result by transforming a NP-complete variant of the three-dimensional matching problem to the

1 Levei

1

0

0

0 [ 1

0

-q o

Level

2

0

0

0

2

0

0

I 0

2

o

Figure 5. A table that has no zero-restricted controlled rounding (base 3)

zero-restricted controlled rounding problem. Pruhs (1989) has developed similar results that demonstrate the complexity of the three-dimensional controlled rounding problem. Specifically, Pruhs shows that the three-dimensional controlled rounding problem is NP-complete by developing a polynomial transformation from an edge coloring problem that is known to be NPcomplete.

Relaxations It can easily be shown that not every three-dimensional table has a zero-restricted controlled rounding (Cox, 1987). For example consider the table shown in Figure 5 The t a Ic has dimensions 3 × 3 × 2 and can be visualized by placing Level I directly above l,evel 2 (only internal entries are shown). If the table shown in Figure 5 has a base-3 zero-restricted controlled rounding, then a single 1 in Level l must round to a 3. However, this forces one of tile 2's in Level 2 to round to 0. This, in turn, requires that another of the l's in Level I also round to 3. If niore than one 1 rounds to 3. we will violate the face constraint assoclatcd with Level I. Theretb, c, the table has no base-3 zero-restricted controlled rounding. Since some tables do not ha,,e zero-restricted controlled rouadings, it is convenient to define several rehv~:ations of the original problem specified by (I,l)-(l,9). It is only desirable to relax the zero-restricted controlled rounding problem if such a solution does not exist. Relaxations are established by allowing entries which are multiples of the base to increase to the next multiple '

' b

"

'

'

'

J.P. Kelly et al. / Contro!h'd rounding probh,m

of the base. Aitl-ough, it is reasonable to allow multiples of the base to decrease, this has historically not beet~ considered by the US Bureau of the Census. A formulation and heuristic solution procedure for the problem variant in which multiples of the base can increase or decrease is given by Greenberg (1988). Entries and sums v,hich are not multiples of the base are governed by the original constraints (1.1)-(1.9). The controlled rounding problem has four major types of entries which may be candidates for relaxation. Internal entries of tables may be zero or nonzero multiples of the base. Likewise, sums such as a..~ or a~j. can also be zero or nonzero multiples of the base. In general, it is preferable to change nonzero multiples of the base rather than true zero entries. Furthermore, among multiples of the base, internal entries should be changed before sums. Four common relaxations and the affected constraints are described below in increasing order of relaxation. NonZero Internal Entry Relaxation (NZI3DCR) allows internal entries which are nonzero multiples of the base to increase. NonZero Entry and Sum Relaxation (NZS-3DCR) is ~ relaxation of NZI-3DCR in that nonzero sums which are multiples of the base may also increase. Zero Internal Entry Relaxation (ZI-3DCR) is a relaxation of NZS-3DCR in which zero internal entries arc permitted to increase. The most relaxed varmnt is Zero Sun Relaxation (ZS-3DCR) in which all internal entries and sums that are zero or nonzero multiples of the base can increase. The zero-restricted problem and the nature of its four relaxations are summarized in Table I. in practice, one would first seek a ZR-3DCR. If a specific problem did not have a zero-restricted solution, then the nonzero internal en-

213

tries which are multiples of the base would be permitted to increase (NZI-3DCR). If the problem still remained infeasible, then the sums which were nonzero multiples of the base would also be permitted to increase (NZS-3DCR), and so forth. If a problem fails to have a ZS-3DCR solution, then it is said to have no controlled rounding and some approximate solution which violates a small number of constraints is chosen as a rounding. This is an extremely rare occurrence and it is highly unlikely that any naturally generated table would fail to have a controlled rounding. However, three-dimensional tables that do not have controlled roundings have been constructed by Ernst (1989). These tables have over a thousand entries arranged in highly organized patterns, and thus, the probability of duplicating these tables from a random sampling is extremely small. Kelly, Assad and Golden (1990) show that the NZI-3DCR and NZS-3DCR relaxed controlled rounding problems are NP-complete for any base b >_ 3. The authors also present examples of tables which demonstrate the need for the various relaxations of the original zero-restricted controlled rounding problem for any base b >__2.

3. Soluthm methods Basically, two methods for solving the three-dimensional controlled rounding problem have been used and tested. The first method is a heuristic procedure that is based on a partial network representation of the problem. Minimal cost net° work flow problems are fornlulated which often (but not always) produce feasible roundings. In the second method, a search guided by linear programming is conducted. This procedure can

Table I Relaxations of zero-restrictedness Relaxation

Nonzero entries

Nonzero sums

Zero entries

Zero sums

ZR-3DCR NZI-3DCR NZS-3DCR ZI.3DCR ZS-3DCR

0 + + + +

0 0 + + +

l} 0 0 + +

0 0 {I 0 +

+ indicates that a multiple of the base can increase. 0 indicates that a multiple of the base must remain constant.

214

J.P. Kelly et al. / Controlled rounding problem

also incorporate pre-processor heuristics to speed its convergence.

Network-based approach George and Penny (1987) discuss their experience with the implementation of a controlled rounding solution procedure based on a network representation. They consider two-dimensional tables with additional subtotal constraints. The network used includes as many of the rounding constraints as possible, but cannot accommodate all of them. The authors claim that this partial representation is usually sufficient for generating feasible roundings. Furthermore, this approach can be extended to solve three-dimensional controlled rounding problems. Cox and George (1989) also discuss a network-based algorithm for finding controlled roundings of tables with subtotal constraints. Fagan, Greenberg and Hemmig (1988) also developed a heuristic procedure for finding controlled roundings of three-dimensional tables. Their procedure is based on solving a sequence o~ minimum cost network flow problems, each of which partially models the controlled rounding problem. If the heuristic is successful, then a controlled rounding can be constructed from the sequence of solutions to the individual network problems. This algorithm is rather complex and the reader is referred to the above-mentioned paper for further details,

Linear program.based approach The formulation of the rounding problem in three dimensions given by (1,1)-(I.9) ,:an easily be transformed into a 0 - I integer program if one divides by b (Kelly et al., 19.9{}).Consider the LP relaxation of this problem where the integrality constraints (1.9) are dropped. Clearly, an integral solution to the relaxed problem also solves the rounding problem. Moreover, since there is n~ objective function, only Phase ! of the simplex method must be executed. The result is then checked for integrality, If the LP solution fails to be integral, an appropriate enumerative search technique is applied to locate an integer solution or to prove that no such solution exists. Each variable in the controlled rounding problem is

constrained to lie between two consecutive integers and must therefore equal one of its two bounds at integrality. The search technique is a depth-first backtrack search over the binary tree resulting from setting individual variables at their lower or upper bounds. This procedure (ROUND & BACK) is detailed in Kelly et al. (1990). A highly desirable feature of this approach, which is absent from the network-based approaches discussed earlier, is that it either finds a solution to the rounding problem or proves that no such solution exists.

Heuristics The experiments in the paper by Kelly et ai. (1990) demonstrated that the ROUND & BACK procedure could effectively find controlled roundings. However, improvements in efficiency were still desirable. The original ROUND & BACK procedure formed a starting simplex basis with all of the decision variables (internal table entries) set to zero. As an alternative to this approach, the starting basis is modified by setting the nonbasic decision variables to either the upper bound of one or the lower bound of zero. This is accomplished through the assignment of values to basic slack variables ~,.,-ss,-,' ,,,.,u,,. ~.,,.~d.~ with the decision variables, if these assignmcnls are made in such a way as to satisl~, all of the constraints (i.e., if a controlled rounding solu,~ion is used to initialize the basis), then the lin~;ar program will solve immediately (in a single iteration). While a controlled rounding solution is not known initially, a heuristic can be used to obtain a partial solution to build the starting basis with. if the partial solution is close to an actual solution, then one would expect the number of simplex iterations required to determine a solution to be small. Three procedures called ROUND-ROUND & BACK, QUICK-ROUND & BACK, and ANNEAL-ROUND & BACK were designed to speed uT~ the p , ~ e s s of finding an initial solution to the linear program since this step required approximately 99% of the processing time required to determine a typical rounding. The ROUND heuristic: if each internal variable aok is rounded to the nearest multiple of the base, the resulting table represents an approximation to the controlled rounding of A. Deterministic rounding combined with the

J.P Kelly et al. / Controlled rounding problem

ROUND & BACK algorithm is called ROUND-ROUND & BACK (Kelly et al., 1990). The QUICK heuristic. QUICK is another simple heuristic solution procedure for constructing a good starting basis. The search p~ocedure equipped with this start-up heurfstic is called QUICK-ROUND & BACK (Kelly et al., 1990). It is based on a sequential assignment of variables to one of their bounds (lower or upper). Each assignment is checked against a relaxed version of the controlled ro,~nding constraints (1.2)-(1.8) and is accepted if these constraints are not violated. If all variables can be assigned in this way, the heuristic produces a controlled rounding and thereby solves the problem. Otherwise, some variable remain unassigned, as all allowable settings of these variables combined with previous settings are infeasible. In this ease, the heuristic fails to solve the problem. However, if it still succeeds in setting most variables in a 'reasonable' manner, then the modifications required to convert the heuristic (partial) solution into a controlled rounding tend to be minor. The ANNEAL heuristic. ANNEAL-ROUND & BACK described in Kelly, Golden and Assad (1990) is a solution procedure which combines simulated annealing with the search technique based on linear programming. A N N E A L ROUND & BACK was found computationaily to be the most effective of the three heuristics. We thereibrc describe it in more detail. Since we are searching for a controlled rounding which satisfies the constraints given by (1.1)(I.9), it is only natural to choose the number of constraints violated as the objective function. A value of zero for this objective therefore indicates a feasible rounding that solves the controlled rounding problem. Heuristic methods for solving the controlled rounding problem that seek to decrease the number of constraints violated monotonically often fail, since local minima exist which do not satisfy all of the constraints. Simulated annealing provides a mechanism for accepting increases i,, the objective function in a systematic fashion to allow the heuristic search to move away from local minima. The reader is referred to the comprehensive bibliography by Collins, Eglese and Golden (1988) for further references on the simulated annealing process. Simulated annealing uses an annealing 'sched-

215

ule' or a sequence of temperatures {Ty}. When evaluating states (trial solutions) at temperature Tj, the procedure evaluates the change in the objective function 6C (change in the number of constraints violated) associated with moving to a new state. A change in the state is accomplished by altering the assignment of one of the internal variables Yiy~from its current assignment (0 or b) to its other bound. This move is always made if 6C < 0, but only with probability exp(-~C/Ty) if 8C > 0. The reader can easily verify that, for the controlled rounding problem, 6C is always an integer between - 7 and 7, inclusive. This limited range of values for ~C and the use of resourceful data structures allow for the effective tailoring of simulated annealing to the rounding problem. The algorithm either locates a controlled rounding or outputs a table that constitutes a good approximation to the solution. In the later case, the output from the simulated annealing procedure can be used to initialize the linear programming search procedure ROUND & BACK. This composite approach, where simulated annealing is used as a heuristic 'warm-start' procedure for the ROUND & BACK search is called ANNEAL-ROUND & BACK (Kelly, Golden and Assad, 1990).

4. Comparison of solution methods

Network .based The network-based approaches of George and Penny (1987) and Fagan, Greenberg and Hemmig (1988) are actually heuristics which do not guarantee optimality. For this reason, it is unfair, if not impossible, to compare their performance to the performance of the LP-based approaches. (Nonetheless, results indicate that ANNEALROUND & BACK is substantially more efficient.) A comparison between the network-based approaches is also difficult, since one must ascertain the cost of a suboptimal solution. George and Penny (1987) present sample results which indicate that their procedure can usually determine roundings efficiently for two.dimensional tables with subtotal constraints. Fagan, Greenberg and Hemmig (1988) present summary results which also show that their procedure is efficient ~nd finds zero-restricted controlled roundings in

J.l~ Kelly et al. / Controlled rounding problem

216

99% of the problems tested. The reader is referred to the papers where these methods are developed to further evaluate their performance.

LP-based In this section, we compare R O U N D ROUND & BACK, Q U I C K - R O U N D & BACK, and A N N E A L - R O U N D & BACK (Kelly, Golden and Assad, 1990). The procedures all share the LP backtrack search ROUND & BACK, but differ in the heuristic used to obtain the initial table that feeds into ROUND & BACK. ROUND=ROUND & BACK uses deterministic rounding, Q U I C K - R O U N D & BACK systematically fixes variables while maintaining feasibility with respect to a relaxed set of constraints, and ANNEAL-ROUND & BACK uses a simulated annealing heuristic to search for a rounding. ROUND=ROUND & BACK is used as a benchmark procedure to which the other two algorithms can be compared. Each of these procedures tries to identify a feasible rounding using its respective heuristic, if the heuristic fails, its output is used to initialize the linear program simplex basis and the ROUND & BACK binary search procedure takes over to complete the solution procedure. Numerical results obtained from processing 32,500 randomly generated tables and 292 real-life tables are shown in Figure 6. Figure 6 shows the relative computation times associated with the three algorithms, The data is averaged over all tables of the same type: 60 cell tables (201)00

tables); cubical tables (12500) tables ranging from 4 x 4 x 4 to 8 x 8 x 8; real data (obtained from the US Bureau of the Census). The generated tables had sparsities that ranged from 10% to 90% with nonzero entries taken from a uniform distribution. The execution times (IBM PS/2 Model 50, 80286 - 10 MHz) ranged from a fraction of a second for the small tables to approximately 100 seconds for the 8 x 8 x 8 tables. If the tables exceed 10 x 10 x 10, then an alternative heuristic method based on tabu search by Kelly, Golden and Assad (1991) is recommended. For a breakdown of the specific results for each type and size of table, the reader is referred to Kelly et al. (1990) and Kelly, Golden and Assad (1990). As an exact solution procedure, the A N N E A L ROUND & BACK algorithm is significantly faster than any previously known solution procedure for this class of problems. Moreover, compared to the other heuristics we tested, simulated annealing is significantly more successful in finding controlled rounding solutions on a stand-alone basis. Naturally, the LP-based search becomes unnecessary if the heuristic identifies a feasible solution.

5. Summary The literature on two-dimensional controlled

round;,~g problems shows that they can be solved effectivelywith. tandard network flow algorithms. For the three-dimensional problem, which is NPcomplete, network-based procedures can be used

10o

R E k A T

ao ~o

I

oo

V E

so

T

oo

40

I

~o

M E

=o Io

60 Cells []

I

m

0

ROUND-ROUND&BACK

Cubic Tables

QUICK-ROUND&BACK

[]

Real Data ANNEAL=ROUND&BACK

Figure 6, Relative computation times for search algorithms

J.P. Kelly et al. / Controlh,d rounding problem

to search for roundings heuristically, However, these approaches are not optimal and may fail to find solutions when they exist. The problem ca always t,~ solved exactly using an LP-based search procedure. The efficiency of the LP-based search can be increased significantly by using preprocessor heuristics to provide the search with an initial starting point. Of the heuristics tested, one based on simulated annealing seems to accomplish the best speed-up. This makes the LP-based binary search procedure with the simulated annealing heuristic the most effective approach developed to date for obtaining controlled roundings. Currently, we are developing techniques for obtaining controlled roundings that minimize a norm between the original and rounded tables (Kelly, Golden and Assad, 19911. This problem is considerably more difficult than the feasibility problem and poses a significant research challenge. References Baranyai, Z. (1975), On the factorization of the complete uniform hypergraph, in: A. Itajnal, R. Rado and V.T. Sos (eds.) bafinite and Finite Sets, VoL I, North-Holhmd, Amsterdam, 91~98. Baranyai, Z. (1979), "The edge=coloring of complete hypergraphs I", Journal o]"Combinatorial Theory B 26, 276=294. Causey, B.D. (1979), "Approaches to statistical disclosure", in: I¥ocee¢hng.~ o]' the S~'ial Statisti~'s Se~'ti,u ~ American Statistical Associatiotl , 3lll~-3(tt). Ct~llins, N,Eo, Eglese, R,W, and Golden, B.L. (1~188), "Straw lated annealing ~ An annotated bibliography", American Jounaal ~[ Mathemath'al and Ma,aw,ment Sciem,e~' 8, 2119~ 3117, Cox, L.H., and Ernst, L.R. (1982), "Controlled nmnding", INFOR 2(1, 423-432. Cox, L.H. (1987), "A constructive procedure for unbiased controlled rounding", Journal of the American Statistical Association 82, 520-524. Cox, L.H., and George, J.A. (1989), "Controlled rounding for tables with subtotals", Annals of Operations Research 20, 141-157. Dalenius, T. (19811, "A simple procedure for conlrolled rounding", Statistisk Tidskrifl 3, 2112-208.

217

Ernst, L. (I 989), "Further applications of linear programming to sampling problems", in: Pro~'eedings of the Surt,ey Research Methods Section - American Statistical Association, 625-630. Fellegi, I.P. (19721, "'On the question of statistical confidentiality", Journal of the American Statistical Association 67, 7-18. Fellegi, I.P.(1975), "'Controlled random rounding", Surl,ey Methodology, Statistics Canada 1, 123- i 33. Fagan, J.T., Greenberg, B.V., and Hemmig, R.J. (19881, "Controlled rounding of three-dimensional tables", Statistical Research Division Report Series Census/SRD/RR88/02, Bureau of the Census. Garey, R.G., and Johnson, D.S. (19791, Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman, New York. George, J.A., and Penny, R.N. (1987), "Initial experience in implementing controlled rounding for confidentiality control", Proceedings of Bureau of the Census ARC 3, 253-262. Greenberg, B.V. (1988a), '*An alternative formulation of controlled rounding", Statistical Research Division Report Series Census/SRD/RR-88/0I, Bureau of the Census. Greenberg, B.V. (1988b), Private Communication, US Bureau of the Census, Statistical Research Division, Suitland, MD. Gusfieid, D. (1988), "A graph theoretic approach to statistical data security", SlAM Journal wz Computing 17, 552-571. Gusfield, D. (19901, "A little knowledge goes a long way: Faster detection of compromised data in 2-D tables", in: IEEE Symposium on Security and Pri~'acy, 86°°93. Kelly, J.P., Assad, A.A., and Golden, B.L. (1990), "The controlled rounding problem: Relaxations and complexity is. sues", OR Spektrum 12, 129~138. Kelly, J.P., Golden, B.L., and Assad, A.A. (19~J0), "Using simulated annealing to solve controlled rounding probo Ictus", ORSA .hmrnal on ~mlputiug 2, 174= 185. Kelly, J.P., Golden. B.L,, Assad, A.A., and Baker, E.K, (19911), "Controlled rounding of t~lbular data", ()in'rations Reo sear~!h 38, 7~1 ~773, Kelly, J.P.. Golden, B.L., and Assad, A.A. ( 1991 ). "Large-scale controlled rounding using tabu search with strategic oscilo lalion", Working Paper, College of Business and Adminiso tration, Uulv,:l.~ity of Colorado at Boulder, Boulder, CO. Nargundkar, M.S., and Saveland, W. (1972), "Random round° ing: A means of preventing disclosure of inlbrmation about respondents in aggregate data", in: Proceedings ~f the Scu'ial Statistics Section ~ American Statistical Association, 382-385. Pruhs, K. (19891, "The computational complexity of some rounding and survey overlap problems", in: Pr~n~eedingsc~f the Surt,ey Research Methods Section ,~- American Statistio cal Association. 747-752.

The controlled rounding problem: Complexity and computational experience

The controlled rounding problem: Complexity and computational experience

Recommend Documents