The risk to breach vote privacy by unanimous voting

The risk to breach vote privacy by unanimous voting

Journal of Information Security and Applications 35 (2017) 168–174 Contents lists available at ScienceDirect Journal of Information Security and App...

381KB Sizes 1 Downloads 52 Views

Journal of Information Security and Applications 35 (2017) 168–174

Contents lists available at ScienceDirect

Journal of Information Security and Applications journal homepage: www.elsevier.com/locate/jisa

The risk to breach vote privacy by unanimous voting Peter Ullrich Universität Koblenz-Landau, Fachbereich 3, Mathematisches Institut, Universitätsstraße 1, 56070 Koblenz, Germany

a r t i c l e

i n f o

Article history:

Keywords: Breach of vote privacy Unanimous voting Small subsets of voters

a b s t r a c t The paper studies the risk that all members of a set of voters give their votes unanimously and thereby breach the privacy of the voting procedure. This problem becomes relevant in the situation that the voting behavior of a small (sub)set of voters can be identified by the way they transmit their votes, e. g., when at least two possible ways to give votes (like voting with ballot boxes, postal voting, and electronic voting) are admitted in theory but one of them is used by only a small minority of voters in practice. For the situation of a simple alternative between “yes” and “no” it turns out that as long as the probability of approval lies between 25% and 75% the probability of a breach of vote privacy by unanimous voting is smaller than 1% if there are at least 17 voters and even smaller than 0.1% if there are at least 25 voters. If, however, the rate of approval or disapproval increases, even to values already observed in reality, then the probability of such a breach of vote privacy can no longer be neglected. And even small values for the probability of a breach of vote privacy sum up when several thousands of these situations appear in parallel. Furthermore, if there is a three valued situation “yes” – “no” – “abstention” present, then, depending on the concept of vote privacy, a breach of it becomes considerably more probable even if the probability of approval remains within the boundaries mentioned above. © 2017 Elsevier Ltd. All rights reserved.

1. Introduction One of the basic conditions a voting process has to fulfil is to guarantee privacy for all voters in the sense that for no voter one can find out the individual voting behavior. Contrary to this statement on the principle one has to take into account the possibility of unanimous results which can happen even if the voters give their votes independently of each other. In these cases it is revealed either that each voter has given a “yes” to the proposal or that each voter has given a “no”. Both cases have the same result that the privacy of the voting procedure is breached. For large sets of voters, of course, it is very improbable that all voters give a “yes” or all a “no” to the proposal. And one tends to accept such incidents if there are only very few voters who also know each other well enough so that the loss of vote privacy is more or less fictitious. The phenomenon can no longer be neglected, however, if the two situations described above mix in the following way: Suppose that there are altogether enough voters that an unanimous voting of the complete set is improbable enough. But within the complete set of voters let there be a small subset such that the result of the

E-mail address: [email protected] http://dx.doi.org/10.1016/j.jisa.2017.07.001 2214-2126/© 2017 Elsevier Ltd. All rights reserved.

voting of this subset can be identified because of whatsoever reason. If this subset is small enough then the risk has to be considered that the members of this small subset vote unanimously so that also the individual voting behavior of all of its members becomes known and the privacy of the voting is breached, hence. This problem is not new in principle: It becomes present as soon as there are at least two ways to transmit votes like using ballot boxes or, alternatively, postal voting. In Germany, for example, legal texts concerning election procedures already explicitly mention this problem: The regulation for federal elections in Germany (“Bundeswahlordnung”) generally prescribes that the voting precincts (“Wahlbezirke”), i. .e., the subsets of voters for which partial results of the election are determined, have to encompass enough voters that it is impossible to find out how single voters have voted [1, Section 12, (2), sentence 3]. This prescription is made concrete for the case of postal voting by demanding that such a subset should consist of at least 50 voters [1, Section 7, No. 1]. For the case of voting by use of ballot boxes, however, smaller numbers are accepted in legal practice. The most extreme example is the “Hallig Gröde”, a small island in the North Sea, where only 9 voters live. (For the last federal elections in 2013 all of them chose postal voting in order to save the privacy of their votes. For the state elections of 2017, however, they used again their ballot box, which made them an object of public interest since their voting behaviour differed considerably from the

P. Ullrich / Journal of Information Security and Applications 35 (2017) 168–174

average: Three of the nine gave their votes to a party that got only 1.2% of all of the votes in the state of Schleswig-Holstein.) E-voting firstly gives a further way to transmit votes so that the number of possibilities is increased. But there is also a much more serious effect: In the case of voting with the use of ballot boxes vs. postal voting one can shape (not only the ballots themselves but) also the envelopes for the ballots in exactly the same manner for both ways of transmission so that one can mix them and by this make the votes indistinguishable. For votes transmitted electronically, however, the different media make it technically difficult to mix the votes given by the different ways of transmission and thereby hide the partials results for these ways. Furthermore, because of the advantages of e-voting for the voter its introduction to the election system may lead to a rapid decrease in the number of voters who use the other ways: In the 2012 elections of the German Informatics Society (“Gesellschaft für Informatik”, GI) for its council (“Präsidium”) and its board (“Vorstand”), for example, 2671 of the members used electronic voting, but only 42 postal voting, which is less than 2% of the voters altogether. In principle, however, the argument in the present article is not specific for e-voting or for any other way to transmit the votes. In particular, it does not depend on the concrete voting system and its implementation at all besides the only fact that there are at least two different ways to transmit the votes. Contrary to this, the paper [2] describes a possibility to breach the privacy of a particular paper ballot voting system by using its particularities. According to these results the vote of up to 96% of the voters can be correctly recovered. Contrary to this, the present paper is concerned with the breach of vote privacy for only small subsets of voters, namely those that transmitted their votes by the barely used way. But the only information that is used for this breach is that the voter under consideration has used this way of transmission, not any other information on the voting system or the particular act of voting, e.g., the time when it took place. For sake of simplicity the paper starts by examining the situation that there is only one voting decision to be made. At first, only a dichotomic voting is considered, i. e., that only the two alternatives “yes” and “no” can be realized (Section 3.1). The consequences of the theoretical result are evaluated on the basis of emprical data gained from elections of several scientific societies (Section 3.2). Then the situation of trichotomic voting is discussed in a similar way, i. e., that besides approval and disproval there is the possibility of abstention. It turns out that this makes necessary a discussion of what “keeping of vote privacy” should mean in this context (Section 4.1). Also a short generalization of this discussion is given for the case that three or more alternatives are presented to the voter (Section 5). In the sequel, these results are generalized to the case of several voting decisions at a time, both if these are independent (Section 6.1) and dependent of one another (Section 6.2). Furthermore, an application is given for the situation of a voting on the scale of a state which is divided into many voting precincts. The paper closes with a discussion of the conclusions and the possibilities to reduce the risk to breach vote privacy in the way described above.

2. Notation and assumptions

169

This is the case, for example, when the small group of voters on the average has the same voting behavior as the group of all voters and when one can use the outcomes of previous voting procedures as a predictor for the voting under consideration. • All voters, in particular those from the small subset, give their votes independently from each other. This is plausible since the results of the voting process will only become known after the possibility to give votes has ended. In Section 6, however, also the situation is studied that several voting procedures take place at a time or that the whole community of voters is divided into several precincts where the probabilities are not necessarily equal. Throughout this paper the following symbols will be used n for the number of members of the small set of voters (which is supposed to be strictly positive), p for the probability that a vote is “yes” (or something similar definitely positive), in the dichotomic case: q = 1 − p for the probability that a vote is not “yes”, in the trichotomic case: r for the probability that a vote is “no” (or something similar definitely negative), in the trichotomic case: s = 1 − ( p + r ) for the probability that a vote is “neither – nor”, and P for the probability that a breach of vote privacy takes place by an unanimous voting of the n voters. In the case of several decisions their respective probabilities will be indicated by indices. 3. One single dichotomic vote 3.1. Determination of the probability At first, consider the following situation: Each voter is confronted with only one voting decision and has exactly two alternatives: to say “yes” or to do the contrary, so to speak, to give a “non-yes” regardless whether this means that (s)he simply does not say “yes” or that (s)he explicitly says “no” (where in the latter situation (s)he does not have the possibility of abstention). Let p denote the probability that a voter says “yes” and q = 1 − p the probability of the contrary. Since it has been assumed that all members of the subset of voters under consideration give their votes independently, the law of multiplication gives that the probability that all n voters unanimously give a “yes” equals

pn whereas the probability that all n voters unanimously give a “nonyes” equals

qn . Since the two possibilities of unanimous voting described above are mutually exclusive, the resulting total probability of a breach of vote privacy by unanimous voting equals

P = pn + qn For the discussion two assumptions are made: • There exists a way to find out the probabilities with which the small set of voters under consideration takes the relevant actions. At least, one should be able to give boundaries for these probabilities.

where p and q underly the condition p + q = 1. Therefore as a function of p alone one has

P = P ( p) = pn + (1 − p)n . This expression for P has the following properties:

170

P. Ullrich / Journal of Information Security and Applications 35 (2017) 168–174

• A proposal with probability p bears the same risk of a breach of vote privacy as a proposal with the complementay probability q = 1 − p. For example, a proposal to which 85% of the voters give a “yes” is as dangerous with respect to a breach of vote privacy as a proposal to which only 15% of the voters give a “yes”. • For fixed n the function

Table 1 Exact values for p∗ = 75% and p∗ = 90% in case of a dichotomic vote.

p → P ( p) = pn + (1 − p)n is strictly monotonically decreasing for p  50% and strictly monotonically increasing for p  50%. Therefore it attains its minimum for p = 50% which therefore has the value

P (50% ) = (50% )n + (50% )n = 2 · (1/2 )n = 1/2n−1 . ∗

Taking the above remarks together leads to the conclusion: Let p∗  50% be fixed. Then not only

P ( p∗ ) = P ( q∗ ) holds for q∗ := 1 − p∗  50% but this value also is an upper bound for P(p) for any value p with q∗  p  p∗ (and also for P(q) for any value q with q∗ q  p∗ ). 3.2. Numerical values In the case of a proposal that is unreasonable according to every voter’s opinion ( p = 0%, q = 100%) or in the case of a candidate who is everybody’s darling ( p = 100%, q = 0%) one does not need the formula deduced above in order to realize that an unanimous voting will surely take place. Even if one excludes these extreme cases, for n fixed one can cause the probability P of a breach of vote privacy to become arbitrarily close to 100% by taking p very near (but not equal) to 0% or 100%. (If  one wants P   P0 with P0 < 1 then one simply has to choose p  n P0 or p  1 − n P0 .) Therefore, in order to get a feeling for the largeness of the risk of a breach of vote privacy one has to fix a priori-bounds for p and q. Taking Remark ∗ from Section 3.1 into respect this will be done by choosing a p∗  50% and condering all p (and q = 1 − p) that lie between q∗ = 1 − p∗  50% and p∗ so that the domain for p and q lies symmetrically with respect to 50%. In the sequel the following two situations will be studied: a. Let p∗ := 75%, i. e., consider the case that both p and q lie between 25% and 75%. One can argue for this choice of p∗ in the following ways: • It lies half way between the mean value 50% and the extremal value 100%. • The motivation for the present study comes from the election for the council of the GI. And the last published numbers of given valid votes and the number of voters lead to probabilities that lie between 30.657% and 69.343% and therefore in the domain under consideration. b. Let p∗ := 90%, i. e., consider the case that both p and q lie between 10% and 90%. At first sight, this may seem to be an extreme value. But the election for the council of the German Mathematical Society (“Deutsche Mathematiker-Vereinigung”, DMV) of the same year, 2010, shows that in an election for a scientific society even two persons may get a percentage of more than 90% of the votes, namely 91.10% and 92.98% [3]. For these two values of p∗ one gets Table 1 for the probability P of a breach of vote privacy dependent on the number n of the voters in the small set. By Remark ∗ at the end of Section 3.1 the

n

Case a.: p∗ = 75%

n

Case b.: p∗ = 90%

1 2  4  8 9  16 17  24 25  32 33 

10 0.0 0 0% 62.500%  32.031%  10.013% 7.509%  1.002% 0.752%  0.100% 0.075%  0.010% 0.008% 

1 2  10  21 22  43 44  65 66  87 88 

10 0.0 0 0% 82.0 0 0%  34.868%  10.942% 9.848%  1.078% 0.970%  0.106% 0.096%  0.010% 0.009% 

values given are upper bounds for the value of P(p) if p lies between 25% and 75% or 10% and 90%, respectively. For sake of brevity (almost) only those n are listed where the number of non-zero decimal places changes. In particular, this table implies: a. As long as one can assume that p lies between 25% and 75% then the probability of a breach of vote privacy by unanimous voting is smaller than 1% if there are at least 17 voters, and it smaller than 0.1% if there are at least 25 voters. b. If, however, the case p∗ = 90% has to be taken into account – think of a strong candidate for the presidentship of a scientific society – the small subset of voters must contain at least 44 or 66 members, respectively, to keep the probability below the bounds mentioned above. 3.3. A rule of thumb It is a simple task for a spreadsheet calculation program to set up a similar list for another value of p∗ if this is necessary in an application. But there is also a simple rule of thumb for the number of voters needed in order to keep the risk of a breach of vote privacy by unanimous voting below given bounds: Since the case that p lies between 25% and 75% has been treated above, one may restrict to cases where p∗ is relatively close to 100%. To be more concrete, suppose that p∗ is of the form

1−

1 N−1 = N N

for a natural number N. (The instances p∗ = 75% and p∗ = 90% discussed above are special cases of this situation with N = 4 and N = 10, respectively.) This means that on the average N − 1 out of N voters give a “yes” to the proposal whereas 1 out of N voters gives a “non-yes”. From this one can, of course, not infer that if there are exactly N members in the small set then 1 of them will surely give a “nonyes” while the others give a “yes” so that a breach of vote privacy cannot take place. The value



pN∗ = 1 −

1 N

N

of the probability that an unanimous “yes” from the N voters takes place, however, converges for N arbitrarily large to the constant

1/e = 0.3678794 . . .

P. Ullrich / Journal of Information Security and Applications 35 (2017) 168–174

as already Leonhard Euler (1707–1783) has remarked [4, Section 125]. For the present purposes, where the case p∗ = 75%, i. e., N = 4, has already been studied, one can assume N  5 where one has the estimates

0.32768 =

1024  pN∗  1/e  0.36788. 3125

1 Since 0  q∗ = 1 − p∗ = N1  15 one has qN ∗  55 = 0.0 0 032. So for the probability P that a breach of vote privacy takes place in the case of N voters who give a “yes” with a probability of 1 − N1 one has

32%  P  37% and therefore roughly

P ≈ 1/3. By analogous considerations one gets • that in the case of 2N voters giving “yes” with a probability of 1 − N1 the probability for a breach of vote privacy lies between 10% and 14%, • that in the case of 3N voters the probability lies between 3% and 5% and • that in the case of 4N voters the probability lies between 1% and 2%. So if p∗ has the form

p∗ = 1 −

1 N

with N  5 then 3N voters suffice to get a probability of a breach of vote privacy that is smaller than 5% but 4N voters are not enough to get a probability that is smaller than 1%. 4. One single trichotomic vote Now the voting situation is changed: Again, each voter is confronted with only one voting decision. But (s)he now has three alternatives: (S)He can say “yes”, (s)he can say “no”, and (s)he can abstent (regardless whether there is a formal category “abstention” or only the possibility to say neither “yes” nor “no”). 4.1. Discussion of the concept of vote privacy for the trichotomic situation In a dichotomic situation it is clear what “vote privacy” means for a specific voter who has taken part in the voting process: It must be impossible to find out whether (s)he has given a “yes” or a “non-yes”. The minimal way to transfer this to the trichotomic situation described above is the condition: 1. For each voter it must be impossible to find out whether (s)he has given a “yes” or a “no”. But abstention also is a possible voting behavior that is equivalent to the two possibilities mentioned above. Therefore the transfer could also read as follows: 2. For each voter it must be impossible to find out which vote (s)he has given regardless whether this was a “yes” or a “no” or an abstention (in the sense of giving neither a “yes” or a “no”). Now suppose that in a voting process a proposal was passed that some time later turns out to be a bad idea. Then no voter

171

Table 2 Alternative definitions of “vote privacy”. action of the voter \ number of the definition

1.

2.

3.

4.

has has has has has has

X X

X X X

X X X X X

X X X X X X

given a “yes” given a “no” abstended given a “yes” or abstended, i. e., not voted “no” given a “no” or abstended, i. e., not voted “yes” given a “yes” or a “no”, i. .e., not abstended

would like to be accused of having done nothing against the proposal, i. e., of having not voted with “no” regardless whether (s)he has given a “yes” or has abstained. Also the contrary is possible: Suppose that a member of a minority (of whatsoever kind) is a candidate for a council – and does not receive the necessary number of votes. Then no voter would like to be accused of having done nothing for the candidate (and thereby endangered an adequate representation in the council), i. e., of having not voted with “yes” regardless whether (s)he has given a “no” or has abstained. These considerations imply that a further possibility for a transfer of the concept of vote privacy reads as follows: 3. For each voter it must be impossible to find out • whether (s)he has either voted “yes” or chosen one of the alternatives “no” and “abstention”, resp., • whether (s)he has voted not “yes”, i. e., has voted “no” or abstained, • (s)he has either voted “no” or chosen one of the alternatives “yes” and “abstention”, resp., • whether (s)he has voted not “no” i. e. has voted “yes” or abstained, and also • whether (s)he has abstained. With respect to abstention the above list is asymmetrical and therefore one may think of adding the condition that for each voter it must be impossible to find out whether (s)he has not abstained (i. e., has voted “yes” or “no”) even if one could argue that this information is not as sensitive as the information of having given a “yes” or of having given a “no”. Now, this enlarged list is equivalent to a very simple description of a concept of vote privacy: 4. For each voter one can only find out that (s)he has participated in the voting process (and therefore is not allowed to vote a second time) by giving “yes”, “no” or abstention but no information whether (s)he has or has not chosen a specific one of these three alternatives. This definition intends that only that amount of information is stored which is necessary for a correct voting procedure in principle. This sounds plausible and is along the spirit of the section on vote privacy in [5]. Furthermore, the paper [2] proves how additional informations like the time when the act of voting took place can be used for an attack on the vote privacy. Therefore, the reader may wonder why also other alternatives for the concept of vote privacy have been discussed before and this also in considerable length: The point is that the risk of a breach of vote privacy as defined in 4. turns out to be rather large even in the case p∗ = 75% which seemed harmless up to now, cf. Section 4.3. For the convenience of the reader the four definitions are illustrated by Table 2 where an entry “X” indicates that the respective definition regards it a breach of vote privacy to get the information that the voter has undertaken the respective action:

172

P. Ullrich / Journal of Information Security and Applications 35 (2017) 168–174

4.2. Determination of the probability Let again p denote the probability that a voter says “yes”. Furthermore, let r denote the probability that (s)he says “no” and let s denote the probability that (s)he abstains. Then p + r + s = 1 obviously holds. Again, there is a subset of n voters whose voting behavior as a whole can be determined. Determination for definition 1. of vote privacy If vote privacy is defined in the way that it must be impossible to find out whether the voter under consideration has given a “yes” or a “no” then a breach of vote privacy takes place if and only if all of the n members of the small subset of voters give a “yes” or all of the n members give a “no”. As in Section 3.1 one gets in this situation that the probability of an unanimous voting and therefore a breach of vote privacy equals

pn + r n . Determination for definition 2. of vote privacy Now also the case of identifying abstention has to be taken into consideration. Since the probability that all n voters abstain equals sn and since this event is mutually exclusive with the two events that all give a “yes” or all give a “no” one analogously gets the probability for a breach of vote privacy as

pn + r n + sn . Determination for definition 3. of vote privacy According to the list used for this definition the following events will lead to a breach of vote privacy: i. ii. iii. iv. v.

All All All All All

n n n n n

members members members members members

of of of of of

the the the the the

small small small small small

subset subset subset subset subset

vote “yes”. vote “no” or abstain. vote “no”. vote “yes” or abstain. abstain.

Obviously, event i. is a special case of iv., iii. is a special case of ii., and v. is a special case both of ii. and iv., to be precise, it equals the event that both ii. and iv. take place. Therefore, in order to calculate the probability that a breach of vote privacy takes place in the sense of the present definition one can at first concentrate on the events ii. and iv. and ignore the events iii. and i. instead, respectively. But since the events ii. and iv. are not mutually exclusive one cannot simply add the probabilities for them. Instead one has to use the principle of inclusion and exclusion (aka: the sieve formula, cf. [6, p. 96]) and subtract the probability that both ii. and iv. take place from the sum, which, by the above, is the probability of the event v. Event ii. takes place with the probability (r + s )n , event iv. with ( p + s )n , and event v. with sn . So one gets

( p + s )n + (t + s )n − sn as the probability that vote privacy is breached in the sense of definition 3. Determination for definition 4. of vote privacy In comparison to definition 3. only one event has to be considered additionally, namely: vi. All n members of the small subset vote “yes” or “no”. This event has probability ( p + r )n . It is already covered by definition 3. exactly in those cases when all n members vote “yes” or all vote “no”, which takes place with the probability pn and rn , respectively.

Therefore, using the result from above and once again the principle of inclusion and exclusion one gets as the probability that vote privacy is breached in the sense of definition 4. the value

( p + s )n + (t + s )n − sn + ( p + r )n − ( pn + rn ) = ( p + r )n + ( p + s )n + (r + s )n − pn − r n − sn . 4.3. Numerical values 4.3.1. Definitions 1. and 2. Let a proposal be given for which in the dichotomic situation the probability for a vote “yes” is equal to p0 and the probability for a vote “non-yes” is equal to q0 = 1 − p0 , hence. If one now creates the possibility of abstention then the probability p to vote “yes” in this trichotomic situation will not become larger than p0 . Suppose that p  50% holds. For the sum r + s of the probability r to vote “no” and the probability s to abstain in this situation one has r + s = 1 − p. By Section 4.2 the probability for a breach of vote privacy in the sense of definition 2. equals

pn + r n + sn . From the binomial formula and the fact that r and s are nonnegative numbers one concludes that r n + sn  (r + s )n = (1 − p)n so that the probability in the trichotomic case is at most pn + (1 − p)n . Since p  p0 holds and p  50% is assumed, this value is smaller or equal than pn0 + (1 − p0 )n which is the probability for a breach of vote privacy in the original dichotomic situation. Since by obvious reasons the probability for a breach of vote privacy in the sense of definition 1. is not larger than the one in case of definition 2., one has the result that under the assumption p  50% for these two definitions of vote privacy the risk in the trichotomic situation is not larger as in the dichotomic one. 4.3.2. Definitions 3. and 4. For the other two definitions, in particular definition 4., however, the trichotomic situation may bear a considerably larger risk due to the appearance of terms like ( p + r )n and ( p + s )n in the expression for the probability for a breach of vote privacy: Let the probability p for giving a vote “yes” in the trichotomic case been given. Then, as r + s = 1 − p and r and s are positive, one of the values r and s must be greater than or equal to (1 − p)/2 so that the sum of this value and p is greater than or equal to (1 + p)/2. By Section 4.2 the nth power of this value gives a lower bound for the probability of a breach of vote privacy in the sense of definition 4. For the values p∗ = 75% and p∗ = 90% studied in Section 3.2 values of these bounds are given in Table 3. (In the present situation

Table 3 Lower bounds for p∗ = 75% and p∗ = 90% in case of a trichotomic vote using definition 4. n

Case a.: p∗ = 75%

n

Case b.: p∗ = 90%

1  17 18  34 35  51 52  69 70 

87.500%  10.331% 9.040%  1.067% 0.934%  0.110% 0.096%  0.010% 0.009% 

1  44 45  89 90  134 135  180 181 

95.0 0 0%  10.467% 9.944%  1.041% 0.989%  0.104% 0.098%  0.010% 0.009% 

P. Ullrich / Journal of Information Security and Applications 35 (2017) 168–174

173

lower bounds suffice in order to clarify the problem. Therefore, in the above argument one can afford to disregard the risk that the breach of vote privacy is due to votes in the third category. This reflects in the fact that the entries for n = 1 are smaller than 100%.) Comparing this table with Table 1 shows that if in the dichotomic situation the same probability p for a “yes” holds as in the trichotomic situation then the trichotomic situations needs numbers n of voters which are at least twice as large as in the dichotomic case in order to reduce the risk of a breach of vote privacy to the same level (Table 3).

equals the product of the Qi . Therefore, for the probability P that a breach of vote privacy takes place at all one gets the value

5. Generalization to more alternatives

On the one hand, for any i = 1, . . . , m the probability P that a breach of vote privacy takes place for any of the votes is larger than the probability Pi that a breach of privacy takes place for the ith vote. This implies

The above discussions are concerned with voting situations with only two alternatives (maybe, plus abstention): for or against a proposal or a candidate, for candidate A or for candidate B, .... In political elections, however, the number of possible alternatives often is considerably larger, for example with parties A, B, C, ... standing for election. The increase of the number of possibilities will in general reduce the probability of a specific party to get a vote. This argument already indicates that the situation is less dangerous than in the dichotomic situation discussed in Section 3. More detailed, one can argue as follows: Make the (rather plausible) assumption that for each of the several parties under consideration the probability of getting a vote is less than 75% and that also the probability for abstention is less than 75%. Then – as a thought experiment! – one can group the parties and the possibility of abstention into two so to speak “coalitions” each of which has a probability between 25% and 75% of getting a vote. Each situation in which one of the parties gets all the votes from the small (sub)set of voters clearly is also a situation in which the “coalition” to which this party belongs gets all the votes. Therefore the probability of a breach of vote privacy by unanimous voting for one party is less or equal to the probability of a breach by unanimous voting for one “coalition”. The two coalitions, however, define the simple situation of dichotomic voting discussed in Section 3. By construction of the “coalitions” and the considerations in Section 3.2 one can use the values in the left half of Table 1 as upper bounds for the probability of a breach of vote privacy by giving all votes from the small (sub)set to just one party. 6. Several votes Up to now only the simple situation has been considered that one – relatively small – group of n voters had to make one (dichotomic or trichotomic) vote. Now we study composite situations: For example, the n voters are faced with m such voting situations simultaneously which are independent of one another (Section 6.1). Of course, there may also be dependence between the m voting possibilities, for example, when the voters are only allowed to give a number of “yes”votes that is smaller than the number m of candidates for a position, cf. Section 6.2. Furthermore, an example is considered where many of the simple voting situations take place at the same time (Section 6.3). In the sequel the situation is studied that m votes are given simultaneously. For i = 1, . . . , m let Pi denote the probability that the ith vote gives rise to a breach of vote privacy. Then Qi := 1 − Pi is the probability that no breach of privacy takes place for the ith vote. 6.1. Independent votes If the m votes are given independently of each other then the probability Q that no breach of vote privacy takes place at all

P =1−Q =1−

n  i=1

Qi = 1 −

n 

(1 − Pi ).

i−1

If one is not interested in an exact formula but satisfied with an estimate then one can use the following reasoning instead which is also valid for the case of 6.2. Dependent votes

max Pi  P.

1 i m

On the other hand, the probability for the union of two events is always smaller than or equal to the sum of the probabilities of the single events. This implies

P

m 

Pi .

i=1

One can improve the estimate

max Pi  P 

1 i m

m 

Pi

i=1

to an equation by use of the principle of inclusion and exclusion mentioned before. But in order to apply this formula one needs to know the probabilities of all double, triple, ... breaches of privacy which are difficult to access in the situation of not necessarily independent votes. Furthermore, in applications one often meets the case that – in a dichotomic situation – exactly one of the values pi or qi is larger than all other ones. This has the consequence that with n increasing the term Pi = pni + qni with this i dominates all others so that the difference between the maximum and the sum becomes arbitrarily small. Additionally, if there are more candidates than the number of votes “yes” that can be given, the values of the pi tend to be relatively small, say, smaller than p∗ = 75%. At any rate, one can make use of the fact that P  m · max Pi so 1im

that bounding the probabilities Pi for the single votes from above gives an upper bound for the probability P for the simultaneous votes even if this bound is worse by the factor m. 6.3. Application to a large scale situation The situation of a small subset of, say 9 (cf. Hallig Gröde) to 42 voters (cf. the 2012 elections of the GI) within a group of some hundreds or thousands of voters may not see impressive at first view. But if one looks at large scale elections on state level, these can be interpreted as a parallelization of such a situation, namely in the several voting precincts into which the state is divided. Since the procedure for the election of the president of the Austrian republic has attracted a lot of attention in 2016, its numerical data are used for an estimation of the problem: Austria is divided into about 13,0 0 0 voting precincts (“Wahlsprengel”) each of which encompasses about 500 voters. For each of these precints the result of the election is determined and published separately as far as the votes given by use of ballot boxes are concerned. The votes given by postal voting, however, are tallied on the level of the states forming the Austrian republic.

174

P. Ullrich / Journal of Information Security and Applications 35 (2017) 168–174

On the one hand, this can simply be done by having let them to be sent to one place in each of these states. On the other hand, there were specific problems with the postal votes in 2016. Therefore, as a thought experiment, one can imagine that somewhere in the future postal voting could be substituted by e-voting with the use of ballot boxes left as a second alternative. Suppose furthermore that each voting precinct has one ballot box which is tallied separately since there are obvious difficulties to collect the contents of the ballot boxes quickly enough on the level of a state. Assume furthermore, on the basis of the numerical data form the GI elections that about 97% of the voters change to e-voting, i. e., only 3% use the ballot boxes, which means about 15 voters in each voting precinct. Taking the data of the 2016 election as a basis, namely two almost equally strong candidates (at least of the level of the whole republic), one may have the impression that the probability of a breach of vote privacy on the level of one voting precinct is not too far away of the lower bound of

1/215−1 = 1/16 384 discussed in Section 3.1. This seem comfortable. But there are about 13 0 0 0 voting precincts altogether! Combining the formula of Section 6.1 with the rule of thumb in Section 3.3 shows that approximatively the probability increases linearly with the number of precincts thus giving a probability of about

13, 0 0 0/16, 384 ≈ 80% that a breach of vote privacy would take place in any one of the voting precincts! 7. Possibilities to reduce the risk to breach vote privacy At first sight, the scenario discussed in this paper may look somewhat harmless since no details of the voting procedure are involved and since no real attack on it takes place. But the fact that everything happens by pure chance and cannot be influenced by actions (or non-actions) during the voting process proper also makes it difficult to reduce the risk of a breach of vote privacy: Of course, it has been shown in Section 4.3 that it makes a difference whether the voting alternatives are dichotomic or trichotomic. But the reasoning also makes clear that the answer to the question whether it is better to allow abstention or not depends on the definition of “vote privacy” and should therefore be discussed among the voters before the regulations for the voting process are fixed. The parameters p – the probability for “yes” – and n – the numbers of voters within the small set – are even more problematic:

One cannot avoid making proposals that meet strong consent: The examples given in Section 3.2 came from the elections within learned societies, to be sure. But recently the candidate for the leadership of a large party in Germany received in fact 100% of the votes of the party delegates, even if this took place in an open vote. The only parameter left is the number n of the voters: One could be tempted, of course, to decree a minimum for this number. But the example of the Hallig Gröde from the Introduction – for which one cannot guarantee that the voters can leave it each day of the year in order to reach another ballot box – and the discussion in Section 6.3 show that this can lead to serious technical problems, in particular when the small set of voters uses ballot boxes. In this situation – which can come up in the very near future when e-voting becomes so popular that only very few voters go to the ballot boxes any more – there seem to be only two possibilities to cope with the risk of this kind of breach of vote privacy: Either one has to find a way to make it indistinguishable by which way the votes have been transmitted before tallying them. Or one has to design a way that on the one hand ensures that the voters cannot vote several times but that on the other hand can also not be used in order to identify the persons that have given their votes afterwards. Acknowledgements My thanks go to my colleagues Rüdiger Grimm, University Koblenz-Lan-dau, for his interest in the problem of the election of the council of the GI and Friedrich Pukelsheim, University Augsburg, for his information on legal issues. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. References [1] Bundeswahlordnung. e.g., https://www.bundeswahlleiter.de/dam/jcr/04f736b066e8- 4618- 9063- 88af47e83ce2/bundeswahlordnung.pdf. [2] Ashur T., Dunkelman O., Talmon N.. Breaching the privacy of Israel’s paper ballot voting system. preprint arXiv:1608.08020v1; 2016. [3] Ergebnis der Präsidiumswahlen. Mitteilungen der Deutschen Mathematiker-Vereinigung 2010;18:188. [4] Euler L.. Introductio in analysin infinitorum, Bousquet et al., Lausanne, 1748. Also in Opera Omnia Ser. I, Vol. VIII. First English translation by John D. Blanton. Springer, New York (1988). Second English translation by Ian Bruce, http: //www.17centurymaths.com/contents/introductiontoanalysisvol1.htm. [5] Gesellschaft für Informatik. FAQ-Liste zu den elektronischen Wahlen in der GI. https://www.gi.de/gi- wahlen/faq- liste- zu- den- elektronischen- wahlen- inder-gi.html. [6] van Lint J, Wilson R. A course in combinatorics. 2nd edition. Cambridge: Cambridge University Press; 2001.