1 / 26

APPROXIMATE MEASUREMENT OF VOTER PRIVACY LOSS IN AN ELECTION WITH PRECINCT REPORTS

APPROXIMATE MEASUREMENT OF VOTER PRIVACY LOSS IN AN ELECTION WITH PRECINCT REPORTS. Chris Crutchfield , David Molnar, and David Turner University of California, Berkeley. Precinct Reports. In California, counties are required by law to release per-precinct election tallies.

cato
Download Presentation

APPROXIMATE MEASUREMENT OF VOTER PRIVACY LOSS IN AN ELECTION WITH PRECINCT REPORTS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. APPROXIMATE MEASUREMENT OF VOTER PRIVACY LOSS IN ANELECTION WITH PRECINCT REPORTS Chris Crutchfield, David Molnar, and David Turner University of California, Berkeley

  2. Precinct Reports In California, counties are required by law to release per-precinct election tallies Source: San Francisco County SOV, Nov. 2004 Presidential Election

  3. Precinct Reports Sometimes, due to absentee balloting, geography, or poor turnout, precinct sizes may be very small (≤10 voters!) Source: Santa Cruz County SOV, Nov. 2004 Presidential Election

  4. Precinct Reports • On occasion, all the voters in a precinct vote the same way. • Total privacy loss! Source: Santa Cruz County SOV, Nov. 2004 Presidential Election

  5. Precinct Reports • Sometimes a precinct vote is not unanimous, but it is close. • If 9 people vote for Alice and 1 for Bob, how much privacy is lost? • Can we quantify it? Source: Santa Cruz County SOV, Nov. 2004 Presidential Election

  6. Goal Can we devise a way to calculate the total privacy lost by a county releasing its precinct tallies?

  7. Precinct Tallies • In our calculations, we use precinct tallies from both San Francisco County and Santa Cruz County. • Data was taken from both counties’ Statements of Vote for the November 2004 Presidential Elections • Both tallies are fairly one-sided: • - In SF, 82% voted for Kerry • - In SC, 72% voted for Kerry • So we should expect fairly high privacy loss.

  8. A Voting Privacy Measure • Measure devised by Coney, Hall, Vora, Wagner (2005) • Gives an entropy-based measure of privacy loss for a voting system

  9. Measure by Coney et al. • Let V denote the random variable of how a voter voted. • Let S denote the information from sources other than the voting system. • - For example, geographic location. • Let E denote the information leaked by the voting system and voting procedures. • - For example, precinct tallies constitute information obtained through the voting procedures.

  10. Measure by Coney et al. • Definition (Perfectly Private) • A voting system is perfectly private if V is conditionally independent of E, given S. • Pr(V|S) = Pr(V|S,E)

  11. Measure by Coney et al. The Shannon entropy of a random variable X can be described as a measure of the uncertainty of X. In the Coney et al. measure, the amount of privacy loss, L, is given by

  12. Measure by Coney et al. • Note that for a perfectly private voting system, L = 0. • L = max (H(V|S) – H(V|S,E)) = 0. • Of course, there can never be such a system.

  13. Our Measure • Instead of looking at a particular voter, we want to consider the privacy lost across all voters. • We need to modify the measure given by Coney et al. • Consider the random variable for each voter i to be Vi. • Then let V be the joint distribution over all the Vi, so • V = (V1, V2, …, Vn) • As before,

  14. Some Simplifying Assumptions • In order to compute L, we introduce some simplifying assumptions. • First, suppose that the effect of S is negligible, hence Pr(V|S) = Pr(V). • Second, suppose that all Vi are independent.

  15. Our Measure • Consider the case of a county with only one precinct. • Thus releasing the tally X = V1 + V2 + … + Vncorresponds with E being the event X = k. • But it turns out that the V that maximizes L is just • V1 = ... = Vn-k = 0 • Vn-k+1 = ... = Vn = 1/2

  16. Our Measure • Instead, we consider the following measure • Where VU is the uniform case – all Vi uniform. • One way to look at this: if we have no prior information S, the best we can do is assume each voter votes uniformly at random. • We’ll come back to this later.

  17. The Two-Candidate Model • Suppose we have m precincts, each with nipeople, of whom kivote for Alice (hence ni – kivote for Bob). • Then we can define L’ as follows • Let’s try this out on some real data.

  18. The Two-Candidate Model • The notation L’/n here is used to denote the “average privacy loss”. In a sense, it’s the average amount of privacy each voter is expected to lose.

  19. The Multi-Candidate Model • However, most elections often have more than 2 candidates. • Can we extend our model to lcandidates? • As before, there are nipeople in precinct i, of whom ki,jvote for candidate j.

  20. The Multi-Candidate Model • Here we see that the “average privacy loss” figure grew by a large factor • What went wrong?

  21. Realistic Prior Model • Our initial assumptions that V is uniform and S is negligible turned out to be poor. • Massive privacy loss in the Multi-Candidate case because our model weighted each candidate equally. • In the U.S., it’s unreasonable to assume that a voter will vote for each candidate with equal probability (in 2000, the Green Party received 2.7% of the national vote). • How can we update our model to include a realistic prior distribution?

  22. Realistic Prior Model • Suppose instead that S, the information available prior to the election, includes some polling data that gives away how likely each voter is to vote for each candidate. • Example: In Santa Cruz County, we know prior to the election that around 72% of the voters will vote for Kerry. • Then each voter has distribution Vi, where Pr(Vi= Kerry) = 0.72, Pr(Vi = Bush) = 0.28.

  23. Realistic Prior Model Updated prior makes a more realistic model

  24. So, what does it all mean? • What does a number like 5% privacy loss mean? • How is this useful, then? • Riverside County, CA registrar of voters.

  25. Thank you! Any questions?

  26. Open Problems/Ideas • Precinct tallies not necessarily part of the voting system – rather a part of the reporting procedure. • How can we extend this model to information leaked through a system itself? • Ballot language, other variables.

More Related