770 likes | 778 Views
Explore the connections between mathematics, computer science, and voting systems. Learn about electronic voting security risks, consensus rankings, and the Kemeny-Snell distance. Discover how computer science ideas can solve voting and societal problems.
E N D
Voting Problems and Computer Science Applications Fred Roberts, Rutgers University
What do Mathematics and Computer Science have to do with Voting?
Have you used Google lately? Did you know that Google has something to do with voting?
Have you tried buying a book on online lately? • Did you get a message saying: If you are interested in this book, you might want to look at the following books as well? Did you know that has something to do with voting?
Have you ever heard of v-sis? • It’s a cancer-causing gene. • Computer scientists helped discover how it works? • How did they do it? • The answer also has something to do with voting. Cancer cell
Some connections between Computer Science and Voting are clearly visible. • Some people are working on plans to allow us to vote from home – over the Internet.
Electronic Voting • Security Risks in Electronic Voting • Could someone put on a “denial of service attack?” • That is, could someone flood your computer and those of other likely voters with so much spam that you couldn’t succeed in voting?
Electronic Voting • Security Risks in Electronic Voting • How can we prevent random loss of connectivity that would prevent you from voting? • How can your vote be kept private? • How can you be sure your vote is counted? • What will prevent you from selling your vote to someone else?
Electronic Voting • Security Risks in Electronic Voting • These are all issues in modern computer science research. • However, they are not what I want to talk about. • I want to talk about how ideas about voting systems can solve problems of computer science.
How do Elections Work? • Typically, everyone votes for their first choice candidate. • The votes are counted. • The person with the most votes wins. • Or, sometimes, if no one has more than half the votes, there is a runoff.
Sometimes Having More Information about Voters’ Preferences is Very Helpful • Sometimes it is helpful to have voters rank order all the candidates • From their top choice to their bottom choice.
Rankings Dennis Kucinich Bill Richardson John Edwards Ties are allowed Hillary Clinton Barack Obama
Rankings • What if we have four voters and they give us the following rankings? Who should win? • Voter 1Voter 2Voter 3Voter 4 • Clinton Clinton Obama Obama • Richardson Kucinich Edwards Richardson • Edwards Edwards Richardson Kucinich • Kucinich Richardson Kucinich Edwards • Obama Obama Clinton Clinton
Rankings • What if we have four voters and they give us the following rankings? • There is one added candidate. • Who should win? • Voter 1Voter 2Voter 3Voter 4 • Clinton Clinton Obama Obama • Gore Gore Gore Gore • Richardson Kucinich Edwards Richardson • Edwards Edwards Richardson Kucinich • Kucinich Richardson Kucinich Edwards • Obama Obama Clinton Clinton
Rankings • Voter 1Voter 2Voter 3Voter 4 • Clinton Clinton Obama Obama • Gore Gore Gore Gore • Richardson Kucinich Edwards Richardson • Edwards Edwards Richardson Kucinich • Kucinich Richardson Kucinich Edwards • Obama Obama Clinton Clinton • Maybe someone who is everyone’s second choice is the best choice for winner. • Point: We can learn something from ranking candidates.
Consensus Rankings • How should we reach a decision in an election if every voter ranks the candidates? • What decision do we want? • A winner • A ranking of all the candidates that is in some sense a consensus ranking • This would be useful in some applications • Job candidates are ranked by each interviewer • Consensus ranking of candidates • Make offers in order of ranking • How do we find a consensus ranking?
Consensus Rankings These two rankings are very close: Clinton Obama Obama Clinton Edwards Edwards Kucinich Kucinich Richardson Richardson
Consensus Rankings These two rankings are very far apart: Clinton Obama Richardson Kucinich Edwards Edwards Kucinich Richardson Obama Clinton
Consensus Rankings • This suggests we may be able to make precise how far apart two rankings are. • How do we measure the distance between two rankings?
Consensus Rankings • Kemeny-Snell distance between rankings: twice the number of pairs of candidates i and j for which i is ranked above j in one ranking and below j in the other + the number of pairs that are ranked in one ranking and tied in another. • ab • x y-z • y x • z • On {x,y}: +2 • On {x,z}: +2 • On {y,z}: +1 • d(a,b) = 5.
Consensus Rankings • One well-known consensus method: • “Kemeny-Snell medians”: Given set • of rankings, find ranking minimizing • sum of distances to other rankings. • Kemeny-Snell medians are having • surprising new applications in CS. John Kemeny, pioneer in time sharing in CS
Consensus Rankings • Kemeny-Snell median: Given rankings a1, a2, …, ap, find a ranking x so that • d(a1,x) + d(a2,x) + … + d(ap,x) • is as small as possible. • x can be a ranking other than a1, a2, …, ap. • Sometimes just called Kemeny median.
Consensus Rankings • a1a2a3 • Fish Fish Chicken • Chicken Chicken Fish • Beef Beef Beef • Median = a1. • If x = a1: • d(a1,x) + d(a2,x) + d(a3,x) = 0 + 0 + 2 = 2 • is minimized. • If x = a3, the sum is 4. • For any other x, the sum is at least 1 + 1 + 1 = 3.
Consensus Rankings • a1a2a3 • Fish Chicken Beef • Chicken Beef Fish • Beef Fish Chicken • Three medians = a1, a2, a3. • This is the “voter’s paradox” situation.
Consensus Rankings • a1a2a3 • Fish Chicken Beef • Chicken Beef Fish • Beef Fish Chicken • Note that sometimes we wish to minimize • d(a1,x)2 + d(a2,x)2 + … + d(ap,x)2 • A ranking x that minimizes this is called a Kemeny-Snell mean. • In this example, there is one mean: the ranking declaring all three alternatives tied.
Consensus Rankings • a1a2a3 • Fish Chicken Beef • Chicken Beef Fish • Beef Fish Chicken • If x is the ranking declaring Fish, Chicken • and Beef tied, then • d(a1,x)2 + d(a2,x)2 + … + d(ap,x)2 = • 32 + 32 + 32 = 27. • Not hard to show this is minimum.
Consensus Rankings • Theorem (Bartholdi, Tovey, and Trick, 1989; Wakabayashi, 1986): Computing the Kemeny-Snell median of a set of rankings is an NP-complete problem.
Consensus Rankings • Okay, so what does this have to do with practical computer science questions?
Consensus Rankings • I mean reallypractical computer science questions.
Google Example • Google is a “search engine” • It searches through web pages and rank orders them. • That is, it gives us a ranking of web pages from most relevant to our query to least relevant.
Meta-search • There are other search engines besides Google. • Wouldn’t it be helpful to use several of them and combine the results? • This is meta-search. • It is a voting problem • Combine page rankings from several search engines to produce one consensus ranking • Dwork, Kumar, Naor, Sivakumar (2000): Kemeny-Snell medians good in spam resistance in meta-search (spam by a page if it causes meta-search to rank it too highly) • Approximation methods make this computationally tractable
Collaborative Filtering • Recommending books or movies • Combine book or movie ratings by various people • This too is voting • Produce a consensus ordered list of books or movies to recommend • Freund, Iyer, Schapire, Singer (2003): “Boosting” algorithm for combining rankings. • Related topic: Recommender Systems
Meta-search and Collaborative Filtering • A major difference from the election situation • In elections, the number of voters is large, number of candidates is small. • In CS applications, number of voters (search engines) is small, number of candidates (pages) is large. • This makes for major new complications and research challenges.
Have you ever heard of v-sis? • It’s a cancer-causing gene. • Computer scientists helped discover how it works? • How did they do it? • The answer also has something to do with voting.
Large Databases and Inference • Decision makers consult massive data sets. • The study of large databases and gathering of information from them is a major topic in modern computer science. • We will give an example from the field of Bioinformatics. • This lies at the interface between Computer Science and Molecular Biology
Large Databases and Inference • Real biological data often in form of sequences. • GenBank has over 7 million sequences comprising 8.6 billion “bases.” • The search for similarity or patterns has extended from pairs of sequences to finding patterns that appear in common in a large number of sequences or throughout the database: “consensus sequences” • Emerging field of “Bioconsensus”: applies consensus methods to biological databases.
Large Databases and Inference Why look for such patterns? Similarities between sequences or parts of sequences lead to the discovery of shared phenomena. For example, it was discovered that the sequence for platelet derived factor, which causes growth in the body, is 87% identical to the sequence for v-sis, that cancer-causing gene. This led to the discovery that v-sis works by stimulating growth.
Large Databases and Inference DNA Sequences A DNA sequence is a sequence of “bases”: A = Adenine, G = Guanine, C = Cytosine, T = Thymine Example: ACTCCCTATAATGCGCCA
Large Databases and Inference Example Bacterial Promoter Sequences studied by Waterman (1989): RRNABP1: ACTCCCTATAATGCGCCA TNAA: GAGTGTAATAATGTAGCC UVRBP2: TTATCCAGTATAATTTGT SFC: AAGCGGTGTTATAATGCC Notice that if we are looking for patterns of length 4, each sequence has the pattern TAAT.
Large Databases and Inference Example Bacterial Promoter Sequences studied by Waterman (1989): RRNABP1: ACTCCCTATAATGCGCCA TNAA: GAGTGTAATAATGTAGCC UVRBP2: TTATCCAGTATAATTTGT SFC: AAGCGGTGTTATAATGCC Notice that if we are looking for patterns of length 4, each sequence has the pattern TAAT.
Large Databases and Inference Example However, suppose that we add another sequence: M1 RNA: AACCCTCTATACTGCGCG The pattern TAAT does not appear here. However, it almost appears, since the pattern TACT appears, and this has only one mismatch from the pattern TAAT.