1.42k likes | 1.65k Views
Anonymous communications: High latency systems. Messaging anonymity & the traffic analysis of hardened systems. FOSAD 2010 – Bertinoro , Italy. Network identity today. Networking Relation between identity and efficient routing Identifiers: MAC, IP, email, screen name
E N D
Anonymous communications: High latency systems Messaging anonymity & the traffic analysis of hardened systems FOSAD 2010 – Bertinoro, Italy
Network identity today • Networking • Relation between identity and efficient routing • Identifiers: MAC, IP, email, screen name • No network privacy = no privacy! • The identification spectrum today “The Mess” we are in! Strong Identification Full Anonymity Pseudonymity
Network identity today (contd.) No anonymity No identification Weak identifiers easy to modulate Expensive / unreliable logs. IP / MAC address changes Open wifi access points Botnets Partial solution Authentication Open issues: DoS and network level attacks • Weak identifiers everywhere: • IP, MAC • Logging at all levels • Login names / authentication • PK certificates in clear • Also: • Location data leaked • Application data leakage
Ethernet packet format Anthony F. J. Levi - http://www.usc.edu/dept/engineering/eleceng/Adv_Network_Tech/Html/datacom/ No integrity or authenticity MAC Address
IP packet format 3.1. Internet Header Format A summary of the contents of the internet header follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| IHL |Type of Service| Total Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification |Flags| Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time to Live | Protocol | Header Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Example Internet Datagram Header Figure 4. RFC: 791 INTERNET PROTOCOL DARPA INTERNET PROGRAM PROTOCOL SPECIFICATION September 1981 Link different packets together Weak identifiers No integrity / authenticity Same for TCP, SMTP, IRC, HTTP, ...
Outline –4 lectures • Objective: foundations of anonymity & traffic analysis • Why anonymity? • Lecture 1 – Unconditional anonymity • DC-nets in detail, Broadcast & their discontents • Lecture 2 – Black box, long term attacks • Statistical disclosure & its Bayesian formulation • Lecture 3 – Mix networks in detail • Sphinx format, mix-networks • Lecture 4 – Bayesian traffic analysis of mix networks
Anonymity in communications • Specialized applications • Electronic voting • Auctions / bidding / stock market • Incident reporting • Witness protection / whistle blowing • Showing anonymous credentials! • General applications • Freedom of speech • Profiling / price discrimination • Spam avoidance • Investigation / market research • Censorship resistance
Anonymity properties (1) • Sender anonymity • Alice sends a message to Bob. Bob cannot know who Alice is. • Receiver anonymity • Alice can send a message to Bob, but cannot find out who Bob is. • Bi-directional anonymity • Alice and Bob can talk to each other, but neither of them know the identity of the other.
Anonymity properties (2) • 3rd party anonymity • Alice and Bob converse and know each other, but no third party can find this out. • Unobservability • Alice and Bob take part in some communication, but no one can tell if they are transmitting or receiving messages.
Pseudonymity properties • Unlinkability • Two messages sent (received) by Alice (Bob) cannot be linked to the same sender (receiver). • Pseudonymity • All actions are linkable to a pseudonym, which is unlinkable to a principal (Alice)
Anonymity through Broadcast Point 1: Do not re-invent this E(Junk) Point 2: Many ways to do broadcast - Ring - Trees It has all been done (Buses) Simple receiver anonymity E(Message) Point 3:Is your anonymity system better than this? E(Junk) Point 4: What are the problems here? Coordination Sender anonymity Latency Bandwidth E(Junk) E(Junk)
Unconditional anonymity • DC-nets • Dining Cryptographers (David Chaum 1985) • Multi-party computation resulting in a message being broadcast anonymously • 2 twists • How to implement DC-nets through broadcast trees • How to achieve reliability? • Communication cost...
The Dining Cryptographers (1) • “Three cryptographers are sitting down to dinner at their favourite three-star restaurant. • Their waiter informs them that arrangements have been made with the maitre d'hotel for the bill to be paid anonymously. • One of the cryptographers might be paying for the dinner, or it might have been NSA (U.S. National Security Agency). • The three cryptographers respect each other's right to make an anonymous payment, but they wonder if NSA is paying.”
The Dining Cryptographers (2) I didn’t I paid Ron Adi Did the NSA pay? I didn’t Wit
The Dining Cryptographers (2) I didn’t ma = 0 Toss coin car br = mr + car + crw I paid mr = 1 Ron ba = ma + car + caw Toss coin crw Adi Combine: B = ba + br + bw = ma + mr +mw = mr (mod 2) Toss coin caw I didn’t mw = 0 bw = mw + crw + caw Wit
DC-nets • Generalise • Many participants • Larger message size • Conceptually many coins in parallel (xor) • Or: use +/- (mod 2|m|) • Arbitrary key (coin) sharing • Graph G: • nodes - participants, • edges - keys shared • What security?
The Dining Cryptographers (3) ma = 0 Toss m coins car br = mr+ car - crw(mod 2m) I want to send mr = message Ron ba = ma - car + caw (mod 2m) Toss m coins crw Adi Combine: B = ba + br + bw = ma + mr +mw = mr (mod 2m) Toss m coins caw mw = 0 bw = mw + crw - caw (mod 2m) Wit
Key sharing graph • Derive coins • cabi= H[Kab, i]for round i • Stream cipher (Kab) • Alice broadcasts • ba = cab + cac + ma C B A Shared key Kab
Key sharing graph – security (1) • If B and C corrupt • Alice broadcasts • ba = cab + cac+ ma • Adversary’s view • ba = cab + cac + ma • No Anonymity C B A Shared key Kab
Key sharing graph – security (2) • Adversary nodes partition the graph into a blue and green sub-graph • Calculate: • Bblue = ∑bj, j is blue • Bgreen = ∑bi, i is green • Substract known keys • Bblue+ Kred-blue = ∑mj • Bgreen + K’red-green = ∑mi • Discover the originating subgraph. • Reduction in anonymity C B A Anonymity set size = 4 (not 11 or 8!)
DC-net implementation • bi broadcast graph • Tree – independent of key sharing graph • = Key sharing graph – No DoS unless split in graph • Communication in 2 phases: • 1) Key sharing (off-line) • 2) Round sync & broadcast Combine: B = bi = mr (mod 2m) Aggregator
DC-net implementation (p2p) • bi broadcast graph • Tree – independent of key sharing graph • = Key sharing graph – No DoS unless split in graph • Communication in 2 phases: • 1) Key sharing (off-line) • 2) Round sync & broadcast (peer-to-peer?) • Ring? • Tree?
DC collisions ma = 0 Toss m coins car br = mr+ car - crw(mod 2m) I want to send mr = message1 Ron ba = ma - car + caw (mod 2m) Toss m coins crw Adi Combine: B = ba + br + bw = ma + mr +mw = collision (mod 2m) Toss m coins caw I want to send mw = message2 bw = mw + crw - caw (mod 2m) Wit
How to resolve collisions? • Ethernet: • detect collisions (redundancy) • Exponential back-off, with random re-transmission • Collisions do not destroy all information • B = ba + br + bw= ma+ mr +mw = = collision(mod m) = message 1 + message2(mod m) • N collisions can be decoded in N transmissions Cool!
Reliability – How? Trick 1: run 2 DC-net rounds in parallel Trick 2: retransmit if greater than average (1, m1) Round 1 (5, m1+m2+m4+m5+m6) (1, m2) Round 2 (3, m1+m2+m4) (0, 0) (2, m5+m6) (1, m4) Round 3 Round 5 (2, m2+m4) (1, m6) (1, m1) (1, m5) (1, m5) (1, m6) Round 4 (1, m4) (1, m2) (0, 0)
Other tricks with DC nets • Reliability DoS resistance! • How to protect a DC-network against disruption? • Solved problem? • DC in a Disco – “trap & trace” • Modern crypto: commitments, secret sharing, and banning • Note the homomorphism: collisions = addition • Can tally votes / histograms through DC-nets
DC-net shortcommings • Security is great! • Full key sharing graph perfect anonymity • Communication cost – BAD • (N broadcasts for each message!) • Naive: O(N2) cost, O(1) Latency • Not so naive: O(N) messages, O(N) latency • Ring structure for broadcast • Expander graph: O(N) messages, O(logN) latency? • Centralized: O(N) messages, O(1) latency • Not practical for large(r) N! • Local wireless communications? • Perfect Anonymity ?
Black-box, long term traffic analysis attacks “In the long run we are all dead” – Keynes
Anonymity so far... • DC-nets – setting • Single message from Alice to Bob, or between a set group • Real communications • Alice has a few friends that she messages often • Interactive stream between Alice and Bob (TCP) • Emails from Alice to Bob (SMTP) • Alice is not always on-line (network churn) • Repetition – patterns -> Attacks
Fundamental limits • Even perfect anonymity systems leak information when participants change • Remember: DC-nets do not scale well • Setting: • N senders / receivers – Alice is one of them • Alice messages a small number of friends: • RA in {Bob, Charlie, Debbie} • Through a MIX / DC-net • Perfect anonymity of size K • Can we infer Alice’s friends?
Setting • Alice sends a single message to one of her friends • Anonymity set size = KEntropy metric EA = log K • Perfect! • rA in RA= {Bob, Charlie, Debbie} Alice Anonymity System K-1 Sendersout of N-1others K-1 Receiversout of Nothers (Model as random receivers)
Many rounds • Observe many rounds in which Alice participates • Rounds in which Alice participates will output a message to her friends! • Infer the set of friends! T1 Alice Alice Alice Alice • rA1 • rA4 • rA2 • rA3 Anonymity System Anonymity System Anonymity System Anonymity System Others Others Others Others Others Others Others Others T2 T3 T4 ... Tt
Hitting set attack (1) • Guess the set of friends of Alice (RA’) • Constraint |RA’| = m • Accept if an element is in the output of each round • Downside: Cost • N receivers, m size – (N choose m) options • Exponential – Bad • Good approximations...
Statistical disclosure attack • Note that the friends of Alice will be in the sets more often than random receivers • How often? Expected number of messages per receiver: • μother = (1 / N) ∙ (K-1) ∙ t • μAlice = (1 / m) ∙ t + μother • Just count the number of messages per receiver when Alice is sending! • μAlice >μother
Comparison: HS and SDA • Parameters: N=20 m=3 K=5 t=45 KA={[0, 13, 19]} Round Receivers SDA SDA_error #Hitting sets 1 [15, 13, 14, 5, 9] [13, 14, 15] 2 685 2 [19, 10, 17, 13, 8] [13, 17, 19] 1 395 3 [0, 7, 0, 13, 5] [0, 5, 13] 1 257 4 [16, 18, 6, 13, 10] [5, 10, 13] 2 203 5 [1, 17, 1, 13, 6] [10, 13, 17] 2 179 6 [18, 15, 17, 13, 17] [13, 17, 18] 2 175 7 [0, 13, 11, 8, 4] [0, 13, 17] 1 171 8 [15, 18, 0, 8, 12] [0, 13, 17] 1 80 9 [15, 18, 15, 19, 14] [13, 15, 18] 2 41 10 [0, 12, 4, 2, 8] [0, 13, 15] 1 16 11 [9, 13, 14, 19, 15] [0, 13, 15] 1 16 12 [13, 6, 2, 16, 0] [0, 13, 15] 1 16 13 [1, 0, 3, 5, 1] [0, 13, 15] 1 4 14 [17, 10, 14, 11, 19] [0, 13, 15] 1 2 15 [12, 14, 17, 13, 0] [0, 13, 17] 1 2 16 [18, 19, 19, 8, 11] [0, 13, 19] 0 1 17 [4, 1, 19, 0, 19] [0, 13, 19] 0 1 18 [0, 6, 1, 18, 3] [0, 13, 19] 0 1 19 [5, 1, 14, 0, 5] [0, 13, 19] 0 1 20 [17, 18, 2, 4, 13] [0, 13, 19] 0 1 21 [8, 10, 1, 18, 13] [0, 13, 19] 0 1 22 [14, 4, 13, 12, 4] [0, 13, 19] 0 1 23 [19, 13, 3, 17, 12] [0, 13, 19] 0 1 24 [8, 18, 0, 10, 18] [0, 13, 18] 1 1 Round 16: Both attacks give correct result SDA: Can give wrong results – need more evidence
HS and SDA (continued) 25 [19, 4, 13, 15, 0] [0, 13, 19] 0 1 26 [13, 0, 17, 13, 12] [0, 13, 19] 0 1 27 [11, 13, 18, 15, 14] [0, 13, 18] 1 1 28 [19, 14, 2, 18, 4] [0, 13, 18] 1 1 29 [13, 14, 12, 0, 2] [0, 13, 18] 1 1 30 [15, 19, 0, 12, 0] [0, 13, 19] 0 1 31 [17, 18, 6, 15, 13] [0, 13, 18] 1 1 32 [10, 9, 15, 7, 13] [0, 13, 18] 1 1 33 [19, 9, 7, 4, 6] [0, 13, 19] 0 1 34 [19, 15, 6, 15, 13] [0, 13, 19] 0 1 35 [8, 19, 14, 13, 18] [0, 13, 19] 0 1 36 [15, 4, 7, 13, 13] [0, 13, 19] 0 1 37 [3, 4, 16, 13, 4] [0, 13, 19] 0 1 38 [15, 13, 19, 15, 12] [0, 13, 19] 0 1 39 [2, 0, 0, 17, 0] [0, 13, 19] 0 1 40 [6, 17, 9, 4, 13] [0, 13, 19] 0 1 41 [8, 17, 13, 0, 17] [0, 13, 19] 0 1 42 [7, 15, 7, 19, 14] [0, 13, 19] 0 1 43 [13, 0, 17, 3, 16] [0, 13, 19] 0 1 44 [7, 3, 16, 19, 5] [0, 13, 19] 0 1 45 [13, 0, 16, 13, 6] [0, 13, 19] 0 1 SDA: Can give wrong results – need more evidence
Disclosure attack family • Counter-intuitive • The larger N the easiest the attack • Hitting-set attacks • More accurate, need less information • Slower to implement • Sensitive to Model • E.g. Alice sends dummy messages with probability p. • Statistical disclosure attacks • Need more data • Very efficient to implement (vectorised) – Faster partial results • Can be extended to more complex models (pool mix, replies, ...) • Bayesian modelling of the problem • A systematic approach to traffic analysis
Traffic analysis is probabilistic inference (1) Profile Alice Alice • Full generative model: • Pick a profile for Alice Alice and a Profile for others Other from prior distributions (once!) • For each round: • Pick a multi-set of senders uniformly from the multi-sets of size K of possible senders • Pick a permutation M1 from kk uniformly at random • For each sender pick a receiver according to the probability in their respective profiles T1 Anonymity System • rA1 Alice Profile Other Others rother Other Matching M1 M Observation O1 = Senders ( Uniform) + receivers (as above)
Traffic analysis is probabilistic inference (2) Profile Alice Alice • Full generative model: • Pick a profile for Alice Alice and a Profile for others Other from prior distribution (once!) • For each round: • Pick a multi-set of senders in Oi uniformly from the multi-sets of size K of possible senders • Pick a permutation M1 from kk uniformly at random • For each sender pick a receiver rAlice and rother according to the probability in their respective profiles T1 Anonymity System • rA1 Alice Profile Other Others rother Other Matching M1 M Observation O1 = Senders ( Uniform) + receivers (as above)
Traffic analysis is probabilistic inference (3) Profile Alice Alice • The Disclosure attack inference problem: • “Given all known and observed variables (priors , M, observationsOi) determine the distribution over the hidden random variables (profiles Alice , Otherand matching Mi)” • Using information from all rounds, assuming profiles are stable • How? (a) maths = Bayesian inference (b) computation = Gibbs sampling T1 Anonymity System • rA1 Alice Profile Other Others rother Other Matching M1 M Observation O1 = Senders ( Uniform) + receivers (as above)
Lets apply probability theory • What are profiles? • Keep it simple: a multinomial distribution over all users. • E.g. Alice= [Bob: 0.4, Charlie: 0.3, Debbie: 0.3]Other= [Uniform(1/n-1)] • Prior distribution on multinomials: Dirichlet Distribution, i.e. samples from Dirichlet distribution are multinomial distributions • Other profile models are also possible: restrictions on number of contacts, cliques, mutual friends, … • Other quantities have known conditional probabilities. • What are those?
Traffic analysis is probabilistic inference (4) Profile Alice Alice • Full generative model: • Pick a profile for Alice Alice and a Profile for others Other from prior distribution (once!) • For each round: • Pick a multi-set of senders in Oi uniformly from the multi-sets of size K of possible senders • Pick a permutation M1 from kk uniformly at random • For each sender pick a receiver rAlice and rother according to the probability in their respective profiles • Pr[rA1| Alice,M1,O1] T1 Anonymity System Pr[Alice| ] • rA1 Alice Profile Other Others rother Other Pr[Other | ] • Pr[rother | other,M1,O1] Pr[O1 | K] Matching M1 M Pr[M1 | M] Observation O1 = Senders ( Uniform) + receivers (as above)
Re-arrange probabilities • What we have:Pr[Alice, Other,Mi , riA,riother| Oi, M, , K] == Pr[riA| Alice,M1,Oi] x Pr[riother| other,M1,Oi] x Pr[Mi| M] x Pr[Alice| ] x Pr[Other | ] • What we really want:Pr[Alice , Other, Mi |riA,riother, Oi, M, , K]
Revision: Bayes theorem • Pr[A=a, B=b] = Pr[A=a | B=b] x Pr[B = b] =Pr[B=b | A=a] x Pr[A = a] • Pr[A=a | B=b] = Pr[B=b | A=a ] x Pr[A=a ] / Pr[B = b] • Pr[B=b] = (A=a) Pr[A=a, B=b] = (A=a) Pr[A=a | B=b] x Pr[B = b] Derivation Inverse probability Meaning Forward probability Prior Normalising factor Normalising factor Large spaces = no direct computation
Apply Bayes A B • Pr[Alice , Other, Mi| riA,riother, Oi, M, , K] =Pr[ riA,riother| Alice , Other, MiOi, M, , K] xPr[Alice , Other, Mi| Oi, M, , K]Pr[ riA,riother| Alice , Other, MiOi, M, , K] xPr[Alice , Other, Mi | Oi, M, , K] A B A All riA,riother Large spaces = no direct computation
The good, the bad and the ugly • Good: can compute the probability sought up to a multiplicative constant • Pr[Alice , Other, Mi | riA,riother, Oi, M, , K] Pr[ riA,riother| Alice , Other, MiOi, M, , K] xPr[Alice , Other, Mi | Oi, M, , K] • Bad: Forget about computing the constant – in general • Ugly: Use computational methods to estimate quantities of interest for traffic analysis
Quantities of Interest • Full hidden state: Alice , Other, Mi • Marginal probabilities & distributions • Pr[Alice->Bob] – Are Alice and Bob friends? • Mx – Who is talking to whom at round x? • How can we get information about those? • Without computing the exact probability!
Revision: Sampling • Consider a distribution Pr[A=a] • Direct computation of mean: • E[A] = aPr[A=a] x a • Computation of mean through sampling: • Samples A1, A2, …, An A • E[A] iAi / n
Revision: Sampling (2) • Consider a distribution Pr[A=a] • Computation of mean through sampling: • Samples A1, A2, …, An A • E[A] iAi / n • More: • Estimates of variance • Estimate of percentiles – confidence intervals(0.5%, 2.5%, 50%, 97.5%, 99.5%)
Traffic analysis is probabilistic inference (5) Profile Alice Alice • The Disclosure attack inference problem: • “Given all known and observed variables (priors , M, observationsOi) determine the distribution over the hidden random variables (profiles Alice , Otherand matching Mi)” • Using information from all rounds, assuming profiles are stable • How? (a) maths = Bayesian inference (b) computation = Gibbs sampling T1 Anonymity System • rA1 Alice Profile Other Others rother Other Matching M1 M Observation O1 = Senders ( Uniform) + receivers (as above) Sample