230 likes | 421 Views
Information Theory and the Security of Binary Data Perturbation. Poorvi Vora Dept. of Computer Science George Washington University. Statistical Database. Database A: Q = {q 1 ,q 2 ,...q i ,... } (queryable bits) and S = {s 1 , s 2 ,...s i ,... } (sensitive bits).
E N D
Information Theory and the Security of Binary Data Perturbation Poorvi Vora Dept. of Computer Science George Washington University
Statistical Database • Database A: • Q = {q1 ,q2 ,...qi ,... } (queryable bits) and • S = {s1, s2,...si ,... } (sensitive bits). • Data collector B can ask for: fi(q1, q2, q3, …)qjQ = Xi Poorvi Vora/CS/GWU
The statistical database security problem • Can query multiple fi(q1, q2, q3, …)qjQ = Xi And simultaneously solve • (perfect zk protocols do not leak additional information about xi, but Ai are revealed; thus not a traditional cryptographic problem) Poorvi Vora/CS/GWU
Random Data Perturbation (RDP) Used in public health community for twenty odd years, can be used together with cryptographic techniques • If xi perturbed each time, the simultaneous equations are inconsistent fi(q1+1i, q2 +2i, q3 +3i, …) = Xi+ i • Security and attack characterization open problem for 20+ years; though many attempts (Denning, Adams, Duncan, … Landers). Poorvi Vora/CS/GWU
RDP Salary 25,000 Salary 40,000 -25,000 25,000 q 0 0 p = 1-q F(x) G(x) Yes HIV? p = 1-q q 1 1 stats. over many are accurate Poorvi Vora/CS/GWU
Known Security Property of RDP m repeated queries m probability of error m 0 m Chernoff Bound: m = [ln(2/)] /[0.38 2] m< Probability of lie = 0.5 Poorvi Vora/CS/GWU
A simple inference attack • Query 1: Female? • Query 2: Over 40? • Query 3: Losing Calcium? Really asking about age and gender How does one characterize all such attacks? What can one say about security wrt such attacks? Poorvi Vora/CS/GWU
Our definitions Definition An inference attack is a set of queries x not independent of the set of sensitive bits S, i.e. I (S ;x) 0 Definition A small error inference attack is one in which lim nm = 0 . Definition The query complexity per bit, of query sequence x of length m, as a means of distinguishing among M possible values of x is m = m/log2M . Poorvi Vora/CS/GWU
Recall attack example • Query 1: Female? • Query 2: Over 40? • Query 3: Losing Calcium? Query 3 checks answers to Query 1 and 2 Is a parity-check bit of sorts, but not quite If 1 and 2 independent, = 3/2 m 0 m ? Poorvi Vora/CS/GWU
Our analogy (ISIT ‘03) • All attacks are communication over channel • When attacks are codes: x = f(S) • What B queries is a codeword bit • What B receives is the transmitted codeword that he decodes Poorvi Vora/CS/GWU
Shannon’s theorems apply when x = f(S) and constant (ISIT ’03) Assuming x = f(S) (including adaptive, related queries) – queries are channel codes • constant – reliable transmission Result: m 0 1/C Above this bound, m 0 exponentially, Below it, it m increases exponentially Poorvi Vora/CS/GWU
What about the general zero-error inference attack? All inference attacks are not codes, i.e. x f(S). is not necessarily kept constant as m , i.e. transmission is not necessarily reliable. Poorvi Vora/CS/GWU
Thm. 1 lim m m = 0 { m}m=1 s.t. i m im; lim m m = 1/C Proof modifies the converse of Shannon’s proof of the channel coding theorem Poorvi Vora/CS/GWU
The Proof log2M = H(sm) =H(sm|ym) + I(sm;ym) • 1 + Emlog2M + I(sm;ym) • 1 + Emlog2M + mC • m= m/log2M (1-Em)/(1/m+C) = m Lim m m = 1/C Poorvi Vora/CS/GWU
Thm. 2 Small error attacks with constant 1/C exist. Proof: Follows from channel coding theorem Poorvi Vora/CS/GWU
Thm. 3 For data of entropy H, stationary record sequence, Nr records, and m the number of queries per record, lim m m = 0 {m}m=1 s.t. i m im; lim m m = H/C Proof: Modification of source-channel coding theorem Poorvi Vora/CS/GWU
Proof Given Theorem 1, smaller lengths can be shown to violate Shannon’s source coding theorem when the data is stationary. Poorvi Vora/CS/GWU
Corollary m ln2/22 When p = 0.5 For any probability of error Different from Chernoff bound, does not increase with a smaller probability of error This is the improvement bought over the repetition code Poorvi Vora/CS/GWU
Where to? • Block Ciphers as channels for properties of the key (Filiol, ePrint 2003) • Attacks on Stream Ciphers as codes over key bits (Johansson et al, Golic et al, Filiol et al) • It appears there is a framework (Vora, working documents): • all statistical attacks as channel communication • efficient attacks as codes • related-input (key, message) attacks as concatenated codes1 • Wagner’s Cryptanalytic Model (FSE ‘03) to determine inner codes Do related-key attacks provide an improvement in efficiency over repeated key attacks? 1Filiol shows the repeated key attack on block ciphers as a concatenated code with the outer code as the repetition code Poorvi Vora/CS/GWU
Also traffic analysis, e.g.Crowds: Reiter and Rubin/Lucent and AT&T N nodes; C colluding pf probability of forwarding At node i+1: Probability that node i originated the message (probability of truth): 1 – pf (N-C-1)/N Probability of any other non-collaborating node originating message: pf/N Observable information changes the pdf on the data of interest: the originator of the message Crowds Poorvi Vora/CS/GWU
The Crowds protocol as a simplex channel X Y Φ: X = set of originator nodes {0, ..N-3} → Y = set of predecessor nodes {0, ..N-3} Φ(X) = Y Assumption: all senders equally likely P(Y = j | X = i) = pij = pf/N i j; = 1 – pf(N-2)/N; i=j Poorvi Vora/CS/GWU
The Crowds protocol X Y C = 1+ (N-2)pf/N log [1- (N-2)pf/N] + pf/N log [pf/N] = 2log2/N if pf=1 2log2/N + (N-1)2 if pf= 1 - Average path length = (1 - )/ = O(1/ ) Poorvi Vora/CS/GWU
The replay attack on Crowds Repetition code resending message, along different (randomly chosen) route How about attacks corresponding to other codes? Poorvi Vora/CS/GWU