1 / 23

Information Theory and the Security of Binary Data Perturbation

Information Theory and the Security of Binary Data Perturbation. Poorvi Vora Dept. of Computer Science George Washington University. Statistical Database. Database A: Q = {q 1 ,q 2 ,...q i ,... } (queryable bits) and S = {s 1 , s 2 ,...s i ,... } (sensitive bits).

miltony
Download Presentation

Information Theory and the Security of Binary Data Perturbation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Theory and the Security of Binary Data Perturbation Poorvi Vora Dept. of Computer Science George Washington University

  2. Statistical Database • Database A: • Q = {q1 ,q2 ,...qi ,... } (queryable bits) and • S = {s1, s2,...si ,... } (sensitive bits). • Data collector B can ask for: fi(q1, q2, q3, …)qjQ = Xi Poorvi Vora/CS/GWU

  3. The statistical database security problem • Can query multiple fi(q1, q2, q3, …)qjQ = Xi And simultaneously solve • (perfect zk protocols do not leak additional information about xi, but Ai are revealed; thus not a traditional cryptographic problem) Poorvi Vora/CS/GWU

  4. Random Data Perturbation (RDP) Used in public health community for twenty odd years, can be used together with cryptographic techniques • If xi perturbed each time, the simultaneous equations are inconsistent fi(q1+1i, q2 +2i, q3 +3i, …) = Xi+ i • Security and attack characterization open problem for 20+ years; though many attempts (Denning, Adams, Duncan, … Landers). Poorvi Vora/CS/GWU

  5. RDP Salary 25,000 Salary 40,000 -25,000 25,000 q 0 0 p = 1-q F(x) G(x) Yes HIV? p = 1-q q 1 1 stats. over many are accurate Poorvi Vora/CS/GWU

  6. Known Security Property of RDP m repeated queries m probability of error m  0  m   Chernoff Bound: m = [ln(2/)] /[0.38 2]  m<  Probability of lie = 0.5   Poorvi Vora/CS/GWU

  7. A simple inference attack • Query 1: Female? • Query 2: Over 40? • Query 3: Losing Calcium? Really asking about age and gender How does one characterize all such attacks? What can one say about security wrt such attacks? Poorvi Vora/CS/GWU

  8. Our definitions Definition An inference attack is a set of queries x not independent of the set of sensitive bits S, i.e. I (S ;x)  0 Definition A small error inference attack is one in which lim nm = 0 . Definition The query complexity per bit, of query sequence x of length m, as a means of distinguishing among M possible values of x is m = m/log2M . Poorvi Vora/CS/GWU

  9. Recall attack example • Query 1: Female? • Query 2: Over 40? • Query 3: Losing Calcium? Query 3 checks answers to Query 1 and 2 Is a parity-check bit of sorts, but not quite If 1 and 2 independent,  = 3/2 m 0  m   ? Poorvi Vora/CS/GWU

  10. Our analogy (ISIT ‘03) • All attacks are communication over channel • When attacks are codes: x = f(S) • What B queries is a codeword bit • What B receives is the transmitted codeword that he decodes Poorvi Vora/CS/GWU

  11. Shannon’s theorems apply when x = f(S) and  constant (ISIT ’03) Assuming x = f(S) (including adaptive, related queries) – queries are channel codes • constant – reliable transmission Result: m 0    1/C Above this bound, m 0 exponentially, Below it, it m increases exponentially Poorvi Vora/CS/GWU

  12. What about the general zero-error inference attack? All inference attacks are not codes, i.e. x f(S).  is not necessarily kept constant as m , i.e. transmission is not necessarily reliable. Poorvi Vora/CS/GWU

  13. Thm. 1 lim m  m = 0  { m}m=1 s.t. i   m  im; lim m  m = 1/C Proof modifies the converse of Shannon’s proof of the channel coding theorem Poorvi Vora/CS/GWU

  14. The Proof log2M = H(sm) =H(sm|ym) + I(sm;ym) • 1 + Emlog2M + I(sm;ym) • 1 + Emlog2M + mC • m= m/log2M  (1-Em)/(1/m+C) = m Lim m m = 1/C Poorvi Vora/CS/GWU

  15. Thm. 2 Small error attacks with constant   1/C exist. Proof: Follows from channel coding theorem Poorvi Vora/CS/GWU

  16. Thm. 3 For data of entropy H, stationary record sequence, Nr records, and m the number of queries per record, lim m  m = 0  {m}m=1 s.t. i   m  im; lim m  m = H/C Proof: Modification of source-channel coding theorem Poorvi Vora/CS/GWU

  17. Proof Given Theorem 1, smaller lengths can be shown to violate Shannon’s source coding theorem when the data is stationary. Poorvi Vora/CS/GWU

  18. Corollary m  ln2/22 When p = 0.5 For any probability of error Different from Chernoff bound, does not increase with a smaller probability of error This is the improvement bought over the repetition code Poorvi Vora/CS/GWU

  19. Where to? • Block Ciphers as channels for properties of the key (Filiol, ePrint 2003) • Attacks on Stream Ciphers as codes over key bits (Johansson et al, Golic et al, Filiol et al) • It appears there is a framework (Vora, working documents): • all statistical attacks as channel communication • efficient attacks as codes • related-input (key, message) attacks as concatenated codes1 • Wagner’s Cryptanalytic Model (FSE ‘03) to determine inner codes Do related-key attacks provide an improvement in efficiency over repeated key attacks? 1Filiol shows the repeated key attack on block ciphers as a concatenated code with the outer code as the repetition code Poorvi Vora/CS/GWU

  20. Also traffic analysis, e.g.Crowds: Reiter and Rubin/Lucent and AT&T N nodes; C colluding pf probability of forwarding At node i+1: Probability that node i originated the message (probability of truth): 1 – pf (N-C-1)/N Probability of any other non-collaborating node originating message: pf/N Observable information changes the pdf on the data of interest: the originator of the message Crowds Poorvi Vora/CS/GWU

  21. The Crowds protocol as a simplex channel X Y Φ: X = set of originator nodes {0, ..N-3} → Y = set of predecessor nodes {0, ..N-3} Φ(X) = Y Assumption: all senders equally likely P(Y = j | X = i) = pij = pf/N i j; = 1 – pf(N-2)/N; i=j Poorvi Vora/CS/GWU

  22. The Crowds protocol X Y C = 1+ (N-2)pf/N log [1- (N-2)pf/N] + pf/N log [pf/N] = 2log2/N if pf=1  2log2/N + (N-1)2 if pf= 1 -  Average path length = (1 - )/ = O(1/ ) Poorvi Vora/CS/GWU

  23. The replay attack on Crowds Repetition code  resending message, along different (randomly chosen) route How about attacks corresponding to other codes? Poorvi Vora/CS/GWU

More Related