230 likes | 252 Views
Explore the use of Random Data Perturbation (RDP) in protecting binary data integrity and its correlation with cryptographic techniques. Investigate inference attacks, Shannon's theorems, and implications for data security. Analyze the application of RDP in public health and statistical databases. Understand the role of small error attacks and inference complexity in safeguarding sensitive information. Delve into the potential of related-key attacks in enhancing data security measures.
E N D
Information Theory and the Security of Binary Data Perturbation Poorvi Vora Dept. of Computer Science George Washington University
Statistical Database • Database A: • Q = {q1 ,q2 ,...qi ,... } (queryable bits) and • S = {s1, s2,...si ,... } (sensitive bits). • Data collector B can ask for: fi(q1, q2, q3, …)qjQ = Xi Poorvi Vora/CS/GWU
The statistical database security problem • Can query multiple fi(q1, q2, q3, …)qjQ = Xi And simultaneously solve • (perfect zk protocols do not leak additional information about xi, but Ai are revealed; thus not a traditional cryptographic problem) Poorvi Vora/CS/GWU
Random Data Perturbation (RDP) Used in public health community for twenty odd years, can be used together with cryptographic techniques • If xi perturbed each time, the simultaneous equations are inconsistent fi(q1+1i, q2 +2i, q3 +3i, …) = Xi+ i • Security and attack characterization open problem for 20+ years; though many attempts (Denning, Adams, Duncan, … Landers). Poorvi Vora/CS/GWU
RDP Salary 25,000 Salary 40,000 -25,000 25,000 q 0 0 p = 1-q F(x) G(x) Yes HIV? p = 1-q q 1 1 stats. over many are accurate Poorvi Vora/CS/GWU
Known Security Property of RDP m repeated queries m probability of error m 0 m Chernoff Bound: m = [ln(2/)] /[0.38 2] m< Probability of lie = 0.5 Poorvi Vora/CS/GWU
A simple inference attack • Query 1: Female? • Query 2: Over 40? • Query 3: Losing Calcium? Really asking about age and gender How does one characterize all such attacks? What can one say about security wrt such attacks? Poorvi Vora/CS/GWU
Our definitions Definition An inference attack is a set of queries x not independent of the set of sensitive bits S, i.e. I (S ;x) 0 Definition A small error inference attack is one in which lim nm = 0 . Definition The query complexity per bit, of query sequence x of length m, as a means of distinguishing among M possible values of x is m = m/log2M . Poorvi Vora/CS/GWU
Recall attack example • Query 1: Female? • Query 2: Over 40? • Query 3: Losing Calcium? Query 3 checks answers to Query 1 and 2 Is a parity-check bit of sorts, but not quite If 1 and 2 independent, = 3/2 m 0 m ? Poorvi Vora/CS/GWU
Our analogy (ISIT ‘03) • All attacks are communication over channel • When attacks are codes: x = f(S) • What B queries is a codeword bit • What B receives is the transmitted codeword that he decodes Poorvi Vora/CS/GWU
Shannon’s theorems apply when x = f(S) and constant (ISIT ’03) Assuming x = f(S) (including adaptive, related queries) – queries are channel codes • constant – reliable transmission Result: m 0 1/C Above this bound, m 0 exponentially, Below it, it m increases exponentially Poorvi Vora/CS/GWU
What about the general zero-error inference attack? All inference attacks are not codes, i.e. x f(S). is not necessarily kept constant as m , i.e. transmission is not necessarily reliable. Poorvi Vora/CS/GWU
Thm. 1 lim m m = 0 { m}m=1 s.t. i m im; lim m m = 1/C Proof modifies the converse of Shannon’s proof of the channel coding theorem Poorvi Vora/CS/GWU
The Proof log2M = H(sm) =H(sm|ym) + I(sm;ym) • 1 + Emlog2M + I(sm;ym) • 1 + Emlog2M + mC • m= m/log2M (1-Em)/(1/m+C) = m Lim m m = 1/C Poorvi Vora/CS/GWU
Thm. 2 Small error attacks with constant 1/C exist. Proof: Follows from channel coding theorem Poorvi Vora/CS/GWU
Thm. 3 For data of entropy H, stationary record sequence, Nr records, and m the number of queries per record, lim m m = 0 {m}m=1 s.t. i m im; lim m m = H/C Proof: Modification of source-channel coding theorem Poorvi Vora/CS/GWU
Proof Given Theorem 1, smaller lengths can be shown to violate Shannon’s source coding theorem when the data is stationary. Poorvi Vora/CS/GWU
Corollary m ln2/22 When p = 0.5 For any probability of error Different from Chernoff bound, does not increase with a smaller probability of error This is the improvement bought over the repetition code Poorvi Vora/CS/GWU
Where to? • Block Ciphers as channels for properties of the key (Filiol, ePrint 2003) • Attacks on Stream Ciphers as codes over key bits (Johansson et al, Golic et al, Filiol et al) • It appears there is a framework (Vora, working documents): • all statistical attacks as channel communication • efficient attacks as codes • related-input (key, message) attacks as concatenated codes1 • Wagner’s Cryptanalytic Model (FSE ‘03) to determine inner codes Do related-key attacks provide an improvement in efficiency over repeated key attacks? 1Filiol shows the repeated key attack on block ciphers as a concatenated code with the outer code as the repetition code Poorvi Vora/CS/GWU
Also traffic analysis, e.g.Crowds: Reiter and Rubin/Lucent and AT&T N nodes; C colluding pf probability of forwarding At node i+1: Probability that node i originated the message (probability of truth): 1 – pf (N-C-1)/N Probability of any other non-collaborating node originating message: pf/N Observable information changes the pdf on the data of interest: the originator of the message Crowds Poorvi Vora/CS/GWU
The Crowds protocol as a simplex channel X Y Φ: X = set of originator nodes {0, ..N-3} → Y = set of predecessor nodes {0, ..N-3} Φ(X) = Y Assumption: all senders equally likely P(Y = j | X = i) = pij = pf/N i j; = 1 – pf(N-2)/N; i=j Poorvi Vora/CS/GWU
The Crowds protocol X Y C = 1+ (N-2)pf/N log [1- (N-2)pf/N] + pf/N log [pf/N] = 2log2/N if pf=1 2log2/N + (N-1)2 if pf= 1 - Average path length = (1 - )/ = O(1/ ) Poorvi Vora/CS/GWU
The replay attack on Crowds Repetition code resending message, along different (randomly chosen) route How about attacks corresponding to other codes? Poorvi Vora/CS/GWU