230 likes | 248 Views
A model for data revelation. Poorvi Vora Dept. of Computer Science George Washington University. “Security” frameworks. Binary Divide the world into trusted and untrusted parties Provides complete revelation of information or complete protection
E N D
A model for data revelation Poorvi Vora Dept. of Computer Science George Washington University
“Security” frameworks Binary • Divide the world intotrusted and untrustedparties • Provides complete revelation of information or complete protection E.g. multiparty computation, encrypted data Poorvi Vora/CS/GWU
Even a statistic or aggregate reveals “private” information Secure multiparty computation reveals f(x1, x2, .. xn) And nothing more. Yet, this reveals information about all xi Thus, typical security assurances not enough Poorvi Vora/CS/GWU
What is privacy • Control over information • Extent of information revelation Tensions between: Access to aggregate information for community Vs. Individual control reputation vs. predjudice Poorvi Vora/CS/GWU
Individual control requires more than binary security of personal information Information is often given up for something in return • Safeway card • Monthly charge to be kept of phone books • Information for community statistics: • Health statistics • Collaborative filtering/personalization in virtual communities Poorvi Vora/CS/GWU
A model: introduce uncertaintymaximum uncertainty (i.e. secrecy) corresponds to crypto protocols • Alice and Bob determine: • a binary data point from Alice’s personal information, x • a probability of truth, p • a return, y • Alice reveals a variable z = x with probability p • Bob provides, in return, y • z exists in the ether as Alice’s value x with probability p This is not mutually exclusive with cryptographic protection (p=0.5 is cryptographic) Used in public health community for twenty odd years Poorvi Vora/CS/GWU
Outcome Protocol is a mathematical game between Alice and Bob Optimal situation not when no information is revealed, but when Alice gets maximum benefit for her information Think about this: should women in Africa test for HIV when they will certainly not obtain any treatment for it? Poorvi Vora/CS/GWU
An analogy • The protocol is a communication channel • The sender is Alice, the receiver (malicious?) Bob • The probability of error is the probability of a lie Poorvi Vora/CS/GWU
Security properties of randomization • Repeated queries Error 0 as n And n as Error 0 • Cost to attacker increases without bound if error not bounded above zero • This is a repetition code over channel Poorvi Vora/CS/GWU
Other attacks Query 1: Graying? Query 2: Balding? Query 3: Weight? Query 4: Sports? Really asking about age and gender How does one characterize all such attacks? What can one say about security wrt such attacks? Poorvi Vora/CS/GWU
An analogy • The protocol is a communication channel • The sender is Alice, the receiver (malicious?) Bob • The probability of error is the probability of a lie • The attributes that Bob wants to determine form the message Poorvi Vora/CS/GWU
A simple attack • Query 1: Female? • Query 2: Over 40? • Query 3: Losing Calcium? Query 3 checks answers to Query 1 and 2 Is a parity-check it Poorvi Vora/CS/GWU
An analogy • All attacks are communication over channel • Good attacks are codes • What Bob queries is a codeword bit • What he receives is the transmitted codeword that he decodes Poorvi Vora/CS/GWU
Shannon’s theorems apply In fact, assuming any functions of Alice’s data points as queries (adaptive, related queries) and error probability 0 as n The number of queries required per bit of entropy is asymptotically tightly bound below by the inverse of the channel capacity Above this bound, error tends exponentially to 0 Below it, it increases exponentially with n Poorvi Vora/CS/GWU
Questions • How does one determine the entropy of a particular data set, or a general data set? • What kinds of attacks are computationally feasible? • This was a very powerful attacker. What are reasonable limits on the attacker’s abilities? • Result in itself, independent of model. • Partly published at Int. Symp. Info. Theory, 2003 • Journal paper in review, at website Poorvi Vora/CS/GWU
Value-free model • Human rights aspects covered through crypto protocols • Necessary health information and community information can be gathered • Consumer behaviour treated through this game • Criticism: very adversarial model Poorvi Vora/CS/GWU
Another application: anonymous deliveryCrowds: Reiter and Rubin/Lucent and AT&T • At node i+1: node i more likely than any other • Receiver: Node i+1 • Message: sending node • Received symbol: Node i • Channel characteristic: • Probability that true sender is Node i, • Probability that other nodes are senders • Traffic analysis/data mining: correlations among senders (communication across channel, less efficient than some error-correcting code) B A E C D N nodes; pf probability of forwarding Poorvi Vora/CS/GWU
An example of model use to measure the value of informationwith Yu-An Sun and Sumit Joshi • Auction bids reveal much about an individual’s profile • Consider the Vickrey – sealed second highest bid – auction • Optimal strategy: to bid one’s valuation • Bids (and hence valuations) can be protected with secure multiparty computation • But, bids allow determination of market demand (efficient markets) • Need for an aggregate value, not well-defined at the moment of the auction Poorvi Vora/CS/GWU
Variably Private Vickrey – Bidding RoundIntroduce uncertainty • The seller announces a minimum sale price and a maximum randomization setting. • Each bidder submits a sealed interval containing her bid. The size of the interval is her choice. • In the running with high end, committed to low Poorvi Vora/CS/GWU
Variably Private Vickrey – Revealing Round • Bidders not in the running will reveal no more information on their valuations. • Largest of the others will reveal which half of their interval contains valuation Poorvi Vora/CS/GWU
Sale Price Buyer pays Seller gets { Divided among all bidders proportional to the interval width Poorvi Vora/CS/GWU
Properties? • Provides various demand statistics • In general, accuracy of future bid estimation lower for more uncertainty • Allows for bidder to vary uncertainty, and pay for it • Allows seller to obtain more than regular Vickrey, depending on how much information is valued • Bidder with highest valuation still wins auction as long as she can tolerate revealing her valuation to the extent required. Poorvi Vora/CS/GWU
Summary A model that we hope will: • Provide choices not currently typically available to users • Extend the security framework to include problems like those in statistical databases • Provide a means of measuring uncertainty in situations where there is some not none or complete • Include other leakage from security-related protocols such as anonymous delivery and ciphers • Be useful for measuring the economic value of information Poorvi Vora/CS/GWU