290 likes | 407 Views
Vanish : Increasing Data Privacy with Self-Destructing Data. Roxana Geambasu, Tadayoshi Kohno, Amit Levy, et al. University of Washington USENIX Security Symposium, 2009 --- Presented by Joseph Del Rocco University of Central Florida. Outline. Distributed Hash Tables (DHT)
E N D
Vanish: Increasing Data Privacy with Self-Destructing Data Roxana Geambasu, Tadayoshi Kohno, Amit Levy, et al. University of Washington USENIX Security Symposium, 2009 --- Presented by Joseph Del Rocco University of Central Florida
Outline • Distributed Hash Tables (DHT) • Data Destruction w/ Vanish • Motivations • Architecture • Implementations • Results • Contributions / Weakness / Improvement • References
Hash Tables (review)[5] • Tag or data hashed into table index • Hashing functions:MD5, SHA-1, CRC32, FSB, HAS, etc.++++ • Hash collisions unavoidable, so linked lists used (see birthday paradox)
Distributed Hash Tables • Hash Tables… split-up across machines • Key-space astronomically large (2128, 2160) 2128 = 340282367000000000000000000000000000000 • In 2001, four main DHTs ignited research: - Chord (MIT) - CAN (Berkeley, AT&T) - Pastry (Rice, Microsoft) - Tapastry (Berkeley) • Availability, Scale, Decentralized, Churn!
Viewed as a Distributed Hash Table [3] 0 2128-1 Hash table Peer nodes • Each peer node is responsible for a range of the hash table, according to the peer hash key • Location information about Objects are placed in the peer with the closest key (information redundancy) 5
DHT in action: put() [3] K V K V K V K V K V K V K V K V K V K V K V Want to share a file insert(K1,V1) Operation: Route message, “I have the file,” to node holding key K1 6
DHT in action: get() [3] K V K V K V K V K V K V K V K V K V K V K V retrieve (K1) Operation: Retrieve message V1 at node holding key K1 7
Chord [4] • Keys, nodes have 2m hash (filename, IP) • Each node stores ~(K / N) keys • “finger table” = next((n + 2i − 1) % 2m)
CAN – Content Addressable Network [3] Each peer is responsible for one zone, i.e., stores all (key, value) pairs of the zone Each peer knows the neighbors of its zone Random assignment of peers to zones at startup – split zone if not empty Dimensional-ordered multi-hop routing 9
CAN: Object Publishing [3] x = a I node I::publish(K,V) (1) a = hx(K) b = hy(K) y = b 10
CAN: Object Publishing [3] I node I::publish(K,V) J (1) a = hx(K) b = hy(K) (2) route (K,V) -> J 11
Modern, Popular P2P w/ DHTs • Vuze / Azureus (Java BitTorrent client) • BitTorrent DHT (Based on KAD) • IBM Websphere • Apache Cassandra • OpenDHT • Mainline • Kazaa, eDonkey, etc. (KAD) Dendrobates Azureus(Blue Poison Dart Frog)
Vuze (Azureus) Specifics • Nodes in network assigned “random” 160-bit ID hashed on IP & port (DHT idx range) • Client sends “put” messages to 20 closest nodes to hashed key index in DHT • Nodes re-put() entries from local hash tables every 30 minutes to combat churn • Nodes supposedly remove key/value pairs > 8 hours, if not re-put() by originator • Originator node must re-put() to persist?
Vanish Motivations • Data frequently cached/archived by email providers, ISPs, network backup systems • Often available after account termination • Forensic examination of hard drives (raid) • Laptops stolen, taken-in for repair • High-profile political scandals • Some argue the right and ability to destroy data is as fundamental as privacy & liberty
Vanish Motivations • Hushmail email encryption service offered cleartext contents of encrypted messages to the federal government • Trusted 3rd party (Ephemerizer) supposedly destroy data after timeout, but this never caught on… trust issue? • Subpoenas…
Vanish Goals • Create a Vanishing Data Object (VDO) • Becomes unreadable after a timeout, regardless if one retroactively obtains a pristine copy of VDO before expiration • Accessible until timeout • Leverage existing infrastructure • NO required passwords, keys, special security hardware…
Vanish Architecture • Encrypt data D w/ random key K into C • Use T.S.S.[6] to split C into N shares • Pick random access key L, use cryptographically secure PRNG (keyed by L) to derive N indices
Vanish Architecture • Threshold of T.S.S. (threshold ratio), determines how many of N shares are needed to reconstruct K • EX: N = 20, threshold = 10So any 10 of the 20 shares can be used • EX: N = 50, threshold = 50Better have all shares… • VDO = (L, C, N, threshold), sent / stored
Vanish Decapsulation • Given VDO: - extract access key L - derive locations of shares of K - get() # of shares required by threshold - reconstruct K - decrypt C to obtain D • # of shares must be > threshold ratio
Benefits of Churn! • Nodes continue to leave/re-enter network • Supposedly 80% of IPs change in 7 days • Nodes change IDs (locations in network) as IP changes • Also, hash tables per node purge themselves after some time period • So data is guaranteed to NOT last long at its original node…
The Big Question… [1][7] • How long are we talking w/ churn? - Vuze = unclear… (7h, 3h, 2h …) - OpenDHT = (1 hour – 1 week) - Kazaa = ~“several minutes” (2.5) • Refresh uses K, re-splits into new shares, uses L to derive new indices & re-puts() • “Naturally, refreshes require periodic Internet connectivity.”[1]
Results “two minor changes (<50 lines of code)”
Results “[With single share VDO model] … ~5% of VDOs continue to live long after 8 hours.” [1] “…the cause for the latter effect demands more investigation, we suspect that some of the single VDO keys are stored by DHT peers running non-default configurations.” [1] “These observations suggest that the naive (one share) approach [does not meet our goals] … thereby motivating our need for redundancy.” [1]
Contributions • Solution utilizes existing, popular, researched technology - used since 2001 • Interesting idea utilizing DHT as general temporary storage • No required special security hardware, or special operations on the part of the user • Utilizes inherent half-life (churn) of nodes in DHT – data definitely destroyed
Weaknesses • Requirements: - network connection (put, get, !destroy) - complex analysis of DHT networks - “refresher” hardware for reasonable life • Clearly not for all data! (network flooding) • Life of data is not mathematically determinate or even guaranteed (depends completely on churn) • Assumes no hardcopies of data…
Improvement / Future Work • Instead of refresher hardware, DHT maintenance could refresh automatically • Utilization of many P2Ps in parallel, choose appropriate one based upon churn • Analyze network w/ many data objects over very long timeout periods • Make sure VDO is well encrypted or someone could easily hack the threshold
References • Geambasu, Roxana, et al. Vanish: Increasing Data Privacy with Self-Destructing Data, USENIX Security Symposium, 2009 • Wiley, Brandon. Distributed Hash Tables, Part I, http://www.linuxjournal.com/article/6797, Linux Journal, 2003 • Hua, Kien. P2P Search, http://www.cs.ucf.edu/~kienhua/classes/, COP5711 Parallel & Distributed Databases, University of Central Florida, 2008 • http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29 • http://en.wikipedia.org/wiki/Hash_table • Shamir, A. How to share a secret, Commun. ACM, 1979 • Stutzbach, D., Rejaie R. Characterizing Churn in P2P Networks, Technical Report, University of Oregon, 2005