220 likes | 356 Views
A Framework for the Analysis of Mix-Based Steganographic File Systems. Claudia Diaz, Carmela Troncoso , Bart Preneel K.U.Leuven / COSIC Cambridge, January 28, 2009. Motivation. Problem: we want to keep stored information secure (confidential)
E N D
A Framework for the Analysis of Mix-Based Steganographic File Systems Claudia Diaz, Carmela Troncoso, Bart Preneel K.U.Leuven / COSIC Cambridge, January 28, 2009
Motivation • Problem: we want to keep stored information secure (confidential) • Encryption protects against the unwanted disclosure of information • but… reveals the fact that hidden information exists! • User can be threatened / tortured / coerced to disclose the decryption keys (“coercion attack”) • We need to hide the existence of files • Property: plausible deniability • Allow users to deny believably that any further encrypted data is located on the storage device • If password is not known, not possible to determine the existence of hidden files
Attacker model: one snapshot • Attacker has never inspected the user’s computer before coercion • Ability to coerce the user at any point in time • User produces some keys • Attacker inspects user computer • Game: If attacker is able to determine that the user has not provided all her keys, the attacker wins
Anderson, Needham & Shamir (1998) • Use cover files such that a linear combination (XOR) of them reveals the information • Password: subset of files to combine • Hierarchy (various levels of security) • User can show some “low” security levels while hiding “high” security levels • Not possible to know whether she has revealed the keys to all existing levels • Drawbacks: • File read operations have high cost • Needs a lot of cover files to be secure (computationally infeasible to try all combinations) • Assumes adversary knows nothing about the plaintext
Anderson, Needham & Shamir (1998) • Real files hidden in encrypted form in pseudo-random locations amongst random data • Location derived from the name of the file and a password • Collisions (birthday paradox) overwrite data: • Use only small part of the storage capacity ( < ) • Replication • All copies of a block need to be overwritten to lose the data • Linear hierarchy: higher security levels need more replication
StegFS: McDonald & Kuhn (1999) • Implemented as extension of the Linux file system (Ext2fs) • Hidden files are placed into unused blocks of a “normal” partition • Normal files are overwritten with random data when deleted • Attacker cannot distinguish a deleted normal file from an encrypted hidden file • Block allocation table with one entry per block on the partition: • Used blocks: entry encrypted with same key as data block • Unused blocks: random data • The table helps locating data and detecting corrupted blocks (lower security levels can still overwrite higher ones)
Attackermodel: continuousobservation • What if attacker can observe accesses to the store? • Remote or shared semi-trusted store • Distributed P2P system • Same game as before: • Ability to coerce the user at any point in time • User produces keys to some security levels • Attacker inspects user computer • If attacker is able to determine that the user has not provided all her keys, the attacker wins • BUT now the adversary has prior information (which blocks have been accessed/modified) • Previous systems do not provide plausible deniability against this adversary model
Previous work where this adversary is relevant: P2P • Distributed (P2P) steganographic file systems: • Mnemosyne: Hand and Roscoe (2002) • Mojitos: Giefer and Letchner (2002) • Propose dummy traffic to hide access patterns (no details provided)
Previous work where this adversary is relevant: Semi-trusted remote store • Semi-trusted remote store: Zhou et al. (2004) • Use of constant rate cover traffic (dummy accesses) to disguise file accesses • Every time a block location is accessed, it is overwritten with different data (re-encrypted with different IV) • Block updates no longer indicate file modifications • Every time a file block is accessed, it is moved to another (empty) location • Protects against simple access frequency analysis • Relocations are low-entropy • Broken by Troncoso et al. (2007) with traffic analysis attacks that find correlations between sets of accesses • Multi-block files are found prior to coercion if they are accessed twice • One-block files are found if accessed a few times
How it is broken (simplified version) At time t1 At time t2 • 1 • 10 • 2 • 20 • 3 • 30 • 4 • 40 • … • … • 10 • 100 • 20 • 200 • 30 • 300 • 40 • 400
Can we provide plausible deniability against an adversary who monitors the store prior to coercion?
System model • Files are stored on fixed-size blocks • Blocks containing (encrypted) file data are undistinguishable from empty blocks containing random data • Several levels of security (we assume hierarchical) • User discloses keys to some of these levels while keeping others hidden • Data persistence: erasure codes for redundancy (impact on plausible deniability) • Traffic analysis resistance • Constant rate dummy traffic • High entropy block relocation Process user file requests Generate dummy traffic (uniform)
User Login • User logs in with security level s, by providing key uks • Agent trial-decrypts every entry in the table • Files in security levels s or lower can be found in the table • Files in higher security levels are indistinguishable from random (empty) • Agent starts making block accesses (either dummy or to retrieve files requested by the user) • For each block, the agent performs an access cycle Table
Block encryption • Block containing a file in security level s • User key: uks • (One time) block key: bki • Empty block, or containing a file in security level higher than s data random
Access cycle Table
Attack methodology • Attacker profiles the system to extract: • Typical access sequences when the user is idle (dummy traffic) • Typical access sequences when the user is accessing a file • Attacker monitors accesses and looks for sequences that look like file accesses • Attacker coerces the user when sequence indicates possible file access (worst case scenario) • Attacker obtains some user keys and inspects computer • Attacker combines the evidence obtained before and after coercion to try to determine if there are more user keys the user has not provided • If the probability of undisclosed keys is high, deniability is low, and vice versa.
Extracting information from the sequence of accesses to the store I • Attacker profiles the system to extract typical access sequences when the user is accessing a file 2 3 7 1 9 3 8 9 4 7 5 x x 8 4 x MixSFS 4 8 5 9 1 7 3 2
Extracting information from the sequence of accesses to the store II • Attacker profiles the system to extract: • Typical access sequences when the user is idle (dummy traffic) • Establish a baseline for dummy traffic • Analyze accesses to store and find strong correlations (unlikely to be generated by dummy traffic) • For big files, the area that goes over the baseline is much bigger than for dummy traffic (i.e., distinguishable)
Security metrics: unobservability • Prior to coercion: • we define unobservability (U) as the probability of a file operation being undetectable by the adversary; i.e., the sequence of store accesses generated by a file operation is considered by the adversary as dummy traffic
Security metrics: deniability • After coercion • Percentage of empty blocks in pool compared to the percentage in the whole store • Worst case scenario: coercion occurs immediately after a hidden file access – large number of “empty” blocks in the pool • We define deniability (D) as the probability that the evidence collected by the adversary (before and after coercion) has been generated by dummy traffic (i.e., no evidence of hidden files).
Conclusions and open questions • Conclusions • Hard to protect against traffic analysis, even using constant rate dummy traffic • Hard to conceal file accesses with dummy traffic that selects locations uniformly at random • When files occupy more blocks, access to them is harder to conceal • Open questions • More sophisticated pattern recognition algorithms may extract more info from the sequence of accesses • Design of smarter traffic analysis strategies • Can such a system be implemented in practice?