240 likes | 504 Views
Sequence-Aware Privacy Preserving Data-Leak Detection. Xiaokui Shu 11/29/2011. Content. Applications of Privacy Preserving Data-Leak Detection (PDLD) Challenges and our schema Sequence-aware PDLD (SPDLD) Implementation & evaluation. Application :: Outsourced Security Service. Internet.
E N D
Sequence-Aware PrivacyPreserving Data-Leak Detection XiaokuiShu 11/29/2011
Content • Applications of Privacy Preserving Data-Leak Detection (PDLD) • Challenges and our schema • Sequence-aware PDLD (SPDLD) • Implementation & evaluation
Application :: Outsourced Security Service Internet Customer’s network • Service Provider • Professional solution • Value-added service provider (VASP) • Semi-honest: honest, but curious • Customer • Zero knowledge required • Better business concentration Guest Outsourced Security Service
Application :: Introspective Security Service Intranet Endpoint Sensitive Data Owner VPN Endpoint Normal Endpoint • Sensitive Data Owner • Knowledge of all sensitive data • Distribute sensitive data fingerprints to endpoints • DLD Endpoint • Inside or outside the intranet • Being monitored to be data-leak-free Internet
Challenges • Accuracydecrease both false positive and false negative • Privacyminimize the DLD executor’s knowledge of the sensitive data • Efficiencyreal-time processing of the traffic in PCs as well as through network gateways • Robustnessthe ability to handle modified leaked data, or variants
SPDLD Schema • Robustnessextract local features to represent the sensitive data • Accuracytake into account features of the sensitive data as well as the relationship among features • Privacyhash/fingerprinting values, samples • Efficiencysample both sensitive data and network traffic to improve performance
Fingerprint Tape II • Sensitive Data • Network Traffic • Fingerprint Tape I SPDLD :: Whole View Data Owner DLD Executor • Alignment
SPDLD :: Basic Alignment w/o Sampling … Alignment Result …
SPDLD :: Flow Sampling Requirement No matter where we start, • ABCDEFGHIJKLMNOPQ • …FJM… • CDEFGHIJKLMNOPQRS • …FJM… We should always have the same sample for an identical segment.
SPLDL :: Punching FingerprintTape Fingerprint Punched fingerprint in FingerprintTape Sliding window Quasi-gap encoded in FingerprintTape Minimum fingerprint in the window FP Flow … … … … FPTape … …
SPLDL :: Advanced FingerprintTape • Quasi-gap encoding/decoding • Start flags bound for each FingerprintTape • Start position recorded
SPLDL :: Alignment • Needleman-Wunsch Algorithm • Dynamic programming • Gap penalty • Unit comparison function replaced to expand quasi-gap • Implementation optimized for Python using 1D array and multiple iterators
Implementation & Evaluation • Implementation Environment • Python 2.7 • Sensitive data • One paragraph from the source of TCP/IP wikipedia page • Leaked network traffic • Whole source of TCP/IP wikipedia page • MediaWiki & WordPress
Implementation & Evaluation • Parameters of the system • 3-byte shingles • 64 bit Rabin’s fingerprint • Window size: 100 • Number of minima: 5 • Unit score in alignment • Match: 12, Mismatch: -1, Gap: -4
Implementation & Evaluation :: Speed • My optimization of Needleman–Wunsch algorithm achieves 2.5 times speed as the naive (my previous) implementation • Comparison of set intersection, basic alignment, FingerprintTape
Background :: Shingling & Fingerprinting shingling hashing
Background :: Automation-based RE Matching • Evolution of pattern matching in NIDS Boyer–Moore Regular Expression Support Aho–Corasick Multi-pattern search NFA DFA Automations D2FA CD2FA
Background :: List Alignment • Needleman-Wunsch • Dialign
SPDLD :: Shingling & Fingerprinting 658955 SENSITIVE INFO 452785 fingerprints 123587 754812 458763 shingling 885621 645853 SENSITIV ENSITIVE shingles NSITIVE fingerprinting SITIVE I ITIVE IN TIVE INF IVE INFO
Sequence-Aware PP-DLD set list flow