210 likes | 293 Views
802.11 User Fingerprinting. Jeff Pang, Ben Greenstein, Ramki Gummadi, Srini Seshan, and David Wetherall. Most slides borrowed from Ben. Location Privacy is at Risk. Your MAC address: 00:0E:35:CE:1F:59. Usually < 100m. You. “The adversary” (a.k.a., some dude with a laptop).
E N D
802.11 User Fingerprinting Jeff Pang, Ben Greenstein, Ramki Gummadi, Srini Seshan, and David Wetherall Most slides borrowed from Ben
Location Privacy is at Risk Your MAC address: 00:0E:35:CE:1F:59 Usually < 100m You “The adversary” (a.k.a., some dude with a laptop)
MAC address now: 00:0E:35:CE:1F:59 MAC address later: 00:AA:BB:CC:DD:EE Are pseudonyms enough?
Implicit Identifiers Remain • Consider one user at SIGCOMM 2004 • Visible in an “anonymized” trace • MAC addresses scrubbed • Effectively a pseudonym • Transferred 512MB via bittorrent • => Crappy performance for everyone else • Let’s call him Bob • Can we figure out who Bob is?
Implicit Identifier: SSIDs • SSIDs in Probe Requests • Windows XP, Mac OS X probe for your preferred networks by default • Set of networks advertised in a traffic sample • Determined by a user’s preferred networks list SSID Probe: “roofnet” Bob
What if Bob used pseudonyms? • “roofnet” probe occurred during different session than bittorrent download • Can no longer explicitly associate “roofnet” with poor network etiquette • Can we do it implicitly?
Implicit Identifier: Network Destinations • Network Destinations • Set of IP <address, port> pairs in a traffic sample • In SIGCOMM, each visited by 1.15 users on average • A user is likely to visit a site repeatedly (e.g., an email server) SSH/IMAP server: 159.16.40.45 Bob
What if network is encrypted? • Can’t see IP addresses through link-layer encryption like WPA • Is Bob safe now?
Implicit Identifier: Broadcast Packet Sizes • Broadcast Packet Sizes • Set of 802.11 broadcast packet sizes in a traffic sample • E.g., Windows machines NetBIOS naming advertisements; FileMaker and Microsoft Office advertise themselves • In SIGCOMM, only 16% more unique <application, size> tuples than unique sizes Broadcast packet sizes: 239, 245, 257 Bob
Implicit Identifier:MAC Protocol Fields • MAC Protocol Fields • Header bits (e.g., power mgmt., order) • Supported rates • Offered authentication algorithms Mac Protocol Fields: 11,4,2,1Mbps, WEP, etc. Bob
What else do implicit identifiers tell us? David J. Wetherall Anonymized 802.11 Traces from SIGCOMM 2004 Search on Wigle for “djw” in the Seattle area A pseudonym Google pinpoints David’s home (to within 200 ft)
Automating Implicit Identifiers ? ? ? TRAINING: Collect some traffic known to be from Bob OBSERVATION: Which traffic is from Bob?
Simulate using SIGCOMM, USCD Split trace into training data and observation data Sample = 1hour of traffic to/from a user Assume pseudonyms Methodology “The adversary”
Did this traffic sample come from Bob? Naïve Bayesian Classifier: We say sample s (with features fi) is from Bob if Pr[s from Bob | s has features fi] > T How to convert implicit identifiers into features?
Did This Traffic Sample Come from Bob? Features: Set similarity (Jaccard Index), weighted by frequency: Rare djw linksys IR_Guest SIGCOMM_1 Common SAMPLE FORVALIDATION PROFILE FROMTRAINING
60% TPR with 99% FPR Higher FPR, likely due to not being user specific Useful in combination with other features, to rule out identities Individual Feature Accuracy
Multi-feature Accuracy • Samples from 1 in 4 users are identified >50% of the time with 0.001 FPR bcast + ssids + fields + netdests bcast + ssids + fields bcast + ssids
Was Bob here today? • Maybe… • Suppose N users present • Over an 8 hour day, 8*N opportunities to misclassify a user’s traffic • Instead, say Bob is present iff multiple samples are classified as his
In a busy coffee shop with 25 concurrent users, more than half (54%) can be identified with 90% accuracy 4 hour median to detect (4 samples) 27% with two 9s. Was Bob here today?
Conclusion: Pseudonyms Are Insufficient • 4 new identifiers: netdests, ssids, fields, bcast • Average user emits highly distinguishing identifiers • Adversary can combine features • Future • Uncover more identifiers (timing, etc.) • Validate on longer/more diverse traces(SSIDs stable in home setting for >=2 weeks) • Build a better link layer