1 / 20

802.11 User Fingerprinting

802.11 User Fingerprinting. Jeff Pang, Ben Greenstein, Ramki Gummadi, Srini Seshan, and David Wetherall. Most slides borrowed from Ben. Location Privacy is at Risk. Your MAC address: 00:0E:35:CE:1F:59. Usually < 100m. You. “The adversary” (a.k.a., some dude with a laptop).

tacita
Download Presentation

802.11 User Fingerprinting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 802.11 User Fingerprinting Jeff Pang, Ben Greenstein, Ramki Gummadi, Srini Seshan, and David Wetherall Most slides borrowed from Ben

  2. Location Privacy is at Risk Your MAC address: 00:0E:35:CE:1F:59 Usually < 100m You “The adversary” (a.k.a., some dude with a laptop)

  3. MAC address now: 00:0E:35:CE:1F:59 MAC address later: 00:AA:BB:CC:DD:EE Are pseudonyms enough?

  4. Implicit Identifiers Remain • Consider one user at SIGCOMM 2004 • Visible in an “anonymized” trace • MAC addresses scrubbed • Effectively a pseudonym • Transferred 512MB via bittorrent • => Crappy performance for everyone else • Let’s call him Bob • Can we figure out who Bob is?

  5. Implicit Identifier: SSIDs • SSIDs in Probe Requests • Windows XP, Mac OS X probe for your preferred networks by default • Set of networks advertised in a traffic sample • Determined by a user’s preferred networks list SSID Probe: “roofnet” Bob

  6. What if Bob used pseudonyms? • “roofnet” probe occurred during different session than bittorrent download • Can no longer explicitly associate “roofnet” with poor network etiquette • Can we do it implicitly?

  7. Implicit Identifier: Network Destinations • Network Destinations • Set of IP <address, port> pairs in a traffic sample • In SIGCOMM, each visited by 1.15 users on average • A user is likely to visit a site repeatedly (e.g., an email server) SSH/IMAP server: 159.16.40.45 Bob

  8. What if network is encrypted? • Can’t see IP addresses through link-layer encryption like WPA • Is Bob safe now?

  9. Implicit Identifier: Broadcast Packet Sizes • Broadcast Packet Sizes • Set of 802.11 broadcast packet sizes in a traffic sample • E.g., Windows machines NetBIOS naming advertisements; FileMaker and Microsoft Office advertise themselves • In SIGCOMM, only 16% more unique <application, size> tuples than unique sizes Broadcast packet sizes: 239, 245, 257 Bob

  10. Implicit Identifier:MAC Protocol Fields • MAC Protocol Fields • Header bits (e.g., power mgmt., order) • Supported rates • Offered authentication algorithms Mac Protocol Fields: 11,4,2,1Mbps, WEP, etc. Bob

  11. What else do implicit identifiers tell us? David J. Wetherall Anonymized 802.11 Traces from SIGCOMM 2004 Search on Wigle for “djw” in the Seattle area A pseudonym Google pinpoints David’s home (to within 200 ft)

  12. Automating Implicit Identifiers ? ? ? TRAINING: Collect some traffic known to be from Bob OBSERVATION: Which traffic is from Bob?

  13. Simulate using SIGCOMM, USCD Split trace into training data and observation data Sample = 1hour of traffic to/from a user Assume pseudonyms Methodology “The adversary”

  14. Did this traffic sample come from Bob? Naïve Bayesian Classifier: We say sample s (with features fi) is from Bob if Pr[s from Bob | s has features fi] > T How to convert implicit identifiers into features?

  15. Did This Traffic Sample Come from Bob? Features: Set similarity (Jaccard Index), weighted by frequency: Rare djw linksys IR_Guest SIGCOMM_1 Common SAMPLE FORVALIDATION PROFILE FROMTRAINING

  16. 60% TPR with 99% FPR Higher FPR, likely due to not being user specific Useful in combination with other features, to rule out identities Individual Feature Accuracy

  17. Multi-feature Accuracy • Samples from 1 in 4 users are identified >50% of the time with 0.001 FPR bcast + ssids + fields + netdests bcast + ssids + fields bcast + ssids

  18. Was Bob here today? • Maybe… • Suppose N users present • Over an 8 hour day, 8*N opportunities to misclassify a user’s traffic • Instead, say Bob is present iff multiple samples are classified as his

  19. In a busy coffee shop with 25 concurrent users, more than half (54%) can be identified with 90% accuracy 4 hour median to detect (4 samples) 27% with two 9s. Was Bob here today?

  20. Conclusion: Pseudonyms Are Insufficient • 4 new identifiers: netdests, ssids, fields, bcast • Average user emits highly distinguishing identifiers • Adversary can combine features • Future • Uncover more identifiers (timing, etc.) • Validate on longer/more diverse traces(SSIDs stable in home setting for >=2 weeks) • Build a better link layer

More Related