130 likes | 231 Views
(Un)Trustworthy Wireless: What your wireless traffic says about you…. Jeff Pang with Ben Greenstein, Ramki Gummadi, Tadayoshi Kohno, David Wetherall (UW/Intel Seattle), and Srini. What are we trying to achieve?. Time to rethink privacy implications of wireless networks
E N D
(Un)Trustworthy Wireless:What your wireless traffic says about you… Jeff Pang with Ben Greenstein, Ramki Gummadi, Tadayoshi Kohno, David Wetherall (UW/Intel Seattle), and Srini
What are we trying to achieve? • Time to rethink privacy implications of wireless networks • Identify the shortcomings of current designs and how an adversary might exploit them • Propose some directions for thwarting these attacks • Initial focus on Wi-Fi, but aim is to address other protocols as well, e.g., Bluetooth, RFID, GSM
What is wireless privacy? • Traditional notions: • data encryption • user authentication • Anonymity is also important • Traditional notion not quite right: e2e privacy only • Data encryption doesn’t preserve anonymity • 3rd party can still track where a user goes, with whom he might be communicating, what sorts of data he might be exchanging, and what sorts of applications he might be running • traditionally known as traffic analysis, but much easier to do with ubiquitous wireless
What information is being leaked? • The link between a wireless card and its associated AP • Where a user has been • Thug tracks a user from the bank’s network to the dark alley’s network • Who has been in an area • Jealous boyfriend monitors girlfriend’s apartment network • Timestamps of user transmissions • When are people talking and how much are they saying (chatter) • Who is talking to whom? (assumes monitors at both edges) • A dermatologist shares records with an oncologist near patient X, ergo X may have melanoma
An initial problem statement • Adversary: • Can passively sniff all 802.11 traffic at various locations (e.g., café, library, your home, conference) • Goal: • Wants to know where you were at and when you visited • Question: • Given a traffic sample, how accurately can an adversary accurately classify it as belonging to you or not? • Assumptions: • Adversary has some traffic samples “known” to come from you (e.g., sitting next to you while he/she is collecting it) • Adversary has collected a library of traffic samples from other (random) users in the targeted locations
The obvious answer • Yes! • Trivially, by looking at MAC addresses • globally unique • always transmitted in the clear • But that is also trivially thwarted • Can change MAC address each time you associate to an AP • Suppose the next wireless driver patch does this • Knowledgeable users can do this themselves, of course • But is this a sufficient fix to advertise “improved privacy”? • Revised question: • How accurately can the adversary classify a traffic sample if MAC addresses change, say, each hour?
Initial approach • Fairly generic machine learning algorithm: • Compute a “profile” based on known traffic from target user • Based on profile, generate features for each traffic sample • Use known traffic samples to train a naïve bayes model (e.g., generate a probability table for each feature) • Given a new sample, model outputs a probability p that sample came from target user • Assume positive match if p > T, for some T • Two types of profile features: • 802.11 specific (ctrl pkt contents, driver timing behavior, etc.) • Ben Greenstein working on this • 802.11 agnostic (IP/application traffic features) • I’m working on this
Initial features • Conjecture: the sites you visit identify you • e.g., only you visit slashdot, cnn, joe’s blog, etc. • Profile P: • Set of IP destinations we observe you talking to • Feature: • Set similarity of the IPs seen in the traffic sample S and your profile; i.e., • intersection(P, S)/union(P, S) • Higher scores mean the traffic sample visited more of the same sites
Initial features (2) • Problem: User can mask IP packet contents • AP can use WEP/WPA • User might tunnel traffic through a VPN • Attempt to use other exposed features • Object sizes: previous work shows object sizes from a website identifies it accurately • use packet timings to group packets into “objects” • feature: set similarity based on the set of object sizes users accessed • challenges: overlapping flows, dynamic web content • Other possibilities: infer site RTT, site bandwidth, etc. • Question: how good can we do?
Initial results • Setup • SIGCOMM ’04 Wireless traces • Wireless traffic from ~200 users across 3 days at the conference • Limitations: homogenous location, biased user population, limited timeframe • Looking for volunteers to collect better data! • Build profiles and train model using traffic on the first day • For each hourly traffic sample in the 2nd and 3rd day: • For each user: • Can we determine if a sample comes from that user or not? • Metrics: • True positive rate • the fraction of samples from that user that are correctly classified • False positive rate • the fraction of samples not from that user that are misclassified • Tune the classification threshold T to trade-off one for the other
Some profiles better than others IP Destinations Object Sizes
Summary + Near Future Work • Using sites visited is one promising feature to identify users • Current inference of object sizes is insufficient as a stand-in when IP traffic is encrypted • But for some users, does give positive information gain • Next steps: • Combine with other application traffic features like inferred RTT • Combine with 802.11 specific features. E.g., SSID broadcasts: • 43% of sources had at least one unique SSID