350 likes | 534 Views
Physical Layer Attacks on Unlinkability in Wireless LANs. Kevin Bauer * Damon McCoy * Ben Greenstein + Dirk Grunwald * Douglas Sicker * * University of Colorado + Intel Research Seattle. Our Wireless World. tcpdump. Link Layer Header. Link Layer Header. Link Layer Header.
E N D
Physical Layer Attacks on Unlinkability in Wireless LANs Kevin Bauer* Damon McCoy* Ben Greenstein+Dirk Grunwald* Douglas Sicker* * University of Colorado+ Intel Research Seattle
Our Wireless World tcpdump Link Layer Header Link Layer Header Link Layer Header Link Layer Header Link Layer Header PrivatePhoto1.jpg Home location=(47.28,… Buddy list: Alice, Bob, … PrivateVideo1.avi Blood pressure: high Our wireless devices reveal lots of information about us
Best Security Practices for 802.11 Bootstrap tcpdump SSID: Bob’s Network Key: 0x2384949… Username: Alice Key: 0x348190… Out-of-band (e.g., password, WiFiProtected Setup) 802.11 probe Is Bob’s Network here? 802.11 beacon Bob’s Network is here Discover Authenticate and Bind 802.11 auth Proof that I’m Alice 802.11 auth Proof that I’m Bob • Confidentiality • Authentication • Integrity Send Data 802.11 header 802.11 header 2
Problem: Short-Term Linking tcpdump 12:34:56:78:90:ab 12:34:56:78:90:ab, seqno: 1, … 12:34:56:78:90:ab 12:34:56:78:90:ab, seqno: 2, … 00:00:99:99:11:11, seqno: 102, … 00:00:99:99:11:11 12:34:56:78:90:ab, seqno: 3, … 12:34:56:78:90:ab 00:00:99:99:11:11, seqno: 103, … 00:00:99:99:11:11 Alice -> AP 12:34:56:78:90:ab, seqno: 4, … 12:34:56:78:90:ab Alice -> AP Alice -> AP 00:00:99:99:11:11, seqno: 104, … 00:00:99:99:11:11 Easy to isolate packet streams using addresses, seq nums
Problem: Short-Term Linking DFT • Isolated data streams are susceptible to side-channel analysis using packet size and timing information • Exposes keystrokes, VoIP calls, webpages, movies, … • [Liberatore, CCS ‘06; Pang, MobiCom ’07; Saponas, Usenix Security ’07; Song, Usenix Security ‘01; Wright, IEEE S&P ‘08; Wright, Usenix Security ‘07] 100 250 500 300 200 120 ≈ transmission sizes transmission sizes Device fingerprints Video compression signatures Keystroke timings
Solution: Encrypt the Entire Frames Which packets are transmitted by which devices? “SlyFi”, MobiSys ’08 tcpdump 3-9 data streams overlap each 100 ms, on average Unlinkability is achieved
Our Goal: Short-Term Linking Using Physical Layer Information • State-of-the-art methods requirespecialized and expensive hardware [Brik, Mobicom ’08; Danev, Usenix Security ‘09] • We want to perform short-term transmitter packet linking using low-cost commodity hardware tcpdump Charlie -> AP ??? -> AP Charlie -> AP ??? -> AP Alice -> AP ??? -> AP Charlie -> AP ??? -> AP Charlie -> AP ??? -> AP Charlie -> AP ??? -> AP
Talk Outline ✓ Motivation and Goals Physical Layer Packet Linking Experimental Evaluation Solution: Introduce Noise
Signal Strength Background RSSI values can be obtained using commodity 802.11 radios and drivers tcpdump Increasing distance -85 dB Eavesdropper Decreasing RSSI -50 dB -65 dB Noise floor Received signal strength indication(RSSI) fades as transmissions travel further
Real World Signal Strength Behavior Physical Location Signal Strength (dB) Received signal strength is influenced by the transmitting device’s physical location
Packet Linking with Device Localization • We first try to link packets by location • RSSI values fluctuate due to environmental noise • Supervised learning algorithms: RSSI location mapping • We use k-nearest neighbors [Bahl, Infocom ’00] But localization requires training data, which is expensive and time consuming to collect
An Unsupervised Approach We’re not interested in mapping packets to location, just linking packets to transmitters tcpdump Use a clustering algorithm to handle noise
More Details • Use k-means to classify packets by transmitter • n listening sensors • Feature vector: (RSSI1, RSSI2, … , RSSIn) • k-means is probabilistic may not find a globally optimal solution • Heuristic: Run 100 times to get a stable solution • Meets our goal: Requires only commodity 802.11 hardware, stock drivers, and no training
Talk Outline ✓ ✓ Motivation and Goals Physical Layer Packet Linking Experimental Evaluation Solution: Introduce Noise
Experimental Evaluation Collect real signal strength data in a 75m × 50m office building 5 passive monitors and 58 different measurement positions Our dataset is available in CRAWDAD wireless trace repository: http://crawdad.cs.dartmouth.edu/cu/rssi
Packet Clustering Accuracy • Adversary uses 5 sensors to record packets’ RSSI values • Generate 100 random device configurations • Clustering accuracy > 75% for all experiments • Accuracy using localization-based approach performs worse • (see paper for details) But is this good enough to enable interesting traffic analysis? Higher = Better Vary the number of transmitters from 5-25 • k-means is very accurate at clustering packets using RSSI
Website Fingerprinting Accuracy • Attack: Encrypted website fingerprinting using [Liberatore and Levine, CCS ‘06] • Naïve Bayes classifier to identify websites after clustering packets Higher = Better • Simple traffic analysis task performs well
Talk Outline ✓ ✓ ✓ Motivation and Goals Physical Layer Packet Linking Experimental Evaluation Solution: Introduce Noise
Solution: Vary Transmit Power Intuition: We expect tight, separable clusters Goal: Make the clusters overlap Cluster is now larger, more likely to overlapwith other clusters: this introduces more clustering errors • Varying transmit power introduces more noise in RSSI
Solution: Directional Antenna Intuition: Focus signal in different directions: creates “phantom” clusters Inexpensive “cantenna” 1 device, 4 distinct clusters • Using a directional antenna causes fluctuation in RSSI
Combined: Clustering Accuracy • 15 transmitters total • Vary number of devices that add noise • Decreases clustering accuracy from 80% to 50% • Traffic analysis accuracy decreases from 40% to 26% for devices that add noise Lower = Better • Both solutions decrease clustering accuracy
Other Potential Solutions • Anonymity (still) loves company • The more devices, the better • Devices close together have similar clusters • Wireless cover traffic • Devices transmit “dummy traffic” to frustrate side channel attacks • Wireless shared medium degrades performance • Physical security, jamming, frequency hopping • Performance implications, may not be effective • Physical layer info is hard to control
Conclusion • Wireless devices are becoming personal and pervasive • Information present at the physical layer can lead to privacy leaks • Short-term linking: Side-channel attacks • Defenses to mitigate attacks • Introducing additional noise reduces clustering accuracy • More research is needed to help address privacy risks exposed by the physical layer
How many sensors are enough? Almost no gain after three sensors
Empirical stream interleaving • Many streams interleaved at short timescales
Why use k-means? k-means performs well with spherical patterns It’s simple, yet it out-performed other clustering methods on our task
How does distance effect accuracy? Two transmitters at different distances Measured accuracy of k-means
What if attacker doesn’t know k? Even if attacker can approximate k, website fingerprinting attack can still perform well
Related Work • Device Distinction • Detect MAC spoofing [Faria, WISE ‘06] • Doesn’t generalize to k devices • Uses multipathing to detect spoofing [Patwari ‘07] • Uses non-commodity hardware • RF Fingerprinting • Uses electromagnetic signature [Hall ‘05] • Uses expensive non-commodity hardware • Uses modulation fingerprinting [Brik ’08,Danev ‘09] • Relies on signal analyzer hardware
Clustering accuracy: F-measure Weighted harmonic mean of precision and recall: 1. In terms of information retrieval: tp: true positive fp: false positive fn: false negative 2. In terms of classification: Homogeneity of each cluster Extent to which packets are clustered together
k-Means Clustering Algorithm • Input: Data set and number of clusters k • Initialization: Select initial cluster centroids by choosing k data points at random • Repeat until cluster membership is stable: • Compute the distance from each data point to each of the k centroids • Group the data points by their closest centroid • Compute the new cluster centroids • k-means minimizes the residual sum of squares
Why does clustering perform better than localization for linking? • Surprising result • Training means it should be better, right? • But, localized packets have error (3.5 meters at the median) so we need to cluster the localized packets by their location predictions • Errors from localization and clustering steps are additive
Estimating k from data where μi is the centroid of cluster Si • k-means tries to minimize the within-cluster residual sum of squares • Choose ks.t. the within-cluster sum of squares is minimized using cross validation • Works best when clusters are separable