340 likes | 351 Views
Learn about Privacy-Preserving Measurement for Tor usage, its challenges, and Differential Privacy techniques used for accurate and secure data collection.
E N D
Understanding Tor Usage with Privacy-Preserving Measurement ACM Internet Measurement Conference (IMC) 2018 T Wilson-Brown∗ UNSW Canberra Cyber University of New South Wales Rob Jansen U.S. Naval Research Laboratory Akshaya Mani∗ Georgetown University Aaron Johnson U.S. Naval Research Laboratory Micah Sherr Georgetown University *Co-first authors
Tor is an Anonymity Network . . . . . .
Who uses Tor? and How do they use it?
Challenges in Measuring Tor Naïve Solution + Gathering statistics poses privacy risks Machine compromise Compulsion through subpoena Published aggregate + Background knowledge . . . +
Differential Privacy Aggregate + Noise Safe Tor Measurements with Differential Privacy Minimizes and quantifies the privacy risk Provides good accuracy PrivEx[Elahi et al. CCS’14], PrivCount[Jansen et al. CCS’ 16], HisTorε[Mani et al. NDSS’17], and PSC [Fenske et al. CCS’17] PrivCount PSC
Aggregate + Noise Primer on PrivCount and PSC Relays record statistics in encrypted counters . . . Relays Aggregation Parties perform crypto operations Output satisfies (ɛ, 𝛿)-differential privacy guarantees . . . Proved secure in UC-framework Aggregation Parties
PrivCount Queries . . . Aggregation Parties Relays Supports counting queries . . . E.g., how many visits over Tor to Google, Amazon, and Facebook? Does not supports count distinct queries E.g., how many unique destinations visited over Tor?
PSC Queries {I2} . . . |{I1}∪{I2}∪ . . . {In}| Private Set-union Cardinality {I1} {In} Aggregation Parties Relays Supports count of distinct values across relays . . . E.g., how many unique clients connected to Tor?
Safely Measuring Tor with PrivCount and PSC PrivCount Differentially Private privacy parameters |{I1}∪{I2}∪ . . . {In}| PSC . . . Differentially Private privacy parameters
Tuning Privacy via Action Bounds Proposed by Jansen et al. [CCS ’16] For an epoch (i.e, measurement period) Bounds the amount of network activity protected by differential privacy Major Challenge Coming up with “reasonable” bounds that produce accurate results
Aggregate + Noise Differential Privacy Protecting “hypothetical” network-level users produces inaccurate results
Example: Web Browsing with Tor Browser What is the maximum number of domains that a regular user might access over Tor in 24 hours? Action Bound: 20 domains per day – allows for 2-4 domains for 5-10 hours
Aggregate + Noise Deployment: PrivCount & PSC 3 Aggregation Parties 6 Exits & 10 Non-exits . . . . . . . . . US CA FR 3 Operators
Measuring How Users use Tor (via PrivCount) Measurement Period: 4th – 5th Jan 2018 Exit Weight: 1.5% total available exit weight in Tor Web Server ACM IMC ’18 TCP connections Streams Guard Exit Uses new circuit for each unique domain in the address bar (or a new tab) Initial stream – indicates users intended destination (e.g., example.com) Subsequent streams – fetch embedded resources (e.g., images, scripts)
Measuring How Users use Tor (via PrivCount) Measurement Period: 4th – 5th Jan 2018 Exit Weight: 1.5% total available exit weight in Tor Exit Streams 2.1 billion Subsequent Initial 5% Hostname IP Address 110.1% Web Other Insignificant 86.54% Result 1: Vast majority of Tor use is for web browsing
Domain (Alexa Rank) Measurement (via PrivCount) Measurement Period: 31st Jan – 1st Feb 2018 Exit Weight: 2.2% onionoo.torproject.org – 43.4% 47.8 Android Tor client (Orbot) does an onionoo lookup for every relay in every circuit built by Tor
Domain (Alexa Rank) Measurement (via PrivCount) Measurement Period: 31st Jan – 1st Feb 2018 Exit Weight: 2.2% onionoo.torproject.org – 43.4% 47.8 Android Tor client (Orbot) does an onionoo lookup for every relay in every circuit built by Tor Alexa top sites ~ 63.8% Result 2: Alexa top sites represent majority of destinations visited by Tor users
Domain (Alexa Siblings) Measurement (via PrivCount) Measurement Period: 1st Feb – 2ndFeb 2018 Exit Weight: 2.1% amazon.com – 8.6% Contacted Amazon, unable to find the reason (due to lack of response)
Summary: How Users use Tor Result 1: Vast majority of Tor use is for web browsing Result 2: Alexa top sites represent majority of destinations visited by Tor users Other Results Top Level Domains (via PrivCount) Result 3: The three main TLDs (.com, .org, and .net) make up the majority of the primary domains accessed by Tor users Unique SLDs & Alexa SLDs (via PSC) Result 4: A long tail exists in the distribution of sites accessed over Tor
PSC Measurements Statistical Analysis Major Challenge E.g., domains visited follow power-law distribution Extrapolating unique counts to the entire network [Krashakov et al. 2006, Adamic et al. 2012] Items in our sample a fair representation of the entire network or not Using information about frequency distribution of observed items log(P(k)) log(k) Using Monte-Carlo simulations for complicated distributions
Measuring Distinct Tor Users (via PSC) Assumption Each unique IP is a distinct Tor client (or user) May be violated Mobile users with changing IP addresses Users behind NAT Bridges also counted as clients
Measuring Distinct Tor Users (via PSC) Measurement Period: 12th – 15th Apr 2018 Perform measurements using relays of different sizes 0.42 0.88 1.19 If suppose clients connect to a single guard: 0.0088 x 148,174 / 0.0042 ≈ 310,460 unique clients > 269, 795 Conclusion: Client IPs connect to multiple guards
Tor Client Model Captures behavior of Tor bridges, etc. Using simulation Model: A set of ppromiscuous clients connect to all guards Remaining clients connect to g guards Yields ~6 to 11 million unique client IP addresses Result 1: ~6 to 11 million unique client IP addresses connect to Tor A factor of 3-5 more than that reported by Tor Metrics Portal (using heuristics)
Geopolitical Distribution of Tor Clients (via PrivCount) Result 2: United States (US), Russia (RU), and Germany (DE) use Tor the most
Geopolitical Distribution of Tor Clients (via PrivCount) Tor Metrics Portal ranks United Arab Emirates (AE) second Surprisingly many clients from scarcely populated islands Probably from Tor bridges, single onion services, Tor network scanners
Network Diversity of Tor Clients (via PrivCount) Uses IPv4 and IPv6 datasets from CAIDA Hosting providers – E.g., Hetzner, DigitalOcean Probably from Tor bridges, single onion services, Tor network scanners
Client Results Result 1: ~6 to 11 million unique client IP addresses connect to Tor Result 2: United States (US), Russia (RU), and Germany (DE) use Tor the most Other Results Client IP churn (via PSC) – has never been measured before Result 3: The no. of new unique client IPs connecting to Tor decreases over time Unique country and AS count (via PSC) Result 4: We observe clients from about 125 (out of 252) countries and 11,882 (out of 59,597) ASes
Very Brief Overview of Onion Services Allows user to offer a service without revealing its location (IP address) abc.onion Alice Bob abc.onion Distributed Hash Table [DHT] . . .
Onion Services Results Some results Result 1: 90.9% (out of 134 million) onion service descriptor fetches failed Possibly bots with with outdated address lists Result 2: Ahmia onion service search engine contains 56.8% (out of 12.2 million) successfully fetched descriptors Majority of visits to onion sites with publicly available addresses Result 3: Surprisingly ~90% of attempted connections to onion services fail
Challenges Faced Coming up with reasonable action bounds Extrapolating PSC measurements to network-wide counts 24 hours waiting time between different measurements Differentially private noise can overwhelm the actual count Requires repeating measurement for multiple rounds
Major Findings Tor is predominantly used for web browsing Tor users visit Alexa sites as regular Internet users do ~6 to 11 million unique IP addresses connect to Tor United States (US), Russia (RU), and Germany (DE) use Tor the most 90.9% onion address fetches fail ~90% attempted connection to onion services fail Data available at https://security.cs.georgetown.edu/measurement-study/ Understanding Tor Usage with Privacy-Preserving Measurement Akshaya Mani, T Wilson-Brown, Rob Jansen, Aaron Johnson, Micah Sherr
Major Findings Tor is predominantly used for web browsing Tor users visit Alexa sites as regular Internet users do ~6 to 11 million unique IP addresses connect to Tor United States (US), Russia (RU), and Germany (DE) use Tor the most 90.9% onion address fetches fail ~90% attempted connection to onion services fail Data available at https://security.cs.georgetown.edu/measurement-study/ Understanding Tor Usage with Privacy-Preserving Measurement Akshaya Mani, T Wilson-Brown, Rob Jansen, Aaron Johnson, Micah Sherr