1 / 34

Understanding Tor Usage with Privacy-Preserving Measurement

Understanding Tor Usage with Privacy-Preserving Measurement. ACM Internet Measurement Conference ( IMC ) 2018. T Wilson-Brown ∗ UNSW Canberra Cyber University of New South Wales. Rob Jansen U.S. Naval Research Laboratory. Akshaya Mani ∗ Georgetown University. Aaron Johnson

broome
Download Presentation

Understanding Tor Usage with Privacy-Preserving Measurement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding Tor Usage with Privacy-Preserving Measurement ACM Internet Measurement Conference (IMC) 2018 T Wilson-Brown∗ UNSW Canberra Cyber University of New South Wales Rob Jansen U.S. Naval Research Laboratory Akshaya Mani∗ Georgetown University Aaron Johnson U.S. Naval Research Laboratory Micah Sherr Georgetown University *Co-first authors

  2. Tor is an Anonymity Network . . . . . .

  3. Who uses Tor? and How do they use it?

  4. Challenges in Measuring Tor Naïve Solution + Gathering statistics poses privacy risks Machine compromise Compulsion through subpoena Published aggregate + Background knowledge . . . +

  5. Differential Privacy Aggregate + Noise Safe Tor Measurements with Differential Privacy Minimizes and quantifies the privacy risk Provides good accuracy PrivEx[Elahi et al. CCS’14], PrivCount[Jansen et al. CCS’ 16], HisTorε[Mani et al. NDSS’17], and PSC [Fenske et al. CCS’17] PrivCount PSC

  6. Aggregate + Noise Primer on PrivCount and PSC Relays record statistics in encrypted counters . . . Relays Aggregation Parties perform crypto operations Output satisfies (ɛ, 𝛿)-differential privacy guarantees . . . Proved secure in UC-framework Aggregation Parties

  7. PrivCount Queries . . . Aggregation Parties Relays Supports counting queries . . . E.g., how many visits over Tor to Google, Amazon, and Facebook? Does not supports count distinct queries E.g., how many unique destinations visited over Tor?

  8. PSC Queries {I2} . . . |{I1}∪{I2}∪ . . . {In}| Private Set-union Cardinality {I1} {In} Aggregation Parties Relays Supports count of distinct values across relays . . . E.g., how many unique clients connected to Tor?

  9. Safely Measuring Tor with PrivCount and PSC PrivCount Differentially Private privacy parameters |{I1}∪{I2}∪ . . . {In}| PSC . . . Differentially Private privacy parameters

  10. Tuning Privacy via Action Bounds Proposed by Jansen et al. [CCS ’16] For an epoch (i.e, measurement period) Bounds the amount of network activity protected by differential privacy Major Challenge Coming up with “reasonable” bounds that produce accurate results

  11. Aggregate + Noise Differential Privacy Protecting “hypothetical” network-level users produces inaccurate results

  12. Example: Web Browsing with Tor Browser What is the maximum number of domains that a regular user might access over Tor in 24 hours? Action Bound: 20 domains per day – allows for 2-4 domains for 5-10 hours

  13. Aggregate + Noise Deployment: PrivCount & PSC 3 Aggregation Parties 6 Exits & 10 Non-exits . . . . . . . . . US CA FR 3 Operators

  14. How do users use Tor?

  15. Measuring How Users use Tor (via PrivCount) Measurement Period: 4th – 5th Jan 2018 Exit Weight: 1.5% total available exit weight in Tor Web Server ACM IMC ’18 TCP connections Streams Guard Exit Uses new circuit for each unique domain in the address bar (or a new tab) Initial stream – indicates users intended destination (e.g., example.com) Subsequent streams – fetch embedded resources (e.g., images, scripts)

  16. Measuring How Users use Tor (via PrivCount) Measurement Period: 4th – 5th Jan 2018 Exit Weight: 1.5% total available exit weight in Tor Exit Streams 2.1 billion Subsequent Initial 5% Hostname IP Address 110.1% Web Other Insignificant 86.54% Result 1: Vast majority of Tor use is for web browsing

  17. Domain (Alexa Rank) Measurement (via PrivCount) Measurement Period: 31st Jan – 1st Feb 2018 Exit Weight: 2.2% onionoo.torproject.org – 43.4% 47.8 Android Tor client (Orbot) does an onionoo lookup for every relay in every circuit built by Tor

  18. Domain (Alexa Rank) Measurement (via PrivCount) Measurement Period: 31st Jan – 1st Feb 2018 Exit Weight: 2.2% onionoo.torproject.org – 43.4% 47.8 Android Tor client (Orbot) does an onionoo lookup for every relay in every circuit built by Tor Alexa top sites ~ 63.8% Result 2: Alexa top sites represent majority of destinations visited by Tor users

  19. Domain (Alexa Siblings) Measurement (via PrivCount) Measurement Period: 1st Feb – 2ndFeb 2018 Exit Weight: 2.1% amazon.com – 8.6% Contacted Amazon, unable to find the reason (due to lack of response)

  20. Summary: How Users use Tor Result 1: Vast majority of Tor use is for web browsing Result 2: Alexa top sites represent majority of destinations visited by Tor users Other Results Top Level Domains (via PrivCount) Result 3: The three main TLDs (.com, .org, and .net) make up the majority of the primary domains accessed by Tor users Unique SLDs & Alexa SLDs (via PSC) Result 4: A long tail exists in the distribution of sites accessed over Tor

  21. Who uses Tor?

  22. PSC Measurements Statistical Analysis Major Challenge E.g., domains visited follow power-law distribution Extrapolating unique counts to the entire network [Krashakov et al. 2006, Adamic et al. 2012] Items in our sample a fair representation of the entire network or not Using information about frequency distribution of observed items log(P(k)) log(k) Using Monte-Carlo simulations for complicated distributions

  23. Measuring Distinct Tor Users (via PSC) Assumption Each unique IP is a distinct Tor client (or user) May be violated Mobile users with changing IP addresses Users behind NAT Bridges also counted as clients

  24. Measuring Distinct Tor Users (via PSC) Measurement Period: 12th – 15th Apr 2018 Perform measurements using relays of different sizes 0.42 0.88 1.19 If suppose clients connect to a single guard: 0.0088 x 148,174 / 0.0042 ≈ 310,460 unique clients > 269, 795 Conclusion: Client IPs connect to multiple guards

  25. Tor Client Model Captures behavior of Tor bridges, etc. Using simulation Model: A set of ppromiscuous clients connect to all guards Remaining clients connect to g guards Yields ~6 to 11 million unique client IP addresses Result 1: ~6 to 11 million unique client IP addresses connect to Tor A factor of 3-5 more than that reported by Tor Metrics Portal (using heuristics)

  26. Geopolitical Distribution of Tor Clients (via PrivCount) Result 2: United States (US), Russia (RU), and Germany (DE) use Tor the most

  27. Geopolitical Distribution of Tor Clients (via PrivCount) Tor Metrics Portal ranks United Arab Emirates (AE) second Surprisingly many clients from scarcely populated islands Probably from Tor bridges, single onion services, Tor network scanners

  28. Network Diversity of Tor Clients (via PrivCount) Uses IPv4 and IPv6 datasets from CAIDA Hosting providers – E.g., Hetzner, DigitalOcean Probably from Tor bridges, single onion services, Tor network scanners

  29. Client Results Result 1: ~6 to 11 million unique client IP addresses connect to Tor Result 2: United States (US), Russia (RU), and Germany (DE) use Tor the most Other Results Client IP churn (via PSC) – has never been measured before Result 3: The no. of new unique client IPs connecting to Tor decreases over time Unique country and AS count (via PSC) Result 4: We observe clients from about 125 (out of 252) countries and 11,882 (out of 59,597) ASes

  30. Very Brief Overview of Onion Services Allows user to offer a service without revealing its location (IP address) abc.onion Alice Bob abc.onion Distributed Hash Table [DHT] . . .

  31. Onion Services Results Some results Result 1: 90.9% (out of 134 million) onion service descriptor fetches failed Possibly bots with with outdated address lists Result 2: Ahmia onion service search engine contains 56.8% (out of 12.2 million) successfully fetched descriptors Majority of visits to onion sites with publicly available addresses Result 3: Surprisingly ~90% of attempted connections to onion services fail

  32. Challenges Faced Coming up with reasonable action bounds Extrapolating PSC measurements to network-wide counts 24 hours waiting time between different measurements Differentially private noise can overwhelm the actual count Requires repeating measurement for multiple rounds

  33. Major Findings Tor is predominantly used for web browsing Tor users visit Alexa sites as regular Internet users do ~6 to 11 million unique IP addresses connect to Tor United States (US), Russia (RU), and Germany (DE) use Tor the most 90.9% onion address fetches fail ~90% attempted connection to onion services fail Data available at https://security.cs.georgetown.edu/measurement-study/ Understanding Tor Usage with Privacy-Preserving Measurement Akshaya Mani, T Wilson-Brown, Rob Jansen, Aaron Johnson, Micah Sherr

  34. Major Findings Tor is predominantly used for web browsing Tor users visit Alexa sites as regular Internet users do ~6 to 11 million unique IP addresses connect to Tor United States (US), Russia (RU), and Germany (DE) use Tor the most 90.9% onion address fetches fail ~90% attempted connection to onion services fail Data available at https://security.cs.georgetown.edu/measurement-study/ Understanding Tor Usage with Privacy-Preserving Measurement Akshaya Mani, T Wilson-Brown, Rob Jansen, Aaron Johnson, Micah Sherr

More Related