1 / 32

Understanding Tor Usage with Privacy-Preserving Measurement

Understanding Tor Usage with Privacy-Preserving Measurement. T Wilson-Brown ∗ UNSW Canberra Cyber University of New South Wales. Rob Jansen U.S. Naval Research Laboratory. Akshaya Mani ∗ Georgetown University. Aaron Johnson U.S. Naval Research Laboratory. Micah Sherr

boyle
Download Presentation

Understanding Tor Usage with Privacy-Preserving Measurement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding Tor Usage with Privacy-Preserving Measurement T Wilson-Brown∗ UNSW Canberra Cyber University of New South Wales Rob Jansen U.S. Naval Research Laboratory Akshaya Mani∗ Georgetown University Aaron Johnson U.S. Naval Research Laboratory Micah Sherr Georgetown University *Co-first authors

  2. Tor is an Anonymity Network . . . . . .

  3. Whouses Tor? and How do they use it?

  4. Challenges in Measuring Tor Naïve Solution + Gathering statistics poses privacy risks Machine compromise Compulsion through subpoena Published aggregate + Background knowledge . . . +

  5. Differential Privacy Aggregate + Noise Safe Tor Measurements with Differential Privacy Minimizes and quantifies the privacy risk Provides good accuracy PrivEx[Elahi et al. CCS’14], PrivCount[Jansen et al. CCS’ 16], HisTorε[Mani et al. NDSS’17], and PSC [Fenske et al. CCS’17] PrivCount PSC

  6. Aggregate + Noise Primer on PrivCountand PSC Relays record statistics in encrypted counters . . . Relays Aggregation Parties perform crypto operations Output satisfies (ɛ, 𝛿)-differential privacy guarantees . . . Proved secure in UC-framework Aggregation Parties

  7. PrivCount Queries . . . Aggregation Parties Relays Supports counting queries . . . E.g., how many visits over Tor to Google, Amazon, and Facebook? Does not supports count distinct queries E.g., how many unique destinations visited over Tor?

  8. PSC Queries {I2} . . . |{I1}∪{I2}∪ . . . {In}| Private Set-union Cardinality {I1} {In} Aggregation Parties Relays Supports count of distinct values across relays . . . E.g., how many unique clients connected to Tor?

  9. Safely Measuring Tor with PrivCount and PSC PrivCount Differentially Private privacy parameters |{I1}∪{I2}∪ . . . {In}| PSC . . . Differentially Private privacy parameters

  10. Tuning Privacy via Action Bounds Proposed by Jansen et al. [CCS ’16] For an epoch (i.e, measurement period) Bounds the amount of network activity protected by differential privacy Major Challenge Coming up with “reasonable” bounds that produce accurate results

  11. Aggregate + Noise Differential Privacy Protecting “hypothetical” network-level users produces inaccurate results

  12. Coming Up With Reasonable Action Bounds Consider reasonable activities a Tor user might perform in an epoch E.g., web browsing with Tor Browser Determine how this activity translates to observable actions in Tor Connecting to a domain Send or receive entry/exit data Compute the maximum amount of network action For reasonable amount of this activity For an epoch (we use 24 hours)

  13. Example: Web Browsing with Tor Browser What is the maximum number of domains that a regular user might access over Tor in 24 hours? Action Bound: 20 domains per day – allows for 2-4 domains for 5-10 hours

  14. Aggregate + Noise Deployment: PrivCount & PSC 3 Aggregation Parties 6 Exits & 10 Non-exits . . . . . . . . . US CA FR 3 Operators

  15. How do users use Tor?

  16. Measuring How Users use Tor (via PrivCount) Measurement Period: 4th – 5th Jan 2018 Exit Weight: 1.5% total available exit weight in Tor Web Server ACM IMC ’18 TCP connections Streams Guard Exit Uses new circuit for each unique domain in the address bar (or a new tab) Destination ports requested are web ports (either 80 or 443) Result 1: Vast majority of Tor use is for web browsing

  17. Domain (Alexa Rank) Measurement (via PrivCount) Measurement Period: 31st Jan – 1st Feb 2018 Exit Weight: 2.2% onionoo.torproject.org – 43.4% 47.8 Android Tor client (Orbot) does an onionoo lookup for every relay in every circuit built by Tor

  18. Domain (Alexa Rank) Measurement (via PrivCount) Measurement Period: 31st Jan – 1st Feb 2018 Exit Weight: 2.2% onionoo.torproject.org – 43.4% 47.8 Android Tor client (Orbot) does an onionoo lookup for every relay in every circuit built by Tor Alexa top sites ~80% Result 2: Alexa top sites is a reasonable representation of destinations visited by Tor users

  19. Summary: How Users use Tor Result 1: Vast majority of Tor use is for web browsing Result 2: Alexa top sites is a reasonable representation of destinations visited by Tor users Other Results Top Level Domains (via PrivCount) Result 3: The three main TLDs (.com, .org, and .net) make up the majority of the primary domains accessed by Tor users Unique SLDs & Alexa SLDs (via PSC) Result 4: A long tail exists in the distribution of sites accessed over Tor

  20. Who uses Tor?

  21. PSC Measurements Statistical Analysis Major Challenge E.g., domains visited follow power-law distribution Extrapolating unique counts to the entire network [Krashakov et al. 2006, Adamic et al. 2012] Using information about frequency distribution of observed items log(P(k)) Determine parameter and construct confidence intervals log(k) Using Monte-Carlo simulations for complicated distributions

  22. Measuring Distinct Tor Users (via PSC) Measurement Period: 12th – 15th Apr 2018 Perform measurements using relays of different sizes 0.42 0.88 If suppose clients connect to a single guard: 0.0088 x 148,174 / 0.0042 ≈ 310,460 unique clients > 269, 795 Conclusion: Client IPs connect to multiple guards

  23. Tor Client Model Captures behavior of Tor bridges, etc. Using simulation Model: A set of ppromiscuous clients connect to all guards Remaining clients connect to g guards Yields ~8 million clients Result 1: Tor has approximately 8 million daily users A factor of four more than that reported by Tor Metrics Portal (using heuristics)

  24. Geopolitical Distribution of Tor Clients (via PrivCount) Result 2: United States (US), Russia (RU), and Germany (DE) use Tor the most

  25. Geopolitical Distribution of Tor Clients (via PrivCount) However Tor Metrics Portal ranks United Arab Emirates (AE) second Seems overestimate: implies ~4% of AE population uses Tor daily Possibly majority of Tor clients from AE are partially blocked from using Tor

  26. Network Diversity of Tor Clients (via PrivCount) Uses IPv4 and IPv6 datasets from CAIDA Major Challenge Per-AS differentially- private noise was large for some ASes Hosting providers – E.g., Hetzner, DigitalOcean Probably from Tor bridges, onion services, Tor network scanners

  27. Client Results Result 1: Tor has approximately 8 million daily users Result 2: United States (US), Russia (RU), and Germany (DE) use Tor the most Other Results Client IP churn (via PSC) – has never been measured before Result 3: Client IP churn rate decreases the longer we observe Unique country and AS count (via PSC)

  28. Very Brief Overview of Onion Services Allows user to offer a service without revealing its location (IP address) abc.onion Alice Bob abc.onion Distributed Hash Table [DHT] . . .

  29. Onion Services Results Some results: Result 1: 90.9% (out of 134 million) onion service descriptor fetches failed Result 2: Ahmia onion service search engine contains 56.8% (out of 12.2 million) successfully fetched descriptors

  30. Challenges Faced Coming up with reasonable action bounds Extrapolating PSC measurements to network-wide counts 24 hours waiting time between different measurements Differentially private noise can overwhelm the actual count Requires repeating measurement for multiple rounds

  31. Major Findings Tor is predominantly used for web browsing Tor users visit Alexa sites as regular Internet users do Tor has approximately 8 million daily users Tor has a decreasing client IP churn rate Decreases the longer we observe 90.9% onion address fetches fail Data available at https://security.cs.georgetown.edu/measurement-study/ Understanding Tor Usage with Privacy-Preserving Measurement AkshayaMani, T Wilson-Brown, Rob Jansen, Aaron Johnson, Micah Sherr

  32. Major Findings Tor is predominantly used for web browsing Tor users visit Alexa sites as regular Internet users do Tor has approximately 8 million daily users Tor has a decreasing client IP churn rate Decreases the longer we observe 90.9% onion address fetches fail Data available at https://security.cs.georgetown.edu/measurement-study/ Understanding Tor Usage with Privacy-Preserving Measurement AkshayaMani, T Wilson-Brown, Rob Jansen, Aaron Johnson, Micah Sherr

More Related