1 / 19

Privacy-Preserving Dynamic Learning of Tor Network Traffic

This paper explores a privacy-preserving approach to understanding Tor network traffic through the use of the open-source tool PrivCount. The study proposes a more comprehensive model for Tor traffic, while ensuring user privacy in private Tor networks and simulations.

cdiane
Download Presentation

Privacy-Preserving Dynamic Learning of Tor Network Traffic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Privacy-Preserving Dynamic Learning of Tor Network Traffic Presented by Matthew Taylor Original authors: Rob Jansen, Matthew Traudt and Nicholas Hopper

  2. Introduction

  3. Background Knowledge - Tor • The Onion Router • Anonymous communication protocol • Utilizes network of volunteer relays to encrypt and randomly “bounce around” traffic to hide it from online surveillance. • Each relay decrypts a layer and passes on the original message to the next relay without knowing original sender or destination • 6,000 volunteer relays as of 2018 • 100Gbit/s of traffic from over two million daily users

  4. Background Knowledge - Terminology • Circuit - Path through Tor network • Stream - TCP connections carried over a circuit • Relay - router/node in a network

  5. Motivation • Current tor experimentation tools work but underlying models oversimplify tor traffic • Currently look at single file (320KiB)and bulk (5MiB) downloads but neglect inner workings of tor i.e. doesn’t look at circuits etc. • Authors believe a more comprehensive model of tor traffic can be created

  6. Problem Gain a more comprehensive understanding of Tor traffic without compromising the privacy of its users for the purposes of creating more accurate traffic in private Tor networks and simulations.

  7. Solution

  8. Measuring Tor • Use open-source tool PrivCount • PrivCount works by using three types of relays: one tally server, one or mores share keepers and one or more data collectors. • Data collectors collect data and add blinding value to make it unreadable • Data is given to share keepers which collect it and sum event count and add noise • Tally server used to configure and receive data from share keepers as well as removing blinding values, at the end tally server has the global noisy count of events • PrivCount deployment with 1 tally server, 3 share keepers, and 17 data collectors • Minimum privacy standards met

  9. Measuring Tor Statistics Tor measurements can be taken at entry relays and exit relays: • Entry relays measure circuit and clients no metadata, this information is helpful in producing accurate Tor client models. • Exit relays can view streams and therefore metadata but not circuit or client data

  10. Tor Statistics Findings • 47% of clients will be inactive at a given time • Only 6.5% of traffic was outbound to the wider Tor network • 73.5% of incoming traffic was to ports 80 and 443 • 55-59% of active circuits only contain one or two streams • 4 - 11% carry 15 or more stream • 70 - 80% of streams receive less than 16 KiB inbound • 75 - 85% of streams send less than 1KiB outbound

  11. Modelling Tor • Need to generate model for creation of streams and traffic on stream • Modelled using Hidden Markov Model - markov model except probabilities are treated as a “black box” • Collect data to make assumptions about the form of the model, e.g. The number of states, the connectivity between states, and the form of observation distributions.

  12. Training the Model • Usually with hidden Markov Model, trained using a single dataset • To conserve privacy this could not be done as data needs to change every 10 minutes • PrivCount used to supply data for every iteration and hope data converges

  13. Evaluating Models Single file - repeatedly downloads a file and sometimes pauses to simulate traffic, widely used but overly simplified Protocol - web model, generates realistic traffic from http archives / bulk model, uses a dataset of torrents to simulate real torrent traffic. privCount model - model generated from all data collected previously

  14. Evaluation • Used TGen traffic generation tool to generate traffic, tor routing and Shadow simulator to test all models. • PrivCount used again to measure results

  15. Results • The cumulative percentage distance from ground truth distance across all nine single counters was 703% for the single file model, 1001% for the protocol model, and 408% for the PrivCount. Meaning PrivCount matched Tor the closest but was still considerably off • The cumulative percentage distance from ground truth across the six histogram counters was 150% for the single file model, 56% for the protocol model, and 95% for the PrivCount model • PrivCount more computationally expensive

  16. Criticism

  17. What was Good • All things considered, was good paper overall • Took care to conserve user privacy • Managed to produce an improved model

  18. What was not so Good • No direct measurements could be taken so as to not compromise anonymity • HMM trained with multiple datasets via PrivCount instead of one and as a result didn’t converge but instead oscillated • Assuming HMM, i.e. HMM is simplest of Bayesian models, perhaps the Tor traffic should have been modelled using a more complex model • Resorted to using simulations instead of real experiments

More Related