1 / 27

Privacy-Preserving Dynamic Learning of Tor Network Traffic

Privacy-Preserving Dynamic Learning of Tor Network Traffic. Adhirath Kapoor 931526192. BACKGROUND. Anonymity providing Communication System. How Tor Works ?. MOTIVATION. Current Research. Current Research only focuses on the single file model.

jonesethel
Download Presentation

Privacy-Preserving Dynamic Learning of Tor Network Traffic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Privacy-Preserving Dynamic Learning of Tor Network Traffic Adhirath Kapoor 931526192

  2. BACKGROUND

  3. Anonymity providing Communication System

  4. How Tor Works ?

  5. MOTIVATION

  6. Current Research • Current Research only focuses on the single file model. • This model is limited to a single file download and fails to explain the following – - TOR efficacy - TOR protocol dynamics - Website structural dependencies (embedded objects) - TOR network characteristics

  7. Research Question • How can more accurate traffic flows for use in Tor experimentation tools and research be generated?

  8. SOLUTION

  9. Contribution/Solution Measure TOR Learning TOR Traffic Building TOR Traffic Evaluating TOR Traffic

  10. Measuring TOR

  11. Measuring TOR • This is done using PrivCount which is an Open Source Privacy Preserving distributed Measurement System. • PrivCount has three components – - Tally Server - Share Keeper - Data Collector • PrivCount offers the following security properties – - Forward Secrecy - Measurement Aggregation - Private Measurement Results • The primary goal of measuring TOR was to improve upon the accuracy of previously reported PrivCount measurement results.

  12. Measuring TOR (continued) Entry Relay - they can observe clients and circuits, and hence, the distribution of circuits per client. Exit Relays - they can observe streams

  13. Measurement Process • Two Phases – - General Measurement - Traffic Model Measurement General Measurement allows for accuracy and to get an insight of most important traffic classes. Traffic Model Measurement focuses on measuring stream and packet hidden Markov traffic models. - Uses exit based observations (why?)

  14. Observations (contd.) Active Circuit - if eight or more cells were sent on it • Active Client - if it has at least one active circuit • 55 to 59 percent of active circuits carry only one or two streams and about 21 to 25 percent of circuits carry 3 to 6 streams. • Another 9 to 11 percent of circuits carry 7-14 streams, and about4-11 percent carry 15 or more streams. • This suggests that Tor Browser users may experience the web differently than non-Tor users. • A single circuit per client is the most common, followed by two, three, or four circuits per client. • Most Tor users only use a handful of circuits in an average 10 minutes. • Many Tor Browser users are only lightly browse the web.

  15. Learning TOR Traffic

  16. Learning TOR Traffic • In this case, exits are reliable (why?) • They can observe - Stream Model events - Packet Model events - Both Stream and Packet Models • So, PrivCount’s observation of exit relay is essential for learning both the HMM models of TOR traffic

  17. HMM • HMM is a statistical model in which the system being modeled is assumed to be a Markov process with unobservable (i.e. hidden) states. • A Markov process is a series of possible events in which the probability of each event depends only on the state attained in the previous event.

  18. Measurement Process • Step 1: Initiate the Markov Model • Step 2: Observe TOR traffic, use PrivCount to count. • Step 3: Update Model Measure HMM path with PrivCount - Observe inter-stream delays - Most likely HMM path (Viterbi) - Count HMM frequencies - Update HMM probs. using weight parameter

  19. Observations For both the stream arrival model and the packet model, - The later models are generally superior fits to new data than the earliest models. - Potentially anomalous measurement periods can cause some iterations to produce inferior models that then continue to improve in following iterations. - Using the results of these measurements, authors chose the stream and packet models that had the best performance as the basis for Shadow experiments.

  20. Building TOR Traffic

  21. Traffic Generator & Models • Tgen – Traffic Generator - Based on action dependency graphs. - Used for collecting performance benchmarks. Three Tgen configs are created. Protocol Model PrivCount Model Single File Model

  22. Evaluating TOR Traffic

  23. Evaluation • To evaluate the whole 4 step process, Shadow is used. • At this stage, all the three tgen models are run on Shadow. • The Shadow usually has datasets 5 years old, so RipeAtlas data was used.

  24. Results at 22.19.00 • PrivCount model requires more computational and memory resources to run in Shadow compared to the single file and protocol models. • The generator times are negligible when compared to the time to send and receive network traffic. • Generator generated fewer than 165 streams per circuit across all generated circuits.

  25. CRITICISM

  26. Critique • In the first stage, the authors haven’t mentioned which traffic classes are the most important in the general measurement phase. • HMM is a generative, probabilistic model and can represent only a small fraction of distributions over the space of possible sequences. • Due to their Markovian nature, HMMs do not take into account the sequence of states leading into any given state. • All the algorithms used by HMM are expensive in terms of memory and computation time. • Shadow imitates the system behaviour and network processes in order to carry out experiments on a single machine and this is heavily subject to delays. • The middle relay is useful as it can observe the difference between types of traffic. This relay model was not included in this research.

  27. What could be a possible solution ? • Machine learning can be an answer. • We can have a Machine Learning process with a learning phase followed by prediction phase. • This model has two advantages over the one proposed in the paper - can detect the type of service being used. - whether the user is actually using the network.

More Related