270 likes | 283 Views
This research focuses on improving Tor traffic measurements to understand its efficacy, protocol dynamics, and network characteristics more accurately. The study utilizes PrivCount, offers security properties like Forward Secrecy, and involves phases like General Measurement and Traffic Model Measurement to evaluate Tor traffic flows. The research uses HMM models for learning Tor traffic patterns and evaluates the process using the Shadow tool. Results show the computational resources required by PrivCount and critique aspects of the methodology. Overall, the study aims to enhance Tor network research and measurement techniques.
E N D
Privacy-Preserving Dynamic Learning of Tor Network Traffic Adhirath Kapoor 931526192
Current Research • Current Research only focuses on the single file model. • This model is limited to a single file download and fails to explain the following – - TOR efficacy - TOR protocol dynamics - Website structural dependencies (embedded objects) - TOR network characteristics
Research Question • How can more accurate traffic flows for use in Tor experimentation tools and research be generated?
Contribution/Solution Measure TOR Learning TOR Traffic Building TOR Traffic Evaluating TOR Traffic
Measuring TOR • This is done using PrivCount which is an Open Source Privacy Preserving distributed Measurement System. • PrivCount has three components – - Tally Server - Share Keeper - Data Collector • PrivCount offers the following security properties – - Forward Secrecy - Measurement Aggregation - Private Measurement Results • The primary goal of measuring TOR was to improve upon the accuracy of previously reported PrivCount measurement results.
Measuring TOR (continued) Entry Relay - they can observe clients and circuits, and hence, the distribution of circuits per client. Exit Relays - they can observe streams
Measurement Process • Two Phases – - General Measurement - Traffic Model Measurement General Measurement allows for accuracy and to get an insight of most important traffic classes. Traffic Model Measurement focuses on measuring stream and packet hidden Markov traffic models. - Uses exit based observations (why?)
Observations (contd.) Active Circuit - if eight or more cells were sent on it • Active Client - if it has at least one active circuit • 55 to 59 percent of active circuits carry only one or two streams and about 21 to 25 percent of circuits carry 3 to 6 streams. • Another 9 to 11 percent of circuits carry 7-14 streams, and about4-11 percent carry 15 or more streams. • This suggests that Tor Browser users may experience the web differently than non-Tor users. • A single circuit per client is the most common, followed by two, three, or four circuits per client. • Most Tor users only use a handful of circuits in an average 10 minutes. • Many Tor Browser users are only lightly browse the web.
Learning TOR Traffic • In this case, exits are reliable (why?) • They can observe - Stream Model events - Packet Model events - Both Stream and Packet Models • So, PrivCount’s observation of exit relay is essential for learning both the HMM models of TOR traffic
HMM • HMM is a statistical model in which the system being modeled is assumed to be a Markov process with unobservable (i.e. hidden) states. • A Markov process is a series of possible events in which the probability of each event depends only on the state attained in the previous event.
Measurement Process • Step 1: Initiate the Markov Model • Step 2: Observe TOR traffic, use PrivCount to count. • Step 3: Update Model Measure HMM path with PrivCount - Observe inter-stream delays - Most likely HMM path (Viterbi) - Count HMM frequencies - Update HMM probs. using weight parameter
Observations For both the stream arrival model and the packet model, - The later models are generally superior fits to new data than the earliest models. - Potentially anomalous measurement periods can cause some iterations to produce inferior models that then continue to improve in following iterations. - Using the results of these measurements, authors chose the stream and packet models that had the best performance as the basis for Shadow experiments.
Traffic Generator & Models • Tgen – Traffic Generator - Based on action dependency graphs. - Used for collecting performance benchmarks. Three Tgen configs are created. Protocol Model PrivCount Model Single File Model
Evaluation • To evaluate the whole 4 step process, Shadow is used. • At this stage, all the three tgen models are run on Shadow. • The Shadow usually has datasets 5 years old, so RipeAtlas data was used.
Results at 22.19.00 • PrivCount model requires more computational and memory resources to run in Shadow compared to the single file and protocol models. • The generator times are negligible when compared to the time to send and receive network traffic. • Generator generated fewer than 165 streams per circuit across all generated circuits.
Critique • In the first stage, the authors haven’t mentioned which traffic classes are the most important in the general measurement phase. • HMM is a generative, probabilistic model and can represent only a small fraction of distributions over the space of possible sequences. • Due to their Markovian nature, HMMs do not take into account the sequence of states leading into any given state. • All the algorithms used by HMM are expensive in terms of memory and computation time. • Shadow imitates the system behaviour and network processes in order to carry out experiments on a single machine and this is heavily subject to delays. • The middle relay is useful as it can observe the difference between types of traffic. This relay model was not included in this research.
What could be a possible solution ? • Machine learning can be an answer. • We can have a Machine Learning process with a learning phase followed by prediction phase. • This model has two advantages over the one proposed in the paper - can detect the type of service being used. - whether the user is actually using the network.