220 likes | 347 Views
RelSamp : Preserving Application Structure in Sampled Flow Measurements. Myungjin Lee , Mohammad Hajjat , Ramana Rao Kompella , Sanjay Rao. A plethora of Internet applications. 1) Emergence of new applications. Objectives Re-provision networks Detect undesirable behaviors of applications
E N D
RelSamp:Preserving Application Structurein Sampled Flow Measurements Myungjin Lee, Mohammad Hajjat, RamanaRaoKompella, Sanjay Rao
A plethora of Internet applications 1) Emergence of new applications • Objectives • Re-provision networks • Detect undesirable behaviors of applications • Prepare network better against major application trends 3) Characterization 2) Measure/Monitor Internet
Monitoring applications at an edge • Goal: Monitoring application behavior • Identify number of flows • Identify number of packets • Current Solution: Sampled NetFlow • Supported by most modern routers • Key limitation: Application session structure gets distorted • Small # of flows per application session • Small # of packets per application session Internet Edge Router Enterprise Network Sampled NetFlow
Preserving application structure in flow measurements • Benefit 1: Enables continuous monitoring of applications • Better understanding about communication patterns • Better understanding of characteristics (# of flows, packets) • Benefit 2: Application classification becomes easier • Statistical machine learning techniques: SVM, C4.5, etc. • Social behavior-based classifier: BLINC • Benefit 3: Detecting undesirable traffic patterns of an application
Contributions • Introduce the notion of related sampling • Flows belonging to the same application session are sampled with higher probability • Propose RelSamparchitectureforrealizing related sampling • Uses three stages of sampling to preserve application structure • Show efficacy in preserving application structure • Captures more number of flows per application session • Significant increase of accuracy in application classification
Related sampling Original application structure Sampled NetFlow Related sampling App1 Key idea: Sample more flows from fewer application sessions App2 App3
Realizing related sampling • Question 1: How to sample an application session ? • Question 2: How to sample packets within an application session ?
Defining application session • A sequence of packets from an application on a given host with inter-arrival time ≤τseconds • Packets may belong to different flows to different destinations • Example 1: BitTorrent connections to several destinations within a short span of time constitute an application session • Example 2: Web connections from a browser several seconds apart constitute different application sessions
Sampling an application session • One possible approach: Similar to Sampled NetFlow • Sample packets with some probability • Create an application session record if no record exists • Update the application session record • Problem: Hard to do in an online fashion • No application session identifier (like flow key) • Need to know all flows that constitute an application session • DPI-based techniques are both difficult and incomplete
Our approach: sampling hosts • Observation: Host is a super-set of an application session • Sample more flows from the same host • Flows originating at a same host closely in time typically belong to few application sessions • About 80% hosts run fewer than 2 applications in our study • More details in the paper
RelSamp design • Three-stage sampling process consisting of host, flow, and packet selection stages • Host stage: hash-based sampling • No state maintained on a per-application basis • Many application sessions for a given host are possibly sampled • Change hash function periodically to track different hosts • Flow and packet stages: random packet sampling • Controls fraction of flows sampled in an application session and packets sampled in a flow • Post processing: Can separate flow records into application sessions using port-based/statistical classifiers
RelSamp architecture Ph = selection range / hash space 2 Copy Selection range 1 if ( random no. ≤Pf && no flow record) create a flow record if ( random no. ≤Pp && flow record) update the flow record 1 2 1 H(SrcIP) Hash space Tunable parameters Host-level bias stage Ph Pp Pf Pkt-level bias stage Flow-level bias stage Flow Memory
Exploring parametric space • Router sampling budget Pe = f(Ph, Pf, Pp) • Trade-off between accuracy of flow statistics and # flows/application session • Parameters can be tuned depending on • Objective • Network environment • Examples of tuning parameters by objective • Application classification: low Ph, high Pf, low Pp • Application characterization: lower Ph, high Pf, high Pp • Flow statistics of all flows: Ph = Pf = Pp= Pe
Evaluation goals • Application characterization • Question 1: Is RelSamp effective for sampling more # of flows in an application session? • Question 2: Can RelSamp estimate statistics of an application session? • Application classification • Questions 3: Is sampling more # flows in an application session beneficial for application classification?
Experimental setup • Evaluation of effectiveness for capturing more flows • Trace 1: 1 hour packet trace collected at an edge • RelSampconfiguration (other settings in paper): Capture more flows of app session from many hosts • , , () • Evaluation of application classification accuracy • Trace 2: 13-hour full-payload trace captured at a dorm network • RelSampsetting: Similar setting, but varies from 0.1 to 1.0 • Classifiers: BLINC [SIGCOMM ’05] , SVM, and C4.5 • Ground truth is obtained using DPI-based classifier (tstat)
Flows per application session More # of flows per app session CDF #captured flows/#total flows in an app session
Accuracy of BLINC classifier Accuracy (%) ~ 50% increase Sampling rate Note: classification results on flows using non-standard port
Related work • Flow Sampling [ToN ’06] • Samples flows once flow record is created • Flow Slices [IMC ’05] • Focuses on controlling router resources (CPU and memory) • cSamp [NSDI ’08] • Supports sampling of all traffic by coordinating various vantage points in a network • FlexSample [IMC ’08] • Support monitoring of traffic subpopulations, but needs to maintain extra states for approximate checking of predicates
Summary • Introduced the notion of related sampling • Samples more number of related flows in the same application session with higher probability • Proposed RelSamp architecture • Preserve application structure in sampled flow records • Effective to preserving application session structure • 5-10x more flows per application session compared to Sampled NetFlow • Up to 50% higher classification accuracy than Sampled NetFlow
Evaluation method of classification techniques Tstat DPI-based Classifier Ground Truth RelSamp Flow Record1 Classification Algorithm (e.g., BLINC, SVM, C4.5) Packet Trace Report Sampled NetFlow Flow Record2 Flow Sampling Flow Record3
Comparison with other solutions using BLINC # of accurately classified flows Sampling rate Note: classification results on flows using non-standard port