320 likes | 417 Views
Automatically Inferring Patterns of Resource Consumption in Network Traffic. Cristian Estan, Stefan Savage, George Varghese University of California, San Diego. Who is using my link?. Looking at the traffic. Too much data for a human. Do something smarter!. Src. IP. Dest. IP. Dest. IP.
E N D
Automatically Inferring Patterns of Resource Consumption in Network Traffic Cristian Estan, Stefan Savage, George Varghese University of California, San Diego
Who is using my link? Traffic Clusters - 2003
Looking at the traffic Too much data for a human Do something smarter! Traffic Clusters - 2003
Src. IP Dest. IP Dest. IP Source port Protocol Src. port Dest. port Src. net Dest. net Dest. net Looking at traffic aggregates • Aggregating on individual packet header fields gives useful results but • Traffic reports are not always at the right granularity (e.g. individual IP address, subnet, etc.) • Cannot show aggregates defined over multiple fields (e.g. which network uses which application) • The traffic analysis tool should automatically find aggregates over the right fields at the right granularity Which network uses web and which one kazaa? Where does the traffic come from? …… What apps are used? Most traffic goes to the dorms … Traffic Clusters - 2003
Ideal traffic report Web is the dominant application This is a Denial of Service attack !! The library is a heavy user of web That’s a big flash crowd! This paper is about giving the network administratorinsightfultraffic reports Traffic Clusters - 2003
Contributions of this paper • Approach • Definitions • Algorithms • System • Experience Traffic Clusters - 2003
Approach • Characterize traffic mix by describing all important traffic aggregates • Multidimensional aggregates (e.g. flash crowd described by protocol, port number and IP address) • Aggregates at the the right level of granularity (e.g. computer, subnet, ISP) • Traffic analysis is automated– finds insightful data without human guidance Traffic Clusters - 2003
Definition: traffic clusters • Traffic clustersare the multidimensional traffic aggregates identified by our reports • A cluster is defined by a range for each field • The ranges are from natural hierarchies (e.g. IP prefix hierarchy) – meaningful aggregates • Example • Traffic aggregate: incoming web traffic for CS Dept. • Traffic cluster: ( SrcIP=*, DestIP in 132.239.64.0/21, Proto=TCP, SrcPort=80, DestPort in [1024,65535] ) Traffic Clusters - 2003
Definition: traffic report • Traffic reports give the volume of chosen traffic clusters • To keep report size manageable describe only clusters above threshold (e.g. H=total of traffic/20) • To avoid redundant data compress by omitting clusters whose traffic can be inferred (up to error H) from non-overlapping more specific clusters in the report • To highlight non-obvious aggregates prioritize by using unexpectedness label • Example • 50% of all traffic is web • Prefix B receives 20% of all traffic • The web traffic received by prefix B is 15% instead of 50%*20%=10%, unexpectedness label is 15%/10%=150% Traffic Clusters - 2003
Contributions of this paper • Approach • Definitions • Algorithms • System • Experience Traffic Clusters - 2003
Algorithms and theory • Algorithms and theoretical bounds in the paper • Unidimensional reports are easy to compute • Multidimensional reports are exponentially harder as we add more fields • Next few slides • Example of unidimensional compression • Example for the structure of the multidimensional cluster space Traffic Clusters - 2003
500 500 10.0.0.0/28 10.0.0.0/29 10.0.0.8/29 120 120 380 380 10.0.0.0/30 10.0.0.4/30 10.0.0.8/30 50 70 305 305 75 10.0.0.10/31 10.0.0.2/31 10.0.0.4/31 10.0.0.8/31 50 70 270 270 35 75 160 110 Unidimensional report example Threshold=100 Hierarchy 10.0.0.12/30 10.0.0.14/31 40 35 15 35 30 160 110 75 10.0.0.2 10.0.0.3 10.0.0.4 10.0.0.5 10.0.0.8 10.0.0.9 10.0.0.10 10.0.0.14 Traffic Clusters - 2003
500 10.0.0.0/28 120 380 305 10.0.0.8/30 270 10.0.0.8/31 160 110 Unidimensional report example Compression 380-270≥100 120 380 10.0.0.0/29 10.0.0.8/29 305-270<100 160 110 10.0.0.8 10.0.0.9 Traffic Clusters - 2003
Source net Application All traffic All traffic US EU Web Mail CA NY GB DE US Web Multidimensional structure ex. Nodes (clusters) have multiple parents Nodes (clusters) overlap US CA Web Traffic Clusters - 2003
Contributions of this paper • Approach • Definitions • Algorithms • System • Experience Traffic Clusters - 2003
names categories System: AutoFocus Cluster miner Web based GUI Grapher Traffic parser Packet header trace Traffic Clusters - 2003
Contributions of this paper • Approach • Definitions • Algorithms • System • Experience Traffic Clusters - 2003
Structure of regular traffic mix • Backups from CAIDA to tape server • Semi-regular time pattern • FTP from SLAC Stanford • Scripps web traffic • Web & Squid servers • Large ssh traffic • Steady ICMP probing from CAIDA SD-NAP SD-NAP Traffic Clusters - 2003
Analysis of unusual events • UCSD to UCLA route change • Sapphire/SQL Slammer worm Site 2 Traffic Clusters - 2003
Conclusionsraffic Clusters - 2003
Conclusions • Multidimensional traffic clusters using natural hierarchies describe traffic aggregates • Traffic reports using thresholding identify automatically conspicuous resource consumption at the right granularity • Compression produces compact traffic reports and unexpectedness labels highlight non-obvious aggregates • Our prototype system, AutoFocus, provides insights into the structure of regular traffic and unexpected events Traffic Clusters - 2003
Thank you! Alpha version of AutoFocus downloadable from http://ial.ucsd.edu/AutoFocus/ Any questions? Acknowledgements: NIST, NSF, Vern Paxson, David Moore, Liliana Estan, Jennifer Rexford, Alex Snoeren, Geoff Voelker Traffic Clusters - 2003
Bounds and running times Traffic Clusters - 2003
Open questions • Are there tighter bounds for the size of the reports? • Are there algorithms that produce smaller results? • Are there algorithms that compute traffic reports more efficiently? In streaming fashion? Traffic Clusters - 2003
Delta reports • Why repeat the same traffic report if the traffic doesn’t change from one day to the other? • Delta reports describe the clusters that increased or decreased by more than the threshold from one interval to the other • On related traffic mixes delta reports much smaller than traffic reports • Multidimensional compression very hard for delta reports • We have only exponential algorithm for the cluster delta Traffic Clusters - 2003
Greedy compression algorithm Traffic Clusters - 2003
Multidimensional report example Thresholding Compression Traffic Clusters - 2003
System details Traffic Clusters - 2003