210 likes | 227 Views
Explore methodologies, measurements, and data cleaning strategies in analyzing P2P traffic, with insights on host distribution, connectivity, traffic volume, and more. Understand the dynamics, limitations, and impact of P2P protocols on network traffic.
E N D
Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research
P2P applications • Distributed file sharing • Napster, Gnutella, FastTrack, EDonkey, DirectConnect… • Searching v.s. data fetching phases • All the communications occur over default ports • SuperNodes and Hubs • Why is this interesting? • Large and growing traffic volume Analyzing peer-to-peer traffic accoss large networks
Outline • Methodology • Data collection • Characterization metrics • Analysis results • Traffic volume and overlay topology • System dynamics • Traffic characterization • P2P vs Web Analyzing peer-to-peer traffic accoss large networks
Methodology • Challenges • Decentralized system • Transient peer membership • Some popular close proprietary protocols • Large-scale passive measurement • Flow-level data from routers across a large tier-1 ISP backbone • Analyze both signaling and data fetching traffic • 3 levels of granularity: IP, Prefix, AS • P2P protocols • FastTrack:1214 (including Morpheus) • Gnutella:6346/6347 • DirectConnect:411/412 Analyzing peer-to-peer traffic accoss large networks
Methodology Discussion • Advantages • Requires minimal knowledge of P2P protocols: port number • Large scale non-intrusive measurement • More complete view of P2P traffic • Allows localized analysis • Limitations • Flow-level data: no application-level details • Incomplete traffic flows • Other issues • DHCP, NAT, proxy • Host IP • Asymmetric IP routing Analyzing peer-to-peer traffic accoss large networks
Measurements • Characterization • Overlay network topology • Traffic distribution • Dynamic behavior • Metrics • Host distribution • Host connectivity • Traffic volume • Mean bandwidth usage • Traffic pattern over time • Connection duration and on-time Analyzing peer-to-peer traffic accoss large networks
Data cleaning • Invalid IPs • 10.0.0.0-10.255.255.255 • 172.16.0.0-172.31.255.255.255 • 192.168.0.0-192.168.255.255 • No matched prefixes in routing tables • Invalid AS numbers • > 64512 • Removed 4% flows Analyzing peer-to-peer traffic accoss large networks
Overview of P2P traffic • Total 800 million flow records • FastTrack is the most popular one Analyzing peer-to-peer traffic accoss large networks
Host distribution Analyzing peer-to-peer traffic accoss large networks
Host connectivity FastTrack (9/14/2001) Connectivity is very small for most hosts, very high for few hosts Distribution is less skewed at prefix and AS levels Analyzing peer-to-peer traffic accoss large networks
Traffic volume distribution FastTrack (9/14/2001) • Significant skews in traffic volume across granularities • Few entities source most of the traffic • Few entities receive most of the traffic Analyzing peer-to-peer traffic accoss large networks
Mean bandwidth usage FastTrack (9/14/2001) • Upstream usage < downstream usage. Possible causes are • Asymmetric available BW, e.g., DSL, cable • Users/ISPs rate-limiting upstream data transfers Analyzing peer-to-peer traffic accoss large networks
Time of day effect FastTrack (9/14/2001 GMT) • Traffic volume exhibits very strong time-of-day effect • Milder time-of-day variation for # hosts in the system Analyzing peer-to-peer traffic accoss large networks
Host connection duration & on-time FastTrack (9/14/2001) thd=30min • Substantial transience: most hosts stay in the system for a short time • Distribution less skewed at the prefix and AS levels • Using per-cluster or per-AS indexing/caching nodes may help Analyzing peer-to-peer traffic accoss large networks
Traffic characterization • The power law • May not be a suitable model for P2P traffic • Relationship between metrics • Traffic volume • Number of IPs • On-time • Mean bandwidth usage Analyzing peer-to-peer traffic accoss large networks
Traffic volume vs. on-time FastTrack (9/14/2001): top 1% hosts (73% volume) 1 2 Volume heavy hitters tend to have long on-times Hosts with short on-times contribute small traffic volumes Analyzing peer-to-peer traffic accoss large networks
Connectivity vs. on-time FastTrack (9/14/2001): top 1% hosts (73% volume) 1 2 Hosts with high connectivity have long on-times Hosts with short on-times communicate with few other hosts Analyzing peer-to-peer traffic accoss large networks
P2P vs Web • Observations • 97% of prefixes contributing P2P traffic also contribute Web traffic • Heavy hitter prefixes for P2P traffic tend to be heavy hitters for Web traffic • Prefix stability – the daily traffic volume (in %) from the prefix does not change over days • Experiments: 0.01%, 0.1%, 1%, 10% heavy hitters => 10%, 30%, 50%, 90% of the traffic volume Analyzing peer-to-peer traffic accoss large networks
Traffic stability March 2002 Top 0.01% prefixes Top 1% prefixes P2P traffic contributed by the top heavy hitter prefixes is more stable than either Web or total traffic Analyzing peer-to-peer traffic accoss large networks
Summary • Measure and characterize P2P traffic across a large network • Three popular P2P systems • Significant increase in both number of users and traffic volume • Traffic distributions are highly skewed • High level system dynamics • P2P is significant, but stable component of the Internet traffic Analyzing peer-to-peer traffic accoss large networks
Acknowledgement • AT&T Labs • Matt Grossglauser, Carsten Lund, Jennifer Rexford, Matt Roughan, Fred True • External • Steve Gribble Analyzing peer-to-peer traffic accoss large networks