1.53k likes | 1.67k Views
Network Measurements, Modeling and Simulations. Kun-chan Lan Department of Computer Science and Information Engineering klan@csie.ncku.edu.tw. Some Admin stuff. Paper review list due next week The references for homework is post on the course webpage
E N D
Network Measurements, Modeling and Simulations Kun-chan Lan Department of Computer Science and Information Engineering klan@csie.ncku.edu.tw
Some Admin stuff • Paper review list due next week • The references for homework is post on the course webpage • We will have a guest speaker on 3/18 to talk about how to do game measurements (related to your homework 2) – No class lecture
A quick survey… Why do you come to this class? What do you want to get out of this course?
Learn about ns-2? • Learn how to measure traffic? • Learn how to use emulator? • Somebody suggested you to try it out? • None of the above (you have no idea why you came here!)
Outline • Model and simulate Internet traffic • It’s hard to model and simulate Internet • Use measurement to improve the realism of your model • We advocate trace-driven simulation • Internet and wireless measurements
What is a model? • Abstraction of real world • Base of a network simulation • Topology model • e.g. “a dumbbell topology” • Traffic model • “80% TCP + 20% UDP” • Queuing model • e.g. “FIFO”, “Fair queuing”, etc. • …..
Role of simulation • Based on some particular models • Topology: e.g. dumbell vs. tree • Traffic: e.g. TCP vs. UDP • … • Widely used by researcher to study Internet • Millions of hosts in different administrative domains • Simulation vs. experiment (Why simulation?) • Repeatability • Configurability • Scalability • Explore complicated scenarios • Study “future” application/prtotocol/network
What simulation does’t do • Realism • Details of simulation matters! • It’s your responsibility to know what level of details you need to capture in the simulation • Prove correctness of the model • Only for validation! • The value of simulation relies on a good model
It’s hard to simulate Internet • Network heterogeneity • Rapid and unpredictable change
Network heterogeneity • Topology • Link properties • Protocol • traffic • All the above matter when you do the simulation
Difficulty in modeling topology • Constantly changing • Routing change • Link/node up and down • ISPs typically do not make topological information available • There is no “typical” topology • Depends on what are you simulating
Difficulty in modeling links • large diversities • Speed: e.g. modem vs. fiber optic link • Loss: e.g. cooper wire vs. 802.11 • Transmission: point-to-point vs. broadcast • Latency: DSL vs. satellite links • Routing-dependent • Asymmetry
Difficulty in modeling protocol • Differences in implementations • 400 different TCP implementations • Different applications and different traffic mix
Difficulty in modeling traffic • Traffic is different everywhere • Effect of background traffic • Queuing, congestion • Some application are adaptive to network conditions
Rapid and unpredictable changes • Change in TCP: Reno -> NewReno/SACK • Change in devices: PC->handheld • Change in web: caching -> CDN • Change in killer applicaton: • web->p2p->VoIP? • Change in physical layer: wired -> wireless
Coping strategy • OK, so it’s hard to simulate Internet, but can we do something about it? • Yes • Systematically explore important parameters • Searching for invariants
Network behavior as a function • Explore network behavior as a function of changing parameters • <observed traffic> = f(x1,x2,x3,…..) • Impossible to explore the whole set of parameters • Challenge: identify important parameters • Example parameters to which a simulation might be sensitive • Congestion • Topology • Router mechanism (routing, scheduling, etc.)
Search for Invariants • Invariant: behavior that holds in a very wide range of environment • Examples • Diurnal patterns • Self-similarity • Poisson session arrival • Heavy-tailed distribution • Geographical topology • Extract invariants from real world data • Extensive measurements!
Outline • Model and simulate Internet traffic • It’s hard to model and simulate Internet • Internet and wireless measurements • Case study: modeling heavy-hitter traffic
Why measuring? • To tell us what are the invariants, and what are just artifacts of the system • A base for realistic modeling and simulation • A common practice in other science disciplines (physics, biology, etc)
A measurement plan • What questions you want to answer? • Testbed setup • How to collect the traces? And for how long? • What to collect? (what is your performance metrics) • Data analysis All of these should be in your project report!
TCP over GPRS network How fair is TCP over GPRS?
Things I am going to tell you next • What can you measure? • Things that you need to know when you measure • Where can you get Internet traffic measurements for free?
Measure the Internet • What can you measure • Traffic • Routing • Topology • Performance • Multicast • Wireless/Mobility
Tool for measuring traffic • Tcpdump/etherreal (libpcap) • Netflow • NetTrMet/RTG (SNMP)
tcpdump/Ethereal • tcpdump • Most commonly used packet collector • based on libpcap API • Output can be easily analyzed using awk/perl scripts • Ethereal • GUI-based • Support various trace formats, including tcpdump, snoop, etc. • Support various link-layer headers, including 802.11, ATM, etc. • tcpdpriv • A commonly used packet anonymizer (to share traces with the others) • Libpcap-based • Link-level headers are passed through unchanged.
Usage of tcpdump tcpdump [ -adeflnNOpqStvx ] [ -ccount ] [-Ffile ] [ -iinterface ] [ -rfile ] [ -ssnaplen ] [-Ttype ] [ -wfile ] [expression ] Must run as root or have sudo permission
<option> -iListen on interface. If unspecified, tcpdump searches the system interface list for the lowest numbered, configured up interface (excluding loopback) -nDon't convert addresses (i.e., host addresses, port numbers, etc.) to names
<option> -pDon't put the interface into promiscuous mode. -q Quick (quiet?) output. Print less protocol information so output lines are shorter. -r Read packets from file (which was created with the -w option). Standard input is used if file is ``-''.
<option> -wWrite the raw packets to file rather than parsing and printing them out. They can later be printed with the -r option. Standard output is used if file is ``-''. -rRead packets from file (which was created with the -w option). Standard input is used if file is ``-''. -SPrint absolute, rather than relative, TCP sequence numbers
<option> -s snarf snaplen bytes of data from each packet rather than the default of 68. 68 bytes is adequate for IP, ICMP, TCP and UDP but may truncate protocol information from name server and NFS packets. Packets truncated because of a limited snapshot are indicated in the output with ``[|proto]'', where proto is the name of the protocol level at which the truncation has occurred. Taking larger snapshots both increases the amount of time it takes to process packets and, effectively, decreases the amount of packet buffering. This may cause packets to be lost. - Limit snaplen to the smallest number that will capture the protocol information you're interested in.
<option> -tDon't print a timestamp on each dump line. -ttPrint an unformatted timestamp on each dump line. -v(Slightly more) verbose output. For example, the time to live and type of service information in an IP packet is printed. -vvEven more verbose output. For example, additional fields are printed from NFS reply packets. -xPrint each packet in hex.
<expression> • selects which packets will be dumped. If no expression is given, all packets will be dumped. Otherwise, only packets for which expression is `true' will be dumped. • The expression consists of one or more primitives. Primitives usually consist of an id (name or number) preceded by one or more qualifiers. • There are three different kinds of qualifier. <type> <dir> <proto>
<qualifier> <type> • what kind of thing the id name or number refers to • Possible types are host, net and port • E.g., `host csie.ncku.edu.tw', `net 146.132', `port 20' • If there is no type qualifier, host is assumed.
<qualifier> <dir> • specify a particular transfer direction to and/or from id. • Possible directions are src, dst, src or dst and src anddst. • E.g., `src csie.ncku.edu.tw', `dst net 146.132', `src or dst port ftp-data'. • If there is no dir qualifier, src or dst is assumed
<qualifier> <proto> • restrict the match to a particular protocol. • Possible protos are: ether, fddi, ip, arp, rarp, decnet, lat, sca, moprc, mopdl, tcp and udp. • E.g., `ether src server1.ncku.edu.tw', `arp net 128.3', `tcp port 21'. • If there is no proto qualifier, all protocols consistent with the type are assumed. E.g., `src mail.ncku.edu.tw' means `(ip or arp or rarp) src mail.ncku.edu.tw'
Complex expression • complex filter expressions are built up by using the words and, or and not to combine primitives. • E.g., `host csie.ncku.edu.tw and not port ftp and not port ftp-data'. • Iidentical qualifier lists can be omitted. E.g., `tcp dst port ftp or ftp-data or domain' == `tcp dst port ftp or tcp dst port ftp-data or tcp dst port domain'.
Allowable primitives • dst host host • src host host • host host • ether dst ehost • ether src ehost • ether host ehost • gatewayhost
Allowable primitives • dst net net • src net net • net net • net netmask mask • net net/len True if the IP address matches net a netmask len bits wide. May be qualified with src or dst. • dst port port • src port port • port port
Allowable primitives • less length True if the packet has a length less than or equal to length. This is equivalent to: len <= length. • greater length • ip proto protocol • True if the packet is an ip packet of protocol type protocol. Protocol can be a number or one of the names icmp, igrp, udp, nd, or tcp. Note that the identifiers tcp, udp, and icmp are also keywords and must be escaped via backslash (\) • ether broadcast • ip broadcast
Allowable primitives • ether multicast • ip multicast • ip, arp, rarp, decnet short for: ether proto pwhere p is one of the above protocols. • tcp, udp, icmp short for: ip proto p
Relation operator • expr relop expr • relop is one of >, <, >=, <=, =, != • expr is an arithmetic expression composed of integer constants, the normal binary operators [+, -, *, /, &, |], a length operator, and special packet data accessors. • To access data inside the packet, use the following syntax: proto [ expr : size ]Proto is one of ether, fddi, ip, arp, rarp, tcp, udp, or icmp. E.g.tcp[0] means the first byte of the TCP header • For example, `ether[0] & 1 != 0' catches all multicast traffic. The expression `ip[0] & 0xf != 5' catches all IP packets with options.
Combining primitives • Primitives may be combined using: • Negation (`!' or `not'). • Concatenation (`&&' or `and'). • Alternation (`||' or `or'). • Negation has highest precedence. Alternation and concatenation have equal precedence and associate left to right.. • If an identifier is given without a keyword, the most recent keyword is assumed. • E.g.,not host vs and aceis short for not host vs and host ace,which should not be confused with not ( host vs or ace )
Netflow • Built-in service for most Cisco router/switch that runs Cisco IOS • Provide flow-level information • First packet in a flow is used to build an entry in the cache • Per-interface basis • Useful for accounting/billing, traffic monitoring, user profiling, data mining, etc.
More on Netflow • Typical cache size: 4K-128K (typical DRAM size: 2M-8M) • Need to use the cache efficiently • When to expire netflow cache entries • Idle time > t • Long-lived flows (duration > 30min) • TCP connections with FIN or RST • when cache becomes full (applying some heuristics to age flows)
Management of Netflow • Netflow FlowCollector • can collect flow info from multiple NetFlow-enabled devices • data volume reduction through selective filtering and aggregation • store flow information for off-line analysis • Netflow FlowAnalyzer • data visualization: graphical data display • data export to external applications (such as Excel) • Netflow Server • collect flow statistics from multiple FlowCollector • further summarize NetFlow statistics by enabling bi-directional consolidation • store NetFlow statistics in a common commercial RDBMS (can be queried via SQL later) • encrypt and compress NetFlow statistics
NetTrMet • Collect flow data via SNMP • builds up packet and byte counts for traffic flows • Flows are defined by their end-point addresses • Address can be ethernet addresses, IP address or the combination of both • Can specify a set of rules to filter the flows of interest • Run under dos or Unix
RTG • A SNMP statistics monitoring system • Commonly used by ISPs • collect time-series SNMP data from a large number of interfaces • Run as a daemon • All collected data is inserted into a relational database where complex queries and reports may be generated via SQL • can poll at sub-one-minute intervals • utilities are included to generate traffic reports, 95th percentile reports and graphical data plots