540 likes | 680 Views
UW-Madison - FlowScan and Rate Limiting Adventures. I2 Techs Conference May 17, 2001 Michael Hare. Presentation Overview. FlowScan Controlling ResNet traffic: Some experiences with rate limits. FlowScan: A Network Traffic Reporting and Visualization Tool.
E N D
UW-Madison - FlowScan and Rate Limiting Adventures I2 Techs Conference May 17, 2001 Michael Hare
Presentation Overview FlowScan Controlling ResNet traffic: Some experiences with rate limits
FlowScan: A Network Traffic Reporting and Visualization Tool FlowScan is a software package for open systems that is freely available under the terms of the GNU General Public License. Primarily developed by Dave Plonka of UW-Madison. FlowScan analyses and reports on flow data exported by IP routers. FlowScan produces graph images which provide a continuous, near real-time view of network traffic across a its border suitable for webpages.
Background on Flows • The notion of flow profiling was introduced by the research community. • Today, for performance and accounting reasons, flow profiling is built into some networking devices. • Not yet standards-based, FlowScan utilizes flows defined and exported by Cisco's NetFlow feature.
Sample Flows - FTP An IP flow is a unidirectional series of IP packets of a given protocol, travelling between a source and destination, within a certain period of time.
FlowScan • FlowScan maintains counters based upon flow classifications and periodically exports information into databases. • Counters are currently maintained based on these flow attributes: • Protocols (ICMP, TCP, UDP) • Services (FTP, SMTP, HTTP, P2P Apps) • Subnets (if desired) • AS pairs • Works with most Cisco and RiverStone RS routers • Compatibility with Juniper's routers and packet-sampling-based flows is in the planning stages (More on this later)
Some Uses For Flowscan • Short term network analysis lets you discover recent changes in network behavior. Graphs over a short time frame are based upon five-minute intervals. • Long term network analysis aids in capacity planning and traffic shaping efforts.
Short-Term Network Analysis: Redhat 7.1 Release Events, such as the release of RedHat 7.1, are visible as jumps in outbound traffic patterns. Outbound Computer Science traffic increased from 10 Mb/s to nearly 80Mb/s almost instantly.
Short-Term Network Analysis: DoS Detection Network abuse, such as flood-based Denial of Service attacks, are visible as "stalagmites" and "stalactites". These would be hidden in coarser-grained long-term graphs. Since one flow is created for each series of packets between a source and a destination, portscans are common culprits for these “Flow Explosions”.
Short-Term Network Analysis: DoS Detection (cont) Difference in the number of hosts talking out vs. being talking to in a 5 minute period. Another scheme for detecting portscans unearths the huge amounts of probes initiated in a 48 hour period.
Long Term AnalysisInput/Output totals, 730 days prior to 12 May 2001 • The academic calendar year dramatically influences campus traffic levels, mostly notably in ResNet. Since the beginning of data collection in early 1999, ResNet users have typically been larger providers than consumers of Internet content. • Outbound traffic consistently exceeds our inbound traffic level, but this academic year’s inbound / outbound traffic patterns haven’t experienced the typical ‘doubling’ effect; access links at or near capacity.
Long Term AnalysisApplication totals, 365 days prior to 12 May 2001 Here, we get a glimpse of the rise and fall of Napster, the first ‘killer’ p2p app. Although Napster usage has declined, outbound from traffic from ResNet has not.
UW-MadisonNapster vs. Gnutella Usage Mid-Dec through Mid-Jan was a quiet time on campus for Napster, as the primary users are not utilizing the network. Here, we clearly illustrate the declining usage of Napster and the increased usage of Gnutella. As was with Napster, the campus appears to be a larger provider than consumer of Gnutella data.
UW-MadisonNapster vs Gnutella Usage (cont) For the first time, Gnutella overtakes Napster as UW-Madison’s most popular P2P file swapping application.
Long Term AnalysisPeering, 730 days prior to 12 May 2001 FlowScan lets you monitor the effectiveness of your peering by reporting the next-hop source or destination AS’s of your traffic. Our biggest peers are WiscNet and Abilene.
CampusIO Extension ModulesTop ASNs Flowscan can help you make informed peering and provisioning decisions by reporting the amount of traffic that other AS’s sources, sinks, or carries for your institution. Above, our most popular origin (endpoint) peer is @Home. We are currently working on a peering arrangement.
CampusIO Extension ModulesAlerts To deal with DoS floods, alerts via pager and email were introduced. Currently based on tolerances set in a configuration file. Looking for ways to utilize AI-type heuristics to automate tolerances.
CampusIO Extension ModulesTop Talkers This output of FlowScan’s Top Talkers module (anonymized sample shown here) lets you see top bandwidth consumers and providers.
Implementing Flowscan inLarge Scale (ISP) Networks WiscNet, Wisconsin’s statewide educational network, is currently researching several challenges of utilizing FlowScan in a large environment. • Limitations of the flow processor itself (FlowScan) • Limitations of the exporting hardware (Routers)
Limitations of Flow Processing:Flow Processing UW-Madison campus collects flows from a Cisco 7507 and processes them on a 700Mhz P3. FlowScan almost falls behind during peak usage times, because there are too many flows to process. WiscNet handles 2~3x the amount of traffic of Madison, and will be collecting flows from multiple border routers and processing them on a 1Ghz machine. Without some course of action, it is doubtful that the processor will be able to keep up.
Limitations of Flow Processing:Flow Exporting Large ISPs tend to have devices with high-speed interfaces. Because of router CPU utilization, current hardware is not able to support full flow export on heavily utilized high-speed interfaces (OC12+). Running FlowScan in an environment with multiple edge routers, possibly with mixed vendors, adds complexity. Juniper routers do not support full scale flow exporting, but they do support a concept known as packet sampling.
Packet Sampling In order to reduce the CPU demands on their routers, Juniper utilized the concept of packet sampling; instead of considering each packet for flow export, they only examine a configurable percentage. UW-Madison campus recently evaluated a Juniper router, and found that with its current interfaces and amount of traffic processed, a sampling rate of 1 out of every 96 packets had to be set, otherwise the Juniper would become overburdened in flow export duties.
Packet Sampling (cont) With packet sampling, the produced graphs looked similar to the graphs produced during non-sampled periods.
The Bright Side of Packet Sampling As an added bonus, we saw the amount of flows being exported from our router drop nearly 90%. FlowScan could easily keep up with this level of flows.
The Ugly Side of Packet Sampling Packet sampling broke some things we expected and more. • Our security team relied on the logs produced by the 1 to 1 flow exporting when investigating network abuse and techno-crimes. We no longer could provide a completely accurate view of our network traffic. • We lost the ability to detect DoS attacks based on the "stalagmites" and "stalactites“ in the flow graphs, because we were only catching about 1/96th of the usually single-packet flow portscans.
More Ugly Things about Packet Sampling • FlowScan itself relies on the 1 to 1 flow exporting for application classification. The Napster and Passive FTP detection modules determined users by looking for patterns in the packets; • For Napster, look for a client talking to an index server before counting port 6699 traffic as Napster data. • For Passive FTP, look for established port 20 connections. Packet sampling gives us no guarantee that these packets will be sampled.
Statistical Accuracy of Flow Sampling: Non Sampled Model I was surprised to find that more than 88% of our flows consisted of only twelve packets or less. 76% were six packets or less. Upon investigation, these were typically SYNs, ACKs, UDP, and small bits of web content.
Statistical Accuracy of Flow Sampling: Sampled Model In a sampled model, only 39% of our flows consisted of twelve packets or less, and only 27% of our flows were six packets or less. This was compared to 88% and 76% respectively in our non-sampled model.
Conclusions on Sampling • In our short eval period, the sampled application and input/output graphs appeared representative of the campus traffic, but the nature of the traffic being reported dramatically shifted. • Larger flows were over-represented, and smaller flows were under-represented. • Longer studies need to be done.
The Future of Flow Accounting: • FlowScan is currently coded in Perl for easy maintenance and portability. Further speed improvements may come in a rewrite to C, or creating codebase that can utilize multiple processors. • Running multiple FlowScan instances and aggregating totals collected by each flow processor. • Breaks stateful inspection. • Vendor support in hardware for 1 to 1 flow accounting.
FlowScan Information on flowscan can be found here: http://net.doit.wisc.edu/~plonka/FlowScan/ The UW-Madison Campus uses FlowScan to graph traffic patterns. The live site is available here. http://wwwstats.net.wisc.edu
FlowScan This concludes this portion of the presentation.
Controlling ResNet Traffic We started investigating rate limits in order to get a handle on ResNet usage. Napster outbound at times compromised 50% of our outbound traffic. We first tried educating users to remove server functions of their Napster clients, but no change in network behavior was observed.
Rate Limiting • Once UW-Madison had FlowScan in place for measurement instrumentation, it became a great tool by which to gauge the effectiveness of configuration changes. • We needed to attain predictability for network costs, including bandwidth, engineering, and equipment resources.
Basic Types of Rate Limiting:Traffic Shaping • Traffic shaping - Traffic comes into a queue and is released at a specified rate, thereby smoothing the flow of traffic. This queuing introduces latency into the flow. (Juniper Networks)
Basic Types of Rate Limiting:Traffic Shaping • Advantages • Prevents congestion at aggregation points. • Available in a number of routers. • Disadvantages • Doesn't necessarily allow all available network capacity to be utilized. • Doesn’t allow "bursting" beyond the configured rate-limit, even if the average rate would conform to the limit.
Basic Types of Rate Limiting:Traffic Policing Traffic policing - Traffic comes into an interface, and a decision is made either to drop, pass, or mark the traffic (best effort/less than bet effort). Queuing is not involved so it doesn't degrade performance of conformant traffic. (Juniper Networks)
Basic Types of Rate Limiting:Traffic Policing Hard policing causes an immediate drop of the packet, which causes retransmissions. Soft policing is the ability to defer the decision about whether or not to drop a given packet until that packet reaches a downstream router which is better informed as to whether or not congestion currently exists.
Practical Rate-Limit Methods:Aggregate Rate-Limiting Aggregate rate limits are usually enforced at some central point in the network. The rate-limit is applied to either a physical interface or to traffic defined by addressing or by application, for example, as can be defined using a Cisco Access Control List (ACL). Aggregate limits can be implemented with policing and/or shaping techniques.
Aggregate Rate-Limiting:Pros and Cons • Advantages • Relatively simple to configure. • Simple to enforce for the router hardware because most rate-limit implementations of this sort do not need to track the state of individual connections. • Disadvantages • Inability to track individual users, hosts, or application sessions. As such they can unfairly punish some users or applications by indiscriminately dropping their packets rather than others. • Decreases goodput by causing retransmissions
Aggregate Rate-Limiting:Experiences • UW-Madison has experimented with Cisco's Committed Access Rate (CAR) limits on a 7507 router at our campus border. Although it effectively limited traffic to the specified level, it was reported that ftp users in the outside world were unable to even establish a connection to the rate-limited ftp servers because the all of the returning ACK packets were dropped during high congestion.
Aggregate Rate-Limiting:Example • The following commands configured CAR on our Cisco border router to limit a user population's outbound traffic to 10Mb/s: access-list 125 permit tcp 10.10.0.0 0.0.255.255 any interface (your interface) rate-limit output access-group 125 10000000 1000000 1000000 conform-action transmit exceed-action drop
Practical Rate-Limit Methods:Flow-Based Rate-Limiting • Flow based rate limiting conforms individual traffic flows to a predetermined allocation of bandwidth. They are most effective nearest the population one wishes to control. As with aggregate limits, the rate-limit is applied to either a physical interface or to traffic defined by addressing or by application, and can also be implemented with policing and/or shaping techniques.
Flow-Based Rate-Limiting:Pros and Cons • Advantages • Somewhat fair in that they distribute packet drops across individual application sessions of users. • One user's session doesn't impinge on another's since each flow gets its own allocation. • There is a fine level of granularity of control, because each direction of individual streams can be affected. • Disadvantages • Individual users can't burst traffic within a single session. • Retranmissions are caused when packets are dropped, leading to decreased goodput. • Users that create more simultaneous sessions get more bandwidth. A local server can get a large percentage of available bandwidth. • There are some scalability issues, but this is improving with application-specific hardware support. • Doesn’t set any ‘hard’ limits. Bandwidth usage not guaranteed.
Flow-Based Rate-Limiting:Experiences • UW-Campus currently has this implemented on an Riverstone RS to limit residence hall network (ResNet) traffic. This can also be done with Cisco gear such as the Catalyst 65xx with the requisite additional cards. • It was reported that ResNet users had difficulty with UDP based applications, although we had per-flow UDP limiting set to 10Mb/s. The problems disappeared after completely removing the limits.
Flow-Based Rate-Limiting:Example Example: configuring rate-limits on a Riverstone aggregation router to limit a user population's outbound flows TCP flows to 100 Kb/s each. Also, limit flows that are likely to be Napster to 33.6 Kb/s. Consider outbound flows to campus destinations and to web servers to be "preferred", so only limit those to 10 Mb/s. acl resnet_napdata permit tcp 10.10.0.0/16 any 6699 any acl resnet_napdata permit tcp 10.10.0.0/16 any any 6699 acl resnet_napdata permit tcp 10.10.0.0/16 any 6688 any acl resnet_napdata permit tcp 10.10.0.0/16 any any 6688 acl resnet_tcp permit tcp 10.10.0.0/16 any acl resnet_tcp_preferred permit tcp 10.10.0.0/16 any any http acl resnet_tcp_preferred permit tcp 10.10.0.0/16 10.1.0.0/16 acl resnet_tcp_preferred permit tcp 10.10.0.0/16 10.2.0.0/16 acl resnet_tcp_preferred permit tcp 10.10.0.0/16 10.3.0.0/16 acl resnet_tcp_preferred permit tcp any 10.10.0.0/16 rate-limit resnet_tcp input acl resnet_tcp rate 100000 exceed-action drop-packets sequence 1 rate-limit resnet_napdata input acl resnet_napdata rate 33600 exceed-action drop-packets sequence 3 rate-limit resnet_tcp_preferred input acl resnet_tcp_preferred rate 10000000 exceed-action drop-packets sequence 4 rate-limit resnet_napdata apply interface backbone rate-limit resnet_tcp apply interface backbone rate-limit resnet_tcp_preferred apply interface backbone
Results of Rate-Limit Implementations • Aggregated based hard policing causes steady 20 Mb output from ResNet • Winter break: ResNet traffic very low • Flow based hard policing lowers traffic amount, but level is not steady.
Practical Rate-Limit Methods:TCP Rate Control • TCP rate shapes flows of TCP traffic at the same fine granularity available with other flow-based rate limits by manipulating TCP header fields, which are used to negotiate window sizes information, and by pacing response ACKs. • Sites such as UW-Whitewater have experience with the Packeteer PacketShaper product. Last weekend, UW-Madison began experimenting with a PacketShaper.
TCP Rate Control:Pros and Cons • Advantages: • Maximizes goodput by minimizing packet retransmissions. • A mature commercial product implementing it is available. • TCP rate control could offer some protection against some obscure denial-of-service attacks which generate non-conforming TCP packets. • Disadvantages: • As the name implies, TCP rate control is TCP specific, and therefore must be augmented with other rate-limiting mechanisms. • There are scalability issues. PacketShaper, for example, must track state of connections and manipulate packets on the fly. • Patented, closed-source implementations.
TCP Rate Control:Experiences • At UW-Whitewater, connected via DS3 to WiscNet, when the PacketShaper was set to 45Mb versus not being present in the network, the transfer rate was roughly 2/3 of the non-PacketShaper rate. Simply having the device in the network slowed transfer rates. • Current maximum physical connection rate is 100Mb Ethernet. Using PacketShaper in large networks is tricky. • Basically a flow-based rate controller, those advantages and disadvantages apply as well.