290 likes | 440 Views
Successful Bandwidth Management at Carnegie Mellon. Peter Hill & Kevin Miller Internet2 Joint Techs – August 2003. About Us. Responsible for campus backbone, Internet connection, 80% of data ports Internet connectivity (commodity & research) through Pittsburgh GigaPOP
E N D
Successful Bandwidth Management at Carnegie Mellon Peter Hill & Kevin Miller Internet2 Joint Techs – August 2003
About Us • Responsible for campus backbone, Internet connection, 80% of data ports • Internet connectivity (commodity & research) through Pittsburgh GigaPOP • Historically, net exporter of data to Internet • 8000 students, 1300 faculty, 2000 staff • 3:1 computer:person ratio • 10,000+ registered Wireless cards
Timeline – Nov 2001 • Demand rising faster than willingness to supply • P2P file-swapping allowed users to easily create high volume servers
No Bandwidth Management • Outbound bandwidth hits limits of OC-3 ATM interface • High rate of packet loss from single cell drops • TCP retransmits cause egress meltdown • Upgraded physical connection to GbE • Demand continues rising
Timeline – March 2002 • GbE link to GigaPOP uses 802.1q trunk, separate Vlans for commodity vs. research traffic • Router unable to rate-limit outbound traffic on single Vlan
Emergency Solution • Complex, ‘delicate’ engineering to split commodity vs. research traffic from internal core to border router • Multiple OSPF processes • Research route redistribution to OSPF • Applied rate limits to commodity traffic upon ingress to border router
Emergency Solution • 75Mbps rate limit – single queue, tail drop only • Results weren’t pretty: high latency Out In
Timeline – Summer 2002 • Messy solution to a complicated problem • No discrimination between legitimate and discretionary traffic • Research traffic unaffected, though
Tail Drop Issues • School resumes: hard limits hit immediately • P2P consuming a significant amount of bandwidth • Users reporting problems accessing email from home • Interactive sessions suffer (SSH, Telnet)
(Un)fair Bandwidth Access • High priority traffic? • Nine machines over .5% each; 21% total
(Un)fair Bandwidth Access • On the same day, 47% of all traffic was easily classifiable as P2P • 18% of traffic was HTTP • Other traffic: 35% • Believe ~28% is port-hopping P2P
Researching Solutions • Middlebox Traffic Shapers • Compared Allot NetEnforcer vs. Packeteer Packetshaper • Determined NetEnforcer better matched our requirements • Adds slight delays to constrain TCP • Fair bandwidth distribution among flows • Easier to configure policy • Better performance when demand equals limit
Raised Bandwidth Limit • Nov 2002: Campus limit raised to 120Mbps
Timeline – January 2003 • NetEnforcer deployed in October • Policy developed in late fall • Implemented policy in early January
NetEnforcer Policy • Technical bandwidth policy using NetEnforcer • Used per-flow class-based fair bandwidth queuing Network critical traffic High Priority Interactive (SSH, telnet); limited per-flow Traffic on well-known service ports (IMAP, HTTP) Non-classified traffic Low Priority P2P traffic, classified by port number
NetEnforcer Policy • Improved interactive performance • Fewer complaints about accessing campus services remotely • Traffic consistently at 120Mbps Winter Recess
Limits of Shaping Solution • Per-host fair queuing not possible • User with 100 flows allowed 100 times bandwidth of user with one flow • Poor SSH X forwarding performance • High latency for UDP gaming services • Demand still high – socially, nothing had changed
Tighter P2P Limits • Software to classify P2P traffic at Layer 7 available in February 2003 • Added L-7 classification • Put absolute bandwidth cap on P2P
Timeline – February 2003 • Demand still close to limit • Consider alternate approach: social engineering • Identifying machines using high bandwidth
Solution • Add a targeted user education/dialogue component to strictly technical solutions • Blanket notices were/are ineffective • Why not before? • Usage accounting software was in development • Only anecdotal information on ‘top talkers’ • Unsure of community reaction • Didn’t know how easy it would be
Solution • Created official Campus Bandwidth Usage Guidelines • Established a daily per-host bandwidth usage limit • 1GB per day for wired machines (outbound to commodity Internet) • 250MB per day for wireless (to any destination) • Published guidelines and requested comments from campus community • Began direct notification of hosts exceeding guidelines on February 24, 2003
Guideline Enforcement • Accounting data generated • ‘Argus’ utility captures raw packet data from egress span port • Post-processing aggregates flows by host, sums bandwidth usage • Nightly top-talkers report Hostname iMB oMB iKp oKp flows iMbs oMbs ips ops web4.andrew.cmu. 3456 107621 51917 83038 1948667 0.1 2.0 120 192
Guideline Enforcement • Over-usage notification • Initial mail worded carefully, informed owner of quota and overage, and requested information on usage reasons • Approximately a 0.25 FTE process • Automated in July • Disconnections handled like other abuse incidents
Positive Results • Immediately noticed a decline in bandwidth utilization (within a few hours of first notifications) Applied Strict Layer-7 P2P Limits First Usage Notices Sent
Positive Results • Very few negative responses • Some hosts granted exclusions (official web servers) • Many notified were completely unaware of bandwidth usage • P2P programs left running in background • Compromised machines • Trojan FTP servers serving French films • Favorite responses • “i don't understand--what's bandwidth?? how do i reduce it?? what am i doing wrong?? what's going on???” • “i have no idea what bandwidth is or how one goes about using it too much.”
Positive Results Now: (Summer) Summary:
Timeline – May 2003 • With guideline enforcement, traffic drops by half • NetEnforcer still has an impact on P2P – need to assess legitimate and discretionary P2P uses
Considerations • Per-machine limits might create artificial commodity in addresses • Role of Enterprise or Service Provider? • Packet shaping tends to apply Enterprise mindset – determine organization priorities • Bandwidth quotas use Service Provider mindset – how are quotas applied (port, machine, user?)
Questions? • Argus: http://www.qosient.com/argus • Related links/resources: http://www.net.cmu.edu/pres/jt0803 • Packet shaper evaluation • Argus accounting • Usage guidelines • Peter Hill: peterjhill@cmu.edu • Kevin Miller: kcm@cmu.edu