1 / 35

Web100

Web100. Wendy Huntoon - PSC Jim Ferguson - NCSA I2 Members Meeting May 2002. Outline. Project Overview Motivation: What is the problem Web100 Collaboration Progress to Date Standardization Process Code Release Code Capabilities Overview of Users Web100 Resources.

astra
Download Presentation

Web100

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web100 Wendy Huntoon - PSC Jim Ferguson - NCSA I2 Members Meeting May 2002

  2. Outline • Project Overview • Motivation: What is the problem • Web100 Collaboration • Progress to Date • Standardization Process • Code Release • Code Capabilities • Overview of Users • Web100 Resources

  3. Motivations: What’s the Problem? • High performance flows slower than line rate • Delays continue/increase even with higher bandwidth • TCP tuning issues are non-trivial • Poorly conceived stacks • Router/switch buffer queues inadequate • Slow start and AIMD algorithm • Eliminate/dramatically reduce the “wizard gap” • Need for kernel instrumentation set for TCP variables

  4. The Wizard Gap TCP over a long haul path Year Wizards Non-wizards Ratio • 1Mb/s 300kb/s 3:1 • 10Mb/s 1995 100Mb/s • 1Gb/s 3Mb/s 300:1 Scientists/researchers not happy with this

  5. TCP tuning is painful debugging • All problems limit performance • IP routing, long round trip times • Improper MSS negotiations or path MTU discovery • IP Packet reordering • Packet losses, congestion, lame hardware • TCP sender or receive buffer space • Inefficient applications • Any one problem can mask all the others and confound all but the best (and few) tuning gurus • Need for better diagnostics and visibility into problems

  6. Goal and Method • Make it “easy” (transparent) for non-experts to achieve higher throughput performance • Enhance TCP capabilities with better (finer grain) kernel instrumentation and automatic controls • Real time triage capability determines sender, receiver, and/or network bottlenecks

  7. Why Focus on TCP • TCP has an ideal vantage point into throughput problem space • TCP can identify bottleneck subsystem(s) • TCP already measures the network (some) • TCP can measure the application • TCP can adjust itself (auto-tuning feedback)

  8. Web100 Collaboration • Funded by the NSF • Currently Year 2 of a 3 Year grant. • Cisco URP for initial seed funding. • Collaborators • PSC (Matt Mathis, R. Reddy, Janet Brown, John Heffner) • NCAR (Peter O’Neil, Marla Meehl) • NCSA (John Estabrook, Tanya Brethour, Stephen Engelhardt, Jim Ferguson)

  9. What is in the code • Web100 software consists of: • TCP Kernel Instrument Set (TPC-KIS) • Instruments coded directly in to the Operating System kernel. • Derived Instrument Set (DIS) • Information that is collected based on KIS parameters. • Application Code • Tools, applications, etc. that use the information provided by the KIS and DIS.

  10. Kernel Instrument Set • Definition • Set of instruments designed to collect as much of the information as possible to enable a user to isolate the performance problems of a TCP connection. • How it is implemented • Each instrument is a variable in a "stats" structure that is linked through the kernel socket structure. • The Linux /proc interface is used to expose these instruments outside the kernel.

  11. What is the TCP-KIS? • TCP-KIS instruments group naturally into categories. • Currently roughly 19 categories. • Already more than 125 instruments have been developed. • For each instrument: • Precise (standards ready) definition. • Instrument code in the kernel • Implementation verification tests • Does the kernel implementation meet the definition. • Prototype diagnostic tool(s) to demonstrate functionality and effectiveness.

  12. TCP-KIS • Basic instrumentation examples • Connection ID: 5-tuple that uniquely identifies a connection. • State: determines what protocol features or algorithms are enabled. • Traffic out: statistics aggregate packets and traffic sent out on a connection.

  13. Local Sender Triage • Group of instruments associated with the local sender. • Determine what subsystems are throttling TCP data transmission. • Three parallel sets of instruments that measure: • Receiver Window • Network Congestion • Senders Availability

  14. Local Sender Groups • Other groups of instruments associated with the Local Sender: • Local Sender Congestion Model • Local Sender Loss Model • Local Sender Re-order Model • Local Sender RTT • Local Sender Segment Size • Local Sender Bottlenecks • Local Sender Tuning

  15. Other Instruments • Similar instruments for the Local Receiver. • Observed Receiver instruments • Often inferred from the data stream. • E.g, Observed Receiver - receivers state is inferred from the ACK stream. • Application Interface • Future instruments to collect statistics on how the application is using the network.

  16. Userland Distribution • Released asynchronously with kernel distribution • Currently at Alpha 1.1 • Version 1.2 release imminent • Consists of • The web100 library • Command line utilities • GUI utilities

  17. Web100 Library • Web100 kernel exposes critical TCP variables/instruments through /proc • Web100 library provides the necessary access functions to access these variables/instruments • Functions • Read the value of a variable/instrument • Snap shot of a group (facilitates atomic reading of a group of variables) • Modify tunable variables (ex. send buffer size) • Etc …

  18. Utilities • Command line utilities • Useful in batch scripts • Serve as demo codes for the usage of web100 library • GUI utilities • Based on GTK+ • Useful for troubleshooting network applications • Serve as examples for application developers

  19. GUI Sample Screens – DTB

  20. Connection Selector

  21. Looking at a Variable

  22. Timeline - Year 1 • Alpha code development • Establish User Support • www.web100.org • Initial User Community • Very limited to begin with. • Knowledgeable users, expected to provide technical input on the code. • Understand and develop applications.

  23. Timeline - Year 2 • Began standardization process. • Develop MIB • Submit to IETF • Develop public code • Fix bugs in alpha versions • Add instrumentation • Code release • Continue code development • Identify and add new instruments

  24. Code Releases - To date • Initial Release • Alpha0.2, released May 23, 2001 • Alpha0.3, released Sept. 19, 2001 • Alpha 1.0-Separation of Kernel and Userland code • Kernel Patch: • Alpha 1.1 for Linux 2.4.16, released March 18, 2002 • Alpha 1.0, released March 1, 2002 • Alpha 1.0, released February 26,2002 • Userland: • Alpha 1.1, released February 28, 2002 • Alpha 1.0, released February 26,2002

  25. Timeline - Year 3 • New pathprobe diagnostic tool (wip, unreleased). • Add another 10-12 instruments. • Review instruments and code with other wizards. • Gain vendor support for ideas and code. • Finalize IETF draft by December IETF meeting.

  26. Milestones • Over a year of ~ 30 alpha testers • Including: SLAC, ORNL, LBNL, and universities • www.net100.org • Modified Linux kernel supports 2.4.16 • Separation between KIS and library functions • draft-ietf-tsvwg-tcp-mib-extension-00.txt • draft-ietf-ipngwg-rfc2012-update-01.txt

  27. Web100 Collaborator Activity • Rich Carlson, ANL • Tom Dunnigan, ORNL • Tom Hacker, U. of Michigan • Doug Chang, SLAC • Andreas Burkhardt & Matt Grob, Qualcomm • Larry Dunn & Scott Dier, Cisco/U. of Minnesota • Jason Lee, LBL

  28. Collaborator Assistance • Bugs! • Kernel • Utilities • Release • Request new features • Review and criticize documentation • Way too easy on us

  29. Collaborator Activity • Carlson/ANL working on a troubleshooting guide for LANs. • Set up network of 13 identically equipped PIII connected via Cisco 5500 network switch, running Web100-enabled Linux. • Introduces typical network faults (duplex mismatches, other config errors) and analyzes data for “signatures” of these faults. • Modified Iperf 1.2 to collect variables and reverse flow.

  30. Collaborator Activity • Dunnigan/ORNL has found web100 helpful in seeing losses/retransmission and congestion avoidance parameters of individual TCP flows, and for tuning flows • Has developed a Web100-enabled ttcp • Has developed a daemon that logs web100 variables for designated paths when a flow closes • Has developed an autotuning daemon that uses web100 to tune flows, including modifications to web100 to support "event notification", so the daemon knows when a new flow/socket is opened

  31. Collaborator Activity • Hacker/U.Michigan has been using the web100 software to help tune and diagnose end-to-end network performance problems across the U-M campus network as well as across Abilene for the Visible Human and Atlas projects at U-M. • Chang/SLAC is looking to fix performance problem between Linux and Solaris machines.

  32. Collaborator Activity • Qualcomm is using Web100 to measure TCP performance over certain types of high speed wireless links under development. Web100 is partially integrated into some other tools - in the sense that output reports are published automatically in a format similar to other tools Qualcomm uses. • Dunn/Cisco currently using Web100 for a class at U.Minnesota. Includes accounts on test machine at NCSA.

  33. Collaborator Activity • Lee/LBL has obtained accounts at SLAC and ANL for WAN testing, and have co-located one of our machines in Washington D.C. to do testing over SuperNet. Still in the process of testing all this out. • Keith Jackson at LBL has written Python wrappers to the Web100 calls using swing.

  34. Web100 Summary • Main WWW site: www.web100.org • Freely available software distribution • www.web100.org/download • hundreds of downloads • Please be cognizant of impacts on others • Please use, test, provide feedback, contribute code • IETF standards process to benefit all • Attention turning to working with OS vendors to incorporate standards enhancements into their stacks

More Related