240 likes | 373 Views
Pushing Up Performance for Everyone. Matt Mathis 7-Dec-99. Why do so few people get good network performance?. Context and history Architectural origins Approaches. The Wizard Gap. Past Performance Evolution. Wizards wrote standards Standard TCP could not go fast (1988)
E N D
Pushing Up Performancefor Everyone Matt Mathis 7-Dec-99
Why do so few people get good network performance? • Context and history • Architectural origins • Approaches
Past Performance Evolution • Wizards wrote standards • Standard TCP could not go fast (1988) • Wizards enhanced systems • Stock systems could not go fast (1995) • Gurus tune systems (today) • Fast TCP is present • Badly misstuned by default
Ongoing Performance Evolution • More disciples tune and debug (tomorrow) • All netadmins and sysadmins? • Systems are tuned by default (future) • Web100..… • Debugging will become “easy” (?)
Architecture • The Good news • TCP hides the net from the application • The Bad news • TCP hides the net
Architecture • The Good news • TCP hides the net from the application • The Bad news • TCP hides the net ……. including ALL bugs everywhere. • The only legal symptom is less than expected performance
You get poor performance if: • The application is inefficient • TCP is buggy • TCP is misstuned • The path is buggy • The path is congested • Routing is suboptimal Especially on a long path. • Think: weakest link of an invisible chain
Closing the Wizard gap • Share the expertise • Train more disciples • Require less expertise • Systems should tune themselves • Better observability • Focused and efficient debugging • Documentation • Show that the world is improving
Share the expertise • Joint Techs meetings • TCP Tuning • In depth presentation by Matt Mathis • DAST Application tutorials • See: dast.nlanr.net
Require less expertise • TCP Autotuning • Presentation by Matt Mathis • Web100 • Presentation be Basil Irwin • Online TCP debugging resources • See http://www.ncne.nlanr.net/TCP
Better Observability (Instrumentation) • Network Instrumentation and Visualization • Presentation by Mark Gates • Trace Analysis and Auto-Diagnosis • Presentation by Kathy Benninger • Better TCP instrumentation (Web-100) • Just ask TCP why it is slow
Better Observability(Debugging methods) • Sweden - Pittsburgh path • Presentation by Greg Miller & Jerry Sobieski • iPerf tool • Presentation by Mark Gates • Existing tools and tool repositories • See: http://www.ncne.nlanr.net/tools • Still insufficient
Better Observability(Measurement) • Measurements from Seattle I2 Meeting • Presentation by Matt Zekauskas • Advanced Research and Engineering Atlas • Presentation by John Jamison • Many distributed measurement efforts • AMP, Surveyor, NIMI, etc
Documentation • vBNS stats and measurement • Tutorial by Rick Wilder • NLANR MOAT vBNS traffic on NAI • See: moat.nlanr.net • Many benchmark efforts • Surveyor, AMP, NIMI, Web100…… • HPC host census(?)
Conclusion • We need to find every bug that TCP hides • Now and always • We need to eliminate all irrelevant controls • Autotune TCP (and RED, etc)
Debugging flowchart • http://www.ncne.nlanr.net/TCP/debugging • Look at a trace and click to study symptoms • Ongoing evolution
Testrig kit • "Fool proof" TCP diagnosis starter kit with: • Simple diagnostic application • TCP trace collection tools • Visualization tools • Pointer to the debugging flowchart • With wrapper scripts around everything
TCP Debugging In-depth • Draft done at CAIDA this summer • Future NCNE On-site • 1, 2.5 and 5 hour versions • Basis for the debugging flowchart • Update from flowchart as it evolves • Interactive - Uses magicpoint/xplot
Trace Analysis and Auto-Diagnosis (TAAD) • Scan GigaPop traffic for misstuned TCP connections • that fail to meet the model rate = (MSS/RTT) * (C/sqrt(p)) • Running prototype • Use to direct other resources
Autotuning • Make TCP “do the right thing” by default • No unneeded user controls
Generate data points (AMP) • Nearly 100 systems already • Kernel TCP bug • Need to upgrade to freeBSD 3.3 • Easy to create 100x1 data points • Can create 100x100 data points • Opportunity for NIMI
Generate OC-12 data points • Max Okumoto working at PSC for SDSC • Will start tuning selected paths
HPC Host Census • Use existing data from MCI OC-Xmon • Patterned after HWB big flow detection • Measure the number of fast hosts • Words needed to generalize to all of JET