270 likes | 379 Views
QBSS Applications. Les Cottrell – SLAC Presented at the Internet 2 Working Group on QBone Scavenger Service (QBSS), October 2001 www.slac.stanford.edu/grp/scs/talk/qbss-i2-oct01.ppt.
E N D
QBSS Applications Les Cottrell – SLAC Presented at the Internet 2 Working Group on QBone Scavenger Service (QBSS), October 2001 www.slac.stanford.edu/grp/scs/talk/qbss-i2-oct01.ppt Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), also supported by IUPAP
High Speed Bulk Throughput • Driven by: • Data intensive science, e.g. data grids • HENP data rates, e.g. • BaBar today have 500TBytes data, TB/day, by end of run in summer 2002 3TB/day, PB/yr, 40MB/s • Jlab similar, FNAL 2 similar experiments turning on • CERN/LHC 1000PBytes • Boeing 747 high throughput, BUT poor latency (~ 2 weeks) & very people intensive Data vol Moore’s law • So need high-speed networks and ability to utilize • High speed today = several hundred GBytes/day – TB/day (100GB/d ~ 10Mb/s) • Today’s networks have crossed the threshold where now possible to share data effectively via the network
Throughput quality improvements TCPBW < MSS/(RTT*sqrt(loss)) 80% annual improvement ~ factor 10/3yr China Note E. Europe keeping up Macroscopic Behavior of the TCP Congestion Avoidance Algorithm, Matthis, Semke, Mahdavi, Ott, Computer Communication Review 27(3), July 1997
Bandwidth changes with time 1/2 • Short term competing cross-traffic, other users, factors of 3-5 observed in 1 minute • Long term: link, route upgrades, factors 3-16 in 12 months All hosts had 100Mbps NICs. Recently have measured 105Mbps SLAC > IN2P3 and 340Mbps Caltech > SLAC with GE
Typical results Today Hi-thru usually = big windows & multiple streams Improves ~ linearly with streams for small windows Broke 100Mbps Trans Atlantic Barrier Solaris Default window size 64kB 100kB 32kB 16kB 8kB
Impact on Others • Make ping measurements with & without iperf loading • Loss loaded(unloaded) • RTT • Looking at how to avoid impact: e.g. QBSS/LBE, application pacing, control loop on stdev(RTT) reducing streams, want to avoid scheduling
HENP Experiment Model • World wide collaborations necessary for large undertakings • Regional computer centers in France, Italy, UK & US • Spending Euros on data center at SLAC not attractive • Leverage local equipment & expertise • Resources available to all collaborators • Requirements - bulk: • Bulk data replication (current goal > 100MBytes/s) • Optimized cached read access to 10-100GB from 1PB data set • Requirements – interactive: • Remote login, video conferencing, document sharing, joint code development, co-laboratory (remote operations, reduced travel, more humane shifts) • Modest bandwidth – often < 1 Mbps • Emphasis on quality of service & sub-second responses
Applications • Main network application focus today is on replication at multiple sites worldwide (mainly N. America, Europe and Japan) • Need fast, secure, easy to use, extendable way to copy data between sites • Need to interactive and real time at same time, e.g. experiment control, video & voice conferencing • HEP community has developed 2 major (freely available) applications to meet replication need: bbftp and bbcp
Bbcp Data Data bbcp bbcp Source Sink bbcp Agent • Peer-to-peer copy program with multiple (<=64) streams, large window support, secure password exchange (ssh control path, single use passwords (data path)), similar syntax to scp • C++ component design allows testing new algorithms (relatively easy to extend) • Peer-to-peer • No server, if have program, have service (usually no need for admins), any node can act as source or sink, 3rd party copies • Provides sequential I/O (e.g. from /dev/zero, to pipe or tape or /dev/null) and progress reporting
Application rate-limiting • Bbcp has transfer rate limiting • Could use network information (e.g. from Web100 or independent pinging) to bbcp to reduce/increase its transfer rate, or change number of parallel streams No rate limiting, 64KB window, 32 streams 15MB/s rate limiting, 64KB window, 32 streams
QBSS test bed with Cisco 7200s Cisco 7200s • Set up QBSS testbed • Configure router interfaces • 3 traffic types: • QBSS, BE, Priority • Define policy, e.g. • QBSS > 1%, priority < 30% • Apply policy to router interface queues 10Mbps 100Mbps 100Mbps 1Gbps 100Mbps
Example of effects Also tried: 1 stream for all, and priority at 30%
QBSS with Cisco 6500 • 6500s + Policy Feature Card (PFC) • Routing by PFC2, policing on switch interfaces • 2 queues, 2 thresholds each • QBSS assigned to own queue with 5% bandwidth – guarantees QBSS gets something • BE & Priority traffic in 2nd queue with 95% bandwidth • Apply ACL to switch port to police Priority traffic to < 30% BE 100% Cisco 6500s + MSFC/Sup2 QBSS (~5%) Priority (30%) 100Mbps 1Gbps 1Gbps 1Gbps 1Gbps Time
Impact on response time (RTT) • Run ping with Iperf loading with various QoS settings, iperf ~ 93Mbps • No iperf ping avg RTT ~ 300usec (regardless of QoS) • Iperf = QBSS, ping=BE or Priority: RTT~550usec • 70% greater than unloaded • Iperf=Ping QoS (exc. Priority) then RTT~5msec • > factor of 10 larger RTT than unloaded • If both ping & iperf have QoS=Priority then ping RTT very variable since iperf limited to 30% • RTT quick when iperf limited, long when iperf transmits
Possible usage • Apply priority to lower volume interactive voice/video-conferencing and real time control • Apply QBSS to high volume data replication • Leave the rest as Best Effort • Since 40-65% of bytes to/from SLAC come from a single application, we have modified to enable setting of TOS bits • Need to identify bottlenecks and implement QBSS there • Bottlenecks tend to be at edges so hope to try with a few HEP sites
SC2001 demo • Send data from SLAC/FNAL booth computers (emulate a tier 0 or 1 HENP site) to over 20 other sites with good connections in about 6 countries • Part of bandwidth challenge proposal • Saturate 2Gbps connection to floor network • Apply QBSS to some sites, priority to a few and rest Best Effort • See how QBSS works at high speeds
More Information • IEPM/PingER home site: • www-iepm.slac.stanford.edu/ • Bulk throughput site: • www-iepm.slac.stanford.edu/monitoring/bulk/ • Transfer tools: • http://dast.nlanr.net/Projects/Iperf/release.html • http://doc.in2p3.fr/bbftp/ • www.slac.stanford.edu/~abh/bbcp/ • http://hepwww.rl.ac.uk/Adye/talks/010402-ftp/html/sld015.htm • TCP Tuning: • www.ncne.nlanr.net/training/presentations/tcp-tutorial.ppt • www-didc.lbl.gov/tcp-wan.html • QBSS measurements • www-iepm.slac.stanford.edu/monitoring/qbss/measure.html
Requirements • HENP formed a Trans-Atlantic Network committee charged to project requirements Does not include university or trans Pacific, or research needs
bbftp • Implements an ftp-like user interface, with additions to allow large windows, multiple streams, and secure password exchange. • Has been in production use for more than a year • Is supported and being extended • http://doc.in2p3.fr/bbftp/
Bbcp: algorithms • Data pipelining • Multiple streams “simultaneously” pushed • Automatically adapts to router traffic shaping • Can control maximum rate • Can write to tape, read from /dev/zero, write to /dev/null, pipe • Check-pointing (resume failed transmission) • Coordinated buffers • All buffers same-sized emd-to-end • Page aligned buffers • Allows direct I/O on many file-systems (e.g. Veritas)
Bbcp: Security • Low cost, simple and effective security • Leveraging widely deployed infrastructure • If you can ssh there you can copy data • Sensitive data is encrypted • One time passwords and control information • Bulk data is not encrypted • Privacy sacrificed for speed • Minimal sharing of information • Source and Sink do not reveal environment
Bbcp: user interface & features • Familiar syntax • bbcp [ options ] source [ source [ … ] ] target • Sources and target can be anything • [[username@]hostname:]]path • /dev/zero or /dev/null • Easy but powerful • Can gather data from multiple hosts • Many usability and performance options • Features: read from /dev/zero; write to: tape, /dev/null, pipe; check-pointing; MD5 checksums; compression; transfer rate limiting; progress reporting; mark QoS/TOS bits
Impact of cross-traffic on Iperf between SLAC & NASA-GSFC Best throughput about 44Mbps Throughput varies by factor of 5 or more from weekday day to night Congested path
Using bbcp to make QBSSmeasurements • Run bbcp src data /dev/zero, dst=/dev/null, report throughput at 1 second intervals • with TOS=32 (QBSS) • After 20 s. run bbcp with no TOS bits specified (BE) • After 20 s. run bbcp with TOS=40 (priority) • After 20 more secs turn off Priority • After 20 more secs turn off BE
Optimizing streams • Choose # streams to optimize throughput/impact • Measure RTT from Web100 • App controls # streams