310 likes | 412 Views
Networking Research Overview. Micah Beck Assoc. Prof., Computer Science Director, LoCI Laboratory University of Tennessee SciDAC PI Mtg 24 March 2004. SciDAC Networking Research Projects: Goals. Goal: Phase I
E N D
Networking Research Overview Micah Beck Assoc. Prof., Computer Science Director, LoCI Laboratory University of Tennessee SciDAC PI Mtg 24 March 2004
SciDAC Networking Research Projects: Goals • Goal: Phase I • Develop data movement tools and infrastructures to support real-time data-intensive SciDAC applications • To develop advanced network tools enable SciDAC applications efficiently measure, predict, and diagnose end-to-end performance (2 projects) • To develop and deploy cyber security tools to support group collaborations in grid infrastructures • Goal: Phase II • Deploy the advanced tools developed in phase I in production infrastructures to support network intensive SciDAC projects
Logistical Networking: Tools, Applications & Architecture Micah BeckJack DongarraJames S. Plank University of Tennessee Rich Wolksi University of California,Santa Barbara http://loci.cs.utk.edu/scidac
Project Thrusts • Dongarra: Application Development Tools/Environments • NetSolve/GridSolve • Wolski: Network Monitoring/Prediction • Network Weather Service • Beck & Plank: Logistical Networking Infrastructure, Middleware & Support • Internet Backplane Protocol • Logistical Runtime System
Internet Backplane Protocol • Overlay intermediate node providing services based on enriched resources • Storage: file system, RAM, disk • Transfer: TCP (std, compressed), UDP(SABUL, mcast), SAN/WAN • Processing: primitive operations (alpha) • 100s of IBP depots deployed worldwide • 1.4 alpha release: persistent sockets; optional authentication, usage logging
Logistical Networking Tools • Logistical Runtime System (LoRS) • E2E Services: Fault tolerance (Reed-Solomon), encryption (AES), compression, high perf. data movement strategies • Library, command line, GUI, Web tools • Ported to all compute platforms (Cray OS problems) • Logistical Backbone (L-Bone) • depot monitoring, resource discovery • Logistical Distribution Network (LoDN) • directory services, content distribution • Java Web Start delivery of tools
SciDAC Application Impact • Terascale Supernova Initiative(A. Mezzacappa, ONRL; J. Blondin, NCSU, D. Swesty, SUNY Stony Brook) • Five 1.6TB depots deployed at TSI sites • Energy Fusion Research(S. Klasky, PPPL) • Depots deployed on PPPL cluster nodes • Dataset transfers: O(1TB) @ 1-400 Mb/s • Simulations at NERSC and ORNL • Control/viz at ONRL, NCSU, Stony Brook, PPPL • Transfers span ESNet, Abilene • CS/Physics collaboration, science getting done!
TSI Site Deployment: ORNL, NCSU, SUNY Stony Brook, NERSC, UCSD
SciDAC Technology Impact • Spanning heterogeneous networks • Ultrascale (10 Gbps) wide area transfers require specialized systems • Optically swtiched networks (e.g DOE Science UltraNet) do not peer with IP • Serving scalable communities • Staging and caching at intermediate nodes • Processing data “in transit” • Common services ondistributed data
Transit Networking Architecture Application Transport … Network IP common interface Transit link Local Physical transfer storage processing
INCITE–Edge-based Traffic Processingfor High-Performance Networks R. Baraniuk, E. Knightly, R. Nowak, R. Riedi Rice University L. Cottrell, J. Navratil, W. Mathews SLAC W. Feng, M. Gardner LANL web site: incite.rice.edu
INCITE Project • InterNet Control and Inference from The Edge on-line tools to characterize and map host and network performance as a function of time, space, application, protocol, andservice
INCITE Thrusts and Tools Thrust 1:Multiscale traffic analysis and modeling techniques • wavelet, multifractal, connection-level models Thrust 2:Inference and control algorithms for network paths, links, and routers • end-to-end path probing and modeling • network tomography and topology discovery • advanced high-speed protocols Thrust 3:Data collection tools • active measurement infrastructure • passive application-layer measurement
pathChirp • Goal • estimate instantaneous available bandwidth (ABW) on an end-to-end network link • Basic probing paradigm • stream packets at some rate • no queuing delay rate<ABW • queuing delay builds up rate>ABW • Until now: tradeoff • high accuracy has required high volume probing (inefficient) • Unique to pathChirp • variable rateprobe packet train (exponentially spaced chirp) • 10x more efficient than competing techniques
Network Tomography From end-to-endmeasurements… … infer internal topology and delay/loss characteristics
TCP alone 745.5 Kb/s TCP plus 739.5 Kb/sTCP-LP109.5 Kb/s TCP-LPis invisible to TCP TCP - Low Priority • Goal • utilize excess bandwidth in a non-intrusive fashion • Methodology • sender-side modification of TCP: delay-based approach • Applications • bulk data transfers • available bandwidth monitoring • P2P file sharing • High-speedTCP-LP • TCP-LP + HSTCP • implementation • Linux-2.4.22-web100 • experiments • Stanford - Ann Arbor • Stanford - Gainesville
Changes in network topology (BGP) can result in dramatic changes in performance Hour Samples of traceroute trees generated from the table Los-Nettos (100Mbps) Remote host Snapshot of traceroute summary table Note: 1. Caltech misrouted via Los-Nettos 100Mbps commercial net 14:00-17:00 2. ESnet/GEANT working on routes from 2:00 to 14:00 Drop in performance (From original path: SLAC-CENIC-Caltech to SLAC-Esnet-LosNettos (100Mbps) -Caltech ) Back to original path Dynamic BW capacity (DBC) Changes detected by IEPM-Iperfand AbWE Mbits/s Available BW = (DBC-XT) Cross-traffic (XT) Esnet-LosNettos segment in the path (100 Mbits/s) ABwE measurement one/minute for 24 hours Thursday 9 October 9:00am to Friday 10 October 9:01am
Crossing the Application/Network Divide Send data over network Application Segmentation TCP Flow & Congestion Control • Implications to the • application? • Insights for high- • performance network • protocols? Checksums IP Fragmentation : : Data Link Network monitors focus here. Network
MUSE M A G N E T TICKET: tcpdump++ TICKET and MAGNET+MUSETICKET: Traffic Information-Collecting Kernel with Exact TimingMAGNeT: Monitor for Application-Generated Network TrafficMUSE: MAGNET User-Space Environment Send data over network Application Segmentation TCP Flow & Congestion Control Checksums IP Fragmentation : : Data Link Network For more information, go to www.lanl.gov/radiant/pubs.html
MAGNeT MAGNETMonitoring Apparatus for General kerNel-Event Tracing (at nanoscale granularity) • Why not extend monitoring to kernel events in general? Software Oscilloscope for Cluster and Grids • Debugging • e.g., IdentifiedLinux OS bug in the scheduler for SMPs. • Can be used to deploy, debug, and monitor the DOE UltraNet (UltraScienceNet), e.g., dynamic provisioning. • Performance Optimization • Improved performance of 10GigE adapters by 300%. Can improve end-to-end performance of DOE UltraNet. • Monitoring Grid Applications • Integrated MAGNET with SciDAC’s PERC TAU and SciDAC’s PERC SvPablo/Autopilot.* • Adaptive Resource-Aware Applications • SciDAC Deployment: PERC, Supernova Science Ctr, Transit Network Fabric + Terascale Supernova Initiative + Fusion Energy (emerging), and Earth Systems Grid II (emerging). * For more information, see M. Gardner, W. Deng, T. Markham, C. Mendes, W. Feng, and D. Reed, “A High-Fidelity Software Oscilloscope for Globus,” GlobusWorld 2004, Jan. 2004.
Bandwidth estimation:measurement methodologies and applications k claffy (CAIDA), Constantinos Dovrolis (Georgia Tech)
Project goals • Develop estimation techniques and public-domain tools for the estimation of end-to-end: • Network capacity (bottleneck bandwidth) • Available bandwidth (residual capacity) • Focus 1: non-intrusive, fast, and accurate techniques • Focus 2: high-bandwidth paths (up to 1Gbps) • Compare and validate different tools in reproducible and realistic net conditions • Apply bandwidth estimation in transport and overlay routing problems • Disseminate research results at conferences and journals
Main accomplishments • Pathrate: capacity estimation tool • Based on packet pairs and trains • Publication: Transactions on Networking, to appear in 2004, and Infocom 2001 • Pathload: available bandwidth estimation tool • Based on self-loading periodic streams • Publications at ACM SIGCOMM02 and PAM 2002 • Both tools are available at: www.pathrate.org • About 200 downloads per month (and increasing) • Able to measure up to 1Gbps paths, even in the presence of interrupt coalescence • See publication at PAM 2004 • 1st Bandwidth Estimation workshopat CAIDA, Dec’03
Main accomplishments (cont’) • Created testbed at CAIDA with several high-bw routers and switches and realistic cross traffic • Tested all existing open-source bandwidth estimation tools • Showed that, despite that several such tools exist, very few are accurate and consistent • Developed estimation technique for passive capacity estimation • See publications at IMC 2003 and PAM 2004 • Showed that per-hop capacity estimation tools (pathchar-like) are not accurate in the presence of layer-2 switches • See publication at Infocom 2003 • Created ANEMOS, a distributed system for automated on-line monitoring of many network paths • See publication at PAM 2003
Ongoing work • Created SOBAS, an automatic socket buffer sizing technique based on available bandwidth estimation • Basic idea: limit TCP window based on available bandwidth before the connection causes losses • Does not require changes in TCP • Develop estimation technique for thevariation range of available bandwidth in different time scales • Variation range is crucial for some applications, including overlay routing • Evaluate thepredictability of available bandwidthprocess in Internet traffic • How far in the future can we predict the avail-bw with a given accuracy? • Use of bandwidth estimation inoverlay network routingand inUltraScienceNet dynamic optical circuit bandwidth provisioning
Security and Policy forGroup Collaborationhttp://www.mcs.anl.gov/dsl/scidac/security/ • PIs: • Steven Tuecke (ANL) • Carl Kesselman (USC/ISI) • Miron Livny (U. Wisconsin) • Technologies involved: • Globus Toolkit • Condor
Problem • Scalable, fine-grain policy management for large, dynamic collaborations: • Large number of individually managed resources, each with own policies • Large number of users • Users and resources in different domains • Community policies on use of resources
Goals of this Project • Design, develop and standardize tools for maintaining structure of a collaboration • Take into account collaboration policy, user privileges, site policies, resource policies, etc. • Improve significantly the integration of local security environments • E.g., Kerberos • Instantiate our research results into a framework that makes it useable to a wide range of collaborative tools • Globus Toolkit, Condor • Work within standards community to socialize and standardize our approaches • GGF, IETF, OASIS
Our Process Engage with communities Get feedback Design and develop solutions Integrate into community software Standardize solutions for greater acceptance Evaluate and guide emerging standards
Delivered Solutions • Fine-grained Policy R&D: • Community Authorization Service • Dynamic Policy Reconciliation • Site Security Integration: • KCA/Kx509 • Authorization Callouts • Grid Security Usability: • SimpleCA /Online CA / MiniCA • Online Credential Repository
Standards and Implementations • X.509 Proxy Certificates • GSSAPI extensions • Policy work: SAML, XACML • Policy Reconciliation CAS