Office of Science MICS Division

Office of Science MICS Division Department of Energy Project Quad Charts High-Performance Networking Research Program Program Manger: Thomas D. Ndousse Tel: 301-903-9960 Email: tndousse@er.doe.gov

Bandwidth Estimation: Methodologies and Applications k claffy, CAIDA at SDSC &Constantinos Dovrolis, Univ. of Delaware High-Performance Network Research SciDAC Project The Novel Ideas • Brief Summary of the Project • Task 1: Develop accurate, fast, and non-intrusive bandwidth estimation (bwest) methodologies and measurement tools. • Task 2: Compare and evaluate different bwest tools (both for end-to-end and per-hop bandwidth metrics), characterizing any observed errors. • Task 3: Use bwest methodologies in transport protocols and applications to optimize throughput for high bandwidth-delay-product paths. • Task 4: Prototype bwest middleware to monitor performance between network domains in real-time. • Innovative end-to-end probing techniques to measure capacity (max possible throughput in empty path) and available bandwidth (max throughput under current load): • Packet Train Dispersion (PTD). • Variable Packet Size (VPS). • Self-Loading Periodic Streams (SLoPS) • Methodologies to check for overbuffered or underbuffered network paths. • Smooth pacing in TCP, driven by bwest measurements. • Smooth bwest driven rate-control for UDP-based applications. Impact and Connections Milestones/Dates/Status • IMPACT: Allow scientific applications (transferring terabytes of data) to efficiently use high-performance networks . • Use explicit bwest measurements instead of implicit bwest via TCP’s congestion control algorithms. • Provide easy-to-use tools for monitoring network path performance. • CONNECTIONS: • Apply bwest methodologies to Web100 and Net100 projects. • Correlate bwest to loss/delay (e.g. PingER project) • Establish prototype bwest middleware in ESnet and for DOE labs and investigators. • Compare and evaluate existing bwest tools: • - Hop-by-hop tool survey Jun01 - Aug02 • - End-to-end tool survey Jun 01 - Jun02 • Bandwidth measurement middleware - Create/maintain testbed Jun01 - Jun04 - Collect link characteristics Jun01 - Jun04 - Correlate active/passive measurements Jun01 - Jun04 • Capacity estimation tool (pathrate) v2.1.2 Dec01 DONE - Add GUI to aid analysis of results Dec02 • Available bandwidth tool (pathload) Mar02 - Paper at PAM’02 Mar02 • Develop UDP-based rate-controlled file transfer app driven from bwest measurements Dec02 • Real-time path monitor using bwest middleware Dec03 MICS Program Manager: Thomas Ndousse 1/3/2020 12:00:03 PM

Security and Policy for Group Collaboration Steven Tuecke, Argonne National Laboratory, Carl Kesselman, USC Information Sciences Institute Miron Livny, U. Wisconsin, Madison The Novel Ideas • Enable collaborative work, with common security tools that address: • - Large, geographically & organizationally distributed membership • - Membership with diverse expertise, comprising different roles • - Community resources with associated community policies • Develop novel tools and approaches for: - Management of collaboration membership and resources - Online CA & Credential Repository (CR), local security integration - Management of roles and privileges - Community Authorization Service (CAS), restricted delegation - Integration into collaborative tools and environments 1. CAS request, with user/group CAS resource names membership Does the and operations User collective policy resource/collective authorize this 2. CAS reply, with membership request for this capability and resource CA info user? collective policy information Impact and Connections Milestones/Dates/Status Resource 3. Resource request, • IMPACT: We expect this project to result in: • Standardization of new PKI-based approaches to credential management, restricted delegation, policy management • Development of security tools and services for collaboration • Widespread deployment and adoption of approaches and tools • CONNECTIONS: • This work builds on the Globus Toolkit’s widely used Grid Security Infrastructure (GSI), and will be in future Globus Toolkit. • To be used by numerous SciDAC collaboratories, including DOE Science Grid, Particle Physics Data Grid, Earth Systems Grid, and Fusion Collaboratory • Also to be used by many non-DOE projects worldwide, including NSF PACI DTF, NASA IPG, and European Data Grid • Demonstrate CAS prototype @ SC’01 November 2001 • Complete X.509 & GSS standards drafts February 2002 • Deliver draft standard conforming GSS April 2002 • Deliver CAS w/ simple policies May 2002 • Demonstrate Online CA & CR September 2002 • Complete Online CA & CR standards drafts December 2002 • Finalize X.509 & GSS standards February 2003 • Deliver Online CA & CR March 2003 • Deliver CAS w/ rich policy & app support May 2003 • Finalize Online CA & CR standards December 2003 • Deliver standards-based Online CA & CR March 2004 • Deliver CAS w/ accounting support May 2004 authenticated with Is this request capability authorized by the local policy capability? information 4. Resource reply Is this request authorized for the CAS? High-Performance Network Research SciDAC Project Community Authorization Service September 2001 MICS Program Manager: Thomas Ndousse

INCITE Novel Ideas • Multiscale / multifractal analysis for traffic bursts • Efficient “packet chirp” and “fat boy” path probing • Active and passive network tomography • Monitor for Application-Generated Network Traffic (MAGNeT) • Traffic Information Collecting Kernel with Exact Timing (TICKET) • Augmented PingER Impact and Connections Milestones • IMPACT: • Optimize performance of demanding applications such as remote visualization and high-capacity data transfers • New understanding of the complex dynamics of large-scale, high-speed networks • New edge-based tools to characterize and map network performance as a function of space, time, application, protocol, and service • CONNECTIONS: • Rice/SLAC/LANL synergy, SciDAC • Analysis, modeling, and inference • Multifractal, wavelet, tomography theory ongoing • Traffic analysis toolbox 12/02 • Passive path inference and tomography algs 10/03 • PingER • Add tomography, chirping, fat boy 04/02 • Port extended PingER to Rice/LANL 10/02 • Add new inference algs to PingER-NG 06/03 • Evaluate, port PingER-NG to GIMI/NMF 04/04 • MAGNeT / TICKET • MAGNeT, TICKET (alpha distribution) 10/02 • High-speed, high-utilization traffic traces 09/02 • MAGNeT (public availability) 06/03 INCITE: Edge-based Traffic Processing and Inference for High-Performance Networks Richard Baraniuk, Rice University; Les Cottrell, SLAC; Wu-chun Feng, LANL High-Performance Network Research SciDAC Project • INCITE Summary • Task 1: Multiscale traffic analysis and modeling • Task 2: Inference algorithms for network paths and links • Task 3:Network tomography • Task 4:Active network measurement: PingER • Task 5:Passive network Measurement: MAGNeT, TICKET • Task 6:Passive path monitoring and tomography toolkit incite.rice.edu MICS Program Manager: Thomas Ndousse Date Prepared: 10 Jan 02

Logistical Networking PIs: Micah Beck, Jack Dongarra, James S. Plank / Tennessee; Rich Wolski / UCSB Novel Ideas • Storage is too cheap to hoard. • Storage can be a scalably shared network resource. • Logistical Networking gives applications and middleware uniform control over buffering and routing of data. • Data storage and data transport can be viewed as points on a spectrum of data management mechanisms. • Monitoring and prediction can replace reservation as a means of scheduling storage resources. • End-to-end networking principles can apply to storage. Impact and Connections Milestones/Dates/Status • IMPACT: • Improved performance and scalability of data-intensive distributed application • Greater ease of and lower cost of deployment of new wide area data management strategies • Dramatically improved flexibility in data-intensive collaboration • CONNECTIONS: • SciDAC: Net100, Data Grid, Scalable Systems, Data Mgt, Computational Science (e.g. Climate, Supernovas) • Base:Network Monitoring, Data Grid, Transport Protocols, Storage Res. Mgt., IQ-Echo, • IBP applications demonstrated at SC’01 • exNode support in NetSolve • Reliability/performance coscheduling alpha • Allocation policy simulation • Initial generalized caching infrastructure • Initial logistical overlay network on ESNet • Wide-area logistical peering mechanisms and policies • Resolution for highly volatile storage resources • Experimental IBP architectures • Large scale measurement and simulations High-Performance Network Research SciDAC Project Logistical Networking: Developing a communicative infrastructure with persistence Tasks: -develop/deploy network storage depots -develop layered storage stack & tools -develop/validate scheduling techniques -optimize application performance loci.cs.utk.edu 6-12mos 12mos 12-18mos 18-36mos MICS Program Manager: Thomas Ndousse Date Prepared: 1/10/02

Net100 PIs: Wendy Huntoon/PSC,Tom Dunigan/ORNL,Brian Tierney/LBNL Net100 Novel Ideas • Net100 will tune network-UNaware applications based on recent and current link characteristics • Net100 will tune more than just transport buffer sizes, such as • TCP AIMD parameters • DUP threshold • Delayed ACK • Net100 will determine optimal paths and whether to use multiple streams and/or multiple paths • Net100 kernel utilizes passive monitoring from the Web100 kernel Impact and Connections Milestones/Dates/Status • IMPACT: • increase throughput of bulk transfers over high delay, bandwidth networks (like DOE’s ESnet) • select optimal paths and transport parameters for distributed (Grid) application (e.g.: GridFTP) • provide network performance data base from active and passive monitoring • CONNECTIONS: • SciDAC: Astrophysics, Bandwidth Estimation, Data Grid, INCITE, Logistical Networking • Base:Network Monitoring, Data Grid, Transport Protocols • Network probes and sensors Mon/Yr DONE - initial sensor and tool deployment 12/01 12/01 - data base design 4/02 - initial data base implementation 9/02 - final sensor/data base 6/03 •Transport protocol optimizations - protocol analysis 11/02 - initial tuning daemon 3/02 - bulk transfer tuning demos 8/02 - final tuning daemon 6/03 • Multipath support - analytical analysis 8/02 - proof-of-principal routing daemons 12/02 - grid applications demos 4/03 High-Performance Network Research - Base Project NET100: Developing network-aware operating systems Tasks: -develop/deploy network probes/sensors -develop network metrics data base -develop transport protocol optimizations -develop network-tuning daemon www.net100.org MICS Program Manager: Thomas Ndousse Date Prepared: 1/7/02

Self-Configuring Network Monitor (SCNM) PIs: Brian Tierney/LBNL and Deb Agarwal/LBNL Novel Ideas • A secure monitoring infrastructure that applications can use to monitor performance of their own data streams • Passive – introduce traffic only in the form of monitoring data and requests for monitoring Tasks Involved • Develop a monitor activation mechanism • Develop monitor software and hardware • Develop data collection and display capabilities • Deploy monitors • Work with applications Impact and Connections Milestones/Dates/Status • IMPACT: • Build a monitoring infrastructure that will aid in debugging of distributed application communication and support both active and passive monitoring • CONNECTIONS: • SciDAC: Net 100, DOE Science Grid, Astrophysics, Bandwidth Estimation, Data Grid, INCITE, Net100 • Base:Network Monitoring, Data Grid, Transport Protocols • URL: • www-itg.lbl.gov/Net-Mon/Self-Config.html • Monitor Daemon Year - Design base passive monitor daemon 1 - Activation mechanism integration 1 - Improvements to network drivers 1 - Improvements and enhancements to sensor mechanism 2 & 3 •Activation Mechanisms - Design basic activation mechanism 1 - Develop and deploy full activation capabilities 2 & 3 • Results Handling Infrastructure - TCP dump viewing capabilities 1 - Develop improved data viewing capabilities 2 & 3 • Deployment of Monitors - Deployment to initial ESnet sites (gig-E) 1 – 3 - Work with applications 2 & 3 - Additional ESnet sites 2 & 3 High-Performance Network Research Base Project SCNM: Developing a distributed passive network monitoring system MICS Program Manager: Thomas Ndousse Date Prepared: 1/7/02

Impact and Connections Milestones/Dates/Status • IMPACT. • Dynamic Right-Sizing • Auto-tuned, order-of-magnitude increase in throughput. • Vendor adoption, e.g., IRIX, Linux (still in the works) • Potential integration into GridFTP, Web100, Net100. • RAPID • Sliding reliability semantics may result in adoption of RAPID by LANL large-data visualization team. • CONNECTIONS. • Dynamic Right-Sizing: Web100, Net100, DOE Science Grid, Particle Physics Data Grid, Earth System Grid II, • RAPID: The LANL large-data visualization team, previously sponsored by the DOE NGI Corridor One project. Others? Mon Yr DONE • Simulation: Flow-Control Adaptation with Dynamic Right-Sizing • Protocol Analysis & Design (ns-2) 12/01 12/01 • Protocol Testing & Evaluation (rudimentary) 03/02 beta testing • Implementation: Flow-Control Adaptation with Dynamic Right-Sizing -Kernel Space, Linux 2.4.x 07/02 beta testing -User Space, drsFTP 01/03 alpha testing -Protocol Testing & Evaluation (rudimentary) 03/03 -Potential Integration with GridFTP 04/03 -Deployment (kernel- & user space) 07/03 • Simulation: RAPID -Effect of packet spacing 03/02 preliminaries -Definition of API to middleware 03/02 preliminaries -Sliding reliablity 07/03 High-Performance Transport Protocols PI: Wu-chun (Wu) Feng, Los Alamos National Laboratory and The Ohio State University High-Performance Network Research - Base Project The Novel Ideas • Goal: To significantly improve network performance in support of all computing environments, particularly grids and NGI. • TCP/IP  Make the network fast but TCP friendly. • Eliminate TCP’s flow-control bottleneck • by automatically tuning buffer sizes. • RAPID  Make the network more adaptable. • Smooth QoS support over a best-effort network. • User-settable reliability, providing a spectrum of QoS from unreliable UDP to reliable TCP. • Dynamic Right-Sizing: TCP Flow-Control Adaptation for Grids & the Next-Generation Internet • Automatically enhance network performance over the WAN by as much as an order of magnitude while abiding by TCP semantics. • RAPID: Rate-Adjusting Protocol for Internet Delivery • Provide smoother QoS support over the best-effort Internet for grids and NGI while minimizing the need for widespread deployment of DiffServ or IntServ. MICS Program Manager: Thomas Ndousse-Fetter January 16, 2002

IQ-ECho PIs: Schwan, Ahamad, Eisenhauer, Yalamanchili -- Georgia Institute of Technology Impact and Connections Milestones/Dates/Status • IQ-ECho IMPACT. • enable network-aware adaptable applications • cross-layer information exchanges will make effective runtime tradeoffs in quality vs. performance across the protocol, middleware, and application levels • enable the creation of efficient and adaptable Grid data services • CONNECTIONS: • Remote visualization (Supernova Visualization), source-based filtering (Oakridge), program monitoring and steering • Extensible cluster platforms (NSF, DOE) • Remote sensing, monitoring, and security (DARPA, NSF) Year 1Mon Yr DONE • performance attributes in ECho middleware 4/02 • select and implement sample application 6/02 • create instrumentation for performance attributes 8/02 • Year 2 • evaluate and tune middleware 3/03 • enable application for adaptation 3/03 • extend/create configurable network protocols 6/03 • Year 3 • integrate ECho-IQ with access grid software 3/04 • demonstrate benefits in access grid environment 6/04 High-Performance Network Research Base Project IQ-ECho– Interactive Quality of Service Across Heterogeneous Hardware/Software IQ-ECho Novel Ideas • integrated QoS management through quality attributes • dynamic code generation relocates application-level functionality to the most appropriate location • configurable protocols and kernel-level monitoring provide the system-level support required for online quality management • vertical programming allows extending platforms while programming applications • represent information flows as event streams in event-based IQ-ECho middleware • use dynamic code generation to migrate application-level filtering/ data processing to appropriate network locations • use network-level feedback to drive application-level quality of service adaptations. • http://www.cc.gatech.edu/systems/projects/IQECho MICS Program Manager: Thomas Ndousse Date Prepared: 1/10/02

PingER PIs: Les Cottrell SLAC PingER novel ideas • Low impact network performance measurements to most of the Internet connected world providing delays, loss and connectivity information over long time periods • Network AND application high throughput performance measurements allowing comparisons, identification of bottlenecks • Continuous, robust, measurement, analysis and web based reporting of results available world wide • Simple infrastructure enabling rapid deployment, locating within an application host, and local site management to avoid security issues Impact and Connections Milestones/Dates/Status • IMPACT: • increase network and Grid application bulk throughput over high delay, bandwidth networks (like DOE’s ESnet) • provide trouble shooting information for networkers and users by identifying the onset and magnitude of performance changes, and whether they appear in the application or the network • provide network performance data base, analysis and navigateable reports from active monitoring • CONNECTIONS: • SciDAC: High Energy Nuclear Physics, Bandwidth Estimation, Data Grid, INCITE • Base:Network Monitoring, Data Grid, Transport Protocols • Infrastructure development Mon/Yr DONE - develop simple window tuning tool 08/01 08/01 - initial infrastructure developed 12/01 12/01 - infrastructure installed at one site 01/02 01/02 - improve and extend infrastructure 06/02 - deploy at 2nd site 08/02 - evaluate GIMI/DMF alternatives 10/02 - extend deployment to PPDG sites 03/03 •Develop analysis/reporting tools - first version for standard apps 02/02 • Integrate new apps &net tools - GridFTP and demo 05/05 - INCITE tools 08/02 - BW measure tools (e.g. pathload) 01/03 • Compare & validate tools - GridFTP 09/02 - BW tools 04/03 High-Performance Network Research Base Project PingER: Active End-to-end performance monitoring for the Research and Education communities Tasks: -develop/deploy simple, robust ssh based active end-to-end measurement and management infrastructure -develop analysis/reporting tools -integrate new application and network measurement tools into the infrastructure -compare & validate various tools, and determine regions of applicability www-iepm.slac.stanford.edu MICS Program Manager: Thomas Ndousse Date Prepared: 1/7/02

Stability Modeling and Control of Transport Protocols for High-Speed Data Grids Nageswara S. Rao, Oak Ridge National Laboratory The Novel Ideas • Detailed analysis of transport dynamics using non-linear control and chaos theory – showed that TCP generates “complicated” phase space attractors • Developed the concept of grid network instruments to perform measurement and traffic engineering using light-weight in-situ modules – analytically showed their performance optimality • Novel transport control methods for end-to-end control for • high throughput using concurrent window and graded control • controlled dynamics using multiple throttle methods Impact and Connections Milestones/Dates/Status • IMPACT. • Provides controlled end-to-end dynamics for grids over wide-area networks – significant step beyond state-of-the art • Fundamentally new classes of transport methods based on sound analysis and experimentation – inexpensive and easy to use • Provides the needed quality of service for control over wide-area networks for data and instrument grids • CONNECTIONS: • Net100 project: will use the proposed instruments and will provide certain measurement modules • Terascale Supernova Initiative can significantly benefit from the proposed control methods – we are in communication • Detailed rigorous analysis: - attractor analysis Feb 02/Feb 03 - conditions of chaos Apr 02/Apr03 • Grid network instrumentation design: - sufficiency proofs of measurements Mar 02/ Mar03 - detailed module design June 02 • Proof of concept implementations: - high throughput July 02 - bounded higher order delay moments Aug 02/Sept 03 • Application and testing: - identification of representative problem Feb 03 - performance study Sept 03 High-Performance Network Research Base Project • Understand and Control the End-to-End Transport Dynamics of High-Speed Grids • Detailed analysis of transport processes • rigorous treatment using non-linear control and chaos theory • Develop provably effective transport methods for: • high throughput, and • end-to-end dynamics control • Implement and test on grid environments MICS Program Manager: Thomas D. Ndousse Date Prepared: 01/09/02

Pushing the Network Simulation Envelope W. R. Wing - Oak Ridge National Laboratory SSFnet Novel Ideas • SSFnet will be the first network simulator with verifiable instrumentation - We plan to include (not model) the Net100/Web100 MIB - Net100/Web100 MIB data will be accumulated for direct comparison • SSFnet will be the first production quality Distributed Memory simulator - Domain Modeling Language will automate decomposition • SSFnet will be the first simulator able to tackle SciDAC-scale problems Impact and Connections Milestones/Dates/Status • IMPACT: SSFnet will be the first network simulator able to: • Fully model SciDAC Terascale applications • Allow SciDAC developers to tune their applications to evolving mixed-technology network environments • Allow testing/confirmation of future SciDAC-developed network protocols • CONNECTIONS: A key element of SSFnet’s verifiability is our plan to directly incorporate the Net100/Web100 MIB in the simulator. Comparison of real-life MIB measurements with the SSF-instrumented MIB will provide confirmation of SSFnet simulation fidelity. However, this does require deployment of at least some SciDAC applications on Web100/Net100 platforms Proposed Milestone Proposed Date Actual Date Verify Shared-mem architectures - IBM, Compaq, Solaris Q1 - FY02 Complete Develop initial DM scheduler Q3 - FY02 Develop MIB instrumentation Q4 - FY02 Develop application-level IDE Q2 - FY03 Develop 2nd-Gen DML-based Scheduler Q4 - FY03 Distribute to DOE community Q4 - FY03 High-Performance Network Research Base Project • SSFnet - Creating a Terascale network simulator that can model SciDAC applications • Tasks: • Verify SM SSFnet on candidate architectures • Develop initial DM version of SSFnet • Develop and verify instrumentation • Develop application-level IDE • Distribute to DOE network research community • Develop 2nd-Gen DM scheduler and DML MICS Program Manager: T. Ndousse Date Prepared 01/08/ 02

Impact and Connections Milestones/Dates/Status • IMPACT. • Dynamic Right-Sizing • Auto-tuned, order-of-magnitude increase in throughput. • Vendor adoption, e.g., IRIX, Linux (still in the works) • Potential integration into GridFTP, Web100, Net100. • RAPID • Sliding reliability semantics may result in adoption of RAPID by LANL large-data visualization team. • CONNECTIONS. • Dynamic Right-Sizing: Web100, Net100, DOE Science Grid, Particle Physics Data Grid, Earth System Grid II, • RAPID: The LANL large-data visualization team, previously sponsored by the DOE NGI Corridor One project. Others? Mon Yr DONE • Simulation: Flow-Control Adaptation with Dynamic Right-Sizing • Protocol Analysis & Design (ns-2) 12/01 12/01 • Protocol Testing & Evaluation (rudimentary) 03/02 beta testing • Implementation: Flow-Control Adaptation with Dynamic Right-Sizing -Kernel Space, Linux 2.4.x 07/02 beta testing -User Space, drsFTP 01/03 alpha testing -Protocol Testing & Evaluation (rudimentary) 03/03 -Potential Integration with GridFTP 04/03 -Deployment (kernel- & user space) 07/03 • Simulation: RAPID -Effect of packet spacing 03/02 preliminaries -Definition of API to middleware 03/02 preliminaries -Sliding reliability 07/03 High-Performance Transport Protocols PI: Wu-chun (Wu) Feng, Los Alamos National Laboratory and The Ohio State University High-Performance Network Research Base Project The Novel Ideas • Goal: To significantly improve network performance in support of all computing environments, particularly grids and NGI. • TCP/IP  Make the network fast but TCP friendly. • Eliminate TCP’s flow-control bottleneck • by automatically tuning buffer sizes. • RAPID  Make the network more adaptable. • Smooth QoS support over a best-effort network. • User-settable reliability, providing a spectrum of QoS from unreliable UDP to reliable TCP. • Dynamic Right-Sizing: TCP Flow-Control Adaptation for Grids & the Next-Generation Internet • Automatically enhance network performance over the WAN by as much as an order of magnitude while abiding by TCP semantics. • RAPID: Rate-Adjusting Protocol for Internet Delivery • Provide smoother QoS support over the best-effort Internet for grids and NGI while minimizing the need for widespread deployment of DiffServ or IntServ. MICS Program Manager: Thomas Ndousse-Fetter January 16, 2002

Office of Science MICS Division

Office of Science MICS Division

Presentation Transcript

Office of Science Perspective

OFFICE OF SCIENCE

Office of Science MICS Division

Office of Science

OFFICE OF SCIENCE

Biomedical Science Division

OFFICE OF SCIENCE

OFFICE OF SCIENCE

MICS

MICS KERYS

Chemistry and Surface Science Division, Office of Research and Development

OFFICE OF SCIENCE

Division of Science

Wireless Mics

OFFICE OF SCIENCE

DOE Office of Science

Office of Science

Microphones (mics)

Division of Science Resources Statistics

OFFICE OF SCIENCE

MICS 2007