130 likes | 149 Views
Office of Science MICS Division. Department of Energy. Project Quad Charts. High-Performance Networking Research Program Program Manger: Thomas D. Ndousse Tel: 301-903-9960 Email: tndousse@er.doe.gov. Bandwidth Estimation: Methodologies and Applications
E N D
Office of Science MICS Division Department of Energy Project Quad Charts High-Performance Networking Research Program Program Manger: Thomas D. Ndousse Tel: 301-903-9960 Email: tndousse@er.doe.gov
Bandwidth Estimation: Methodologies and Applications k claffy, CAIDA at SDSC &Constantinos Dovrolis, Univ. of Delaware High-Performance Network Research SciDAC Project The Novel Ideas • Brief Summary of the Project • Task 1: Develop accurate, fast, and non-intrusive bandwidth estimation (bwest) methodologies and measurement tools. • Task 2: Compare and evaluate different bwest tools (both for end-to-end and per-hop bandwidth metrics), characterizing any observed errors. • Task 3: Use bwest methodologies in transport protocols and applications to optimize throughput for high bandwidth-delay-product paths. • Task 4: Prototype bwest middleware to monitor performance between network domains in real-time. • Innovative end-to-end probing techniques to measure capacity (max possible throughput in empty path) and available bandwidth (max throughput under current load): • Packet Train Dispersion (PTD). • Variable Packet Size (VPS). • Self-Loading Periodic Streams (SLoPS) • Methodologies to check for overbuffered or underbuffered network paths. • Smooth pacing in TCP, driven by bwest measurements. • Smooth bwest driven rate-control for UDP-based applications. Impact and Connections Milestones/Dates/Status • IMPACT: Allow scientific applications (transferring terabytes of data) to efficiently use high-performance networks . • Use explicit bwest measurements instead of implicit bwest via TCP’s congestion control algorithms. • Provide easy-to-use tools for monitoring network path performance. • CONNECTIONS: • Apply bwest methodologies to Web100 and Net100 projects. • Correlate bwest to loss/delay (e.g. PingER project) • Establish prototype bwest middleware in ESnet and for DOE labs and investigators. • Compare and evaluate existing bwest tools: • - Hop-by-hop tool survey Jun01 - Aug02 • - End-to-end tool survey Jun 01 - Jun02 • Bandwidth measurement middleware - Create/maintain testbed Jun01 - Jun04 - Collect link characteristics Jun01 - Jun04 - Correlate active/passive measurements Jun01 - Jun04 • Capacity estimation tool (pathrate) v2.1.2 Dec01 DONE - Add GUI to aid analysis of results Dec02 • Available bandwidth tool (pathload) Mar02 - Paper at PAM’02 Mar02 • Develop UDP-based rate-controlled file transfer app driven from bwest measurements Dec02 • Real-time path monitor using bwest middleware Dec03 MICS Program Manager: Thomas Ndousse 1/3/2020 12:00:03 PM
Security and Policy for Group Collaboration Steven Tuecke, Argonne National Laboratory, Carl Kesselman, USC Information Sciences Institute Miron Livny, U. Wisconsin, Madison The Novel Ideas • Enable collaborative work, with common security tools that address: • - Large, geographically & organizationally distributed membership • - Membership with diverse expertise, comprising different roles • - Community resources with associated community policies • Develop novel tools and approaches for: - Management of collaboration membership and resources - Online CA & Credential Repository (CR), local security integration - Management of roles and privileges - Community Authorization Service (CAS), restricted delegation - Integration into collaborative tools and environments 1. CAS request, with user/group CAS resource names membership Does the and operations User collective policy resource/collective authorize this 2. CAS reply, with membership request for this capability and resource CA info user? collective policy information Impact and Connections Milestones/Dates/Status Resource 3. Resource request, • IMPACT: We expect this project to result in: • Standardization of new PKI-based approaches to credential management, restricted delegation, policy management • Development of security tools and services for collaboration • Widespread deployment and adoption of approaches and tools • CONNECTIONS: • This work builds on the Globus Toolkit’s widely used Grid Security Infrastructure (GSI), and will be in future Globus Toolkit. • To be used by numerous SciDAC collaboratories, including DOE Science Grid, Particle Physics Data Grid, Earth Systems Grid, and Fusion Collaboratory • Also to be used by many non-DOE projects worldwide, including NSF PACI DTF, NASA IPG, and European Data Grid • Demonstrate CAS prototype @ SC’01 November 2001 • Complete X.509 & GSS standards drafts February 2002 • Deliver draft standard conforming GSS April 2002 • Deliver CAS w/ simple policies May 2002 • Demonstrate Online CA & CR September 2002 • Complete Online CA & CR standards drafts December 2002 • Finalize X.509 & GSS standards February 2003 • Deliver Online CA & CR March 2003 • Deliver CAS w/ rich policy & app support May 2003 • Finalize Online CA & CR standards December 2003 • Deliver standards-based Online CA & CR March 2004 • Deliver CAS w/ accounting support May 2004 authenticated with Is this request capability authorized by the local policy capability? information 4. Resource reply Is this request authorized for the CAS? High-Performance Network Research SciDAC Project Community Authorization Service September 2001 MICS Program Manager: Thomas Ndousse
INCITE Novel Ideas • Multiscale / multifractal analysis for traffic bursts • Efficient “packet chirp” and “fat boy” path probing • Active and passive network tomography • Monitor for Application-Generated Network Traffic (MAGNeT) • Traffic Information Collecting Kernel with Exact Timing (TICKET) • Augmented PingER Impact and Connections Milestones • IMPACT: • Optimize performance of demanding applications such as remote visualization and high-capacity data transfers • New understanding of the complex dynamics of large-scale, high-speed networks • New edge-based tools to characterize and map network performance as a function of space, time, application, protocol, and service • CONNECTIONS: • Rice/SLAC/LANL synergy, SciDAC • Analysis, modeling, and inference • Multifractal, wavelet, tomography theory ongoing • Traffic analysis toolbox 12/02 • Passive path inference and tomography algs 10/03 • PingER • Add tomography, chirping, fat boy 04/02 • Port extended PingER to Rice/LANL 10/02 • Add new inference algs to PingER-NG 06/03 • Evaluate, port PingER-NG to GIMI/NMF 04/04 • MAGNeT / TICKET • MAGNeT, TICKET (alpha distribution) 10/02 • High-speed, high-utilization traffic traces 09/02 • MAGNeT (public availability) 06/03 INCITE: Edge-based Traffic Processing and Inference for High-Performance Networks Richard Baraniuk, Rice University; Les Cottrell, SLAC; Wu-chun Feng, LANL High-Performance Network Research SciDAC Project • INCITE Summary • Task 1: Multiscale traffic analysis and modeling • Task 2: Inference algorithms for network paths and links • Task 3:Network tomography • Task 4:Active network measurement: PingER • Task 5:Passive network Measurement: MAGNeT, TICKET • Task 6:Passive path monitoring and tomography toolkit incite.rice.edu MICS Program Manager: Thomas Ndousse Date Prepared: 10 Jan 02
Logistical Networking PIs: Micah Beck, Jack Dongarra, James S. Plank / Tennessee; Rich Wolski / UCSB Novel Ideas • Storage is too cheap to hoard. • Storage can be a scalably shared network resource. • Logistical Networking gives applications and middleware uniform control over buffering and routing of data. • Data storage and data transport can be viewed as points on a spectrum of data management mechanisms. • Monitoring and prediction can replace reservation as a means of scheduling storage resources. • End-to-end networking principles can apply to storage. Impact and Connections Milestones/Dates/Status • IMPACT: • Improved performance and scalability of data-intensive distributed application • Greater ease of and lower cost of deployment of new wide area data management strategies • Dramatically improved flexibility in data-intensive collaboration • CONNECTIONS: • SciDAC: Net100, Data Grid, Scalable Systems, Data Mgt, Computational Science (e.g. Climate, Supernovas) • Base:Network Monitoring, Data Grid, Transport Protocols, Storage Res. Mgt., IQ-Echo, • IBP applications demonstrated at SC’01 • exNode support in NetSolve • Reliability/performance coscheduling alpha • Allocation policy simulation • Initial generalized caching infrastructure • Initial logistical overlay network on ESNet • Wide-area logistical peering mechanisms and policies • Resolution for highly volatile storage resources • Experimental IBP architectures • Large scale measurement and simulations High-Performance Network Research SciDAC Project Logistical Networking: Developing a communicative infrastructure with persistence Tasks: -develop/deploy network storage depots -develop layered storage stack & tools -develop/validate scheduling techniques -optimize application performance loci.cs.utk.edu 6-12mos 12mos 12-18mos 18-36mos MICS Program Manager: Thomas Ndousse Date Prepared: 1/10/02
Net100 PIs: Wendy Huntoon/PSC,Tom Dunigan/ORNL,Brian Tierney/LBNL Net100 Novel Ideas • Net100 will tune network-UNaware applications based on recent and current link characteristics • Net100 will tune more than just transport buffer sizes, such as • TCP AIMD parameters • DUP threshold • Delayed ACK • Net100 will determine optimal paths and whether to use multiple streams and/or multiple paths • Net100 kernel utilizes passive monitoring from the Web100 kernel Impact and Connections Milestones/Dates/Status • IMPACT: • increase throughput of bulk transfers over high delay, bandwidth networks (like DOE’s ESnet) • select optimal paths and transport parameters for distributed (Grid) application (e.g.: GridFTP) • provide network performance data base from active and passive monitoring • CONNECTIONS: • SciDAC: Astrophysics, Bandwidth Estimation, Data Grid, INCITE, Logistical Networking • Base:Network Monitoring, Data Grid, Transport Protocols • Network probes and sensors Mon/Yr DONE - initial sensor and tool deployment 12/01 12/01 - data base design 4/02 - initial data base implementation 9/02 - final sensor/data base 6/03 •Transport protocol optimizations - protocol analysis 11/02 - initial tuning daemon 3/02 - bulk transfer tuning demos 8/02 - final tuning daemon 6/03 • Multipath support - analytical analysis 8/02 - proof-of-principal routing daemons 12/02 - grid applications demos 4/03 High-Performance Network Research - Base Project NET100: Developing network-aware operating systems Tasks: -develop/deploy network probes/sensors -develop network metrics data base -develop transport protocol optimizations -develop network-tuning daemon www.net100.org MICS Program Manager: Thomas Ndousse Date Prepared: 1/7/02
Self-Configuring Network Monitor (SCNM) PIs: Brian Tierney/LBNL and Deb Agarwal/LBNL Novel Ideas • A secure monitoring infrastructure that applications can use to monitor performance of their own data streams • Passive – introduce traffic only in the form of monitoring data and requests for monitoring Tasks Involved • Develop a monitor activation mechanism • Develop monitor software and hardware • Develop data collection and display capabilities • Deploy monitors • Work with applications Impact and Connections Milestones/Dates/Status • IMPACT: • Build a monitoring infrastructure that will aid in debugging of distributed application communication and support both active and passive monitoring • CONNECTIONS: • SciDAC: Net 100, DOE Science Grid, Astrophysics, Bandwidth Estimation, Data Grid, INCITE, Net100 • Base:Network Monitoring, Data Grid, Transport Protocols • URL: • www-itg.lbl.gov/Net-Mon/Self-Config.html • Monitor Daemon Year - Design base passive monitor daemon 1 - Activation mechanism integration 1 - Improvements to network drivers 1 - Improvements and enhancements to sensor mechanism 2 & 3 •Activation Mechanisms - Design basic activation mechanism 1 - Develop and deploy full activation capabilities 2 & 3 • Results Handling Infrastructure - TCP dump viewing capabilities 1 - Develop improved data viewing capabilities 2 & 3 • Deployment of Monitors - Deployment to initial ESnet sites (gig-E) 1 – 3 - Work with applications 2 & 3 - Additional ESnet sites 2 & 3 High-Performance Network Research Base Project SCNM: Developing a distributed passive network monitoring system MICS Program Manager: Thomas Ndousse Date Prepared: 1/7/02
Impact and Connections Milestones/Dates/Status • IMPACT. • Dynamic Right-Sizing • Auto-tuned, order-of-magnitude increase in throughput. • Vendor adoption, e.g., IRIX, Linux (still in the works) • Potential integration into GridFTP, Web100, Net100. • RAPID • Sliding reliability semantics may result in adoption of RAPID by LANL large-data visualization team. • CONNECTIONS. • Dynamic Right-Sizing: Web100, Net100, DOE Science Grid, Particle Physics Data Grid, Earth System Grid II, • RAPID: The LANL large-data visualization team, previously sponsored by the DOE NGI Corridor One project. Others? Mon Yr DONE • Simulation: Flow-Control Adaptation with Dynamic Right-Sizing • Protocol Analysis & Design (ns-2) 12/01 12/01 • Protocol Testing & Evaluation (rudimentary) 03/02 beta testing • Implementation: Flow-Control Adaptation with Dynamic Right-Sizing -Kernel Space, Linux 2.4.x 07/02 beta testing -User Space, drsFTP 01/03 alpha testing -Protocol Testing & Evaluation (rudimentary) 03/03 -Potential Integration with GridFTP 04/03 -Deployment (kernel- & user space) 07/03 • Simulation: RAPID -Effect of packet spacing 03/02 preliminaries -Definition of API to middleware 03/02 preliminaries -Sliding reliablity 07/03 High-Performance Transport Protocols PI: Wu-chun (Wu) Feng, Los Alamos National Laboratory and The Ohio State University High-Performance Network Research - Base Project The Novel Ideas • Goal: To significantly improve network performance in support of all computing environments, particularly grids and NGI. • TCP/IP Make the network fast but TCP friendly. • Eliminate TCP’s flow-control bottleneck • by automatically tuning buffer sizes. • RAPID Make the network more adaptable. • Smooth QoS support over a best-effort network. • User-settable reliability, providing a spectrum of QoS from unreliable UDP to reliable TCP. • Dynamic Right-Sizing: TCP Flow-Control Adaptation for Grids & the Next-Generation Internet • Automatically enhance network performance over the WAN by as much as an order of magnitude while abiding by TCP semantics. • RAPID: Rate-Adjusting Protocol for Internet Delivery • Provide smoother QoS support over the best-effort Internet for grids and NGI while minimizing the need for widespread deployment of DiffServ or IntServ. MICS Program Manager: Thomas Ndousse-Fetter January 16, 2002
IQ-ECho PIs: Schwan, Ahamad, Eisenhauer, Yalamanchili -- Georgia Institute of Technology Impact and Connections Milestones/Dates/Status • IQ-ECho IMPACT. • enable network-aware adaptable applications • cross-layer information exchanges will make effective runtime tradeoffs in quality vs. performance across the protocol, middleware, and application levels • enable the creation of efficient and adaptable Grid data services • CONNECTIONS: • Remote visualization (Supernova Visualization), source-based filtering (Oakridge), program monitoring and steering • Extensible cluster platforms (NSF, DOE) • Remote sensing, monitoring, and security (DARPA, NSF) Year 1Mon Yr DONE • performance attributes in ECho middleware 4/02 • select and implement sample application 6/02 • create instrumentation for performance attributes 8/02 • Year 2 • evaluate and tune middleware 3/03 • enable application for adaptation 3/03 • extend/create configurable network protocols 6/03 • Year 3 • integrate ECho-IQ with access grid software 3/04 • demonstrate benefits in access grid environment 6/04 High-Performance Network Research Base Project IQ-ECho– Interactive Quality of Service Across Heterogeneous Hardware/Software IQ-ECho Novel Ideas • integrated QoS management through quality attributes • dynamic code generation relocates application-level functionality to the most appropriate location • configurable protocols and kernel-level monitoring provide the system-level support required for online quality management • vertical programming allows extending platforms while programming applications • represent information flows as event streams in event-based IQ-ECho middleware • use dynamic code generation to migrate application-level filtering/ data processing to appropriate network locations • use network-level feedback to drive application-level quality of service adaptations. • http://www.cc.gatech.edu/systems/projects/IQECho MICS Program Manager: Thomas Ndousse Date Prepared: 1/10/02
PingER PIs: Les Cottrell SLAC PingER novel ideas • Low impact network performance measurements to most of the Internet connected world providing delays, loss and connectivity information over long time periods • Network AND application high throughput performance measurements allowing comparisons, identification of bottlenecks • Continuous, robust, measurement, analysis and web based reporting of results available world wide • Simple infrastructure enabling rapid deployment, locating within an application host, and local site management to avoid security issues Impact and Connections Milestones/Dates/Status • IMPACT: • increase network and Grid application bulk throughput over high delay, bandwidth networks (like DOE’s ESnet) • provide trouble shooting information for networkers and users by identifying the onset and magnitude of performance changes, and whether they appear in the application or the network • provide network performance data base, analysis and navigateable reports from active monitoring • CONNECTIONS: • SciDAC: High Energy Nuclear Physics, Bandwidth Estimation, Data Grid, INCITE • Base:Network Monitoring, Data Grid, Transport Protocols • Infrastructure development Mon/Yr DONE - develop simple window tuning tool 08/01 08/01 - initial infrastructure developed 12/01 12/01 - infrastructure installed at one site 01/02 01/02 - improve and extend infrastructure 06/02 - deploy at 2nd site 08/02 - evaluate GIMI/DMF alternatives 10/02 - extend deployment to PPDG sites 03/03 •Develop analysis/reporting tools - first version for standard apps 02/02 • Integrate new apps &net tools - GridFTP and demo 05/05 - INCITE tools 08/02 - BW measure tools (e.g. pathload) 01/03 • Compare & validate tools - GridFTP 09/02 - BW tools 04/03 High-Performance Network Research Base Project PingER: Active End-to-end performance monitoring for the Research and Education communities Tasks: -develop/deploy simple, robust ssh based active end-to-end measurement and management infrastructure -develop analysis/reporting tools -integrate new application and network measurement tools into the infrastructure -compare & validate various tools, and determine regions of applicability www-iepm.slac.stanford.edu MICS Program Manager: Thomas Ndousse Date Prepared: 1/7/02
Stability Modeling and Control of Transport Protocols for High-Speed Data Grids Nageswara S. Rao, Oak Ridge National Laboratory The Novel Ideas • Detailed analysis of transport dynamics using non-linear control and chaos theory – showed that TCP generates “complicated” phase space attractors • Developed the concept of grid network instruments to perform measurement and traffic engineering using light-weight in-situ modules – analytically showed their performance optimality • Novel transport control methods for end-to-end control for • high throughput using concurrent window and graded control • controlled dynamics using multiple throttle methods Impact and Connections Milestones/Dates/Status • IMPACT. • Provides controlled end-to-end dynamics for grids over wide-area networks – significant step beyond state-of-the art • Fundamentally new classes of transport methods based on sound analysis and experimentation – inexpensive and easy to use • Provides the needed quality of service for control over wide-area networks for data and instrument grids • CONNECTIONS: • Net100 project: will use the proposed instruments and will provide certain measurement modules • Terascale Supernova Initiative can significantly benefit from the proposed control methods – we are in communication • Detailed rigorous analysis: - attractor analysis Feb 02/Feb 03 - conditions of chaos Apr 02/Apr03 • Grid network instrumentation design: - sufficiency proofs of measurements Mar 02/ Mar03 - detailed module design June 02 • Proof of concept implementations: - high throughput July 02 - bounded higher order delay moments Aug 02/Sept 03 • Application and testing: - identification of representative problem Feb 03 - performance study Sept 03 High-Performance Network Research Base Project • Understand and Control the End-to-End Transport Dynamics of High-Speed Grids • Detailed analysis of transport processes • rigorous treatment using non-linear control and chaos theory • Develop provably effective transport methods for: • high throughput, and • end-to-end dynamics control • Implement and test on grid environments MICS Program Manager: Thomas D. Ndousse Date Prepared: 01/09/02
Pushing the Network Simulation Envelope W. R. Wing - Oak Ridge National Laboratory SSFnet Novel Ideas • SSFnet will be the first network simulator with verifiable instrumentation - We plan to include (not model) the Net100/Web100 MIB - Net100/Web100 MIB data will be accumulated for direct comparison • SSFnet will be the first production quality Distributed Memory simulator - Domain Modeling Language will automate decomposition • SSFnet will be the first simulator able to tackle SciDAC-scale problems Impact and Connections Milestones/Dates/Status • IMPACT: SSFnet will be the first network simulator able to: • Fully model SciDAC Terascale applications • Allow SciDAC developers to tune their applications to evolving mixed-technology network environments • Allow testing/confirmation of future SciDAC-developed network protocols • CONNECTIONS: A key element of SSFnet’s verifiability is our plan to directly incorporate the Net100/Web100 MIB in the simulator. Comparison of real-life MIB measurements with the SSF-instrumented MIB will provide confirmation of SSFnet simulation fidelity. However, this does require deployment of at least some SciDAC applications on Web100/Net100 platforms Proposed Milestone Proposed Date Actual Date Verify Shared-mem architectures - IBM, Compaq, Solaris Q1 - FY02 Complete Develop initial DM scheduler Q3 - FY02 Develop MIB instrumentation Q4 - FY02 Develop application-level IDE Q2 - FY03 Develop 2nd-Gen DML-based Scheduler Q4 - FY03 Distribute to DOE community Q4 - FY03 High-Performance Network Research Base Project • SSFnet - Creating a Terascale network simulator that can model SciDAC applications • Tasks: • Verify SM SSFnet on candidate architectures • Develop initial DM version of SSFnet • Develop and verify instrumentation • Develop application-level IDE • Distribute to DOE network research community • Develop 2nd-Gen DM scheduler and DML MICS Program Manager: T. Ndousse Date Prepared 01/08/ 02
Impact and Connections Milestones/Dates/Status • IMPACT. • Dynamic Right-Sizing • Auto-tuned, order-of-magnitude increase in throughput. • Vendor adoption, e.g., IRIX, Linux (still in the works) • Potential integration into GridFTP, Web100, Net100. • RAPID • Sliding reliability semantics may result in adoption of RAPID by LANL large-data visualization team. • CONNECTIONS. • Dynamic Right-Sizing: Web100, Net100, DOE Science Grid, Particle Physics Data Grid, Earth System Grid II, • RAPID: The LANL large-data visualization team, previously sponsored by the DOE NGI Corridor One project. Others? Mon Yr DONE • Simulation: Flow-Control Adaptation with Dynamic Right-Sizing • Protocol Analysis & Design (ns-2) 12/01 12/01 • Protocol Testing & Evaluation (rudimentary) 03/02 beta testing • Implementation: Flow-Control Adaptation with Dynamic Right-Sizing -Kernel Space, Linux 2.4.x 07/02 beta testing -User Space, drsFTP 01/03 alpha testing -Protocol Testing & Evaluation (rudimentary) 03/03 -Potential Integration with GridFTP 04/03 -Deployment (kernel- & user space) 07/03 • Simulation: RAPID -Effect of packet spacing 03/02 preliminaries -Definition of API to middleware 03/02 preliminaries -Sliding reliability 07/03 High-Performance Transport Protocols PI: Wu-chun (Wu) Feng, Los Alamos National Laboratory and The Ohio State University High-Performance Network Research Base Project The Novel Ideas • Goal: To significantly improve network performance in support of all computing environments, particularly grids and NGI. • TCP/IP Make the network fast but TCP friendly. • Eliminate TCP’s flow-control bottleneck • by automatically tuning buffer sizes. • RAPID Make the network more adaptable. • Smooth QoS support over a best-effort network. • User-settable reliability, providing a spectrum of QoS from unreliable UDP to reliable TCP. • Dynamic Right-Sizing: TCP Flow-Control Adaptation for Grids & the Next-Generation Internet • Automatically enhance network performance over the WAN by as much as an order of magnitude while abiding by TCP semantics. • RAPID: Rate-Adjusting Protocol for Internet Delivery • Provide smoother QoS support over the best-effort Internet for grids and NGI while minimizing the need for widespread deployment of DiffServ or IntServ. MICS Program Manager: Thomas Ndousse-Fetter January 16, 2002