210 likes | 335 Views
OptIPuter System Software. Andrew A. Chien SAIC Chair Professor, Computer Science and Engineering, UCSD Director, Center for Networked Systems September 2003. OptIPuter System Software Team. Challenge ~20 Lead Researchers, Many More in Entire Team Diverse Researcher Backgrounds and Focus
E N D
OptIPuter System Software Andrew A. ChienSAIC Chair Professor, Computer Science and Engineering, UCSD Director, Center for Networked Systems September 2003
OptIPuter System Software Team • Challenge • ~20 Lead Researchers, Many More in Entire Team • Diverse Researcher Backgrounds and Focus • Broad Research Agenda, Abstract Shared Perspective • Process • Innumerable Phone Calls and 1-on-1 Meetings, Fall 2002-Spring 2003 • Team Meeting with UCSD and UCI Teams (October 4, 2002) • Straw Man OptIPuter System Software Architecture (January 2003) • Goals, Context, Organization, Relationship of Efforts • OptIPuter All Hands Meeting, February 6-7, 2003 • First Presentation to Entire Team • Feedback, Revision, Improvement, Deeper Understanding, Shared Perspective • Optical Signalling and Network Management Meeting (May 22, 2003) • Mambretti Organized • OptIPuter Software Architecture Version 1.0 (July 2003) • Structure Stabilized, interfaces Becoming Concrete
l’s Transform Distributed Systems • Key Technology Changes • Massive Bandwidth • 100-1000x Increases Wide-Area Systems • “End To End” l-Connections • Private Networks, Guaranteed Bandwidth • Endpoints are Parallel Clusters • Large-Scale Network-Attached • Storage • Instruments • Displays • Other Peripherals • Grids and Flexible Wide-Area Sharing • Opportunities • Communication • Tight Wide-area Resource Coupling • Simpler Distributed Applications • Proactive Computing and Communication Challenge is Abstractions, Technologies, and Protocols (SOFTWARE!) to Deliver these Capabilities to Applications
Towards Middleware for l-Networked Systems Application DUROC, GARA, Replica Catalogs, Metadata Servers, Brokers, Workflow Collective GRAM, GridFTP, GRIS, Co-allocation Resource Globus_IO/XIO & GSI Connectivity Resource Access and Control: Computers, Storage, Networks Fabric • Leverage Investment and Capabilities (e.g. Globus 2.2 and 3.0) • Carl Kesselman OptIPuter Participant • Ian Foster, OptIPuter Frontier Advisory Board • Explore What Must Change • New Software/Protocols for Managing Lambdas • Simplify, Deliver Higher Performance and New Capabilities Globus Architecture
OptIPuter Software Architecture for Distributed Virtual Computers v1.1 DVC/ Middleware High-Speed Transport Optical Signaling/Mgmt OptIPuter Applications Visualization DVC #1 DVC #2 DVC #3 Higher Level Grid Services Security Models Data Services: DWTP Real-Time Objects Layer 5: SABUL, RBUDP, Fast, GTP Grid and Web Middleware – (Globus/OGSA/WebServices/J2EE) Node Operating Systems Layer 4: XCP l-configuration, Net Management Physical Resources
OptIPuter Links Three Major Sets ofTechnology Activities • Distributed Virtual Computers • Provide a Simple Abstractions • Aggregate Component Technology Capabilities • Surface Novel Capabilities • High speed Transport Protocols [Bannister’s Talk] • Long Thread of High Bandwidth-Delay Product Network Protocols • Span The Range “Reach” For Dedicated Optical Connections • Complete Integration with IP Network Management • Hybrid – to Local Packet-Switched Networks • Separate – End-to-end • Optical Network Signaling and Management [Mambretti’s Talk] • Single Domain and Inter-Domain • Hybrid Circuit and Packet-Switched Networks • Planning and Execution
Exploiting l’s for an Application • Network View: Ad Hoc connections • Applications Request l-Connections • Network Recognizes High BW flows and Configures • System View: Enclave of Resources and Connections • a Distributed Virtual Computer (a SYSTEM) • How to Specify, Implement, and Exploit?
DVC Examples SDSC • Virtual Cluster (Hide Complexity of Grid; Resource Flexibility) • Shared Single Domain (Spans Multiple) • Private Connections; Simple Network Naming • Simple Resource Discovery and Access • Uniform Performance Characteristics • Direct Access to Everything (Storage, Displays, etc.) • Real-Time Virtual Cluster for Distributed Collaborative Visualization • Grid Resources + Real-Time (TMO) • Collaborative Visualization Cluster • Grid Resources + Photonic Multicast or LambdaRAM (Leigh) UCI or UIC SIO/NCMIR UCSD CSE
Realizing Distributed Virtual Computers • Research Challenges • Application-driven Definition of Abstractions • Useful Collections which Match Application Paradigms and Needs • Incorporates New Collective Models • DVC Description • Namespaces, Communication, Performance, Real-Time, … • Standard Specifications; Most Applications Parameterize • Integration Of Component Technologies • Executing the DVC on a Grid • Planner That Identifies Resources • Selects from Virtual Grid Resources • Negotiates with Resource Managers and Brokers • Executor and Monitor for DVC • Acquires and Configures • Monitors for Failures and Performance • Adapts and Reconfigures
Current Storage Views • Network-attached Storage (NAS) • Filesystem protocols; Integrated Access-Control and Security • Low performance; Little Aggregation and Parallelism • Grid View: High-Level Storage Federation • GridFTP (Distributed File Sharing) • GSI-based Access/Authentication • Put/Get, Third-Party Transfers, Whole File and Segments • Single-System view: Lower-level storage federation • Secure Single System View • SAN – Block Level Disk and Controller Protocols • High Performance, Efficient sharing • Research Areas • Network-Attached Secure Disk • Direct Access File Systems
We Need a Distributed Storage Solutionfor e-Science Distributed Data Generators • BIRN: Distributed Data, Intensive Analysis • 100GB Data Elements; Petabyte Data Sets • Comparative and Collective Analysis across Data Elements • Visualization of Multi-Scale Data Objects
Storage Research Directions • From Performance to Performability • Manage and Exploit Multi-Latency Performance • Parallel Performance, Stability, and Isolation • Integration of Device, Network, Site Reliability Concerns • OptIPuter Storage Directions • Application-Driven Design • Needs, Performance, Device/Site/Network Flexibility, Coding and Selection • Integrate Dynamic l’s and SAN Networks • Peering, Protocol Interfacing, Performance • Performance Robust Storage • Erasure/Other Redundancy; Large-Scale Parallelism; Statistical Approaches to Performance Isolation • Secure Shared Storage: Threshold Cryptography Approach
OptIPuter Security Considerations • OptIPuter as a Computing Platform • Information Assurance and Security Needed for Applications • Current Plan: use Globus Security Infrastructure • OptIPuter as a Research Platform • Current Efforts • Distributed Security Services (Goodrich & Tamassia) • Incremental IP Trace-Back via Packet Marking for DOS Defense (Goodrich) • Enhanced Forensic Analysis By Design (Karin & Peisert) • Planned Efforts • Minimum Round Trip Latency Control (Goodrich) • Hardening Against Attacks by Multi-Path Routing (Goodrich, Karin) • End-to-End Application and Session Security Through Dedicated Lambdas (Karin) Source: Karin, UCSD and Goodrich, UCI
Multi-Lambda Security Opportunities • Security Frequently Defined Through Three Measures: • Integrity, Confidentiality, And Reliability (“Uptime”) • Can These Measures be Enhanced by Employing Multiple Lambdas? • Can Confidentiality be Improved by Dividing the Transmission Over Multiple Lambdas? • Fundamentally or Using “Cheap” Encryption? • Can Integrity be Ensured or Reliability Improved by Exploiting Redundancy? • Source Coding and Performance • Adaptive Techniques Source: Goodrich, Karin
Vision – Real-Time Tightly Coupled Wide-Area Distributed Computing Real-Time Object network Goals • High-precision Timings of Critical Actions • Tight Bounds on Response Times • Ease of Programming • High-Level Prog • Top-Down Design • Ease of Timing Analysis Dynamically formed DistributedVirtual Computer Source: Kim, UCI
Real-Time: from LAN to WAN var • Time-Triggered Message-Triggered Object (TMO) Middleware Subsystem Model that can be Easily Implemented on Both Windows and Linux Platforms Compo-nents of a C++ object • Developed a Global Time-Based Coordination for use in Fair and Efficient Distributed On-Line Game Systems and LAN Feasibility Demonstration • a Step towards Distributed OptIPuter Environment Demonstration • Paper will be Presented at IDPT 2003 Conference, December 2003 TT Method 1 AAC TT Method 2 AAC Deadlines Service Method 1 Service Method 2 • No thread, No priority • High-level Programming Style Source: Kim, UCI
TMO and OptIPuter Software data data data • TMO will be Integrated into the Overall OptIPuter Software Architecture • Begin Design TMO Programming Framework for the OptIPuter • Prototype Implementation TMO Support on Linux Platforms, Including OptIPuter Visualization Cluster (UIC – Leigh, UCI -- Jenks) " Let us start a chorus at 2pm " " e-Science " Middleware Middleware FT Support FT Support TMOSM TMOSM Kernel Kernel Lambda mux / demux Lambda mux / demux • An API Wrapping the Services of the RT Middleware Enables High-Level RT Programming Without a new Compiler Source: Kim, UCI
Prophesy: Application Performance Modeling Web-based GUI Profiling & Template Instrumentation Model Database Builder Performance Database Actual Symbolic Predictor Execution Systems Database DATA DATA ANALYSIS DATABASES COLLECTION • Performance Modeling of Applications on OptIPuter • Cross Platform Comparison (vs. Traditional Grid & Parallel) • Yr1: Completed Data Analysis Module • Yr2: Work with Applications and High Speed Transport Protocols • Target applications include: • SIO Geophysical Data Visualization • NCMIR/BIRN Neuroscience Applications Source: Taylor, TAMU
Summary • OptIPuter System Software Team Organization • Development of a Concrete, Shared Perspective • Organization into Tightly-Coupled Teams • OptIPuter Software Architecture 1.0 (July 2003) • Provides Focus on Key Problems, Clusters Related Activities • Framework for Integrating Diverse Capabilities, Identifying Gaps, Integrating and Delivering Solutions • Research Activity Clusters • Distributed Virtual Computers • Including Real-Time, Security, Storage, Performance Modeling • High Speed Transport Protocols • Optical Signaling and Network Management