200 likes | 319 Views
OptIPuter Software Research and Architecture. Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter Workshop February 6-7, 2003. OptIPuter Software Research. Key driving technology changes
E N D
OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter Workshop February 6-7, 2003
OptIPuter Software Research • Key driving technology changes • advent of massive bandwidth; orders of magnitude increases both in the local-area and wide-area for wired systems, • lambda programmed “end to end” connections which can be used as private networks and can provide guaranteed bandwidth, • endpoint machines which cannot terminate more than a single lambda, due to performance scaling, • large-scale network-attached storage, instruments, displays, and other peripherals, and • Grids and flexible wide-area sharing. • Key research areas suggest opportunities new capabilities in • High performance communication/data movement (bandwidth, time to bandwidth) • Tight-coupling data/storage with computing, visualization, other devices across wide area • Proactive use of communication, data, and compute resources to enhance applications *** OptIPuter Software ***
Network Impact of Lambda’s • Optical “circuit switching” with DWDM • Bandwidth: more from the same fiber infrastructure • Dedicated: controllable latency, low jitter, predictable bandwidth • Private: security, data integrity • Avoid routing (cost, variable latency) *** OptIPuter Software ***
Exploiting ls for an Application • Applications request l-connections • Networks/endpoints automatically recognize high bandwidth flows and allocate/configure transparently • Ad hoc point-to-point connections *** OptIPuter Software ***
A System View • Patch Panel computers? Array processors? Systolic processors? • Connections form a virtual system abstraction • How do we think of the Computing Elements and Network connected together as a SYSTEM? • Based on l connections, what are the potential capabilities? • => Scenarios for composition into a virtual computer *** OptIPuter Software ***
Scenario #1 Dynamically formed Virtual Computer (VC) • Dynamic Virtual Computer (DVC) • User (any entity or collection of entities) forms on-demand • Dynamic configuration of l-network and binding of resources • Possibilities • Centralized control/management of resources in virtual computer • Novel security properties for distributed resources • Novel performance properties for distributed resources *** OptIPuter Software ***
Scenario #2 • Pseudo-Static Virtual Computer (PSVC) • Administrator(s) cooperate to form PSVC configuration • Users (or any entity) can instantiate PSVC on-demand • Slower configuration of l-network and binding of resources • Possibilities • Centralized control/management of resources in virtual computer • Novel security properties for distributed resources • Novel performance properties for distributed resources Pseudo-static Configuration *** OptIPuter Software ***
Scenario #3 ?? • Some Devices can’t run at l speeds; should they be left out? • Storage, instruments (microscopes? ), frame buffers, legacy devices” • Enabling “slower” devices to participate in a virtual computer • Extend the capabilities of ls thru traditional networks to these devices (or sharing l connections) • “Direct access” to shared devices • Preserve unique l-capabilities • Dedicated: controllable latency, low jitter, predictable bandwidth • Private: security, data integrity *** OptIPuter Software ***
OptIPuter Software Research • Near Term Goals and Activities • Define Testbeds and Support Use • Standard OptIPuter node and on-ramp network infrastructure • Define scope of testbed experiments and stability • Distributed Configuration Management For OptIPuter Systems (nodes, networks) • Control Plane Software For DWDM Management And Dynamic Setup • High Speed IP-based Protocols (RBUDP, SABUL, hsTCP, …) • Jumpstart application “rethinking” for l-enabled environments • Computer science and application teams intimate with OptIPuter potentials and application needs *** OptIPuter Software ***
Long Term Goals • System Models • Novel system mechanisms and abstractions; exploit/expose unique l-capabilities • Component Technologies • Communication • Security Models • Data Abstractions • Real-time Objects • l-configuration management • Virtual Computer configuration management • Technical foundation for widespread use • Ex. New capabilities, new models, radical new applications • Enable the driving applications (and many others) • Make easy, high leverage use of OptIPuter capabilities • Demonstrate models for next-generation Distributed E-science *** OptIPuter Software ***
Component Technologies • Communication Protocols which deliver novel capabilities and make l-based easy to use • Bandwidth, latency, parallel stripes • Security models • Leverage l-capabilities and support virtual computer models • Low-overhead integration of resources into virtual computer models and delivery of performance • Proactive Data Placement, Movement & Management supports new capabilities • Expend (“waste”) communication resources to enhance applications • Intelligently replicate and migrate data • Proactive optimization • Real-time Virtual Computers for distributed applications • Ease programming, performance modeling • Enable novel applications • Virtual Computer Configuration and management • Integrates control plane management into resource management *** OptIPuter Software ***
OptIPuter Communication Challenges • Terminating A Terabit Link In An Application System • --> Not A Router • Parallel Termination With Commodity Components • N 10GigE Links -> N Clustered Machines (Low Cost) • Community-Based Communication • What Are: • Efficient Protocols to Move Data in Local, Metropolitan, Wide Area? • High Bandwidth, Low Startup, “Time to Bandwidth” • Dedicated Channels, Shared Endpoints • Good Parallel Abstractions For Communication? • Coordinate Management And Use Of Endpoints And Channels • Convenient For Application, Storage System • Secure Communication Models For “Single System View” • Enabled By “Lambda” Private Channels • Exploit Flexible Dispersion Of Data And Computation *** OptIPuter Software ***
Communication Challenges (Example) • Communicate FAST (Quick) • How to scale to a Terabit and sustain it • Parallel endpoints • TCP and alternatives • Psockets, SABUL 2.1, RBUDP 0.1, hsTCP, XCP • Bandwidth; Latency • Lightweight bypass protocols • FM, AM, BIP, Hamlyn, ST • Communicate FAIR • How to share resources (contention at the endpoints, if not in the network) • Coexistence compatibility; robustness of applications performance *** OptIPuter Software ***
OptIPuter Storage Challenges • DWDM Enables Uniform Performance View Of Storage • How To Exploit Capability? • Other Challenges Remain: Security, Coherence, Parallelism • “Storage Is a Network Device” • Storage Federation: Grid View (High-Level) vs Single-System (Low-level) • Grid: GridFTP, NAS, w/ Access-control and Security in Protocol (Performance Challenges) • Single system: Secure Single System View, SAN direct access (Security Challenges) • Tradeoffs: Performance, Security, and Access Control • Plentiful Bandwidth enables Proactive Data Management • “Waste” storage, bandwidth, and computation empower applications • Drive via models, speculation, application hints, replication and data movement *** OptIPuter Software ***
Storage Challenges (Example) • Earthscope SAR Application • High speed data integration/visualization • 32 gigabytes, delivered in less than 0.5 seconds • Presumed to be sourced from MANY disks distributed throughout the OptIPuter network • How many disks? How many streams? What are the critical performance factors? *** OptIPuter Software ***
Parallel Transfer Performance • Assume physical network no longer the bottleneck • Access time Elements • 10Gbps link: identify, authenticate, connect, xfer data, complete (~33 seconds) • 128 x 10 Gbps links (and storage): <same steps, parallel transfer> (~0.75 seconds • ...but disk and network variability + scaling become key issues *** OptIPuter Software ***
OptIPuter Software Architecture • Approach: • Leverage advances in Grid Software (e.g. Globus 2.2 and 3.0) • Add software/protocols/API’s for managing Lambdas • Explore what else must/can change • To capture the potential of Lambda networks • To simplify where it is now possible • To deliver higher performance • To deliver greater capability *** OptIPuter Software ***
OptIPuter Software Architecture v0.1 Security Models Data Access Protocols Real-Time Objects Fast Protocols Node Operating System Network Routers/Switches Compute/Storage Physical Resources • Network l-configuration enables “virtual computer” view • OptIPuter middleware technologies expose/exploit unique capabilities based on ls • Virtual computer abstraction enables challenging, novel applications OptIPuter Applications Virtual Computer Abstraction “Classic” Grid Middleware l-setup, Mgmt *** OptIPuter Software ***
OptIPuter Software Architecture • Not a strict layering atop Globus • Some Features implemented as new services • l-management and configuration • Security configuration services • Fast Protocols • Real-time objects • Some Features implemented as modifications to • Communication: Globus_IO/XIO and network management • Resource Management: GRAM/GARA/SNAP • Data Movement/Management: GASS/GridFTP/Replication • Security: GSI, GSS *** OptIPuter Software ***
OptIPuter SW Research Summary • Near Term Goals and Activities • Define Testbeds and Support Use (HW, node SW, management, level of experimentation) • Control Plane Software • High Speed IP-based Protocols • CS and App teams “meeting of minds” • Long Term Goals and Activities System Models • Novel system abstractions; exploit/expose unique l-capabilities Component Technologies • Communication, Security Models, Data Abstractions, Real-time Objects, l-management, Virtual Computer configuration management Technical foundation for widespread use and great utility • Ex. New capabilities, new models, radical new applications • Enable the driving applications (and many others) • Demonstrate models for next-generation Distributed E-science *** OptIPuter Software ***