1 / 79

OptIPuter System Software

OptIPuter System Software. Andrew A. Chien Computer Science and Engineering, UCSD January 2005 OptIPuter All-Hands Meeting. OptIPuter Software Architecture for Distributed Virtual Computers v1.1. DVC/ Middleware. High-Speed Transport. Optical Signaling/Mgmt.

abril
Download Presentation

OptIPuter System Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OptIPuter System Software Andrew A. ChienComputer Science and Engineering, UCSD January 2005 OptIPuter All-Hands Meeting

  2. OptIPuter Software Architecture for Distributed Virtual Computers v1.1 DVC/ Middleware High-Speed Transport Optical Signaling/Mgmt • January 2003, OptIPuter All Hands Meeting OptIPuter Applications Visualization DVC #1 DVC #2 DVC #3 Higher Level Grid Services Security Models Data Services: DWTP Real-Time Objects Layer 5: SABUL, RBUDP, Fast, GTP Grid and Web Middleware – (Globus/OGSA/WebServices/J2EE) Node Operating Systems Layer 4: XCP l-configuration, Net Management Physical Resources

  3. OptIPuter Software Architecture Distributed Applications/ Web Services Visualization Telescience SAGE JuxtaView Data Services Vol-a-Tile LambdaRAM DVC API DVC Runtime Library DVC Configuration DVC Services DVC Communication DVC Job Scheduling DVC Core Services Resource Identify/Acquire Namespace Management Security Management High Speed Communication Storage Services Globus XIO GSI RobuStore PIN/PDC GRAM GTP XCP UDT CEP LambdaStream RBUDP

  4. System Software/Middleware Progress • Significant Progress in Key Areas! • A unified Vision of Application Interface to the OptIPuter Middleware • Distributed Virtual Computer: Simpler Application Models, New Capabilities • 3-Layer Demonstration: JuxtaView/LambdaRAM Tiled Viz on DVC on Transports • Efficient Transport Protocols to exploit High Speed Optical Networks • RBUDP/LambdaStream, XCP, GTP, CEP, SABUL/UDT • Single Streams, Converging Streams, Composite Endpoint Flows • Unified Presentation under XIO (single application API) • Performance Modeling • Characterization of Vol-a-tile Performance on Small-scale Configurations • Real-time • Definition of a Real-time DVC, Components for Layered RT Resource Management – IRDRM, RCIM • Storage • Design and Initial Simulation Evaluation of LT Code-based Techniques for Distributed Robust (low variance of access, guaranteed bandwidth) Storage • Security • Efficient Group Membership Protocols to support Broadcast and Coordination across OptIPuters

  5. Cross Team Integration and Demonstrations • TeraBIT Juggling, 2-layer Demo [SC2004, November 8-12, 2004] • Distributed Virtual Computer, OptIPuter Transport Protocols (GTP) • Move data between OptIPuter Network Endpoints (UCSD, UIC, Pittsburgh) • Share efficiently; Good Flow Behavior, Maximize Transfer Speeds (saturate all rcvrs) • Configuration: 10 endpoints, 40+ nodes, 1000’s of miles • Achieved 17.8Gbps, a TeraBIT in less than one minute! • 3-layer Demo [AHM2005, January 26-7, 2005] • Visualization, Distributed Virtual Computer, OptIPuter Transport Protocols • 5-layer Demo [iGrid, September 26-8, 2005 ??] • Biomedical/Geophysical, Visualization, Distributed Virtual Computer, OptIPuter Transport Infrastructure, Optical Network Configuration

  6. OptIPuter Software “Stack” 3-layer Demo Applications (Neuroscience, Geophysics) Visualization Distributed Virtual Computer (Coordinated Network and Resource Configuration) 5-layer Demo Novel Transport Protocols Optical Network Configuration

  7. Year 3 Goals • Integration and Demonstration of Capability • All Five Layers (Application, Visualization, DVC, Transport Protocols, Optical Network Control) • Across a Range of Testbeds • With Neuroscience and Geophysical Applications • Distributed Virtual Computer • Integrate with Network Configuration (e.g. PIN) • Deploy as persistent OptIPuter Testbed Service • Alpha Release of DVC as a Library • Efficient Transport Protocols • LambdaStream: Implement, Analyze Effectiveness, Integrate with XIO • GTP: Release and Demonstrate at Scale; Analytic Stability Modeling • CEP: Implement and Evaluate Dynamic N-to-M Communication • SABUL/UDT: Integrate with XIO; Flexible Prototyping Toolkit • Unified Presentation under XIO (single application API) • Performance Modeling • Characterization of Vol-a-tile, JuxtaView Performance on Wide-Area OptIPuter • Real-time • Prototype RT DVC, Experiment: remote device control within Campus Scale OptIPuter • Storage • Prototype RobuSTore, Evaluate using OptIPuter Testbeds and Applications • Security • Develop and Evaluate High Speed / Low Latency Network Layer Authentication and Encryption

  8. 10Gig WANs: Terabit Juggling • SC2004: 17.8Gbps, a TeraBIT in < 1 minute! • SC2005: Juggle Terabytes in a Minute UI at Chicago 10 GE 10 GE 10 GE NIKHEF Trans-Atlantic Link PNWGP Seattle 10 GE 10 GE NetherLight Amsterdam StarLight Chicago U of Amsterdam 10 GE SC2004 Pittsburgh Netherlands CENIC Los Angeles 2 GE UCI United States 2 GE ISI/USC UCSD/SDSC 10 GE SDSC JSOE CSE 2 GE CENIC San Diego 10 GE 10 GE 10 GE 1 GE SIO

  9. 3-layer Integrated Demonstration Visualization Application (Juxtaview + LambdaRAM) System SW Fmwork (Distributed Virtual Computer) System SW Transports (GTP, UDT, etc.) Nut Taesombut, Venkat Vishwanath, Ryan Wu, Freek Dijkstra, David Lee, Aaron Chin, Lance Long UCSD/CSAG, UIC, UvA, UCSD/NCMIR, etc. January 2005, OptIPuter All Hands Meeting

  10. 3-Layer Demo Configuration Configuration JuxtaView at NCMIR LamdaRAM Client at NCMIR LambdaRAM Server EVL, UvA High Bandwidth (2.5Gbps, ~7 streams) Long Latencies, Two Configurations NCMIR/ San Diego EVL/ Chicago Output Video Streaming SDSC/ San Diego NLR/CAVEWAVE 10G/ 70 msec CAMPUS GE 10G/ 0.5 msec GTP Flows Transatlantic Link 4G/ 100 msec Audiences UvA/ Amsterdam

  11. Distributed Virtual Computers Nut Taesombut and Andrew Chien University of California, San Diego January 2005 OptIPuter All-Hands Meeting

  12. Distributed Virtual Computer (DVC) DVC • Application Request: Grid Resources AND Network Connectivity • Redline-style Specification, 1st Order Constraint Language • DVC Broker Establishes DVC • Binds Ends Resources, Switching, Lambda’s • Leverages Grid Protocols for Security, Resource Access • DVC <-> Private Resource Environment, Surface thru WSRF

  13. Distributed Virtual Computer (DVC) • Key Features • Single Distributed Resource Configuration Description and Binding • Simple use of Optical Network Configuration and Grid Resource Binding • Single Interface to Diverse Communication Capabilities • Transport Protocols, Novel Communication Capabilities • Using a DVC • Application presents Resource Specification • Requests Grid Resources and Lambda Connectivity • DVC Broker Selects Resources and Network Configuration • DVC Broker Binds Resources and Configures Network, and Return List of Bound Resources and Their Respective (Newly Created) IP’s • Application Uses These IP’s to Access Created Network Paths • Application Selects Communication Protocols and Mechanisms amongst Bound Resources • Application Executes • Application Releases the DVC [Taesombut & Chien, UCSD]

  14. JuxtaView and LambdaRAM on DVC Example Resource/Network Information Services (Globus MDS) viz1: ncmir.ucsd.sandiego str1: rembrandt0.uva.amsterdam str2: rembrandt1.uva.amsterdam str3: rembrandt2.uva.amsterdam str4: rembrandt6.uva.amsterdam (rembrandt0,yorda0.uic.chicago) --- BW 1, LambdaID 3 (rembrandt1,yorda0.uic.chicago) --- BW 1, LambdaID 4 (rembrandt2,yorda0.uic.chicago) --- BW 1, LambdaID 5 (rembrandt6,yorda0.uic.chicago) --- BW 1, LambdaID 17 Physical Resources and Network Configuration Application Requirements and Preference (communication + end resources) [ viz ISA [type =="vizcluster"; InSet(special-device, "tiled display")]; str1 ISA [free-memory>1700;  InSet(dataset, "rat-brain.rgba")]; str2 ISA [free-memory>1700;  InSet(dataset, "rat-brain.rgba")]; str3 ISA [free-memory>1700;  InSet(dataset, "rat-brain.rgba")]; str4 ISA [free-memory>1700;  InSet(dataset, "rat-brain.rgba")]; Link1 ISA [restype = "conn"; ep1 = <viz>; ep2 = <str1>; bandwidth > 940; latency <= 100]; Link2 ISA [restype = "conn"; ep1 = <viz>; ep2 = <str2>; bandwidth > 940; latency <= 100]; Link3 ISA [restype = "conn"; ep1 = <viz>; ep2 = <str3>; bandwidth > 940; latency <= 100]; Link4 ISA [restype = "conn"; ep1 = <viz>; ep2 = <str4>; bandwidth > 940; latency <= 100] ] (1) Requests a Viz Cluster, Storage Servers, and High-Bandwidth Connectivity DVC Manager

  15. JuxtaView and LambdaRAM on DVC Example 192.168.85.13 192.168.85.14 192.168.85.15 192.168.85.16 192.168.85.12 (2) Allocates End Resources and Communication • Resource Binding (GRAM) • Lambda Path Instantiation (PIN) (Current Demo doesn’t yet include this) • DVC IP Allocation DVC Manager PIN Server NCMIR/San Diego UvA/Amsterdam

  16. JuxtaView and LambdaRAM on DVC Example 192.168.85.13 192.168.85.14 192.168.85.15 192.168.85.16 192.168.85.12 (3) Create Resource Groups • Storage Group • Viz Group DVC Manager Storage Group Viz Group NCMIR/San Diego UvA/Amsterdam

  17. JuxtaView and LambdaRAM on DVC Example 192.168.85.13 192.168.85.14 192.168.85.15 192.168.85.16 192.168.85.12 (4) Launch Applications • Launch LambdaRAM Servers • Launch JuxtaView/ LambdaRAM Clients DVC Manager Storage Group Viz Group NCMIR/San Diego UvA/Amsterdam

  18. OptIPuter Component Technologies Real-time DVC’s Application Performance Analysis High Speed Transports (CEP, LambdaStream, XCP, GTP, UDT) Storage Security

  19. Vision – Real-Time Tightly Coupled Wide-Area Distributed Computing Real-Time Object network Goals • High-precision Timings of Critical Actions • Tight Bounds on Response Times • Ease of Programming • High-Level Prog • Top-Down Design • Ease of Timing Analysis Dynamically formed DistributedVirtual Computer Source: Kim, UCI

  20. Real-Time DVC Architecture Application expressed as teal time objects and links w/ various latency constraints) Real-time Application Real-Time Object Network Schedules and manages underlying resources to achieve desired RT TMO Real-Time Middleware Collection of Resources with known performance and security capabilities, and control & management Provides simple resource and management abstractions, hides detailed resource management (i.e. network provisioning, machine reservation) Distributed Virtual Machine Libraries that realize initial configuration and ongoing management High Speed Protocols/Network Management /Basic Resource Management Controls and Manages “single” resources

  21. Real-Time: from LAN to WAN • RT grid (or subgrid) ::= A grid (or subgrid) facilitating (RG1) Message communications with easily determinable tight latency bounds and (RG2) Computing node operations enabling easy guaranteeing of timely progress of threads toward computational milestones • RG1 realized via • Dedicated optical-path WAN • Campus networks, the LAN part of the RT grid, equipped with Time-Triggered (TT) Ethernet switches (a new research task in collaboration with Hermann Kopetz) Source: Kim, UCI

  22. Real-Time DVC (RD1) Message paths with easily determinable tight latency bounds. (RD2) In each computing or sensing-actuating site within the RT DVC, computing nodes must exhibit timing behaviorswhich are not different from those of computing nodes in an isolated site by more than a few percents. Also, computing nodes in an RT DVC must enable easy procedures for assuring the very high probability of application processes and threads reaching important milestones on time. => Computing nodes must be equipped with appropriate infrastructure software, i.e., OS kernel & middleware with easily analyzable QoS. (RD3) If representative computing nodes of two RT DVCs are connected via RT message paths, then the ensemble consisting of the two DVCs and the RT message paths is also an RT DVC. Source: Kim, UCI

  23. Middleware for Real-Time DVC data data data Acq of ’s; Alloc of Virtual ’s; Coord of msg-send timings " Let us start a chorus at 2pm " " e-Science " Support exec of appls viaAlloc of comp & comm resources within DVC On-demand creation of DVCs RGRM RT grid resource management RCIM RT comm infrastr mgt IRDRM Intra-RT-DVC res mgt Basic Infrastructure Services IRDRM agent RCIM agent Globus System l-Configuration Net Management Source: Kim, UCI

  24. Progress var • RCIM (RT comm infrastructure mgt) • Study of TT Ethernet began with the help of Hermann Kopetz • The 1st unit is expected to become available to us by June 2005. • IRDRM (Intra-RT-DVC resource mgt) • TMO (Time-triggered Message-triggered Object) Support Middleware (TMOSM) adopted as a starting base • A significantly redesigned version (4.1) of TMOSM (for improved modularity, concurrency, and portability) has been developed. It runs on Linux, WinXP, and WinCE. • An effort for extending the TMOSM to fit into the Jenks’ cluster began.   Compo-nents of a C++ object TT Method 1 AAC TT Method 2 AAC  Deadlines Service Method 1 Service Method 2  • No thread, No priority • High-level Programming Style Source: Kim, UCI

  25. Progress (cont.) • Programming model • An API wrapping the services of the RT middleware enables high-level RT programming (TMO) without a new compiler. • The notion of Distance-Aware (DA) TMO, an attractive building-block for RT wide-area DC applications, was created and a study for its realization began. • Application development experiments • Fair and efficient Distributed On-Line Game Systems and LAN-based feasibility demonstration • Application of the global-time-based coordination principle • A step towards OptIPuter environment demonstration • Publication • A paper on distributed on-line game systems in IDPT2003 proc. • A paper on distributed on-line game systems to appear in ACM-Springer Journal on Multimedia Systems • A keynote paper on RT DVC at AINA2004 proc. • A paper on RT DVC middleware to appear in WORDS2005 proc. Source: Kim, UCI

  26. Year 3 Plan • RCIM (RT comm infrastructure mgt) • Development of middleware support for TT Ethernet • The 1st unit of TT Ethernet switch is expected to become available to us by June 2005. • IRDRM (Intra-RT-DVC resource mgt) • Extension of TMOSM to fit into clusters • Interfacing TMOSM to the Basic Infrastructure Services of OptIPuter Source: Kim, UCI

  27. Year 3 Plan • Application development experiments • An experiment for remote access and control within the UCI or UCSD campus • A step toward preparation of an experiment for remote access and control of electron microscopes at UCSD-NCMIR Source: Kim, UCI

  28. Performance Analysis and Monitoring of VolaTile • Use Prophesy system to Instrument and Study VolaTile on 5-node System • Evaluate Performance Impact of Configuration (data servers, clients, network) Xingfu Wu <wuxf@cs.tamu.edu> [Wu & Taylor, TAMU]

  29. Comparison of VolaTile Configuration Scenarios Xingfu Wu <wuxf@cs.tamu.edu>

  30. Year 3+ Plans • Port the instrumented Volatile to a large-scale optiputer testbed for analysis (3/2005) • Analyze the performance of JuxtaView and LambdaRam applications (6/2005) • Where possible, develop models of data accesses for the different visualization applications (9/2005) • Continue collaborating with Jason’s group about viz applications (12/2005) Xingfu Wu <wuxf@cs.tamu.edu>

  31. High Speed Protocols

  32. High Performance Transport Problem • OptIPuter is Bridging the Gap Between High Speed Link Technologies and Growing Demands of Advanced Applications • Transport Protocols Are the Weak Link • TCP Has Well-Documented Problems That Militate Against its Achieving High Speeds • Slow Start Probing Algorithm • Congestion Avoidance Algorithm • Flow Control Algorithm • Operating System Considerations • Friendliness and Fairness Among Multiple Connections • These Problems Are the Foci of Much Ongoing Work • OptIPuter is Pursuing Four Complementary Avenues of Investigation • RBUDP Addresses Problems of Bulk Data Transfer • SABUL Addresses Problems of High Speed Reliable Communication • GTP Addresses Problems of Multiparty Communication • XCP Addresses Problems of General Purpose, Reliable Communication

  33. OptIPuter Transport Protocols Composite Endpoint Protocol (Efficient N-to-M Communication) E2e Path Allocated Lambda Shared, Routed Managed Group Standard Routers Enhanced Routers Unicast RBUDP/ l-stream SABUL/ UDT GTP XCP

  34. Composite Endpoint Protocol (CEP) Eric Weigle and Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter All Hands Meeting, January 2005

  35. Composite-EndPoint Protocol (CEP) • Network Transfers Faster than Individual Machines • A Terabit flow? A 100Gbit flow? A 10Gbps flow w/ 1Gbps NIC’s • Clusters are Cost-effective means to terminate Fast transfers • Support Flexible, Robust, General N-to-M Communication • Manage Heterogeneity, Multiple Transfers, Data Accessibility Uh-oh! [Weigle & Chien, UCSD]

  36. Example • Move Data from a Heterogeneous Storage Cluster (N) • Exploit Heterogeneous network structure and Dedicated Lambda’s • Terminate in a Visualization Cluster (M) • Render for a Tiled Display Wall (M) • Data flow is not easy for the application to handle. • May want to locally to the storage cluster to offload checksum/buffering requirements or avoid a contested link.

  37. Composite Endpoint Approach • Transfers Move Distributed Data • Provides hybrid memory/file namespace for any transfer request • Choose Dynamic Subset of Nodes to Transfer Data • Performance Management for Heterogeneity, Dynamic Properties Integrated with Fairness • API and Scheduling • API enables easy use • Scheduler handles performance, fairness, adaptation • Exploit Many Transport Protocols

  38. CEP Efficiently Composes Heterogenous and Homogeneous Cluster Nodes • Seamless Composition of Performance, widely varying node performance • High Composition efficiency, demonstrated 32Gbps from 1Gbps nodes! • Efficiency increasing as implementation improves • Scaling suggests 1000 node Composites => Terabit Flows • Next Steps: Wide Area, Dynamic Network Performance

  39. Summary and Year 3 Plans • Current Scheduling Mechanism is Static • Selects nodes to move data • Handles static heterogeneity • node/link capabilities • 32Gbps in LAN • Simple API Specification • Ease of use; scheduler takes care of transfer • Allows Scatter/Gather with arbitrary constraints on data • Plans: 1H2005 • XIO implementation: Use GTP, TCP, other transports • Tuned WAN Performance • Dynamic Transfer Scheduling (adapt to network and node conditions) • Plans: 2H2005 • Security, code stabilization, optimization • Initial Public Release • 5-layer Demo Participation • Better Dynamic Scheduling • De-centralization • Fault Tolerance

  40. LambdaStream Chaoyue Xiong, Eric He, Venkatram Vishwanath, Jason Leigh, Luc Renambot, Tadao Murata, Thomas A. DeFanti January 2005 OptIPuter All Hands Meeting

  41. LambdaStream (Xiong) Applications Need High BW with low jitter Idea • Combine loss-based and rate-based techniques • Loss type prediction, respond appropriately • => Good BW and Low Jitter

  42. Loss Type Prediction When packet loss occurs, Average receiving interval = • Loss Types: • Continuous decrease in receiving capability • Occurrence of congestion in the link • Sudden decrease in receiving capability or random loss

  43. Incipient undesirable situations avoidance (1) • When there is no loss, longer receiving packet interval indicates link congestion or lower receiving capability. Sender Bottleneck router Receiver ∆ts wi wi+1 ∆tr

  44. Incipient undesirable situations avoidance (2) • Metric: • Ratio between the sending interval and the average receiving interval during one epoch. • Methods to improve precision • Use weighted addition of receiving intervals from the previous three epochs. • Exclude unusual samples.

  45. Single Stream Experiment Result (1)

  46. Single Stream Experiment Result (2)

  47. Year 3 Plans • Development of XIO driver • Experiments with multiple streams • Integrate with TeraVision and SAGE. • Use formal modeling (Petri Net) to improve the scalability of the algorithm.

  48. Information Sciences Institute • Joe Bannister • Aaron Falk • Jim Pepin • Joe Touch OptIPuter Project Progress January 18, 2005

  49. Design of Linux XCP port Net100 tweaks Makes most sense for end-systems only; little benefit by changing OS for XCP routers Strategy is to put XCP in generic Linux 2.6 kernel; then port to Net100 (Net100 optimizations are largely orthogonal to XCP) Technical challenges exist in extending Linux kernel to handle 64-bit arithmetic needed for XCP Linux port is pending conclusion of on-going design work to eliminate line-rate divide operations from router OptIPuter XCP Progress [Bannister, Falk, Pepin, Touch ISI]

  50. Workshops Aaron Falk, Ted Faber, Eric Coe, Aman Kapoor, and Bob Braden. Experimental Measurements of the eXplicit Control Protocol. Second Annual Workshop on Protocols for Fast Long Distance Networks. February 16, 2004. http://www.isi.edu/isi-xcp/docs/falk-pfld04-slides-2-16-04.pdf Aaron Falk. NASA Optical Network Testbeds Workshop. August 9-11, 2004, NASA Ames Research Center. User Application Requirements, Including End-to-end Issues. http://duster.nren.nasa.gov/workshop7/report.html Papers Aaron Falk and Dina Katabi. Specification for the Explicit Control Protocol (XCP), draft-falk-xcp-00.txt (work in progress), October 2004. http://www.isi.edu/isi-xcp/docs/draft-falk-xcp-spec-00.txt Aman Kapoor, Aaron Falk, Ted Faber, and Yuri Pryadkin. Achieving Faster Access to Satellite Link Bandwidth. Submitted to Global Internet 2005). December 2004. http://www.isi.edu/isi-xcp/docs/kapoor-pep-gi2005.pdf OptIPuter XCP Activities

More Related