Flexible and Efficient Control of Data Transfers for Loosely Coupled Components

Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu http://meou.us Department of Computer Science University of Maryland, USA

Obtain more accurate results by coupling existing (parallel) physical simulation components Different time and space scales for data produced in shared or overlapped regions Runtime decisions for which time-stamped data objects should be exchanged Performance might be a concern What & How

Approximate Match [Grid 2004] Collective Buffering [IPDPS 2007] Distributed App Match + Eager Transfer [under submission] Conclusion Roadmap

Separate matching (coupling) information from the participating components Maintainability – Components can be developed/upgraded individually Flexibility – Change participants/components easily Functionality – Support variable-sized time interval numerical algorithms or visualizations Matching is OUTSIDE components

Arrays are distributed among multiple processes T=4 T=3 T=2 T=1 Basic Operation Importer component Exporter component Exported Distributed Array Imported Distributed Array Distributed Array Transfer Library Approximate Match Request Array for T = 2.5 Matched Array for T = 3 Runtime-based Approximate Match Library

Source Sink Precision Policy Find t’ in App0, s.t. (a) t <= t’ <= t + 0.5 (b) minimize t’ – t Separate codes from matching Connection-Wise Approximate Match Configuration file Exporter App0 Importer App1

Execution time is composed of Computation time (Tcomp) Buffering time (Tbuf) Matched data transfer time (Ttran) Tbuf matters when exporter components (data sources) run more slowly Ttran matters when import components (data sinks) run more slowly Dissection of Execution Time

Fastest export process sends runtimematch results to slower processes in the same program Unnecessary memory copies can be avoided in slower processes Optimal State: only required exported data are buffered Collective Buffering (when exporters run more slowly)

Optimal State Collective Buffering Result Copy All Copy Some Only Copy Required Data Exporting Time for the Slowest Process

Bandwidth and Latency both contribute matched data transfer time Eager transfer, transferring predicted data in advance, solves bandwidth issue Distributed approximate match, running on both exporter and importer, solves latency issue Eager Transfer + Distributed Match(when importer runs more slowly)

Original ET Only ET+DM

Runtime-based approximate match is a solution to couple different time scale components Performance can be improved When exporter runs more slowly, avoid unnecessary memory copies When importer runs more slowly, transfer predicted data and meta-data in advance Conclusion

The End

Questions ?(http://meou.us)

Arrays are distributed among multiple processes T=4 T=3 T=2 T=1 Basic Operation Importer component Exporter component Exported Distributed Array Imported Distributed Array Distributed Array Transfer Library Approximate Match Request Array for T = 2.5 Matched Array for T = 3 Runtime-based Approximate Match Library

On-Demand Approach • Import Component Makes Request • Perform Approx Match on Export Component, and then Transfer Matched Data • Need Data Transfer Time (T3 – T2) and 2 one-way delays (T2 – T1)

Eager Transfer Only • Get permission to push predicted data • Transfer predicted data in advance • Import component makes request • Perform approx match on export component • Need 2 one-way delays ( T16 – T15)

Eager Transfer With Distributed Match • … • Transfer predicted data + meta-data in advance • Import component makes request becomes local operations • Local operation time T26 – T25 is needed, independent to one-way delay

All Together

<importer request, exporter matched, desired precision> = <x, f(x), p> LUB minimum f(x) with f(x) ≥ x GLB maximum f(x) with f(x) ≤ x REG f(x) minimizes |f(x)-x| with |f(x)-x| ≤ p REGU f(x) minimizes f(x)-x with 0 ≤ f(x)-x ≤ p REGL f(x) minimizes x-f(x) with 0 ≤ x-f(x) ≤ p FASTR any f(x) with |f(x)-x| ≤ p FASTU any f(x) with 0 ≤ f(x)-x ≤ p FASTL any f(x) with 0 ≤ x-f(x) ≤ p Supported matching policies

Flexible and Efficient Control of Data Transfers for Loosely Coupled Components

Flexible and Efficient Control of Data Transfers for Loosely Coupled Components

Presentation Transcript

Toward Loosely Coupled Programming on Petascale Systems

The Chubby lock service for loosely-coupled distributed systems

ROMA: Reliable Overlay Multicast with Loosely Coupled TCP Connections

Interfaces for Control Components

The Chubby Lock Service for Loosely-coupled Distributed Systems

A Loosely Coupled Ocean-Atmosphere Ensemble Assimilation System.

Leveraging W3C Linked Data, OSLC, and Open Source for Loosely Coupled Application Integrations

Presentación: “Loosely Coupled Traceability for ATL” Frederic Jouault 2005

Techniques for Monitoring Large Loosely-coupled Cluster Jobs

Learning activities loosely coupled with Sakai @ UCT

An Open and Shut Case for Flexible Components

The Chubby Lock Service for Loosely-coupled Distributed Systems

Efficient Survivable Provisioning for Bulk Data Transfers in Grid Networks

Flexible Control of Data Transfer between Parallel Programs

Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Loosely coupled OPC client used to animate GIS

Loosely Coupled Sakai

Late Typing for Loosely Coupled Recursion

Flexible Arithmetic Components for Area-Efficient Fault Tolerance

Maintaining XPath Views in Loosely Coupled Systems

Loosely Coupled Parallelism: Clusters

Security for Data Transfers