200 likes | 311 Views
Flexible and Efficient Control of Data Transfers for Loosely Coupled Components. Joe Shang-Chieh Wu http://meou.us Department of Computer Science University of Maryland, USA. Obtain more accurate results by coupling existing (parallel) physical simulation components
E N D
Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu http://meou.us Department of Computer Science University of Maryland, USA
Obtain more accurate results by coupling existing (parallel) physical simulation components Different time and space scales for data produced in shared or overlapped regions Runtime decisions for which time-stamped data objects should be exchanged Performance might be a concern What & How
Approximate Match [Grid 2004] Collective Buffering [IPDPS 2007] Distributed App Match + Eager Transfer [under submission] Conclusion Roadmap
Separate matching (coupling) information from the participating components Maintainability – Components can be developed/upgraded individually Flexibility – Change participants/components easily Functionality – Support variable-sized time interval numerical algorithms or visualizations Matching is OUTSIDE components
Arrays are distributed among multiple processes T=4 T=3 T=2 T=1 Basic Operation Importer component Exporter component Exported Distributed Array Imported Distributed Array Distributed Array Transfer Library Approximate Match Request Array for T = 2.5 Matched Array for T = 3 Runtime-based Approximate Match Library
Source Sink Precision Policy Find t’ in App0, s.t. (a) t <= t’ <= t + 0.5 (b) minimize t’ – t Separate codes from matching Connection-Wise Approximate Match Configuration file Exporter App0 Importer App1
Execution time is composed of Computation time (Tcomp) Buffering time (Tbuf) Matched data transfer time (Ttran) Tbuf matters when exporter components (data sources) run more slowly Ttran matters when import components (data sinks) run more slowly Dissection of Execution Time
Fastest export process sends runtimematch results to slower processes in the same program Unnecessary memory copies can be avoided in slower processes Optimal State: only required exported data are buffered Collective Buffering (when exporters run more slowly)
Optimal State Collective Buffering Result Copy All Copy Some Only Copy Required Data Exporting Time for the Slowest Process
Bandwidth and Latency both contribute matched data transfer time Eager transfer, transferring predicted data in advance, solves bandwidth issue Distributed approximate match, running on both exporter and importer, solves latency issue Eager Transfer + Distributed Match(when importer runs more slowly)
Original ET Only ET+DM
Runtime-based approximate match is a solution to couple different time scale components Performance can be improved When exporter runs more slowly, avoid unnecessary memory copies When importer runs more slowly, transfer predicted data and meta-data in advance Conclusion
Arrays are distributed among multiple processes T=4 T=3 T=2 T=1 Basic Operation Importer component Exporter component Exported Distributed Array Imported Distributed Array Distributed Array Transfer Library Approximate Match Request Array for T = 2.5 Matched Array for T = 3 Runtime-based Approximate Match Library
On-Demand Approach • Import Component Makes Request • Perform Approx Match on Export Component, and then Transfer Matched Data • Need Data Transfer Time (T3 – T2) and 2 one-way delays (T2 – T1)
Eager Transfer Only • Get permission to push predicted data • Transfer predicted data in advance • Import component makes request • Perform approx match on export component • Need 2 one-way delays ( T16 – T15)
Eager Transfer With Distributed Match • … • Transfer predicted data + meta-data in advance • Import component makes request becomes local operations • Local operation time T26 – T25 is needed, independent to one-way delay
<importer request, exporter matched, desired precision> = <x, f(x), p> LUB minimum f(x) with f(x) ≥ x GLB maximum f(x) with f(x) ≤ x REG f(x) minimizes |f(x)-x| with |f(x)-x| ≤ p REGU f(x) minimizes f(x)-x with 0 ≤ f(x)-x ≤ p REGL f(x) minimizes x-f(x) with 0 ≤ x-f(x) ≤ p FASTR any f(x) with |f(x)-x| ≤ p FASTU any f(x) with 0 ≤ f(x)-x ≤ p FASTL any f(x) with 0 ≤ x-f(x) ≤ p Supported matching policies