190 likes | 314 Views
Simulation of Streaming Applications on Multicore Systems. Saurabh Gayen, Mark Franklin (PI), Eric J. Tyson, Roger D. Chamberlain Storage-Based Supercomputing Group Dept. of Computer Science and Engineering Washington University in St. Louis
E N D
Simulation of Streaming Applications on Multicore Systems Saurabh Gayen, Mark Franklin (PI), Eric J. Tyson, Roger D. Chamberlain Storage-Based Supercomputing Group Dept. of Computer Science and Engineering Washington University in St. Louis Supported by Nat’l Science Foundation grant CCF-0427794
FPGA Network Proc Network Proc FPGA FPGA Problem domain • High-performance streaming applications • Large streams of high-throughput data • Networking and communications • Scientific computing (offline AND online) • Media creation and playback • Data mining (e.g., bioinformatics, security) • Hard to develop applications on multicore systems • Complex programming model (e.g., synchronization) • Other platforms can provide speedups (FPGA, DSP, NP) • Devices are becoming more interconnected • Hard to simulate • Hard to debug • Hard to deploy
Overview • Auto-Pipe and the X Language • X-Sim: Federated System Simulator • Example applications • Status and future work
NP CPU CPU PCI PCI What is Auto-Pipe? Auto-Pipe is made for… • Complex heterogeneous systems Auto-Pipe is… • a set of tools used to create, test, build and deploy, and optimize distributed applications FPGA CPU CPU • Time and/or resource-constrained applications • Partitioned, parallel algorithms
CPU FPGA A D E C B CPU CPU The X Language X language files are composed of: • An algorithm description • Made of blocks and edges • A processing architecture • Made of computation and interconnect resources • A mapping of algorithm to architecture
Overview • Auto-Pipe and the X Language • X-Sim: Federated System Simulator • Example applications • Status and future work
X-Sim: Federated Simulation FPGA proc[1] PCI sum half out Sh. Mem. • Platform-Specific Simulators proc[2] gen1 gen2 • Communication Link Models
out in testpoint avail in avail half store 0us 1us testpoint testpoint D D D D D D D T T T T T T T T T T T T T out out X-Sim Mechanism FPGA proc[1] PCI sum Sh. Mem. proc[2] gen1 gen2 Data file Timestamp file
Overview • Auto-Pipe and the X Language • X-Sim: Federated System Simulator • Example applications • Status and future work
1.93x 1.87x Example Application : test1
Example Application : VERITAS Astrophysics • Gamma-ray event parameterization • Active sources: galactic nuclei, pulsars • Transient sources: hypernovae, ... • Lots of data: 20TB/year • Want to process as fast as possible • Process whole DB for rare events
VERITAS algorithm Pipe[i]
proc [1] proc [1] FFT Front Front LowPass Back Back IFFT proc [2] proc [2] map2a : Vertical Partition map2b : Horizontal Partition 2-Processor Mappings
proc [1] proc [1] FFT Front Front LowPass proc [2] Back Back proc [3] IFFT proc [2] proc [3] Horizontal Partition map3b : Vertical Partition map3a : 3-Processor Mappings
1x 1.83x 1.73x 2.07x 2.74x 2 and 3 Processor Results • VERITAS Configured with 6 Pipes
1x 1.94x 2.81x 3.79x 6.84x 11.75x … SMP Performance Scaling • VERITAS Configured with 16 Pipes
Overview • Auto-Pipe and the X Language • X-Sim: Federated System Simulator • Example applications • Status and future work
Status and Future Work • Currently • X-Sim is operational • What’s next • Develop library of validated communication models • Future directions • Develop X-Opt, an automated performance optimization tool
Acknowledgements • Storage based supercomputing group • Michela Becchi Justin Brown Jim Buckley • Jeremy Buhler Roger Chamberlain Patrick Crowley • Mark Franklin (PI) Narayan Ganesan Gregory Galloway • Saurabh Gayen Eric Tyson • Gamma Ray application: Jim Buckley / VERITAS collab. • National Science Foundation CCF-0427794