140 likes | 295 Views
HPC User Forum 2012 Panel on Potential Disruptive Technologies Emerging Parallel Programming Approaches. Guang R. Gao Founder ET International Inc Newark, Delaware USA ggao@etinternational.com. Who is ETI ?. From “Cool Vendors” Report – By Gartner ( April 17,2012 ): [
E N D
HPC User Forum 2012 Panel on Potential Disruptive TechnologiesEmerging Parallel Programming Approaches Guang R. Gao Founder ET International Inc Newark, Delaware USA ggao@etinternational.com
Who is ETI ? From “Cool Vendors” Report – By Gartner (April 17,2012): [ ET International Newark, Delaware (www.etinternational.com) Analysis by Carl Claunch Why Cool: ET International delivers its dataflow-oriented ETI Swarm environment for garnering high efficiency from highly parallel software, based on the alternative ParalleX execution model. As highly parallel execution becomes essential to addressing the more substantial computing tasks that HPC users face today, progress is increasingly being stymied by the application's inability to keep all the parallel strands working productively. …]
Motivation • Many-core is coming • Current paradigms don't have the expressive power to harness concurrency • Hardware is getting more heterogeneous • Current hybrid programming techniques (OpenMP+MPI+OpenCL) are not maintainable: too complicated • Caches are disappearing or becoming non-coherent • Distributed memory is everywhere, and at different levels • Fine grained power management • Use what you need and turn off/down the rest • Failure is the norm • Resilience must be baked in the whole stack (application, compiler, runtime, hardware) • Increasing Application Computation/data Irregularity • Static scheduling can no longer properly load balance
ETI Vision • We need new “Execution Models”! • Leverage ETI’s deep and growing IP position based on 25+ years of applied R&D expertise and $20M+ in R&D software engineering and development • (e.g. extensive system software base for Cyclops, CELL, SCC, Intel Runnemede, Intel X86 based machines, Adapteva, etc) • Provide high-performance SWARM software solutions to our OEM’s, partners and direct customers • Advance SWARM solutions to address optimization opportunities driven by heterogeneous multi-/many- core processing including: • Big Compute (Private HPC Cloud)systems • Big Data HPC systems • HPC embedded appliances • etc
Execution Paradigm Comparisons MPI, OpenMP, OpenCL SWARM Time Time Active threads Waiting • Asynchronous Event-Driven Tasks • Dependencies • Resources • Active Messages • Control Migration • Communicating Sequential Processes • Bulk Synchronous • Message Passing
SWARM Execution Overview Enabled Tasks Tasks with Unsatisfied Dependencies Tasks enabled SWARM Dependencies satisfied Tasks mapped to resources Resources in Use CPU CPU CPU CPU Available Resources CPU CPU CPU GPU CPU Resources allocated CPU CPU CPU GPU GPU Resources released
FT-06-09-2011-Gao Case Studies of Fine-Gran Execution Models • Static Dataflow Model (1970s - ) • EARTH Model (1988 - ) • TNT Model and Cyclops-64 (2003 - ) • Codelet Model under Intel-led DARPA/UHPC
DARPA/Intel Runnemede Program ET International, Inc. 1000X Energy reduction Heterogeneous, Tightly-Coupled Simple Architecture System Management & Concurrency Assured Operation Event driven codelets Self-aware introspection Code and data motion CPU <10% overhead Checkpoint with Flash/CPM Security Through Sandboxing Resiliency Execution Model HW/SW Co-Design University of Illinois Interconnect Fabric Productivity Application Efficiency Data Movement Model-based Goal-oriented Self-morphing Heterogeneous & tapered Large local memory Memory Courtesy of The Intel DARPA UHPC Team 1000X energy reduction Overhauled DRAM mArch Resilient memory Our Collaborators
Barnes-HutSWARM vsOpenMP Barnes-Hut SWARM vs OpenMP Ideal SWARM OpenMP Barnes-Hut
SWARM/MPI Performance Comparison Consistent Speed-up from 2X to 14.5X
Cholesky Decomposition (SWARM vs MKL/ScaLAPACK) Cholesky Decomposition (SWARM vs MKL/ScaLAPACK
Summary and Acknowledgements • Summary (productivity observation) • N-Body: 1 man-day, 3X • G-500: 1 man-month, upto 14x • Cholesky: 2 man-week, 1.5x NOTE: the base is performance of optimized code • Acknowledgements • Our Sponsors • Our Collaborators and Colleagues • My Host • Others .
Cholesky Profiles SWARM OpenMP