670 likes | 680 Views
This book explores the challenges of parallel programming and proposes a pattern language and framework approach to address them. It covers common approaches that do not work and provides high-level examples for composing patterns. The book emphasizes the need for architecture in software development and its impact on programming as a whole.
E N D
Architecting Parallel SoftwarewithPatterns Kurt Keutzer, EECS, Berkeley Tim Mattson, Intel and the PALLAS team: Bryan Catanzaro, Jike Chong, Matt Moskewicz, Michael Murphy, NR Satish, Bor-Yiing Su, Naryanan Sundaram, Youngmin Yi
Executive Summary • Our challenge in parallelizing applications really reflects a deeper more pervasive problem about inability to develop software in general • Corollary: Any highly-impactful solution to parallel programming should have significant impact on programming as a whole • Software must be architected to achieve productivity, efficiency, and correctness • SW architecture >> programming environments • >> programming languages • >> compilers and debuggers • (>>hardware architecture) • Key to architecture (software or otherwise) is design patterns and a pattern language • The desired pattern language should span the full range of design from application conceptualization to detailed software implementation • Resulting software design then uses a hierarchy of software frameworks for implementation • Application frameworks for application (e.g. CAD) developers • Programming frameworks for those who build the application frameworks
Outline • What doesn’t work • Common approaches to approaching parallel programming will not work • The scope and nature of the endeavor • Challenges in parallel programming are symptoms, not root causes • We need a full pattern language • Patterns and frameworks • Frameworks will be a (the?) primary medium of software development for application developers • Detail on Structural Patterns • Detail on Computational Patterns • High-level examples of composing patterns
Assumption #1: How not to develop parallel code Initial Code Re-code with more threads Profiler Not fast enough Performance profile Fast enough Lots of failures N PE’s slower than 1 Ship it 4
Steiner Tree Construction Time By Routing Each Net in Parallel
Assumption #2: This won’t help either Code in new cool language Re-code with cool language Profiler Not fast enough Performance profile Fast enough After 200 parallel languages where’s the light at the end of the tunnel? Ship it 6
ABCPL ACE ACT++ Active messages Adl Adsmith ADDAP AFAPI ALWAN AM AMDC AppLeS Amoeba ARTS Athapascan-0b Aurora Automap bb_threads Blaze BSP BlockComm C*. "C* in C C** CarlOS Cashmere C4 CC++ Chu Charlotte Charm Charm++ Cid Cilk CM-Fortran Converse Code COOL CORRELATE CPS CRL CSP Cthreads CUMULVS DAGGER DAPPLE Data Parallel C DC++ DCE++ DDD DICE. DIPC DOLIB DOME DOSMOS. DRL DSM-Threads Ease . ECO Eiffel Eilean Emerald EPL Excalibur Express Falcon Filaments FM FLASH The FORCE Fork Fortran-M FX GA GAMMA Glenda GLU GUARD HAsL. Haskell HPC++ JAVAR. HORUS HPC IMPACT ISIS. JAVAR JADE Java RMI javaPG JavaSpace JIDL Joyce Khoros Karma KOAN/Fortran-S LAM Lilac Linda JADA WWWinda ISETL-Linda ParLin Eilean P4-Linda Glenda POSYBL Objective-Linda LiPS Locust Lparx Lucid Maisie Manifold Mentat Legion Meta Chaos Midway Millipede CparPar Mirage MpC MOSIX Modula-P Modula-2* Multipol MPI MPC++ Munin Nano-Threads NESL NetClasses++ Nexus Nimrod NOW Objective Linda Occam Omega OpenMP Orca OOF90 P++ P3L p4-Linda Pablo PADE PADRE Panda Papers AFAPI. Para++ Paradigm Parafrase2 Paralation Parallel-C++ Parallaxis ParC ParLib++ ParLin Parmacs Parti pC pC++ PCN PCP: PH PEACE PCU PET PETSc PENNY Phosphorus POET. Polaris POOMA POOL-T PRESTO P-RIO Prospero Proteus QPC++ PVM PSI PSDM Quake Quark Quick Threads Sage++ SCANDAL SAM pC++ SCHEDULE SciTL POET SDDA. SHMEM SIMPLE Sina SISAL. distributed smalltalk SMI. SONiC Split-C. SR Sthreads Strand. SUIF. Synergy Telegrphos SuperPascal TCGMSG. Threads.h++. TreadMarks TRAPPER uC++ UNITY UC V ViC* Visifold V-NUS VPE Win32 threads WinPar WWWinda XENOOPS XPC Zounds ZPL Parallel Programming environments in the 90’s
Assumption #3: Nor this Initial Code Tune compiler Super-compiler Not fast enough Performance profile Fast enough 30 years of HPC research don’t offer much hope Ship it 8
Automatic parallelization? • Aggressive techniques such as speculative multithreading help, but they are not enough. • Ave SPECint speedup of 8% … will climb to ave. of 15% once their system is fully enabled. • There are no indications auto par. will radically improve any time soon. • Hence, I do not believe Auto-par will solve our problems. Results for a simulated dual core platform configured as a main core and a core for speculative execution. A Cost-Driven Compilation Framework for Speculative Parallelization of Sequential Programs, Zhao-Hui Du, Chu-Cheow Lim, Xiao-Feng Li, Chen Yang, Qingyu Zhao, Tin-Fook Ngai (Intel Corporation) in PLDI 2004
Outline • What doesn’t work • Common approaches to parallel programming will not work • The scope and nature of the endeavor • Challenges in parallel programming are symptoms, not root causes • We need a full pattern language • Patterns and frameworks • Frameworks will be a (the?) primary medium of software development for application developers • Detail on Structural Patterns • Detail on Computational Patterns • High-level examples of composing patterns
Reinvention of design? • In 1418 the Santa Maria del Fiore stood without a dome. • Brunelleschi won the competition to finish the dome. • Construction of the dome without the support of flying buttresses seemed unthinkable.
Innovation in architecture • After studying earlier Roman and Greek architecture, Brunelleschi drew on diverse architectural styles to arrive at a dome design that could stand independently http://www.templejc.edu/dept/Art/ASmith/ARTS1304/Joe1/ZoomSlide0010.html
Innovation in tools • Scaffolding for cupola • His construction of the dome design required the development of new tools for construction, as well as an early (the first?) use of architectural drawings (now lost). Mechanism for raising materials http://www.artist-biography.info/gallery/filippo_brunelleschi/67/
Innovation in use of building materials • Herringbone pattern bricks • His construction of the dome design also required innovative use of building materials. http://www.buildingstonemagazine.com/winter-06/art/dome8.jpg
Resulting Dome • Completed dome http://www.duomofirenze.it/storia/cupola_eng.htm
The point? • Challenges to design and build the dome of Santa Maria del Fiore showed underlying weaknesses of architectural understanding, tools, and use of materials • By analogy, parallelizing code should not have thrown us for such a loop. Our difficulties in facing the challenge of developing parallel software are a symptom of underlying weakness is in our abilities to: • Architect software • Develop robust tools and frameworks • Re-use implementation approaches • Time for a serious rethink of all of software design
What I’ve learned (the hard way) Software must be architected to achieve productivity, efficiency, and correctness SW architecture >> programming environments >> programming languages >> compilers and debuggers (>>hardware architecture) Discussions with superprogrammers taught me: Give me the right program structure/architecture I can use any programming language Give me the wrong architecture and I’ll never get there What I’ve learned when I had to teach this stuff at Berkeley: Key to architecture (software or otherwise) is design patterns and a pattern language Resulting software design then uses a hierarchy of software frameworks for implementation Application frameworks for application (e.g. CAD) developers Programming frameworks for those who build the application frameworks
Outline • What doesn’t work • Common approaches to parallel programming will not work • The scope and nature of the endeavor • Challenges in parallel programming are symptoms, not root causes • We need a full pattern language • Patterns and frameworks • Frameworks will be a (the?) primary medium of software development for application developers • Detail on Structural Patterns • Detail on Computational Patterns • High-level examples of composing patterns
Alexander’s Pattern Language Christopher Alexander’s approach to (civil) architecture: "Each pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice.“ Page x, A Pattern Language, Christopher Alexander Alexander’s 253 (civil) architectural patterns range from the creation of cities (2. distribution of towns) to particular building problems (232. roof cap) A pattern language is an organized way of tackling an architectural problem using patterns Main limitation: It’s about civil not software architecture!!!
Alexander’s Pattern Language (95-103) • Layout the overall arrangement of a group of buildings: the height and number of these buildings, the entrances to the site, main parking areas, and lines of movement through the complex. • 95. Building Complex • 96. Number of Stories • 97. Shielded Parking • 98. Circulation Realms • 99. Main Building • 100. Pedestrian Street • 101. Building Thoroughfare • 102. Family of Entrances • 103. Small Parking Lots
Family of Entrances (102) • May be part of Circulation Realms (98). • Conflict: • When a person arrives in a complex of offices or services or workshops, or in a group of related houses, there is a good chance he will experience confusion unless the whole collection is laid out before him, so that he can see the entrance of the place where he is going. Resolution: Lay out the entrances to form a family. This means: • 1) They form a group, are visible together, and each is visible from all the others. • 2) They are all broadly similar, for instance all porches, or all gates in a wall, or all marked by a similar kind of doorway. • May contain Main Entrance (110), Entrance Transition (112), Entrance Room (130), Reception Welcomes You (149).
Family of Entrances http://www.intbau.org/Images/Steele/Badran5a.jpg
Computational Patterns The Dwarfs from “The Berkeley View” (Asanovic et al.) Dwarfs form our key computational patterns
Patterns for Parallel Programming • PLPP is the first attempt to develop a complete patternlanguage for parallel software development. • PLPP is a great model for a pattern language for parallel software • PLPP mined scientific applications that utilize a monolithic application style • PLPP doesn’t help us much with horizontal composition • Much more useful to us than: Design Patterns: Elements of Reusable Object-Oriented Software, Gamma, Helm, Johnson & Vlissides, Addison-Wesley, 1995.
Structural programming patterns • In order to create more complex software it is necessary to compose programming patterns • For this purpose, it has been useful to induct a set of patterns known as “architectural styles” • Examples: • pipe and filter • event based/event driven • layered • Agent and repository/blackboard • process control • Model-view-controller
Our Pattern Language 2.0 Applications Choose you high level architecture? Guided decomposition Identify the key computational patterns – what are my key computations?Guided instantiation Choose your high level structure – what is the structure of my application? Guided expansion Task Decomposition ↔ Data Decomposition Group Tasks Order groups data sharing data access Graph Algorithms Dynamic Programming Dense Linear Algebra Sparse Linear Algebra Unstructured Grids Structured Grids Graphical models Finite state machines Backtrack Branch and Bound N-Body methods Circuits Spectral Methods Model-view controller Iteration Map reduce Layered systems Arbitrary Static Task Graph Pipe-and-filter Agent and Repository Process Control Event based, implicit invocation Productivity Layer Refine the structure - what concurrent approach do I use? Guided re-organization Digital Circuits Task Parallelism Graph algorithms Event Based Divide and Conquer Data Parallelism Geometric Decomposition Pipeline Discrete Event Utilize Supporting Structures – how do I implement my concurrency? Guided mapping Master/worker Loop Parallelism BSP Shared Queue Shared Hash Table SPMD Fork/Join CSP Distributed Array Shared Data Efficiency Layer Implementation methods – what are the building blocks of parallel programming? Guided implementation Semaphores Thread Creation/destruction Process Creation/destruction Message passing Collective communication Speculation Transactional memory Barriers Mutex
Our Pattern Language 2.0 Applications Choose you high level architecture? Guided decomposition Identify the key computational patterns – what are my key computations?Guided instantiation Choose your high level structure – what is the structure of my application? Guided expansion Task Decomposition ↔ Data Decomposition Group Tasks Order groups data sharing data access Berkeley View 13 dwarfs Garlan and Shaw Architectural Styles Graph Algorithms Dynamic Programming Dense Linear Algebra Sparse Linear Algebra Unstructured Grids Structured Grids Graphical models Finite state machines Backtrack Branch and Bound N-Body methods Circuits Spectral Methods Model-view controller Iteration Map reduce Layered systems Arbitrary Static Task Graph Pipe-and-filter Agent and Repository Process Control Event based, implicit invocation Productivity Layer Refine the structure - what concurrent approach do I use? Guided re-organization Digital Circuits Task Parallelism Graph algorithms Event Based Divide and Conquer Data Parallelism Geometric Decomposition Pipeline Discrete Event Utilize Supporting Structures – how do I implement my concurrency? Guided mapping Master/worker Loop Parallelism BSP Shared Queue Shared Hash Table SPMD Fork/Join CSP Distributed Array Shared Data Efficiency Layer Implementation methods – what are the building blocks of parallel programming? Guided implementation Semaphores Thread Creation/destruction Process Creation/destruction Message passing Collective communication Speculation Transactional memory Barriers Mutex
Outline • What doesn’t work • Common approaches to parallel programming will not work • The scope and nature of the endeavor • Challenges in parallel programming are symptoms, not root causes • We need a full pattern language • Patterns and frameworks • Frameworks will be a (the?) primary medium of software development for application developers • Detail on Structural Patterns • Detail on Computational Patterns • High-level examples of composing patterns
The future of parallel programming is not training Joe Coder to learn to write parallel programs The future of parallel programming is getting domain experts to use parallel application frameworks (without them ever knowing it) Louis Pasteur Inventor of Pasteurization Assumption #4
Patterns and Frameworks Developer Application Framework Developer Programming Framework Developer Framework Developer End User Target application Application Patterns Frameworks Application Patterns Application Framework Pattern Language Programming Framework ProgrammingPatterns SW Infrastructure Computation & Communication Framework Computation & Communication Patterns Platform HW target Today’s talk focuses only on the middle of the pattern language Hardware Architect
Example: Content-Based Image Retrieval Application Large number of untagged photos Want quick retrieval based on image query 34
Example:CBIR Layers Cellphone User Personal Search Application Patterns Frameworks Face Search Developer Feature Extraction& Classifier Application Patterns CBIR Application Framework Application Framework Developer Map Reduce Programming Framework SW Infrastructure Map Reduce ProgrammingPattern Pattern Language Map Reduce Programming Framework Developer CUDA Computation & Communication Framework Barrier/Reduction Computation & Communication Patterns CUDA Framework Developer Platform GeForce 8800 Hardware Architect
Eventually End-user, application programs 1 Application patterns & frameworks + Domain Experts 2 Parallel patterns & programming frameworks Application frameworks + Domain literate programming gurus (1% of the population). Parallel programming frameworks 3 Parallel programming gurus (1-10% of programmers) The hope is for Domain Experts to create parallel code with little or no understanding of parallel programming. Leave hardcore “bare metal” efficiency layer programming to the parallel programming experts
Today End-user, application programs 1 Application patterns & frameworks + Domain Experts Parallel patterns & programming frameworks 2 Application frameworks + Domain literate programming gurus (1% of the population). Parallel programming frameworks 3 Parallel programming gurus (1-10% of programmers) • For the foreseeable future, domain experts, application framework builders, and parallel programming gurus will all need to learn the entire stack. • That’s why you all need to be here today!
Definitions - 1 • Design Patterns: “Each design pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice.“ Page x, A Pattern Language, Christopher Alexander • Structural patterns: design patterns that provide solutions to problems associated with the development of program structure • Computational patterns: design patterns that provide solutions to recurrent computational problems
Definitions - 2 • Library: The software implementation of a computational pattern (e.g. BLAS) or a particular sub-problem (e.g. matrix multiply) • Software architecture: A composition of structural and compositional patterns • Framework: An extensible software environment (e.g. Ruby on Rails) organized around a software architecture (e.g. model-view-controller) that allows for programmer customization only in harmony with the software architecture • Domain specific language: A programming language (e.g. Matlab) that provides language constructs that particularly support a particular application domain. The language may also supply library support for common computations in that domain (e.g. BLAS). If the language is restricted to maintain fidelity to a one or more structural patterns and provides library support for common computations then it encompasses a framework (e.g. NPClick).
Architecting Parallel Software • Decompose Data • Identify data sharing • Identify data access • Decompose Tasks • Group tasks • Order Tasks Identify the Key Computations Identify the Software Structure
Identify the SW Structure Structural Patterns • Pipe-and-Filter • Agent-and-Repository • Event-based coordination • Iterator • MapReduce • Process Control • Layered Systems These define the structure of our software but they do not describe what is computed
Identify Key Computations Computational Patterns • Computational patterns describe the key computations but not how they are implemented
Architecting Parallel Software Decompose Tasks/Data Order tasks Identify Data Sharing and Access Identify the Key Computations Identify the Software Structure • Graph Algorithms • Dynamic programming • Dense/Spare Linear Algebra • (Un)Structured Grids • Graphical Models • Finite State Machines • Backtrack Branch-and-Bound • N-Body Methods • Circuits • Spectral Methods • Pipe-and-Filter • Agent-and-Repository • Event-based • Bulk Synchronous • MapReduce • Layered Systems • Arbitrary Task Graphs
Analogy: Architected Factory Raises appropriate issues like scheduling, latency, throughput, workflow, resource management, capacity etc.
Outline • What doesn’t work • Common approaches to parallel programming will not work • The scope and nature of the endeavor • Challenges in parallel programming are symptoms, not root causes • We need a full pattern language • Patterns and frameworks • Frameworks will be a (the?) primary medium of software development for application developers • Detail on Structural Patterns • Detail on Computational Patterns • High-level examples of composing patterns
Architecting Parallel Software • Decompose Tasks • Group tasks • Order Tasks • Decompose Data • Data sharing • Data access Identify the Key Computations Identify the Software Structure
Structure of Logic Optimization netlist Scan Netlist • Highest level structure is the • pipe-and-filter pattern • (just because it’s obvious doesn’t mean it’s not worth stating!) Build Data model Optimize circuit netlist