380 likes | 539 Views
Universal Parallel Computing Research Center Making parallel programming synonymous with programming. Marc Snir. The Illinois UPCRC Team. Parallel Processing @ Illinois. UPCRC. Illiac. Gigascale System Research Center. Petascalecomputing. Parallel Software is Expensive.
E N D
Universal Parallel Computing Research CenterMaking parallel programming synonymous with programming Marc Snir
Parallel Processing @ Illinois UPCRC Illiac Gigascale System Research Center Petascalecomputing
Parallel Software is Expensive • More complex hardware – more dimensions to the problem of mapping applications to hardware • Good opportunity for subtle, hard to reproduce, bugs • Significantly more complex testing problem • Immature development environments • Lack of trained manpower
Client Parallelism – a New Game • Must enable all programmers to write good parallel code • Can afford sophisticated development environments • Must focus on client application domain • Must focus on new quality metrics
Our Approach • Simple parallelism • Focus on simple forms of parallelism • Trade some generality and performance for productivity • Power tools • Leverage and strengthen app development frameworks • Empower tools with specification, analysis and domain information • Mass market can afford sophisticated tools.
UPCRC Themes Applications, Patterns New applications & catalog of parallel patterns Programming Environment Multiscale programming Did the system do what you wanted? Translation Environment Multiscale compilation Correctness Execution Environment Virtual architecture to manage scalability & heterogeneity
Applications Applications, Patterns New applications & catalog of parallel patterns Programming Environment Multiscale programming Did the system do what you wanted? Translation Environment Multiscale compilation Correctness Execution Environment Virtual architecture to manage scalability & heterogeneity
The Game • Focus on applications that • Improve the quality of human-human and human-computer interactions • Are likely to run on clients, not servers • Can benefit from additional CPU power
Understanding Information & Humans • Need performance for better low-level communication (speech, vision) [client – bandwidth and latency] • Need performance for higher level understanding of information [server – information is shared] • Need performance for higher level understanding of humans (goals, emotions, cognitive state, situation) [client – information is private and contextual] • Can use client performance to reduce application development time and improve flexibility and user experience human information Find Create Understand Communicate Transform
Understanding Humans • Q: how do I get a taxi to the airport? • “Intelligent” answer: “There are five taxi companies in town; do you want me to call one of them?” • Speech recognition (client); shallow text analysis (server); search (server) • Real Intelligent answer: “You do not need a taxi; your host plans to take you to the airport.” • Speech recognition (client); deep text analysis and reasoning (client) • Client advantage: connectivity; privacy; resident user context • Server advantage: global information; resource sharing
Dynamic Virtual Environments • Grand Theft Auto IV • “Sandbox” World • Free Interaction(within game-space) • High Quality Graphics • Halo 3 • First-Person Shooter • Constrained Interaction • Photorealistic Graphics(much precomputation) • World of Warcraft • Social Internet World • Completely Unconstrained(can build & share things) • Lower Quality Graphics Multicore enables both flexibility and photorealism Dynamic, Flexible“Game” Graphics Precomputed, Rigid“Film” Graphics
Videogame Production • Costly • Expensive: $10M/title • Slow: 3+ years/title • Compromises • Precomputed visibility – restricts viewermobility and environment complexity • Precomputed lighting – restricts scene dynamics, user alterations • Precomputed motion – restricts movement to mocap data, rigging • Consequences • Significant development effort to achieve real-time rates • Dynamic social game-space quality lags that of solo/team shooter levels • Goal • Real-time ray tracing (seems doable with 32-64 cores)
Programming Environments Applications, Patterns New applications & catalog of parallel patterns Programming Environment Multiscale programming Translation Environment Did the system do what you wanted? Multiscale compilation Correctness Execution Environment Virtual architecture to manage scalability & heterogeneity
Multiscale Programming • Observation 1: Codes are written today at multiple scales: High level, domain specific languages (Mathematica); scripting languages (Python); program generators; frameworks; vanilla C++/C#/Java…; hand-tuned C, Fortran; etc. – different tradeoffs of generality, flexibility, programmer productivity and performance • One scale, one style does not fit all • Observation 2: Changing scale is expensive
Goal • Support multiple styles and multiple “scales” of programming in one integrated environment • Domain specific environments, (annotated) sequential code, explicit parallel code, tunable libraries • Hide parallelism, whenever possible (DSE) • Express parallelism using simple patterns (e.g., pipeline, master-slave), otherwise (DSE) • Enable easy refactoring of selected code components to “finer scale”, otherwise (deterministic parallelism) • Refactoring: Parallelize code without changing its semantics, with user in the loop • Reuse effort spent in parallelizing code (autotuning)
Deterministic Parallel Languages • Sequential semantics model: execution outcome identical to sequential execution • Facilitates debugging & testing; easy to understand • Parallel performance model: user “knows” what executes in parallel and can predict performance • Fits most (but not all) transformational programs • Does not handle reactive (event driven) concurrent programs
Disciplined Sharing Philosophy • Introduce the least amount of nondeterminism necessitated by the problem specification • Ensure that nondeterminism is explicit – always result from the use of nondeterministic synchronization operations • Require programs to be race-free: conflicting accesses to shared variables are synchronized • Detect all races and detect as early as possible • Compile-time detection • Run-time detection
Exceptions and Speculation • Speculation can be used to enforce sequential semantics for code that cannot be safely parallelized • Accurate for small contexts, conservative for large contexts • Conservative implementation of speculation may destroy parallel performance model • Our approach: speculation used to handle code tagged by user to be conflict free or have rare conflicts • Warning/exception raised if conflict occurs in code tagged conflict-free or occurs frequently in code tagged “conflict-rare”
“Annotations”: Parallel Control Different syntax, Same effect Implicit Explicit #pragma parallelforall-loop for-loop “I expect this loop to run in parallel; please let me know if I am wrong” (NOT “execute loop in parallel, no matter what”) weaker contracts: to run in parallel unless an exception occurs; to run most of the time in parallel • Directives preserve semantics and change performance model • Same outcome as sequential loop execution • Iterates are executed in parallel, if independent • Dependency causes (recoverable) exception • Need more than parallel loops (e.g., task graphs) • User (and/or run-time) should be able to provide more information for cheap dependence checking and performance enhancement
“Annotations”: Help with Race Prevention/Detection • Moves checks from run-time to compile-time, whenever possible • Helps users develop race-free code • Deterministic Parallel Java (Bocchino/V. Adve): Type system that enables compile-time verification of determinism. • Variables are partitioned into disjoint regions, according to type information • Read or write operations are constrained to a region • Compiler can prove accesses do not conflict
Parent Tree.Left Tree.Right Example: Updating a Tree class Tree{Parent} { region Left under Parent; regionRight under Parent; Tree{Left}left{Left}; Tree{Right} right{Right}; Data data{Parent}; void update() writes under Parent { … } } • Formally, have dynamic types (e.g. Left, Right,…) • No run-time type information needed
Example: Updating a Tree • class Tree{Parent} { • regionLeft under Parent; • regionRight under Parent; • Tree{Left} left{Left}; • Tree{Right} right{Right}; • Data data{Parent}; • void update() • writes under Parent { • data.compute(); • // writes under Tree.Left • spawn left.update(); • // writes under Tree.Right • spawn right.update() • } • } Parent Tree.Left Tree.Right Provably distinct for any Parent
“Annotations”: User Management of Locality • High-level programming languages provide limited hooks to handle temporal locality and no hooks to handle spatial locality and prefetch. • Annotations for race prevention can help: they restrict which variable may be touched by a thread during an epoch • Finer-grain and more dynamic information may be needed • Spectrum: Compiler guesses (static analysis); compiler+run-time finds out (inspector-executor); user hints
Shared Nothing Parallelism (Agha) • Appropriate for “programming in the large”, reactive code, and distributed computing • On-chip parallelism handled no differently than large-scale concurrency • Two-level specification: • Behavior of individual actors • Global constraints (scheduling, QoS, reliability, etc.)
Translation Applications, Patterns New applications & catalog of parallel patterns Programming Environment Multiscale programming Translation Environment Did the system do what you wanted? Multiscale compilation Correctness Execution Environment Virtual architecture to manage scalability & heterogeneity
Multiscale Compilation Infrastructure • Supports transformations that happen at different times and different ways • User-driven refactoring • Compile-time parallelization & optimization • Platform-dependent autotuning • Input-dependent autotuning • Issues: • Factoring in information created by external analysis tools (e.g., equivalence of different functions), and provided by run-time (e.g., inspector-executor, performance) • Managing data layout
A New Compiler Infrastructure Front-end type systems and whole-program analysis Annotated dependencies: additional user input (refactoring, annotations) run-time feedback Transformation plug-ins Optimistic transformations Domain specific transformations Program Dependence Graph (PDG) based parallelism transformations Feedback Execution platform
Execution Applications, Patterns New applications & catalog of parallel patterns Programming Environment Multiscale programming Translation Environment Did the system do what you wanted? Multiscale compilation Correctness Execution Environment Virtual architecture to manage scalability & heterogeneity
UPCRC View: Multiple Levels of Heterogeneity • Static heterogeneity in multi-core • Within platform • Small cores for parallel, big cores for sequential • Process variation • Across platforms • Multiple price points within generation, different tech generations • Dynamic heterogeneity in multi-core • Changing environment at application entry/exit • Changing available parallelism within application • Physical constraints (energy, temp) force hardware to adapt • May switch off core, operate at lower voltage/frequency • Need portability & scalability across static & dynamic heterogeneity
UPCRC View: QoS and Phys./App. Constraints • Quality of service increasingly important • Can often trade output quality for resource usage • Traditional soft real-time deadlines => need predictability • Multi-core resource sharing reduces predictability • Physical and Application constraints • Energy, temperature, aging… • Not all threads are created equal in parallel applications • Must express QoS requirements, tradeoffs in program • Must deliver QoS within constraints of dynamic multi-core system • Need HW supported Adaptive Run-Time
Defining the Virtual Machine CodletsQoS Requirements per app, per team, global OS protection competitive resource management Virtual machine; virtual cores, virtual memory Run-time manager collaborative resource management Resource management Performance feedback Physical cores Core attributes Hardware context management, data movement, concurrency model
UPCRC Themes Applications, Patterns New applications & catalog of parallel patterns Programming Environment Multiscale programming Translation Environment Did the system do what you wanted? Multiscale compilation Correctness Execution Environment Virtual architecture to manage scalability & heterogeneity
Problems • “Dusty-deck” parallel code: how does it fit in new environment? • Deducing what is the intended synchronization; transforming code into well synchronized code (sw+hw) • Alternative: “sandboxing” • Feedback for correctness • Efficient enforcement of language constraints on interleavings • Good reporting of “anomalous interleavings” • Feedback for performance • Efficient translation from local HW view to global SW view
Summary • The world of computing is undergoing a paradigm shift • We (academia) have a unique opportunity to impact it • If we (industry+academia) fail then industry is in deep trouble • If we succeed then immersive intelligence will have become (the new) reality • Decades of research in parallel computing provide sound starting point • Provided that we remember this is a new game • Let the fun start
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.