310 likes | 435 Views
PPoPP’06: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming New York, March 29-31, 2006. Conference Review Presented by: Utku Aydonat. Outline. Conference overview Brief summaries of sessions Keynote speeches & Panel Best paper. Conference Overview.
E N D
PPoPP’06:ACM SIGPLAN Symposium onPrinciples and Practice of Parallel ProgrammingNew York, March 29-31, 2006 Conference Review Presented by: Utku Aydonat
Outline • Conference overview • Brief summaries of sessions • Keynote speeches & Panel • Best paper CARG
Conference Overview • History: 90, 91, 93, 95, 97, 99, 01, 03, 05, 06 • Primary focus: anything related to parallel programming • Algorithms • Communication • Languages • 8 sessions, 26 papers • Dominating topics: multicores, parallelization techniques CARG
Conference Overview PPoPP: Paper Acceptance Statistics Year Submitted Accepted Rate 2006 91 25 27% 2005 87 27 31% 2003 45 20 44% 1999 79 17 22% 1997 86 26 30% CARG
Overview of Session • Communication • Languages • Performance Characterization • Shared Memory Parallelism • Atomicity Issues • Multicore Software • Transactional Memory • Potpourri CARG
Session 1: Communication • “Collective Communication of Architectures that Support Simultaneous Communication over Multiple Links”,E.Chan, R.van de Geijn (UTexas), W. Gropp, R.Thakur (Argonne National Lab.) • Adopt MPI collective communication algorithms to supercomputer architectures that support simultaneous communication with multiple nodes. • Theoritically latency can be reduced; practically it is not achievable due to the algorithms and overheads. • “Performance Evaluation of Adaptive MPI”,Chao Huang, Gengbin Zheng (UIUC), Sameer Kumar (IBM T. J. Watson), Laxmikant Kale(UIUC) • Design and evaluate AMPI that supports processor virtualization • Benefits: load balancing, adaptive overlapping, independence from the available number of processors, etc. CARG
Session 1: Communication • “Mobile MPI Programs on Computational Grids”,Rohit Fernandes, Keshav Pingali, Paul Stodghill (Cornell) • Checkpointing system for C programs using MPI • Able to take checkpoint on Alpha cluster and restart them on Windows • “RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits”,Sayantan Sur, Hyun-Wook Jin, Lei Chai, Dhabaleswar K Panda (Ohio State) • A rendezvous protocol in MPI using RDMA read. • Increases communication / computation overlap. CARG
Session 2: Languages • “Global-View Abstractions for User-Defined Reductions and Scans”, SteveJ. Deitz, David Callahan, Bradford L. Chamberlain (Cray), Lawrence Snyder (U. of Washington) • Chapel programming language developed by Cray Inc. as a part of DARPA High-Productivity Computing Systems program • Global view abstractions for user-defined reductions and scans • “Programming for Parallelism and Locality with Hierarchically Tiled Arrays”,Ganesh Bikshandi, Jia Guo, Daniel Hoeflinger (UIUC), Gheorghe Almasi (IBM T. J. Watson), Basilio B Fraguela (Universidade da Coruña), Maria Jesus Garzaran, David Padua (UIUC), Christoph von Praun (IBM T. J. Watson) • Hierarchically Tiled Arrays (HTAs) that define tiling structure for arrays • Reductions, mapping, scans, transpose, shift operations are defined. CARG
MinK in Chapel Called for each element of A var minimums: [1..10] integer; minimums = mink(integer, 10) reduce A; CARG
HTA CARG
Session 3: Performance Characterization • “Performance characterization of bio-molecular simulations using molecular dynamics”,Sadaf Alam, Pratul Agarwal, Al Geist, Jeffrey Vetter (ORNL) • Investigated performance bottlenecks in MD applications on supercomputers • Found out that the implementations of algorithms are not scalable • “On-line Automated Performance Diagnosis on Thousands of Processors”,Philip C. Roth (ORNL), Barton P. Miller (U. of Wisconsin, Madison) • Distributed and scalable performance analysis tool • Can analyze large application with 1024 processes and present the results in a folded graph. CARG
Session 3: Performance Characterization • “A Case Study in Top-Down Performance Estimation for a Large-Scale Parallel Application”,Ilya Sharapov, Robert Kroeger, Guy Delamarter (Sun Microsystems) • Performance estimation of HPC workloads on future architectures • Based on low-level analysis and scalability predictions. • Predicts the performance of Gyrokinetic Toroidal Code executed on Sun’s future architectures CARG
Session 4: Shared Memory Parallelism • “Hardware Profile-guided Automatic Page Placement for ccNUMA Systems”,Jaydeep Marathe, Frank Mueller (North Carolina State U.) • Profiles memory accesses and places pages accordingly. • 20% performance improvement and 2.7% overhead. • “Adaptive Scheduling with Parallelism Feedback”,Kunal Agrawal, Yuxiong He, Wen Jing Hsu, Charles Leiserson (Mass. Inst. of Tech.) • Allocates processors to jobs based on the past parallelism of the job. • Uses R-trimmed mean for the feed-back. CARG
Session 4: Shared Memory Parallelism • “Predicting Bounds on Queuing Delay for Batch-scheduled Parallel Machines”,John Brevik, Daniel Nurmi, Rich Wolski (UCSB) • Binomial Method Batch Predictor (BMBP) that bases its predictions on the past wait times. • Uses 95th percentile and its predictions are close to real wait times experienced. • “Optimizing Irregular Shared-Memory Applications for Distributed-Memory Systems”,Ayon Basumallik, Rudolf Eigenmann (Purdue) • Converts OpenMP applications to MPI based applications • Uses inspection loop to find non-local access and reorder loops. CARG
OpenMP-to-MPI CARG
Session 5: Atomicity Issues • “Proving Correctness of Highly-Concurrent Linearizable Objects Viktor Vefeiadis (U. of Cambridge)”,Maurice Herlihy (Brown U.), Tony Hoare (Microsoft Research Cambridge), Marc Shapiro (INRIA Rocquencourt & LIP6) • Proves the safety of concurrent objects using Rely-Guarantee method • Each thread’s rely condition should be satisfied and each thread’s guarantee condition implies other’s rely condition for every operation. • “Accurate and Efficient Runtime Detection of Atomicity Errors in Concurrent Programs”,Liqiang Wang, Scott D. Stoller (SUNY at Stony Brook) • Instruments the program and obtain profiling of memory accesses • Builds a tree of the conflicting accesses and applies some algorithms to prove conflict and view equivalency. CARG
Session 5: Atomicity Issues • “Scalable Synchronous Queues”,William N. Scherer III (U. of Rochester), Doug Lea (SUNY Oswego), Michael L. Scott (U. of Rochester) • Best Paper • Details are coming up. CARG
Session 6: Multicore Software • “POSH: A TLS Compiler that Exploits Program Structure”,Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn (UIUC), Karin Strauss, Jose Renau (UCSC), Josep Torrellas (UIUC) • TLS compiler that divides the program to tasks, prune the inefficient ones • Uses profiling to detect tasks that may violate frequently. • “High-performance IPv6 Forwarding Algorithm for Multi-core and Multithreaded Network Processors”,Hu Xianghui (U. of Sci. and Tech. of China), Xinan Tang (Intel), Bei Hua (U. of Sci. and Tech. of China) • New IPv6 forwarding algorithm optimized for Intel NPU features • Achieves 10Gbps speed for large routing tables with up to 400K entries. CARG
Session 6: Multicore Software • “MAMA! A Memory Allocator for Multithreaded Architectures”,Simon Kahan, Petr Konecny (Cray Inc.) • A memory allocator that aggregate requests to reduce the fragmentation • Transforms contention to collaboration • Experiments with micro-benchmarks proves that it works CARG
Session 7: Transactional Memory • “A High Performance Software Transactional Memory System For A Multi-Core Runtime”,Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L. Hudson (Intel), Chi Cao Minh, Ben Hertzberg (Stanford) • Maps each memory location to a unique lock and acquires all the relevant locks before committing a transaction • Undo-logging, write-locking/read versioning, cache-line conflict detection • “Exploiting Distributed Version Concurrency in a Transactional Memory Cluster”,Kaloian Manassiev, Madalin Mihailescu, Cristiana Amza (UofT) • Transactional Memory system on commodity clusters for generic C++ and SQL applications • Diffs are applied by readers on demand and may violate writers. CARG
Session 7: Transactional Memory • “Hybrid Transactional Memory”,Sanjeev Kumar (Intel), Michael Chu (U. of Mich.), Christopher Hughes, Partha Kundu, Anthony Nguyen (Intel) • Hardware and Software TM together • Extends DSTM • Conflict detection is based on loading and storing the state field of the object wrapper and the locator field. CARG
Session 8: Potpourri • “Fast and Transparent Recovery for Continuous Availability of Cluster-based Servers”,Rosalia Christodoulopoulou, Kaloian Manassiev (UofT), Angelos Bilas (U. of Crete), Cristiana Amza (UofT) • Recovery from failure on virtual shared memory systems • Based on page replication on backup nodes • Fail-free overhead of 38% and recovery cost is below 600ms. • “Mimimizing Execution Time in MPI Programs on an Energy-Constrained, Power-Scalable Cluster”,Rob Springer1, David K. Lowenthal1, Barry Rountree (The U. of Georgia), Vincent W. Freeh (North Carolina State U.) • Finds the best # of processors + gear combination that minimizes power and execution time. • Found the optimum schedule in 50% of the programs by iterating 7% of search space. CARG
Session 8: Potpourri • “Teaching parallel computing to science faculty: best practices and common pitfalls”,David Joiner (Kean U.), Paul Gray (U. of Northern Iowa), Thomas Murphy (Contra Costa College), Charles Peck (Earlham College) • Experience in teaching parallel programming in a community college CARG
Keynote Speeches & Panel • “Parallel Programming and Code Selection in Fortress”,Guy L. Steele Jr., Sun Fellow, Sun Microsystems Laboratories • “Parallel Programming in Modern Web Search Engines”,Raymie Stata, Chief Architect for Search & Marketplace, Yahoo!, Inc. • “Software Issues for Multicore Systems”,Moderator: James Larus, (Microsoft Research), Panelists: Saman Amarasinghe (MIT), Richard Brunner (AMD), Luddy Harrison (UIUC), David Kuck (Intel), Michael Scott (U. Rochester), Burton Smith (Microsoft), Kevin Stoodley (IBM) CARG
Guy L. Steele: “Parallel Programming and Code Selection in Fortress” • To do for Fortran what Java did for C • Dynamic compilation • Platform independence • Security model including type checking • Research funded in part by the DARPA through their High Productivity Computing Systems program • Don't build the language—grow it • Make programming notation closer to math • Ease use of parallelism • Can a feature be provided by a library rather than in compiler? • Programmers (especially library writers) need not fear subroutines, functions, methods, and interfaces for performance reasons CARG
Guy L. Steele: “Parallel Programming and Code Selection in Fortress” • Type System: Objects and Traits • Traits: like interfaces, but may contain code • Primitive types are first-class • Booleans, integers, floats, characters are all objects • Transactional access to shared variables • Fortress “loops” are parallel by default • Programming language notation can become closer to mathematical notation CARG
Guy L. Steele: “Parallel Programming and Code Selection in Fortress” CARG
Panel: Software Issues for Multicore Systems • Performance Conscious Languages • Languages that increase programmer productivity while making it easier to optimize • New Compiler Opportunities • New languages that take performance seriously • Possible compiler support for using multicores for other than parallelism • Security Enforcement • Program Introspection • Meanwhile, vast majority of applications programmers have no idea about parallelism • More Dual-core mid-2006, Quad core in 2007 (AMD) • Software Architecture Challenges (debugging, profiling, making multi-threading easier, etc. CARG
Panel: Software Issues for Multicore Systems • Some Successes in Using Multi-Core (OS support, transactional memory, virtualization, efficient JVMs) • Parallel software systems must be much simpler, architecturally, than sequential ones if they have a chance of holding together • We will struggle before finally accepting that the cache abstraction does not scale • Efficient point-to-point communication is required • Most success will be achieved on nonstandard multicore platforms like graphics processors, network processors, signal processors, where there is less investment in caches. • We need new apps to drive the interest towards multicores • Where will the parallelism come from? (dataflow, reduce/map/scan, speculative parallelization, etc.) CARG
Panel: Software Issues for Multicore Systems • The explicit sacrifice of single-thread performance in favor of parallel performance • Most vulnerable communities • Those who have not previously been exposed to or had a need for parallel systems, for example .. • Typical client software, mobile devices • Server transactions with significant internal complexity • Those who chronically need to drive the maximum performance from their computer systems, for example .. • High performance computing • Gamers • Above 8 cores, we do not know if multi-cores will be useful or not CARG
Readings For Future CARG • “Optimizing Irregular Shared-Memory Applications for Distributed-Memory Systems”,Ayon Basumallik, Rudolf Eigenmann (Purdue) • “POSH: A TLS Compiler that Exploits Program Structure”,Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn (UIUC), Karin Strauss, Jose Renau (UCSC), Josep Torrellas (UIUC) • “MAMA! A Memory Allocator for Multithreaded Architectures”,Simon Kahan, Petr Konecny (Cray Inc.) • “Hybrid Transactional Memory”,Sanjeev Kumar (Intel), Michael Chu (U. of Mich.), Christopher Hughes, Partha Kundu, Anthony Nguyen (Intel) CARG