Understanding Parallel Computing: Hardware, Programming Models, and Performance Analysis

What you should get out of CS240A In depth understanding of: • When is parallel computing useful? • Understanding of parallel computing hardware options. • Multi-cores. Clusters, shared memory, cache/memory, I/O. • Programming models and tools. • Some important parallel applications and the algorithms • Where is the parallelism? How to manage? • Tradeoff with memory latency, communication, I/O. • Performance analysis (how to evaluate) and tuning • Exposure to various open research questions

Summary: Memory Hierarchy • Details of machine are important for performance • Processor and memory system (not just parallelism) • Before you parallelize, make sure you’re getting good serial performance (Megaflops) • Locality is at least as important as computation • Temporal: re-use of data recently used • Spatial: using data nearby that recently used • Machines have memory hierarchies • 100s of cycles to read from DRAM (main memory) • Caches are fast (small) memory that optimize average case • Can rearrange code/data to improve locality

Questions You Should Be Able to Answer • What is the key to understand algorithm efficiency in our simple memory model? • What is tiling? • Why does block matrix multiply reduce the number of memory references? • What are the BLAS? • Why does loop unrolling improve uniprocessor performance?

CS267 Lecture 3 Hardware and Programming Models • Three basic conceptual models • Shared memory • Distributed memory • Data parallel and hybrid of these machines • Characteristics • Shared memory: impact of cache/consistency • Synchronization • Distributed memory: • Synchronization • how to communicate

CS267 Lecture 3 Programming and Parallelism Management • Threads • Thread management • Synchronization • Locks, semaphore, condition variables, barriers • Correctness • MPI • Coordination, communication primitives • OpenMP • How parallelize loops/regions • MapReduce • Map, reduce, combine • Basic parameters

CS267 Lecture 3 Program Parallelization /Parallel Execution • Program/data Mapping • Program partitioning • dependence analysis • Code/data distribution. • Scheduling of execution • Load balancing • SPMD code • Owner computers rule • Loop transformation • Blocking, unrolling, skewing. • Loop interchange

CS267 Lecture 3 Parallelism in Scientific Computing • Matrix multiplication • HW1. Partitioning vs. parallelism. • Numerical methods for ODE/PDE • High level view • Approximation with linear equations • Iterative methods • Particle methods • Where is the parallelisms? • How to manage parallelism?How to partition?

CS267 Lecture 3 Parallelism in Data-Intensive Computing • Log analysis. HW2 • Parallel Boosted Regression Trees for Web Search Ranking WWW 2011. • Where is parallelism. • What is the scheduling model? • Optimizing Parallel Algorithms for All Pairs Similarity Search. WSDM'2013. • Essentially matrix multiplication problem • Where is parallelism? How to utilize parallelism with better performance?

CS267 Lecture 3 MapReduce Optimziation • Incoop: MapReduce for Incremental Computations, ACM Cloud 2011. • Strategies for incremental computing • Cache results, build tree-depenedence • Adaptive data partitioning (splits) • A Platform for Scalable One-pass Analytics using MapReduce, SIGMOD 2011. • What is the cost of map-reduce execution • What parameters are adjusted • How to speedup map-reduce communication?

CS267 Lecture 3 Graph computation & Shared Memory programming • PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs Slides. OSDI 2012 • Programming model • How to partition graph computation? • An Analysis of Linux Scalability to Many Cores OSDI 2010 • Understand characteristics of shared memory architecture. True/false sharing. • Contention removal in lock& reference counter

Ranking Ranking Ranking Ranking Ranking Ranking Classification Web page index Parallelism in Internet services: Ask.com search engine example Client queries Traffic load balancer Frontend Frontend Frontend Frontend PageInfo Page Info Hierarchical Cache Clustering Middleware Cache Cache Cache Document Abstract Web page index Document Abstract Document Abstract Document description Structured DB Synchronization Fault tolerance

Understanding Parallel Computing: Hardware, Programming Models, and Performance Analysis

Understanding Parallel Computing: Hardware, Programming Models, and Performance Analysis

Presentation Transcript

What do you get out of this seminar?

What You Should Get From Furniture Rental Companies

What You Get

What do people get out of…

WHAT FREAKS YOU OUT?

CS240A Project

What You Get

What do you want to get out of today?

What Should We Get

Top 10 Reasons You Should Get Out and Travel Today

What You Should Find Out About Online Gambling

What You Should Find Out About Open public Talking

What You Should Find Out About Juicing

What You Should Find Out About Automobile Getting

Spa in Udaipur-What Type of Massage you should Get?

What You Should Do To Get Real-estate

What You Should Find Out About Legal representatives

What should you do when the power goes out?

Highlights of what you should know!

What You Need to Get out of a Lawyer

What size of truck or trailer should you get?

Outsourcing Payroll Services: What You Should Look Out For