350 likes | 635 Views
CMPT 886: Special Topics in Operating Systems and Computer Architecture. Dr. Alexandra Fedorova School of Computing Science SFU. Meet the Instructor. Ph.D . in Computer Science from Harvard, 2006 Dissertation on operating system design for multicore processors
E N D
CMPT 886: Special Topics in Operating Systems and Computer Architecture Dr. Alexandra Fedorova School of Computing Science SFU
Meet the Instructor • Ph.D. in Computer Science from Harvard, 2006 • Dissertation on operating system design for multicore processors • Concurrently with Ph.D., an intern at Sun Labs (3 years) • 9 US patent applications • First semester at SFU: Spring 2007 • Industrial partnership with Sun Microsystems
Course Topic • Multicore processors • New type of computer architecture • Dominates new processor market • Desktops, servers, mobile devices, etc. • Almost all chips will be multicore soon • Many research problems to solve • How to design software for these chips? • How to design the chips themselves? • How to structure hardware/software interaction?
Today • Introduction to multicore processors • Examples of research problems • Overview of the course
Conventional vs. Multicore Core 0 Core 0 Core 1 L1 cache L1 cache L1 cache L2 cache L2 cache • Conventional processor • Single core • Dedicated caches • One thread at a time • Multicore processors • At least two cores • Shared caches • Many threads simultaneously
The Multicore Revolution • Most new processors are multicore • intel.com: Most processors shipped are multicore: • 2006: 75% for desktops, 85% for servers • 2007: 90% for desktop and mobile, 100% for servers • Everyone’s doing it • Sun Microsystems Rock, Niagara 1, Niagara 2 • IBM Power4, Power5, Power6, Cell • AMD Quad Core (Barcelona) • Embedded: ARM
Why Multicore? • Power consumption is a huge problem • Multicore chips potentially produce a lot more computation per unit of power • Example: Reduce CPU clock frequency by 20% Power consumption reduces by 50%! Put two 0.8 frequency cores on the same chip Get 1.6 times the computation at the same power consumption
Superior Performance/Watt • Example: • Reduce CPU clock frequency by 20% • Power consumption reduces by 50%! • Put two 0.8 frequency cores on the same chip • Get 1.6 times the computation at the same power consumption 0.5x power 0.5x power Core 0 Core 1 0.8x frequency 0.8x frequency L1 cache L1 cache L2 cache
Why Multicore? • Increasing processor clock speed (GHz) is inefficient • Increase clock speed by 20% • Power increases by ≈75% • How much does performance increase?
Multicore vs. Unicore • Multicore: • 1.6x throughput increase • No power consumption increase • Single-core: • 1.2x throughput increase • 1.75x power increase
Transistor density still rising Clock speed isn’t Transistors are used for parallelism: multicore processors Source: Sutter, The Free Lunch is over
Multicore Potential • Multicoresoffer potentialto compute more efficiently • Applications and systems are not ready to realize that potential • What needs to be done? • A fundamental shift to parallel programming • New ways to manage resources in the operating system
What’s Important to Remember? • Massive parallelism • Good or bad? • Good: We can use processor more efficiently • Bad: We don’t know how to make the most out of it. Core 0 Core 1 L1 cache L1 cache L2 cache • Good or bad? • Good: More efficient resource utilization (the reason for multicore) • Bad: Contention for resources • Shared resources • Execution: functional units, queues, register files • Memory: L1 cache, L2 cache, interconnects
Problems Addressed in Research • How to manage resource allocation? • Operating system solutions • Architectural (hardware solution) • How to take advantage of parallelism? • Make concurrent programming easier (languages, performance tools, etc.) • Make concurrent programming automatic (automatic parallelization)
Managing Resource Allocation • New OS structures • Extensions to hardware architecture • Analytical performance modeling • New ways to write applications: can the application tell the OS how it uses resources? • New algorithms (attention, theoreticians and AI researchers!)
Operating Systems for Multicore Processors A B C • Threads running concurrently compete for resources • Degree of contention depends on what the threads are doing • A is a database application (needs lots of L1 cache) • B is a web server (needs lots of L1 cache) • Cis a cryptographic thread (needs little L1 cache) Core 0 Core 1 L1 cache L1 cache L2 cache
Challenges A B C • How to find out threads’ resource requirements? • How to find out if threads will compete? Core 0 Core 1 L1 cache L1 cache L2 cache • How to find out the degree of contention on performance? • What is the best way to schedule threads?
Problems Addressed in Research • How to manage resource allocation? • Operating system solutions • Architectural (hardware solution) • How to take advantage of parallelism? • Make concurrent programming easier (languages, performance tools, etc.) • Make concurrent programming automatic (automatic parallelization)
Support for Concurrent Programming • Writing parallel code is difficult • Most people think serially • Deciding how to divide the work between threads is not always trivial • Parallel entities need to synchronizeor communicate • A new paradigm for synchronization
Synchronization Hurts Performance shared data If lock is not available, threads wait Execution becomes serialized
Coarse vs. Fine Synchronization intupdate_shared_counters(int *counters, intn_counters) { inti; coarse_lock_acquire(counters_lock); for (i=0; i<n_counters; i++) { • fine_lock_acquire(counter_locks[i]); counters[i]++; • fine_lock_release(counter_locks[i]); } coarse_lock_release(counters_lock); } Coarse locks are easy to program But perform poorly Fine locks perform well But are difficult to program
Transactional Memory To the Rescue! • Can we have the best of both worlds? • Good performance • Ease of programming • The answer is: • Transactional Memory (TM)
Transactional Memory (TM) • Programming model: • Extension to the language • Runtime and/or hardware support • Lets you do synchronization without locks • Performance of fine grained locks • Ease of programming of coarse grained locks
Transactional Memory vs. Locks intupdate_shared_counters(int *counters, intn_counters) { inti; ATOMIC_BEGIN(); coarse_lock_acquire(counters_lock); for (i=0; i<n_counters; i++) { • fine_lock_acquire(counter_locks[i]); counters[i]++; • fine_lock_release(counter_locks[i]); } coarse_lock_release(counters_lock); • ATOMIC_END(); } • Transactional section • Looks like coarse grained lock • Acts like fine grained lock • Performance degrades only if there is conflict
The Backend of TM restart • read A • write B • read B • write A • write D • read C • write C • read E • write E • read D Abort!
State of TM • Still evolving • More work needed to make it usable and well performing • It is very real • Sun’s new Rock processor has TM support • Intel is very active
Summary • Multicore systems • They are everywhere: servers, desktops, small devices • Must understand them • Plenty of research on multicore systems • System software (OS, compilers, runtimes) • Architecture • Analytical modeling • Applications
Class Structure • Learn about multicore research • Read and critique papers • Paper summaries, presentations • Learn how to do multicore research • Discuss papers, think about new ideas • Analyze papers • Learn how to use research tools (2 homeworks) • Do multicore research • A research project
Research Project • A unique experience: getting a project done from start to end • Goal: generate a publication • Last year: two publications out of four projects • Gives you confidence as a grad student • Improves your resume • Challenging! You will learn a lot!
Your Expectations • Expect to work hard • But you’ll be glad you did this later • Papers will be difficult to read at first (3-5 hours/paper) • Will get easy later • Reward: You will be comfortable at leading your own research in this area
Final Project • You can create your own topic • Or choose from a list of existing topics • Some projects are very well specified (like an undergraduate course project) • Others are more open-ended (hint: an opportunity to be creative) • We have systems and tools you’ll need for the project
Final Project (cont.) • Submit a project proposal in early February • Complete the project by early April • You have only two months • Have to work hard! • Expect to dedicate ≈15-20 hrs/week
Will I Succeed in this Course? • You have to work independently! • Take full responsibility for your project • I will help, but I cannot do it for you • I do not have all the answers • You will succeed, if you are prepared to work hard • What you can or cannot do now does not matter • The course is designed to train you
Course Web Site • Syllabus • Wiki • Multicore portal • Technical documentation