1 / 8

Introduction

“Amdahl's Law in the Multicore Era” Mark Hill and Mike Marty University of Wisconsin IEEE Computer, July 2008 Presented by Dan Sorin. Introduction. Multicore is here  architects need to cope Time to re-visit Amdahl’s Law Speedup = 1/ [(1-f) + f/s]

nevan
Download Presentation

Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Amdahl's Law in the Multicore Era”Mark Hill and Mike MartyUniversity of WisconsinIEEE Computer, July 2008Presented by Dan Sorin

  2. Introduction • Multicore is here  architects need to cope • Time to re-visit Amdahl’s Law Speedup = 1/ [(1-f) + f/s] f = fraction of computation that’s parallels = speedup on parallel fraction • Goal of paper is to gain insights • Not actually a “research paper”, per se ECE 259 / CPS 221

  3. System Model & Assumptions • Chip contains fixed number, say N, of “base core equivalents” (BCEs) • Can construct more powerful cores by fusing BCEs • Performance of core is function of number of BCEs it uses • Perf(1) < Perf (R) < R • In paper, assume Perf(R) = sqrt(R) • Why doesn’t Perf(R) = R? • Homogeneous vs heterogeneous cores • Homogeneous: N/R cores per chip • Heterogeneous: 1 + (N-R) cores per chip • Rest of paper ignores/abstracts many issues • Shared caches (L2 and beyond), interconnection network ECE 259 / CPS 221

  4. Homogeneous Cores • Reminder: N/R cores per chip • Data in Figures 2a & 2b shows: • Speedups are often depressingly low, especially for large R • Even for large values of f, speedups are low • What’s intuition behind results? • For small R, chip performs poorly on sequential code • For large R, chip performs poorly on parallel code ECE 259 / CPS 221

  5. Heterogeneous Cores • Reminder: 1 big core + (N-R) minimal cores per chip • Data in Figures 2c & 2d shows: • Speedups are much better than for homogeneous cores • But still not doing great on parallel code • What’s intuition behind results? • For large f, can’t make good use of big core ECE 259 / CPS 221

  6. Somewhat Obvious Next Step • If homogeneous isn’t great and heterogeneous isn’t always great, can we dynamically adjust to workload? • Assign more BCEs to big core when sequential • When parallel code, no need for big core • Data in Figures 2e and 2f show: • Yup, this was a good idea (best of both worlds) • Is this realistic, though? ECE 259 / CPS 221

  7. Conclusions • Just because world is now multicore, we can’t forget about single-core performance • Aside: interesting observation from a traditionally MP group • Cost-effectiveness matters • Sqrt(R) may seem bad, but may actually be fine • Amdahl is still correct – we’re limited by f • Dynamic provisioning of resources, if possible, is important ECE 259 / CPS 221

  8. Questions/Concerns • Is this model too simplistic to be insightful? • Abstractions can be good, but can also be misleading • For example, this paper focuses on cores, when the real action is in the memory system and interconnection network • Concrete example: more cores require more off-chip memory bandwidth  having more cores than you can feed isn’t going to help you • Are the overheads for dynamic reconfiguration going to outweigh its benefits? • CoreFusion paper does this, but it ain’t cheap or easy • What if breakthrough in technology (e.g., from Prof. Dwyer’s research) removes the power wall? • Do we go back to big uniprocessors? ECE 259 / CPS 221

More Related