120 likes | 254 Views
The 50B Transistor Challenge Mikko Lipasti Department of Electrical and Computer Engineering University of Wisconsin - Madison. IBM T.J. Watson Research Center July 22 and 23, 2008. 50B Transistors on a Chip?. History 1997 IEEE Computer Special Issue, 1B T/chip by 2007
E N D
The 50B Transistor ChallengeMikko LipastiDepartment of Electrical and Computer EngineeringUniversity of Wisconsin - Madison IBM T.J. Watson Research Center July 22 and 23, 2008
50B Transistors on a Chip? • History • 1997 IEEE Computer Special Issue, 1B T/chip by 2007 • 3 papers advocate single fast core – CMU, Michigan, Wisconsin • IRAM – Berkeley • RAW – MIT • SMT – Washington • Multicore – Stanford • 11 years later, 50x more transistors • We still need faster cores : computation • Fundamentally constrained by power • Will get more than one core : communication • Need efficient interconnects and coherent caches • Will get lots of on-chip memory • Need to think about new algorithms and new approaches to use it
(1) What Will We Do With 50B Transistors? • 50B transistors/chip dramatically alters data centers • E.g. Nokia moving aggressively into services • Google, Yahoo, MSN each provision ~1M servers • Now provision for 10x installed base (phone vs. PC) • Witness recent problems with Iphone/MobileMe • Impossible to anticipate applications • Youtube/Facebook/Flickr/Twitter • Unstructured real world data • Organize, search, extract semantic knowledge, mashups, … • Existing and future server apps all benefit
(2) How Will We Design Chips with 50B Transistors • Three things that processors need to be good at: • Computation • Communication • Storage/Memory • Focus on cost and nature of computation • Focus on cost of communication • Shift emphasis to memory
Cost of Computation • Less than 10% of energy spent on useful work • EPI overhead has gotten out of hand • Need to rethink operand delivery [ICCD’07], queues [ISPLED’07], caches, register files, control, … • Exploit program attributes • Solve hard problems via elimination • Macro-ops : no single-cycle operations [MICRO’03, HPCA’06] • Do the hard parts with narrow values [JILP’07] • Eliminate redundancy, excessive pipelines • Clever clock gating [ISLPED’06, ICCD’07] • Remove renaming, register file, clocked scheduler, pipelines [submitted] • Goal: reduce EPI by 10x at fixed process technology and MIPS
Cost of Communication • Reduce coherence overhead and speculation • Region coherence [ISCA’05, ASPLOS’06, HPCA’08] • Exploit locality of communication patterns • Switched circuits [CALetters’07, NOCS’08] • On-chip multicasting [ISCA’08] • Multicast coherence [submitted] • New technologies • Nanophotonic rings [HP Labs collaboration] • Massive bandwidth, speed-of-light latency • Lots of interesting problems to solve
Emphasis on Memory • In future processes, memory will be easier than logic • Reliability, variability: well-known solutions (ECC, sparing) • Interesting new technologies (PCRAM, etc.) • Not caches -- diminishing returns • Return to more regular, “memory-like” devices and logic? • Gate array, LUT, PLA • Majority of 50B T must not be switching • Remembering is cheaper than computing • Revisit value locality/reuse/memoization? • New search algorithms: • TCAM accelerator [ICCD’08] : Logic in memory—but not IRAM!
Unstructured Real-World Data • Internet is exploding with data • Text • Semantic knowledge • Photo, video, audio • It is all in digital form but all we can do is view and copy it • Algorithms for analysis range from poor to nonexistent • Machine learning? • Why not learn from nature?
Brains • Human brain Von Neumann machine • Face recognition: <500ms • Neurons are slow: • Critical path is a handful of “gates” • Fundamentally different computational model • Made of shoddy, unreliable parts “…neurons are noisy, unreliable devices, … the nervous system averages over many cells to compensate for these shoddy components.” -Christof Koch • We can build it. We have the technology. MICRO’-40 Panel: Computing Beyond Von Neumann
Brains (2) • Human neocortex: • ~20B neurons, ~200T synapses • Structurally homogenous • Hypothesis: runs common algorithm • Apply architecture 101? • Abstraction layers • Hierarchy and replication • Simulation/analysis/synthesis • Massively parallel fault-tolerant hardware • Best news: no need for parallel programming • Train vs. program • Let’s Build Brains! MICRO’-40 Panel: Computing Beyond Von Neumann
Summary • Computation : • Reduce cost (EPI) by 10x • New algorithms • Communication • Streamline coherence protocols, interconnects • Exploit new technologies • Storage/Memory • Reliability/variability • Logic in memory/new algorithms • Brain computing for unstructured real-world data
Questions? http://www.ece.wisc.edu/~pharm