1 / 63

General Purpose Processors as Processor Arrays

General Purpose Processors as Processor Arrays. Peter Cappello UC, Santa Barbara. VLSI Design Forces in 1986. “Nature, to be commanded, must be obeyed.” Sir Francis Bacon High performance  parallelism. VLSI Design Forces in 1986. High performance  parallelism.

kathy
Download Presentation

General Purpose Processors as Processor Arrays

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. General Purpose Processors as Processor Arrays Peter Cappello UC, Santa Barbara

  2. VLSI Design Forces in 1986 “Nature, to be commanded, must be obeyed.” • Sir Francis Bacon • High performance  parallelism

  3. VLSI Design Forces in 1986 • High performance  parallelism

  4. VLSI Design Forces in 1986 • Power is scarce  limit resistive delay

  5. VLSI Design Forces in 1986 • Power is scarce  limit resistive delay  limit long communication

  6. VLSI Design Forces in 1986 • Power is scarce  limit resistive delay  limit long communication • Area is scarce  limit wire crossing

  7. VLSI Design Forces in 1986 • Power is scarce  limit resistive delay  limit long communication • Area is scarce  limit wire crossing

  8. VLSI Design Forces in 1986 • Power is scarce  limit resistive delay  limit long communication • Area is scarce  limit wire crossing

  9. VLSI Design Forces in 1986 • $$ are scarce  design is expensive  reuse components

  10. VLSI Design Forces in 1986 • $$ are scarce  design is expensive  reuse components

  11. VLSI Design Forces in 1986 • $$ are scarce  design is expensive  reuse components

  12. VLSI Design Forces in 1986 • $$ are scarce  design is expensive  reuse components

  13. VLSI Design Forces in 1986 • $$ are scarce  design is expensive  reuse components

  14. VLSI Design Forces in 1986 In 2D systolic arrays, clock skew is an issue  wavefront arrays Islands of synchrony in an ocean of asynchrony

  15. Processor Array Properties • Have multiple processors

  16. Processor Array Properties • Have multiple processors • Neighbors abut (no long wires)

  17. Processor Array Properties • Have multiple processors • Neighbors abut • Only neighbors communicate directly

  18. Processor Array Properties • Have multiple processors • Neighbors abut • Only neighbors communicate directly • Have a constant # of processor types

  19. Processor Array Properties • Have multiple processors • Neighbors abut • Only neighbors communicate directly • Have a constant # of processor types • Scale: larger problems  larger arrays

  20. No 3D PA Has Properties 1 - 5 Enclose 3D PA in minimal sphere of radius r. r

  21. No 3D PA Has Properties 1 - 5 Scale PA in all 3 dimensions. r

  22. No 3D PA Has Properties 1 - 5 • Power consumption = Θ( r3 ). r

  23. No 3D PA Has Properties 1 - 5 • Power consumption = Θ( r3 ). • Heat dissipation via surface = Θ( r2 ). r

  24. VLSI Design Forces in 2006 “Nature, to be commanded, must be obeyed.” • Sir Francis Bacon • Power is scarce  limit clock frequency  parallelism • Power is scarce  limit resistive delay  limit long communication

  25. Trends in GPP in 2006 • Chip multiprocessors (CMP) • Vector IRAM • Cell • TRIPS • RAW

  26. Trends in GPP in 2006 Chip Multiprocessors (CMP) • Parallel processors • Crossbar

  27. Trends in GPP in 2006 Vector IRAM – Vector Intelligent RAM • For mobile multimedia devices Stream data processing • Combine GPP and DSP • Parallel – linear array • Crossbar

  28. Trends in GPP in 2006 Cell processor “The Department of Energy said Wednesday that it had awarded I.B.M. a contract to build a supercomputer capable of 1,000 trillion calculations a second, using an array of 16,000 Cell processor chips that I.B.M. designed for the coming PlayStation 3 video game machine.” Sept. 7, 2006. NY Times. • Parallel processors • BIU – Bus interface unit • RMT – Replacement management table • SL1 – 1st-level cache • PPE – PowerPC Element • SPE – Synergistic Processor Element • Element interconnect bus

  29. Trends in GPP in 2006 • TRIPS Tera-op, Reliable, Intelligently adaptive Processing System The following slides are taken from a talk: "The Design and Implementation of the TRIPS Prototype Chip," HotChips 17, Palo Alto, CA, August, 2005.

  30. E – execution tile R – register bank D – 8KB data cache I – instruction cache G – global control

  31. Instructions execute as a data flow graph An instruction’s output is another instruction’s input. Minimize use of register/cache for intermediate values Register reads/writes access the register banks Loads/stores access the data cache banks

  32. Trends in GPP in 2006 RAW (MIT) The following slides are taken from a RAW talk: Evaluating The Raw Microprocessor: Scalability and Versatility Presented at the International Symposium on Computer Architecture, June 21, 2004.

  33. Replace the crossbar with a point-to-point, pipelined, routed network. ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU + RF >>

  34. Distribute the Register File RF RF RF RF RF RF RF RF RF RF RF RF RF ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU RF RF RF RF

  35. Distribute the rest. RF PC Wide Fetch (16 inst) PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC RF RF RF RF RF RF RF RF RF RF RF RF I$ I$ I$ I$ I$ I$ I$ I$ I$ I$ I$ I$ I$ I$ I$ I$ ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU D$ D$ D$ D$ D$ D$ D$ D$ D$ D$ D$ D$ D$ D$ D$ D$ RF RF Control RF Unified Load/Store Queue [ISCA99]

  36. Tiles! RF PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC RF RF RF RF RF RF RF RF RF RF RF RF I$ I$ I$ I$ I$ I$ I$ I$ I$ I$ I$ I$ I$ I$ I$ I$ ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU D$ D$ D$ D$ D$ D$ D$ D$ D$ D$ D$ D$ D$ D$ D$ D$ RF RF RF

  37. Conclusions • VLSI Scalable microprocessors are possible. • Constant factors are beginning to give way to asymptotics: • - 16 ALU Raw – Oct 2002 • - 64 ALU Raw – Now • - 1,024 ALU Raw - 2010 • - 32,768 ALU Raw – If Moore’s Law makes it to 2 nm • There is an opportunity to make processors more • “versatile” i.e., steal applications from custom chips. • Tiled Processor Architectures are a promising approach and merit further research.

  38. GPP Predictions: In 10 Years • Encapsulate registers/cache/processor into an array (RAW) • Partition off-chip memory: Encapsulate memory & processor. Safely increase parallel access (concurrent programming) • For non-recursive applications GPP (mobile multimedia): • no bus; quasi-nearest neighbor networks. • For recursive applications GPP (gaming, control) • replace bus w/ lean on-chip short-diameter communication network. • 1 network-on-chip routes register/cache/instruction/control. • Need >= 1K processors/chip to justify network-on-chip.

  39. Predictions • Increasing complexity of: • Applications • Technology  Increasing specialization of labor

  40. Predictions • Increasing complexity of: • Applications • Technology  Increasing specialization of labor • Rate of change of increase in complexity is increasing over time  Increasing adaptability is important!

  41. Yet another taxonomy! ARCHITECTURAL SPECIFICITY CCM GPP GENERAL SPECIFIC PROTOTYPE ASIC ASIC RECONFIGURABILITY STATIC DYNAMIC

  42. Yet another taxonomy! ARCHITECTURAL SPECIFICITY CCM GPP GENERAL SPECIFIC PROTOTYPE ASIC ASIC RECONFIGURABILITY STATIC DYNAMIC

  43. COMMUNICATION LATENCY APPLICATION SPECIFICITY DP TP CCM GPP GENERAL SPECIFIC PROTOTYPE ASIC ASIC RECONFIGURABILITY STATIC DYNAMIC

  44. FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA DP Communication Topology EDGE ISA (2D VLIW) With Cores FFT, RISC High Throughput (iterative) Communication topology

More Related