1 / 23

Modeling of Architectures Platform-based Design 5KK70 Henk Corporaal Bart Mesman Hamed Fatemi 2010

Modeling of Architectures Platform-based Design 5KK70 Henk Corporaal Bart Mesman Hamed Fatemi 2010. Outline. We will look at models for Area, Delay and Energy Processor structure Register files - Register cell Model (area, power, delay) details for several register file configurations

chione
Download Presentation

Modeling of Architectures Platform-based Design 5KK70 Henk Corporaal Bart Mesman Hamed Fatemi 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling of ArchitecturesPlatform-based Design5KK70Henk CorporaalBart MesmanHamed Fatemi2010

  2. Outline • We will look at models for Area, Delay and Energy • Processor structure • Register files - Register cell • Model (area, power, delay) • details for several register file configurations • Apply this to the Imagine architecture • Stream register file • Network

  3. Processor • Single processor • Instruction Memory (IM) • Controller • Processing Element (PE) • Register File (RF) • ALU • Data Memory (DM) • SIMD • Multiple PEs • VLIW • Multiple ALUs • Multi-Processor • Several processors • Connected by a bus or network IM Controller PE RF ALU DM Network

  4. Register File (RF) Area model 1-bit • Assume: • p = number of ports • For large RF row decoder small compared to cell area • 1-Bit area = w*h (tracks) If p is large Schematic of 1 register cell

  5. Register file (RF) Delay model Delay (d): • Wire Propagation delay • Fan-in/out delay • Wire propagation dominates the delay with a large number of ports • R = number of registers • Register file • assuming square layout • R registers of b bits Note: for N FUs (ALUs), p ~ 3N, R ~ N → d ~ N3/2

  6. Register file (RF) Power model • Power (P): • Proportional to the capacitance that must be switched for each access • In each access every bit-line and one word-line  bit-line capacitance • Each port drives (bR)1/2 bit lines • Each bit line has length (h+p) (bR)1/2 Register file If p is large: power is dominated by wire capacitance Note: for N FUs (ALUs), p ~ 3N, R ~ N → P ~ N3

  7. Register File organization • Processor with one level register • Central (shared register file) ALU N ALU 1 DRF (distributed register file): ALU 1 ALU N

  8. Comparing Area model of Central and Distributed RF • Central (shared) RF: • 2 read ports, one write port per ALU • R= rN: number of registers of b bits • r: number of register per ALU • N: number of ALUs • DRF: • Only 2 ports: one read, one write • This would give A(1 RF) ~ N • Area of switch has same area cost complexity Square layout & organization of the DRF, including 2N*N crossbar

  9. Delay and Power models of central versus distributed RF Assume N ALUs • Central RF: • #registers R=rN • #ports p =3N • Large N • DRF: • Constant #registers per ALU • #ports p=2 (also constant!) • DRF has a fixed delay and power (per RF) • Wire propagation determines delay and power (for large N) • For large N

  10. Register File Register (memory) storage and communication between ALUs are critical parts for area, energy and performance in media processor. Hierarchical register storage

  11. RF2 (level 2) RF2 (level 2) RF1 (level 1) RF1 (level 1) DRF: ALU 1 ALU N 2-levels register files (Hierarchical) Central: ALU N ALU 1 • RF1 serves the ALUs, while RF2 is used to cover the memory latency • Overall tendency for Area is the same as having one level RF

  12. Register Files • Processor with stream register files: • Replace each port into the memory staging RF with a stream buffer • All stream buffers share a single port into the memory staging RF, allowing that single physical port to act as many logical ports. Central: ALU N ALU 1

  13. Register Files • The payoff the transformation into a stream architecture is that we can achieve an area proportional to N^2, since R2 (memory storage) only needs 1 port. We also have to add in the area of the stream buffers, which grows as N^2 with a very small constant. DRF: ALU 1 ALU N

  14. Results area per ALU (Normalized to 1 ALU)

  15. Results Local delay

  16. Results Power overhead

  17. Imagine Architecture Cell placement of Imagine Die Photo of Imagine

  18. Imagine Floorplan • 22 million transistors • 500 MHz • Area, Energy, Delay models • Clusters, Micro-controller, SRF, Network Interface

  19. Stream register File

  20. Network: • Area of network grows with (like DRF switch) : More details in khailany paper [2003]

  21. Exploration Intra-cluster scaling

  22. Exploration Inter-cluster scaling

  23. end • More details: • Scott Rixner, William J. Dally, Brucek Khailany, Peter Mattson, Ujval J.Kapasi, and John D. Owens. Register Organization for Media Processing. In Proceedings of the 6th International Symposium on High-Performance Computer Architecture (HPCA), pages 375–386, Toulouse, France, January 2000. IEEE Computer Society. • Brucek Khailany, William Dally, Scott Rixner, Ujval Kapasi, John Owens, and Brian Towles. Exploring the vlsi scalability of stream processors. In Proceedings of the Ninth Symposium on High Performance Computer Architecture (HPCA), pages 153–164, Anaheim, California, USA, February 2003. IEEE Computer Society.

More Related