1 / 56

Platform based design 5KK70 MPSoC Platforms

Platform based design 5KK70 MPSoC Platforms. Overview and Cell platform Bart Mesman and Henk Corporaal. The Software Crisis. The first SW crisis. Time Frame: ’60s and ’70s Problem: Assembly Language Programming Computers could handle larger more complex programs

raimundo
Download Presentation

Platform based design 5KK70 MPSoC Platforms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Platform based design5KK70MPSoC Platforms Overview and Cell platform Bart Mesman and Henk Corporaal

  2. The Software Crisis Platform Design. H.Corporaal and B. Mesman

  3. The first SW crisis Time Frame: ’60s and ’70s • Problem: Assembly Language Programming • Computers could handle larger more complex programs • Needed to get Abstraction and Portability without losing Performance • Solution: • High-level languages for von-Neumann machines FORTRAN and C Platform Design. H.Corporaal and B. Mesman

  4. The second SW crisis Time Frame: ’80s and ’90s • Problem: Inability to build and maintain complex and robust applications requiring multi-million lines of code developed by hundreds of programmers • Computers could handle larger more complex programs • Needed to get Composability and Maintainability • High-performance was not an issue: left for Moore’s Law Platform Design. H.Corporaal and B. Mesman

  5. Solution • Object Oriented Programming • C++, C# and Java • Also… • Better tools • Component libraries, Purify • Better software engineering methodology • Design patterns, specification, testing, code reviews Platform Design. H.Corporaal and B. Mesman

  6. Today: Programmers are Oblivious to Processors • Solid boundary between Hardware and Software • Programmers don’t have to know anything about the processor • High level languages abstract away the processors • Ex: Java bytecode is machine independent • Moore’s law does not require the programmers to know anything about the processors to get good speedups • Programs are oblivious of the processor -> work on all processors • A program written in ’70 using C still works and is much faster today • This abstraction provides a lot of freedom for the programmers Platform Design. H.Corporaal and B. Mesman

  7. The third crisis: Powered by PlayStation Platform Design. H.Corporaal and B. Mesman

  8. Contents • Hammer your head against 4 walls • Or: Why Multi-Processor • Cell Architecture • Programming and porting • plus case-study Platform Design. H.Corporaal and B. Mesman

  9. Moore’s Law Platform Design. H.Corporaal and B. Mesman

  10. Single Processor SPECint Performance Platform Design. H.Corporaal and B. Mesman

  11. What’s stopping them? • General-purpose uni-cores have stopped historic performance scaling • Power consumption • Wire delays • DRAM access latency • Diminishing returns of more instruction-level parallelism Platform Design. H.Corporaal and B. Mesman

  12. Power density Platform Design. H.Corporaal and B. Mesman

  13. Power Efficiency (Watts/Spec) Platform Design. H.Corporaal and B. Mesman

  14. 1 clock cycle wire range Platform Design. H.Corporaal and B. Mesman

  15. Global wiring delay becomes dominant over gate delay Platform Design. H.Corporaal and B. Mesman

  16. Performance µProc: 55%/year 1000 CPU 100 Processor-Memory Performance Gap:(grows 50% / year) “Moore’s Law” 10 DRAM: 7%/year DRAM 1 2005 1980 1985 1990 1995 2000 Time [Patterson] Memory Platform Design. H.Corporaal and B. Mesman

  17. Now what? • Latest research drained • Tried every trick in the book So: We’re fresh out of ideas Multi-processor is all that’s left! Platform Design. H.Corporaal and B. Mesman

  18. MPSoC Issues • Homogeneous vs Heterogeneous • Shared memory vs local memory • Topology • Communication (Bus vs. Network) • Granularity (many small vs few large) • Mapping • Automatic vs manual parallelization • TLP vs DLP • Parallel vs Pipelined Platform Design. H.Corporaal and B. Mesman

  19. Multi-core Platform Design. H.Corporaal and B. Mesman

  20. Communication models: Shared Memory Shared Memory (read, write) (read, write) Process P2 Process P1 • Coherence problem • Memory consistency issue • Synchronization problem Platform Design. H.Corporaal and B. Mesman

  21. Processor Processor Processor Processor One or more cache levels One or more cache levels One or more cache levels One or more cache levels SMP: Symmetric Multi-Processor • Memory: centralized with uniform access time (UMA) and bus interconnect, I/O • Examples: Sun Enterprise 6000, SGI Challenge, Intel Main memory I/O System Platform Design. H.Corporaal and B. Mesman

  22. Processor Processor Processor Processor Cache Cache Cache Cache Memory Memory Memory Memory DSM: Distributed Shared Memory • Nonuniform access time (NUMA) and scalable interconnect (distributed memory) Interconnection Network Main memory I/O System Platform Design. H.Corporaal and B. Mesman

  23. receive send Process P2 Process P1 send receive FiFO Communication models: Message Passing • Communication primitives • e.g., send, receive library calls Platform Design. H.Corporaal and B. Mesman

  24. Network interface Network interface Network interface Network interface DMA DMA DMA DMA Message passing communication Processor Processor Processor Processor Cache Cache Cache Cache Memory Memory Memory Memory Interconnection Network Platform Design. H.Corporaal and B. Mesman

  25. Communication Models: Comparison • Shared-Memory • Compatibility with well-understood (language) mechanisms • Ease of programming for complex or dynamic communications patterns • Shared-memory applications; sharing of large data structures • Efficient for small items • Supports hardware caching • Messaging Passing • Simpler hardware • Explicit communication • Scalable! Platform Design. H.Corporaal and B. Mesman

  26. Three fundamental issues for shared memory multiprocessors • Coherence, about: Do I see the most recent data? • Consistency, about: When do I see a written value? • e.g. do different processors see writes at the same time (w.r.t. other memory accesses)? • SynchronizationHow to synchronize processes? • how to protect access to shared data? Platform Design. H.Corporaal and B. Mesman

  27. Coherence problem, in Multi-Proc system CPU-1 CPU-2 cache cache a' 550 a'' 100 b' 200 b'' 200 memory a 100 b 200 Platform Design. H.Corporaal and B. Mesman

  28. Potential HW Coherency Solutions • Snooping Solution (Snoopy Bus): • Send all requests for data to all processors (or local caches) • Processors snoop to see if they have a copy and respond accordingly • Requires broadcast, since caching information is at processors • Works well with bus (natural broadcast medium) • Dominates for small scale machines (most of the market) • Directory-Based Schemes • Keep track of what is being shared in one centralized place • Distributed memory => distributed directory for scalability(avoids bottlenecks) • Send point-to-point requests to processors via network • Scales better than Snooping • Actually existed BEFORE Snooping-based schemes Platform Design. H.Corporaal and B. Mesman

  29. Processor Processor Processor Processor Cache Cache Cache Cache Example Snooping protocol • 3 states for each cache line: • invalid, shared, modified (exclusive) • FSM per cache, receives requests from both processor and bus Main memory I/O System Platform Design. H.Corporaal and B. Mesman

  30. Cache coherence protocol • Write invalidate protocol for write-back cache • Showing state transitions for each block in the cache Platform Design. H.Corporaal and B. Mesman

  31. Synchronization problem • Computer system of bank has credit process (P_c) and debit process (P_d) /* Process P_c */ /* Process P_d */ shared int balance shared int balance private int amount private int amount balance += amount balance -= amount lw $t0,balance lw $t2,balance lw $t1,amount lw $t3,amount add $t0,$t0,t1 sub $t2,$t2,$t3 sw $t0,balance sw $t2,balance Platform Design. H.Corporaal and B. Mesman

  32. Issues for Synchronization • Hardware support: • Un-interruptable instruction to fetch-and-update memory (atomic operation) • User level synchronization operation(s) using this primitive; • For large scale MPs, synchronization can be a bottleneck; techniques to reduce contention and latency of synchronization Platform Design. H.Corporaal and B. Mesman

  33. Cell Platform Design. H.Corporaal and B. Mesman

  34. What can it do? Platform Design. H.Corporaal and B. Mesman

  35. Cell/B.E. - the history • Sony/Toshiba/IBM consortium • Austin, TX – March 2001 • Initial investment: $400,000,000 • Official name: STI Cell Broadband Engine • Also goes by Cell BE, STI Cell, Cell • In production for: • PlayStation 3 from Sony • Mercury’s blades Platform Design. H.Corporaal and B. Mesman

  36. Cell blade Platform Design. H.Corporaal and B. Mesman

  37. Cell/B.E. – the architecture • 1 x PPE 64-bit PowerPC • L1: 32 KB I$ + 32 KB D$ • L2: 512 KB • 8 x SPE cores: • Local store: 256 KB • 128 x 128 bit vector registers • Hybrid memory model: • PPE: Rd/Wr • SPEs: Asynchronous DMA • EIB: 205 GB/s sustained aggregate bandwidth • Processor-to-memory bandwidth: 25.6 GB/s • Processor-to-processor: 20 GB/s in each direction Platform Design. H.Corporaal and B. Mesman

  38. Cell chip Platform Design. H.Corporaal and B. Mesman

  39. SPE Platform Design. H.Corporaal and B. Mesman

  40. SPE Platform Design. H.Corporaal and B. Mesman

  41. SPE pipeline Platform Design. H.Corporaal and B. Mesman

  42. Communication Platform Design. H.Corporaal and B. Mesman

  43. 8 parallel transactions Platform Design. H.Corporaal and B. Mesman

  44. Send the code of the function to be run on SPE 1 Send address to fetch the data 2 DMA data in LS from the main memory 3 Run the code on the SPE 4 DMA data out of LS to the main memory 5 Signal the PPE that the SPE has finished the function 6 C++ on Cell Platform Design. H.Corporaal and B. Mesman

  45. Detect & isolate kernels to be ported 1 Replace kernels with C++ stubs 2 Implement the data transfers and move kernels on SPEs 3 Iteratively optimize SPE code 4 Porting C++ Platform Design. H.Corporaal and B. Mesman

  46. Performance estimation • Based on Amdhal’s law … where • K ifr = the fraction of the execution time for kernel Ki • K ispeed-up = the speed-up of kernel Ki compared with the sequential version Platform Design. H.Corporaal and B. Mesman

  47. Performance estimation • Based on Amdhal’s law: • Sequential use of kernels: • Parallel use of kernels: ? Platform Design. H.Corporaal and B. Mesman

  48. MARVEL case-study • Multimedia content retrieval and analysis Compares the image features with the model features and generates an overall confidence score For each picture, we extract the values for the features of interest: ColorHistogram, ColorCorrelogram, Texture, EdgeHistogram http://www.research.ibm.com/marvel Platform Design. H.Corporaal and B. Mesman

  49. MarCell = MARVEL on Cell • Identified 5 kernels to port on the SPEs: • 4 feature extraction algorithms • ColorHistogram (CHExtract) • ColorCorrelogram(CCExtract) • Texture (TXExtract) • EdgeHistogram (EHExtract) • 1 common concept detection, repeated for each feature Platform Design. H.Corporaal and B. Mesman

  50. MarCell – kernels speed-up Platform Design. H.Corporaal and B. Mesman

More Related