1 / 12

Synergistic Processing In Cell’s Multicore Architecture Michael Gschwind , et al.

Synergistic Processing In Cell’s Multicore Architecture Michael Gschwind , et al. Presented by: Jia Zou CS258 3/5/08. Goal for Cell. Increase processor efficiency for most performance per area Reduce area per core, have more core in a given chip are

tillie
Download Presentation

Synergistic Processing In Cell’s Multicore Architecture Michael Gschwind , et al.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Synergistic Processing In Cell’s Multicore ArchitectureMichael Gschwind, et al. Presented by: JiaZou CS258 3/5/08

  2. Goal for Cell • Increase processor efficiency for most performance per area • Reduce area per core, have more core in a given chip are • Take advantage of the application parallelism • Aimd at data-processing intensive applications

  3. Cell Architecture

  4. Design Philosophy • Simple cores, lots of them • Any complexity reduction directly translates into increased performance • Exploiting the compiler to eliminate hardware complexity • PPE serves as controller, SPE provides performance • PPE and SPEs share address translation and virtual memory architecture

  5. Synergic Processing Unit

  6. Data alignment for Scalar and Vector Processing • SPU has no separate support for scalar processing • Unified scalar/SIMD register • Unified execution unit • Simpler control unit • Software-controlled data-alignment approach • Simplifies scalar data extraction, insertion, sharing between scalar and vector data • Increases compiler efficiency

  7. Scalar Layering

  8. Data-Parallel Conditional Execution

  9. Deterministic Data Delivery • SPE has local stores • 4Kb – 4Gb address range • Stores both instruction and data • All memory operations that the SPU executes refer to address space of this local store • Different from cache memory by: • No cache coherency problem • Offers low and deterministic access latency

  10. Statically Scheduled ILP • Instruction fetches are scheduled statically • Delivery up to two instructions per cycle • One to each complex • Static branch prediction: prepare-to-branch instruction => initiate instruction prefetch

  11. SPE Microarchitecture

  12. Design Goals and Decisions

More Related