1 / 17

Comparison of Next Generation Gaming Architectures

Comparison of Next Generation Gaming Architectures. Presented By Dela Tsiagbe. Introduction. Brief History of Gaming Platforms Difference between consoles and personal computers Look at actual Architecture Comparison of Vendors Summary. History of gaming .

carver
Download Presentation

Comparison of Next Generation Gaming Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparison of Next Generation Gaming Architectures Presented By Dela Tsiagbe

  2. Introduction • Brief History of Gaming Platforms • Difference between consoles and personal computers • Look at actual Architecture • Comparison of Vendors • Summary

  3. History of gaming • Video gaming itself dates back to the 60’s and 70’s • Consoles such as Magnavox Odyssey , Atari , and Colecovison made gaming popular • NES • Storytelling

  4. Difference between Consoles and PCs • In the past it used to be true that the computing power of a PC was far more than that of a console. • Consoles today require much more. • Most times, the type of power you get for the amount you pay for the console is more. Meaning you get more for your money when you purchase a gaming console of the same price of a PC.

  5. Difference between Consoles and PCs (continued) • Xbox 360 Stats • Custom IBM PowerPC-based CPU • * 3 symmetrical cores running at 3.2 GHz each • * 2 hardware threads per core; 6 hardware threads total • * 1 VMX-128 vector unit per core; 3 total • * 128 VMX-128 registers per hardware thread • * 1 MB L2 cache • CPU Game Math Performance • * 9 billion dot product operations per second • Custom ATI Graphics Processor • * 500 MHz • * 10 MB embedded DRAM • * 48-way parallel floating-point dynamically-scheduled shader pipelines • * Unified shader architecture

  6. Difference between Consoles and PCs (continued) • * PowerPC-base Core @3.2GHz • * 1 VMX vector unit per core • * 512KB L2 cache • * 7 x SPE @3.2GHz • * 7 x 128b 128 SIMD GPRs • * 7 x 256KB SRAM for SPE • * * 1 of 8 SPEs reserved for redundancy total floating point performance: 218 GFLOPS

  7. Difference between Consoles and PCs (continued) • Things to consider: • Although there is less memory, there is no is a minimal OS running in the background • Compatibility of hardware is never a problem • There is very little overhead from the system itself.

  8. Types of processors • Xbox 360 - Xenon • PS3 - PowerPC Cell

  9. PS3 Schematics

  10. Xbox 360 Schematics

  11. Power PC Instruction Set • li REG, VALUE • loads register REG with the number VALUE • add REGA, REGB, REGC • adds REGB with REGC and stores the result in REGA • addi REGA, REGB, VALUE • add the number VALUE to REGB and stores the result in REGA • mr REGA, REGB • copies the value in REGB into REGA • or REGA, REGB, REGC • performs a logical "or" between REGB and REGC, and stores the result in REGA • ori REGA, REGB, VALUE • performs a logical "or" between REGB and VALUE, and stores the result in REGA • and, andi, xor, xori, nand, nand, and nor • all of these follow the same pattern as "or" and "ori" for the other logical operations • ld REGA, 0(REGB)

  12. PowerPC Instruction Set • use the contents of REGB as the memory address of the value to load into REGA • lbz, lhz, and lwz • all of these follow the same format, but operate on bytes, halfwords, and words, respectively (the "z" indicates that they also zero-out the rest of the register) • b ADDRESS • jump (or branch) to the instruction at address ADDRESS • bl ADDRESS • subroutine call to address ADDRESS • cmpd REGA, REGB • compare the contents of REGA and REGB, and set the bits of the status register appropriately • beq ADDRESS • branch to ADDRESS if the previously compared register contents were equal • bne, blt, bgt, ble, and bge • all of these follow the same form, but check for inequality, less than, greater than, less than or equal to, and greater than or equal to, respectively. • std REGA, 0(REGB) • use the contents of REGB as the memory address to save the value of REGA into • stb, sth, and stw

  13. CPU Specs • Three 3.2 GHz PowerPC cores ・ Shared 1MB L2 cache, 8-way set associative ・ Per-Core Features ミ 2-issue per cycle, in-order, decoupled Vector/Scalar issue queue • 2 symmetric fine grain hardware threads ミ L1 Caches: 32K 2-way I$ / 32K 4-way D$ • Execution pipelines ・ Branch Unit, Integer Unit, Load/Store Unit ・ VMX128 Units: Floating Point Unit, Permute Unit, Simple Unit ・ Scalar FPU ・ VMX128 enhanced for game and graphics workloads • ミ All execution units 4-way SIMD • ミ 128 128-bit vector registers per thread • ミ Custom dot-product instruction • ミ Native D3D compressed data formats

  14. CPU Data Streams • High bandwidth data streaming support with minimal • cache thrashing • – 128B cache line size (all caches) • – Flexible set locking in L2 • – Write streaming: • L1s are write through, writes do not allocate in L1 • 4 uncacheable write gathering buffers per core • 8 cacheable, non-sequential write gathering buffers per core • Read streaming: • xDCBT data prefetch around L2, directly into L1 • 8 outstanding load/prefetches per core • Tight GPU data streaming integration (XPS) • XPS – “Xbox Procedural Synthesis” • GPU 128B read from L2 • GPU low latency cacheable writebacks to CPU • GPU shares D3D compressed data formats with CPU => at least • 2x effective bus bandwidth for typical graphics data

  15. GPU • 500 MHz graphics processor • – 48 parallel shader cores (ALUs); dynamically scheduled; 32bit IEEE • FLP • – 24 billion shader instructions per second • Superscalar design: vector, scalar and texture ops per instruction • – Pixel fillrate: 4 billion pixels/sec (8 per cycle); 2x for depth / stencil only • AA: 16 billion samples/sec; 2x for depth / stencil only • – Geometry rate: 500 million triangles/sec • – Texture rate: 8 billion bilinear filtered samples / sec • 10 MB EDRAM  256 GB/s fill • Direct3D 9.0-compatible • – High-Level Shader Language (HLSL) 3.0+ support • Custom features • – Memory export: Particle physics, Subdivision surfaces • – Tiling acceleration: Full resolution Hi-Z, Predicated Primitives • – XPS: • CPU cores can be slaved to GPU processing • GPU reads geometry data directly from L2 • – Hardware scaling for display resolution matching

  16. GPU Block Diagram

  17. Software • SMP/SMT • – Mainstream techniques • – Everything is simplified by being symmetric • UMA • – No partitioning headaches • OS • – All 3 cores available for game developers • Standard APIs • – Win32, OpenMP • – Direct3D, HLSL • – Assembly (CPU & Shader) supported - direct hardware access • Standard tools • – XNA: PIX, XACT • – Visual C++, works with multiple threads ...

More Related