170 likes | 421 Views
Introduction. Brief History of Gaming PlatformsDifference between consoles and personal computers Look at actual ArchitectureComparison of VendorsSummary. History of gaming. Video gaming itself dates back to the 60's and 70's Consoles such as Magnavox Odyssey , Atari , and Colecovison made ga
E N D
1. Comparison of Next Generation Gaming Architectures Presented By
Dela Tsiagbe
2. Introduction Brief History of Gaming Platforms
Difference between consoles and personal computers
Look at actual Architecture
Comparison of Vendors
Summary
3. History of gaming Video gaming itself dates back to the 60s and 70s
Consoles such as Magnavox Odyssey , Atari , and Colecovison made gaming popular
NES
Storytelling
4. Difference between Consoles and PCs In the past it used to be true that the computing power of a PC was far more than that of a console.
Consoles today require much more.
Most times, the type of power you get for the amount you pay for the console is more. Meaning you get more for your money when you purchase a gaming console of the same price of a PC.
5. Difference between Consoles and PCs (continued) Xbox 360 Stats
Custom IBM PowerPC-based CPU
* 3 symmetrical cores running at 3.2 GHz each
* 2 hardware threads per core; 6 hardware threads total
* 1 VMX-128 vector unit per core; 3 total
* 128 VMX-128 registers per hardware thread
* 1 MB L2 cache
CPU Game Math Performance
* 9 billion dot product operations per second
Custom ATI Graphics Processor
* 500 MHz
* 10 MB embedded DRAM
* 48-way parallel floating-point dynamically-scheduled shader pipelines
* Unified shader architecture
6. Difference between Consoles and PCs (continued) * PowerPC-base Core @3.2GHz
* 1 VMX vector unit per core
* 512KB L2 cache
* 7 x SPE @3.2GHz
* 7 x 128b 128 SIMD GPRs
* 7 x 256KB SRAM for SPE
* * 1 of 8 SPEs reserved for redundancy total floating point performance: 218 GFLOPS
7. Difference between Consoles and PCs (continued) Things to consider:
Although there is less memory, there is no is a minimal OS running in the background
Compatibility of hardware is never a problem
There is very little overhead from the system itself.
8. Types of processors Xbox 360 - Xenon
PS3 - PowerPC Cell
9. PS3 Schematics
10. Xbox 360 Schematics
11. Power PC Instruction Set li REG, VALUE
loads register REG with the number VALUE
add REGA, REGB, REGC
adds REGB with REGC and stores the result in REGA
addi REGA, REGB, VALUE
add the number VALUE to REGB and stores the result in REGA
mr REGA, REGB
copies the value in REGB into REGA
or REGA, REGB, REGC
performs a logical "or" between REGB and REGC, and stores the result in REGA
ori REGA, REGB, VALUE
performs a logical "or" between REGB and VALUE, and stores the result in REGA
and, andi, xor, xori, nand, nand, and nor
all of these follow the same pattern as "or" and "ori" for the other logical operations
ld REGA, 0(REGB)
12. PowerPC Instruction Set use the contents of REGB as the memory address of the value to load into REGA
lbz, lhz, and lwz
all of these follow the same format, but operate on bytes, halfwords, and words, respectively (the "z" indicates that they also zero-out the rest of the register)
b ADDRESS
jump (or branch) to the instruction at address ADDRESS
bl ADDRESS
subroutine call to address ADDRESS
cmpd REGA, REGB
compare the contents of REGA and REGB, and set the bits of the status register appropriately
beq ADDRESS
branch to ADDRESS if the previously compared register contents were equal
bne, blt, bgt, ble, and bge
all of these follow the same form, but check for inequality, less than, greater than, less than or equal to, and greater than or equal to, respectively.
std REGA, 0(REGB)
use the contents of REGB as the memory address to save the value of REGA into
stb, sth, and stw
13. CPU Specs Three 3.2 GHz PowerPC cores ? Shared 1MB L2 cache, 8-way set associative ? Per-Core Features ? 2-issue per cycle, in-order, decoupled Vector/Scalar issue queue
2 symmetric fine grain hardware threads ? L1 Caches: 32K 2-way I$ / 32K 4-way D$
Execution pipelines ? Branch Unit, Integer Unit, Load/Store Unit ? VMX128 Units: Floating Point Unit, Permute Unit, Simple Unit ? Scalar FPU ? VMX128 enhanced for game and graphics workloads
? All execution units 4-way SIMD
? 128 128-bit vector registers per thread
? Custom dot-product instruction
? Native D3D compressed data formats
14. CPU Data Streams High bandwidth data streaming support with minimal
cache thrashing
128B cache line size (all caches)
Flexible set locking in L2
Write streaming:
L1s are write through, writes do not allocate in L1
4 uncacheable write gathering buffers per core
8 cacheable, non-sequential write gathering buffers per core
Read streaming:
xDCBT data prefetch around L2, directly into L1
8 outstanding load/prefetches per core
Tight GPU data streaming integration (XPS)
XPS Xbox Procedural Synthesis
GPU 128B read from L2
GPU low latency cacheable writebacks to CPU
GPU shares D3D compressed data formats with CPU => at least
2x effective bus bandwidth for typical graphics data
15. GPU 500 MHz graphics processor
48 parallel shader cores (ALUs); dynamically scheduled; 32bit IEEE
FLP
24 billion shader instructions per second
Superscalar design: vector, scalar and texture ops per instruction
Pixel fillrate: 4 billion pixels/sec (8 per cycle); 2x for depth / stencil only
AA: 16 billion samples/sec; 2x for depth / stencil only
Geometry rate: 500 million triangles/sec
Texture rate: 8 billion bilinear filtered samples / sec
10 MB EDRAM ? 256 GB/s fill
Direct3D 9.0-compatible
High-Level Shader Language (HLSL) 3.0+ support
Custom features
Memory export: Particle physics, Subdivision surfaces
Tiling acceleration: Full resolution Hi-Z, Predicated Primitives
XPS:
CPU cores can be slaved to GPU processing
GPU reads geometry data directly from L2
Hardware scaling for display resolution matching
16. GPU Block Diagram
17. Software SMP/SMT
Mainstream techniques
Everything is simplified by being symmetric
UMA
No partitioning headaches
OS
All 3 cores available for game developers
Standard APIs
Win32, OpenMP
Direct3D, HLSL
Assembly (CPU & Shader) supported - direct hardware access
Standard tools
XNA: PIX, XACT
Visual C++, works with multiple threads ...