150 likes | 232 Views
Submitted to: J. Lee By DharaKumari Patel. Playstation3 cluster experience. Introduction Hardware Software Communication Algorithms Applications Benchmarks Conclusion. INDEX. Multi-core approach was deemed the next biggest change in processor technology. Challenges to overcome:
E N D
Submitted to: J. Lee By DharaKumari Patel Playstation3 clusterexperience
Introduction Hardware Software Communication Algorithms Applications Benchmarks Conclusion INDEX
Multi-core approach was deemed the next biggest change in processor technology. Challenges to overcome: – Memory Wall 1. Streaming DMA architecture 2. 3-level Memory Model: Main Storage, Local Storage, Register Files – Power Wall 1. High frequency at a low operating voltage with advanced power management – Efficiency Wall 1. Highly optimized implementation 2. Large shared register files, SIMD and deeper pipelines INTRODUCTION
PS3 Cluster – 8 PS3s in a Private VLAN (10.0.0.X) – GigE between the Nodes – DHCP Server on the Front Node IP Masquerading, hosts name setup, etc. – Accessible Only through Front Node HARDWARE
Installed on PS3: – Fedora Core 5 1. Linux Kernel v2.6.16: No built-in Cell BE Support. 2. Recommended by IBM for SDK 2.0 3. Minimum Installation with Extra Packages – SDK 2.0 Installed for PS3 cluster: – MPICH2 (MPI 2.0 standard) 1. Compatibility Issues with PowerPC Architecture. SOFTWARE
SDK & Kernel Recompilation – Two Different Compilers for PPE and SPE – SDK Includes Compilers and Debuggers, SIMD Libraries, Full System Simulator, etc. – Kernel Needs to be Recompiled for Huge TLB Pages 1. Improves Performance 2. Lower Translations from Virtual Addresses to Physical Addresses SOFTWARE
Element Interconnect Bus (EIB) – Heart of cell’s intra-chip communication – Runs at half of processor bandwidth with peak performance of 204.8 Gb/s. – 4 rings, 2 clockwise and 2 counter-clockwise with 16 byte-wide rings. COMMUNICATION
ALGORITHMS • Basic algorthim to split data into several SPEs: • for (i=0; i<num_spes; i++) { • offset = size - ((num_spes - i)*(size/num_spes)); • for (k=0; k < SPU_SPLIT_NUM; k++) { • spu1_data[k] = data[k+offset]; • } • } • Eg:- 1024, SPEs = 4, offset: 0, 256, 512, 768 • Basic algorithm to “stitch” calculated data from several SPEs: • for (i=0; i<num_spes; i++) { • offset = size - ((num_spes - i)*(size/num_spes)); • for (k=0; k < SPU_SPLIT_NUM; k++) { • result[k+offset] = spu1_result[k]; • } • }
Cluster (Optimized): (Variable Nodes and Matrix Sizes) - Normal (Just PPE) - Addition of Two Arrays - Multiplication of Two Arrays - Copy of Two Arrays - Triad (Add, Multiply) + Copy PS3 Node Scaling Comparison to a Desktop processor BENCHMARKS
Cluster setup is similar to setting up any other generic clusters Cell cluster can provide a good scalable performance Very steep learning curve for programming the cell Have to manipulate algorithms and incoming data to take full advantage of Cell CONCLUSION
http://arstechnica.com/old/content/2006/04/6600.ars http://www.playstation.com/ps3-openplatform/index.html http://en.wikipedia.org/wiki/PlayStation_3 http://en.wikipedia.org/wiki/Linux_on_the_PlayStation_3 References