1 / 22

Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting. Bill Dally, Computer Systems Laboratory Stanford University December 10, 2002. Overview. Where we are today First year goal was met: demonstrated feasibility on single node

Download Presentation

Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting Bill Dally, Computer Systems LaboratoryStanford University December 10, 2002

  2. Overview • Where we are today • First year goal was met: demonstrated feasibility on single node • Feedback from site visit team was very positive • Potential for a big impact on scientific computing • But still much to do! • Key FY03 goals • Get long-term software infrastructure in place • Select approach, implement baseline Brook to SSS compiler • Multi-node versions that scale • Language, compiler, simulator • Tackle hard problems: 3-D, Irregular neighborhoods/sparse matrix solve • Language support, numerics support, evaluate on simulator • Refine architecture • Cluster organization, aspect ratio, register organization, memory organization • Industrial Partner • Start serious discussions, outreach to build support, close partner in 04

  3. But first, lets review our overall goal Exploit capabilities of VLSI to realize cost-effective scientific computing.

  4. The big picture • VLSI technology enables us to put TeraOPS on a chip • Conventional general-purpose architecture cannot exploit this • The problem is bandwidth • Streams expose locality and concurrency • Perform operations in record (not operation as with vector) order • Enables compiler optimization at a larger scale than scalar processing • A stream architectureachieves high arithmetic intensity • Intensity = arithmetic rate/bandwidth • Bandwidth hierarchy, compound stream operations • A Streaming Supercomputer is feasible • 100GFLOPS (64-b) on a chip, 1TFLOPS single-board computer, PFLOPS systems

  5. Review – What is the SSS Project About? • Exploit streams to give 100x improvement in performance/cost for scientific applications vs. ‘cluster’ supercomputers • From 100 GFLOPS PCs to TFLOPS single-board computers to PFLOPS supercomputers • Use layered programming system to simplify development and tuning of applications • Stream languages • Streaming virtual machine • Demonstrated feasibility of streaming scientific computing in year 1 • Refine architecture and programming system in year 2 • Demonstrate realistic applications (3D, irregular) • Build usable compiler • Resolve architecture questions – aspect ratio, conditional execution, sparse clusters, reg organization, memory system, etc… • Build a prototype and demonstrate CITS applications in years 3-6 • With industrial and government partners • Broaden our base of support

  6. Software Infrastructure • Compiler • Decide on flow from Brook->SVM->SSS • Select base compiler • ORC, Gnu, SUIF, Tendra, others… • “Spike” a simple program from Brook->SSS • Optimizations • SVM Simulator

  7. 3-D Applications • StreamFLO • StreamFEM • StreamMD/Gromacs

  8. Irregular Grids • Need an application • Brook support for variable degree • Architecture/run-time support

  9. Multi-Node Execution • Brook support • Manual partitioning for first step • Simple application on SVM simulator

  10. Industrial Partner • Candidates • Cray, IBM, Sun, HP, SGI, Intel • Initial discussion • Present SSS project and results to date • Discuss collaboration models • Identify next steps

  11. Outreach • National Labs • Los Alamos • Livermore • Sandia • Other Government • NASA • DARPA • DoD (Charlie Holland) • AFOSR • User communities

  12. Software Fall 02 Goals • Brook • Multi-node issues: • Synchronization primitives • Data Partitioning • Variable length records • SVM • Multi-node simulator • Performance numbers for 3 apps • Compilation • Pick new infrastructure & design compiler (Reservoir) • Generate SVM code from Brook – (StreamC to SVM) • SVM to {SMP, graphics, SSS} (SVM is SMP) • Run-Time (Software services) • Identify issues • Issues • Variable length records? With stencils?

  13. Software Win 02 Goals Brook • Define carefully the semantics of the operators • Work on “views of memory” abstraction • Support for partitioning, shared memory, naming, fitting into stream abstraction • Support for irregular neighborhoods • Multithreaded version (Christos) • Concrete Winter goals [Ian/Frank] • Review of the language [Pat] • Partitioning (UPC) • Multi-node/Multi-threaded version • Irregular support – w/ application • PPoPP paper • MD on BRT

  14. Software Win 02 Goals SVM • Finish prototype single node implementation [Done] • Compiler issue • Implement multinode version • w/ multi-node app. • Start with one that runs on one processor [Francois] • Multithreaded on SMP – on SGI [+] • Cluster version [++] • SVM to simulator path • Mattan – not an intermediate between Brook and SSS

  15. Software Win 02 Goals (3 of 3) • Start regular meetings • Compiler • Decide on flow from Brook->SVM->SSS [Mattan] • Requirements • Select base compiler [Jayanth] • ORC, Gnu, SUIF, Tendra, others… • “Spike” a simple program from Brook->SSS [Mattan/Jayanth ++] • Brook to Nvidia • Optimizations [Spring] • Run time • Write a white paper

  16. Application Fall 02 Goals • SteamMD • Migrate to Gromacs • StreamFlo • Complete • 3D • StreamFEM • 3D • Sparse LA • Scalability – multiple nodes • Look at Sierra, purple benchmarks: ppm, sweep3D

  17. Application Win 02 Goals • StreamFLO[Fatica] • Partioned version; scalable • Convert to 3D • StreamFEM [Barth] • Partioned version; scalable • Convert to 3D • Sparse LA • StreamMD [Eric/student] • Migrate to GROMACS [Vijay Pande/Michael Levitt groups] • Redo inner (force) and outer (neighbor) loops • Partitioned version; scalable • Finish port to NV30: build cluster and folding@home • Model applications [Ron/Frank] • Model PDES with sparse matrix solves • An irregular application [Ron/Frank] • Look at Sierra, purple benchmarks: ppm, sweep3D [delay]

  18. Architecture Fall 02 Goals • Simulator • Multi-node working • Indexable SRF • Scalar processor • Point Studies • Conditionals • Aspect ratio • Indexable SRF • Add & Store (remote ops in general) • Iterative operations & extended precision • Network • Spec • Flesh out I/O • App studies

  19. Architecture Win 02 Goals • Single-Node Simulator [Jung-Ho, Knight] • 64-bit support, MULADD, Scalar Processor • Multi-Node Simulator [Jung-Ho, Abhishek] • Network model • Multi-node mechanisms • Point Studies • Aspect ratio • SSE vs VLIW • Conditional execution [Mattan/Ujval] • Sparse clusters • SRF organization [Nuwan] • Cache alternatives [Jung Ho] • Add and store study [Jung Ho] • I/O • Iterative operations [Francois]

  20. Special Win 02 Goals • Fix website [Pat] • Public and private websites • Name that computer • Mississippi • Axios • Submit names to Mattan • Bill, Pat, Bill to choose • Project Party

  21. Winter Quarter Meeting Schedule • 1/7 Ron Anything • 1/14 Francois/Mattan What is SVM • 1/21 Fatica 3D Flo • 1/28 Pat RTSL partitioning • 2/4 Bill Carlson [Pat] UPC • 2/11 Francois/Ian Discussion of targets SSS/CG/MPI • 2/18 Tim B. Irregular grid • 2/25 Mattan Compilation Infrastructure • 3/4 Jung Ho Add & Store • 3/11 Bill Wrapup

  22. Papers • Arch • Indexable SRFs (Nuwan) • Streaming Supercomputer Overview (Tim K.) • Streaming on conventional CPUs (Mattan) • Conditionals (Ujval) • Remote Ops (Jung Ho) • Aspect Ratio (?) • Data parallel (SSE) vs. ILP (VLIW) • Software • Design of Brook (Ian) • Data parallel programming on graphics HW (Pat) • Brook to CG • Compiler • Apps • Gromacs • StreamFEM (Tim2) • Overview (Bill and Pat)

More Related