220 likes | 354 Views
Architecture or Parallel Computers CSC / ECE 506 Summer 2006 Introduction / Overview. 5/22/2006 Dr Steve Hunter. Architecture of Parallel Computers. Taught jointly by Dr Ed Gehringer and Dr Steve Hunter Course days Monday 4:00 – 6:45 Wednesday 4:00 – 5:15
E N D
Architecture or Parallel ComputersCSC / ECE 506 Summer 2006Introduction / Overview 5/22/2006 Dr Steve Hunter
Architecture of Parallel Computers • Taught jointly by Dr Ed Gehringer and Dr Steve Hunter • Course days • Monday 4:00 – 6:45 • Wednesday 4:00 – 5:15 • Goal: Understand the interaction of hardware and software with respect to parallel systems design and implementation. • Textbook “Parallel Computer Architecture”, by Culler and Singh • Selected papers possible CSC / ECE 506
Architecture of Parallel Computers • Steve’s Info: • NCSU Adjunct Professor • IBM Corporation • Website: http://www.ee.duke.edu/~shunter/ • email: hunters@us.ibm.com • Academic • Auburn University BSEE • NC State University MSEE • Duke University PhD • IBM Corporation • IBM Networking Division 14 years • Systems and Technology Group 8 years • Areas of Interest • Systems and Network Architecture and Technology • Computer and Network Performance and Dependability • Server Clustering and Software Dependability CSC / ECE 506
Course Outline (Tentative) http://courses.ncsu.edu/csc506/lec/052/lectures/syllabus.html CSC / ECE 506
What is Parallel Computer Architecture? • A Parallel Computer is a collection of processing elements that cooperate to solve large problems fast • Some broad issues: • Resource Allocation: • how large a collection? • how powerful are the elements? • how much memory? • Data access, Communication and Synchronization • how do the elements cooperate and communicate? • how are data transmitted between processors? • what are the abstractions and primitives for cooperation? • Performance and Scalability • how does it all translate into performance? • how does it scale? CSC / ECE 506
Historical Perspective • Parallel computing was represented by competing models and corresponding unique architectures, no clear path for growth • Competing Methods • Dataflow • Systolic Arrays • SIMD (bit serial) • Shared Memory • Message passing • Confusion occurs over which model to use paralyzed parallel software development • Section 1.2 shows several architectures. • Shared-Memory Multiprocessors • Bus-based; Crossbar-based; * MIN-based • Message Passing Machines (Hypercube) • IBM SP2 Architecture CSC / ECE 506
Why Study Parallel Computer Architecture? • Role of a computer architect: • To design and engineer the various levels of a computer system to maximize performance and programmability within limits of technology and cost. • Parallelism: • Provides alternative to faster clock for performance • Applies at all levels of system design • Is a fascinating perspective from which to view architecture • Traditionally central in information processing elements in the same locality • However, greater networking bandwidth is expanding parallelism over greater distances. CSC / ECE 506
Parallel Computation: Why and Why Not? • Pros • Performance • Cost-effectiveness (commodity parts) • Smooth upgrade path • Fault Tolerance • Cons • Difficult to parallelize applications • Requires automatic parallelization or parallel program development • Software! CSC / ECE 506
Is Parallel Computing Inevitable? • Application demands: (the need for computing cycles) • Petroleum (reservoir analysis) • Automotive (crash simulation, drag analysis, combustion efficiency) • Aeronautics (airflow analysis, engine efficiency, structural mechanics, electromagnetism) • Computer-aided design • Pharmaceuticals (molecular modeling) • Visualization • in all of the above • entertainment (films like Toy Story, The Hulk) • architecture (walk-throughs and rendering) • Financial modeling (yield and derivative analysis) • Search Engines • etc. CSC / ECE 506
New Applications More Performance Application Trends • Application demand for performance fuels advances in hardware, which enables new applications, which... • Cycle drives exponential increase in microprocessor performance • Drives parallel architecture harder • most demanding applications • Range of performance demands • Need range of system performance with progressively increasing cost CSC / ECE 506
Performance (p processors) Performance (1 processor) Time (1 processor) Time (p processors) Speedup • Speedup (p processors) = • For a fixed problem size (input data set), performance = 1/time • Speedup fixed problem (p processors) = CSC / ECE 506
Is Parallel Computing Inevitable? • Technology Trends • Chip technology continues to increase in density • Driving frequency of single core designs requires too much power • Use of commodity or off-the-shelf technology for low costs • Multi-core processing becoming common among mainstream microprocessors (e.g., AMD, IBM, Intel) • Greater interconnect bandwidth becoming generally available • Standard interconnects: Infiniband, 10Gb Ethernet • Architecture Trends • Packaging parallel solutions in a common chassis • e.g., Blade servers (IBM, HP, Dell, etc.) • Software being packaged for mainstream solutions • e.g., Windows Compute Cluster Server 2003 • High availability commonly achieved by clustering of processing elements CSC / ECE 506
Is Parallel Computing Inevitable? • Economics • The reducing costs of low end servers (dual and quad socket) with high bandwidth of interconnects is driving applications to be parallel • Commodity microprocessors not only fast but CHEAP • Development costs tens of millions of dollars • BUT, many more are sold compared to supercomputers • Crucial to take advantage of the investment, and use the commodity building block • Multiprocessors being pushed by software vendors (e.g. database) as well as hardware vendors • Standardization makes small, bus-based SMPs commodity • Desktop: few smaller processors versus one larger one? • Multiprocessor on a chip? CSC / ECE 506
Scale Up vs Scale Out Model Large SMP Large Parallel Clusters x455 High Density Rack Mount x445 Scale Up / SMP Computing BladeCenter™ xSeries 335/eServer 325 Scale Out / Distributed Computing CSC / ECE 506
Blade Server Example - BladeCenter Nov 2002 March 2004 Jan 2006 BladeCenter 7U Chassis Form Factor Highest Density, Lowest cost Super power efficient, Consolidated Management BladeCenter H 9U Chassis Form Factor Ultra High Performance 4xIB/10Gb Backplane New Management Module BladeCenter T 8U Chassis Form Factor Highly rugged, Telco AC/DC, Long Life, NEBS, Air Filtration Compatible Set of Blades and Switches Web hosting/serving FSS, File/Print Geophysical Analysis Collaboration Graphic Rendering Telco/Core Applications Government Military Rugged Industrial DC Medical HPC Applications Technical Clusters Virtual Enterprise Solutions Future I/O One family, many applications, many environments, long term investment protection- BladeCenter Simply Smarter IT CSC / ECE 506
Blade Server Example – BladeCenter H • Fourteen Blades in a 9U Chassis Form Factor • Blade and switch compatibility across BladeCenter and BladeCenter-T • High performance networking fabrics • New high performance switches and blade I/O • Corresponding bridge bays for protocol translation • Power Enhancements • Four front load 2900W Power Supplies CSC / ECE 506
BladeCenter Overview • Switching Modules • Ethernet • Fiber Channel • Infiniband • Blade I/O Card (or local drive) • I/O card matches switch technology in corresponding slot CSC / ECE 506
I/O Bridge Blade 1 HS Switch 1 HS Switch 2 Blade 2 I/O Bridge I/O Bridge 3/ SM3 HS Switch 3 . . . HS Switch 4 I/O Bridge 4 / SM4 Switch Module 1 Blade 14 Switch Module 2 Mgmt Mod 1 Mgmt Mod 2 BladeCenter H Architecture • High-speed Switch • Ethernet or Infiniband • 4x (16 wire) blade links • 4x (16 wire) bridge links • 1x (4 wire) Mgmt links • Uplinks: Up to 12x links for IB and at least four 10Gb links for Ethernet • I/O Bridge • e.g., Ethernet, Fibre Channel, Passthru • Dual 4x (16 wire) wiring internally to each HSSM CSC / ECE 506
Expanding BladeCenter Ecosystem with Cisco Systems Switch module and daughter card designed for BladeCenter H Daughter card provides dual port 4x (10G) InfiniBand connectivity to each blade Help Reduce Data Center Complexity Reduce the number of adapters, cables, and switch ports required Manage the addition or removal of I/O or storage bandwidth centrally Enable users to adjust resources on demand without downtime High Performance Computing Features Leverages RDMA to deliver low latency performance Delivers higher bandwidth connectivity (160 Gbps to chassis) Achieve blade port consolidation through remote I/O I/O Virtualization via Cisco VFrame BladeCenter H InfiniBand Solution provides high-speed, low latency solutions while lowering TCO InfiniBand on BladeCenter H Enabling High Performance and Virtualized I/O CSC / ECE 506
The End CSC / ECE 506
Grid Example Inter-Grids Inter - Grids Extra-Grids Extra - Grids Cactus Cactus Intra-Grids Intra - Grids (SF) (SF) Express NTG NTG Express Project Project Grid Grid Grid Grid Grid Grid VPN VPN NAS/SAN NAS/SAN NAS/SAN NAS/SAN NAS/SAN NAS/SAN Fin. Fin. Services Services 2003 MFG MFG Commerce with Trusted Partners 2006+ "Full Commercialization" with unknown partners CSC / ECE 506 Courtesy of Ellen Stokes