320 likes | 335 Views
This article provides a brief history of high-performance computing (HPC) and explores the shift from sequential to parallel computing, the rise of clusters and Beowulf Clusters, the role of grid and peer-to-peer computing, and the impact of commodity hardware. It also discusses the challenges and advancements in HPC architecture and software development.
E N D
Technical computing: Observations on an ever changing, occasionally repetitious, environment Los Alamos National Laboratory 17 May 2002
A brief, simplified history of HPC • Sequential & data parallelism using shared memory, Cray’s Fortran computers 60-02 (US:90) • 1978: VAXen threaten general purpose centers… • NSF response: form many centers 1988 - present • SCI: Search for parallelism to exploit micros 85-95 • Scalability: “bet the farm” on clusters.Users “adapt” to clusters aka multi-computers with LCD program model, MPI. >95 • Beowulf Clusters adopt standardized hardware and Linus’s software to create a standard! >1995 • “Do-it-yourself” Beowulfs impede new structures and threaten g.p. centers >2000 • 1997-2002: Let’s tell NEC they aren’t “in step”. • High speed networking enables peer2peer computing and the Grid. Will this really work?
Outline • Retracing scientific computing evolution: Cray, SCI & “killer micros”, ASCI, & Clusters kick in. • Current taxonomy: clusters flavors • deja’vu rise of commodity computng: Beowulfs are a replay of VAXen c1978 • Centers: 2+1/2 at NSF; BRC on CyberInfrastructure urges 650M/year • Role of Grid and Peer-to-peer • Will commodities drive out or enable new ideas?
DARPA SCI: c1985-1995;prelude to DOE’s ASCI • Motivated by Japanese 5th Generation … note the creation of MCC • Realization that “killer micros” were • Custom VLSI and its potential • Lots of ideas to build various high performance computers • Threat and potential sale to military
Steve Squires & G Bell at our “Cray” at the start of DARPA’s SCI c1984.
X What Is the SystemArchitecture?(GB c1990) X X X SIMD GRID
Processor Architectures? VECTORS VECTORS OR CS View MISC >> CISC >> Language directed RISC >> Super-scalar >>Extra-Long Instruction Word Caches: mostly alleviate need for memory B/W SC Designers View RISC >> VCISC (vectors)>> Massively parallel (SIMD) (multiple pipelines) Memory B/W = perf.
The Bell-Hillis Bet c1991Massive (>1000) Parallelism in 1995 TMC World-wide Supers TMC World-wide Supers TMC World-wide Supers Applications Petaflops / mo. Revenue
Results from DARPA’s SCI c1983 • Many research and construction efforts … virtually all new hardware efforts failed except Intel and Cray. • DARPA directed purchases… screwed up the market, including the many VC funded efforts. • No Software funding! • Users responded to the massive power potential with LCD software. • Clusters, clusters, clusters using MPI. • It’s not scalar vs vector, its memory bandwidth! • 6-10 scalar processors = 1 vector unit • 16-64 scalars = a 2 – 6 processor SMP
ACRI Alliant American Supercomputer Ametek Applied Dynamics Astronautics BBN CDC Convex Cray Computer Cray Research Culler-Harris Culler Scientific Cydrome Dana/Ardent/Stellar/Stardent Denelcor Elexsi ETA Systems Evans and Sutherland Computer Floating Point Systems Galaxy YH-1 Goodyear Aerospace MPP Gould NPL Guiltech Intel Scientific Computers International Parallel Machines Kendall Square Research Key Computer Laboratories MasPar Meiko Multiflow Myrias Numerix Prisma Tera Thinking Machines Saxpy Scientific Computer Systems (SCS) Soviet Supercomputers Supertek Supercomputer Systems Suprenum Vitesse Electronics Dead Supercomputer Society
What a difference 25 years AND spending >10x makes! ESRDC:40 Tflops. 640 nodes (8 - 8GFl P.vec/node) LLNL 150 Mflops machine room c1978
Computer types -------- Connectivity-------- WAN/LAN SAN DSM SM Netwrked Supers… VPPuni NEC mP NEC super Cray X…T (all mPv) Old World Clusters GRID& P2P micros vector Legion Condor Beowulf NT clusters T3E SP2(mP) NOW SGI DSM clusters & SGI DSM Mainframes Multis WSs PCs
Top500 taxonomy… everything is a cluster aka multicomputer • Clusters are the ONLY scalable structure • Cluster: n, inter-connected computer nodes operating as one system. Nodes: uni- or SMP. Processor types: scalar or vector. • MPP= miscellaneous, not massive (>1000), SIMD or something we couldn’t name • Cluster types. Implied message passing. • Constellations = clusters of >=16 P, SMP • Commodity clusters of uni or <=4 Ps, SMP • DSM: NUMA (and COMA) SMPs and constellations • DMA clusters (direct memory access) vs msg. pass • Uni- and SMPvector clusters:Vector Clusters and Vector Constellations
Linux - a web phenomenon • Linus Tovald - writes news reader for his PC • Puts it on the internet for others to play • Others add to it contributing to open source software • Beowulf adopts early Linux • Beowulf adds Ethernet drivers for essentially all NICs • Beowulf adds channel bonding to kernel • Red Hat distributes Linux with Beowulf software • Low level Beowulf cluster management tools added
The Challenge leading to Beowulf • NASA HPCC Program begun in 1992 • Comprised Computational Aero-Science and Earth and Space Science (ESS) • Driven by need for post processing data manipulation and visualization of large data sets • Conventional techniques imposed long user response time and shared resource contention • Cost low enough for dedicated single-user platform • Requirement: • 1 Gflops peak, 10 Gbyte, < $50K • Commercial systems: $1000/Mflops or 1M/Gflops
The Virtuous Economic Cycle drives the PC industry… & Beowulf Attracts suppliers Greater availability @ lower cost Competition Volume Standards DOJ Utility/value Innovation Creates apps, tools, training, Attracts users
Lessons from Beowulf • An experiment in parallel computing systems • Established vision- low cost high end computing • Demonstrated effectiveness of PC clusters for some (not all) classes of applications • Provided networking software • Provided cluster management tools • Conveyed findings to broad community • Tutorials and the book • Provided design standard to rally community! • Standards beget: books, trained people, software … virtuous cycle that allowed apps to form • Industry begins to form beyond a research project Courtesy, Thomas Sterling, Caltech.
Clusters: Next Steps • Scalability… • They can exist at all levels: personal, group, … centers • Clusters challenge centers… given that smaller users get small clusters
Kilo Mega Giga Tera Peta Exa Zetta Yotta Disk Evolution • Capacity:100x in 10 years 1 TB 3.5” in 2005 20 TB? in 2012?! • System on a chip • High-speed SAN • Disk replacing tape • Disk is super computer!
Intermediate Step: Shared Logic Snap ~1TB 12x80GB NAS • Brick with 8-12 disk drives • 200 mips/arm (or more) • 2xGbpsEthernet • General purpose OS • 10k$/TB to 100k$/TB • Shared • Sheet metal • Power • Support/Config • Security • Network ports • These bricks could run applications e.g. SQL, Mail… NetApp ~.5TB 8x70GB NAS Maxstor ~2TB 12x160GB NAS IBM TotalStorage ~360GB 10x36GB NAS
RLX “cluster” in a cabinet • 366 servers per 44U cabinet • Single processor • 2 - 30 GB/computer (24 TBytes) • 2 - 100 Mbps Ethernets • ~10x perf*, power, disk, I/O per cabinet • ~3x price/perf • Network services… Linux based *42, 2 processors, 84 Ethernet, 3 TBytes
Computing in small spaces @ LANL(RLX cluster in building with NO A/C) 240 processors @2/3 GFlops Fill the 4 racks -- gives a Teraflops
“The networks becomes the system.”- Bell 2/10/82 Ethernet announcement with Noyce (Intel), and Liddle (Xerox)“The network become the computer.” SUN Slogan >1982“The network becomes the system.” GRID mantra c1999
Legacy mainframes & minicomputers servers & terms Portables Legacy mainframe & minicomputer servers & terminals ComputingSNAPbuilt entirelyfrom PCs Wide-area global network Mobile Nets Wide & Local Area Networks for: terminal, PC, workstation, & servers Person servers (PCs) scalable computers built from PCs A space, time (bandwidth), & generation scalable environment Person servers (PCs) Centralized & departmental uni- & mP servers (UNIX & NT) Centralized & departmental servers buit from PCs ??? TC=TV+PC home ... (CATV or ATM or satellite)
Increased Demand Increase Capacity(circuits & bw) Create new service Lower response time WWW Audio Video Voice! The virtuous cycle of bandwidth supply and demand Incompence ? Standards Telnet & FTP EMAIL
Internet II concerns given $0.5B cost • Very high cost • $(1 + 1) / GByte to send on the net; Fedex and 160 GByte shipments are cheaper • DSL at home is $0.15 - $0.30 • Disks cost $1/GByte to purchase! • Low availability of fast links (last mile problem) • Labs & universities have DS3 links at most, and they are very expensive • Traffic: Instant messaging, music stealing • Performance at desktop is poor • 1- 10 Mbps; very poor communication links
Scalable computing: the effects • They come in all sizes; incremental growth 10 or 100 to 10,000 (100X for most users)debug vs run; problem growth • Allows compatibility heretofore impossible1978: VAX chose Cray Fortran1987: The NSF centers went to UNIX • Users chose sensible environment • Acquisition and operational costs & environments • Cost to use as measured by user’s time • The role of gp centers e.g. NSF, statex is unclear. Necessity for support? • Scientific Data for a given community… • Community programs and data • Manage GRIDdiscipline • Are clusters ≈ Gresham’s Law? Drive out alts.