450 likes | 466 Views
This article explores the evolution of sequential and data parallelism in computing, from Fortran computers to the adoption of clusters, such as Beowulfs. It also discusses the role of centers, the emergence of grids, and the potential impact of commodity computers on innovation.
E N D
Crays, Clusters, Centers and Grids Gordon Bell (gbell@microsoft.com) Bay Area Research Center Microsoft Corporation
Summary • Sequential & data parallelism using shared memory, Fortran computers 60-90 • Search for parallelism to exploit micros 85-95 • Users adapted to the clusters aka multi-computers by lcd program model, MPI. >95 • Beowulf standardized clusters of standard hardware and software >1998 • “Do-it-yourself” Beowulfs impede new structures and threaten centers >2000 • High speed nets kicking in to enable Grid.
Outline • Retracing scientific computing evolution: Cray, DARPA SCI & “killer micros”, Clusters kick in. • Current taxonomy: clusters flavors • deja’vu rise of commodity computng: Beowulfs are a replay of VAXen c1980 • Centers • Role of Grid and Peer-to-peer • Will commodities drive out new ideas?
DARPA Scalable Computing Initiative c1985-1995; ASCI • Motivated by Japanese 5th Generation • Realization that “killer micros” were • Custom VLSI and its potential • Lots of ideas to build various high performance computers • Threat and potential sale to military
Steve Squires & G Bell at our “Cray” at the start of Darpa’s SCI.
ACRI Alliant American Supercomputer Ametek Applied Dynamics Astronautics BBN CDC Convex Cray Computer Cray Research Culler-Harris Culler Scientific Cydrome Dana/Ardent/Stellar/Stardent Denelcor Elexsi ETA Systems Evans and Sutherland Computer Floating Point Systems Galaxy YH-1 Goodyear Aerospace MPP Gould NPL Guiltech Intel Scientific Computers International Parallel Machines Kendall Square Research Key Computer Laboratories MasPar Meiko Multiflow Myrias Numerix Prisma Tera Thinking Machines Saxpy Scientific Computer Systems (SCS) Soviet Supercomputers Supertek Supercomputer Systems Suprenum Vitesse Electronics Dead Supercomputer Society
DARPA Results • Many research and construction efforts … virtually all failed. • DARPA directed purchases… screwed up the market, including the many VC funded efforts. • No Software funding. • Users responded to the massive power potential with LCD software. • Clusters, clusters, clusters using MPI. • It’s not scalar vs vector, its memory bandwidth! • 6-10 scalar processors = 1 vector unit • 16-64 scalars = a 2 – 6 processor SMP
Top500 taxonomy… everything is a cluster aka multicomputer • Clusters are the ONLY scalable structure • Cluster: n, inter-connected computer nodes operating as one system. Nodes: uni- or SMP. Processor types: scalar or vector. • MPP= miscellaneous, not massive (>1000), SIMD or something we couldn’t name • Cluster types. Implied message passing. • Constellations = clusters of >=16 P, SMP • Commodity clusters of uni or <=4 Ps, SMP • DSM: NUMA (and COMA) SMPs and constellations • DMA clusters (direct memory access) vs msg. pass • Uni- and SMPvector clusters:Vector Clusters and Vector Constellations
The Challenge leading to Beowulf • NASA HPCC Program begun in 1992 • Comprised Computational Aero-Science and Earth and Space Science (ESS) • Driven by need for post processing data manipulation and visualization of large data sets • Conventional techniques imposed long user response time and shared resource contention • Cost low enough for dedicated single-user platform • Requirement: • 1 Gflops peak, 10 Gbyte, < $50K • Commercial systems: $1000/Mflops or 1M/Gflops
Linux - a web phenomenon • Linus Tovald - bored Finish graduate student writes news reader for his PC, uses Unix model • Puts it on the internet for others to play • Others add to it contributing to open source software • Beowulf adopts early Linux • Beowulf adds Ethernet drivers for essentially all NICs • Beowulf adds channel bonding to kernel • Red Hat distributes Linux with Beowulf software • Low level Beowulf cluster management tools added
The Virtuous Economic Cycle drives the PC industry… & Beowulf Attracts suppliers Greater availability @ lower cost Competition Volume Standards DOJ Utility/value Innovation Creates apps, tools, training, Attracts users
BEOWULF-CLASS SYSTEMS • Cluster of PCs • Intel x86 • DEC Alpha • Mac Power PC • Pure M2COTS • Unix-like O/S with source • Linux, BSD, Solaris • Message passing programming model • PVM, MPI, BSP, homebrew remedies • Single user environments • Large science and engineering applications
Interesting “cluster” in a cabinet • 366 servers per 44U cabinet • Single processor • 2 - 30 GB/computer (24 TBytes) • 2 - 100 Mbps Ethernets • ~10x perf*, power, disk, I/O per cabinet • ~3x price/perf • Network services… Linux based *42, 2 processors, 84 Ethernet, 3 TBytes
Lessons from Beowulf • An experiment in parallel computing systems • Established vision- low cost high end computing • Demonstrated effectiveness of PC clusters for some (not all) classes of applications • Provided networking software • Provided cluster management tools • Conveyed findings to broad community • Tutorials and the book • Provided design standard to rally community! • Standards beget: books, trained people, software … virtuous cycle that allowed apps to form • Industry begins to form beyond a research project Courtesy, Thomas Sterling, Caltech.
Direction and concerns • Commodity clusters are evolving to be mainline supers • Beowulf do-it-yourself effect is like VAXen… clusters have taken a long time. • Will they drive out or undermine centers? • Or is computing so complex as to require a center to manage and support complexity? • Centers: • Data warehouses • Community centers e.g. weather • Will they drive out a diversity of ideas?Assuming there are some?
Increased Demand Increase Capacity(circuits & bw) Create new service Lower response time WWW Audio Video Voice! The virtuous cycle of bandwidth supply and demand Standards Telnet & FTP EMAIL
Map of Gray Bell Prize results Redmond/Seattle, WA single-thread single-stream tcp/ip via 7 hops desktop-to-desktop …Win 2K out of the box performance* New York Arlington, VA San Francisco, CA 5626 km 10 hops
The Promise of SAN/VIA:10x in 2 years http://www.ViArch.org/ • Yesterday: • 10 MBps (100 Mbps Ethernet) • ~20 MBps tcp/ip saturates 2 cpus • round-trip latency ~250 µs • Now • Wires are 10x faster Myrinet, Gbps Ethernet, ServerNet,… • Fast user-level communication • tcp/ip ~ 100 MBps 10% cpu • round-trip latency is 15 us • 1.6 Gbps demoed on a WAN
SNAP … c1995Scalable Network And PlatformsA View of Computing in 2000+We all missed the impact of WWW! Gordon Bell Jim Gray
How Will Future Computers Be Built? Thesis: SNAP: Scalable Networks and Platforms • Upsize from desktop to world-scale computer • based on a few standard components Because: • Moore’s law: exponential progress • Standardization & Commoditization • Stratification and competition When: Sooner than you think! • Massive standardization gives massive use • Economic forces are enormous
Legacy mainframes & minicomputers servers & terms Portables Legacy mainframe & minicomputer servers & terminals ComputingSNAPbuilt entirelyfrom PCs Wide-area global network Mobile Nets Wide & Local Area Networks for: terminal, PC, workstation, & servers Person servers (PCs) scalable computers built from PCs A space, time (bandwidth), & generation scalable environment Person servers (PCs) Centralized & departmental uni- & mP servers (UNIX & NT) Centralized & departmental servers buit from PCs ??? TC=TV+PC home ... (CATV or ATM or satellite)
GB plumbing from the baroque:evolving from the 2 dance-hall model Mp ---- S --- Pc : | : |——————-- S.fiber ch. — Ms | : |— S.Cluster |— S.WAN — vs. MpPcMs — S.Lan/Cluster/Wan — :
Grids: Why? • The problem or community dictates a Grid • Economics… thief or scavenger • Research funding… that’s where the problems are
The Grid… including P2P • GRID was/is an exciting concept … • They can/must work within a community, organization, or project. What binds it? • “Necessity is the mother of invention.” • Taxonomy… interesting vs necessity • Cycle scavenging and object evaluation (e.g. seti@home, QCD, factoring) • File distribution/sharing aka IP theft (e.g. Napster, Gnutella) • Databases &/or programs and experiments(astronomy, genome, NCAR, CERN) • Workbenches: web workflow chem, bio… • Single, large problem pipeline… e.g. NASA. • Exchanges… many sites operating together • Transparent web access aka load balancing • Facilities managed PCs operating as cluster!
Some observations • Clusters are purchased, managed, and used as a single, one room facility. • Clusters are the “new” computers. They present unique, interesting, and critical problems… then Grids can exploit them. • Clusters & Grids have little to do with one another… Grids use clusters! • Clusters should be a good simulation of tomorrow’s Grid. • Distributed PCs: Grids or Clusters? • Perhaps some clusterable problems can be solved on a Grid… but it’s unlikely. • Lack of understanding clusters & variants • Socio-, political, eco- wrt to Grid.
deja’ vu • ARPAnet: c1969 • To use remote programs & data • Got FTP & mail. Machines & people overloaded. • NREN: c1988 • BW => Faster FTP for images, data • Latency => Got http://www… • Tomorrow => Gbit communication BW, latency • <’90 Mainframes, minis, PCs/WSs • >’90 very large, dep’t, & personal clusters • VAX: c1979 one computer/scientist • Beowulf: c1995 one cluster ∑PCs /scientist • 1960s batch: opti-use allocate, schedule,$ • 2000s GRID: opti-use allocate, schedule, $ (… security, management, etc.)
Modern scalable switches … also hide a supercomputer • Scale from <1 to 120 Tbps • 1 Gbps ethernet switches scale to 10s of Gbps, scaling upward • SP2 scales from 1.2
CMOS Technology Projections • 2001 • logic: 0.15 um, 38 Mtr, 1.4 GHz • memory: 1.7 Gbits, 1.18 access • 2005 • logic: 0.10 um, 250 Mtr, 2.0 GHz • memory: 17.2 Gbits, 1.45 access • 2008 • logic: 0.07 um, 500 Mtr, 2.5 GHz • memory: 68.7 Gbits, 1.63 access • 2011 • logic: 0.05 um, 1300 Mtr, 3.0 GHz • memory: 275 Gbits, 1.85 access
Future Technology Enablers • SOCs: system-on-a-chip • GHz processor clock rate • VLIW • 64-bit processors • scientific/engineering application • address spaces • Gbit DRAMs • Micro-disks on a board • Optical fiber and wave division multiplexing communications (free space?)
The EndHow can GRIDs become a non- ad hoc computer structure?Get yourself an application community!
Volume drives simple,cost to standardplatforms p e r f o r m a n c e Stand-alone Desk tops PCs
In a 5-10 years we can/will have: • more powerful personal computers • processing 10-100x; multiprocessors-on-a-chip • 4x resolution (2K x 2K) displays to impact paper • Large, wall-sized and watch-sized displays • low cost, storage of one terabyte for personal use • adequate networking? PCs now operate at 1 Gbps • ubiquitous access = today’s fast LANs • Competitive wireless networking • One chip, networked platforms e.g. light bulbs, cameras • Some well-defined platforms that compete with the PC for mind (time) and market sharewatch, pocket, body implant, home (media, set-top) • Inevitable, continued cyberization… the challenge… interfacing platforms and people.
Linus’s & Stahlman’s Law: Linux everywhereaka Torvald Stranglehold • Software is or should be free • All source code is “open” • Everyone is a tester • Everything proceeds a lot faster when everyone works on one code • Anyone can support and market the code for any price • Zero cost software attracts users! • All the developers write code
ISTORE Hardware Vision • System-on-a-chip enables computer, memory, without significantly increasing size of disk • 5-7 year target: • MicroDrive:1.7” x 1.4” x 0.2” 2006: ? • 1999: 340 MB, 5400 RPM, 5 MB/s, 15 ms seek • 2006: 9 GB, 50 MB/s ? (1.6X/yr capacity, 1.4X/yr BW) • Integrated IRAM processor • 2x height • Connected via crossbar switch • growing like Moore’s law • 16 Mbytes; ; 1.6 Gflops; 6.4 Gops • 10,000+ nodes in one rack! 100/board = 1 TB; 0.16 Tf
14" The Disk Farm? or a System On a Card? The 500GB disc card An array of discs Can be used as 100 discs 1 striped disc 50 FT discs ....etc LOTS of accesses/second of bandwidth A few disks are replaced by 10s of Gbytes of RAM and a processor to run Apps!!