490 likes | 499 Views
Explore the changes in computing research challenges from 1986 to present, including Moore’s Law, database advancements, parallel computing, and future software productivity outlook. Discover insights from Gordon Bell and Microsoft Research.
E N D
NSF Visit Gordon Bell www.research.microsoft.com/~gbell Microsoft Research 4 October 2002
Topics • How much things have changed since CISE was formed in 1986, but remain the same? • 10 year base case @CRA’s Grand Challenges? http://www.google.com/search?sourceid=navclient&q=cra+grand+challenges • GB MyLifeBits: storing one’s entire life for recall, home media, etc. • Clusters, Grids, and Centers…challenge is apps • Supercomputing directions
Messages… • The Grand Challenge for CISE is to work on applications in science, engineering, and bio/medicine/health care (e.g. NIH). • Databases versus greping. Revolution needed.Performance from software >= Moore’s Law • Big challenge moving forward will come from trying to manage and exploit all the storage. • Supercomputing: Cray. Gresham's Law • Build on industry standards and efforts. Grid and “web services” must co-operate. • Whatever happened to the first, Grand Challenges? • Minimize grant overhead… site visits.
IBM Sets Up Biotech Research Center U.S.-based IBM recently set up a biotechnology research and development center in Taiwan -- IBM Life Sciences Center of Excellence -- the company's first in the Asia Pacific region… the center will provide computation solutions and services from an integrated bio-information database linked to resources around the world. Local research institutes working in cooperation with the center include Academia Sinica, the Institute for Information Industry and National Yang Ming University. From HPCWire 30 September 2002
Retrospective: CISE formed in 1986 • CISE spent about $100 million on research in 1987 • Q: What areas of software research do you think will be the most vital in the next decade? • A: Methods to design and build large programs and data bases in a distributed environment are central. • Q: What software research areas are funded? • A: We fund what the community considers to be important … object-oriented languages, data bases, & human interfaces; semantics; formal methods of design and construction; connectionism; and data and knowledge bases, including concurrency. We aren’t funding applications.
Software Productivity c1986 • I believe the big gains in software will come about by eliminating the old style of programming, by moving to a new paradigm, rather than magic tools or techniques to make the programming process better. Visicalc and Lotus 1-2-3 are good examples of a dramatic improvement in programming productivity. In essence, programming is eliminated and the work put in the hands of the users. • These breakthroughs are unlikely to come from the software research community, because they aren’t involved in real applications. Most likely they will come from people trained in another discipline who understand enough about software to be able to carry out the basic work that ultimately is turned over to the software engineers to maintain and evolve.
Software productivity c1986 • Q: The recent Software Engineering Conference featured a division of opinion on mechanized programming. … developing a programming system to write programs can automate much of the mundane tasks… • A: Mechanized programming is recreated and renamed every few years. In the beginning, it meant a compiler. The last time it was called automatic programming. A few years ago it was program generators and the programmer’s work bench. The better it gets, the more programming you do!
Parallelismc1986 • To show my commitment to parallel processing, for the next 10 years I will offer two $1000 annual awards for the best, operational scientific or engineering program with the most speedup ... • Q: What …do you expect from parallelism in the next decade? • A: Our goal is obtaining a factor of 100 … within the decade and a factor of 10 within five years. 10 will be easy because it is inherently in most applications right now. The hardware will clearly be there if the software can support it or the users can use it. • Many researchers think this goal is aiming too low. They think it should be a factor of I million within 15 years. However, I am skeptical that anything more than our goal will be
Grand Challengeland Goodness Base Case Death and Doldrums 2000 Time 2012 No challenge, next decade of systems. Industry’s evolutionary path…¿Que sera sera Computing Research Association Grand Challenges Gordon Bell Microsoft Research 26 June 2002
In a decade, the evolution: We can count on: • Moore’s Law provides ≈50-100x performance, const. $20% $ decrease/year => ½ per 5 years • Terabyte personal stores => personal db managers • Astronomical sized, by current standards, databases! • Paper quality screens on watch, tablets… walls • DSL wired, 3-4G/802.11j nets (>10 Mbps) access • Network Services: Finally computers can use|access the web. “It’s the Internet, Stupid.” • Enabler of intra-, extra-, inter-net commerce • Finally EDI/Exchanges/Markets • Ubiquity rivaling the telephone. • Challenge: An instrument to supplant the phone? • Challenge: Affordability for everyone on planet <$1500/year • Personal authentication to access anything of value • Murphy’s Law continues with larger and more complex systems, requiring better fundamental understanding. A opportunity and need for “Autonomic Computing”
In a decade, the evolution: We are likely to “have” • 120M computers/yr. World population >1B. • increasing with decreasing price. 2x / -50% • X% are discarded. Result is 1 Billion. • Smaller personals w/phones… video @PDA $ • Almost adequate speech communication for commands, limited dictation, note taking, segmenting/indexing video • Vision capable of tracking each individual in a relatively large crowd. With identity, everybody’s location is known, everywhere, anytime.
Inevitable wireless nets… body, home, …x-area nets will create new opportunities • Need to construct these environment of platforms, networking protocols, and programming environments for each kind • Each net has to research its own sensor/effector structure as f(application) e.g. body, outdoor, building, • Taxonomy includes these alternative dimensions: • Network function • master|slave vs. distributed… currently peripheral nets • permanent|dynamic • indoor|outdoor; • size and spatial diameter; • bandwidth and performance; • sensor/effector types; • security and noise immunity;
New environments can support a wide range of new apps • Continued evolution of personal monitoring and assistance for health and personal care of all ages • Personal platforms that provide “total recall” that will assist (25% of population) solving problems • Platforms for changing education will be available. Limiters: Authoring tools & standards; content • Transforming the scientific infrastructure is needed! • petabyte databases, petaflops performance • shared data notebooks across instruments and labs • new ways of performing experiments and • new ways of programming/visualizing and storing data. • Serendipity: Something really new, like we get every decade but didn’t predict, will occur.
R & D Challenges • Engineering, evolutionary construction, and non-trivial maintenance of billions of node, fractal nets ranging from the space, continent, campus, local, … to in-body nets • Increasing information flows & vast sea of data • Large disks everywhere! personal to large servers across all apps • Akin to the vast tape libraries that are never read (bit rot) • A modern, healthcare system that each of us would be happy or unafraid of being admitted into. Cf. islands (incompatible systems) of automation and instruments floating on a sea of paper moved around by people who maintain a bloated and inefficient “services” industry/economy.
MyLifeBits, The Challenge of a 0.001-1 Petabyte lifetime PC Cyberizing everything…I’ve written, said, presented (incl. video), photos of physical objects & a few things I’ve read, heard, seenand might “want to see” on TV
"The PC is going to be the place where you store the information … really the center of control“ Billg 1/7/2001 MyLifeBits is an “on-going” project following CyberAll to “cyberize” all of personal bits! • Memory recall of books, CDs, communication, papers, photos, video • Photos of physical object collections • Elimination of all physical stores & objects • Content source for home media: ambiance, entertainment, communication, interaction Freestyle for CDs, photos, TV content, videos Goal: to understand the 1 TByte PC: need, utility, cost, feasibility, challenge & tools.
Storing all we’ve read, heard, & seen Human data-types /hr /day (/4yr) /lifetime read text, few pictures 200 K 2 -10 M/G 60-300 G speech text @120wpm 43 K 0.5 M/G 15 G speech @1KBps 3.6 M 40 M/G 1.2 T stills w/voice @100KB 200 K 2 M/G 60 G video-like 50Kb/s POTS 22 M .25 G/T 25 T video 200Kb/s VHS-lite 90 M 1 G/T 100 T video 4.3Mb/s HDTV/DVD 1.8 G 20 G/T 1 P
A “killer app” for Terabyte, Lifetime, PC? • MyLifeBits demonstrates need for lifetime memory! • MODI (Microsoft Office Document Imaging)! The most significant Office™ addition since HTML. • Technology to support the vision: • Guarantee that data will live forever! • A single index that includes mail, conversations, web accesses, and books! • E-book…e-magazines reach critical mass! • Telephony and audio capture are needed • Photo & video “index serving” • More meta-information … Office, photos • Lots of GUIs to improve ease-of-use
The Clusters – GRID EraCCGSC 2002Lyon, France September 2002
Same observations as 2000 X • GRID was/is an exciting concept … • They can/must work within a community, organization, or project. Apps need to drive. • “Necessity is the mother of invention.” • Taxonomy… interesting vs necessity • Cycle scavenging and object evaluation (e.g. seti@home, QCD) • File distribution/sharing for IP theft e.g. Napster • Databases &/or programs for a community(astronomy, bioinformatics, CERN, NCAR) • Workbenches: web workflow chem, bio… • Exchanges… many sites operating together • Single, large objectified pipeline… e.g. NASA. • Grid as a cluster platform! Transparent & arbitrary access including load balancing Web SVCs
Grid nj. An arbitrary distributed, cluster platform A geographical and multi-organizational collection of diverse computers dynamically configured as cluster platforms responding to arbitrary, ill-defined jobs “thrown” at it. • Costs are not necessarily favorable e.g. disks are less expensive than cost to transfer data. • Latency and bandwidth are non-deterministic, thereby changing cluster characteristics • Once a large body of data exists for a job, it is inherently bound to (set into) fixed resources. • Large datasets & I/O bound programs need to be with their data or be database accesses… • But are there resources there to share?
Bright spots… near term, user focus, a lesson for Grid suppliers • Tony Hey, head of UK scientific computing. apps-based funding. versus tools-based funding.Web services based Grid & data orientation. • David Abramson - Nimrod. • Parameter scans… other low hanging fruit • Encapsulate apps! “Excel”-- language/control mgmt. • “Legacy apps are programs that users just want, and there’s no time or resources to modify code …independent of age, author, or language e.g. Java.” • Andrew Grimshaw - Avaki • Making Legion vision real. A reality check. • Lip 4 pairs of “web services” based apps • Gray et al Skyservice and Terraservice • Goal: providing a web service must be as easy as publishing a web page…and will occur!!!
SkyServer: delivering a web service to the astronomy community. Prototype for other sciences? Gray, Szalay, et al First paper on the SkyServer http://research.microsoft.com/~gray/Papers/MSR_TR_2001_77_Virtual_Observatory.pdf http://research.microsoft.com/~gray/Papers/MSR_TR_2001_77_Virtual_Observatory.doc Later, more detailed paper for database community http://research.microsoft.com/~gray/Papers/MSR_TR_01_104_SkyServer_V1.pdf http://research.microsoft.com/~gray/Papers/MSR_TR_01_104_SkyServer_V1.doc
What can be learned from Sky Server? • It’s about data, not about harvesting flops • 1-2 hr. query programs versus 1 wk programs based on grep • 10 minute runs versus 3 day compute & searches • Database viewpoint. 100x speed-ups • Avoid costly re-computation and searches • Use indices and PARALLEL I/O. Read / Write >>1. • Parallelism is automatic, transparent, and just depends on the number of computers/disks. • Limited experience and talent to use dbases.
You can GREP 1 GB in a minute You can GREP 1 TB in 2 days You can GREP 1 PB in 3 years. 1PB ~10,000 >> 1,000 disks At some point you need indices to limit searchparallel data search and analysis Goal using dbases. Make it easy to Publish: Record structured data Find data anywhere in the network Get the subset you need! Explore datasets interactively Database becomes the file system!!! You can FTP 1 MB in 1 sec. You can FTP 1 GB / min. … 2 days and 1K$ … 3 years and 1M$ Some science is hitting a wallFTP and GREP are not adequate (Jim Gray)
Network concerns • Very high cost • $(1 + 1) / GByte to send on the net; Fedex and 160 GByte shipments are cheaper • Disks cost $1/GByte to purchase!!! • DSL at home is $0.15 - $0.30 • Disks cost less than $2/GByte to purchase • Low availability of fast links (last mile problem) • Labs & universities have DS3 links at most, and they are very expensive • Traffic: Instant messaging, music stealing • Performance at desktop is poor • 1- 10 Mbps; very poor communication links • Manage: trade-in fast links for cheap links!!
Gray’s $2.4 K, 1 TByte Sneakernet aka Disk Brick Cost to move a Terabyte Cost, time, and speed to move a Terabyte Cost of a “Sneaker-Net” TB • We now ship NTFS/SQL disks. • Not good format for Linux. • Ship NFS/CIFS/ODBC servers (not disks). • Plug “disk” into LAN. • DHCP then file or DB serve… • Web Service in long term Courtesy of Jim Gray, Microsoft Bay Area Research
Cost, time of Sneaker-net vs Alts Courtesy of Jim Gray, Microsoft Bay Area Research
Grids: Real and “personal”Two carrots, one downside. A bet. • Bell will match any Gordon Bell Prize (parallelism, performance, or performance/cost) winner’s prize that is based on “Grid Platform Technology”. • I will bet any individual or set of individuals of the Grid Research community up to $5,000 that a Grid application will not win the above by SC2005.
Technical computing: Observations on an ever changing, occasionally repetitious, environment
A brief, simplified history of HPC • Sequential & data parallelism using shared memory, Cray’s Fortran computers 60-02 (US:90) • 1978: VAXen threaten general purpose centers… • NSF response: form many centers 1988 - present • SCI: Search for parallelism to exploit micros 85-95 • Scalability: “bet the farm” on clusters.Users “adapt” to clusters aka multi-computers with LCD program model, MPI. >95 • Beowulf Clusters adopt standardized hardware and Linus’s software to create a standard! >1995 • “Do-it-yourself” Beowulfs impede new structures and threaten g.p. centers >2000 • 1997-2002: Let’s tell NEC they aren’t “in step”. • High speed networking enables peer2peer computing and the Grid. Will this really work?
X What Is the SystemArchitecture?(GB c1990) X X X SIMD GRID
Processor Architectures? VECTORS VECTORS OR CS View MISC >> CISC >> Language directed RISC >> Super-scalar >>Extra-Long Instruction Word Caches: mostly alleviate need for memory B/W SC Designers View RISC >> VCISC (vectors)>> Massively parallel (SIMD) (multiple pipelines) Memory B/W = perf.
Results from DARPA’s SCI c1983 • Many research and construction efforts … virtually all new hardware efforts failed except Intel and Cray. • DARPA directed purchases… screwed up the market, including the many VC funded efforts. • No Software funding! • Users responded to the massive power potential with LCD software. • Clusters, clusters, clusters using MPI. Beowulf! • It’s not scalar vs vector, its memory bandwidth! • 6-10 scalar processors = 1 vector unit • 16-64 scalars = a 2 – 6 processor SMP
ACRI Alliant American Supercomputer Ametek Applied Dynamics Astronautics BBN CDC Convex Cray Computer Cray Research Culler-Harris Culler Scientific Cydrome Dana/Ardent/Stellar/Stardent Denelcor Elexsi ETA Systems Evans and Sutherland Computer Floating Point Systems Galaxy YH-1 Goodyear Aerospace MPP Gould NPL Guiltech Intel Scientific Computers International Parallel Machines Kendall Square Research Key Computer Laboratories MasPar Meiko Multiflow Myrias Numerix Prisma Tera Thinking Machines Saxpy Scientific Computer Systems (SCS) Soviet Supercomputers Supertek Supercomputer Systems Suprenum Vitesse Electronics Dead Supercomputer Society
What a difference 25 years AND spending >10x makes! ESRDC:40 Tflops. 640 nodes (8 - 8GFl P.vec/node) LLNL 150 Mflops machine room c1978
Japanese Earth Simulator • Spectacular results for $400M. • Year to year gain of 10x. The greatest gain since the first (1987) Gordon Bell Prize. • Performance is 10x the nearest entrant • Performance/cost is 3x the nearest entrant • RAP (real application performance) >60% PeakOther machines are typically 10% of peak. • Programming was done in HPF (Fortran) that the US research community abandoned. • NCAR was right in wanting to purchase an NEC super
Computer types -------- Connectivity-------- WAN/LAN SAN DSM SM Netwrked Supers… VPPuni NEC mP NEC super Cray X…T (all mPv) Old World Clusters GRID& P2P micros vector Legion Condor Beowulf NT clusters T3E SP2(mP) NOW SGI DSM clusters & SGI DSM Mainframes Multis WSs PCs
The Challenge leading to Beowulf • NASA HPCC Program begun in 1992 • Comprised Computational Aero-Science and Earth and Space Science (ESS) • Driven by need for post processing data manipulation and visualization of large data sets • Conventional techniques imposed long user response time and shared resource contention • Cost low enough for dedicated single-user platform • Requirement: • 1 Gflops peak, 10 Gbyte, < $50K • Commercial systems: $1000/Mflops or 1M/Gflops
The Virtuous Economic Cycle drives the PC industry… & Beowulf Attracts suppliers Greater availability @ lower cost Competition Volume Standards DOJ Utility/value Innovation Creates apps, tools, training, Attracts users
Lessons from Beowulf • An experiment in parallel computing systems • Established vision- low cost high end computing • Demonstrated effectiveness of PC clusters for some (not all) classes of applications • Provided networking software • Provided cluster management tools • Conveyed findings to broad community • Tutorials and the book • Provided design standard to rally community! • Standards beget: books, trained people, software … virtuous cycle that allowed apps to form • Industry begins to form beyond a research project Courtesy, Thomas Sterling, Caltech.
Clusters: Next Steps • Scalability… • They can exist at all levels: personal, group, … centers • Clusters challenge centers… given that smaller users get small clusters
Computing in small spaces @ LANL(RLX cluster in building with NO A/C) 240 processors @2/3 GFlops Fill the 4 racks -- gives a Teraflops
Internet II concerns given $0.5B cost • Very high cost • $(1 + 1) / GByte to send on the net; Fedex and 160 GByte shipments are cheaper • DSL at home is $0.15 - $0.30 • Disks cost $1/GByte to purchase! • Low availability of fast links (last mile problem) • Labs & universities have DS3 links at most, and they are very expensive • Traffic: Instant messaging, music stealing • Performance at desktop is poor • 1- 10 Mbps; very poor communication links
Scalable computing: the effects • They come in all sizes; incremental growth 10 or 100 to 10,000 (100X for most users)debug vs run; problem growth • Allows compatibility heretofore impossible1978: VAX chose Cray Fortran1987: The NSF centers went to UNIX • Users chose sensible environment • Acquisition and operational costs & environments • Cost to use as measured by user’s time • The role of gp centers e.g. NSF, statex is unclear. Necessity for support? • Scientific Data for a given community… • Community programs and data • Manage GRIDdiscipline • Are clusters ≈ Gresham’s Law? Drive out alts.