400 likes | 511 Views
The Future Of Research Clouds. Dan Reed Scalable and Multicore Computing Strategist Managing Director, Data Center Futures Microsoft Corporation. Sapir–Whorf: Context A nd Research. Sapir–Whorf Hypothesis (SWH) Language influences the habitual thought of its speakers
E N D
The Future Of Research Clouds Dan Reed Scalable and Multicore Computing Strategist Managing Director, Data Center Futures Microsoft Corporation
Sapir–Whorf: Context And Research • Sapir–Whorf Hypothesis (SWH) • Language influences the habitual thought of its speakers • Scientific computing analog • Available systems shape research agendas • Consider some past examples • VAX 11/780 and UNIX • Workstations and Ethernet • PCs and web • Inexpensive clusters and web services • Today’s examples • Multicore, sensors, clouds and services …
Today’s Truisms (2008) • Bulk computing is almost free • …but software and power are not • Inexpensive sensors are ubiquitous • …but data fusion remains difficult • Moving lots of data is {still} hard • …we’re missing trans-terabit/second networks • People are really expensive! • …and robust software remains labor intensive • Scientific challenges are complex • …and social engineering is not our forte
The Transductive Continuum • Dynamic adaptation • Computing, data and bandwidth • Deep integration and software services • Edge devices and large centers • Context-aware information • The right information at the right time • The cloud follows you everywhere • Business, home, play …
Economics Drive Change • Moore’s “Law” favored consumer commodities • Economics drove enormous improvements • Specialized processors and mainframes faltered • The commodity software industry was born • Implications • Consumer product space defines outcomes • Follow the money • Clouds are a consequence of commodity exponentials • Inexpensive storage, broadband networks and multicore processors Source: Jim Larus
Services Transformation “SaaS” Service Delivery Web Desktop “SOA” Service Composition Software + Services Devices Enterprise “Web 2.0” Service Experience Software Services
The Microsoft Services Foundation Sharepoint.Microsoft.com Zurich .Net Online Office Labs Plus over 150 more sites and services MBS Online
Embarrassingly Parallel Processing • When applications are hosted • Even sequential ones are embarrassingly parallel • Few dependencies among users • Moore’s benefits accrue to platform owner • 2x processors → • ½ servers (+ ½ power, space, cooling …) • Or 2x service at the same cost • Tradeoffs not entirely one-sided due to • latency, bandwidth, privacy, off-line considerations • capital investment, security, programming problems Source: Jim Larus
Cloud Application Frameworks OS Virtualization • Spanned by three points, each defining an approach • Exploit parallelism • Ease deployment Amazon S3/EC2 RightScale, GigaSpaces, Elastra, 3Tera cohesive Hadoop over EC3 caroline Hadoop What Goes Here? Astoria Microsoft SSDS Microsoft Mesh GFS, BigTable, MapReduce AppEngine Software as a Service Parallel Frameworks Source: Dennis Gannon Source: Dennis Gannon
Clouds And Software • Open source • Hadoop (Map Reduce) • Linux • Amazon Web Services (AWS) • Elastic Computing Cloud (EC2) • Simple Storage Server (S3) • SimpleDB • Google • Gears, Gmail, Apps, Lively • Microsoft • Dryad and DryadLINQ • Windows Server • SQL Server Data Services (SSDS) • Live Mesh • Office Live
Software As A Service (Saas) Exchange, Sharepoint Live Meeting Microsoft Office
Live Mesh Stay Up-to-Date Simple to Share Anywhere Access Devices Acting Together Platform Services
Live Mesh Tech Preview • Distributed sharing • Devices and content • Cloud storage • Publish/subscribe • Decentralized data bus • FeedSync dev.live.com/feedsync • Developer APIs • AJAX, JavaScript • .Net, REST blogs.msdn.com/livemesh www.livemesh.com
Behind The Mesh Mobile Mesh Clients Web Desktop Mesh Folders News Feeds 3rd-Party Services / Apps Live Mesh Experiences MeshFX (Protocols & APIs) MeshFX (Protocols & APIs) Cloud and Client MOE (Service Composition Runtime) Cloud and Client MOE (Service Composition Runtime) Live Developer Platform Developer Stack Developer Tools & Services Rendezvous & Transport Connectivity Services Identity & Directory Identity & Directory Activities & News Activities & News Synchronized Storage Synchronized Storage … … SERVICES Platform Live ServicesPlatform Service Provisioning & Management Computation Storage Cloud Infrastructure ServicesManagement & Operations Infrastructure Services
The Data Explosion Simulations Archives Literature Experiments From petabytes to Exabytes
Data and Information What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it. ~Herbert Simon
Data Distribution Data Convergence Our Data Models Are In Rapid Flux Irregular Local Analysis Research Questions Limited Data Internet Traditional Databases Regular Fusion Unstructured Structured Many Petabytes Distributed Analysis
Dryad (MSR SVC) • Coarse grain data flow • DAG specification • Superset of Map-Reduce • Communication substrates • NTFS, TCP, shared memory • Failure recovery • Scheduling • SkyServer query example • 3-way join to find gravitational lens Source: Michael Isard et al
Dryad Architecture Data Plane Files, TCP, FIFO, Network Job Schedule V V V NS PD PD PD Control Plane Job Manager Cluster Source: MihaiBudiu
Dryad Software Stack Machine Learning sed, awk, grep, etc. C# SSIS Legacycode PSQL Perl C++ Queries C# Vectors SQLserver Job queueing, monitoring Distributed Shell DryadLINQ C++ Dryad Distributed File System CIFS/NTFS Cluster Services Windows Server Windows Server Windows Server Windows Server Source: MihaiBudiu
select u.color,n.neighborobjid from u join n where u.objid = n.objid select u.objid from u join <temp> where u.objid = <temp>.neighborobjid and |u.color - <temp>.color| < d [distinct] [merge outputs] (u.color,n.neighborobjid) [re-partition by n.neighborobjid] [order by n.neighborobjid] u: objid, color n: objid, neighborobjid [partition by objid] SkyServer Example (Lensing) (u.color,n.neighborobjid) [re-partition by n.neighborobjid] [order by n.neighborobjid] Source: Michael Isard et al
DryadLINQ: From LINQ To Dryad Automatic query plan generation Distributed query execution by Dryad • LINQ: .NET Language Integrated Query • Declarative SQL-like programming with C# and Visual Studio • Easy expression of data parallelism • Elegant and unified data model LINQ query Query plan Dryad var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); select where logs Source: Yuan Yu et al
Dryad And Map-Reduce • Dryad • Execution layer • Arbitrary DAG • Plug-in policies • Program (graph generation) • Complex (features) • New (< 2 years) • Still growing • Distribution being considered • Map-Reduce • Execution + application model • Map+Sort+Reduce • Few policies • Program=Map+Reduce • Simple • Mature (> 4 years) • Widely deployed • Hadoop research.microsoft.com/research/sv/Dryad Source: MihaiBudiu
SQL Server Data Services (SSDS) • SSDS cloud access • SOAP/REST web 2.0 interface • SQL-like queries (LINQ) • Visual Studio support • Scalable data hosting • Data model Customer{SSDS account(1..N) {Authority(1..N){Container(0..N) {Entity (0..N) www.microsoft.com/sql/dataservices/default.mspx msdn.microsoft.com/en-us/sqlserver/dataservices/default.aspx
MSR Engagement Plans • Dryad options (being considered) • Academic software release • Hosted instance at Microsoft • Research explorations • Live Mesh and SSDS • Now in public beta • Additional features coming • Other developments • Watch this space
Consider These Cloud Services Challenges • Environmental responsibility • Managing under the 100 MW envelope • Adaptive systems management • Provisioning 25,000 servers • Hardware: at most one week after delivery • Software: at most a few hours • Resilience during a blackout/disaster • Data center failure • Service rollover for 20M customers • Programming the entire data center • Power, environmentals, provisioning, resilience…
Research And Academic Clouds • Build to suit or rent to use? • In academia, our penchant is construction • … but we do not do software well • and we struggle to maintain what we build • Consider the difficulties we’ve had • Production operations and infrastructure reliability • Software complexity and feature creep • Is it time for a new model? • Renting our hardware/software infrastructure • Focusing research funding on science
New MSR Shared Infrastructure • Loosely coupled clusters • Latency-tolerant computations • Tightly coupled clusters • Latency-constrained computations • Large-scale storage • Repositories for common data • Active analysis and processing • Systems experimentation • Systems as research • External facing services • Web services and applications • Multipurpose systems • 1140 unique systems • Dual 2.33 GHz quad-core • GigE interconnects • Low latency systems • 128 unique systems • Dual 3 GHz quad-core • GigE and DDR Infiniband • Large-scale storage • ~1 PB total storage • 60 dual 2.33 GHz quad-core • 5x70 750 GB SAS (RAID5) • GigE interconnect
Microsoft Data Center Futures (DCF) • Research prototypes • Each testing 1-2 innovative ideas • Sufficient scale for validation • Exemplar technical themes • Purpose built hardware • Interconnect cost and capability • Alternative (non-disk) storage • Power, packaging and cooling • Reliability and monitoring • Software adaptation • Validated with driving problems • Current and future applications
Performability Thresholds Useful Lifetime • Changing our assumptions • Design to expect failure • Design as energy quanta Performance Utility Threshold Elapsed Time
Some Data Center Technical Issues • Cooling technologies • Operating points • phase change, liquid, … • New packaging technologies • Optoelectronics • Memory stacking, … • New storage models/algorithms • FLASH, PCM, … • Locality-aware algorithms • The speed of light is pretty slow • Intelligent power management • Adaptation and power down • System adaptation/integration • Reliability/power as first class objects
Data Center Futures Technical Leadership • Dennis Gannon • Director, Application Futures • Jim Larus • Director, Software Architecture • James Hamilton • System Architect • TBD • Director, Hardware Architecture • TBD • Director, Environments and Power
Memex: Still Prescient “Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and to coin one at random, “memex” will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.” ~Vannevar Bush “As We May Think,” Atlantic Monthly, 1945
Research In The Cloud Visualization and analysis services Scholarly communications A Future Research Environmentwith both Software + Services Domain-specific services Searchbooks & citations Blogs & social networking Instant messaging Reference management Identity Mail Projectmanagement Notification Document store Storage/data services Knowledge management The Microsoft Technical Computing mission to reduce time to scientific insights is exemplified by the June 13, 2007 release of a set of four free software tools designed to advance AIDS vaccine research. The code for the tools is available now via CodePlex, an online portal created by Microsoft in 2006 to foster collaborative software development projects and host shared source code. Microsoft researchers hope that the tools will help the worldwide scientific community take new strides toward an AIDS vaccine. See more. Compute services virtualization Knowledge discovery Source: Tony Hey
Academic Research Implications • Consider integrated, holistic designs • Integrative and reductionist approaches • Consider data centers and clouds • Take the long term view • Ten years horizons and new applications • Think about appropriate scales • Much larger than normally considered • Investigate the hard problems • Technically and politically
Microsoft, Academia and Clouds • What would make a difference? • Software access • Hosted services • Hosted data • Hardware and software • We want to hear from you
The Cambrian Explosion • Most phyla appear • Sponges, archaeocyathids, brachiopods • Trilobites, primitive mollusks, echinoderms • Indeed, most appeared quickly! • Tommotian and Atdbanian • … as little as five million years • Lessons for computing writ large • It doesn’t take long when conditions are right • raw materials and environment • Leave fossil records if you want to be remembered!
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.