720 likes | 865 Views
Microsoft Azure Research Engagement. Dennis Gannon, Roger Barga, Jeff Mendenhall. Outline. Part 1. Context setting Microsoft’s goals for this project Defining the cloud and differentiating it from supercomputers Part 2. Engagement strategy
E N D
Microsoft Azure Research Engagement Dennis Gannon, Roger Barga, Jeff Mendenhall
Outline • Part 1. Context setting • Microsoft’s goals for this project • Defining the cloud and differentiating it from supercomputers • Part 2. Engagement strategy • Tutorials, workshops, sample applications and consulting • Part 3. Windows Azure • Architecture • Use cases and programming models
Microsoft’s Goals for this Project • Demonstrate that a client+cloud model can revolutionize research and learning • Illustrate that cloud computing is a cost-effective and easy-to-use way to outsource select components of research infrastructure • Provide feedback from research community to our product groups • Establish the Microsoft Cloud Computing platform as leader and trendsetter for basic research
The Cloud • A model of computation and data storage based on “pay as you go” access to “unlimited” remote data center capabilities • A cloud infrastructure provides a framework to manage scalable, reliable, on-demand access to applications • A cloud is the “invisible” backend to many of our mobile applications • Historical roots in today’s Internet apps • Search, email, social networks • File storage (Live Mesh, MobileMe, Flicker, …)
Clouds are built on Data Centers • Range in size from “edge” facilities to megascale. • Economies of scale • Approximate costs for a small size center (1000 servers) and a larger, 100K server center. Each data center is 11.5 times the size of a football field
Advances in DC deployment • Conquering complexity. • Building racks of servers & complex cooling systems all separately is not efficient. • Package and deploy into bigger units
Data Center vs Supercomputers Fat tree network • Scale • Blue Waters = 40K 8-core “servers” • Road Runner = 13K cell + 6K AMD servers • MS Chicago Data Center = 50 containers = 100K 8-core servers. • Network Architecture • Supercomputers: CLOS “Fat Tree” infiniband • Low latency – high bandwidth protocols • Data Center: IP based • Optimized for Internet Access • Data Storage • Supers: separate data farm • GPFS or other parallel file system • DCs: use disk on node + memcache + databases Standard Data Center Network
Common HPC Programming Paradigm • Domain decomposition • Spreads vital data across all nodes • Each spatial cell exists in one memory • Except possible ghost or halo cells • Single node failure • Causes blockage of entire simulation • Data is lost and must be recovered • Checkpointing is the de facto HPC solution • Periodically write all data to secondary storage • Given failures, one can compute an optimal interval
Cloud Data Architectures • A close integration of data with computation • “Move the computation to the data” – Jim Gray • Data is stored on server disks • Optimized more for reads than writes • Data replication • 3 to 5 copies of each data object • Copies are distributed • Unstructured data • “Blob” storage- basic metadata+ binary object • Streaming data from instruments • Structured data • Tables – billions of rows and columns • Table partitioned into blocks of rows and blocks are distributed and replicated. • Databases – replicated relational databases
Data Center vs Supercomputer Apps • Supercomputer • High parallel, tightly synchronized MPI simulations • Supercomputer or Cloud • Large scale, loosely coupled data analysis • Cloud • Scalable, parallel, resilient web services HPC Supercomputer Data Center based Cloud Internet Map Reduce Data Parallel MPI communication
The Cloud Landscape • Infrastructure as a Service (IaaS) • Provide a data center and a way to host client VMs and data. • Platform as a Service (PaaS) • Provide a programming environment to build a cloud application • The cloud deploys and manages the app for the client • Software as a Service (SaaS) • Delivery of software from the cloud to the desktop
Platform as a Service • An application development, deployment and management fabric. • User programs web service front end and computational & Data Services • Framework manages deployment and scale out • No need to manage VM images App User Internet Web Access Layer App Developer PaaS Dev/Deploy Fabric Examples: Microsoft Azure, Google App Engine, RightScale, SalesForce, Rollbase, Bungee, Cloudera Fabric Controller Data & Compute Layer VM VM VM VM VM VM VM Sever m Sever 4 Sever 3 Sever 2 Sever 1 Sever n
The Cloud Landscape Infrastructure as a Service Software as a Service Platform as a Service
The Future: an Explosion of Data Experiments Simulations Archives Literature Instruments The Challenge: Enable Discovery. Deliver the capability to mine, search and analyze this data in near real time. Enhance our Lives Participate in our own heath care. Augment experience with deeper understanding. Petabytes Doubling every 2 years
Changing Nature of Discovery • Complex models • Multidisciplinary interactions • Wide temporal and spatial scales • Large multidisciplinary data • Real-time steams • Structured and unstructured • Distributed communities • Virtual organizations • Socialization and management • Diverse expectations • Client-centric and infrastructure-centric http://research.microsoft.com/en-us/collaboration/fourthparadigm/
Changing the way we do research Supercomputer Users • The Branscomb Pyramid • The Rest of Us • Use laptops. • Our data collections are not as big as we wished. • Our tools are limited. • Paradigm shifts for research • Google, Yahoo and MS proved the power of the cloud with search. • The game changer: the ability to query anything, anytime, anywhere. • The cloud is also designed to support very large numbers of users or communities • Data collections are the first step. • The second step: build the apps that run on client devices and the cloud that can exploit these collections. The Rest of Us. Have own smallcluster or servers
The Clients+Cloud Platform • At one time the “client” was a PC + browser. Now • The Phone • The laptop/tablet • The TV/Surface/Media wall • And the future • The instrumented room • Aware and active surfaces • Voice and gesture recognition • Knowledge of where we are • Knowledge of our health
The Cloud as an extension of your desktop and other client devices • Today • Cloud storage for your data files synchronized across all your machines (mobile me, live mesh, flicker, etc.) • Your collaboration space (Sakai, SharePoint) • Cloud-enabled apps (Google Apps, Office Live) • Tomorrow (or even sooner) • The lens that magnifies the power of desktop • Operate on a table with a billion rows in excel • Matlab analysis of a thousand images in parallel
Our Metrics of Success • Projects that advance scientific discovery through novel uses of cloud technology • New ways to expose and explore community data collections • Advances in client + cloud tools and programming models • Finding cloud application to reach beyond the “traditional” e-Science community • With NSF we build a model for and examples of SUSTAINABLE cloud services, tools and communities
Things we need from NSF • Help with a time line for the expected progress of the program • How can we give NSF help on formulating a CFP? • Agreement on the nature of the program and our shared goals
Windows AzureBuilding Community around Cloud Computing for Research
CCF Academic Research EngagementResources to Build Community around Cloud Computing for Research PowerPoint tutorial for a general overview of Windows Azure; Whitepaper that presents a technical overview and best practices for developing and deploying research services on Windows Azure; Benchmark suite as a guide to application architects and developers; Host reference data sets for research, based on research value/interest; Kickoff Workshop and Annual All Hands Meeting (AHM) at MSR; Technical engagement team, accessible via ccfengage@microsoft.com (tbc); Community website, regularly updated with technical content, blogs, community supplied content, Q&A, etc.
Azure Tutorial(s) Extended version of SuperComputing’09 tutorial with deep dives on Azure storage, including Blobs, Tables, XDrives, and new Azure features (85 slides) Available January 11th 2010
CCF Academic Research EngagementReadily Available Online Content
Whitepaper that presents the following: Overview of Azure; How we built select research applications; Best practices for developing applications and deploying research services; Links to source code intended to accelerate development Introduces benchmarks and outlines results to inform application development. Available February 1st, 2010 Whitepaper Resource for Decision Makers and Developers
Reference Data Initiative Thematic Focus – goal is to have the top two or three research collections on Azure in each thematic area Health & Bio Energy & Environment Computer Science Tool Ecosystem for Managing Data Collections Sustainability and Egress Guarantees
Azure BenchmarkA Resource for Programmers and Architects to Understand Azure "There are lies, damn lies and then there are performance measures." J. Gray Storage throughput, networking, and role tests. Guide for decision makers (when to use) and Developers (how to use).
Azure BenchmarkA Resource for Programmers and Architects to Understand Azure Extensible Test Harness Suite of tests, able to select and schedule repeated runs, catalog results. Guide for decision makers (when to use) and Developers (how to use). Microbenchmarks – Storage throughput, networking, and role tests. End-to-End Algorithm Benchmarks Spectrum of distributed algorithms, from tightly coupled to totally decoupled Illustrates scalability for pleasingly parallel algorithms and overheads (limits) of current network architecture and I/O architecture (coordination through queues, latency to storage fabric). Targeted Benchmarks on unique Azure Features Failure recovery (inject fault , measure time to automatically restart worker)
CCF Academic Research Engagement Supporting a Community of Researchers Search Examples Menu Data Source Menu Application Menu Community Colleagues Colleagues’ projects Whitepapers Azure Ocean Blast on Azure Azure benchmarks Projects Current Projects Recent Projects Archived Projects Getting Started Sandbox to Experiment with Research Services Resources Tutorials Whitepapers Hands on Labs Code Samples Services for Research Applications Quick Links Need Help Account Management My Account
Windows Azure in a Nutshell • Provide an brief overview of Windows Azure • Additional information in the Technical Tutorial, tentatively scheduled for January. • Examples of Research Services on Windows Azure • Illustration of Research Services
A bunch of machines in a data center Azure FC Owns this Hardware Highly-available Fabric Controller (FC)
Each VM Has… • At Minimum • CPU: 1.5-1.7 GHz x64 • Memory: 1.7GB • Network: 100+ Mbps • Local Storage: 500GB • Up to • CPU: 8 Cores • Memory: 14.2 GB • Local Storage: 2+ TB
FC Then Installs the Azure Platform Compute Storage
Windows Azure Compute Service A closer look Web Role Worker Role main() { … } HTTP ASP.NET, WCF, etc. IIS Load Balancer Agent Agent Fabric VM
Suggested Application ModelUsing queues for reliable messaging To scale, add more of either main() { … } Worker Role Web Role 1) Receive work 4) Do work ASP.NET, WCF, etc. 2) Put work in queue 3) Get work from queue Queue
Scalable, Fault Tolerant Applications • Queues are the application glue • Decouple parts of application, easier to scale independently; • Resource allocation, different priority queues and backend servers • Mask faults in worker roles (reliable messaging). • Use Inter-role communication for performance (PDC’09) • TCP communication between role instances • Define your ports in the service models
Storage Blob REST API Queue Table Load Balancer
Azure Storage ServiceA closer look HTTP Blobs Drives Tables Queues Application Storage Compute Fabric …
Windows Azure StoragePoints of interest Storage types • Blobs: Simple interface for storing named files along with metadata for file • Durable NTFS volumes • Tables: entity-based storage Not relational – entities, which contain a set of properties • Queues: reliable message-based communication Access • Data is exposed via .NET and RESTful interfaces • Data can be accessed by: • Windows Azure apps • Other on-premise applications or cloud applications
Windows Azure Drives • Provides a durable NTFS volume for Windows Azure applications to use, a VHD up to 1TB, 4 drives per VM • Enables the following scenarios • Gives applications NTFS semantics to manage state • Helps migrates existing NTFS applications to the cloud • Durability and survival of data on VM failover • Windows Azure Drive is really a Blob • Mount Page Blob as D:\ • All writes to drive are made durable to the Page Blob • Drive made durable through standard page blob replication • Drive persists even when not mounted as a page blob
How Windows Azure Drives Works VM • Mount drive as drive via lease mechanism • Writes committed to blob store before returning • Reads can be served from local cache or from blob store (cache miss) Application Drive X: OS Windows Azure Blob Store WA Drive Commands Create/Format Drive Mount/Unmount Drive Snapshot Drive Copy Drive MyBlob Lease Local Cache