280 likes | 416 Views
Programming the Cloud Cloud Services for Science . Tony Hey Corporate Vice President Microsoft External Research. Tony Hey – An Introduction. Commander of the British Empire. Worldwide External Research. Community and Geographic Outreach. Advanced Research Tools and Services.
E N D
Programming the Cloud Cloud Services for Science Tony Hey Corporate Vice President Microsoft External Research
Tony Hey – An Introduction Commander of the British Empire
Worldwide External Research Community and Geographic Outreach Advanced Research Tools and Services
Emergence of a Fourth Research Paradigm • Thousand years ago – Experimental Science • Description of natural phenomena • Last few hundred years – Theoretical Science • Newton’s Laws, Maxwell’s Equations… • Last few decades – Computational Science • Simulation of complex phenomena • Today – Data-Intensive Science • Scientists overwhelmed with data sets from many different sources • Data captured by instruments • Data generated by simulations • Data generated by sensor networks • eScience is the set of tools and technologies to support data federation and collaboration • For analysis and data mining • For data visualization and exploration • For scholarly communication and dissemination Astronomy has been one of the first disciplines to embrace data-intensive science with the Virtual Observatory (VO), enabling highly efficient access to data and analysis tools at a centralized site. The image shows the Pleiades star cluster form the Digitized Sky Survey combined with an image of the moon, synthesized within the WorldWide Telescope service. Science must move from data to information to knowledge With thanks to Jim Gray (With thanks to Jim Gray)
Accelerating time to insightwith Advanced Research Tools and Services Our goal is to accelerate research by collaborating with academic communities to use computer science research technologies We also aim to help scientists spend less time on IT issues and more time on discovery by creating open tools and services based on Microsoft platforms and productivity software
What is Cloud Computing? A Definition: • Cloud Computing means using a remote data center to manage scalable, reliable, on-demand access to applications • Providing Applications and Infrastructure over the Internet • Scalable means: • Possibly millions of simultaneous users of the app. • Exploiting thousand-fold parallelism in the app. • Reliable means on-demand; 5 “nines” available right now • Applications span the continuum from client to the cloud Three New Aspects to Cloud Computing: • Illusion of infinite computing resources available on demand • Elimination of an upfront commitment by cloud users • Ability to pay for use of computing resources on a short-term basis as needed
Range in size from “edge” facilities to mega scale. Unprecedented economies of scale Approximate costs for a small size center (1K servers) and a larger, 50K server center. The Data Center Landscape Each data center is 11.5 times the size of a football field Data courtesy of James Hamilton
Advances in Data Center Deployment Conquering complexity • Building racks of servers & complex cooling systems all separately is not efficient. • Package and deploy into bigger units • 3 Sockets: Power, Cooling, Bandwidth
Why is this not just the same as Supercomputing? • A Supercomputer is designed to scale a single application for a single user. • Optimized for peak performance of hardware • Batch operation is not “on-demand” • Reliability is secondary • If MPI fails, application crashes • Build check-pointing into application • Most Data Center applications run continuously (as services) • Most Cloud Applications are immediate, scalable and persistent • The Cloud is also a platform for massive data analysis • Not a replacement for leading edge supercomputers • The Programming model must support scalability in two dimensions • Thousands of simultaneous users of the same applications • Applications that require thousands of cores for each use * Dan Reed
Infrastructure as a Service (IaaS) Provide a way to host virtual machines on demand Platform as a Service (PaaS) You write an Application to Cloud APIs and the platform manages and scales it for you. Software as a Service (SaaS) Delivery of software to the desktop from the Cloud Programming the Cloud Infrastructure as a Service Azure™ Services Platform Software as a Service Platform as a Service
A Spectrum of Application Models Constraints in the App Model More Constrained Less Constrained Google App Engine Traditional Web Apps Auto Scaling and Provisioning Amazon AWS VMs Look Like Hardware No Limit on App Model User Must Implement Scalability and Failover Microsoft Azure .NET CLR/Windows Only Choice of Language Some Auto Failover/ Scale (but needs declarative application properties) Force.Com SalesForce Biz Apps Auto Scaling and Provisioning Automated Management Services Less Automation More Automation
Azure Abstract Programming Model Abstract Programming Model: PublicInternet Front-end Web Role Worker Role(s) Load Balancer Load-balancers In-band communication – software control Azure Services (storage) Switches Highly-available Fabric Controller
Roles:Scalable, Fault Tolerant, Stateless Scalability • Queue length directly reflects how well backend processing is keeping up with overall workload • Queues decouple different parts of the application, making it easier to scale parts of the application independently • Flexible resource allocation, different priority queues and separation of backend servers to process different queues • Roles are mostly stateless processes running on a core • Web Roles provide web service access to the app by the users. Web roles generate tasks for worker roles • Worker Roles do “heavy lifting” and manage data in tables/blobs • Communication is through queues. • The number of role instances should dynamically scale with load
The Azure Fabric • Consists of a (large) group of machines all managed by software called the fabric controller • The fabric controller is replicated across a group of five to seven machines and owns all of the resources in the fabric • Because it can communicate with a fabric agent on every computer, the controller is aware of every Windows Azure application running on the fabric
Azure Storage Blobs, Tables, Queues, and SQL Data Services (relational) • The simplest way to store data in Windows Azure storage is to use blobs • A blob contains binary data • A storage account can have one or more containers, each of which holds one or more blobs • Tables hold some number of entities. • An entity contains zero or more properties • Queues provide scalability • Queues provide a buffer to absorb traffic bursts • Reduce the impact of individual component failures Blobs can be big—up to 50 gigabytes each They can also have associated metadata
Scientific Applications in the CloudSaaS for Science Investigating the use of commercial Cloud services for scientific research Example Applications: • PhyloD, computationally-intensive science application that was not previously available as a service. Moving to the Cloud so researchers around the world can have access. • MatLab, client application making use of Azure blob storage. Matlab could also be hosted in the Cloud, with the appropriate licensing. • Excel, demonstrating seamless interaction between familiar client tools and the Cloud, where data is stored in Azure tables and Azure computations can be invoked for analysis. The data is from sensors distributed throughout one of our Data Centers.
PhyloD as an Azure Service • Statistical tool used to analyze DNA of HIV from large studies of infected patients • PhyloD was developed by Microsoft Research and has had highly impact • Small but important group of researchers • 100’s of HIV and HepC researchers actively use it • 1000’s of research communities rely on these results Cover of PLoS Biology November 2008 • Typical job, 10 – 20 CPU hours with extreme jobs requiring 1K – 2K CPU hours • Very CPU efficient • Requires a large number of test runs for a given job (1 – 10M tests) • Highly compressed data per job ( ~100 KB per job) Highlights Azure’s potential for agile deployment of science-related services that scale Courtesy of Roger Barga
PhyloD as an Azure Service • Web role copies input tree, predictor and target files to blob storage, enqueues INITIAL work item and updates tracking tables. Blob Storage Work Item Queue (INITIAL) Tracking Tables
PhyloD as an Azure Service • Worker role copies the input files to its local storage, computes p-values for a subset of the allele-codon pairs, copies the partial results back to blob storage and updates the tracking tables.
PhyloD as an Azure Service • Web role serves the final results from blob storage and status reports from tracking tables.
Project JUNIOR:Demonstrating the Value of Cloud Services for Science Led by Newcastle University (Prof. Watson) and supported by External Research. Goal:Investigate applicability of Clouds for scientific research • Build a working prototype for a thin slice (use-cases in chemo-informatics) • Utilize Microsoft technologies to build science-related services • Investigating additional Scientific Cloud Services to raise abstraction level for applications
Reference Scientific Data Sets on Azure Ocean Science data on Azure SDS-relational • Two terabytes of coastal and model data Computational finance data on SDS-relational • BATS, daily tick data for stocks (10 years) • XBRL call report for banks (10,000 banks) Currently working with IRIS to store select seismic data on Azure. IRIS consortium based in Seattle (NSF) collects and distributes global seismological data. • Data sets requested by researchers worldwide • Includes HD videos, seismograms, images, data from major seismic events. • High research value worldwide, frequently used in academia and lectures • Consume in any language, any tool, any platform in S+S scenarios
A world where all data is linked … • A knowledge ecosystem: • A richer authoring experience • An ecosystem of services • Semantic storage • Open, Collaborative,Interoperable, and Automatic • Data/information is inter-connected through machine-interpretable information (e.g. paper Xis about star Y) • Social networks are a special case of ‘data meshes’ Attribution: Chris Bizer
…and stored/processed/analyzed in the Cloud visualization and analysis services scholarly communications Vision of Future Research Environment with both Software + Services domain-specific services search books citations blogs &social networking Reference management instant messaging identity mail Project management notification document store storage/data services knowledge management The Microsoft Technical Computing mission to reduce time to scientific insights is exemplified by the June 13, 2007 release of a set of four free software tools designed to advance AIDS vaccine research. The code for the tools is available now via CodePlex, an online portal created by Microsoft in 2006 to foster collaborative software development projects and host shared source code. Microsoft researchers hope that the tools will help the worldwide scientific community take new strides toward an AIDS vaccine. See more. compute services virtualization knowledge discovery
Additional Resourcesresearch.microsoft.com/en-us/collaboration/tools Windows Azure: http://www.windowsazure.com • Dryad; DraydLINQ • Computational Biology Toolkit • Enables and accelerates fundamental advances in biology • F# • Collaboration with the academic and research community on F#’s typed functional and object-oriented programming on the .NET platform • Plug-ins for Office • Ontology Add-in for Word • Article Authoring Add-in for Word • Chem4Word – Chemistry Drawing in Word • Microsoft Electronic Journals Service • Open XML Document Viewer • Software Engineering Tools • Spec#: Program verifier for C# extended with design by contract • VCC: Program verifier for Concurrent C • PEX: automatic unit testing tool for .NET • CHESS: Unit testing tools for concurrent Win32 executable and .NET