Under the Hood: Inside T he Cloud Computing Hosting Environment

ES19 Under the Hood: Inside The Cloud Computing Hosting Environment Erick Smith Development Manager Microsoft Corporation Chuck Lenzmeier Architect Microsoft Corporation

Purpose Of This Talk/Agenda • Introduce the fabric controller • Introduce the service model • Give some insight into how it all works • Describe the workings at the data center level • Then zoom in to a single machine

Deploying A Service Manually • Resource allocation • Machines must be chosen to host roles of the service • Fault domains, update domains, resource utilization, hosting environment, etc. • Procure additional hardware if necessary • IP addresses must be acquired • Provisioning • Machines must be setup • Virtual machines created • Applications configured • DNS setup • Load balancers must be programmed • Upgrades • Locate appropriate machines • Update the software/settings as necessary • Only bring down a subset of the service at a time • Maintaining service health • Software faults must be handled • Hardware failures will occur • Logging infrastructure is provided to diagnose issues This is ongoing work…you’re never done

Windows Azure Fabric Controller VM Control VM VM VM WS08 Hypervisor Control Agent Service Roles Out-of-band communication – hardware control WS08 Load-balancers In-band communication – software control Node can be a VM or a physical machine Switches Highly-available Fabric Controller

Windows Azure Automation Fabric Controller “What” is needed • Fabric Controller (FC) • Maps declarative service specifications to available resources • Manages service life cycle starting from bare metal • Maintains system health and satisfies SLA • What’s special about it • Model-driven service management • Enables utility-model shared fabric • Automates hardware management Make it happen Fabric Load-balancers Switches

Fabric Controller • Owns all the data center hardware • Uses the inventory to host services • Similar to what a per machine operating system does with applications • The FC provisions the hardware as necessary • Maintains the health of the hardware • Deploys applications to free resources • Maintains the health of those applications

Modeling Services Public Internet Template automatically maps to service model Front-end Web Role Background Process Role Load Balancer Fundamental Services Load Balancer Channel Endpoint Interface Directory Resource

What You Describe In Your Service Model… • The topology of your service • The roles and how they are connected • Attributes of the various components • Operating system features required • Configuration settings • Describe exposed interfaces • Required characteristics • How many fault/update domains you need • How many instances of each role

Fault/Update Domains • Allows you to specify what portion of your service can be offline at a time • Fault domains are based on the topology of the data center • Switch failure • Statistical in nature • Update domains are determined by what percentage of your service you will take out at a time for an upgrade • You may experience outages for both at the same time • System considers fault domains when allocating service roles • Example: Don’t put all roles in same rack • System considers update domains when upgrading a service Fault domains Allocation is across fault domains

Dynamic Configuration Settings • Purpose: Communicate settings to service roles • There is no “registry” for services • Application configuration settings • Declared by developer • Set by deployer • System configuration settings • Pre-declared, same kinds for all roles • Instance ID, fault domain ID, update domain ID • Assigned by the system • In both cases, settings accessible at run time • Via call-backs when values change

Windows Azure Service LifecycleGoal is to automate life cycle as much as possible Automated Automated Developer/ Deployer Developer

Lifecycle Of A Windows Azure Service • Resource allocation • Nodes are chosen based on constraints encoded in the service model • Fault domains, update domains, resource utilization, hosting environment, etc. • VIPs/LBs are reserved for each external interface described in the model • Provisioning • Allocated hardware is assigned a new goal state • FC drives hardware into goal state • Upgrades • FC can upgrade a running service • Maintaining service health • Software faults must be handled • Hardware failures will occur • Logging infrastructure is provided to diagnose issues

Resources Come From Our Shared Pool • Primary goal – find a home for all role instances • Essentially a constraint satisfaction problem • Allocate instances across “fault domains” • Example constraints include • Only roles from a single service can be assigned to a node • Only a single instance of a role can be assigned to a node • Node must contain a compatible hosting environment • Node must have enough resources remaining • Service model allows for simple hints as to the resources the role will utilize • Node must be in the correct fault domain • Nodes should only be considered if healthy • A machine can be sub-partitioned into VMs • Performed as a transaction

Key FC Data Structures Logical Node Logical Role Instance Logical Role Logical Service Role Instance Description Role Description Physical Node Service Description

Maintaining Node State Logical Node Logical Role Instance Goal State Current State Physical Node

The FC Provisions Machines… • FC maintains a state machine for each node • Various events cause node to move into a new state • FC maintains a cache about the state it believes each node to be in • State reconciled with true node state via communication with agent • Goal state derived based on assigned role instances • On a heartbeat event the FC tries to move the node closer to its goal state (if it isn’t already there) • FC tracks when goal state is reached • Certain events clear the “in goal state” flag

…And Other Data Center Resources • Virtual IPs (VIPs) are allocated from a pool • Load balancer (LB) setup • VIPs and dedicated IP (DIP) pools are programmed automatically • Dips are marked in/out of service as the FCs belief about state of role instances change • LB probing is set up to communicate with agent on node which has real time info on health of role • Traffic is only routed to roles ready to accept traffic • Routing information is sent to agent to configure routes based on network configuration • Redundant network gear is in place for high availability

The FC Keeps Your Service Running • Windows Azure FC monitors the health of roles • FC detects if a role dies • A role can indicate it is unhealthy • Upon learning a role is unhealthy • Current state of the node is updated appropriately • State machine kicks in again to drive us back into goals state • Windows Azure FC monitors the health of the host • If the node goes offline, FC will try to recover it • If a failed node can’t be recovered, FC migrates role instances to a new node • A suitable replacement location is found • Existing role instances are notified of the configuration change

How Upgrades Are Handled • FC can upgrade a running service • Resources deployed to all nodes in parallel • Done by updating one “update domain” at a time • Update domains are logical and don’t need to be tied to a fault domain • Goal state for a given node is updated when the appropriate update domain is reached • Two modes of operation • Manual • Automatic • Rollbacks are achieved with the same basic mechanism

Behind The Scenes Work • Windows Azure provisions and monitors hardware elements • Compute nodes, TOR/L2 switches, LBs, access routers, and node OOB control elements • Hardware life cycle management • Burn-in tests, diagnostics, and repair • Failed hardware taken out of pool • Application of automatic diagnostics • Physical replacement of failed hardware • Capacity planning • On-going node and network utilization measurements • Proven process for bringing new hardware capacity online

Service Isolation And Security • Your services are isolated from other services • Can access resources declared in model only • Local node resources – temp storage • Network end-points • Isolation using multiple mechanisms • Automatic application of windows security patches • Rolling operating system image upgrades

Windows Azure FC Is Highly Available • FC is a cluster of 5-7 replicas • Replicated state with automatic failover • New primary picks up seamlessly from failed replica • Even if all FC replicas are down, services continue to function • Rolling upgrade support of FC itself • FC cluster is modeled and controlled by a utility “root” FC Client Node FC Agent FC Core FC Core FC Core Object Model Object Model Object Model Primary FC Node Secondary FC Node Secondary FC Node Uncommitted Committed Committed Committed Disk Disk Disk Replication system

Windows Azure Fabric Is Highly Available • Network has redundancy built in • Redundant switches, load balancers, and access routers • Services are deployed across fault domains • Load balancers route traffic to active nodes only • Windows Azure FC state check-pointed periodically • Can roll-back to previous checkpoints • Guards against corrupted FC state, loss of all replicated state, operator errors • FC state is stored on multiple replicas across fault domains

Service Life-cycle • PDC release • Automated service deployment • Three service templates • Support for changing number of running instances • Simple service upgrades/downgrades • Automated service failure discovery and recovery • External VIP address/DNS name per service • Service network isolation enforcement • Automated hardware management • Include automated network load-balancer management • For 2009 • Ability to model more complex applications • Richer service life-cycle management • Richer network management

Summary • Windows Azure automates most functions • System takes care of running and keeping services up • Service owner in control • Self-management model through portal • Secure and highly-available platform • Built-in data center management • Capacity planning • Hardware and network management

Virtualization And Deployment

Virtual Computing Environment • Multi-tenancy with security and isolation • Improved ‘performance/watt/$’ ratio • Increased operations automation • Hypervisor-based virtualization • Highly efficient and scalable • Leverages hardware advances

Guest OS Server Enterprise Guest OS Server Enterprise Host OS Server Core Applications Applications VirtualizationStack (VSC) VirtualizationStack (VSC) VirtualizationStack (VSP) Drivers Hypervisor High-Level Architecture Host Partition GuestPartition GuestPartition VMBUS VMBUS VMBUS Hardware CPU NIC Disk1 Disk2

Image-Based Deployment • Images are virtual hard disks (VHDs) • Offline construction and servicing of images • Separate operating system and service images • Same deployment model for root partition

Image-Based Deployment Maintenance OS Host Partition Guest Partition Guest Partition Guest Partition Application VHD Application VHD Application VHD App1 Package App3 Package App2 Package Host partition differencing VHD Guest partition differencing VHD Guest partition differencing VHD Guest partition differencing VHD HV-enabled Server Core base VHD Server Enterprise base VHD Server Enterprise base VHD Server Core base VHD

Rapid And Reliable Provisioning • Deployment of images is just file copy • No installation • Background process • Multicast • Image caching for quick update and rollback • Servicing is an offline process • Dynamic allocation based on business needs • Net: High availability at lower cost

Windows Azure Compute Instance • Tech Preview offers one virtual machine type • Platform: 64-bit Windows Server 2008 • CPU: 1.5-1.7 GHz x64 equivalent • Memory: 1.7 GB • Network: 100 Mbps • Transient local storage: 250 GB • Windows azure storage also available: 50 GB • Full service model supports more virtual machine types • Expect to see more options post-PDC

Windows Azure Virtualization • Hypervisor • Efficient: Exploit latest processor virtualization features (e.g., SLAT, large pages) • Scalable: NUMA-aware for scalability • Small: Take up little resources • Host/guest operating system • Window Server 2008 compatible • Optimized for virtualized environment • I/O performance equally shared between virtual machines

Second-Level Address Translation SLAT requires less hypervisor intervention associated with shadow page tables (SPT) • Allow more CPU cycles to be spent on real work • Release memory allocated for SPT • SLAT supports large page size (2MB and 1GB) Expensive

NUMA Support • The system is divided into small groups of processors (NUMA nodes) • Each node has dedicated memory (local) • Nodes can access memory residing in other nodes (remote), but with extra latency

NUMA Support

NUMA Scalability • NUMA-aware for virtual machine scalability • Hypervisor schedules resources to improve performance characteristics • Assign “near” memory to virtual machine • Select “near” logical processor for virtual processor

NUMA-Aware Scheduler

More Hypervisor Optimizations • Scheduler • Tuned for datacenter workloads (ASP.NET, etc.) • More predictability and fairness • Tolerate heavy I/O loads • Intercept reduction • Spin lock enlightenments • Reduce TLB flushes • VMBUS bandwidth improvement

Summary • Automated, reliable deployment • Streamlined and consistent • Verifiable through offline provisioning • Efficient, scalable hypervisor • Maximizing CPU cycles on customer applications • Optimized for datacenter workload • Reliable and secure virtualization • Compute instances are isolated from each other • Predictable and consistent behavior

Related Content • Related PDC sessions • A Lap Around Cloud Services • Architecting Services For The Cloud • Cloud Computing: Programming In The Cloud • Related PDC labs • Windows Azure Hands-on Labs • Windows Azure Lounge • Web site http://www.azure.com/windows

Evals & Recordings Please fill out your evaluation for this session at: This session will be available as a recording at: www.microsoftpdc.com

Q&A Please use the microphones provided

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Under the Hood: Inside T he Cloud Computing Hosting Environment

Under the Hood: Inside T he Cloud Computing Hosting Environment

Presentation Transcript

PHP: Under The Hood

Cloud Distributed Computing Environment

Under the hood

Cloud computing vs. virtual hosting

CFSRR under the hood

WinHelp Under the Hood

IPv6 “Under the Hood”

PC Under the Hood

Under the hood

Under the Hood

Lucernex Inside T he engine inside the SLM market

Under The Hood

Under the Hood

t he environment

t he wisdom of the CLOUD

Looking Under the Hood

Under The Hood

Looking Under the Hood

Looking Under the Hood