350 likes | 700 Views
Building Scalable, Global, and Highly Available Web Apps. Name Title Microsoft Corporation. Agenda. Design for High Availability Design for High Scalability Design for Performance. Assumptions. You know the basics Windows Azure Web/Worker Roles SQL Database Windows Azure Storage
E N D
Building Scalable, Global, and Highly Available Web Apps Name Title Microsoft Corporation
Agenda Design for High Availability Design for High Scalability Design for Performance
Assumptions You know the basics Windows Azure Web/Worker Roles SQL Database Windows Azure Storage Asynchronous Programming Windows Azure diagnostics You have deployed a service to Windows Azure Everything can and will (eventually) break
Why do services fail? Increased workload Failure Hardware Network Platform Service Transient conditions Human Upgrades
What do we mean by available? Same functionality Degraded functionality Failsafe
Basics – what you get for free Elasticity Easily deploy compute resources and scale up and down Automated Service Management Windows Azure will (automatically) recover bad nodes Fault Domains Windows Azure deploys services across fault boundaries Storage Resilience 3 copies of storage maintained
Fault Tolerance When Windows Azure breaks, it fixes itself! Can your service? Codifying Operations Upgrade Domains Configure in ServiceDefinition.csdef <ServiceDefinition name="RedDir"xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition" upgradeDomainCount="3"> Transient Datacenter Conditions Do you have Retry Logic?
What did you mean, retry logic? Transient conditions in the datacenter/network/service Example: SQL Azure Error 40501The service is currently busy. Retry the request after 10 seconds. Transient Fault Handling Framework http://windowsazurecat.com/2011/02/transient-fault-handling-framework/ Retry against anything that might be external and have transient conditions*: SQL Database Windows Azure Storage Service Bus 3rd Party Services
Retry demo
Service Specific Implementations Does your service fail without that platform service? Can your service use the same platform services from another data center? Can your service not use that platform service temporarily?
Site Failover If a site specific dependency is out, fail over to another site Easy: Use Traffic Manager Hard: Code your own
Site Failover demo
Upgrade Strategies: VIP Swap V1 DNSfoo.com Load Balancer Foo.cloudapp.net (Production) V2 GUID.cloudapp.net Test (Staging)
Upgrade Strategies: Upgrade DNSfoo.com Load Balancer Foo1.cloudapp.net (Production) Worker Worker Web Web V1 V1 V2 V2 V1 V1 V2 V2 V1 V1 V2 V2
Upgrade StrategiesNew Service & Swap DNS DNSfoo.com Foo1.cloudapp.net (Production) Foo2.cloudapp.net (Production)
What is wrong with this? • It is better to have 50 x 1GB database than 1 x 50GB database Web Role Scale me out too SQL Database 1 n
What about this? Everything needs to scale Web Role SQL Database Load Balancer Table Storage Worker Role Q Blob Storage
Synchronous Design Pattern Each thread dedicated to one outstanding request Block on each step of “the work” done for each request, then respond & repeat This approach scales poorly Each outstanding request is stored on a thread stack Threads block even when there is work to be done Adding a thread enables only one additional concurrent request Web App Front End SQL Azure • “The Work” #1 • Client Request #1 Middle Tier • Response #1 • Client Response #1 • Thread • Thread • blocks WA Storage • Waiting… • Client Request #2 • Time passes…
Asynchronous Design Pattern Each thread picks up work whenever it is ready A thread handling one request may handle another before the first one completes Web App Front End SQL Azure • “The Work” #1 • Client Request #1 Context Middle Tier • Response #1 • Client Response #1 • Client Request #2 • Thread • Thread • “The Work” #2 WA Storage • Client Response#2 • Response#2 This approach scales well Client requests tracked explicitly in app’s data structures Threads never block while there is work to be done Each thread can handle possibly many concurrent requests But bookkeeping & synchronization can be difficult…
What’s Windows Azure Cache? Use spare memory on your VMs as high-performance cache Distributed cache cluster co-located with existing roles, or use dedicated roles Named caches with high availability option and notifications Support Memcached protocol
Why Windows Azure Cache? Faster No external service calls (additional network hops) Co-located in roles Cheaper No external service calls (additional cost) Use spare memory that you already paid for More reliable Your service is running = cache is available No throttling as in cotenant environment
Cache demo
Why Performance Matters More responsive applications Faster page load times 8 seconds vs. 3 seconds? Higher interactivity – new type of applications Better user experience – more $$$
Thinking Globally Network latency Put compute closer to user. Put data closer to user. Global availability Datacenter outages. Synchronizing data.
Content Delivery Network (CDN) High-bandwidth global blob content delivery 24 locations globally (US, Europe, Asia, Australia and South America), and growing Same experience for users no matter how far they are from the geo-location where the storage account is hosted Blob service URL vs CDN URL: Windows Azure Blob URL: http://images.blob.core.windows.net/ Windows Azure CDN URL: http://<id>.vo.msecnd.net/ Custom Domain Name for CDN: http://cdn.contoso.com/
Windows Azure CDN GET http://guid01.vo.msecnd.net/images/pic.1jpg 404 To Enable CDN: Register for CDN via Dev Portal Set container images to public EdgeLocation EdgeLocation EdgeLocation Content Delivery Network TTL http://sally.blob.core.windows.net/ http://guid01.vo.msecnd.net/ Windows Azure Blob Service pic1.jpg pic1.jpg pic1.jpg http://sally.blob.core.windows.net/images/pic1.jpg
Windows Azure Traffic Manager Direct users to the service in the closest region with the Windows Azure Traffic Manager Traffic Manager foo-us.cloudapp.net foo.cloudapp.net foo-europe.cloudapp.net Policies Monitoring DNS response 1.2.3.4 foo-asia.cloudapp.net
Traffic Manager demo
Summary Windows Azure gives you high availability capabilities for free Think about scaling out Handle transient conditions Design for scalability Asynchronous pattern Scale out Design for maximum performance & reach Caching, CDN, Traffic Manager, etc.