540 likes | 670 Views
Architecting Applications for High Scalability. Leveraging the Windows Azure Platform Scott Densmore Sr. Software Development Engineer Microsoft patterns & practices. About you (an assumption). You… are a developer know C# have a basic understanding of Windows Azure.
E N D
Architecting Applications for High Scalability Leveraging the Windows Azure Platform Scott Densmore Sr. Software Development Engineer Microsoft patterns & practices
About you (an assumption) • You… • are a developer • know C# • have a basic understanding of Windows Azure
Goals for this session Learn what is available in Windows Azure to help you build scalable systems (Re)-Discover helpful design patterns Learn about practical techniques Identify (and avoid) potential problems
DEMO TailSpin Surveys
Take the survey http://tailspindemo.cloudapp.net/survey/fabrikam/slovenia
Where should my application live? Location
Windows Azure Traffic Manager 100ms 50ms
Windows Azure Traffic Manager 200ms 100ms 50ms
Windows Azure Traffic Manager Fault Tolerance Round Robin Performance Redirect traffic to another deployment based on availability Traffic routed to deployments based on fixed ratio Directs the user to the best / closest deployment Load balancing across multiple Hosted Services Integrated in the Windows Azure Platform portal
Windows Azure Traffic Manager • Multiple factors determine DNS resolution • Configured by Microsoft • Geo-IP mapping • Periodic performance measurement • Configured by service owner • Policy: Performance, Failover, Geo, Ratio • Monitoring • Currently in CTP
Windows azure cdn Integrated with Storage Delivery from Windows Azure Compute instances Https support CTP of Smooth Streaming
Managing CDN Content Expiration • Default behavior is to fetch once and cache for up to 72 hrs • Modify cache control blob header to control the TTL • x-ms-blob-cache-control: public, max-age=<value in seconds> • Think hours, days or weeks • Higher numbers reduce cost and latency via CDN & downstream caches
Managing CDN Content Expiration HTML Served by App CDN Blob Storage <imgsrc="http://azXXXX.vo.msecnd.net/images/logo.2011-05-29.png"/> logo.2011-05-01.png logo.2011-05-01.png logo.2011-05-29.png logo.2011-05-29.png Enables easy rollback and A/B testing Use versioned URLs to expire content on-demand
Who is using my application? identity
Shared access signatures • Provide direct access to content • Can be time-bound or revoked on demand • Also works for write access (e.g. user-generated content)
Shared access signatures 2. Service prepares a Shared Access Signature (SAS) to X using the securely stored storage account key 1. “I am Bob & I want X” Hosted Compute Key 3. Service returns SAS (signed HTTPS URL) 4. Bob uses SAS to access X directly from Blob Storage for reduced latency & compute load Non-public blob (e.g. paid or ad-funded content) X
Where is the bottleneck? Balancing load
User session • Session is not affinitized – Load Balancer • Session in Windows Azure • Session Providers • SQL Azure • Table Storage • Windows Azure AppFabric Caching • JavaScript on the client • ViewState (hidden fields)
Windows appfabric caching Out of box ASP.NET providers for session state & page output caching Extreme low latency with the local cache Local cache enables you to use spare available memory in your Web tier while the Caching tier gives you a predictable distributed cache
Windows appfabric caching • Caches any managed object (CLR objects, rows, XML, Binary Data…) • Only requirement is that the object should be serializable • Easily integrates into existing applications • Same managed interfaces as Windows Server AppFabric Caching • Secured by the Access Control Service
Key Caching Patterns • Reference Data • A version of the authoritative data, refreshed periodically • Large number of accesses, mostly read • Example – Product catalogs • Activity-oriented Data • Data generated as part of the app activity, typically logged back to a backend datastore • Needs read, write access • Example – Shopping cart, Session State • Resource-oriented Data • Authoritative data, modified by transactions, temporal in nature • Needs frequent read, limited write access • Example – Flight Inventory, Stock Quotes
Partition the application • Multiple web sites • Choose the right number of instances and instance size • Monitor and scale your application without redeploying • Use async processing (Worker Roles)
Calculating survey results • Two approaches • Retrieve all the surveys to date at a fixed time interval, recalculate and then save the summary data over the existing data • Retrieve the survey data since the last time the task ran and update the summary results
Map reduce algorithm Original concepts come from map and reduce functions used in functional languages (Haskell, F#, Erlang) Parallelize operations on a large dataset and speeds up processing by using multiple compute nodes Dryad is Microsoft’s implementation
SQL Azure • Partition (or shard) your data across databases • Spreads load across multiple database instances • Avoid hitting database size limits • Parallelized queries across more nodes • Improved query performance on commodity hardware • Partitioning scheme varies per data set
Sql azure Tenant 1 Tenant 2 Hosted Compute Tenant 3
Table storage • Don’t be afraid to de-normalize data • Only two indexes in a table • Partition Key • Row Key • They are not really tables, think of them as Entity bags (key / value storage)
Paging with table storage Use the ContinuationToken along with the Take operation in your query The ContinuationToken only accesses the next page of data To implement forward and back you will need a stack of ContinuationTokens
Table storage best practices Limit large scans and expect continuation tokens for queries that scan Entity Group Transaction - Batch to reduce costs and get transaction semantics Do not reuse DataServiceContext across multiple logical operations Discard DataServiceContext on failures
Table storage best practices AddObject/AttachTo can throw exception if entity is already being tracked Query throws an exception if resource does not exist. Use IgnoreResourceNotFoundException
Blob storage • Blobs can be anything • Pictures, docs, etc. • Html • XML • JSon objects
Paging with blob storage Each item (survey answer) is stored as a blob (json) in a container A blob is used to maintain a list of the items (survey answers) as they were entered by id Use an inverted tick count to generate the id of the answer to make it unique and ordered
Blob storage best practices • Use parallel block upload count to reduce latency when uploading blob • Client Library uses a default of 90s timeout – use size based timeout • Snapshots – For block or page reuse, issue block and page uploads in place of UploadXXX methods in Storage Client