1 / 54

Inside Windows Azure Storage : what's new and under the hood deep dive

SAC-961T. Inside Windows Azure Storage : what's new and under the hood deep dive. Brad Calder General Manager Microsoft Corporation. Agenda. Windows Azure Storage Today What’s new? Blobs, Tables and Queues features Storage Analytics Geo-Replication Windows Azure Storage Internals.

min
Download Presentation

Inside Windows Azure Storage : what's new and under the hood deep dive

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAC-961T Inside Windows Azure Storage: what's new and under the hood deep dive Brad Calder General Manager Microsoft Corporation

  2. Agenda • Windows Azure Storage Today • What’s new? • Blobs, Tables and Queues features • Storage Analytics • Geo-Replication • Windows Azure Storage Internals

  3. Windows Azure Storage Today Geographically Distributed across 3 Regions Thousands of services/applications Anywhere at Anytime Access to your data Durability and Scalability 70 Petabytes raw storage today Grows to >200 Petabytes by start of 2012

  4. Running on Windows Azure Storage Facebook and Twitter Near Real-Time Search Microsoft ZuneMedia Storage and Delivery Telemetry for Kinect Game Saves in Cloud

  5. Running on Windows Azure Storage Bing Realtimefacebook/twitter search ingestion engine Bing Ingestion Engine (Azure Service) Index Facebook/Twitter data within 15 seconds of update VM VM VM VM User postings Status updates ………… Windows Azure Blobs Windows Azure Queues Windows Azure Tables peak 40,000 Requests/sec 2~3 billion Requests per day Took 1 dev 2 months to design, build and release to production • Facebook/Twitter data stored into blobs • Ingestion engine process blobs • Annotate with auth/spam/adult scores, content classification , expands links, etc • Uses Tables heavily for indexing • Queues to manage work flow • Results stored back into blobs • Bing takes resulting blobs and folds into search index

  6. What’s new for Blobs, Tables and Queues

  7. Windows Azure Storage • Abstractions • Blobs – File system in the cloud • Tables – Massively scalable structured storage • Queues – Reliable storage and delivery of messages • Drives – Durable NTFS volumes for Windows Azure applications • Easy client access • Easy to use REST APIs and Client Libraries • Existing NTFS APIs for Windows Azure Drives

  8. Windows Azure Storage Account • User creates a globally unique storage account name • Choose the primary location to host storage account • “North Central US”, “South Central US” • “North Europe”, “Europe West” • “South East Asia”, “East Asia”

  9. Windows Azure Data Storage Concepts Container Blobs Account • Table Entities https://<account>.blob.core.windows.net/<container> Queue Messages https://<account>.table.core.windows.net/<table> https://<account>.queue.core.windows.net/<queue>

  10. Windows Azure Blobs • A highly scalable and durable file system in the cloud • Store files as blobs and associate metadata with it • Blobs can be up to 200 GB in size • Upload/Download Blobs • Provides continuation for large uploads • Provides range reads • Strong Consistency and Optimistic Concurrency • Conditional operations – If-Match, If-Not-Modified-Since, etc. • Snapshot Blob • Create versions/backup of your blobs • Lease Blob • Exclusive write lease

  11. Windows Azure Blobs – What is new? • Efficient Resume for browsers and streaming media players require: • Range requests of the form “Range: bytes 100-” • Return “Accept-Ranges” response header • ETags to be quoted

  12. Windows Azure Tables • Scalable Structured Storage • Store Tables with billions of entities and TBs of data • Provides flexible schema (NoSQL) • Data Model • A table is a set of entities (rows) • An entity is a set of properties (columns) • Familiar and Easy to use API • OData Protocol • WCF Data Services - .NET classes and LINQ

  13. Windows Azure Tables – What is new? • Query Projection ($select) • Project only selected columns • Upsert Entity • InsertOrReplace • InsertOrMerge • Insert entity • Update entity • Merge • Replace • Delete entity • Query entity • Entity Group Transactions

  14. Windows Azure Tables - Projection • publicclassCustomer • { • publicstringPartitionKey { get; set; } // Customer Name • publicstringRowKey { get; set; } // Customer Phone Number • publicDateTimeCustomerSince { get; set; } • publicdoubleTotalPurchase { get; set; } • publicstring State { get; set; } • // 100 more properties including profile picture etc.… • } • // Partial entity defined here • publicclassCustomerDiscount • { • publicstringPartitionKey { get; set; } • publicstringRowKey { get; set; } • publicdoubleTotalPurchase { get; set; } • }

  15. Windows Azure Tables - Projection • // Select partial entities by choosing properties to be projected • varquery = (fromentity incontext.CreateQuery<CustomerDiscount>("Customers" /*Table Name*/) • selectnewCustomerDiscount • { • PartitionKey= entity.PartitionKey, • RowKey= entity.RowKey, • TotalPurchase= entity.TotalPurchase, • }).AsTableServiceQuery<CustomerDiscount>(); • foreach (CustomerDiscount customer in query) • { • // Calculate the discount to be given based on total purchases made • }

  16. Windows Azure Tables - Upsert • // When user logs in from mobile device, it will register the user using upsert • Customer customer= newCustomer("Thomas Anderson", “555-555-0100"); • customer.Address= "4567 Main St. Redmond 48188"; • customer.State = "Washington" • // Note: AttachTomethod is called without an Etagwhich indicates • // that this is an Upsert Command • context.AttachTo("Customers"/*Table Name*/, customer); • context.UpdateObject(customer); • // No SaveChangeOptions indicates that a MERGE verb will be used • // to get InsertOrMerge semantics • // Use SaveChangesOptions.ReplaceOnUpdate for InsertOrReplacesemantics. • // But InsertOrReplace will overwrite TotalPurchase if it existed • context.SaveChanges(SaveChangesOptions.ReplaceOnUpdate); context.SaveChanges();

  17. Windows Azure Queues • Provides reliable message delivery • Programming semantics – Ensures that a message can be processed at least once • Put message into the queue • Get message makes the message invisible in queue for a specified invisibility timeout • Delete message once done processing to remove message from queue • If worker crashes, message becomes visible for another worker to process

  18. Windows Azure Queues – What is new? • Allow larger messages to be stored in queue • Message size has been increased to 64 KB • Allow worker to treat the invisibility timeout as a lease • Lease can be renewed on a queue message • Allow worker to update contents of queue message • Enable efficient continuation on worker failure • Schedule work at a future time • “PUT Message” takes invisibility timeout

  19. Windows Azure Queue Update Message Example Periodically store progress information in messagecontent Current Time Extend visibility timeout with another 5 minutes 7:00 AM 7:04 AM 7:09 AM 7:07 AM Get Message with 5 minutes visibility timeout Expires @ 7:05AM Web Role Expires @ 7:09AM Worker Role Work items Azure Queue 7:09 7:05 7:14 Worker Role Web Role Retrieve progress from queue message and resume

  20. Windows Azure Storage Analytics

  21. Storage Analytics • Goal • Enable customers to understand and debug their usage of storage • Capabilities • Logs • Enable customers to get a trace of all executed Blob, Table and Queue requests against their storage accounts • Metrics • Enable customers to get an hourly summary of key statistics about the traffic to their Blobs, Tables and Queues

  22. Storage Analytics – Why turn on logging? • Provides ability to answer commonly asked questions: • Did a specific request make it to the storage service and how long did it take? • What Client IP issued a “Delete container” request and when? • How many requests were issued by a specific client or to a specific set of objects? • list goes on and on…

  23. Storage Analytics Logs • Log records for requests are stored in Windows Azure Blobs • The Log blobs are text files with one log entry per line • Each blob can contain one to many request records • A request typically appears in the log within 15 minutes after it completes execution • Configure the logging levels separately for • Blob, Table and Queues • read (GET), write (PUT/POST/MERGE), delete (DELETE) requests or any combination • Best effort logging

  24. Storage Analytics Data Fields Logged • The following are some of the fields logged for each record: Request Status HTTP Status Code Client IP User Agent Referrer Client Request ID ETag LMT Log Version Accessing Account Owner Account Service Type Request URL Object Key Request ID Operation Number Request Version Operation Type Start Time Application End to End Latency Storage Server Latency Authentication Type Request Packet Size Request Header Size Response Packet Size Response Header Size Request MD5 Server MD5 Conditions Used

  25. Log Entry Example Log Entry in Blob: 1.0;2011-07-28T18:02:40.6271789Z;PutBlob;Success;201;28;21;authenticated;sally;sally;blob;"http://sally.blob.core.windows.net/thumbnails/lake.jpg?timeout=30000";"/sally/thumbnails/lake.jpg";fb658ee6-6123-41f5-81e2-4bfdc178fea3;0;201.9.10.20;2009-09-19;438;100;223;0;100;;"66CbMXKirxDeTr82SXBKbg==";"0x8CE1B67AD25AA05";Thursday, 28-Jul-11 18:02:40 GMT;;;;"req12345“ Log Version: 1.0 Start Time: 2011-07-28T18:02:40.6271789Z Operation Type: PutBlob Status: Success HTTP Status Code: 201 Application E2E Latency (milliseconds): 28 Storage Server Latency (milliseconds): 21 Accessing Account: sally Owner Account: sally Service Type: blob Request URL: PUT http://sally.blob.core.windows.net/thumbnails/lake.jpg Object Key: /sally/thumbnails/lake.jpg Request ID: fb658ee6-6123-41f5-81e2-4bfdc178fea3 Operation Number: 0 Request Version: 2009-09-19 Client IP: 201.9.10.20 Client Request ID: req12345

  26. Storage Analytics – Why turn on Metrics? • Provides ability to answer commonly asked questions: • How many transactions did my service issue per hour over the past week? • How many anonymous Get Blob requests were issued to my storage account? • My application is not performing as expected, what is the availability and performance of storage for a given time period? • list goes on and on…

  27. Storage Analytics – Metrics • Transaction metrics are provided for every 1 hour time interval stored into Windows Azure Tables • Example Metrics • Total Transactions • Availability • % Success, % Network Errors, % Timeout, % Throttled, etc. • Average Latency (Application E2E and Storage Server latency) • Total Ingress • Total Egress • Blob, Table or Queue Summary and per REST API metrics • Capacity metrics provided for only Blobs at this time • Updated once a day • Capacity and # of objects

  28. Storage Analytics – Example using Metrics • Client application running in cloud start experiencing slow table access • Compare Application E2E latency with Storage Server latency Application E2E Latency Time taken for application to retrieve the result Time for input to be transferred to storage service Time for storage service to process request and compute result Request arrives at storage service Done Storage Server Latency

  29. Compare Application E2E Latency to Storage Server Latency

  30. Compare Application E2E Latency to Storage Server Latency (ms) Total Transactions

  31. Root Causing the Issue • They then looked at their application performance counters and profiling to find • High CPU utilization • High Memory usage • Frequent Garbage Collection cycles • Reason for the difference between E2E latency and Server latency • Took a long time for application to retrieve the results of a query • Their resolution was to: • Increase number of VM instances • Move to larger VM instances • Move to Server GC

  32. Storage Analytics Summary • Separate Namespace • Logs • Stored as blobs in separate Blob Container in the storage account being monitored • http://account.blob.core.windows.net/$logs/ • Metrics • Stored as entities in a separate metrics Azure Tables in the storage account being monitored • http://account.table.core.windows.net/$Metrics* • Isolation • $logs and $Metrics have separate resource limits and throttling from the rest of the storage account traffic • Cost • Capacity to keep the data • Transactions for generating & accessing analytics data • Can use retention policy on both logs and metrics in terms of days • Deleting data via retention policy does not incur transaction cost

  33. announcing Geo-replication

  34. Geo-replication • Data geo-replicated cross data centers 100s miles apart • Turned on right now for Blob and Table data (Queues will be in CY12) • Provide data durability in face of major data center disasters • Data only geo-replicated within regions • User chooses primary location during account creation • The other location in region is the secondary location • Asynchronous geo-replication • Off critical path of live requests North Central US North Europe Geo-replication South Central US Geo-replication East Asia South East Asia Geo-replication Europe West

  35. Geo-replication • Is there a cost for geo-replication? • Geo-replication included in current price of Storage • Geo-replication is on by default for all storage accounts • Can turn off for whole storage account • Though no price savings if you turn it off • To disable (turn off) geo-replication contact Microsoft Windows Azure Support • But note, if you turn geo-rep off and then back on • Data transfer egress rates apply to re-bootstrap the data from primary to secondary data center. No additional charge after the re-bootstrap is done.

  36. Geo-Failover South Central US http://account.blob.core.windows.net/ AzureDNS Update DNS • Existing URL works after failover • Failover Trigger – failover would only be used if primary could not be recovered • Asynchronous Geo-replication – may lose recent updates during failover • Typically geo-replicate data within minutes, though no SLA guarantee DNS lookup Data access South Central US North Central US Failover Geo-replication

  37. Windows Azure Storage Internals

  38. Design Goals • Highly Available Storage with Strong Consistency • Provide access to data in face of hardware failures • Durability • Replicate data several times within and across data centers • Scalability • Need to scale to exabytes and beyond • Automatically load balance data to meet peak traffic demands • Provide a global namespace to access data around the world • Additional details can be found in up coming paper: • “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011

  39. Windows Azure Storage Stamps Access blob storage via the URL: http://<account>.blob.core.windows.net/ Storage Location Service Data access LB LB Storage Stamp Storage Stamp Front-Ends Front-Ends Partition Layer Partition Layer Inter-stamp (Geo) replication DFS Layer DFS Layer Intra-stamp replication Intra-stamp replicaion

  40. Storage Stamp Architecture – DFS Layer • All data from the Partition Layer is stored into files (extents) in the DFS layer • An extent is replicated 3 times across different fault and upgrade domains • Checksum all stored data • Verified on every client read • Scrubbed every few days • Re-replicate on disk/node/rack failure or checksum mismatch • Load balancing • 3 replicas are randomly allocated across a candidate set of servers based on available resources • Any of the 3 replicas can be read from and read load balancing is used • Use a journal drive to keep the write latencies low M Distributed File System (DFS) Layer M Paxos M DFS Servers

  41. Storage Stamp Architecture – Partition Layer • Provide transaction semantics and strong consistency for high level data abstractions • Stores and reads the objects to/from extents in the DFS layer • Provides inter-stamp (geo) replication by shipping logs to other stamps • Scalable object index via partitioning Partition Master Lock Service Partition Layer Partition Server Partition Server Partition Server Partition Server M M Paxos DFS Layer M DFS Servers

  42. Storage Stamp Architecture Front End Layer FE FE FE FE FE • Stateless Servers • Authentication + authorization • Request routing Partition Master Lock Service Partition Layer Partition Server Partition Server Partition Server Partition Server M M Paxos DFS Layer M DFS Servers

  43. Storage Stamp Architecture Incoming Write Request Ack Front End Layer FE FE FE FE FE Partition Master Lock Service Partition Layer Partition Server Partition Server Partition Server Partition Server M M Paxos DFS Layer M DFS Servers

  44. Partition Layer – Scalable Object Index • 100s of Billions of blobs, entities, messages across all accounts can be stored in a stamp • Need to efficiently enumerate, query, get, and update them • Traffic pattern can be highly dynamic • Hot objects, peak load, traffic bursts, etc • Need a scalable index for the objects that can • Spread the index across 100s of servers • Dynamically load balance • Dynamically change what servers are serving each part of the index based on load

  45. Scalable Object Index via Partitioning • Partition Layer maintains an internal Object Index Table for each data abstraction • Blob Index: contains all blob objects for all accounts in a stamp • Entity Index: contains all entities for all accounts in a stamp • Message Index: contains all messages for all accounts in a stamp • Scalability is provided for each Object Index • Monitor load to each part of the index to determine hot spots • Index is dynamically split into thousands of Index RangePartitions based on load • Index RangePartitions are automatically load balanced across servers to quickly adapt to changes in load

  46. Partition Layer – Index Range Partitioning • Split index into Range Partitions based on load • Can only split at PartitionKey boundaries • PartitionMap tracks Index RangePartition assignment to partition servers • Front-End caches the PartitionMap to route user requests • Each part of the index is assigned to only one Partition Server at a time Blob Index Storage Stamp Partition Master A-H: PS1 H’-R: PS2 R’-Z: PS3 A-H: PS1 H’-R: PS2 R’-Z: PS3 Partition Map Partition Server A-H Front-End Server PS 1 Partition Server Partition Server R’-Z H’-R PS 3 PS 2 Partition Map

  47. Partition Layer – Automatic RangePartition Load Balancing PM PM VIP • Load balancing is triggered based on hot RangePartitions or Partition Servers • No data is moved on disk for the reassignment • Only changing the index assignment for the Partition Servers Legend - RangePartition Master System FE 2 - Server Load FE 1 FE 3 Master System Partition Master Reassign RangePartition Unassign RangePartition Partition Server 1 Partition Server 2 Partition Server 3 Partition Server 4 DFS Layer

  48. Scalability of Data Abstractions • Namespace for accessing storage • http://<accountName>.<type>.core.windows.net/partitionName • How to scale out storage for your service • Understand the scalability targets at 2 levels: • Scalability targets of a single storage account • Scalability targets for Blobs, Table Entities and Queues within a storage account

  49. Scalability of Storage Accounts • Namespace for accessing storage • http://<accountName>.<type>.core.windows.net/partitionName • How to scale out storage for your service • Understand the scalability targets at 2 levels: • Scalability targets of a single storage account • Account Scalability Targets • Capacity – Up to 100 TBs • Transactions – Up to 5000 entities per second • Bandwidth – Up to 3 gigabits per second • Partition data across storage accounts to go beyond these targets

  50. Scalability of Objects within an Account • Namespace for accessing storage • http://<accountName>.<type>.core.windows.net/partitionName • How to scale out storage for your service • Understand the scalability targets at 2 levels: • Scalability targets for Blobs, Table Entities and Queues within a storage account • Single Blob – up to 60MBytes per second • Single PartitionKey in a Table – up to 500 entities per second • Single Queue - up to 500 messages per second

More Related